But how should you retry?
Simple retries
The most simplistic retry mode is to simply surround all your code with a while loop that executes your block a predefined number of times, ala:int retry = 0;
do { // Operation if( true == MyFlakyOperation() )
break;}
while ( ++retry < 6 )
The problem with this approach is that it completely ignores the most likely underlying reason for the failure. Congestion or resource load on the remote end could be causing your calls (and many others) to intermittently fail as the server cannot handle the incoming requests. In this case your naive implementation might actually be contributing to making the situation even worse.
So how do we solve this?
Spacing out retries
One common approach to spacing out retries is called Exponential backoff. This algorithm uses a predefined feedback (e.g. retry count) to systematically increase wait times between repeated executions of the same code to avoid congestion.Example of exponential spacing based on 4sec base wait time. The vertical bars indicate retry points. |
Example of backoff retries
Below is an example of a very simple C++ algorithm snippet that performs this kind of exponential backoff based on 4sec intervals:int success = OK; int retry = 0; do { // Operation success = MyFlakyOperation(); // Sleep if operation was not success if (success != OK) { int sec = static_cast(std::pow(4, retry)); std::this_thread::sleep_for(std::chrono::seconds(sec)); } }
while ( ++retry < 6 && success != OK)
In this example my algorithm has a maximum running time with full retry count of a whooping 22 min and 44 seconds! (4+16+64+256+1024 = 1364sec).
How much does the waiting time increase?
Care must be taken when choosing the interval to increment by when using a naive approach as my example above. Below is a table listing the waiting times in seconds for each retry for 2-7 second intervals.Remember that your maximum running time is the cumulative waiting numbers for all intervals!
Retry# | 2sec | 3sec | 4sec | 5sec | 6sec | 7sec |
---|---|---|---|---|---|---|
1
|
2
|
3
|
4
|
5
|
6
|
7
|
2
|
4
|
9
|
16
|
25
|
36
|
49
|
3
|
8
|
27
|
64
|
125
|
216
|
343
|
4
|
16
|
81
|
256
|
625
|
1,296
|
2,401
|
5
|
32
|
243
|
1,024
|
3,125
|
7,776
|
16,807
|
6
|
64
|
729
|
4,096
|
15,625
|
46,656
|
117,649
|
7
|
128
|
2,187
|
16,384
|
78,125
|
279,936
|
823,543
|
8
|
256
|
6,561
|
65,536
|
390,625
|
1,679,616
|
5,764,801
|
9
|
512
|
19,683
|
262,144
|
1,953,125
|
10,077,696
|
40,353,607
|
10
|
1,024
|
59,049
|
1,048,576
|
9,765,625
|
60,466,176
|
282,475,249
|
* so using 7 sec as a base and allowing up to 10 retries, the total maximum waiting time will be just shy of 10,5 years!