Peter Bengtsson: Best practice with retries with requests

Datetime:2017-04-20 05:57:23         Topic: Python          Share        Original >>
Here to See The Original Article!!!

tl;dr; I have a lot of code that does response = requests.get(...) in various Python projects. This is nice and simple but the problem is that networks are unreliable. So it's a good idea to wrap these network calls with retries. Here's one such implementation.

The First Hack

import time
import requests

# DON'T ACTUALLY DO THIS. 
# THERE ARE BETTER WAYS. HANG ON!

def get(url):
    try:
        return requests.get(url)
    except Exception:
        # sleep for a bit in case that helps
        time.sleep(1)
        # try again
        return get(url)

This, above, is a terrible solution. It might fail for sooo many reasons. For example SSL errors due to missing Python libraries. Or the URL might have a typo in it, like get('http:/www.example.com') .

Also, perhaps it did work but the response is a 500 error from the server and you know that if you just tried again, the problem would go away.

# ALSO A TERRIBLE SOLUTION

while True:
    response = get('http://www.example.com')
    if response.status_code != 500:
        break
    else:
        # Hope it won't 500 a little later
        time.sleep(1)

What we need is a solution that does this right. Both for 500 errors and for various network errors.

The Solution

Here's what I propose:

import requests
from requests.adapters import HTTPAdapter
from requests.packages.urllib3.util.retry import Retry


def requests_retry_session(
    retries=3,
    backoff_factor=0.3,
    status_forcelist=(500, 502, 504),
    session=None,
):
    session = session or requests.Session()
    retry = Retry(
        total=retries,
        read=retries,
        connect=retries,
        backoff_factor=backoff_factor,
        status_forcelist=status_forcelist,
    )
    adapter = HTTPAdapter(max_retries=retry)
    session.mount('http://', adapter)
    session.mount('https://', adapter)
    return session

Usage example...

response = requests_retry_session().get('https://www.peterbe.com/')
print(response.status_code)

s = requests.Session()
s.auth = ('user', 'pass')
s.headers.update({'x-test': 'true'})

response = requests_retry_session(sesssion=s).get(
    'https://www.peterbe.com'
)

It's an opinionated solution but by its existence it demonstrates how it works so you can copy and modify it.

Testing The Solution

Suppose you try to connect to a URL that will definitely never work, like this:

t0 = time.time()
try:
    response = requests_retry_session().get(
        'http://localhost:9999',
    )
except Exception as x:
    print('It failed :(', x.__class__.__name__)
else:
    print('It eventually worked', response.status_code)
finally:
    t1 = time.time()
    print('Took', t1 - t0, 'seconds')

There is no server running in :9999 here on localhost . So the outcome of this is...

It failed :( ConnectionError
Took 1.8215010166168213 seconds

Where...

1.8 = 0 + 0.6 + 1.2

The algorithm for that backoff is documented here and it says:

A backoff factor to apply between attempts after the second try (most errors are resolved immediately by a second try without a delay). urllib3 will sleep for: {backoff factor} * (2 ^ ({number of total retries} - 1)) seconds. If the backoff_factor is 0.1, then sleep() will sleep for [0.0s, 0.2s, 0.4s, ...] between retries. It will never be longer than Retry.BACKOFF_MAX. By default, backoff is disabled (set to 0).

It does 3 retry attempts, after the first failure, with a backoff sleep escalation of: 0.6s, 1.2s.

So if the server never responds at all, after a total of ~1.8 seconds it will raise an error:

In this example, the simulation is matching the expectations (1.82 seconds) because my laptop's DNS lookup is near instant for localhost . If it had to do a DNS lookup, it'd potentially be slightly more on the first failure.

Works In Conjunction With timeout

Timeout configuration is not something you set up in the session. It's done on a per-request basis. httpbin makes this easy to test. With a sleep delay of 10 seconds it will never work (with a timeout of 5 seconds) but it does use the timeout this time. Same code as above but with a 5 second timeout:

t0 = time.time()
try:
    response = requests_retry_session().get(
        'http://httpbin.org/delay/10',
        timeout=5
    )
except Exception as x:
    print('It failed :(', x.__class__.__name__)
else:
    print('It eventually worked', response.status_code)
finally:
    t1 = time.time()
    print('Took', t1 - t0, 'seconds')

And the output of this is:

It failed :( ConnectionError
Took 21.829053163528442 seconds

That makes sense. Same backoff algorithm as before but now with 5 seconds for each attempt:

21.8 = 5 + 0 + 5 + 0.6 + 5 + 1.2 + 5

Works For 500ish Errors Too

This time, let's run into a 500 error:

t0 = time.time()
try:
    response = requests_retry_session().get(
        'http://httpbin.org/status/500',
    )
except Exception as x:
    print('It failed :(', x.__class__.__name__)
else:
    print('It eventually worked', response.status_code)
finally:
    t1 = time.time()
    print('Took', t1 - t0, 'seconds')

The output becomes:

It failed :( RetryError
Took 2.353440046310425 seconds

Here, the reason the total time is 2.35 seconds and not the expected 1.8 is because there's a delay between my laptop and httpbin.org . I tested with a local Flask server to do the same thing and then it took a total of 1.8 seconds.

Discussion

Yes, this suggested implementation is very opinionated. But when you've understood how it works, understood your choices and have the documentation at hand you can easily implement your own solution.

Personally, I'm trying to replace all my requests.get(...) with requests_retry_session().get(...) and when I'm making this change I make sure I set a timeout on the .get() too.

The choice to consider a 500, 502 and 504 errors "retry'able" is actually very arbitrary. It totally depends on what kind of service you're reaching for. Some services only return 500'ish errors if something really is broken and is likely to stay like that for a long time. But this day and age, with load balancers protecting a cluster of web heads, a lot of 500 errors are just temporary. Obivously, if you're trying to do something very specific like requests_retry_session().post(...) with very specific parameters you probably don't want to retry on 5xx errors.








New