Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Requests are not retried when received body length is shorter than Content-Length #6512

Open
zweger opened this issue Aug 23, 2023 · 5 comments

Comments

@zweger
Copy link

zweger commented Aug 23, 2023

When a server sends less bytes than indicated by Content-Length, we get a ChunkedEncodingError instead of retrying the request.

urllib3 supports retrying requests in this situation by setting preload_content=True. When a user specifies stream=True, obviously, all bets are off: the response cannot be preloaded and therefore the request cannot be retried. However, even when stream=False, the response is still not preloaded and therefore the urllib3 retry mechanism in this situation is bypassed.


As a background to this issue, I've been investigating rare failures in my CI builds during pip install. I believe this issue to be the proximate cause: pip makes some requests to PyPI, with stream=False and retries configured but still fails.

In the current version of pip (which has an out of date urllib3 package), pip falls victim to #4956 and fails to parse the PyPI metadata with a JSONDecodeError. Upgrading pip's urllib3 version results in a ChunkedEncodingError as below.

Expected Result

The request is retried according to the specified retry policy.

Actual Result

requests.exceptions.ChunkedEncodingError: ('Connection broken: IncompleteRead(10 bytes read, 26227 more expected)', IncompleteRead(10 bytes read, 26227 more expected))

Because the response is not preloaded, urllib3 cannot retry the request, and requests has no retry functionality of its own.

Reproduction Steps

import requests
from requests.adapters import HTTPAdapter

s = requests.Session()
s.mount("http://", HTTPAdapter(max_retries=5))

r = s.get('http://127.0.0.1:5000/test', stream=False)

I'm using an intentionally broken local server for testing. See here for an example.

System Information

$ python -m requests.help
{
  "chardet": {
    "version": null
  },
  "charset_normalizer": {
    "version": "3.2.0"
  },
  "cryptography": {
    "version": ""
  },
  "idna": {
    "version": "3.4"
  },
  "implementation": {
    "name": "CPython",
    "version": "3.11.4"
  },
  "platform": {
    "release": "6.4.11-100.fc37.x86_64",
    "system": "Linux"
  },
  "pyOpenSSL": {
    "openssl_version": "",
    "version": null
  },
  "requests": {
    "version": "2.31.0"
  },
  "system_ssl": {
    "version": "30000090"
  },
  "urllib3": {
    "version": "2.0.4"
  },
  "using_charset_normalizer": true,
  "using_pyopenssl": false
}

Proposed Patch

I have a proposed patch which I believe fixes this problem. Unfortunately, my patch breaks a bunch of the tests (and probably also breaks backwards compatibility, in particular, this patch causes requests to start leaking urllib3 exceptions). On the off chance it's useful in coming up with a proper fix, here it is:

diff --git a/src/requests/adapters.py b/src/requests/adapters.py
index eb240fa9..ce01c2a5 100644
--- a/src/requests/adapters.py
+++ b/src/requests/adapters.py
@@ -489,8 +489,8 @@ class HTTPAdapter(BaseAdapter):
                 headers=request.headers,
                 redirect=False,
                 assert_same_host=False,
-                preload_content=False,
-                decode_content=False,
+                preload_content=not stream,
+                decode_content=not stream,
                 retries=self.max_retries,
                 timeout=timeout,
                 chunked=chunked,
diff --git a/src/requests/models.py b/src/requests/models.py
index 44556394..f43f1bf8 100644
--- a/src/requests/models.py
+++ b/src/requests/models.py
@@ -893,6 +893,8 @@ class Response:
 
             if self.status_code == 0 or self.raw is None:
                 self._content = None
+            elif getattr(self.raw, "data", None) is not None:
+                self._content = self.raw.data
             else:
                 self._content = b"".join(self.iter_content(CONTENT_CHUNK_SIZE)) or b""
 
diff --git a/tests/test_lowlevel.py b/tests/test_lowlevel.py
index 859d07e8..39a1175e 100644
--- a/tests/test_lowlevel.py
+++ b/tests/test_lowlevel.py
@@ -4,6 +4,7 @@ import pytest
 from tests.testserver.server import Server, consume_socket_content
 
 import requests
+from requests.adapters import HTTPAdapter
 from requests.compat import JSONDecodeError
 
 from .utils import override_environ
@@ -426,3 +427,33 @@ def test_json_decode_compatibility_for_alt_utf_encodings():
     assert isinstance(excinfo.value, requests.exceptions.RequestException)
     assert isinstance(excinfo.value, JSONDecodeError)
     assert r.text not in str(excinfo.value)
+
+
+def test_retry_truncated_response():
+    data = b"truncated before retry"
+    response_lengths = [len(data), 9]
+
+    def retry_handler(sock):
+        request_content = consume_socket_content(sock, timeout=0.5)
+
+        response = (
+            b"HTTP/1.1 200 OK\r\n"
+            b"Content-Length: %d\r\n\r\n"
+            b"%s"
+        ) % (len(data), data[:response_lengths.pop()])
+        sock.send(response)
+
+        return request_content
+
+    close_server = threading.Event()
+    server = Server(retry_handler, wait_to_close_event=close_server, requests_to_handle=2)
+
+    s = requests.Session()
+    s.mount("http://", HTTPAdapter(max_retries=2))
+
+    with server as (host, port):
+        url = f"http://{host}:{port}/"
+        r = s.get(url, stream=False)
+        assert r.status_code == 200
+        assert r.content == data
+        close_server.set()
@alain-khalil
Copy link

alain-khalil commented Oct 17, 2023

I really want to work on this issue, but I can't reproduce the bug. For me, the exception ChunkedEncodingError is not raised if the server sends less bytes than indicated by Content-Length. Here is my test case

`import requests
from requests.exceptions import ChunkedEncodingError

def test_chunked_encoding_error():
# Create a mock server response with an incorrect Content-Length header
response = requests.Response()
response.status_code = 200
response.headers['Content-Length'] = '10000000' # Set an incorrect length

# Simulate the content stream
def mock_stream():
    yield b'Chunk 1'
    yield b'Chunk 2'
    yield b'Chunk 3'

response._content = b''.join(mock_stream())

# Attempt to read the content
try:
    response.content
except ChunkedEncodingError:
    print("Failed with ChunkedEncodingError")
else:
    assert False, "Expected a ChunkedEncodingError"

if name == "main":
test_chunked_encoding_error()`

@nerdvegas
Copy link

Hi, I'm having the same problem, with urllib v2.0.6, requests v2.31.0.

Specifically I am getting this when communicating with a gitlab server:
requests.exceptions.ChunkedEncodingError: ('Connection broken: IncompleteRead(0 bytes read, 2 more expected)', IncompleteRead(0 bytes read, 2 more expected))

When I pin urllib3 back to v1.26.16, the problem goes away.

@zweger
Copy link
Author

zweger commented Oct 20, 2023

Hi @alain-khalil , thanks for taking a look at this issue. The test case included in my proposed patch in the issue description is sufficient to reproduce this issue: requests.exceptions.ChunkedEncodingError: ('Connection broken: IncompleteRead(9 bytes read, 13 more expected)', IncompleteRead(9 bytes read, 13 more expected)).

I was originally testing on commit 2ee5b0b but can confirm it still works on main (839a8ed). I think the way you're setting response._content in your test case is just too dissimilar to how requests works in normal operation to trigger the bug.

@nerdvegas , there were some changes in urllib3 2.x (urllib3/urllib3#2514) which causes urllib3 to (correctly) raise an error when the read comes up short. The previous behavior was to read the truncated data, but (incorrectly) not raise an error.

@georgruetsche
Copy link

Maybe try annother code part like:

import requests
from requests.adapters import HTTPAdapter
from urllib3.util.retry import Retry

retry_strategy = Retry(
total=5,
backoff_factor=0.1,
status_forcelist=[500, 502, 503, 504],
method_whitelist=["GET"],
)

adapter = HTTPAdapter(max_retries=retry_strategy)

s = requests.Session()
s.mount("http://", adapter)

r = s.get('http://127.0.0.1:5000/test', stream=False)

print(r)

@jameschristopher
Copy link

@zweger did you ever find a workaround for this?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants