-
Notifications
You must be signed in to change notification settings - Fork 13
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix: prevent raw decoder from closing before sync completion #366
Conversation
📝 WalkthroughWalkthroughThis pull request updates the Changes
Possibly related PRs
Suggested reviewers
Wdyt about adding any additional details or clarifications for the reviewers? ✨ Finishing Touches
🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments. CodeRabbit Commands (Invoked using PR comments)
Other keywords and placeholders
CodeRabbit Configuration File (
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Nitpick comments (1)
airbyte_cdk/sources/declarative/decoders/composite_raw_decoder.py (1)
154-156
: Good improvement to resource management!This change properly addresses the premature closing issue by disabling auto_close and explicitly closing the response.raw stream after all parsing is complete. This should prevent the ValueError about I/O operations on closed files during sync operations.
Have you considered adding a try-finally block to ensure the raw stream always gets closed even if parsing raises an exception? Something like:
if self.is_stream_response(): response.raw.auto_close = False - yield from self.parser.parse(data=response.raw) # type: ignore[arg-type] - response.raw.close() + try: + yield from self.parser.parse(data=response.raw) # type: ignore[arg-type] + finally: + response.raw.close()This would ensure proper cleanup even in error scenarios. Wdyt?
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
airbyte_cdk/sources/declarative/decoders/composite_raw_decoder.py
(1 hunks)
⏰ Context from checks skipped due to timeout of 90000ms (1)
- GitHub Check: Check: 'source-shopify' (skip=false)
@maxi297 to add a test on that |
/autofix
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🔭 Outside diff range comments (1)
unit_tests/sources/declarative/decoders/test_composite_decoder.py (1)
1-258
:⚠️ Potential issueFix Ruff formatting issues in this file.
The pipeline is reporting Ruff formatting issues that need to be addressed:
[error] 1-1: Ruff formatting check failed. 1 file would be reformatted. Run 'ruff format' to fix code style issues in this file.
Run
ruff format
on this file to fix the formatting issues and ensure the CI pipeline passes.🧰 Tools
🪛 GitHub Actions: Linters
[error] 1-1: Ruff formatting check failed. 1 file would be reformatted. Run 'ruff format' to fix code style issues in this file.
🧹 Nitpick comments (1)
unit_tests/sources/declarative/decoders/test_composite_decoder.py (1)
215-226
: Consider making the test more robust against port conflicts, wdyt?The test correctly sets up a real HTTP server to verify the decoder doesn't prematurely close the response stream. However, using a hardcoded port (8080) might cause test failures if that port is already in use on the CI system or developer machine.
Could you consider using port 0 to let the OS assign an available port, then retrieve the assigned port?
- httpd = HTTPServer(('localhost', 8080), TestServer) + httpd = HTTPServer(('localhost', 0), TestServer) + port = httpd.server_port thread = Thread(target=httpd.serve_forever, args = ()) thread.start() - response = requests.get("http://localhost:8080", stream=True) + response = requests.get(f"http://localhost:{port}", stream=True)
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
unit_tests/sources/declarative/decoders/test_composite_decoder.py
(2 hunks)
🧰 Additional context used
🪛 GitHub Actions: Linters
unit_tests/sources/declarative/decoders/test_composite_decoder.py
[error] 1-1: Ruff formatting check failed. 1 file would be reformatted. Run 'ruff format' to fix code style issues in this file.
⏰ Context from checks skipped due to timeout of 90000ms (8)
- GitHub Check: Check: 'source-pokeapi' (skip=false)
- GitHub Check: Check: 'source-amplitude' (skip=false)
- GitHub Check: Check: 'source-shopify' (skip=false)
- GitHub Check: Check: 'source-hardcoded-records' (skip=false)
- GitHub Check: Pytest (All, Python 3.11, Ubuntu)
- GitHub Check: Pytest (All, Python 3.10, Ubuntu)
- GitHub Check: Pytest (Fast)
- GitHub Check: Analyze (python)
🔇 Additional comments (3)
unit_tests/sources/declarative/decoders/test_composite_decoder.py (3)
7-9
: New imports support the HTTP server test implementation, looks good.The addition of
BaseHTTPRequestHandler
,HTTPServer
, andThread
imports supports the new test case that verifies the fix for premature raw decoder closing.
207-213
: Simple test server implementation that serves CSV data, looks good!This
TestServer
class provides a minimal HTTP server that returns a simple CSV response. It's a good approach for testing the decoder with a real HTTP response stream rather than just using mocks.
215-226
: This test directly addresses the PR objective - good work!This new test case directly verifies the fix for the issue described in the PR: "prevent raw decoder from closing before sync completion". By setting up a real HTTP server and response stream, it tests the actual behavior of the
CompositeRawDecoder
with CSV data in a way that would reveal if the file is closed prematurely.The test is simple but focused on the specific issue, which is exactly what was needed based on the reviewer's comment requesting a test.
unit_tests/sources/declarative/decoders/test_composite_decoder.py
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Nitpick comments (2)
unit_tests/sources/declarative/decoders/test_composite_decoder.py (2)
214-228
: Good test for reproducing the closed file issueThis test does a great job at simulating the real-world scenario where
ValueError: I/O operation on closed file
was occurring. Using a real HTTP server instead of mocks is an excellent approach to catch the actual behavior.A couple of suggestions to make this even more robust:
- Would it be helpful to use a random port instead of hardcoded 8080 to avoid potential conflicts during testing?
httpd = HTTPServer(('localhost', 0), TestServer)
and then get the port withport = httpd.server_port
- Maybe add an assertion on the actual content of the parsed result to ensure data integrity? Something like:
assert result[0] == {'col1': 'val1', 'col2': 'val2'}
What do you think? Particularly for CI environments where port conflicts might arise.
227-228
: Clean server shutdown implementationGood job on the proper cleanup with both
httpd.shutdown()
andthread.join()
. This ensures the test doesn't leave hanging threads or ports in use.You might consider adding a timeout to the thread.join() to prevent tests from hanging if the server doesn't shut down properly, wdyt?
- thread.join() + thread.join(timeout=5) # Add timeout to prevent hanging tests
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
unit_tests/sources/declarative/decoders/test_composite_decoder.py
(2 hunks)
⏰ Context from checks skipped due to timeout of 90000ms (7)
- GitHub Check: Check: 'source-pokeapi' (skip=false)
- GitHub Check: Check: 'source-amplitude' (skip=false)
- GitHub Check: Check: 'source-shopify' (skip=false)
- GitHub Check: Check: 'source-hardcoded-records' (skip=false)
- GitHub Check: Pytest (Fast)
- GitHub Check: Pytest (All, Python 3.11, Ubuntu)
- GitHub Check: Pytest (All, Python 3.10, Ubuntu)
🔇 Additional comments (4)
unit_tests/sources/declarative/decoders/test_composite_decoder.py (4)
207-212
: Well-structured test server implementationNice implementation of the TestServer class. It's simple and focused on just what's needed to test the CSV parsing functionality.
215-217
: Clear docstring explaining the test's purposeGreat job clearly documenting the issue this test is addressing. The docstring effectively explains why this test is needed beyond the existing mocked tests.
219-221
: Thread management looks goodThe server setup and thread management approach is solid. You're properly creating the server and starting it in a background thread.
223-226
: Core test functionality looks goodThis is testing the key functionality - ensuring that the CSV parser can properly process a streamed response without encountering closed file errors.
during sync or during the test_read? I can imagine that when we use @ChristoGrab , Could you share the link with this error in sync during the read? |
UPD: adding \n to the Maxime's test allows parser to work even without closing the response
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM!
approving to unblock, we might need to introduce another buffer to the CSV parser in order to handle and check for \n at the end of the line.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
🔭 Outside diff range comments (1)
unit_tests/sources/declarative/decoders/test_composite_decoder.py (1)
1-267
: 💡 Verification agent🧩 Analysis chain
Fix formatting issues flagged by the pipeline.
The pipeline reports that Ruff formatting check failed. Please run 'ruff format' to fix the code style issues in this file.
🏁 Script executed:
#!/bin/bash # Run formatter on the file to see what would change ruff format --diff unit_tests/sources/declarative/decoders/test_composite_decoder.pyLength of output: 837
Action: Please run
ruff format
to adjust the file's formatting.The diff from the formatter indicates that one file (specifically, around line 222 in
unit_tests/sources/declarative/decoders/test_composite_decoder.py
) would be modified by removing an extra newline. Could you runruff format
on this file to fix the style issues as flagged by the pipeline? wdyt?🧰 Tools
🪛 GitHub Actions: Linters
[error] 1-1: Ruff formatting check failed. 1 file would be reformatted. Please run 'ruff format' to fix code style issues in this file.
🧹 Nitpick comments (1)
unit_tests/sources/declarative/decoders/test_composite_decoder.py (1)
207-212
: Simple server implementation looks good, but could potentially be enhanced.The
TestServer
class implements a minimal HTTP server that returns a CSV string without a trailing newline character, which appears intentional based on the test's docstring. This is a good approach for reproducing the specific issue.Would you consider adding some basic error handling or logging to make debugging easier if the test fails? Just a thought - this is fine as is for the specific use case.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (2)
airbyte_cdk/sources/declarative/decoders/composite_raw_decoder.py
(1 hunks)unit_tests/sources/declarative/decoders/test_composite_decoder.py
(2 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
- airbyte_cdk/sources/declarative/decoders/composite_raw_decoder.py
🧰 Additional context used
🪛 GitHub Actions: Linters
unit_tests/sources/declarative/decoders/test_composite_decoder.py
[error] 1-1: Ruff formatting check failed. 1 file would be reformatted. Please run 'ruff format' to fix code style issues in this file.
⏰ Context from checks skipped due to timeout of 90000ms (10)
- GitHub Check: Check: 'source-pokeapi' (skip=false)
- GitHub Check: Check: 'source-amplitude' (skip=false)
- GitHub Check: Check: 'source-shopify' (skip=false)
- GitHub Check: Check: 'source-hardcoded-records' (skip=false)
- GitHub Check: Pytest (All, Python 3.11, Ubuntu)
- GitHub Check: Analyze (python)
- GitHub Check: Pytest (Fast)
- GitHub Check: Pytest (All, Python 3.10, Ubuntu)
- GitHub Check: SDM Docker Image Build
- GitHub Check: preview_docs
🔇 Additional comments (3)
unit_tests/sources/declarative/decoders/test_composite_decoder.py (3)
7-9
: Appropriate imports for the new test functionality.The added imports for HTTP server functionality and threading are well-aligned with the new test requirements. These are necessary to create a real HTTP server for testing the CSV parser without mocked responses.
214-224
: Well-documented test case explaining the issue being fixed.Excellent documentation explaining the rationale for this test and the specific conditions needed to reproduce the issue. This context is very valuable for future maintainers.
The commented-out CSV content is useful for reference, though I wonder if we might want to try reproducing with that exact content in a future test?
227-229
: Consider using a dynamic port instead of hardcoded 8080.Using port 8080 hardcoded could cause test failures if that port is already in use by another process. This could lead to intermittent CI failures that are hard to debug.
Would you consider using a dynamic port allocation approach instead? Maybe something like:
with socket.socket() as s: s.bind(('localhost', 0)) port = s.getsockname()[1] httpd = HTTPServer(("localhost", port), TestServer)This would help prevent port conflicts, wdyt?
unit_tests/sources/declarative/decoders/test_composite_decoder.py
Outdated
Show resolved
Hide resolved
Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com>
/autofix
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 2
🧹 Nitpick comments (2)
unit_tests/sources/declarative/decoders/test_composite_decoder.py (2)
231-234
: Consider adding more assertions to validate CSV parsing.The test currently only verifies that the result has a length of 1, but it doesn't check the actual content of the parsed CSV.
Would it be helpful to add more specific assertions to verify the parsed data structure? Perhaps:
result = list(CompositeRawDecoder(parser=CsvParser()).decode(response)) assert len(result) == 1 + assert result[0] == {'col1': 'val1', 'col2': 'val2'}
This would ensure not just that some data was parsed, but that it was parsed correctly. wdyt?
225-237
:❓ Verification inconclusive
Consider adding a port finder to avoid conflicts.
Using a hardcoded port (8080) might lead to test failures if that port is already in use on the test system.
Would you consider implementing a dynamic port allocation? Something like:
- httpd = HTTPServer(("localhost", 8080), TestServer) + def find_free_port(): + with socket.socket(socket.AF_INET, socket.SOCK_STREAM) as s: + s.bind(('', 0)) + return s.getsockname()[1] + + port = find_free_port() + httpd = HTTPServer(("localhost", port), TestServer)This would make the test more reliable across different environments. What do you think?
🌐 Web query:
How to find a free port in Python for HTTP server testing?
💡 Result:
To find a free port in Python for HTTP server testing, there are several reliable methods depending on your use case:
1. Using sockets with OS-assigned ports
The most common method involves creating a socket and letting the OS assign a free port automatically via
bind(('', 0))
:import socket def get_free_port(): with socket.socket(socket.AF_INET, socket.SOCK_STREAM) as s: s.bind(('', 0)) return s.getsockname()[1]
- Advantages: Simple, returns an available port immediately[1][4]
- Limitations: Port may be reclaimed by the OS after socket closure, creating a race condition[1][4]
- Solution: Use the socket before closing it to reserve the port[1][7]
2. Atomic port allocation for servers
For HTTP servers, create and bind the socket first, then pass it to the server:
from http.server import HTTPServer # Create and bind socket first sock = socket.socket(socket.AF_INET, socket.SOCK_STREAM) sock.bind(('', 0)) port = sock.getsockname()[1] # Create server using existing socket server = HTTPServer(('', port), YourHandlerClass, bind_and_activate=False) server.socket = sock server.server_bind = lambda: None # Disable default binding server.server_activate()This prevents race conditions by keeping the port reserved[1][7].
3. Using testing frameworks
For pytest integration,
pytest-httpserver
automatically handles port allocation:def test_server(httpserver): httpserver.expect_request("/test").respond_with_data("OK") # Access via httpserver.port assert requests.get(httpserver.url_for("/test")).status_code == 200
- Automatically selects and reserves a port[5][10]
- No manual port management required[10]
4. Built-in Python HTTP server
When using
python -m http.server
, specify0
to get an OS-assigned port:python -m http.server 0 Serving HTTP on 0.0.0.0 port 12345 # Random port shown here
- Port is printed after server starts[8][9]
- No race condition since server keeps port occupied[9]
Method Best Use Case Race Condition Risk Socket binding Quick port checks High Atomic server setup Production-grade servers None pytest-httpserver Test environments None Built-in HTTP server Manual testing None For testing scenarios, Method 3 (pytest-httpserver) is recommended as it handles port management automatically. For manual server creation, Method 2 provides the safest approach by reserving the port before server initialization[1][5][9].
Citations:
- 1: https://gist.github.com/gabrielfalcao/20e567e188f588b65ba2
- 2: https://web-platform-tests.org/running-tests/from-local-system.html
- 3: https://gist.github.com/betrcode/0248f0fda894013382d7
- 4: https://www.scivision.dev/get-available-port-for-shell-script/
- 5: https://pytest-httpserver.readthedocs.io
- 6: https://www.redhat.com/en/blog/test-tcp-python-scapy
- 7: https://python-forum.io/thread-11375.html
- 8: https://developer.mozilla.org/en-US/docs/Learn_web_development/Howto/Tools_and_setup/set_up_a_local_testing_server
- 9: https://realpython.com/python-http-server/
- 10: https://dev.to/bowmanjd/two-methods-for-testing-https-api-calls-with-python-and-pytest-and-also-communicating-with-the-in-laws-388n
Dynamic Port Allocation Suggestion
Currently, the test uses a hardcoded port (8080) which could lead to conflicts if that port is already in use on the test system. Would you consider implementing a dynamic port finder to improve reliability? For example, you might replace the fixed port with something like:
- httpd = HTTPServer(("localhost", 8080), TestServer) + def find_free_port(): + import socket + with socket.socket(socket.AF_INET, socket.SOCK_STREAM) as s: + s.bind(('', 0)) + return s.getsockname()[1] + + port = find_free_port() + httpd = HTTPServer(("localhost", port), TestServer)This approach, as verified by current best practices, should help avoid potential test failures. Alternatively, if you’re concerned about race conditions (i.e., the port might be reclaimed before the server starts), you might consider binding the socket first and passing it to the server for an atomic allocation. What do you think?
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
unit_tests/sources/declarative/decoders/test_composite_decoder.py
(2 hunks)
⏰ Context from checks skipped due to timeout of 90000ms (8)
- GitHub Check: Check: 'source-pokeapi' (skip=false)
- GitHub Check: Check: 'source-amplitude' (skip=false)
- GitHub Check: Check: 'source-shopify' (skip=false)
- GitHub Check: Check: 'source-hardcoded-records' (skip=false)
- GitHub Check: Analyze (python)
- GitHub Check: SDM Docker Image Build
- GitHub Check: Pytest (All, Python 3.10, Ubuntu)
- GitHub Check: Pytest (Fast)
🔇 Additional comments (2)
unit_tests/sources/declarative/decoders/test_composite_decoder.py (2)
207-212
: Clean implementation of the test server.The
TestServer
class is well-structured and focused on its single responsibility - providing a simple CSV response. Nice job!
214-224
: Great documentation on the test case background and motivation.The comprehensive docstring really helps explain why this test is necessary and the history of the issue. This context is invaluable for future maintainers.
unit_tests/sources/declarative/decoders/test_composite_decoder.py
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Nitpick comments (1)
unit_tests/sources/declarative/decoders/test_composite_decoder.py (1)
226-226
: Consider implementing dynamic port allocation.The hardcoded port (8080) might cause test failures if that port is already in use by another process. Would you consider implementing a utility to find an available port dynamically? This would make the test more robust in CI environments and when running locally.
- httpd = HTTPServer(("localhost", 8080), TestServer) + import socket + + def find_available_port(): + with socket.socket(socket.AF_INET, socket.SOCK_STREAM) as s: + s.bind(('localhost', 0)) + return s.getsockname()[1] + + port = find_available_port() + httpd = HTTPServer(("localhost", port), TestServer)wdyt?
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
unit_tests/sources/declarative/decoders/test_composite_decoder.py
(2 hunks)
⏰ Context from checks skipped due to timeout of 90000ms (1)
- GitHub Check: Analyze (python)
🔇 Additional comments (5)
unit_tests/sources/declarative/decoders/test_composite_decoder.py (5)
7-9
: Well-chosen imports for the new test.The addition of
BaseHTTPRequestHandler
,HTTPServer
, andThread
imports are appropriate for implementing the local HTTP server test. This approach allows for testing with actual HTTP responses rather than mocks.
207-212
: Nice implementation of the TestServer class.The TestServer implementation is focused and concise. I notice you're intentionally providing a CSV without a trailing newline character to reproduce the specific issue mentioned in the test docstring.
214-224
: Excellent detailed docstring explaining the test purpose and issue context.The docstring thoroughly explains the context, the problem being solved, and the reasoning behind the specific test implementation. This kind of documentation is invaluable for future maintainers who might need to understand why this test exists.
228-236
: Great error handling and resource cleanup.I appreciate the proper thread management with try-finally and thread.join(). This ensures resources are properly cleaned up even if an exception occurs during the test.
230-231
: Ensure consistent port reference.If you implement dynamic port allocation as suggested earlier, don't forget to update the port used in the request URL as well.
- response = requests.get(f"http://localhost:8080", stream=True) + response = requests.get(f"http://localhost:{port}", stream=True)wdyt?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🔭 Outside diff range comments (1)
unit_tests/sources/declarative/decoders/test_composite_decoder.py (1)
1-1
: 💡 Verification agent🧩 Analysis chain
Fix formatting issue flagged by pipeline.
The CI pipeline indicates there's a formatting issue:
Ruff formatting check failed. 1 file would be reformatted.
Would you consider running
ruff format
on this file to fix the style issues?
🏁 Script executed:
#!/bin/bash # Check what formatting changes ruff would make ruff format --diff unit_tests/sources/declarative/decoders/test_composite_decoder.pyLength of output: 581
Please run
ruff format
on this file to update the quote style – wdyt?I verified the formatting diff and it shows that the only change required is to switch the quotes in the
s.bind
call from single quotes to double quotes:
- Before:
s.bind(('localhost', 0))
- After:
s.bind(("localhost", 0))
Could you run
ruff format unit_tests/sources/declarative/decoders/test_composite_decoder.py
so that the file meets the CI formatting standards?🧰 Tools
🪛 GitHub Actions: Linters
[error] 1-1: Ruff formatting check failed. 1 file would be reformatted. Run 'ruff format' to fix code style issues in this file.
🧹 Nitpick comments (1)
unit_tests/sources/declarative/decoders/test_composite_decoder.py (1)
232-244
: Well-structured test with proper resource cleanup.The test correctly:
- Finds a dynamic port
- Starts the server in a thread
- Uses try-finally to ensure cleanup
- Joins the thread with a timeout
This addresses previous review feedback about thread cleanup and error handling. Good job!
One small suggestion though - would you consider adding a small sleep after starting the thread to ensure the server is fully up before making the request? This might prevent occasional flaky test failures, wdyt?
thread.start() +import time +time.sleep(0.1) # Short delay to ensure server is ready try:
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
unit_tests/sources/declarative/decoders/test_composite_decoder.py
(3 hunks)
🧰 Additional context used
🪛 GitHub Actions: Linters
unit_tests/sources/declarative/decoders/test_composite_decoder.py
[error] 1-1: Ruff formatting check failed. 1 file would be reformatted. Run 'ruff format' to fix code style issues in this file.
⏰ Context from checks skipped due to timeout of 90000ms (9)
- GitHub Check: Check: 'source-pokeapi' (skip=false)
- GitHub Check: Check: 'source-amplitude' (skip=false)
- GitHub Check: Check: 'source-shopify' (skip=false)
- GitHub Check: Check: 'source-hardcoded-records' (skip=false)
- GitHub Check: Pytest (All, Python 3.11, Ubuntu)
- GitHub Check: Pytest (All, Python 3.10, Ubuntu)
- GitHub Check: SDM Docker Image Build
- GitHub Check: Pytest (Fast)
- GitHub Check: Analyze (python)
🔇 Additional comments (4)
unit_tests/sources/declarative/decoders/test_composite_decoder.py (4)
7-10
: Good addition of necessary imports for the server-based test.These imports support the new HTTP server implementation and dynamic port finding - both needed for testing with a real server rather than mocked responses.
26-29
: Nice implementation of dynamic port finding!This is a clean solution for finding an available port dynamically, which prevents potential port conflicts during test runs. The type ignore comment is also properly used.
214-219
: Simple yet effective test server implementation.The test server correctly serves a minimal CSV payload. I notice the CSV intentionally doesn't end with a newline character - which helps reproduce the specific edge case you're testing for.
221-231
: Thorough docstring explaining the test purpose and reproduction steps.The docstring clearly explains why this test is needed and the specific issue it's designed to reproduce. The commented-out data sample is helpful for future reference.
/autofix
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Nitpick comments (2)
unit_tests/sources/declarative/decoders/test_composite_decoder.py (2)
228-228
: Minor wording issue in docstring.Looks like there's a small typo in the docstring - "this is the only we were able to reproduce locally" seems to be missing a word. Did you mean "the only way" perhaps? wdyt?
230-230
: Consider externalizing the commented test data.This commented line contains a large CSV string that might make the test harder to read. Would you consider moving it to a constant at the top of the file or to a separate fixture file to improve readability? wdyt?
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
unit_tests/sources/declarative/decoders/test_composite_decoder.py
(3 hunks)
⏰ Context from checks skipped due to timeout of 90000ms (9)
- GitHub Check: Check: 'source-pokeapi' (skip=false)
- GitHub Check: Check: 'source-amplitude' (skip=false)
- GitHub Check: Check: 'source-shopify' (skip=false)
- GitHub Check: Check: 'source-hardcoded-records' (skip=false)
- GitHub Check: Pytest (All, Python 3.11, Ubuntu)
- GitHub Check: Pytest (Fast)
- GitHub Check: SDM Docker Image Build
- GitHub Check: Pytest (All, Python 3.10, Ubuntu)
- GitHub Check: Analyze (python)
🔇 Additional comments (4)
unit_tests/sources/declarative/decoders/test_composite_decoder.py (4)
26-29
: Nice implementation of port finder!This is a clean and effective way to dynamically find an available port for testing. Using a context manager ensures proper socket cleanup.
214-219
: Simple and effective test server implementation!The test server implementation is concise and serves its purpose well. It correctly responds with a minimal CSV payload that helps reproduce the issue.
237-241
: Excellent use of try/finally for cleanup!Great job implementing the try/finally pattern to ensure proper resource cleanup. This addresses the previous review comment about thread cleanup and error handling.
232-244
: Consider adding server readiness check.The test starts a server in a separate thread but immediately makes a request without ensuring the server is ready. In rare cases, this might lead to intermittent test failures.
Would you consider adding a simple readiness check before making the request? Something like:
thread.start() +# Ensure server is ready before making request +def is_server_ready(): + try: + with socket.socket(socket.AF_INET, socket.SOCK_STREAM) as s: + return s.connect_ex(('localhost', port)) == 0 + except: + return False + +# Wait for server to be ready with timeout +import time +start_time = time.time() +while not is_server_ready() and time.time() - start_time < 3: + time.sleep(0.1) + try: response = requests.get(f"http://localhost:{port}", stream=True)This would make the test more robust against timing issues, wdyt?
What
We've run into an issue with the
CsvDecoder
closing files prematurely during syncs, leading to crashes when syncing and/or setting up sources:The problem seems to be that the connection is automatically closed once the last chunk is being processed, preventing the process from gracefully completing. By explicitly closing the connection AFTER we have yielded the data, we can ensure that the connection remains until all data has been parsed and read.
Note that not only csv fails with the same error but pandas as well which seems to indicate an issue regarding how urllib3 handles closing of IO rather than an issue with the Python csv standard lib.
Note
From docs
Sometimes you want to use io.TextIOWrapper or similar objects like a CSV reader directly with HTTPResponse data. Making these two interfaces play nice together requires using the auto_close attribute by setting it to False. By default HTTP responses are closed after reading all bytes, this disables that behavior:
Summary by CodeRabbit
Summary by CodeRabbit
Bug Fixes
New Features
CompositeRawDecoder
.