Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: Add ClpKeyValuePairStreamHandler which supports logging dictionary-type log events into CLP's key-value pair IR format. #46

Merged
merged 26 commits into from
Feb 17, 2025

Conversation

LinZhihao-723
Copy link
Member

@LinZhihao-723 LinZhihao-723 commented Nov 30, 2024

Description

This PR adds ClpKeyValuePairStreamHandler to support serializing key-value pair log events using the latest CLP IR format. It implements everything that CLPStreamHandler supports, except that the log event given by users is a Python dictionary but not a text string:

This PR also adds new unit test cases to test ClpKevValuePairStreamHandler's functionalities:

  • Ensure the serialized log events can be successfully deserialized and the results match the expected results.

NOTE: due to the limitation of the current unit test files, the current unit test cases duplicate some existing code, as explained in #47.

Validation performed

  • Ensure the newly added unit tests passed locally
  • Ensure workflows are all passed

Summary by CodeRabbit

Summary by CodeRabbit

  • New Features

    • Introduced a new logging handler for processing key-value pairs, enhancing structured logging capabilities.
    • Added functionality for generating auto-generated key-value pairs for logging events.
    • Added a new Timestamp class for improved time handling in logs.
    • Updated dependency version for improved version control.
  • Tests

    • Expanded test coverage for new logging functionalities and key-value pair handling, including various scenarios and conditions.

Copy link

coderabbitai bot commented Nov 30, 2024

"""

Walkthrough

The changes in this pull request involve updates to the pyproject.toml file, the introduction of new logging utilities in src/clp_logging, and enhancements to the testing framework in tests. Specifically, the dependency on clp-ffi-py has been fixed to a specific version. New classes and methods have been added to facilitate structured logging with key-value pairs, including a new logging handler. The testing suite has been expanded to include tests for these new features and ensure proper functionality.

Changes

File Change Summary
pyproject.toml Dependency clp-ffi-py updated from >= 0.0.11 to >= 0.0.14.
src/clp_logging/auto_generated_kv_pairs_utils.py New class AutoGeneratedKeyValuePairsBuffer added; methods for generating log metadata introduced.
src/clp_logging/handlers.py New ClpKeyValuePairStreamHandler class added; methods for handling key-value pair logging introduced.
tests/__init__.py Import of TestClpKeyValuePairLoggingBase added to include new test cases in the test suite.
tests/test_handlers.py New classes and tests for logging functionality added, including ExpectedLogEvent and various test handlers.
src/clp_logging/utils.py New class Timestamp added; methods for managing Unix timestamps and UTC offsets introduced.

Sequence Diagram(s)

sequenceDiagram
    participant User
    participant Logger
    participant Handler

    User->>Logger: Log event with key-value pairs
    Logger->>Handler: Process log event
    Handler->>Handler: Serialize log event to CLP format
    Handler-->>User: Log event confirmation
Loading

"""

✨ Finishing Touches
  • 📝 Generate Docstrings (Beta)

Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media?

❤️ Share
🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

  • Review comments: Directly reply to a review comment made by CodeRabbit. Example:
    • I pushed a fix in commit <commit_id>, please review it.
    • Generate unit testing code for this file.
    • Open a follow-up GitHub issue for this discussion.
  • Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
    • @coderabbitai generate unit testing code for this file.
    • @coderabbitai modularize this function.
  • PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
    • @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
    • @coderabbitai read src/utils.ts and generate unit testing code.
    • @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.
    • @coderabbitai help me debug CodeRabbit configuration file.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (Invoked using PR comments)

  • @coderabbitai pause to pause the reviews on a PR.
  • @coderabbitai resume to resume the paused reviews.
  • @coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
  • @coderabbitai full review to do a full review from scratch and review all the files again.
  • @coderabbitai summary to regenerate the summary of the PR.
  • @coderabbitai generate docstrings to generate docstrings for this PR. (Beta)
  • @coderabbitai resolve resolve all the CodeRabbit review comments.
  • @coderabbitai configuration to show the current CodeRabbit configuration for the repository.
  • @coderabbitai help to get help.

Other keywords and placeholders

  • Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
  • Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
  • Add @coderabbitai anywhere in the PR title to generate the title automatically.

CodeRabbit Configuration File (.coderabbit.yaml)

  • You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
  • Please see the configuration documentation for more information.
  • If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Documentation and Community

  • Visit our Documentation for detailed information on how to use CodeRabbit.
  • Join our Discord Community to get help, request features, and share feedback.
  • Follow us on X/Twitter for updates and announcements.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Outside diff range and nitpick comments (9)
src/clp_logging/auto_generated_kv_pairs_utils.py (3)

19-44: Consider making the buffer immutable

The buffer is initialized with a mutable dictionary that could be modified externally. Consider using a frozen or immutable data structure to prevent accidental modifications.


45-76: Add thread safety warning in docstring

The generate method modifies shared state (self._buf) which could lead to race conditions in multi-threaded environments. Consider adding a warning in the docstring about thread safety, or implement thread-safe mechanisms.

Example addition to docstring:

        """
        Generated auto-generated key-value pairs by populating the underlying
        buffer with the given log event metadata.

+       Note: This class is not thread-safe. Each thread should maintain its own
+       instance of AutoGeneratedKeyValuePairsBuffer.
+
        :param timestamp: The Unix epoch timestamp in millisecond of the log
            event.

1-94: Consider adding schema validation

The key-value pair structure is critical for the CLP IR format. Consider adding schema validation to ensure the generated structures always conform to the expected format, especially for the user-generated key-value pairs that will interact with this module.

src/clp_logging/handlers.py (3)

858-866: Avoid storing unsupported formatters

Since formatters are not supported by ClpKeyValuePairStreamHandler, assigning the formatter to self._formatter may cause confusion. Consider removing this assignment to prevent misleading users.

Apply this diff to remove the unnecessary assignment:

def setFormatter(self, fmt: Optional[logging.Formatter]) -> None:
    if fmt is None:
        return
    warnings.warn(
        f"Formatter is currently not supported in the current {self.__class__.__name__}",
        category=RuntimeWarning,
    )
-   self._formatter = fmt

929-942: Use record.created for consistent timestamps

Instead of using time.time(), utilize record.created to obtain the timestamp when the LogRecord was created. Multiplying record.created by 1000 will provide the timestamp in milliseconds, ensuring consistency and accuracy in log event timing.

Apply this diff to use record.created:

def _write(self, record: logging.LogRecord) -> None:
    if self._is_closed():
        raise IOError("The handler has been closed.")

    if not isinstance(record.msg, dict):
        raise TypeError("The log msg must be a valid Python dictionary.")

-   curr_ts: int = floor(time.time() * 1000)
+   curr_ts: int = floor(record.created * 1000)

    if self._loglevel_timeout is not None:
        self._loglevel_timeout.update(record.levelno, curr_ts, self._log_level_timeout_callback)

    self._serialize_kv_pair_log_event(
        self._auto_generated_kv_pairs_buf.generate(curr_ts, self._tz, record), record.msg
    )

964-967: Maintain consistent timestamping in callbacks

For consistency with other methods, consider using time.time() directly in _log_level_timeout_callback, or refactor the method to accept a LogRecord if applicable.

tests/test_handlers.py (3)

774-797: Consider using @dataclass to simplify ExpectedLogEvent.

The ExpectedLogEvent class can be simplified by using the @dataclass decorator from the dataclasses module. This reduces boilerplate code and enhances readability.

You can refactor the class as follows:

+from dataclasses import dataclass

-class ExpectedLogEvent:
-
-    def __init__(
-        self,
-        utc_epoch_ms: int,
-        tz: Optional[str],
-        level_no: int,
-        level_name: str,
-        path: Path,
-        line: Optional[int],
-        user_generated_kv_pairs: Dict[str, Any],
-    ) -> None:
-        self.utc_epoch_ms: int = utc_epoch_ms
-        self.tz: Optional[str] = tz
-        self.level_no: int = level_no
-        self.level_name: str = level_name
-        self.path: Path = path
-        self.line: Optional[int] = line
-        self.user_generated_kv_pairs: Dict[str, Any] = user_generated_kv_pairs
+@dataclass
+class ExpectedLogEvent:
+    utc_epoch_ms: int
+    tz: Optional[str]
+    level_no: int
+    level_name: str
+    path: Path
+    line: Optional[int]
+    user_generated_kv_pairs: Dict[str, Any]

803-805: Convert TODO comment into a GitHub issue for tracking refactoring.

The TODO comment suggests refactoring TestCLPBase to support both raw-text and key-value pair logging. Consider creating a GitHub issue to formalize and track this enhancement.

Would you like me to open a new GitHub issue to track this refactoring effort?


927-930: Remove unnecessary copy.deepcopy() for immutable data structures.

Since primitive_dict and primitive_array contain immutable primitive types, using copy.deepcopy() is unnecessary. You can use a shallow copy with copy() or simply assign the variables directly to improve performance.

Apply this diff to simplify the code:

-            dict_with_array: Dict[str, Any] = copy.deepcopy(primitive_dict)
+            dict_with_array: Dict[str, Any] = primitive_dict.copy()

             dict_with_array["array"] = primitive_array

-            array_with_dict: List[Any] = copy.deepcopy(primitive_array)
+            array_with_dict: List[Any] = primitive_array.copy()

             array_with_dict.append(primitive_dict)
📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL

📥 Commits

Reviewing files that changed from the base of the PR and between 43d1eaa and ec36f20.

📒 Files selected for processing (5)
  • pyproject.toml (1 hunks)
  • src/clp_logging/auto_generated_kv_pairs_utils.py (1 hunks)
  • src/clp_logging/handlers.py (4 hunks)
  • tests/__init__.py (2 hunks)
  • tests/test_handlers.py (4 hunks)
🔇 Additional comments (6)
pyproject.toml (1)

17-17: Verify the necessity of pinning to a beta version

The dependency clp-ffi-py has been pinned to a specific beta version (0.1.0b1). While this ensures reproducible builds, using beta versions in production dependencies could introduce stability risks.

Let's verify the changes and requirements:

Consider:

  1. Is this beta version specifically required for the new key-value pair functionality?
  2. Are there any known issues or breaking changes in this version?
  3. Would a stable version constraint (e.g., >=0.1.0) be more appropriate?
✅ Verification successful

Let me gather more information about the dependency requirements and changes.


Based on the gathered information, I can now provide the final response:

Pinning to beta version 0.1.0b1 is necessary

The codebase heavily integrates with clp-ffi-py for key-value pair logging functionality, as evidenced by the extensive usage in handlers and tests. The README explicitly states that the library relies on clp-ffi-py for IR analytics interface and advanced features. The beta version requirement appears intentional as it introduces necessary features for the key-value pair functionality implemented throughout the codebase.

  • The library uses KeyValuePairLogEvent and IR-related features from clp-ffi-py in multiple files
  • The README specifically references clp-ffi-py for "newest and improved CLP IR analytics interface"
  • The implementation shows tight coupling with specific IR format handling capabilities
🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Description: Check the changelog and related issues for breaking changes and verify if this version is required for the key-value pair functionality.

# Check for related issues or PRs mentioning the version requirement
gh search issues "clp-ffi-py 0.1.0b1" --repo y-scope/clp-loglib-py

# Check the changelog in clp-ffi-py repository
gh api repos/y-scope/clp-ffi-py/contents/CHANGELOG.md 2>/dev/null | jq -r '.content' | base64 -d || echo "Changelog not found"

# Look for any references to key-value pair functionality in the codebase
rg -i "key.*value|kv.*pair" --type py

Length of output: 8118


Script:

#!/bin/bash
# Check imports and usage of clp-ffi-py in the codebase
rg -l "from clp_ffi_py" --type py

# Check the specific version requirement in git history
git log -p pyproject.toml | grep -A 5 -B 5 "clp-ffi-py"

# Look for any documentation about dependencies
fd README.md | xargs cat

Length of output: 10770

tests/__init__.py (2)

4-8: LGTM! Clean import organization.

The grouped import of test base classes maintains a clear and organized structure.


42-44: LGTM! Verify test discovery for the new handler.

The addition follows the established pattern for test collection and properly integrates with the existing test loading mechanism.

Let's verify that the test discovery is working as expected:

✅ Verification successful

Test discovery is working correctly for the new handler

The test discovery is properly implemented with multiple test classes inheriting from TestClpKeyValuePairLoggingBase:

  • TestClpKeyValuePairHandlerBase - Base test class for key-value pair handlers
  • TestClpKeyValuePairHandlerLogLevelTimeoutBase - Base test class for log level timeout functionality
  • Concrete test implementations:
    • TestClpKeyValuePairStreamingHandlerRaw
    • TestClpKeyValuePairStreamingHandlerZstd
    • TestClpKeyValuePairStreamingHandlerLogLevelTimeoutRaw
    • TestClpKeyValuePairStreamingHandlerLogLevelTimeoutZstd

The test discovery mechanism in __init__.py will correctly collect all these test classes through __subclasses__().

🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Description: Verify the test discovery for ClpKeyValuePairLoggingBase

# Check for test class implementations
echo "Checking for concrete test implementations:"
ast-grep --pattern 'class $_($$$TestClpKeyValuePairLoggingBase$$$):'

# Verify test file structure
echo -e "\nChecking test file organization:"
fd -e py . tests/

Length of output: 309


Script:

#!/bin/bash
# Let's check the content of test files to verify the test class hierarchy

# Check the base test class definition
echo "Checking TestClpKeyValuePairLoggingBase definition:"
rg "class TestClpKeyValuePairLoggingBase" tests/ -A 5

# Look for any test classes that inherit from TestClpKeyValuePairLoggingBase
echo -e "\nChecking for test classes inheriting from TestClpKeyValuePairLoggingBase:"
rg "class.*\(.*TestClpKeyValuePairLoggingBase.*\)" tests/

# Check the test handler implementations
echo -e "\nChecking test handler implementations:"
cat tests/test_handlers.py

Length of output: 45299

src/clp_logging/auto_generated_kv_pairs_utils.py (2)

4-16: LGTM! Well-structured constant definitions.

The constants are logically grouped, clearly named, and properly typed. They align well with the PR objectives for handling auto-generated key-value pairs.


78-94: Verify consistent usage of timestamp format

The implementation looks good. Let's verify that the timestamp format is consistent across the codebase.

✅ Verification successful

Timestamp format is consistently implemented across the codebase

The verification shows that the timestamp implementation in create_loglib_generated_log_event_as_auto_generated_kv_pairs is consistent with the rest of the codebase:

  • All timestamps are consistently handled as Unix epoch milliseconds (integer type)
  • The function follows the same timestamp structure used throughout:
    • Uses ZONED_TIMESTAMP_KEY with nested ZONED_TIMESTAMP_UTC_EPOCH_MS_KEY and ZONED_TIMESTAMP_TZ_KEY
    • Accepts timezone as an optional string parameter
  • The implementation aligns with the protocol's timestamp handling (ID_TIMESTAMP, TIMESTAMP_DELTA_* constants) and reader/handler components
🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Search for timestamp-related code to ensure consistent format usage
rg -A 2 "timestamp.*:.*int" --type python

Length of output: 74


Script:

#!/bin/bash
# Search for timestamp-related code using ripgrep without file type filter
rg -A 2 "timestamp.*:.*int"

Length of output: 3170


Script:

#!/bin/bash
# Search for timestamp usage in function parameters and type hints
ast-grep --pattern 'def $FUNC($$$, timestamp: int, $$$)'

Length of output: 58


Script:

#!/bin/bash
# Search for timestamp-related constants and type definitions
rg "TIMESTAMP|timestamp" -A 2

Length of output: 29362

src/clp_logging/handlers.py (1)

227-231: Update CLPLogLevelTimeout to support Serializer

The inclusion of Serializer in the ostream attribute and the set_ostream method enhances the flexibility of CLPLogLevelTimeout, allowing it to work with a broader range of output streams.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Outside diff range and nitpick comments (4)
src/clp_logging/handlers.py (3)

858-865: Consider removing unused formatter assignment

In the setFormatter method, after issuing a warning that formatters are not supported, you assign the formatter to self._formatter. Since the formatter isn't utilized elsewhere in the class, consider removing this assignment to prevent potential confusion.


862-864: Adjust warning message for clarity

In the warning message within setFormatter, the phrasing can be improved for clarity. Consider rephrasing it to:

"Formatter is currently not supported in ClpKeyValuePairStreamHandler"


916-919: Optimize compressor instantiation

The ZstdCompressor instance cctx is instantiated even when compression is disabled. To enhance performance, consider creating the compressor only when self._enable_compression is True.

Apply this change:

def _init_new_serializer(self, stream: IO[bytes]) -> None:
    """
    Initializes a new serializer that writes to the given stream.

    :param stream: The stream that the underlying serializer will write to.
    """
    if self._enable_compression:
        cctx: ZstdCompressor = ZstdCompressor()
        self._serializer = Serializer(cctx.stream_writer(stream))
    else:
        self._serializer = Serializer(stream)
tests/test_handlers.py (1)

1028-1031: Adhere to PEP 8 line length guidelines for readability

Lines 1028-1031 exceed the recommended maximum line length of 79 characters. Wrapping these lines according to PEP 8 guidelines would improve code readability.

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL

📥 Commits

Reviewing files that changed from the base of the PR and between ec36f20 and 06608a7.

📒 Files selected for processing (3)
  • src/clp_logging/auto_generated_kv_pairs_utils.py (1 hunks)
  • src/clp_logging/handlers.py (4 hunks)
  • tests/test_handlers.py (4 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
  • src/clp_logging/auto_generated_kv_pairs_utils.py
🧰 Additional context used
📓 Learnings (1)
src/clp_logging/handlers.py (1)
Learnt from: LinZhihao-723
PR: y-scope/clp-loglib-py#46
File: src/clp_logging/handlers.py:883-894
Timestamp: 2024-11-30T03:17:38.238Z
Learning: `clp_ffi_py.ir.Serializer` needs to allow closing the serializer without closing the underlying output stream.

@kirkrodrigues
Copy link
Member

Sorry, haven't yet finished a review. Will get back to you today.

@kirkrodrigues kirkrodrigues self-requested a review December 3, 2024 15:05
Copy link
Member

@kirkrodrigues kirkrodrigues left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • The wrapping of docstrings is inconsistent across this PR. Can we stick to 100 characters and open an issue to refactor the existing docstrings?
  • Do we need indents for multiline docstrings?

from typing import Any, Dict, Optional

ZONED_TIMESTAMP_KEY: str = "zoned_timestamp"
ZONED_TIMESTAMP_UTC_EPOCH_MS_KEY: str = "utc_epoch_ms"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • Aren't all epoch timestamps UTC?
  • How about unix_timestamp_ms?
    • Technically, ms is ambiguous since it could mean Ms or ms due to capitalization, so I would rather use millisecs for the least ambiguity; but I guess that would be a larger change across our codebase.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changed the structure to:

{
  "timestamp": {
    "unix_ts_ms": 0,
    "utc_offset_sec": 0
  }
}

Dropped ZONE_ prefix as suggested in the previous discussion.


self._close_and_compare()

def test_empty(self) -> None:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we also test creating an entirely empty file? I.e., one without a log event.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • Moved this test case to test_basic.
  • Make test_empty to test serializing an empty file instead.

@LinZhihao-723
Copy link
Member Author

  • The wrapping of docstrings is inconsistent across this PR. Can we stick to 100 characters and open an issue to refactor the existing docstrings?
  • Do we need indents for multiline docstrings?
  • Yeah, I'm aware of the inconsistent docstring style. For copied code I didn't touch the docstrings so they were 80-char long. I will create an issue to track the refactoring.
  • Yes, my bad. I will fix them in the coming commits.

LinZhihao-723 and others added 2 commits February 2, 2025 21:08
Co-authored-by: kirkrodrigues <2454684+kirkrodrigues@users.noreply.github.com>
Co-authored-by: kirkrodrigues <2454684+kirkrodrigues@users.noreply.github.com>
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (2)
src/clp_logging/handlers.py (2)

805-833: Consider moving key-value pair rules to documentation.

The docstring is comprehensive but quite long. Consider:

  1. Moving the key-value pair rules section to a separate documentation file.
  2. Keeping only the essential class description and parameters in the docstring.

This would improve readability while maintaining the documentation's completeness.


955-956: Replace assertion with error handling.

The assertion could be replaced with proper error handling to provide a better user experience in production.

-        assert self._serializer is not None
+        if self._serializer is None:
+            raise IOError("Stream already closed.")
         self._serializer.serialize_log_event_from_msgpack_map(serialize_dict_to_msgpack(log_event))
📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 096c3ce and e854fa1.

📒 Files selected for processing (1)
  • src/clp_logging/handlers.py (4 hunks)
🧰 Additional context used
📓 Learnings (1)
src/clp_logging/handlers.py (1)
Learnt from: LinZhihao-723
PR: y-scope/clp-loglib-py#46
File: src/clp_logging/handlers.py:883-894
Timestamp: 2024-11-30T03:17:38.238Z
Learning: `clp_ffi_py.ir.Serializer` needs to allow closing the serializer without closing the underlying output stream.
🔇 Additional comments (6)
src/clp_logging/handlers.py (6)

40-41: LGTM! Constants are well-defined.

The constants follow Python naming conventions and their values align with the PR objectives for separating auto-generated and user-generated key-value pairs.


227-227: LGTM! Type annotations updated correctly.

The ostream attribute and set_ostream method have been properly updated to support the Serializer type for the new key-value pair handler.

Also applies to: 231-231


835-856: LGTM! Constructor and attributes are well-structured.

The initialization is thorough and attributes are properly typed. The use of AutoGeneratedKeyValuePairsBuffer aligns with the PR objectives for structured logging.


858-866: LGTM! Formatter warning is clear and appropriate.

The warning effectively communicates that formatters are not supported in this handler.


883-894: Track the serializer closing behaviour issue.

The TODO comment indicates a known limitation where the serializer closes the old stream, which differs from logging.StreamHandler's behaviour.

Would you like me to create an issue to track this limitation in clp_ffi_py.ir.Serializer?


921-934: LGTM! Input validation is thorough.

The method properly validates that:

  1. The stream is not closed
  2. The log message is a dictionary

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

🧹 Nitpick comments (1)
tests/test_handlers.py (1)

795-802: Consider implementing the suggested refactoring.

As noted in the TODO comment, consider refactoring TestCLPBase to support both raw-text and key-value pair logging to reduce code duplication.

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between e854fa1 and b310993.

📒 Files selected for processing (5)
  • pyproject.toml (1 hunks)
  • src/clp_logging/auto_generated_kv_pairs_utils.py (1 hunks)
  • src/clp_logging/handlers.py (4 hunks)
  • src/clp_logging/utils.py (1 hunks)
  • tests/test_handlers.py (5 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
  • pyproject.toml
🧰 Additional context used
📓 Learnings (1)
src/clp_logging/handlers.py (1)
Learnt from: LinZhihao-723
PR: y-scope/clp-loglib-py#46
File: src/clp_logging/handlers.py:883-894
Timestamp: 2024-11-30T03:17:38.238Z
Learning: `clp_ffi_py.ir.Serializer` needs to allow closing the serializer without closing the underlying output stream.
🪛 GitHub Actions: Lint, build, and test package
src/clp_logging/auto_generated_kv_pairs_utils.py

[warning] 12-12: Code does not adhere to PEP 8 standards. Please format the code using Black.

🔇 Additional comments (9)
src/clp_logging/utils.py (3)

24-34: LGTM!

The constructor and member variables are well-designed with proper type hints and documentation.


35-40: LGTM!

The getter method is properly implemented with clear documentation.


41-45: LGTM!

The getter method is properly implemented with clear documentation.

src/clp_logging/auto_generated_kv_pairs_utils.py (2)

1-17: LGTM!

The constants are well-defined with clear naming conventions.

🧰 Tools
🪛 GitHub Actions: Lint, build, and test package

[warning] 12-12: Code does not adhere to PEP 8 standards. Please format the code using Black.


42-67: LGTM!

The generate method is well-implemented with proper type hints and documentation.

tests/test_handlers.py (2)

772-793: LGTM!

The ExpectedLogEvent class is well-designed with proper type hints and documentation.


933-936: Add test for empty file creation.

As suggested in the past review comments, add a test case for creating an entirely empty file without any log events.

src/clp_logging/handlers.py (2)

804-831: LGTM!

The class documentation is clear and comprehensive, properly explaining the differences from logging.StreamHandler and the rules for key-value pair representation.


893-900: ⚠️ Potential issue

Fix stream closure behavior.

As noted in the TODO comment and past review comments, the stream closure behavior differs from logging.StreamHandler. This needs to be fixed once clp_ffi_py.ir.Serializer supports closing the serializer without closing the underlying stream.

@LinZhihao-723
Copy link
Member Author

LinZhihao-723 commented Feb 8, 2025

  • The wrapping of docstrings is inconsistent across this PR. Can we stick to 100 characters and open an issue to refactor the existing docstrings?
  • Do we need indents for multiline docstrings?

Update: The inconsistency in the docstring is caused by the outdated linting workflow. If you don't mind, let's fix this in the subsequent PR immediately after merging this PR. The gh issue already created: #53
@kirkrodrigues

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (3)
tests/test_handlers.py (3)

772-793: Enhance the docstring with more details.

While the class is well-structured, consider expanding the docstring to include:

  • Purpose of each field
  • Example usage
  • Relationship with the logging system

Apply this diff to improve the documentation:

 class ExpectedLogEvent:
     """
-    Simple class to represent an expected log event, which contains all relevant
-    log event metadata and user-generated kv-pairs.
+    Class representing an expected log event for testing key-value pair logging.
+
+    This class encapsulates all metadata and user-generated key-value pairs that
+    should be present in a log event, allowing for precise verification of logging
+    behaviour.
+
+    Fields:
+        ts: Timestamp of the log event
+        level_no: Numeric logging level (e.g., 10 for DEBUG)
+        level_name: String representation of logging level (e.g., "DEBUG")
+        path: Source file path where the log was generated
+        line: Line number where the log was generated (optional)
+        user_generated_kv_pairs: Dictionary of user-provided key-value pairs
+
+    Example:
+        event = ExpectedLogEvent(
+            ts=Timestamp.now(),
+            level_no=logging.INFO,
+            level_name="INFO",
+            path=Path("test.py"),
+            line=42,
+            user_generated_kv_pairs={"message": "test"}
+        )
     """

795-802: Consider creating an issue for the TODO refactoring.

The TODO comment correctly identifies the need to refactor TestCLPBase. This would improve code maintainability by unifying the testing infrastructure for both structured and unstructured logging.

Would you like me to create an issue to track this refactoring task?


853-889: Consider extracting comparison logic into separate methods.

The comparison logic in _compare method is comprehensive but could be more maintainable if split into smaller, focused methods.

Consider this refactoring:

     def _compare(self) -> None:
         actual_log_events: List[KeyValuePairLogEvent] = self._read_clp()
         self.assertEqual(len(self._expected_log_events), len(actual_log_events))
         for actual, expected in zip(actual_log_events, self._expected_log_events):
             auto_generated_kv_pairs: Dict[str, Any]
             user_generated_kv_pairs: Dict[str, Any]
             auto_generated_kv_pairs, user_generated_kv_pairs = actual.to_dict()
+            self._compare_user_generated_pairs(user_generated_kv_pairs, expected)
+            self._compare_auto_generated_pairs(auto_generated_kv_pairs, expected)
 
-            # Check user generated kv pairs
-            self.assertEqual(user_generated_kv_pairs, expected.user_generated_kv_pairs)
-
-            # Check auto generated kv pairs
-            self.assertAlmostEqual(
-                auto_generated_kv_pairs[TIMESTAMP_KEY][TIMESTAMP_UNIX_TS_MS],
-                expected.ts.get_unix_ts(),
-                delta=ASSERT_TIMESTAMP_DELTA_MS,
-            )
-            self.assertEqual(
-                auto_generated_kv_pairs[TIMESTAMP_KEY][TIMESTAMP_UTC_OFFSET_SEC],
-                expected.ts.get_utc_offset(),
-            )
-
-            self.assertEqual(auto_generated_kv_pairs[LEVEL_KEY][LEVEL_NO_KEY], expected.level_no)
-            self.assertEqual(
-                auto_generated_kv_pairs[LEVEL_KEY][LEVEL_NAME_KEY], expected.level_name
-            )
-
-            self.assertEqual(
-                Path(auto_generated_kv_pairs[SOURCE_CONTEXT_KEY][SOURCE_CONTEXT_PATH_KEY]),
-                expected.path,
-            )
-            if expected.line is not None:
-                self.assertEqual(
-                    auto_generated_kv_pairs[SOURCE_CONTEXT_KEY][SOURCE_CONTEXT_LINE_KEY],
-                    expected.line,
-                )
+
+    def _compare_user_generated_pairs(
+        self, actual: Dict[str, Any], expected: ExpectedLogEvent
+    ) -> None:
+        self.assertEqual(actual, expected.user_generated_kv_pairs)
+
+    def _compare_auto_generated_pairs(
+        self, actual: Dict[str, Any], expected: ExpectedLogEvent
+    ) -> None:
+        self._compare_timestamp(actual[TIMESTAMP_KEY], expected.ts)
+        self._compare_level(actual[LEVEL_KEY], expected)
+        self._compare_source_context(actual[SOURCE_CONTEXT_KEY], expected)
+
+    def _compare_timestamp(self, actual: Dict[str, Any], expected: Timestamp) -> None:
+        self.assertAlmostEqual(
+            actual[TIMESTAMP_UNIX_TS_MS],
+            expected.get_unix_ts(),
+            delta=ASSERT_TIMESTAMP_DELTA_MS,
+        )
+        self.assertEqual(
+            actual[TIMESTAMP_UTC_OFFSET_SEC],
+            expected.get_utc_offset(),
+        )
+
+    def _compare_level(self, actual: Dict[str, Any], expected: ExpectedLogEvent) -> None:
+        self.assertEqual(actual[LEVEL_NO_KEY], expected.level_no)
+        self.assertEqual(actual[LEVEL_NAME_KEY], expected.level_name)
+
+    def _compare_source_context(
+        self, actual: Dict[str, Any], expected: ExpectedLogEvent
+    ) -> None:
+        self.assertEqual(
+            Path(actual[SOURCE_CONTEXT_PATH_KEY]),
+            expected.path,
+        )
+        if expected.line is not None:
+            self.assertEqual(
+                actual[SOURCE_CONTEXT_LINE_KEY],
+                expected.line,
+            )
📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 715961b and 1bcf7d7.

📒 Files selected for processing (1)
  • tests/test_handlers.py (5 hunks)
⏰ Context from checks skipped due to timeout of 90000ms (5)
  • GitHub Check: test (ubuntu-latest, 3.11)
  • GitHub Check: test (ubuntu-latest, 3.8)
  • GitHub Check: test-python37 (ubuntu-22.04, 3.7)
  • GitHub Check: test (macos-latest, 3.8)
  • GitHub Check: test-python37 (macos-13, 3.7)
🔇 Additional comments (2)
tests/test_handlers.py (2)

1-2: LGTM! Clean imports and constants organization.

The new imports and constants are well-organized and properly support the key-value pair logging functionality.

Also applies to: 10-10, 14-15, 27-37, 73-73


904-968: LGTM! Comprehensive test coverage.

The test implementation is thorough, covering:

  • Basic key-value pair logging
  • Various data types (strings, numbers, booleans, null)
  • Nested structures (dictionaries and arrays)
  • Edge cases (empty dictionaries and no logs)
  • Both raw and compressed configurations

LinZhihao-723 and others added 3 commits February 16, 2025 16:33
Co-authored-by: kirkrodrigues <2454684+kirkrodrigues@users.noreply.github.com>
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (5)
src/clp_logging/handlers.py (2)

852-859: Consider using NotImplementedError instead of a warning.

Since formatters are not supported, it might be clearer to raise a NotImplementedError instead of just warning the user. This would prevent potential issues where users might assume their formatter is being applied.

-        warnings.warn(
-            f"{self.__class__.__name__} doesn't currently support Formatters",
-            category=RuntimeWarning,
-        )
+        raise NotImplementedError(f"{self.__class__.__name__} doesn't currently support Formatters")

954-967: Consider adding error handling for serialization failures.

The method should handle potential serialization errors from the FFI layer to provide better error messages to users.

    def _serialize_kv_pair_log_event(
        self, auto_gen_kv_pairs: Dict[str, Any], user_gen_kv_pairs: Dict[str, Any]
    ) -> None:
        if self._is_closed():
            raise RuntimeError("Stream already closed.")
        assert self._serializer is not None
+       try:
            self._serializer.serialize_log_event_from_msgpack_map(
                serialize_dict_to_msgpack(auto_gen_kv_pairs),
                serialize_dict_to_msgpack(user_gen_kv_pairs),
            )
+       except Exception as e:
+           raise RuntimeError(f"Failed to serialize log event: {str(e)}") from e
tests/test_handlers.py (3)

795-802: Track the refactoring of base test classes.

The TODO comment indicates a need to refactor TestCLPBase. Consider creating an issue to track this technical debt.

Would you like me to create an issue to track the refactoring of TestCLPBase to support both unstructured and structured logging?


833-844: Consider adding docstring for _log method.

The method is crucial for test cases but lacks documentation explaining its purpose and parameters.

    def _log(self, level: int, kv_pairs: Dict[str, Any]) -> None:
+       """
+       Log a message with the specified level and key-value pairs.
+       
+       :param level: The log level to use (e.g., logging.INFO)
+       :param kv_pairs: Dictionary of key-value pairs to log
+       """
        level_name: str = logging.getLevelName(level)
        path: Path = Path(__file__).resolve()
        curr_frame: Optional[FrameType] = inspect.currentframe()

906-934: Consider adding more edge cases to test_basic.

While the current test cases cover the basic functionality, consider adding tests for:

  • Very large dictionaries
  • Deeply nested structures
  • Unicode strings
  • Special characters in keys
    def test_basic(self) -> None:
+       # Test large dictionary
+       large_dict = {f"key_{i}": f"value_{i}" for i in range(1000)}
+       self._log(logging.INFO, {"large": large_dict})
+
+       # Test deeply nested structure
+       nested_dict = {"level1": {"level2": {"level3": {"level4": "value"}}}}
+       self._log(logging.INFO, {"nested": nested_dict})
+
+       # Test Unicode and special characters
+       unicode_dict = {"unicode_key": "🐰", "special_chars": "!@#$%^&*()"}
+       self._log(logging.INFO, {"special": unicode_dict})
📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 2f03629 and 5ab34d4.

📒 Files selected for processing (4)
  • pyproject.toml (1 hunks)
  • src/clp_logging/handlers.py (4 hunks)
  • src/clp_logging/utils.py (1 hunks)
  • tests/test_handlers.py (5 hunks)
🚧 Files skipped from review as they are similar to previous changes (2)
  • pyproject.toml
  • src/clp_logging/utils.py
🧰 Additional context used
🧠 Learnings (1)
src/clp_logging/handlers.py (2)
Learnt from: LinZhihao-723
PR: y-scope/clp-loglib-py#46
File: src/clp_logging/handlers.py:883-894
Timestamp: 2024-11-30T03:17:38.238Z
Learning: `clp_ffi_py.ir.Serializer` needs to allow closing the serializer without closing the underlying output stream.
Learnt from: LinZhihao-723
PR: y-scope/clp-loglib-py#46
File: src/clp_logging/handlers.py:943-951
Timestamp: 2025-02-08T22:26:24.887Z
Learning: In clp-loglib-py, validation of key-value pairs in log messages is handled by the FFI layer (clp-ffi-py) rather than the Python layer.
🪛 GitHub Actions: Lint, build, and test package
tests/test_handlers.py

[warning] 771-774: Line length exceeds the recommended limit. Please format the code to adhere to style guidelines.

src/clp_logging/handlers.py

[warning] 875-878: Line length exceeds the recommended limit. Please format the code to adhere to style guidelines.

🔇 Additional comments (6)
src/clp_logging/handlers.py (4)

804-831: LGTM! Clear and comprehensive docstring.

The docstring effectively communicates:

  • The purpose of the handler
  • Key differences from logging.StreamHandler
  • Rules for key-value pairs
  • Parameter descriptions

833-850: LGTM! Clean constructor implementation.

The constructor:

  • Properly initializes base class
  • Sets up required attributes
  • Creates auto-generated key-value pairs buffer
  • Initializes the serializer

936-953: LGTM! Robust error handling in _write.

The method:

  • Checks for closed state
  • Validates input type
  • Delegates to serialization method

875-902:

❓ Verification inconclusive

Verify the stream handling in setStream.

The method correctly mirrors CPython's implementation but has a known limitation with stream closure. This needs to be addressed in the FFI layer.


🏁 Script executed:

#!/bin/bash
# Description: Check if there are any other handlers in the codebase that might be affected by this stream handling behavior

# Search for handlers that use setStream
rg -A 5 "def setStream" --type python

Length of output: 70


Attention: Verify and Align Stream Closure Behaviour

The current implementation of setStream in src/clp_logging/handlers.py mirrors CPython’s approach by closing the old stream, in contrast with logging.StreamHandler which only flushes it. This discrepancy should be resolved in the FFI layer to ensure that closing the serializer does not inadvertently close the underlying output stream.

  • Action: Manually verify whether any other parts of the codebase depend on the non-closing behaviour of streams.
  • Action: Confirm that the FFI layer is updated to support closing the serializer without terminating the stream if a flush is required instead.
🧰 Tools
🪛 GitHub Actions: Lint, build, and test package

[warning] 875-878: Line length exceeds the recommended limit. Please format the code to adhere to style guidelines.

tests/test_handlers.py (2)

772-793: LGTM! Well-structured test helper class.

The class effectively encapsulates all the necessary fields for validating log events.


939-969: LGTM! Good test coverage for both raw and compressed handlers.

The test classes properly verify the handler's functionality with and without compression.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

🧹 Nitpick comments (6)
tests/test_handlers.py (6)

772-776: Enhance the docstring to better describe the class's purpose and usage.

The current docstring could be more descriptive. Consider adding:

  • The purpose of this class in testing
  • Description of each field
  • Example usage
-    """
-    An expected kv-pair log event, which contains all relevant log event
-    metadata and user-generated kv-pairs.
-    """
+    """
+    A test helper class that represents an expected key-value pair log event for comparison
+    with actual log events during testing.
+
+    This class encapsulates all the metadata and user-generated key-value pairs that should
+    be present in a log event, including:
+    - Timestamp with timezone information
+    - Log level number and name
+    - Source context (file path and line number)
+    - User-generated key-value pairs
+
+    Example:
+        expected = ExpectedLogEvent(
+            ts=Timestamp.now(),
+            level_no=logging.INFO,
+            level_name="INFO",
+            path=Path("test.py"),
+            line=42,
+            user_generated_kv_pairs={"message": "test"}
+        )
+    """

798-802: Create an issue to track the TODO for refactoring.

The TODO comment indicates that this class mirrors TestCLPBase functionality. Create an issue to track the refactoring of TestCLPBase to support both unstructured and structured logging.

Would you like me to create an issue to track this refactoring task?


854-890: Enhance error handling in the _compare method.

The comparison method could be more robust by:

  1. Validating dictionary types before accessing them
  2. Adding descriptive error messages for assertion failures
  3. Handling potential KeyError exceptions
     def _compare(self) -> None:
         actual_log_events: List[KeyValuePairLogEvent] = self._read_clp()
-        self.assertEqual(len(self._expected_log_events), len(actual_log_events))
+        self.assertEqual(
+            len(self._expected_log_events),
+            len(actual_log_events),
+            "Number of actual log events does not match expected",
+        )
         for actual, expected in zip(actual_log_events, self._expected_log_events):
             auto_generated_kv_pairs: Dict[str, Any]
             user_generated_kv_pairs: Dict[str, Any]
-            auto_generated_kv_pairs, user_generated_kv_pairs = actual.to_dict()
+            try:
+                auto_generated_kv_pairs, user_generated_kv_pairs = actual.to_dict()
+            except Exception as e:
+                self.fail(f"Failed to convert log event to dict: {e}")

             # Check user generated kv pairs
-            self.assertEqual(user_generated_kv_pairs, expected.user_generated_kv_pairs)
+            self.assertEqual(
+                user_generated_kv_pairs,
+                expected.user_generated_kv_pairs,
+                "User-generated key-value pairs do not match expected",
+            )

             # Check auto generated kv pairs
+            self.assertIn(
+                TIMESTAMP_KEY,
+                auto_generated_kv_pairs,
+                f"Missing required key: {TIMESTAMP_KEY}",
+            )
             self.assertAlmostEqual(
                 auto_generated_kv_pairs[TIMESTAMP_KEY][TIMESTAMP_UNIX_TS_MS],
                 expected.ts.get_unix_ts(),
                 delta=ASSERT_TIMESTAMP_DELTA_MS,
+                msg="Timestamp does not match expected within delta",
             )

905-937: Add test cases for error conditions and edge cases.

The current test suite could be enhanced with additional test cases:

  1. Error conditions:
    • Invalid dictionary values
    • Non-dictionary messages
    • Closed handler
  2. Edge cases:
    • Large dictionaries
    • Deep nested structures
    • Unicode keys and values

Add these test cases:

def test_error_conditions(self) -> None:
    """Test various error conditions."""
    # Test non-dictionary message
    with self.assertRaises(TypeError):
        self._log(logging.INFO, "not a dict")  # type: ignore

    # Test closed handler
    self._close()
    with self.assertRaises(RuntimeError):
        self._log(logging.INFO, {"message": "test"})

def test_edge_cases(self) -> None:
    """Test edge cases with large and complex data."""
    # Large dictionary
    large_dict = {f"key_{i}": f"value_{i}" for i in range(1000)}
    self._log(logging.INFO, large_dict)

    # Deep nested structure
    nested_dict = {"level1": {"level2": {"level3": {"level4": "value"}}}}
    self._log(logging.INFO, nested_dict)

    # Unicode
    unicode_dict = {"🔑": "value", "key": "📝"}
    self._log(logging.INFO, unicode_dict)

    self._close_and_compare()

939-942: Enhance the docstring with more details.

The current docstring could be more descriptive about what aspects of the uncompressed handler are being tested.

-    """
-    Test `ClpKeyValuePairStreamingHandler` without compression.
-    """
+    """
+    Test suite for `ClpKeyValuePairStreamingHandler` without compression.
+
+    This test class verifies that the handler correctly:
+    - Serializes key-value pairs into the CLP IR format
+    - Writes the serialized data directly to the output stream
+    - Maintains the correct log event structure without compression
+    - Handles all supported data types in their raw form
+    """

955-958: Enhance the docstring with more details.

The current docstring could be more descriptive about what aspects of the compressed handler are being tested.

-    """
-    Test `ClpKeyValuePairStreamingHandler` with zstd compression.
-    """
+    """
+    Test suite for `ClpKeyValuePairStreamingHandler` with zstd compression.
+
+    This test class verifies that the handler correctly:
+    - Serializes key-value pairs into the CLP IR format
+    - Compresses the serialized data using zstd
+    - Writes the compressed data to the output stream
+    - Maintains data integrity through the compression process
+    - Handles all supported data types with compression enabled
+    """
📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 5ab34d4 and 47d3af7.

📒 Files selected for processing (2)
  • src/clp_logging/handlers.py (4 hunks)
  • tests/test_handlers.py (5 hunks)
🧰 Additional context used
🧠 Learnings (1)
src/clp_logging/handlers.py (2)
Learnt from: LinZhihao-723
PR: y-scope/clp-loglib-py#46
File: src/clp_logging/handlers.py:883-894
Timestamp: 2024-11-30T03:17:38.238Z
Learning: `clp_ffi_py.ir.Serializer` needs to allow closing the serializer without closing the underlying output stream.
Learnt from: LinZhihao-723
PR: y-scope/clp-loglib-py#46
File: src/clp_logging/handlers.py:943-951
Timestamp: 2025-02-08T22:26:24.887Z
Learning: In clp-loglib-py, validation of key-value pairs in log messages is handled by the FFI layer (clp-ffi-py) rather than the Python layer.
⏰ Context from checks skipped due to timeout of 90000ms (6)
  • GitHub Check: test (ubuntu-latest, 3.11)
  • GitHub Check: test-python37 (ubuntu-22.04, 3.7)
  • GitHub Check: test (ubuntu-latest, 3.8)
  • GitHub Check: test (macos-latest, 3.11)
  • GitHub Check: test-python37 (macos-13, 3.7)
  • GitHub Check: test (macos-latest, 3.8)

LinZhihao-723 and others added 2 commits February 16, 2025 23:12
Co-authored-by: kirkrodrigues <2454684+kirkrodrigues@users.noreply.github.com>
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (2)
tests/test_handlers.py (2)

795-802: Track the refactoring of test base classes.

The class is well-implemented. However, as noted in the TODO comment, there's potential code duplication with TestCLPBase. Consider creating a tracking issue for refactoring both classes to share common functionality.

Would you like me to create an issue to track the refactoring of TestCLPBase and TestClpKeyValuePairLoggingBase?


905-937: Add test cases for error scenarios.

The test coverage is good for happy paths, but consider adding test cases for:

  • Invalid dictionary values
  • Nested dictionaries exceeding maximum depth
  • Large dictionaries that might exceed buffer limits
📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 47d3af7 and 1ea69de.

📒 Files selected for processing (3)
  • src/clp_logging/auto_generated_kv_pairs_utils.py (1 hunks)
  • src/clp_logging/handlers.py (4 hunks)
  • tests/test_handlers.py (5 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
  • src/clp_logging/auto_generated_kv_pairs_utils.py
🧰 Additional context used
🧠 Learnings (1)
src/clp_logging/handlers.py (2)
Learnt from: LinZhihao-723
PR: y-scope/clp-loglib-py#46
File: src/clp_logging/handlers.py:883-894
Timestamp: 2024-11-30T03:17:38.238Z
Learning: `clp_ffi_py.ir.Serializer` needs to allow closing the serializer without closing the underlying output stream.
Learnt from: LinZhihao-723
PR: y-scope/clp-loglib-py#46
File: src/clp_logging/handlers.py:943-951
Timestamp: 2025-02-08T22:26:24.887Z
Learning: In clp-loglib-py, validation of key-value pairs in log messages is handled by the FFI layer (clp-ffi-py) rather than the Python layer.
🔇 Additional comments (5)
tests/test_handlers.py (2)

772-793: LGTM!

The class is well-structured with clear type hints and encapsulates all necessary metadata for log event validation.


854-890:

❓ Verification inconclusive

Verify the comparison of auto-generated key-value pairs.

The comparison logic for auto-generated key-value pairs needs to handle potential timezone differences correctly.

Run this script to verify timezone handling:


🏁 Script executed:

#!/bin/bash
# Description: Check for timezone-related test failures across different environments.

# Test: Search for timezone-related test failures in the test history.
gh search issues "timezone test failure in:title,body repo:y-scope/clp-loglib-py"

# Test: Search for timezone-related code.
rg -A 5 "timezone|tzinfo|UTC"

Length of output: 22904


Ensure Consistent Timezone Handling in Auto-Generated Log Comparisons

The current test in tests/test_handlers.py verifies that the Unix timestamp (with an allowed delta) and the UTC offset match the expected values. Please review that both the timestamp and UTC offset comparisons correctly account for any differences arising from environment-specific timezone settings. In particular, confirm that the functions responsible for retrieving and converting timezone information (for example, those in clp_logging/auto_generated_kv_pairs_utils.py and clp_logging/handlers.py) are consistently normalising these values across different environments.

  • Verify that the UTC offset obtained from expected.ts.get_utc_offset() matches the one generated by the log event even when different system timezones are in effect.
  • Reassess if an explicit normalisation step is required before the comparison, to prevent potential flaky tests on CI due to varying default timezones.
src/clp_logging/handlers.py (3)

39-40: LGTM!

The constants have clear, descriptive names and follow the project's naming convention.


804-831: LGTM!

The class is well-documented with clear docstrings that explain:

  • The differences from logging.StreamHandler
  • The rules for key-value pairs
  • The parameter requirements

944-949: LGTM!

The error handling is consistent and uses appropriate error types with clear error messages.

Copy link
Member

@kirkrodrigues kirkrodrigues left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

feat: Add ClpKeyValuePairStreamHandler which supports logging dictionary-type log events into CLP's key-value pair IR format.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (5)
tests/test_handlers.py (5)

772-776: Enhance the docstring to better describe the class purpose and attributes.

The current docstring could be more descriptive to help users understand the class's purpose and its attributes.

Apply this diff to improve the docstring:

-    """
-    An expected kv-pair log event, which contains all relevant log event
-    metadata and user-generated kv-pairs.
-    """
+    """
+    Represents an expected key-value pair log event for testing purposes.
+
+    This class encapsulates both the metadata (timestamp, log level, source location)
+    and user-generated key-value pairs that are expected to be present in a log event.
+    It is used to validate that the actual log events match the expected format and content.
+
+    Attributes:
+        ts: The timestamp of the log event
+        level_num: The numeric log level (e.g., 10 for DEBUG)
+        level_name: The string representation of the log level (e.g., "DEBUG")
+        path: The source file path where the log was generated
+        line: The line number where the log was generated (optional)
+        user_generated_kv_pairs: Dictionary of user-provided key-value pairs
+    """

795-802: Track the refactoring task.

The TODO comment indicates a need to refactor TestCLPBase to support both unstructured and structured logging.

Would you like me to create an issue to track this refactoring task? The issue would outline the steps needed to:

  1. Refactor TestCLPBase to support both logging types
  2. Share common functionality between the base classes
  3. Ensure backward compatibility

833-845: Improve variable name clarity in _log method.

The variable curr_frame could be more descriptive to better indicate its purpose.

Apply this diff to improve the variable name:

-        curr_frame: Optional[FrameType] = inspect.currentframe()
+        current_stack_frame: Optional[FrameType] = inspect.currentframe()
 
         # NOTE: This line must be right before the actual logging statement
-        line: Optional[int] = curr_frame.f_lineno + 1 if curr_frame is not None else None
+        line: Optional[int] = current_stack_frame.f_lineno + 1 if current_stack_frame is not None else None

905-937: Consider adding edge case tests.

While the current test coverage is good, consider adding tests for the following edge cases:

  1. Very large dictionaries/arrays
  2. Special characters in keys/values
  3. Nested structures with maximum depth
  4. Unicode characters

Example test case:

def test_edge_cases(self) -> None:
    # Test very large dictionary
    large_dict = {f"key_{i}": f"value_{i}" for i in range(1000)}
    self._log(logging.INFO, {"large_dict": large_dict})

    # Test special characters
    special_chars = {
        "key.with.dots": "value",
        "key with spaces": "value",
        "key_with_unicode": "🐰",
        "nested.key": {"deeply.nested": {"more.nesting": "value"}}
    }
    self._log(logging.INFO, special_chars)

    self._close_and_compare()

939-969: Consider using constants for file extensions.

The .clp and .clp.zst file extensions are used in multiple places. Consider extracting them into constants at the module level.

Apply this diff to add constants and use them:

+# File extensions for different compression types
+CLP_RAW_EXTENSION = ".clp"
+CLP_ZSTD_EXTENSION = ".clp.zst"

 class TestClpKeyValuePairStreamingHandlerRaw(TestClpKeyValuePairHandlerBase):
     # override
     def setUp(self) -> None:
         self._enable_compression = False
         super().setUp()
         self._clp_kv_pair_handler = ClpKeyValuePairStreamHandler(
             io_open(self._clp_log_path, "wb"),
             self._enable_compression,
         )
         self._setup_logging()

 class TestClpKeyValuePairStreamingHandlerZstd(TestClpKeyValuePairHandlerBase):
     # override
     def setUp(self) -> None:
         self._enable_compression = True
         super().setUp()
         self._clp_kv_pair_handler = ClpKeyValuePairStreamHandler(
             io_open(self._clp_log_path, "wb"),
             self._enable_compression,
         )
         self._setup_logging()

Then update the file extension selection in setUp:

     def setUp(self) -> None:
-        file_extension: str = ".clp.zst" if self._enable_compression else ".clp"
+        file_extension: str = CLP_ZSTD_EXTENSION if self._enable_compression else CLP_RAW_EXTENSION
📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 1ea69de and cb8be0a.

📒 Files selected for processing (2)
  • src/clp_logging/auto_generated_kv_pairs_utils.py (1 hunks)
  • tests/test_handlers.py (5 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
  • src/clp_logging/auto_generated_kv_pairs_utils.py
⏰ Context from checks skipped due to timeout of 90000ms (5)
  • GitHub Check: test (ubuntu-latest, 3.11)
  • GitHub Check: test (ubuntu-latest, 3.8)
  • GitHub Check: test-python37 (ubuntu-22.04, 3.7)
  • GitHub Check: test (macos-latest, 3.8)
  • GitHub Check: test-python37 (macos-13, 3.7)

@LinZhihao-723 LinZhihao-723 changed the title feat: Add ClpKeyValuePairStreamHandler to support logging dictionary type log events into CLP key-value pair IR format. feat: Add ClpKeyValuePairStreamHandler which supports logging dictionary-type log events into CLP's key-value pair IR format. Feb 17, 2025
@LinZhihao-723 LinZhihao-723 merged commit 0359ef6 into y-scope:main Feb 17, 2025
10 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants