Improved docs #335

tony · 2025-02-27T22:05:12Z

Changes

Improved Docs

Summary by Sourcery

Adds example code demonstrating various use cases of the unihan-etl library, including linguistic analysis, educational tools, data integration, software development, research analysis, input method development, stroke order extraction, and API development.

Tests:

Adds example tests showcasing the usage of the unihan-etl library for different applications.
Adds tests for extracting character learning data for educational applications.
Adds tests for database population with UNIHAN data.
Adds tests for extracting dictionary data from UNIHAN for software development.
Adds tests for extracting and analyzing etymology data with UNIHAN.
Adds tests for input method development with UNIHAN data.
Adds tests for extracting and analyzing stroke order data.
Adds tests for custom data processing with UNIHAN data.
Adds tests for developing an API with unihan-etl data.
Adds tests for using custom fields with UNIHAN data.
Adds tests for filtering characters in the UNIHAN dataset.
Adds tests for accessing UNIHAN fields metadata.
Adds tests for basic usage of the Packager class to get data.
Adds tests for retrieving specific character information.

sourcery-ai · 2025-02-27T22:05:16Z

Reviewer's Guide by Sourcery

This pull request adds a comprehensive suite of example tests to the unihan-etl library. These tests demonstrate various use cases, including linguistic analysis, educational tools, data integration, software development, research analysis, input method development, stroke order extraction, advanced API usage, custom fields, character filtering, and basic data retrieval. Each test provides a practical example of how to leverage the library for specific tasks, enhancing its usability and showcasing its versatility.

No diagrams generated as the changes look simple and do not need a visual representation.

File-Level Changes

Change	Details	Files
Added example tests demonstrating various use cases of the `unihan-etl` library, such as linguistic analysis, educational tools, data integration, software development, research analysis, input method development, stroke order extraction, advanced API usage, custom fields, character filtering, and character lookup.	Created `test_linguistic_analysis.py` to demonstrate linguistic analysis using UNIHAN data. Created `test_educational_tools.py` to demonstrate extracting character learning data for educational applications. Created `test_data_integration.py` to demonstrate integrating UNIHAN data with database systems. Created `test_software_dev.py` to demonstrate extracting dictionary data for software development. Created `test_research_analysis.py` to demonstrate extracting etymology data for research analysis. Created `test_input_method.py` to demonstrate input method development using UNIHAN data. Created `test_stroke_order.py` to demonstrate extracting and analyzing stroke order data. Created `test_advanced_api.py` to demonstrate building advanced processing pipelines with UNIHAN data. Created `test_api_development.py` to demonstrate developing an API with unihan-etl data. Created `test_custom_fields.py` to demonstrate using custom fields with UNIHAN data. Created `test_character_filtering.py` to demonstrate filtering characters in the UNIHAN dataset. Created `test_unihan_fields.py` to demonstrate working with UNIHAN field metadata. Created `test_basic_usage.py` to demonstrate basic usage of the Packager class to get data. Created `test_character_lookup.py` to demonstrate retrieving specific character information.	`tests/examples/test_linguistic_analysis.py` `tests/examples/test_educational_tools.py` `tests/examples/test_data_integration.py` `tests/examples/test_software_dev.py` `tests/examples/test_research_analysis.py` `tests/examples/test_input_method.py` `tests/examples/test_stroke_order.py` `tests/examples/test_advanced_api.py` `tests/examples/test_api_development.py` `tests/examples/test_custom_fields.py` `tests/examples/test_character_filtering.py` `tests/examples/test_unihan_fields.py` `tests/examples/test_basic_usage.py` `tests/examples/test_character_lookup.py`

Tips and commands

Interacting with Sourcery

Trigger a new review: Comment @sourcery-ai review on the pull request.
Continue discussions: Reply directly to Sourcery's review comments.
Generate a GitHub issue from a review comment: Ask Sourcery to create an
issue from a review comment by replying to it. You can also reply to a
review comment with @sourcery-ai issue to create an issue from it.
Generate a pull request title: Write @sourcery-ai anywhere in the pull
request title to generate a title at any time. You can also comment
@sourcery-ai title on the pull request to (re-)generate the title at any time.
Generate a pull request summary: Write @sourcery-ai summary anywhere in
the pull request body to generate a PR summary at any time exactly where you
want it. You can also comment @sourcery-ai summary on the pull request to
(re-)generate the summary at any time.
Generate reviewer's guide: Comment @sourcery-ai guide on the pull
request to (re-)generate the reviewer's guide at any time.
Resolve all Sourcery comments: Comment @sourcery-ai resolve on the
pull request to resolve all Sourcery comments. Useful if you've already
addressed all the comments and don't want to see them anymore.
Dismiss all Sourcery reviews: Comment @sourcery-ai dismiss on the pull
request to dismiss all existing Sourcery reviews. Especially useful if you
want to start fresh with a new review - don't forget to comment
@sourcery-ai review to trigger a new review!
Generate a plan of action for an issue: Comment @sourcery-ai plan on
an issue to generate a plan of action for it.

Customizing Your Experience

Access your dashboard to:

Enable or disable review features such as the Sourcery-generated pull request
summary, the reviewer's guide, and others.
Change the review language.
Add, remove or edit custom review instructions.
Adjust other review settings.

Getting Help

Contact our support team for questions or feedback.
Visit our documentation for detailed guides and information.
Keep in touch with the Sourcery team by following us on X/Twitter, LinkedIn or GitHub.

codecov · 2025-02-27T22:06:02Z

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 59.53%. Comparing base (a47f637) to head (b15dcfb).

Additional details and impacted files

@@             Coverage Diff             @@
##           master     #335       +/-   ##
===========================================
- Coverage   70.03%   59.53%   -10.51%     
===========================================
  Files          13        8        -5     
  Lines        1325      939      -386     
  Branches      114       99       -15     
===========================================
- Hits          928      559      -369     
+ Misses        372      361       -11     
+ Partials       25       19        -6

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

sourcery-ai

Hey @tony - I've reviewed your changes - here's some feedback:

Overall Comments:

These examples are great, but consider adding a README or tutorial to guide users on how to run them.
It might be helpful to include a section on error handling and edge cases within the examples.

Here's what I looked at during the review

🟢 General issues: all looks good
🟢 Security: all looks good
🟢 Testing: all looks good
🟢 Complexity: all looks good
🟢 Documentation: all looks good

Sourcery is free for open source - if you like our reviews please consider sharing them ✨

_{Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.}

sourcery-ai · 2025-02-27T22:06:14Z

tests/examples/test_character_filtering.py

+        filtered_packager = Packager(options)
+
+        # Download the filtered data
+        filtered_packager.download()


issue (code-quality): We've found these issues:

Extract duplicate code into function (extract-duplicate-method)

Simplify sequence length comparison (simplify-len-comparison)

sourcery-ai · 2025-02-27T22:06:14Z

tests/examples/test_data_integration.py

+
+    try:
+        # Create a table for the UNIHAN data
+        cursor.execute("""


issue (code-quality): Extract code out into function (extract-method)

sourcery-ai · 2025-02-27T22:06:14Z

tests/examples/test_educational_tools.py

+        )
+
+    # Verify we created some educational data
+    assert len(educational_data) > 0


suggestion (code-quality): Simplify sequence length comparison (simplify-len-comparison)

Suggested change

assert len(educational_data) > 0

assert educational_data

sourcery-ai · 2025-02-27T22:06:14Z

tests/examples/test_input_method.py

+        pinyin_to_chars[pinyin_key].append(item["char"])
+
+    # Verify our input method dictionary has entries
+    assert len(pinyin_to_chars) > 0


suggestion (code-quality): Simplify sequence length comparison (simplify-len-comparison)

Suggested change

assert len(pinyin_to_chars) > 0

assert pinyin_to_chars

sourcery-ai · 2025-02-27T22:06:14Z

tests/examples/test_linguistic_analysis.py

+        )
+
+    # Verify we found some correspondences
+    assert len(sound_correspondences) > 0


suggestion (code-quality): Simplify sequence length comparison (simplify-len-comparison)

Suggested change

assert len(sound_correspondences) > 0

assert sound_correspondences

sourcery-ai · 2025-02-27T22:06:14Z

tests/examples/test_research_analysis.py

+
+    # Verify we extracted data
+    if data is not None:
+        assert len(variants_data) > 0


suggestion (code-quality): Simplify sequence length comparison (simplify-len-comparison)

Suggested change

assert len(variants_data) > 0

assert variants_data

sourcery-ai · 2025-02-27T22:06:15Z

tests/examples/test_stroke_order.py

+                f"for {item.get('char', 'Unknown')}"
+            )
+
+            char = item.get("char", "")


issue (code-quality): We've found these issues:

Use named expression to simplify assignment and conditional (use-named-expression)

Simplify sequence length comparison (simplify-len-comparison)

sourcery-ai · 2025-02-27T22:06:15Z

tests/examples/test_unihan_fields.py

+    fields_per_file = {}
+    for filename, fields in UNIHAN_MANIFEST.items():
+        fields_per_file[filename] = len(fields)
+
+    # Verify we have field counts
+    assert len(fields_per_file) > 0


issue (code-quality): We've found these issues:

Convert for loop into dictionary comprehension (dict-comprehension)

Simplify sequence length comparison (simplify-len-comparison)

… examples This commit resolves all test failures in the example test suite by: 1. Adding proper type annotations across all example tests: - Use modern Python type hints (e.g., `list[dict[str, Any]]` instead of `List[Dict[str, Any]]`) - Add proper type casts (`cast()`) for handling ambiguous return types - Fix incorrect type signatures for function parameters and return values - Ensure consistent type annotation style across all test files 2. Fixing test implementation issues: - Replace invalid field 'kFrequency' with supported fields in educational_tools test - Change 'kRSKangXi' to 'kRSUnicode' in stroke_order test - Simplify stroke_order test to use existing data rather than creating a new packager - Fix handling of list-type values by properly converting them to strings - Add proper null-checks and defensive programming for external data 3. Improve code quality: - Replace if-else blocks with ternary operators for conciseness - Convert for-loops to list comprehensions where appropriate - Add proper error handling for data conversion operations - Fix line length issues to comply with style guidelines - Add meaningful debug output for troubleshooting 4. Ensure test robustness: - Add fallback mechanisms for tests that depend on specific data patterns - Improve assertions to verify data integrity - Add type guards to prevent runtime errors with ambiguous types All tests now pass consistently, type checking with mypy succeeds with zero issues, and code formatting conforms to project standards. These changes improve code maintainability, readability, and reliability while providing example code that demonstrates best practices for using the unihan-etl library.

sourcery-ai bot reviewed Feb 27, 2025

View reviewed changes

tony force-pushed the improved-docs branch 2 times, most recently from 1ee19cc to 34763ba Compare February 28, 2025 10:00

tony added 5 commits February 28, 2025 13:34

pyproject(coverage[omit]) Omit test and other paths

1609d9d

docs: Examples

9586ef2

!squash examples

7fdc4bb

!squash test examples

b15dcfb

tony force-pushed the improved-docs branch from 34763ba to b15dcfb Compare February 28, 2025 19:34

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Improved docs #335

Improved docs #335

tony commented Feb 27, 2025 •

edited by sourcery-ai bot

Loading

sourcery-ai bot commented Feb 27, 2025 •

edited

Loading

Interacting with Sourcery

Customizing Your Experience

Getting Help

codecov bot commented Feb 27, 2025 •

edited

Loading

sourcery-ai bot left a comment

sourcery-ai bot Feb 27, 2025

sourcery-ai bot Feb 27, 2025

sourcery-ai bot Feb 27, 2025

sourcery-ai bot Feb 27, 2025

sourcery-ai bot Feb 27, 2025

sourcery-ai bot Feb 27, 2025

sourcery-ai bot Feb 27, 2025

sourcery-ai bot Feb 27, 2025

	assert len(sound_correspondences) > 0
	assert sound_correspondences

Improved docs #335

Are you sure you want to change the base?

Improved docs #335

Conversation

tony commented Feb 27, 2025 • edited by sourcery-ai bot Loading

Changes

Improved Docs

Summary by Sourcery

sourcery-ai bot commented Feb 27, 2025 • edited Loading

Reviewer's Guide by Sourcery

File-Level Changes

Interacting with Sourcery

Customizing Your Experience

Getting Help

codecov bot commented Feb 27, 2025 • edited Loading

Codecov Report

sourcery-ai bot left a comment

Choose a reason for hiding this comment

sourcery-ai bot Feb 27, 2025

Choose a reason for hiding this comment

sourcery-ai bot Feb 27, 2025

Choose a reason for hiding this comment

sourcery-ai bot Feb 27, 2025

Choose a reason for hiding this comment

sourcery-ai bot Feb 27, 2025

Choose a reason for hiding this comment

sourcery-ai bot Feb 27, 2025

Choose a reason for hiding this comment

sourcery-ai bot Feb 27, 2025

Choose a reason for hiding this comment

sourcery-ai bot Feb 27, 2025

Choose a reason for hiding this comment

sourcery-ai bot Feb 27, 2025

Choose a reason for hiding this comment

tony commented Feb 27, 2025 •

edited by sourcery-ai bot

Loading

sourcery-ai bot commented Feb 27, 2025 •

edited

Loading

codecov bot commented Feb 27, 2025 •

edited

Loading