Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Xml Content Handlers #240

Merged
merged 7 commits into from
Sep 1, 2020
Merged

Xml Content Handlers #240

merged 7 commits into from
Sep 1, 2020

Conversation

tefra
Copy link
Owner

@tefra tefra commented Aug 29, 2020

The goal is to stop depending on lxml for parsing and add the ability to bolt other xml content handlers for whatever reason eg performance, custom processing, compatibility, issues etc etc

from xsdata.formats.dataclass.parsers.handlers import LxmlEventHandler
from xsdata.formats.dataclass.parsers.handlers import LxmlSaxHandler
from xsdata.formats.dataclass.parsers.handlers import XmlEventHandler
from xsdata.formats.dataclass.parsers.handlers import XmlSaxHandler


parser = XmlParser(handler=LxmlEventHandler)  # Default handler based on lxml
p.parse("some/file.xml", Books)

parser = XmlParser(handler=LxmlSaxHandler)
p.parse("some/file.xml", Books)

parser = XmlParser(handler=XmlEventHandler) # native
p.parse("some/file.xml", Books)

parser = XmlParser(handler=XmlSaxHandler)  # native
p.parse("some/file.xml", Books)

From travis

----------------------------------------------------------------------------- benchmark 'size: 53.21 KB': 4 tests -----------------------------------------------------------------------------
Name (time in ms)                   Min                Max               Mean            StdDev             Median               IQR            Outliers      OPS            Rounds  Iterations
-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
test_small[LxmlIterHandler]     11.8074 (1.0)      12.7203 (1.0)      12.0336 (1.0)      0.1816 (1.10)     11.9881 (1.0)      0.1653 (1.17)         12;6  83.1004 (1.0)          80           1
test_small[XmlIterHandler]      12.0386 (1.02)     12.9314 (1.02)     12.2282 (1.02)     0.1650 (1.0)      12.1787 (1.02)     0.1408 (1.0)          15;6  81.7782 (0.98)         82           1
test_small[LxmlSaxHandler]      13.4678 (1.14)     14.4815 (1.14)     13.7428 (1.14)     0.1965 (1.19)     13.7141 (1.14)     0.1890 (1.34)         12;6  72.7652 (0.88)         73           1
test_small[XmlSaxHandler]       15.9125 (1.35)     17.6129 (1.38)     16.1757 (1.34)     0.2551 (1.55)     16.1069 (1.34)     0.2191 (1.56)          7;4  61.8212 (0.74)         60           1
-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------


------------------------------------------------------------------------------- benchmark 'size: 531.33 KB': 4 tests ------------------------------------------------------------------------------
Name (time in ms)                     Min                 Max                Mean            StdDev              Median               IQR            Outliers     OPS            Rounds  Iterations
---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
test_medium[LxmlIterHandler]     119.8988 (1.0)      121.8568 (1.0)      120.4409 (1.0)      0.6877 (1.0)      120.2558 (1.0)      0.7342 (1.39)          2;1  8.3028 (1.0)           9           1
test_medium[XmlIterHandler]      120.1281 (1.00)     129.9892 (1.07)     121.8156 (1.01)     3.0858 (4.49)     121.0813 (1.01)     0.5286 (1.0)           1;1  8.2091 (0.99)          9           1
test_medium[LxmlSaxHandler]      133.1965 (1.11)     135.3319 (1.11)     133.9587 (1.11)     0.7241 (1.05)     133.7970 (1.11)     1.0403 (1.97)          2;0  7.4650 (0.90)          8           1
test_medium[XmlSaxHandler]       162.4670 (1.36)   0m  171.2611 (1.41)     166.1869 (1.38)     3.0557 (4.44)     166.8058 (1.39)     3.9920 (7.55)          3;0  6.0173 (0.72)          7           1
---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------


------------------------------------------------------------------------- benchmark 'size: 5312.58 KB': 4 tests --------------------------------------------------------------------------
Name (time in s)                   Min               Max              Mean            StdDev            Median               IQR            Outliers     OPS            Rounds  Iterations
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
test_large[LxmlIterHandler]     1.2073 (1.0)      1.2377 (1.02)     1.2163 (1.00)     0.0129 (5.43)     1.2089 (1.0)      0.0156 (4.86)          1;0  0.8222 (1.00)          5           1
test_large[XmlIterHandler]      1.2117 (1.00)     1.2180 (1.0)      1.2146 (1.0)      0.0024 (1.0)      1.2147 (1.00)     0.0032 (1.0)           2;0  0.8233 (1.0)           5           1
test_large[LxmlSaxHandler]      1.3448 (1.11)     1.3568 (1.11)     1.3526 (1.11)     0.0046 (1.94)     1.3541 (1.12)     0.0042 (1.30)          1;0  0.7393 (0.90)          5           1
test_large[XmlSaxHandler]       1.6082 (1.33)     1.6261 (1.34)     1.6171 (1.33)     0.0077 (3.24)     1.6186 (1.34)     0.0135 (4.20)          2;0  0.6184 (0.75)          5           1
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

@codecov
Copy link

codecov bot commented Aug 29, 2020

Codecov Report

Merging #240 into master will not change coverage.
The diff coverage is 100.00%.

Impacted file tree graph

@@            Coverage Diff             @@
##            master      #240    +/-   ##
==========================================
  Coverage   100.00%   100.00%            
==========================================
  Files           58        62     +4     
  Lines         4473      4626   +153     
  Branches       767       780    +13     
==========================================
+ Hits          4473      4626   +153     
Impacted Files Coverage Δ
xsdata/codegen/mappers/definitions.py 100.00% <ø> (ø)
xsdata/codegen/mappers/schema.py 100.00% <ø> (ø)
xsdata/formats/dataclass/models/generics.py 100.00% <ø> (ø)
xsdata/models/mixins.py 100.00% <ø> (ø)
xsdata/codegen/handlers/class_extension.py 100.00% <100.00%> (ø)
xsdata/codegen/parsers/schema.py 100.00% <100.00%> (ø)
xsdata/exceptions.py 100.00% <100.00%> (ø)
xsdata/formats/bindings.py 100.00% <100.00%> (ø)
xsdata/formats/dataclass/context.py 100.00% <100.00%> (ø)
...ata/formats/dataclass/parsers/handlers/__init__.py 100.00% <100.00%> (ø)
... and 10 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 623ee6d...39345e8. Read the comment docs.

@tefra tefra added the wip label Aug 29, 2020
@tefra tefra force-pushed the sax-parser branch 2 times, most recently from f2316e7 to 8ef2034 Compare August 29, 2020 22:11
@tefra
Copy link
Owner Author

tefra commented Aug 29, 2020

Interesting stuff I thought for sure the lxml sax approach without the element tree builder would be faster
https://travis-ci.com/github/tefra/xsdata-samples/jobs/379120860

@tefra
Copy link
Owner Author

tefra commented Aug 29, 2020

Next python built-in xml handlers 👍

@tefra tefra force-pushed the sax-parser branch 7 times, most recently from 92ddfa1 to a812122 Compare August 30, 2020 20:51
@tefra tefra mentioned this pull request Aug 30, 2020
@tefra
Copy link
Owner Author

tefra commented Aug 30, 2020

It looks good so far, the w3c suite has a few tests failing, that also fail with lxml without the recover mode because of some illegal characters.

@tefra tefra force-pushed the sax-parser branch 2 times, most recently from c821ca8 to 5436b18 Compare September 1, 2020 00:26
@tefra tefra removed the wip label Sep 1, 2020
@tefra tefra merged commit 6d4c8f2 into master Sep 1, 2020
@tefra tefra deleted the sax-parser branch September 1, 2020 23:02
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant