-
Notifications
You must be signed in to change notification settings - Fork 90
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support for "non traditional" dataclasses #495
Comments
That's unfortunate. You're correct in your assessment of the cause here. Two options:
In the short term I suspect that option 2 will be easier (if it's possible). In the long term option 1 may be a better choice. |
Ok, I did some experimenting with edgedb last night. It's annoying that these look like dataclasses but don't make use of any of the standard python dataclass machinery. In short:
If you look at the dataclass docs there's no mention of Anyway, two options for how to support these here: Option 1Any dataclass-like thing that looks magical (custom Option 2We redo our dataclass encoding support to always lookup fields in The nice thing about this design is it fixes a wart on our current dataclass support. Currently we encode any attributes on a dataclass that don't start with a The downside is it comes at a performance cost. I wrote up a POC implementation - here's a quick benchmark comparing msgspec & orjson for encoding dataclasses with Currently
With this change
Currently we're faster than orjson for encoding all dataclasses. With this change we're still always faster for I could go either way here and would love a second opinion @provinzkraut if you have the time. |
Does that information need to necessarily be pulled from the class? I'm thinking that for decoding, having the instance
My initial thought would be that option 2 is preferable, as it is the "correct" way of doing things, even though it comes at a performance cost. Would you be able to make the POC implementation of this available somewhere so I can play around with it a bit? |
No, that's sufficient for encoding - these bullets were mostly just notes on their implementation on how they differed from the standard library.
Sure thing - I pushed up a WIP PR at #501. Only the JSON encoder has been updated (and still no tests/docs), but I think the implementation there is solid if this is the path we want to take. |
@jcrist you're too fast for me (= Anyway, I got to run some tests with your WIP, specifically comparing its performance to orjson. I have a project that has about ~170 dataclasses of varying sizes, complexity, levels of nesting, slots, no slots. I tested the serialization of those with real world data and found that on average, msgspec performs the same as orjson here. I was planning to have more detailed stats for this, but seeing as you already completed the work on this it seems like that's not needed anymore. Thank you once more for the fast turnaround! ❤️ |
Description
I ran into an issue encoding EdgeDB's Objects with msgspec, where it would always come out as an empty dict.
However, according to their docs, these support the dataclass protocol, and inspecting them manually reveals this to be the case. All conversion / inspection methods from the
dataclasses
module (asdict
,astuple
,fields
, etc.) work as expected on them.I checked how msgspec encodes these and I believe the issue might have something to do with the fact that msgspec tries to serialise any object that reports being a dataclass (i.e. has the respective
__dataclass_fields__
set) [1], using its__dict__
and__slots__
[2]. This however seems to cause issues because the EdgeDBObject
does not meet this expectation.I realise this might be a bit tricky, but at the moment I can't really think of any workaround, since msgspec thinks these are a supported type, therefore preventing the use of a custom
enc_hook
.Do you see a way for msgspec to either fully support dataclasses like these or raise an exception so that one could use an
enc_hook
?The text was updated successfully, but these errors were encountered: