-
Notifications
You must be signed in to change notification settings - Fork 4.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Write DAT(s) that test different type combinations #9443
Comments
Should this be a warning instead? Users may be fine with having some improperly formatted data, especially if it is an optionally supported type. |
there's a longish comment thread about this :/ I think they're both valid, but generally I lean towards telling the user that something will break before doing the sync, rather than the user waiting for the sync to finish and then discovering that it didn't work. Though I guess we don't currently have a way to exclude fields from streams, right? So it's actually impossible for the user to sync the entire stream, even if they only care about the fields that are supported |
@edgao This looks good! my only comment is that the test cases seem overly focused on happy paths e.g: for string why don't we test UTFs, emojis, a bunch of weird character patterns etc. similarly for numeric, a negative number, null, positive number, |
added some entries to the base type/value table with odd values (this won't cost much performance). For nulls specifically, added some options to the properties/additionalProperties lists. This will generate a few more test cases, so keeping them a bit limited. |
@oustynova fyi this is ready to be worked on now |
Depends on #9442
Write a new DestinationAcceptanceTest method to check that the destination has support for all of the types that can be passed in an AirbyteRecordMessage. The test should verify that for each type, the destination either handles that type (i.e. can accept+read back records containing those types) or falls back to string-typing.
String-typing a record is defined as finding the deepest element(s) that the destination can't handle, then replacing them with
{"type": "string"}
. For example, if a destination that can't handle arrays receives a schema of{"type": "object", "properties": {"prop1": {"type": "object", "properties": {"prop2": {"type": "array"}}}}}
, it should treat that schema as{"type": "object", "properties": {"prop1": {"type": "object", "properties": {"prop2": {"type": "string"}}}}}
(note that thearray
is now astring
). Then it would treat an input message of{"prop1": {"prop2": [1, 2, 3]}}
as{"prop1": {"prop2": "[1, 2, 3]"}}
(i.e. serialize the array[1, 2, 3]
to a string"[1, 2, 3]"
).Each test case should verify a schema/record pair, which are built up from these base types (and corresponding values). It should use the schema/record to construct an
AirbyteCatalog
andAirbyteMessage
, run the sync, and assert that the messages were retrieved correctly (similarly to howtestSync
works).{"type": "string"}
"foo bar"
{"type": "string"}
""
{"type": "string"}
"some random special characters: ࠈൡሗ"
{"type": "string", "format": "date"}
"2021-01-23"
{"type": "string", "format": "date"}
"1504-02-29"
{"type": "string", "format": "date"}
"9999-12-23"
{"type": "string", "format": "date-time", "airbyte_type": "timestamp_with_timezone"}
"2022-11-22T01:23:45+05:00"
{"type": "string", "format": "date-time", "airbyte_type": "timestamp_with_timezone"}
"1504-02-29T01:23:45+05:00"
{"type": "string", "format": "date-time", "airbyte_type": "timestamp_with_timezone"}
"9999-12-21T01:23:45+05:00"
{"type": "string", "format": "date-time", "airbyte_type": "timestamp_without_timezone"}
"2022-11-22T01:23:45"
{"type": "string", "format": "date-time", "airbyte_type": "timestamp_without_timezone"}
"1504-02-29T01:23:45"
{"type": "string", "format": "date-time", "airbyte_type": "timestamp_without_timezone"}
"9999-12-21T01:23:45"
{"type": "number"}
56.78
{"type": "number"}
0
{"type": "number"}
-12345.678
{"type": "string", "airbyte_type": "big_number"}
"1,000,000,...,000.1234"
with 5000
's, i.e."100000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000.1234"
{"type": "string", "airbyte_type": "big_number"}
"0"
{"type": "string", "airbyte_type": "big_number"}
"-12345.678"
{"type": "integer"}
42
{"type": "integer"}
0
{"type": "integer"}
-12345
{"type": "string", "airbyte_type": "big_integer"}
"123141241234124123141241234124123141241234124123141241234124123141241234124"
{"type": "string", "airbyte_type": "big_integer"}
"0"
{"type": "string", "airbyte_type": "big_integer"}
"-12345"
{"type": "boolean"}
true
Each schema/record pair should use a
properties
and anadditionalProperties
taken from these options:properties
:{"type": ["null", "string"]}
), and the record contains those properties{"type": ["null", "string"]}
), and the record does not contain those properties{"type": ["null", "string"]}
), and the record contains those properties, but set tonull
additionalProperties
:true
- record contains one additional property for each base typetrue
- but do not set any additional properties{"type": "integer"}
- record contains a single additional property with value42
[{"type": "integer"}, {"type": "object"}]
- record contains two additional properties, with values42
and{"foo": "bar"}
false
example test case
For example, consider the case where
properties
is "one property of each type" andadditionalProperties
is{"type": "integer"}
. The schema would be:And the record would be:
The text was updated successfully, but these errors were encountered: