Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Write DAT(s) that test different type combinations #9443

Closed
edgao opened this issue Jan 12, 2022 · 5 comments · Fixed by #10325
Closed

Write DAT(s) that test different type combinations #9443

edgao opened this issue Jan 12, 2022 · 5 comments · Fixed by #10325
Assignees

Comments

@edgao
Copy link
Contributor

edgao commented Jan 12, 2022

Depends on #9442

Write a new DestinationAcceptanceTest method to check that the destination has support for all of the types that can be passed in an AirbyteRecordMessage. The test should verify that for each type, the destination either handles that type (i.e. can accept+read back records containing those types) or falls back to string-typing.

String-typing a record is defined as finding the deepest element(s) that the destination can't handle, then replacing them with {"type": "string"}. For example, if a destination that can't handle arrays receives a schema of {"type": "object", "properties": {"prop1": {"type": "object", "properties": {"prop2": {"type": "array"}}}}}, it should treat that schema as {"type": "object", "properties": {"prop1": {"type": "object", "properties": {"prop2": {"type": "string"}}}}} (note that the array is now a string). Then it would treat an input message of {"prop1": {"prop2": [1, 2, 3]}} as {"prop1": {"prop2": "[1, 2, 3]"}} (i.e. serialize the array [1, 2, 3] to a string "[1, 2, 3]").

Each test case should verify a schema/record pair, which are built up from these base types (and corresponding values). It should use the schema/record to construct an AirbyteCatalog and AirbyteMessage, run the sync, and assert that the messages were retrieved correctly (similarly to how testSync works).

Base type Value
{"type": "string"} "foo bar"
{"type": "string"} ""
{"type": "string"} "some random special characters: ࠈൡሗ"
{"type": "string", "format": "date"} "2021-01-23"
{"type": "string", "format": "date"} "1504-02-29"
{"type": "string", "format": "date"} "9999-12-23"
{"type": "string", "format": "date-time", "airbyte_type": "timestamp_with_timezone"} "2022-11-22T01:23:45+05:00"
{"type": "string", "format": "date-time", "airbyte_type": "timestamp_with_timezone"} "1504-02-29T01:23:45+05:00"
{"type": "string", "format": "date-time", "airbyte_type": "timestamp_with_timezone"} "9999-12-21T01:23:45+05:00"
{"type": "string", "format": "date-time", "airbyte_type": "timestamp_without_timezone"} "2022-11-22T01:23:45"
{"type": "string", "format": "date-time", "airbyte_type": "timestamp_without_timezone"} "1504-02-29T01:23:45"
{"type": "string", "format": "date-time", "airbyte_type": "timestamp_without_timezone"} "9999-12-21T01:23:45"
{"type": "number"} 56.78
{"type": "number"} 0
{"type": "number"} -12345.678
{"type": "string", "airbyte_type": "big_number"} "1,000,000,...,000.1234" with 500 0's, i.e. "100000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000.1234"
{"type": "string", "airbyte_type": "big_number"} "0"
{"type": "string", "airbyte_type": "big_number"} "-12345.678"
{"type": "integer"} 42
{"type": "integer"} 0
{"type": "integer"} -12345
{"type": "string", "airbyte_type": "big_integer"} "123141241234124123141241234124123141241234124123141241234124123141241234124"
{"type": "string", "airbyte_type": "big_integer"} "0"
{"type": "string", "airbyte_type": "big_integer"} "-12345"
{"type": "boolean"} true

Each schema/record pair should use a properties and an additionalProperties taken from these options:

  • properties:
    • Schema contains one property of each base type, and the record contains those properties (with the values taken from the table above)
    • Schema contains one nullable property of each base type (e.g. {"type": ["null", "string"]}), and the record contains those properties
    • Schema contains one nullable property of each base type (e.g. {"type": ["null", "string"]}), and the record does not contain those properties
    • Schema contains one nullable property of each base type (e.g. {"type": ["null", "string"]}), and the record contains those properties, but set to null
    • Schema contains no explicit properties
  • additionalProperties:
    • true - record contains one additional property for each base type
    • true - but do not set any additional properties
    • {"type": "integer"} - record contains a single additional property with value 42
    • [{"type": "integer"}, {"type": "object"}] - record contains two additional properties, with values 42 and {"foo": "bar"}
    • false
    • not set
example test case

For example, consider the case where properties is "one property of each type" and additionalProperties is {"type": "integer"}. The schema would be:

{
  "type": "object",
  "properties": {
    "property_string": {"type": "string"},
    "property_date": {"type": "string", "format": "date"},
    "property_timestamp_with_timezone": {"type": "string", "format": "date-time", "airbyte_type": "timestamp_with_timezone"},
    "property_timestamp_without_timezone": {"type": "string", "format": "date-time", "airbyte_type": "timestamp_without_timezone"},
    "property_number": {"type": "number"},
    "property_big_number": {"type": "string", "airbyte_type": "big_number"},
    "property_integer": {"type": "integer"},
    "property_big_integer": {"type": "string", "airbyte_type": "big_integer"},
    "property_boolean": {"type": "boolean"}
  },
  "additionalProperties": {"type": "integer"}
}

And the record would be:

{
  "property_string": "foo bar",
  "property_date": "2021-01-23",
  "property_timestamp_with_timezone": "2022-11-22T01:23:45+00:00",
  "property_timestamp_without_timezone": "2022-11-22T01:23:45",
  "property_number": 56.78,
  "property_big_number": "100000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000.1234",
  "property_integer": 42,
  "property_big_integer": "123141241234124123141241234124123141241234124123141241234124123141241234124",
  "property_boolean": true,
  "additional_property1": 42
}
@tuliren
Copy link
Contributor

tuliren commented Jan 15, 2022

throws an error during startup (if it sees a stream containing a type it can't handle)

Should this be a warning instead? Users may be fine with having some improperly formatted data, especially if it is an optionally supported type.

@edgao
Copy link
Contributor Author

edgao commented Jan 17, 2022

there's a longish comment thread about this :/ I think they're both valid, but generally I lean towards telling the user that something will break before doing the sync, rather than the user waiting for the sync to finish and then discovering that it didn't work.

Though I guess we don't currently have a way to exclude fields from streams, right? So it's actually impossible for the user to sync the entire stream, even if they only care about the fields that are supported

@alexandr-shegeda alexandr-shegeda moved this to Ready for implementation in GL Roadmap Jan 18, 2022
@sherifnada
Copy link
Contributor

@edgao This looks good! my only comment is that the test cases seem overly focused on happy paths e.g:

for string why don't we test UTFs, emojis, a bunch of weird character patterns etc.

similarly for numeric, a negative number, null, positive number, 42.0, etc...

@edgao
Copy link
Contributor Author

edgao commented Jan 19, 2022

added some entries to the base type/value table with odd values (this won't cost much performance).

For nulls specifically, added some options to the properties/additionalProperties lists. This will generate a few more test cases, so keeping them a bit limited.

@edgao
Copy link
Contributor Author

edgao commented Jan 26, 2022

@oustynova fyi this is ready to be worked on now

@alexandr-shegeda alexandr-shegeda moved this from Ready for implementation to Implementation in progress in GL Roadmap Jan 28, 2022
@igrankova igrankova moved this to Ready for implementation (prioritized) in GL Roadmap Feb 2, 2022
@alexandr-shegeda alexandr-shegeda moved this from Ready for implementation (prioritized) to Implementation in progress in GL Roadmap Feb 4, 2022
@DoNotPanicUA DoNotPanicUA linked a pull request Feb 14, 2022 that will close this issue
@DoNotPanicUA DoNotPanicUA moved this from Implementation in progress to In review (internal) in GL Roadmap Feb 18, 2022
@oustynova oustynova moved this from In review (internal) to In review (Airbyte) in GL Roadmap Feb 22, 2022
@DoNotPanicUA DoNotPanicUA moved this from In review (Airbyte) to Done in GL Roadmap Feb 23, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
No open projects
Archived in project
Development

Successfully merging a pull request may close this issue.

4 participants