Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add metadata to segment tracking #8872

Merged
merged 6 commits into from
Jan 14, 2022
Merged

Conversation

ChristopheDuong
Copy link
Contributor

@ChristopheDuong ChristopheDuong commented Dec 17, 2021

What

I'm proposing to add some properties to segment tracking.

If I look at data sent by Segment, for example:

select *
FROM `ab-analytics-308500.api.connector_jobs` 
where job_id = 853 and connection_id = '4997a08c-5f27-4d1b-afd0-427177c0ea8d'
order by timestamp

I see that some tracking data for a job was sent on multiple different dates (all dates columns are auto-generated and handled by Segment)

see results:
query_results.txt

(Looking at the full source dataset from segment, I find 6 different dates for the same ids all the way back from 2021-09-22)

How

  • Airbyte should attach some date data to the metadata payload of when the tracking call is triggered, so we don't rely on the common fields handled by Segment: https://segment.com/docs/connections/spec/common/#timestamps
  • Additionally, some docker repositories may be masked and overridden (for example in the case of SSL-strict destinations). So the connector definition id is not enough to distinguish between the actual connector implementation. Surfacing the docker repository would help to properly track this.

@github-actions github-actions bot added area/platform issues related to the platform area/scheduler labels Dec 17, 2021
@ChristopheDuong ChristopheDuong temporarily deployed to more-secrets December 17, 2021 14:29 Inactive
@ChristopheDuong ChristopheDuong marked this pull request as ready for review December 17, 2021 14:45
@ChristopheDuong ChristopheDuong temporarily deployed to more-secrets December 17, 2021 16:44 Inactive
@@ -96,6 +96,7 @@
final JobOutput jobOutput = lastAttempt.getOutput().get();
if (jobOutput.getSync() != null) {
final StandardSyncSummary syncSummary = jobOutput.getSync().getStandardSyncSummary();
metadata.put("sync_start_time", syncSummary.getStartTime());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should we also add end time?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

duration is computed using the end time and we now have start time so we can always re-compute the reverse.
Let's leave it out to make the amount of data transiting through segment a little smaller

@ChristopheDuong ChristopheDuong temporarily deployed to more-secrets January 5, 2022 10:42 Inactive
@github-actions github-actions bot added the area/connectors Connector related issues label Jan 14, 2022
@ChristopheDuong ChristopheDuong temporarily deployed to more-secrets January 14, 2022 12:06 Inactive
@ChristopheDuong ChristopheDuong merged commit dbddd7b into master Jan 14, 2022
@ChristopheDuong ChristopheDuong deleted the chris/add-tracking-metadata branch January 14, 2022 12:37
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/connectors Connector related issues area/platform issues related to the platform area/scheduler
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants