-
Notifications
You must be signed in to change notification settings - Fork 4.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Performance issues in new Mongo Source Connector #6544
Comments
Hi @JCWahoo I was able to sync the |
About 20 collections, some have 10s of millions of records
…On Fri, Oct 1, 2021 at 7:24 PM Marcos Marx ***@***.***> wrote:
Hi @JCWahoo <https://github.com/JCWahoo> I was able to sync the
sample_training in 8 min. Can you give more context about your dataset?
rows/mb?
[image: image]
<https://user-images.githubusercontent.com/5154322/135695478-12ef6f98-58af-47c2-ad34-67135e352a63.png>
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#6544 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AUAW7TFCESN3FQGJQVHIVODUEY7JJANCNFSM5FABUH4Q>
.
Triage notifications on the go with GitHub Mobile for iOS
<https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675>
or Android
<https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub>.
|
Old connector runs around 16 collections hourly in ~ 5 minutes, ~10k records. Doesnt have the same issues on schema discovery or initial sync |
Tried the latest version and issue is still present |
Are there any updates that can be shared? |
Same behavior in latest 0.1.3 |
@JCWahoo sorry not answer you before here. I'll address this to the connector team. |
I upgraded both Strict and the V2 connector and have no change in performance |
@JCWahoo, could you please provide more details so that we can improve this for you? Our team tested the new changes on multiple mongodb databases with multiple collections and different field types. At the moment, we see an acceleration of work at the moment of receiving collections and their fields. I provide a video on which you can see how fast it works. The test was carried out on a database with one collection, in which there are many different fields and 10 thousand records. mongodbtest.mp4 |
Yes, I provided those stats previously. The database has around 40-50 collections, some of which have upwards of 100mm records. The old connector returns schema in ~60-90 seconds. The new connector returns schema in around 30 minutes if at all. When looking on the Mongo side the new connector appears to parse the entire collection for schema discovery as opposed to just a 10,000 record sample. I recommend testing on larger collections to see what I'm talking and viewing real time monitoring on the Mongo side. I'm currently running a nightly incremental sync of 100k-150k records using the old connector in about 20 minutes. The new connector I cannot complete an incremental sync in less than an hour, even when testing a single stream and performing the incremental sync immediately after a full refresh |
+1. I didn't have an opportunity to try the old connector but the new one feels slow. it took around 10-15 mins to return the schema for a database with 10 collections and around 13MM docs. |
The update 0.1.9 resolved the issue. Will give this one a shot next week to
confirm
…On Fri, Dec 17, 2021 at 10:47 AM andriikorotkov ***@***.***> wrote:
@JCWahoo <https://github.com/JCWahoo>, @TSkrebe
<https://github.com/TSkrebe> - We added a new version - 0.1.19. We tested
these changes on 20 collections in which there were from 110 to 250
thousand documents. please try the newer version and report your results.
—
Reply to this email directly, view it on GitHub
<#6544 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AUAW7TARU2N24BBHFNGKJZ3URNLRTANCNFSM5FABUH4Q>
.
Triage notifications on the go with GitHub Mobile for iOS
<https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675>
or Android
<https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub>.
You are receiving this because you were mentioned.Message ID:
***@***.***>
|
Enviroment
Current Behavior
When setting up a new source with this connector, schema discovery takes close to 50 minutes, and appears to be scanning entire collections from source Mongo database
When syncing records, an incremental load of 1 stream/collection < 10k records is taking > 1 hour. Comparing to the old Mongo connector, I can refresh 20 streams/collections in around 8 minutes. The connector seems to be scanning the entire collection in a much different manner than the old Ruby source
Expected Behavior
Comparable performance to old connector, no 50 minute delay in retrieving records
Logs
Attaching logs from initial full sync. Note the 50 minute gap before records are returned
Also including logs from the next incremental sync. Same gap
LOG from initial Full Sync
LOG from incremental sync
Steps to Reproduce
Are you willing to submit a PR?
Unfortunately cannot at this time
The text was updated successfully, but these errors were encountered: