Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve sync_ch performance in Django debug mode #265

Merged
merged 12 commits into from
Jun 21, 2017

Conversation

reupen
Copy link
Contributor

@reupen reupen commented Jun 16, 2017

In debug mode, django keeps a history of all SQL statements executed. This was causing the sync_ch command to exhaust the available memory. The main change here is to call django.db.reset_queries() to clear that history after each batch is inserted in the sync_ch command.

There's also more logging added to give more visibility of what the command is doing, and changes to the local logging config so that standard Python logging works more seamlessly.

@reupen reupen requested review from canni and elcct June 16, 2017 15:46
@codecov-io
Copy link

codecov-io commented Jun 16, 2017

Codecov Report

Merging #265 into develop will increase coverage by 0.25%.
The diff coverage is 92%.

Impacted file tree graph

@@             Coverage Diff             @@
##           develop     #265      +/-   ##
===========================================
+ Coverage    93.39%   93.64%   +0.25%     
===========================================
  Files           56       56              
  Lines         2044     2047       +3     
  Branches       189      190       +1     
===========================================
+ Hits          1909     1917       +8     
+ Misses         114      109       -5     
  Partials        21       21
Impacted Files Coverage Δ
datahub/core/utils.py 80% <100%> (-0.65%) ⬇️
datahub/company/management/commands/sync_ch.py 86.25% <87.5%> (+7.3%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 9135011...2fe0bbc. Read the comment docs.

class Command(BaseCommand):
"""Companies House sync command."""

def handle(self, *args, **options):
"""Handle."""
try:
sync_ch(tmp_file_creator=tempfile.TemporaryFile, truncate_first=True)
except Exception as e:
with log_and_ignore_exceptions():
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why remove that? If sentry is down (and this happened a couple of times), the whole process will blow up without this.

Copy link
Contributor Author

@reupen reupen Jun 16, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So what? It's already blown up by this point and the exception chain will be there.

The old code was broken because self.stderr.write(e) takes a string and not an exception. Even if you fixed it, it would mess up the exit code because it was swallowing up any errors.

@reupen
Copy link
Contributor Author

reupen commented Jun 19, 2017

The explicit capturing of CLI command exceptions for Sentry is no longer needed, because they fixed the bug stopping that happening automatically in Raven 6.1.

@reupen reupen merged commit 7be9a77 into develop Jun 21, 2017
@reupen reupen deleted the feature/DH-342-improve-sync-ch branch June 21, 2017 15:26
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants