Skip to content
This repository has been archived by the owner on Nov 24, 2023. It is now read-only.

dm-worker: refine db operation #236

Merged
merged 70 commits into from
Aug 26, 2019
Merged

Conversation

3pointer
Copy link
Contributor

@3pointer 3pointer commented Aug 12, 2019

What problem does this PR solve?

  1. Unify db operation in Syncer and Loader into one pkg
  2. Remove syncer.maxRetry config

What is changed and how it works?

  1. Use baseConn struct to connect db, both Syncer and Loader hold the baseConn, which can take same retry policy in the future.
  2. Remove syncer.maxRetry config, In my option, if we can define error type in the future, when we met error we need to retry, we should retry. user may not realize the error we met, and don't know how many retry times need to be set, So make the user controlling the maxRetry times is a bit improperly.

Check List

Tests

  • Unit test
  • Integration test
  • Manual test (add detailed scripts or steps below)
  • No code

@3pointer 3pointer added status/WIP This PR is still work in progress type/feature New feature type/feature-request This issue is a feature request labels Aug 12, 2019
@3pointer
Copy link
Contributor Author

/run-all-tests

2 similar comments
@3pointer
Copy link
Contributor Author

/run-all-tests

@3pointer
Copy link
Contributor Author

/run-all-tests

@3pointer 3pointer force-pushed the refine_db_operation branch 2 times, most recently from 5b2d3e2 to a0ad3d3 Compare August 13, 2019 07:59
@3pointer
Copy link
Contributor Author

/run-all-tests

2 similar comments
@3pointer
Copy link
Contributor Author

/run-all-tests

@3pointer
Copy link
Contributor Author

/run-all-tests

@3pointer 3pointer force-pushed the refine_db_operation branch from a2cb07a to 837e0a4 Compare August 14, 2019 07:56
@3pointer
Copy link
Contributor Author

/run-all-tests

@codecov
Copy link

codecov bot commented Aug 14, 2019

Codecov Report

Merging #236 into master will increase coverage by 1.195%.
The diff coverage is 62.6582%.

@@               Coverage Diff               @@
##             master       #236       +/-   ##
===============================================
+ Coverage   58.6197%   59.8148%   +1.195%     
===============================================
  Files           125        128        +3     
  Lines         15360      14366      -994     
===============================================
- Hits           9004       8593      -411     
+ Misses         5468       4924      -544     
+ Partials        888        849       -39

@3pointer 3pointer added status/PTAL This PR is ready for review. Add this label back after committing new changes and removed status/WIP This PR is still work in progress labels Aug 15, 2019
@3pointer 3pointer changed the title Refine db operation dm-worker: refine db operation Aug 15, 2019
@3pointer
Copy link
Contributor Author

/run-all-tests

@3pointer 3pointer added the priority/important Major change, requires approval from ≥2 primary reviewers label Aug 15, 2019
pkg/utils/db.go Outdated
func (conn *BaseConn) NormalRetryOperation(ctx *tcontext.Context, operateFn func(*tcontext.Context, int) (interface{}, error), retryFn func(int, error) bool) (interface{}, error) {
var err error
var ret interface{}
for i := 0; i < 100; i++ {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please don't use hard code

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

just let retryFn to control when to exit?

Copy link
Collaborator

@IANTHEREAL IANTHEREAL Aug 15, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

retryFn may also need a tcontext argument

Copy link
Collaborator

@IANTHEREAL IANTHEREAL Aug 15, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should we sleep a duration?

pkg/utils/db.go Outdated
}

// InfinityRetryOperation retry operation until succeed
func (conn *BaseConn) InfinityRetryOperation(operateFn func() (interface{}, error), retryFn func(error) bool) (interface{}, error) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do operateFn and retryFn need tcontext argument?

pkg/utils/db.go Outdated

// QuerySQL query sql
func (conn *BaseConn) QuerySQL(tctx *tcontext.Context, query string) (*sql.Rows, error) {
if conn == nil || conn.DB == nil {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should we also check it in NormalRetryOperation and InfinityRetryOperation

pkg/utils/db.go Outdated
tctx.L().Error("rollback failed", zap.String("Query", sqls[i].Query), zap.Reflect("argument", sqls[i].Args), log.ShortError(rerr))
}
// we should return the exec err, instead of the rollback rerr.
return i, errors.Trace(err)
Copy link
Collaborator

@IANTHEREAL IANTHEREAL Aug 15, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should we use the new error system in #216

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this PR need rebase master after #216 merged

"github.com/pingcap/dm/dm/config"
tcontext "github.com/pingcap/dm/pkg/context"
"github.com/pingcap/dm/pkg/log"
"github.com/pingcap/dm/pkg/utils"
Copy link
Collaborator

@IANTHEREAL IANTHEREAL Aug 15, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We divide the import package path into three sections

  • built-in packages
  • dm packages
  • other packages

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

syncer/relay.go Outdated
@@ -60,7 +60,7 @@ func (s *Syncer) setInitActiveRelayLog() error {
}
} else {
// start from dumper or loader, get current pos from master
pos, _, err = utils.GetMasterStatus(s.fromDB.db, s.cfg.Flavor)
pos, _, err = utils.GetMasterStatus(s.fromDB.baseConn.DB, s.cfg.Flavor)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

wrap a function to make it like s.fromDB.GetMasterStatus?

loader/db.go Outdated
func (conn *Conn) executeSQL2(ctx *tcontext.Context, stmt string, maxRetry int, args ...interface{}) error {
if conn == nil || conn.db == nil {
return errors.NotValidf("database connection")
func (conn *Conn) executeSQL(ctx *tcontext.Context, ddl bool, queries []string, args ...[]interface{}) error {
Copy link
Collaborator

@IANTHEREAL IANTHEREAL Aug 15, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we replace ddl bool argument by other argument like retryFn?

@IANTHEREAL
Copy link
Collaborator

how to reset all connections in one unit to handle error invalid connection/bad connection?

pkg/utils/db.go Outdated
}

// InfinityRetryOperation retry operation until succeed
func (conn *BaseConn) InfinityRetryOperation(operateFn func() (interface{}, error), retryFn func(error) bool) (interface{}, error) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

seems this function has no relationship with BaseConn?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, its useless, removed already

@IANTHEREAL
Copy link
Collaborator

IANTHEREAL commented Aug 16, 2019

one important question: under what circumstances we will receive an invalid connection error?you can learn from code of go-mysql.
And I think we need to have a strategy to avoid the situation that cannot do replication because of frequent invalid connection. We prefer slow replication than going into the error recovery process frequently (because binlog replication unit doesn't save checkpoint ASAP, it would re-replicate from old checkpoint again and again)

@IANTHEREAL IANTHEREAL removed the status/PTAL This PR is ready for review. Add this label back after committing new changes label Aug 24, 2019
Copy link
Member

@csuzhangxc csuzhangxc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

rest LGTM.
also see #236 (comment) again.

@csuzhangxc
Copy link
Member

how about adding slow log (if cost > 1) for syncer/db.go?

@3pointer
Copy link
Contributor Author

3pointer commented Aug 25, 2019

how about adding slow log (if cost > 1) for syncer/db.go?

you mean log as same as loader/db.go such like ctx.L().Warn("transaction execute successfully", zap.Duration("cost time", cost))?

@csuzhangxc
Copy link
Member

how about adding slow log (if cost > 1) for syncer/db.go?

you mean log as same as loader/d.go such like ctx.L().Warn("transaction execute successfully", zap.Duration("cost time", cost))?

YES, it is.

@3pointer
Copy link
Contributor Author

how about adding slow log (if cost > 1) for syncer/db.go?

you mean log as same as loader/d.go such like ctx.L().Warn("transaction execute successfully", zap.Duration("cost time", cost))?

YES, it is.

ok, fixed

Copy link
Member

@csuzhangxc csuzhangxc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@csuzhangxc csuzhangxc added status/LGT2 Two reviewers already commented LGTM, ready for merge and removed status/LGT1 One reviewer already commented LGTM labels Aug 25, 2019
Copy link
Contributor

@amyangfei amyangfei left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@amyangfei amyangfei added status/LGT3 Three reviewers already commented LGTM, ready for merge and removed status/LGT2 Two reviewers already commented LGTM, ready for merge labels Aug 25, 2019
@3pointer 3pointer merged commit 2ed7cb6 into pingcap:master Aug 26, 2019
@3pointer 3pointer deleted the refine_db_operation branch August 26, 2019 02:38
lichunzhu pushed a commit to lichunzhu/dm that referenced this pull request Apr 6, 2020
* syncer db conn

* loader db refine

* remove useless code

* add db retry policy

* remove syncer max retry config

* add normal retry policy

* update retry strategy

* add context for retry

* update to terror

* add invalid strategy

* update retry invalid connection

* update retry strategy

* add retry strategy comment

* add reset connection logic

* use same baseConn in Conns

* udpate syncer dbs

* unify query in log

* share same *DB by all conns

* split load execute sql with different retryFn

* unify retry logic to pkg

* put retry error together

* unify syncer conn helper function

* update retry logic and some tests error formats

* add baseConn test

* make baseconn a pkg

* abstract retry strategy

* add querySQL args

* add whiteList error retry logic

* unify retry error in syncer/loader

* keep max-retry in syncerConifg
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
priority/important Major change, requires approval from ≥2 primary reviewers status/LGT3 Three reviewers already commented LGTM, ready for merge type/feature New feature type/feature-request This issue is a feature request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants