Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

core/rawdb, cmd/geth, core/eth, vendor: fixes and inspect tool #15

Merged
merged 0 commits into from
May 14, 2019

Conversation

rjl493456442
Copy link

@rjl493456442 rjl493456442 commented Apr 26, 2019

This PR adds a few things:
(1) database inspector
(2) ancient store path checker
(3) ancient migrator
(4) external ancient supporting
(5) fix a few things

@rjl493456442 rjl493456442 requested a review from karalabe as a code owner April 26, 2019 10:38
@rjl493456442 rjl493456442 force-pushed the freezer-fix branch 6 times, most recently from 3a5d4e4 to 639ab04 Compare April 29, 2019 10:59
@@ -166,6 +167,36 @@ Remove blockchain and state databases`,
The arguments are interpreted as block numbers or hashes.
Use "ethereum dump 0" to dump the genesis block.`,
}
upgradeAncientCommand = cli.Command{
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm unsure that we want to have an upgrade command. I'd rather rely on the user messing up their db manually than us facilitating it. Upgrading the ancients is essentially moving some files on disk.

What we might think about is implicitly moving it when the user specified a different ancient dir than last time. Or is that too dangerous?

Copy link
Author

@rjl493456442 rjl493456442 May 2, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The ancient path is inconsistent, most of which is caused by the user's misoperation. For example, datadir.ancient == ./foo is resolved different with datadir.ancient == foo. So I think we should prevent this kind of misoperation. So basically we have to record the path in the database. If the path doesn't match, print a warning log to users and terminate the node.

Regarding the ancient-upgrade, users also have to demand to move the ancient data (e.g. from ssd to hdd). So I offer this command line too.

But maybe you are right, in this approach, users have to know the ancient stuff. we can have a try about the implicitly moving approach.

But if we move the ancient data to another dir, before this operation is done, the thread crashs, so we will never know where is the ancient.

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On the other hand, it makes sense to run it as a separate command, because it may take arbitrary amount of time. Lets keep it for now, but rename it to geth migratedb and support moving both the ancients as well as the main db (just for completeness).

}()
rawdb.InspectDatabase(chainDb, stopCh)
return nil
}
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a problematic wrapper (not too useful), because we can't print progress. Imho we should just call InspectDatabase here, and inside that, periodically (e.g. after 1k/10k keys) check if X seconds passed nince the last printout and ping the user then (also the key at which the inspection is at). Otherwise the progress report is mostly useless:

INFO [05-02|12:19:18.977] Inspecting database... 
INFO [05-02|12:19:28.977] Inspecting database... 
INFO [05-02|12:19:38.977] Inspecting database... 
INFO [05-02|12:19:48.977] Inspecting database... 
INFO [05-02|12:19:58.977] Inspecting database... 
INFO [05-02|12:20:08.977] Inspecting database... 
INFO [05-02|12:20:18.977] Inspecting database... 
INFO [05-02|12:20:28.977] Inspecting database... 
INFO [05-02|12:20:38.977] Inspecting database... 
INFO [05-02|12:20:48.977] Inspecting database... 
INFO [05-02|12:20:58.977] Inspecting database... 
INFO [05-02|12:21:08.977] Inspecting database... 
INFO [05-02|12:21:18.977] Inspecting database... 
INFO [05-02|12:21:28.977] Inspecting database... 
INFO [05-02|12:21:38.977] Inspecting database... 
INFO [05-02|12:21:48.977] Inspecting database... 
INFO [05-02|12:21:58.977] Inspecting database... 
INFO [05-02|12:22:08.977] Inspecting database... 
INFO [05-02|12:22:18.977] Inspecting database... 
INFO [05-02|12:22:28.977] Inspecting database... 
INFO [05-02|12:22:38.977] Inspecting database... 
INFO [05-02|12:22:48.977] Inspecting database... 
INFO [05-02|12:22:58.977] Inspecting database... 
INFO [05-02|12:23:08.977] Inspecting database... 
INFO [05-02|12:23:18.977] Inspecting database... 
INFO [05-02|12:23:28.977] Inspecting database... 
INFO [05-02|12:23:38.977] Inspecting database... 
INFO [05-02|12:23:48.977] Inspecting database... 
INFO [05-02|12:23:58.977] Inspecting database... 
INFO [05-02|12:24:08.977] Inspecting database... 
INFO [05-02|12:24:18.977] Inspecting database... 
INFO [05-02|12:24:28.977] Inspecting database... 
INFO [05-02|12:24:38.977] Inspecting database... 
INFO [05-02|12:24:48.977] Inspecting database... 
INFO [05-02|12:24:58.977] Inspecting database... 
INFO [05-02|12:25:08.977] Inspecting database... 
INFO [05-02|12:25:18.977] Inspecting database... 
INFO [05-02|12:25:28.977] Inspecting database... 
INFO [05-02|12:25:38.977] Inspecting database... 
INFO [05-02|12:25:48.977] Inspecting database... 
INFO [05-02|12:25:58.977] Inspecting database... 
INFO [05-02|12:26:08.977] Inspecting database... 
INFO [05-02|12:26:18.977] Inspecting database... 
INFO [05-02|12:26:28.977] Inspecting database... 
INFO [05-02|12:26:38.977] Inspecting database... 
INFO [05-02|12:26:48.977] Inspecting database...

@rjl493456442
Copy link
Author

rjl493456442 commented May 5, 2019

I update the PR, but still leave a few things:

  • Regarding the user-friendly log for database inspecting, I dump the count and elapsed for each 100M keys. In my laptop, it costs 1s to iterate 1M keys, so I think it's a reasonable interval.
    But we can't print the latest key since some of them are garbled.
  • Regarding the migratedb, I think it is unnecessary for main db(leveldb) because the main db is embedded in the datadir.

// ----------------+------------------+----------------------
// freezer == nil | non-freezer mode | ancient store missing
// freezer != nil | initialize | ensure consistency
stored := ReadAncientPath(kvdb)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we do some sort of canonicalization here? I mean, the path can be unexpanded (~/ancient), relative ./foo/ancient or a soft link, right?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We only store the absolute path for ancient.

But for soft link...hmmm

log.Warn("Ancient database migration aborted")
default:
// TODO(rjl493456442) This only works on the unix/linux, but not windows.
if err := os.Rename(stored, freezer); err != nil {
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think also that the rename operation does not work across filesystem boundaries, so if you rename from an SSD-disk to an HDD, then it will break. I've hit that problem on python with the evmfuzzer, trying to move stuff via rename

@rjl493456442 rjl493456442 force-pushed the freezer-fix branch 3 times, most recently from 5985660 to 804500f Compare May 9, 2019 05:58
Copy link

@holiman holiman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's looking good, found a few minor comments

// copyFileSynced copies data from source file to destination
// and synces the dest file forcibly.
func copyFileSynced(src string, dest string, info os.FileInfo) error {
data, err := ioutil.ReadFile(src)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If these are ancient files, they will most likely be 2G each. Can it be a problem to do ReadFile on such large files? Not sure myself, just making a note here

// All rights reserved.
//
// Use of this source code is governed by a BSD-style license that can be
// found in the LICENSE file.
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Our LICENSE file does not contain BSD-style license, so maybe leave the rest intact, but remove the that can be found in the LICENSE file bit ?

// All rights reserved.
//
// Use of this source code is governed by a BSD-style license that can be
// found in the LICENSE file.
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same here


// Inject all hash<->number mapping.
for i := uint64(0); i < frozen; i++ {
rawdb.WriteHeaderNumber(bc.db, rawdb.ReadCanonicalHash(bc.db, i), i)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

At this point, we will iterate through the entire header chain from zero. Since we're doing that anyway, could we remove the first check (the last header) and instead check that every header is present here? And if we ever encounter a common.Hash, we can simply write the last one and then either truncate the ancients or error out.


// errInvalidDatadir is returned if the ancient directory specified by user
// is a symbolic link.
errInvalidDatadir = errors.New("symbolic link datadir is not supported")
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps make it a bit more descriptive name

Suggested change
errInvalidDatadir = errors.New("symbolic link datadir is not supported")
errSymlinkDatadir = errors.New("symbolic link datadir is not supported")

Copy link
Owner

@karalabe karalabe left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor nits

rawdb.WriteHeadHeaderHash(bc.db, hash)
rawdb.WriteHeadFastBlockHash(bc.db, hash)

log.Info("Initialize chain with ancient", "number", frozen-1, "hash", hash)
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"Initialize" -> "Initialized"

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"ancient" -> "ancients"

return nil, errors.New("broken ancient database")
}
rawdb.WriteHeaderNumber(bc.db, hash, i)
}
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't we need to write the tx lookups too?

trieSize += size
}
count += 1
if count%100000000 == 0 {
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is not the best way, because on a slow machine (HDD) this may be too long. I'd rather maintain a time.Time of the last time we logged, and if more than 8 seconds pass, log again.

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we want to avoid doing too many time.Since ops, we can perhaps do if count % 1000 == 0 && time.Since(logged) > 8 * time.Second)

@rjl493456442
Copy link
Author

rjl493456442 commented May 14, 2019

Updated and also bump the database version.

@karalabe
Copy link
Owner

SGTM, lets merge and discuss forward on the main pr

@karalabe karalabe merged this pull request into karalabe:freezer-2 May 14, 2019
karalabe pushed a commit that referenced this pull request May 16, 2019
* core, eth: some fixes for freezer

* vendor, core/rawdb, cmd/geth: add db inspector

* core, cmd/utils: check ancient store path forceily

* cmd/geth, common, core/rawdb: a few fixes

* cmd/geth: support windows file rename and fix rename error

* core: support ancient plugin

* core, cmd: streaming file copy

* cmd, consensus, core, tests: keep genesis in leveldb

* core: write txlookup during ancient init

* core: bump database version
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants