Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

In-place OSS 1.x to 2.x upgrade #19308

Closed
stuartcarnie opened this issue Aug 12, 2020 · 4 comments
Closed

In-place OSS 1.x to 2.x upgrade #19308

stuartcarnie opened this issue Aug 12, 2020 · 4 comments
Assignees
Labels
area/2.x OSS 2.0 related issues and PRs team/storage

Comments

@stuartcarnie
Copy link
Contributor

stuartcarnie commented Aug 12, 2020

What

Provide tooling to upgrade a user running InfluxDB 1.7 / 1.8 to InfluxDB 2.0.

Requirements

  • Provide a subcommand, influxd upgrade, that can restructure existing TSM data and update the boltdb database with the metadata from meta.db

Upgrade Steps

It is assumed that the user has progressed through the upgrade to the point their org, user and other required metadata has been created in the local Bolt database.

The tool expects to be pointed to an existing InfluxDB 1.x directory (the "source directory") where the meta.db and time-series data is stored.

  • Read meta.db
  • Scan "source directory" for wal and data
  • For each of the DB/RP pair, create
    • a bucket named "/" in the 2.0 metadata store (boltdb)
    • a DB/RP mapping the the DBRPMappingV2 service
  • Reshape the meta.db data, which stores database, retention policy and shard information.
    • This is still serialized as a single blob, as it is in 1.x to a single key within boltdb.
    • The meta client and serialization is available in 2.x and can be used to read / write from a meta.db file or bolt db
    • Reshaping entails creating a database and retention policy for each bucket and copying their respective shard groups from the original meta.db
  • move / copy (user option?) the wal and TSM data directories to the respective destination directories as described in the design below. Do not copy the _series and TSI index directories; these will be regenerated upon startup.
    • A future improvement may be to copy the _series and index directories if they exist to reduce initial startup time.

Design

The layout described in the following sections was discussed at length with Paul Dix, Edd and others.

On Disk Structure

Every bucket will have its own database and retention policy. To state it another way, every database will have exactly one retention policy, which is the bucket.

Creating two buckets, bucket-a and bucket-b would result in the following:

data/
  bucket-a/
    autogen/
  bucket-b/
    autogen/

NOTE: The bucket metadata will be separate from the TSM 1 metadata (database name, retention policy name, shards, etc). As is with 1.x, the database name and retention policy names are immutable. Other metadata, such as shard duration may be exposed via subcommands of the influx CLI tool.

Migration

Given the above property, if the user has the following existing structure:

data/
    my-db-1/
        default/
        1year
    my-db-2/
        autogen/
        1year

The migration process will generate 4 buckets, one for each of the above database / retention policy pairs.

Imagine the migration process yields the following metadata for the 4 buckets:

Bucket ID Bucket Name (derived) DB RP
7425a44ac4110001 my-db-1-default my-db-1 default
7425a44ac4110002 my-db-1-1year my-db-1 1year
7425a44ac4110003 my-db-2-autogen my-db-2 autogen
7425a44ac4110004 my-db-2-1year my-db-2 1year

The resulting layout on disk will be:

data/
    7425a44ac4110001/
        autogen/
    7425a44ac4110002/
        autogen/
    7425a44ac4110003/
        autogen/
    7425a44ac4110004/
        autogen/
@stuartcarnie stuartcarnie added area/2.x OSS 2.0 related issues and PRs team/storage labels Aug 12, 2020
@stuartcarnie stuartcarnie mentioned this issue Aug 12, 2020
8 tasks
@stuartcarnie
Copy link
Contributor Author

I expect this is > a single sprint as there will be iteration and testing to ensure the process is as smooth as possible

@sebito91 sebito91 self-assigned this Aug 14, 2020
@ivankudibal ivankudibal self-assigned this Aug 26, 2020
@russorat
Copy link
Contributor

The is the process for upgrading users to 2.0:

If there are 0 or 1 users in the 1.x instance:

  • during the upgrade process, we will prompt the user to create a new username, password, org just like the normal setup process. this org will contain the operator token, which has all permissions.

If there are >= 2 users in the 1.x instance:

  • during the upgrade process, we will prompt the user to create a new username, password, org just like the normal setup process. this org will contain the operator token, which has all permissions. We will then generate new tokens, one for each user in the 1.x instance, with the same access to the buckets as they had to the databases in 1.x to and set the description to the name of the 1.x user.

Once the upgrade is complete, the admin will log into the new 2.0 instance, and either manually set up new users in the org (with their own username/password and tokens) or distribute the newly created token to the appropriate user (in the case that the 1.x user was only used as an integration).

examples:
I am a 1.x user with 5 users (named a, b, c, d, e) configured in my 1.x instance, all with read/write access to all databases. When i run the upgrade process, my 2.0 instance contains 1 org, 1 admin user, 1 operator token, and a token for each user with the descriptions a, b, c, d, e, with r/w permissions on the buckets in 2.0 that correspond to the db/rp combos from 1.x.

I am a 1.x user with 2 users (named a, b) configured in my 1.x instance, where a has read/write access to all databases, and b has write access to a single database. When i run the upgrade process, my 2.0 instance contains 1 org, 1 admin user, 1 operator token, and a token for each user with the description a, b, where token a has with r/w permissions to all buckets in 2.0 and token b has write access to a single bucket.

@vlastahajek
Copy link
Contributor

@russorat, do you mean by we will prompt the user upgrade should provide interactive input for parameters? And no cli options? Or both?
Current options list:

  -m, --bolt-path string         path for boltdb database (default "/home/ubuntu/.influxdbv2/influxd.bolt")
  -b, --bucket string            primary bucket name
      --config-file string       optional: Custom InfluxDB 1.x config file path, else the default config file
  -e, --engine-path string       path for persistent engine files (default "/home/ubuntu/.influxdbv2/engine")
  -h, --help                     help for upgrade
      --log-path string          optional: custom log file path (default "/home/ubuntu/upgrade.log")
  -o, --org string               primary organization name
  -p, --password string          password for username
  -r, --retention string         optional: duration bucket will retain data. 0 is infinite. Default is 0.
      --security-script string   optional: generated security upgrade script path (default "/home/ubuntu/influxd-upgrade-security.sh")
  -t, --token string             optional: token for username, else auto-generated
  -u, --username string          primary username
      --v1-dir string            path to source 1.x db directory containing meta,data and wal sub-folders (default "/home/ubuntu/.influxdb")
  -v, --verbose                  verbose output (default true)

@russorat
Copy link
Contributor

@vlastahajek i would imagine it would work similar to the influx setup command today. you can run influx setup with no parameters, and you are prompted to enter the required parameters. You can also provide them on the command line (influx setup -f -b telegraf -o influxdata -u russ -p something) which will continue in non-interactive mode with no confirmation prompts (-f).

 ✗ influx setup -h
Setup instance with initial user, org, bucket

Usage:
  influx setup [flags]
  influx setup [command]

Available Commands:
  user        Setup instance with user, org, bucket

Flags:
  -c, --active-config string   Config name to use for command; Maps to env var $INFLUX_ACTIVE_CONFIG
  -b, --bucket string          primary bucket name
      --configs-path string    Path to the influx CLI configurations; Maps to env var $INFLUX_CONFIGS_PATH (default "/Users/rsavage/.influxdbv2/configs")
  -f, --force                  skip confirmation prompt
  -h, --help                   Help for the setup command
      --hide-headers           Hide the table headers; defaults false; Maps to env var $INFLUX_HIDE_HEADERS
      --host string            HTTP address of InfluxDB; Maps to env var $INFLUX_HOST
      --json                   Output data as json; defaults false; Maps to env var $INFLUX_OUTPUT_JSON
  -n, --name string            config name, only required if you already have existing configs
  -o, --org string             primary organization name
  -p, --password string        password for username
  -r, --retention string       Duration bucket will retain data. 0 is infinite. Default is 0.
      --skip-verify            Skip TLS certificate chain and host name verification.
  -t, --token string           token for username, else auto-generated
  -u, --username string        primary username

Use "influx setup [command] --help" for more information about a command.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/2.x OSS 2.0 related issues and PRs team/storage
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants