Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

InfluxDB 2.0 sometimes uses inmem index instead of TSI #20257

Closed
pierrekin opened this issue Dec 4, 2020 · 2 comments · Fixed by #20313
Closed

InfluxDB 2.0 sometimes uses inmem index instead of TSI #20257

pierrekin opened this issue Dec 4, 2020 · 2 comments · Fixed by #20313
Assignees
Labels
area/compat-v1x v1.x compatibility related work in v2.x area/2.x OSS 2.0 related issues and PRs kind/bug

Comments

@pierrekin
Copy link

pierrekin commented Dec 4, 2020

Steps to reproduce:
we ran into this issue while trying to do something else, I have not attempted to create a minimum set of steps

  1. Upgrade an influx 1.8 database with > 1M series cardinality
  2. Start inserting data using the new flux line protocol
  3. Wait for a fair amount of time

Expected behavior:
Data to be ingested as normal

Actual behavior:
We are inserting with many different threads, and occasionally some threads get stuck with the following behaviour.

  • We occasionally get an (HTTP 500) error with a body like {"code":"internal error","message":"unexpected error writing points to database: partial write: max-series-per-database limit exceeded: (1000000) dropped=5"} when writing to InfluxDB

Environment info:

  • System info: Run uname -srm and copy the output here
    Linux 5.4.0-1029-aws x86_64

  • InfluxDB version: Run influxd version and copy the output here
    InfluxDB 2.0.2 (git: 84496e507a) build_date: 2020-11-19T03:59:35Z

  • Other relevant environment details: Container runtime, disk info, etc

This max-series-per-database setting is not documented for InfluxDB v2 and only exists for the inmem index type which we don’t use (use tsi1 on InfluxDB v1.8, there is no option to change the index type on InfluxDB v2.0 afaik)

Seems like we get a message like this when InfluxDB opens a new shard or is restarted:
(here are those counts over time.. the tsi1_count does seem to be increasing, while the inmem_count decreased after a restart)

Dec 01 12:34:14 influxdb-experiment01 influxd[55446]: ts=2020-12-01T12:34:14.272747Z lvl=warn msg="Mixed shard index types" log_id=0QfXUnBl000 service=storage-engine inmem_count=15 tsi1_count=1 db_instance=e06881594d095d90
Dec 01 12:34:31 influxdb-experiment01 influxd[55446]: ts=2020-12-01T12:34:31.212353Z lvl=warn msg="Mixed shard index types" log_id=0QfXUnBl000 service=storage-engine inmem_count=15 tsi1_count=2 db_instance=e06881594d095d90
Dec 01 12:35:35 influxdb-experiment01 influxd[55446]: ts=2020-12-01T12:35:35.135699Z lvl=warn msg="Mixed shard index types" log_id=0QfXUnBl000 service=storage-engine tsi1_count=3 inmem_count=15 db_instance=e06881594d095d90
Dec 01 13:07:36 influxdb-experiment01 influxd[55446]: ts=2020-12-01T13:07:36.543219Z lvl=warn msg="Mixed shard index types" log_id=0QfXUnBl000 service=storage-engine inmem_count=15 tsi1_count=4 db_instance=e06881594d095d90
Dec 01 23:02:15 influxdb-experiment01 influxd[55446]: ts=2020-12-01T23:02:15.668045Z lvl=warn msg="Mixed shard index types" log_id=0QfXUnBl000 service=storage-engine inmem_count=15 tsi1_count=5 db_instance=e06881594d095d90
Dec 02 09:31:01 influxdb-experiment01 influxd[2116776]: ts=2020-12-02T09:31:01.280938Z lvl=warn msg="Mixed shard index types" log_id=0QphRlrW000 service=storage-engine tsi1_count=5 inmem_count=11 db_instance=e06881594d095d90

some notes from our engineer investigating this:

I think possibly what’s happening is that the upgrade process created inmem shard indexes instead of tsi1

ok staring at more influxdb source code will make me crazy.. I don’t know why we’re getting inmem indexes--those are the only kind that have the max-series-per-database error and the only place we use them is the legacy influx05

In InfluxDB v1.8 there was a tool for transferring between inmem and tsi1 and inspecting the current indexes (influx_inspect). I can’t find that in InfluxDB v2.0 (the binary is not provided in any of their downloads and while it looks like the command still exists in the source, most of the functionality has been stripped out)

Config:

  • Tried setting the (undocumented) max-series-per-database option in the config file, had no effect

Logs:
see above

Performance:
n/a

@danxmoran danxmoran assigned danxmoran and unassigned psteinbachs Dec 7, 2020
@danxmoran danxmoran added area/2.x OSS 2.0 related issues and PRs area/compat-v1x v1.x compatibility related work in v2.x labels Dec 7, 2020
@danxmoran
Copy link
Contributor

Did a quick test on my laptop:

  1. Ran influxd upgrade to upgrade my 1.8 DB to a fresh 2.0 instance
  2. Started influxd
  3. Saw this in the logs:
2020-12-07T20:58:40.841899Z     info    Opened shard    {"log_id": "0QwkpkMG000", "service": "storage-engine", "op_name": "tsdb_open", "index_version": "inmem", "path": "/Users/dan/.influxdbv2/engine/data/af0736c5aba3a58c/autogen/3", "duration": "284.769ms"}

Emphasis on "index_version": "inmem". Still need to figure out why this is happening...

@danxmoran
Copy link
Contributor

Looks like:

  1. influxd upgrade runs don't always (possibly never) generate tsi1 index files
  2. There was some fallback code in the DB to handle this case by injecting the inmem index when no tsi1 files were present.

We're going to use this as our excuse to delete all the inmem index code. There will be different fallback logic to generate tsi1 files on startup if needed, so you users shouldn't need to worry about re-upgrading etc.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/compat-v1x v1.x compatibility related work in v2.x area/2.x OSS 2.0 related issues and PRs kind/bug
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants