Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Failed to start nats streaming server #20510

Open
felipecanhedo opened this issue Jan 13, 2021 · 13 comments
Open

Failed to start nats streaming server #20510

felipecanhedo opened this issue Jan 13, 2021 · 13 comments
Labels
area/2.x OSS 2.0 related issues and PRs kind/bug team/edge

Comments

@felipecanhedo
Copy link

Steps to reproduce:
After a server reboot I'm getting the following error on InfluxDB startup:

2021-01-13T16:38:31.182037Z error Failed to start nats streaming server {"log_id": "0Rg9e0yl000", "error": "nats: no servers available for connection"}
Error: nats: no servers available for connection

The command I'm using to start is:

/usr/local/bin/influxd --engine-path=/ifdata/engine --bolt-path=/ifdata/influxd.bolt

The error does not happen if I don't pass my persistent engine-path as a parameter

Environment info:

  • System info: Linux 4.18.0-193.14.3.el8_2.x86_64 x86_64
  • InfluxDB version: InfluxDB 2.0.2 (git: 84496e5) build_date: 2020-11-19T03:59:35Z

Config:
Only non-default variables are the engine path and bolt path:

--engine-path=/ifdata/engine --bolt-path=/ifdata/influxd.bolt

Logs:
2021-01-13T16:38:31.138603Z info Starting retention policy enforcement service {"log_id": "0Rg9e0yl000", "service": "retention", "check_interval": "30m"}
2021-01-13T16:38:31.138640Z info Starting precreation service {"log_id": "0Rg9e0yl000", "service": "shard-precreation", "check_interval": "10m", "advance_period": "30m"}
2021-01-13T16:38:31.138861Z info Starting query controller {"log_id": "0Rg9e0yl000", "service": "storage-reads", "concurrency_quota": 10, "initial_memory_bytes_quota_per_query": 9223372036854775807, "memory_bytes_quota_per_query": 9223372036854775807, "max_memory_bytes": 0, "queue_size": 10}
2021-01-13T16:38:31.139609Z info Configuring InfluxQL statement executor (zeros indicate unlimited). {"log_id": "0Rg9e0yl000", "max_select_point": 0, "max_select_series": 0, "max_select_buckets": 0}
2021-01-13T16:38:31.182037Z error Failed to start nats streaming server {"log_id": "0Rg9e0yl000", "error": "nats: no servers available for connection"}
Error: nats: no servers available for connection
See 'influxd -h' for help

@russorat
Copy link
Contributor

russorat commented Mar 2, 2021

@felipecanhedo thanks for the issue. can you try the latest 2.0.4 and see if this is still an issue?

@luizmendesalmeida
Copy link

Hi, I'm facing the same issue.

Mar 4 18:35:28 myHOST influxd[2102]: ts=2021-03-04T18:35:28.265180Z lvl=error msg="Failed to start nats streaming server" log_id=0SgdDP~W000 error="nats: no servers available for connection"
Mar 4 18:35:28 myHOST influxd[2102]: Error: nats: no servers available for connection
Mar 4 18:35:28 myHOST influxd[2102]: See 'influxd -h' for help

I'm running in a RPI and the version is 2.0.4 (influxdb2-2.0.4-arm64.deb)

@danxmoran danxmoran added the area/2.x OSS 2.0 related issues and PRs label Mar 4, 2021
@danxmoran
Copy link
Contributor

@luizmendesalmeida does the problem also only affect you if you change the --engine-path? Or does it always happen?

@luizmendesalmeida
Copy link

Hi @danxmoran,

I didn't change anything.
It was a straightforward install.

@felipecanhedo
Copy link
Author

Hi @russorat ,

In my case it was "fixed" by starting over with a fresh --engine-path. It was seemingly corrupted after a server reboot with the service running.

@luizmendesalmeida
Copy link

Hi,

I forgot to mention I have one 2.0.4 instance running properly in Debian 10.

@danxmoran danxmoran self-assigned this May 5, 2021
@danxmoran
Copy link
Contributor

A customer has reported that this error can happen if you hit the max-open-file limit on your system.

At minimum, we should attach a logger to the NATS server to allow for better debugging. If possible, the error message included in the log should be improved.

@dgnorton
Copy link
Contributor

@danxmoran let's look into removing NATS.

@russorat
Copy link
Contributor

If we implemented this: #15445 we could remove it

@Ing-Med
Copy link

Ing-Med commented May 31, 2021

A customer has reported that this error can happen if you hit the max-open-file limit on your system.

At minimum, we should attach a logger to the NATS server to allow for better debugging. If possible, the error message included in the log should be improved.

i have that same problem. Yesterday influxdb stopped working, spewed out errors regarding too many open http connections. The interwebz said this was due to the limit of open files linux allowed (1024). I increased the value for both my user and docker user and now it throws this error. Also reverting bach ulimits doesn't help, the issue with the nets streaming server persists.
Welp!

@fitch
Copy link

fitch commented Aug 3, 2021

I'm getting the same error. First, there was an error about too many open files. Then I increased the limit from 1024 to ulimit -n 65536 and now getting this error as does @Ing-Med.

@fitch
Copy link

fitch commented Aug 3, 2021

Well, it seems that adding LimitNOFILE=65536 to the influxdb.service file did the trick.

@Ing-Med
Copy link

Ing-Med commented Aug 3, 2021

Well, it seems that adding LimitNOFILE=65536 to the influxdb.service file did the trick.

Thanks for telling us your solution! Unfortunately i am running influxdb on docker, so i wouldn't know where to add that parameter.
And unfortunately this happend to me again, so if no one is able to pull that magic rabbit out of their hats, i'm switching back to v1.

edit, so i saw that the container itself has a filelimit still of 1024, though it should allow more for docker on a system level.
I added

ulimits:
  nproc: 65535
  nofile:
    soft: 20000
    hard: 40000

to my docker compose influx container and it suddenly started agian.

It is beyond me how this has not been documented yet. wtf

@danxmoran danxmoran removed their assignment Oct 26, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/2.x OSS 2.0 related issues and PRs kind/bug team/edge
Projects
None yet
Development

No branches or pull requests

7 participants