Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unable to write Pandas Dataframe #169

Closed
stepanurban opened this issue Nov 23, 2020 · 7 comments · Fixed by #170
Closed

Unable to write Pandas Dataframe #169

stepanurban opened this issue Nov 23, 2020 · 7 comments · Fixed by #170
Labels
bug Something isn't working
Milestone

Comments

@stepanurban
Copy link

stepanurban commented Nov 23, 2020

Hi there,
I'm not able to write pandas dataframe into Influx 2.0

import pandas as pd
import numpy as np
from datetime import datetime, timedelta

date_today = datetime.now()
days = pd.date_range(date_today, date_today + timedelta(7), freq='D')

np.random.seed(seed=1111)
data = np.random.randint(1, high=100, size=len(days))
df = pd.DataFrame({'date': days, 'col2': data})
df = df.set_index('date')
print(df)
_write_client.write("bucket", org, record=df, data_frame_measurement_name='meas')

There is only empty measurement meas. There is no output of the command even when Debug=True. What's wrong?

@bednar
Copy link
Contributor

bednar commented Nov 24, 2020

Hi @stepanurban,

Thanks for using our client.

How do you create the _write_client?

The default instance of WriteApi uses batching. At the end of your writes you should call _write_client.__del__() to flush records or use synchronous version of WriteApi: _write_client = client.write_api(write_options=SYNCHRONOUS).

Please check following code:

from datetime import datetime, timedelta

import numpy as np
import pandas as pd

from influxdb_client import InfluxDBClient
from influxdb_client.client.write.dataframe_serializer import data_frame_to_list_of_points
from influxdb_client.client.write_api import PointSettings

"""
Prepare Client
"""
org = "my-org"
bucket = "my-bucket"
token = "my-token"

client = InfluxDBClient(url="http://localhost:8086", token=token, org=org)

"""
Prepare DataFrame
"""
date_today = datetime.now()
days = pd.date_range(date_today, date_today + timedelta(7), freq='D')

np.random.seed(seed=1111)
data = np.random.randint(1, high=100, size=len(days))
df = pd.DataFrame({'date': days, 'col2': data})
df = df.set_index('date')
print(df)

"""
Check Generated LineProtocol
"""
points = data_frame_to_list_of_points(data_frame=df,
                                      point_settings=PointSettings(),
                                      data_frame_measurement_name='meas')
print(points)

"""
Ingest DataFrame
"""
_write_client = client.write_api()
_write_client.write(bucket, org, record=df, data_frame_measurement_name='meas')
# Flush changes
_write_client.__del__()

"""
Querying ingested data
"""
query = f'from(bucket:"{bucket}")' \
        ' |> range(start: 0, stop: 10d)' \
        ' |> filter(fn: (r) => r._measurement == "meas")'
result = client.query_api().query(query=query)

"""
Processing results
"""
print()
print("=== results ===")
print()
for table in result:
    for record in table.records:
        print('{0}: {1} = {2}'.format(record["_time"], record["_field"], record["_value"]))

"""
Close client
"""
client.__del__()

Regards

@bednar bednar added the question Further information is requested label Nov 24, 2020
@stepanurban
Copy link
Author

Hi @bednar,
Thanks for your response! The problem really was with flushing the data. Unfortunately another issue arise when sending data. I'm getting

The batch item wasn't processed successfully because: (400)
Reason: Bad Request
HTTP response headers: HTTPHeaderDict({'Content-Type': 'application/json; charset=utf-8', 'X-Platform-Error-Code': 'invalid', 'Date': 'Tue, 24 Nov 2020 10:59:34 GMT', 'Content-Length': '565'})
HTTP response body: {"code":"invalid","message":"unable to parse 'meas  1606219156423525000': invalid field format\nunable to parse 

due to NaN values in the dataframe. When you will change

data = np.empty(len(days))
data[:] = np.NaN

you can get this response. How can I cope with NaNs?

@bednar
Copy link
Contributor

bednar commented Nov 24, 2020

The data_frame_to_list_of_points is able to handle NaNs, but your DataFrame is without data.

Following data are pretty fine:

data = np.random.randint(1, high=100, size=len(days))
df = pd.DataFrame({'date': days, 'col2': data, 'col3': np.NaN})

@stepanurban
Copy link
Author

Sorry for wrong example. For integer it works fine unfortunately the error arise when there are floats with NaNs

data = np.random.random(len(days))
data[0] = np.NaN
df = pd.DataFrame({'date': days, 'col2': data})
df = df.set_index('date')

@bednar
Copy link
Contributor

bednar commented Nov 24, 2020

Thanks, I see... there is the LineProtocol without fields:

meas  1606231210460024000
meas col2=0.9250037018616327 1606317610460024000
meas col2=0.34357342320721895 1606404010460024000
meas col2=0.3104769418698441 1606490410460024000
meas col2=0.002009839949097536 1606576810460024000
meas col2=0.23559472439157547 1606663210460024000
meas col2=0.23779172015837957 1606749610460024000
meas col2=0.7359158732407093 1606836010460024000

I will take a look

@bednar
Copy link
Contributor

bednar commented Nov 25, 2020

Hi @stepanurban,

the issue is fixed in #170.

You could try a dev version by:

pip install git+https://github.com/influxdata/influxdb-client-python.git@fix/dataframe-rows-nan

Regards

@stepanurban
Copy link
Author

Works perfectly thanks!

@bednar bednar added this to the 1.13.0 milestone Nov 25, 2020
@bednar bednar added bug Something isn't working and removed question Further information is requested labels Nov 25, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants