Unable to write Pandas Dataframe #169

stepanurban · 2020-11-23T14:01:59Z

Hi there,
I'm not able to write pandas dataframe into Influx 2.0

import pandas as pd
import numpy as np
from datetime import datetime, timedelta

date_today = datetime.now()
days = pd.date_range(date_today, date_today + timedelta(7), freq='D')

np.random.seed(seed=1111)
data = np.random.randint(1, high=100, size=len(days))
df = pd.DataFrame({'date': days, 'col2': data})
df = df.set_index('date')
print(df)
_write_client.write("bucket", org, record=df, data_frame_measurement_name='meas')

There is only empty measurement meas. There is no output of the command even when Debug=True. What's wrong?

bednar · 2020-11-24T09:02:42Z

Hi @stepanurban,

Thanks for using our client.

How do you create the _write_client?

The default instance of WriteApi uses batching. At the end of your writes you should call _write_client.__del__() to flush records or use synchronous version of WriteApi: _write_client = client.write_api(write_options=SYNCHRONOUS).

Please check following code:

from datetime import datetime, timedelta

import numpy as np
import pandas as pd

from influxdb_client import InfluxDBClient
from influxdb_client.client.write.dataframe_serializer import data_frame_to_list_of_points
from influxdb_client.client.write_api import PointSettings

"""
Prepare Client
"""
org = "my-org"
bucket = "my-bucket"
token = "my-token"

client = InfluxDBClient(url="http://localhost:8086", token=token, org=org)

"""
Prepare DataFrame
"""
date_today = datetime.now()
days = pd.date_range(date_today, date_today + timedelta(7), freq='D')

np.random.seed(seed=1111)
data = np.random.randint(1, high=100, size=len(days))
df = pd.DataFrame({'date': days, 'col2': data})
df = df.set_index('date')
print(df)

"""
Check Generated LineProtocol
"""
points = data_frame_to_list_of_points(data_frame=df,
                                      point_settings=PointSettings(),
                                      data_frame_measurement_name='meas')
print(points)

"""
Ingest DataFrame
"""
_write_client = client.write_api()
_write_client.write(bucket, org, record=df, data_frame_measurement_name='meas')
# Flush changes
_write_client.__del__()

"""
Querying ingested data
"""
query = f'from(bucket:"{bucket}")' \
        ' |> range(start: 0, stop: 10d)' \
        ' |> filter(fn: (r) => r._measurement == "meas")'
result = client.query_api().query(query=query)

"""
Processing results
"""
print()
print("=== results ===")
print()
for table in result:
    for record in table.records:
        print('{0}: {1} = {2}'.format(record["_time"], record["_field"], record["_value"]))

"""
Close client
"""
client.__del__()

Regards

stepanurban · 2020-11-24T11:44:47Z

Hi @bednar,
Thanks for your response! The problem really was with flushing the data. Unfortunately another issue arise when sending data. I'm getting

The batch item wasn't processed successfully because: (400)
Reason: Bad Request
HTTP response headers: HTTPHeaderDict({'Content-Type': 'application/json; charset=utf-8', 'X-Platform-Error-Code': 'invalid', 'Date': 'Tue, 24 Nov 2020 10:59:34 GMT', 'Content-Length': '565'})
HTTP response body: {"code":"invalid","message":"unable to parse 'meas  1606219156423525000': invalid field format\nunable to parse

due to NaN values in the dataframe. When you will change

data = np.empty(len(days))
data[:] = np.NaN

you can get this response. How can I cope with NaNs?

bednar · 2020-11-24T13:13:39Z

The data_frame_to_list_of_points is able to handle NaNs, but your DataFrame is without data.

Following data are pretty fine:

data = np.random.randint(1, high=100, size=len(days))
df = pd.DataFrame({'date': days, 'col2': data, 'col3': np.NaN})

stepanurban · 2020-11-24T13:50:48Z

Sorry for wrong example. For integer it works fine unfortunately the error arise when there are floats with NaNs

data = np.random.random(len(days))
data[0] = np.NaN
df = pd.DataFrame({'date': days, 'col2': data})
df = df.set_index('date')

bednar · 2020-11-24T14:21:04Z

Thanks, I see... there is the LineProtocol without fields:

meas  1606231210460024000
meas col2=0.9250037018616327 1606317610460024000
meas col2=0.34357342320721895 1606404010460024000
meas col2=0.3104769418698441 1606490410460024000
meas col2=0.002009839949097536 1606576810460024000
meas col2=0.23559472439157547 1606663210460024000
meas col2=0.23779172015837957 1606749610460024000
meas col2=0.7359158732407093 1606836010460024000

I will take a look

bednar · 2020-11-25T09:15:27Z

Hi @stepanurban,

the issue is fixed in #170.

You could try a dev version by:

pip install git+https://github.com/influxdata/influxdb-client-python.git@fix/dataframe-rows-nan

Regards

stepanurban · 2020-11-25T10:20:30Z

Works perfectly thanks!

bednar added the question Further information is requested label Nov 24, 2020

bednar mentioned this issue Nov 25, 2020

fix: skip DataFrame rows without data #170

Merged

6 tasks

stepanurban closed this as completed Nov 25, 2020

bednar added this to the 1.13.0 milestone Nov 25, 2020

bednar added bug Something isn't working and removed question Further information is requested labels Nov 25, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Unable to write Pandas Dataframe #169

Unable to write Pandas Dataframe #169

stepanurban commented Nov 23, 2020 •

edited

Loading

bednar commented Nov 24, 2020

stepanurban commented Nov 24, 2020

bednar commented Nov 24, 2020

stepanurban commented Nov 24, 2020

bednar commented Nov 24, 2020

bednar commented Nov 25, 2020

stepanurban commented Nov 25, 2020

Unable to write Pandas Dataframe #169

Unable to write Pandas Dataframe #169

Comments

stepanurban commented Nov 23, 2020 • edited Loading

bednar commented Nov 24, 2020

stepanurban commented Nov 24, 2020

bednar commented Nov 24, 2020

stepanurban commented Nov 24, 2020

bednar commented Nov 24, 2020

bednar commented Nov 25, 2020

stepanurban commented Nov 25, 2020

stepanurban commented Nov 23, 2020 •

edited

Loading