ENH: Include `df.attrs` metadata in `to_csv` output #53740

canthonyscott · 2023-06-20T15:47:38Z

closes ENH: Include df.attrs in to_csv output #53577
Tests added and passed if fixing a bug or adding a new feature
All code checks passed.
Added type annotations to new arguments/methods/functions.
Added an entry in the latest doc/source/whatsnew/vX.X.X.rst file if fixing a bug or adding a new feature.

This new functionality writes DataFrame metadata stored in df.attrs as header/comment lines at the top of a CSV file.

The to_csv() method had a comment param added to its call, matching the read_csv() method signature
If this param is set all key/values will be written to the beginning of the CSV file in the following format [comment][key]:[value]

canthonyscott · 2023-06-20T15:50:26Z

Given other discussions around df.attrs #52166 this implementation does not use or depend on __finalize__ at all.

removed testing file removed testing file removed testing file cleaned up comments modified docstring

Grammatical error corrected.

* testing example in template * Added max example * Added groupby examples * Updated code_checks * Corrected alignment

…in Numpy 1.25 (pandas-dev#53548) * DEBUG: npdev build * Address tests where sorting changed * Adjust more tests * Undo everything, even nanargsort * xfail the relevant tests * Add xfail to test_sort_column_level_and_index_label

fixed missing return type hint fixed failing docstring ci tests

…ndle quotes on the write line yet. Need to test how it handles on the reads

…uts to prevnt downstream parsing errors

* Example for pct_change * Added examples for groupby sem, shift, size * Updated code_checks * Corrected error on groupby size

Added examples

* TST: Use more fixtures * Use more fixtures in test_indexing_slow * Move addition compression_to_extension * Use more fixture in sparse test_indexing * fixturize libsparse

* Examples Timestamp.time, timetuple, timetz, to_datetime64, toordinal * Added tests and updated code_checks.sh * Corrected time and timetz * Corrected Timestamp.time and timetz

* Adding implementation for deprecation and entry in whatsnew file * Updating unit tests to account for deprecating of series.last() and DataFrame.last() * Added deprecation message in doc string * Adding PR number to new unit test * Removing duplicate "Parameters" docstring header * Adding doctest skip for call to last()

…evels" (pandas-dev#53760) Revert "BUG: DataFrame.stack with sort=True and unsorted MultiIndex levels (pandas-dev#53637)" This reverts commit 5edc2cc.

* TST: refactor data path for xml tests * fix style * fix typo

* TST: Reduce memory pressure of plotting tests * Trigger ci * Remove gc call

* note pytest bump * update install.rst

COMPAT: Remove np.comat

* Added suggested new line to fix doc code example * Removed the newline * Done

* TST: Make test_complibs deterministic * Make sure files are unique? * Try unique key * Just xfail test * Check what is taking long * Remove -v * Filter warning

* Cleanup single used method * Clean plotting test * Improve test_series_groupby_nunique * Address more slow tests * Undo changes

…v#53794) TYP: type pytest.MarkDecorator

TST/CLN: use fixture path for all xml tests

…dev#53792) * BUG: combine_first ignoring others columns if other is empty * Fix

* BUG: bad display for complex series with nan * added comments * added more test cases

* CLN: assorted * revert pivot edits * revert np.float128 check

* examples TimedeltaArray, Period.asfreq * Examples for categoricals * Exampl Categorical.codes and .__array__, edited code_checks * Corrected TimedeltaArray and wording to categorical * remove multiline --------- Co-authored-by: MarcoGorelli <33491632+MarcoGorelli@users.noreply.github.com>

* Update governance.md Misspelling of Subcommittee * Update web/pandas/about/governance.md --------- Co-authored-by: Marco Edward Gorelli <marcogorelli@protonmail.com>

canthonyscott · 2023-06-23T15:35:10Z

Closing to cleanup my forked branch. Will re-open as a new PR

canthonyscott and others added 3 commits June 8, 2023 09:31

base fixture and test for added function

c679b9f

Framework for to_csv to function with comment writing

1e5747e

Feature added, comment lines can be written out to csv files

9d8727b

canthonyscott changed the title ~~Csv write comments~~ ENH: Include df.attrs metadata in to_csv output Jun 20, 2023

mroeschke added metadata _metadata, .attrs IO CSV read_csv, to_csv labels Jun 20, 2023

canthonyscott and others added 23 commits June 23, 2023 10:33

Feature added, comment lines can be written out to csv files

64d10a8

removed testing file removed testing file removed testing file cleaned up comments modified docstring

Update 02_read_write.rst (pandas-dev#53559)

1baa312

Grammatical error corrected.

DOC: Fixing EX01 - Added examples (pandas-dev#53561)

d066909

* testing example in template * Added max example * Added groupby examples * Updated code_checks * Corrected alignment

prevent errors when comment is supplied, but comment_lines is None

7275637

fixed missing return type hint fixed failing docstring ci tests

removed file write call that was used for testing

d3a2d90

Change for comments to be sources from df.attrs -- Not sure how to ha…

8f5d12a

…ndle quotes on the write line yet. Need to test how it handles on the reads

renamed func

3d3e9fc

when saving as csv with comma delim, remove commas from the attr outp…

00059f5

…uts to prevnt downstream parsing errors

moving tests to new location and not using static data files

14abcee

fix precommit

b6eea23

Added fixtures for testing comments

0d2b004

removed spacing

efc9269

refactored tests to test writing outputs wit df.attrs as comment lines

c1c1266

removed todo line

11fcdc4

updated docstring

b6bb7a9

updated docstring

d42347a

fixed failing cicd checks

0a8efaf

DOC: Fixing EX01 - Added examples (pandas-dev#53564)

c77f0df

* Example for pct_change * Added examples for groupby sem, shift, size * Updated code_checks * Corrected error on groupby size

Upload nightlies to new location (pandas-dev#53341)

5c64661

CLN: Cleanup after CoW setitem PRs (pandas-dev#53142)

8eefe00

DOC: Fixing EX01 - Added examples (pandas-dev#53573)

47ce2da

Added examples

TST: Use more pytest fixtures (pandas-dev#53567)

73d0fda

* TST: Use more fixtures * Use more fixtures in test_indexing_slow * Move addition compression_to_extension * Use more fixture in sparse test_indexing * fixturize libsparse

DeaMariaLeon and others added 21 commits June 23, 2023 10:33

DOC: Fixing EX01 - Added examples (pandas-dev#53725)

ff612aa

* Examples Timestamp.time, timetuple, timetz, to_datetime64, toordinal * Added tests and updated code_checks.sh * Corrected time and timetz * Corrected Timestamp.time and timetz

BUG / CoW: Series.transform not respecting CoW (pandas-dev#53747)

d700bcd

Revert "BUG: DataFrame.stack with sort=True and unsorted MultiIndex l…

826f205

…evels" (pandas-dev#53760) Revert "BUG: DataFrame.stack with sort=True and unsorted MultiIndex levels (pandas-dev#53637)" This reverts commit 5edc2cc.

TST: refactor data path for xml tests (pandas-dev#53766)

d8260bf

* TST: refactor data path for xml tests * fix style * fix typo

TST: Reduce memory pressure of plotting tests (pandas-dev#53660)

9a1e1a0

* TST: Reduce memory pressure of plotting tests * Trigger ci * Remove gc call

DOC note pytest bump (pandas-dev#53768)

8ff4879

* note pytest bump * update install.rst

COMPAT: Remove np.compat (pandas-dev#53774)

6890cf2

COMPAT: Remove np.comat

Added suggested new line to fix doc code example (pandas-dev#53775)

59881f3

* Added suggested new line to fix doc code example * Removed the newline * Done

TST: Make test_complibs deterministic (pandas-dev#53754)

bf76e30

* TST: Make test_complibs deterministic * Make sure files are unique? * Try unique key * Just xfail test * Check what is taking long * Remove -v * Filter warning

TST: Refactor some slow tests (pandas-dev#53784)

7c2bdd2

* Cleanup single used method * Clean plotting test * Improve test_series_groupby_nunique * Address more slow tests * Undo changes

TYP: reshape.merge (pandas-dev#53780)

40aa4b6

TYP: annotate testing decorators with pytest.MarkDecorator (pandas-de…

70f0558

…v#53794) TYP: type pytest.MarkDecorator

TST/CLN: use fixture for data path in all xml tests (pandas-dev#53790)

bb923c2

TST/CLN: use fixture path for all xml tests

REF: remove unused merge args (pandas-dev#53789)

e4ba598

BUG: combine_first ignoring others columns if other is empty (pandas-…

8c68943

…dev#53792) * BUG: combine_first ignoring others columns if other is empty * Fix

PERF: concat in no-reindexing case (pandas-dev#53772)

4c254b5

BUG: bad display for complex series with nan (pandas-dev#53764)

b0baa2e

* BUG: bad display for complex series with nan * added comments * added more test cases

CLN: assorted (pandas-dev#53742)

931dc4b

* CLN: assorted * revert pivot edits * revert np.float128 check

Update governance.md (pandas-dev#53814)

b865253

* Update governance.md Misspelling of Subcommittee * Update web/pandas/about/governance.md --------- Co-authored-by: Marco Edward Gorelli <marcogorelli@protonmail.com>

canthonyscott force-pushed the csv-write-comments branch from 93b0e97 to b865253 Compare June 23, 2023 15:33

canthonyscott requested review from rhshadrach, MarcoGorelli, WillAyd, datapythonista and mroeschke as code owners June 23, 2023 15:33

canthonyscott closed this Jun 23, 2023

canthonyscott mentioned this pull request Jun 23, 2023

ENH: Include df.attrs metadata in to_csv output #53816

Closed

5 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ENH: Include `df.attrs` metadata in `to_csv` output #53740

ENH: Include `df.attrs` metadata in `to_csv` output #53740

canthonyscott commented Jun 20, 2023 •

edited

Loading

canthonyscott commented Jun 20, 2023

canthonyscott commented Jun 23, 2023 •

edited

Loading

ENH: Include df.attrs metadata in to_csv output #53740

ENH: Include df.attrs metadata in to_csv output #53740

Conversation

canthonyscott commented Jun 20, 2023 • edited Loading

canthonyscott commented Jun 20, 2023

canthonyscott commented Jun 23, 2023 • edited Loading

ENH: Include `df.attrs` metadata in `to_csv` output #53740

ENH: Include `df.attrs` metadata in `to_csv` output #53740

canthonyscott commented Jun 20, 2023 •

edited

Loading

canthonyscott commented Jun 23, 2023 •

edited

Loading