Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ENH: Include df.attrs metadata in to_csv output #53740

Closed
wants to merge 123 commits into from

Conversation

canthonyscott
Copy link

@canthonyscott canthonyscott commented Jun 20, 2023


This new functionality writes DataFrame metadata stored in df.attrs as header/comment lines at the top of a CSV file.

  1. The to_csv() method had a comment param added to its call, matching the read_csv() method signature
  2. If this param is set all key/values will be written to the beginning of the CSV file in the following format [comment][key]:[value]

@canthonyscott
Copy link
Author

Given other discussions around df.attrs #52166 this implementation does not use or depend on __finalize__ at all.

@canthonyscott canthonyscott changed the title Csv write comments ENH: Include df.attrs metadata in to_csv output Jun 20, 2023
@mroeschke mroeschke added metadata _metadata, .attrs IO CSV read_csv, to_csv labels Jun 20, 2023
canthonyscott and others added 23 commits June 23, 2023 10:33
removed testing file

removed testing file

removed testing file

cleaned up comments

modified docstring
Grammatical error corrected.
* testing example in template

* Added max example

* Added groupby examples

* Updated code_checks

* Corrected alignment
…in Numpy 1.25 (pandas-dev#53548)

* DEBUG: npdev build

* Address tests where sorting changed

* Adjust more tests

* Undo everything, even nanargsort

* xfail the relevant tests

* Add xfail to test_sort_column_level_and_index_label
fixed missing return type hint

fixed failing docstring ci tests
…ndle quotes on the write line yet. Need to test how it handles on the reads
* Example for pct_change

* Added examples for groupby sem, shift, size

* Updated code_checks

* Corrected error on groupby size
* TST: Use more fixtures

* Use more fixtures in test_indexing_slow

* Move addition compression_to_extension

* Use more fixture in sparse test_indexing

* fixturize libsparse
DeaMariaLeon and others added 21 commits June 23, 2023 10:33
* Examples Timestamp.time, timetuple, timetz, to_datetime64, toordinal

* Added tests and updated code_checks.sh

* Corrected time and timetz

* Corrected Timestamp.time and timetz
* Adding implementation for deprecation and entry in whatsnew file

* Updating unit tests to account for deprecating of series.last() and DataFrame.last()

* Added deprecation message in doc string

* Adding PR number to new unit test

* Removing duplicate "Parameters" docstring header

* Adding doctest skip for call to last()
…evels" (pandas-dev#53760)

Revert "BUG: DataFrame.stack with sort=True and unsorted MultiIndex levels (pandas-dev#53637)"

This reverts commit 5edc2cc.
* TST: refactor data path for xml tests

* fix style

* fix typo
* TST: Reduce memory pressure of plotting tests

* Trigger ci

* Remove gc call
* note pytest bump

* update install.rst
* Added suggested new line to fix doc code example

* Removed the newline

* Done
* TST: Make test_complibs deterministic

* Make sure files are unique?

* Try unique key

* Just xfail test

* Check what is taking long

* Remove -v

* Filter warning
* Cleanup single used method

* Clean plotting test

* Improve test_series_groupby_nunique

* Address more slow tests

* Undo changes
…dev#53792)

* BUG: combine_first ignoring others columns if other is empty

* Fix
* BUG: bad display for complex series with nan

* added comments

* added more test cases
* CLN: assorted

* revert pivot edits

* revert np.float128 check
* examples TimedeltaArray, Period.asfreq

* Examples for categoricals

* Exampl Categorical.codes and .__array__, edited code_checks

* Corrected TimedeltaArray and wording to categorical

* remove multiline

---------

Co-authored-by: MarcoGorelli <33491632+MarcoGorelli@users.noreply.github.com>
* Update governance.md

Misspelling of Subcommittee

* Update web/pandas/about/governance.md

---------

Co-authored-by: Marco Edward Gorelli <marcogorelli@protonmail.com>
@canthonyscott
Copy link
Author

canthonyscott commented Jun 23, 2023

Closing to cleanup my forked branch. Will re-open as a new PR

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
IO CSV read_csv, to_csv metadata _metadata, .attrs
Projects
None yet
Development

Successfully merging this pull request may close these issues.

ENH: Include df.attrs in to_csv output