Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BUG: output formatting with to_html(), index=False and/or index_names=False (#22579, #22747) #22655

Merged
merged 109 commits into from
Jan 1, 2019

Conversation

simonjayhawkins
Copy link
Member

@simonjayhawkins simonjayhawkins commented Sep 10, 2018

@pep8speaks
Copy link

pep8speaks commented Sep 10, 2018

Hello @simonjayhawkins! Thanks for updating the PR.

Cheers ! There are no PEP8 issues in this Pull Request. 🍻

Comment last updated on December 28, 2018 at 15:37 Hours UTC

@codecov
Copy link

codecov bot commented Sep 10, 2018

Codecov Report

Merging #22655 into master will increase coverage by <.01%.
The diff coverage is 100%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master   #22655      +/-   ##
==========================================
+ Coverage   92.17%   92.18%   +<.01%     
==========================================
  Files         169      169              
  Lines       50708    50697      -11     
==========================================
- Hits        46740    46734       -6     
+ Misses       3968     3963       -5
Flag Coverage Δ
#multiple 90.59% <100%> (ø) ⬆️
#single 42.36% <0%> (ø) ⬆️
Impacted Files Coverage Δ
pandas/io/formats/html.py 91.96% <100%> (+1.27%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 0976e12...6b441df. Read the comment docs.

@codecov
Copy link

codecov bot commented Sep 10, 2018

Codecov Report

Merging #22655 into master will increase coverage by <.01%.
The diff coverage is 100%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master   #22655      +/-   ##
==========================================
+ Coverage   92.29%   92.29%   +<.01%     
==========================================
  Files         163      163              
  Lines       51948    51956       +8     
==========================================
+ Hits        47945    47953       +8     
  Misses       4003     4003
Flag Coverage Δ
#multiple 90.7% <100%> (ø) ⬆️
#single 42.99% <0%> (-0.01%) ⬇️
Impacted Files Coverage Δ
pandas/io/formats/html.py 98.67% <100%> (+0.03%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update ab55d05...5b635e4. Read the comment docs.

@simonjayhawkins
Copy link
Member Author

@WillAyd @jreback comments addressed. ptal.

# Determine if ANY column names need to be displayed
# since if the row index is not displayed a column of
# blank cells need to be included before the DataFrame values.
self.show_col_idx_names = all((self.fmt.has_column_names,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry for questions but still trying to wrap my head around implementation. Based off of the comment, why is this all here and not any? Wouldn't any of these require there to be a cell where a column index name would be placed?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

index=False with a single level row index and multi-level columns index with named columns but not all named...

index = pd.MultiIndex.from_product([['a','b'], ['c','d'], ['e','f']], names=[
                 'foo',None, 'baz'])
df = pd.DataFrame(np.arange(64).reshape(8,8), columns=index)
result = df.to_html(max_rows=4, max_cols=4, index=False)
print(result)
foo a ... b
c ... d
baz e f ... e f
0 1 6 7
8 9 14 15
48 49 54 55
56 57 62 63

Note: missing truncation indicators in data now fixed in master.

the misalignment of the column names is due to the logic being applied within the level generating loop..

name = self.columns.names[lnum]
row = [''] * (row_levels - 1) + ['' if name is None else
pprint_thing(name)]
if row == [""] and self.fmt.index is False:
row = []

hence class-level variable needed to check if ANY names need to be displayed to determine alignment.

ALL condition is to determine in ANY names should be displayed given the to_html parameters and uses similar logic as to_string etc.

def _get_formatted_index(self, frame):
# Note: this is only used by to_string() and to_latex(), not by
# to_html().
index = frame.index
columns = frame.columns
show_index_names = self.show_index_names and self.has_index_names
show_col_names = (self.show_index_names and self.has_column_names)

and the rows in to_html..

if all((self.fmt.has_index_names,
self.fmt.index,
self.fmt.show_index_names)):

There is currently no test to explicitly cover this example. so i think the best way forward is to fully parameterize the truncation tests in line with the parametrized basic_alignment tests for added assurance.

i'll make show_col_idx_names a class property for clarity and add a note to refactor and 'inherit' from DataFrameFormatter class. inherit quoted since HTMLFormatter class is not directly inherited from DataFrameFormatter. in the first refactor just use mock inheritence like..

@property
def is_truncated(self):
return self.fmt.is_truncated

@simonjayhawkins
Copy link
Member Author

@jreback @WillAyd with the additional parameterization of the truncation tests, we now have test coverage for multi-indexes with more than 2 rows, missing column index names and truncation with standard row indexes. There is now test coverage in place allowing the refactoring of row-levels to class property in this PR for use by _write_header and _write_regular_rows. I've added a TODO in _write_hierarchical_rows to refactor after #22887 is fixed

@simonjayhawkins
Copy link
Member Author

@WillAyd @jreback Could you please take another look. Thanks.

@jreback
Copy link
Contributor

jreback commented Dec 28, 2018

I would make a sub-dir of data/html to hold all of this test data (and move the original .html files as well).

@jreback
Copy link
Contributor

jreback commented Dec 28, 2018

@WillAyd over to you

@jreback jreback added this to the 0.24.0 milestone Dec 28, 2018
@jreback
Copy link
Contributor

jreback commented Jan 1, 2019

@WillAyd

@WillAyd WillAyd merged commit b9284a2 into pandas-dev:master Jan 1, 2019
@WillAyd
Copy link
Member

WillAyd commented Jan 1, 2019

Thanks @simonjayhawkins !

@jreback
Copy link
Contributor

jreback commented Jan 1, 2019

thanks @simonjayhawkins !

thoo added a commit to thoo/pandas that referenced this pull request Jan 1, 2019
* upstream/master:
  BUG: output formatting with to_html(), index=False and/or index_names=False (pandas-dev#22579, pandas-dev#22747) (pandas-dev#22655)
  MAINT: Port _timelex in codebase (pandas-dev#24520)
  Implement unique+array parts of 24024 (pandas-dev#24527)
  Integer NA docs (pandas-dev#23617)
@simonjayhawkins simonjayhawkins deleted the issue22579 branch January 1, 2019 21:00
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
IO HTML read_html, to_html, Styler.apply, Styler.applymap Output-Formatting __repr__ of pandas objects, to_string
Projects
None yet
4 participants