Configurable truncation when rendering text #53

shoyer · 2025-01-09T02:40:19Z

Hierarchical models can create really large trees of objects.

Treescope has nice control for shrinking reprs with using_expansion_strategy(max_height=X), but this only works well for interactive HTML and displays that never wrap text (e.g., Google Colab). On the command line, long lines wrap around and make reprs unreadable.

It would be really nice if Treescope has some built-in support for truncating such really long lines, perhaps by replacing objects with truncated reprs like <dict> or {...}. I would use this for implementing __repr__ methods, e.g., for neural network modules.

To reproduce:

from __future__ import annotations
import dataclasses
import numpy as np
import treescope

@dataclasses.dataclass
class Config:
  batch_size: int = 128
  num_features: int = 64
  height: int = 1024
  width: int = 1024
  model_name: str = 'magic'
  nested_config: Config | None = None

nested = {
    f'x{i}': {
        'foo': np.random.randn(100, 100),
        'bar': {'x': 1111, 'y': 2222, 'z': 3333, 'c': Config(nested_config=Config())},
        'baz': [3] * 7,
    }
    for i in range(5)
}

Default rendering -- fine on GitHub, but unreadable when pasted into a console:

with treescope.using_expansion_strategy(max_height=10):
  print(treescope.render_to_text(nested))

{
  'x0': {'foo': <np.ndarray float64(100, 100) ≈-0.0016 ±1.0 [≥-3.9, ≤3.9] nonzero:10_000>, 'bar': {'x': 1111, 'y': 2222, 'z': 3333, 'c': Config(batch_size=128, num_features=64, height=1024, width=1024, model_name='magic', nested_config=Config(batch_size=128, num_features=64, height=1024, width=1024, model_name='magic', nested_config=None))}, 'baz': [3, 3, 3, 3, 3, 3, 3]},
  'x1': {'foo': <np.ndarray float64(100, 100) ≈0.0019 ±1.0 [≥-3.5, ≤3.8] nonzero:10_000>, 'bar': {'x': 1111, 'y': 2222, 'z': 3333, 'c': Config(batch_size=128, num_features=64, height=1024, width=1024, model_name='magic', nested_config=Config(batch_size=128, num_features=64, height=1024, width=1024, model_name='magic', nested_config=None))}, 'baz': [3, 3, 3, 3, 3, 3, 3]},
  'x2': {'foo': <np.ndarray float64(100, 100) ≈0.01 ±1.0 [≥-3.5, ≤3.9] nonzero:10_000>, 'bar': {'x': 1111, 'y': 2222, 'z': 3333, 'c': Config(batch_size=128, num_features=64, height=1024, width=1024, model_name='magic', nested_config=Config(batch_size=128, num_features=64, height=1024, width=1024, model_name='magic', nested_config=None))}, 'baz': [3, 3, 3, 3, 3, 3, 3]},
  'x3': {'foo': <np.ndarray float64(100, 100) ≈0.003 ±0.99 [≥-3.8, ≤3.8] nonzero:10_000>, 'bar': {'x': 1111, 'y': 2222, 'z': 3333, 'c': Config(batch_size=128, num_features=64, height=1024, width=1024, model_name='magic', nested_config=Config(batch_size=128, num_features=64, height=1024, width=1024, model_name='magic', nested_config=None))}, 'baz': [3, 3, 3, 3, 3, 3, 3]},
  'x4': {'foo': <np.ndarray float64(100, 100) ≈0.022 ±1.0 [≥-3.7, ≤3.7] nonzero:10_000>, 'bar': {'x': 1111, 'y': 2222, 'z': 3333, 'c': Config(batch_size=128, num_features=64, height=1024, width=1024, model_name='magic', nested_config=Config(batch_size=128, num_features=64, height=1024, width=1024, model_name='magic', nested_config=None))}, 'baz': [3, 3, 3, 3, 3, 3, 3]},
}

One very naive attempt at truncation:

with treescope.using_expansion_strategy(max_height=10):
  print(treescope.render_to_text(nested))

def truncate_lines(text):
  return '\n'.join(t[:77] + '...' if len(t) > 80 else t for t in text.split('\n'))

{
  'x0': {'foo': <np.ndarray float64(100, 100) ≈-0.0016 ±1.0 [≥-3.9, ≤3.9] non...
  'x1': {'foo': <np.ndarray float64(100, 100) ≈0.0019 ±1.0 [≥-3.5, ≤3.8] nonz...
  'x2': {'foo': <np.ndarray float64(100, 100) ≈0.01 ±1.0 [≥-3.5, ≤3.9] nonzer...
  'x3': {'foo': <np.ndarray float64(100, 100) ≈0.003 ±0.99 [≥-3.8, ≤3.8] nonz...
  'x4': {'foo': <np.ndarray float64(100, 100) ≈0.022 ±1.0 [≥-3.7, ≤3.7] nonze...
}

Example of what a better truncated repr might look like, with the tree truncated at a consistent depth:

{
  'x0': {'foo': <np.ndarray>, 'bar': {...}, 'baz': [3, 3, 3, 3, 3, 3, 3]},
  'x1': {'foo': <np.ndarray>, 'bar': {...}, 'baz': [3, 3, 3, 3, 3, 3, 3]},
  'x2': {'foo': <np.ndarray>, 'bar': {...}, 'baz': [3, 3, 3, 3, 3, 3, 3]},
  'x3': {'foo': <np.ndarray>, 'bar': {...}, 'baz': [3, 3, 3, 3, 3, 3, 3]},
  'x4': {'foo': <np.ndarray>, 'bar': {...}, 'baz': [3, 3, 3, 3, 3, 3, 3]},
}

The text was updated successfully, but these errors were encountered:

danieldjohnson · 2025-01-21T08:11:46Z

This is a great idea, I've been thinking about doing something like this for a while but never got around to it.

I had some time to take a stab at a basic implementation in #56. It's currently missing tests and I haven't gotten a chance to play around with it much so there may be bugs. I probably won't be able to polish this up for a few weeks, but I figured I'd share now for early feedback.

If you want to try it out you should be able to install it with

pip install treescope@git+https://github.com/google-deepmind/treescope.git@abbreviation_system

danieldjohnson · 2025-01-21T08:24:58Z

I actually ran into a design question while prototyping this that I'd appreciate your thoughts on. There are sort of two choices for when to abbreviate:

Abbreviate at a consistent depth (what I implemented)
Abbreviate only if the contents are very long (not implemented)

1 is the easiest to implement. We could sort of fake 2 if we were OK with the following heuristic:

If a node could be fully rendered in less than K columns without any abbreviation, don't abbreviate it. Otherwise abbreviate it at fixed depth.

But it would be quite difficult to extend this to do a full recursive analysis like the layout algorithms currently do, where abbreviation of one node is based on the size of the node when abbreviating its children, and there is a target width we are trying to reach. (This is because the user can change the expand states interactively in HTML mode, which interact with the abbreviation depth levels, and I don't want to make that logic more complicated or give users any toggles for abbreviation levels.)

So, I'm curious whether you think it would be useful to implement something like 2 in this heuristic form? A limitation would be that lines might still become "too long" under this rule if they have many children and the children are all just short enough to not be abbreviated.

shoyer · 2025-01-21T18:36:44Z

Daniel, thanks so much for taking a look at this!

It would be fantastic to land simple version of this soon, which we could start testing out and think about incrementally improving upon. Constant depth seems like a totally reasonable place to start.

A limitation would be that lines might still become "too long" under this rule if they have many children and the children are all just short enough to not be abbreviated.

This sounds potentially problematic for the use-case I am most concerned about (readable text reprs with line wrapping). So I guess I would lean towards version 1.

danieldjohnson · 2025-02-17T18:49:43Z

Ok, I've merged a simple version of this, it will be included in v0.1.9.

Thanks for the idea and feedback, and let me know if you run into any issues with it!

shoyer · 2025-02-17T18:51:07Z

Amazing, thanks Daniel!!

…

On Mon, Feb 17, 2025 at 10:50 AM Daniel D. Johnson ***@***.***> wrote: Ok, I've merged a simple version of this, it will be included in v0.1.9. Thanks for the idea and feedback, and let me know if you run into any issues with it! — Reply to this email directly, view it on GitHub <#53 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AAJJFVXCUZHR52IGYZRKNF32QIVNZAVCNFSM6AAAAABU3FO7R6VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDMNRTHA4DGMRSGM> . You are receiving this because you authored the thread.Message ID: ***@***.***> [image: danieldjohnson]*danieldjohnson* left a comment (google-deepmind/treescope#53) <#53 (comment)> Ok, I've merged a simple version of this, it will be included in v0.1.9. Thanks for the idea and feedback, and let me know if you run into any issues with it! — Reply to this email directly, view it on GitHub <#53 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AAJJFVXCUZHR52IGYZRKNF32QIVNZAVCNFSM6AAAAABU3FO7R6VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDMNRTHA4DGMRSGM> . You are receiving this because you authored the thread.Message ID: ***@***.***>

danieldjohnson added the feature-request New feature or request label Jan 21, 2025

danieldjohnson mentioned this issue Jan 21, 2025

Implement abbreviation system to avoid very-long-line outputs #56

Merged

danieldjohnson self-assigned this Jan 21, 2025

danieldjohnson closed this as completed in 022f8ee Feb 17, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Configurable truncation when rendering text #53

Configurable truncation when rendering text #53

shoyer commented Jan 9, 2025 •

edited

Loading

danieldjohnson commented Jan 21, 2025

danieldjohnson commented Jan 21, 2025

shoyer commented Jan 21, 2025

danieldjohnson commented Feb 17, 2025

shoyer commented Feb 17, 2025 via email

Configurable truncation when rendering text #53

Configurable truncation when rendering text #53

Comments

shoyer commented Jan 9, 2025 • edited Loading

danieldjohnson commented Jan 21, 2025

danieldjohnson commented Jan 21, 2025

shoyer commented Jan 21, 2025

danieldjohnson commented Feb 17, 2025

shoyer commented Feb 17, 2025 via email

shoyer commented Jan 9, 2025 •

edited

Loading