Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Compress #123

Merged
merged 13 commits into from
Feb 4, 2021
Merged

Compress #123

merged 13 commits into from
Feb 4, 2021

Conversation

HoWol76
Copy link
Contributor

@HoWol76 HoWol76 commented Oct 1, 2020

Allows output in compressed form, resolves issue #122

New namelist property: compressed. Defaults to False, if True, arrays are compressed on output.

Changes method _var_strings by two methods:

_compress converts the v_values list to a compressed list where each element is a 2-element list with the number of repetitions and the original element.

_f90comprepr converts each of these individual 2-element lists into the correct compressed format.

Several tests were added for this feature.

Holger Wolff added 6 commits October 1, 2020 13:33
A new property added to namelist: compressed. Defaults to False.

If True, the output of the namelist is compressed by writing
successive identical values of arrays as n*v, where n is the number
of successive values and v is the value.

In order to do so, two new methods were added:

_compress: converts a list of values into a list of list of values,
e.g. [1, 1, 1, 2, 3, 3] -> [[3, 1], [1, 2], [2, 3]]

_f90comprepr: takes the elements of this compressed list and converts
them to their string components. In the above example:
"3*1", "2", and "2*3"

Call to these methods has been added to the _var_strings method
@HoWol76
Copy link
Contributor Author

HoWol76 commented Oct 1, 2020

TODO: No changes have yet been made to the version number or the documentation.

@coveralls
Copy link

coveralls commented Oct 1, 2020

Coverage Status

Coverage decreased (-0.2%) to 99.442% when pulling f47e71d on HoWol76:compress into 7de6c44 on marshallward:master.

@HoWol76
Copy link
Contributor Author

HoWol76 commented Oct 1, 2020

The coverage decrease is not due to my code changes. All my code is covered by unit tests.

@marshallward
Copy link
Owner

marshallward commented Oct 1, 2020

Thanks Holger, this looks like very nice work, and thank you also for adding the tests. I am very happy to merge this.

The code is good, the only real issue I have is with describing it as compression. The Fortran standard often describes as a "repeat" or "repeat factor". For example, the "data-stmt-repeat" is the label used for the repeat counter in a data statement. (Annoyingly, the IO section calls them "r* form" which really doesn't help with naming conventions. My old Metcalf/Reid/Cohen book also calls them "repeat counters". (That name is suspiciously absent from the standard, but nevermind that for now...)

The other reason is that "compression" can also refer to more elaborate forms of data compression. I guess this is a kind of compression, but might not be what comes to mind for most people.

I can make changes here if you are happy with them, or you are welcome to do them yourself.

In the future, we could modify this so that repeat_counter also takes an integer argument, which could represent a minimum count to enable repeated values. But let's leave that for another time.


No worries about the coverage. I think something changed with respect to YAML support on either the Travis or Ubuntu side of things.

@marshallward
Copy link
Owner

Also _compress could probably be replaced with something like this:

list([len(list(x)),i] for i, x in itertools.groupby(values))

@marshallward
Copy link
Owner

I have one more comment which is more of a long-term concern and which need not be addressed right now.

I am a little concerned that we are now implicitly purposing the list-of-list [[r],[c]] as a temporary representation of a repeated counter, while also using also use list-of-lists in other plafces to represent multidimensional arrays. Because we process multidimensional list-of-lists before reaching this point, and because we use the flag to catch the _f90repr call and instead pass to _f90comprepr, I think we avoid the issue of duplicate representation.

I have not been able to find any case which causes problems, and it obviously passes all the current tests, so I don't mind merging this. But we may want to at least add a comment describing the potential conflict in any future work.

Holger Wolff added 2 commits October 2, 2020 10:24
Instead of storing repeat lists as lists of lists,
use a new subclass RepeatValue. This makes it easier to
distinguish between repeat lists and multidimensional lists.

Also, use itertools.groupby to replace the whole _repeat method.
@HoWol76
Copy link
Contributor Author

HoWol76 commented Oct 2, 2020

I've taken your suggestions to heart.

  1. It's now called repeat, not compress. This might still be counterintuitive, as I would call 1, 1, 1 repeating, and 3*1 non-repeating.
  2. I've created a new tiny class RepeatValue that simply stores the value and the times that value is repeated. This also means that we can move this back into the _f90repr method and distinguish by the type.
  3. I'm using itertools as you suggested.
  4. I just noticed that I didn't run the codestyle checker. Let me to this right now.

@HoWol76
Copy link
Contributor Author

HoWol76 commented Oct 2, 2020

Still TODO:

  • Fix version number
  • Update documentation
  • when reading namelist, set repeat property correctly.

@marshallward
Copy link
Owner

Thanks, I like this very much. I think there could also be additional benefit by introducing more of these micro-classes.

I agree that repeat is still a bit counterintuitive. What do you think of repeat_counter? It is heavier than repeat, but I think it says what it means. (Like the other logical flags, I suppose the use_ is implicit...)


Don't worry about versioning or version numbers, I will handle that after this is merged.

@marshallward
Copy link
Owner

I think I will go ahead and merge this, despite a few minor issues (repeat counters with nulls?). This should not effect default behaviour and will only be noticed when this feature is enabled. The other issues like documentation can be sorted out in future PRs. If no objections from you, I'd like to merge it.

@marshallward marshallward merged commit c126d83 into marshallward:master Feb 4, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants