Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cleanup and simplification of code structure (Suggestion) #4

Merged
merged 2 commits into from
Jan 18, 2021

Conversation

hernot
Copy link
Contributor

@hernot hernot commented Dec 15, 2019

Hi Paul

Finally i managed to come up with a suggestion for simplifying and cleaning code structure such that future extensions are simple and easy and that dropping python2 support is nearly as simple as removing related code sections from core.py. And converting some other places marked with TODO back to proper and clean python >= 3 code.

Here a short summary on the changes the elaborate details you find in the commit message.
Admitted some codes may still need some comments and doc strings.

  • All parsing now only handled by grammar.py file including parsing of AmiraMesh data-streams
  • HyperSurface data structure is parsed along with File header removing asymmetry between HyperSurface and AmiraMesh file trees
  • HyperSurface data-streams are loaded along with structure information unless stream_loading set to HEADER_ONLY or False in that case not loaded at all
  • Added functions for defining and modifying global stream_loading behaviour, possible values are HEADERONLY, ONDEMMAND (current default open to discussion), IMMEDIATE. Can be overridden by load_streams argument of Header and AmiraFile classes
  • AmiraFile is subclass of Header not Wrapper
  • introduced Meta array_declarations which allow proper mapping of BoundaryCurves, Patches and Surfaces declarations on HyperSurface Files
  • Extended AmiraDispatchProcessor to group lists of array_declarations which share same base name and differ only be integer counter using meta_array_declarations. Can be in future be extended by more complex and sophisticated heuristics for deciding whether grouping is valid or should not happen. Example could be HxSpreadSheet files.
  • Improved access to attributes added using add_attr, no more need for falling back to getattr exempt for loading of data streams on demand
  • parent attribute is now weak reference with automatic resetting to None if parent is deleted
  • some metaclass magic to flag attributes declared on slots to be read-only when accessed from outside Block class or subclass.
  • renamed decodesmodule.cpp to decodersmodule.c skipping static_cast method which was only one requiring to compile as c++ which caused compiler error when compiling for python2
  • all tests should be running ok for python3 and python2 as long as still valid all others currently commented. please verify.
  • extended lists of relative imports by if __package__: else: ``clause switching between relative import if imported from external ahds package or called from within ahds directory for debugging purpose. In contrast to try: catch:` these work for python2 and python3 flawlessly
  • in core.py changed your assert statement based argument checks back to simple if based checks as asserts are converted to no-ops if python is called with -O or -OO options for basic optimizations. And than check always succeeds. Should not rely on people playing fair and using code properly or not ask python for optimization.
  • first suggestion on how to identify and support various ContentTypes especially from AmiraMesh files see suggestion for implementation using HxSpreadSheet as an example.

@hernot
Copy link
Contributor Author

hernot commented Dec 15, 2019

Think Travis setup for python2 i have no idea what the pywavelets package is used for it is not inside installation requirements. Are they really needed or could they for python 2 be kicked.
For python 3 i fixed it by limiting scicit-image to <= 0.15 one more as for python 2.7 Results for python3.6 and python3.7 are working after undoing some temporary direct debugging hack.
For testcoverage we should make a plan how to best cover the new stuff.

Will continue to improve this pullrequest, if i come across something while working on other project which could be beneficial for ahds, or further simplify things.

@hernot
Copy link
Contributor Author

hernot commented Jan 3, 2020

List of packages which in latest version do not support anything below Python 3.5 or even 3.6 again increased. Fear that till end of first quarter of 2020 we will do nothing else any more than identifying latest package versions of indirect dependencies still supporting Python 2.7. This time Pillow stopped support for Python 2.7 starting with version 7.0.0.

@hernot hernot closed this Feb 11, 2020
@hernot hernot reopened this Feb 11, 2020
@hernot
Copy link
Contributor Author

hernot commented Feb 11, 2020

Any idea why Travis not running?

1 similar comment
@hernot
Copy link
Contributor Author

hernot commented Feb 11, 2020

Any idea why Travis not running?

@hernot
Copy link
Contributor Author

hernot commented Aug 31, 2020

Not sure why pytest hicksup on Python 3.5 possibly cause tarvis.net is not any more hosted and we should switcht finally to tarvis.com. What ever the reason is i do not have any idea.

Despite from that, added unit tests and tests to achieve 100 % coverage for core modules only extra is not tested at all these tests you have to do.

There are serveral lines which i marked with nocover for one of the following reasons

  1. unless code broken else where not reached -> Exception if executed
  2. lack of test data to be able to check if correct or should throw execption if code broken ->
    warning if executed remembering to collect test data triggering line and recording what
    has happened to reach there
  3. exclusively relevant for Python 2 or Python >= 3 marked with cover_py2 or cover_py3

EDIT: pathlib2 required by pytest for python < 3.6 tries to compare three element list sys.version_info with two element tuple which is valid and possible on Python2 but not Python3 fear to fix this would require to open related issue on pytest which would be rejected anyway (see). Alternative drop python3.5 testing on travis to avoid pytest switchback to pathlib2 for python3.5. How much installations of ahds you would expect to break if not tested under python3.5? I will remove for now as the only fix i could come up with.

@paulkorir
Copy link
Member

Hi Christoph,
I'm very happy to merge these. I would like to suggest some cleanup:

  • squash commits so that we don't have 'oh no!' commits
  • make the commit message helpful
    Thanks so much for your contribution.
    Paul

@hernot
Copy link
Contributor Author

hernot commented Sep 18, 2020

Hi Paul
good to hear from you. Yes same procedure as every time. Maybe, if i encounter some more things to make less complicated hassl i will do in addition. But i think i have all so far.

Parsing:
========
Collects all code dealing with parsing in grammar.py. This includes
regular expressions for and code handling parsing of data streams, ASCII
and binary for AmiraMesh and Hyper Surface files. As a result the
structure of HyperSurface files is now fully parsed before get_parsed_data
method returns. This removes an annoying asymmetry between HyperSurface
and AmiraMesh files. For both header now shows the full structure.
For extracting AmiraMesh data-streams binary and ASCII two new methods are
introduced in grammar.py next_amiramesh_binary_stream and
next_amiramesh_ascii_stream. The encoded stream data for HyperSurface
files is read and loaded during parsing of file structure. No second
pass for reading HyperSurface data is used.

Data stream loading:
====================
Added set_stream_policy and get_stream_policy methods. The two allow
to select global default setting for loading streams. This can be
overridden by the load_streams argument of the Header and AmiraFile
classes. The following values can be used to control stream loading
globally using set_stream_policy and load_streams attribute.

   HEADERONLY ... Only header data and structural information is loaded
                  trying to access data attributes of AmiraDataStream
                  type objects triggers an AttributeError Exception
   ONDEMMAND .... Load the stream data as soon as the data attribute
                  of specific AmiraDataStream is accessed for the first
                  time. The stream data for any stream which is encountered
                  while searching for the targeted will be loaded
                  implicitly (*) (**)
   IMMEDIATE .... Load the stream data immediately during creation of
                  Header or AmiraFile object. (*) (**)
   (*) For HyperSurface files ONDEMMAND and IMMEDIATE expose the same
       behavior (see above) and can be considered synonymous.
   (**) decoding of data streams will happen in any case on first access
       of data attribute

The load_streams attribute of the Header and AmiraFile class may be
set in in addition to the following values:
   True ......... Selects IMMEDIATE for specific instance
   False ........ Selects HEADERONLY for specific instance
   None ......... Selects the global default setting as returned by
                  get_stream_policy method

AmiraFile:
==========
AmiraFile class is subclass of Header class.

Meta array_declarations:
========================
In order to simplify parsing and creation of Header structure of
HyperSurface files parsing prepends so called meta_array_declarations to
the array_declaratoins structure. Their array_dimension is set to the
number of BondaryCurves, Patches, Surfaces as for any other
ArrayDeclaration. In addition they have an array_blocktype field reading
'list' which is used by the header load_declarations method to create a
ListBock instead of a normal Block attribute. Further for any following
BoundaryCurve, Patch and Surface an additional declaration is added for
which the array_dimension is set to 0, the additional array_parent field
references the corresponding meta_array_declaration and the additional
array_itemid field hosts the item index at which the load_declarations method
shall insert it into the corresponding BondaryCurves, Patches or
Surfaces ListBlock. The Blocks for each BoundaryCurve, Patch and/or
Surface is temporarily added to Header class instance __dict__ prefixing
its name by '_@' sequence this ensures that the can only be found and
retrieved using getattr and not the '.' operator. the load_definitions
method is extended such that it also test for hidden class attributes prefixed
by '_@'.

Extended the AmiraDispatchProcessor class such that array_declaration
method now applies ContentType specific filters to all array
declaration. Such a filter for example would check the name of an
array_declaration for trailing numbers which could comprise a valid
consecutive counter such as for __Column1, __Column2, ... __ColumnN used
by HxSpreadSheet type AmiraMesh files.
If the filter finds a match the triggering array_declaration is extended
as described above for HyperSurface file structure.
In Addition on the first hit a meta_array_declaration is created. This
in addition contains a field used to count the number of array_declarations
belonging to the same set of array_declarations forming the meta_array_declaration.
If the final content_type of file corresponds to content_type assumed by
filter function it is prepended to the list of array_declaratoins and
discarded otherwise.

A meta_array_declaration is not prepended to
the list of array_declarations unless a corresponding ContentType
parameter entry is encountered and the meta_array_declaration referes to
more than one array_declaration read from the file. All other collected
meta_array_declarations are ignored.

Blocks representing MetaArrays like Patches, Surfaces, HxSpreadSheet have
to register a dedicated filter function. This functoin has to return a
tuple containing
```
(base_name, array_index, block_type, extra_meta_data, extra_declaration_data)
```
in order to indicate that the inspected array_declaration shall be
linked to the meta_array_declaration representing the specified
ContentType.

As a first example the HxSpreadSheet content_type and the hxsurface contenttype
assumed for HyperSurface files is implemented. The AnmiraSpreadSheet provides
the static array_declaration filter function which returns constant string
reading Sheet1 instead of simply striping the counter and returning the base_name.
The aim is to demonstrate that filter functions not only may identify
array_declarations which shall be grouped together into a meta_array but
also can specify the name of this meta_meta array, rename their child arrays by
modifying the inspected array declaration and inject addtional
parameters into any of the array_delcarations eventh_though not
triggering the creation of a meta_array_declaration.

For all other AmiraMesh files the ContentType and corresponding array_declaratoins
have to be still captured and the appropriate appropriate filterfunctions
have to be implemented.  Like for other explicit ContentTypes

Dynamic attributes created by add_attr
======================================
All dynamic attributes are now accessible through a custom
data_descriptor which is maintained and created by the add_attr method
and automatically vanishes from Block class hierarchy as soon as the
last Block type object hosting the corresponding attribute is deleted or
the attribute of that specific instance is deleted or renamed. This made
calls to __getattr__ including the repeated AttributeError thrown by
builtin object.__getattribute__ method obsolete. The data_descriptor is
configured such that it behaves like an instance attribute. As a
consequence an attribute which was added to instance of Block type class
using by setattr can not be added to it using add_attr and vice versa.
Attributes the name of which has been listed on __slots__ special
attribute for the Class any any of its base classes can not be added to
instance using add_attr in any case.

parent attribute:
=================
parent attribute is now a weak reference which switches back to None as
soon as the parent Block type class instance is deleted or garbage
collected. The parent attribute has two interfaces a public one named
'parent' and a protected named '_parent'. Like for readnoly attributes
below the 'parent' attribute is readonly and may only modified by
assigning to the '_parent' attribute.

Read only attributes:
=====================
Replaced @attributes methods which just access protected __slot__ attributes
with not much other logic by read_only_datadescriptor which directly
guards corresponding public attribute. From within Block type class
and subclass it can be freely assigned using the protected __slots__ attribute.
For example the public 'name' attribute allows to read the name of the block
but assigning a new name to the Block is only possible by assigning it
to the '_name' attribute.
To flag a __slots__ attribute read-only its Name has to be decorated by the
READONLY method when listed on the __slots__ attribute of the new
Block type class. The rest is handled by some MetaClass magic, which
also ensures that the READONLY method is available from within Block
class declaration code.
details see core.py file.

NOTE: Protected, private and special attributes the names of which start
by convention with at least one leading '_' are ignored by READONLY
decorator.
NOTE: in Python 2 due to lack of __prepare__ method READONLY is injected
into list __builtins__ methods and functions.

Byte-RLE decoders:
==================

Droped state machine used by byterle_decoder methods to decode hxbyterle
stream switching between read counter, expand rleencoded block or copy
non compressed block. Implementation resorts to memcpy and memset based
approach for c/c++ native encoder function and slicing for slow pure python
based implementation. Both changes yields a significant speedup.
The arguments accepted by both implementatoins can also be passed as keywords
and have been adjusted to allwo to use byterle_decodder methods
interchangeably with numpy.frombuffer method at least when called with
count and dtype keyword arguments only.

Python & packaging:
===================

Renamed decodersmodule source:
-----------------------------
As i got gcc compiler error on calling python setup.py build about
 ISO C++ forbids not sure what, renamed the decodersmodule.cpp file
to decodersmodule.c as it was a cpp file only to be able to use
static_cast on one single place. works with unsafe c cast too.

Pytest:
------
Improoved pytest unittest collection required to properly
test and verify the suggested approach. All necessary AmiraMesh
and HyperSurface files are included for properly testing.

Added warnings for code pathes which currently can not be tested or
checked if they are requird at all due to lack of appropriate test data
These code pathes are hidden form coverage check using nocover pragma

Introduced dedicatd coverage configuration file to ensure coverage does
not report code pathes which are not supported by Python 2 and thus to be
ignored by coverage. Code pathes to be considered for Python 3 are
marked by cover_py3 pragma and those to be considered for Python 2 only
are marked by cover_py2 pragma. Updated tox.ini and .coveragerc and
.coverage_27 accordingly. tox.ini is also used to configure pytest

Travis:
-------
Switched to tox-travis for seamless interaction between Tox and Travis
and make Pytest and Pytest-cov changes taking effect for Travis too

Python 2.7:
-----------
Additional scikit-image requirements:
   PyWavelets<=1.0.3, Pillow<=6.2.1, kiwisolver<=1.1.0
   total: 9

Some minor mocking fix for object.__dir__ which has been introduced in
Python 3 but caused exception on Python 2 for overloaded __dir__ method
filtering file defined attributes which are not hosted by the inspected
Block instance.

Python 3.5:
-----------
Additional scikit-image requirements:
   scikit-image<=0.15 , scipy<=1.4
   total: 3

Disabled pytest and pytest-cov:
Disable was necessary as pytest team decided to switchc back on
python 3.5 to pathlib2 python2 backport as builtin pathlib does
not meet their requirements until Python 3.6. At the same or some
time later pathlib2 team has decided not to care any more about
whether pathlib2 is usable under Python3 or not as Python 3 versions
have anyway native pathlib. This two opposit decisions lead to the
fact that pytest turns out to be at least in Travis disfunct for
Python 3.5. The reason is that in Python comparison of List and Tuple
is properly possibl in Python 3 the two can not be compared and
pathlib 2 does excatly that when checking whether it has to do some
compatibility thingy for Python2 <= 2.7. The offending line is
    ```sys.version_info <= (2,7)```
 which causes TypeError unorderable types in Python 3.5

Thus suggest to not test anymore on Python 3.5 and rely upon that
testing on >= 3.6 is sufficient. In worst drop python 3.5 support in
case not.
@hernot hernot force-pushed the clean-file-attr-loading branch from 3b452d4 to ccf1ea7 Compare September 18, 2020 20:33
@hernot
Copy link
Contributor Author

hernot commented Sep 18, 2020

Unless Travis finds some packages which meanwhile dropped python2 or python3.5 support squashing should be done and commit message should be updated accordingly.

… proper build of byterele_decoder from c-source in tox virtual environment with pytest. old structure caused import or source and not installed packages
@paulkorir paulkorir merged commit 71a1477 into emdb-empiar:dev Jan 18, 2021
@hernot
Copy link
Contributor Author

hernot commented Jan 18, 2021

Thank you very much even though i meant that you just update dev with your changes from master related to travis moving to .com only,so that i can than amend pull request before you merge it finally.

@paulkorir
Copy link
Member

We ran out of free travis credits and I don't have the bandwidth to figure it out. Therefore, I've got things working with Gihub actions. We can then unmerge your changes then you can simply copy the .github folder and make the necessary changes to get the tests running. Is that OK?

@hernot
Copy link
Contributor Author

hernot commented Jan 19, 2021

Sure would have been easier to just put the file in dev, cause thatn it would have fetched the changes and force pushed them back to my fork but no worry. I already have fetched the updated master and the updated dev so is unmerge stil necessary or which files did you change to make github actions work, maybe i can merge over to dev on my fork and than pullrequest against upstream dev. Would assumably easier than unmerging again.

@paulkorir
Copy link
Member

That's OK. I'll find a workaround.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants