Inefficient order of checks in `verdi import` #2979

sphuber · 2019-06-08T15:43:54Z

verdi import currently first extracts the node data before checking the metadata. When trying to import a big archive with an old version, one first has to wait for the data to be extracted before getting the message that the version is incompatible. Only then does it go automatically to verdi export migrate, but there the same problem happens. Version compatibility happens after all the data is extracted. The Archive class provides useful context managers to read just a single file. This should be used in both these commands to first read the metadata.json and report any errors if it is incompatible before doing anything else.

The text was updated successfully, but these errors were encountered:

ltalirz · 2020-07-06T10:20:09Z

Other issues/feature requests:

verdi export inspect has a way of looking inside the metadata and we should simply be reusing this for verdi export migrate (I just checked and it doesn't seem to be terribly fast either; perhaps the same mistake is made there)
For archives that already are at the latest version, verdi export migrate currently fails with:

$ verdi export migrate aiida_tutorial_2020_07_perovskites_v0.9.aiida out.aiida
Critical: Your export file is already at the version 0.9

This is not the behavior you want e.g. when you try migrating files automatically - this should be considered a Success

I would like to see an -i/--in-place flag to migrate the archive in place

Since we're about to migrate a number of archives for Materials Cloud, I will have a brief look into this.

ltalirz · 2020-07-06T10:41:05Z

The issue seems to be here:

aiida-core/aiida/tools/importexport/common/archive.py

Lines 213 to 222 in 9d918e0

    
               @ensure_within_context 
        
               @ensure_unpacked 
        
               def _read_json_file(self, filename): 
        
                   """Read the contents of a JSON file from the unpacked archive contents. 
        
                   :param filename: the filename relative to the sandbox folder 
        
                   :return: a dictionary with the loaded JSON content 
        
                   """ 
        
                   with open(self.folder.get_abs_path(filename), 'r', encoding='utf8') as fhandle: 
        
                       return json.load(fhandle)

The @ensure_unpacked decorator extracts the whole archive although we would just like to read a single file.

The Archive class provides useful context managers to read just a single file.

@sphuber Could you please point me to this context manager?

sphuber · 2020-07-06T10:45:36Z

That phrase might be a bit misleading. It does allow you to retrieve the contents of a single file, but it still requires unpacking the whole thing. Currently, AiiDA archives can be either .zip or .tar.gz archives, and I am not sure if it is possible to even extract just a single file out of those. I think for .zip files it is, but not sure for tarballs.

ltalirz · 2020-07-06T11:02:13Z

I am not sure if it is possible to even extract just a single file out of those.

It certainly is possible, both for .zip and for .tar.gz files.

Python syntax:

tarfile.open('tarfile.tar').extract('file/to/extract', path='dest')

for zipfile see https://stackoverflow.com/a/17729939/1069467

So it seems that we should generalize the extraction code a bit to allow extracting single files and then let this functionality bubble up here...

ltalirz · 2020-08-26T11:33:53Z

Just to mention that

Calling verdi export migrate on an up-to-date file should return exit code 0 #4322 was opened for the error when trying to migrate up-to-date archives
Add an --in-place flag for verdi export migrate #4233 was opened for the --in-place flag

I.e. we can keep discussions on this thread to speed issues in inspecting/loading archive files.

ltalirz · 2020-10-29T14:47:46Z

Thanks to #4510, the archive reader is now just reading (and caching) the files that are needed.

In particular, the version check will just read the metadata

aiida-core/aiida/tools/importexport/archive/readers.py

Lines 123 to 147 in 2f8e845

    
               def check_version(self): 
        
                   """Check the version compatibility of the archive. 
        
                   :raises: `~aiida.tools.importexport.common.exceptions.IncompatibleArchiveVersionError`: 
        
                       If the version is not compatible 
        
                   """ 
        
                   file_version = StrictVersion(self.export_version) 
        
                   expected_version = StrictVersion(self.compatible_export_version) 
        
                   try: 
        
                       if file_version != expected_version: 
        
                           msg = f'Archive file version is {file_version}, can read only version {expected_version}' 
        
                           if file_version < expected_version: 
        
                               msg += "\nUse 'verdi export migrate' to update this archive file." 
        
                           else: 
        
                               msg += '\nUpdate your AiiDA version in order to import this file.' 
        
                           raise IncompatibleArchiveVersionError(msg) 
        
                   except AttributeError: 
        
                       msg = ( 
        
                           f'Archive file version is {self.export_version}, ' 
        
                           f'can read only version {self.compatible_export_version}' 
        
                       ) 
        
                       raise IncompatibleArchiveVersionError(msg)

At the moment verdi export inspect defaults to a statistics overview that requires loading the data.json as well.
This is certainly useful but I propose to change the default to a mode that only requires reading the metadata.json (e.g. printing just the export version and the AiiDA version that was used to generate it.
Opened #4531 for this.

sphuber added topic/verdi priority/nice-to-have type/enhancement topic/archive aiida-core 1.x labels Jun 8, 2019

sphuber removed the aiida-core 1.x label Jun 8, 2020

ltalirz closed this as completed Oct 29, 2020

ltalirz mentioned this issue Oct 29, 2020

Refactor Import Archive #4510

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Inefficient order of checks in `verdi import` #2979

Inefficient order of checks in `verdi import` #2979

sphuber commented Jun 8, 2019

ltalirz commented Jul 6, 2020

ltalirz commented Jul 6, 2020 •

edited

Loading

sphuber commented Jul 6, 2020

ltalirz commented Jul 6, 2020

ltalirz commented Aug 26, 2020

ltalirz commented Oct 29, 2020 •

edited

Loading

Inefficient order of checks in verdi import #2979

Inefficient order of checks in verdi import #2979

Comments

sphuber commented Jun 8, 2019

ltalirz commented Jul 6, 2020

ltalirz commented Jul 6, 2020 • edited Loading

sphuber commented Jul 6, 2020

ltalirz commented Jul 6, 2020

ltalirz commented Aug 26, 2020

ltalirz commented Oct 29, 2020 • edited Loading

Inefficient order of checks in `verdi import` #2979

Inefficient order of checks in `verdi import` #2979

ltalirz commented Jul 6, 2020 •

edited

Loading

ltalirz commented Oct 29, 2020 •

edited

Loading