Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Address Issue 27 - Selective processing and uploading of articles and collections mentioned in the command-line argument #45

Merged
merged 13 commits into from
Jun 3, 2023

Conversation

pavithraarizona
Copy link
Contributor

@pavithraarizona pavithraarizona commented Jun 2, 2023

This pull request must be merged after pull request #44 to avoid errors during code execution. If this order is not followed, an error will be thrown.

Please find below the list of functionalities that have been accomplished through the code added in this pull request.

  1. Code to accept specific article and collection IDs from a command-line argument is already achieved as part of enhancement Enhance ReBACH to accept specific article and collection IDs for selective processing #43 . However, as part of this pull request a code has been added to perform the processing and uploading of those specific article and collection Ids.

Please refer to the commits titled 'Add processing and upload of collections based on IDs' and 'Add processing and upload of articles based on IDs' to see the precise details of the added code and its specific location.

  1. Currently, the required space is calculated by summing the 'curation folder size' and the 'total size of all articles'. The 'curation folder size' is determined by considering all curation folders. However, when article IDs are provided explicitly, it is unnecessary to consider all curation folders to calculate required space. Instead, only the curation folder(s) that match the specified article ID(s) should be considered for calculating the 'curation folder size'. This refinement enables a more accurate calculation of the required space. The code responsible for this functionality has been developed and added as part of this pull request.

Please refer to the commits titled 'Capture curation folder names for matched articles' and 'Calculate curation folder size based on matched curation folders' to see the precise details of the added code and its specific location.

  1. A code has been added as part of this to calculate and display count of articles for which curation folder exists and do not exist

Please refer to the commits titled 'Add code to count and display matched and unmatched article Ids' and 'Add log messages for tracking total size of articles, curation folder size, and required space' to see the precise details of the added code and its specific location.

Add code to filter and process only the collections whose IDs are explicitly provided via the command-line argument.
1. Add code to filter and process only the articles whose IDs are explicitly provided
2. Create a variable to indicate whether article IDs have been explicitly passed or not. The value 'True' signifies explicit inclusion.
1. Add code to capture curation folder names for the articles that have been matched. 
2. If a curation folder exists for a matched article, the folder name is stored in the 'self.matched_curation_folder_list' variable. 
3. This variable is used in a later stage of the code (in process_articles function) when article IDs are explicitly passed and the curation folder size is calculated based on the matched curation folders.
Currently, the required space is calculated by summing the curation folder size and the total size of all articles. The curation folder size is determined by considering all curation folders. However, when article IDs are provided explicitly, it is unnecessary to consider all curation folders. Instead, only the curation folder(s) that match the provided article ID(s) should be considered for calculating the 'curation folder size'. This refinement enables a more accurate calculation of the required space. The code responsible for this functionality has been developed and committed as part of this commit
… size, and required space

Add log messages for tracking total size of articles, curation folder size, and required space
Implement code to count and display matched and unmatched article IDs after performing the curation folder check for articles.
@pavithraarizona pavithraarizona self-assigned this Jun 2, 2023
@pavithraarizona pavithraarizona marked this pull request as draft June 2, 2023 22:24
@pavithraarizona pavithraarizona changed the title Issue 27 Adress Issue 27 Jun 3, 2023
@pavithraarizona pavithraarizona changed the title Adress Issue 27 Address Issue 27 - Selective processing and uploading of articles and collections mentioned in the command-line argument Jun 3, 2023
1. The parameter 'total_file_size' passed to the function '__initial_process' has been removed since it is not utilized within the function.
2. To prevent errors when the values for 'matched_articles' and 'unmatched_articles' variables are empty, their declarations have been adjusted. 
3. Lint errors have been fixed
Fix lint errors
fix lint errors
fix lint errors
@pavithraarizona pavithraarizona marked this pull request as ready for review June 3, 2023 04:30
@zoidy zoidy assigned zoidy and unassigned pavithraarizona Jun 3, 2023
zoidy added 3 commits June 3, 2023 18:56
Avoid manual parsing of arguments via the get_id_list function (which is now removed). Pass in article ids to functions instead
@zoidy zoidy merged commit 619cbbd into main Jun 3, 2023
@zoidy zoidy deleted the Issue_27 branch June 4, 2023 13:12
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Selective processing and uploading of articles and collections mentioned in the command-line argument
2 participants