Skip to content

Update/operation tmp dirs #81

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 6 commits into from
Apr 14, 2025
Merged
Show file tree
Hide file tree
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
39 changes: 23 additions & 16 deletions datalab/datalab_session/data_operations/data_operation.py
Original file line number Diff line number Diff line change
Expand Up @@ -23,22 +23,6 @@ def __init__(self, input_data: dict = None):
self.cache_key = self.generate_cache_key()
self.temp = settings.TEMP_FITS_DIR # default fallback

try:
tmp_hash_path = os.path.join(self.temp, self.cache_key)
# If tmp dir already exists, append a random hash to avoid collision
if os.path.exists(tmp_hash_path):
tmp_hash_path = os.path.join(tmp_hash_path, hashlib.sha256(os.urandom(8)).hexdigest())

os.makedirs(tmp_hash_path)
self.temp = tmp_hash_path
except Exception as e:
log.warning(f"Failed to create temp dir for operation {self.cache_key}: {e} using default {self.temp}")

def __del__(self):
""" Clear the tmp dir for the operation """
if self.temp and os.path.exists(self.temp):
shutil.rmtree(self.temp)

def _normalize_input_data(self, input_data):
if input_data == None:
return {}
Expand Down Expand Up @@ -72,6 +56,29 @@ def operate(self):
It should periodically update the percent completion during its operation.
It should set the output and status into the cache when done.
"""

def allocate_operate(self):
"""
Wraps the operate() method, creates a unique temp directory for the operation
"""
# Create the temp directory for the operation
try:
tmp_hash_path = os.path.join(self.temp, self.cache_key)
# If tmp dir already exists, append a random hash to avoid collision
if os.path.exists(tmp_hash_path):
tmp_hash_path = os.path.join(tmp_hash_path, hashlib.sha256(os.urandom(8)).hexdigest())
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This shouldn't be possible since its just the cache_key which should ensure we are only operating on it once... But I guess its fine to have as extra protection.

Copy link
Contributor Author

@LTDakin LTDakin Apr 14, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was thinking if someone ran the same operation on another computer at the same time: ex. Median on the same two images. Thought it might be more possible in a classroom setting. Is a very unique edge case though


os.makedirs(tmp_hash_path)
self.temp = tmp_hash_path
except Exception as e:
log.warning(f"Failed to create temp dir for operation {self.cache_key}: {e} using default {self.temp}")

# Run the operation
self.operate()

# Clean up the temp directory
if self.temp and os.path.exists(self.temp):
shutil.rmtree(self.temp)

def perform_operation(self):
""" The generic method to perform the operation if its not in progress """
Expand Down
2 changes: 1 addition & 1 deletion datalab/datalab_session/tasks.py
Original file line number Diff line number Diff line change
Expand Up @@ -25,7 +25,7 @@ def execute_data_operation(data_operation_name: str, input_data: dict):
raise NotImplementedError("Operation not implemented!")
else:
try:
operation_class(input_data).operate()
operation_class(input_data).allocate_operate()
except ClientAlertException as error:
log.error(f"Client Error executing {data_operation_name}: {error}")
operation_class(input_data).set_failed(str(error))
Expand Down