You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I have a scenario in which I have some complex code which creates params / metrics for my Aim run, sometimes the code might fail in-between with an exception, at which case I want to discard my Aim run. I am facing some issues in doing that as there is a resource thread running as a daemon which raises an exception and at the same time blocks the closure of my program.
To reproduce
fromaimimportRunrun=Run()
exception_occurred=Falsetry:
... # complex code for ML experiments1/0# simulate exception being raisedexcept:
exception_occurred=Truefinally:
run.finalize()
run.close()
ifexception_occurred:
run.repo.delete_run(run.hash)
print(f"Run {run.hash} discarded")
With this code I obtain aimrocks.errors.RocksIOError: b'IO error: No such file or directory: While open a file for appending: /home/.../.aim/seqs/chunks/d94288337184963820667c/000008.log: No such file or directory'. The Python program hangs after this exception is raised due to a daemon thread.
Expected behavior
I should be able to discard a run when an exception is raised on my code, and not have this exception raised and have my program hanging in the process.
Environment
Aim Version: 3.22.0
Python version: 3.10.14
pip version: 24.0
OS: Linux
Additional context
The text was updated successfully, but these errors were encountered:
Hey @luispsantos! Thanks a lot for opening the issue. The resource cleanup is a kind of a messy area currently, as we have done some magic tricks to close the run resources implicitly as well. We'll revisit that in the upcoming releases as we have some other issues there as well. But for now the below script should do the job:
from aim import Run, Repo
repo = Repo.from_path('.')
run = Run(repo=repo)
exception_occurred = False
try:
... # complex code for ML experiments
1 / 0 # simulate exception being raised
except:
exception_occurred = True
finally:
run.finalize()
run.close()
run_hash = run.hash
run = None
if exception_occurred:
repo.delete_run(run_hash)
print(f"Run {run_hash} discarded")
Hi @mihran113. Thanks a lot for your fix, this indeed fixed my issue! The current way of discarding runs is not very straightforward in my opinion, but I guess there'll be a rework of this on future releases. We can close this ticket 👍
🐛 Bug
I have a scenario in which I have some complex code which creates params / metrics for my Aim run, sometimes the code might fail in-between with an exception, at which case I want to discard my Aim run. I am facing some issues in doing that as there is a resource thread running as a daemon which raises an exception and at the same time blocks the closure of my program.
To reproduce
With this code I obtain
aimrocks.errors.RocksIOError: b'IO error: No such file or directory: While open a file for appending: /home/.../.aim/seqs/chunks/d94288337184963820667c/000008.log: No such file or directory'
. The Python program hangs after this exception is raised due to a daemon thread.Expected behavior
I should be able to discard a run when an exception is raised on my code, and not have this exception raised and have my program hanging in the process.
Environment
Additional context
The text was updated successfully, but these errors were encountered: