Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Discard Aim run when exception is raised #3183

Closed
luispsantos opened this issue Jul 10, 2024 · 2 comments
Closed

Discard Aim run when exception is raised #3183

luispsantos opened this issue Jul 10, 2024 · 2 comments
Labels
help wanted Extra attention is needed type / bug Issue type: something isn't working

Comments

@luispsantos
Copy link

luispsantos commented Jul 10, 2024

🐛 Bug

I have a scenario in which I have some complex code which creates params / metrics for my Aim run, sometimes the code might fail in-between with an exception, at which case I want to discard my Aim run. I am facing some issues in doing that as there is a resource thread running as a daemon which raises an exception and at the same time blocks the closure of my program.

To reproduce

from aim import Run

run = Run()
exception_occurred = False
try:
    ...  # complex code for ML experiments
    1 / 0  # simulate exception being raised
except:
    exception_occurred = True
finally:
    run.finalize()
    run.close()
    if exception_occurred:
        run.repo.delete_run(run.hash)
        print(f"Run {run.hash} discarded")

With this code I obtain aimrocks.errors.RocksIOError: b'IO error: No such file or directory: While open a file for appending: /home/.../.aim/seqs/chunks/d94288337184963820667c/000008.log: No such file or directory'. The Python program hangs after this exception is raised due to a daemon thread.

Expected behavior

I should be able to discard a run when an exception is raised on my code, and not have this exception raised and have my program hanging in the process.

Environment

  • Aim Version: 3.22.0
  • Python version: 3.10.14
  • pip version: 24.0
  • OS: Linux

Additional context

@luispsantos luispsantos added help wanted Extra attention is needed type / bug Issue type: something isn't working labels Jul 10, 2024
@mihran113
Copy link
Contributor

Hey @luispsantos! Thanks a lot for opening the issue. The resource cleanup is a kind of a messy area currently, as we have done some magic tricks to close the run resources implicitly as well. We'll revisit that in the upcoming releases as we have some other issues there as well. But for now the below script should do the job:

from aim import Run, Repo

repo = Repo.from_path('.')
run = Run(repo=repo)
exception_occurred = False
try:
    ...  # complex code for ML experiments
    1 / 0  # simulate exception being raised
except:
    exception_occurred = True
finally:
    run.finalize()
    run.close()
    run_hash = run.hash
    run = None
    if exception_occurred:
        repo.delete_run(run_hash)
        print(f"Run {run_hash} discarded")

@luispsantos
Copy link
Author

Hi @mihran113. Thanks a lot for your fix, this indeed fixed my issue! The current way of discarding runs is not very straightforward in my opinion, but I guess there'll be a rework of this on future releases. We can close this ticket 👍

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
help wanted Extra attention is needed type / bug Issue type: something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants