Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add async support for temporary file handling. Closes #344 #867

Open
wants to merge 26 commits into
base: master
Choose a base branch
from

Conversation

11kkw
Copy link

@11kkw 11kkw commented Feb 10, 2025

Changes

feat: Add async support for temporary file handling (#344)

  • Implemented async support for TemporaryFile, NamedTemporaryFile, SpooledTemporaryFile, and TemporaryDirectory.
  • Integrated to_thread for handling async operations with tempfiles.
  • Added tests to ensure async compatibility.

Checklist

  • You've added tests (in tests/) which would fail without your patch.
  • You've updated the documentation (in docs/, in case of behavior changes or new features). (Pending)
  • You've added a new changelog entry (in docs/versionhistory.rst). (Pending)

Updating the changelog

(Changelog not updated)


Additional Notes

I have implemented async support for temporary file handling.
Currently, I haven't updated the documentation or changelog yet.

I'm unsure if this implementation follows the best approach, so I would appreciate feedback on the current implementation first.
Once I receive input from the review, I will update the documentation and changelog accordingly if needed.

Copy link
Owner

@agronholm agronholm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good in general, but I left several change requests.

11kkw and others added 2 commits February 12, 2025 02:39
- Remove redundant  decorators
- Adjust type annotations by removing unnecessary parentheses
- Import  from its correct location
- Remove unnecessary async versions of
- Add blank lines for better readability after control blocks
- Remove redundant type casts and isinstance checks
- Group related tests under a test class for better organization
- Fix path variable naming consistency
- Ensure proper handling of blocking I/O within async functions
@agronholm
Copy link
Owner

You also need to fix the pyright errors.

@agronholm agronholm requested a review from graingert February 11, 2025 18:15
@agronholm
Copy link
Owner

Given that this is a new feature, I think it warrants a new documentation chapter. The code examples don't need to be comprehensive, but should make readers aware of this feature.

…nore ASYNC230

- Insert a blank line after the end of every control block to separate it from the following code (per review instructions).
- Explicitly specify return types and adjust type annotations to resolve Pyright errors.
- Maintain blocking I/O inside to_thread.run_sync() as per ASYNC230 compliance for runtime code.
- Suppress ASYNC230 warnings in the test suite by adding per-file ignores in pyproject.toml.
@11kkw
Copy link
Author

11kkw commented Feb 11, 2025

Given that this is a new feature, I think it warrants a new documentation chapter. The code examples don't need to be comprehensive, but should make readers aware of this feature.

I will add the document work and commit it as soon as possible. Thank you for your thorough review!

self._async_file = None


class SpooledTemporaryFile(Generic[AnyStr]):
Copy link
Collaborator

@graingert graingert Feb 12, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ideally for SpooledTemporaryFile is that writes/reads to/from the BytesIO buffer will not go to a thread

SpooledTemporaryFile needs to not go to a thread for these 'small' file things because the usecase is starlette multipart form file uploads which needs to remain fast for tiny uploads

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Kludex would you like to participate in the review of this feature too?

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've made the necessary changes based on your feedback, but I'm not entirely sure if I've implemented them correctly due to my limited experience. I'd really appreciate it if you could review my changes and provide any further suggestions. Thank you for your guidance!"

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This class needs an aclose()

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep, it needs the aclose().

- Optimized  to avoid unnecessary thread switching for small files
- Fixed blank lines after control blocks for better readability
- Resolved Pyright type errors
- Updated async tempfile documentation based on review feedback
@agronholm
Copy link
Owner

Tests on 3.14 are failing because a new alpha added yet more attributes to pathlib.Path.

@agronholm
Copy link
Owner

Oh, before I forget: the API reference needs to be updated to include the new classes and functions.

11kkw and others added 5 commits February 15, 2025 01:37
Co-authored-by: Alex Grönholm <alex.gronholm@nextday.fi>
Co-authored-by: Alex Grönholm <alex.gronholm@nextday.fi>
…te API docs

- Improved type annotations for mkstemp and mkdtemp functions using overloads per Typeshed.
- Fixed test failures on Python 3.14 caused by additional attributes in pathlib.Path.
- Updated API reference to include new classes and functions.
Copy link
Owner

@agronholm agronholm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added some minor clean-ups as suggestions. The only real piece missing is the additions to the API reference documentation, and then @graingert needs to re-check the work you did on SpooledTemporaryFile. We're almost done now!

11kkw and others added 6 commits February 16, 2025 22:37
Co-authored-by: Alex Grönholm <alex.gronholm@nextday.fi>
Co-authored-by: Alex Grönholm <alex.gronholm@nextday.fi>
Co-authored-by: Alex Grönholm <alex.gronholm@nextday.fi>
Co-authored-by: Alex Grönholm <alex.gronholm@nextday.fi>
Co-authored-by: Alex Grönholm <alex.gronholm@nextday.fi>
Co-authored-by: Alex Grönholm <alex.gronholm@nextday.fi>
11kkw and others added 5 commits February 16, 2025 22:38
Co-authored-by: Alex Grönholm <alex.gronholm@nextday.fi>
Co-authored-by: Alex Grönholm <alex.gronholm@nextday.fi>
Co-authored-by: Alex Grönholm <alex.gronholm@nextday.fi>
Co-authored-by: Alex Grönholm <alex.gronholm@nextday.fi>
Co-authored-by: Alex Grönholm <alex.gronholm@nextday.fi>
if not getattr(self._fp, "_rolled", True):
result = self._fp.write(s)
if self._fp._max_size and self._fp.tell() > self._fp._max_size:
self._fp.rollover()
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Doesn't this do blocking IO?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

write will do the rollover for us in this situation, which will result in blocking disk IO on the event loop there, so we need to make sure it won't.

You don't have to use SpooledTemporyFile here you can just copy the source code here with attribution, and modify it to use an async file

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@11kkw Yeah, so instead of delegating to tempfile.SpooledTemporaryFile, @graingert is suggesting that you roll an independent implementation in order to get more control over async checkpoints and such.

) -> SpooledTemporaryFile[AnyStr]:
fp = await to_thread.run_sync(
partial(
tempfile.SpooledTemporaryFile,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Creating the SpooledTemporyFile doesn't perform blocking disk IO so this shouldn't need a thread

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right, you already mentioned that ^ here.

self.dir: AnyStr | None = dir
self.errors = errors
self._async_file: AsyncFile | None = None
self._fp: Any = None
Copy link
Collaborator

@graingert graingert Feb 17, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think should be initialized here, how does the upstream SpooledTemporyFile behave?

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It doesn't do any file I/O if that's what you mean, so yes, self._fp should be initialized to tempfile.SpooledTemporaryFile here.

raise RuntimeError("Underlying file is not initialized.")

if not getattr(self._fp, "_rolled", True):
result = self._fp.write(s)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
result = self._fp.write(s)
return self._fp.write(s)

@@ -90,6 +90,9 @@ extend-select = [
[tool.ruff.lint.isort]
"required-imports" = ["from __future__ import annotations"]

[tool.ruff.lint.per-file-ignores]
"tests/test_tempfile.py" = ["ASYNC230"]
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Which open call is blocking?

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The one in test_mkstemp().

async def test_exception_handling(self) -> None:
async with NamedTemporaryFile[bytes]() as af:
filename = af.name
assert os.path.exists(filename) # type: ignore[arg-type]
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is this a mypy error?

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because the name attribute is retrieved using __getattr__() which is declared to return object. I think this has been used elsewhere in anyio too – perhaps we should declare it to return Any instead?

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Of course, that won't be necessary if this is just reimplemented by copying stdlib code.

raise RuntimeError("Underlying file is not initialized.")

if not getattr(self._fp, "_rolled", True):
result = self._fp.write(s)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need to have a bit of a think here. According to the rules of trio every async function should always checkpoint, however for SpooledTemporyFile to be useful it should be as close to a BytesIO in terms of performance at small sizes. The documentation should have a big red note here or write should include a checkpoint if we're writing to the memory buffer



@pytest.mark.parametrize("prefix", [b"mkdtemp_", "mkdtemp_"])
async def test_mkdtemp(prefix: AnyStr) -> None:
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The typevar is unwarranted here.

Suggested change
async def test_mkdtemp(prefix: AnyStr) -> None:
async def test_mkdtemp(prefix: str | bytes) -> None:

@agronholm
Copy link
Owner

@11kkw Let us know if you need help. I understand all this back and forth may feel pretty exhausting!

self._async_file = None


class SpooledTemporaryFile(Generic[AnyStr]):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is this a generic? The tempfile.SpooledTemporaryFile is not a generic.

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's something different happening here...

from tempfile import SpooledTemporaryFile

from anyio import SpooledTemporaryFile as anyio_SpooledTemporaryFile

a = SpooledTemporaryFile()
b = anyio_SpooledTemporaryFile()

I think it defaults to bytes. My IDE is running pyright (strict mode):

Screenshot 2025-02-22 at 13 53 19

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Curious, mypy reports the type as SpooledTemporaryFile[builtins.str] while pyright reports it as SpooledTemporaryFile[Unknown].

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

from tempfile import SpooledTemporaryFile

from anyio import SpooledTemporaryFile as AnyIOSpooledTemporaryFile

async def main() -> None:
    with SpooledTemporaryFile() as f1:
        reveal_type(f1)

    async with AnyIOSpooledTemporaryFile() as f2:
        reveal_type(f2)

Gives me:

  /home/alex/workspace/anyio/client.py:7:21 - information: Type of "f1" is "SpooledTemporaryFile[bytes]"
  /home/alex/workspace/anyio/client.py:10:21 - information: Type of "f2" is "SpooledTemporaryFile[Unknown]"

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ahh...I didn't notice before that those TemporaryFile etc. in typeshed were actually functions! That explains a few things. I'll work on fixing the SpooledTemporaryFile implementation tomorrow.

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I refactored SpooledTemporaryFile in a way that should satisfy @graingert. However, this caused a lot fo mypy errors that I can't understand.

Copy link
Author

@11kkw 11kkw Feb 23, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm sorry for bringing this up so late. However, since the official documentation states, The dir, prefix and suffix parameters have the same meaning and defaults as with mkstemp() I assumed that the parameters for mkstemp are of type AnyStr, and I proceeded with that approach.

https://docs.python.org/3/library/tempfile.html#tempfile.mkstemp

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Whatever the reason, using AnyStr in both the class type argument AND the initializer arguments makes no sense because the type parameter in the class is supposed to determine whether the input and output type is str or bytes. But the reason for the initializer arguments to be parametrized is because they in part determine file name and you want them to be either str or bytes. Two different, unrelated uses.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Whatever the reason, using AnyStr in both the class type argument AND the initializer arguments makes no sense because the type parameter in the class is supposed to determine whether the input and output type is str or bytes. But the reason for the initializer arguments to be parametrized is because they in part determine file name and you want them to be either str or bytes. Two different, unrelated uses.

I agree with this change

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants