Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

gh-127647: Add typing.Reader and Writer protocols #127648

Merged
merged 37 commits into from
Mar 6, 2025

Conversation

srittau
Copy link
Contributor

@srittau srittau commented Dec 5, 2024

@srittau
Copy link
Contributor Author

srittau commented Dec 5, 2024

A few design considerations:

  • I used the names Reader and Writer to match existing ABC names in typing/collections.abc. Alternatives are Readable and Writable, but I think they are used more rarely for these kind of (pseudo-)protocols. SupportsX would work for Writer, but not for Reader, since the latter supports multiple methods. (Any we maybe want to use SupportsRead etc. at some point for tighter protocols.)
  • I would prefer these protocols to be in io, where they fit better thematically, but that would require importing typing with unforseeable (performance) implications for such a foundational module.
  • I deliberated using tighter protocols, but for ease of use reasons – and since they are mostly an alternative to using IO et al. – I went with larger protocols for now.
  • Seekable (with methods seek and tell) would be an interesting addition as well, but I think this should wait for easy protocol composition.

Copy link
Member

@picnixz picnixz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm a bit sad that we can't use covariance and contravariance in the protocols since a subclass could use it with an invariant type.

Small improvements to the docstrings and signature

Protocol for reading from a file or other input stream.

.. method:: read(size=...)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
.. method:: read(size=...)
.. method:: read(size=..., /)

(Same for other methods)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was unsure of how much of the signature to document. For example, would it also make sense to add argument and/or return types? I've added the slashes for now.

@AlexWaygood
Copy link
Member

I would prefer these protocols to be in io, where they fit better thematically, but that would require importing typing with unforseeable (performance) implications for such a foundational module.

io feels like a more natural home to me too. Could you not do something similar to the pseudo-protocol ABCs in collections.abc, contextlib and os.path? The approach there is:

  • Make the classes ABCs rather than protocols
  • But document them as protocols
  • And write __subclasshook__ methods so that they behave identically to runtime-checkable Protocols in their isinstance()/issubclass() behaviour:
    @classmethod
    def __subclasshook__(cls, C):
    if cls is AbstractContextManager:
    return _collections_abc._check_methods(C, "__enter__", "__exit__")
    return NotImplemented
  • And special-case the ABCs in typing.py so that the typing module treats them identically to protocols:

    cpython/Lib/typing.py

    Lines 1940 to 1948 in 023b7d2

    _PROTO_ALLOWLIST = {
    'collections.abc': [
    'Callable', 'Awaitable', 'Iterable', 'Iterator', 'AsyncIterable',
    'AsyncIterator', 'Hashable', 'Sized', 'Container', 'Collection',
    'Reversible', 'Buffer',
    ],
    'contextlib': ['AbstractContextManager', 'AbstractAsyncContextManager'],
    'os': ['PathLike'],
    }
  • And pretend in typeshed that they're protocols, so that type checkers treat them as protocols (I feel like python/typeshed@f554f54 missed the point there slightly)

@srittau
Copy link
Contributor Author

srittau commented Dec 6, 2024

That feels terribly hackish to me, even if there is precedence. And in my experience, hackish stuff often falls on one's feet one way or the other. But I can move it to io if that's the consensus.

@AlexWaygood
Copy link
Member

This has been the way the collections.abc and contextlib ABCs have always worked. As far as I'm aware, it's worked pretty well up to now. And it doesn't actually feel like the most hackish part of collections.abc ;)

_sys.modules['collections.abc'] = _collections_abc
abc = _collections_abc

@srittau
Copy link
Contributor Author

srittau commented Dec 6, 2024

Well, considering that typing.is_protocol uses the _is_protocol marker to determine whether something is a protocol, I guess that's the official way to mark protocols at runtime, not the fact that they are derived from Protocol. Which contradicts the typing spec at the moment. This all seems a bit precarious to me.

@AlexWaygood
Copy link
Member

I'm open to something less hacky if you can find a generalized solution that doesn't make _collections_abc depend on typing (which would significantly slow down startup for all Python applications, since _collections_abc is imported as part of Python's initialisation).

@AlexWaygood
Copy link
Member

considering that typing.is_protocol uses the _is_protocol marker to determine whether something is a protocol, I guess that's the official way to mark protocols at runtime, not the fact that they are derived from Protocol. Which contradicts the typing spec at the moment.

I don't think the typing spec needs to be cognisant of the hacks we use to make things work at runtime in the stdlib! This knowledge is irrelevant for third-party users of protocols, to users of type checkers and to implementers of type checkers. It's definitely not encouraged for any external users to make use of the _is_protocol attribute.

@srittau
Copy link
Contributor Author

srittau commented Dec 6, 2024

CI seems flaky, failures are unrelated.

@srittau
Copy link
Contributor Author

srittau commented Feb 25, 2025

@AlexWaygood I've moved the protocols to io, and removed Reader.readline(), and Reader.__iter__(). But now the tests fail:

======================================================================
FAIL: test_startup_imports (test.test_site.StartupImportTests.test_startup_imports)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/Users/runner/work/cpython/cpython/Lib/test/test_site.py", line 624, in test_startup_imports
    self.assertNotIn('copyreg', modules, stderr)
    ~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^^^^^^
AssertionError: 'copyreg' unexpectedly found in {'_thread', 'collections', '_weakref', 'reprlib', '__main__', '_io', 'contextlib', '_collections_abc', 'posixpath', 'time', 'zipimport', '_signal', '_warnings', 'collections.abc', '_operator', 'encodings.aliases', 'annotationlib', '_frozen_importlib', '_imp', 'io', 'abc', 'operator', 'linecache', 'functools', 'genericpath', 'builtins', '_functools', '_abc', 'site', 'sys', '_codecs', '_typing', 'codecs', 'copyreg', 'keyword', 'stat', '_collections', 'itertools', 'encodings', 'errno', 'ast', 'types', 'marshal', 'typing', 'posix', 'enum', 'os.path', 'encodings.utf_8', '_stat', '_ast', '_sitebuiltins', 'os', '_frozen_importlib_external'} : import _frozen_importlib # frozen

I'm unsure why that is. Originally I thought it was because I added a _collections_abc import to io, but even after moving the import into the __subclasshook__s, the error persists.

@srittau
Copy link
Contributor Author

srittau commented Feb 25, 2025

It seems that by using the new generic syntax (class Reader[T]), the typing module was implicitly imported. Removing that and using the __class_getitem__ = classmethod(GenericAlias) trick has worked.

Co-authored-by: Adam Turner <9087854+AA-Turner@users.noreply.github.com>
srittau and others added 2 commits February 27, 2025 13:08
Co-authored-by: Adam Turner <9087854+AA-Turner@users.noreply.github.com>
Co-authored-by: Adam Turner <9087854+AA-Turner@users.noreply.github.com>
Copy link
Member

@AlexWaygood AlexWaygood left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This LGTM now but it might be good to have one or two simple tests covering the new code. It probably doesn't need to be much more than this (added to test_io.py)

def test_supports_index(self):
self.assertIsSubclass(int, typing.SupportsIndex)
self.assertNotIsSubclass(str, typing.SupportsIndex)

and maybe an addition to the test_typing tests similar to this one:

def test_collections_protocols_allowed(self):
@runtime_checkable
class Custom(collections.abc.Iterable, Protocol):
def close(self): ...
class A: pass
class B:
def __iter__(self):
return []
def close(self):
return 0
self.assertIsSubclass(B, Custom)
self.assertNotIsSubclass(A, Custom)
@runtime_checkable
class ReleasableBuffer(collections.abc.Buffer, Protocol):
def __release_buffer__(self, mv: memoryview) -> None: ...
class C: pass
class D:
def __buffer__(self, flags: int) -> memoryview:
return memoryview(b'')
def __release_buffer__(self, mv: memoryview) -> None:
pass
self.assertIsSubclass(D, ReleasableBuffer)
self.assertIsInstance(D(), ReleasableBuffer)
self.assertNotIsSubclass(C, ReleasableBuffer)
self.assertNotIsInstance(C(), ReleasableBuffer)

Copy link
Member

@AlexWaygood AlexWaygood left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is great, thank you! Just a few docs nits remaining

Co-authored-by: Alex Waygood <Alex.Waygood@Gmail.com>
@AlexWaygood
Copy link
Member

AlexWaygood commented Feb 27, 2025

I'll leave it a little bit in case any of the other reviewers here have any remaining concerns before merging

__slots__ = ()

@abc.abstractmethod
def read(self, size=..., /):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry if this has been discussed before, but I'm unsure on the runtime use of size=... (I didn't notice this earlier in my documentation review, sorry).

Almost every other read(size) method I can find has a default of either None or -1. I also can't find another method in the stdlib with a default of ... (outside of the recently-added protocols in wsgiref.types).

Would it be better to have size=-1, to indicate that the method takes an int? I'm not sure how much we want typeshed-like practices to leak into the standard library.

Suggested change
def read(self, size=..., /):
def read(self, size=-1, /):

A

Copy link
Contributor Author

@srittau srittau Feb 27, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mandating defaults is not really something you can do in a protocol. I also wouldn't want to mandate that implementors have to use a default of -1, because – as you said – some implementations use None.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right, but this isn't a protocol -- it's an ABC, which do have defaults -- see e.g. collections.abc.Generator. All of the read() methods in io are documented as having read(size=-1, /), and given these ABCs are going into io, I think we should be consistent with that interface, or have good reason to diverge from it (& document why).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's supposed to be a protocol, not an ABC. (Notwithstanding the fact that all protocols are ABCs.) It's just a protocol in the implementation for performance reasons. And using -1 as a default would give users the very wrong impression that they can use read(-1) when that may or may not actually be supported.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

using -1 as a default would give users the very wrong impression that they can use read(-1) when that may or may not actually be supported.

From the documentation of io.RawIOBase.read():

Read up to size bytes from the object and return them. As a convenience, if size is unspecified or -1, all bytes until EOF are returned.

I would expect the io.Reader.read() ABC/protocol to have this same guarantee, for a 'properly' implemented read() method (according to the io expecations). In the proposed documentation, we say:

Read data from the input stream and return it. If size is specified, it should be an integer, and at most size items (bytes/characters) will be read.

This forbids None (good!), but is silent on what happens should size be omitted. I still think we should use -1 instead of ..., but at the very least we should include in the documentation the contract for what happens when read() is called with no arguments.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I disagree. -1 is the default for some implementations, but the protocol should not make any default mandatory.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm still not sure I follow, though -- currently the ABC/protocol has the mandatory default of size=..., which is always invalid and not the correct type. -1 is valid for all interfaces specified by the io documentation, which is where this new type is being added.

I ran the following with mypy --strict and it passed, so I don't think type checkers care about default values (and as discussed, the type forbids using non-integer types):

import abc

class Reader(metaclass=abc.ABCMeta):
    __slots__ = ()

    @abc.abstractmethod
    def read(self, size: int = -1, /) -> bytes: pass

class CustomReader(Reader):
    def read(self, size: int = -1, /) -> bytes:
        return b''

class CustomReaderZero(Reader):
    def read(self, size: int = 0, /) -> bytes:
        return b''

assert issubclass(CustomReader, Reader)
assert issubclass(CustomReaderZero, Reader)
assert isinstance(CustomReader(), Reader)
assert isinstance(CustomReaderZero(), Reader)

def reader_func(r: Reader) -> None:
    r.read()

reader_func(CustomReader())
reader_func(CustomReaderZero())

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree with Sebastian here; we should use ... because the protocol need not mandate any particular default.

-1 is valid for all interfaces specified by the io documentation

The protocol should also match other file-like classes defined elsewhere in the stdlib or even in third-party libraries. When defining a protocol it's often useful to be permissive, so that all objects that are intended to match the protocol actually match it.

@srittau
Copy link
Contributor Author

srittau commented Mar 6, 2025

Could this be merged before the next alpha next week?

@JelleZijlstra JelleZijlstra merged commit c6dd234 into python:main Mar 6, 2025
39 checks passed
@srittau srittau deleted the typing-readable-writable branch March 6, 2025 16:02
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants