One pitfall of type hints and generic types is they are different from what Python coders already know. Even if you were diligent and read the entire tutorial, they didn't get a mention and the standard library reference has them squirrled away under "development tools." They're obscure, but we need them1 so we ought to explain them.
Type hints are used by static type checkers like mypy and Pyre to prove that functions are passing the correct type of data to each other. They are the same concept as TypeScript and Flow in the Javascript world.
The premise of "gradual typing" is that it's optional. If code works, leave it alone. If you chase down a TypeError
, though, you can add a few annotations directly in the source rather than write yet another unit test.
Generic types are the weird capitalized square bracketed types like Dict[str, Tuple[int, ...]]
provided by the typing module.
In Python, the primary distinction is that type hints and generic types are not native to the interpreter.
To summarize them:
- Types
- The regular
int
,bool
,set
,Decimal
you already know. - A value always has a type, so
5
is implicitlyint
. - Used extensively by the interpreter.
- The regular
- Type hints
- Usually looks like
name: hint
. - Uses either a type or a generic type.
- A variable may have a hint.
- Largely ignored by the interpreter.
- (Also an argument to a function or member of a class has a hint.)
- Usually looks like
- Generic types
- Imported from typing.
- Look like
FrozenSet[Decimal]
,Dict[str, Tuple[int, str]]
. - Used in hints to describe the type of a variable with greater precision.
The reason for all this is that if you can nail down what kind of data is coming into a function, your code doesn't have to deal with all kinds of exceptional cases.
Python doesn't have a problem with a list like [1, 2, 'three', 'four']
, but if you're trying to sum the elements of the list, it's going to fail because summation is only defined for numbers.
A generic type like List[int]
is an assertion that the specific list
will only contain int
s. A type checker can scan those assertions and look for contradictions. It's going to scan your code, finding those assertions and try to generate a proof that your code is sound before you run it.
And just as type checkers can use type hints to generate proofs, json-syntax can unpack such assertions and write a converter based on the structure of the data.
This document won't go into how type checkers use hints, and mypy and Pyre both have tutorials. In a nutshell, though, you can put hints in your function signatures.
For what we're trying to do, which is describe your data so you can convert it to and from JSON, the nicest way is through either the attrs package or the (since 3.7) standard dataclasses package. They're similar because dataclasses
is a standardized attrs
. It typically looks something like this:
@attr.s(auto_attribs=True)
class Employee:
name: str
widgets_made: int
# Don't actually mix attrs and dataclasses,
# this is just to show they're similar.
@dataclass
class Department:
name: str
budget: float
staff: List[Employee]
@property
def widgets_made(self):
return sum(peon.widgets_made for peon in staff)
And what they do is write the __dunder__
methods for you:
>>> Employee('Bob', 55) # __init__ and __repr__
Employee('Bob', 55)
>>> Employee('Bob', 55) == Employe('Bob', 55) # comparisons
True
>>> {Employee('Bob', 55), Employee('Liz', 56)} # __hash__
{Employee('Bob', 55), Employee('Liz', 56)}
That said, the type hints don't enforce anything by themselves:
>>> Employee(name=123, widgets_made='wat?')
Employee(name=123, widgets_made='wat?')
But mypy and Pyre4 can use them to check the correctness of your code, and json-syntax uses them to write converters for you.
Let's ask Python:
>>> issubclass(List[int], list)
TypeError: issubclass() arg 1 must be a class
>>> isinstance([1, 2, 3], List[int])
TypeError: Subscripted generics cannot be used with class and instance checks
>>> List[int]([1, 2, 3])
TypeError: Type List cannot be instantiated; use list() instead
>>> type(List[int])
<class 'typing._GenericAlias'>
Generic types are special objects that describe types, but there's a twist. Let's check the method-resolution order of List[int]
to list all the known base classes:
>>> List[int].mro()
[<class 'list'>, <class 'object'>]
The mro
method is only defined on type
s, and it turns out List[int]
does inherit from list
. Weirder still:
>>> class MyList(List[int]):
... def average(self):
... return sum(self) / len(self)
>>> MyList([1, 2, 3]).average()
2
>>> MyList.mro()
[<class '__main__.MyList'>, <class 'list'>, <class 'typing.Generic'>, <class 'object'>]
So it's valid for your own class to inherit from List[int]
, whereupon it will behave like a list
.
Your type checker can then enforce that your code only stores int
s in that class for you.
At the time of writing, inheriting from a generic type won't work with json-syntax; we'll have to see if and how people want to use that.
As an example, let's suppose we have a type hint Set[date]
and we want to convert that back and forth between the Python representation and a reasonable2 JSON representation.
>>> json.loads('["2020-02-02", "2020-03-03", "2020-04-04"]')
['2020-02-02', '2020-03-03', '2020-04-04']
We want a decoder that will convert this to a Python set. And json-syntax will write us a function to do that based on the type hints:
decoder = lookup(verb='json_to_python', typ=Set[date])
# Should result in equivalent to:
def decoder(value):
return {date.fromisoformat(elem) for elem in data}
# And so we get our desired python values:
>>> decoder(['2020-02-02', '2020-03-03', '2020-04-04'])
{date(2020, 2, 2), date(2020, 3, 3), date(2020, 4, 4)}
The algorithm can be visualized as transforming one tree into another.
Type convert_type
/ \ ---> / \
Type Type convert_type convert_type
Set convert_set
| ----> |
date convert_date
We can deconstruct complex types, like an attrs
class:
>>> [(a.name, a.type) for a in attrs.fields(Employee)]
[('name', str), ('widgets_made', int)]
Back to our example:
decoder = lookup(verb='json_to_python', typ=Set[date])
We first need to take apart that generic Set[date]
:
>>> from typing import Set
>>> Set[date].__origin__
set
>>> Set[date].__args__
(date,)
We know it's a python set
of something, and that it takes a single argument date
.
The sets
rule catches that we're dealing with a set, but it doesn't know how date
s work, so it internally calls:
inner = lookup(verb='json_to_python', typ=date)
The dates
rule knows that date
is an atom, it has no inner types to deal with. So it can simply return:
def convert_date(value):
return date.fromisoformat(value)
The date.fromisoformat
method will parse a correctly formatted str
to a date
.
Now we're back in the sets
rule and it knows that in the JSON representation it will have a list
of something that it should convert to a set
. Its action is a little less elegant than our original set comprehension:
def convert_set(value, inner):
return set(map(inner, value))
We use the functools.partial builtin3 to put this together, and wind up with an expression like:
decoder = partial(convert_set, inner=convert_date)
# Same as:
def decoder(value):
return convert_set(value, inner=convert_date)
Some of the generic types are generic versions of abstract base classes from collections
and others, which can be used to write custom classes, or to declare as little as possible. In the latter case, if your function just uses for
to walk through the contents of an argument, it could hint that argument with Iterable[Whatever]
.
This package doesn't have any standard rules supporting abstract types, as they seem like they'd suit specific use cases.
Type variables are used to allow types to change in lockstep. You might define a function first
like this:
T = TypeVar('T')
def first(elems: Iterable[T]) -> T:
for elem in elems:
return elem
The T
may be different when the function is invoked in different contexts, but a type checker could infer from this that if a: Set[str]
and b = first(a)
that b
's type is str
.
You can create a generic user-defined class with type variables. This package doesn't support type variables yet.
@dataclass
class Pair(Generic[T]):
a: T
b: Set[T]
@dataclass
class Info:
x: Pair[int]
y: Pair[str]
# Effectively the same as:
@dataclass
class PairInt:
a: int
b: Set[int]
@dataclass
class PairStr:
a: str
b: Set[str]
@dataclass
class Info:
x: PairInt
y: PairStr
The Union
generic type lets you select alternate types, and this is supported by json-syntax. There are some caveats, mentioned in the top level README.
1: It's trivial to write an encoder that asks Python types to convert themselves to JSON, and attrs
, simplejson
and other libraries support this. Writing the decoder is trickier because you have to reconstruct that information. It can be done, it's how we did it before writing this library, but our experience was that it became a giant kludge over time.↩
2: This package defines "reasonable" as representing a set of dates as a JSON array of strings in the common ISO8601 format. You may have different needs, so you can swap in your own rules, and please submit a PR if you think they're addressing a broader need.↩
3: Using partial
ensures that the converter can be pickled; not sure at this time if that's really helpful but it's easy to do. It should also make an explain
function relatively easy to write.↩
4: Pyre only seems to support dataclasses
.↩