Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Complete PEP 670 #2156

Merged
merged 4 commits into from
Nov 23, 2021
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
172 changes: 126 additions & 46 deletions pep-0670.rst
Original file line number Diff line number Diff line change
Expand Up @@ -60,13 +60,13 @@ The `GCC documentation
<https://gcc.gnu.org/onlinedocs/cpp/Macro-Pitfalls.html>`_ lists several
common macro pitfalls:

- Misnesting
- Operator precedence problems
- Swallowing the semicolon
- Duplication of side effects
- Self-referential macros
- Argument prescan
- Newlines in arguments
- Misnesting;
- Operator precedence problems;
- Swallowing the semicolon;
- Duplication of side effects;
- Self-referential macros;
- Argument prescan;
- Newlines in arguments.


Performance and inlining
Expand All @@ -77,19 +77,39 @@ compilers have efficient heuristics to decide if a function should be
inlined or not.

When a C compiler decides to not inline, there is likely a good reason.
For example, inlining would reuse a register which require to
save/restore the register value on the stack and so increase the stack
memory usage or be less efficient.
For example, inlining would reuse a register which requires to
save/restore the register value on the stack and so increases the stack
memory usage, or be less efficient.


Debug build
-----------

When Python is built in debug mode, most compiler optimizations are
disabled. For example, Visual Studio disables inlining. Benchmarks must
not be run on a Python debug build, only on release build: using LTO and
PGO is recommended for reliable benchmarks. PGO helps the compiler to
decide if function should be inlined or not.
Benchmarks must not be run on a Python debug build, only on release
build. Moreover, using LTO and PGO optimizations is recommended for best
performances and reliable benchmarks. PGO helps the compiler to decide
if function should be inlined or not.

``./configure --with-pydebug`` uses the ``-Og`` compiler option if it's
supported by the compiler (GCC and LLVM clang support it): optimize
debugging experience. Otherwise, the ``-O0`` compiler option is used:
disable most optimizations.

With GCC 11, ``gcc -Og`` can inline static inline functions, whereas
``gcc -O0`` does not inline static inline functions. Examples:

* Call ``Py_INCREF()`` in ``PyBool_FromLong()``:

* ``gcc -Og``: inlined
* ``gcc -O0``: not inlined, call ``Py_INCREF()`` function

* Call ``_PyErr_Occurred()`` in ``_Py_CheckFunctionResult()``:

* ``gcc -Og``: inlined
* ``gcc -O0``: not inlined, call ``_PyErr_Occurred()`` function

On Windows, when Python is built in debug mode by Visual Studio, static
inline functions are not inlined.


Force inlining
Expand Down Expand Up @@ -154,6 +174,11 @@ functions should be measured with benchmarks. If there is a significant
slowdown, there should be a good reason to do the conversion. One reason
can be hiding implementation details.

To avoid any risk of performance slowdown on Python built without LTO,
it is possible to keep a private static inline function in the internal
C API and use it in Python, but expose a regular function in the public
C API.

Using static inline functions in the internal C API is fine: the
internal C API exposes implementation details by design and should not be
used outside Python.
Expand All @@ -164,8 +189,8 @@ Cast to PyObject*
When a macro is converted to a function and the macro casts its
arguments to ``PyObject*``, the new function comes with a new macro
which cast arguments to ``PyObject*`` to prevent emitting new compiler
warnings. So the converted functions still accept pointers to structures
inheriting from ``PyObject`` (ex: ``PyTupleObject``).
warnings. So the converted functions still accept pointers to other
structures inheriting from ``PyObject`` (ex: ``PyTupleObject``).

For example, the ``Py_TYPE(obj)`` macro casts its ``obj`` argument to
``PyObject*``::
Expand Down Expand Up @@ -224,9 +249,47 @@ the macro.
People using macros should be considered "consenting adults". People who
feel unsafe with macros should simply not use them.

The idea was rejected because macros are error prone and it is too easy
to miss a macro pitfall when writing a macro. Moreover, macros are
harder to read and to maintain than functions.


Examples of hard to read macros
===============================

PyObject_INIT()
---------------

Example showing the usage of commas in a macro which has a return value.

Python 3.7 macro::

#define PyObject_INIT(op, typeobj) \
( Py_TYPE(op) = (typeobj), _Py_NewReference((PyObject *)(op)), (op) )

Python 3.8 function (simplified code)::

static inline PyObject*
_PyObject_INIT(PyObject *op, PyTypeObject *typeobj)
{
Py_TYPE(op) = typeobj;
_Py_NewReference(op);
return op;
}

#define PyObject_INIT(op, typeobj) \
_PyObject_INIT(_PyObject_CAST(op), (typeobj))

* The function doesn't need the line continuation character ``"\"``.
* It has an explicit ``"return op;"`` rather than the surprising
``", (op)"`` syntax at the end of the macro.
* It uses short statements on multiple lines, rather than being written
as a single long line.
* Inside the function, the *op* argument has the well defined type
``PyObject*`` and so doesn't need casts like ``(PyObject *)(op)``.
* Arguments don't need to be put inside parenthesis: use ``typeobj``,
rather than ``(typeobj)``.

_Py_NewReference()
------------------

Expand Down Expand Up @@ -254,35 +317,6 @@ Python 3.8 function (simplified code)::
Py_REFCNT(op) = 1;
}

PyObject_INIT()
---------------

Example showing the usage of commas in a macro.

Python 3.7 macro::

#define PyObject_INIT(op, typeobj) \
( Py_TYPE(op) = (typeobj), _Py_NewReference((PyObject *)(op)), (op) )

Python 3.8 function (simplified code)::

static inline PyObject*
_PyObject_INIT(PyObject *op, PyTypeObject *typeobj)
{
Py_TYPE(op) = typeobj;
_Py_NewReference(op);
return op;
}

#define PyObject_INIT(op, typeobj) \
_PyObject_INIT(_PyObject_CAST(op), (typeobj))

The function doesn't need the line continuation character. It has an
explicit ``"return op;"`` rather than a surprising ``", (op)"`` at the
end of the macro. It uses one short statement per line, rather than a
single long line. Inside the function, the *op* argument has a well
defined type: ``PyObject*``.


Macros converted to functions since Python 3.8
==============================================
Expand Down Expand Up @@ -346,6 +380,52 @@ private static inline function has been added to the internal C API:
* ``_PyVectorcall_FunctionInline()``


Benchmarks
==========

Benchmarks run on Fedora 35 (Linux) with GCC 11 on a laptop with 8
logical CPUs (4 physical CPU cores).


gcc -O0 versus gcc -Og
----------------------

Benchmark of the ``./python -m test -j10`` command on a Python debug
build:

* ``gcc -Og``: 220 sec ± 3 sec
* ``gcc -O0``: 360 sec ± 6 sec

Python built with ``gcc -O0`` is **1.6x slower** than Python built with
``gcc -Og``.

Replace macros with static inline functions
-------------------------------------------

The `PR 29728 <https://github.com/python/cpython/pull/29728>`_ replaces
existing the following static inline functions with macros:

* ``PyObject_TypeCheck()``
* ``PyType_Check()``, ``PyType_CheckExact()``
* ``PyType_HasFeature()``
* ``PyVectorcall_NARGS()``
* ``Py_DECREF()``, ``Py_XDECREF()``
* ``Py_INCREF()``, ``Py_XINCREF()``
* ``Py_IS_TYPE()``
* ``Py_NewRef()``
* ``Py_REFCNT()``, ``Py_TYPE()``, ``Py_SIZE()``

Benchmark of the ``./python -m test -j10`` command on a Python debug
build:

* Macros (PR 29728), ``gcc -O0``: 345 sec ± 5 sec
* Static inline functions (reference), ``gcc -O0``: 360 sec ± 6 sec

Replacing macros with static inline functions makes Python
**1.04x slower** when the compiler **does not inline** static inline
functions.


References
==========

Expand Down