Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

PEP 686: Update based on discussion #2446

Merged
merged 6 commits into from
Mar 22, 2022
Merged
Changes from 3 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
48 changes: 19 additions & 29 deletions pep-0686.rst
Original file line number Diff line number Diff line change
Expand Up @@ -30,7 +30,7 @@ UTF-8 becomes de-facto standard text encoding.
default.
* Most websites and text data on the internet uses UTF-8.
* And many other popular programming languages including node.js, Go, Rust,
Ruby, and Java uses UTF-8 by default.
and Java uses UTF-8 by default.
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I removed Ruby here because Ruby still uses locale encoding on Unix.


Changing the default encoding to UTF-8 makes Python easier to interoperate
with them.
Expand All @@ -44,28 +44,6 @@ source files). Inconsistent default encoding caused many bugs.
Specification
=============

Changes to UTF-8 mode
---------------------

Currently, UTF-8 mode affects to ``locale.getpreferredencoding()``.

This PEP proposes to remove this override. UTF-8 mode will not affect to
``locale`` module.

After this change, UTF-8 mode affects to:

* stdin, stdout, stderr

* User can override it with ``PYTHONIOENCODING``.

* filesystem encoding

* ``TextIOWrapper`` and APIs using it including ``open()``,
``Path.read_text()``, ``subprocess.Popen(cmd, text=True)``, etc...

This change will be introduced in Python 3.11 if possible.


Enable UTF-8 mode by default
----------------------------

Expand All @@ -74,6 +52,15 @@ Python enables UTF-8 mode by default.
User can still disable UTF-8 mode by setting ``PYTHONUTF8=0`` or ``-X utf8=0``.


``locale.get_locale_encoding()``
--------------------------------

Add ``locale.get_locale_encoding()``. It is same to
``locale.getpreferredencoding(False)`` except it don't follow UTF-8 mode.

This API will be used by ``io.TextIOWrapper`` to support ``encoding="locale"`` option.


Backward Compatibility
======================

Expand All @@ -86,10 +73,14 @@ should be announced very loudly.

To resolve this backward incompatibility, users can do:

* Disable UTF-8 mode
* Disable UTF-8 mode.
* Use ``EncodingWarning`` to find where the default encoding is used and use
``encoding="locale"`` option to keep using locale encoding
``encoding="locale"`` option if locale encoding should be used
(as defined in :pep:`597`).
* Find every occurrence of ``locale.getpreferredencoding(False)`` in the
application, and replace it with ``locale.get_locale_encoding()`` if
locale encoding should be used.
* Test the application with UTF-8 mode.


Preceding examples
Expand Down Expand Up @@ -125,11 +116,10 @@ How to teach this
=================

For new users, this change reduces things that need to teach.
Users don't need to learn about text encoding in their first year.
They need to learn it when they need to use non-UTF-8 text files first time.

Users can delay learning about text encoding until they need to handle
non-UTF-8 text files.

For existing users, see `Backward compatibility`_ section.
For existing users, see the `Backward compatibility`_ section.


References
Expand Down