urlstd
is a Python implementation of the WHATWG URL Living Standard.
This library provides URL
class, URLSearchParams
class, and low-level APIs that comply with the URL specification.
-
- class urlstd.parse.
URL(url: str, base: Optional[str | URL] = None)
- canParse: classmethod
can_parse(url: str, base: Optional[str | URL] = None) -> bool
- stringifier:
__str__() -> str
- href:
readonly property href: str
- origin:
readonly property origin: str
- protocol:
property protocol: str
- username:
property username: str
- password:
property password: str
- host:
property host: str
- hostname:
property hostname: str
- port:
property port: str
- pathname:
property pathname: str
- search:
property search: str
- searchParams:
readonly property search_params: URLSearchParams
- hash:
property hash: str
- URL equivalence:
__eq__(other: Any) -> bool
andequals(other: URL, exclude_fragments: bool = False) → bool
- canParse: classmethod
- class urlstd.parse.
-
- class urlstd.parse.
URLSearchParams(init: Optional[str | Sequence[Sequence[str | int | float]] | dict[str, str | int | float] | URLRecord | URLSearchParams] = None)
- size:
__len__() -> int
- append:
append(name: str, value: str | int | float) -> None
- delete:
delete(name: str, value: Optional[str | int | float] = None) -> None
- get:
get(name: str) -> str | None
- getAll:
get_all(name: str) -> tuple[str, ...]
- has:
has(name: str, value: Optional[str | int | float] = None) -> bool
- set:
set(name: str, value: str | int | float) -> None
- sort:
sort() -> None
- iterable<USVString, USVString>:
__iter__() -> Iterator[tuple[str, str]]
- stringifier:
__str__() -> str
- size:
- class urlstd.parse.
-
Low-level APIs
-
- urlstd.parse.
parse_url(urlstring: str, base: Optional[str | URLRecord] = None, encoding: str = "utf-8") -> URLRecord
- urlstd.parse.
-
- class urlstd.parse.
BasicURLParser
- classmethod
parse(urlstring: str, base: Optional[URLRecord] = None, encoding: str = "utf-8", url: Optional[URLRecord] = None, state_override: Optional[URLParserState] = None) -> URLRecord
- classmethod
- class urlstd.parse.
-
- class urlstd.parse.
URLRecord
- scheme:
property scheme: str = ""
- username:
property username: str = ""
- password:
property password: str = ""
- host:
property host: Optional[str | int | tuple[int, ...]] = None
- port:
property port: Optional[int] = None
- path:
property path: list[str] | str = []
- query:
property query: Optional[str] = None
- fragment:
property fragment: Optional[str] = None
- origin:
readonly property origin: Origin | None
- is special:
is_special() -> bool
- is not special:
is_not_special() -> bool
- includes credentials:
includes_credentials() -> bool
- has an opaque path:
has_opaque_path() -> bool
- cannot have a username/password/port:
cannot_have_username_password_port() -> bool
- URL serializer:
serialize_url(exclude_fragment: bool = False) -> str
- host serializer:
serialize_host() -> str
- URL path serializer:
serialize_path() -> str
- URL equivalence:
__eq__(other: Any) -> bool
andequals(other: URLRecord, exclude_fragments: bool = False) → bool
- scheme:
- class urlstd.parse.
-
Hosts (domains and IP addresses)
- class urlstd.parse.
IDNA
- domain to ASCII: classmethod
domain_to_ascii(domain: str, be_strict: bool = False) -> str
- domain to Unicode: classmethod
domain_to_unicode(domain: str, be_strict: bool = False) -> str
- domain to ASCII: classmethod
- class urlstd.parse.
Host
- host parser: classmethod
parse(host: str, is_not_special: bool = False) -> str | int | tuple[int, ...]
- host serializer: classmethod
serialize(host: str | int | Sequence[int]) -> str
- host parser: classmethod
- class urlstd.parse.
-
- urlstd.parse.
string_percent_decode(s: str) -> bytes
- urlstd.parse.
-
- urlstd.parse.
string_percent_encode(s: str, safe: str, encoding: str = "utf-8", space_as_plus: bool = False) -> str
- urlstd.parse.
-
application/x-www-form-urlencoded parser
- urlstd.parse.
parse_qsl(query: bytes) -> list[tuple[str, str]]
- urlstd.parse.
-
application/x-www-form-urlencoded serializer
- urlstd.parse.
urlencode(query: Sequence[tuple[str, str]], encoding: str = "utf-8") -> str
- urlstd.parse.
-
Validation
- class urlstd.parse.
HostValidator
- valid host string: classmethod
is_valid(host: str) -> bool
- valid domain string: classmethod
is_valid_domain(domain: str) -> bool
- valid IPv4-address string: classmethod
is_valid_ipv4_address(address: str) -> bool
- valid IPv6-address string: classmethod
is_valid_ipv6_address(address: str) -> bool
- valid host string: classmethod
- class urlstd.parse.
URLValidator
- valid URL string: classmethod
is_valid(urlstring: str, base: Optional[str | URLRecord] = None, encoding: str = "utf-8") -> bool
- valid URL-scheme string: classmethod
is_valid_url_scheme(value: str) -> bool
- valid URL string: classmethod
- class urlstd.parse.
-
-
Compatibility with standard library
urllib
-
urlstd.parse.
urlparse(urlstring: str, base: str = None, encoding: str = "utf-8", allow_fragments: bool = True) -> urllib.parse.ParseResult
urlstd.parse.urlparse()
ia an alternative tourllib.parse.urlparse()
. Parses a string representation of a URL using the basic URL parser, and returnsurllib.parse.ParseResult
.
-
To parse a string into a URL
:
from urlstd.parse import URL
URL('http://user:pass@foo:21/bar;par?b#c')
# → <URL(href='http://user:pass@foo:21/bar;par?b#c', origin='http://foo:21', protocol='http:', username='user', password='pass', host='foo:21', hostname='foo', port='21', pathname='/bar;par', search='?b', hash='#c')>
To parse a string into a URL
with using a base URL:
url = URL('?ffi&🌈', base='http://example.org')
url # → <URL(href='http://example.org/?%EF%AC%83&%F0%9F%8C%88', origin='http://example.org', protocol='http:', username='', password='', host='example.org', hostname='example.org', port='', pathname='/', search='?%EF%AC%83&%F0%9F%8C%88', hash='')>
url.search # → '?%EF%AC%83&%F0%9F%8C%88'
params = url.search_params
params # → URLSearchParams([('ffi', ''), ('🌈', '')])
params.sort()
params # → URLSearchParams([('🌈', ''), ('ffi', '')])
url.search # → '?%F0%9F%8C%88=&%EF%AC%83='
str(url) # → 'http://example.org/?%F0%9F%8C%88=&%EF%AC%83='
To validate a URL string:
from urlstd.parse import URL, URLValidator, ValidityState
URL.can_parse('https://user:password@example.org/') # → True
URLValidator.is_valid('https://user:password@example.org/') # → False
validity = ValidityState()
URLValidator.is_valid('https://user:password@example.org/', validity=validity)
validity.valid # → False
validity.validation_errors # → 1
validity.descriptions[0] # → "invalid-credentials: input includes credentials: 'https://user:password@example.org/' at position 21"
URL.can_parse('file:///C|/demo') # → True
URLValidator.is_valid('file:///C|/demo') # → False
validity = ValidityState()
URLValidator.is_valid('file:///C|/demo', validity=validity) # → False
validity.valid # → False
validity.validation_errors # → 1
validity.descriptions[0] # → "invalid-URL-unit: code point is found that is not a URL unit: U+007C (|) in 'file:///C|/demo' at position 9"
To parse a string into a urllib.parse.ParseResult
with using a base URL:
import html
from urllib.parse import unquote
from urlstd.parse import urlparse
pr = urlparse('?aÿb', base='http://example.org/foo/', encoding='utf-8')
pr # → ParseResult(scheme='http', netloc='example.org', path='/foo/', params='', query='a%C3%BFb', fragment='')
unquote(pr.query) # → 'aÿb'
pr = urlparse('?aÿb', base='http://example.org/foo/', encoding='windows-1251')
pr # → ParseResult(scheme='http', netloc='example.org', path='/foo/', params='', query='a%26%23255%3Bb', fragment='')
unquote(pr.query, encoding='windows-1251') # → 'aÿb'
html.unescape('aÿb') # → 'aÿb'
pr = urlparse('?aÿb', base='http://example.org/foo/', encoding='windows-1252')
pr # → ParseResult(scheme='http', netloc='example.org', path='/foo/', params='', query='a%FFb', fragment='')
unquote(pr.query, encoding='windows-1252') # → 'aÿb'
urlstd
uses standard library logging for validation error.
Change the logger log level of urlstd
if needed:
logging.getLogger('urlstd').setLevel(logging.ERROR)
- icupy >= 0.11.0 (pre-built packages are available)
icupy
requirements:- ICU4C (ICU - International Components for Unicode) - latest version recommended
- C++17 compatible compiler (see supported compilers)
- CMake >= 3.7
-
Configuring environment variables for icupy (ICU):
-
Windows:
-
Set the
ICU_ROOT
environment variable to the root of the ICU installation (default isC:\icu
). For example, if the ICU is located inC:\icu4c
:set ICU_ROOT=C:\icu4c
or in PowerShell:
$env:ICU_ROOT = "C:\icu4c"
-
To verify settings using icuinfo (64 bit):
%ICU_ROOT%\bin64\icuinfo
or in PowerShell:
& $env:ICU_ROOT\bin64\icuinfo
-
-
Linux/POSIX:
-
If the ICU is located in a non-regular place, set the
PKG_CONFIG_PATH
andLD_LIBRARY_PATH
environment variables. For example, if the ICU is located in/usr/local
:export PKG_CONFIG_PATH=/usr/local/lib/pkgconfig:$PKG_CONFIG_PATH export LD_LIBRARY_PATH=/usr/local/lib:$LD_LIBRARY_PATH
-
To verify settings using pkg-config:
$ pkg-config --cflags --libs icu-uc -I/usr/local/include -L/usr/local/lib -licuuc -licudata
-
-
-
Installing from PyPI:
pip install urlstd
Install dependencies:
pipx install tox
# or
pip install --user tox
To run tests and generate a report:
git clone https://github.com/miute/urlstd.git
cd urlstd
tox -e wpt
See result: tests/wpt/report.html