Ravenport: python-url-normalize

python-url-normalize

Port variant	v13
Summary	URL normalization for Python (3.13)
Package version	3.0.0
Homepage	https://github.com/niksite/url-normalize
Keywords	python
Maintainer	Python Automaton
License	Not yet specified
Other variants	v14
Ravenports	Buildsheet \| History
Ravensource	Port Directory \| History
Last modified	22 MAY 2026, 14:41:05 UTC
Port created	25 APR 2024, 22:29:23 UTC

Subpackage Descriptions

single

# url-normalize [tests] [Coveralls] [PyPI] [Python Versions] [License] [Ruff] A Python library for standardizing and normalizing URLs. Ideal for database deduplication, caching, web crawling, and anywhere you need to ensure that equivalent URLs resolve to the exact same string. ```python from url_normalize import url_normalize # Fixes IDN, lowercases host/scheme, removes default ports, resolves path segments url_normalize("HTTP://User:Pass@www.FOO.com:80///foo/../bar/./baz?q=1#frag") # -> 'http://User:Pass@www.foo.com/bar/baz?q=1#frag' ``` ## Features url-normalize provides a robust URI normalization function that handles IDN domains, scheme/host lowercasing, and RFC-compliant path normalization. - **IDN Support**: Full internationalized domain name handling (using IDNA2008 with UTS46). - **Humanization**: Convert normalized URLs to a readable display format while preserving round-trip normalization. - **RFC Compliance**: - Proper percent-encoding (minimal, uppercase hex). - Dot-segment removal in paths. - Default port and authority handling. - UTF-8 NFC normalization. - **Configurable Defaults**: - Customizable default scheme (https by default). - Configurable default domain for absolute paths. - **Query Parameter Control**: - Parameter filtering with allowlists. - Support for domain-specific parameter rules. - **Versatile URL Handling**: Handles empty strings, double-slash URLs (//domain.tld), and shebang (#!) URLs. - **Developer Friendly**: - Python 3.10+ compatibility. - 100% test coverage. - Modern type hints and string handling. Inspired by Sam Ruby's [urlnorm.py]. ## Installation Install as a library: ```sh pip install url-normalize ``` Or install as a standalone CLI tool using [uv]: ```sh uv tool install url-normalize ``` ## Usage ### Python API #### Basic Normalization ```python from url_normalize import url_normalize # Basic normalization (uses https by default) print(url_normalize("www.foo.com:80/foo")) # Output: https://www.foo.com/foo # With custom default scheme print(url_normalize("www.foo.com/foo", default_scheme="http")) # Output: http://www.foo.com/foo ``` #### Query Parameter Filtering You can strip out tracking parameters and only keep the ones you care about using allowlists. ```python # With query parameter filtering enabled (strips all params by default) print(url_normalize("www.google.com/search?q=test&utm_source=test", filter_params=True)) # Output: https://www.google.com/search?q=test # With custom parameter allowlist as a list print(url_normalize( "example.com?page=1&id=123&ref=test", filter_params=True, param_allowlist=["page", "id"] ))

Configuration Switches (platform-specific settings discarded)

PY313 ON Build using Python 3.13 PY314 OFF Build using Python 3.14

Package Dependencies by Type

Build (only)	python313:dev:std python-pip:single:v13 autoselect-python:single:std
Build and Runtime	python313:primary:std
Runtime (only)	python-idna:single:v13

Download groups

main	mirror://PYPIWHL/13/8a/f72344eab18674fd7b174f35abbce41ed88fea72927f111726732d0ca779

Distribution File Information

95234bd359f86831c1fd87c248877f2a6887db2f3b5087120083f2fffcba4889 16854 python-src/url_normalize-3.0.0-py3-none-any.whl

Ports that require python-url-normalize:v13

python-requests-cache:v13

Persistent cache for python requests (3.13)