python-url-normalize
Port variant v13
Summary URL normalization for Python (3.13)
Package version 3.0.0
Homepage https://github.com/niksite/url-normalize
Keywords python
Maintainer Python Automaton
License Not yet specified
Other variants v14
Ravenports Buildsheet | History
Ravensource Port Directory | History
Last modified 22 MAY 2026, 14:41:05 UTC
Port created 25 APR 2024, 22:29:23 UTC
Subpackage Descriptions
single # url-normalize [tests] [Coveralls] [PyPI] [Python Versions] [License] [Ruff] A Python library for standardizing and normalizing URLs. Ideal for database deduplication, caching, web crawling, and anywhere you need to ensure that equivalent URLs resolve to the exact same string. ```python from url_normalize import url_normalize # Fixes IDN, lowercases host/scheme, removes default ports, resolves path segments url_normalize("HTTP://User:Pass@www.FOO.com:80///foo/../bar/./baz?q=1#frag") # -> 'http://User:Pass@www.foo.com/bar/baz?q=1#frag' ``` ## Features url-normalize provides a robust URI normalization function that handles IDN domains, scheme/host lowercasing, and RFC-compliant path normalization. - **IDN Support**: Full internationalized domain name handling (using IDNA2008 with UTS46). - **Humanization**: Convert normalized URLs to a readable display format while preserving round-trip normalization. - **RFC Compliance**: - Proper percent-encoding (minimal, uppercase hex). - Dot-segment removal in paths. - Default port and authority handling. - UTF-8 NFC normalization. - **Configurable Defaults**: - Customizable default scheme (https by default). - Configurable default domain for absolute paths. - **Query Parameter Control**: - Parameter filtering with allowlists. - Support for domain-specific parameter rules. - **Versatile URL Handling**: Handles empty strings, double-slash URLs (//domain.tld), and shebang (#!) URLs. - **Developer Friendly**: - Python 3.10+ compatibility. - 100% test coverage. - Modern type hints and string handling. Inspired by Sam Ruby's [urlnorm.py]. ## Installation Install as a library: ```sh pip install url-normalize ``` Or install as a standalone CLI tool using [uv]: ```sh uv tool install url-normalize ``` ## Usage ### Python API #### Basic Normalization ```python from url_normalize import url_normalize # Basic normalization (uses https by default) print(url_normalize("www.foo.com:80/foo")) # Output: https://www.foo.com/foo # With custom default scheme print(url_normalize("www.foo.com/foo", default_scheme="http")) # Output: http://www.foo.com/foo ``` #### Query Parameter Filtering You can strip out tracking parameters and only keep the ones you care about using allowlists. ```python # With query parameter filtering enabled (strips all params by default) print(url_normalize("www.google.com/search?q=test&utm_source=test", filter_params=True)) # Output: https://www.google.com/search?q=test # With custom parameter allowlist as a list print(url_normalize( "example.com?page=1&id=123&ref=test", filter_params=True, param_allowlist=["page", "id"] ))
Configuration Switches (platform-specific settings discarded)
PY313 ON Build using Python 3.13 PY314 OFF Build using Python 3.14
Package Dependencies by Type
Build (only) python313:dev:std
python-pip:single:v13
autoselect-python:single:std
Build and Runtime python313:primary:std
Runtime (only) python-idna:single:v13
Download groups
main mirror://PYPIWHL/13/8a/f72344eab18674fd7b174f35abbce41ed88fea72927f111726732d0ca779
Distribution File Information
95234bd359f86831c1fd87c248877f2a6887db2f3b5087120083f2fffcba4889 16854 python-src/url_normalize-3.0.0-py3-none-any.whl
Ports that require python-url-normalize:v13
python-requests-cache:v13 Persistent cache for python requests (3.13)