single |
mwparserfromhell
================
**mwparserfromhell** (the *MediaWiki Parser from Hell*) is a Python package
that provides an easy-to-use and outrageously powerful parser for
MediaWiki_
wikicode. It supports Python 3.9+.
Developed by Earwig_ with contributions from `Σ`_, Legoktm_, and others.
Full documentation is available on ReadTheDocs_. Development occurs on
GitHub_.
Installation
------------
The easiest way to install the parser is from `PyPI`_; you can install the
latest release with pip install mwparserfromhell.
Prebuilt wheels are available on PyPI with a fast, compiled C tokenizer
extension for most environments (Linux x86_64 and arm64, macOS x86_64 and
arm64, Windows x86 and x86_64). If building from source and the C tokenizer
cannot be built, you can fall back to the slower pure-Python implementation
by
setting the environment variable ``WITH_EXTENSION=0`` when installing.
To get the latest development version (with `uv`_):
.. code-block:: sh
git clone https://github.com/earwig/mwparserfromhell.git
cd mwparserfromhell
uv sync
uv run python -c 'import mwparserfromhell;
print(mwparserfromhell.__version__)'
The comprehensive test suite can be run with pytest. If using uv, pass
``--reinstall-package`` so updates to the extension module are properly
tested:
.. code-block:: sh
uv run --reinstall-package mwparserfromhell pytest
Usage
-----
Normal usage is rather straightforward (where text is page text):
.. code-block:: python
>>> import mwparserfromhell
>>> wikicode = mwparserfromhell.parse(text)
wikicode is a ``mwparserfromhell.Wikicode`` object, which acts like an
ordinary str object with some extra methods. For example:
.. code-block:: python
>>> text = "I has a template! {{foo|bar|baz|eggs=spam}} See it?"
>>> wikicode = mwparserfromhell.parse(text)
>>> print(wikicode)
I has a template! {{foo|bar|baz|eggs=spam}} See it?
>>> templates = wikicode.filter_templates()
>>> print(templates)
['{{foo|bar|baz|eggs=spam}}']
>>> template = templates[0]
>>> print(template.name)
foo
>>> print(template.params)
['bar', 'baz', 'eggs=spam']
>>> print(template.get(1).value)
bar
>>> print(template.get("eggs").value)
spam
Since nodes can contain other nodes, getting nested templates is trivial:
.. code-block:: python
>>> text = "{{foo|{{bar}}={{baz|{{spam}}}}}}"
>>> mwparserfromhell.parse(text).filter_templates()
['{{foo|{{bar}}={{baz|{{spam}}}}}}', '{{bar}}', '{{baz|{{spam}}}}',
'{{spam}}']
You can also pass ``recursive=False to filter_templates()`` and explore
templates manually. This is possible because nodes can contain additional
Wikicode objects:
.. code-block:: python
>>> code = mwparserfromhell.parse("{{foo|this {{includes a|template}}}}")
>>> print(code.filter_templates(recursive=False))
['{{foo|this {{includes a|template}}}}']
>>> foo = code.filter_templates(recursive=False)[0]
>>> print(foo.get(1).value)
this {{includes a|template}}
>>> print(foo.get(1).value.filter_templates()[0])
{{includes a|template}}
>>> print(foo.get(1).value.filter_templates()[0].get(1).value)
template
|