Skip to content

Markup HTML

HTML manipulation and sanitization utilities.

HTML Module

HTML processing utilities.

This submodule provides utilities for cleaning, escaping, and manipulating HTML content.

Examples

from rite.markup.html import ( ... html_clean, ... html_escape, ... html_unescape ... ) html_clean("

Hello

") 'Hello'

Modules

html_clean

HTML Cleaner

Remove HTML tags from text.

Examples

from rite.markup.html import html_clean html_clean("

Hello World

") 'Hello World'

Functions

html_clean(raw_html: str, strip: bool = True) -> str

Remove HTML tags from string.

Parameters:

Name Type Description Default
raw_html str

Raw HTML string to clean.

required
strip bool

Strip whitespace from result.

True

Returns:

Type Description
str

Cleaned text without HTML tags.

Examples:

>>> html_clean("<p>Hello</p>")
'Hello'
>>> html_clean("<div>  Text  </div>", strip=False)
'  Text  '
Notes

Uses regex to remove tags. Does not parse HTML structure.

html_escape

HTML Escaper

Escape special HTML characters.

Examples

from rite.markup.html import html_escape html_escape("

Hello & goodbye
") '<div>Hello & goodbye</div>'

Functions

html_escape(text: str) -> str

Escape special HTML characters.

Parameters:

Name Type Description Default
text str

Text to escape.

required

Returns:

Type Description
str

HTML-escaped text.

Examples:

>>> html_escape("5 < 10 & 10 > 5")
'5 &lt; 10 &amp; 10 &gt; 5'
>>> html_escape('"quoted"')
'&quot;quoted&quot;'
Notes

Escapes: &, <, >, ", ' Uses html.escape from standard library.

html_strip_tags

HTML Tag Stripper

Strip specific HTML tags.

Examples

from rite.markup.html import html_strip_tags html_strip_tags("

Keep

", ["script"]) '

Keep

'

Functions

html_strip_tags(html: str, tags: list[str]) -> str

Strip specific HTML tags and their content.

Parameters:

Name Type Description Default
html str

HTML string.

required
tags list[str]

List of tag names to strip.

required

Returns:

Type Description
str

HTML with specified tags removed.

Examples:

>>> html_strip_tags("<div>Keep</div><style>Remove</style>", ["style"])
'<div>Keep</div>'
>>> html_strip_tags(
...     "<p>Text</p><script>alert()</script>",
...     ["script", "style"]
... )
'<p>Text</p>'
Notes

Removes both opening and closing tags plus content. Case-insensitive tag matching.

html_unescape

HTML Unescaper

Unescape HTML entities.

Examples

from rite.markup.html import html_unescape html_unescape("<div>Hello</div>") '

Hello
'

Functions

html_unescape(text: str) -> str

Unescape HTML entities.

Parameters:

Name Type Description Default
text str

HTML-escaped text.

required

Returns:

Type Description
str

Unescaped text.

Examples:

>>> html_unescape("&lt;p&gt;Hello&lt;/p&gt;")
'<p>Hello</p>'
>>> html_unescape("&amp;")
'&'
Notes

Converts entities like < back to <. Uses html.unescape from standard library.

options: show_root_heading: true show_source: false heading_level: 2