Markup HTML¶
HTML manipulation and sanitization utilities.
HTML Module¶
HTML processing utilities.
This submodule provides utilities for cleaning, escaping, and manipulating HTML content.
Examples¶
from rite.markup.html import ( ... html_clean, ... html_escape, ... html_unescape ... ) html_clean("
Hello
") 'Hello'
Modules¶
html_clean
¶
HTML Cleaner¶
Remove HTML tags from text.
Examples¶
from rite.markup.html import html_clean html_clean("
Hello World
") 'Hello World'
Functions¶
html_clean(raw_html: str, strip: bool = True) -> str
¶
Remove HTML tags from string.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
raw_html
|
str
|
Raw HTML string to clean. |
required |
strip
|
bool
|
Strip whitespace from result. |
True
|
Returns:
| Type | Description |
|---|---|
str
|
Cleaned text without HTML tags. |
Examples:
Notes
Uses regex to remove tags. Does not parse HTML structure.
html_escape
¶
HTML Escaper¶
Escape special HTML characters.
Examples¶
from rite.markup.html import html_escape html_escape("
Hello & goodbye") '<div>Hello & goodbye</div>'
Functions¶
html_escape(text: str) -> str
¶
Escape special HTML characters.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
text
|
str
|
Text to escape. |
required |
Returns:
| Type | Description |
|---|---|
str
|
HTML-escaped text. |
Examples:
>>> html_escape("5 < 10 & 10 > 5")
'5 < 10 & 10 > 5'
>>> html_escape('"quoted"')
'"quoted"'
Notes
Escapes: &, <, >, ", ' Uses html.escape from standard library.
html_strip_tags
¶
HTML Tag Stripper¶
Strip specific HTML tags.
Examples¶
from rite.markup.html import html_strip_tags html_strip_tags("
Keep
", ["script"]) 'Keep
'
Functions¶
html_strip_tags(html: str, tags: list[str]) -> str
¶
Strip specific HTML tags and their content.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
html
|
str
|
HTML string. |
required |
tags
|
list[str]
|
List of tag names to strip. |
required |
Returns:
| Type | Description |
|---|---|
str
|
HTML with specified tags removed. |
Examples:
>>> html_strip_tags("<div>Keep</div><style>Remove</style>", ["style"])
'<div>Keep</div>'
>>> html_strip_tags(
... "<p>Text</p><script>alert()</script>",
... ["script", "style"]
... )
'<p>Text</p>'
Notes
Removes both opening and closing tags plus content. Case-insensitive tag matching.
html_unescape
¶
HTML Unescaper¶
Unescape HTML entities.
Examples¶
from rite.markup.html import html_unescape html_unescape("<div>Hello</div>") '
Hello'
options: show_root_heading: true show_source: false heading_level: 2