Markup Module¶

The rite.markup module provides utilities for working with HTML, XML, and Markdown.

Overview¶

markup ¶

Markup Module¶

Comprehensive markup language processing utilities.

This module provides utilities for HTML, XML, Markdown processing, entity encoding/decoding, and content sanitization.

Submodules¶

html: HTML cleaning, escaping, unescaping, tag stripping
xml: XML escaping, unescaping, formatting
markdown: Markdown to HTML conversion, escaping
entities: HTML entity encoding and decoding
sanitize: URL, filename, and HTML sanitization

Examples¶

HTML: >>> from rite.markup import html_clean, html_escape >>> html_clean("

Hello

") 'Hello' >>> html_escape("") '<tag>'

XML

from rite.markup import xml_escape xml_escape("value") '<tag>value</tag>'

Markdown

from rite.markup import markdown_to_html markdown_to_html("bold") 'bold'

Entities

from rite.markup import entities_encode entities_encode("©") '©'

Sanitize

from rite.markup import sanitize_url sanitize_url("javascript:alert(1)") ''

Modules¶

entities ¶

Entities Module¶

HTML entity encoding and decoding utilities.

This submodule provides utilities for encoding text to HTML entities and decoding entities back to text.

Examples¶

from rite.markup.entities import ( ... entities_encode, ... entities_decode ... ) entities_encode("©") '©'

Modules¶

entities_decode ¶

Entity Decoder¶

Decode HTML entities to text.

Examples¶

from rite.markup.entities import entities_decode entities_decode("café") 'café'

Functions¶

entities_decode ¶

entities_decode(text: str) -> str

Decode HTML entities to text.

Parameters:

Name	Type	Description	Default
`text`	`str`	Entity-encoded text.	required

Returns:

Type	Description
`str`	Decoded text.

Examples:

>>> entities_decode("&#169;")
'©'
>>> entities_decode("&copy;")
'©'

Notes

Decodes both numeric (&#N;) and named (©). Uses html.unescape from standard library.

entities_encode ¶

Entity Encoder¶

Encode text to HTML entities.

Examples¶

from rite.markup.entities import entities_encode entities_encode("café") 'café'

Functions¶

entities_encode ¶

entities_encode(text: str, ascii_only: bool = False) -> str

Encode text to HTML entities.

Parameters:

Name	Type	Description	Default
`text`	`str`	Text to encode.	required
`ascii_only`	`bool`	Encode only non-ASCII characters.	`False`

Returns:

Type	Description
`str`	Entity-encoded text.

Examples:

>>> entities_encode("©")
'&#169;'
>>> entities_encode("Hello", ascii_only=True)
'Hello'

Notes

Converts characters to &#N; format. Useful for encoding special characters.

entities_decode ¶

Entity Decoder¶

Decode HTML entities to text.

Examples¶

from rite.markup.entities import entities_decode entities_decode("café") 'café'

Functions¶

entities_decode ¶

entities_decode(text: str) -> str

Decode HTML entities to text.

Parameters:

Name	Type	Description	Default
`text`	`str`	Entity-encoded text.	required

Returns:

Type	Description
`str`	Decoded text.

Examples:

>>> entities_decode("&#169;")
'©'
>>> entities_decode("&copy;")
'©'

Notes

Decodes both numeric (&#N;) and named (©). Uses html.unescape from standard library.

entities_encode ¶

Entity Encoder¶

Encode text to HTML entities.

Examples¶

from rite.markup.entities import entities_encode entities_encode("café") 'café'

Functions¶

entities_encode ¶

entities_encode(text: str, ascii_only: bool = False) -> str

Encode text to HTML entities.

Parameters:

Name	Type	Description	Default
`text`	`str`	Text to encode.	required
`ascii_only`	`bool`	Encode only non-ASCII characters.	`False`

Returns:

Type	Description
`str`	Entity-encoded text.

Examples:

>>> entities_encode("©")
'&#169;'
>>> entities_encode("Hello", ascii_only=True)
'Hello'

Notes

Converts characters to &#N; format. Useful for encoding special characters.

html ¶

HTML Module¶

HTML processing utilities.

This submodule provides utilities for cleaning, escaping, and manipulating HTML content.

Examples¶

from rite.markup.html import ( ... html_clean, ... html_escape, ... html_unescape ... ) html_clean("
Hello
") 'Hello'

Modules¶

html_clean ¶

HTML Cleaner¶

Remove HTML tags from text.

Examples¶

from rite.markup.html import html_clean html_clean("
Hello World
") 'Hello World'

Functions¶

html_clean ¶

html_clean(raw_html: str, strip: bool = True) -> str

Remove HTML tags from string.

Parameters:

Name	Type	Description	Default
`raw_html`	`str`	Raw HTML string to clean.	required
`strip`	`bool`	Strip whitespace from result.	`True`

Returns:

Type	Description
`str`	Cleaned text without HTML tags.

Examples:

>>> html_clean("<p>Hello</p>")
'Hello'
>>> html_clean("<div>  Text  </div>", strip=False)
'  Text  '

Notes

Uses regex to remove tags. Does not parse HTML structure.

html_escape ¶

HTML Escaper¶

Escape special HTML characters.

Examples¶

from rite.markup.html import html_escape html_escape("
Hello & goodbye
") '<div>Hello & goodbye</div>'

Functions¶

html_escape ¶

html_escape(text: str) -> str

Escape special HTML characters.

Parameters:

Name	Type	Description	Default
`text`	`str`	Text to escape.	required

Returns:

Type	Description
`str`	HTML-escaped text.

Examples:

>>> html_escape("5 < 10 & 10 > 5")
'5 &lt; 10 &amp; 10 &gt; 5'
>>> html_escape('"quoted"')
'&quot;quoted&quot;'

Notes

Escapes: &, <, >, ", ' Uses html.escape from standard library.

html_strip_tags ¶

HTML Tag Stripper¶

Strip specific HTML tags.

Examples¶

from rite.markup.html import html_strip_tags html_strip_tags("
Keep
", ["script"]) '
Keep
'

Functions¶

html_strip_tags ¶

html_strip_tags(html: str, tags: list[str]) -> str

Strip specific HTML tags and their content.

Parameters:

Name	Type	Description	Default
`html`	`str`	HTML string.	required
`tags`	`list[str]`	List of tag names to strip.	required

Returns:

Type	Description
`str`	HTML with specified tags removed.

Examples:

>>> html_strip_tags("<div>Keep</div><style>Remove</style>", ["style"])
'<div>Keep</div>'
>>> html_strip_tags(
...     "<p>Text</p><script>alert()</script>",
...     ["script", "style"]
... )
'<p>Text</p>'

Notes

Removes both opening and closing tags plus content. Case-insensitive tag matching.

html_unescape ¶

HTML Unescaper¶

Unescape HTML entities.

Examples¶

from rite.markup.html import html_unescape html_unescape("<div>Hello</div>") '
Hello
'

Functions¶

html_unescape ¶

html_unescape(text: str) -> str

Unescape HTML entities.

Parameters:

Name	Type	Description	Default
`text`	`str`	HTML-escaped text.	required

Returns:

Type	Description
`str`	Unescaped text.

Examples:

>>> html_unescape("&lt;p&gt;Hello&lt;/p&gt;")
'<p>Hello</p>'
>>> html_unescape("&amp;")
'&'

Notes

Converts entities like < back to <. Uses html.unescape from standard library.

html_clean ¶

HTML Cleaner¶

Remove HTML tags from text.

Examples¶

from rite.markup.html import html_clean html_clean("
Hello World
") 'Hello World'

Functions¶

html_clean ¶

html_clean(raw_html: str, strip: bool = True) -> str

Remove HTML tags from string.

Parameters:

Name	Type	Description	Default
`raw_html`	`str`	Raw HTML string to clean.	required
`strip`	`bool`	Strip whitespace from result.	`True`

Returns:

Type	Description
`str`	Cleaned text without HTML tags.

Examples:

>>> html_clean("<p>Hello</p>")
'Hello'
>>> html_clean("<div>  Text  </div>", strip=False)
'  Text  '

Notes

Uses regex to remove tags. Does not parse HTML structure.

html_escape ¶

HTML Escaper¶

Escape special HTML characters.

Examples¶

from rite.markup.html import html_escape html_escape("
Hello & goodbye
") '<div>Hello & goodbye</div>'

Functions¶

html_escape ¶

html_escape(text: str) -> str

Escape special HTML characters.

Parameters:

Name	Type	Description	Default
`text`	`str`	Text to escape.	required

Returns:

Type	Description
`str`	HTML-escaped text.

Examples:

>>> html_escape("5 < 10 & 10 > 5")
'5 &lt; 10 &amp; 10 &gt; 5'
>>> html_escape('"quoted"')
'&quot;quoted&quot;'

Notes

Escapes: &, <, >, ", ' Uses html.escape from standard library.

html_strip_tags ¶

HTML Tag Stripper¶

Strip specific HTML tags.

Examples¶

from rite.markup.html import html_strip_tags html_strip_tags("
Keep
", ["script"]) '
Keep
'

Functions¶

html_strip_tags ¶

html_strip_tags(html: str, tags: list[str]) -> str

Strip specific HTML tags and their content.

Parameters:

Name	Type	Description	Default
`html`	`str`	HTML string.	required
`tags`	`list[str]`	List of tag names to strip.	required

Returns:

Type	Description
`str`	HTML with specified tags removed.

Examples:

>>> html_strip_tags("<div>Keep</div><style>Remove</style>", ["style"])
'<div>Keep</div>'
>>> html_strip_tags(
...     "<p>Text</p><script>alert()</script>",
...     ["script", "style"]
... )
'<p>Text</p>'

Notes

Removes both opening and closing tags plus content. Case-insensitive tag matching.

html_unescape ¶

HTML Unescaper¶

Unescape HTML entities.

Examples¶

from rite.markup.html import html_unescape html_unescape("<div>Hello</div>") '
Hello
'

Functions¶

html_unescape ¶

html_unescape(text: str) -> str

Unescape HTML entities.

Parameters:

Name	Type	Description	Default
`text`	`str`	HTML-escaped text.	required

Returns:

Type	Description
`str`	Unescaped text.

Examples:

>>> html_unescape("&lt;p&gt;Hello&lt;/p&gt;")
'<p>Hello</p>'
>>> html_unescape("&amp;")
'&'

Notes

Converts entities like < back to <. Uses html.unescape from standard library.

markdown ¶

Markdown Module¶

Markdown processing utilities.

This submodule provides utilities for converting and escaping Markdown content.

Examples¶

from rite.markup.markdown import ( ... markdown_to_html, ... markdown_escape ... ) markdown_to_html("bold") 'bold'

Modules¶

markdown_escape ¶

Markdown Escape¶

Escape Markdown special characters.

Examples¶

from rite.markup.markdown import markdown_escape markdown_escape("not italic") '\not italic\'

Functions¶

markdown_escape ¶

markdown_escape(text: str) -> str

Escape Markdown special characters.

Parameters:

Name	Type	Description	Default
`text`	`str`	Text to escape.	required

Returns:

Type	Description
`str`	Escaped text.

Examples:

>>> markdown_escape("# Not a heading")
'\\# Not a heading'
>>> markdown_escape("[not](link)")
'\\[not\\]\\(link\\)'

Notes

Escapes: *, _, #, [, ], (, ), `, ~ Prevents Markdown interpretation.

markdown_to_html ¶

Markdown to HTML¶

Convert Markdown to HTML (basic).

Examples¶

from rite.markup.markdown import markdown_to_html markdown_to_html("bold text") 'bold text'

Functions¶

markdown_to_html ¶

markdown_to_html(markdown: str) -> str

Convert basic Markdown to HTML.

Parameters:

Name	Type	Description	Default
`markdown`	`str`	Markdown text.	required

Returns:

Type	Description
`str`	HTML string.

Examples:

>>> markdown_to_html("# Heading")
'<h1>Heading</h1>'
>>> markdown_to_html("**bold** and *italic*")
'<strong>bold</strong> and <em>italic</em>'

Notes

Basic conversion only. Supports: headings, bold, italic, code. For full Markdown, use external library.

markdown_escape ¶

Markdown Escape¶

Escape Markdown special characters.

Examples¶

from rite.markup.markdown import markdown_escape markdown_escape("not italic") '\not italic\'

Functions¶

markdown_escape ¶

markdown_escape(text: str) -> str

Escape Markdown special characters.

Parameters:

Name	Type	Description	Default
`text`	`str`	Text to escape.	required

Returns:

Type	Description
`str`	Escaped text.

Examples:

>>> markdown_escape("# Not a heading")
'\\# Not a heading'
>>> markdown_escape("[not](link)")
'\\[not\\]\\(link\\)'

Notes

Escapes: *, _, #, [, ], (, ), `, ~ Prevents Markdown interpretation.

markdown_to_html ¶

Markdown to HTML¶

Convert Markdown to HTML (basic).

Examples¶

from rite.markup.markdown import markdown_to_html markdown_to_html("bold text") 'bold text'

Functions¶

markdown_to_html ¶

markdown_to_html(markdown: str) -> str

Convert basic Markdown to HTML.

Parameters:

Name	Type	Description	Default
`markdown`	`str`	Markdown text.	required

Returns:

Type	Description
`str`	HTML string.

Examples:

>>> markdown_to_html("# Heading")
'<h1>Heading</h1>'
>>> markdown_to_html("**bold** and *italic*")
'<strong>bold</strong> and <em>italic</em>'

Notes

Basic conversion only. Supports: headings, bold, italic, code. For full Markdown, use external library.

sanitize ¶

Sanitize Module¶

Content sanitization utilities.

This submodule provides utilities for sanitizing URLs, filenames, and HTML content for security.

Examples¶

from rite.markup.sanitize import ( ... sanitize_url, ... sanitize_filename, ... sanitize_html ... ) sanitize_url("javascript:alert(1)") ''

Modules¶

sanitize_filename ¶

Filename Sanitizer¶

Sanitize filenames for safe filesystem use.

Examples¶

from rite.markup.sanitize import sanitize_filename sanitize_filename("file:name?.txt") 'filename.txt'

Functions¶

sanitize_filename ¶

sanitize_filename(filename: str, replacement: str = '') -> str

Sanitize filename by removing unsafe characters.

Parameters:

Name	Type	Description	Default
`filename`	`str`	Filename to sanitize.	required
`replacement`	`str`	Character to replace unsafe chars with.	`''`

Returns:

Type	Description
`str`	Safe filename.

Examples:

>>> sanitize_filename("my/file:name.txt")
'myfilename.txt'
>>> sanitize_filename("file<>name.txt", "_")
'file__name.txt'

Notes

Removes: / : * ? " < > | Preserves file extension.

sanitize_html ¶

HTML Sanitizer¶

Sanitize HTML by removing dangerous elements.

Examples¶

from rite.markup.sanitize import sanitize_html sanitize_html("
Safe
") '
Safe
'

Functions¶

sanitize_html ¶

sanitize_html(html: str, allowed_tags: list[str] | None = None) -> str

Sanitize HTML by removing dangerous tags.

Parameters:

Name	Type	Description	Default
`html`	`str`	HTML to sanitize.	required
`allowed_tags`	`list[str] \| None`	List of allowed tags (default: p, br, strong, em).	`None`

Returns:

Type	Description
`str`	Sanitized HTML.

Examples:

>>> sanitize_html("<p>Safe</p><script>Bad</script>")
'<p>Safe</p>'
>>> sanitize_html("<div>Text</div>", ["div"])
'<div>Text</div>'

Notes

Removes script, iframe, object, embed by default. Only allows whitelisted tags.

sanitize_url ¶

URL Sanitizer¶

Sanitize and validate URLs.

Examples¶

from rite.markup.sanitize import sanitize_url sanitize_url("javascript:alert('xss')") ''

Functions¶

sanitize_url ¶

sanitize_url(url: str, allowed_schemes: list[str] | None = None) -> str

Sanitize URL by checking scheme.

Parameters:

Name	Type	Description	Default
`url`	`str`	URL to sanitize.	required
`allowed_schemes`	`list[str] \| None`	Allowed URL schemes (default: http, https).	`None`

Returns:

Type	Description
`str`	Sanitized URL or empty string if invalid.

Examples:

>>> sanitize_url("https://example.com")
'https://example.com'
>>> sanitize_url("javascript:void(0)")
''
>>> sanitize_url("ftp://server.com", ["ftp"])
'ftp://server.com'

Notes

Blocks dangerous schemes like javascript:. Returns empty string for invalid URLs.

sanitize_filename ¶

Filename Sanitizer¶

Sanitize filenames for safe filesystem use.

Examples¶

from rite.markup.sanitize import sanitize_filename sanitize_filename("file:name?.txt") 'filename.txt'

Functions¶

sanitize_filename ¶

sanitize_filename(filename: str, replacement: str = '') -> str

Sanitize filename by removing unsafe characters.

Parameters:

Name	Type	Description	Default
`filename`	`str`	Filename to sanitize.	required
`replacement`	`str`	Character to replace unsafe chars with.	`''`

Returns:

Type	Description
`str`	Safe filename.

Examples:

>>> sanitize_filename("my/file:name.txt")
'myfilename.txt'
>>> sanitize_filename("file<>name.txt", "_")
'file__name.txt'

Notes

Removes: / : * ? " < > | Preserves file extension.

sanitize_html ¶

HTML Sanitizer¶

Sanitize HTML by removing dangerous elements.

Examples¶

from rite.markup.sanitize import sanitize_html sanitize_html("
Safe
") '
Safe
'

Functions¶

sanitize_html ¶

sanitize_html(html: str, allowed_tags: list[str] | None = None) -> str

Sanitize HTML by removing dangerous tags.

Parameters:

Name	Type	Description	Default
`html`	`str`	HTML to sanitize.	required
`allowed_tags`	`list[str] \| None`	List of allowed tags (default: p, br, strong, em).	`None`

Returns:

Type	Description
`str`	Sanitized HTML.

Examples:

>>> sanitize_html("<p>Safe</p><script>Bad</script>")
'<p>Safe</p>'
>>> sanitize_html("<div>Text</div>", ["div"])
'<div>Text</div>'

Notes

Removes script, iframe, object, embed by default. Only allows whitelisted tags.

sanitize_url ¶

URL Sanitizer¶

Sanitize and validate URLs.

Examples¶

from rite.markup.sanitize import sanitize_url sanitize_url("javascript:alert('xss')") ''

Functions¶

sanitize_url ¶

sanitize_url(url: str, allowed_schemes: list[str] | None = None) -> str

Sanitize URL by checking scheme.

Parameters:

Name	Type	Description	Default
`url`	`str`	URL to sanitize.	required
`allowed_schemes`	`list[str] \| None`	Allowed URL schemes (default: http, https).	`None`

Returns:

Type	Description
`str`	Sanitized URL or empty string if invalid.

Examples:

>>> sanitize_url("https://example.com")
'https://example.com'
>>> sanitize_url("javascript:void(0)")
''
>>> sanitize_url("ftp://server.com", ["ftp"])
'ftp://server.com'

Notes

Blocks dangerous schemes like javascript:. Returns empty string for invalid URLs.

xml ¶

XML Module¶

XML processing utilities.

This submodule provides utilities for escaping, unescaping, and formatting XML content.

Examples¶

from rite.markup.xml import ( ... xml_escape, ... xml_unescape ... ) xml_escape("value") '<tag>value</tag>'

Modules¶

xml_escape ¶

XML Escaper¶

Escape special XML characters.

Examples¶

from rite.markup.xml import xml_escape xml_escape("value & more") '<tag>value & more</tag>'

Functions¶

xml_escape ¶

xml_escape(text: str) -> str

Escape special XML characters.

Parameters:

Name	Type	Description	Default
`text`	`str`	Text to escape.	required

Returns:

Type	Description
`str`	XML-escaped text.

Examples:

>>> xml_escape("5 < 10 & 10 > 5")
'5 &lt; 10 &amp; 10 &gt; 5'
>>> xml_escape('"quoted"')
'&quot;quoted&quot;'

Notes

Escapes: &, <, >, ", ' Uses xml.sax.saxutils.escape.

xml_format ¶

XML Formatter¶

Format XML with proper indentation.

Examples¶

from rite.markup.xml import xml_format xml_format("text") # doctest: +SKIP '\n text\n'

Functions¶

xml_format ¶

xml_format(xml_string: str, indent: str = '  ') -> str

Format XML string with indentation.

Parameters:

Name	Type	Description	Default
`xml_string`	`str`	Unformatted XML string.	required
`indent`	`str`	Indentation string (default: 2 spaces).	`' '`

Returns:

Type	Description
`str`	Formatted XML string.

Examples:

>>> xml = "<root><child>text</child></root>"
>>> formatted = xml_format(xml)
>>> print(formatted)
<root>
  <child>text</child>
</root>

Notes

Uses xml.dom.minidom for formatting. May fail on malformed XML.

xml_unescape ¶

XML Unescaper¶

Unescape XML entities.

Examples¶

from rite.markup.xml import xml_unescape xml_unescape("<tag>value</tag>") 'value'

Functions¶

xml_unescape ¶

xml_unescape(text: str) -> str

Unescape XML entities.

Parameters:

Name	Type	Description	Default
`text`	`str`	XML-escaped text.	required

Returns:

Type	Description
`str`	Unescaped text.

Examples:

>>> xml_unescape("&lt;root&gt;&lt;/root&gt;")
'<root></root>'
>>> xml_unescape("&amp;&apos;&quot;")
"&'""

Notes

Converts entities like < back to <. Uses xml.sax.saxutils.unescape.

xml_escape ¶

XML Escaper¶

Escape special XML characters.

Examples¶

from rite.markup.xml import xml_escape xml_escape("value & more") '<tag>value & more</tag>'

Functions¶

xml_escape ¶

xml_escape(text: str) -> str

Escape special XML characters.

Parameters:

Name	Type	Description	Default
`text`	`str`	Text to escape.	required

Returns:

Type	Description
`str`	XML-escaped text.

Examples:

>>> xml_escape("5 < 10 & 10 > 5")
'5 &lt; 10 &amp; 10 &gt; 5'
>>> xml_escape('"quoted"')
'&quot;quoted&quot;'

Notes

Escapes: &, <, >, ", ' Uses xml.sax.saxutils.escape.

xml_format ¶

XML Formatter¶

Format XML with proper indentation.

Examples¶

from rite.markup.xml import xml_format xml_format("text") # doctest: +SKIP '\n text\n'

Functions¶

xml_format ¶

xml_format(xml_string: str, indent: str = '  ') -> str

Format XML string with indentation.

Parameters:

Name	Type	Description	Default
`xml_string`	`str`	Unformatted XML string.	required
`indent`	`str`	Indentation string (default: 2 spaces).	`' '`

Returns:

Type	Description
`str`	Formatted XML string.

Examples:

>>> xml = "<root><child>text</child></root>"
>>> formatted = xml_format(xml)
>>> print(formatted)
<root>
  <child>text</child>
</root>

Notes

Uses xml.dom.minidom for formatting. May fail on malformed XML.

xml_unescape ¶

XML Unescaper¶

Unescape XML entities.

Examples¶

from rite.markup.xml import xml_unescape xml_unescape("<tag>value</tag>") 'value'

Functions¶

xml_unescape ¶

xml_unescape(text: str) -> str

Unescape XML entities.

Parameters:

Name	Type	Description	Default
`text`	`str`	XML-escaped text.	required

Returns:

Type	Description
`str`	Unescaped text.

Examples:

>>> xml_unescape("&lt;root&gt;&lt;/root&gt;")
'<root></root>'
>>> xml_unescape("&amp;&apos;&quot;")
"&'""

Notes

Converts entities like < back to <. Uses xml.sax.saxutils.unescape.

Submodules¶

HTML¶

HTML manipulation and sanitization.

HTML Module¶

HTML processing utilities.

This submodule provides utilities for cleaning, escaping, and manipulating HTML content.

Examples¶

from rite.markup.html import ( ... html_clean, ... html_escape, ... html_unescape ... ) html_clean("
Hello
") 'Hello'

Modules¶

html_escape ¶

HTML Escaper¶

Escape special HTML characters.

Examples¶

from rite.markup.html import html_escape html_escape("
Hello & goodbye
") '<div>Hello & goodbye</div>'

Functions¶

html_escape ¶

html_escape(text: str) -> str

Escape special HTML characters.

Parameters:

Name	Type	Description	Default
`text`	`str`	Text to escape.	required

Returns:

Type	Description
`str`	HTML-escaped text.

Examples:

>>> html_escape("5 < 10 & 10 > 5")
'5 &lt; 10 &amp; 10 &gt; 5'
>>> html_escape('"quoted"')
'&quot;quoted&quot;'

Notes

Escapes: &, <, >, ", ' Uses html.escape from standard library.

html_unescape ¶

HTML Unescaper¶

Unescape HTML entities.

Examples¶

from rite.markup.html import html_unescape html_unescape("<div>Hello</div>") '
Hello
'

Functions¶

html_unescape ¶

html_unescape(text: str) -> str

Unescape HTML entities.

Parameters:

Name	Type	Description	Default
`text`	`str`	HTML-escaped text.	required

Returns:

Type	Description
`str`	Unescaped text.

Examples:

>>> html_unescape("&lt;p&gt;Hello&lt;/p&gt;")
'<p>Hello</p>'
>>> html_unescape("&amp;")
'&'

Notes

Converts entities like < back to <. Uses html.unescape from standard library.

html_strip_tags ¶

HTML Tag Stripper¶

Strip specific HTML tags.

Examples¶

from rite.markup.html import html_strip_tags html_strip_tags("
Keep
", ["script"]) '
Keep
'

Functions¶

html_strip_tags ¶

html_strip_tags(html: str, tags: list[str]) -> str

Strip specific HTML tags and their content.

Parameters:

Name	Type	Description	Default
`html`	`str`	HTML string.	required
`tags`	`list[str]`	List of tag names to strip.	required

Returns:

Type	Description
`str`	HTML with specified tags removed.

Examples:

>>> html_strip_tags("<div>Keep</div><style>Remove</style>", ["style"])
'<div>Keep</div>'
>>> html_strip_tags(
...     "<p>Text</p><script>alert()</script>",
...     ["script", "style"]
... )
'<p>Text</p>'

Notes

Removes both opening and closing tags plus content. Case-insensitive tag matching.

html_clean ¶

HTML Cleaner¶

Remove HTML tags from text.

Examples¶

from rite.markup.html import html_clean html_clean("
Hello World
") 'Hello World'

Functions¶

html_clean ¶

html_clean(raw_html: str, strip: bool = True) -> str

Remove HTML tags from string.

Parameters:

Name	Type	Description	Default
`raw_html`	`str`	Raw HTML string to clean.	required
`strip`	`bool`	Strip whitespace from result.	`True`

Returns:

Type	Description
`str`	Cleaned text without HTML tags.

Examples:

>>> html_clean("<p>Hello</p>")
'Hello'
>>> html_clean("<div>  Text  </div>", strip=False)
'  Text  '

Notes

Uses regex to remove tags. Does not parse HTML structure.

XML¶

XML parsing and formatting.

XML Module¶

XML processing utilities.

This submodule provides utilities for escaping, unescaping, and formatting XML content.

Examples¶

from rite.markup.xml import ( ... xml_escape, ... xml_unescape ... ) xml_escape("value") '<tag>value</tag>'

Modules¶

xml_escape ¶

XML Escaper¶

Escape special XML characters.

Examples¶

from rite.markup.xml import xml_escape xml_escape("value & more") '<tag>value & more</tag>'

Functions¶

xml_escape ¶

xml_escape(text: str) -> str

Escape special XML characters.

Parameters:

Name	Type	Description	Default
`text`	`str`	Text to escape.	required

Returns:

Type	Description
`str`	XML-escaped text.

Examples:

>>> xml_escape("5 < 10 & 10 > 5")
'5 &lt; 10 &amp; 10 &gt; 5'
>>> xml_escape('"quoted"')
'&quot;quoted&quot;'

Notes

Escapes: &, <, >, ", ' Uses xml.sax.saxutils.escape.

xml_unescape ¶

XML Unescaper¶

Unescape XML entities.

Examples¶

from rite.markup.xml import xml_unescape xml_unescape("<tag>value</tag>") 'value'

Functions¶

xml_unescape ¶

xml_unescape(text: str) -> str

Unescape XML entities.

Parameters:

Name	Type	Description	Default
`text`	`str`	XML-escaped text.	required

Returns:

Type	Description
`str`	Unescaped text.

Examples:

>>> xml_unescape("&lt;root&gt;&lt;/root&gt;")
'<root></root>'
>>> xml_unescape("&amp;&apos;&quot;")
"&'""

Notes

Converts entities like < back to <. Uses xml.sax.saxutils.unescape.

xml_format ¶

XML Formatter¶

Format XML with proper indentation.

Examples¶

from rite.markup.xml import xml_format xml_format("text") # doctest: +SKIP '\n text\n'

Functions¶

xml_format ¶

xml_format(xml_string: str, indent: str = '  ') -> str

Format XML string with indentation.

Parameters:

Name	Type	Description	Default
`xml_string`	`str`	Unformatted XML string.	required
`indent`	`str`	Indentation string (default: 2 spaces).	`' '`

Returns:

Type	Description
`str`	Formatted XML string.

Examples:

>>> xml = "<root><child>text</child></root>"
>>> formatted = xml_format(xml)
>>> print(formatted)
<root>
  <child>text</child>
</root>

Notes

Uses xml.dom.minidom for formatting. May fail on malformed XML.

Markdown¶

Markdown processing.

Markdown Module¶

Markdown processing utilities.

This submodule provides utilities for converting and escaping Markdown content.

Examples¶

from rite.markup.markdown import ( ... markdown_to_html, ... markdown_escape ... ) markdown_to_html("bold") 'bold'

Modules¶

markdown_escape ¶

Markdown Escape¶

Escape Markdown special characters.

Examples¶

from rite.markup.markdown import markdown_escape markdown_escape("not italic") '\not italic\'

Functions¶

markdown_escape ¶

markdown_escape(text: str) -> str

Escape Markdown special characters.

Parameters:

Name	Type	Description	Default
`text`	`str`	Text to escape.	required

Returns:

Type	Description
`str`	Escaped text.

Examples:

>>> markdown_escape("# Not a heading")
'\\# Not a heading'
>>> markdown_escape("[not](link)")
'\\[not\\]\\(link\\)'

Notes

Escapes: *, _, #, [, ], (, ), `, ~ Prevents Markdown interpretation.

markdown_to_html ¶

Markdown to HTML¶

Convert Markdown to HTML (basic).

Examples¶

from rite.markup.markdown import markdown_to_html markdown_to_html("bold text") 'bold text'

Functions¶

markdown_to_html ¶

markdown_to_html(markdown: str) -> str

Convert basic Markdown to HTML.

Parameters:

Name	Type	Description	Default
`markdown`	`str`	Markdown text.	required

Returns:

Type	Description
`str`	HTML string.

Examples:

>>> markdown_to_html("# Heading")
'<h1>Heading</h1>'
>>> markdown_to_html("**bold** and *italic*")
'<strong>bold</strong> and <em>italic</em>'

Notes

Basic conversion only. Supports: headings, bold, italic, code. For full Markdown, use external library.

Entities¶

HTML entity encoding/decoding.

Entities Module¶

HTML entity encoding and decoding utilities.

This submodule provides utilities for encoding text to HTML entities and decoding entities back to text.

Examples¶

from rite.markup.entities import ( ... entities_encode, ... entities_decode ... ) entities_encode("©") '©'

Modules¶

entities_encode ¶

Entity Encoder¶

Encode text to HTML entities.

Examples¶

from rite.markup.entities import entities_encode entities_encode("café") 'café'

Functions¶

entities_encode ¶

entities_encode(text: str, ascii_only: bool = False) -> str

Encode text to HTML entities.

Parameters:

Name	Type	Description	Default
`text`	`str`	Text to encode.	required
`ascii_only`	`bool`	Encode only non-ASCII characters.	`False`

Returns:

Type	Description
`str`	Entity-encoded text.

Examples:

>>> entities_encode("©")
'&#169;'
>>> entities_encode("Hello", ascii_only=True)
'Hello'

Notes

Converts characters to &#N; format. Useful for encoding special characters.

entities_decode ¶

Entity Decoder¶

Decode HTML entities to text.

Examples¶

from rite.markup.entities import entities_decode entities_decode("café") 'café'

Functions¶

entities_decode ¶

entities_decode(text: str) -> str

Decode HTML entities to text.

Parameters:

Name	Type	Description	Default
`text`	`str`	Entity-encoded text.	required

Returns:

Type	Description
`str`	Decoded text.

Examples:

>>> entities_decode("&#169;")
'©'
>>> entities_decode("&copy;")
'©'

Notes

Examples¶

from rite.markup import (
    html_escape,
    xml_format,
    markdown_to_html
)

# Escape HTML
safe = html_escape("<script>alert('xss')</script>")

# Format XML
formatted = xml_format("<root><child>text</child></root>")

# Convert Markdown
html = markdown_to_html("# Heading\n\nParagraph")