The Case for Converting HTML to Markdown
The web is written in HTML. But HTML is not pleasant to read or edit in its raw form — it’s a delivery format, not a writing format. Markdown is the inverse: designed specifically for writing and reading as plain text, while still producing structured HTML when rendered.
Converting HTML to Markdown makes content portable, editable, and compatible with the growing ecosystem of Markdown-native tools — note-taking apps, documentation platforms, AI systems, and version control workflows.
When You Need This Tool
Saving web articles for offline or personal use. Paste the article HTML and get a clean .md file you can save to Obsidian, Notion, or any Markdown-aware notes app. The Markdown version is smaller, more portable, and readable even without rendering.
CMS migration. Moving content from an HTML-first CMS (like WordPress or Drupal) to a Markdown-based system (like Docusaurus, Gatsby, or a custom static site)? Export the HTML content from your old system, convert to Markdown, and import into the new one.
Preparing content for AI prompts. Large language models work with plain text. Converting a structured HTML page to Markdown preserves the document hierarchy (headings, lists, emphasis) in a format that AI tools handle better than raw HTML.
Cleaning up rich text. Copy content from Google Docs, Confluence, or Notion into a browser and grab the page source. The HTML will be messy, but the converter strips away the presentation layer and leaves you with structured Markdown.
Content editing. Some web pages are easier to edit as Markdown than as HTML — particularly long-form content with many headings and lists.
What the Converter Handles
The converter walks the HTML DOM tree and translates recognized elements to their Markdown equivalents:
| HTML | Markdown |
|---|---|
<h1> through <h6> | # through ###### |
<strong>, <b> | **bold** |
<em>, <i> | *italic* |
<del>, <s> | ~~strikethrough~~ |
<code> | `inline code` |
<pre><code> | Fenced code block |
<a href="..."> | [text](url) |
<img alt="" src=""> |  |
<blockquote> | > blockquote |
<ul> / <li> | - list item |
<ol> / <li> | 1. ordered item |
<table> | Markdown table |
<hr> | --- |
<br> | Line break |
Elements with no Markdown equivalent — <div>, <span>, <nav>, <header>, <footer>, <script>, <style> — are either traversed for their text content or skipped.
Tips for Better Results
Use browser developer tools to select only the article content. Right-click the main article body, choose “Inspect”, and copy just the <article> or <main> element’s outer HTML. This avoids converting navigation, sidebars, footers, and ads.
Paste partial HTML, not full page source. Full page source includes <head>, scripts, and layout structure that doesn’t convert meaningfully. The converter handles it, but the output will contain a lot of noise from non-content elements.
Post-process the output. The Markdown output may benefit from a quick cleanup: removing empty lines, tidying table alignment, or re-adding context that was implied by the page’s layout but not present in the content structure.
Frequently Asked Questions
Why would I convert HTML to Markdown? +
Common reasons: saving web articles to a personal knowledge base, migrating content from a CMS to a Markdown-based system, cleaning up HTML for use in AI prompts, or creating an editable version of published web content you own.
How do I get the HTML source of a web page? +
In Chrome, Firefox, or Edge: right-click the page and choose 'View Page Source', or press Ctrl+U (Cmd+U on Mac). You can also right-click a specific element and choose 'Inspect', then copy the outer HTML from the Elements panel.
Does the converter handle complex HTML? +
It handles the common elements found in articles and documents: headings, paragraphs, bold, italic, links, images, lists, tables, code blocks, and blockquotes. Complex layouts with nested divs, grid systems, navigation menus, and scripts are simplified or omitted — only content elements are converted.
What happens to inline styles and CSS classes? +
They are ignored. The converter extracts content structure from the HTML element hierarchy and translates it to Markdown — CSS classes and inline style attributes have no Markdown equivalent and are discarded.
Can I convert HTML emails to Markdown? +
Yes, with caveats. HTML emails often use table-based layouts for compatibility, which will produce nested table Markdown that may look unusual. Extracting only the main text content from an HTML email usually works better than converting the full source.