MkDocs Themes HTML Structure Analysis¶
Objective: Analyze MkDocs theme HTML structure to improve librovore content extraction and enable custom markdownify extensions for code blocks.
Method: Direct HTML download with curl + BeautifulSoup analysis to extract precise CSS selectors and structural patterns.
Status: ✅ ANALYSIS COMPLETE | All 3 MkDocs themes analyzed with comprehensive patterns discovered!
Analyzed Themes¶
Successfully analyzed with 10 representative documentation sites:
- Material for MkDocs - Most popular theme (5 sites analyzed)
Material docs, code blocks page, reference docs
Pydantic API docs, HTTPX API docs (with mkdocstrings)
- ReadTheDocs - Built-in default theme (3 sites analyzed)
MkDocs main site, configuration docs, writing guide
- MkDocs (default) - Built-in Bootstrap-based theme (2 sites analyzed)
FastAPI docs, MkDocs getting started
Analysis Scripts¶
Location: .auxiliary/scripts/mkdocs-analysis/
- Core Analysis Script:
analyze_mkdocs_html.py Main analysis script using BeautifulSoup. Analyzes code blocks, API documentation, and section structure. Outputs
analysis_results.jsonwith detailed findings.- Helper Scripts
extract_code_patterns.py- Displays code block patterns from analysis resultssection_analysis.py- Shows section structure and navigation patternscomprehensive_summary.py- ✅ COMPLETE ANALYSIS of all 3 themesdownload_mkdocs_themes.sh- Downloads all theme samples systematically
- Usage
# Download samples .auxiliary/scripts/mkdocs-analysis/download_mkdocs_themes.sh # Run analysis hatch --env develop run python .auxiliary/scripts/mkdocs-analysis/analyze_mkdocs_html.py # View specific patterns hatch --env develop run python .auxiliary/scripts/mkdocs-analysis/extract_code_patterns.py # Full comprehensive summary hatch --env develop run python .auxiliary/scripts/mkdocs-analysis/comprehensive_summary.py
Key Findings¶
1. Code Block Language Detection ✅ SOLVED¶
Key Discovery: MkDocs uses language-{lang} CSS classes directly on code elements!
HTML Pattern (confirmed across all themes)¶
<div class="highlight language-python">
<pre><!-- code content --></pre>
</div>
Language Identification¶
Pattern: element.class includes 'language-{lang}'
- Languages found:
language-python- Python code blockslanguage-yaml- YAML configurationlanguage-json- JSON exampleslanguage-bash- Shell commandslanguage-html- HTML markuplanguage-css- CSS stylinglanguage-javascript- JavaScript code
Universal container class: highlight (Same as Sphinx!)
2. API Documentation Structure ✅ SOLVED - SIGNATURE PATTERNS DISCOVERED!¶
🔥 CRITICAL DISCOVERY: mkdocstrings provides precise signature identification patterns similar to Sphinx’s dt.sig.sig-object.py!
mkdocstrings Signature Pattern (from HTTPX API docs)¶
<div class="autodoc">
<div class="autodoc-signature">
<code>httpx.<strong>request</strong></code>
<span class="autodoc-punctuation">(</span>
<em class="autodoc-param">method</em>
<span class="autodoc-punctuation">, </span>
<em class="autodoc-param">url</em>
<span class="autodoc-punctuation">, ...</span>
</div>
<div class="autodoc-docstring">
<p>Sends an HTTP request.</p>
<!-- Full documentation -->
</div>
</div>
Complete mkdocstrings Structure¶
- Primary containers:
Signature container:
div.autodocSignature element:
div.autodoc-signatureDocstring element:
div.autodoc-docstring
- Signature components:
Function name:
code > strongParameters:
em.autodoc-paramPunctuation:
span.autodoc-punctuation
- Object type indicators:
Class indicator:
em:contains("class")- Classes prefixed with “class”Function signature:
code- Function signatures in code tags
🎯 SPHINX INVENTORY INTEGRATION¶
When mapping Sphinx objects.inv entries to MkDocs URLs, you can now precisely locate function and class signatures using div.autodoc-signature patterns - just as reliable as Sphinx’s dt.sig.sig-object.py!
3. Section Structure for Query Results ✅ CONFIRMED¶
Actual Patterns (confirmed through analysis)¶
mkdocs_content_patterns = {
'material': [
'main.md-main', # Primary container
'article.md-content__inner', # Main content article
'div.md-content' # Content wrapper
],
'readthedocs': [
'div.col-md-9[role="main"]', # Bootstrap main column
'div.container' # Bootstrap container
],
'mkdocs_default': [
'div.col-md-9[role="main"]', # Bootstrap main column (same as RTD)
'div.container' # Bootstrap container
]
}
Comprehensive Analysis Results - All 3 Themes¶
✅ CLEAR PATTERNS DISCOVERED!
Themes Analyzed: Material for MkDocs, ReadTheDocs, MkDocs Default (10 documentation sites total)
Code Block Patterns - CONSISTENT CONTAINER, VARIABLE LANGUAGE DETECTION¶
Universal Container Class: .highlight (identical to Sphinx!)
Language Detection Patterns¶
mkdocs_code_patterns = {
# Material Theme: Most comprehensive language support
'material': [
'language-python', 'language-yaml', 'language-json',
'language-html', 'language-css', 'language-javascript'
],
# ReadTheDocs Theme: Standard language support
'readthedocs': [
'language-python', 'language-yaml', 'language-bash'
],
# Default Theme: Basic language support
'mkdocs_default': [
'language-bash', 'language-yaml'
],
# Universal pattern: Check element classes for 'language-{lang}'
'detection_method': 'element.class includes language-{lang}'
}
Main Content Containers - THEME-SPECIFIC BUT PREDICTABLE¶
Content Extraction Patterns¶
mkdocs_content_extraction = {
'material': {
'primary': 'main.md-main',
'content_article': 'article.md-content__inner',
'content_wrapper': 'div.md-content'
},
'readthedocs': {
'primary': 'div.col-md-9[role="main"]',
'container': 'div.container'
},
'mkdocs_default': {
'primary': 'div.col-md-9[role="main"]', # Same as ReadTheDocs
'container': 'div.container'
}
}
mkdocstrings API Documentation - CUSTOM PATTERN¶
API Documentation Structure (different from Sphinx dt/dd)¶
mkdocstrings_patterns = {
'container_classes': ['autodoc', 'autodoc-docstring', 'autodoc-members'],
'code_examples': '.highlight', # Uses same code container
'content_wrapper': '.md-content__inner.md-typeset'
}
Key Difference: mkdocstrings uses custom div containers, not dt/dd like Sphinx.
Universal MkDocs Patterns Summary¶
Code Blocks
Selector:
.highlightLanguage detection:
element_class_prefix:language-Supported languages:
['python', 'yaml', 'json', 'bash', 'html', 'css', 'javascript']Fallback detection: Check parent classes if no explicit language class
API Documentation (mkdocstrings)
Signature container:
div.autodocSignature element:
div.autodoc-signatureDocstring element:
div.autodoc-docstringFunction name selector:
code > strongParameters selector:
em.autodoc-paramPattern type: div_containers (Not dt/dd like Sphinx)
Sphinx inventory compatible: True (Can pinpoint specific functions/classes!)
Content Containers (Theme-specific)
content_containers = {
'material': ['main.md-main', 'article.md-content__inner', 'div.md-content'],
'readthedocs': ['div.col-md-9[role="main"]', 'div.container'],
'mkdocs_default': ['div.col-md-9[role="main"]', 'div.container'],
'generic_fallback': [
'main',
'article',
'[role="main"]',
'.md-content',
'.container'
]
}
Navigation Cleanup (Theme-specific)
navigation_cleanup = {
'material': ['nav.md-nav', 'div.md-sidebar', 'nav.md-header__inner'],
'readthedocs': ['div.navbar', 'ul.nav.navbar-nav'],
'mkdocs_default': ['div.navbar', 'ul.nav.navbar-nav'],
'generic': ['nav', '.navbar', '.navigation', '.sidebar']
}
MkDocs vs Sphinx Comparison¶
📊 Key Similarities
Code Container: Both use
.highlightclass ✅Theme-Specific Content: Both require theme-aware extraction ✅
Navigation Cleanup: Both need theme-specific navigation removal ✅
📊 Key Differences
- Language Detection:
MkDocs:
language-{lang}directly on code elementSphinx:
highlight-{lang}on parent container
- API Documentation Signatures:
MkDocs/mkdocstrings:
div.autodoc > div.autodoc-signature✅ Precise function/class targetingSphinx:
dt.sig.sig-object.py✅ Precise function/class targetingVERDICT: ✅ Both provide excellent signature identification for Sphinx inventory mapping!
- Theme Consistency:
MkDocs: More variation between themes (Material vs Bootstrap)
Sphinx: More standardized patterns across themes
Final Summary¶
✅ MISSION ACCOMPLISHED!
🔥 Clear Patterns Discovered
Code Block Language Detection:
element.class includes 'language-{lang}'- consistent across themesContent Container Selection: Clear theme-specific patterns with predictable fallbacks
Navigation Cleanup: Theme-aware patterns identified and documented
🎯 mkdocstrings Signature Targeting:
div.autodoc-signatureprovides precise function/class identification for Sphinx inventory mapping!
Analysis Completeness
Themes Analyzed: 3/3 (100%)
Documentation Sites: 10 representative sites
Code Block Consistency: High (universal
.highlightcontainer)Language Detection: Theme-dependent but predictable
Pattern Reliability: High
Implementation Readiness: Complete
Session Handoff Information
- Context:
COMPLETE analysis of all 3 major MkDocs themes
- Status:
✅ ANALYSIS COMPLETE - All patterns discovered and documented
- Scripts:
Comprehensive analysis toolchain in
.auxiliary/scripts/mkdocs-analysis/- Key Achievement:
Discovered clear patterns despite theme variation
- Next Phase:
Implementation in librovore structure extractors alongside Sphinx patterns