Sphinx Themes HTML Structure Analysis¶
Objective: Analyze Sphinx theme HTML structure to improve librovore content extraction and enable custom markdownify extensions for code blocks.
Method: Direct HTML download with curl + BeautifulSoup analysis to extract precise CSS selectors and structural patterns.
Status: ✅ ALL 8 THEMES ANALYZED | 🔥 UNIVERSALLY CONSISTENT PATTERNS DISCOVERED!
Analysis Scripts¶
Location: .auxiliary/scripts/mkdocs-analysis/
- Core Analysis Script:
analyze_sphinx_html.py Main analysis script using BeautifulSoup. Analyzes code blocks, API documentation, and section structure. Outputs
analysis_results.jsonwith detailed findings.- Helper Scripts
extract_code_patterns.py- Displays code block patterns from analysis resultssection_analysis.py- Shows section structure and navigation patternscomprehensive_summary.py- ✅ COMPLETE ANALYSIS of all 8 themesdownload_remaining_themes.sh- Downloads all theme samples systematically
- Usage
# Download samples curl -s "https://sphinx-themes.org/sample-sites/{theme}/kitchen-sink/blocks/" \ -o .auxiliary/scribbles/sphinx-samples/{theme}-blocks.html # Run analysis hatch --env develop run python .auxiliary/scribbles/analyze_sphinx_html.py # View specific patterns hatch --env develop run python .auxiliary/scribbles/extract_code_patterns.py
Key Findings¶
1. Code Block Language Detection ✅ SOLVED¶
Critical Discovery: Sphinx uses parent container CSS classes for language identification!
HTML Pattern¶
<div class="highlight-python notranslate">
<div class="highlight">
<pre><!-- actual code content --></pre>
</div>
</div>
Language Identification¶
Pattern: parent.class.startswith('highlight-')
- Languages found:
highlight-python- Python code blockshighlight-json- JSON code blockshighlight-text- Plain text blockshighlight-default- Default/unknown language
- Additional classes:
doctest- Python doctest blocksnotranslate- Prevents translation
2. API Documentation Structure ✅ CONSISTENT¶
Pattern (Identical across Furo/RTD themes)¶
- API documentation structure:
Definition list:
dlSignature element:
dt.sig.sig-object.pyDescription element:
ddSignature classes:
['sig', 'sig-object', 'py']Anchor ID pattern:
module.function_name
HTML Structure¶
<dl>
<dt class="sig sig-object py" id="my_module.my_function">
async my_module.my_function(parameter: ParameterT = default_value) → ReturnT¶
</dt>
<dd>
The py:function directive.
</dd>
</dl>
3. Section Structure for Query Results ✅ THEME-SPECIFIC¶
Furo Theme Patterns¶
- Furo section structure:
Main content:
article[role="main"]Content wrapper:
div.contentSections:
section- Extraction selectors (in priority order):
article[role="main"] section(Primary)div.content section(Fallback)section(Generic)
RTD Theme Patterns¶
- RTD section structure:
Main wrapper:
section.wy-nav-content-wrapSections:
sectionNavigation sidebar:
nav.wy-nav-sideNavigation top:
nav.wy-nav-top- Extraction selectors (in priority order):
section.wy-nav-content-wrap section(Primary)section(Fallback)
Complete Analysis Results - All 8 Themes¶
✅ UNIVERSAL CONSISTENCY DISCOVERED!
Themes Analyzed: Furo, RTD, PyData, Python Documentation, Alabaster, agogo, classic, nature
Code Block Patterns - 100% CONSISTENT¶
Universal Classes Found Across ALL Themes¶
code_block_classes = [
'highlight', # The actual code content container
'highlight-default', # Default/unknown language
'highlight-python', # Python syntax highlighting
'highlight-json', # JSON syntax highlighting
'highlight-text', # Plain text blocks
'doctest', # Python doctest blocks
'notranslate' # Prevents translation
]
HTML Pattern (IDENTICAL across all 8 themes)¶
<div class="highlight-python notranslate">
<div class="highlight">
<pre><!-- code content --></pre>
</div>
</div>
API Documentation - 100% CONSISTENT¶
Universal API Classes (IDENTICAL across all 8 themes):
api_classes = ['sig', 'sig-object', 'py']
Function Signatures: All themes have exactly 19 function signatures with identical structure.
Section Structure - THEME-SPECIFIC BUT PREDICTABLE¶
Main Content Container Patterns¶
section_extraction_priorities = {
'furo': ['article[role="main"]', 'div.content', 'section'],
'rtd': ['section.wy-nav-content-wrap', 'section'],
'pydata': ['main.bd-main', 'article.bd-article', 'section'],
'python-docs': ['div.body[role="main"]', 'section'],
'alabaster': ['div.body[role="main"]', 'section'],
'agogo': ['div.body[role="main"]', 'div.content', 'section'],
'classic': ['div.body[role="main"]', 'section'],
'nature': ['div.body[role="main"]', 'section'],
}
Universal Sphinx Patterns Summary¶
Code Blocks (100% consistent)
Selector:
.highlightLanguage detection:
parent_class_prefix:highlight-Supported languages:
['python', 'json', 'text', 'default']Additional classes:
['doctest', 'notranslate']
API Documentation (100% consistent)
Signature selector:
dt.sig.sig-object.pyDescription selector:
ddAnchor pattern:
id_attributeUniversal classes:
['sig', 'sig-object', 'py']
Content Containers (Theme-specific)
content_containers = {
'furo': ['article[role="main"]', 'div.content', 'section'],
'sphinx_rtd_theme': ['section.wy-nav-content-wrap', 'section'],
'pydata_sphinx_theme': ['main.bd-main', 'article.bd-article', 'section'],
'python_docs_theme': ['div.body[role="main"]', 'section'],
'alabaster': ['div.body[role="main"]', 'section'],
'agogo': ['div.body[role="main"]', 'div.content', 'section'],
'classic': ['div.body[role="main"]', 'section'],
'nature': ['div.body[role="main"]', 'section'],
'generic_fallback': [
'div.body[role="main"]',
'section',
'div.content',
'article[role="main"]'
]
}
Navigation Cleanup (Theme-specific)
navigation_cleanup = {
'sphinx_rtd_theme': ['nav.wy-nav-side', 'nav.wy-nav-top'],
'pydata_sphinx_theme': ['nav.bd-docs-nav', 'nav.d-print-none'],
'python_docs_theme': ['nav.menu', 'nav.nav-content'],
'agogo': ['div.sidebar'],
'generic': ['nav', '.navigation', '.sidebar', '.toc']
}
Final Summary¶
✅ MISSION ACCOMPLISHED!
🔥 Universal Patterns Discovered
Code Block Language Detection:
parent.class.startswith('highlight-')- 100% consistent across all 8 themesAPI Documentation Structure:
dt.sig.sig-object.py + dd- 100% consistent across all 8 themesSection Content Extraction: Theme-specific selectors with predictable fallback patterns
Analysis Completeness
Themes Analyzed: 8/8 (100%)
Code Block Consistency: 100%
API Documentation Consistency: 100%
Pattern Reliability: Extremely High
Implementation Readiness: Complete
Session Handoff Information
- Context:
COMPLETE analysis of all 8 major Sphinx themes
- Status:
✅ ANALYSIS COMPLETE - All patterns discovered and documented
- Scripts:
Comprehensive analysis toolchain in
.auxiliary/scribbles/- Key Achievement:
Discovered universal consistency in Sphinx theme structure
- Next Phase:
Implementation in librovore structure extractors