Theme Structure Research¶
This section contains empirical research findings on the HTML structure and patterns of documentation generation systems. These analyses inform the content extraction strategies and structure processor implementations.
Research Methodology¶
The research documented here was conducted through systematic analysis of multiple documentation sites built with various themes:
Direct HTML downloads using curl
BeautifulSoup-based structural analysis
Pattern extraction and consistency verification across themes
CSS selector identification for reliable content targeting
Analysis scripts are preserved in .auxiliary/scripts/ for reproducibility
and future validation.
Theme Analysis Documents¶
Sphinx Themes¶
Analysis of 8 major Sphinx themes revealing 100% consistent patterns for:
Code block language detection (
highlight-{lang}parent container classes)API documentation structure (
dt.sig.sig-object.py+ddpatterns)Theme-specific content container selectors
The universal consistency across Sphinx themes enables highly reliable content extraction for Sphinx-based documentation sites.
MkDocs Themes¶
Analysis of 3 major MkDocs themes (Material, ReadTheDocs, default) across 10 representative documentation sites, discovering:
Code block language detection (
language-{lang}element classes)mkdocstrings API documentation patterns (
div.autodoc-signaturestructure)Theme-specific content and navigation patterns
The mkdocstrings plugin provides Sphinx inventory compatibility with precise function and class signature targeting capabilities.
Implementation Impact¶
The patterns documented in these research findings are implemented in:
- Structure Processors
Located in
sources/librovore/structures/, these processors use the documented CSS selectors and HTML patterns for reliable content extraction.- Markdownify Extensions
Custom extensions leverage the language detection patterns to preserve code block formatting and metadata during HTML-to-Markdown conversion.
- Theme Detection
Automated theme identification enables selection of optimal extraction strategies based on documented patterns.
Future Research¶
Potential areas for additional analysis:
Comparative analysis of custom Sphinx theme implementations
MkDocs plugin ecosystem (beyond mkdocstrings)
Jupyter Book and other Sphinx-derived documentation systems
Version-specific theme variations and compatibility
As new documentation generation systems emerge, similar empirical analysis can be conducted using the established methodology and analysis scripts.