002. Detector Registry Specification¶
Overview¶
This document specifies the detector registry architecture for pluggable backend support in the detextive library. The registry system enables configurable detector precedence, graceful degradation with optional dependencies, and dynamic fallback strategies for robust detection across diverse environments.
The design follows established project practices for type aliases, interface contracts, and module organization while providing extensibility for third-party detection backends.
Registry Architecture¶
Core Registry Types¶
Detector Function Signatures
CharsetDetector: __.typx.TypeAlias = __.cabc.Callable[
[ Content, Behaviors ],
CharsetResult | __.types.NotImplementedType
]
MimetypeDetector: __.typx.TypeAlias = __.cabc.Callable[
[ Content, Behaviors ],
MimetypeResult | __.types.NotImplementedType
]
Registry Container Types
charset_detectors: __.accret.Dictionary[ str, CharsetDetector ]
mimetype_detectors: __.accret.Dictionary[ str, MimetypeDetector ]
Registry Contract Specifications: - Detectors return specific result types with confidence scoring - NotImplemented return value indicates missing optional dependency - Registry keys provide user-configurable detector ordering - Detector functions accept standardized parameters for consistent interfaces
Registry Registration Pattern¶
Dynamic Registration System
def _detect_via_chardet(
content: Content, behaviors: Behaviors
) -> CharsetResult | __.types.NotImplementedType:
''' Detects charset using chardet library. '''
try:
from chardet import detect as _chardet_detect
except ImportError:
return NotImplemented
# Detection implementation would follow here
def _detect_via_charset_normalizer(
content: Content, behaviors: Behaviors
) -> CharsetResult | __.types.NotImplementedType:
''' Detects charset using charset-normalizer library. '''
try:
from charset_normalizer import from_bytes
except ImportError:
return NotImplemented
# Detection implementation would follow here
# Registration at module initialization
charset_detectors[ 'chardet' ] = _detect_via_chardet
charset_detectors[ 'charset-normalizer' ] = _detect_via_charset_normalizer
Registration Design Principles: - Lazy import strategy with graceful ImportError handling - Consistent function signature across all detector implementations - Registry key naming matches common library names for intuitive configuration - Module-level registration enables import-time detector discovery
Optional Dependency Strategy¶
Graceful Degradation Pattern¶
NotImplemented Return Protocol
The registry system implements graceful degradation where: - Detectors return NotImplemented for missing optional dependencies - Registry iteration continues until successful detection - Exception raising occurs only when all configured detectors fail - User-configurable detector ordering enables fallback preferences
Configuration Integration¶
Behavior-Driven Detector Selection
class Behaviors( __.immut.DataclassObject ):
''' Configuration for detector registry usage. '''
charset_detectors_order: __.typx.Annotated[
__.cabc.Sequence[ str ],
__.ddoc.Doc( ''' Order in which charset detectors are applied. ''' ),
] = ( 'chardet', 'charset-normalizer' )
mimetype_detectors_order: __.typx.Annotated[
__.cabc.Sequence[ str ],
__.ddoc.Doc( ''' Order in which MIME type detectors are applied. ''' ),
] = ( 'magic', 'puremagic' )
Configuration Design Features: - User-configurable detector precedence through sequence ordering - Default ordering based on library reliability and performance characteristics - Runtime modification support for dynamic behavior adjustment - Validation ensures only registered detectors attempted
Multiple Backend Support¶
Charset Detection Backends¶
Supported Charset Libraries
# Standard charset detection backends
charset_detectors[ 'chardet' ] # Statistical analysis, UTF-8 bias
charset_detectors[ 'charset-normalizer' ] # Enhanced heuristics, multiple algorithms
Backend Characteristics: - chardet: Mature statistical analysis with proven UTF-8 bias handling - charset-normalizer: Enhanced detection algorithms with multiple confidence scoring
Registration Strategy: - Both libraries registered with graceful ImportError handling - Default ordering prioritizes chardet for proven reliability - User configuration enables alternative precedence based on use case requirements
MIME Type Detection Backends¶
Supported MIME Type Libraries
# MIME type detection backends
mimetype_detectors[ 'magic' ] # python-magic (libmagic bindings)
mimetype_detectors[ 'puremagic' ] # Pure Python magic byte detection
Backend Selection Strategy: - python-magic: Comprehensive magic byte database via libmagic - puremagic: Pure Python implementation for deployment simplicity - Fallback ordering ensures detection capability across diverse environments
Detection Priority Logic: - Primary detection via content analysis (magic bytes) - Secondary detection via filename extension analysis - Default MIME type assignment based on available context
Interface Contract Design¶
Detector Function Contracts¶
Standardized Parameters
def detector_function(
content: Content, # Raw byte content for analysis
behaviors: Behaviors # Configuration object with detection preferences
) -> DetectionResult | __.types.NotImplementedType:
''' Standard detector function signature. '''
Return Value Specifications: - Successful detection returns structured result with confidence scoring - Missing dependencies indicated by NotImplemented return value - Exception raising reserved for genuine detection failures - Result types provide consistent interface across all detection backends
Parameter Design Principles: - Wide parameter acceptance for maximum backend flexibility - Behavior-driven configuration enables detector-specific optimization - Content parameter accepts any bytes-like input for broad compatibility
Result Type Integration¶
Registry Return Value Contracts: - Successful detection returns CharsetResult or MimetypeResult (defined in API design) - Missing dependencies indicated by NotImplemented return value - Exception raising reserved for genuine detection failures - Confidence scoring enables quality-based selection among multiple results
Registry Architecture Summary¶
Key Design Features: - Pluggable backend system with standardized detector function signatures - Graceful degradation through NotImplemented return protocol - User-configurable detector precedence via Behaviors configuration - Support for multiple optional dependencies per detection type
Implementation Architecture: - Registry containers in detectors.py module - Type aliases for detector function signatures - Dynamic registration with import-time discovery - Registry-based dispatch in core detection functions