.. vim: set fileencoding=utf-8: .. -*- coding: utf-8 -*- .. +--------------------------------------------------------------------------+ | | | Licensed under the Apache License, Version 2.0 (the "License"); | | you may not use this file except in compliance with the License. | | You may obtain a copy of the License at | | | | http://www.apache.org/licenses/LICENSE-2.0 | | | | Unless required by applicable law or agreed to in writing, software | | distributed under the License is distributed on an "AS IS" BASIS, | | WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. | | See the License for the specific language governing permissions and | | limitations under the License. | | | +--------------------------------------------------------------------------+ ******************************************************************************* 002. Detector Registry Specification ******************************************************************************* Overview =============================================================================== This document specifies the detector registry architecture for pluggable backend support in the detextive library. The registry system enables configurable detector precedence, graceful degradation with optional dependencies, and dynamic fallback strategies for robust detection across diverse environments. The design follows established project practices for type aliases, interface contracts, and module organization while providing extensibility for third-party detection backends. Registry Architecture =============================================================================== Core Registry Types ------------------------------------------------------------------------------- **Detector Function Signatures** .. code-block:: python CharsetDetector: __.typx.TypeAlias = __.cabc.Callable[ [ Content, Behaviors ], CharsetResult | __.types.NotImplementedType ] MimetypeDetector: __.typx.TypeAlias = __.cabc.Callable[ [ Content, Behaviors ], MimetypeResult | __.types.NotImplementedType ] **Registry Container Types** .. code-block:: python charset_detectors: __.accret.Dictionary[ str, CharsetDetector ] mimetype_detectors: __.accret.Dictionary[ str, MimetypeDetector ] **Registry Contract Specifications:** - Detectors return specific result types with confidence scoring - `NotImplemented` return value indicates missing optional dependency - Registry keys provide user-configurable detector ordering - Detector functions accept standardized parameters for consistent interfaces Registry Registration Pattern ------------------------------------------------------------------------------- **Dynamic Registration System** .. code-block:: python def _detect_via_chardet( content: Content, behaviors: Behaviors ) -> CharsetResult | __.types.NotImplementedType: ''' Detects charset using chardet library. ''' try: from chardet import detect as _chardet_detect except ImportError: return NotImplemented # Detection implementation would follow here def _detect_via_charset_normalizer( content: Content, behaviors: Behaviors ) -> CharsetResult | __.types.NotImplementedType: ''' Detects charset using charset-normalizer library. ''' try: from charset_normalizer import from_bytes except ImportError: return NotImplemented # Detection implementation would follow here # Registration at module initialization charset_detectors[ 'chardet' ] = _detect_via_chardet charset_detectors[ 'charset-normalizer' ] = _detect_via_charset_normalizer **Registration Design Principles:** - Lazy import strategy with graceful ImportError handling - Consistent function signature across all detector implementations - Registry key naming matches common library names for intuitive configuration - Module-level registration enables import-time detector discovery Optional Dependency Strategy =============================================================================== Graceful Degradation Pattern ------------------------------------------------------------------------------- **NotImplemented Return Protocol** The registry system implements graceful degradation where: - Detectors return `NotImplemented` for missing optional dependencies - Registry iteration continues until successful detection - Exception raising occurs only when all configured detectors fail - User-configurable detector ordering enables fallback preferences Configuration Integration ------------------------------------------------------------------------------- **Behavior-Driven Detector Selection** .. code-block:: python class Behaviors( __.immut.DataclassObject ): ''' Configuration for detector registry usage. ''' charset_detectors_order: __.typx.Annotated[ __.cabc.Sequence[ str ], __.ddoc.Doc( ''' Order in which charset detectors are applied. ''' ), ] = ( 'chardet', 'charset-normalizer' ) mimetype_detectors_order: __.typx.Annotated[ __.cabc.Sequence[ str ], __.ddoc.Doc( ''' Order in which MIME type detectors are applied. ''' ), ] = ( 'magic', 'puremagic' ) **Configuration Design Features:** - User-configurable detector precedence through sequence ordering - Default ordering based on library reliability and performance characteristics - Runtime modification support for dynamic behavior adjustment - Validation ensures only registered detectors attempted Multiple Backend Support =============================================================================== Charset Detection Backends ------------------------------------------------------------------------------- **Supported Charset Libraries** .. code-block:: python # Standard charset detection backends charset_detectors[ 'chardet' ] # Statistical analysis, UTF-8 bias charset_detectors[ 'charset-normalizer' ] # Enhanced heuristics, multiple algorithms **Backend Characteristics:** - `chardet`: Mature statistical analysis with proven UTF-8 bias handling - `charset-normalizer`: Enhanced detection algorithms with multiple confidence scoring **Registration Strategy:** - Both libraries registered with graceful ImportError handling - Default ordering prioritizes `chardet` for proven reliability - User configuration enables alternative precedence based on use case requirements MIME Type Detection Backends ------------------------------------------------------------------------------- **Supported MIME Type Libraries** .. code-block:: python # MIME type detection backends mimetype_detectors[ 'magic' ] # python-magic (libmagic bindings) mimetype_detectors[ 'puremagic' ] # Pure Python magic byte detection **Backend Selection Strategy:** - `python-magic`: Comprehensive magic byte database via libmagic - `puremagic`: Pure Python implementation for deployment simplicity - Fallback ordering ensures detection capability across diverse environments **Detection Priority Logic:** - Primary detection via content analysis (magic bytes) - Secondary detection via filename extension analysis - Default MIME type assignment based on available context Interface Contract Design =============================================================================== Detector Function Contracts ------------------------------------------------------------------------------- **Standardized Parameters** .. code-block:: python def detector_function( content: Content, # Raw byte content for analysis behaviors: Behaviors # Configuration object with detection preferences ) -> DetectionResult | __.types.NotImplementedType: ''' Standard detector function signature. ''' **Return Value Specifications:** - Successful detection returns structured result with confidence scoring - Missing dependencies indicated by `NotImplemented` return value - Exception raising reserved for genuine detection failures - Result types provide consistent interface across all detection backends **Parameter Design Principles:** - Wide parameter acceptance for maximum backend flexibility - Behavior-driven configuration enables detector-specific optimization - Content parameter accepts any bytes-like input for broad compatibility Result Type Integration ------------------------------------------------------------------------------- **Registry Return Value Contracts:** - Successful detection returns `CharsetResult` or `MimetypeResult` (defined in API design) - Missing dependencies indicated by `NotImplemented` return value - Exception raising reserved for genuine detection failures - Confidence scoring enables quality-based selection among multiple results Registry Architecture Summary =============================================================================== **Key Design Features:** - Pluggable backend system with standardized detector function signatures - Graceful degradation through `NotImplemented` return protocol - User-configurable detector precedence via `Behaviors` configuration - Support for multiple optional dependencies per detection type **Implementation Architecture:** - Registry containers in `detectors.py` module - Type aliases for detector function signatures - Dynamic registration with import-time discovery - Registry-based dispatch in core detection functions