Test Plan: Version 2.0 Complete Test Suite¶
Testing Philosophy¶
Coverage-Gap-First Approach: Use doctests for examples and happy paths, pytest for coverage gaps and edge cases only.
Focus Areas: - Default return behavior patterns (DetectFailureActions enum) - Exception location parameter handling - Enhanced detection and inference capabilities - Cross-platform compatibility considerations
Windows Compatibility Considerations: - python-magic vs python-magic-bin MIME type detection differences - Cross-platform line separator handling - Cygwin buffer issue mitigations
Test Strategy Overview¶
Coverage-Gap-First Approach: - Target specific uncovered lines identified in coverage analysis - Replace existing commented-out tests with minimal effective coverage - Focus on default return behavior patterns (DetectFailureActions enum) - Essential edge cases and error paths only - Avoid comprehensive testing that duplicates doctest coverage
Test Module Organization:
- test_100_nomina: Type aliases and common types (minimal - may skip)
- test_110_exceptions: Exception hierarchy and location parameter handling
- test_120_core: Core types, enums, and behaviors
- test_200_lineseparators: Line separator detection and normalization
- test_210_mimetypes: MIME type utility functions
- test_220_charsets: Charset detection utilities and codec handling
- test_300_validation: Text validation and reasonableness checking
- test_310_detectors: Core detection functions (highest priority)
- test_400_inference: Context-aware inference functions
- test_500_decoders: High-level decoding and integration functions
Test Module Specifications¶
test_100_nomina (Optional)¶
Scope: Type aliases and common definitions
Assessment: Minimal testing needed - type aliases don’t require extensive testing. May skip this module unless coverage tools require it.
Basic Tests (000-099): - Import verification - Type alias accessibility
test_110_exceptions¶
Scope: Exception hierarchy and location parameter handling
Basic Tests (000-099): - Exception hierarchy verification - Import and inheritance structure validation
CharsetDetectFailure Tests (100-119): - Construction with and without location parameter - String location message formatting - pathlib.Path location handling - Absential location handling (__.absent)
CharsetInferFailure Tests (120-139): - Construction with and without location parameter - Location context in inference failure messages
MimetypeDetectFailure Tests (140-159): - Construction with and without location parameter - Various location types (str, Path) in messages
ContentDecodeFailure Tests (160-179): - Construction with charset and location details - Exception chaining preservation
Exception Hierarchy Tests (180-199): - Omniexception base class behavior - Omnierror inheritance and catching patterns - Multiple inheritance with built-in exception types - Package-wide exception catching via Omnierror
Implementation Notes: - Test all exception types with both present and absent location parameters - Verify proper message formatting includes location when provided - Test exception chaining with ‘from’ clauses - Cross-platform path handling in location parameters
test_120_core¶
Current Coverage: 100% - Maintain coverage while expanding tests
Basic Tests (000-099): - Module import verification - Constant value validation (CHARSET_DEFAULT, MIMETYPE_DEFAULT)
Enum Tests (100-199): - BehaviorTristate enum values and behavior - CodecSpecifiers enum values and usage - DetectFailureActions enum values and semantics - Enum string representations and comparisons
Behaviors Configuration Tests (200-299): - Default Behaviors instance validation - Custom Behaviors instance creation - Field defaults and validation - Detector order sequence handling - Tristate behavior configurations
Result Types Tests (300-399): - CharsetResult construction and field access - MimetypeResult construction and field access - Confidence value validation (0.0 to 1.0 range) - Optional charset handling in CharsetResult
Confidence Calculation Tests (400-499): - confidence_from_bytes_quantity with various content lengths - Confidence divisor behavior testing - Edge cases: empty content, very long content - Custom behavior configuration effects
Implementation Notes: - Test all enum values and their auto-generated identities - Test confidence calculation formula and edge cases - Validate behavior configuration precedence and defaults
test_200_lineseparators¶
Scope: Line separator detection and normalization
Basic Tests (000-099): - Enum structure and values validation - Import accessibility verification
Detection Tests (100-199): - Unix LF detection from byte content - Windows CRLF detection from byte content - Classic Mac CR detection from byte content - Mixed line ending detection (first-wins behavior) - Empty content detection (returns None) - Content without line endings (returns None) - Integer sequence input handling - Detection limit parameter behavior
Normalization Tests (200-299): - normalize_universal: all endings to LF conversion - normalize_universal: content without endings (unchanged) - normalize_universal: empty content handling - Individual enum normalize methods (CR, CRLF, LF) - Preserve content that’s already normalized
Platform Conversion Tests (300-399): - nativize method behavior per platform - Unix LF to platform-specific conversion - Edge cases in platform conversion - Content without line endings in nativize
Edge Case Tests (400-499): - Very long content with mixed endings - Consecutive line separators - Line separators at content boundaries - Invalid or malformed line ending sequences
Windows Compatibility Tests (500-599): - CRLF detection accuracy on Windows - Cross-platform nativize behavior consistency - Large content handling (Cygwin buffer considerations)
Implementation Notes: - Use content patterns for consistent test data - Test detection precedence (which separator wins in mixed content) - Verify immutability of enum instances - Cross-platform testing considerations for nativize behavior
test_210_mimetypes¶
Scope: MIME type utility functions
Basic Tests (000-099): - Module import and function accessibility
Textual MIME Type Tests (100-199):
- is_textual_mimetype with text/* prefixes
- Known textual application types (json, xml, javascript, yaml)
- Textual suffixes (+json, +xml, +yaml, +toml)
- Non-textual types rejection (image/*, video/*, audio/*)
- Empty and malformed MIME type handling
- Case sensitivity in MIME type evaluation
Edge Case Tests (200-299):
- MIME types with parameters (text/plain; charset=utf-8)
- Vendor-specific MIME types (application/vnd.*)
- Custom and unknown MIME types
- Very long MIME type strings
- MIME types with unusual characters
Implementation Notes: - Comprehensive coverage of textual vs non-textual classification - Test MIME type parameter handling if applicable - Edge cases for malformed input - Performance testing with large MIME type lists
test_220_charsets¶
Scope: Charset detection utilities and codec handling
Basic Tests (000-099): - Module import verification - Function accessibility validation
OS Charset Detection Tests (100-199): - discover_os_charset_default function behavior - Cross-platform charset default handling - Caching behavior for OS charset detection - Environment variable influence testing
Codec Resolution Tests (200-299): - CodecSpecifiers enum handling in attempt_decodes - OsDefault codec specifier behavior - PythonDefault codec specifier behavior - UserSupplement codec specifier behavior - FromInference codec specifier behavior - Invalid codec name handling
Trial Decode Tests (300-399): - attempt_decodes with valid charset inference - attempt_decodes with malformed content - attempt_decodes with unsupported charset names - trial_decode_as_confident function behavior - Confidence calculation in trial decoding - Exception handling in decode failures
Charset Promotion Tests (400-499): - ASCII to UTF-8 promotion behavior - UTF-8 to UTF-8-sig promotion behavior - Custom promotion mapping handling - Promotion precedence and conflict resolution
Implementation Notes: - Mock environment for OS charset testing - Test all CodecSpecifiers enum variants - Verify confidence calculation accuracy - Cross-platform charset handling differences - Error path testing for decode failures
test_300_validation¶
Scope: Text validation and reasonableness checking
Basic Tests (000-099): - Module import and function accessibility
Text Validation Profile Tests (100-199): - Default profile behavior and validation - Custom profile creation and application - Profile parameter validation - Immutable profile handling
Text Reasonableness Tests (200-299): - is_valid_text with normal textual content - is_valid_text with control character heavy content - is_valid_text with whitespace-only content - is_valid_text with binary data rejection - Unicode normalization considerations - Very long text validation performance
BOM Handling Tests (300-399): - BOM detection and handling in validation - UTF-8, UTF-16, UTF-32 BOM recognition - BOM removal in validation process - Invalid BOM sequence handling
Character Ratio Tests (400-499): - Character ratio calculations at boundaries - Threshold validation for ratio limits - Edge cases with minimal content - Ratio calculation with various character sets
Implementation Notes: - Test validation profiles with extreme content - BOM handling across different Unicode encodings - Character ratio boundary condition testing - Performance considerations with large text
test_310_detectors (HIGHEST PRIORITY)¶
Scope: Core detection functions and default return behavior
Basic Tests (000-099): - Module import verification - Registry container initialization - Detector registration verification
DEFAULT RETURN BEHAVIOR TESTS (100-199) - CRITICAL: - DetectFailureActions.Default returns default with confidence 0.0 - DetectFailureActions.Error raises appropriate exceptions - charset_on_detect_failure configuration behavior - mimetype_on_detect_failure configuration behavior - Mixed failure behaviors (charset defaults, mimetype errors) - Empty content handling in both failure modes - Failed detection with various default values
Charset Detection Tests (200-299): - detect_charset with UTF-8 content - detect_charset with ASCII content (promotion to UTF-8) - detect_charset with Latin-1 content - detect_charset with malformed content - detect_charset_confidence function behavior - Empty content handling (returns UTF-8 with confidence 1.0) - Supplement parameter usage - Location parameter context
MIME Type Detection Tests (300-399): - detect_mimetype with magic byte detection - detect_mimetype with extension fallback - detect_mimetype_confidence function behavior - Empty content handling (returns text/plain with confidence 1.0) - Charset parameter influence on MIME detection - Binary content detection and classification
Registry System Tests (400-499): - Detector registration and retrieval - NotImplemented return handling for missing dependencies - Detector ordering configuration via Behaviors - Registry iteration and fallback behavior - Custom detector registration - Detector failure and recovery patterns
Integration Tests (500-599): - Combined charset and MIME type detection workflows - Context-aware detection with location hints - Behavior configuration influence on detection - Error recovery and fallback strategies - Performance testing with large content
Windows Compatibility Tests (600-699): - python-magic vs python-magic-bin MIME type differences - Cross-platform magic byte interpretation - Cygwin buffer handling for large content - Platform-specific charset detection differences
Implementation Notes: - Test all DetectFailureActions enum variants in isolation and combination - Test default return behavior with various custom default values - Validate confidence scoring for failure scenarios (must be 0.0) - Mock detector registry for dependency injection testing - Cross-platform testing considerations for magic libraries - Property-based testing for detection determinism
test_400_inference¶
Scope: Context-aware inference functions
Basic Tests (000-099): - Module import and function accessibility
Charset Inference Tests (100-199): - infer_charset with HTTP Content-Type headers - infer_charset with location extension hints - infer_charset with charset supplement parameters - infer_charset_confidence function behavior - Context priority resolution (HTTP > location > content) - Default parameter usage in inference
MIME Type and Charset Inference Tests (200-299): - infer_mimetype_charset combined detection - infer_mimetype_charset_confidence function behavior - HTTP Content-Type parsing and validation - Location-based inference precedence - Supplement parameter handling - Default value application
HTTP Content-Type Parsing Tests (300-399): - Valid Content-Type header parsing - Malformed Content-Type header handling - Charset parameter extraction from headers - MIME type parameter handling - Case sensitivity in header parsing - Missing or incomplete headers
Context Resolution Tests (400-499): - Multiple context source priority handling - Conflicting context resolution - Context validation and sanitization - Context-aware confidence scoring - Error handling in context processing
Enhanced Default Behavior Tests (500-599): - Custom charset_default and mimetype_default parameters - Default behavior with inference failures - Mixed default and error behaviors - Context-aware default selection
Implementation Notes: - Test HTTP Content-Type parsing with malformed headers - Verify context priority: HTTP > location > content analysis - Test inference with conflicting context indicators - Default behavior testing with new parameter patterns - Integration testing with complete inference workflows
test_500_decoders¶
Scope: High-level decoding and integration functions
Basic Tests (000-099): - Module import and function accessibility
High-Level Decode Tests (100-199): - decode function with valid content and detection - decode function with malformed content - decode function with custom charset_default parameter - decode function with custom mimetype_default parameter - decode function with validation profile parameters - decode function error handling and fallback
Default Parameter Tests (200-299): - Custom default values in decode function - Default behavior with detection failures - Graceful degradation with default parameters - Validation of default parameter precedence - Error handling when defaults are insufficient
Integration Workflow Tests (300-399): - Complete detection → validation → decode pipeline - HTTP Content-Type integration in decode - Location context usage in decode - Supplement parameter propagation - Behavior configuration effects on decode
Error Handling Tests (400-499): - ContentDecodeFailure exception scenarios - Decode error recovery with fallback charsets - Validation failure handling in decode - Exception chaining in decode failures - Location context in error messages
Performance Tests (500-599): - Large content decoding performance - Memory usage with large content - Decode timeout behavior (if applicable) - Streaming decode considerations
Implementation Notes: - Test new default parameter patterns comprehensively - Integration testing with complete detection pipeline - Error path testing with proper exception chaining - Performance testing with various content sizes - Validation profile integration testing
Test Data and Patterns¶
Content Patterns Module: tests/test_000_detextive/patterns.py
Provides curated byte sequences covering: - Charset detection samples (UTF-8, ASCII, Latin-1, Windows-1252, malformed) - MIME type detection samples (text, JSON, binary magic bytes) - Line separator patterns (Unix, Windows, Mac, mixed) - Content length patterns (empty, minimal, short, long) - Validation patterns (reasonable text, control characters, binary) - Error condition patterns (undetectable content, decode failures) - Windows compatibility patterns (platform-specific detection differences)
Test Fixtures: - Behaviors configurations for various testing scenarios - Mock detector functions for registry testing - Cross-platform expected outcomes - Performance benchmarking baselines
Cross-Platform Testing Strategy¶
Windows Compatibility: - python-magic vs python-magic-bin detection differences - Cygwin buffer handling validation - Platform-specific line separator handling - Unicode handling across platforms
Testing Approach: - Platform variant patterns for content with different expected outcomes - Conditional test expectations based on platform - Mock detector behavior for consistent cross-platform testing - Performance considerations for platform-specific libraries
Implementation Priorities¶
Priority 1 (CRITICAL): - Default return behavior patterns (DetectFailureActions enum) - Exception location parameter handling - Default parameter paths in decoding functions
Priority 2 (HIGH): - Charset codec edge cases and specifier handling - Enhanced inference functions with context awareness
Priority 3 (MEDIUM): - Text validation edge cases - Line separator detection edge cases - MIME type detection edge cases
Success Metrics¶
Functional Validation: - All DetectFailureActions enum variants tested - Default return behavior patterns comprehensively covered - Exception handling with location parameters complete - Enhanced inference functions tested - Cross-platform compatibility patterns established
Quality Assurance: - Coverage-gap-first methodology applied - Test data centralized in patterns module - Clean test structure with numbered organization - Cross-platform compatibility validated
Implementation Notes¶
Dependencies Requiring Injection: - OS charset detection for platform testing - Magic library detection for cross-platform testing - Registry detector functions for failure scenario testing
Filesystem Operations: - All test content provided via patterns module (no filesystem reads) - Location context testing with mock paths - Cross-platform path handling validation
External Services: - No external network testing required - All magic byte detection with local libraries - HTTP Content-Type testing with direct header values (no mocking needed)
Architectural Considerations: - Immutable object testing requires constructor-based injection - Registry testing through public API detector configuration - Behavior configuration testing via Behaviors dataclass - Exception testing through expected failure scenarios
CRITICAL Testing Focus: The default return behavior pattern (DetectFailureActions enum) is essential for testing system reliability with the new graceful degradation capabilities.