Test Plan: Version 2.0 Complete Test Suite¶

Testing Philosophy¶

Coverage-Gap-First Approach: Use doctests for examples and happy paths, pytest for coverage gaps and edge cases only.

Focus Areas: - Default return behavior patterns (DetectFailureActions enum) - Exception location parameter handling - Enhanced detection and inference capabilities - Cross-platform compatibility considerations

Windows Compatibility Considerations: - python-magic vs python-magic-bin MIME type detection differences - Cross-platform line separator handling - Cygwin buffer issue mitigations

Test Strategy Overview¶

Coverage-Gap-First Approach: - Target specific uncovered lines identified in coverage analysis - Replace existing commented-out tests with minimal effective coverage - Focus on default return behavior patterns (DetectFailureActions enum) - Essential edge cases and error paths only - Avoid comprehensive testing that duplicates doctest coverage

Test Module Organization: - test_100_nomina: Type aliases and common types (minimal - may skip) - test_110_exceptions: Exception hierarchy and location parameter handling - test_120_core: Core types, enums, and behaviors - test_200_lineseparators: Line separator detection and normalization - test_210_mimetypes: MIME type utility functions - test_220_charsets: Charset detection utilities and codec handling - test_300_validation: Text validation and reasonableness checking - test_310_detectors: Core detection functions (highest priority) - test_400_inference: Context-aware inference functions - test_500_decoders: High-level decoding and integration functions

Test Module Specifications¶

test_100_nomina (Optional)¶

Scope: Type aliases and common definitions

Assessment: Minimal testing needed - type aliases don’t require extensive testing. May skip this module unless coverage tools require it.

Basic Tests (000-099): - Import verification - Type alias accessibility

test_110_exceptions¶

Scope: Exception hierarchy and location parameter handling

Basic Tests (000-099): - Exception hierarchy verification - Import and inheritance structure validation

CharsetDetectFailure Tests (100-119): - Construction with and without location parameter - String location message formatting - pathlib.Path location handling - Absential location handling (__.absent)

CharsetInferFailure Tests (120-139): - Construction with and without location parameter - Location context in inference failure messages

MimetypeDetectFailure Tests (140-159): - Construction with and without location parameter - Various location types (str, Path) in messages

ContentDecodeFailure Tests (160-179): - Construction with charset and location details - Exception chaining preservation

Exception Hierarchy Tests (180-199): - Omniexception base class behavior - Omnierror inheritance and catching patterns - Multiple inheritance with built-in exception types - Package-wide exception catching via Omnierror

Implementation Notes: - Test all exception types with both present and absent location parameters - Verify proper message formatting includes location when provided - Test exception chaining with ‘from’ clauses - Cross-platform path handling in location parameters

test_120_core¶

Current Coverage: 100% - Maintain coverage while expanding tests

Basic Tests (000-099): - Module import verification - Constant value validation (CHARSET_DEFAULT, MIMETYPE_DEFAULT)

Enum Tests (100-199): - BehaviorTristate enum values and behavior - CodecSpecifiers enum values and usage - DetectFailureActions enum values and semantics - Enum string representations and comparisons

Behaviors Configuration Tests (200-299): - Default Behaviors instance validation - Custom Behaviors instance creation - Field defaults and validation - Detector order sequence handling - Tristate behavior configurations

Result Types Tests (300-399): - CharsetResult construction and field access - MimetypeResult construction and field access - Confidence value validation (0.0 to 1.0 range) - Optional charset handling in CharsetResult

Confidence Calculation Tests (400-499): - confidence_from_bytes_quantity with various content lengths - Confidence divisor behavior testing - Edge cases: empty content, very long content - Custom behavior configuration effects

Implementation Notes: - Test all enum values and their auto-generated identities - Test confidence calculation formula and edge cases - Validate behavior configuration precedence and defaults

test_200_lineseparators¶

Scope: Line separator detection and normalization

Basic Tests (000-099): - Enum structure and values validation - Import accessibility verification

Detection Tests (100-199): - Unix LF detection from byte content - Windows CRLF detection from byte content - Classic Mac CR detection from byte content - Mixed line ending detection (first-wins behavior) - Empty content detection (returns None) - Content without line endings (returns None) - Integer sequence input handling - Detection limit parameter behavior

Normalization Tests (200-299): - normalize_universal: all endings to LF conversion - normalize_universal: content without endings (unchanged) - normalize_universal: empty content handling - Individual enum normalize methods (CR, CRLF, LF) - Preserve content that’s already normalized

Platform Conversion Tests (300-399): - nativize method behavior per platform - Unix LF to platform-specific conversion - Edge cases in platform conversion - Content without line endings in nativize

Edge Case Tests (400-499): - Very long content with mixed endings - Consecutive line separators - Line separators at content boundaries - Invalid or malformed line ending sequences

Windows Compatibility Tests (500-599): - CRLF detection accuracy on Windows - Cross-platform nativize behavior consistency - Large content handling (Cygwin buffer considerations)

Implementation Notes: - Use content patterns for consistent test data - Test detection precedence (which separator wins in mixed content) - Verify immutability of enum instances - Cross-platform testing considerations for nativize behavior

test_210_mimetypes¶

Scope: MIME type utility functions

Basic Tests (000-099): - Module import and function accessibility

Textual MIME Type Tests (100-199): - is_textual_mimetype with text/* prefixes - Known textual application types (json, xml, javascript, yaml) - Textual suffixes (+json, +xml, +yaml, +toml) - Non-textual types rejection (image/*, video/*, audio/*) - Empty and malformed MIME type handling - Case sensitivity in MIME type evaluation

Edge Case Tests (200-299): - MIME types with parameters (text/plain; charset=utf-8) - Vendor-specific MIME types (application/vnd.*) - Custom and unknown MIME types - Very long MIME type strings - MIME types with unusual characters

Implementation Notes: - Comprehensive coverage of textual vs non-textual classification - Test MIME type parameter handling if applicable - Edge cases for malformed input - Performance testing with large MIME type lists

test_220_charsets¶

Scope: Charset detection utilities and codec handling

Basic Tests (000-099): - Module import verification - Function accessibility validation

OS Charset Detection Tests (100-199): - discover_os_charset_default function behavior - Cross-platform charset default handling - Caching behavior for OS charset detection - Environment variable influence testing

Codec Resolution Tests (200-299): - CodecSpecifiers enum handling in attempt_decodes - OsDefault codec specifier behavior - PythonDefault codec specifier behavior - UserSupplement codec specifier behavior - FromInference codec specifier behavior - Invalid codec name handling

Trial Decode Tests (300-399): - attempt_decodes with valid charset inference - attempt_decodes with malformed content - attempt_decodes with unsupported charset names - trial_decode_as_confident function behavior - Confidence calculation in trial decoding - Exception handling in decode failures

Charset Promotion Tests (400-499): - ASCII to UTF-8 promotion behavior - UTF-8 to UTF-8-sig promotion behavior - Custom promotion mapping handling - Promotion precedence and conflict resolution

Implementation Notes: - Mock environment for OS charset testing - Test all CodecSpecifiers enum variants - Verify confidence calculation accuracy - Cross-platform charset handling differences - Error path testing for decode failures

test_300_validation¶

Scope: Text validation and reasonableness checking

Basic Tests (000-099): - Module import and function accessibility

Text Validation Profile Tests (100-199): - Default profile behavior and validation - Custom profile creation and application - Profile parameter validation - Immutable profile handling

Text Reasonableness Tests (200-299): - is_valid_text with normal textual content - is_valid_text with control character heavy content - is_valid_text with whitespace-only content - is_valid_text with binary data rejection - Unicode normalization considerations - Very long text validation performance

BOM Handling Tests (300-399): - BOM detection and handling in validation - UTF-8, UTF-16, UTF-32 BOM recognition - BOM removal in validation process - Invalid BOM sequence handling

Character Ratio Tests (400-499): - Character ratio calculations at boundaries - Threshold validation for ratio limits - Edge cases with minimal content - Ratio calculation with various character sets

Implementation Notes: - Test validation profiles with extreme content - BOM handling across different Unicode encodings - Character ratio boundary condition testing - Performance considerations with large text

test_310_detectors (HIGHEST PRIORITY)¶

Scope: Core detection functions and default return behavior

Basic Tests (000-099): - Module import verification - Registry container initialization - Detector registration verification

DEFAULT RETURN BEHAVIOR TESTS (100-199) - CRITICAL: - DetectFailureActions.Default returns default with confidence 0.0 - DetectFailureActions.Error raises appropriate exceptions - charset_on_detect_failure configuration behavior - mimetype_on_detect_failure configuration behavior - Mixed failure behaviors (charset defaults, mimetype errors) - Empty content handling in both failure modes - Failed detection with various default values

Charset Detection Tests (200-299): - detect_charset with UTF-8 content - detect_charset with ASCII content (promotion to UTF-8) - detect_charset with Latin-1 content - detect_charset with malformed content - detect_charset_confidence function behavior - Empty content handling (returns UTF-8 with confidence 1.0) - Supplement parameter usage - Location parameter context

MIME Type Detection Tests (300-399): - detect_mimetype with magic byte detection - detect_mimetype with extension fallback - detect_mimetype_confidence function behavior - Empty content handling (returns text/plain with confidence 1.0) - Charset parameter influence on MIME detection - Binary content detection and classification

Registry System Tests (400-499): - Detector registration and retrieval - NotImplemented return handling for missing dependencies - Detector ordering configuration via Behaviors - Registry iteration and fallback behavior - Custom detector registration - Detector failure and recovery patterns

Integration Tests (500-599): - Combined charset and MIME type detection workflows - Context-aware detection with location hints - Behavior configuration influence on detection - Error recovery and fallback strategies - Performance testing with large content

Windows Compatibility Tests (600-699): - python-magic vs python-magic-bin MIME type differences - Cross-platform magic byte interpretation - Cygwin buffer handling for large content - Platform-specific charset detection differences

Implementation Notes: - Test all DetectFailureActions enum variants in isolation and combination - Test default return behavior with various custom default values - Validate confidence scoring for failure scenarios (must be 0.0) - Mock detector registry for dependency injection testing - Cross-platform testing considerations for magic libraries - Property-based testing for detection determinism

test_400_inference¶

Scope: Context-aware inference functions

Basic Tests (000-099): - Module import and function accessibility

Charset Inference Tests (100-199): - infer_charset with HTTP Content-Type headers - infer_charset with location extension hints - infer_charset with charset supplement parameters - infer_charset_confidence function behavior - Context priority resolution (HTTP > location > content) - Default parameter usage in inference

MIME Type and Charset Inference Tests (200-299): - infer_mimetype_charset combined detection - infer_mimetype_charset_confidence function behavior - HTTP Content-Type parsing and validation - Location-based inference precedence - Supplement parameter handling - Default value application

HTTP Content-Type Parsing Tests (300-399): - Valid Content-Type header parsing - Malformed Content-Type header handling - Charset parameter extraction from headers - MIME type parameter handling - Case sensitivity in header parsing - Missing or incomplete headers

Context Resolution Tests (400-499): - Multiple context source priority handling - Conflicting context resolution - Context validation and sanitization - Context-aware confidence scoring - Error handling in context processing

Enhanced Default Behavior Tests (500-599): - Custom charset_default and mimetype_default parameters - Default behavior with inference failures - Mixed default and error behaviors - Context-aware default selection

Implementation Notes: - Test HTTP Content-Type parsing with malformed headers - Verify context priority: HTTP > location > content analysis - Test inference with conflicting context indicators - Default behavior testing with new parameter patterns - Integration testing with complete inference workflows

test_500_decoders¶

Scope: High-level decoding and integration functions

Basic Tests (000-099): - Module import and function accessibility

High-Level Decode Tests (100-199): - decode function with valid content and detection - decode function with malformed content - decode function with custom charset_default parameter - decode function with custom mimetype_default parameter - decode function with validation profile parameters - decode function error handling and fallback

Default Parameter Tests (200-299): - Custom default values in decode function - Default behavior with detection failures - Graceful degradation with default parameters - Validation of default parameter precedence - Error handling when defaults are insufficient

Integration Workflow Tests (300-399): - Complete detection → validation → decode pipeline - HTTP Content-Type integration in decode - Location context usage in decode - Supplement parameter propagation - Behavior configuration effects on decode

Error Handling Tests (400-499): - ContentDecodeFailure exception scenarios - Decode error recovery with fallback charsets - Validation failure handling in decode - Exception chaining in decode failures - Location context in error messages

Performance Tests (500-599): - Large content decoding performance - Memory usage with large content - Decode timeout behavior (if applicable) - Streaming decode considerations

Implementation Notes: - Test new default parameter patterns comprehensively - Integration testing with complete detection pipeline - Error path testing with proper exception chaining - Performance testing with various content sizes - Validation profile integration testing

Test Data and Patterns¶

Content Patterns Module: tests/test_000_detextive/patterns.py

Provides curated byte sequences covering: - Charset detection samples (UTF-8, ASCII, Latin-1, Windows-1252, malformed) - MIME type detection samples (text, JSON, binary magic bytes) - Line separator patterns (Unix, Windows, Mac, mixed) - Content length patterns (empty, minimal, short, long) - Validation patterns (reasonable text, control characters, binary) - Error condition patterns (undetectable content, decode failures) - Windows compatibility patterns (platform-specific detection differences)

Test Fixtures: - Behaviors configurations for various testing scenarios - Mock detector functions for registry testing - Cross-platform expected outcomes - Performance benchmarking baselines

Cross-Platform Testing Strategy¶

Windows Compatibility: - python-magic vs python-magic-bin detection differences - Cygwin buffer handling validation - Platform-specific line separator handling - Unicode handling across platforms

Testing Approach: - Platform variant patterns for content with different expected outcomes - Conditional test expectations based on platform - Mock detector behavior for consistent cross-platform testing - Performance considerations for platform-specific libraries

Implementation Priorities¶

Priority 1 (CRITICAL): - Default return behavior patterns (DetectFailureActions enum) - Exception location parameter handling - Default parameter paths in decoding functions

Priority 2 (HIGH): - Charset codec edge cases and specifier handling - Enhanced inference functions with context awareness

Priority 3 (MEDIUM): - Text validation edge cases - Line separator detection edge cases - MIME type detection edge cases

Success Metrics¶

Functional Validation: - All DetectFailureActions enum variants tested - Default return behavior patterns comprehensively covered - Exception handling with location parameters complete - Enhanced inference functions tested - Cross-platform compatibility patterns established

Quality Assurance: - Coverage-gap-first methodology applied - Test data centralized in patterns module - Clean test structure with numbered organization - Cross-platform compatibility validated

Implementation Notes¶

Dependencies Requiring Injection: - OS charset detection for platform testing - Magic library detection for cross-platform testing - Registry detector functions for failure scenario testing

Filesystem Operations: - All test content provided via patterns module (no filesystem reads) - Location context testing with mock paths - Cross-platform path handling validation

External Services: - No external network testing required - All magic byte detection with local libraries - HTTP Content-Type testing with direct header values (no mocking needed)

Architectural Considerations: - Immutable object testing requires constructor-based injection - Registry testing through public API detector configuration - Behavior configuration testing via Behaviors dataclass - Exception testing through expected failure scenarios

CRITICAL Testing Focus: The default return behavior pattern (DetectFailureActions enum) is essential for testing system reliability with the new graceful degradation capabilities.