Core Functionality Test Plan¶

Test Plan: detection.py and lineseparators.py

Coverage Analysis Summary¶

detection.py¶

Current coverage: 77%
Target coverage: 95%+ (focused on critical paths)
Remaining uncovered lines: 77-81, 111, 121, 124-128, 173-174, 176
Critical gaps: ASCII charset fallback, parameter overrides, exception paths

lineseparators.py¶

Current coverage: 91%
Target coverage: 95%+ (focused on critical paths)
Remaining uncovered branches: 4 exit conditions in enum methods
Status: Good coverage, mainly missing edge case branches

Focused Test Cases for Remaining Coverage Gaps¶

Priority Test Cases to Close Critical Coverage Gaps¶

ASCII Charset Detection (Lines 77-81)

Test content that chardet detects as ‘ascii’ → should return ‘utf-8’
Test content that chardet detects as ‘MacRoman’ but decodes as UTF-8 → should return ‘utf-8’
Test content that chardet detects as ‘iso-8859-1’ and fails UTF-8 decode → should return ‘iso-8859-1’

Parameter Override Cases (Line 111)

Test detect_mimetype_and_charset() with explicit mimetype override
Test with both mimetype and charset overrides

Fallback to Octet-Stream (Line 121)

Test with binary content that has no detectable mimetype or charset

Exception Path Testing (Lines 124-128, 173-174, 176)

Test non-textual mimetype (e.g., ‘image/jpeg’) with detected charset but no reasonable text content
Test invalid charset name (LookupError) in validation
Test content that can’t be decoded with detected charset (UnicodeDecodeError)
Test decoded content that fails reasonableness checks

Exception Constructor Coverage (exceptions.py Lines 43, 52, 61)

Raise each exception type to test constructor message formatting

Test Strategy¶

detection.py Component-Specific Tests¶

Function: detect_charset (Tests 100-199)¶

Happy path: Valid text content with various encodings (UTF-8, ASCII, latin-1, cp1252)
UTF-8 bias logic: Content that could be multiple encodings but should return UTF-8
ASCII superset handling: ASCII content should return ‘utf-8’
chardet failure: Content where chardet returns None
False positive elimination: Content detected as MacRoman but actually UTF-8
Edge cases: Empty content, binary content, mixed encoding markers

Function: detect_mimetype (Tests 200-299)¶

Content-based detection: Files with clear magic numbers (JPEG, PNG, PDF)
Extension fallback: Files without magic numbers falling back to mimetypes.guess_type
PureError handling: Content that triggers puremagic.PureError
ValueError handling: Malformed content triggering ValueError
Location parameter variations: str and Path inputs

Function: detect_mimetype_and_charset (Tests 300-399)¶

Both detected: Content with both clear mimetype and charset
Mimetype override: Using absential parameter to override detection
Charset override: Using absential parameter to override detection
Text/plain fallback: Charset detected but no mimetype
Octet-stream fallback: Neither detected
TextualMimetypeInvalidity cases: Non-textual mimetype with charset but validation fails
Validation success: Non-textual mimetype with valid charset and reasonable content

Function: is_textual_mimetype (Tests 400-499)¶

text/* prefix: text/plain, text/html, text/x-custom
Specific application types: All types in _TEXTUAL_MIME_TYPES frozenset
Textual suffixes: Custom types with +xml, +json, +yaml, +toml suffixes
Non-textual types: image/jpeg, video/mp4, application/octet-stream
Edge cases: Empty string, malformed MIME types like “text” or “text//html”

Function: is_reasonable_text_content (Tests 500-599)¶

Valid text content: Normal readable text with proper character distribution
Empty content rejection: Empty strings should return False
Control character limits: Content with >10% control characters (excluding \t\n\r)
Printable character ratio: Content with <80% printable characters
Common whitespace handling: Content with tabs, newlines, carriage returns
Binary-like content: Content that appears to be binary data

Function: _validate_mimetype_with_trial_decode (Tests 600-699)¶

Successful decode and validation: Valid charset and reasonable text content
UnicodeDecodeError: Invalid charset for the content
LookupError: Unknown/invalid charset name
Unreasonable content: Valid decode but content fails reasonableness test
Exception chaining: Verify TextualMimetypeInvalidity is raised with proper cause

lineseparators.py Component-Specific Tests¶

LineSeparators Enum Basic Tests (Tests 100-199)¶

Enum members: CR, CRLF, LF values and string representations
Enum behavior: Comparison, hashing, iteration

Method: LineSeparators.detect_bytes (Tests 200-299)¶

LF detection: Unix-style \n line endings
CRLF detection: Windows-style \r\n line endings
CR detection: Classic Mac \r line endings
Mixed content: Content with multiple line ending types (first wins)
No line endings: Content without any line separators
Limit parameter: Content longer than limit with line endings beyond limit
Edge cases: Empty content, single character content
Byte vs int sequence: Both bytes objects and Sequence[int] inputs

Method: LineSeparators.normalize_universal (Tests 300-399)¶

CRLF to LF: Windows line endings converted to Unix
CR to LF: Classic Mac line endings converted to Unix
Mixed line endings: Content with both CRLF and CR converted
Already LF: Unix content unchanged
No line endings: Content without line separators unchanged
Edge cases: Empty string, single line ending character

Method: LineSeparators.normalize (Tests 400-499)¶

CR instance normalization: CR enum member converting \r to \n
CRLF instance normalization: CRLF enum member converting \r\n to \n
LF instance normalization: LF enum member should return unchanged
Multiple occurrences: Content with multiple instances of the separator
No matching separators: Content without the specific separator

Method: LineSeparators.nativize (Tests 500-599)¶

CR instance nativization: Converting \n to \r
CRLF instance nativization: Converting \n to \r\n
LF instance nativization: LF enum member should return unchanged
Multiple line endings: Content with multiple \n converted appropriately
No line endings: Content without \n unchanged

Implementation Notes¶

Dependencies requiring injection: None¶

All functions are pure with standard library dependencies
chardet, puremagic, mimetypes can be mocked if needed but may not be necessary

Filesystem operations needing pyfakefs: None¶

Functions operate on in-memory content, no file I/O required

External services requiring mocking: None¶

No external network calls or services

Test data strategy¶

Primary approach: Inline byte arrays in test code (100% of tests)
- b"Hello \\xc3\\xa9 world" for UTF-8 content
- b"Simple ASCII text" for ASCII content
- b"Line 1\\r\\nLine 2\\r\\nLine 3" for line ending tests
- b'\\xff\\xd8\\xff\\xe0\\x00\\x10JFIF' for JPEG magic number testing
No file fixtures needed: All test data can be represented as byte literals

Private functions/methods testable via public API¶

_validate_mimetype_with_trial_decode() is called by detect_mimetype_and_charset()
Test through public API by providing scenarios that trigger validation

Areas requiring immutability constraint violations: None¶

All code is testable through public interfaces without monkey-patching

Third-party testing patterns to research¶

Mock puremagic.from_string() exceptions if needed
Mock chardet.detect() return values for edge cases
Mock mimetypes.guess_type() for extension fallback testing

Test module numbering¶

Current test structure: - test_000_package.py - package sanity checks (existing) - test_010_base.py - imports testing (existing)

Needed test modules for 100% coverage: - test_100_exceptions.py - exception classes testing - test_200_detection.py - detection module functional testing - test_210_lineseparators.py - line separators enum functional testing

Anti-patterns to avoid¶

Testing against real external sites (not applicable)
Monkey-patching internal code (use mocking of external deps only if needed)
Over-mocking (prefer real function execution with varied inputs)

Success Metrics¶

Target line coverage: 100% for both detection.py and lineseparators.py
Target branch coverage: 100% for both modules
Specific gaps to close: Lines 77-81, 111, 121, 124-128, 173-174, 176 in detection.py
Exception testing: Ensure all 3 exception classes are instantiated and tested

100% Coverage Approach

Since all uncovered lines are testable without complex mocking: - Target: 100% line and branch coverage - Estimated: 15-20 focused test cases across 3 new test modules - Strategy: Direct testing of edge cases and error paths - No #pragma: no cover needed - all code paths are legitimately testable