005. Validation Behavior Configuration¶
Status¶
Accepted
Context¶
The v1.x functional approach provides no control over validation execution, leading to inappropriate validation overhead and inflexible error handling for different use cases. Analysis of integration patterns revealed that validation requirements vary significantly based on context:
Performance-Critical Scenarios: Quick charset detection for decoding workflows should skip expensive printable character analysis.
Security-Sensitive Contexts: Comprehensive validation including trial decoding and character analysis required to prevent processing of malicious content.
Batch Processing Workflows: Different validation thresholds appropriate for automated processing versus interactive validation.
Current Limitations:
All validation logic hardcoded with no runtime configuration
No ability to skip expensive validations for performance-critical paths
Fixed printable character thresholds inappropriate for all content types
Trial decoding always performed regardless of use case requirements
Requirements Analysis:
Selective Validation: Control which validation steps execute
Configurable Thresholds: Adjust validation parameters for different content types
Performance Control: Skip expensive operations when not required
Default Behavior: Zero-configuration defaults for common use cases
Backward Compatibility: Existing behavior preserved as default
Decision¶
We will implement a Behaviors Configuration Pattern that provides fine-grained control over validation execution through a structured configuration object.
Evolved Configuration Design:
class BehaviorTristate(enum.Enum):
Never = enum.auto()
AsNeeded = enum.auto()
Always = enum.auto()
class Behaviors(immut.Dataclass):
# Core detection controls
charset_detect: BehaviorTristate = BehaviorTristate.AsNeeded
mimetype_detect: BehaviorTristate = BehaviorTristate.AsNeeded
# Charset handling sophistication
charset_promotions: Mapping[str, str] = {'ascii': 'utf-8'}
charset_trial_codecs: Sequence[str | CodecSpecifiers] = (
CodecSpecifiers.Inference, CodecSpecifiers.UserDefault)
charset_trial_decode: BehaviorTristate = BehaviorTristate.AsNeeded
BehaviorTristate Control:
Never: Skip behavior entirely for maximum performance
AsNeeded: Apply behavior based on detection confidence and context (default)
Always: Force behavior regardless of confidence or context
Advanced Charset Handling:
charset_promotions: Mapping for upgrading detected charsets (e.g., ASCII→UTF-8)
charset_trial_codecs: Sequence of codecs to try during trial decoding
CodecSpecifiers: Enum for dynamic codec resolution (Inference, OsDefault, UserDefault)
Sophisticated Detection Control:
charset_detect: Controls when charset detection from content occurs
mimetype_detect: Controls when MIME type detection from content occurs
charset_trial_decode: Controls when trial decoding validation occurs
Integration Pattern:
def detect_mimetype_charset(
content: Content,
location: Absential[Location] = absent, *,
behaviors: Absential[Behaviors] = absent,
# ... other parameters
) -> tuple[Absential[str], Absential[str]]:
Default Behavior Design:
BEHAVIORS_DEFAULT = Behaviors(
trial_decode='as-needed',
validate_printable='as-needed',
printable_threshold=0.0,
assume_utf8_superset=True,
)
Alternatives¶
Individual Boolean Parameters
Benefits: Simple parameter interface, clear enable/disable semantics Drawbacks: Parameter proliferation, no structured configuration Rejection Reason: Leads to unwieldy function signatures as validation options grow
Global Configuration Object
Benefits: One-time configuration affects all function calls Drawbacks: Global state, less flexible per-call control, testing complexity Rejection Reason: Global state conflicts with functional approach
Validation Profile Enums
Benefits: Simple selection between predefined validation sets Drawbacks: Limited flexibility, configuration coupling Rejection Reason: Insufficient granularity for diverse use case requirements
Builder Pattern Configuration
Benefits: Fluent interface, incremental configuration building Drawbacks: Over-engineering for configuration object, additional complexity Rejection Reason: Functional configuration object simpler and more maintainable
Consequences¶
Positive Consequences
Performance Control: Skip expensive validations for performance-critical workflows
Use Case Flexibility: Appropriate validation for security, performance, or accuracy requirements
Threshold Configurability: Adjust validation parameters for different content types
Default Behavior: Zero-configuration operation for common use cases
Structured Configuration: Clear configuration object with documented semantics
Negative Consequences
Configuration Complexity: Additional parameter and configuration object increase cognitive load
Testing Matrix: Behavior combinations create large test space requiring systematic coverage
Documentation Overhead: Configuration options require comprehensive documentation and examples
Implementation Complexity: Conditional validation logic increases internal implementation complexity
Neutral Consequences
Migration Strategy: Existing code continues working with default behaviors
Future Extensibility: Configuration pattern provides foundation for additional validation options
Performance Characteristics: Behavior selection affects performance profiles predictably
Implementation Guidance
Performance-Optimized Configuration:
# Quick charset detection for decoding
fast_behaviors = Behaviors(
trial_decode='never',
validate_printable='never',
)
Security-Focused Configuration:
# Comprehensive validation for untrusted content
secure_behaviors = Behaviors(
trial_decode='always',
validate_printable='always',
printable_threshold=0.05, # Allow 5% non-printable
)
Content-Specific Configuration:
# Relaxed validation for code/data content
code_behaviors = Behaviors(
printable_threshold=0.15, # Allow more control characters
validate_printable='as-needed',
)
Conditional Logic Implementation:
Internal implementation will evaluate behavior configuration to determine which validation steps to execute, maintaining performance characteristics appropriate for each configuration profile.
Integration with Error Class Provider:
Behaviors configuration works in conjunction with error class provider pattern to provide complete control over validation execution and error handling:
result = detect_mimetype_charset(
content, location,
behaviors=secure_behaviors,
error_class_provider=security_error_mapper,
)
This decision provides the foundation for performance-aware and context-sensitive validation that addresses the rigid validation limitations of the v1.x functional approach while maintaining backward compatibility through sensible defaults.