.. vim: set fileencoding=utf-8:
.. -*- coding: utf-8 -*-
.. +--------------------------------------------------------------------------+
   |                                                                          |
   | Licensed under the Apache License, Version 2.0 (the "License");          |
   | you may not use this file except in compliance with the License.         |
   | You may obtain a copy of the License at                                  |
   |                                                                          |
   |     http://www.apache.org/licenses/LICENSE-2.0                           |
   |                                                                          |
   | Unless required by applicable law or agreed to in writing, software      |
   | distributed under the License is distributed on an "AS IS" BASIS,        |
   | WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. |
   | See the License for the specific language governing permissions and      |
   | limitations under the License.                                           |
   |                                                                          |
   +--------------------------------------------------------------------------+


*******************************************************************************
System Overview
*******************************************************************************

The **detextive** library implements a faithful functional reproduction to
consolidate text detection capabilities from multiple packages. The first
iteration prioritizes behavioral fidelity and minimal migration effort over
architectural sophistication.

Major Components
===============================================================================

Core Detection Functions
-------------------------------------------------------------------------------

**Public Functional API**
  Direct consolidation of proven functions providing drop-in compatibility:
  
  * ``detect_charset(content)`` - Character encoding with UTF-8 bias
  * ``detect_mimetype(content, location)`` - MIME type with fallback chains
  * ``detect_mimetype_and_charset(content, location, *, mimetype=absent, 
    charset=absent)`` - Complex parameter handling from mimeogram
  * ``is_textual_mimetype(mimetype)`` - Textual MIME type validation
  * ``is_reasonable_text_content(content)`` - Heuristic text vs binary

**Line Separator Processing**
  Direct migration of proven enumeration and utilities:
  
  * ``LineSeparators`` enum - Detection, normalization, and nativization methods

Component Relationships
===============================================================================

**Functional Architecture**

.. code-block::

    ┌─────────────────────────────────────────────────┐
    │             Public Functions                  │
    │  detect_mimetype()  detect_charset()  etc...    │
    └─────────────────────────────────────────────────┘
                            │
    ┌─────────────────────────────────────────────────┐
    │          Consolidated Detection Logic          │
    │     Faithful reproduction of existing logic     │
    └─────────────────────────────────────────────────┘
                            │
    ┌─────────────────────────────────────────────────┐
    │            External Dependencies               │
    │    chardet  puremagic  mimetypes (stdlib)      │
    └─────────────────────────────────────────────────┘

**Data Flow**

1. **Input Processing**: Functions receive byte content and optional metadata
2. **Direct Analysis**: Functions apply statistical analysis, pattern matching,
   and heuristics using consolidated logic from existing implementations  
3. **Validated Logic**: All detection behavior reproduced exactly from proven
   mimeogram, cache proxy, and ai-experiments implementations
4. **Output**: Identical return values and types as existing implementations

Integration Patterns
===============================================================================

**Drop-in Replacement Strategy**
  Existing code can replace imports with minimal changes:
  
  .. code-block:: python
  
      # Before: from mimeogram.acquirers import _detect_charset
      # After:  from detextive import detect_charset
      charset = detect_charset(content_bytes)

**Behavioral Fidelity**
  Preserves exact existing behavior:
  
  * UTF-8 bias with validation from mimeogram charset detection
  * Extensible textual MIME type patterns from all implementations
  * Fallback chains (puremagic → mimetypes) from mimeogram  
  * Complex parameter handling from ``detect_mimetype_and_charset``
  * Heuristic validation from ``is_reasonable_text_content``
  * Error handling and exception types maintained

**Implementation Strategy**
  * Direct consolidation of proven function logic
  * Minimal abstraction to preserve performance characteristics
  * Same dependencies and detection libraries as existing implementations

Architectural Patterns
===============================================================================

**Faithful Functional Reproduction**
  Direct consolidation of existing functional implementations without
  architectural changes (see ADR-001).

**Consolidation Pattern**
  Multiple implementations merged into single functions:
  
  * **chardet**: Statistical charset detection with UTF-8 bias
  * **puremagic**: Pure Python magic byte detection (primary)  
  * **mimetypes**: Standard library extension-based fallback
  * **LineSeparators**: Byte-level line ending detection and normalization

**Future Extensibility**
  ADR-002 documents deferred architectural enhancements for future iterations:
  
  * Internal detector classes for configuration and testing
  * Consolidated result objects for multi-value operations
  * Plugin architecture for alternative detection backends