.. vim: set fileencoding=utf-8:
.. -*- coding: utf-8 -*-
.. +--------------------------------------------------------------------------+
   |                                                                          |
   | Licensed under the Apache License, Version 2.0 (the "License");          |
   | you may not use this file except in compliance with the License.         |
   | You may obtain a copy of the License at                                  |
   |                                                                          |
   |     http://www.apache.org/licenses/LICENSE-2.0                           |
   |                                                                          |
   | Unless required by applicable law or agreed to in writing, software      |
   | distributed under the License is distributed on an "AS IS" BASIS,        |
   | WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. |
   | See the License for the specific language governing permissions and      |
   | limitations under the License.                                           |
   |                                                                          |
   +--------------------------------------------------------------------------+


*******************************************************************************
001. Python API Design Specification
*******************************************************************************

Overview
===============================================================================

This document specifies the Python API design for the detextive library's
initial feature set, implementing faithful functional reproduction of existing
text detection capabilities from mimeogram, cache proxy, and ai-experiments
packages.

The design prioritizes behavioral fidelity and minimal migration effort while
following established project practices for interface contracts, module
organization, and naming conventions.

Public Interface Specification
===============================================================================

Core Detection Functions
-------------------------------------------------------------------------------

**Character Encoding Detection**

.. code-block:: python

    def detect_charset( content: bytes ) -> __.typx.Optional[ str ]:
        ''' Detects character encoding with UTF-8 preference and validation.

            Returns None if no reliable encoding can be determined.
        '''

**MIME Type Detection**

.. code-block:: python

    def detect_mimetype(
        content: bytes,
        location: __.cabc.Sequence[ str ] | __.Path | str
    ) -> __.typx.Optional[ str ]:
        ''' Detects MIME type using content analysis and extension fallback.

            Returns standardized MIME type strings or None if detection fails.
        '''

**Combined Detection with Parameter Overrides**

.. code-block:: python

    def detect_mimetype_and_charset(
        content: bytes,
        location: __.cabc.Sequence[ str ] | __.Path | str, *,
        mimetype: __.Absential[ str ] = __.absent,
        charset: __.Absential[ str ] = __.absent,
    ) -> tuple[ str, __.typx.Optional[ str ] ]:
        ''' Detects MIME type and charset with optional parameter overrides.

            Returns tuple of (mimetype, charset). MIME type defaults to
            'text/plain' if charset detected but MIME type unknown, or
            'application/octet-stream' if neither detected.
        '''

**Textual Content Validation**

.. code-block:: python

    def is_textual_mimetype( mimetype: str ) -> bool:
        ''' Validates if MIME type represents textual content.

            Consolidates textual MIME type patterns from all source
            implementations. Supports text/* prefix, specific application
            types (JSON, XML, JavaScript, etc.), and textual suffixes
            (+xml, +json, +yaml, +toml).

            Returns True for MIME types representing textual content.
        '''

    def is_textual_content( content: bytes ) -> bool:
        ''' Determines if byte content represents textual data.

            Returns True for content that can be reliably processed as text.
        '''

Line Separator Processing
-------------------------------------------------------------------------------

**LineSeparators Enum**

.. code-block:: python

    class LineSeparators( __.enum.Enum ):
        ''' Line separators for cross-platform text processing. '''

        CR = '\r'     # Classic MacOS (0xD)
        CRLF = '\r\n' # DOS/Windows (0xD 0xA)
        LF = '\n'     # Unix/Linux (0xA)

        @classmethod
        def detect_bytes(
            selfclass,
            content: __.cabc.Sequence[ int ] | bytes,
            limit: int = 1024
        ) -> __.typx.Optional[ 'LineSeparators' ]:
            ''' Detects line separator from byte content sample.

                Returns detected LineSeparators enum member or None.
            '''

        @classmethod
        def normalize_universal( selfclass, content: str ) -> str:
            ''' Normalizes all line separators to Unix LF format.
            '''

        def normalize( self, content: str ) -> str:
            ''' Normalizes specific line separator to Unix LF format.
            '''

        def nativize( self, content: str ) -> str:
            ''' Converts Unix LF to this platform's line separator.
            '''

Interface Contract Principles
===============================================================================

Wide Parameters, Narrow Returns
-------------------------------------------------------------------------------

**Parameter Design:**
- Accept abstract base classes for maximum flexibility
- Support multiple input formats (bytes, Path, str, Sequence[str])
- Use Union types for naturally variable inputs

**Return Design:**
- Return concrete, immutable types (str, tuple, enum members)
- Prefer specific types over generic containers
- Use None for explicit "not detected" semantics

**Examples:**

.. code-block:: python

    # Wide parameters: accept any sequence-like or path-like input
    location: __.cabc.Sequence[ str ] | __.Path | str
    content: __.cabc.Sequence[ int ] | bytes

    # Narrow returns: specific immutable types
    -> __.typx.Optional[ str ]                        # Explicit None for "not detected"
    -> tuple[ str, __.typx.Optional[ str ] ]          # Immutable tuple with concrete types
    -> __.typx.Optional[ LineSeparators ]             # Specific enum member

Type Annotation Patterns
-------------------------------------------------------------------------------

**Function Signatures:**

.. code-block:: python

    # Use Annotated for documented parameter types
    Content: __.typx.TypeAlias = __.typx.Annotated[
        bytes,
        __.ddoc.Doc( "Raw byte content for analysis." )
    ]

    Location: __.typx.TypeAlias = __.typx.Annotated[
        __.typx.Union[ str, __.Path, __.cabc.Sequence[ str ] ],
        __.ddoc.Doc( "File path, URL, or path components for context." )
    ]

    # Comprehensive annotations with Absential pattern
    def detect_mimetype_and_charset(
        content: Content,
        location: Location, *,
        mimetype: __.Absential[ str ] = __.absent,
        charset: __.Absential[ str ] = __.absent,
    ) -> tuple[ str, __.typx.Optional[ str ] ]:

**Absential Pattern Usage:**
- Distinguish "not provided" (absent) from "explicitly None"
- Enable three-state parameters: absent | None | value
- Preserve complex parameter handling from mimeogram

Module Organization Design
===============================================================================

Package Structure
-------------------------------------------------------------------------------

.. code-block::

    sources/detextive/
    ├── __/
    │   ├── __init__.py          # Re-exports: cabc, typx, enum, Absential
    │   ├── imports.py           # chardet, puremagic, mimetypes
    │   └── nomina.py            # Project-specific constants
    ├── __init__.py              # Public API re-exports from implementation modules
    ├── py.typed                 # Type checking marker
    ├── detection.py             # Core detection function implementations
    ├── exceptions.py            # Package exception hierarchy
    └── lineseparators.py        # LineSeparators enum and utilities

**Module Responsibilities:**

**Module Responsibilities:**

**`__init__.py` (Main Module):**
- Re-exports public API from implementation modules
- Module organization: imports → re-exports

**`detection.py`:**
- Core detection function implementations: `detect_charset`, `detect_mimetype`, `detect_mimetype_and_charset`
- Textual content validation: `is_textual_mimetype`, `is_textual_content`
- Private heuristic functions: `_is_probable_textual_content` (used internally by validation logic)
- Consolidates detection logic from all source implementations

**`lineseparators.py`:**
- LineSeparators enum class with all methods
- Direct migration preserving existing byte-level detection logic
- Cross-platform line ending handling utilities

**`exceptions.py`:**
- Package exception hierarchy: Omniexception → Omnierror → specific exceptions
- Detection-specific exceptions following nomenclature patterns

**Additional Dependencies:**

The implementation will require imports for `chardet`, `mimetypes`, `puremagic` external libraries, and `dynadoc` for parameter documentation annotations.

**Private Constants Organization:**

.. code-block:: python

    # Textual MIME type patterns (consolidated from all sources)
    _TEXTUAL_MIME_TYPES = frozenset((
        'application/json',
        'application/xml',
        'application/javascript',
        'application/ecmascript',
        'application/graphql',          # From ai-experiments
        'application/ld+json',          # From cache proxy
        'application/x-httpd-php',      # From ai-experiments
        'application/x-latex',          # From ai-experiments
        'application/x-perl',           # From mimeogram
        'application/x-python',         # From mimeogram
        'application/x-ruby',           # From mimeogram
        'application/x-shell',          # From mimeogram
        'application/x-tex',            # From ai-experiments
        'application/x-yaml',           # From cache proxy
        'application/yaml',             # From cache proxy
        'image/svg+xml',
    ))

    _TEXTUAL_SUFFIXES = ('+xml', '+json', '+yaml', '+toml')

Exception Hierarchy Design
===============================================================================

Following Omniexception → Omnierror Pattern
-------------------------------------------------------------------------------

.. code-block:: python

    class Omniexception(__.immut.Object, BaseException):
        ''' Base for all exceptions raised by detextive package. '''

    class Omnierror(Omniexception, Exception):
        ''' Base for error exceptions raised by detextive package. '''

    # Specific exceptions following nomenclature patterns
    class CharsetDetectFailure( Omnierror, RuntimeError ):
        ''' Raised when character encoding detection fails. '''

    class ContentDecodeFailure( Omnierror, UnicodeError ):
        ''' Raised when content cannot be decoded with detected charset. '''

    class TextualMimetypeInvalidity( Omnierror, ValueError ):
        ''' Raised when MIME type is invalid for textual content processing. '''

Implementation Considerations
===============================================================================

Behavioral Fidelity Requirements
-------------------------------------------------------------------------------

**UTF-8 Bias Logic:**
- Prefer UTF-8 for ASCII-compatible content
- Validate detected charsets through trial decoding
- Return 'utf-8' for successful UTF-8 decoding of non-UTF charsets

**MIME Type Fallback Chain:**
- Primary: puremagic content-based detection
- Fallback: mimetypes extension-based detection
- Default: 'text/plain' if charset detected, 'application/octet-stream' otherwise

**Parameter Validation:**
- Preserve complex logic from `detect_mimetype_and_charset`
- Apply textual MIME type validation with trial decoding
- Handle mixed parameter states using Absential pattern

**Performance Characteristics:**
- Sample-based line separator detection (default 1KB limit) for performance on large files
- Lazy evaluation of detection operations
- Minimal abstraction to preserve existing performance