.. vim: set fileencoding=utf-8:
.. -*- coding: utf-8 -*-
.. +--------------------------------------------------------------------------+
   |                                                                          |
   | Licensed under the Apache License, Version 2.0 (the "License");          |
   | you may not use this file except in compliance with the License.         |
   | You may obtain a copy of the License at                                  |
   |                                                                          |
   |     http://www.apache.org/licenses/LICENSE-2.0                           |
   |                                                                          |
   | Unless required by applicable law or agreed to in writing, software      |
   | distributed under the License is distributed on an "AS IS" BASIS,        |
   | WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. |
   | See the License for the specific language governing permissions and      |
   | limitations under the License.                                           |
   |                                                                          |
   +--------------------------------------------------------------------------+


*******************************************************************************
Basic Usage
*******************************************************************************

This section demonstrates core text detection capabilities. Examples progress
from simple detection to combined inference and high-level text processing.

Character Encoding Detection
===============================================================================

Basic Encoding Detection
-------------------------------------------------------------------------------

Detect character encoding from byte content:

.. doctest:: BasicUsage

    >>> import detextive
    >>> content = b'Hello, world!'
    >>> charset = detextive.detect_charset( content )
    >>> charset
    'utf-8'

UTF-8 content with special characters:

.. doctest:: BasicUsage

    >>> content = b'Caf\xc3\xa9 \xe2\x98\x85'
    >>> charset = detextive.detect_charset( content )
    >>> charset
    'utf-8'

Non-ASCII encodings can be detected with sufficient content:

.. doctest:: BasicUsage

    >>> content = 'Café Restaurant Menu\nEntrées: Soupe, Salade'.encode( 'iso-8859-1' )
    >>> charset = detextive.detect_charset( content )
    >>> charset
    'iso8859-9'

MIME Type Detection
===============================================================================

Content-Based Detection
-------------------------------------------------------------------------------

Detect MIME types from file content using magic bytes:

.. doctest:: BasicUsage

    >>> import detextive
    >>> json_content = b'{"name": "example", "value": 42}'
    >>> mimetype = detextive.detect_mimetype( json_content )
    >>> mimetype in ('application/json', 'text/plain')  # text/plain on Windows with python-magic-bin
    True

Location-aware detection combines content analysis with file extension:

.. code-block:: python

    # For plain text without magic bytes, location helps determine MIME type
    text_content = b'Plain text content'
    try:
        mimetype = detextive.detect_mimetype( text_content, location = 'document.txt' )
        print( f"Text file MIME type: {mimetype}" )
    except detextive.exceptions.MimetypeDetectFailure:
        print( "Could not detect MIME type - need more distinctive content" )
    # Note: Plain text without magic bytes may require charset detection

Binary content is correctly identified:

.. doctest:: BasicUsage

    >>> pdf_header = b'%PDF-1.4'
    >>> mimetype = detextive.detect_mimetype( pdf_header )
    >>> mimetype
    'application/pdf'

Combined Inference
===============================================================================

MIME Type and Charset Together
-------------------------------------------------------------------------------

For best accuracy, detect both MIME type and charset simultaneously:

.. doctest:: BasicUsage

    >>> import detextive
    >>> content = b'{"message": "Hello"}'
    >>> mimetype, charset = detextive.infer_mimetype_charset( content, location = 'data.json' )
    >>> mimetype
    'application/json'
    >>> charset
    'utf-8'

Plain text files with location context:

.. doctest:: BasicUsage

    >>> content = b'Sample document content'
    >>> mimetype, charset = detextive.infer_mimetype_charset( content, location = 'readme.txt' )
    >>> mimetype
    'text/plain'
    >>> charset
    'utf-8'

Confidence-Based Detection
-------------------------------------------------------------------------------

Access confidence scores for detection decisions using the confidence API:

.. doctest:: BasicUsage

    >>> import detextive
    >>> content = b'{"name": "example", "data": "test"}'
    >>> mimetype_result, charset_result = detextive.infer_mimetype_charset_confidence( content, location = 'config.json' )
    >>> mimetype_result.mimetype
    'application/json'
    >>> mimetype_result.confidence > 0.8
    True
    >>> charset_result.charset
    'utf-8'
    >>> charset_result.confidence > 0.8
    True

The confidence API is useful for quality assessment and decision making:

.. doctest:: BasicUsage

    >>> text_content = b'Plain text without magic bytes'
    >>> mimetype_result, charset_result = detextive.infer_mimetype_charset_confidence( text_content, location = 'notes.txt' )
    >>> mimetype_result.mimetype
    'text/plain'
    >>> mimetype_result.confidence > 0.7
    True

High-Level Decoding
===============================================================================

Automatic Text Decoding
-------------------------------------------------------------------------------

The ``decode`` function provides complete bytes-to-text processing:

.. doctest:: BasicUsage

    >>> import detextive
    >>> content = b'Hello, world!'
    >>> text = detextive.decode( content )
    >>> text
    'Hello, world!'

UTF-8 content is properly decoded:

.. doctest:: BasicUsage

    >>> content = b'Caf\xc3\xa9 \xe2\x98\x85'
    >>> text = detextive.decode( content )
    >>> text
    'Café ★'

Location context improves decoding decisions:

.. doctest:: BasicUsage

    >>> content = b'Sample content for analysis'
    >>> text = detextive.decode( content, location = 'document.txt' )
    >>> text
    'Sample content for analysis'

Content Validation
===============================================================================

MIME Type Classification
-------------------------------------------------------------------------------

Check if MIME types represent textual content:

.. doctest:: BasicUsage

    >>> import detextive
    >>> detextive.is_textual_mimetype( 'text/plain' )
    True
    >>> detextive.is_textual_mimetype( 'application/json' )
    True
    >>> detextive.is_textual_mimetype( 'image/jpeg' )
    False

Text Quality Validation
-------------------------------------------------------------------------------

Validate that decoded text meets quality standards:

.. doctest:: BasicUsage

    >>> import detextive
    >>> text = "Hello, world!"
    >>> detextive.is_valid_text( text )
    True

Text with control characters fails validation:

.. doctest:: BasicUsage

    >>> text_with_controls = "Hello\x00\x01world"
    >>> detextive.is_valid_text( text_with_controls )
    False

Different types of text content and their validation:

.. doctest:: BasicUsage

    >>> detextive.is_valid_text( "Hello, world!" )
    True
    >>> detextive.is_valid_text( "Hello\x00\x01world" )
    False
    >>> detextive.is_valid_text( "   \n\t  " )
    True
    >>> detextive.is_valid_text( "" )
    True