detextiveยถ

Package Version PyPI - Status Tests Status Code Coverage Percentage Project License Python Versions

๐Ÿ•ต๏ธ A Python library which provides consolidated text detection capabilities for reliable content analysis. Offers MIME type detection, character set detection, and line separator processing.

Key Features โญยถ

๐Ÿ” MIME Type Detection

Intelligent content-based detection using magic bytes with file extension fallback for comprehensive format identification.

๐Ÿ“ Character Encoding Detection

Statistical analysis with UTF-8 optimization and validation through decode operations for reliable text processing.

๐Ÿ“„ Line Separator Processing

Cross-platform line ending detection and normalization supporting CR, LF, and CRLF formats with mixed-content handling.

โœ… Textual Content Validation

Smart classification of MIME types and content reasonableness assessment using control character and printability heuristics.

Installation ๐Ÿ“ฆยถ

Method: Install Python Packageยถ

Install via uv pip command:

uv pip install detextive

Or, install via pip:

pip install detextive

Examples ๐Ÿ’กยถ

Basic Usageยถ

MIME Type and Charset Detection:

import detextive

with open( 'document.txt', 'rb' ) as file:
    content = file.read( )

# Individual detection
mimetype = detextive.detect_mimetype( content, 'document.txt' )
charset = detextive.detect_charset( content )

# Combined detection
mimetype, charset = detextive.detect_mimetype_and_charset(
    content, 'document.txt' )
print( "Detected: {mimetype} with {charset} encoding".format(
    mimetype = mimetype, charset = charset ) )

Line Separator Processing:

import detextive

content = 'Line 1\r\nLine 2\rLine 3\n'
separator = detextive.LineSeparators.detect_bytes( content.encode( ) )

# Normalize line separators to Python standard.
normalized = detextive.LineSeparators.normalize_universal( content )

# Convert to specific line separators.
native = detextive.LineSeparators.CRLF.nativize( normalized )

Content Classification:

import detextive

# Check if MIME type represents textual content
detextive.is_textual_mimetype( 'application/json' )  # True
detextive.is_textual_mimetype( 'image/jpeg' )        # False

# Validate text content from bytes
detextive.is_textual_content( b'Hello world!' )      # True
detextive.is_textual_content( b'\x00\x01\x02\x03' )  # False

Contribution ๐Ÿคยถ

Contribution to this project is welcome! However, it must follow the code of conduct for the project.

Please file bug reports and feature requests in the issue tracker or submit pull requests to improve the source code or documentation.

For development guidance and standards, please see the development guide.

More Flairยถ

GitHub last commit Copier Hatch pre-commit Pyright Ruff PyPI - Implementation PyPI - Wheel

Other Projects by This Author ๐ŸŒŸยถ

  • python-absence (absence on PyPI)

    ๐Ÿ•ณ๏ธ A Python library package which provides a sentinel for absent values - a falsey, immutable singleton that represents the absence of a value in contexts where None or False may be valid values.

  • python-accretive (accretive on PyPI)

    ๐ŸŒŒ A Python library package which provides accretive data structures - collections which can grow but never shrink.

  • python-classcore (classcore on PyPI)

    ๐Ÿญ A Python library package which provides foundational class factories and decorators for providing classes with attributes immutability and concealment and other custom behaviors.

  • python-dynadoc (dynadoc on PyPI)

    ๐Ÿ“ A Python library package which bridges the gap between rich annotations and automatic documentation generation with configurable renderers and support for reusable fragments.

  • python-falsifier (falsifier on PyPI)

    ๐ŸŽญ A very simple Python library package which provides a base class for falsey objects - objects that evaluate to False in boolean contexts.

  • python-frigid (frigid on PyPI)

    ๐Ÿ”’ A Python library package which provides immutable data structures - collections which cannot be modified after creation.

  • python-icecream-truck (icecream-truck on PyPI)

    ๐Ÿฆ Flavorful Debugging - A Python library which enhances the powerful and well-known icecream package with flavored traces, configuration hierarchies, customized outputs, ready-made recipes, and more.

  • python-mimeogram (mimeogram on PyPI)

    ๐Ÿ“จ A command-line tool for exchanging collections of files with Large Language Models - bundle multiple files into a single clipboard-ready document while preserving directory structure and metadataโ€ฆ good for code reviews, project sharing, and LLM interactions.

Table of Contentsยถ

Indicesยถ