API

Package detextive

Detects textual content.

Module detextive.detection

Core detection function implementations.

type detextive.detection.Content = bytes
type detextive.detection.Location = str | pathlib.Path
detextive.detection.detect_charset(content)

Detects character encoding with UTF-8 preference and validation.

Returns None if no reliable encoding can be determined.

Parameters:

content (bytes) – Raw byte content for analysis.

Return type:

str | None

detextive.detection.detect_mimetype(content, location)

Detects MIME type using content analysis and extension fallback.

Returns standardized MIME type strings or None if detection fails.

Parameters:
  • content (bytes) – Raw byte content for analysis.

  • location (str | pathlib.Path) – File path, URL, or path components for context.

Return type:

str | None

detextive.detection.detect_mimetype_and_charset(content, location, *, mimetype=absence.absent, charset=absence.absent)

Detects MIME type and charset with optional parameter overrides.

Returns tuple of (mimetype, charset). MIME type defaults to ‘text/plain’ if charset detected but MIME type unknown, or ‘application/octet-stream’ if neither detected.

Parameters:
  • content (bytes) – Raw byte content for analysis.

  • location (str | pathlib.Path) – File path, URL, or path components for context.

  • mimetype (str | absence.objects.AbsentSingleton)

  • charset (str | absence.objects.AbsentSingleton)

Return type:

tuple[ str, str | None ]

detextive.detection.is_textual_content(content)

Determines if byte content represents textual data.

Returns True for content that can be reliably processed as text.

Parameters:

content (bytes)

Return type:

bool

detextive.detection.is_textual_mimetype(mimetype)

Validates if MIME type represents textual content.

Consolidates textual MIME type patterns from all source implementations. Supports text/* prefix, specific application types (JSON, XML, JavaScript, etc.), and textual suffixes (+xml, +json, +yaml, +toml).

Returns True for MIME types representing textual content.

Parameters:

mimetype (str)

Return type:

bool

Module detextive.lineseparators

Line separator enumeration and utilities.

class detextive.lineseparators.LineSeparators(value)

Bases: Enum

Line separators for cross-platform text processing.

Variables:
classmethod detect_bytes(content, limit=1024)

Detects line separator from byte content sample.

Returns detected LineSeparators enum member or None.

nativize(content)

Converts Unix LF to this platform’s line separator.

normalize(content)

Normalizes specific line separator to Unix LF format.

classmethod normalize_universal(content)

Normalizes all line separators to Unix LF format.

Module detextive.exceptions

Family of exceptions for package API.

exception detextive.exceptions.CharsetDetectFailure(location)

Bases: Omnierror, RuntimeError

Character encoding detection fails.

exception detextive.exceptions.ContentDecodeFailure(location, charset)

Bases: Omnierror, UnicodeError

Content cannot be decoded with detected charset.

exception detextive.exceptions.Omnierror

Bases: Omniexception, Exception

Base for error exceptions raised by package API.

exception detextive.exceptions.Omniexception

Bases: BaseException

Base for all exceptions raised by package API.

exception detextive.exceptions.TextualMimetypeInvalidity(location, mimetype)

Bases: Omnierror, ValueError

MIME type is invalid for textual content processing.