API¶
Package detextive
¶
Detects textual content.
Module detextive.detection
¶
Core detection function implementations.
- type detextive.detection.Location = str | pathlib.Path¶
- detextive.detection.detect_charset(content)¶
Detects character encoding with UTF-8 preference and validation.
Returns None if no reliable encoding can be determined.
- detextive.detection.detect_mimetype(content, location)¶
Detects MIME type using content analysis and extension fallback.
Returns standardized MIME type strings or None if detection fails.
- Parameters:
content (bytes) – Raw byte content for analysis.
location (str | pathlib.Path) – File path, URL, or path components for context.
- Return type:
str | None
- detextive.detection.detect_mimetype_and_charset(content, location, *, mimetype=absence.absent, charset=absence.absent)¶
Detects MIME type and charset with optional parameter overrides.
Returns tuple of (mimetype, charset). MIME type defaults to ‘text/plain’ if charset detected but MIME type unknown, or ‘application/octet-stream’ if neither detected.
- detextive.detection.is_textual_content(content)¶
Determines if byte content represents textual data.
Returns True for content that can be reliably processed as text.
- detextive.detection.is_textual_mimetype(mimetype)¶
Validates if MIME type represents textual content.
Consolidates textual MIME type patterns from all source implementations. Supports text/* prefix, specific application types (JSON, XML, JavaScript, etc.), and textual suffixes (+xml, +json, +yaml, +toml).
Returns True for MIME types representing textual content.
Module detextive.lineseparators
¶
Line separator enumeration and utilities.
- class detextive.lineseparators.LineSeparators(value)¶
Bases:
Enum
Line separators for cross-platform text processing.
- Variables:
- classmethod detect_bytes(content, limit=1024)¶
Detects line separator from byte content sample.
Returns detected LineSeparators enum member or None.
- nativize(content)¶
Converts Unix LF to this platform’s line separator.
- normalize(content)¶
Normalizes specific line separator to Unix LF format.
- classmethod normalize_universal(content)¶
Normalizes all line separators to Unix LF format.
Module detextive.exceptions
¶
Family of exceptions for package API.
- exception detextive.exceptions.CharsetDetectFailure(location)¶
Bases:
Omnierror
,RuntimeError
Character encoding detection fails.
- exception detextive.exceptions.ContentDecodeFailure(location, charset)¶
Bases:
Omnierror
,UnicodeError
Content cannot be decoded with detected charset.
- exception detextive.exceptions.Omnierror¶
Bases:
Omniexception
,Exception
Base for error exceptions raised by package API.
- exception detextive.exceptions.Omniexception¶
Bases:
BaseException
Base for all exceptions raised by package API.
- exception detextive.exceptions.TextualMimetypeInvalidity(location, mimetype)¶
Bases:
Omnierror
,ValueError
MIME type is invalid for textual content processing.