Line Separator Processing

This section demonstrates cross-platform line ending detection and normalization. Examples cover mixed content handling and platform-specific conversions.

Line Separator Detection

Detecting Line Endings in Bytes

Detect the predominant line separator in byte content:

>>> import detextive
>>> from detextive import LineSeparators

>>> unix_content = b'Line 1\nLine 2\nLine 3'
>>> separator = LineSeparators.detect_bytes( unix_content )
>>> separator
<LineSeparators.LF: '\n'>

Windows-style line endings:

>>> windows_content = b'Line 1\r\nLine 2\r\nLine 3'
>>> separator = LineSeparators.detect_bytes( windows_content )
>>> separator
<LineSeparators.CRLF: '\r\n'>

Detecting Line Endings in Text

Detection also works with text strings:

>>> mixed_content = 'Line 1\r\nLine 2\rLine 3\n'
>>> separator = LineSeparators.detect_text( mixed_content )
>>> separator
<LineSeparators.CRLF: '\r\n'>

When line endings are mixed, the first detected type is returned:

>>> mixed_unix_first = 'A\nB\nC\nD\r\nE'
>>> separator = LineSeparators.detect_text( mixed_unix_first )
>>> separator
<LineSeparators.LF: '\n'>

Line Ending Normalization

Universal Normalization

Normalize any line endings to Python’s standard (LF):

>>> mixed_content = 'Line 1\r\nLine 2\rLine 3\n'
>>> normalized = LineSeparators.normalize_universal( mixed_content )
>>> normalized
'Line 1\nLine 2\nLine 3\n'

The normalization handles all three line ending types:

>>> complex_content = 'Unix\nWindows\r\nMac\rMixed'
>>> normalized = LineSeparators.normalize_universal( complex_content )
>>> normalized
'Unix\nWindows\nMac\nMixed'

Platform-Specific Conversion

Convert normalized text to specific line ending formats:

>>> normalized = 'Line 1\nLine 2\nLine 3'
>>> windows_format = LineSeparators.CRLF.nativize( normalized )
>>> windows_format
'Line 1\r\nLine 2\r\nLine 3'

Unix format (no change needed):

>>> unix_format = LineSeparators.LF.nativize( normalized )
>>> unix_format
'Line 1\nLine 2\nLine 3'

Complete Processing Workflow

Detection and Normalization Pipeline

A typical workflow for handling text with unknown line endings:

>>> import detextive
>>> from detextive import LineSeparators

>>> # Content with mixed line endings
>>> raw_content = 'Header\r\nUnix line\nMac line\rFooter'

>>> # Detect the predominant line ending
>>> detected = LineSeparators.detect_text( raw_content )
>>> print( f"Detected line ending: {detected.name}" )
Detected line ending: CRLF

>>> # Normalize to Python standard
>>> normalized = LineSeparators.normalize_universal( raw_content )
>>> print( f"Normalized: {repr( normalized )}" )
Normalized: 'Header\nUnix line\nMac line\nFooter'

>>> # Convert to target platform
>>> target_format = LineSeparators.CRLF.nativize( normalized )
>>> print( f"Target format: {repr( target_format )}" )
Target format: 'Header\r\nUnix line\r\nMac line\r\nFooter'

Processing Binary Content

Handle line endings in binary data before text processing:

>>> import detextive
>>> from detextive import LineSeparators

>>> # Binary content with mixed line endings
>>> binary_content = b'Data\r\nMore data\nFinal data\r'

>>> # Detect line separator
>>> separator = LineSeparators.detect_bytes( binary_content )
>>> print( f"Binary line ending: {separator.name}" )
Binary line ending: CRLF

>>> # Convert to text for normalization
>>> text_content = binary_content.decode( 'utf-8' )
>>> normalized = LineSeparators.normalize_universal( text_content )
>>> print( f"Normalized text: {repr( normalized )}" )
Normalized text: 'Data\nMore data\nFinal data\n'

Edge Cases and Special Handling

Empty and Single-Line Content

Line separator detection handles edge cases gracefully:

>>> # Empty content
>>> empty_separator = LineSeparators.detect_text( '' )
>>> empty_separator is None
True

>>> # Single line without ending
>>> single_line = 'Just one line'
>>> single_separator = LineSeparators.detect_text( single_line )
>>> single_separator is None
True