001. Core Engine and Rule Framework Architecture

Status

Accepted

Context

The Python linter needs a core architecture that supports the PRD requirements for:

  • Extensible rule system: Four initial rules with ability to add more

  • Performance targets: Process 1000 lines in <1000ms

  • Configurable behavior: Rule enable/disable and parameter customization

  • Precise error reporting: Line/column positions with actionable messages

  • Incremental development: Rules implemented and deployed individually

The system must balance simplicity for initial development with extensibility for future growth. Analysis of prototype implementations reveals three approaches with varying architectural merit:

  1. Single-file monolithic design: All rules in one visitor class (from GPT-5 sketch) - demonstrates implementation patterns but has architectural limitations

  2. Phased incremental approach: Start simple, add complexity iteratively (from Grok sketch) - excellent development strategy

  3. Modular plugin architecture: Separate rule classes with registry system (from Opus sketch) - superior production architecture

Key architectural forces:

  • Development speed vs. maintainability: Simple designs enable faster initial development but become unmaintainable as rules grow

  • Performance vs. flexibility: Single-pass analysis is faster but multiple rule instances may provide better isolation

  • Configuration complexity: Granular rule control requires more sophisticated configuration management

Decision

We will implement a hybrid modular architecture based primarily on the Opus and Grok approaches, incorporating the GPT-5 implementation patterns where valuable:

Core Engine Design: - Central LinterEngine class orchestrates the analysis pipeline - Single-pass CST traversal with metadata providers (PositionProvider, ScopeProvider, QualifiedNameProvider) - Rule registry system for automatic discovery and configuration - Violation collection and deduplication across all rules

Rule Framework Design: - Abstract BaseRule class providing common metadata access and violation reporting - Each rule inherits from both BaseRule and libcst.CSTVisitor - Rules operate independently with no cross-dependencies - Collection-then-analysis pattern: Rules collect data during traversal, analyze collections in post-processing (pattern validated in proof-of-concept) - Context extraction pattern validated for enhanced error reporting - Standardized violation format with rule ID, location, and message

Development Progression: - Start with simplified single-file prototype for rapid validation (Phase 1) - Refactor to modular architecture as rule complexity grows (Phase 2) - Add advanced features (configuration, auto-fixes) incrementally (Phase 3+)

Key Architectural Components:

# Core abstractions
@dataclass
class Violation:
    rule_id: str
    filename: str
    line: int
    column: int
    message: str
    severity: str = "error"

class BaseRule(ABC, cst.CSTVisitor):
    """Base class for all linting rules implementing collection-then-analysis pattern."""
    METADATA_DEPENDENCIES = (PositionProvider, ScopeProvider, QualifiedNameProvider)

    def __init__(self, filename: str, wrapper: MetadataWrapper):
        self.filename = filename
        self.wrapper = wrapper
        self.violations: List[Violation] = []
        # Subclasses add collection attributes here

    @property
    @abstractmethod
    def rule_id(self) -> str: ...

    def leave_Module(self, node: cst.Module) -> None:
        """Override to analyze collected data after traversal completes."""
        self._analyze_collections()

    @abstractmethod
    def _analyze_collections(self) -> None:
        """Analyze collected data and generate violations."""
        ...

class LinterEngine:
    """Central orchestrator for linting analysis."""
    def __init__(self, rules: List[BaseRule], config: Configuration): ...
    def lint_files(self, paths: List[Path]) -> List[Violation]: ...

Alternatives

Alternative 1: Pure Single-File Design

Implement all rules as methods in a single CSTVisitor class.

Rejected because: - Poor separation of concerns as rule count grows - Difficult to configure individual rules - Testing becomes complex with intermingled rule logic - Violates single responsibility principle

Alternative 2: Pure Plugin Architecture

Implement each rule as completely independent modules with their own CST parsing.

Rejected because: - Performance overhead from multiple parse passes - Increased memory usage and complexity - Harder to share common metadata and utilities - Overly complex for initial implementation needs

Alternative 3: Event-Driven Architecture

Use observer pattern with CST events triggering rule callbacks.

Rejected because: - Added complexity for minimal benefit in this domain - Harder to debug and trace rule execution - Potential performance overhead from event dispatch - LibCST visitor pattern already provides needed traversal

Consequences

Positive Consequences:

  • Rapid prototyping: Can start with simple implementations and refactor incrementally

  • Good separation of concerns: Each rule focuses on specific code patterns

  • Performance optimization: Single-pass analysis with multiple rule evaluation

  • Easy testing: Rules can be tested independently with focused test cases

  • Configuration flexibility: Individual rule enable/disable and parameter control

  • Future extensibility: New rules require minimal framework changes

Negative Consequences:

  • Initial architecture complexity: More complex than pure single-file approach

  • Development overhead: Base classes and abstractions require more initial setup

  • Memory usage: Multiple rule instances consume more memory than single visitor

  • Potential performance impact: Rule isolation may have minor overhead vs. monolithic design

Risks and Mitigations:

  • Risk: Framework complexity slows initial development Mitigation: Start with minimal viable abstractions, add complexity incrementally

  • Risk: Performance doesn’t meet 1000ms target Mitigation: Target confirmed achievable through validation - 1000ms target provides comfortable margin for implementation

  • Risk: Rule interactions create unexpected behaviors Mitigation: Enforce rule isolation, comprehensive integration testing

Implementation Guidance:

  1. Phase 1: Implement working prototype with 2-3 core rules to validate approach

  2. Phase 2: Refactor to full modular architecture with remaining rules

  3. Phase 3: Add configuration system and advanced features

  4. Testing strategy: Unit tests per rule, integration tests for engine, performance benchmarks

  5. Performance monitoring: Track analysis time per file size, memory usage per rule count