001. Core Engine and Rule Framework Architecture¶
Status¶
Accepted
Context¶
The Python linter needs a core architecture that supports the PRD requirements for:
Extensible rule system: Four initial rules with ability to add more
Performance targets: Process 1000 lines in <1000ms
Configurable behavior: Rule enable/disable and parameter customization
Precise error reporting: Line/column positions with actionable messages
Incremental development: Rules implemented and deployed individually
The system must balance simplicity for initial development with extensibility for future growth. Analysis of prototype implementations reveals three approaches with varying architectural merit:
Single-file monolithic design: All rules in one visitor class (from GPT-5 sketch) - demonstrates implementation patterns but has architectural limitations
Phased incremental approach: Start simple, add complexity iteratively (from Grok sketch) - excellent development strategy
Modular plugin architecture: Separate rule classes with registry system (from Opus sketch) - superior production architecture
Key architectural forces:
Development speed vs. maintainability: Simple designs enable faster initial development but become unmaintainable as rules grow
Performance vs. flexibility: Single-pass analysis is faster but multiple rule instances may provide better isolation
Configuration complexity: Granular rule control requires more sophisticated configuration management
Decision¶
We will implement a hybrid modular architecture based primarily on the Opus and Grok approaches, incorporating the GPT-5 implementation patterns where valuable:
Core Engine Design:
- Central LinterEngine class orchestrates the analysis pipeline
- Single-pass CST traversal with metadata providers (PositionProvider, ScopeProvider, QualifiedNameProvider)
- Rule registry system for automatic discovery and configuration
- Violation collection and deduplication across all rules
Rule Framework Design:
- Abstract BaseRule class providing common metadata access and violation reporting
- Each rule inherits from both BaseRule and libcst.CSTVisitor
- Rules operate independently with no cross-dependencies
- Collection-then-analysis pattern: Rules collect data during traversal, analyze collections in post-processing (pattern validated in proof-of-concept)
- Context extraction pattern validated for enhanced error reporting
- Standardized violation format with rule ID, location, and message
Development Progression: - Start with simplified single-file prototype for rapid validation (Phase 1) - Refactor to modular architecture as rule complexity grows (Phase 2) - Add advanced features (configuration, auto-fixes) incrementally (Phase 3+)
Key Architectural Components:
# Core abstractions
@dataclass
class Violation:
rule_id: str
filename: str
line: int
column: int
message: str
severity: str = "error"
class BaseRule(ABC, cst.CSTVisitor):
"""Base class for all linting rules implementing collection-then-analysis pattern."""
METADATA_DEPENDENCIES = (PositionProvider, ScopeProvider, QualifiedNameProvider)
def __init__(self, filename: str, wrapper: MetadataWrapper):
self.filename = filename
self.wrapper = wrapper
self.violations: List[Violation] = []
# Subclasses add collection attributes here
@property
@abstractmethod
def rule_id(self) -> str: ...
def leave_Module(self, node: cst.Module) -> None:
"""Override to analyze collected data after traversal completes."""
self._analyze_collections()
@abstractmethod
def _analyze_collections(self) -> None:
"""Analyze collected data and generate violations."""
...
class LinterEngine:
"""Central orchestrator for linting analysis."""
def __init__(self, rules: List[BaseRule], config: Configuration): ...
def lint_files(self, paths: List[Path]) -> List[Violation]: ...
Alternatives¶
Alternative 1: Pure Single-File Design
Implement all rules as methods in a single CSTVisitor class.
Rejected because: - Poor separation of concerns as rule count grows - Difficult to configure individual rules - Testing becomes complex with intermingled rule logic - Violates single responsibility principle
Alternative 2: Pure Plugin Architecture
Implement each rule as completely independent modules with their own CST parsing.
Rejected because: - Performance overhead from multiple parse passes - Increased memory usage and complexity - Harder to share common metadata and utilities - Overly complex for initial implementation needs
Alternative 3: Event-Driven Architecture
Use observer pattern with CST events triggering rule callbacks.
Rejected because: - Added complexity for minimal benefit in this domain - Harder to debug and trace rule execution - Potential performance overhead from event dispatch - LibCST visitor pattern already provides needed traversal
Consequences¶
Positive Consequences:
Rapid prototyping: Can start with simple implementations and refactor incrementally
Good separation of concerns: Each rule focuses on specific code patterns
Performance optimization: Single-pass analysis with multiple rule evaluation
Easy testing: Rules can be tested independently with focused test cases
Configuration flexibility: Individual rule enable/disable and parameter control
Future extensibility: New rules require minimal framework changes
Negative Consequences:
Initial architecture complexity: More complex than pure single-file approach
Development overhead: Base classes and abstractions require more initial setup
Memory usage: Multiple rule instances consume more memory than single visitor
Potential performance impact: Rule isolation may have minor overhead vs. monolithic design
Risks and Mitigations:
Risk: Framework complexity slows initial development Mitigation: Start with minimal viable abstractions, add complexity incrementally
Risk: Performance doesn’t meet 1000ms target Mitigation: Target confirmed achievable through validation - 1000ms target provides comfortable margin for implementation
Risk: Rule interactions create unexpected behaviors Mitigation: Enforce rule isolation, comprehensive integration testing
Implementation Guidance:
Phase 1: Implement working prototype with 2-3 core rules to validate approach
Phase 2: Refactor to full modular architecture with remaining rules
Phase 3: Add configuration system and advanced features
Testing strategy: Unit tests per rule, integration tests for engine, performance benchmarks
Performance monitoring: Track analysis time per file size, memory usage per rule count