Parse PNC bank statement PDFs to CSV with automatic categorization, year processing, and validation. Supports monthly/yearly batch processing with cross-year boundary handling.
A Python application that parses PNC bank statement PDFs and converts them to CSV format for import into Google Sheets or Excel. Features JSON-based categorization, year processing mode, and comprehensive validation.
This skill helps you work with the PNC Statement Parser codebase, which:
```
src/
├── parsers/ # Modular parser system
│ ├── base_parser.py # Abstract interface
│ ├── pnc_statement_parser.py # PNC implementation
│ ├── pnc_patterns.py # Regex patterns
│ ├── section_extractor.py # Section handling
│ ├── transaction_parser.py # Core parsing
│ ├── categorization.py # Auto-categorization
│ └── text_utils.py # Text cleaning
├── categories.json # Category definitions
experiments/ # Deprecated enhanced parsing
tests/ # Test suite
docs/ # Documentation
data/ # Input PDFs (gitignored)
output/ # Output CSVs (gitignored)
```
When working with this codebase, follow these steps:
Always test after making changes:
```bash
source venv/bin/activate
python tests/test_basic.py
python tests/test_extraneous_filtering.py
```
**Basic Usage:**
```bash
python parse_statements.py --file statement.pdf --output output.csv
python parse_statements.py --directory statements/ --output all.csv --monthly
```
**Year Processing Mode (Recommended):**
```bash
python parse_statements.py --year 2023 --output output/2023.csv
python parse_statements.py --year 2023 --include-next-month --output output/2023_complete.csv
python parse_statements.py --year 2023 --base-path /path/to/statements --output 2023.csv
python parse_statements.py --year 2023 --summary report.txt
```
```
PNC_Documents/
├── 2023/
│ ├── Spend_x2157_Statement_01_January_2023.pdf
│ ├── Spend_x2157_Statement_02_February_2023.pdf
│ └── ... (all 12 monthly statements)
└── 2024/
├── Spend_x2157_Statement_01_January_2024.pdf
└── ...
```
**Date Pattern:**
```python
r'^(\d{1,2}/\d{1,2})\s+' # MM/DD at start of line
```
**Amount Pattern:**
```python
r'(\.?\d{1,3}(?:,\d{3})*\.?\d{0,2})' # Handles .14, 6,250.00, etc.
```
**Text Filtering:**
When modifying the codebase:
1. **Parser Logic**: Edit files in `src/parsers/`
2. **Categories**: Update `src/categories.json` (JSON format)
3. **Test Immediately**: Run relevant tests after changes
4. **Validate**: Use real PDFs if available
5. **Check Output**: Review JSON validation reports
```python
class BankOfAmericaPatterns:
def __init__(self):
self.DATE_PATTERN = re.compile(r'...')
class BOAStatementParser(BaseStatementParser):
def __init__(self):
self.patterns = BankOfAmericaPatterns()
self.categorizer = TransactionCategorizer() # Reuse
```
**"No text extracted from PDF"**
**Wrong transaction types or page ordering**
**Type errors in imports**
**CRITICAL: This is a PUBLIC repository**
NEVER include Personally Identifiable Information (PII):
**Gitignore protections:**
The parser includes comprehensive validation:
Users can extend categories via PRs to `src/categories.json`:
```json
{
"categories": {
"Medical": {
"patterns": ["Cleveland Clinic", "MetroHealth", "Mhs\\*Metrohealth"]
}
}
}
```
1. **Always test after changes**: Run test suite before committing
2. **Prioritize accuracy**: Financial data must be 100% accurate
3. **Use modular parser**: Deprecated enhanced parser has reliability issues
4. **Preserve critical patterns**: Date/amount regex patterns are battle-tested
5. **Validate with real data**: Test against actual statements when possible
6. **Follow security rules**: Never commit PII or sensitive financial data
7. **Use year mode for complete records**: `--year` with `--include-next-month` captures cross-year transactions
Leave a review
No reviews yet. Be the first to review this skill!
# Download SKILL.md from killerskills.ai/api/skills/pnc-statement-parser/raw