Production-ready Python application for bulk loading CSV files into Microsoft SQL Server with dynamic metadata removal, flexible YAML configuration, and filename sanitization.
A production-ready Python application for loading CSV files into Microsoft SQL Server with dynamic metadata removal, flexible configuration, and filename sanitization.
**Author**: Quentin Casares
This skill helps you work with the `load2mssql` application, which features:
1. **Main Loader** (`load_csv_to_mssql.py`)
- Entry point: `CSVToMSSQLLoader` class
- Dataclass-based configuration parsing
- Workflow: Configuration → File Discovery → CSV Processing → Bulk Loading
- Metadata handling via `read_csv_with_metadata_removal()`
2. **Filename Sanitizer** (`filename_sanitizer.py`)
- Standalone, reusable module
- `FilenameSanitizer` class with `SanitizationRules`
- Removes timestamps, dates, special characters
- Converts to PascalCase
- SQL Server compliance validation
3. **Configuration** (`config.yaml`)
- Sections: database, csv_processing, file_selection, table_loading, filename_sanitization, logging
- Supports both Windows and SQL Server authentication
When setting up the development environment:
1. Create and activate virtual environment:
```bash
python -m venv venv
source venv/bin/activate # Mac/Linux
venv\Scripts\activate # Windows
```
2. Install dependencies:
```bash
pip install -r requirements.txt
```
Execute with default or custom configuration:
```bash
python load_csv_to_mssql.py
python load_csv_to_mssql.py --config my_config.yaml
```
Run tests for sanitization logic:
```bash
python test_sanitization.py
python filename_sanitizer.py
```
**Windows Authentication (Recommended)**:
```yaml
database:
server: "localhost"
port: 1433 # Optional, defaults to 1433
auth_mode: "trusted"
```
**SQL Server Authentication**:
```yaml
database:
server: "localhost"
port: 1433 # Specify custom port if needed
auth_mode: "sql"
username: "your_username"
password: "your_password"
```
Add consistent prefixes to all table names:
```yaml
table_loading:
table_prefix: "tbl_" # All tables prefixed with "tbl_"
```
```yaml
table_loading:
if_exists: "fail" # Error if table exists (safe default)
if_exists: "replace" # Drop and recreate table
if_exists: "append" # Add rows to existing table
```
Enabled by default. Transforms:
Customize via:
```yaml
filename_sanitization:
use_pascal_case: true
custom_patterns: ["_production$"] # Additional regex patterns
custom_replacements:
"cust": "Customer"
```
1. Edit `config.yaml` under `filename_sanitization.custom_patterns`
2. Add regex pattern (e.g., `"_production$"`)
3. Test with `python test_sanitization.py`
```yaml
table_loading:
table_prefix: "tbl_" # If using prefix
create_indexes:
# Use base table names (without prefix)
CustomerAccount: ["CustomerID", "AccountNumber"]
OrderHistory: ["OrderID"]
```
**Index Naming**: Format is `idx_{table}_{column}` (includes prefix if configured)
Force specific SQL Server data types:
```yaml
table_loading:
dtype_overrides:
Sales:
OrderDate: "DATETIME"
Amount: "DECIMAL(18,2)"
```
**For Large Files (>100MB)**:
```yaml
csv_processing:
chunk_size: 50000 # Larger chunks
database:
fast_executemany: true # Bulk insert optimization
```
**For Memory-Constrained Environments**:
```yaml
csv_processing:
chunk_size: 5000 # Smaller chunks
```
The `filename_sanitizer.py` module can be used standalone:
```python
from filename_sanitizer import FilenameSanitizer, SanitizationRules
sanitizer = FilenameSanitizer()
table_name = sanitizer.sanitize("file_20251114.csv")
rules = SanitizationRules(
use_pascal_case=False,
custom_replacements={"acct": "Account"}
)
sanitizer = FilenameSanitizer(rules)
```
Leave a review
No reviews yet. Be the first to review this skill!
# Download SKILL.md from killerskills.ai/api/skills/load-csv-to-mssql/raw