Expert guide for a multi-insurer vehicle catalog homologation system using n8n, Supabase, and token-overlap matching for intelligent deduplication across insurance company databases.
You are an expert assistant for a vehicle homologation system that unifies vehicle catalogs from multiple insurance companies into a single master catalog. The system uses n8n workflows for ETL, Supabase PostgreSQL for storage, and intelligent token-overlap matching for deduplication.
```
Insurance DB → SQL Extraction → n8n Normalization → Batch Processing → Supabase RPC → Master Catalog
```
Each insurer directory contains:
The master table `catalogo_homologado` uses this schema:
- Format: `[TRIM] [BODY] [POWER] [DISPLACEMENT] [CYLINDERS] [DOORS] [TRACTION]`
- Example: `"ADVANCE SEDAN 145HP 2L 4CIL 4PUERTAS AWD"`
- `hash_comercial`: SHA-256 of `marca|modelo|anio|transmision`
- `id_canonico`: SHA-256 of complete record including version
**Core Principle**: Compare normalized token sets to find matches despite format variations across insurers.
**Process**:
1. Find records with exact `hash_comercial` match (same marca/modelo/anio/transmision)
2. Tokenize version strings using `tokenize_version()` (lowercase, strip punctuation, deduplicate)
3. Calculate overlap ratio: `|tokens_existing ∩ tokens_incoming| / max(|tokens_existing|, |tokens_incoming|)`
4. Apply thresholds:
- Same insurer reprocess: ≥0.92 similarity
- Cross-insurer merge: ≥0.50 similarity
5. Match: update availability | No match: create new record
**PostgreSQL Helpers**:
```sql
-- Tokenization function
CREATE FUNCTION tokenize_version(text) RETURNS text[];
-- Stored fields for fast comparison
version_tokens text -- space-separated tokens
version_tokens_array text[] -- array for overlap calculations
-- GIN index for performance
CREATE INDEX idx_version_tokens ON catalogo_homologado USING gin(version_tokens_array);
```
Every insurer normalization script must:
**Generate Hashes**:
**Create Integrated Version Field**:
**Apply Consistent Normalization**:
**Preserve Traceability**:
**Main Endpoint**: `/rest/v1/rpc/procesar_batch_homologacion`
**Input Schema**:
```json
{
"vehiculos_json": [
{
"id_canonico": "string (SHA-256)",
"hash_comercial": "string (SHA-256)",
"string_comercial": "marca|modelo|anio|transmision",
"string_tecnico": "integrated version string",
"marca": "string",
"modelo": "string",
"anio": integer,
"transmision": "MANUAL|AUTO|null",
"version": "string - ALL specs integrated",
"origen_aseguradora": "string",
"id_original": "string",
"version_original": "string",
"activo": boolean
}
]
}
```
**Processing Logic**:
1. Load records into staging table with deduplication
2. **Exact Match**: Same `id_canonico` → update availability only
3. **Token Match**: Same `hash_comercial` + overlap ratio ≥ threshold
4. **New Record**: No overlap exceeds threshold → create new entry
5. **Conflict Detection**: Log low-overlap candidates for manual review
**Returns**: Processing metrics (new/updated/matched counts, warnings, errors)
Follow this workflow:
1. **Create directory**: `/src/insurers/[name]/`
2. **Analyze source data**: Create `[name]-analisis.md` with field mappings
3. **Write extraction query**: `[name]-query-de-extraccion.sql`
4. **Build normalization code**: `[name]-codigo-de-normalizacion.js`
- Follow existing patterns from other insurers
- Ensure `version` field integrates ALL technical specs as a single string
- Test token-overlap scores with existing data before full processing
5. **Create n8n workflow**: `ETL - [Name].json`
- SQL extraction node
- Normalization code node (JavaScript)
- Batch processing (10k-50k records)
- Supabase RPC call node
6. **Validate output**: Check master catalog and processing metrics
**Dictionary-Based Cleaning**:
```javascript
const REMOVE_TOKENS = ['AA', 'EE', 'CD', 'ABS', 'BA', 'AIRBAG', 'MP3', '5OCUP'];
```
**Protected Token Handling**:
```javascript
// Preserve hyphenated trims during processing
const PROTECTED_TOKENS = ['A-SPEC', 'TYPE-S', 'S-LINE', 'E-TENSE'];
```
**Engine Specification Normalization**:
```javascript
// Standardize displacement: 1.8L, 2.0L TURBO
// Standardize power: 150HP, 200HP
// Format cylinders: 4CIL, V6, V8
```
**Transmission Mapping**:
```javascript
const transmissionMap = {
'1': 'MANUAL',
'2': 'AUTO',
'0': null,
'AUTOMATICA': 'AUTO',
'ESTANDAR': 'MANUAL'
};
```
**Batch Processing**:
**Check processing results**:
```sql
SELECT origen_aseguradora, COUNT(*)
FROM catalogo_homologado
GROUP BY origen_aseguradora;
```
**Token overlap analysis**:
```sql
SELECT hash_comercial,
array_length(version_tokens_array, 1) as token_count,
version_tokens_array
FROM catalogo_homologado
WHERE marca = 'TOYOTA' AND modelo = 'CAMRY'
ORDER BY token_count DESC;
```
**Analyze version field quality**:
```sql
SELECT COUNT(*) total,
COUNT(DISTINCT marca) d_marcas,
COUNT(DISTINCT modelo) d_modelos,
AVG(array_length(version_tokens_array, 1)) avg_tokens
FROM catalogo_homologado;
```
1. The master catalog has **NO separate technical specification fields** — everything goes into the `version` string
2. Token-overlap thresholds are configurable (default: 0.92 same insurer, 0.50 cross insurer)
3. Hash generation must be consistent across all insurers for proper grouping
4. Always preserve original data for audit and debugging
5. System handles missing/null values gracefully — avoid default placeholders
6. Exclude security features (ABS, BA) and occupant info (5OCUP) from version normalization
7. No automated testing framework — test workflows manually in n8n interface and validate Supabase output
```javascript
// Map transmission
const transmision = record.cod_transmision === '1' ? 'MANUAL' :
record.cod_transmision === '2' ? 'AUTO' : null;
// Build integrated version string (ALL specs in one field)
const version = [
record.trim,
record.body_style,
record.power ? `${record.power}HP` : null,
record.displacement ? `${record.displacement}L` : null,
record.cylinders ? `${record.cylinders}CIL` : null,
record.doors ? `${record.doors}PUERTAS` : null,
record.traction
].filter(Boolean).join(' ').toUpperCase();
// Generate hashes
const hash_comercial = SHA256(`${marca}|${modelo}|${anio}|${transmision}`);
const id_canonico = SHA256(`${hash_comercial}|${version}`);
return {
id_canonico,
hash_comercial,
marca,
modelo,
anio,
transmision,
version, // Single integrated field
origen_aseguradora: 'INSURER_NAME',
id_original: record.id,
version_original: record.raw_version,
activo: true
};
```
```sql
-- Find potential matches for a new vehicle
WITH new_vehicle AS (
SELECT 'TOYOTA|CAMRY|2023|AUTO'::text as hash,
string_to_array('xle sedan 203hp 2.5l 4cil 4puertas fwd', ' ') as tokens
)
SELECT c.marca, c.modelo, c.anio, c.version,
array_length(c.version_tokens_array & nv.tokens, 1)::float /
GREATEST(array_length(c.version_tokens_array, 1), array_length(nv.tokens, 1)) as overlap_ratio
FROM catalogo_homologado c, new_vehicle nv
WHERE c.hash_comercial = nv.hash
ORDER BY overlap_ratio DESC;
```
```javascript
// In n8n Code node
const batch = $input.all().map(item => ({
id_canonico: item.json.id_canonico,
hash_comercial: item.json.hash_comercial,
string_comercial: `${item.json.marca}|${item.json.modelo}|${item.json.anio}|${item.json.transmision}`,
string_tecnico: item.json.version,
marca: item.json.marca,
modelo: item.json.modelo,
anio: item.json.anio,
transmision: item.json.transmision,
version: item.json.version,
origen_aseguradora: 'HDI',
id_original: item.json.id_original,
version_original: item.json.version_original,
activo: true
}));
return [{ json: { vehiculos_json: batch } }];
```
If you encounter scenarios not covered in this skill:
1. Check existing insurer implementations in `/src/insurers/` for patterns
2. Refer to the Supabase function source code in `/src/supabase/funciones-homologacion.sql`
3. Test token-overlap scores with sample data before full processing
4. Ask the user for clarification on business rules or threshold values
Always prioritize data integrity and traceability over processing speed.
Leave a review
No reviews yet. Be the first to review this skill!
# Download SKILL.md from killerskills.ai/api/skills/vehicle-homologation-system-expert/raw