Zorro: Malware Detection Framework Assistant

This skill provides expert guidance for working with the Zorro malware detection framework, a comprehensive system for detecting malicious packages in software ecosystems using two complementary approaches: ICN (Intent Convergence Networks) and AMIL (Attention-based Multiple Instance Learning).

What This Skill Does

Assists developers and security researchers working with the Zorro framework by providing:

Architecture guidance for ICN and AMIL models

Training and evaluation workflow instructions

Benchmarking setup and execution

Data preparation and processing guidance

Configuration and environment setup

Interpretation of model outputs and metrics

Step-by-Step Instructions

1. Understand Project Context

When the user asks about Zorro, first identify which component they're working with:

**ICN Model**: Advanced neural architecture detecting malware through intent divergence analysis (research-grade, interpretable)

**AMIL Model**: Lightweight attention-based classifier optimized for CI/CD and registry scanning (production-ready, fast)

**Evaluation Framework**: Benchmarking system comparing against SOTA models and LLMs

**Data Pipeline**: Extraction, preprocessing, and dataset management

2. Environment and Setup Assistance

For setup questions:

1. **Check environment variables** needed:

- `GITHUB_TOKEN` - Required for advisory scraping

- `OPENROUTER_API_KEY` - Optional, for LLM benchmarking

- `WANDB_API_KEY` - Optional, for experiment tracking

2. **Verify package management** - Zorro uses `uv` (not pip):

```bash

uv sync # Install dependencies

uv add <package> # Add new dependency

uv lock --upgrade # Update dependencies

```

3. **Check data paths** are correctly set:

- `malicious-software-packages-dataset/` - Primary dataset

- `data/` - Processed training data

- `checkpoints/` - Model checkpoints

- `logs/` - Training logs

- `test_results/` - Benchmark results

3. ICN Model Workflows

For ICN-related tasks:

**Architecture Components:**

Local Intent Estimator: Analyzes individual code units (functions/files)

Global Intent Integrator: Maintains package-level state

Dual Detection: Divergence detection + plausibility detection

**Training Pipeline (4-stage curriculum):**

```bash

Full training with curriculum learning

python train_icn.py

Quick structure test

python test_icn_structure.py

Verify complete pipeline

python verify_icn_pipeline.py

```

**Training Stages:**

1. Intent Pretraining - Learn intent classification on benign packages

2. Convergence Training - Train stable convergence on benign packages

3. Malicious Training - Introduce real malware, train divergence detection

4. Robustness Training - Adversarial examples and obfuscated code

**Key Files:**

`icn/models/icn_model.py` - Main architecture

`icn/training/config.py` - Training configuration

`icn/training/wandb_config.py` - Experiment tracking

`icn/evaluation/` - Benchmarking framework

`icn/data/` - Data preparation

`icn/parsing/` - Code parsing and AST analysis

4. AMIL Model Workflows

For AMIL-related tasks:

**Architecture Components:**

Feature Extractor: GraphCodeBERT embeddings + handcrafted features (API patterns, entropy, metadata)

Attention Pooling: Multi-head attention for package-level aggregation

Binary Classifier: Malicious/benign with confidence scores and attention weights

**Training Pipeline (3-stage curriculum):**

```bash

Run demo with examples

python amil_demo.py

Full training

python -m amil.trainer --config-file amil_config.json

Comprehensive evaluation

python -m amil.evaluator --model-path checkpoints/amil_model.pth --test-data data/test_samples/

Benchmark integration

python amil_benchmark_integration.py

```

**Training Stages:**

1. Stage A - Balanced (5:1 benign:malicious, clean samples)

2. Stage B - Augmented (add obfuscation variants)

3. Stage C - Realistic (10:1 ratio for production calibration)

**Key Files:**

`amil/model.py` - Core architecture

`amil/trainer.py` - Training pipeline

`amil/evaluator.py` - Evaluation suite

`amil/feature_extractor.py` - Feature extraction

`amil/config.py` - Configuration

**Success Criteria:**

ROC-AUC ≥ 0.95

False positive rate < 2%

Localization IoU ≥ 0.7

Inference ≤2s per package

5. Benchmarking and Evaluation

For benchmarking tasks:

**Quick Tests:**

```bash

Quick benchmark test

python test_benchmark_framework.py

LLM granularity comparison

python icn/evaluation/test_granularity.py

```

**Full Benchmark Study:**

```bash

python run_icn_benchmark.py \

--include-llms \

--include-huggingface \

--include-baselines

```

**Comparison Targets:**

Security Models: Endor Labs BERT, HuggingFace models

Modern LLMs: GPT o1, Claude 3.5, Gemini (via OpenRouter)

Traditional Baselines: Heuristic detection, random baselines

**Evaluation Modes:**

Package-level vs file-by-file analysis

Cross-ecosystem evaluation (npm ↔ PyPI)

Cost/performance trade-offs

**Metrics:**

F1 Score (primary comparison metric)

ROC-AUC

Inference speed

API costs (for LLMs)

Interpretability quality

6. Data Preparation

For data-related tasks:

```bash

Extract malicious samples

python extract_malicious_samples.py

Run demos on sample data

python icn_demo.py

python icn_phase2_demo.py

```

**Package Manager Support:**

npm (JavaScript/Node.js)

PyPI (Python)

Cargo (Rust, planned)

7. Configuration Guidance

**ICN Configuration (`icn/training/config.py`):**

Model architecture parameters

Curriculum learning stages

Optimization settings

Intent vocabulary (fixed + latent)

**AMIL Configuration (`amil/config.py`):**

Feature extraction settings

Attention mechanism parameters

Curriculum stage definitions

Training hyperparameters

**Intent Vocabulary (ICN):**

Fixed Intents (~15): `net.outbound`, `fs.read`, `fs.write`, `proc.spawn`, `eval`, `crypto`, `sys.env`

Latent Intents (8-12): Learned through contrastive learning

8. Troubleshooting Common Issues

**Import Errors:**

All relative imports use `..` notation

Always run scripts from project root

Ensure virtual environment is activated (`.venv/`)

**GPU Requirements:**

ICN training requires GPU (CUDA recommended)

AMIL can run on CPU but GPU accelerates training

**API Access:**

LLM benchmarking requires `OPENROUTER_API_KEY`

Set environment variables before running benchmarks

**Module Structure:**

ICN modules: `icn/models/`, `icn/training/`, `icn/evaluation/`, `icn/data/`, `icn/parsing/`

AMIL modules: `amil/model.py`, `amil/trainer.py`, `amil/evaluator.py`, `amil/feature_extractor.py`

Usage Examples

**Example 1: User wants to train ICN model**

```

User: "How do I train the ICN model?"

Agent:

1. Verify GPU is available and environment is set up

2. Check data is prepared in malicious-software-packages-dataset/

3. Run: python train_icn.py

4. Monitor training through 4 curriculum stages

5. Check logs/ for training progress and checkpoints/ for saved models

```

**Example 2: User wants to benchmark AMIL against ICN**

```

User: "How do I compare AMIL performance to ICN?"

Agent:

1. Train both models if not already done

2. Run: python amil_benchmark_integration.py

3. Review metrics: F1, ROC-AUC, speed, interpretability

4. Check test_results/ for detailed comparison reports

```

**Example 3: User asks about intent vocabulary**

```

User: "What intents does ICN detect?"

Agent:

ICN uses dual intent system:

Fixed Intents (~15): net.outbound, fs.read, fs.write, proc.spawn, eval, crypto, sys.env, etc.

Latent Intents (8-12): Learned during training via contrastive learning

Latent intents can be promoted to fixed taxonomy based on interpretability

```

Constraints and Important Notes

1. **Always run scripts from project root** - Relative imports assume this

2. **Use `uv` not `pip`** - Project uses uv for dependency management

3. **GPU recommended for ICN** - Training requires CUDA-capable GPU

4. **OpenRouter API key optional** - Only needed for LLM benchmarking

5. **Statistical significance** - Benchmark results include significance testing for research publication

6. **Both models support same ecosystems** - ICN and AMIL work with npm, PyPI, and planned Cargo support

7. **Virtual environment** - Always activate `.venv/` before running scripts

8. **Cross-ecosystem evaluation** - Tests generalization between package managers

Key File Locations

ICN Model: `icn/models/icn_model.py`

AMIL Model: `amil/model.py`

Training Configs: `icn/training/config.py`, `amil/config.py`

Benchmarking: `icn/evaluation/`, `run_icn_benchmark.py`

Data Processing: `icn/data/`, `extract_malicious_samples.py`

Demos: `icn_demo.py`, `icn_phase2_demo.py`, `amil_demo.py`

Zorro: Malware Detection Framework Assistant

Zorro: Malware Detection Framework Assistant

What This Skill Does

Step-by-Step Instructions

1. Understand Project Context

2. Environment and Setup Assistance

3. ICN Model Workflows

Full training with curriculum learning

Quick structure test

Verify complete pipeline

4. AMIL Model Workflows

Run demo with examples

Full training

Comprehensive evaluation

Benchmark integration

5. Benchmarking and Evaluation

Quick benchmark test

LLM granularity comparison

6. Data Preparation

Extract malicious samples

Run demos on sample data

7. Configuration Guidance

8. Troubleshooting Common Issues

Usage Examples

Constraints and Important Notes

Key File Locations

Reviews (0)