Instructions for working with Monarch Initiative data ingest projects using the Koza framework for biomedical knowledge graph transformations
Instructions for working with Monarch Initiative data ingest projects that use the Koza framework to transform biomedical data into standardized knowledge graph formats (Biolink Model).
You are working on a **Monarch Initiative data ingest project** that uses Koza 2.x for ETL transformations. The project processes biological/biomedical data into knowledge graph formats compatible with the Biolink Model.
Key files and directories:
```bash
just setup
just download # or: uv run mmrrc-ingest download
just transform # or: uv run mmrrc-ingest transform
just test
just check-config
```
Follow this sequence when working on the project:
1. **Define Data Sources**: Update `download.yaml` with data source URLs, formats, and metadata
2. **Configure Transformations**:
- Update `transform.yaml` with Koza 2.x nested structure (reader/transform/writer)
- Implement transformation logic in `transform.py`
3. **Add Tests**: Create tests in `tests/` directory
4. **Validate**: Run `just check-config` to validate YAML configurations with Pydantic models
5. **Execute Pipeline**: Run `just transform` and verify output
6. **Update Documentation**: Keep `README.md` synchronized with changes
Always use the nested `reader`/`transform`/`writer` structure in YAML configs:
```yaml
name: "my_transform"
reader:
format: "csv" # or json, jsonl, tsv, etc.
files: ["data.csv"]
delimiter: ","
transform:
code: "./transform.py"
writer:
node_properties: [...]
min_node_count: 100
```
**CRITICAL REQUIREMENT**: All transform functions MUST return a list of entities, never `None` or bare entity objects:
```python
from typing import Any
from koza import KozaTransform
from koza.model.entity import Entity
@koza.transform_record()
def transform_record(koza_transform: KozaTransform, row: dict[str, Any]) -> list[Entity]:
# Validation: return empty list for invalid records
if not row.get('required_field'):
return [] # CORRECT: Return empty list, NOT None
# Create entity
entity = Entity(...)
# CORRECT: Return list containing entity
return [entity]
# INCORRECT: Do NOT return bare entity or None
# return entity # ❌ Wrong
# return None # ❌ Wrong
```
Always include proper type hints for mypy compatibility:
```python
from typing import Any
from koza import KozaTransform
from koza.model.entity import Entity
@koza.transform_record()
def transform_record(koza_transform: KozaTransform, row: dict[str, Any]) -> list[Entity]:
"""Transform a single record into knowledge graph entities."""
# Implementation here
return [entity]
```
For projects with multiple transformation pipelines:
When making changes, prioritize files in this order:
1. **High Priority** (core ingest logic):
- `download.yaml` - data source definitions
- `transform.yaml` - transformation configuration
- `transform.py` - transformation implementation
2. **Medium Priority**:
- `metadata.yaml` - project metadata
- `cli.py` - CLI commands
- Test files in `tests/`
3. **Low Priority** (unless specifically requested):
- Documentation files
- GitHub workflows
- Configuration files
Update `download.yaml` with new source definitions:
```yaml
format: "csv"
delimiter: ","
```
1. Modify `transform.py` with new transformation code
2. Update `transform.yaml` if reader/writer config changes needed
3. Run `just check-config` to validate
4. Run `just transform` to test
5. Add/update tests in `tests/`
Extend `cli.py` with new command functions following existing patterns.
1. Check Koza logs for error messages
2. Validate YAML syntax with `just check-config`
3. Run transform with verbose logging
4. Verify input data format matches reader configuration
5. Ensure transform functions return lists
Leave a review
No reviews yet. Be the first to review this skill!
# Download SKILL.md from killerskills.ai/api/skills/monarch-initiative-koza-ingest/raw