Assemble and refine viral genomes from NGS data using SPAdes, Gap2Seq, and reference-guided workflows
This skill enables you to work with the viral-assemble codebase, a set of scripts and tools for assembling viral genomes from NGS data. Primarily used for Lassa and Ebola virus analysis, this is a docker-centric Python project built on top of viral-core with a modular architecture for viral genome assembly workflows.
You can assist with:
Execute all unit tests in parallel:
```bash
pytest -rsxX -n auto test/unit
```
Run a specific test file:
```bash
pytest test/unit/test_assembly.py
```
Run integration tests:
```bash
pytest test/unit/test_assembly_integration.py
```
Generate coverage report:
```bash
pytest --cov
```
Run slow tests (marked with `@pytest.mark.slow`):
```bash
pytest --runslow
```
The project uses a docker-centric development paradigm. To work on code changes:
1. **Mount local code into viral-core container:**
```bash
docker run -it --rm -v $(pwd):/opt/viral-ngs/viral-assemble quay.io/broadinstitute/viral-core
```
2. **Update conda dependencies if changed:**
```bash
/opt/viral-ngs/source/docker/install-conda-dependencies.sh /opt/viral-ngs/viral-assemble/requirements-conda.txt
```
3. **Or install full dev layer:**
```bash
/opt/viral-ngs/viral-assemble/docker/install-dev-layer.sh
```
4. **Test interactively within container:**
```bash
cd /opt/viral-ngs/viral-assemble
pytest -rsxX -n auto test/unit
```
```bash
docker build -t viral-assemble .
```
The Dockerfile layers viral-assemble on top of viral-core:2.4.2.
Available via `assembly.py <command>`:
**Assembly Tool Wrappers** (`assemble/` directory):
**Testing** (`test/` directory):
**Dependencies from viral-core:**
Core tools specified in `requirements-conda.txt`:
Standard viral genome assembly pipeline:
1. **Preprocess reads:**
```bash
assembly.py trim_rmdup_subsamp input.bam output.bam
```
2. **De novo assembly:**
```bash
assembly.py assemble_spades reads.bam output.fasta
```
3. **Scaffold against reference:**
```bash
assembly.py order_and_orient contigs.fasta reference.fasta scaffolded.fasta
```
4. **Fill gaps from reference:**
```bash
assembly.py impute_from_reference scaffolded.fasta reference.fasta filled.fasta
```
5. **Iterative refinement:**
```bash
assembly.py refine_assembly reads.bam reference.fasta refined.fasta
```
When developing new features:
Each command follows this structure:
```python
def parser_my_command(parser=argparse.ArgumentParser()):
parser.add_argument('input', help='Input file')
parser.add_argument('output', help='Output file')
util.cmd.attach_main(parser, main_my_command, split_args=True)
return parser
def main_my_command(args):
# Implementation here
pass
__commands__.append(('my_command', parser_my_command))
```
Standard exceptions used throughout:
GitHub Actions workflow (`.github/workflows/build.yml`) runs on push/PR:
When helping users with this codebase:
1. **Understand the request** - Identify if they need to run assembly workflows, develop new features, or debug issues
2. **Check the environment** - Determine if they're working locally or in Docker container
3. **For assembly tasks:**
- Identify which assembly command(s) are needed
- Verify input file formats and locations
- Construct appropriate command-line invocations
- Explain expected outputs
4. **For development tasks:**
- Locate relevant module in `assembly.py` or `assemble/` directory
- Follow command registration pattern if adding new commands
- Write tests following testing guidelines
- Test in Docker container before committing
5. **For debugging:**
- Check test files in `test/unit/` for examples
- Run specific test files to isolate issues
- Use docker container for reproducible debugging environment
6. **For testing:**
- Run appropriate pytest command with proper flags
- Ensure new tests meet time requirements or are marked `@pytest.mark.slow`
- Verify coverage for new code
Leave a review
No reviews yet. Be the first to review this skill!
# Download SKILL.md from killerskills.ai/api/skills/viral-genome-assembly-pipeline/raw