Evaluate and secure LLM applications with automated testing, red teaming, and vulnerability scanning. Compare models, run security checks, and automate LLM testing in CI/CD.
Evaluate and secure LLM applications with automated testing, red teaming, and vulnerability scanning using Promptfoo.
This skill helps you set up and run Promptfoo, a developer-friendly local tool for testing LLM applications. It enables you to:
When the user requests Promptfoo testing or evaluation:
1. **Determine the user's goal**:
- Ask what they want to test: prompt evaluation, model comparison, security scanning, or CI/CD integration
- If unclear, suggest starting with basic evaluation (`promptfoo init` → `promptfoo eval`)
2. **Install and initialize Promptfoo**:
- Run `npx promptfoo@latest init` to create a new project
- This generates a config file (`promptfooconfig.yaml`) with example prompts and test cases
- Explain the generated config structure to the user
3. **For prompt/model evaluation**:
- Help configure `promptfooconfig.yaml` with their prompts and test cases
- Set up providers (OpenAI, Anthropic, Azure, Bedrock, Ollama, etc.)
- Run `npx promptfoo eval` to execute tests
- Run `npx promptfoo view` to open the web UI for results
- Explain how to interpret the evaluation matrix
4. **For security red teaming**:
- Run `npx promptfoo redteam init` to set up vulnerability scanning
- Configure the red team config with target endpoints and security policies
- Run `npx promptfoo redteam run` to execute security tests
- Generate and review the security vulnerability report
- Explain findings and suggest remediation strategies
5. **For model comparison**:
- Configure multiple providers in the config file
- Set up test cases that evaluate quality, consistency, or specific behaviors
- Run evaluation and help interpret side-by-side results
- Provide recommendations based on metrics
6. **For CI/CD integration**:
- Help set up Promptfoo commands in CI pipeline files
- Configure assertion thresholds for pass/fail criteria
- Set up code scanning for PR reviews if requested
- Test the pipeline locally before committing
7. **After running tests**:
- Analyze results and identify issues or improvements
- Suggest configuration tweaks to improve coverage
- Help iterate on prompts or system configurations
- Export or share results as needed
```bash
npx promptfoo@latest init
npx promptfoo eval
npx promptfoo view
npx promptfoo redteam init
npx promptfoo redteam run
npx promptfoo eval --cache
```
1. **Prompt optimization**: "Help me test which prompt variant performs best for customer support responses"
2. **Security audit**: "Run red team tests to find vulnerabilities in my chatbot"
3. **Model selection**: "Compare GPT-4, Claude, and Llama 3 for my use case"
4. **Regression testing**: "Set up automated tests to catch prompt degradation in CI"
5. **Code review**: "Scan my LLM integration PR for security issues"
Leave a review
No reviews yet. Be the first to review this skill!
# Download SKILL.md from killerskills.ai/api/skills/promptfoo-llm-testing-and-red-teaming/raw