Promptfoo LLM Testing & Red Teaming

Evaluate and secure LLM applications with automated testing, red teaming, and vulnerability scanning using Promptfoo.

What This Skill Does

This skill helps you set up and run Promptfoo, a developer-friendly local tool for testing LLM applications. It enables you to:

Run automated evaluations of prompts and models

Perform security vulnerability scanning (red teaming)

Compare multiple AI models side-by-side

Generate security reports

Integrate LLM testing into CI/CD pipelines

Review LLM-related code changes for security issues

Instructions

When the user requests Promptfoo testing or evaluation:

1. **Determine the user's goal**:

- Ask what they want to test: prompt evaluation, model comparison, security scanning, or CI/CD integration

- If unclear, suggest starting with basic evaluation (`promptfoo init` → `promptfoo eval`)

2. **Install and initialize Promptfoo**:

- Run `npx promptfoo@latest init` to create a new project

- This generates a config file (`promptfooconfig.yaml`) with example prompts and test cases

- Explain the generated config structure to the user

3. **For prompt/model evaluation**:

- Help configure `promptfooconfig.yaml` with their prompts and test cases

- Set up providers (OpenAI, Anthropic, Azure, Bedrock, Ollama, etc.)

- Run `npx promptfoo eval` to execute tests

- Run `npx promptfoo view` to open the web UI for results

- Explain how to interpret the evaluation matrix

4. **For security red teaming**:

- Run `npx promptfoo redteam init` to set up vulnerability scanning

- Configure the red team config with target endpoints and security policies

- Run `npx promptfoo redteam run` to execute security tests

- Generate and review the security vulnerability report

- Explain findings and suggest remediation strategies

5. **For model comparison**:

- Configure multiple providers in the config file

- Set up test cases that evaluate quality, consistency, or specific behaviors

- Run evaluation and help interpret side-by-side results

- Provide recommendations based on metrics

6. **For CI/CD integration**:

- Help set up Promptfoo commands in CI pipeline files

- Configure assertion thresholds for pass/fail criteria

- Set up code scanning for PR reviews if requested

- Test the pipeline locally before committing

7. **After running tests**:

- Analyze results and identify issues or improvements

- Suggest configuration tweaks to improve coverage

- Help iterate on prompts or system configurations

- Export or share results as needed

Key Commands

```bash

Initialize project

npx promptfoo@latest init

Run evaluation

npx promptfoo eval

View results in browser

npx promptfoo view

Initialize red teaming

npx promptfoo redteam init

Run security scan

npx promptfoo redteam run

Run with caching for faster iteration

npx promptfoo eval --cache

```

Configuration Tips

**Config file location**: `promptfooconfig.yaml` in project root

**Providers**: Support for 50+ LLM providers (OpenAI, Anthropic, Azure, Bedrock, Ollama, local models)

**Test cases**: Use YAML arrays or separate files for organization

**Assertions**: Define expected outputs, quality metrics, or security constraints

**Caching**: Enable to speed up development iterations

**Privacy**: All evaluations run locally - prompts never leave the user's machine

Example Use Cases

1. **Prompt optimization**: "Help me test which prompt variant performs best for customer support responses"

2. **Security audit**: "Run red team tests to find vulnerabilities in my chatbot"

3. **Model selection**: "Compare GPT-4, Claude, and Llama 3 for my use case"

4. **Regression testing**: "Set up automated tests to catch prompt degradation in CI"

5. **Code review**: "Scan my LLM integration PR for security issues"

Important Notes

Promptfoo is open source (MIT licensed) and runs entirely locally

Results include detailed metrics, scoring, and visual comparisons

Red teaming covers common vulnerabilities: prompt injection, jailbreaking, PII leakage, bias, toxicity, etc.

Can integrate with any LLM API or custom providers

Web UI provides live reload during development

Community support available via Discord

Resources

Documentation: https://www.promptfoo.dev/docs/

Red Teaming Guide: https://www.promptfoo.dev/docs/red-team/

Getting Started: https://www.promptfoo.dev/docs/getting-started/

Supported Providers: https://www.promptfoo.dev/docs/providers/

CI/CD Integration: https://www.promptfoo.dev/docs/integrations/ci-cd/

Code Scanning: https://www.promptfoo.dev/docs/code-scanning/

Promptfoo LLM Testing & Red Teaming

Promptfoo LLM Testing & Red Teaming

What This Skill Does

Instructions

Key Commands

Initialize project

Run evaluation

View results in browser

Initialize red teaming

Run security scan

Run with caching for faster iteration

Configuration Tips

Example Use Cases

Important Notes

Resources

Reviews (0)