Expert assistant for SkyRL-based reinforcement learning with MCP servers. Helps with training, debugging, testing, and monitoring RL models for multi-turn tool use.
Expert assistant for working with SkyRL-based reinforcement learning implementations that train language models on multi-turn tool use with MCP (Model Context Protocol) servers. Specializes in GRPO (Group Relative Policy Optimization) training, debugging, and monitoring.
This skill helps you work with a complex RL training pipeline that:
When setting up the environment:
1. **Check Dependencies**: Verify all requirements are installed
- Run `pip install -r requirements.txt`
- Ensure SkyRL is installed from https://github.com/Sky-T/SkyRL
- Verify Python 3.12+ is available
2. **Validate API Keys**: Check that `.env` file in parent directory contains:
- `OPENAI_API_KEY`
- `POLYGON_API_KEY`
- `FMP_API_KEY`
- `TAVILY_API_KEY`
- `SLACK_BOT_TOKEN`
3. **Verify MCP Servers**: Ensure `../mcp_tools/limited/` is accessible with required servers
4. **Run Setup Validation**: Execute `python training/validate_setup.py`
5. **Check Training Data**: Confirm `data/processed/train.json` or `data/inputs/train.json` exists and is properly formatted
Help users select the appropriate training approach:
**For Production/Speed (Recommended):**
```bash
ENABLE_VLLM=true VLLM_MAX_MODEL_LEN=4096 VLLM_GPU_MEMORY_UTILIZATION=0.3 ./training/scripts/launch_real_env_gpu_vllm.sh
```
**For Single GPU (Memory-Efficient):**
```bash
./training/scripts/launch_qwen3_training.sh
```
**For Multi-GPU (Full Fine-Tuning):**
```bash
./training/scripts/launch_distributed.sh
```
**For Development/Debugging:**
```bash
./training/scripts/launch_real_env_cpu.sh
```
During training, watch for these critical indicators:
**PPO Ratio Health (Critical):**
**vLLM Performance:**
**Training Speed:**
**Monitor Commands:**
```bash
tail -f outputs/real-env-grpo-*/training.log
./monitor_gpu.sh
```
**Memory Issues:**
For MPS memory issues on macOS:
```bash
export DEVICE_TYPE="cpu"
export DISABLE_BITSANDBYTES=1
```
For CUDA out of memory:
For BitsAndBytes issues:
```bash
export DISABLE_BITSANDBYTES=1
```
**Training Issues:**
For SkyRL import errors:
```bash
cd /path/to/SkyRL && pip install -e .
export PYTHONPATH="${PYTHONPATH}:$(pwd):$(pwd)/.."
```
For MCP server connection failures:
For training instabilities:
**Enable Detailed Debugging:**
```bash
export PYTHONPATH="$(pwd):$(pwd)/.."
export CUDA_LAUNCH_BLOCKING=1
python training/scripts/train_qwen3_grpo_real_env.py --debug
```
Guide users through comprehensive testing:
**Quick Smoke Test:**
```bash
python training/tests/smoke_test.py
```
**Full Test Suite:**
```bash
python training/tests/run_comprehensive_tests.py
```
**Test Real Tool Execution:**
```bash
python training/scripts/test_real_tool_execution.py
```
**Test MCP Integration:**
```bash
python training/tests/test_mcp_integration.py
```
**Test Critical Fixes:**
```bash
python test_critical_fixes_v2.py
```
Help users understand the training flow:
1. **Policy** (`training/core/qwen_policy_with_value.py`) - Manages Qwen model with value head
2. **Environment** (`environments/mcp_tool_environment.py`) - Handles real MCP tool execution
3. **Trajectory Collector** (`training/data/trajectory_collector.py`) - Collects parallel rollouts
4. **GRPO Trainer** (`training/core/grpo_trainer_gradient_fix.py`) - Implements RL training loop
5. **Tool Manager** (`environments/simple_shared_manager.py`) - Manages MCP server connections
**Data Flow:**
Point users to key configuration files:
**Key Hyperparameters:**
Ensure users understand:
**PPO Degeneracy Fix:**
**vLLM Integration:**
**Tool Use Configuration:**
**Output Structure:**
1. **Real Environment Only**: This system uses actual MCP tool execution, not mock data
2. **SkyRL Dependency**: Must be installed separately from https://github.com/Sky-T/SkyRL
3. **API Keys Required**: Training will fail without proper .env configuration
4. **Hardware Requirements**: Match training mode to available resources
5. **PPO Health Critical**: Monitor std_ratio - training cannot proceed if degenerate
6. **Python Version**: Requires Python 3.12+ for optimal compatibility
**Starting a Training Run:**
```bash
python training/validate_setup.py
ENABLE_VLLM=true VLLM_MAX_MODEL_LEN=4096 VLLM_GPU_MEMORY_UTILIZATION=0.3 ./training/scripts/launch_real_env_gpu_vllm.sh
```
**Debugging Memory Issues:**
```bash
export VLLM_GPU_MEMORY_UTILIZATION=0.2 # Reduce memory usage
./training/scripts/launch_qwen3_training.sh
```
**Monitoring Training:**
```bash
tail -f outputs/real-env-grpo-vllm-*/training.log
```
Leave a review
No reviews yet. Be the first to review this skill!
# Download SKILL.md from killerskills.ai/api/skills/multi-mcp-rl-training-assistant/raw