Guide for developing reinforcement learning web agents using Playwright, Hydra configuration, and browser automation in the WebServ environment
You are working with an RL (Reinforcement Learning) web agent project that enables automated browser interactions through a WebArena-like environment. The project uses Playwright for browser automation, Hydra for configuration management, and includes a proxy system for host rewriting.
1. **WebAgentEnv** (`rl_web_agent/env.py`) - Main environment class managing browser sessions, page interactions, and step-based RL interface
2. **Configuration** - Single YAML config file (`rl_web_agent/conf/config.yaml`) managed by Hydra
3. **JavaScript Layer** - Browser-side scripts:
- `parser.js` - DOM parsing with semantic ID assignment
- `initscript.js` - Event detection and hover tracking
4. **Proxy System** (`proxy/proxy_client_aiohttp.py`) - HTTP proxy for host rewriting and API gateway communication
```bash
python -m rl_web_agent.main
python -m rl_web_agent.main environment.browser.headless=true environment.proxy.enabled=false
jupyter notebook # Use notebooks/ directory
```
```bash
uv sync
uv add package_name
uv sync --group gpu
uv sync --extra webarena
```
```bash
ruff check .
ruff format .
```
All configuration is centralized in `rl_web_agent/conf/config.yaml` using Hydra's single config approach.
```bash
python -m rl_web_agent.main environment.browser.launch_options.headless=false
python -m rl_web_agent.main environment.proxy.server=http://localhost:9090
python -m rl_web_agent.main environment.sites.custom_site=example.com:8080
```
Actions are JSON strings:
```python
await env.step('{"action": "click", "target": "login_button"}')
await env.step('{"action": "type", "target": "username", "text": "john_doe", "enter": true}')
await env.step('{"action": "goto_url", "url": "https://example.com"}')
await env.step('{"action": "new_tab", "url": "https://example.com"}')
await env.step('{"action": "switch_tab", "tab_id": 1}')
```
```python
observation = await env.observation()
```
1. **Initialization** (`initscript.js`) - Detects hover events, marks hoverable elements
2. **DOM Processing** (`parser.js`) - Strips to essential interactive elements, assigns semantic IDs, preserves form state
3. **Element Interaction** - Uses semantic IDs for reliable targeting, auto-scrolls elements into view
Example:
```python
from rl_web_agent.prompts import load_prompt
prompt = load_prompt("system_prompt").format(
objective=task_objective,
context=context_info
)
```
When working with this codebase:
1. **Start with Jupyter notebooks** in `notebooks/` for experimentation
2. **Use Hydra overrides** for configuration changes instead of editing YAML directly
3. **Follow fail-fast principles** - Let errors surface immediately, don't hide them
4. **Never truncate logs** - Always show full content for debugging
5. **Load prompts from files** - Never hardcode prompts in Python code
6. **Use async/await consistently** - All browser operations must be asynchronous
7. **Test with Playwright notebooks** - Iterate quickly in interactive environment
8. **Leverage semantic IDs** - Use the DOM's semantic ID system for reliable element targeting
When adding new features:
1. Prototype in a Jupyter notebook first
2. Use direct key access (`dict["key"]`) to catch bugs early
3. Load any prompts from `rl_web_agent/prompts/` directory
4. Test with `python -m rl_web_agent.main` with appropriate config overrides
5. Update `config.yaml` if new configuration options are needed
When debugging:
1. Enable visible browser: `environment.browser.headless=false`
2. Check full logs - never truncate message history
3. Let exceptions propagate to see the full error context
4. Use notebooks for interactive debugging sessions
Leave a review
No reviews yet. Be the first to review this skill!
# Download SKILL.md from killerskills.ai/api/skills/webserv-rl-web-agent-development/raw