WebServ RL Web Agent Development

You are working with an RL (Reinforcement Learning) web agent project that enables automated browser interactions through a WebArena-like environment. The project uses Playwright for browser automation, Hydra for configuration management, and includes a proxy system for host rewriting.

Core Architecture

Main Components

1. **WebAgentEnv** (`rl_web_agent/env.py`) - Main environment class managing browser sessions, page interactions, and step-based RL interface

2. **Configuration** - Single YAML config file (`rl_web_agent/conf/config.yaml`) managed by Hydra

3. **JavaScript Layer** - Browser-side scripts:

- `parser.js` - DOM parsing with semantic ID assignment

- `initscript.js` - Event detection and hover tracking

4. **Proxy System** (`proxy/proxy_client_aiohttp.py`) - HTTP proxy for host rewriting and API gateway communication

Design Patterns

**Async/Await** - All browser operations use Playwright's async API

**Semantic IDs** - DOM elements tagged with `data-semantic-id` for reliable interaction

**Action-Observation Loop** - JSON-based action interface with structured observations

**Shared Playwright Instance** - ClassVar pattern for efficient resource management

Development Workflow

Running the Agent

```bash

Basic execution with default config

python -m rl_web_agent.main

Override configuration values

python -m rl_web_agent.main environment.browser.headless=true environment.proxy.enabled=false

Development with Jupyter (primary workflow)

jupyter notebook # Use notebooks/ directory

```

Dependency Management (UV)

```bash

Install dependencies

uv sync

Add new packages

uv add package_name

GPU support (transformers, torch)

uv sync --group gpu

WebArena support

uv sync --extra webarena

```

Code Quality

```bash

Linting with ruff

ruff check .

ruff format .

```

Configuration System

All configuration is centralized in `rl_web_agent/conf/config.yaml` using Hydra's single config approach.

Key Sections

`environment.browser` - Playwright launch and context options

`environment.proxy` - Proxy server settings

`environment.sites` - Site name to hostname mappings for WebArena

`hydra.run.dir` - Output directory for logs

Override Examples

```bash

Disable headless mode

python -m rl_web_agent.main environment.browser.launch_options.headless=false

Change proxy settings

python -m rl_web_agent.main environment.proxy.server=http://localhost:9090

Add site mapping

python -m rl_web_agent.main environment.sites.custom_site=example.com:8080

```

WebAgentEnv API

Action Format

Actions are JSON strings:

```python

Click action

await env.step('{"action": "click", "target": "login_button"}')

Text input with optional enter

await env.step('{"action": "type", "target": "username", "text": "john_doe", "enter": true}')

Navigation

await env.step('{"action": "goto_url", "url": "https://example.com"}')

Tab management

await env.step('{"action": "new_tab", "url": "https://example.com"}')

await env.step('{"action": "switch_tab", "tab_id": 1}')

```

Observation Structure

```python

observation = await env.observation()

Returns:

{

"html": "...", # Processed DOM with semantic IDs

"clickable_elements": ["button1", "link2", ...],

"input_elements": [{"id": "username", "type": "text", "value": "", ...}],

"tabs": [{"id": 0, "title": "Page Title", "url": "...", "is_active": true}]

}

```

Browser Automation Pipeline

1. **Initialization** (`initscript.js`) - Detects hover events, marks hoverable elements

2. **DOM Processing** (`parser.js`) - Strips to essential interactive elements, assigns semantic IDs, preserves form state

3. **Element Interaction** - Uses semantic IDs for reliable targeting, auto-scrolls elements into view

Critical Development Rules

Code Style

Use async/await for ALL browser operations

Follow PEP 8 with 4-space indentation

Use `pathlib.Path` instead of `os.path`

**NEVER use dict.get()** - Always use direct key access (`dict["key"]`) to fail fast

**No graceful error handling for missing keys** - Let KeyErrors expose bugs early

Error Handling (FAIL FAST)

**NO try-catch blocks** unless absolutely necessary (e.g., LLM API retries for rate limits)

**NEVER use getattr() or dict.get()** - Direct key access only

Let KeyErrors and AttributeErrors happen to catch bugs early

Use structured logging with semantic context

Logging Rules (NO TRUNCATION)

**NEVER truncate conversation history or messages** - No `[:200]` or similar truncation

**NEVER limit message history** - Log all messages, not just recent (avoid `[-4:]` slicing)

**ALWAYS show full content** - Complete debugging information required

**NO shortcuts** - Show full conversation history, not "last N messages"

Prompt Management Rules

**NEVER hardcode prompts in code** - Always load from `.txt` files in `rl_web_agent/prompts/`

**Use load_prompt() function** - Import from `rl_web_agent.prompts`

**Descriptive filenames** - e.g., `fuzzy_match_evaluator.txt`, `system_prompt.txt`

**Use format strings** - Placeholders like `{objective}`, `{question}`, `{reference}`

Example:

```python

from rl_web_agent.prompts import load_prompt

prompt = load_prompt("system_prompt").format(

objective=task_objective,

context=context_info

)

```

Third-Party Integration

WebArena (`thirdparty/webarena/`)

Editable dependency for web-based RL environments

Task configurations define start URLs, evaluation criteria, required actions

VERL (`thirdparty/verl/`)

Reinforcement learning workflows

GPU dependency group includes ML packages

Supports distributed training with Ray

Testing Approach

**Primary development via Jupyter notebooks** in `notebooks/` directory

**No formal test suite** - notebooks serve as interactive testing

Use `notebooks/playwright-test.ipynb` for browser automation experiments

Performance Considerations

Shared Playwright instance across multiple environments

Image blocking enabled by default for faster page loads

Browser args optimized for automation (disable extensions, autofill, etc.)

Instructions

When working with this codebase:

1. **Start with Jupyter notebooks** in `notebooks/` for experimentation

2. **Use Hydra overrides** for configuration changes instead of editing YAML directly

3. **Follow fail-fast principles** - Let errors surface immediately, don't hide them

4. **Never truncate logs** - Always show full content for debugging

5. **Load prompts from files** - Never hardcode prompts in Python code

6. **Use async/await consistently** - All browser operations must be asynchronous

7. **Test with Playwright notebooks** - Iterate quickly in interactive environment

8. **Leverage semantic IDs** - Use the DOM's semantic ID system for reliable element targeting

When adding new features:

1. Prototype in a Jupyter notebook first

2. Use direct key access (`dict["key"]`) to catch bugs early

3. Load any prompts from `rl_web_agent/prompts/` directory

4. Test with `python -m rl_web_agent.main` with appropriate config overrides

5. Update `config.yaml` if new configuration options are needed

When debugging:

1. Enable visible browser: `environment.browser.headless=false`

2. Check full logs - never truncate message history

3. Let exceptions propagate to see the full error context

4. Use notebooks for interactive debugging sessions

WebServ RL Web Agent Development

WebServ RL Web Agent Development

Core Architecture

Main Components

Design Patterns

Development Workflow

Running the Agent

Basic execution with default config

Override configuration values

Development with Jupyter (primary workflow)

Dependency Management (UV)

Install dependencies

Add new packages

GPU support (transformers, torch)

WebArena support

Code Quality

Linting with ruff

Configuration System

Key Sections

Override Examples

Disable headless mode

Change proxy settings

Add site mapping

WebAgentEnv API

Action Format

Click action

Text input with optional enter

Navigation

Tab management

Observation Structure

Returns:

{

"html": "...", # Processed DOM with semantic IDs

"clickable_elements": ["button1", "link2", ...],

"input_elements": [{"id": "username", "type": "text", "value": "", ...}],

"tabs": [{"id": 0, "title": "Page Title", "url": "...", "is_active": true}]

}

Browser Automation Pipeline

Critical Development Rules

Code Style

Error Handling (FAIL FAST)

Logging Rules (NO TRUNCATION)

Prompt Management Rules

Third-Party Integration

WebArena (`thirdparty/webarena/`)

VERL (`thirdparty/verl/`)

Testing Approach

Performance Considerations

Instructions

Reviews (0)