Expert guidance for building AI agents with Scrapybara SDK - remote desktop automation, computer use tools, and structured agent workflows
Expert guidance for building AI agents with the Scrapybara TypeScript SDK. Scrapybara provides remote desktop instances (Ubuntu, Browser, Windows) that AI agents can control through bash, computer use, file editing, and browser automation tools.
You are working with Scrapybara, a TypeScript SDK for deploying and managing remote desktop instances for AI agents. This skill teaches you to properly interact with the SDK for building computer use agents.
Always start by importing and initializing the Scrapybara client:
```typescript
import { ScrapybaraClient } from "scrapybara";
const client = new ScrapybaraClient({ apiKey: "YOUR_API_KEY" });
```
Choose the appropriate instance type based on your automation needs:
1. **Ubuntu Instance** - Full Linux environment with bash, computer use, file editing, and browser support:
```typescript
const ubuntuInstance = await client.startUbuntu({ timeoutHours: 1 });
```
2. **Browser Instance** - Lightweight browser-only environment:
```typescript
const browserInstance = await client.startBrowser();
```
3. **Windows Instance** - Windows desktop environment:
```typescript
const windowsInstance = await client.startWindows();
```
Manage instance lifecycle to optimize costs:
```typescript
await instance.pause(); // Pause to save resources
await instance.resume({ timeoutHours: 1 }); // Resume work
await instance.stop(); // Terminate and clean up
```
**Critical**: Always stop instances after use to prevent unnecessary billing. Instances auto-terminate after 1 hour by default.
Import types based on your needs:
```typescript
// Core types
import { ScrapybaraClient, UbuntuInstance, BrowserInstance, WindowsInstance } from "scrapybara";
// Tool types
import { bashTool, computerTool, editTool } from "scrapybara/tools";
// Model types
import { anthropic } from "scrapybara/anthropic";
// Message types
import { z } from "zod";
// Error types
import { ApiError } from "scrapybara/core";
// Request/Response types (namespace)
import { Scrapybara } from "scrapybara";
```
Capture screenshots to understand the current state:
```typescript
const base64Image = await instance.screenshot().base64Image;
```
Execute bash commands for system operations:
```typescript
await instance.bash({ command: "ls -la" });
```
Perform mouse and keyboard actions:
```typescript
// Mouse movement
await instance.computer({ action: "move_mouse", coordinates: [x, y] });
// Click actions (left, right, middle)
await instance.computer({
action: "click_mouse",
button: "right",
coordinates: [x, y]
});
// Drag operations
await instance.computer({
action: "drag_mouse",
path: [[x1, y1], [x2, y2]]
});
// Scrolling
await instance.computer({
action: "scroll",
coordinates: [x, y],
delta_x: 0,
delta_y: -100
});
// Keyboard input
await instance.computer({ action: "press_key", keys: ["ctrl", "c"] });
await instance.computer({ action: "type_text", text: "Hello world" });
// Wait for UI updates
await instance.computer({ action: "wait", duration: 3 });
// Get cursor position
await instance.computer({ action: "get_cursor_position" });
```
Read and write files on the remote instance:
```typescript
// Write file
await instance.file.write({
path: "/tmp/data.txt",
content: "file content"
});
// Read file
const content = await instance.file.read({
path: "/tmp/data.txt"
}).content;
```
The Act SDK is the **primary focus** for building computer use agents. It provides a unified interface combining models, tools, and structured workflows.
1. **Model**: LLM integration (currently Anthropic Claude)
```typescript
import { anthropic } from "scrapybara/anthropic";
// Use Scrapybara's API key
const model = anthropic();
// Or use your own Anthropic API key
const model = anthropic({ apiKey: "YOUR_ANTHROPIC_KEY" });
```
2. **Tools**: Interface for computer interactions
```typescript
import { bashTool, computerTool, editTool } from "scrapybara/tools";
const tools = [
bashTool(instance), // Shell command execution
computerTool(instance), // Mouse/keyboard control
editTool(instance), // File editing
];
```
3. **Prompts**: Task instructions
```typescript
import { UBUNTU_SYSTEM_PROMPT, BROWSER_SYSTEM_PROMPT, WINDOWS_SYSTEM_PROMPT } from "scrapybara/prompts";
// Option 1: Simple prompt
const response = await client.act({
model: anthropic(),
tools,
system: UBUNTU_SYSTEM_PROMPT,
prompt: "Your task description here"
});
// Option 2: Multi-turn conversation with messages array
const response = await client.act({
model: anthropic(),
tools,
system: UBUNTU_SYSTEM_PROMPT,
messages: [
{ role: "user", content: [{ type: "text", text: "First task" }] },
{ role: "assistant", content: [{ type: "text", text: "Response" }] },
{ role: "user", content: [{ type: "text", text: "Follow-up task" }] }
]
});
```
**Important**: Only include either `prompt` OR `messages`, not both.
Monitor agent progress in real-time using the `onStep` callback:
```typescript
const handleStep = (step: Step) => {
console.log(`Agent output: ${step.text}`);
if (step.toolCalls) {
for (const call of step.toolCalls) {
console.log(`Calling tool: ${call.toolName}`);
console.log(`Arguments: ${JSON.stringify(call.args)}`);
}
}
if (step.toolResults) {
for (const result of step.toolResults) {
console.log(`Tool result: ${result.result}`);
if (result.isError) {
console.error(`Tool error occurred`);
}
}
}
console.log(`Tokens used: ${step.usage?.totalTokens ?? 'N/A'}`);
};
const { messages, steps, text, output, usage } = await client.act({
model: anthropic(),
tools,
system: UBUNTU_SYSTEM_PROMPT,
prompt: "Your task",
onStep: handleStep
});
```
The Act SDK uses structured messages with typed content:
**Content Types:**
1. **TextPart** - Simple text content:
```typescript
{ type: "text", text: "content" }
```
2. **ImagePart** - Base64 or URL images:
```typescript
{ type: "image", image: "base64...", mimeType: "image/png" }
```
3. **ReasoningPart** - Model reasoning (extended thinking):
```typescript
{
type: "reasoning",
id: "id",
reasoning: "reasoning process",
signature: "signature",
instructions: "instructions"
}
```
4. **ToolCallPart** - Tool invocations:
```typescript
{
type: "tool-call",
toolCallId: "unique-id",
toolName: "bash",
args: { command: "ls -la" }
}
```
5. **ToolResultPart** - Tool execution results:
```typescript
{
type: "tool-result",
toolCallId: "unique-id",
toolName: "bash",
result: "file1.txt\nfile2.txt",
isError: false
}
```
Define expected output structure using Zod schemas for type-safe results:
```typescript
import { z } from "zod";
const schema = z.object({
posts: z.array(z.object({
title: z.string(),
url: z.string(),
points: z.number(),
})),
});
const { output } = await client.act({
model: anthropic(),
tools,
schema, // Validates and types the output
system: UBUNTU_SYSTEM_PROMPT,
prompt: "Get the top 10 posts on Hacker News",
});
// output.posts is now fully typed
const posts = output.posts; // TypeScript knows this is an array of posts
```
The `output` field contains the validated, typed data returned by the model matching your schema.
Monitor token consumption through `TokenUsage` objects:
```typescript
const { usage, steps } = await client.act({
model: anthropic(),
tools,
system: UBUNTU_SYSTEM_PROMPT,
prompt: "Your task"
});
// Overall usage
console.log(`Total tokens: ${usage.totalTokens}`);
console.log(`Prompt tokens: ${usage.promptTokens}`);
console.log(`Completion tokens: ${usage.completionTokens}`);
// Per-step usage
for (const step of steps) {
console.log(`Step tokens: ${step.usage?.totalTokens}`);
}
```
```typescript
import { ScrapybaraClient } from "scrapybara";
import { anthropic } from "scrapybara/anthropic";
import { UBUNTU_SYSTEM_PROMPT } from "scrapybara/prompts";
import { bashTool, computerTool, editTool } from "scrapybara/tools";
async function scrapeYCombinator() {
const client = new ScrapybaraClient();
const instance = await client.startUbuntu();
// Start browser on the instance
await instance.browser.start();
const { messages, steps, text, output, usage } = await client.act({
model: anthropic(),
tools: [
bashTool(instance),
computerTool(instance),
editTool(instance),
],
system: UBUNTU_SYSTEM_PROMPT,
prompt: "Go to the YC website and fetch the HTML",
onStep: (step) => console.log(`${step}\n`),
});
console.log(`Task completed. Total tokens: ${usage.totalTokens}`);
// Clean up
await instance.browser.stop();
await instance.stop();
}
```
Save and reuse browser authentication state:
```typescript
const instance = await client.startUbuntu();
// Start browser and get CDP URL
const { cdpUrl } = await instance.browser.start();
// Perform login, then save auth state
const { authStateId } = await instance.browser.saveAuth({
name: "default"
});
// Later sessions can reuse the auth
await instance.browser.authenticate({ authStateId });
```
Manage environment variables on the remote instance:
```typescript
// Set variables
await instance.env.set({
API_KEY: "secret-key",
DATABASE_URL: "postgres://..."
});
// Get all variables
const { variables } = await instance.env.get();
// Delete variables
await instance.env.delete(["API_KEY"]);
```
Always wrap SDK calls in try-catch blocks:
```typescript
import { ApiError } from "scrapybara/core";
try {
const instance = await client.startUbuntu();
// ... perform operations
await instance.stop();
} catch (e) {
if (e instanceof ApiError) {
console.error(`API Error ${e.statusCode}: ${e.body}`);
} else {
console.error(`Unexpected error: ${e}`);
}
}
```
1. **Always stop instances** after use to prevent unnecessary billing
2. **Use async/await** for all operations as they are asynchronous
3. **Handle errors** with try-catch blocks around all SDK calls
4. **Customize timeouts** when needed (default is 60s):
```typescript
await client.startUbuntu({ timeoutHours: 2 });
```
5. **Start browser before browserTool usage** on Ubuntu instances:
```typescript
await instance.browser.start();
// Now use browserTool
```
6. **Prefer bash commands over GUI** interactions for launching applications (faster and more reliable)
7. **Use appropriate system prompts**: `UBUNTU_SYSTEM_PROMPT`, `BROWSER_SYSTEM_PROMPT`, or `WINDOWS_SYSTEM_PROMPT` based on instance type
```typescript
const { messages, steps, text, output, usage } = await client.act({
model: anthropic(),
tools: [bashTool(instance), computerTool(instance), editTool(instance)],
system: UBUNTU_SYSTEM_PROMPT,
prompt: "Task description"
});
```
```typescript
const schema = z.object({
data: z.array(z.object({
title: z.string(),
link: z.string()
}))
});
const { output } = await client.act({
model: anthropic(),
tools,
schema,
system: UBUNTU_SYSTEM_PROMPT,
prompt: "Collect all article titles and links from the homepage"
});
```
```typescript
const instance = await client.startUbuntu();
await instance.browser.start();
// First time: login and save
const { authStateId } = await instance.browser.saveAuth({ name: "site-auth" });
// Subsequent runs: reuse auth
await instance.browser.authenticate({ authStateId });
```
Leave a review
No reviews yet. Be the first to review this skill!
# Download SKILL.md from killerskills.ai/api/skills/scrapybara-typescript-agent-development-fdpk7m/raw