LLM Connection & Tracing

Connect to LLM providers (OpenAI, Anthropic, etc.) and trace API calls. Includes vision utilities for querying images and detecting UI elements via bounding boxes.

What This Skill Does

This skill helps you integrate LLM APIs into your application with built-in tracing capabilities. It also provides vision utilities to:

Query images with natural language (extract information, make decisions)

Detect UI elements and get their bounding boxes (useful for automation testing)

Debug bounding box detection with annotated images

Instructions

Basic LLM Connection

1. Install the package:

```bash

npm install @empiricalrun/llm

```

2. Import and initialize the LLM client:

```typescript

import { LLM } from "@empiricalrun/llm";

const llm = new LLM({

provider: "openai",

defaultModel: "gpt-4o",

});

```

3. Make a chat completion request:

```typescript

const llmResponse = await llm.createChatCompletion({

messages: [

{ role: "user", content: "Your prompt here" }

]

});

```

Vision Query (Extract Information from Images)

Use this to ask questions about an image and get text answers.

```typescript

import { query } from "@empiricalrun/llm/vision";

// Example with Appium screenshot (base64 encoded)

const data = await driver.saveScreenshot("dummy.png");

const instruction = "Extract number of ATOM tokens from the image. Return only the number.";

const text = await query(data.toString("base64"), instruction);

// Returns: "0.01"

```

**Use cases:**

Extract text from screenshots

Verify UI state in tests

Make decisions based on image content

Bounding Box Detection (Locate UI Elements)

Get coordinates for UI elements described in natural language.

```typescript

import { getBoundingBox } from "@empiricalrun/llm/vision/bbox";

const data = await driver.saveScreenshot("dummy.png");

const instruction = "This screenshot shows a screen to send crypto tokens. What is the bounding box for the dropdown to select the token?";

const bbox = await getBoundingBox(data.toString("base64"), instruction);

const centerToTap = bbox.center; // { x: 342, y: 450 }

```

**Important:** Coordinates are relative to the image dimensions. Scale them to your target system's coordinates before using (e.g., for Appium tap actions).

Debug Bounding Boxes

Iterate on bounding box detection with visual feedback:

```typescript

const bbox = await getBoundingBox(data.toString("base64"), instruction, {

debug: true,

});

// Returns base64 image with bounding box drawn

console.log(bbox.annotatedImage);

```

Save the annotated image to verify the detection is correct, then refine your instruction if needed.

Examples

Example 1: Test Automation with Vision

```typescript

// Verify a button exists and tap it

const screenshot = await driver.saveScreenshot("screen.png");

const bbox = await getBoundingBox(

screenshot.toString("base64"),

"Find the 'Continue' button"

);

await driver.touchAction({

action: "tap",

x: bbox.center.x * scaleX,

y: bbox.center.y * scaleY

});

```

Example 2: Extract Data from Screenshots

```typescript

const screenshot = await takeScreenshot();

const balance = await query(

screenshot,

"What is the account balance shown? Return only the number."

);

console.log(`Current balance: $${balance}`);

```

Notes

Requires API keys for the LLM provider (set via environment variables)

Vision features work with base64-encoded images

Bounding box prompts may require iteration—use debug mode to visualize results

Package version: 0.25.2

LLM Connection & Tracing

LLM Connection & Tracing

What This Skill Does

Instructions

Basic LLM Connection

Vision Query (Extract Information from Images)

Bounding Box Detection (Locate UI Elements)

Debug Bounding Boxes

Examples

Example 1: Test Automation with Vision

Example 2: Extract Data from Screenshots

Notes

Reviews (0)