Connect to LLM providers and trace API calls with vision utilities for image analysis and bounding box detection
Connect to LLM providers (OpenAI, Anthropic, etc.) and trace API calls. Includes vision utilities for querying images and detecting UI elements via bounding boxes.
This skill helps you integrate LLM APIs into your application with built-in tracing capabilities. It also provides vision utilities to:
1. Install the package:
```bash
npm install @empiricalrun/llm
```
2. Import and initialize the LLM client:
```typescript
import { LLM } from "@empiricalrun/llm";
const llm = new LLM({
provider: "openai",
defaultModel: "gpt-4o",
});
```
3. Make a chat completion request:
```typescript
const llmResponse = await llm.createChatCompletion({
messages: [
{ role: "user", content: "Your prompt here" }
]
});
```
Use this to ask questions about an image and get text answers.
```typescript
import { query } from "@empiricalrun/llm/vision";
// Example with Appium screenshot (base64 encoded)
const data = await driver.saveScreenshot("dummy.png");
const instruction = "Extract number of ATOM tokens from the image. Return only the number.";
const text = await query(data.toString("base64"), instruction);
// Returns: "0.01"
```
**Use cases:**
Get coordinates for UI elements described in natural language.
```typescript
import { getBoundingBox } from "@empiricalrun/llm/vision/bbox";
const data = await driver.saveScreenshot("dummy.png");
const instruction = "This screenshot shows a screen to send crypto tokens. What is the bounding box for the dropdown to select the token?";
const bbox = await getBoundingBox(data.toString("base64"), instruction);
const centerToTap = bbox.center; // { x: 342, y: 450 }
```
**Important:** Coordinates are relative to the image dimensions. Scale them to your target system's coordinates before using (e.g., for Appium tap actions).
Iterate on bounding box detection with visual feedback:
```typescript
const bbox = await getBoundingBox(data.toString("base64"), instruction, {
debug: true,
});
// Returns base64 image with bounding box drawn
console.log(bbox.annotatedImage);
```
Save the annotated image to verify the detection is correct, then refine your instruction if needed.
```typescript
// Verify a button exists and tap it
const screenshot = await driver.saveScreenshot("screen.png");
const bbox = await getBoundingBox(
screenshot.toString("base64"),
"Find the 'Continue' button"
);
await driver.touchAction({
action: "tap",
x: bbox.center.x * scaleX,
y: bbox.center.y * scaleY
});
```
```typescript
const screenshot = await takeScreenshot();
const balance = await query(
screenshot,
"What is the account balance shown? Return only the number."
);
console.log(`Current balance: $${balance}`);
```
Leave a review
No reviews yet. Be the first to review this skill!
# Download SKILL.md from killerskills.ai/api/skills/llm-connection-and-tracing/raw