Extract reliable structured JSON from LLMs using Pydantic models for validation, type safety, and automatic retries with the Instructor library
Extract reliable, validated structured data from any LLM using Pydantic models. Instructor handles JSON schema generation, validation, retries, and error correction automatically.
This skill guides you through using the Instructor library to get structured outputs from LLMs. Instead of parsing raw JSON and handling validation manually, you define Pydantic models and Instructor ensures the LLM returns properly typed, validated data.
```bash
pip install instructor
```
Or with modern package managers:
```bash
uv add instructor
poetry add instructor
```
Create a Pydantic model that describes the structure you want to extract:
```python
from pydantic import BaseModel, Field, field_validator
from typing import List, Optional
class User(BaseModel):
name: str = Field(description="Full name of the user")
age: int = Field(description="Age in years", gt=0, lt=150)
email: Optional[str] = Field(default=None, description="Email address")
@field_validator('email')
def validate_email(cls, v):
if v and '@' not in v:
raise ValueError('Email must contain @')
return v
```
Choose your LLM provider using the unified `from_provider` syntax:
```python
import instructor
client = instructor.from_provider("openai/gpt-4o-mini")
client = instructor.from_provider("anthropic/claude-3-5-sonnet")
client = instructor.from_provider("google/gemini-pro")
client = instructor.from_provider("ollama/llama3.2")
client = instructor.from_provider("openai/gpt-4o", api_key="sk-...")
```
Use the client to extract data from natural language:
```python
user = client.chat.completions.create(
response_model=User,
messages=[
{"role": "user", "content": "Extract: John Doe is 32 years old, email [email protected]"}
],
)
print(user)
print(type(user)) # <class '__main__.User'>
```
Instructor automatically handles nested Pydantic models:
```python
class Address(BaseModel):
street: str
city: str
country: str
class Company(BaseModel):
name: str
employees: List[User]
headquarters: Address
company = client.chat.completions.create(
response_model=Company,
messages=[
{"role": "user", "content": """
Acme Corp has two employees: Alice (28) and Bob (35).
Located at 123 Main St, San Francisco, USA.
"""}
],
)
```
Instructor automatically retries failed validations with error feedback:
```python
user = client.chat.completions.create(
response_model=User,
messages=[{"role": "user", "content": "Extract user info..."}],
max_retries=3, # Will retry up to 3 times if validation fails
)
```
Stream partial objects as the LLM generates them:
```python
from instructor import Partial
for partial_user in client.chat.completions.create(
response_model=Partial[User],
messages=[{"role": "user", "content": "John is 25"}],
stream=True,
):
print(partial_user)
# User(name=None, age=None)
# User(name='John', age=None)
# User(name='John', age=25)
```
Add custom validators to enforce business rules:
```python
from pydantic import model_validator
class Order(BaseModel):
item: str
quantity: int
price: float
@model_validator(mode='after')
def check_total(self):
if self.quantity * self.price > 10000:
raise ValueError('Order total cannot exceed $10,000')
return self
```
**Data Extraction**: Extract structured data from documents, emails, or chat messages
**API Response Parsing**: Convert unstructured LLM responses into typed objects
**Form Filling**: Populate forms from natural language input
**Entity Recognition**: Extract named entities with validation
**Classification**: Classify text into predefined categories with confidence scores
Extract lists of objects:
```python
class UserList(BaseModel):
users: List[User]
result = client.chat.completions.create(
response_model=UserList,
messages=[{"role": "user", "content": "Extract all users from this text..."}],
)
```
Use enums for categorical data:
```python
from enum import Enum
class Sentiment(str, Enum):
POSITIVE = "positive"
NEGATIVE = "negative"
NEUTRAL = "neutral"
class Analysis(BaseModel):
text: str
sentiment: Sentiment
confidence: float = Field(ge=0.0, le=1.0)
```
Handle missing data gracefully:
```python
class Product(BaseModel):
name: str
price: Optional[float] = None
in_stock: bool = True
description: Optional[str] = Field(default=None, max_length=500)
```
**Validation errors persist**: Increase `max_retries` or simplify your model
**Slow responses**: Use streaming for better user experience
**Missing fields**: Add `Optional` types or default values
**Type errors**: Ensure Pydantic v2 is installed (`pip install -U pydantic`)
Leave a review
No reviews yet. Be the first to review this skill!
# Download SKILL.md from killerskills.ai/api/skills/python-instructor-integration/raw