Extract structured data from PyPI packages using Marvin AI
Use Marvin AI to analyze PyPI packages and extract structured information from their documentation, READMEs, and metadata.
This skill helps you leverage the Marvin Python framework to extract, classify, cast, and generate structured data from PyPI package information. Marvin provides an intuitive API for working with AI to produce type-safe, validated outputs from unstructured package documentation.
1. Install Marvin via pip or uv:
```bash
uv add marvin
pip install marvin
```
2. Configure your LLM provider:
```bash
export OPENAI_API_KEY=your-api-key
```
Marvin uses OpenAI by default but natively supports all Pydantic AI models.
Create a Python script and import Marvin:
```python
import marvin
from typing import TypedDict
from enum import Enum
```
Create Pydantic models or TypedDict classes for the data you want to extract:
```python
class PackageMetadata(TypedDict):
name: str
version: str
description: str
keywords: list[str]
class PackageCategory(Enum):
WEB_FRAMEWORK = "web_framework"
DATA_SCIENCE = "data_science"
AI_ML = "ai_ml"
DEVTOOLS = "devtools"
TESTING = "testing"
```
Use Marvin's extraction utilities to parse unstructured package documentation:
```python
features = marvin.extract(
package_readme_text,
list[str],
instructions="Extract main features and capabilities"
)
category = marvin.classify(
package_description,
PackageCategory
)
metadata = marvin.cast(
raw_package_info,
PackageMetadata
)
```
For more sophisticated package analysis, create specialized agents:
```python
analyst = marvin.Agent(
name="Package Analyst",
instructions="Analyze Python packages for security, quality, and usability"
)
analysis = analyst.run(
f"Analyze the marvin package: {package_info}",
result_type=dict
)
```
Generate structured comparisons between packages:
```python
alternatives = marvin.generate(
PackageMetadata,
n=5,
instructions="Generate similar packages to Marvin for AI/ML workflows"
)
```
```python
dependencies = marvin.extract(
readme_content,
list[str],
instructions="Extract all mentioned dependencies and libraries"
)
```
```python
class Maturity(Enum):
ALPHA = "alpha"
BETA = "beta"
STABLE = "stable"
MATURE = "mature"
maturity = marvin.classify(
package_changelog + package_version_info,
Maturity
)
```
```python
summary = marvin.run(
f"Summarize the API surface of this package: {api_docs}",
result_type=str
)
```
```python
security_issues = marvin.extract(
package_code + package_dependencies,
list[str],
instructions="Identify potential security vulnerabilities or concerns"
)
```
1. **Use Type Hints**: Always define clear Pydantic models or TypedDict structures for extraction targets
2. **Provide Context**: Use the `instructions` parameter to guide extraction with domain-specific requirements
3. **Validate Results**: Marvin provides type-safe outputs, but always validate critical data
4. **Combine Utilities**: Chain `extract`, `classify`, and `cast` for multi-stage analysis pipelines
5. **Use Agents for Complex Tasks**: For workflows requiring multiple steps, use `marvin.Agent` and `marvin.Task`
```python
from marvin import Agent, Task, Thread
security_analyst = Agent(
name="Security Analyst",
instructions="Focus on security vulnerabilities and best practices"
)
quality_analyst = Agent(
name="Code Quality Analyst",
instructions="Evaluate code quality, testing, and maintainability"
)
with marvin.Thread() as thread:
security_report = marvin.run(
f"Analyze security of package: {package_info}",
agents=[security_analyst]
)
quality_report = marvin.run(
f"Evaluate code quality: {package_info}",
agents=[quality_analyst]
)
final_recommendation = marvin.run(
"Provide adoption recommendation",
context={
"security": security_report,
"quality": quality_report
}
)
```
Leave a review
No reviews yet. Be the first to review this skill!
# Download SKILL.md from killerskills.ai/api/skills/pypi-package-analysis-and-structured-output/raw