Generate and work with LangChain.js text splitters for retrieval-augmented generation (RAG) pipelines. Includes character, recursive, token-based, and semantic text splitting implementations.
Expert assistance for implementing and working with LangChain.js text splitters, commonly used in retrieval-augmented generation (RAG) pipelines.
This skill helps you implement various text splitting strategies using the `@langchain/textsplitters` package. Text splitters are essential for breaking down large documents into manageable chunks for vector databases, embeddings, and RAG systems.
When the user requests help with text splitting, LangChain text splitters, or RAG pipelines:
First, determine what type of text splitting is needed:
If not already installed, add the required packages:
```bash
npm install @langchain/textsplitters @langchain/core
```
Choose and implement the right splitter based on the use case:
**For general text (Recursive Character Splitter - RECOMMENDED):**
```typescript
import { RecursiveCharacterTextSplitter } from "@langchain/textsplitters";
const splitter = new RecursiveCharacterTextSplitter({
chunkSize: 1000,
chunkOverlap: 200,
});
const docs = await splitter.createDocuments([text]);
```
**For token-based splitting:**
```typescript
import { TokenTextSplitter } from "@langchain/textsplitters";
const splitter = new TokenTextSplitter({
chunkSize: 500,
chunkOverlap: 50,
});
```
**For code:**
```typescript
import { RecursiveCharacterTextSplitter } from "@langchain/textsplitters";
const splitter = RecursiveCharacterTextSplitter.fromLanguage("js", {
chunkSize: 1000,
chunkOverlap: 200,
});
```
**For Markdown:**
```typescript
import { MarkdownTextSplitter } from "@langchain/textsplitters";
const splitter = new MarkdownTextSplitter({
chunkSize: 1000,
chunkOverlap: 200,
});
```
Explain and help configure key parameters:
Common chunk sizes:
Overlap typically 10-20% of chunk size.
If implementing a full RAG pipeline, show how text splitters fit:
```typescript
import { RecursiveCharacterTextSplitter } from "@langchain/textsplitters";
import { MemoryVectorStore } from "langchain/vectorstores/memory";
import { OpenAIEmbeddings } from "@langchain/openai";
// 1. Split documents
const splitter = new RecursiveCharacterTextSplitter({
chunkSize: 1000,
chunkOverlap: 200,
});
const splitDocs = await splitter.createDocuments([text]);
// 2. Create embeddings and store in vector database
const vectorStore = await MemoryVectorStore.fromDocuments(
splitDocs,
new OpenAIEmbeddings()
);
// 3. Query
const results = await vectorStore.similaritySearch(query, 4);
```
Address common issues:
Help verify the splitting works correctly:
1. **Default to RecursiveCharacterTextSplitter** for most text - it's the most intelligent general-purpose splitter
2. **Overlap is important** - helps maintain context across chunk boundaries
3. **Match chunk size to your embedding model** and downstream use case
4. **Test with real documents** from the user's domain
5. **Consider document structure** - use specialized splitters (Markdown, Code, HTML) when appropriate
**Basic RAG setup:**
**Code documentation:**
**Long-form content:**
Leave a review
No reviews yet. Be the first to review this skill!
# Download SKILL.md from killerskills.ai/api/skills/langchain-text-splitters/raw