Compacting Chat Memory Advisor
This skill teaches you how to implement a custom Spring AI advisor that manages conversation memory intelligently by summarizing old messages rather than dropping them, preserving context while optimizing token usage.
What This Skill Does
Guides you through understanding and extending a Spring Boot application that demonstrates:
Custom `CallAdvisor` implementation for conversation memory managementDual-model architecture: OpenAI GPT-5 for chat, Google Gemini 2.5 Flash for cost-effective summarizationAutomatic compaction of conversation history when thresholds are reachedExternalized configuration for memory management settingsREST endpoints demonstrating standard vs. compacting memory approachesPrerequisites
Java 25 installedMaven installedOpenAI API key (`OPENAI_API_KEY` environment variable)Google GenAI API key (`GOOGLE_GENAI_API_KEY` environment variable)Spring Boot 3.5.6 knowledge recommendedUnderstanding of Spring AI concepts helpfulInstructions
1. Understand the Project Structure
The application is organized into key components:
**`src/main/java/dev/danvega/compact/advisor/CompactingChatMemoryAdvisor.java`** - Core advisor implementation**`src/main/java/dev/danvega/compact/config/ChatMemoryConfiguration.java`** - Bean definitions for memory management**`src/main/java/dev/danvega/compact/config/CompactingMemoryProperties.java`** - Configuration properties**`src/main/java/dev/danvega/compact/ChatMemoryController.java`** - Standard memory endpoint**`src/main/java/dev/danvega/compact/CompactingChatMemoryController.java`** - Compacting memory endpoint**`src/main/resources/application.properties`** - Configuration settings2. Review Core Configuration
Examine `application.properties` to understand:
**Primary chat model** (OpenAI GPT-5): temperature 1.0, handles user-facing responses**Summarization model** (Google Gemini 2.5 Flash): cost-effective background summarization**Compacting memory settings**: max-messages (100), compact-threshold (80), messages-to-compact (40)3. Study the CompactingChatMemoryAdvisor
Review the advisor implementation at `src/main/java/dev/danvega/compact/advisor/CompactingChatMemoryAdvisor.java`:
**`adviseCall()`** - Intercepts chat requests to manage memory lifecycle**`compact()`** - Manually trigger compaction of conversation history**`clear()`** - Reset conversation history**Threshold logic** - Automatically summarizes oldest messages when message count reaches threshold4. Understand the Dual-Model Architecture
The application uses two AI models strategically:
1. **OpenAI GPT-5** - Configured via `@Qualifier("openAi")` in `ChatMemoryConfiguration.java`, handles all user-facing chat responses
2. **Google Gemini 2.5 Flash** - Configured separately, handles background summarization to reduce costs
5. Build and Run the Application
```bash
Build
./mvnw clean install
Run
./mvnw spring-boot:run
Run tests
./mvnw test
```
Ensure both `OPENAI_API_KEY` and `GOOGLE_GENAI_API_KEY` environment variables are set.
6. Test the Endpoints
Compare behavior between standard and compacting memory:
**Standard memory**: `ChatMemoryController` - Uses typical Spring AI memory (drops old messages)**Compacting memory**: `CompactingChatMemoryController` - Summarizes old messages to preserve context**Additional operations**: `/compact` to manually trigger compaction, `/clear` to reset history7. Customize Compaction Behavior
Modify settings in `application.properties`:
**`compact.memory.max-messages`** - Maximum total messages to retain**`compact.memory.compact-threshold`** - When to automatically compact (message count)**`compact.memory.messages-to-compact`** - How many old messages to summarize per compaction8. Extend the Advisor
Consider implementing:
Different summarization strategies (e.g., importance-weighted summarization)Multi-tier compaction (different summarization levels for very old vs. moderately old messages)Configurable summarization promptsMetrics collection (compaction frequency, token savings)Alternative models for summarization9. Integration Patterns
The advisor integrates via Spring AI's `CallAdvisor` interface:
Registered with `ChatClient.Builder` in configurationIntercepts all chat calls automaticallyAugments prompts with conversation historyStores responses in memory after generation10. Testing Strategy
Review test patterns in the test directory:
Unit tests for advisor logicIntegration tests for controller endpointsMock configuration for testing without API keysAssertions on memory state after compactionKey Concepts
**CallAdvisor Pattern**: Spring AI extension point for intercepting chat requests**Memory Compaction**: Summarizing old messages instead of dropping them**Dual-Model Architecture**: Using different models for different tasks (quality vs. cost)**Threshold-Based Automation**: Automatic memory management based on configurable thresholds**Context Preservation**: Maintaining conversation continuity even with long historiesCommon Use Cases
Long-running conversational applications where context mattersCost optimization for high-volume chat applicationsApplications requiring audit trails of conversation summariesMulti-turn conversations with token budget constraintsImportant Notes
The advisor automatically manages memory - no manual intervention requiredSummarization uses a separate model for cost efficiencyConfiguration is externalized for easy tuning in productionThe advisor pattern allows for non-invasive memory managementBoth API keys must be valid for full functionalityTroubleshooting
**"API key not found"**: Ensure `OPENAI_API_KEY` and `GOOGLE_GENAI_API_KEY` environment variables are set**Compaction not triggering**: Check `compact.memory.compact-threshold` is less than `compact.memory.max-messages`**Build failures**: Verify Java 25 is installed and `JAVA_HOME` is set correctly**Test failures**: Check that mock configurations are properly set up for tests