Compacting Chat Memory Advisor

This skill teaches you how to implement a custom Spring AI advisor that manages conversation memory intelligently by summarizing old messages rather than dropping them, preserving context while optimizing token usage.

What This Skill Does

Guides you through understanding and extending a Spring Boot application that demonstrates:

Custom `CallAdvisor` implementation for conversation memory management

Dual-model architecture: OpenAI GPT-5 for chat, Google Gemini 2.5 Flash for cost-effective summarization

Automatic compaction of conversation history when thresholds are reached

Externalized configuration for memory management settings

REST endpoints demonstrating standard vs. compacting memory approaches

Prerequisites

Java 25 installed

Maven installed

OpenAI API key (`OPENAI_API_KEY` environment variable)

Google GenAI API key (`GOOGLE_GENAI_API_KEY` environment variable)

Spring Boot 3.5.6 knowledge recommended

Understanding of Spring AI concepts helpful

Instructions

1. Understand the Project Structure

The application is organized into key components:

**`src/main/java/dev/danvega/compact/advisor/CompactingChatMemoryAdvisor.java`** - Core advisor implementation

**`src/main/java/dev/danvega/compact/config/ChatMemoryConfiguration.java`** - Bean definitions for memory management

**`src/main/java/dev/danvega/compact/config/CompactingMemoryProperties.java`** - Configuration properties

**`src/main/java/dev/danvega/compact/ChatMemoryController.java`** - Standard memory endpoint

**`src/main/java/dev/danvega/compact/CompactingChatMemoryController.java`** - Compacting memory endpoint

**`src/main/resources/application.properties`** - Configuration settings

2. Review Core Configuration

Examine `application.properties` to understand:

**Primary chat model** (OpenAI GPT-5): temperature 1.0, handles user-facing responses

**Summarization model** (Google Gemini 2.5 Flash): cost-effective background summarization

**Compacting memory settings**: max-messages (100), compact-threshold (80), messages-to-compact (40)

3. Study the CompactingChatMemoryAdvisor

Review the advisor implementation at `src/main/java/dev/danvega/compact/advisor/CompactingChatMemoryAdvisor.java`:

**`adviseCall()`** - Intercepts chat requests to manage memory lifecycle

**`compact()`** - Manually trigger compaction of conversation history

**`clear()`** - Reset conversation history

**Threshold logic** - Automatically summarizes oldest messages when message count reaches threshold

4. Understand the Dual-Model Architecture

The application uses two AI models strategically:

1. **OpenAI GPT-5** - Configured via `@Qualifier("openAi")` in `ChatMemoryConfiguration.java`, handles all user-facing chat responses

2. **Google Gemini 2.5 Flash** - Configured separately, handles background summarization to reduce costs

5. Build and Run the Application

```bash

Build

./mvnw clean install

Run

./mvnw spring-boot:run

Run tests

./mvnw test

```

Ensure both `OPENAI_API_KEY` and `GOOGLE_GENAI_API_KEY` environment variables are set.

6. Test the Endpoints

Compare behavior between standard and compacting memory:

**Standard memory**: `ChatMemoryController` - Uses typical Spring AI memory (drops old messages)

**Compacting memory**: `CompactingChatMemoryController` - Summarizes old messages to preserve context

**Additional operations**: `/compact` to manually trigger compaction, `/clear` to reset history

7. Customize Compaction Behavior

Modify settings in `application.properties`:

**`compact.memory.max-messages`** - Maximum total messages to retain

**`compact.memory.compact-threshold`** - When to automatically compact (message count)

**`compact.memory.messages-to-compact`** - How many old messages to summarize per compaction

8. Extend the Advisor

Consider implementing:

Different summarization strategies (e.g., importance-weighted summarization)

Multi-tier compaction (different summarization levels for very old vs. moderately old messages)

Configurable summarization prompts

Metrics collection (compaction frequency, token savings)

Alternative models for summarization

9. Integration Patterns

The advisor integrates via Spring AI's `CallAdvisor` interface:

Registered with `ChatClient.Builder` in configuration

Intercepts all chat calls automatically

Augments prompts with conversation history

Stores responses in memory after generation

10. Testing Strategy

Review test patterns in the test directory:

Unit tests for advisor logic

Integration tests for controller endpoints

Mock configuration for testing without API keys

Assertions on memory state after compaction

Key Concepts

**CallAdvisor Pattern**: Spring AI extension point for intercepting chat requests

**Memory Compaction**: Summarizing old messages instead of dropping them

**Dual-Model Architecture**: Using different models for different tasks (quality vs. cost)

**Threshold-Based Automation**: Automatic memory management based on configurable thresholds

**Context Preservation**: Maintaining conversation continuity even with long histories

Common Use Cases

Long-running conversational applications where context matters

Cost optimization for high-volume chat applications

Applications requiring audit trails of conversation summaries

Multi-turn conversations with token budget constraints

Important Notes

The advisor automatically manages memory - no manual intervention required

Summarization uses a separate model for cost efficiency

Configuration is externalized for easy tuning in production

The advisor pattern allows for non-invasive memory management

Both API keys must be valid for full functionality

Troubleshooting

**"API key not found"**: Ensure `OPENAI_API_KEY` and `GOOGLE_GENAI_API_KEY` environment variables are set

**Compaction not triggering**: Check `compact.memory.compact-threshold` is less than `compact.memory.max-messages`

**Build failures**: Verify Java 25 is installed and `JAVA_HOME` is set correctly

**Test failures**: Check that mock configurations are properly set up for tests

Compacting Chat Memory Advisor

Compacting Chat Memory Advisor

What This Skill Does

Prerequisites

Instructions

1. Understand the Project Structure

2. Review Core Configuration

3. Study the CompactingChatMemoryAdvisor

4. Understand the Dual-Model Architecture

5. Build and Run the Application

Build

Run

Run tests

6. Test the Endpoints

7. Customize Compaction Behavior

8. Extend the Advisor

9. Integration Patterns

10. Testing Strategy

Key Concepts

Common Use Cases

Important Notes

Troubleshooting

Reviews (0)