Proof Reading Multiagent System Setup

A comprehensive setup guide for developing and maintaining a CrewAI-based proof reading multiagent system that automates document review and severity assessment for medical/scientific documents.

System Overview

This system uses two specialized AI agents to review document issues from Excel files and reassess their severity levels using configurable rulesets. The architecture is optimized for 8GB development environments with production deployment on Google Cloud Run Functions.

Environment Setup Instructions

1. Virtual Environment Management

**CRITICAL**: Always use the existing virtual environment - never create new ones.

```bash

Navigate to project root

cd /path/to/proof_reading_multiagent_system

Activate existing virtual environment (ALWAYS first step)

source .virtenv/bin/activate

Install dependencies only after activation

pip install -r requirements.txt

```

2. Memory Configuration for Development

For 8GB laptop development environments:

```bash

Add to ~/.zshrc for permanent Node.js memory limit

echo 'export NODE_OPTIONS="--max-old-space-size=3072"' >> ~/.zshrc

source ~/.zshrc

```

3. Project Structure Setup

Ensure the following directory structure exists:

```

proof_reading_multiagent_system/

├── logs/ # Root-level logs with archive/

├── proof_reading_multiagent_system/

│ ├── data/ # Data files (benchmarks/, output/, samples/)

│ ├── docs/ # Documentation

│ ├── knowledge/ # CrewAI knowledge base

│ ├── scripts/ # Utility scripts

│ ├── src/proof_reading_multiagent_system/

│ │ ├── config/ # Agent and task YAML configs

│ │ ├── models/ # Data models and schemas

│ │ ├── tools/ # Custom CrewAI tools

│ │ ├── utils/ # Logging infrastructure

│ │ ├── config.yaml # Main system configuration

│ │ ├── crew.py # CrewAI crew definition

│ │ └── main.py # Main execution entry point

│ └── tests/ # Test suite

```

Configuration Management

Self-Contained Configuration Pattern

All configuration must be self-contained in config.yaml files - no manual environment variables required:

```python

def _setup_google_cloud_environment(self) -> None:

"""Set up Google Cloud environment from configuration."""

try:

gemini_config = self.config.get('gemini_config', {})

credentials_path = gemini_config.get('credentials_path')

if credentials_path:

if not os.path.isabs(credentials_path):

config_dir = os.path.dirname(__file__)

credentials_path = os.path.join(config_dir, credentials_path)

os.environ['GOOGLE_APPLICATION_CREDENTIALS'] = credentials_path

except Exception as e:

self.logger.error(f"Failed to setup Google Cloud environment: {e}")

```

Required config.yaml Structure

```yaml

Document processing context

document_context:

document_type: "Clinical Study Report"

domain: "Oncology"

urgency: "Standard"

Behavioral settings

behavior_config:

verbose: false

Use absolute paths for production stability

logging:

log_file: "/absolute/path/to/logs/proof_reading_system.log"

output_directory: "/absolute/path/to/output/"

Google Cloud configuration

gemini_config:

credentials_path: "$HOME/.config/gcloud/credentials.json"

```

Logging Infrastructure Integration

Required Logging Patterns

Always use the structured logging system:

```python

from proof_reading_multiagent_system.utils import (

get_logger, log_structured, correlation_context,

performance_monitor, audit_logger, log_execution_time

)

Performance tracking with context managers

with performance_monitor.track_processing_operation("excel_processing") as metrics:

# Processing logic

metrics.records_processed = record_count

Audit logging for severity changes

audit_logger.log_severity_change(issue_id, old_severity, new_severity, reason, agent_name)

Function performance decoration

@log_execution_time("critical_function")

def process_excel_data():

pass

```

CrewAI Session Management

```python

@CrewBase

class ProofReadingCrew:

def __init__(self):

self.logger = setup_logging()

self.session_id = generate_correlation_id()

@before_kickoff

def setup_session_logging(self, inputs: Dict[str, Any]) -> Dict[str, Any]:

with correlation_context(self.session_id):

self.logger.info("Starting proof reading session",

extra={'session_id': self.session_id})

return inputs

@after_kickoff

def finalize_session_logging(self, output: Any) -> Any:

performance_monitor.log_session_summary()

return output

```

Development Workflow

CLI Usage Pattern

```bash

Basic usage - all context from config

python -m proof_reading_multiagent_system.main --input "data/samples/issues.xlsx"

With custom output path

python -m proof_reading_multiagent_system.main --input "issues.xlsx" --output "/custom/path/result.xlsx"

Verbose mode for debugging

python -m proof_reading_multiagent_system.main --input "issues.xlsx" --verbose

CrewAI development commands

run_crew

train

replay

test

```

Git Workflow Standards

```bash

1. Always activate virtual environment first

source .virtenv/bin/activate

2. Make atomic commits at feature level

git add specific_files_for_feature

git commit -m "Add ExcelReaderTool: implement XLSX to DocumentIssue conversion"

3. Use descriptive commit messages without Claude attributions

Format: [Action] [Component]: Brief description

```

Error Handling Pattern

```python

try:

# Operation here

result = process_document_issues(excel_data)

except Exception as e:

log_error(

f"Document processing failed: {str(e)}",

error_type=type(e).__name__,

operation="document_processing",

correlation_id=get_correlation_id(),

additional_context={"file_path": input_file}

)

raise # Re-raise after logging

```

Data Processing Architecture

Critical Efficiency Pattern

Process only Medium/High severity issues for 60-80% efficiency gain:

```

XLSX Input → SeverityFilter → [Medium/High Issues] + [Low Issues (untouched)]

↓ ↓

Process Medium/High Only Keep Low Issues Separate

↓ ↓

Updated Medium/High Issues ← DataMergeTool ← Untouched Low Issues

↓

XLSX Output (Complete Dataset)

```

Memory Management

Troubleshooting Steps

1. **Check Node.js memory limit**: Ensure NODE_OPTIONS is set to 3GB

2. **Monitor file sizes**: Keep development Excel files under 5MB

3. **Restart when needed**: Don't hesitate to restart Claude Code if memory usage is high

4. **Browser management**: Close unnecessary tabs to free system memory

Quality Standards

**ALWAYS** activate .virtenv before any Python operations

**NEVER** create new virtual environments

**REQUIRED** use absolute paths in production config.yaml

**MANDATORY** integrate structured logging for all new components

**CRITICAL** follow atomic commit patterns for feature development

**ESSENTIAL** validate all configuration paths at startup

This setup ensures a robust, scalable proof reading system with proper logging, configuration management, and development workflows optimized for both local development and production deployment.

Proof Reading Multiagent System Setup

Proof Reading Multiagent System Setup

System Overview

Environment Setup Instructions

1. Virtual Environment Management

Navigate to project root

Activate existing virtual environment (ALWAYS first step)

Install dependencies only after activation

2. Memory Configuration for Development

Add to ~/.zshrc for permanent Node.js memory limit

3. Project Structure Setup

Configuration Management

Self-Contained Configuration Pattern

Required config.yaml Structure

Document processing context

Behavioral settings

Use absolute paths for production stability

Google Cloud configuration

Logging Infrastructure Integration

Required Logging Patterns

Performance tracking with context managers

Audit logging for severity changes

Function performance decoration

CrewAI Session Management

Development Workflow

CLI Usage Pattern

Basic usage - all context from config

With custom output path

Verbose mode for debugging

CrewAI development commands

Git Workflow Standards

1. Always activate virtual environment first

2. Make atomic commits at feature level

3. Use descriptive commit messages without Claude attributions

Format: [Action] [Component]: Brief description

Error Handling Pattern

Data Processing Architecture

Critical Efficiency Pattern

Memory Management

Troubleshooting Steps

Quality Standards

Reviews (0)