Go Reddit Storage Library
You are working with a Go library that provides database persistence for Reddit data fetched via `go-reddit-api-wrapper`. The library supports both PostgreSQL and SQLite backends with identical interfaces.
Project Architecture
Storage Interface Pattern
The codebase uses an interface-based design defined in `storage.go` that allows multiple backend implementations. Both PostgreSQL and SQLite implement the same `Storage` interface, providing:
Post/comment CRUD operationsSubreddit metadata storageFull-text search (PostgreSQL only)Query options for filtering and paginationArchiver Pattern
The `Archiver` type in `archiver.go` combines a Reddit API client with a storage backend to provide high-level operations:
`ArchiveSubreddit` - Fetch and store posts from a subreddit`ArchivePost` - Fetch and store a single post with comments`ContinuousArchive` - Monitor and archive new content continuously`BackfillSubreddit` - Archive historical posts with pagination`UpdateScores` - Refresh scores for recently archived postsDatabase Schema & Migrations
Schema files are embedded in the binary using `go:embed` (`schema/schema.go`)Migrations stored in `schema/migrations/postgres/` and `schema/migrations/sqlite/`Migration runner automatically tracks applied versions in `schema_version` tableMigrations run transactionally with automatic rollback on failureNew migrations follow pattern: `NNN_description.sql` where NNN is the version numberKey Implementation Details
**Idempotent Operations**: All save operations use UPSERT patterns (PostgreSQL: `INSERT ... ON CONFLICT DO UPDATE`, SQLite: `INSERT OR REPLACE`)**Batch Operations**: `SavePosts` and `SaveComments` use transactions and prepared statements for performance**Comment Threading**: Comments store `depth` field and `parent_id` references; recursive CTEs query full comment trees**Backend Organization**: Each backend (postgres/, sqlite/) organized into separate files for connection, posts, comments, and tests**Error Handling**: Uses `StorageError` type for all storage-related errors with operation contextCommon Development Tasks
Testing
```bash
Run all tests
go test -v ./...
Run specific package tests
go test -v ./sqlite
go test -v ./postgres
Run PostgreSQL tests (requires DATABASE_URL)
export TEST_POSTGRES_URL="postgres://user:pass@localhost/reddit_test?sslmode=disable"
go test -v ./postgres
Start test PostgreSQL with Docker
docker run -d --name test-postgres \
-e POSTGRES_PASSWORD=test \
-e POSTGRES_DB=reddit_test \
-p 5432:5432 \
postgres:15
```
Building
```bash
Build CLI tool
go build -o reddit-archiver ./cmd/reddit-archiver
Install CLI globally
go install github.com/jamesprial/go-reddit-storage/cmd/reddit-archiver@latest
```
Running Examples
```bash
Set required environment variables
export REDDIT_CLIENT_ID="your-client-id"
export REDDIT_CLIENT_SECRET="your-client-secret"
Run examples
go run examples/basic/main.go
go run examples/continuous/main.go
go run examples/backfill/main.go
Run CLI tool
./reddit-archiver -subreddit golang -limit 100 -comments
```
Development Patterns
Adding New Storage Methods
1. Add method to `Storage` interface in `storage.go`
2. Implement in `postgres/postgres.go` (or appropriate file)
3. Implement in `sqlite/sqlite.go` (or appropriate file)
4. Add tests to both `postgres_test.go` and `sqlite_test.go`
Adding New Migrations
1. Create `00X_description.sql` in both `schema/migrations/postgres/` and `schema/migrations/sqlite/`
2. Increment version number from last migration
3. Migrations run automatically on next `RunMigrations()` call
4. Test migrations on both backends
Testing with Real Databases
PostgreSQL tests require `TEST_POSTGRES_URL` environment variableSQLite tests use temporary files (no setup required)Use `internal/testutil/fixtures.go` for test data generationImportant Guidelines
Always use context for cancellation support in long-running operationsStore raw JSON in `raw_json` columns for future schema evolutionPostgreSQL connection strings require `sslmode` parameterSQLite uses WAL mode for better concurrency (enabled automatically)The CLI tool reads credentials from environment variables (never hardcode)Comment depth is stored denormalized for query performanceAlways commit to git after finishing tasksAll batch operations are atomic (all succeed or all fail)Re-archiving the same content is safe due to UPSERT patterns