Snowplow Snowflake Loader Development

Expert guidance for working with Snowplow Snowflake Loader, a functional Scala streaming application that loads Snowplow enriched events into Snowflake with low latency across multiple cloud platforms.

What This Skill Does

Provides comprehensive development guidance for the Snowplow Snowflake Loader codebase, including:

Multi-module Scala/SBT project structure navigation

Functional programming patterns with Cats Effect and FS2

Multi-cloud platform integration (Azure Event Hubs, AWS Kinesis, GCP Pub/Sub)

Snowflake schema management and parallel loading

Testing, building, and Docker deployment workflows

Instructions

When working with this codebase, follow these guidelines:

1. Understanding the Architecture

**Multi-Module Structure:**

`modules/core/`: Shared business logic - Snowflake loading, configuration, processing pipeline

`modules/kafka/`: Azure Event Hubs integration with dual authentication handlers

`modules/kinesis/`: AWS Kinesis integration

`modules/pubsub/`: GCP Pub/Sub integration

`modules/*Distroless/`: Distroless Docker variants for production deployment

**Key Components:**

`LoaderApp.scala`: Abstract base for platform-specific entry points

`Config.scala`: HOCON configuration with Circe decoders

`processing/Processing.scala`: Main FS2 streaming pipeline

`processing/TableManager.scala`: Schema evolution and table management

`processing/Channel.scala`: Snowflake channel management for parallel uploads

2. Development Workflow

**Building:**

```bash

Compile all modules

sbt compile

Compile specific module

sbt {core|kafka|kinesis|pubsub}/compile

```

**Testing:**

```bash

Run all tests

sbt test

Test specific module

sbt {core|kafka|kinesis|pubsub}/test

Run specific test class

sbt "core/testOnly *ProcessingSpec"

```

**Code Quality:**

```bash

Format code (do this before committing)

sbt scalafmt

Check formatting

sbt scalafmtCheck

Add/check license headers

sbt headerCreate

sbt headerCheck

```

3. Docker Builds

**Standard Ubuntu images:**

```bash

sbt kafka/docker:publishLocal

sbt kinesis/docker:publishLocal

sbt pubsub/docker:publishLocal

```

**Distroless production images:**

```bash

sbt kafkaDistroless/docker:publishLocal

sbt kinesisDistroless/docker:publishLocal

sbt pubsubDistroless/docker:publishLocal

```

4. Configuration Management

Base config: `modules/core/src/main/resources/reference.conf`

Platform examples: `config/` directory with minimal and reference configs

Format: HOCON with environment variable substitution

**Critical:** License acceptance required via `ACCEPT_LIMITED_USE_LICENSE` env var

5. Cloud Platform Specifics

**Azure (Event Hubs/Kafka):**

Uses `AzureAuthenticationCallbackHandler` for OAuth authentication

Separate auth instances for source (input) and sink (bad rows) connections

Entry point: `kafka/AzureApp.scala`

**AWS (Kinesis):**

Uses AWS STS SDK for credential management

Entry point: `kinesis/AwsApp.scala`

**GCP (Pub/Sub):**

Native Pub/Sub client with service account authentication

Entry point: `pubsub/GcpApp.scala`

6. Processing Pipeline

The application follows this flow:

1. Read enriched events from cloud message queue (Event Hubs/Kinesis/Pub/Sub)

2. Transform events to Snowflake-compatible format

3. Batch events for efficient loading

4. Detect schema changes and alter tables as needed

5. Upload via Snowflake channels (parallel, transactional)

6. Retry transient failures with backoff

7. Write failed events to bad rows sink

7. Common Development Tasks

**Adding a new configuration field:**

1. Update case class in `Config.scala`

2. Add Circe decoder if custom decoding needed

3. Update `reference.conf` with default value

4. Update example configs in `config/` directory

**Modifying processing logic:**

1. Core logic lives in `modules/core/src/main/scala/com.snowplowanalytics.snowflake/processing/`

2. Use FS2 combinators for stream transformations

3. Ensure proper resource management with `Resource[F, A]`

4. Add unit tests in corresponding test directory

**Adding a new cloud platform:**

1. Create new module in `modules/` (follow kafka/kinesis/pubsub pattern)

2. Extend `LoaderApp` abstract class

3. Implement platform-specific `source` and `sink` constructors

4. Add SBT project definition with Docker packaging

5. Create example configs in `config/`

8. Testing Strategy

Use Specs2 with Cats Effect integration

Mock external dependencies (`MockEnvironment.scala`)

Test config parsing for all platforms

Unit test processing pipeline components

Focus on edge cases: schema evolution, retries, failure scenarios

9. Build System

SBT multi-project build with shared settings in `project/BuildSettings.scala`

Dependencies centralized in `project/Dependencies.scala`

Version overrides for security patches

Plugins: scalafmt, sbt-header, sbt-buildinfo, sbt-native-packager

10. Key Functional Programming Patterns

**Effect management:** `cats.effect.IO` for side effects

**Streaming:** `fs2.Stream` for event processing

**Resource safety:** `Resource[F, A]` for lifecycle management

**Error handling:** `Either`, `EitherT`, and `ApplicativeError`

**Concurrency:** `Ref`, `Deferred`, `Queue` for state management

Constraints

Scala 2.13.16 only (not compatible with Scala 3)

Requires Java 11+ runtime

License acceptance mandatory for usage

Snowflake private key authentication required

Each platform requires platform-specific credentials

Example Usage

**Building and testing a module:**

```bash

Build core module and run tests

sbt core/clean core/compile core/test

Format code and check headers before committing

sbt scalafmt headerCheck

```

**Creating a Docker image:**

```bash

Build distroless production image for Kinesis

sbt kinesisDistroless/docker:publishLocal

Run with config file

docker run \

-v $(pwd)/config/config.kinesis.hocon:/var/config.hocon \

-e ACCEPT_LIMITED_USE_LICENSE=yes \

snowplow/snowflake-loader-kinesis-distroless:latest \

--config /var/config.hocon

```

**Running specific tests:**

```bash

Test processing pipeline

sbt "core/testOnly *ProcessingSpec"

Test Kinesis config parsing

sbt "kinesis/testOnly *ConfigSpec"

```

Snowplow Snowflake Loader Development

Snowplow Snowflake Loader Development

What This Skill Does

Instructions

1. Understanding the Architecture

2. Development Workflow

Compile all modules

Compile specific module

Run all tests

Test specific module

Run specific test class

Format code (do this before committing)

Check formatting

Add/check license headers

3. Docker Builds

4. Configuration Management

5. Cloud Platform Specifics

6. Processing Pipeline

7. Common Development Tasks

8. Testing Strategy

9. Build System

10. Key Functional Programming Patterns

Constraints

Example Usage

Build core module and run tests

Format code and check headers before committing

Build distroless production image for Kinesis

Run with config file

Test processing pipeline

Test Kinesis config parsing

Reviews (0)