Expert guidance for developing, testing, and deploying Snowplow Snowflake Loader - a Scala/Cats Effect streaming application that loads enriched events into Snowflake across Azure, AWS, and GCP platforms
Expert guidance for working with Snowplow Snowflake Loader, a functional Scala streaming application that loads Snowplow enriched events into Snowflake with low latency across multiple cloud platforms.
Provides comprehensive development guidance for the Snowplow Snowflake Loader codebase, including:
When working with this codebase, follow these guidelines:
**Multi-Module Structure:**
**Key Components:**
**Building:**
```bash
sbt compile
sbt {core|kafka|kinesis|pubsub}/compile
```
**Testing:**
```bash
sbt test
sbt {core|kafka|kinesis|pubsub}/test
sbt "core/testOnly *ProcessingSpec"
```
**Code Quality:**
```bash
sbt scalafmt
sbt scalafmtCheck
sbt headerCreate
sbt headerCheck
```
**Standard Ubuntu images:**
```bash
sbt kafka/docker:publishLocal
sbt kinesis/docker:publishLocal
sbt pubsub/docker:publishLocal
```
**Distroless production images:**
```bash
sbt kafkaDistroless/docker:publishLocal
sbt kinesisDistroless/docker:publishLocal
sbt pubsubDistroless/docker:publishLocal
```
**Azure (Event Hubs/Kafka):**
**AWS (Kinesis):**
**GCP (Pub/Sub):**
The application follows this flow:
1. Read enriched events from cloud message queue (Event Hubs/Kinesis/Pub/Sub)
2. Transform events to Snowflake-compatible format
3. Batch events for efficient loading
4. Detect schema changes and alter tables as needed
5. Upload via Snowflake channels (parallel, transactional)
6. Retry transient failures with backoff
7. Write failed events to bad rows sink
**Adding a new configuration field:**
1. Update case class in `Config.scala`
2. Add Circe decoder if custom decoding needed
3. Update `reference.conf` with default value
4. Update example configs in `config/` directory
**Modifying processing logic:**
1. Core logic lives in `modules/core/src/main/scala/com.snowplowanalytics.snowflake/processing/`
2. Use FS2 combinators for stream transformations
3. Ensure proper resource management with `Resource[F, A]`
4. Add unit tests in corresponding test directory
**Adding a new cloud platform:**
1. Create new module in `modules/` (follow kafka/kinesis/pubsub pattern)
2. Extend `LoaderApp` abstract class
3. Implement platform-specific `source` and `sink` constructors
4. Add SBT project definition with Docker packaging
5. Create example configs in `config/`
**Building and testing a module:**
```bash
sbt core/clean core/compile core/test
sbt scalafmt headerCheck
```
**Creating a Docker image:**
```bash
sbt kinesisDistroless/docker:publishLocal
docker run \
-v $(pwd)/config/config.kinesis.hocon:/var/config.hocon \
-e ACCEPT_LIMITED_USE_LICENSE=yes \
snowplow/snowflake-loader-kinesis-distroless:latest \
--config /var/config.hocon
```
**Running specific tests:**
```bash
sbt "core/testOnly *ProcessingSpec"
sbt "kinesis/testOnly *ConfigSpec"
```
Leave a review
No reviews yet. Be the first to review this skill!
# Download SKILL.md from killerskills.ai/api/skills/snowplow-snowflake-loader-development/raw