RE: LeoThread 2025-10-18 14-48

Part 5/12:

CDC Data Collection: Change Data Capture (CDC) streams from databases via DBZ (Debezium), which captures change logs and streams.
Message Queues & Storage: Data is pushed into Kafka (MSK), with schemas maintained through schema registries. From Kafka, data is stored in Amazon S3, which acts as a cost-effective, external storage layer.
Data Processing & Transformation: Using Databricks with Spark and Photon engines, Zepto performs data cleaning, de-duplication, and transformation across bronze (raw), silver (cleaned), and gold (aggregated) layers.

This architecture allows Zepto to: