You are viewing a single comment's thread from:

RE: LeoThread 2025-10-18 14-48

in LeoFinance2 months ago

Part 5/12:

  • CDC Data Collection: Change Data Capture (CDC) streams from databases via DBZ (Debezium), which captures change logs and streams.

  • Message Queues & Storage: Data is pushed into Kafka (MSK), with schemas maintained through schema registries. From Kafka, data is stored in Amazon S3, which acts as a cost-effective, external storage layer.

  • Data Processing & Transformation: Using Databricks with Spark and Photon engines, Zepto performs data cleaning, de-duplication, and transformation across bronze (raw), silver (cleaned), and gold (aggregated) layers.

This architecture allows Zepto to:

  • Scale Indefinitely: Through delta format and S3 storage.

  • Improve Query Speed: By partitioning data and optimizing storage.