Part 6/12:
Control Costs: Using budget thresholds and resource tagging.
Enhance Data Governance: Via Unity Catalog for access control and management.
Data Ingestion: From Streams to S3
The ingestion pipeline involves streaming data from various sources into MSK, with schema management ensuring data integrity. To send data into the Delta Lake, Zepto employs S3 sync connectors—Java-based applications that efficiently transfer data from Kafka to Amazon S3, avoiding unnecessary compute costs.
This process feeds multiple data layers:
Bronze Layer: Raw change logs and event streams.
Silver Layer: Cleaned, schema-validated data suitable for analysis.
Gold Layer: Business-ready datasets for reporting and decision-making.