You are viewing a single comment's thread from:

RE: LeoThread 2025-10-18 14-48

in LeoFinance2 months ago

Part 6/11:

  • This allows the system to handle new event types gracefully, without manual schema tracking.

Data Transformation Workflow

The workflow for processing incoming raw events is as follows:

  1. Metadata Extraction: Headers like event source, country, and event type are extracted for contextual information.

  2. Schema Inference & Bucketing: The system infers the schema, compares it with existing ones, and buckets the data accordingly.

  3. Casting & Transformation: The raw JSON data undergoes upcasting—resolving data types to a generic schema—using predefined casting rules.

  4. Compression & Storage: Transformed data is compressed, stored in Apache Parquet format in S3, with a focus on optimal file sizes to mitigate the small file problem.