You are viewing a single comment's thread from:

RE: LeoThread 2025-10-18 14-48

in LeoFinance2 months ago

Part 5/11:

  • Enormous volume: 57 million events daily from just three key APIs.

  • Cost and performance constraints.

  • Schema variability: Evolving data schemas over time.

  • Small file proliferation: Generating many tiny files that complicate storage and retrieval.

Redbus faced the challenge of storing this raw data cost-effectively while maintaining accessibility and integrity.

Exploiting Schema Inference and Optimized Storage

Schema Evolution and Inference

Given that API data schemas evolve daily—new fields and structures are introduced—the team adopted schema inference:

  • When raw data arrives, the system automatically deduces its schema.

  • Schemas are versioned and buckets are created based on timestamps.