Sort:  

Inference compute refers to the computational resources used to run a trained AI or machine learning model to generate predictions or outputs from new data. It's the "runtime" phase after training, where the model applies its learned patterns—e.g., classifying images or generating text.

Key differences from training:

  • Training: Builds the model using vast datasets (resource-intensive, one-time).
  • Inference: Applies the model in real-time or batch (faster, optimized for efficiency).

Common in cloud services like GPUs/TPUs for apps like chatbots or recommendation systems. For details, see Cloudflare's guide on AI inference vs. training.