For example, a high-end GPU like the NVIDIA V100 can process up to 15.7 teraflops of compute power, but the interconnects can only transfer data at a rate of around 100-200 GB/s. This means that the GPU can perform many more computations than the interconnects can transfer data, resulting in a bottleneck that limits the overall performance of the AI training process.
You are viewing a single comment's thread from: