You are viewing a single comment's thread from:

RE: LeoThread 2024-10-29 05:12

in LeoFinance3 months ago

Nvidia ethernet technology boosts Elon Musk’s world-class AI training system

Elon Musk recently stated that xAI’s Colossus supercomputer is the “most powerful AI training system in the world.”

US chipmaker Nvidia announced on Monday, October 28, that it has helped Elon Musk’s xAI expand its Colossus supercomputer.

The Colossus supercomputer cluster is now recognized as the largest AI training cluster in the world.

Thanks partly to Nvidia’s Spectrum-X ethernet networking technology, xAI can take its ChatGPT-rivaling Grok AI to new levels.

#nvidia #elonmusk #ai #xai #colossus #supercomputer #technology

Sort:  

Grok AI: Can generative AI decipher the meaning of life?
Founded by Elon Musk last year, xAI is a startup that provides a service similar to Open AI’s ChatGPT. In a move typical of Musk, the company has a superb mission goal that strikes the core of our existence. That goal, the company says, is to use generative artificial intelligence “to understand the true nature of the universe.”

Key to achieving that goal is xAI’s Colossus supercomputer. The impressive computing powerhouse was built in Memphis, Tennessee, to train the third generation of Grok. xAI’s Grok is a large language model AI, much like Open AI’s ChatGPT. It is available to premium X (formerly Twitter) subscribers.

Impressively, xAI completed Colossus in just 122 days. It then began training its first models 19 days after the installation. According to Nvidia, these systems often take many months or even years to make.

Much like ChatGPT, Grok’s large language models are trained by analyzing massive amounts of data, requiring vast computing power. The data used includes text, images, and other content most often procured online.

In a recent post on X, Elon Musk said, “Colossus is the most powerful AI training system in the world. Moreover, it will double in size to 200k (50k H200s) in a few months. Excellent work by the [xAI] team, NVIDIA, and our many partners/suppliers.”

Now, Nvidia is helping xAI to take the world’s most powerful AI training cluster to the next level. In its statement on Monday, the US tech giant said it will help Elon Musk’s xAI double Colossus’s capacity to 200,000 GPUs.

The world’s most advanced AI chatbot
The Colossus AI training cluster comprises an interconnected network of 100,000 NVIDIA Hopper GPUs. These use a unified Remote Direct Memory Access network.

The network uses Nvidia’s Spectrum-X technology for low latency. Data moves directly between nodes without being rerouted toward the operating system, allowing Colossus to process the massive amounts of data required to train Grok.

“Across all three tiers of the network fabric, the system has experienced zero application latency degradation or packet loss due to flow collisions,” Nvidia explained in its statement. “It has maintained 95% data throughput enabled by Spectrum-X congestion control.”

Nvidia describes Spectrum-X as the world’s first ethernet networking platform for generative AI.

Theoretically, with Nvidia’s help, the combined Colossus array could eventually achieve about 497.9 exaflops (497,900,000 teraflops). This would set a new benchmark in supercomputing power and make Grok the most impressive AI chatbot on the planet.