Nvidia fixes Blackwell chip flaw with help from TSMC, mass production back on schedule
Overcoming this issue was crucial for Nvidia, as it aims to maintain its dominant position in the AI chip market.
Nvidia has successfully fixed a design flaw in its latest Blackwell AI chips, according to CEO Jensen Huang. The issue, which caused production delays, has been solved with the assistance of TSMC, Nvidia's long-standing manufacturing partner. In fact, it was TSMC that originally spotted the problem.
Overcoming this issue was crucial for Nvidia, as it aims to maintain its dominant position in the AI chip market. As demand for high-performance AI computing solutions continues to surge, the successful launch of Blackwell will play a pivotal role in providing the necessary hardware.
Huang candidly admitted the company's responsibility for the setback. "We had a design flaw in Blackwell," he said. "It was functional, but the design flaw caused the yield to be low. It was 100 percent Nvidia's fault."
The Blackwell chips, unveiled in March, were originally slated for second-quarter shipping. However, the design flaw led to delays, potentially affecting major customers such as Meta, Google, and Microsoft.
The Blackwell project was unusually complex, Huang said, which may have been a factor in the flaw. "In order to make a Blackwell computer work, seven different types of chips were designed from scratch and had to be ramped into production at the same time."
The technical issue stemmed from the intricate packaging technology used in the Blackwell B100 and B200 GPUs. These chips employ TSMC's CoWoS-L packaging, which utilizes an RDL interposer with local silicon interconnect bridges to achieve data transfer rates of about 10 TB/s. The problem arose from a mismatch in thermal expansion properties between various components, causing system warping and failure.
Article