OCP Educational Webinar: Optical Interconnect for Large Scale AI Clusters
Data and computation are fundamental to the success of AI, which is why hyperscale data center operators are racing to build ever larger accelerator-based compute clusters. The market jolt caused by LLMs and ChatGPT has further accelerated the race. Scaling clusters requires innovation across many domains: the accelerator chips, their linkages within a node, the fabrics linking the nodes, whether InfiniBand or Ethernet-based, and their topology. The input-output (I/O) growth of each new generation of accelerator and the growth of bandwidth that needs to escape a compute node means using optical I/O is a matter of time. Each hyperscale operator has its preference; some back co-packaged optics (CPO), while others favor pluggable optics. But the use of optics continues to grow in such clusters. For example, Google uses its optical circuit switches as part of its TPU v4 accelerator-based AI compute clusters to reduce cost and improve system reliability and flexibility. And Broadcom has already developed CPO-based Tomahawk Ethernet switches; its latest Jericho3-AI chip developed to scale AI compute clusters can now also be CPO-based. This webinar will discuss the challenges of scaling AI clusters as well as the emerging solutions for improving optical I/O density, bandwidth and power efficiency.
Data and computation are fundamental to the success of AI, which is why hyperscale data center operators are racing to build ever larger accelerator-based compute clusters. The market jolt caused by LLMs and ChatGPT has further accelerated the race. Scaling clusters requires innovation across many domains: the accelerator chips, their linkages within a node, the fabrics linking the nodes, whether InfiniBand or Ethernet-based, and their topology. The input-output (I/O) growth of each new generation of accelerator and the growth of bandwidth that needs to escape a compute node means using optical I/O is a matter of time. Each hyperscale operator has its preference; some back co-packaged optics (CPO), while others favor pluggable optics. But the use of optics continues to grow in such clusters. For example, Google uses its optical circuit switches as part of its TPU v4 accelerator-based AI compute clusters to reduce cost and improve system reliability and flexibility. And Broadcom has already developed CPO-based Tomahawk Ethernet switches; its latest Jericho3-AI chip developed to scale AI compute clusters can now also be CPO-based. This webinar will discuss the challenges of scaling AI clusters as well as the emerging solutions for improving optical I/O density, bandwidth and power efficiency.
During the webinar, you'll learn:
AI workloads characteristics
Interconnect requirements of AI workloads
The various options of Optical Interconnect designs and topologies to support AI workloads
Optical Interconnect in large scale AI clusters - How big is the market opportunity?
Who should Attend?
Switch System Vendors, Silicon Vendors, Optical Transceiver vendors, Cloud and Telecom Service Providers, IT buyers and Decision Makers, and Enterprise CIOs and CTOs.
Please visit OCP Educational Webinar page for more details.