1. Since LLM inference is memory BW bounded, cluster supercomputing for AI inference with large scale concurrency is much more efficient than server, due to much less frequent model loading from HBM; .
2. distributed parallel computing with tight coupling(large BW>8Tb/s) is efficient(80%); but inefficient(40%) in loose cluster due to low internode BW.
| concurrent inference(deepseek R1) efficiency (tokens per GPU), with >30token/s per user | 16 H100 | 320 H100 | 320 H100 with NVL |
|---|---|---|---|
| 5000token prompt+1000token generation | 800 | 3000 | 6000 (BW1.5Tb/s) |
| 1000token prompt+1000token generation | 5000 | 15000(BW 4Tb/s) |
Moore’s law is the bottleneck of the performance of a single chip, thus GPU cluster is the unique future of AI training, concurrent inference and AI4S; Interconnection bandwidth BW is the NASC for the scability
Current Cluster Parallelism
DGX GH200
GB200 nvl72
optical IO chip
shorten serdes signal chain to optics in one die, for signal integrity and density
| cake | eat | fit |
|---|---|---|
| Large BW>4Tb/s | range>100m (optic fiber chain,and WDM for less fiber thus higher reliability | cheap,cost $50, price $300 |
Cutting-Edge Photonic Integration for AI Cluster Interconnection
disruptive optical material LN/SiN/Si+hybrid bonding with serdes
850nm DWDM,for low cost and high yield
12inch customerization.
CMOS foundry compatible.
| Technology Feature | TROE Solution | Ayarlabs | TROE merits | current tech |
|---|---|---|---|---|
| eletrical device node
optical material integration of both |
22nm electronics
LN-SiN-SI photonics wafer bonding |
45nm SOI electronics
Silocon Photonic monolithic |
much faster(both electrical and optical) and lower power | 1. optical modules 4*800G, more than $2000 2. copper cables more than $500 and much thermal and power supply cost |
| wavelength | 850nm | 1310nm | much cheaper and smaller | |
| WDM | DWDM 3-fiber coupling | LWDM array 30-fiber coupling | much cheaper and higher yield and reliability compatible with optical switch | optical IO is only $500 |
low loss high modulation efficient LN/SiN waveguide die2wafer and post via with serdes wafer
GPT(AI)+Al2S for GPU cluster:4M chips/year (2024),and 4X volume optical IO, $5 billion/y market
Mr. Zhao, the Manager
Tel: 17280752040
E-mail: zhaozq@xight.ai