Laser Light Computing Cuts AI Workloads to a Single Pass
AI training and inference burn energy on memory traffic and sequential math. A new optical approach collapses that overhead: it encodes matrices into laser light and completes the core calculation in one pass through an optical stack.
The result: matrix multiplications finish in nanoseconds of light propagation with minimal data movement, and real neural networks run without retraining for specialized hardware.
In a nutshell
- Parallel optical matrix-matrix multiplication (POMMM) executes a full matmul via a single propagation of coherent light.
- Tested on GPU-designed models, it reached 94.44% accuracy on MNIST and 84.11% on Fashion-MNIST; vision transformers showed similar behavior.
- The approach targets the main bottleneck in AI chips: energy-hungry data movement between compute and memory.
- Theoretical analysis indicates multiple orders-of-magnitude gains in parallelism and energy efficiency with purpose-built photonic hardware.
Why light beats electronics for this job
GPUs multiply matrices by iterating through millions of reads, multiplies, accumulations, and writes. Every step moves data and burns power. POMMM sidesteps that pipeline by letting physics do the math.
One matrix is encoded into the amplitude and spatial layout of a laser field. Distinct phase gradients are applied across rows, and cylindrical lenses perform optical Fourier transforms in perpendicular axes. Those transforms separate partial products into unique spatial locations that a camera captures at once.
Across sizes from 10×10 to 50×50, the optical results closely matched GPU outputs with low average error-completed during a single pass of light.
What the prototype actually did
The lab setup used spatial light modulators (SLMs) to encode inputs on a 532 nm laser, cylindrical lens assemblies for the transforms, and a high-resolution quantitative CMOS camera for readout. Throughput is set by modulator and sensor speeds; the core math happens during propagation.
For complex-valued ops, the team demonstrated wavelength multiplexing: encoding real and imaginary parts at 540 and 550 nm and processing them in parallel. This points toward single-shot tensor operations across multiple wavelengths for higher-dimensional data common in deep learning.
What this means for your lab or stack
- Energy and bandwidth: By minimizing memory traffic for matmuls, POMMM targets the dominant cost in modern AI workloads.
- Model portability: GPU-trained weights ran directly on the optical system, suggesting near-term use as a co-processor without architectural redesign.
- Error-aware training: Training with POMMM-specific error characteristics can offset optical imperfections-think of it like calibration-aware or noise-aware fine-tuning.
- System design: Practical deployments will hinge on fast, low-noise modulators and sensors, stable alignment, and photonic integration to chain layers.
Performance, accuracy, and scope
Neural nets tested included convolutional networks for digit and clothing recognition and vision transformers with comparable outcomes. The prototype validated single-pass matmuls on small-to-moderate matrices, while simulations extended to 256×9,216 operations-evidence of scalability beyond the benchtop rig.
The authors argue that a purpose-built platform could push parallelism and efficiency far beyond current optical approaches that require multiple propagations per operation.
Constraints you should factor in
- Cascading layers: Deep networks require chaining multiple optical stages with precise alignment and phase control-an engineering lift beyond single-layer demos.
- Optical limits: Spectral leakage from discrete sampling, finite apertures, and diffraction boundaries affect accuracy and require careful optical design.
- Real-valued ops: Enabling real-valued computations adds phase modulation steps and deployment complexity versus simpler diffractive vision systems.
- Hardware ceiling: Current experiments capped at 50×50 physical matrices; scaling needs high-quality, large-area SLMs, robust calibration, and integrated photonics.
Bottom line for researchers
If your workloads bottleneck on memory bandwidth and matmul energy, POMMM is worth tracking as a co-processing path. Its strengths are throughput per joule and parallelism; its challenges are integration, stability, and end-to-end system design.
Next steps to watch: multi-wavelength tensor pipelines, photonic integration to stack layers, and training schemes that bake in optical error models. If those mature, the cost per inference or training step could drop sharply for suitable models.
Publication details
Yufeng Zhang, Xiaobing Liu, Chenguang Yang, Jinlong Xiang, Hao Yan, Tianjiao Fu, Kaizhi Wang, Yikai Su, Zhipei Sun and Xuhan Guo. "Direct tensor processing with coherent light." Nature Photonics (November 14, 2025). DOI: 10.1038/s41566-025-01799-7
Your membership also unlocks: