Moving Beyond Perception: How AFEELA's AI Learns Relationships and Context
October 16, 2025 * #tech
Welcome to the Sony Honda Mobility Tech Blog. This series opens the hood on AFEELA Intelligent Drive-our Advanced Driver-Assistance System (ADAS)-and the in-house AI model behind it. The focus: moving from basic detection to contextual reasoning so the system can interpret how elements in a scene relate and what that implies for driving decisions.
From Perception to Reasoning
AFEELA's model integrates cameras, LiDAR, radar, SD maps, and odometry into a single pipeline. The objective goes beyond "what is that?" to "how do these elements relate and what action should follow?" We refer to this as Contextual AI: fusing multi-sensor signals with scene-level logic so the vehicle can build a coherent picture from partial, noisy inputs.
Precision from Above: SPAD LiDAR and Sensor Placement
AFEELA 1 carries 40 sensors. A key piece is a Time-of-Flight LiDAR using a Sony-developed Single Photon Avalanche Diode (SPAD) receiver, producing high-density 3D point clouds at up to 20 Hz for detailed mapping.
In testing, SPAD-based LiDAR improved object recognition in low light and at long range. Reflection intensity added another signal that helped the model detect lane markings and separate pedestrians from vehicles with higher fidelity.
We also made a deliberate placement choice: LiDAR and cameras are roof-mounted. This provides a wider, unobstructed field of view and reduces blind spots introduced by the body. It's a design trade based on performance first.
Topology: Reasoning About How the Scene Fits Together
AFEELA's system models the scene as structured relationships-what we call topology. Example: Lane Topology infers how lanes merge, split, and intersect, and how signs and traffic lights connect to specific lanes. The point is to interpret the scene as a graph, not a list.
Transformers make this possible. With attention, the model learns long-range associations across complex inputs-even when signals are far apart or in different modalities. It links 3D LiDAR lane geometry with 2D camera traffic lights without heavy pre-alignment. This abstraction level raises the bar on data rigor, so we enforce precise modeling rules and labeling guidelines to keep training consistent and reliable.
For background on the architecture, see the original Transformer paper: Attention Is All You Need.
Making Transformers Run in Real Time
Transformers are compute- and memory-intensive. Early versions in our stack ran at roughly one-tenth the efficiency of comparable CNNs. The constraint wasn't raw FLOPs; it was memory access. Attention requires frequent large matrix multiplications, triggering constant memory reads and writes that can underutilize the SoC.
We partnered with Qualcomm to tune architecture and execution. Iterative optimization delivered a 5× efficiency gain over our baseline, enabling large-scale models to run in real time inside AFEELA's ADAS. There's still a gap versus CNNs, and work continues to close it through deeper architectural and runtime improvements. For context on the vehicle compute platform category, see Qualcomm's automotive AI stack overview: Qualcomm ADAS.
Multi-Modal Integration That Holds Up on Real Roads
Road scenes change quickly-lighting, weather, and surface conditions vary. AFEELA's model fuses LiDAR, radar, and SD maps into one reasoning layer. Cross-verification across sources increases accuracy and stability, even when one modality degrades.
By linking perception, topology, and decision context, the system aims for real-world usable intelligence, not just benchmarks.
Practical Notes for Engineers
- Treat the scene as a graph. Lane attribution (signals-to-lanes, merges/splits) is a high-leverage capability for policy and planning.
- Use cross-modal attention to reduce pre-alignment overhead, but invest heavily in consistent labels and clear ontology. Data rules are a feature, not an afterthought.
- Profile memory bandwidth early. Attention layers can be bound by reads/writes more than compute. Optimize op fusion, tiling, and caching with your vendor toolchain.
- Sensor placement is an algorithm decision. Field of view and occlusion patterns can simplify downstream reasoning and improve model reliability.
What's Next
In the next post, we'll share how we're improving learning efficiency across modalities and tasks.
Note: The statements and information above are based on development-stage data.
If you're building your own edge AI stack and want structured learning paths, explore curated AI courses by skill: Complete AI Training.
Your membership also unlocks: