Research Day: Is Sensor Fusion Better Than Computer Vision Alone?
- Raffay Hassan
- Feb 4
- 2 min read
One of the key questions explored in this project is whether sensor fusion provides a more reliable collision-prevention strategy than relying solely on computer vision. With modern deep-learning models such as YOLO achieving strong real-time performance, it is reasonable to question whether additional sensors are necessary.
Computer vision is highly effective at identifying what is present in the environment. Vision-based models can detect and classify vehicles, pedestrians, and obstacles with high accuracy under favourable conditions. In clear lighting and low-complexity scenes, vision-only systems can perform well and are widely used in many perception pipelines.
However, collision prevention depends not only on recognising objects, but also on understanding distance, relative speed, and timing. Cameras measure pixels rather than physical units, meaning depth and velocity must be inferred indirectly. This can introduce uncertainty, particularly in situations involving poor lighting, glare, fast-moving objects, or crowded scenes (Geiger et al., 2012).
Sensor fusion addresses these limitations by combining complementary sensing modalities. In this project, vision is used to identify and classify objects, LiDAR provides accurate metric distance measurements, and mmWave radar supplies relative velocity information. When fused, these sensors enable more reliable estimation of Time-To-Collision (TTC) using physical measurements rather than visual assumptions. This multi-sensor approach reflects established automotive safety practices and aligns with real-world ADAS design principles (Hasirlioglu et al., 2020).
A sensor-driven digital twin is used to evaluate both approaches under identical simulated conditions. This allows direct comparison between vision-only and fused-sensor systems in terms of TTC accuracy, braking behaviour, and overall robustness, while avoiding the risks of real-world testing. The results so far suggest that while computer vision is a powerful perception tool, sensor fusion provides a more robust and safety-oriented solution for collision prevention, particularly in complex or degraded environments.
In short: vision explains what is happening, but sensor fusion helps determine how dangerous the situation really is.
References
Geiger, A., Lenz, P. and Urtasun, R. (2012) ‘Are we ready for autonomous driving? The KITTI vision benchmark suite’, Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 3354–3361.
Hasirlioglu, S., Kamann, A., Doric, I. and Brandmeier, T. (2020) ‘Test methodology for rain influence on automotive surround sensors’, IEEE Intelligent Vehicles Symposium, pp. 2242–2247.
Grieves, M. and Vickers, J. (2017) ‘Digital Twin: Mitigating unpredictable, undesirable emergent behavior in complex systems’, in Transdisciplinary Perspectives on Complex Systems. Springer, pp. 85–113.



Comments