PanoRadar: Enabling Visual Recognition at Radio Frequency

Abstract

PanoRadar is the first RF imaging system that brings RF resolution close to that of LiDAR, while providing resilience against conditions challenging for optical signals. Our LiDAR-comparable 3D imaging results enable, for the first time, a variety of visual recognition tasks at radio frequency, including surface normal estimation, semantic segmentation, and object detection.

PanoRadar utilizes a rotating single-chip mmWave radar, along with a combination of novel signal processing and machine learning algorithms, to create high-resolution 3D images of the surroundings. Our system accurately estimates robot motion, allowing for coherent imaging through a dense grid of synthetic antennas. It also exploits the high azimuth resolution to enhance elevation resolution using learning-based methods. Furthermore, PanoRadar tackles 3D learning via 2D convolutions and specifically addresses challenges due to the unique characteristics of RF signals. Our results demonstrate PanoRadar's robust performance across 12 buildings.

Demo Video

System Overview

This figure illustrates the system architecture of PanoRadar, consisting of four components:

3D imaging with a rotating radar: the radar's 8 antennas are placed vertically, which forms a dense cylindrical array (8×1200) when rotating.
Motion estimation and compensation: our novel algorithms for accurate robot motion estimation, combined with efficient imaging algorithms that include compensation, enable deployment on mobile robot platforms.
Vertical resolution enhancement: due to limited number of vertical antennas, a resolution enhancement model is used to for range image estimation. Our ML model efficiently learns 3D structures with 2D models.
Downstream visual recognition tasks: the high-resolution range images enable various downstream recognition tasks, including surface normal estimation, semantic segmentation, object detection, and human localization

Enhancing Elevation Resolution

Through signal processing alone, PanoRadar has achieved fine-grained resolution in both the azimuth and range dimensions. However, the elevation resolution remains limited, especially when compared to the other dimensions. The signal processing results are marked with "After SP" below.

Given the structural properties inherent in 3D environments, spatial dimensions are not independent. For instance, consistent depth cues from surfaces, as well as gravity constraints (e.g., humans and objects need support and stand on the floor), provide cross-dimensional information. We leverage machine learning to make use of these cross-dimensional dependencies for elevation resolution enhancement. Final imaging results are marked with "After ML".

The imaging results are presented in both range image and point cloud views. Signal processing results are labeled as "After SP", while machine learning results are labeled as "After ML". For point clouds given by ML, each point is color-coded according to the predicted semantic categories.

Code and Dataset

Our system is evaluated across 12 diverse buildings, demonstrating its feasibility, accuracy, and robustness in various environments. All ML methods are evaluated with a cross-building approach to ensure the generalization of our model. Specifically, we left out each building for testing while using the rest for training, repeating this process 12 times, once for each building.

We release code and dataset to facilitate future research in this direction. Instructions can be found in the readme of our github repo.

BibTeX

@inproceedings{panoradar,
  title={Enabling Visual Recognition at Radio Frequency},
  author={Lai, Haowen and Luo, Gaoxiang and Liu, Yifei and Zhao, Mingmin},
  booktitle={Proceedings of the 30th Annual International Conference on Mobile Computing and Networking (MobiCom)},
  pages={388--403},
  year={2024}
}

Acknowledgments

This work was carried out in the WAVES Lab, University of Pennsylvania. We sincerely thank the anonymous reviewers and our shepherd for their insightful comments. We are grateful to Xin Yang, Zitong Lan, Dongyin Hu, Ahhyun Yuh, and Zhiwei Zheng for their feedback. We also thank Yiqiao Liao for his contributions during the early development of this project.

This project page template is adapted from Nerfies.