Abstract
Machine vision systems using convolutional neural
networks (CNNs) for robotic applications are increasingly being
developed. Conventional vision CNNs are driven by camera
frames at constant sample rate, thus achieving a fixed latency and
power consumption tradeoff. This paper describes further work
on the first experiments of a closed-loop robotic system integrating
a CNN together with a Dynamic and Active Pixel Vision Sensor
(DAVIS) in a predator/prey scenario. The DAVIS, mounted on the
predator Summit XL robot, produces frames at a fixed 15 Hz
frame-rate and Dynamic Vision Sensor (DVS) histograms
containing 5k ON and OFF events at a variable frame-rate ranging
from 15-500 Hz depending on the robot speeds. In contrast to
conventional frame-based systems, the latency and processing cost
depends on the rate of change of the image. The CNN is trained
offline on the 1.25h labeled dataset to recognize the position and
size of the prey robot, in the field of view of the predator. During
inference, combining the ten output classes of the CNN allows
extracting the analog position vector of the prey relative to the
predator with a mean 8.7% error in angular estimation. The
system is compatible with conventional deep learning technology,
but achieves a variable latency-power tradeoff that adapts
automatically to the dynamics. Finally, investigations on the
robustness of the algorithm, a human performance comparison
and a deconvolution analysis are also explored.
networks (CNNs) for robotic applications are increasingly being
developed. Conventional vision CNNs are driven by camera
frames at constant sample rate, thus achieving a fixed latency and
power consumption tradeoff. This paper describes further work
on the first experiments of a closed-loop robotic system integrating
a CNN together with a Dynamic and Active Pixel Vision Sensor
(DAVIS) in a predator/prey scenario. The DAVIS, mounted on the
predator Summit XL robot, produces frames at a fixed 15 Hz
frame-rate and Dynamic Vision Sensor (DVS) histograms
containing 5k ON and OFF events at a variable frame-rate ranging
from 15-500 Hz depending on the robot speeds. In contrast to
conventional frame-based systems, the latency and processing cost
depends on the rate of change of the image. The CNN is trained
offline on the 1.25h labeled dataset to recognize the position and
size of the prey robot, in the field of view of the predator. During
inference, combining the ten output classes of the CNN allows
extracting the analog position vector of the prey relative to the
predator with a mean 8.7% error in angular estimation. The
system is compatible with conventional deep learning technology,
but achieves a variable latency-power tradeoff that adapts
automatically to the dynamics. Finally, investigations on the
robustness of the algorithm, a human performance comparison
and a deconvolution analysis are also explored.
Original language | English |
---|---|
Number of pages | 8 |
Publication status | Accepted/In press - 19 May 2018 |
Event | 4th International Conference on Event-Based Control, Communication and Signal Processing - Perpignan, France Duration: 27 Jun 2018 → 29 Jun 2018 Conference number: 4 https://www.ebccsp2018.org |
Conference
Conference | 4th International Conference on Event-Based Control, Communication and Signal Processing |
---|---|
Abbreviated title | EBCCSP 2018 |
Country/Territory | France |
City | Perpignan |
Period | 27/06/18 → 29/06/18 |
Internet address |