Open main menu

DAVE Developer's Wiki β

Changes

ML-TN-002 - Real-time Social Distancing estimation

773 bytes added, 13:01, 1 March 2021
The hardware/software platform
{{AppliesToMachineLearning}}
{{AppliesTo Machine Learning TN}}
{{AppliesTo ORCA TN}}
{{AppliesTo ORCA SBC TN}}
{{InfoBoxBottom}}
==The hardware/software platform==
The choice fell on the [https://wiki.dave.eu/index.php/ORCA_SBC ORCA SBC], which is powered by the [https://www.nxp.com/products/processors-and-microcontrollers/arm-processors/i-mx-applications-processors/i-mx-8-processors/i-mx-8m-plus-arm-cortex-a53-machine-learning-vision-multimedia-and-industrial-iot:IMX8MPLUS|NXP NXP i.MX8M Plus SoC]. This industrial/automotive-grade SoC is built around a 4-core ARM Cortex-A53 CPU and has a rich set of peripherals and systems. It also integrates a 2.3 TOPS Neural Processing Unit (NPU) and native interfaces to connect image sensors making it suited for computer vision applications.
The system software is a Yocto Linux distribution derived from the [https://www.nxp.com/design/software/embedded-software/i-mx-software/embedded-linux-for-i-mx-applications-processors:IMXLINUX NXP 5.4.70_2.3.0] BSP. In addition to the default packages, a number of libraries were added to satisfy the application's requirements.
As stated previously, the main application derives from the IIT Social-Distancing project. It was developed in several steps starting when only a few alpha samples of the i.MX8M Plus were available thanks to the fact that DAVE Embedded Systems joined the the component's beta program.
==== Step #1 ====
The first step was conducted using the official evaluation kit (EVK) by NXP. The goal was to make the Social-Distancing project to work on this platform maintaining the core functionalities. In essence, the code was modified to replace the [https://github.com/CMU-Perceptual-Computing-Lab/openpose OpenPose library] with [https://github.com/tensorflow/tfjs-models/tree/master/posenet PoseNet]. This was required to cope with the operations actually supported by the [https://www.nxp.com/design/software/development-software/eiq-ml-development-environment:EIQ NXP eIQ] software stack and the NPU. For those who are familiar with embedded software development, this should be unsurprising. When porting applications from PC-like platforms to embedded platforms, in fact, handling such hardware/software constraints is a common practice.
It is worth remembering that, even though OpenPose was replaced, the software interface between high-level layers and PoseNet was not altered allowing to keep untouched these layers.
==== Step #2 ====
Step #2 concerned implementing some optimizations in order to increase the overall frame rate.
[[File:Ss-main-pipeline-v2-20210204.png|center|thumb|600x600px|Processing pipeline after implementing parallel computations]]
====Step #3====In this step, the application was migrated to the definitive hardware platform, the aforementioned ORCA SBC, which was designed while the software team was working on the EVK.
==Testing==
The following clip shows the application running on the ORCA SBC.
TBD inserire video{| class="wikitable" | width="100%"| {{#ev:youtube|HAAH2bTVrXM|600|center|Social Distancing application running on ORCA SBC|frame}}|}
In this casethe example, the system was fed with a 640x360 @25 fps 25fps stream. On average, the frame rate of the processed stream is 23 fps.
This screenshot illustrates the CPU load during the execution of the application. As expected, the 4 ARM cores are almost fully loaded because of parallel computation implemented in the algorithm.
 
[[File:Social-distancing-htop1.png|center|thumb|600px|CPU load during the execution of the application]]
== Future work ==For convenience, this test was run using an MPEG4 video file as input. Well-known [https://opencv.org/ OpenCV] libraries were used to decompress the video and to retrieve the frames. At the time of this writing, these libraries did not support i.MX8M Plus's hardware video decoder. As such, it should be taken into account that video decompression is carried out by the ARM cores as well. Thus, in the case of an uncompressed live stream captured from a camera, it is expected to have further processing headroom for the core computations.
8,220
edits