Open main menu

DAVE Developer's Wiki β

Changes

ML-TN-002 - Real-time Social Distancing estimation

2,183 bytes added, 13:01, 1 March 2021
The hardware/software platform
{{AppliesToMachineLearning}}
{{AppliesTo Machine Learning TN}}
{{AppliesTo ORCA TN}}
{{AppliesTo ORCA SBC TN}}
{{InfoBoxBottom}}
To date, though, the computing power required for algorithms that complex has represented a hurdle difficult to overcome, hindering the adoption of embedded platforms for these tasks. Recently, new system-on-chips (SoC's) integrating Neural Network hardware accelerators have appeared on the market, however. Thanks to such an improvement in terms of computational power, these devices allow the implementation of novel solutions satisfying all the above-mentioned requirements.
This Technical Note illustrates one of these implementations regarding the real-time social distancing estimation issue. This work started off the publicly-available , open-source Social-Distancing project released by the [https://iit.it/|Istituto Istituto Italiano di Tecnologia (IIT)], which is illustrated in this [https://arxiv.org/abs/2011.02018v2 paper]. The goal was to port the IIT code onto a one of the DAVE Embedded Systems Single Board Computer (SBC) powered by the [https://wwwwiki.nxpdave.comeu/productsindex.php/processors-and-microcontrollers/arm-processors/i-mx-applications-processors/i-mx-8-processors/i-mx-8m-plus-arm-cortex-a53-machine-learning-vision-multimedia-and-industrial-iot:IMX8MPLUS|NXP i.MX8M Plus SoCMain_Page#Single_Board_Computers Single Board Computers]. This (SBC) suitable to build an industrial/automotive-grade SoC is built around a 4-core ARM Cortex A53 CPU and has a rich set of peripherals and systems. It also integrates a 2.3 TOPS Neural Processing Unit (NPU) and native interfaces to connect image sensors making it suited automatic machine vision system for this kind of applicationssocial distancing.
==The hardware/software platform==
The hardware platform consists ofchoice fell on the [https:* //wiki.dave.eu/index.php/ORCA_SBC ORCA SBC* TBDRegarding the software platform], it which is based on powered by the [https://www.nxp.com/products/processors-and-microcontrollers/arm-processors/i-mx-applications-processors/i-mx-8-processors/i-mx-8m-plus-arm-cortex-a53-machine-learning-vision-multimedia-and-industrial-iot:IMX8MPLUS NXP BSP TBDi. In addition to the default packages, MX8M Plus SoC]. This industrial/automotive-grade SoC is built around a 4-core ARM Cortex-A53 CPU and has a number rich set of libraries were added peripherals and systems. It also integrates a 2.3 TOPS Neural Processing Unit (NPU) and native interfaces to satisfy the application's requirementsconnect image sensors making it suited for computer vision applications.
The system software is a Yocto Linux distribution derived from the [https://www.nxp.com/design/software/embedded-software/i-mx-software/embedded-linux-for-i-mx-applications-processors:IMXLINUX NXP 5.4.70_2.3.0] BSP. In addition to the default packages, a number of libraries were added to satisfy the application's requirements. === Main application =Application software ==
As stated previously, the main application derives from the IIT Social-Distancing project. It was developed in several steps starting when only a few alpha samples of the i.MX8M Plus were available thanks to the fact that DAVE Embedded Systems joined the the component's beta program.
==== Step #1 ====The first step was conducted using the official evaluation kit (EVK) by NXP. The goal was to make the Social-Distancing project to work on this platform maintaining the core functionalities. In essence, the code was modified in order to replace the [https://github.com/CMU-Perceptual-Computing-Lab/openpose OpenPose library] with [https://github.com/tensorflow/tfjs-models/tree/master/posenet PoseNet]. This was required to cope with the operations actually supported by the [https://www.nxp.com/design/software/development-software/eiq-ml-development-environment:EIQ NXP eIQ] software stack and the NPU. For those who are familiar with embedded software development, this should be unsurprising. When porting applications from PC-like platforms to embedded platforms, in fact, handling such hardware/software constraints is a common practice.
The resulting processing pipeline is shown in the following figure.
[[File:Ss-main-pipeline-20210127.png|center|thumb|600x600px|Processing pipeline]]
The yellow box boxes indicate processing performed by the ARM cores , while the green one refers to the computation carried out by the NPU.
The following screenshots show the application running on the EVK.
[[File:Social-distancing-screenshot2.png|center|thumb|600x600px|The step 1 application running on the EVK (2/2)]]
It is worth remembering that, even though OpenPose was replaced, the software interface between high-level layers and PoseNet was not altered. This allowed allowing to keep untouched these layers.
==== Step #2 ====
Step #2 concerned implementing some optimizations in order to increase the overall frame rate.
As usual, before implementing any optimization, a profiling the code was carried out profiled in order to detect the portion of code portions that made sense to optimize. In addition to traditional, well-know techniques, the specific NPU-related tools were used as well. For instance, the following dump shows the a detailed report referring to the execution of a Convolutional Neural Network (CNN) on the accelerator.
{| class="wikitable"
|}
Combining the results of profiling with a manual analysis of the code, it was decided to work on the operations performed before the inference. Basically, these tasks were restructured to implement a parallel computation for the purpose of leveraging the quad-core ARM Cortex-A53 cluster. The resulting architecture is depicted in the following figure.[[File:Ss-main-pipeline-v2-20210204.png|center|thumb|600x600px|Processing pipeline after implementing parallel computations]] ===Step #3===In this step, the application was migrated to the definitive hardware platform, the aforementioned ORCA SBC, which was designed while the software team was working on the EVK. == Testing and results ==The following clip shows the application running on the ORCA SBC.  {| class="wikitable" | width="100%"| {{#ev:youtube|HAAH2bTVrXM|600|center|Social Distancing application running on ORCA SBC|frame}}|}  In the example, the system was fed with a 640x360 25fps stream. On average, the frame rate of the processed stream is 23 fps. This screenshot illustrates the CPU load during the execution of the application. As expected, the 4 ARM cores are almost fully loaded because of parallel computation implemented in the algorithm.  [[File:Social-distancing-htop1.png|center|thumb|600px|CPU load during the execution of the application]]
== Conclusions ==
== Future work ==For convenience, this test was run using an MPEG4 video file as input. Well-known [https://opencv.org/ OpenCV] libraries were used to decompress the video and to retrieve the frames. At the time of this writing, these libraries did not support i.MX8M Plus's hardware video decoder. As such, it should be taken into account that video decompression is carried out by the ARM cores as well. Thus, in the case of an uncompressed live stream captured from a camera, it is expected to have further processing headroom for the core computations.
8,186
edits