Changes

Jump to: navigation, search
InceptionV4
|}
After performing the quantization with the vai_q_tensorflow tool and after the deployment on the target device, the model has an overall value of '''''accuracy of 88.87%''''' and an overall weighted average '''''F1-score of 88.91%''''' on the test subset of the dataset. The model is still performing well in correcly classify samples belonging to ''resistor'' class (97.65% F1-score). On the other hand , for the remaining classes, there is a substantial reduction in the value of this metric. The classes that exhibit the worst results are ''diode'' class (85.15% F1-score), ''IC'' (83.27% F1-score) , and, ''transistor'' class (81.97% F1-score). In general, the performance of the model is still good, but it is decidedly definitely lower than the one obtained with the ResNet models analyzed previously.
{| align="center" style="background: transparent; margin: auto; width: 60%;"
|}
After performing the quantization with the ''vai_q_tensorflow tool'' and after the deployment on the target device, the model has an overall value of '''''accuracy of 93.34%''''' and an overall weighted average '''''F1-score of 93.34%''''' on the test subset of the dataset. The model is still performing very well in correcly classify samples belonging to ''resistor'' class (97.12% F1-score), ''inductor'' class (97.00% F1-score) , and, ''capacitor'' class (96.59% F1-score) by keeping a F1-score above 96.00%. However, for the remaining classes, the value of the metric is substantially reduced. The classes that exhibit the worst results are ''IC'' class(89.41% F1-score) because of a low value measured for precision metric (84.12% precision) , and, ''transistor'' class (87.75% F1-score) because of a very low value of the recall metric (82.80% recall). In general, the performance of the model is still good, similar to the one obtained with ResNet models.
{| align="center" style="background: transparent; margin: auto; width: 60%;"
|}
The model, before performing the quantization with the ''vai_q_tensorflow'' tool, has an overall value of '''''accuracy of 97.53%''''' and an overall weighted average '''''F1-score of 97.53%''''' over the test subset of the dataset, showing a very high generalization capability on unseen samples. Five classes have a F1-score above 96.00%, actually very high for ''inductor'' class (98.66% F1-score) , and, ''resistor'' class (98.55% F1-score). The worst result is the one displayed by the ''transistor'' class by having a F1-score below 96.00% but, still very close (95.86% F1-score) mainly due to a low value of the precision metric (93.36% precision).
{| align="center" style="background: transparent; margin: auto; width: 60%;"
|}
After performing the quantization with the ''vai_q_tensorflow'' tool and after the deployment on the target device, the model has an overall value of '''''accuracy of 93.34%''''' and an overall weighted average '''''F1-score of 93.34%''''' on the test subset of the dataset. The model is still performing very well in correcly classify samples belonging to ''resistor'' class (98.07% F1-score) and, ''capacitor'' class (96.23% F1-score) by keeping a F1-score above 96.00%. However, for the remaining classes, the value of the metric is reduced. In particular the worst results can be found in the ''IC'' class (90.80% F1-score) by having a low value of precision and recall metrics (91.73% precision, 89.90% recall) , and, ''transistor'' class by haveing having a low value of precision metric (87.88% precision).
{| align="center" style="background: transparent; margin: auto; width: 60%;"
==Comparison==
After reviewing all the created models, showing their performances in terms of accuracy and other classification metrics such as precision, recall, and F1-score, and after evaluating DPU usage/latency for a single inference over the test samples, a comparison between them should can be made. The aim is to understand if there exists one model that can be considered the best to be used for solving the problem among between the proposed ones.
Since the original dataset was augmented to compensate for the lack of data, hence resulting in a balanced dataset with the same number of samples for each of the six classes, metrics such as precision, recall , and, F1-score can be omitted and only the accuracy can be taken into account. Note that, the accuracy of a model can actually be enhanced by furtherly further tweaking the training hyperparameters or simply by training the model for a higher number of epochs. Thus, the value of this metric can actually be higher (or even lower in case of overfitting) than the one obtained for this particular configuration (all the models were trained using the same configuration).
For the purpose of this evaluation, it should be noted that only considering the accuracy as a metric might not be the best idea because there are other elements, given by the complexity of the models, that make the choice more complex. There are also features that depend exclusively on the chosen network architecture, such as the number of layers or the total number of training parameters (results in memory occupation), that become fixed parameters in the DPU kernel after model compilation.
Therefore, to proceed with the evaluation, these the followgin features must be taken into account for a better understanding of the whole situation: * accuracy pre and post quantization, * DPU Kernel parameters size and total tensor count, * DPU cores latency, and finally * DPU throughput. 
By initially considering the accuracy of the models before the quantization, it is possible to see that the ones that have a higher capability of correctly classifying the test samples are, in descending order, the Inception ResNet V2, Inception ResNet V1, and the ResNet101. These three models show an accuracy above 97%. In contrast, the models that display two of the lowest accuracy values are the ResNet50 and the Inception V4. After doing the quantization, the situation changes radically, having at the top of the list the ResNet101, followed by the ResNet50 model, while the Inception ResNet V1 and inception ResNet V2 stand at the bottom, with an accuracy drop of 6.65% for the former and 5.55% for the latter. Moreover, the worst model among those analyzed is the Inception V4, with an accuracy below 90%.
As mentioned before, two other aspects should be taken into account when comparing the models: the DPU Kernel parameters size and the total tensor count. Recall that these two data can be easily retrieved by looking at the Vitis-AI compiler log file when compiling a model or by executing on the target device the command ''ddump''.
*'''Parameters size''': indicates in the unit of MB, kB, or bytes, the amount of memory occupied by the DPU Kernel, including weight and bias. It is straightforward to check that the greater the number of parameters for an implemented model on the host, the greater the amount of memory occupied on the target device.
*'''Total tensor count''': is the total number of DPU tensors for a DPU Kernel. This value depends on the number of stacked layers between input and output layers of the model and obviously the greater the number of stacked layers, the higher the number of tensors, leading to a more complex computation on the DPU. This is directly responsible for increasing the requested amount of time for a single inference on a single image.<!--Start of table definition-->
{| style="background:transparent; color:black" border="0" align="center" cellpadding="10px" cellspacing="0px" height="550" valign="bottom"
|}
In the two figures below it is shown the DPU cores latency for 1, 2, and 4 threads; it . It is interesting to note that the core latency of Inception ResNet V1 is lower than the one of ResNet152, even though they have similar ''total tensor count'' and different values of DPU Kernel ''parameters size'' (actually greater for ResNet152). Vice versa, the ResNet101 and Inception V4 have a similar DPU Kernel ''parameters size'' and different values of ''total tensor count'', and , in this case, the core latency is higher for the latter. The same observation can be made for models ResNet50 and Inception ResNet V1 leading to the following statements:
*with the same ''total tensor count'', the latency increases along with the DPU Kernel parameters size.
*with the same DPU Kernel ''parameters size'', the latency decreases if the total tensor count lowers.
|}
Finally, it is possible to evaluate the DPU throughput in relation to the number of threads used by the benchmark application. In the figure below, it is really interesting to observe how all the models for , in the case of 1 thread, have similar values of FPS , but by increasing the level of concurrency the difference is more and more evidentwhen increasing the level of concurrency.
[[File:DPU throughput for 1-2-4 threads.png|center|thumb|500x500px|Deployed models DPU throughput for 1, 2, and 4 threads]]
In conclusion, by summing up all the considerations that have been made, it is clearly evident that the solution with the best compromise between accuracy and inference latency is the ResNet50 model, followed by the ResNet101 , and Inception ResNet V1 models.
==Useful links==
4,650
edits

Navigation menu