Changes

ML-TN-003 — AI at the edge: visual inspection of assembled PCBs for defect detection — Part 2

176 bytes added, 12:10, 13 April 2021

no edit summary

It is straightforward just by looking at the figure below that this dataset is highly unbalanced, having a lot of samples only for two classes i.e. capacitor and resistor. This is indeed no surprise, simply because these two component types are more commonly mounted on a PCB with respect to the others. Unfortunately, in this situation, it is not a good idea to use this dataset as it is, simply because the models will be trained on image batches mainly composed by the most common components, hence learning only a restricted number of features. This has as a consequence that the models will probably be very good at classifying capacitor and resistor and pretty bad at classifying the remaining classes. Therefore, the missing data must be increased with image augmentation.

Before proceeding further, please note that the number of DSLR subset examples is by far lower than the number of the Microscope subset samples.As the two subsets were acquired using two different kind of instruments, their characteristics — the resolution, for example — differ significantly. In order to have homogeneous images w.r.t. the characteristics, it is preferable to keep only one of them, specifically the most numerous.

{|style="background:transparent; color:black" border="0" ~~height~~align="center" cellpadding="10px" cellspacing="~~550~~0px" ~~align~~height="~~center~~550" valign="bottom" ~~cellpadding=10px cellspacing=0px~~|-align="center"

|

|[[File:Resnet50 train and validation accuracy.png|thumb|500x500px|Train and validation accuracy trend over 1000 training epochs for ResNet50 model]]

|}

The model, before performing the quantization with the ''vai_q_tensorflow'' tool, has an overall value of '''''accuracy of 94.85%''''' and an overall weighted average '''''F1-score of 94.86%''''' over the test subset of the dataset, showing a good generalization capability on unseen samples. The classes with the highest F1-score, above 96.00% are: ''resistor'' (98.08% F1-score), ''inductor'' (97.10% F1-score) and, ''capacitor'' (96.88% F1-score). On the contrary, the class in which the model performs poorly w.r.t the others, is the ''diode'' class (91.75% F1-score). This is attributable to a low value of precision metric (88.55% precision).

The model, before performing the quantization with the ''vai_q_tensorflow'' tool, has an overall value of '''''accuracy of 94.85%''''' and an overall weighted average '''''F1-score of 94.86%''''' over the test subset of the dataset, showing a good generalization capability on unseen samples. The classes with the highest F1-score, above 96.00% are: ''resistor'' (98.08% F1-score), ''inductor'' (97.10% F1-score) and, ''capacitor'' (96.88% F1-score). On the contrary, the class on which the model performs poorly w.r.t the other classes, is the class diode (91.75% F1-score) due to having a low precision (88.55% precision). {| align="center" style = "background: transparent; margin: auto; width: 60%;"

|-

{| class="wikitable" style="margin: auto; text-align: center;"

|+ Host machine, classification report

|}

After performing the quantization with the ''vai_q_tensorflow'' tool and after the deployment on the target device, the model has an overall value of '''''accuracy of 93.27%''''' and an overall weighted average '''''F1-score of 93.29%''''' on the test subset of the dataset. The model is still performing well on ''resistor'' class (98.08% F1-score), ''inductor'' class(97.10% F1-score) and, ''capacitor'' class (96.88% F1-score). The worst results of the model in the classification task can be found in the ''transistor'' class (89.78% F1-score) because both precision and recall metrics are below 90.00% (89.96% precision and, 89.60% recall) and, in the ''diode'' class (88.59% F1-score) because the precision metric is very low (83.77% precision).

After performing the quantization with the ''vai_q_tensorflow'' tool and after the deployment on the target device, the model has an overall value of '''''accuracy of 93.27%''''' and an overall weighted average '''''F1-score of 93.29%''''' on the test subset of the dataset. The model is still performing well on ''resistor'' (98.08% F1-score), ''inductor'' (97.10% F1-score) and, ''capacitor'' (96.88% F1-score) classes. However, the model shows the worst results for ''transistor'' class (89.78% F1-score) due to having both precision and recall below 90.00% (89.96% precision and, 89.60% recall) and, ''diode'' class (88.59& F1-score) since the precision for this class is very low (83.77% precision). {| align="center" style = "background: transparent; margin: auto; width: 60%;"

|-

{| class="wikitable" style="margin: auto; text-align: center;"

|+ Target device, classification report

|}

To perform the inference over the images, only one DPU core is used for 1 thread, leading to almost a 55% utilization of the DPU-01 core. By increasing the number of threads i.e. with 4 threads, more cores are used and the percentage gets higher, very close to 100% on DPU-00 core and close to 90% on DPU-01 core. Concerning the DPU latency, for 1 thread the average latency for one image is about 12ms (11526.41μs). By increasing the concurrency, the latency for both cores is higher, about 13ms (13318.01μs) for DPU-00 core, and 12ms (12019.21μs) for DPU-01 core when using 2 threads and about 14ms (14200.19μs) for DPU-00 core, and 13ms (12776.24μs) for DPU-01 core with 4 concurrent threads.

{|style="background:transparent; color:black" border="0" ~~height~~align="center" cellpadding="10px" cellspacing="~~550~~0px" ~~align~~height="~~center~~550" valign="bottom" ~~cellpadding=10px cellspacing=0px~~|-align="center"

|

|[[File:Resnet50 cores utilization.png|thumb|500x500px|Utilization of CPU and DPU cores of ResNet50 model for 1, 2, and 4 threads]]

|

|}

===ResNet101===

{|style="background:transparent; color:black" border="0" ~~height~~align="center" cellpadding="10px" cellspacing="~~550~~0px" ~~align~~height="~~center~~550" valign="bottom" ~~cellpadding=10px cellspacing=0px~~|-align="center"

|

|[[File:Resnet101 train and validation accuracy.png|thumb|500x500px|Train and validation accuracy trend over 1000 training epochs for ResNet101 model]]

|}

The model, before performing the quantization with the ''vai_q_tensorflow'' tool, has an overall value of '''''accuracy of 97.10%''''' and an overall weighted average '''''F1-score of 97.11%''''' over the test subset of the dataset, showing a very high generalization capability on unseen samples. All the classes have a F1-score above 96.00%, in particular it is very high in the ''resistor'' class (98.65% F1-score) and, in the ''inductor'' class (98.50% F1-score) with the only exception of the ''diode'' class (95.40% F1-score) mainly because it has a low value of recall metric (94.40% recall).

The model, before performing the quantization with the ''vai_q_tensorflow'' tool, has an overall value of '''''accuracy of 97.10%''''' and an overall weighted average '''''F1-score of 97.11%''''' over the test subset of the dataset, showing a very high generalization capability on unseen samples. All the classes have a F1-score above 96.00%, very high for class ''resistor'' (98.65% F1-score) and, class ''inductor'' (98.50% F1-score) with only the exception of the diode class (95.40% F1-score) mainly because it has a low recall (94.40% recall). {| align="center" style = "background: transparent; margin: auto; width: 60%;"

|-

{| class="wikitable" style="margin: auto; text-align: center;"

|+ Host machine, classification report

|}

After performing the quantization with the ''vai_q_tensorflow'' tool and after the deployment on the target device, the model has an overall value of '''''accuracy of 93.95%'''''' and an overall weighted average '''''F1-score of 93.91%''''' on the test subset of the dataset. The model is still performing very well in correcly classify samples of the ''capacitor'' class by keeping the F1-score above 96.00% (97.03% F1-score) but, on the other hand for the remaining classes, there is a substantial drop in the value of this metric. The classes that exhibits the worst results are ''diode'' class (92.09% F1-score) and, ''IC'' class (92.06% F1-score) a low recall (88.20% recall). In general, the performance of the model is still good, similar to the one obtained with the ResNet50 model.

After performing the quantization with the ''vai_q_tensorflow'' tool and after the deployment on the target device, the model has an overall value of '''''accuracy of 93.95%'''''' and an overall weighted average '''''F1-score of 93.91%''''' on the test subset of the dataset. The model is still performing very well on the ''capacitor'' class by keeping a F1-score above 96.00% (97.03% F1-score) but, on the other hand for the remaining classes, there is a substantial drop in the value of the metric. The classes that exhibits the worst results are ''diode'' (92.09% F1-score) and, ''IC'' (92.06% F1-score) due to having a low recall (88.20% recall). In general, the performance of the model is still good, similar to the one obtained with the ResNet50 model. {| align="center" style = "background: transparent; margin: auto; width: 60%;"

|-

{| class="wikitable" style="margin: auto; text-align: center;"

|+ Target device, classification report

|}

To perform the inference over the images, only one DPU core is used for 1 thread, leading to almost a 70% utilization of the DPU-01 core. By increasing the number of threads i.e. with 4 threads, more cores are used and the percentage gets higher, very close to 100% on DPU-00 core and close to 95% on DPU-01 core. Concerning the DPU latency, for 1 thread the average latency for one image is about 21ms (21339.73μs). By increasing the concurrency, the latency for both cores is higher, about 24ms (24313.61μs) for DPU-00 core, and 22ms (22231.22μs) for DPU-01 core when using 2 threads and about 25ms (25385.51μs) for DPU-00 core, and 23ms (23025.89μs) for DPU-01 core with 4 concurrent threads.

{|style="background:transparent; color:black" border="0" ~~height~~align="center" cellpadding="10px" cellspacing="~~550~~0px" ~~align~~height="~~center~~550" valign="bottom" ~~cellpadding=10px cellspacing=0px~~|-align="center"

|

|[[File:Resnet101 cores utilization.png|thumb|500x500px|Utilization of CPU and DPU cores of ResNet101 model for 1, 2, and 4 threads]]

|

|}

===ResNet152===

{|style="background:transparent; color:black" border="0" ~~height~~align="center" cellpadding="10px" cellspacing="~~550~~0px" ~~align~~height="~~center~~550" valign="bottom" ~~cellpadding=10px cellspacing=0px~~|-align="center"

|

|[[File:Resnet152 train and validation accuracy.png|thumb|500x500px|Train and validation accuracy trend over 1000 training epochs for ResNet152 model]]

|

|}

The model, before performing the quantization with the ''vai_q_tensorflow'' tool, has an overall value of '''''accuracy of 96.46%''''' and an overall weighted average '''''F1-score of 96.48%''''' over the test subset of the dataset, showing a good generalization capability on unseen samples. The classes with the highest F1-score, above 96.00% are respectively ''resistor'' (98.58% F1-score), ''inductor'' (98.03% F1-score) and, ''capacitor'' (96.99% F1-score), a result quite similar to ResNet50 model. The worst performance is the one displayed by the class ''transistor'' by having "only" a F1-score around 94.00% (94.18% F1-score) mainly due to a low value of the precision metric (92.89% precision).

{| align="center" style = "background: transparent; margin: auto; width: 60%;"

|-

{| class="wikitable" style="margin: auto; text-align: center;"

|+ Host machine, classification report

|}

After performing the quantization with the ''vai_q_tensorflow'' tool and after the deployment on the target device, the model has an overall value of '''''accuracy of 93.40%''''' and an overall weighted average '''''F1-score of 93.36%''''' on the test subset of the dataset. The model is still performing very well on the ''capacitor'' class by keeping a F1-score above 96.00% (96.62% F1-score) but, on the other hand for the remaining classes, there is a substantial drop in the value of the metric. The classes that exhibit the worst results are ''diode'' (91.65% F1-score) because the recall is very low (87.30% recall), ''IC'' (91.09% F1-score) having low precision and recall (91.18% precision, 91.00% recall) and, ''transistor'' (90.62% F1-score) having low precision and recall (90.35% precision, 90.62% recall). In general, the performance of the model is still good, similar to the one obtained with two previous models, especially similar to the ResNet101 model.

{| align="center" style = "background: transparent; margin: auto; width: 60%;"

|-

{| class="wikitable" style="margin: auto; text-align: center;"

|+ Target device, classification report

|}

To perform the inference over the images, only one DPU core is used for 1 thread, leading to almost an 80% utilization of the DPU-01 core. By increasing the number of threads i.e. with 4 threads, more cores are used and the percentage gets higher, very close to 100% on both DPU-00 and DPU-01 cores. Concerning the DPU latency, for 1 thread the average latency for one image is about 28ms (28867.86μs). By increasing the concurrency, the latency for both cores is higher, about 33ms (32702.59μs) for DPU-00 core, and 30ms (30046.64μs) for DPU-01 core when using 2 threads and about 34ms (33826.30μs) for DPU-00 core, and 30ms (30834.46μs) for DPU-01 core with 4 concurrent threads.

{|style="background:transparent; color:black" border="0" ~~height~~align="center" cellpadding="10px" cellspacing="~~550~~0px" ~~align~~height="~~center~~550" valign="bottom" ~~cellpadding=10px cellspacing=0px~~|-align="center"

|

|[[File:Resnet152 cores utilization.png|thumb|500x500px|Utilization of CPU and DPU cores of ResNet152 model for 1, 2, and 4 threads]]

|

|}

===InceptionV4===

{|style="background:transparent; color:black" border="0" ~~height~~align="center" cellpadding="10px" cellspacing="~~550~~0px" ~~align~~height="~~center~~550" valign="bottom" ~~cellpadding=10px cellspacing=0px~~|-align="center"

|

|[[File:InceptionV4 train and validation accuracy.png|thumb|500x500px|Train and validation accuracy trend over 1000 training epochs for InceptionV4 model]]

|

|}

The model, before performing the quantization with the vai_q_tensorflow tool, has an overall value of '''''accuracy of 92.68%''''' and an overall weighted average '''''F1-score of 92.69%''''' over the test subset of the dataset, showing a good generalization capability on unseen samples, although lower than in the three previous models. The classes with the highest F1-score, above 96.00% are: ''resistor'' (97.56% F1-score), ''capacitor'' (96.81% F1-score) and, ''inductor'' (96.38% F1-score). However, the model performance on the three remaining classes is poorly compared w.r.t the previous models, showing an F1-score below 90.00% in the class ''diode'' (87.94% F1-score) and, class ''transistor'' (87.27% F1-score) due to having low precision and recall in the former case (88.38% precision, 87.50% recall) and, low precision in the latter (83.67% precision).

{| align="center" style = "background: transparent; margin: auto; width: 60%;"

|-

{| class="wikitable" style="margin: auto; text-align: center;"

|+ Host machine, classification report

|}

After performing the quantization with the vai_q_tensorflow tool and after the deployment on the target device, the model has an overall value of '''''accuracy of 88.87%''''' and an overall weighted average '''''F1-score of 88.91%''''' on the test subset of the dataset. The model is still performing well on ''resistor'' (97.65% F1-score) but, on the other hand for the remaining classes, there is a substantial drop in the value of the metric. The classes that exhibit the worst results are ''diode'' (85.15% F1-score), ''IC'' (83.27% F1-score) and, ''transistor'' (81.97% F1-score). In general, the performance of the model is still good, but it is decidedly lower than the models analyzed previously.

{| align="center" style = "background: transparent; margin: auto; width: 60%;"

|-

{| class="wikitable" style="margin: auto; text-align: center;"

|+ Target device, classification report

|}

To perform the inference over the images, only one DPU core is used for 1 thread, leading to almost a 70% utilization of the DPU-01 core. By increasing the number of threads i.e. with 4 threads, more cores are used and the percentage gets higher, very close to 100% on both DPU-00 and DPU-01 cores. Concerning the DPU latency, for 1 thread the average latency for one image is about 30ms (30127.38μs). By increasing the concurrency, the latency for both cores is higher, about 34ms (34105.45μs) for DPU-00 core, and 31ms (30981.59μs) for DPU-01 core when using 2 threads and about 35ms (35273.61μs) for DPU-00 core, and 31ms (31761.21μs) for DPU-01 core with 4 concurrent threads.

{|style="background:transparent; color:black" border="0" ~~height~~align="center" cellpadding="10px" cellspacing="~~550~~0px" ~~align~~height="~~center~~550" valign="bottom" ~~cellpadding=10px cellspacing=0px~~|-align="center"

|

|[[File:Inception v4 cores utilization.png|thumb|500x500px|Utilization of CPU and DPU cores of InceptionV4 model for 1, 2, and 4 threads]]

|

|}

===Inception ResNet V1===

{|style="background:transparent; color:black" border="0" ~~height~~align="center" cellpadding="10px" cellspacing="~~550~~0px" ~~align~~height="~~center~~550" valign="bottom" ~~cellpadding=10px cellspacing=0px~~|-align="center"

|

|[[File:Inception ResNet V1 train and validation accuracy.png|thumb|500x500px|Train and validation accuracy trend over 1000 training epochs for Inception ResNet V1 model]]

|

|}

The model, before performing the quantization with the vai_q_tensorflow tool, has an overall value of '''''accuracy of 97.66%''''' and an overall weighted average '''''F1-score of 97.36%''''' over the test subset of the dataset, showing a very high generalization capability on unseen samples. All the classes have a F1-score above 96.00%, actually very high for the class ''resistor'' (98.50% F1-score).

{| align="center" style = "background: transparent; margin: auto; width: 60%;"

|-

{| class="wikitable" style="margin: auto; text-align: center;"

|+ Host machine, classification report

|}

After performing the quantization with the vai_q_tensorflow tool and after the deployment on the target device, the model has an overall value of '''''accuracy of 93.34%''''' and an overall weighted average '''''F1-score of 93.34%''''' on the test subset of the dataset. The model is still performing very well in three classes i.e ''resistor'' (97.12% F1-score), ''inductor'' (97.00% F1-score) and, ''capacitor'' (96.59% F1-score) by keeping a F1-score above 96.00%. However, for the remaining classes, the value of the metric is substantially reduced. The classes that exhibit the worst results are ''IC'' (89.41% F1-score) due to having low precision (84.12% precision) and, ''transistor'' (87.75% F1-score) due to having a very low recall (82.80% recall). In general, the performance of the model is still good, similar to the one obtained with ResNet models.

{| align="center" style = "background: transparent; margin: auto; width: 60%;"

|-

{| class="wikitable" style="margin: auto; text-align: center;"

|+ Target device, classification report

|}

To perform the inference over the images, only one DPU core is used for 1 thread, leading to almost a 60% utilization of the DPU-01 core. By increasing the number of threads i.e. with 4 threads, more cores are used and the percentage gets higher, very close to 100% on DPU-00 core and to 90% on DPU-01 core. Concerning the DPU latency, for 1 thread the average latency for one image is about 18ms (17651.31μs). By increasing the concurrency, the latency for both cores is higher, about 21ms (20511.79μs) for DPU-00 core, and 18ms (18466.97μs) for DPU-01 core when using 2 threads and about 22ms (21654.99μs) for DPU-00 core, and 20ms (19503.17μs) for DPU-01 core with 4 concurrent threads.

{|style="background:transparent; color:black" border="0" ~~height~~align="center" cellpadding="10px" cellspacing="~~550~~0px" ~~align~~height="~~center~~550" valign="bottom" ~~cellpadding=10px cellspacing=0px~~|-align="center"

|

|[[File:Inception resnet v1 cores utilization.png|thumb|500x500px|Utilization of CPU and DPU cores of Inception ResNet V1 model for 1, 2, and 4 threads]]

|

|}

===Inception ResNet V2===

{|style="background:transparent; color:black" border="0" ~~height~~align="center" cellpadding="10px" cellspacing="~~550~~0px" ~~align~~height="~~center~~550" valign="bottom" ~~cellpadding=10px cellspacing=0px~~|-align="center"

|

|[[File:Inception ResNet V2 train and validation accuracy.png|thumb|500x500px|Train and validation accuracy trend over 1000 training epochs for Inception ResNet V2 model]]

|

|}

The model, before performing the quantization with the vai_q_tensorflow tool, has an overall value of '''''accuracy of 97.53%''''' and an overall weighted average '''''F1-score of 97.53%''''' over the test subset of the dataset, showing a very high generalization capability on unseen samples. Five classes have a F1-score above 96.00%, actually very high for class ''inductor'' (98.66% F1-score)

and, class ''resistor'' (98.55% F1-score). The worst result is the one displayed by the class ''transistor'' by having a F1-score below 96.00% but, still very close (95.86% F1-score) mainly due to a low value of the precision metric (93.36% precision).

{| align="center" style = "background: transparent; margin: auto; width: 60%;"

|-

{| class="wikitable" style="margin: auto; text-align: center;"

|+ Host machine, classification report

|}

The model is still performing very well in two classes i.e ''resistor'' (98.07% F1-score) and, ''capacitor'' (96.23% F1-score) by keeping a F1-score above 96.00%. However, for the remaining classes, the value of the metric is reduced. In particular the worst results can be found in the class ''IC'' (90.80% F1-score) by having a low value for precision and recall metrics (91.73% precision, 89.90% recall) and, class ''transistor'' due to have low precision (87.88% precision).

{| align="center" style = "background: transparent; margin: auto; width: 60%;"

|-

{| class="wikitable" style="margin: auto; text-align: center;"

|+ Target device, classification report

|}

To perform the inference over the images, only one DPU core is used for 1 thread, leading to almost a 65% utilization of the DPU-01 core. By increasing the number of threads i.e. with 4 threads, more cores are used and the percentage gets higher, very close to 100% on DPU-00 core and to 95% on DPU-01 core. Concerning the DPU latency, for 1 thread the average latency for one image is about 25ms (25185.03μs). By increasing the concurrency, the latency for both cores is higher, about 29ms (28858.88μs) for DPU-00 core, and 26ms (26336.11μs) for DPU-01 core when using 2 threads and about 30ms (30229.27μs) for DPU-00 core, and 27ms (27452.70μs) for DPU-01 core with 4 concurrent threads.

{|style="background:transparent; color:black" border="0" ~~height~~align="center" cellpadding="10px" cellspacing="~~550~~0px" ~~align~~height="~~center~~550" valign="bottom" ~~cellpadding=10px cellspacing=0px~~|-align="center"

|

|[[File:Inception resnet v2 cores utilization.png|thumb|500x500px|Utilization of CPU and DPU cores of Inception ResNet V2 model for 1, 2, and 4 threads]]

|

|}

==Comparison==

{|style="background:transparent; color:black" border="0" ~~height~~align="center" cellpadding="10px" cellspacing="~~550~~0px" ~~align~~height="~~center~~550" valign="bottom" ~~cellpadding=10px cellspacing=0px~~|-align="center"

|

|[[File:DPU Kernel parameters size.png|thumb|500x500px|Deployed models DPU Kernel parameters size]]

|

|}

In the two figures below it is shown the DPU cores latency for 1, 2, and 4 threads; it is interesting to note that the core latency of Inception ResNet V1 is lower than the one of ResNet152, even though they have similar ''total tensor count'' and different values of DPU Kernel ''parameters size'' (actually greater for ResNet152). Vice versa, the ResNet101 and Inception V4 have a similar DPU Kernel ''parameters size'' and different values of ''total tensor count'', and in this case, the core latency is higher for the latter. The same observation can be made for models ResNet50 and Inception ResNet V1 leading to the following statements:

{|style="background:transparent; color:black" border="0" ~~height~~align="center" cellpadding="10px" cellspacing="~~550~~0px" ~~align~~height="~~center~~550" valign="bottom" ~~cellpadding=10px cellspacing=0px~~|-align="center"

|

|[[File:DPU-00 core latency for 1-2-4 threads.png|thumb|500x500px|Deployed models DPU-00 core latency for 1, 2, and 4 threads]]

|

|}

Finally, it is possible to evaluate the DPU throughput in relation to the number of threads used by the benchmark application. In the figure below, it is really interesting to observe how all the models for 1 thread, have similar values of FPS but by increasing the level of concurrency the difference is more and more evident.

U0019

dave_user

207

edits

Changes

ML-TN-003 — AI at the edge: visual inspection of assembled PCBs for defect detection — Part 2

Navigation menu

Personal tools

Namespaces

Variants

Views

More

Search

Navigation

Quick Links

Contact us

How to use wiki

Advanced Search

Tools