Changes

Jump to: navigation, search
no edit summary
[[File:FICS-PCB samples.png|center|thumb|500x500px|FICS-PCB dataset, examples of six types of components]]
 It is straightforward just by looking at the figure below that this dataset is highly unbalanced, having a lot of samples only for two classes i.e. ''capacitor'' and ''resistor''. In this situation, it is not a good idea to use this dataset as it is, simply because the models will be trained on image batches mainly composed of the most common components, hence learning only a restricted number of features. This has as a consequence that the models will probably be very good at classifying ''capacitor'' and ''resistor'' classes and pretty bad at classifying the remaining ones. Therefore, the missing data must be increased with image augmentationoversampling.
Before proceeding further, please note that the number of DSLR subset examples is by far lower than the number of the Microscope subset samples. As the two subsets were acquired using two different kinds of instruments, their characteristics — the resolution, for example — differ significantly. In order to have homogeneous images w.r.t. the characteristics, it is preferable to keep only one of them, specifically the most numerous.
[[File:Samples per class in Microscope and DSLR subsets.png|center|thumb|500x500px|FICS-PCB dataset, component count per class in DSLR and Microscope subsets]]
The dataset was created by random sampling 150000 component images from Microscope subset of the FICS-PCB dataset, providing a total of 25000 images per class. The 72% of the images were used for training, the 24% was used as a validation subset during training and the remaining 4% was used as a test set, providing exactly 108000 training images, 32000 validation images and 6000 test images, equally distributed among the six classes of the dataset. Each image was preprocessed and padded with a constant value in order to adapt its scale to the input tensor size of the models. During this process the aspect ratio of the image was not modified. To increase variety among the examples random contrast, brightness, saturation and rotation were applyed too. Classes as ''diode'', ''inductor'', ''IC'' and, ''transistor'' were oversampled.
[[File:Dataset processing and augmentation.png|center|thumb|500x500px|FICS-PCB dataset, an example of image augmentation as compensation for lack of data in IC, diode, inductor, and transistor classes]]
==Training configuration and hyperparameters setup==
The training was done in the cloud using Google Colab. All the models were trained with the same configuration for 1000 epochs, providing at each step of an epoch a mini-batch of 32 images. The learn rate was initially set at 0.000010001 with an exponential decay schedule, and dropout rate was set at 0.4 for all models. Patience for early stopping was set at 100 epochs. The training images were further augmented with random zoom, shift and, rotation in order to improve model robustness on validation and test subsets and prevent the risk of overfitting.
[[File:Image augmentation for training samples.png|center|thumb|500x500px|FICS-PCB dataset, an example of image augmentation on training images to increase the robustness of the models]]
[[File:Pre and post quantization accuracy.png|center|thumb|500x500px|Models pre and post quantization accuracy with vai_q_tensorflow tool]]
 
As mentioned before, two other aspects should be taken into account when comparing the models: the DPU Kernel parameters size and the total tensor count. Recall that these two data can be easily retrieved by looking at the Vitis-AI compiler log file when compiling a model or by executing on the target device the command ''ddump''.
Finally, it is possible to evaluate the DPU throughput in relation to the number of threads used by the benchmark application. In the figure below, it is really interesting to observe how all the models for 1 thread, have similar values of FPS but by increasing the level of concurrency the difference is more and more evident.
[[File:DPU throughput for 1-2-4 threads.png|center|thumb|500x500px|Deployed models DPU throughput for 1, 2, and 4 threads]]
 
In conclusion, by summing up all the considerations that have been made, it is clearly evident that the solution with the best compromise between accuracy and inference latency is the ResNet50 model, followed by the ResNet101 and Inception ResNet V1 models.
dave_user
207
edits

Navigation menu