Open main menu

DAVE Developer's Wiki β

Changes

no edit summary
In this particular case, a good compromise between compression and accuracy drop, is to prune only the two dense layers of the model, which have a high number of parameters, with a pruning schedule that start at epoch 0, ends at 1/3 of the total number of epochs (i.e. 100 epochs), starting with an initial sparsity of 50% and ending with a final sparsity of 80%, with a pruning frequency of 5 steps (i.e. the model is pruned every 5 steps during the training phase).
The weights sparsity of the model, after applying pruning;:
<pre>
The process of inference is expensive in terms of computation and requires a high memory bandwidth to satisfy the low-latency and high-throughput requirement of edge applications. Generally, when training neural networks, 32-bit floating-point weights and activation values are used but, with the Vitis AI quantizer the complexity of the computation could be reduced without losing prediction accuracy, by converting the 32-bit floating-point values to 8-bit integer format. In this case, the fixed-point network model requires less memory bandwidth, providing faster speed and higher power efficiency than using the floating-point model.
In the quantize calibration process, only a small set of images are required to analyze the distribution of activations. Since we are not performing any backpropagation, there is no need to provide any labels either. Depending on the size of the neural network the running time of quantize calibration varies from a few seconds to several minutes.
 
After calibration, the quantized model is transformed into a DPU deployable model (named as deploy_model.pb for vai_q_tensorflow) which follows the data format of a DPU. This model can then be compiled by the Vitis AI compiler and deployed to the DPU. This quantized model cannot be used by the standard TensorFlow framework to evaluate the loss of accuracy; hence in order to do so, a second file is produced (named as quantize_eval_model.pb for vai_q_tensorflow).
 
The accuracy of the '''baseline model''' over the test dataset after applying quantization:
'''Baseline model'''
<pre>
graph accuracy with test dataset: 0.7083
</pre>
The accuracy of the '''Pruned pruned model'''over the test dataset after applying quantization: 
<pre>
graph accuracy with test dataset: 0.7083
dave_user
207
edits