Changes

Jump to: navigation, search
no edit summary
{{AppliesToMachineLearning}}
{{InfoBoxBottom}}
 
[[File:TBD.png|thumb|center|200px|Work in progress]]
__FORCETOC__
|-
|1.0.0
|September October 2020
|First public release
|}
The model is trained for a total number of 100 epochs, with early stopping to prevent model overfitting on train data and checkpointing the weights on best val_loss. After that, a new model is created disabling all the layers only useful during training such as dropouts and batchnorms (i.e. in this case the batchnorm layers are not used).
 
[[File:Train Accuracy.png|thumb|center|500px|Plot of model's accuracy during training phase]]
 
 
[[File:Train Loss.png|thumb|center|500px|Plot of model's loss during training phase]]
===Prune the model===
Weight pruning means eliminating unnecessary values in the weight tensors, practically setting the neural network parameters’ values to zero in order to remove the unnecessary connections between the layers of a neural network. This is done during the training process to allow the neural network to adapt to the changes. An immediate benefit from this work is disk compression: sparse tensors are amenable to compression. Hence, by applying simple file compression to the pruned TensorFlow checkpoint, it is possible to reduce the size of the model for its storage and/or transmission.
The following list shows the weights sparsity of the model, before applying pruning; . It is notable how there is actually no sparsity in the weights of the model.
<pre>
predictions/bias:0 -- Param: 6 -- Zeros: 00.00%
</pre>
 
The dimension in bytes of the compressed model size before applying pruning:
Size of gzipped loaded model: 17801431.00 bytes
</pre>
 
The accuracy of the non-pruned model over the test dataset:
The model is loaded and trained once again, resuming its previous state, after applying a pruning schedule. As training proceeds, the pruning routine will be scheduled to execute, eliminating (i.e. setting to zero) the weights with the lowest magnitude values (i.e. those closest to zero) until the current sparsity target is reached. Every time the pruning routine is scheduled to execute, the current sparsity target is recalculated, starting from 0% until it reaches the final target sparsity at the end of the pruning schedule. After the end step, the training continues, in order to regain the lost accuracy, knowing that the actual level of sparsity will not change.
In this particular case, a good compromise between compression and accuracy drop, is to prune only the two dense layers of the model, which have a high number of parameters, with a pruning schedule that start at epoch 0, ends at 1/3 of the total number of epochs (i.e. 100 epochs), starting with an initial sparsity of 50% and ending with a final sparsity of 80%, with a pruning frequency of 5 steps (i.e. the model is pruned every 5 steps during the training phase). 
[[File:Prune Accuracy.png|thumb|center|500px|Plot of model's accuracy during pruning phase]]
 
 
[[File:Prune Loss.png|thumb|center|500px|Plot of model's loss during pruning phase]]
 
The weights sparsity of the model, after applying pruning:
</pre>
The dimension in bytes of the compressed model size after pruning; the difference between the two versions of the same compressed model (before and after pruning) in terms of disk occupation is remarkable, almost by a factor of 3.
The dimension in bytes of the compressed model size after pruning:
<pre>
Size of gzipped loaded model: 5795289.00 bytes
</pre>
The difference between the two versions of the same compressed model (before and after pruning) in terms of disk occupation is remarkable, almost by a factor of 3.
 
The accuracy of the pruned model over the test dataset:
===Freezing the computational graph===
Freezing the model means producing a singular file containing information about the graph and checkpoint variables, but saving these hyperparameters as constants within the graph structure. This eliminates additional information saved in the checkpoint files such as the gradients at each point, which are included so that the model can be reloaded and resume training starting from a previous previously saved point. As this is not needed when serving a model purely for inference , they are discarded in freezing.
<pre>
total nodes : 56
</pre>
 
A much more detailed description of the computational graph, showing all the nodes and the corrisponding operations, is provided as follows:
4,650
edits

Navigation menu