In this particular case, a good compromise between compression and accuracy drop, is to prune only the two dense layers of the model, which have a high number of parameters, with a pruning schedule that start at epoch 0, ends at 1/3 of the total number of epochs (i.e. 100 epochs), starting with an initial sparsity of 50% and ending with a final sparsity of 80%, with a pruning frequency of 5 steps (i.e. the model is pruned every 5 steps during the training phase).
The weights sparsity of the model, after applying pruning;
<pre>