Changes

Jump to: navigation, search
Data augmentation with image synthesis
In the previous articles, when facing a similar problem of lack of data, data augmentation was employed with ease by simply creating input pipelines with image re-mapping operations, or by using OpenCV functions to increase the amount of data stored in memory, more or less in the same fashion as TensorFlow image APIs.
However, despite being a good and very common methodology to increase model robustness for the classification task, new images are generated directly from an already existent image. Thus, newly created images are always based on already existent features, such as shapes, edges, contours, etc. For this reason, it can be really a matter of interest to explore the possibility of generating new data based on features of the entire original subset. This can be achieved through ML techniques applied to artificial image synthesis, in particular with the help of [https://en.wikipedia.org/wiki/Generative_adversarial_network Generative Adversarial Networks (GANs)].
===Generative Adversarial Networks===
===Progressive GAN implementation===
When using [https://towardsdatascience.com/progan-how-nvidia-generated-images-of-unprecedented-quality-51c98ec2cbd2 proGANs ] to synthesize images at high resolutions, instead of attempting to train all layers of the generator and discriminator at once to generate samples at the target resolution as it is usually done, the networks are initially created with a block containing only a bunch of layers and . Then, they are progressively grown to output higher resolution versions of the images by adding more and more blocks, one at a time, after completing the training of the previous one, as illustrated in the figure below.
[[File:Training progression.png|center|thumb|500x500px|ProGAN: training progression]]
This approach leads to a series of advantages:
*the The incremental learning process greatly stabilizes training and reduces the chance of mode collapse, since the networks gradually learn a much simple piece of the overall problem.*the The low-to-high resolution trend forces the progressively grown networks to focus on high-level structure first and fill in the details later, resulting in an improvement of the quality of the final images*increasing Increasing the network size gradually is more computationally efficient w.r.t. the classic approach of using all the layers from the start (fewer layers are faster to train because there are fewer parameters).
To implement a proGAN, one possibile solution is to pre-define all models prior to training and exploit the usage of the TensorFlow Keras functional API, ensuring that layers are shared across the models. This approach requires defining, for each resolution of the discriminator and generator, two models: one named ''straight-through'' and the other one named ''fade-in''. The latter, as the name suggests, implements the fade-in mechanism and so is used to transition from a lower resolution ''straight-through'' model to a higher one. The ''straight-through'' version has a plain architecture and its purpose is to fine-tuning all the layers for a given resolution.
4,650
edits

Navigation menu