Joke Collection Website - Mood Talk - Mode Interpretation of Five Basic Structures of Calendar Trunk

Mode Interpretation of Five Basic Structures of Calendar Trunk

"There are three AIs" that first appeared in the official account of WeChat WeChat.

Generating antagonistic network is the biggest progress in the field of unsupervised learning in recent years, and it is known as the next generation of deep learning. Both the research enthusiasm and the number of papers have approached or even surpassed the traditional discriminant CNN architecture.

This paper briefly introduces the mainstream model structure of generating countermeasure network, from one generator and one discriminator to multiple generators and multiple discriminators.

Author | has three words

Edit | has three words

In this issue, we are not going to talk about GAN from the beginning, so if you don't have the relevant foundation, you can take a look at our introduction to GAN in the last issue.

There are three theories in Technical Review: GANs (I)?

This is the structure of the basic GAN used to generate images.

A generator is a generator that inputs noise and outputs the generated image. Usually noise is a one-dimensional vector, which is shaped into a two-dimensional image, and then several deconvolution layers are used to learn upsampling.

For example, the full-rolling DC- Gan model [1], the input vector is 1* 100, and then it is shaped into a tensor of 4*4* 1024 through a fully connected layer learning, and then it is generated through four up-sampling deconvolution networks.

Discrimator is an ordinary CNN classifier, which inputs real samples or generates false samples for classification, and it is also the four convolution layers in DCGAN.

Advantages of using multiple discriminators [2] bring advantages similar to boosting. Training a good discriminator will damage the performance of the generator, which is a big problem facing g an. If we can train a few discriminators that are not so strong, and then boost them, we can achieve good results, and even apply the dropout technique.

Multiple discriminators can also work together, for example, in image classification, one performs coarse-grained classification and the other performs fine-grained classification. In voice tasks, they are used to handle different channels.

Generally speaking, the task of generator is more difficult than that of discriminator, because it needs to fit the probability density of data, and discriminator only needs to discriminate, which leads to a problem that affects the performance of GAN, that is, pattern collapse, that is, generating highly similar samples.

Using multiple generators and a single discriminator can effectively alleviate this problem.

As can be seen from the above structure, multiple generators adopt the same structure and enjoy the weight in the shallow layer of the network.

In the semi-supervised image classification task using GAN, the discriminator needs to play two roles at the same time, namely, discriminating false samples and predicting classification, which puts forward higher requirements for the discriminator. By adding a classifier, the workload of the discriminator can be shared, that is, the task of capturing the conditional distribution of samples and labels is given to the generator and classifier, while the discriminator only focuses on distinguishing the real samples from the generated samples.

This structure is represented by a triple-generated countermeasure network, and the following figure shows its network structure.

There are several generators and discriminators.

5. 1 cascade structure [5]

In the early days, the images generated by the network represented by DCGAN had low resolution and poor quality. They are all less than 100× 100, about 32×32 or 64×64. This is because it is difficult to generate high-resolution samples at one time and the convergence process is easy to be unstable.

Similar problems exist in image segmentation and target detection. In target detection, cascade network is widely used, that is, the performance of detector is improved from coarse to fine. When upsampling in image segmentation, we also adopt the method of learning small magnification instead of large magnification. For example, using two upsampling instead of one quadruple upsampling can not only enhance the expression ability of the network, but also reduce the learning difficulty.

Based on this, pyramid GAN structure is proposed and widely used. It refers to the pyramid structure in the image field to generate images step by step from coarse to fine, and adds residuals for learning.

The above picture shows its structure, starting from low-resolution z3, rising step by step, and finally generating I0, which is a pyramid-shaped structure. The following symbols are mostly replaced by pictures.

5.2 Parallel and Cyclic Structure [6]

One of the major applications of GAN is stylization, which realizes the style exchange between two domains, and CycleGAN[6] is a typical representative. It contains multiple generators and multiple discriminators. The typical structure of the loop is as follows:

X and y represent images of two domains respectively. It can be seen that there are two generators G and F, which are used to generate X to Y and Y to X, respectively, including two discriminators, namely Dx and d Y. Moreover, the loss itself also increases a circular loss, and interested readers can read the article carefully.

In addition, multi-discriminator, multi-generator and multi-structure are often used for cross-domain learning to learn different fields respectively. Moreover, the discriminator and generator in each domain usually enjoy some weights, as shown in the following figure, which is the network structure of CoGAN [7].

In addition, there are some scattered structures, such as 3D GAN and RNN GAN, which are variants of the above categories and will not be introduced in a unified way.

[1] Radford A, Metz L, Chintala S, et al. Unsupervised representation learning based on deep convolution to generate countermeasure networks [J]. International Conference on Learning Representation, 20 16.

[2] Duruka, Jin Pu, Mahadevan, et al. Generative Multi-opponent Network [J]. International Conference on Learning Representation, 20 17.

Ghosh A, Kulharia V, Namboodiri V P, et al. Multi-agent diverse generation of countermeasure networks [J]. Computer Vision and Pattern Recognition, 2018: 8513-8521.

Chong Xuan L I I, Xu T, Zhu J, et al. Triple Generation Countermeasure Network [J]. Neural Information Processing System, 20 17: 4088-4098.

Denton e l, Chintala S, Szlam A, et al. deeply generated image model based on laplacian pyramid [J]. neural information processing system, 2015:1486-1494.

Zhu Jun, Park T, Isola P, et al. Image-to-image Translation Based on Cyclic Consistent Antagonism Network [J]. International Conference on Computer Vision, 20 17: 2242-225 1.

Liu M, Tuzel O. Coupled generative antagonism network [J]. Neural Information Processing System, 20 16: 469-477.

Complete catalogue of the series:

Model interpretation From LeNet to VGG, look at the network structure of convolution+aggregation series.

Do you understand the model interpretation of "Network in Network" 1* 1 convolution?

Do you understand the model interpretation of the inception structure in GoogLeNet?

Model interpretation on mobile terminal benchmark model MobileNets

Where did the pattern interpretation co-ordinate go?

Are you sure you really understand the model explanation of the remaining connections in resnet?

Model explanation of "irregular" convolutional neural network

What are the advantages of model interpretation of "fully connected" convolutional networks?

Explanation of Neural Network Model from "Local Connection" to "Complete Connection"

Can a model explain that a deep learning network can only have one input?

What is the difference between the model interpretation from 2D convolution to 3D convolution?

Interpretation and Analysis of Patterns from RNN to LSTM