CN109902824A

CN109902824A - It is a kind of to generate confrontation network method with self adaptive control learning improvement

Info

Publication number: CN109902824A
Application number: CN201910157777.1A
Authority: CN
Inventors: 金日泽; 马晓寒; 白准永; 孙庆雅; 郑泰善
Original assignee: Tianjin Polytechnic University
Current assignee: Tianjin Polytechnic University
Priority date: 2019-03-02
Filing date: 2019-03-02
Publication date: 2019-06-18

Abstract

The invention belongs to fight network technique field, it discloses a kind of generated with self adaptive control learning improvement and fights network method (GAN), adaptive hyper parameter learning process suitable for GAN, to improve the training stability of different data collection, to guarantee the quality of generation data (such as image, text).This is that classification and the training process of the relative complex data set of mode is instructed to realize by the well-trained learning curve obtained under classification and the relatively simple data set of mode；The present invention, which is also analyzed, fights network (Ak-GAN) model with the adaptive generation of multilayer perceptron (MLP) and depth convolutional neural networks (DCGAN) framework.The stability of general GAN training can be improved in the present invention really, and can be generalized to various improved models and data set well；For following work, planning analysis preferably generates sample measurement, this may encourage the convergence of GAN；Wish to analyze effect of the self-adaptive controlled making mechanism proposed in GAN Multimodal Learning.

Description

It is a kind of to generate confrontation network method with self adaptive control learning improvement

Technical field

The invention belongs to fight network technique field more particularly to a kind of generated with self adaptive control learning improvement to fight Network method.

Background technique

The sample of various applications can effectively be synthesized by generating confrontation network (GAN), such as image generation, industrial design, voice Synthesis and natural language processing.The target of GAN is alternately trained Maker model G and arbiter model D.The G and D of GAN is logical Often selection uses multi-layer perception (MLP) (MLP) or convolutional neural networks (CNN).It is true to simulate that generator receives one group of noise priori Real data distribution, the distribution simulated is called generation data distribution, and arbiter is trained to extract the area of truthful data Other feature.More specifically, arbiter judges that a data are the probability from truthful data distribution；, output probability and really mark The error of label is for instructing the parameter of arbiter and generator network to update.The process alternately, until D cannot distinguish between very Reality and generated data.

In practice, GAN is well-known with its nonsteady behavior (unstability) and mode crash issue, especially right When data acquisition system comprising various visual objects is trained.Two main problems to complicate the issue are: 1) training Generator and arbiter model capability is unbalanced in journey, this hinders generator effectively learn truthful data to be distributed；2) lacking can It calculates, interpretable convergence.

Recently, some domestic and international researchs or invention concentrate on how improving the stability of original GAN, the plan that they take Heuristic is slightly mainly used, is distributed by simplifying training truthful data early period, gradually instructs the training of G, such as scheming As generating field, GAN is allowed first to try to train perhaps lesser image then sharpening or the expansion gradually of unsharp image The size of image.This heuristic learning methods can stable training to a certain extent.But it is this by obscure become gradually it is clear or It does not depend on the equilibrium growth (competition) of G and D ability by the small process to become larger gradually, and is to rely on one and pre-defines Asymptotic Functions.This method is extremely sensitive to different data sets and GAN model variant, it is difficult to be applied to true application scenarios. Different from existing research or invention, the invention proposes one kind based on to data collection adaptive, dynamically adjusts hyper parameter Come guide the method for generator and arbiter study course solve GAN training instability problem.

GAN is easy to train successfully on simple data collection, i.e. the growth or reduction that the ability of G and D can be balanced, final G reach The purpose for generating photorealism has been arrived, and D finally cannot accurately distinguish true picture and generate the difference of image.This project mentions The method that ginseng is adaptively adjusted based on a reference value out is established in such a observation: on simple data collection the normal GAN of training it Two probability output value (P of middle D_rAnd P_g) present with the similar trend of Fig. 6 (a) and trend, while the loss function of G and D is defeated It is worth (L out_GAnd L_D) present and the similar trend of Fig. 6 (b) and trend.And Fig. 6 (c) illustrates the GAN training trend an of failure Figure.

Based on this observation, this project is intended that the two groups of curves trained and obtained on MNIST with GAN as reference line: Two probability output value (P of one group of displaying D_mrAnd P_mg)；Loss function value (the L of another group of displaying G and D_mGAnd L_mD).This two groups A reference value curve negotiating combination control algolithm instructs training process of the GAN on complicated image.

In conclusion problem of the existing technology is as follows:

Traditional GAN is unstable and mode is collapsed, and especially carries out to the data acquisition system comprising Various Complex visual object When training；

In the prior art, two main problems to complicate the issue are: generator and arbiter in the training process Model capability unbalanced growth, which prevent effective study of generator；Lack the convergence that can be calculated, can be explained.

Practical significance of the invention:

Adaptive GAN training method can expand the application field of GAN and promote the process of GAN application landing.Specific table Present the following aspects: 1) promotion of model performance, model can preferably learn to truthful data to be distributed, when content creation It is more life-like；2) adaptive, interpretable artificial intelligence technology, the deep neural network of today are the artificial intelligence skill of representative Art lacks versatility and the adaptivity to data and environment, the adaptive GAN training to different data collection of this project concern Method can promote the adaptation to data in a certain range and provide theory support for general artificial intelligence technology.

Summary of the invention

In view of the problems of the existing technology, confrontation is generated with self adaptive control learning improvement the present invention provides a kind of Network method.

The invention is realized in this way a kind of generate confrontation network method with self adaptive control learning improvement, it is described with certainly Suitable solution learning improvement generate confrontation network method the following steps are included:

Step 1: the training on relatively simple data set using any GAN model, until convergence.When record convergence Batch number (such as 500 times)

Step 2: the penalty values or decision probability value of first number to convergence all generations of batch number interval are exported, and These values are fitted by curve-fitting method, the value after fitting is as benchmark penalty values (L_mGAnd L_mD) or baseline probability It is worth (P_mrAnd P_mg)

Step 3: the training GAN on relative complex data set.In training process every c batch number (iteration coefficient) it Relatively and calculate current output valve (L afterwards_G、L_D、P_rAnd P_gOne of four) with the difference of a reference value, if the difference is more than pre- The threshold value (α) being first arranged, then training G and D after being optionally adjusted to k value；If the difference is in threshold value, not to k value It is any change, training G and D.

Step 4: operating procedure three repeatedly, until being finished compared with all a reference values.

Further, use the ratio k of the frequency of training of D and G as control variable；Dynamic adjustment k value meets different data Constraint during collection training；K is controlled using following two inequality constraints；

Wherein P_gIt is that D will generate the probability that data classification is truthful data；P_mgBe on MNIST training obtain it is general Rate instructs P_gThe a reference value of value；L_DIt is the penalty values of D in current training data；L_mDIt is the benchmark loss of the training on MNIST Value；α is threshold value predetermined.

Symbol definition

Further, include: based on probability/penalty values self-adaptation control method

Implement the self adaptive control learning improvement generation confrontation network another object of the present invention is to provide a kind of The self adaptive control learning improvement of method generates confrontation network control system.

Confrontation network computer journey is generated with self adaptive control learning improvement another object of the present invention is to provide a kind of Sequence, it is described to be generated described in confrontation network computer program implementation claim 1-3 any one with self adaptive control learning improvement With self adaptive control learning improvement generate confrontation network method.

Another object of the present invention is to provide a kind of terminal, the terminal carrying is described to be learnt with self adaptive control Improve the controller for generating confrontation network method.

Another object of the present invention is to provide a kind of computer readable storage mediums, including instruction, when it is in computer When upper operation, so that computer executes described being generated with self adaptive control learning improvement and fights network method.

Due to P_gOr L_DStandard learning curve show identical trend in the training process, can be used above-mentioned based on P_g Self-adaptation control method handle them.In the algorithm 1 proposed, when k be greater than 1 when, it means that D be trained to k step and G is trained to a step.If current P_gOr L_DMore than the tolerance that threshold alpha defines, D is walked by (being indicated by k) based on following one Increase or decrease 1.As a result case: overgaugeOr minus deviationIf it is the former, k is increased by 1；If it is the latter, k is reduced 1.When k is 1, Indicate that G and D is alternately trained with a step respectively.If it is overgauge, increase D step (k=k+1) by making k be greater than 1；If it is Minus deviation, by walking k less than 1 (k=1/ (k+1)) Lai Zengjia G.When k is less than 1, G training 1/k step, mono- step of D.If just Deviation, then G step reduces 1；Otherwise, increase by 1.

Advantages of the present invention and good effect are as follows:

The invention proposes a kind of adaptive hyper parameter learning processes suitable for GAN, to improve the instruction of different data collection Practice stability, to guarantee the quality of generation image.This is obtained by using under relatively easy, single data set Well-trained learning curve (reference line) realized to instruct the training process on different complexity contextual data collection. The invention also provides the adaptive GAN models with MLP and DCGAN framework, and have carried out a variety of confirmatory experiments, as a result table Bright, the stability of original GAN training can be improved in this method really, and can be generalized to well various GAN mutation models and The data set of different scenes.For following work, planning analysis preferably generates sample measurement, this may encourage GAN's Convergence.Further, it is desirable to effect of the self-adaptive controlled making mechanism proposed in GAN Multimodal Learning is analyzed, it is available to extend it Range.

The present invention improves traditional GAN in detail below:

Due to the Non-balanced Growth of D and G ability in training process, traditional GAN training is difficult to converge to suitable data life At ability, the data on certain data sets (the especially more complex data set of application scenarios) is caused to generate ineffective, this hair Performance can be improved in the bright training step or intensity by being adaptively adjusted D and G of clear proof；

Based on this observation, a kind of self adaptive control learning algorithm (generated new mould for GAN model is proposed Type is called Ak-GAN), dynamic adjusts the ratio (k value) of the training step of generator and arbiter；

Compared with providing the performance between Ak-GAN and tradition GAN, the algorithm is shown in terms of generating picture quality Superiority.

Symbol definition

Comparative experimental data is as follows:

The similarity generated between data distribution and truthful data distribution has used the common index Inception of industry Score, calculation formula are as follows:

X is that generator generates data distribution (p_g) sampled data, under conditions of p (y | x) is a given sampled data Category distribution, p (y) are the data distributions of classification y.D_KL(p (y | x) | | p (y)) KL distance between measurement p (y | x) and p (y) (KL-divergence).Inception Score can quantitatively measure two performance indicators of generator: the image of generation Clear and legible object must be contained, i.e. the entropy of p (y | x) wants low；The image that generator generates simultaneously must satisfy diversity, I.e. the entropy of p (y) wants high.When above-mentioned two index value of an image is bigger, its Inception Score obtained is bigger, That is the KL distance of p (y | x) and p (y) is bigger.

Following table compares the performance evaluating of the present invention with tradition GAN method, and each evaluation and test score is 50,000 image institutes Goals for is averaged.Data set has used Anime and CelebA, and generator and arbiter use MLP framework.The present invention mentions Method out has all surmounted conventional method on both data sets.

Traditional GAN and proposed by the present invention model (Ak-GAN) performance (Inception of the table 1. based on MLP framework Scores it) evaluates and tests

Traditional GAN and proposed by the present invention model (Ak-GAN) performance (Inception of the table 2. based on CNN framework Scores it) evaluates and tests

Detailed description of the invention

Fig. 1 is that being generated with self adaptive control learning improvement for implementation offer of the invention fights network method flow chart.

Fig. 2 is that the present invention implements the original GAN provided to the influence diagram of the data collection with different diversity levels.

Fig. 3 is the learning curve figure that the present invention implements the traditional GAN model provided.

Fig. 4 is that the present invention implements the comparison of the original GAN (MLP) and Ak-GAN (MLP) of offer on animation data collection (animation face is derived from the image plate website of cartoon wallpaper, and cutting image only includes face.These image sizes are 96x This data set forms figure by 51,223 color images).

Fig. 5 is the comparison that the present invention implements original GAN (DCGAN) and Ak-GAN (DCGAN) on the animation data collection provided Figure.

Fig. 6 is that the present invention implements the GAN training curve figure provided；

(a), the successful GAN probability value curve graph of training: p_real is the probability that D exports true picture, p_ Generated is D to the probability graph for generating image output；(b), the successful GAN penalty values curve graph of training: D_Loss is D Penalty values, G_Loss are the penalty values figures of G；(c), the GAN probability curve diagram of failure to train.

Fig. 7 is that the present invention implements generation image corresponding to 1 score of table provided；

Fig. 8 is that the present invention implements generation image corresponding to 2 score of table provided.

Specific embodiment

In order to make the objectives, technical solutions, and advantages of the present invention clearer, with reference to embodiments, to this hair It is bright to be further elaborated.It should be appreciated that the specific embodiments described herein are merely illustrative of the present invention, not For limiting the present invention.

Traditional GAN is unstable and mode is collapsed, and is especially trained to the data acquisition system comprising various visual objects When；

In the prior art, two main problems to complicate the issue are: generator and arbiter in the training process Model capacity unbalanced growth, which prevent effective study of generator；Lack the convergence that can be calculated, can be explained.

To solve problem of the prior art, application principle of the invention is described in detail below with reference to concrete scheme.

As shown in Figure 1, provided in an embodiment of the present invention include with self adaptive control learning improvement generation confrontation network method Following steps:

S101, the training on relatively simple data set using any GAN model, until convergence.Batch when record convergence Number (such as 500 times)

S102, exports the penalty values or decision probability value of first number to convergence all generations of batch number interval, and leads to It crosses curve-fitting method to be fitted these values, the value after fitting is as benchmark penalty values (L_mGAnd L_mD) or baseline probability value (P_mrAnd P_mg)

S103, the training GAN on relative complex data set.In training process after every c batch number (iteration coefficient) Relatively and calculate current output valve (L_G、L_D、P_rAnd P_gOne of four) with the difference of a reference value, if the difference is more than preparatory The threshold value (α) of setting, then training G and D after being optionally adjusted to k value；If the difference in threshold value, is not done k value Any change, training G and D.

S104 runs S103 repeatedly, until finishing compared with all a reference values.

It is provided by the invention to be based on P as the preferred embodiment of the present invention_gOr L_DSelf-adaptation control method it is as follows:

Due to P_gOr L_DStandard learning curve show identical trend in the training process, use identical method come Control them.In the algorithm 1 proposed, when k is greater than 1, it means that D is trained to k step and G is trained to a step.If worked as Preceding P_gOr L_DMore than the tolerance that threshold alpha defines, D step (being indicated by k) is increased or decreased by 1 result case based on following one Example: overgaugeOr minus deviationIf it is The former, increases by 1 for k；If it is the latter, k is reduced 1.When k is 1, indicate that G and D is alternately trained with a step respectively.If it is Overgauge increases D step (k=k+1) by making k be greater than 1；If it is minus deviation, by make k less than 1 (k=1 (k+1)) come Increase G step.When k is less than 1, G training 1/k step, mono- step of D.If overgauge, G step reduces 1；Otherwise, increase by 1.

The application principle of concrete scheme to further describe the present invention is made the present invention below with reference to embodiment further Description.

Embodiment:

1), confrontation generates network

GAN estimates generation model by antagonistic process, and wherein generator G and arbiter D plays chess.The input of D comes From two data distributions: truthful data and generated data, the latter are generated by G.D is trained to can be complete by sample to maximize it It is classified as true or synthesis entirely, and G can reduce the ability that D tells generated data then by training to the maximum extent.? In original frame, training objective is defined as minimax method problem:

Wherein G is the data distribution P that input noise variable Pz (z) is mapped to generation_gFunction, and D is one and will count According to the function of space reflection to scalar value, wherein each value indicates the probability that specific sample is distributed from real data.Function G GAN network is constituted with D, the neural network model that they are usually, and be trained to simultaneously on its objective function.G's and D Loss function L_GAnd L_DIt is defined as foloows:

Wherein θ_DAnd θ_GIt is the parameter set of D and G.x⁽ⁱ⁾Indicate real data.G(z⁽ⁱ⁾) it is to receive random noise z by G⁽ⁱ⁾Afterwards The generated data of generation.So P (x⁽ⁱ⁾；θ_D) it is the probability that truthful data x is classified as truthful data by D.P(G(z⁽ⁱ⁾)；θ_D) be D is by the data G (z of generation⁽ⁱ⁾) it is classified as the probability of truthful data.Hereinafter referred to as P_rAnd P_g。

2), GAN is trained in terms of image synthesis to comprising the other image collection of opposite unitary class.These include MNIST handwritten numeral, bedroom scene and simple animation personage, as shown in Fig. 2 (a) and Fig. 2 (b).However, for relative complex The image collection of classification, GAN generally can not generate satisfactory generation effect.For example, the training on CIFAR-10 data set When, the object identifiability of generation is poor, as shown in Fig. 2 (c) and Fig. 2 (d).

The convergence for balancing D and G is a sizable challenge.If one of them is trained excessively well, the training of GAN is just It can become unstable.In practice, D is usually by over training, because D is starting Shi Tairong under the conditions of complex data collection Easily win.

The training curve of MNIST and CIFAR-10 are tracked and have recorded, as shown in Figure 3.This diagram depicts in training process In probability (the respectively P that is distributed by the true and generated data of D estimation from truthful data_rAnd P_g) curve.Fig. 3 (a) is shown GAN training on MNIST data set reaches convergent curve graph.Observation it is found that start when, P_rAnd P_gBetween exist it is very big Gap, with trained progress, which is gradually become smaller, and finally converges on about probability value 0.5.The good stability of MNIST It is reflected in the suitable high quality of composograph, as shown in Fig. 2 (a).However, the bad visual effect of CIFAR-10 can lead to Unstable training curve is crossed to explain, as shown in Fig. 3 (b).

3), to the acclimatization training algorithm of GAN

A. basic ideas

The target of proposition method be instructed by using well-trained learning curve training on various data sets into Journey, to realize compellent performance.The mode of the well-trained learning curve of description acquisition first, then proposes a kind of calculation Method, the current value of algorithm constraint control variable is (for example, posterior probability P_rAnd P_gOr penalty values L_DAnd L_G) (instructed with a reference value Practice the probability or penalty values of learning curve being always or usually as specified) between difference.

Because original GAN training objective function (Objective Function) is defined as minimax method double person travelling Play problem, so determine that the convergence of G and D is very difficult because D and G minimize respective penalty values, it is such play chess Process cannot be guaranteed to be finally reached convergence.Compared with the numerical value at a certain moment, complete learning curve is typically to identify training process Whether benign effective means.Therefore, use above-mentioned learning curve as convergent benchmark metric.

For P_gAnd L_DIt proposes following inequality constraints, and uses the ratio (k) of frequency of training between D and G as only One control variable.Traditional GAN, which is realized, sets 1 for k, and the present invention takes the method for dynamic adjustment k value to meet different data Stability control during collection training.It is as follows to constrain inequality formula.

Wherein P_gIt is generator in the data measurement (probability) true to nature currently synthesized；P_mgIt is the training on MNIST data set Obtained normal probability；L_DIt is the penalty values of D in current training data；L_mDIt is the standard of the D in training on MNIST data set Penalty values；α is the threshold value for limiting current value and standard value departure degree.Note that α can be with manual setting.Abundant experimental results are aobvious Show that it is that comparison is reasonable that α, which is set as 0.2,.

B. it is based on P_gOr L_DSelf adaptive control

Due to P_gOr L_DStandard learning curve show identical trend in the training process, use identical method come Control them.In the algorithm 1 proposed, when k is greater than 1, it means that D is trained to k step and G is trained to a step.If worked as Preceding P_gOr L_DMore than the tolerance that threshold alpha defines, D step (being indicated by k) is increased or decreased by 1 result case based on following one Example: overgaugeOr minus deviationIf it is the former, k is increased 1；If it is the latter, k is reduced 1.When k is 1, indicate that G and D is alternately trained with a step respectively.If it is overgauge, pass through K is set to be greater than 1 to increase D step (k=k+1)；If it is minus deviation, by walking k less than 1 (k=1/ (k+ 1)) Lai Zengjia G.Work as k When less than 1, G training 1/k step, mono- step of D.If overgauge, G step reduces 1；Otherwise, increase by 1.

(Ak-GAN) of the invention is assessed by being applied to the GAN model with MLP and two kinds of CNN different frameworks.

In fig. 4 it is shown that using the experimental result of Ak-GAN based on MLP framework on different data sets.With tradition The sample that GAN model generates is compared, and adaptive control technology enables GAN to capture facial recognizable feature, such as face wheel Wide and eyes.But these samples also often generate noise.In addition, image sharpness still has improved space.Therefore, also by institute The self adaptive control program of proposition is applied to DCGAN framework, to further increase visual quality, then by animation and CelebA The result of data set is compared with traditional DCGAN.Fig. 5 is shown, compared with traditional DCGAN, the generation of Ak-GAN model has Higher-quality image simultaneously slows down mode collapse phenomenon.

The foregoing is merely illustrative of the preferred embodiments of the present invention, is not intended to limit the invention, all in essence of the invention Made any modifications, equivalent replacements, and improvements etc., should all be included in the protection scope of the present invention within mind and principle.

Claims

1. a kind of generate confrontation network method with self adaptive control learning improvement, which is characterized in that described with self adaptive control It practises improvement and generates the operation iteration ratio that confrontation network method is adaptively adjusted generation network and differentiation network, specific method packet Include following steps:

Step 1: the training on relatively simple data set using any GAN model, until convergence, batch when record is restrained Number；

Step 2: exporting the penalty values or decision probability value of first number to convergence all generations of batch number interval, and pass through Curve-fitting method is fitted these values, and the value after fitting is as benchmark penalty values or baseline probability value；Benchmark penalty values For L_mGAnd L_mD；Baseline probability value is P_mrAnd P_mg；

Step 3: the training GAN on relative complex data set, in training process after every c batch number relatively and calculating is worked as The difference of preceding output valve and a reference value optionally adjusts the instruction of G and D if the difference is more than pre-set threshold value (α) Practice iteration ratio k value and training；If the difference in threshold value, does not do any change to k value, by original k value training G and D；

2. generating confrontation network method with self adaptive control learning improvement as described in claim 1, which is characterized in that use D Ratio k with the training step of G is as control variable；The value of dynamic adjustment k meets the constraint during different data collection training；Make K is controlled with following two inequality constraints；

Wherein P_gIt is that D will generate the probability that data classification is truthful data；P_mgIt is the obtained probability of the training on MNIST, refers to Lead P_gThe a reference value of value；L_DIt is the penalty values of D in current training data；L_mDIt is the benchmark penalty values of the training on MNIST；α is Threshold value predetermined.

3. generating confrontation network method with self adaptive control learning improvement as described in claim 1, which is characterized in that based on general Rate/penalty values self-adaptation control method includes:

4. a kind of self adaptive control for implementing to generate confrontation network method described in claim 1 with self adaptive control learning improvement It practises improving and generates confrontation network control system.

5. a kind of generate confrontation network computer program with self adaptive control learning improvement, which is characterized in that described with adaptive Schistosomiasis control, which improves, to be generated described in confrontation network computer program implementation claim 1-3 any one with self adaptive control It practises improving and generates confrontation network method.

6. a kind of terminal, which is characterized in that the terminal, which is carried, uses self adaptive control described in claims 1 to 3 any one Learning improvement generates the controller of confrontation network method.

7. a kind of computer readable storage medium, including instruction, when run on a computer, so that computer is executed as weighed Benefit requires to generate confrontation network method with self adaptive control learning improvement described in 1-3 any one.