CN108171320A

CN108171320A - A kind of image area switching network and conversion method based on production confrontation network

Info

Publication number: CN108171320A
Application number: CN201711273921.5A
Authority: CN
Inventors: 肖锋; 白猛猛; 冯飞
Original assignee: Xian Technological University
Current assignee: Xian Technological University
Priority date: 2017-12-06
Filing date: 2017-12-06
Publication date: 2018-06-15
Anticipated expiration: 2037-12-06
Also published as: CN108171320B

Abstract

The invention discloses a kind of image area switching networks and conversion method based on production confrontation network, are mainly included the following steps that including U-shaped generation network, true and false discrimination natwork and pairing discrimination natwork, image area transfer process：1）, the U-shaped generation network of training, establish the network model of U-shaped generation network；2）Image to be converted is inputted after normalized through step 1）The network model of foundation completes the image area conversion of image to be converted；The present invention can realize the image area convert task of regional area in image, and image local domain conversion quality is high, and network judgement is strong, and the stability of image conversion is strong, substantially increases the authenticity of the image of generation.

Description

A kind of image area switching network and conversion method based on production confrontation network

Technical field

The present invention relates to image area switch technology fields more particularly to a kind of image area based on production confrontation network to turn Switching network and conversion method.

Background technology

Image area conversion is an important research direction in computer vision, is had broad application prospects.At present, it is right The appearance of anti-formula generation network (Generative Adversarial Networks, GAN) achieves in image generation field The achievement to attract people's attention, this also provides new solution for image area conversion.Network is fought using production, image is defeated Enter and generate the image of network generation aiming field, the training of network is completed based on the game between generation network and discrimination natwork. Production confrontation network is unsupervised learning method when initially proposing, by generating the game between network and discrimination natwork, Gradually learn the data distribution into training set, allow generation network according to learning so as to fulfill one random value of input Data distribution goes to be randomly generated data, the generation of earliest application, that is, image.Later, Conditional GAN are by giving The input of GAN increases artificial condition, its data generated is made no longer to be randomly generated, but is given birth to according to the condition difference of input Into different data, the proposition of Conditional GAN so that the conversion of image area can use production fight network into Row.

Later, the Conditional GAN appearance from original GAN is improved, the Conditional GAN studied at present lead to Cross increases artificial condition to the input of GAN, can be to specifically inputting the specific image data of generation, and is no longer to input immediately Data generation image data immediately.The it is proposed of Conditional GAN allows the conversion of image area to use production It fights network to carry out, i.e., input is original area image under its frame, and target area image can be exported by training study. The image area conversion GAN realized under the frame has：(1) pix2pix GAN, the method for using supervision, network base In one generate network and one confrontation discrimination natwork, solution be image whole domain convert task；(2) Cycle GAN make With unsupervised method, by two generation networks, two confrontation networks, recycled using cycle-consistency loss Training generation network and fight discrimination natwork realize image area conversion.Wherein, unsupervised method is although no longer need one The training data of one pairing, but conversion effect is worse than the pix2pix networks of supervision, and what is be directed to is still image whole domain Conversion, in the conversion of existing image area, for the domain convert task of regional area in image, there is no special GAN.

Invention content

The object of the present invention is to provide a kind of image area switching network and conversion method based on production confrontation network, energy Enough realize the image area convert task of regional area in image, and image local domain conversion quality is high, network judgement is strong, figure The stability of picture conversion is strong, substantially increases the authenticity of the image of generation.

The technical solution adopted by the present invention is：

It is a kind of based on production confrontation network image area switching network, including it is U-shaped generation network, true and false discrimination natwork and Discrimination natwork is matched, U-shaped generation network includes coding network and decoding network, the input terminal access input picture of coding network Input, the input terminal of the output terminal connection decoding network of coding network, the output terminal output network generation image of decoding network Output；If it is target area image target with the true target area image that input picture Input is matched one by one；Network generates Image Output inputs the negative sample input terminal of true and false discrimination natwork, aiming field figure as the training negative sample of true and false discrimination natwork As positive sample input terminals of the target as the true and false discrimination natwork of training positive sample input of true and false discrimination natwork, true and false discriminating net The value of network output feeds back to the true and false loss input terminal of decoding network as true and false penalty values；Network generates image Output and right Answer negative sample input terminals of the input picture Input as training negative sample input pairing discrimination natwork, target area image target With positive sample input terminals of the corresponding input picture Input as training positive sample input pairing discrimination natwork, discrimination natwork is matched Input terminal is lost in the pairing that the value of output feeds back to decoding network as pairing penalty values；Network is generated into image Output and mesh Structural similarity value feeds back to the compensation loss input terminal of decoding network as compensation penalty values between mark area image target.

The coding network includes eight layers of convolutional network, and the convolution kernel size of every layer of convolutional network is 3*3, step-length 2*2, Every layer of convolutional network includes convolutional layer, Normalization layers of Batch and Leak ReLU active coatings, Leak ReLU activation The alpha parameter of layer is 0.2；The decoding network includes eight layers of deconvolution network, and the deconvolution core size of every layer of deconvolution network is 3*3, step-length 2*2, every layer of deconvolution network include warp lamination, Normalization layers of Batch and active coating, first to The active coating of layer 7 deconvolution network uses ReLU active coatings, and the active coating of the 8th layer of deconvolution network is activated using tanh Layer.

The true and false discrimination natwork includes the true and false discriminating convolutional network that multilayer is transmitted successively, every layer of true and false discriminating convolution net Network includes convolutional layer, Normalization layers of Batch and active coating, last layer of true and false active coating for differentiating convolutional network Using Sigmoid activation primitives, other layer of true and false active coating for differentiating convolutional network uses ReLU functions.

The pairing discrimination natwork includes the Concat layers transmitted successively and multilayer pairing differentiates convolutional network, every layer of pairing Differentiate that convolutional network includes convolutional layer, Normalization layers of Batch and active coating, the pairing of last layer differentiates convolution net The active coating of network uses Sigmoid activation primitives, and other layers of pairing differentiate that the active coating of convolutional network uses ReLU functions.

A kind of image area conversion method based on production confrontation network, includes the following steps：

1), the U-shaped generation network of training establishes the network model of U-shaped generation network；Specifically include following steps：

A, the training image collection in domain to be converted is collected, training image concentrates the original area image for including matching one by one and target Area image, the original area image concentrated to training image is normalized, defeated when the image after normalization is network training Enter image Input, it is target area image target corresponding with input picture Input that training image, which concentrates aiming field image,；

B, input picture Input obtained by step A is converted into the network generation figure of training network through U-shaped generation network As Output；

C, image is generated using input picture Input, target area image target and network obtained by step A and step B Output carries out the training of multipair anti-discrimination natwork：The training of multipair anti-discrimination natwork includes the training of true and false discrimination natwork and matches Training to discrimination natwork, wherein, the training of true and false discrimination natwork includes the following steps：

C11：The network weight of true and false discrimination natwork is initialized using random initializtion method；

C12：Network is generated into image Output as negative sample, target area image corresponding with input picture Input Target is trained in true and false discrimination natwork as positive sample, using intersecting Mutual information entropy loss function and adam is excellent Change the network weight that algorithm updates true and false discrimination natwork；

The training of pairing discrimination natwork includes the following steps：

C21：The network weight for matching discrimination natwork is initialized using random initializtion method：

C22：Network is generated into image Output and corresponding input picture Input as negative sample, by input picture Input and corresponding target area image target is trained in discrimination natwork is matched as positive sample, uses intersection mutual trust Cease the network weight of entropy loss function and adam optimization algorithms update pairing discrimination natwork；

D, step C is repeated, after anti-discrimination natwork training multipair twice, fixed true and false discrimination natwork and pairing differentiate net The network weight of network；

E, using the multipair anti-discrimination natwork of the gained after step D training, U-shaped generation network is trained；Specifically Include the following steps：

E1：The network weight of U-shaped generation network is initialized using Ha Weier random initializtions method；

E2：Network generation image Output is inputted into true and false discrimination natwork, true and false discrimination natwork exports true and false penalty values, and The true and false penalty values of output are fed back into the decoding network in U-shaped generation network, for updating its network weight：True and false discriminating net Network exports 30*30*1 images, and image Output is generated close to the penalty values of true picture for returning to network, wherein, true and false mirror Ranging from 0 to the 1 of each image pixel point value of other network output, pixel point value represents input picture Input at this closer to 1 Pixel receptive field region represents input picture Input in the pixel sense closer to true image, pixel point value closer to 0 True image is more kept off by wild region；

E3：By input picture Input and corresponding network generation image Output input pairing discrimination natworks, pairing differentiates Network output pairing penalty values, and the pairing penalty values of output are fed back into the decoding network in U-shaped generation network, for updating Its network weight：Discrimination natwork output 30*30*1 images are matched, for returning to input picture Input and network generation image The penalty values whether Output is input picture Input with target area image target pairings；Wherein, pairing discrimination natwork output Each image pixel point value ranging from 0 to 1, closer to 1 expression input picture Input and network generation image Output get over Matching represents more to mismatch closer to 0；

E4：The structure similar values between network generation image Output and target area image target are calculated, and will The structural similarity numerical value calculated feeds back to the decoding network in U-shaped generation network as loss, for updating its network weight Weight；Structural similarity numerical value includes SSIM loss functions result of calculation and L1 regularization result of calculations, wherein, SSIM loss letters For number from SSIM algorithms, the output valve SSIM (x, y) of SSIM algorithms represents the similitude between two images, i.e. input picture Structural similarity between x and target area image y, ranging from-the 1 to 1 of SSIM (x, y), represent the similar of two images close to 1 Degree is higher, and as input picture x and target area image y striking resemblances, the value of SSIM (x, y) is equal to 1；

The calculation formula of SSIM algorithm output valves is as follows：

In formula (1), x is input picture Input, and y is target area image target corresponding with input picture Input, μ_xIt is the average value of x, μ_yIt is the average value of y,It is the variance of x,It is the variance of y, σ_xyIt is the covariance of x and y, c₁= (k₁L)²And c₂=(k₂L)²It is for maintaining stable constant, L is the dynamic range of pixel value, k₁=0.01,₂=0.03；

F, step C~E is the weight training of primary U-shaped generation network, repeats step C~E, completes U-shaped generation net twice After the weight training of network, that is, the training of U-shaped generation network is completed, gained generation network is the network model of U-shaped generation network； 2), image to be converted is inputted to the network model established through step 1) after normalized, you can complete figure to be converted The image area conversion of picture：Image after normalization is as input picture Input input steps 1) in establish network model in, compile Then the high dimensional feature of code network extraction input picture Input exports network by decoding network and generates image Output, output Network generation image Output be through the transformed target area image of image area.

The whole loss function of U-shaped generation network is in the step E：

L_GAN(G,D₁,D₂)=L_D1+λ₁L_D2+λ₂L_ssim+λ₃L₁ (2)

The whole loss optimized the function is needed to be in entire production confrontation network：

G^*=argmin_GANmax_D1max_D2(L_GAN(G,D₁,D₂)+L_D1+L_D2) (3)

In formula (2) and (3), Represent the true and false loss of true and false discrimination natwork output, Represent the pairing loss of pairing discrimination natwork output,It represents The damage SSIM losses that SSIM loss functions calculate,Represent L₁Regular terms loses, and x is represented Input picture Input, y represent target area image target, λ corresponding with input picture Input₁U-shaped life is accounted for for pairing loss The parameter of weight, λ into network whole loss₂Represent that SSIM losses account for the parameter of weight in U-shaped generation network whole loss, λ₃ Represent L₁The value of regular terms accounts for the parameter of weight in U-shaped generation network whole loss；

The training initial stage of U-shaped generation network, the ratio of true and false loss, pairing loss, SSIM losses and the loss of L1 regular terms It is 1:1:4:1, with the increase of network training number, the ratio of true and false loss, pairing loss, SSIM losses and the loss of L1 regular terms Example gradually becomes 1:1：0.5:1, i.e. SSSIM loss account for the parameter of weight in U-shaped generation network whole loss according to the total of setting Body frequency of training continuously decreases.

The Mutual information entropy loss function that intersects in the step C is the intersection Mutual information entropy loss function with smooth item； The formula of intersection Mutual information entropy loss function with smooth item is：

In formula (4), sizes of the i for batch, t_iFor the sample value of prediction, y_iFor true sample value, for the flat of addition Sliding item, value be chosen for 0.005.

The generating process of the network generation image Output includes the following steps：

A), by image normalization to be converted be 256*256*3 pixels image, using the image after normalization as input In image Input input coding networks, input picture Input by 8 layers of convolutional network in coding network, is finally exported successively Data be 1*1*1024 characteristic image；The convolution kernel size of every layer of convolutional network is 3*3, step-length 2*2 in coding network；

B) characteristic image of 1*1*1024 generated in step a), is inputted into decoding network, characteristic image is successively by solution 8 layers of deconvolution network of code network, while the characteristic image after every layer of convolutional network operation in step a) is input to identical number According to operation is carried out in the warp lamination of tensor size, ultimately generate complete network generation image Output, i.e. warp lamination Input not only has the characteristic image from last layer de-convolution operation, also corresponds to the convolution algorithm characteristic image of tensor size； Wherein, the deconvolution core size of every layer of deconvolution network is 3*3, step-length 2*2.

In the step b), in the characteristic image of the deconvolution network inputs of three first layers, rolled up by every layer in step a) Characteristic image after product network operations, which is input in the warp lamination of identical data tensor size, added in during operation Dropout is operated；Wherein, it is 0.2 that the parameter of Dropout operations, which uses, i.e., closes in two articulamentums 20% connection at random Node.

Using the sliding window form of convolution kernel, sliding window size is 7*7 for the calculating of SSIM algorithms in the step E4.

The present invention has the following advantages：

(1) network is generated by including the confrontation type of U-shaped generation network, pairing discrimination natwork and true and false discrimination natwork, built The network model of vertical U-shaped generation network, then the conversion of the image area for the network model realization topography for passing through foundation, are filled up Instantly it without the vacancy of the confrontation type generation network for topography's conversion, improves confrontation type generation network and turns in image area The use scope in field is changed, and improves the effect and unfailing performance of image area conversion；

(2) generation network network model training process in, using pairing discrimination natwork and true and false discrimination natwork it is more Confrontation mode carries out, in order to improve network training initial stage discrimination natwork judgement it is poor the problem of, increase SSIM loss letter Number, using SSIM algorithms calculate image similarity, and using its result of calculation as lose to more newly-generated network weight, Using the result of calculation of SSIM algorithms as loss, the problem of confrontation network initial stage antagonism is low can be made up so that generation Network can preferably restrain, and never obtain better image domain conversion effect；

(3) by using intersection Mutual information entropy loss function in the training process of multipair anti-discrimination natwork so that multipair The training of anti-discrimination natwork is more stablized, since traditional cross entropy loss function contains log operations, at the initial stage of counting loss Stage so that the fluctuation of loss is big, so that in the training stage it is possible that there is the result that loss function is 0 occurring, causes to instruct Experienced failure, by increasing smooth item, fluctuation when reducing trained, it is therefore prevented that trained failure scenarios occur, and improve depth The stability of degree confrontation network training；

(4) by being arranged in generation network, the input of warp lamination not only has the spy from last layer de-convolution operation Image is levied, also corresponds to the convolution algorithm characteristic image of tensor size so that the information of image is protected to the full extent It stays, more preferably, more completely preserves the characteristic information in original image, improve the effect and authenticity of image area conversion, Er Qie The three first layers convolutional layer of coding network adds in Dropout operations into the operation of corresponding warp lamination input feature vector image, effectively The image unification obtained after decoding is prevented, further improves image area conversion quality；

(3) by using Leak ReLU active coatings and by the parameter of Leak ReLU be 0.2, preferably remain generation net The information of network artwork image field, and network is made to retain residual information as far as possible in backpropagation, improve the complete of conversion image Whole degree, ensure that conversion effect；

Description of the drawings

Fig. 1 is the network structure of the present invention；

Fig. 2 is U-shaped generation network structure in Fig. 1；

Fig. 3 is the network structure of true and false discrimination natwork in Fig. 1；

Fig. 4 is the network structure that discrimination natwork is matched in Fig. 1；

Fig. 5 is U-shaped generation network training figure in Fig. 1；

Fig. 6 is the network training figure of true and false discrimination natwork in Fig. 1；

Fig. 7 is the network training figure that discrimination natwork is matched in Fig. 1.

Specific embodiment

For a better understanding of the present invention, technical scheme of the present invention is described further below in conjunction with the accompanying drawings.

As shown in Figure 1, the present invention includes U-shaped generation network U-net, true and false discrimination natwork D1-net and pairing discrimination natwork D2-net, further comprises structural similarity numerical computations part, and structural similarity numerical computations are calculated including SSIM loss functions Part and L1 regularizations part；U-shaped generation network U-net is used for the conversion of image area, and true and false discrimination natwork D1-net is to be used for Judge that network generates the whether true discriminators of image Output, discrimination natwork D2-net is for judging that network generates image The discriminator whether matched of Output and original image；

As shown in Fig. 2, U-shaped generation network U-net includes coding network F-net and decoding network G-net, coding network F- Net carries out image the characteristic pattern that convolution operation exports its higher-dimension, and decoding network G-net uses deconvolution network, by spy Sign figure carries out the generation that image is completed in deconvolution.

The output terminal connection decoding of input terminal access the input picture Input, coding network F-net of coding network F-net The input terminal of network G-net, the output terminal generation network generation image Output of decoding network G-net.

Coding network F-net includes eight layers of convolutional network, and the convolution kernel size of every layer of convolutional network is 3*3, step-length 2* 2, every layer of convolutional network includes convolutional layer, Normalization layers of Batch and Leak ReLU active coatings, due to generating net Network needs reservation original image domain information as possible and in order to which network is made to retain residual information as far as possible in backpropagation, Therefore the present invention activates the selection of every layer of activation primitive of coding network F-net in generation network using Leak ReLU Function, the wherein alpha parameter of Leak ReLU active coatings are 0.2.

Decoding network G-net includes eight layers of deconvolution network, and the deconvolution core size of every layer of deconvolution network is 3*3, is walked A length of 2*2, every layer of deconvolution network include warp lamination, Normalization layers of Batch and active coating, and first to layer 7 The active coating of deconvolution network uses ReLU active coatings, and the active coating of the 8th layer of deconvolution network uses tanh active coatings.

As shown in figure 3, true and false discrimination natwork D1-net is for differentiating whether the image of generation is true image, therefore its It inputs as a sub-picture, it is true and false to export；True and false discrimination natwork D1-net includes the true and false discriminating convolution net that multilayer is transmitted successively Network, every layer of true and false discriminating convolutional network include convolutional layer, Normalization layers of Batch and active coating.

ReLU activation primitives can effectively transmit residual error and can keep nonlinear fitting, and tanh activation primitive output valves are -1 To between 1, and the value of sigmoid activation primitives value output is convenient for being calculated with label between 0 to 1, therefore true and false discriminating net The excitation function used except remaining every layer of last layer of network D1-net is ReLU functions, last layer of output layer uses Sigmoid activation primitives, i.e., last layer true and false active coating for differentiating convolutional network is using Sigmoid activation primitives, other layers The true and false active coating for differentiating convolutional network uses ReLU functions.

The design of every layer of true and false convolution kernel for differentiating convolutional network follows the principle of small convolution kernel, using the volume of 3*3 Product core removes the true and false discriminating convolutional networks of 32*32*256 to the true and false every layer of step-length stride between differentiating convolutional network of 30*30*1 For 1 convolution algorithm, remaining step-length stride is 2.The generation of gradient diffusing phenomenon in order to prevent adds in every layer of network Enter Normalization layers of Bacth, while play the role of pond for 2 in view of the step-length of part layer convolutional calculation, because Pooling layers have been not added in this network.

Whether images of the pairing discrimination natwork D2-net for discriminating generation matches, therefore its input is with the image of input Two images；Pairing discrimination natwork D2-net includes the Concat layers transmitted successively and multilayer matches and differentiates convolutional network, every layer Pairing differentiates that convolutional network includes convolutional layer, Normalization layers of Batch and active coating, and the pairing of last layer differentiates volume For the active coating of product network using Sigmoid activation primitives, other layers of pairing differentiate that the active coating of convolutional network uses ReLU functions.

As shown in figure 4, pairing discrimination natwork D2-net is similar with the structure of true and false discrimination natwork D1-net, only in first layer Input when more a sub-pictures, i.e. tensor of the input of first layer for 256*256*6, it is similary that ReLU and sigmoid is selected to swash Function living, using Normalization layers of Bacth, and the loss function used equally carries smooth item with D1-net.

If it is target area image target with the true target area image that input picture Input is matched one by one；Network is given birth to The negative sample of true and false discrimination natwork D1-net is inputted as the training negative sample of true and false discrimination natwork D1-net into image Output Input terminal, target area image target input true and false discrimination natwork D1- as the training positive sample of true and false discrimination natwork D1-net The positive sample input terminal of net, the value of true and false discrimination natwork D1-net outputs feed back to decoding network G-net as true and false penalty values True and false loss input terminal；Network generates image Output and corresponding input picture Input as training negative sample input pairing The negative sample input terminal of discrimination natwork D2-net, target area image target and corresponding input picture Input are as the positive sample of training The positive sample input terminal of this input pairing discrimination natwork D2-net, the value of pairing discrimination natwork D2-net outputs are lost as pairing Value feeds back to the pairing loss input terminal of decoding network G-net；By network generation image Output and target area image target Between structural similarity numerical value as compensation penalty values feed back to decoding network G-net compensation lose input terminal, wherein, knot The calculating of structure similarity figure includes SSIM loss functions and L1 regularizations.

The invention also includes a kind of image area conversion methods based on production confrontation network, include the following steps：

1), the U-shaped generation network U-net of training establishes the network model of U-shaped generation network U-net；Specifically include following step Suddenly：

A, the training image collection in domain to be converted is collected, training image concentrates the original area image for including matching one by one and target Area image, the original domain image normalization that training image is concentrated are the image of 256*256*3 pixels, and the image after normalization is Input picture Input during for network training, it is target corresponding with input picture Input that training image, which concentrates aiming field image, Area image target；

B, input picture Input obtained by step A is converted into the network of training network through U-shaped generation network U-net Generate image Output；

C, image is generated using input picture Input, target area image target and network obtained by step A and step B Output carries out the training of multipair anti-discrimination natwork：The training of multipair anti-discrimination natwork includes the instruction of true and false discrimination natwork D1-net Practice and match the training of discrimination natwork D2-net；

As shown in fig. 6, the training of true and false discrimination natwork D1-net includes the following steps：

C11：The network weight of true and false discrimination natwork D1-net is initialized using random initializtion method；

C12：Network is generated into image Output as negative sample, target area image corresponding with input picture Input Target carries out two classification based trainings as positive sample in true and false discrimination natwork D1-net, mutual using the intersection with smooth item Comentropy loss function and adam optimization algorithms update the network weight of true and false discrimination natwork D1-net；

Intersecting Mutual information entropy can be larger in the incipient stage value of counting loss, while is led in the training stage it is possible that occurring 0 Failure to train is caused, increases fluctuation when smooth item reduces trained, it is therefore prevented that trained failure scenarios occur, improved with flat The intersection Mutual information entropy function of sliding item improves the stability of depth confrontation network training；

To be the label that positive and negative samples generate one-hot types, then according to last first in two classification based trainings 0 to 1 value of layer sigmoid activation primitive outputs calculates cross entropy using the SigmoidCrossEntropy for adding smooth item Loss finally updates true and false discrimination natwork D1-net weights according to the loss of feedback；

Wherein, the formula of the intersection Mutual information entropy loss function with smooth item is

In formula (1), sizes of the i for batch, t_iFor the sample value of prediction, y_iFor true sample value, for the flat of addition Sliding item, value be chosen for 0.005；

As shown in fig. 7, the training of pairing discrimination natwork D2-net includes the following steps：

C21：The network weight for matching discrimination natwork D2-net is initialized using random initializtion method：

C22：Network is generated into image Output and corresponding input picture Input as negative sample, by input picture Input and corresponding target area image target carries out the instruction of two classification as positive sample in discrimination natwork D2-net is matched Practice, use the intersection Mutual information entropy loss function with smooth item and adam optimization algorithms update pairing discrimination natwork D2-net Network weight；

D, step C is repeated, after anti-discrimination natwork training multipair twice, fixed true and false discrimination natwork D1-net and pairing The network weight of discrimination natwork D2-net：It is true using multiple training in order to which the training for making U-shaped generation network U-net is more stablized False discrimination natwork D1-net and pairing discrimination natwork D2-net, the strategy of the U-shaped generation network U-net of retraining；

E, using the multipair anti-discrimination natwork of the gained after step D training, U-shaped generation network U-net is trained； As shown in figure 5, the training of U-shaped generation network U-net includes the following steps：

E1：The network weight of U-shaped generation network U-net is initialized using Ha Weier random initializtions method；

E2：Network generation image Output is inputted into true and false discrimination natwork D1-net, true and false discrimination natwork D1-net outputs True and false penalty values, and the true and false penalty values of output are fed back into the decoding network G-net in U-shaped generation network U-net, for more Its new network weight：True and false discrimination natwork D1-net exports 30*30*1 images, is approached for returning to network generation image Output The penalty values of true picture, wherein, ranging from 0 to the 1 of each image pixel point value of true and false discrimination natwork D1-net outputs, as Vegetarian refreshments value closer to 1 represent input picture Input in the pixel receptive field region closer to true image, pixel point value is got over Close to 0 expression input picture Input true image is more kept off in the pixel receptive field region；

E3：By input picture Input and corresponding network generation image Output input pairing discrimination natwork D2-net, match Pairing penalty values are exported to discrimination natwork D2-net, and the pairing penalty values of output are fed back in U-shaped generation network U-net Decoding network G-net, for updating its network weight：Discrimination natwork D2-net output 30*30*1 images are matched, it is defeated for returning Enter image Input and network generation image Output whether be input picture Input and target area image target pairings damage Mistake value；Wherein, ranging from 0 to the 1 of each image pixel point value of pairing discrimination natwork D2-net outputs represents defeated closer to 1 Enter image Input and network generation image Output is more matched, represent more to mismatch closer to 0；

E4：The structural similarity numerical value between network generation image Output and target area image target is calculated, and The decoding network G-net in U-shaped generation network U-net is fed back to using the structural similarity numerical value calculated as loss, is used for Update its network weight；The calculating of structural similarity numerical value includes SSIM loss functions and L1 regularizations, wherein, SSIM losses Function derive from SSIM algorithms, be it is a kind of weigh two images similitude index, output valve SSIM (x, y) table of SSIM algorithms Show the similitude between two images, i.e. structural similarity between input picture x and target area image y, the model of SSIM (x, y) It is -1 to 1 to enclose, and represents that the similarity of two images is higher close to 1, as input picture x and target area image y striking resemblances, The value of SSIM (x, y) is equal to 1；Using SSIM as loss function, enable to generation network that can preferably restrain, so as to obtain Better image domain conversion effect；

The calculation formula of SSIM algorithm output valves is as follows：

In formula (2), x is input picture Input, and y is target area image target corresponding with input picture Input, μ_xIt is the average value of x, μ_yIt is the average value of y,It is the variance of x,It is the variance of y, σ_xyIt is the covariance of x and y, c₁= (k₁L)²With₂=(k₂L)²It is for maintaining stable constant, L is the dynamic range of pixel value, k₁=0.01,₂=0.03；

F, step C~E is the weight training of primary U-shaped generation network U-net, repeats step C~E, completes U-shaped life twice Into after the weight training of network U-net, that is, the training of U-shaped generation network U-net is completed, gained generation network is U-shaped generation The network model of network U-net；

However, needing to be converted to the form of sliding window when calculating two images SSIM values, cause to select different size of cunning The difference of window and parameter δ, obtained result have differences.The SSIM algorithms that Wang et.al et al. are initially proposed use Be 11*11 sliding window, the image generated however, as generation network in initial stage differs larger with target image, big ruler SSIM values between the two images that very little sliding window calculates are in close proximity to 0 in initial period numerical value so that loss can not have Effect passes back to generation network, leads to the failure to train of confrontation type generation network G AN.Given this problem and in view of network inputs It is the photo of 256*256 pixels, the calculating of SSIM algorithms is using the sliding window form of convolution kernel, the sliding window of selection in the final present invention Size is 7*7.

In the above-mentioned step E being trained to U-shaped generation network U-net, the whole loss of U-shaped generation network U-net Function is：

L_GAN(G,D₁,D₂)=L_D1+λ₁L_D2+λ₂L_ssim+λ₃L₁ (3)

G^*=argmin_GANmax_D1max_D2(L_GAN(G,D₁,D₂)+L_D1+L_D2) (4)

In formula (3) and (4), Represent the true and false loss of true and false discrimination natwork D1-net outputs,Represent pairing discrimination natwork D2- The pairing loss of net outputs,Represent the SSIM damages that SSIM loss functions calculate It loses,Represent L₁Regular terms loses, and x represents input picture Input, and y is represented and input Image Input corresponding target area image target, λ₁It accounts in the decoding network G-net of whole generation network and weighs for pairing loss The parameter of weight, λ₂Represent that SSIM losses account for the parameter of weight in the whole decoding network G-net for generating network, λ₃Represent L₁Canonical The value of item accounts for the parameter of weight in the whole decoding network G-net for generating network.

At the training initial stage of U-shaped generation network U-net, true and false loss, pairing loss, SSIM losses and the loss of L1 regular terms Ratio be 1:1:4:1, with the increase of network training number, true and false loss, pairing loss, SSIM losses and L1 regular terms damage The ratio of mistake gradually becomes 1:1：0.5:1, i.e. SSSIM loss account for it is U-shaped generation network U-net whole loss in weight parameter by It is continuously decreased according to the overall frequency of training of setting.

It can make when network training initial stage true and false discrimination natwork D1-net is low with pairing discrimination natwork D2-net distinguishing abilities With the residual error of the U-shaped generation network U-net of SSIM loss functions feedback, target area image can be effectively generated, when true and false mirror Other network D1-net and pairing discrimination natwork D2-net then reduces SSIM loss functions anti-after ability is constantly promoted in training Shared weight in feedback generation network residual error so that generate the residual error major part of network comes from true and false discrimination natwork D1-net The loss fed back with pairing discrimination natwork D2-net, generation more preferable than the effect of conventional images domain conversion method to obtain The more true effect of image.

2), by image to be converted by normalized, the pixel after normalization is 256*256, the figure after normalization The network model established as input through step 1), you can complete the image area conversion of image to be converted：Image after normalization is made For input picture Input input steps 1) in the network model established, coding network F-net extractions input picture Input's Then high dimensional feature exports network by decoding network G-net and generates image Output, the network generation image Output of output As through the transformed target area image of image area.

In the image area transfer process of U-shaped generation network, need image input coding network F-net first carrying out convolution Operation, then carries out deconvolution operation, realizes the conversion of image area again；But；Previous U-shaped generation network, artwork is in convolution The partial information of image is difficult to retain in the process, therefore, in order to preferably, more completely preserve the feature letter in original image Breath, the generating process of inventive network generation image Output include the following steps：

A), by image normalization to be converted be 256*256*3 pixels image, using the image after normalization as input In image Input input coding networks F-net, input picture Input is successively by 8 layers of convolution net in coding network F-net Network, the data finally exported are the characteristic image of 1*1*1024；The convolution kernel size of every layer of convolutional network in coding network F-net For 3*3, step-length 2*2；

B) characteristic image of 1*1*1024 generated in step a), is inputted into decoding network G-net, characteristic image passes through successively 8 layers of deconvolution network of decoding network G-net are crossed, while the characteristic image after every layer of convolutional network operation in step a) is inputted Operation is carried out into the warp lamination of identical data tensor size, ultimately generates complete network generation image Output, i.e., instead The input of convolutional layer not only has the characteristic image from last layer de-convolution operation, and the convolution algorithm for also corresponding to tensor size is special Levy image；Wherein, the deconvolution core size of every layer of deconvolution network is 3*3, step-length 2*2.

In the characteristic image of the deconvolution network inputs of three first layers, after by every layer of convolutional network operation in step a) Characteristic image, which is input in the warp lamination of identical data tensor size, to carry out adding in Dropout operations during operation；Its In, it is 0.2 that the parameter of Dropout operations, which uses, i.e., closes in two articulamentums 20% connecting node at random.

The input of warp lamination not only has the characteristic image from last layer de-convolution operation, also corresponds to tensor size Convolution algorithm characteristic image so that the information of image is retained to the full extent, more preferably, more completely preserves original image In characteristic information, improve the effect and authenticity of image area conversion, and the figure that decoding network G-net is obtained in order to prevent As unification, added in the three first layers convolutional layer of coding network F-net into the operation of corresponding warp lamination input feature vector image Dropout is operated, and is effectively prevent the image unification obtained after decoding, is further improved image area conversion quality.

Finally it should be noted that：The above embodiments are merely illustrative of the technical solutions of the present invention, rather than its limitations；Although The present invention is described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that, still may be used Either which part or all technical features are equal with modifying to the technical solution recorded in previous embodiment It replaces, and these modifications or replacement, the model for technical solution of the embodiment of the present invention that it does not separate the essence of the corresponding technical solution It encloses.

Claims

1. a kind of image area switching network based on production confrontation network, it is characterised in that：Including U-shaped generation network, true and false Discrimination natwork and pairing discrimination natwork, U-shaped generation network include coding network and decoding network, and the input terminal of coding network accesses Input picture Input, the input terminal of the output terminal connection decoding network of coding network, the output terminal output network life of decoding network Into image Output；If it is target area image target with the true target area image that input picture Input is matched one by one；Net Network generates the negative sample input terminal that image Output inputs true and false discrimination natwork as the training negative sample of true and false discrimination natwork, mesh The positive sample input terminal that area image target inputs true and false discrimination natwork as the training positive sample of true and false discrimination natwork is marked, it is true and false The value of discrimination natwork output feeds back to the true and false loss input terminal of decoding network as true and false penalty values；Network generates image Negative sample input terminals of the Output and corresponding input picture Input as training negative sample input pairing discrimination natwork, aiming field Positive sample input terminals of the image target and corresponding input picture Input as training positive sample input pairing discrimination natwork, matches Input terminal is lost in the pairing that decoding network is fed back to as pairing penalty values to the value that discrimination natwork exports；Network is generated into image Structural similarity value feeds back to the compensation damage of decoding network as compensation penalty values between Output and target area image target Lose input terminal.

2. the image area switching network according to claim 1 based on production confrontation network, it is characterised in that：The volume Code network includes eight layers of convolutional network, and the convolution kernel size of every layer of convolutional network is 3*3, step-length 2*2, and every layer of convolutional network is equal Including convolutional layer, Normalization layers of Batch and Leak ReLU active coatings, the alpha parameter of Leak ReLU active coatings is 0.2；The decoding network includes eight layers of deconvolution network, and the deconvolution core size of every layer of deconvolution network is 3*3, step-length 2* 2, every layer of deconvolution network includes warp lamination, BatchNormalization layers and active coating, and first to layer 7 deconvolution The active coating of network uses ReLU active coatings, and the active coating of the 8th layer of deconvolution network uses tanh active coatings.

3. the image area switching network according to claim 2 based on production confrontation network, it is characterised in that：It is described true False discrimination natwork includes the true and false discriminating convolutional network that multilayer is transmitted successively, and every layer of true and false discriminating convolutional network includes convolution Layer, Normalization layers of Batch and active coating, last layer of true and false active coating for differentiating convolutional network use Sigmoid Activation primitive, other layer of true and false active coating for differentiating convolutional network use ReLU functions.

4. the image area switching network according to claim 3 based on production confrontation network, it is characterised in that：It is described to match Include the Concat layers transmitted successively to discrimination natwork and multilayer pairing differentiates convolutional network, every layer of pairing differentiates that convolutional network is equal Including convolutional layer, Normalization layers of Batch and active coating, last layer of pairing differentiates that the active coating of convolutional network uses Sigmoid activation primitives, other layers of pairing differentiate that the active coating of convolutional network uses ReLU functions.

5. a kind of image area conversion method of the image area switching network based on production confrontation network described in claim 4, It is characterized in that：Include the following steps：

A, the training image collection in domain to be converted is collected, training image concentrates the original area image for including matching one by one and aiming field figure Picture, the original area image concentrated to training image are normalized, the input figure when image after normalization is network training As Input, it is target area image target corresponding with input picture Input that training image, which concentrates aiming field image,；

B, the networks for being converted into training network through U-shaped generation network of the input picture Input obtained by step A are generated into image Output；

C, input picture Input, target area image target obtained by step A and step B and network generation image Output are utilized Carry out the training of multipair anti-discrimination natwork：The training of multipair anti-discrimination natwork includes the training of true and false discrimination natwork and pairing differentiates The training of network, wherein, the training of true and false discrimination natwork includes the following steps：

C12：Network is generated into image Output as negative sample, target area image target corresponding with input picture Input It as positive sample, is trained in true and false discrimination natwork, uses intersection Mutual information entropy loss function and adam optimization algorithms Update the network weight of true and false discrimination natwork；

The training of pairing discrimination natwork includes the following steps：

C22：Using network generation image Output and corresponding input picture Input as negative sample, by input picture Input with Corresponding target area image target is trained in discrimination natwork is matched, is damaged using Mutual information entropy is intersected as positive sample Lose the network weight of function and adam optimization algorithms update pairing discrimination natwork；

D, step C is repeated, after anti-discrimination natwork training multipair twice, fixed true and false discrimination natwork and pairing discrimination natwork Network weight；

E, using the multipair anti-discrimination natwork of the gained after step D training, U-shaped generation network is trained；It specifically includes Following steps：

E2：Network generation image Output is inputted into true and false discrimination natwork, true and false discrimination natwork exports true and false penalty values, and will be defeated The true and false penalty values gone out feed back to the decoding network in U-shaped generation network, for updating its network weight：True and false discrimination natwork is defeated Go out 30*30*1 images, image Output is generated close to the penalty values of true picture for returning to network, wherein, true and false discriminating net Ranging from 0 to the 1 of each image pixel point value of network output, pixel point value represent input picture Input in the pixel closer to 1 Point receptive field region represents input picture Input in the pixel receptive field closer to true image, pixel point value closer to 0 More keep off true image in region；

E3：By input picture Input and corresponding network generation image Output input pairing discrimination natworks, discrimination natwork is matched Output pairing penalty values, and the pairing penalty values of output are fed back into the decoding network in U-shaped generation network, for updating its net Network weight：Discrimination natwork output 30*30*1 images are matched, for returning to input picture Input and network generation image Output The penalty values for whether being input picture Input with target area image target pairings；Wherein, the output of pairing discrimination natwork is each Ranging from 0 to the 1 of image pixel point value is more matched closer to 1 expression input picture Input and network generation image Output, It represents more to mismatch closer to 0；

E4：The structure similar values between network generation image Output and target area image target are calculated, and will be calculated The structural similarity numerical value gone out feeds back to the decoding network in U-shaped generation network as loss, for updating its network weight；Knot Structure similarity figure includes SSIM loss functions result of calculation and L1 regularization result of calculations, wherein, SSIM loss functions come Derived from SSIM algorithms, the output valve SSIM (x, y) of SSIM algorithms represents the similitude between two images, i.e. input picture x and Structural similarity between target area image y, ranging from-the 1 to 1 of SSIM (x, y), the similarity for representing two images close to 1 Higher, as input picture x and target area image y striking resemblances, the value of SSIM (x, y) is equal to 1；

The calculation formula of SSIM algorithm output valves is as follows：

In formula (1), x is input picture Input, and y is target area image target, μ corresponding with input picture Input_xIt is x Average value, μ_yIt is the average value of y,It is the variance of x,It is the variance of y, σ_xyIt is the covariance of x and y, c₁=(k₁L)²With c₂=(k₂L)²It is for maintaining stable constant, L is the dynamic range of pixel value, k₁=0.01, k₂=0.03；

F, step C~E is the weight training of primary U-shaped generation network, repeats step C~E, completes U-shaped generation network twice After weight training, that is, the training of U-shaped generation network is completed, gained generation network is the network model of U-shaped generation network；

2), image to be converted is inputted to the network model established through step 1) after normalized, you can complete to wait to turn Change the image area conversion of image：Image after normalization is as input picture Input input steps 1) in establish network model In, then the high dimensional feature of coding network extraction input picture Input exports network by decoding network and generates image Output, the network generation image Output of output is through the transformed target area image of image area.

6. the image area conversion method according to claim 5 based on production confrontation network, it is characterised in that：The step The whole loss function of U-shaped generation network is in rapid E：

L_GAN(G, D₁, D₂)=L_D1+λ₁L_D2+λ₂L_ssim+λ₃L₁ (2)

G^*=arg min_GANmax_D1max_D2(L_GAN(G, D₁, D₂)+L_D1+L_D2) (3)

In formula (2) and (3),Table Show the true and false loss of true and false discrimination natwork output, Represent the pairing loss of pairing discrimination natwork output,Table Show the damage SSIM losses that SSIM loss functions calculate,Represent L₁Regular terms loses, x Represent that input picture Input, y represent target area image target, λ corresponding with input picture Input₁U is accounted for for pairing loss The parameter of weight, λ in type generation network whole loss₂Represent that SSIM losses account for the ginseng of weight in U-shaped generation network whole loss Number, λ₃Represent L₁The value of regular terms accounts for the parameter of weight in U-shaped generation network whole loss；

At the training initial stage of U-shaped generation network, the ratio of true and false loss, pairing loss, SSIM losses and the loss of L1 regular terms is 1: 1 : 4: 1, with the increase of network training number, it is true and false loss, pairing loss, SSIM loss and L1 regular terms loss ratio by 1: 1: 0.5: 1 is faded to, i.e. the parameter that SSSIM losses account for weight in U-shaped generation network whole loss is instructed according to the totality of setting Practice number to continuously decrease.

7. the image area conversion method according to claim 5 based on production confrontation network, it is characterised in that：The step The Mutual information entropy loss function that intersects in rapid C is the intersection Mutual information entropy loss function with smooth item；Friendship with smooth item Fork Mutual information entropy loss function formula be：

In formula (4), sizes of the i for batch, t_iFor the sample value of prediction, y_iFor true sample value, EPS is the smooth of addition , the value of EPS is chosen for 0.005.

8. the image area conversion method according to claim 5 based on production confrontation network, it is characterised in that：Described The generating process of network generation image Output includes the following steps：

A), by image normalization to be converted be 256*256*3 pixels image, using the image after normalization as input picture In Input input coding networks, input picture Input is successively by 8 layers of convolutional network in coding network, the number finally exported According to the characteristic image for 1*1*1024；The convolution kernel size of every layer of convolutional network is 3*3, step-length 2*2 in coding network；

B) characteristic image of 1*1*1024 generated in step a), is inputted into decoding network, characteristic image is successively by decoding net 8 layers of deconvolution network of network, while the characteristic image after every layer of convolutional network operation in step a) is input to identical data It measures in the warp lamination of size and carries out operation, ultimately generate the input of complete network generation image Output, i.e. warp lamination Not only there is the characteristic image from last layer de-convolution operation, also correspond to the convolution algorithm characteristic image of tensor size；Wherein, The deconvolution core size of every layer of deconvolution network is 3*3, step-length 2*2.

9. the image area conversion method according to claim 8 based on production confrontation network, it is characterised in that：The step It is rapid b) in, in the characteristic image of the deconvolution network inputs of three first layers, after by every layer of convolutional network operation in step a) Characteristic image, which is input in the warp lamination of identical data tensor size, to carry out adding in Dropout operations during operation；Its In, it is 0.2 that the parameter of Dropout operations, which uses, i.e., closes in two articulamentums 20% connecting node at random.

10. the image area conversion method according to claim 5 based on production confrontation network, it is characterised in that：It is described Using the sliding window form of convolution kernel, sliding window size is 7*7 for the calculating of SSIM algorithms in step E4.