CN107330954A

CN107330954A - A kind of method based on attenuation network by sliding attribute manipulation image

Info

Publication number: CN107330954A
Application number: CN201710576667.XA
Authority: CN
Inventors: 夏春秋
Original assignee: Shenzhen Vision Technology Co Ltd
Current assignee: Shenzhen Vision Technology Co Ltd
Priority date: 2017-07-14
Filing date: 2017-07-14
Publication date: 2017-11-07

Abstract

The present invention proposes a kind of method based on attenuation network by sliding attribute manipulation image, and its main contents includes：Implementation of the study, presumption model learning algorithm, coding and decoding framework that attenuation network, coding and decoding framework, different attribute stealth are represented in neutral net, its process is to decomposite the notable information of image and attribute in recessive space to carry out image reconstruction with a kind of newly encoded decoding framework.Trained presumption model produces the input picture of different editions by changing image property value, and using continuous property can select generation image in can perceive particular community number, this characteristic, which allows for user, changes human face expression or upgating object color with sliding knob.Compared to existing for the training stage is by replacing attribute come the method for training confrontation network in pixel space, the present invention is simpler and can expand to many attribute, and presumption model can keep image naturalness while the perceived value of image attributes is lifted.

Description

A kind of method based on attenuation network by sliding attribute manipulation image

Technical field

The present invention relates to image reconstruction field, more particularly, to a kind of attenuation network that is based on by sliding attribute manipulation figure The method of picture.

Background technology

Image reconstruction refers to the data obtained according to the detection to object to re-establish image, and is used for reconstruction image Data are usually timesharing, step acquisitions, then create two dimension or 3-D view from scattered or incomplete data, and for Applied mathematics formula is then needed for some imaging techniques to regenerate the image become apparent from so that it becomes more readable With it is useful.Image reconstruction is an important research branch in image procossing, and its significance is to obtain inside object to be detected The image of structure is without causing any damage physically to object.It shows uniqueness in each different application field Importance, such as medical radiation, nuclear medicine, electron microscopic, radio radar astronomy, micro- light and holographic imaging and reason There is application by the fields such as vision all more.

The present invention propose it is a kind of based on attenuation network by sliding the method that attribute manipulates image, trained a kind of new Coding-decoding framework is used for carrying out image by directly decompositing the notable information and property value of image in recessive space Rebuild.After training, thus it is speculated that model can produce the different editions of an input picture by changing image property value. By using continuous property, can select that the number of particular community can be perceived in a generated image, and this characteristic just permits The facial expression of portrait is changed using sliding knob or the color of some objects is updated in family allowable.State-of-the-art method is at present Confrontation network is trained in pixel space by replacing property value dependent in the training stage, compared to this method present invention's Training program is simpler and can be very good to expand to many attribute.In addition model of the invention can largely change Become the perceived value of attribute in image, and the naturalness of image can be kept simultaneously.

The content of the invention

A kind of new coding-decoding framework is trained for the present invention to be used for by directly decompositing figure in recessive space The notable information and property value of picture is rebuild to image.After training, thus it is speculated that model can be belonged to by changing image Property value produces the different editions of an input picture.By using continuous property, it can select in a generated image may be used Perceive particular community number, and this characteristic allows for user and changed using sliding knob facial expression or the renewal of portrait The color of some objects.Current state-of-the-art method is to rely in the training stage by replacing property value come in pixel space Training confrontation network, the training program of the invention compared to this method is simpler and can be very good to expand to many attribute. In addition model of the invention can largely change the perceived value of attribute in image, and can keep image simultaneously Naturalness.

To solve the above problems, the present invention provides a kind of method based on attenuation network by sliding attribute manipulation image, Its main contents includes：

(1) attenuation network；

(2) coding-decoding framework；

(3) study that different attribute stealth is represented；

(4) presumption model learning algorithm；

(5) implementation of the coding-decoding framework in neutral net.

Wherein, described attenuation network, allowsAs image area, andFor withThe associated possibility property set of middle image Close, take the Representative properties of face for wear a pair of spectacles or not wear a pair of spectacles, sex, youth or age, operated here for simplification Step, it is considered to which attribute can also expand to categorical attribute as binary situation, under this configurationWherein N is the quantity of attribute, and comprising m to image and the training set of attributeWhereinHere final goal be fromLearn a model, for any one attribute vector y ', one defeated Enter image x version its property value corresponding with y '.

Wherein, described coding-decoding framework, region confrontation instruction is spatially carried out based on coding-decoding architecture stealthy Practice, wherein encoderIt is to include parameter θ_encConvolutional neural networks, this parameter reflects an input picture The N-dimensional stealth for being mapped to it is representedOn, wherein decoderInclude parameter θ_decDeconvolution god Through network, this parameter produces a redaction of input picture and provides its stealthy expressionAnd arbitrary attribute Vectorial y ', and only need simply to record using D and E when research range understandsWithAs shown in equation (1), with It is a classical mean square error (MSE) that the associated autocoding of the exact architecture of neutral net, which is lost, and it can measure institute Training input picture x reconstruction quality simultaneously provides its real property vector y：

Here and without carrying out cutting true selection to image reconstruction loss, and it is more clear in order to obtain texture in this stage Clear image, resists loss, but use except mean square error (MSE) can also be used as confrontation generation network (GAN) Average absolute value error or mean square error are still necessary, so may insure that the image rebuild matches with original image, And in the ideal case, modification D (E (x), y) in y will produce with different perception properties image, but with other aspect with Input picture x is similar.

Further, the study that the stealth of described different attribute is represented, in no additional restrictions, decoder can be neglected The attribute of sketch map picture, does not at this moment have any effect in testing time modification y, and in order to avoid this behavior, its solution is Go to learn the constant stealthy expression of association attributes, can accomplish to allow the mesh of two given different editions by consistency here It is identical to mark x and x ' property values corresponding with them, the two width figures and two that such as same person is worn glasses during with not wear a pair of spectacles Individual corresponding stealthy statement E (x) and E (x ') should also be identical；When meeting this consistency, decoder must use this Attribute rebuilds original image, and training set do not include the different editions of identical image, therefore can not be simple in counting loss Ground adds the restrictive condition, so suggestion adds this restrictive condition by carrying out dual training in stealthy space, it is this Need training to be referred to as the additional neural network of discriminator to recognize real property y of the training for providing E (x) to (x, y), lead to here Crossing study encoder E can allow discriminator to recognize correct attribute to obtain consistency, this like the same in GAN, it One two-player game of correspondence, wherein discriminator is intended to allow the ability of its recognition property to maximize, and E is intended to prevent it from turning into one Good discriminator.

Further, described discriminator target, discriminator can export an attribute vectorProbability, Wherein θ_disIt is the parameter of discriminator, k-th of attribute is represented used here as subscript k, and have Here the loss of discriminator depends on the current state of encoder, shown in such as equation (2)：

The purpose of wherein discriminator be in order to predict it is given its stealth represent input picture attribute.

Further, described confrontation target, it is characterised in that the target of encoder is to calculate a stealthy table herein Show to optimize two objects, first decoder should be able to when providing E (x) and y reconstruction image x, and discriminator at the same time Y can not be predicted in the case where providing E (x), here it is considered that when discriminator predicts 1-y for attribute k_kIt can make a mistake, institute With the absolute penalty values such as equation (3) of coding-decoding architecture when providing discriminator parameter Suo Shi：

Wherein λ_E＞ 0 controls the balance situation between image reconstruction quality and the stealthy consistency represented, here λ_EIt is larger Value will limit the information content of the x included in E (x), and can cause to produce blurred picture, and λ_ESmaller value will limit Decoder is to hidden code y dependence so as to cause the ill effect produced when converting attribute.

Further, described presumption model learning algorithm, the optimal discriminator when providing the current state of encoder Parameter is metIf ignoring related to multiple minimum or local minimum Problem, then overall goals function beAnd actually in θ_encValue it is every All go to solve during secondary renewalIt is irrational, and after dual training is carried out to deep neural network, it is contemplated that θ_disCurrency conductApproximation, it is necessary to updated to all parameters using stochastic gradient, providing instruction here When practicing sample (x, y), the autocoder loss that will be limited to (x, y) is designated asAnd corresponding discriminating Device losesAnd training example (x^(t), y^(t)) be represented by such as equation (4) and equation (5)：

The parameter current wherein provided according to time t renewal isWith

Further, implementation of the described coding-decoding framework in neutral net, allows coding-decoding architecture to fit C should be made herein to neutral net_kTo include convolution amendment linear unit (ReLU) layer of k wave filter, convolution makes here With the kernel that size is 4 × 4, its span value is 2, and Filling power is 1, so that each layer of input value size of encoder is 2, wherein using the leaky ReLU that slope value is 2 in the encoder, and simple ReLU is used in a decoder, here encoder Constituted as shown in formula (6) by following 7 layers：

C₁₆-C₃₂-C₆₄-C₁₂₈-C₂₅₆-C₅₁₂-C₅₁₂ (6)

And because the size of input picture is 256 × 256, therefore the stealthy of piece image represents that by size be 2 × 2 512 width characteristic patterns are constituted, and are given to here in order to provide image attributes to decoder, it is necessary to which hidden code is attached to as input On each layer of decoder, the hidden code of wherein image be the cascade of hot coding vector to represent the property value of image, here Binary attribute uses [1,0] and [0,1] to represent, therefore is attached to decoding using hidden code as extra constant inflow passage In the convolution of device, the quantity of attribute is represented with n here, and encoder is symmetrical with decoder, therefore make as shown in formula (7) Lifting sampling is carried out with transposition convolution：

C_512+2n-C_512t2n-C_256+2n-C_128+2n-C_64+2n-C_32+2n-C_16+2n (7)

Discriminator is a C herein₅₁₂Layer, its followed by one be respectively by size 512 and n the god connected entirely for two layers Through network.

Further, described discriminator cost planning, to the loss coefficient λ of discriminator_EIt is first here using Changeable weight By λ_E0 and the training pattern as normal autocoder are set to, then the λ in preceding 500,000 iteration_EValue it is linear 0.0001 is incremented to so as to slowly encourage model to produce constant expression, even if will be observed that without this planning It is in λ_EValue it is low-down in the case of the penalty values from discriminator very big influence can be also caused to encoder.

Further, described model selection, carrys out the automatic model that performs using two standards and selects, here first by The image reconstruction errors on original image measured by MSE, then in predicted for second standard by training grader Image attributes, and the ending in each execution cycle concentrates the attribute of each image of exchange in checking and goes measurement for decoding The performance situation of Image Classifier, two indices here are used to filtering out potential good model, and the selection of final mask The people of assessment will be carried out based on to(for) the image that is obtained in the training set rebuild from attribute has been exchanged is carried out.

Brief description of the drawings

Fig. 1 be the present invention it is a kind of based on attenuation network by slide attribute manipulate image method coding-decoding architecture Figure.

Fig. 2 be the present invention it is a kind of based on attenuation network by slide attribute manipulate image method flower in different pink colours The image reconstruction exemplary plot of property value.

Embodiment

It should be noted that in the case where not conflicting, the feature in embodiment and embodiment in the application can phase Mutually combine, the present invention is described in further detail with specific embodiment below in conjunction with the accompanying drawings.

Fig. 1 be the present invention it is a kind of based on attenuation network by slide attribute manipulate image method coding-decoding architecture Figure.Input in figure is an image with attribute to (x, y), and x is mapped to its stealth and represented on z by encoder, and discriminator is then Be trained to predict y when z is provided and can not when now encoder is then trained to allow discriminator only providing z Y is predicted, therefore decoder needs provide (z, y) and can just reconstruct image x.Generally mainly include attenuation network, coding- Decoding framework, the study of different attribute stealth expression, presumption model learning algorithm, coding-decoding framework are in neutral net Implementation.

Further, described presumption model learning algorithm, the optimal discriminator when providing the current state of encoder Parameter is metIf ignoring related to multiple minimum or local minimum Problem, then overall goals function beAnd actually in θ_encValue it is every All go to solve during secondary renewalIt is irrational, and after dual training is carried out to deep neural network, it is contemplated that θ_disCurrency conductApproximation, it is necessary to updated to all parameters using stochastic gradient, providing instruction here When practicing sample (x, y), the autocoder loss that will be limited to (x, y) is designated asAnd corresponding mirror Other device losesAnd training example（x^(t), y^(t)) be represented by such as equation (4) and equation (5)：

The parameter current wherein provided according to time t renewal isWith

C₁₆-C₃₂-C₆₄-C₁₂₈-C₂₅₆-C₅₁₂-C₅₁₂ (6)

C_512+2n-C_512+2n-C_256+2n-C_128+2n-C_64+2n-C_32+2n-C_16+2n (7)

Fig. 2 be the present invention it is a kind of based on attenuation network by slide attribute manipulate image method flower in different pink colours The image reconstruction exemplary plot of property value.The closer pink colour that flower fair becomes when increasing pink colour property value in figure, and in reduction pink colour It can allow and be spent closer to yellow or orange during property value.

For those skilled in the art, the present invention is not restricted to the details of above-described embodiment, in the essence without departing substantially from the present invention In the case of refreshing and scope, the present invention can be realized with other concrete forms.In addition, those skilled in the art can be to this hair Bright to carry out various changes and modification without departing from the spirit and scope of the present invention, these improvement and modification also should be regarded as the present invention's Protection domain.Therefore, appended claims are intended to be construed to include preferred embodiment and fall into all changes of the scope of the invention More and modification.

Claims

1. a kind of method based on attenuation network by sliding attribute manipulation image, it is characterised in that mainly including attenuation network (1)；Coding-decoding framework (two)；The study (three) that different attribute stealth is represented；Presumption model learning algorithm (four)；Coding- Implementation (five) of the decoding framework in neutral net.

2. based on the attenuation network (one) described in claims 1, it is characterised in that allowAs image area, andFor withIn The associated possibility attribute set of image, take the Representative properties of face for wear a pair of spectacles or not wear a pair of spectacles, sex, youth or Age, here for simplified operating procedure, it is considered to which attribute can also expand to categorical attribute as binary situation, at this Plant under settingWherein n is the quantity of attribute, and comprising m to image and the training set of attribute WhereinHere final goal be fromLearn a model, for any one For individual attribute vector y ', its property value of input picture x version is corresponding with y '.

3. based on coding-decoding framework (two) described in claims 1, it is characterised in that based on coding-decoding architecture hidden Shape spatially carries out region dual training, wherein encoderIt is to include parameter θ_encConvolutional neural networks, This parameter represents the N-dimensional stealth that an input picture is mapped to itOn, wherein decoder Include parameter θ_decDeconvolution neutral net, this parameter produces a redaction of input picture and provides its stealthy expressionAnd arbitrary attribute vector y ', and only need simply to record using D and E when research range understandsWithAs shown in equation (1), it is a classical mean square error that the autocoding associated with the exact architecture of neutral net, which is lost, Poor (MSE), it can measure trained input picture x reconstruction quality and provide its real property vector y：

Here and without carrying out cutting true selection to image reconstruction loss, and become apparent from this stage in order to obtain texture Image, loss is resisted except mean square error (MSE) can also be used as confrontation generation network (GAN), but using average Absolute value error or mean square error are still necessary, so may insure that the image rebuild matches with original image, and Ideally, modification D (E (x), y) in y will produce with different perception properties image, but with other aspect with input Image x is similar.

4. the study (three) represented based on the different attribute stealth described in claims 1, it is characterised in that in not additional limit Decoder can ignore the attribute of image during condition processed, at this moment not have any effect in testing time modification y, and in order to avoid this Behavior, its solution is to learn the constant stealthy expression of association attributes, can accomplish to allow given here by consistency The target x and x ' property values corresponding with them of two different editions be identical, such as same person wear glasses with without Two width figures stealthy statement E (x) corresponding with two and E (x ') during glasses should also be identical；When meeting this consistency When, decoder must use the attribute to rebuild original image, and training set does not include the different editions of identical image, therefore The restrictive condition can not be simply added into during counting loss, so suggestion is added by carrying out dual training in stealthy space This restrictive condition, needs to train to be referred to as the additional neural network of discriminator to recognize the training for providing E (x) to (x, y) for this Real property y, by learning encoder E to obtain consistency discriminator can be allowed to recognize correct attribute here, this is just Like the same in GAN, its one two-player game of correspondence, wherein discriminator is intended to allow the ability of its recognition property to maximize, and E It is intended to prevent it from turning into a good discriminator.

5. based on the discriminator target described in claims 4, it is characterised in that discriminator can export an attribute vectorProbability, wherein θ_disIt is the parameter of discriminator, k-th of attribute is represented used here as subscript k, and haveHere the loss of discriminator depends on the current state of encoder, such as Shown in equation (2)：

6. based on the confrontation target described in claims 4, it is characterised in that the target of encoder is that calculating one is hidden herein Shape represents to optimize two objects, first decoder should be able to when providing E (x) and y reconstruction image x, and reflect at the same time Other device can not predict y in the case where providing E (x), here it is considered that when discriminator predicts 1-y for attribute k_kCan occur mistake By mistake, so when providing discriminator parameter shown in the absolute penalty values such as equation (3) of coding-decoding architecture：

Wherein λ_E＞ 0 controls the balance situation between image reconstruction quality and the stealthy consistency represented, here λ_EHigher value will The information content of the x included in E (x) can be limited, and can cause to produce blurred picture, and λ_ESmaller value will limit decoding Device is to hidden code y dependence so as to cause the ill effect produced when converting attribute.

7. based on the presumption model learning algorithm (four) described in claims 1, it is characterised in that providing the current of encoder Optimal discriminator parameter is met during stateIf ignored and multiple minimum Or the problem of local minimum correlation, then overall goals function isAnd Actually in θ_encValue all go when updating every time to solveIt is irrational, and to deep neural network progress pair After anti-training, it is contemplated that θ_disCurrency conductApproximation, it is necessary to use stochastic gradient to all parameters Update, here when providing training sample (x, y), the autocoder loss that will be limited to (x, y) is designated asAnd corresponding discriminator loss isAnd training example (x^(t), y^(t)) such as equation (4) it is represented by with equation (5)：

The parameter current wherein provided according to time t renewal isWith

8. the implementation (five) based on coding-decoding framework described in claims 1 in neutral net, its feature exists In allowing coding-decoding architecture to adapt to neutral net, C made herein_kIt is linearly single for the convolution amendment comprising k wave filter First (ReLU) layer, convolution is using the kernel that size is 4 × 4 here, and its span value is 2, and Filling power is 1, so that encoder Each layer of input value size be 2, wherein and being made in a decoder using the leaky ReLU that slope value is 2 in the encoder Simple ReLU is used, is constituted here shown in encoder such as formula (6) by following 7 layers：

C₁₆-C₃₂-C₆₄-C₁₂₈-C₂₅₆-C₅₁₂-C₅₁₂ (6)

And because the size of input picture is 256 × 256, therefore the stealthy of piece image is represented by 512 width that size is 2 × 2 Characteristic pattern is constituted, and is given to decoding, it is necessary to hidden code is attached to as input here in order to provide image attributes to decoder On each layer of device, the hidden code of wherein image be the cascade of hot coding vector to represent the property value of image, two enter here Attribute processed uses [1,0] and [0,1] to represent, therefore is attached to decoder using hidden code as extra constant inflow passage In convolution, the quantity of attribute is represented with n here, and encoder is symmetrical with decoder, therefore use turns as shown in formula (7) Put convolution and carry out lifting sampling：

C_512+2n-C_512+2n-C_256+2n-C_128+2n-C_64+2n-C_32+2n-C_16+2n (7)

Discriminator is a C herein₅₁₂Layer, its followed by one be respectively by size 512 and n the nerve net connected entirely for two layers Network.

9. based on the discriminator cost planning described in claims 8, it is characterised in that to the loss coefficient λ of discriminator_EUse Changeable weight, here first by λ_E0 and the training pattern as normal autocoder are set to, then at preceding 500,000 times λ in iteration_EValue linear increment to 0.0001 so as to which slow excitation model produces constant expression, without this planning Even words are it will be observed that in λ_EValue it is low-down in the case of the penalty values from discriminator encoder can also be caused very Big influence.

10. based on the model selection described in claims 8, it is characterised in that carry out the automatic model that performs using two standards and select Select, here first by the image reconstruction errors on the original image measured by MSE, then in for second standard pass through instruction Practice grader and carry out prognostic chart picture attribute, and the ending in each execution cycle is concentrated the attribute of each image of exchange in checking and gone The performance situation of the Image Classifier for decoding is measured, two indices here are used to filter out potential good model, and The selection of final mask will be based on people for commenting that the image that is obtained in the training set rebuild from attribute has been exchanged is carried out Estimate to carry out.