CN109684973A

CN109684973A - The facial image fill system of convolutional neural networks based on symmetrical consistency

Info

Publication number: CN109684973A
Application number: CN201811549357.XA
Authority: CN
Inventors: 左旺孟; 李晓明; 刘铭
Original assignee: Harbin Institute of Technology
Current assignee: Harbin Institute of Technology
Priority date: 2018-12-18
Filing date: 2018-12-18
Publication date: 2019-04-26
Anticipated expiration: 2038-12-18
Also published as: CN109684973B

Abstract

The facial image fill system of convolutional neural networks based on symmetrical consistency, belong to image completion technical field, solves the problems, such as that the existing facial image fill system based on convolutional neural networks filling effect due to not can guarantee the symmetrical consistency of filled face is poor.The facial image fill system: light stream network is using partial occlusion facial image and its flip horizontal image as input, using obtained light stream vectors as the absolute coordinate for making the flip horizontal anamorphose, and make the flip horizontal anamorphose deformation flipchart by way of bilinear interpolation.Illumination network corrects by obtained illumination correction coefficient the illumination patterns of deformation flipchart using partial occlusion facial image and its flip horizontal image as input.Symmetrical missing pixel fills the deformation flipchart after subsystem corrects illumination patterns and its corresponding residue blocks template as its input, exports the filled facial image of missing pixel.

Description

The facial image fill system of convolutional neural networks based on symmetrical consistency

Technical field

The present invention relates to a kind of facial image fill systems, belong to image completion technical field.

Background technique

Facial image filling is intended to fill out the facial image not blocked from the facial image of a width partial occlusion, main The shelter in facial image for impaired facial image being restored or being removed partial occlusion.For partial occlusion Facial image, fill out the facial image of high quality, be all the hot spot of graph and image processing area research all the time and difficult Point.

In recent years, in order to obtain better facial image filling effect, scholar attempts for convolutional neural networks to be applied to Facial image padding field, by based on the encoder and decoder of convolutional neural networks come in partial occlusion facial image Missing pixel is filled, and different Loss constraints is used in combination, such as perception loss, segmentation loss and local discriminant damage Lose etc..However, the existing facial image fill system based on convolutional neural networks does not consider intrinsic symmetrical of facial image Information, i.e. left and right face symmetry.Such as when the left face pixel missing in the part in facial image, picture can be lacked using with left face Plain symmetrical right face pixel is filled left face missing pixel, in another example being filled out by carrying out constraint to partial occlusion facial image The mode filled guarantees left and right face symmetry.It follows that the existing facial image fill system based on convolutional neural networks Filling effect needs to be further increased.

Summary of the invention

The present invention is by the existing facial image fill system based on convolutional neural networks of solution because not can guarantee filling people The symmetrical consistency of face and the problem of filling effect difference, propose a kind of face of convolutional neural networks based on symmetrical consistency Image completion system.

The facial image fill system of convolutional neural networks of the present invention based on symmetrical consistency, for part The missing pixel of the facial image that the facial image blocked is filled not blocked, the facial image of partial occlusion includes Symmetrical missing pixel and asymmetric missing pixel；

It relative to the pixel of face median line is axisymmetricly miss status with the pixel for symmetrical missing pixel；

It relative to the pixel of face median line is axisymmetricly not lack shape with the pixel for asymmetric missing pixel State；

The facial image fill system includes asymmetric missing pixel filling subsystem and symmetrical missing pixel filling System, the two are based on convolutional neural networks to realize；

Asymmetric missing pixel filling subsystem includes light stream network and illumination network；

Light stream network is inputted the flip horizontal image of partial occlusion facial image and partial occlusion facial image as it, Using obtained light stream vectors as the absolute coordinate for making the flip horizontal anamorphose, and make this by way of bilinear interpolation Flip horizontal anamorphose obtains deformation flipchart；

Illumination network is inputted the flip horizontal image of partial occlusion facial image and partial occlusion facial image as it, And the illumination patterns of deformation flipchart are corrected by obtained illumination correction coefficient, the deformation flipchart after illumination patterns correction For the filled partial occlusion facial image of asymmetric missing pixel；

Symmetrical missing pixel fills subsystem for the filled partial occlusion facial image of asymmetric missing pixel and the figure As corresponding residue blocks template as its input, the output filled facial image of missing pixel.

The facial image fill system of convolutional neural networks of the present invention based on symmetrical consistency, is based on existing On the basis of the facial image fill system of convolutional neural networks, introduce be made of light stream network and illumination network it is asymmetric Missing pixel fills subsystem.Light stream network is by the flip horizontal image of partial occlusion facial image and partial occlusion facial image As its input, using obtained light stream vectors as the absolute coordinate for making the flip horizontal anamorphose, and inserted by bilinearity The mode of value makes the flip horizontal anamorphose, obtains deformation flipchart.Face and partial occlusion face in deformation flipchart The posture of face in image is identical with expression.Illumination network is equally by partial occlusion facial image and partial occlusion facial image Flip horizontal image as its input, and correct by obtained illumination correction coefficient the illumination patterns of deformation flipchart. After deformation flipchart is multiplied with illumination correction coefficient, human face light distribution and the human face light of partial occlusion facial image are distributed Unanimously.Symmetrical missing pixel of the invention fills the corresponding residue of the deformation flipchart after subsystem corrects illumination patterns Template is blocked as its input, exports the filled facial image of missing pixel.

The facial image fill system of convolutional neural networks of the present invention based on symmetrical consistency, passes through light stream net Network is associated with to construct the pixel in partial occlusion facial image between left and right face, and use and asymmetric missing pixel bilateral symmetry Non- missing pixel asymmetric missing pixel is filled, to realize preliminary filling to partial occlusion facial image.Pass through Illumination network establishes the illumination correction coefficient in partial occlusion facial image between left and right face, and using illumination correction coefficient to first The illumination patterns of step filling facial image are corrected, and the illumination patterns one of its illumination patterns Yu partial occlusion facial image are made It causes.This filling mode for considering the intrinsic symmetric information of facial image can fill for facial image and provide the structure for meeting identity Information.Therefore, of the present invention to be based on symmetrical one compared with the facial image fill system with existing based on convolutional neural networks The facial image fill system of the convolutional neural networks of cause property can guarantee the symmetrical consistency of filled face, and filling effect is more It is good.

Detailed description of the invention

The convolution mind to of the present invention based on symmetrical consistency will hereinafter be carried out based on the embodiments and with reference to the accompanying drawings Facial image fill system through network is described in more detail, in which:

Fig. 1 is the filling flow chart that the asymmetric missing pixel that embodiment refers to fills subsystem；

Fig. 2 is the network structure for the light stream network that embodiment refers to；

Fig. 3 is the network structure for the illumination network that embodiment refers to；

Fig. 4 is that the training network that embodiment refers to constrains symmetrical missing pixel filling subsystem study by rebuilding loss Flow chart；

Fig. 5 is the facial image fill system of the convolutional neural networks based on symmetrical consistency described in embodiment for true The filling effect figure of facial image is blocked in fact.

Specific embodiment

It is filled out below in conjunction with facial image of the attached drawing to the convolutional neural networks of the present invention based on symmetrical consistency Charging system is described further.

Embodiment: the present embodiment is explained in detail below with reference to Fig. 1 to Fig. 5.

The facial image fill system of convolutional neural networks based on symmetrical consistency described in the present embodiment, for portion The facial image blocked is divided to be filled the facial image not blocked, the missing pixel packet of the facial image of partial occlusion Include symmetrical missing pixel and asymmetric missing pixel；

In the present embodiment, light stream network includes light stream encoder and light stream decoder, and light stream encoder includes N₁A volume Lamination, light stream decoder include N₁A warp lamination；

Illumination network includes illumination encoder and illumination decoder, and illumination encoder includes N₂A convolutional layer, illumination decoding Device includes N₂A warp lamination；

Symmetrical missing pixel filling subsystem includes rebuilding encoder and rebuilding decoder, and rebuilding encoder includes N₃A volume Lamination, rebuilding decoder includes N₃A warp lamination；

N₁、N₂And N₃It is all larger than or equal to 2.

In the present embodiment, N₁=8；

Light stream encoder includes convolutional layer C1~convolutional layer C8；

Convolutional layer C1 is for successively carrying out first with the feature of connecting of its flip horizontal image to partial occlusion facial image Convolution operation and the first activation operation；

Convolutional layer C2 is used to successively carry out the output of convolutional layer C1 the second convolution operation, normalization operation and the second activation Operation；

Convolutional layer C3 is used to successively carry out the output of convolutional layer C2 third convolution operation, block normalization operation and third to swash Operation living；

Convolutional layer C4 is used to successively carry out the output of convolutional layer C3 the operation of Volume Four product, block normalization operation and the 4th swashs Operation living；

Convolutional layer C5 is used to successively carry out the output of convolutional layer C4 the 5th convolution operation, block normalization operation and the 5th swashs Operation living；

Convolutional layer C6 is used to successively carry out the output of convolutional layer C5 the 6th convolution operation, block normalization operation and the 6th swashs Operation living；

Convolutional layer C7 is used to successively carry out the output of convolutional layer C6 the 7th convolution operation, block normalization operation and the 7th swashs Operation living；

Convolutional layer C8 is used to successively carry out the output of convolutional layer C7 the 8th convolution operation and the 8th activation operates；

Light stream decoder includes warp lamination D1~warp lamination D8；

Warp lamination D1 is used to successively carry out the output of convolutional layer C8 the first deconvolution operation, block normalization operation and the Nine activation operations；

Warp lamination D2 be used for the output of warp lamination D1 successively carry out the second deconvolution operation, block normalization operation and Tenth activation operation；

Warp lamination D3 be used for the output of warp lamination D2 successively carry out third deconvolution operation, block normalization operation and 11st activation operation；

Warp lamination D4 be used for the output of warp lamination D3 successively carry out the 4th deconvolution operation, block normalization operation and 12nd activation operation；

Warp lamination D5 be used for the output of warp lamination D4 successively carry out the 5th deconvolution operation, block normalization operation and 13rd activation operation；

Warp lamination D6 be used for the output of warp lamination D5 successively carry out the 6th deconvolution operation, block normalization operation and 14th activation operation；

Warp lamination D7 be used for the output of warp lamination D6 successively carry out the 7th deconvolution operation, block normalization operation and 15th activation operation；

Warp lamination D8 is used to successively carry out the output of warp lamination D7 the 8th deconvolution operation, the 16th activation operates It is operated with bilinear interpolation；

The output of warp lamination D8 is deformation flipchart；

The convolution operation that first convolution operation is 64 4*4, step-length is 2；

The convolution operation that second convolution operation is 128 4*4, step-length is 2；

The convolution operation that third convolution operation is 256 4*4, step-length is 2；

It is 512 4*4 that Volume Four product, which operates, the convolution operation that step-length is 2；

5th convolution operation to the 8th convolution operation is 1024 4*4, the convolution operation that step-length is 2；

First deconvolution is operated to the deconvolution that third deconvolution operation is 1024 4*4, step-length is 2 and is operated；

The deconvolution operation that 4th deconvolution operation is 512 4*4, step-length is 2；

The deconvolution operation that 5th deconvolution operation is 256 4*4, step-length is 2；

The deconvolution operation that 6th deconvolution operation is 128 4*4, step-length is 2；

The deconvolution operation that 7th deconvolution operation is 64 4*4, step-length is 2；

The deconvolution operation that 8th deconvolution operation is 2 4*4, step-length is 2；

First activation operation to the 7th activation operation is all made of LReLU function, the 8th activation operation to the 15th activation behaviour It is all made of ReLU function, the 16th activation operation uses Tanh function.

In the present embodiment, N₂=8；

Illumination encoder includes convolutional layer C9~convolutional layer C16；

Convolutional layer C9 is for successively carrying out the 9th with the feature of connecting of its flip horizontal image to partial occlusion facial image Convolution operation and the 17th activation operation；

Convolutional layer C10 is for successively carrying out the tenth convolution operation, block normalization operation and the tenth to the output of convolutional layer C9 Eight activation operations；

Convolutional layer C11 is used to successively carry out the output of convolutional layer C10 the 11st convolution operation, block normalization operation and the 19 activation operations；

Convolutional layer C12 is used to successively carry out the output of convolutional layer C11 the 12nd convolution operation, block normalization operation and the 20 activation operations；

Convolutional layer C13 is used to successively carry out the output of convolutional layer C12 the 13rd convolution operation, block normalization operation and the 21 activation operations；

Convolutional layer C14 is used to successively carry out the output of convolutional layer C13 the 14th convolution operation, block normalization operation and the 22 activation operations；

Convolutional layer C15 is used to successively carry out the output of convolutional layer C14 the 15th convolution operation, block normalization operation and the 23 activation operations；

Convolutional layer C16 is used to successively carry out the output of convolutional layer C15 the 16th convolution operation and the 24th activation is grasped Make；

Illumination decoder includes warp lamination D9~warp lamination D16；

Warp lamination D9 be used for the output of convolutional layer C16 successively carry out the 9th deconvolution operation, block normalization operation and 25th activation operation；

Warp lamination D10 is for successively carrying out the tenth deconvolution operation, block normalization operation to the output of warp lamination D9 With the 26th activation operation；

Warp lamination D11 is used to successively carry out the output of warp lamination D10 the 11st deconvolution operation, block normalization behaviour Make and the 27th activation operates；

Warp lamination D12 is used to successively carry out the output of warp lamination D11 the 12nd deconvolution operation, block normalization behaviour Make and the 28th activation operates；

Warp lamination D13 is used to successively carry out the output of warp lamination D12 the 13rd deconvolution operation, block normalization behaviour Make and the 29th activation operates；

Warp lamination D14 is used to successively carry out the output of warp lamination D13 the 14th deconvolution operation, block normalization behaviour Make and the 30th activation operates；

Warp lamination D15 is used to successively carry out the output of warp lamination D14 the 15th deconvolution operation, block normalization behaviour Make and the 31st activation operates；

Warp lamination D16 is used for the output to warp lamination D15 and carries out the 16th deconvolution operation；

The output of warp lamination D16 is illumination correction coefficient；

The convolution operation that 9th convolution operation is 64 4*4, step-length is 2；

The convolution operation that tenth convolution operation is 128 4*4, step-length is 2；

The convolution operation that 11st convolution operation is 256 4*4, step-length is 2；

The convolution operation that 12nd convolution operation is 512 4*4, step-length is 2；

13rd convolution operation to the 16th convolution operation is 1024 4*4, the convolution operation that step-length is 2；

9th deconvolution is operated to the deconvolution that the 11st deconvolution operation is 1024 4*4, step-length is 2 and is operated；

The deconvolution operation that 12nd deconvolution operation is 512 4*4, step-length is 2；

The deconvolution operation that 13rd deconvolution operation is 256 4*4, step-length is 2；

The deconvolution operation that 14th deconvolution operation is 128 4*4, step-length is 2；

The deconvolution operation that 15th deconvolution operation is 64 4*4, step-length is 2；

The deconvolution operation that 16th deconvolution operation is 2 4*4, step-length is 2；

17th activation operation to the 23rd activation operation is all made of LReLU function, the 24th activation operation to the 31 activation operations are all made of ReLU function.

In the present embodiment, N₃=8；

Rebuilding encoder includes convolutional layer C17~convolutional layer C24；

Convolutional layer C17 is for the remaining screening corresponding to the filled partial occlusion facial image of asymmetric missing pixel The series connection feature of gear template successively carries out the 17th convolution operation and the 32nd activation operation；

Convolutional layer C18 is used to successively carry out the output of convolutional layer C17 the 18th convolution operation, block normalization operation and the 33 activation operations；

Convolutional layer C19 is used to successively carry out the output of convolutional layer C18 the 19th convolution operation, block normalization operation and the 34 activation operations；

Convolutional layer C20 is used to successively carry out the output of convolutional layer C19 the 20th convolution operation, block normalization operation and the 35 activation operations；

Convolutional layer C21 be used for the output of convolutional layer C20 successively carry out the 21st convolution operation, block normalization operation and 36th activation operation；

Convolutional layer C22 be used for the output of convolutional layer C21 successively carry out the 22nd convolution operation, block normalization operation and 37th activation operation；

Convolutional layer C23 be used for the output of convolutional layer C22 successively carry out the 23rd convolution operation, block normalization operation and 38th activation operation；

Convolutional layer C24 is used to successively carry out the output of convolutional layer C23 the 24th convolution operation and the 39th activation Operation；

Rebuilding decoder includes warp lamination D17~warp lamination D24；

Warp lamination D17 is used to successively carry out the output of convolutional layer C24 the 17th deconvolution operation, block normalization behaviour Make, first forgets operation, fisrt feature serial operation and the 40th activation operation；

Warp lamination D18 is used to successively carry out the output of warp lamination D17 eighteen incompatibilities convolution operation, block normalization behaviour Make, second forgets operation, second feature serial operation and the 41st activation operation；

Warp lamination D19 is used to successively carry out the output of warp lamination D18 the 19th deconvolution operation, block normalization behaviour Make, third forgets operation, third feature serial operation and the 42nd activation operate；

Warp lamination D20 is used to successively carry out the output of warp lamination D19 the 20th deconvolution operation, block normalization behaviour Make, fourth feature serial operation and the 43rd activation operate；

Warp lamination D21 is used to successively carry out the output of warp lamination D20 the 21st deconvolution operation, block normalizes Operation, fifth feature serial operation and the 44th activation operation；

Warp lamination D22 is used to successively carry out the output of warp lamination D21 the 22nd deconvolution operation, block normalizes Operation, sixth feature serial operation and the 45th activation operation；

Warp lamination D23 is used to successively carry out the output of warp lamination D22 the 23rd deconvolution operation, block normalizes Operation, seventh feature serial operation and the 46th activation operation；

Warp lamination D24 is used to successively carry out the output of warp lamination D23 the 24th deconvolution operation and the 40th Seven activation operations；

The output of warp lamination D24 is the filled facial image of missing pixel；

The convolution operation that 17th convolution operation is 64 4*4, step-length is 2；

The convolution operation that 18th convolution operation is 128 4*4, step-length is 2；

The convolution operation that 19th convolution operation is 256 4*4, step-length is 2；

The convolution operation that 20th convolution operation is 512 4*4, step-length is 2；

21st convolution operation to the 24th convolution operation is 1024 4*4, the convolution operation that step-length is 2；

17th deconvolution is operated to the deconvolution that the 19th deconvolution operation is 1024 4*4, step-length is 2 and is operated；

The deconvolution operation that 20th deconvolution operation is 512 4*4, step-length is 2；

The deconvolution operation that 21st deconvolution operation is 256 4*4, step-length is 2；

The deconvolution operation that 22nd deconvolution operation is 128 4*4, step-length is 2；

The deconvolution operation that 23rd deconvolution operation is 64 4*4, step-length is 2；

The deconvolution operation that 24th deconvolution operation is 3 4*4, step-length is 2；

32nd activation operation to the 38th activation operation and the 40th activation operation to the 46th activation behaviour It is all made of LReLU function, the 39th activation operation uses ReLU function, and the 47th activation operation uses Sigmoid letter Number；

First, which forgets operation to third forgetting operation, is all made of DropOut function；

The output of operation is forgotten in output and first that fisrt feature serial operation is series connection convolutional layer C23；

The output of operation is forgotten in output and second that second feature serial operation is series connection convolutional layer C22；

The output of operation is forgotten in output and third that third feature serial operation is series connection convolutional layer C21；

Fourth feature serial operation be connect convolutional layer C20 output and warp lamination D20 block normalization operation it is defeated Out；

Fifth feature serial operation be connect convolutional layer C19 output and warp lamination D21 block normalization operation it is defeated Out；

Sixth feature serial operation be connect convolutional layer C18 output and warp lamination D22 block normalization operation it is defeated Out；

Seventh feature serial operation be connect convolutional layer C17 output and warp lamination D23 block normalization operation it is defeated Out.

The facial image fill system of convolutional neural networks based on symmetrical consistency described in the present embodiment further includes instruction Practice network；

Training network constrains light stream e-learning, concrete mode by the loss of face key point and full variational regularization item Are as follows:

The corresponding L people that do not block in facial image g of facial image is blocked using face critical point detection algorithm detection part Face key pointAnd L face key point in facial image g is not blocked by flip horizontal The L face key point of the flip horizontal image g' of facial image g is not blocked

For the x-axis coordinate for not blocking i-th of face key point in facial image g,Not block in facial image g The y-axis coordinate of i face key point,For the x-axis coordinate of j-th of face key point in flip horizontal image g',For level The y-axis coordinate of j-th of face key point in flipped image g'；

In order to which flip horizontal image g' is aligned with facial image g is not blocked, it is expected that

X-axis coordinate and y-axis coordinate are normalized in [- 1,1]；

Face key point is lost into l_lmIs defined as:

In formula,For the x-axis coordinate value of the corresponding light stream vectors of i-th of face key point in flip horizontal image g',For the y-axis coordinate value of the corresponding light stream vectors of i-th of face key point in flip horizontal image g'；

According to the absolute coordinate for the light stream vectors Φ that light stream network obtains, full variational regularization item l is defined_TVAre as follows:

l_TV=‖ ▽_xΦ_x‖²+‖▽_yΦ_x‖²+‖▽_xΦ_y‖²+‖▽_yΦ_y‖²

In formula, ▽_xFor gradient along the x-axis direction, ▽_yFor gradient along the y-axis direction, Φ^xFor the x-axis of light stream vectors Coordinate value, Φ^yFor the y-axis coordinate value of light stream vectors；

Training network constrains the generation of light stream vectors by the loss of face key point and full variational regularization item jointly；

Training network constrains illumination e-learning, concrete mode by illumination consistency loss are as follows:

According to the light stream vectors Φ that light stream network exports, the corresponding deformation flipchart I of partial occlusion facial image is obtained^w:

In formula, N isFour adjacent positions, I^o' be partial occlusion facial image flip horizontal image；

Similarly, it can get and do not block the corresponding deformation flipchart I of facial image g^w'；

It defines illumination consistency and loses L_lAre as follows:

In formula, R is illumination correction coefficient.

The training network of the present embodiment constrains symmetrical missing pixel by rebuilding loss and fills subsystem study, specific side Formula are as follows:

Rebuild loss include the filled facial image of missing pixel it is corresponding do not block facial image g it is European away from It is lost from losing and rebuilding recognition of face network characterization；

By the corresponding Euclidean distance loss definition for not blocking facial image g of the filled facial image of missing pixel Are as follows:

In formula,For the filled facial image of missing pixel；

Recognition of face network characterization is lost is defined as:

In formula,When the facial image blocked for importation, preparatory trained VGG-Face network obtain the L layers of convolution feature, ψ_l(g) when not blocking facial image g for input, obtain l layers of preparatory trained VGG-Face network Convolution feature, C_l、H_lAnd W_lRespectively importation block facial image when, preparatory trained VGG-Face network obtains L layers of convolution feature port number, height and width.

The training network of the present embodiment also constrains symmetrical missing pixel filling subsystem study by perception symmetric loss, Concrete mode are as follows:

The Symmetric Loss that perception symmetric loss is characterized on layer is inputted asymmetric respectively using the mode of shared sub-network The flip horizontal image of missing pixel filled partial occlusion facial image and the image obtains rebuilding in decoder l layers Feature Ω_lWith Ω '_l, define perception symmetric loss are as follows:

In formula, C_lIndicate feature Ω_lOr Ω '_lPort number, Φ_↓It indicates to carry out down-sampling to the output Φ of light stream network It arrives and Ω_lOr Ω '_lSize it is consistent,Indicate corresponding to the filled partial occlusion facial image of asymmetric missing pixel Residue block mould and be down sampled to and Ω_lOr Ω '_lSize it is consistent.

The training network of the present embodiment also constrains symmetrical missing pixel by differentiating loss and fills subsystem study, specifically Mode are as follows:

Training network is by differentiating that network differentiates loss to obtain；

Differentiate that network includes that global differentiation network and position differentiate network, the two network structure having the same；

The overall situation differentiates that network using the facial image of partial occlusion as its input, exports the overall situation and differentiates loss；

Position differentiate network successively using left eye, right eye, nose and the mouth position in the facial image of partial occlusion as It is inputted, and is upsampled to uniform sizes respectively, and is sequentially output left eye, right eye, nose and the differentiation at mouth position loss；

The overall situation differentiates that network includes convolutional layer E1~convolutional layer E5；

Convolutional layer E1 is used to successively carry out the facial image of partial occlusion the 25th convolution operation and the 48th and swashs Operation living；

Convolutional layer E2 is used to successively carry out the output of convolutional layer E1 the 26th convolution operation, block normalization operation and the 49 activation operations；

Convolutional layer E3 is used to successively carry out the output of convolutional layer E2 the 27th convolution operation, block normalization operation and the 50 activation operations；

Convolutional layer E4 is used to successively carry out the output of convolutional layer E3 the 28th convolution operation, block normalization operation and the 51 activation operations；

Convolutional layer E5 is used to successively carry out the output of convolutional layer E4 the 29th convolution operation and the 52nd activation is grasped Make；

25th convolution operation is 64 4*4, the convolution operation that step-length is 2；

The convolution operation that 26th convolution operation is 128 4*4, step-length is 2；

The convolution operation that 27th convolution operation is 256 4*4, step-length is 2；

The convolution operation that 28th convolution operation is 512 4*4, step-length is 1；

The convolution operation that 29th convolution operation is 1 4*4, step-length is 1；

48th activation operation to the 51st activation operation is all made of LReLU function, and the 52nd activation operation is adopted With Sigmoid function；

The entropy loss that the feature of T × T and the 0 of T × T or 1 that convolutional layer E5 is exported are intersected differentiates loss to be global；

The entropy loss that the feature of T × T and the 0 of T × T or 1 that position differentiation network exports are intersected is that position differentiates damage It loses.

The training network of the present embodiment is lacked using Adam optimization algorithm to asymmetric missing pixel filling subsystem and symmetrically It loses pixel filling subsystem and carries out end-to-end training.

The facial image fill system of convolutional neural networks based on symmetrical consistency described in the present embodiment, training network Symmetrical missing pixel filling subsystem study is constrained by perception symmetric loss, perception symmetric loss can constrain symmetrical missing Perception Features consistency loss in pixel filling subsystem between face or so one side of something face, so that symmetrical missing pixel filling The filled facial image of missing pixel of system output has symmetrical consistency.

For the filling for blocking facial image of the facial image and synthesis really blocked, convolutional Neural is based on existing The facial image fill system of network is compared, the face figure of the convolutional neural networks based on symmetrical consistency described in the present embodiment As fill system is generating facial image details, identity characteristic due to introducing light stream network, illumination network and perception symmetric and losing There is better filling effect in terms of keeping with facial symmetry.

Fig. 5 is the facial image fill system of the convolutional neural networks based on symmetrical consistency described in embodiment for true The filling effect figure of facial image is blocked in fact, and facial image, the corresponding filling effect figure of the second behavior are really blocked in the first behavior.

Although describing the present invention herein with reference to specific implementation method, it should be understood that, these realities Applying example only is the example of principles and applications.It should therefore be understood that can be permitted exemplary embodiment More modifications, and can be designed that other arrangements, without departing from spirit of the invention as defined in the appended claims and Range.It should be understood that different appurtenances can be combined by being different from method described in original claim It is required that and feature described herein.It will also be appreciated that the feature in conjunction with described in separate embodiments can be used at it In his embodiment.

Claims

1. the facial image fill system of the convolutional neural networks based on symmetrical consistency, for the facial image to partial occlusion It is filled the facial image not blocked, which is characterized in that the missing pixel of the facial image of partial occlusion includes pair Claim missing pixel and asymmetric missing pixel；

It relative to the pixel of face median line is axisymmetricly non-miss status with the pixel for asymmetric missing pixel；

The facial image fill system includes that asymmetric missing pixel filling subsystem and symmetrical missing pixel fill subsystem, The two is based on convolutional neural networks to realize；

Light stream network is inputted the flip horizontal image of partial occlusion facial image and partial occlusion facial image as it, will To light stream vectors as the absolute coordinate for making the flip horizontal anamorphose, and make the level by way of bilinear interpolation Flipped image deformation, obtains deformation flipchart；

Illumination network leads to using the flip horizontal image of partial occlusion facial image and partial occlusion facial image as its input The illumination correction coefficient crossed corrects the illumination patterns of deformation flipchart, and the deformation flipchart after illumination patterns are corrected is non- The filled partial occlusion facial image of symmetrical missing pixel；

Symmetrical missing pixel fills subsystem for the filled partial occlusion facial image of asymmetric missing pixel and the image pair The residue answered blocks template as its input, exports the filled facial image of missing pixel.

2. the facial image fill system of the convolutional neural networks as described in claim 1 based on symmetrical consistency, feature It is, light stream network includes light stream encoder and light stream decoder, and light stream encoder includes N₁A convolutional layer, light stream decoder packet Include N₁A warp lamination；Illumination network includes illumination encoder and illumination decoder, and illumination encoder includes N₂A convolutional layer, light It include N according to decoder₂A warp lamination；Symmetrical missing pixel filling subsystem includes rebuilding encoder and rebuilding decoder, weight Building encoder includes N₃A convolutional layer, rebuilding decoder includes N₃A warp lamination；

N₁、N₂And N₃It is all larger than or equal to 2.

3. the facial image fill system of the convolutional neural networks as claimed in claim 2 based on symmetrical consistency, feature It is, N₁=8；

Light stream encoder includes convolutional layer C1~convolutional layer C8；

Convolutional layer C1 is for successively carrying out the first convolution with the feature of connecting of its flip horizontal image to partial occlusion facial image Operation and the first activation operation；

Convolutional layer C2 is used to successively carry out the output of convolutional layer C1 the second convolution operation, normalization operation and the second activation and grasps Make；

Convolutional layer C3 is used to successively carry out the output of convolutional layer C2 third convolution operation, block normalization operation and third activation behaviour Make；

Convolutional layer C4 is used to successively carry out the output of convolutional layer C3 the operation of Volume Four product, block normalization operation and the 4th activation and grasps Make；

Convolutional layer C5 is used to successively carry out the output of convolutional layer C4 the 5th convolution operation, block normalization operation and the 5th activation and grasps Make；

Convolutional layer C6 is used to successively carry out the output of convolutional layer C5 the 6th convolution operation, block normalization operation and the 6th activation and grasps Make；

Convolutional layer C7 is used to successively carry out the output of convolutional layer C6 the 7th convolution operation, block normalization operation and the 7th activation and grasps Make；

Light stream decoder includes warp lamination D1~warp lamination D8；

Warp lamination D1 is used to successively carry out the output of convolutional layer C8 the first deconvolution operation, block normalization operation and the 9th swashs Operation living；

Warp lamination D2 is for successively carrying out the second deconvolution operation, block normalization operation and the tenth to the output of warp lamination D1 Activation operation；

Warp lamination D3 is for successively carrying out third deconvolution operation, block normalization operation and the tenth to the output of warp lamination D2 One activation operation；

Warp lamination D4 is for successively carrying out the 4th deconvolution operation, block normalization operation and the tenth to the output of warp lamination D3 Two activation operations；

Warp lamination D5 is for successively carrying out the 5th deconvolution operation, block normalization operation and the tenth to the output of warp lamination D4 Three activation operations；

Warp lamination D6 is for successively carrying out the 6th deconvolution operation, block normalization operation and the tenth to the output of warp lamination D5 Four activation operations；

Warp lamination D7 is for successively carrying out the 7th deconvolution operation, block normalization operation and the tenth to the output of warp lamination D6 Five activation operations；

Warp lamination D8 is used to successively carry out the output of warp lamination D7 the 8th deconvolution operation, the 16th activation operates and double Linear interpolation operation；

The output of warp lamination D8 is deformation flipchart；

First activation operation to the 7th activation operation is all made of LReLU function, and the 8th activation operation is equal to the 15th activation operation Using ReLU function, the 16th activation operation uses Tanh function.

4. the facial image fill system of the convolutional neural networks as claimed in claim 3 based on symmetrical consistency, feature It is, N₂=8；

Illumination encoder includes convolutional layer C9~convolutional layer C16；

Convolutional layer C9 is for successively carrying out the 9th convolution with the feature of connecting of its flip horizontal image to partial occlusion facial image Operation and the 17th activation operation；

Convolutional layer C10 is used to successively carry out the output of convolutional layer C9 the tenth convolution operation, block normalization operation and the 18th swashs Operation living；

Convolutional layer C11 is for successively carrying out the 11st convolution operation, block normalization operation and the 19th to the output of convolutional layer C10 Activation operation；

Convolutional layer C12 is for successively carrying out the 12nd convolution operation, block normalization operation and the 20th to the output of convolutional layer C11 Activation operation；

Convolutional layer C13 is for successively carrying out the 13rd convolution operation, block normalization operation and the 20th to the output of convolutional layer C12 One activation operation；

Convolutional layer C14 is for successively carrying out the 14th convolution operation, block normalization operation and the 20th to the output of convolutional layer C13 Two activation operations；

Convolutional layer C15 is for successively carrying out the 15th convolution operation, block normalization operation and the 20th to the output of convolutional layer C14 Three activation operations；

Convolutional layer C16 is used to successively carry out the output of convolutional layer C15 the 16th convolution operation and the 24th activation operates；

Illumination decoder includes warp lamination D9~warp lamination D16；

Warp lamination D9 is for successively carrying out the 9th deconvolution operation, block normalization operation and second to the output of convolutional layer C16 15 activation operations；

Warp lamination D10 is used to successively carry out the output of warp lamination D9 the tenth deconvolution operation, block normalization operation and the 26 activation operations；

Warp lamination D11 be used for the output of warp lamination D10 successively carry out the 11st deconvolution operation, block normalization operation and 27th activation operation；

Warp lamination D12 be used for the output of warp lamination D11 successively carry out the 12nd deconvolution operation, block normalization operation and 28th activation operation；

Warp lamination D13 be used for the output of warp lamination D12 successively carry out the 13rd deconvolution operation, block normalization operation and 29th activation operation；

Warp lamination D14 be used for the output of warp lamination D13 successively carry out the 14th deconvolution operation, block normalization operation and 30th activation operation；

Warp lamination D15 be used for the output of warp lamination D14 successively carry out the 15th deconvolution operation, block normalization operation and 31st activation operation；

The output of warp lamination D16 is illumination correction coefficient；

17th activation operation to the 23rd activation operation is all made of LReLU function, the 24th activation operation to the 30th One activation operation is all made of ReLU function.

5. the facial image fill system of the convolutional neural networks as claimed in claim 4 based on symmetrical consistency, feature It is, N₃=8；

Rebuilding encoder includes convolutional layer C17~convolutional layer C24；

Convolutional layer C17 blocks mould for the residue corresponding to the filled partial occlusion facial image of asymmetric missing pixel The series connection feature of plate successively carries out the 17th convolution operation and the 32nd activation operation；

Convolutional layer C18 is for successively carrying out the 18th convolution operation, block normalization operation and the 30th to the output of convolutional layer C17 Three activation operations；

Convolutional layer C19 is for successively carrying out the 19th convolution operation, block normalization operation and the 30th to the output of convolutional layer C18 Four activation operations；

Convolutional layer C20 is for successively carrying out the 20th convolution operation, block normalization operation and the 30th to the output of convolutional layer C19 Five activation operations；

Convolutional layer C21 is for successively carrying out the 21st convolution operation, block normalization operation and third to the output of convolutional layer C20 16 activation operations；

Convolutional layer C22 is for successively carrying out the 22nd convolution operation, block normalization operation and third to the output of convolutional layer C21 17 activation operations；

Convolutional layer C23 is for successively carrying out the 23rd convolution operation, block normalization operation and third to the output of convolutional layer C22 18 activation operations；

Convolutional layer C24 is used to successively carry out the output of convolutional layer C23 the 24th convolution operation and the 39th activation operates；

Rebuilding decoder includes warp lamination D17~warp lamination D24；

Warp lamination D17 is used to successively carry out the output of convolutional layer C24 the 17th deconvolution operation, block normalization operation, the One forgets operation, fisrt feature serial operation and the 40th activation operation；

Warp lamination D18 be used for the output of warp lamination D17 successively carry out eighteen incompatibilities convolution operation, block normalization operation, Second forgets operation, second feature serial operation and the 41st activation operation；

Warp lamination D19 be used for the output of warp lamination D18 successively carry out the 19th deconvolution operation, block normalization operation, Third forgets operation, third feature serial operation and the 42nd activation operation；

Warp lamination D20 be used for the output of warp lamination D19 successively carry out the 20th deconvolution operation, block normalization operation, Fourth feature serial operation and the 43rd activation operation；

Warp lamination D21 is used to successively carry out the output of warp lamination D20 the 21st deconvolution operation, block normalization behaviour Make, fifth feature serial operation and the 44th activation operate；

Warp lamination D22 is used to successively carry out the output of warp lamination D21 the 22nd deconvolution operation, block normalization behaviour Make, sixth feature serial operation and the 45th activation operate；

Warp lamination D23 is used to successively carry out the output of warp lamination D22 the 23rd deconvolution operation, block normalization behaviour Make, seventh feature serial operation and the 46th activation operate；

Warp lamination D24 is used to successively carry out the output of warp lamination D23 the 24th deconvolution operation and the 47th and swashs Operation living；

32nd activation operation to the 38th activation operation and the 40th activation operation are equal to the 46th activation operation Using LReLU function, the 39th activation operation uses ReLU function, and the 47th activation operation uses Sigmoid function；

Fourth feature serial operation is the output of the output of series connection convolutional layer C20 and the block normalization operation of warp lamination D20；

Fifth feature serial operation is the output of the output of series connection convolutional layer C19 and the block normalization operation of warp lamination D21；

Sixth feature serial operation is the output of the output of series connection convolutional layer C18 and the block normalization operation of warp lamination D22；

Seventh feature serial operation is the output of the output of series connection convolutional layer C17 and the block normalization operation of warp lamination D23.

6. the facial image fill system of the convolutional neural networks as claimed in claim 5 based on symmetrical consistency, feature It is, the facial image fill system further includes trained network；

For the x-axis coordinate for not blocking i-th of face key point in facial image g,Not block in facial image g i-th The y-axis coordinate of face key point,For the x-axis coordinate of j-th of face key point in flip horizontal image g',It is turned over for level Turn the y-axis coordinate of j-th of face key point in image g'；

X-axis coordinate and y-axis coordinate are normalized in [- 1,1]；

Face key point is lost into l_lmIs defined as:

In formula,For the x-axis coordinate value of the corresponding light stream vectors of i-th of face key point in flip horizontal image g', For the y-axis coordinate value of the corresponding light stream vectors of i-th of face key point in flip horizontal image g'；

In formula,For gradient along the x-axis direction,For gradient along the y-axis direction, Φ^xFor the x-axis coordinate of light stream vectors Value, Φ^yFor the y-axis coordinate value of light stream vectors；

In formula, N isFour adjacent positions, I^o'For the flip horizontal image of partial occlusion facial image；

It defines illumination consistency and loses L_lAre as follows:

L_l=| | I^w'⊙R-g||²

In formula, R is illumination correction coefficient.

7. the facial image fill system of the convolutional neural networks as claimed in claim 6 based on symmetrical consistency, feature It is, training network constrains symmetrical missing pixel by rebuilding loss and fills subsystem study, concrete mode are as follows:

Rebuilding loss includes the corresponding Euclidean distance damage for not blocking facial image g of the filled facial image of missing pixel It becomes estranged and rebuilds the loss of recognition of face network characterization；

By the corresponding Euclidean distance loss for not blocking facial image g of the filled facial image of missing pixel is defined as:

In formula,For the filled facial image of missing pixel；

Recognition of face network characterization is lost is defined as:

In formula,When the facial image blocked for importation, obtain l layers of preparatory trained VGG-Face network Convolution feature, ψ_lIt (g) is the l layers of convolution that preparatory trained VGG-Face network obtains when inputting not blocking facial image g Feature, C_l、H_lAnd W_lRespectively importation block facial image when, l that preparatory trained VGG-Face network obtains Port number, height and the width of layer convolution feature.

8. the facial image fill system of the convolutional neural networks as claimed in claim 7 based on symmetrical consistency, feature It is, training network also constrains symmetrical missing pixel filling subsystem study, concrete mode by perception symmetric loss are as follows:

The Symmetric Loss that perception symmetric loss is characterized on layer inputs asymmetric missing using the mode of shared sub-network respectively The flip horizontal image of partial occlusion facial image and the image after pixel filling obtains rebuilding l layers of feature in decoder Ω_lWith Ω '_l, define perception symmetric loss are as follows:

In formula, C_lIndicate feature Ω_lOr Ω '_lPort number, Φ_↓Indicate to the output Φ of light stream network be down sampled to Ω_lOr Ω '_lSize it is consistent,Indicate corresponding to the filled partial occlusion facial image of asymmetric missing pixel surplus The remaining mould that blocks is down sampled to and Ω_lOr Ω '_lSize it is consistent.

9. the facial image fill system of the convolutional neural networks as claimed in claim 8 based on symmetrical consistency, feature It is, training network also constrains symmetrical missing pixel by differentiating loss and fills subsystem study, concrete mode are as follows:

Position differentiate network successively using in the facial image of partial occlusion left eye, right eye, nose and mouth position it is defeated as its Enter, be upsampled to uniform sizes respectively, and is sequentially output left eye, right eye, nose and the differentiation at mouth position loss；

Convolutional layer E1 is used to successively carry out the facial image of partial occlusion the 25th convolution operation and the 48th activation is grasped Make；

Convolutional layer E2 is for successively carrying out the 26th convolution operation, block normalization operation and the 40th to the output of convolutional layer E1 Nine activation operations；

Convolutional layer E3 is for successively carrying out the 27th convolution operation, block normalization operation and the 50th to the output of convolutional layer E2 Activation operation；

Convolutional layer E4 is for successively carrying out the 28th convolution operation, block normalization operation and the 50th to the output of convolutional layer E3 One activation operation；

Convolutional layer E5 is used to successively carry out the output of convolutional layer E4 the 29th convolution operation and the 52nd activation operates；

48th activation operation to the 51st activation operation is all made of LReLU function, and the 52nd activation operation uses Sigmoid function；

The entropy loss that the feature of T × T and the 0 of T × T or 1 that position differentiation network exports are intersected is that position differentiates loss.

10. the facial image fill system of the convolutional neural networks as claimed in claim 9 based on symmetrical consistency, feature It is, training network fills subsystem to asymmetric missing pixel using Adam optimization algorithm and symmetrical missing pixel fills subsystem System carries out end-to-end training.