Disclosure of Invention
In view of the foregoing, it is desirable to provide a multi-angle side face rectification method, apparatus, computer device and storage medium capable of improving multi-angle side face rectification effect.
A method of multi-angle side face rectification, the method comprising:
acquiring a training data set for face correction, and preprocessing a multi-angle side face image in the training data set to obtain a side local image and a zoomed side global image;
inputting the side local graph and the side global graph into a generator network; the generator network comprises a local network and a global network; the local network comprises a first local network and a second local network; the first local network is used for correcting and restoring the side local image to obtain a front local image; adding an attention weighting adjustment channel after each layer of convolution by the second local network, and extracting edge contour information according to the side local graph through the second local network; the global network is used for obtaining global detail information according to the side global image and obtaining a restoration correction global image according to the front local image, the edge profile information and the global detail information;
inputting the reduction correction global graph and a corresponding real global graph in the training data set into a discriminator network, and respectively obtaining a prediction probability graph of the reduction correction global graph and the prediction probability graph of the real global graph through the discriminator network;
training the generator network and the discriminator network according to a preset loss function, the restoration correction global graph and the prediction probability graph of the real global graph until convergence to obtain a trained generator network;
and inputting the multi-angle side face image to be corrected into the trained generator network to obtain the multi-angle side face correction image.
In one embodiment, the method further comprises the following steps: acquiring a training data set for face correction;
estimating feature points of the multi-angle side face images in the training data set through a trained deep learning network model;
calculating to obtain the rotation angle of the human face according to the position coordinates of the feature points of the left eye, the right eye and the mouth;
rotating the face to a horizontal position by taking the eyes as a reference according to the rotation angle;
according to the positions of the characteristic points of the eyes, the nose and the mouth on the rotated image, a side local image is obtained by cutting;
and zooming the multi-angle side face image according to a preset size to obtain a zoomed side global image.
In one embodiment, the method further comprises the following steps: extracting side face data according to the positions of the feature points; the side face data comprises side eye data, side nose data, side mouth data, and side contour data;
performing pixel comparison on the side face data and front face data in a training data set through the first local network, and correcting and restoring the side face local image in a network self-learning mode to obtain a front face local image; the front partial view includes a front eye view, a front nose view and a front mouth view.
In one embodiment, the method further comprises the following steps: inputting the side profile data into the second local network;
learning and outputting edge contour information through a self-attention mechanism self-attention contour pixel after each layer of convolution through an attention weighting adjustment channel in the second local network.
In one embodiment, the method further comprises the following steps: inserting a front eye image, a front nose image and a front mouth image in the front partial image into an original partial position according to a fixed proportion to obtain a partial splicing image;
and according to the local splicing map, fusing the edge contour information and the global detail information to obtain a reduction correction global map.
In one embodiment, the method further comprises the following steps: the loss function of the global network comprises: fight against loss, synthesis loss and identity retention loss.
In one embodiment, the method further comprises the following steps: the predicted probability plot output by the discriminator network is a 4 x 4 probability plot.
A multi-angle side face orthotic device, the device comprising:
the system comprises a preprocessing module, a side local image processing module and a side global image processing module, wherein the preprocessing module is used for acquiring a training data set for face correction and preprocessing a multi-angle side face image in the training data set to obtain a side local image and a zoomed side global image;
a generator network module for inputting the side local graph and the side global graph into a generator network; the generator network comprises a local network and a global network; the local network comprises a first local network and a second local network; the first local network is used for correcting and restoring the side local image to obtain a front local image; adding an attention weighting adjustment channel after each layer of convolution by the second local network, and extracting edge contour information according to the side local graph through the second local network; the global network is used for obtaining global detail information according to the side global image and obtaining a restoration correction global image according to the front local image, the edge profile information and the global detail information;
the discriminator network module is used for inputting the reduction correction global graph and the corresponding real global graph in the training data set into a discriminator network, and respectively obtaining the prediction probability graphs of the reduction correction global graph and the real global graph through the discriminator network;
the network training module is used for training the generator network and the discriminator network until convergence according to a preset loss function, the restoration correction global graph and the prediction probability graph of the real global graph to obtain a trained generator network;
and the network application module is used for inputting the multi-angle side face image to be corrected into the trained generator network to obtain the multi-angle side face correction image.
A computer device comprising a memory and a processor, the memory storing a computer program, the processor implementing the following steps when executing the computer program:
acquiring a training data set for face correction, and preprocessing a multi-angle side face image in the training data set to obtain a side local image and a zoomed side global image;
inputting the side local graph and the side global graph into a generator network; the generator network comprises a local network and a global network; the local network comprises a first local network and a second local network; the first local network is used for correcting and restoring the side local image to obtain a front local image; adding an attention weighting adjustment channel after each layer of convolution by the second local network, and extracting edge contour information according to the side local graph through the second local network; the global network is used for obtaining global detail information according to the side global image and obtaining a restoration correction global image according to the front local image, the edge profile information and the global detail information;
inputting the reduction correction global graph and a corresponding real global graph in the training data set into a discriminator network, and respectively obtaining a prediction probability graph of the reduction correction global graph and the prediction probability graph of the real global graph through the discriminator network;
training the generator network and the discriminator network according to a preset loss function, the restoration correction global graph and the prediction probability graph of the real global graph until convergence to obtain a trained generator network;
and inputting the multi-angle side face image to be corrected into the trained generator network to obtain the multi-angle side face correction image.
A computer-readable storage medium, on which a computer program is stored which, when executed by a processor, carries out the steps of:
acquiring a training data set for face correction, and preprocessing a multi-angle side face image in the training data set to obtain a side local image and a zoomed side global image;
inputting the side local graph and the side global graph into a generator network; the generator network comprises a local network and a global network; the local network comprises a first local network and a second local network; the first local network is used for correcting and restoring the side local image to obtain a front local image; adding an attention weighting adjustment channel after each layer of convolution by the second local network, and extracting edge contour information according to the side local graph through the second local network; the global network is used for obtaining global detail information according to the side global image and obtaining a restoration correction global image according to the front local image, the edge profile information and the global detail information;
inputting the reduction correction global graph and a corresponding real global graph in the training data set into a discriminator network, and respectively obtaining a prediction probability graph of the reduction correction global graph and the prediction probability graph of the real global graph through the discriminator network;
training the generator network and the discriminator network according to a preset loss function, the restoration correction global graph and the prediction probability graph of the real global graph until convergence to obtain a trained generator network;
and inputting the multi-angle side face image to be corrected into the trained generator network to obtain the multi-angle side face correction image.
The multi-angle side face correction method, the device, the computer equipment and the storage medium use a public data set as a training set to preprocess a multi-angle side face image to obtain a side face local image and a zoomed side face global image, and input the side face local image and the side face global image into a generator network, wherein the generator network comprises a local network and a global network, the side face local image is corrected into a front face local image through the local network, edge contour information is extracted by adding a contour network containing an attention mechanism, global detail information is obtained through the global network, data of the local network and the global network are fused to obtain a corrected restoration image output by the generator, network training is carried out through the training set by combining a discriminator, and the trained generator network is used for correcting the multi-angle side face. The invention extracts the contour information through the network containing the contour pixel attention mechanism, effectively reserves the details of the contour salient region, helps the generator to generate the salient features, restrains the loss of the details, can effectively correct the face of the face at multiple angles and helps to accurately identify the identity of the person.
Detailed Description
In order to make the objects, technical solutions and advantages of the present application more apparent, the present application is described in further detail below with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are merely illustrative of the present application and are not intended to limit the present application.
In one embodiment, as shown in fig. 1, a multi-angle side face rectification method is provided, which includes the following steps:
102, acquiring a training data set for face correction, and preprocessing a multi-angle side face image in the training data set to obtain a side local image and a zoomed side global image.
As shown in fig. 2(b), the rotation angle of the face is first calculated by using the angle between the eyes (a vector) and the mouth (b vector), and when there is a large face rotation angle and self-occlusion, the face information is filled up according to the symmetry of each part of the face. As shown in fig. 2(c), the angle is obtained according to the formula after determining the two-eye position:
a=(xleye,yleye)-(xmouth,ymouth)
b=(xreye,yreye)-(xmouth,ymouth)
wherein, a and b are vectors, phi is the angle between two vectors, i.e. the angle between the mouth part and the left eye part and the angle between the mouth part and the right eye part, the angle of rotation of the face can be roughly calculated, theta represents the angle of deviation of the eyes from the horizontal position, and (x)leye,yleye) And (x)reye,yreye) The pixel coordinates of the left-eye feature point and the pixel coordinates of the right-eye feature point are respectively.
After the angle is calculated, the person image is rotated to a horizontal position with both eyes as a reference, and finally, a face part picture is cut through the positions of the feature points of each part in fig. 2 (d).
Fig. 3 shows a side local graph extracted by setting a threshold value in advance and a resized side global graph based on the feature points shown in fig. 2 (b).
Step 104, inputting the side local graph and the side global graph into a generator network.
The generator network includes a local network and a global network. The local network comprises a first local network and a second local network; the first local network is used for correcting and restoring the side local image to obtain a front local image; an attention weighting adjustment channel is added to the second local network after each layer of convolution, and edge contour information is extracted through the second local network according to the side local graph; the global network is used for obtaining global detail information according to the side global image and obtaining a restoration correction global image according to the front local image, the edge profile information and the global detail information.
The first local network comprises a plurality of learning networks of a left eye, a right eye, a mouth and a nose, corresponding local images are respectively extracted, local detail characteristics are learned through the local networks, and the lateral images are restored to be front images through the loss function correction networks. The model of the first local network is shown in fig. 4, with a loss function of:
wherein L is
pixelFor similar pixel loss function values, W is the width value of the processed image, H is the height value of the processed image, I
fAs a side image, G
θTheta in (-) is the learned parameter in generator G,
is the pixel value with coordinates (x, y) in the side image generating front image,
is the pixel value with coordinates (x, y) in the real image.
And (3) carrying out constraint by using a similar pixel loss function, wherein l1 norm similarity constraint is adopted instead of cosine constraint because the generated graph is a 3-channel graph, and whether the generated graph is unified with a reference graph can be judged only by pixel distance. The output of the first local network is shown in fig. 5.
The second local network is a contour pixel learning network that learns the overall details of the face contour by self-noticing the contour pixels. The structure of the second local network is shown in fig. 6, the network adds a channel after each layer of convolution through down-sampling convolution and up-sampling convolution, and the attention of the convolution network is improved. The thermodynamic diagram of the convolution output of each layer, i.e. the noted regions, is shown in fig. 7. The neural network mainly learns about the edge contour information region, for example, fig. 8(c) is an output result of the second local network, and the details of the complete face contour of the person are completely restored after the original image features are learned through the intermediate structure. If there is no attention from multiple space, space and multiple scales, as shown in fig. 8(b), only a little local detail feature of the face can be seen, and the feature cannot be restored really.
And inputting the side global graph by the global network, firstly extracting the side global graph, learning global detail characteristics through the global network, constraining the network through a loss function, and finally fusing the local graph to restore and correct the global graph. The model of the global network is shown in fig. 9.
The similarity calculation formula of the global path comprises a countermeasure loss function, a synthesis loss function and an identity retention loss function, and the network is constrained by the sum of the above loss functions.
The confrontation loss is composed of a generator G and a discriminator D, the generator generates a face image deception discriminator, the discriminator distinguishes a real image from the real image and a synthesized face image, and the calculation formula of the loss is as follows:
wherein L is
advTo combat the loss function values, N is the picture batch sent into the network,
for discrimination by using the learned discriminator parameter theta,
to generate by using the learned generator parameter theta,
is the nth side image.
Symmetric similarity is an inherent feature of a human face. The self-shielding problem can be effectively relieved by using the knowledge in the field as a priori and applying symmetric constraint on the synthetic image, so that the synthetic performance under the condition of a large angle is greatly improved. The formula is as follows:
wherein L is
symFor the synthesis of the loss function value, G
θTheta in (-) is learning in generator GAnd the parameters obtained.
Is the pixel value with coordinates (x, y) in the side image, and W- (x-1) is the side image I
fThe symmetrical abscissa of the middle W position, W/2, is because the human face has similarity, only half of the width is calculated.
The identity retention loss is:
wherein L is
ipPreserving loss function values for identities, wherein L
e(. is a feature map of the extracted image, L
v(. cndot.) is a feature vector of the extracted image.
Extracting the characteristic value of the position of the real image characteristic diagram (x, y),
the feature value of the position where the image feature map (x, y) is generated is extracted. N is the length of the vector, L
v(I
pred)
iRepresenting the ith value, L, of the extracted real image vector
v(G
θ(I
f))
iIndicating that the ith value of the generated image vector is extracted.
The output of the overall network is fused with the local global feature output as shown in fig. 10.
And 106, inputting the restored and corrected global graph and the corresponding real global graph in the training data set into a discriminator network, and respectively obtaining the predicted probability graphs of the restored and corrected global graph and the real global graph through the discriminator network.
The role of the discriminator is to discriminate the genuineness of the generated image and the real image, thereby penalizing the generator and bringing the generator closer to the real image. The structure of the discriminator is shown in fig. 11, and the discriminator adopts multilayer convolution, inputs real images and composite images, and finally outputs a 4 x 4 probability graph, compared with the traditional generation countermeasure network in which the discriminator only outputs a scalar value to represent the reality of the generated graph, the discriminator of the model adopts a 4 x 4 probability graph, wherein each probability value corresponds to a local receptive field in the image, and can just separate and discriminate the irrelevant positions of the human face, and the method can improve the capability of the discriminator on the local details.
The discriminator loss function is a cross-entropy loss function as shown in the following equation:
wherein, P
dataIn order to be the real data,
representing the probability of the discriminator discriminating the real data,
the global representation z is generated by a generator to generate a frontal image and a discriminator discriminates the probability that the generated image is true or false, where P
zRepresenting the side images and random noise.
Representing a fixed generator, so that the discriminator can distinguish real data and false data as much as possible,
representing a fixed discriminator, the generator generates as realistic data as possible to make the discriminator unrecognizable.
All loss calculations are made at the output of D (discriminator), which is 4 x 4 true/false, thus a cross-entropy loss constrained network is used.
And 108, training the generator network and the discriminator network according to the preset loss function, the restoration correction global graph and the prediction probability graph of the real global graph until convergence, and obtaining the trained generator network.
And step 110, inputting the multi-angle side face image to be corrected into the trained generator network to obtain the multi-angle side face correction image.
According to the multi-angle side face correction method, a public data set is used as a training set, a multi-angle side face image is preprocessed to obtain a side face local image and a zoomed side face global image, the side face local image and the zoomed side face global image are input into a generator network, the generator network comprises a local network and a global network, the side face local image is corrected into a front face local image through the local network, edge contour information is extracted by adding a contour network containing an attention mechanism, global detail information is obtained through the global network, data of the local network and the global network are fused to obtain a correction restoration image output by the generator, network training is carried out through the training set by combining a discriminator, and the trained generator network is used for correcting the multi-angle side face. The invention extracts the contour information through the network containing the contour pixel attention mechanism, effectively reserves the details of the contour salient region, helps the generator to generate the salient features, restrains the loss of the details, can effectively correct the face of the face at multiple angles and helps to accurately identify the identity of the person.
In one embodiment, the method further comprises the following steps: acquiring a training data set for face correction; estimating feature points of the multi-angle side face images in the training data set through the trained deep learning network model; calculating to obtain the rotation angle of the human face according to the position coordinates of the feature points of the left eye, the right eye and the mouth; rotating the face to a horizontal position by taking the eyes as a reference according to the rotation angle; according to the positions of the characteristic points of the eyes, the nose and the mouth on the rotated image, a side local image is obtained by cutting; and zooming the multi-angle side face image according to a preset size to obtain a zoomed side global image.
In one embodiment, the method further comprises the following steps: extracting side face data according to the positions of the feature points; the side face data comprises side eye data, side nose data, side mouth data, and side contour data; performing pixel comparison on the side face data and the front face data in the training data set through a first local network, and performing correction reduction on the side face local graph in a network self-learning mode to obtain a front face local graph; the front partial view includes a front eye view, a front nose view and a front mouth view.
In one embodiment, the method further comprises the following steps: inputting the side profile data into a second local network; and learning and outputting edge contour information through a self-attention mechanism self-attention contour pixel after each layer of convolution through an attention weighting adjustment channel in the second local network.
In one embodiment, the method further comprises the following steps: inserting the front eye image, the front nose image and the front mouth image in the front partial image into the original partial position according to a fixed proportion to obtain a partial splicing image; and according to the local splicing map, fusing the edge contour information and the global detail information to obtain a reduction correction global map.
In one embodiment, the method further comprises the following steps: the loss function of the global network includes: fight against loss, synthesis loss and identity retention loss.
In one embodiment, the method further comprises the following steps: the predicted probability plot of the discriminator network output is 4 x 4 probability plot.
In one embodiment, the overall network of generators is as shown in FIG. 14.
In a specific embodiment, the method of the present invention is subjected to application simulation. And inputting a multi-angle gray-scale figure image and restoring the front face. The network test is carried out by adopting a 30-degree elevation angle of a plurality of persons, the output result is shown as figure 12, the multi-angle recovery picture is shown as figure 13, and the elevation, the head-up or the overlook are combined with the positive angle to horizontally turn for-15 degrees, -30 degrees and-45 degrees, so that the better correction effect is achieved.
It should be understood that, although the steps in the flowchart of fig. 1 are shown in order as indicated by the arrows, the steps are not necessarily performed in order as indicated by the arrows. The steps are not performed in the exact order shown and described, and may be performed in other orders, unless explicitly stated otherwise. Moreover, at least a portion of the steps in fig. 1 may include multiple sub-steps or multiple stages that are not necessarily performed at the same time, but may be performed at different times, and the order of performance of the sub-steps or stages is not necessarily sequential, but may be performed in turn or alternately with other steps or at least a portion of the sub-steps or stages of other steps.
In one embodiment, as shown in fig. 15, there is provided a multi-angle side face rectification device, comprising: a pre-processing module 1502, a generator network module 1504, a discriminator network module 1506, a network training module 1508, and a network application module 1510, wherein:
a preprocessing module 1502, configured to obtain a training data set for face correction, and preprocess a multi-angle side face image in the training data set to obtain a side face local image and a zoomed side face global image;
a generator network module 1504 for inputting the side local graph and the side global graph into a generator network; the generator network comprises a local network and a global network; the local network comprises a first local network and a second local network; the first local network is used for correcting and restoring the side local image to obtain a front local image; an attention weighting adjustment channel is added to the second local network after each layer of convolution, and edge contour information is extracted through the second local network according to the side local graph; the global network is used for obtaining global detail information according to the side global image and obtaining a restoration correction global image according to the front local image, the edge profile information and the global detail information;
the discriminator network module 1506 is configured to input the restored and corrected global graph and the corresponding real global graph in the training data set into a discriminator network, and obtain the predicted probability graphs of the restored and corrected global graph and the real global graph through the discriminator network respectively;
a network training module 1508, configured to train the generator network and the discriminator network according to a preset loss function, the prediction probability map of the reduction correction global map and the real global map until convergence, so as to obtain a trained generator network;
and the network application module 1510 is used for inputting the multi-angle side face image to be corrected into the trained generator network to obtain the multi-angle side face correction image.
The preprocessing module 1502 is further configured to obtain a training data set for face correction; estimating feature points of the multi-angle side face images in the training data set through the trained deep learning network model; calculating to obtain the rotation angle of the human face according to the position coordinates of the feature points of the left eye, the right eye and the mouth; rotating the face to a horizontal position by taking the eyes as a reference according to the rotation angle; according to the positions of the characteristic points of the eyes, the nose and the mouth on the rotated image, a side local image is obtained by cutting; and zooming the multi-angle side face image according to a preset size to obtain a zoomed side global image.
The generator network module 1504 is further configured to extract side face data from the feature point locations; the side face data comprises side eye data, side nose data, side mouth data, and side contour data; performing pixel comparison on the side face data and the front face data in the training data set through a first local network, and performing correction reduction on the side face local graph in a network self-learning mode to obtain a front face local graph; the front partial view includes a front eye view, a front nose view and a front mouth view.
The generator network module 1504 is also used to input the side profile data into a second local network; and learning and outputting edge contour information through a self-attention mechanism self-attention contour pixel after each layer of convolution through an attention weighting adjustment channel in the second local network.
The generator network module 1504 is further configured to insert the front eye diagram, the front nose diagram and the front mouth diagram in the front partial diagram into an original partial position according to a fixed ratio to obtain a partial mosaic; and according to the local splicing map, fusing the edge contour information and the global detail information to obtain a reduction correction global map.
For specific limitations of the multi-angle side face rectification device, reference may be made to the above limitations of the multi-angle side face rectification method, which is not described herein again. All modules in the multi-angle side face correction device can be completely or partially realized through software, hardware and a combination thereof. The modules can be embedded in a hardware form or independent from a processor in the computer device, and can also be stored in a memory in the computer device in a software form, so that the processor can call and execute operations corresponding to the modules.
In one embodiment, a computer device is provided, which may be a terminal, and its internal structure diagram may be as shown in fig. 16. The computer device includes a processor, a memory, a network interface, a display screen, and an input device connected by a system bus. Wherein the processor of the computer device is configured to provide computing and control capabilities. The memory of the computer device comprises a nonvolatile storage medium and an internal memory. The non-volatile storage medium stores an operating system and a computer program. The internal memory provides an environment for the operation of an operating system and computer programs in the non-volatile storage medium. The network interface of the computer device is used for communicating with an external terminal through a network connection. The computer program is executed by a processor to implement a multi-angle side face rectification method. The display screen of the computer equipment can be a liquid crystal display screen or an electronic ink display screen, and the input device of the computer equipment can be a touch layer covered on the display screen, a key, a track ball or a touch pad arranged on the shell of the computer equipment, an external keyboard, a touch pad or a mouse and the like.
Those skilled in the art will appreciate that the architecture shown in fig. 16 is merely a block diagram of some of the structures associated with the disclosed aspects and is not intended to limit the computing devices to which the disclosed aspects apply, as particular computing devices may include more or less components than those shown, or may combine certain components, or have a different arrangement of components.
In an embodiment, a computer device is provided, comprising a memory storing a computer program and a processor implementing the steps of the above method embodiments when executing the computer program.
In an embodiment, a computer-readable storage medium is provided, on which a computer program is stored, which computer program, when being executed by a processor, carries out the steps of the above-mentioned method embodiments.
It will be understood by those skilled in the art that all or part of the processes of the methods of the embodiments described above can be implemented by hardware instructions of a computer program, which can be stored in a non-volatile computer-readable storage medium, and when executed, can include the processes of the embodiments of the methods described above. Any reference to memory, storage, database, or other medium used in the embodiments provided herein may include non-volatile and/or volatile memory, among others. Non-volatile memory can include read-only memory (ROM), Programmable ROM (PROM), Electrically Programmable ROM (EPROM), Electrically Erasable Programmable ROM (EEPROM), or flash memory. Volatile memory can include Random Access Memory (RAM) or external cache memory. By way of illustration and not limitation, RAM is available in a variety of forms such as Static RAM (SRAM), Dynamic RAM (DRAM), Synchronous DRAM (SDRAM), Double Data Rate SDRAM (DDRSDRAM), Enhanced SDRAM (ESDRAM), Synchronous Link DRAM (SLDRAM), Rambus Direct RAM (RDRAM), direct bus dynamic RAM (DRDRAM), and memory bus dynamic RAM (RDRAM).
The technical features of the above embodiments can be arbitrarily combined, and for the sake of brevity, all possible combinations of the technical features in the above embodiments are not described, but should be considered as the scope of the present specification as long as there is no contradiction between the combinations of the technical features.
The above-mentioned embodiments only express several embodiments of the present application, and the description thereof is more specific and detailed, but not construed as limiting the scope of the invention. It should be noted that, for a person skilled in the art, several variations and modifications can be made without departing from the concept of the present application, which falls within the scope of protection of the present application. Therefore, the protection scope of the present patent shall be subject to the appended claims.