WO2019128508A1

WO2019128508A1 - Method and apparatus for processing image, storage medium, and electronic device

Info

Publication number: WO2019128508A1
Application number: PCT/CN2018/115470
Authority: WO
Inventors: 陈岩; 刘耀勇
Original assignee: Oppo广东移动通信有限公司
Priority date: 2017-12-28
Filing date: 2018-11-14
Publication date: 2019-07-04
Also published as: CN109978754A

Abstract

A method and apparatus for processing an image, a storage medium, and an electronic device. The method comprises: matching, according to position information of a face key point in an original image, the corresponding target face image from a preset face database, and aligning the target face image with the face in the original image; revising the target face image on the basis of the trained convolutional neural network model and a face area in the original image; and fusing the revised target face image with the original image.

Description

Image processing method, device, storage medium and electronic device

The present application claims priority to Chinese Patent Application No. JP-A No. No. No. No. No. No. No. No. No. No. No. No. No. No. No No No No No No No No No No No No No No No No No No No No No No No No No No No No No In this application.

Technical field

The present application relates to the field of image processing technologies, and in particular, to an image processing method, apparatus, storage medium, and electronic device.

Background technique

Existing electronic devices generally have a photographing and photographing function. With the rapid development of intelligent electronic devices and computer vision technology, users' demand for smart electronic device cameras is not limited to traditional photography and photography, but more inclined to image processing functions, such as smart beauty, style migration and other technologies. It is popularized by more and more intelligent electronic devices.

Summary of the invention

The embodiment of the present application provides an image processing method, device, storage medium, and electronic device, which can lighten image stitching and improve image synthesis.

In a first aspect, an embodiment of the present application provides an image processing method, which is applied to an electronic device, and includes:

Obtaining first position information of a face key point in the original image;

Matching the corresponding target face image from the preset face database according to the first location information, and aligning the target face image with the face in the original image;

Correcting the target face image based on the trained convolutional neural network model and the face region in the original image;

The corrected target face image is merged with the original image.

In a second aspect, an embodiment of the present application provides an image processing apparatus, which is applied to an electronic device, and includes:

a location obtaining module, configured to acquire first location information of a face key point in the original image;

An alignment module, configured to match a corresponding target face image from the preset face database according to the first location information, and align the target face image with the face in the original image;

a correction module for correcting the target face image based on the trained convolutional neural network model and the face region in the original image;

A fusion module for merging the corrected target face image with the original image.

In a third aspect, the embodiment of the present application further provides a storage medium, where the storage medium stores a plurality of instructions, and the instructions are adapted to be loaded by a processor to perform the following steps:

Obtaining first position information of a face key point in the original image;

The corrected target face image is merged with the original image.

In a fourth aspect, an embodiment of the present application further provides an electronic device, including a processor and a memory, where the processor is electrically connected to the memory, the memory is used to store instructions and data, and the processor is configured to perform the following steps. :

Construct a convolutional neural network;

Obtaining images of multiple angles of the face in the original image as training samples;

Based on the training samples, parameter training is performed on the constructed convolutional neural network to adjust the parameter settings of the content loss function, the illumination loss function, and the smoothing loss function to obtain a trained convolutional neural network model.

DRAWINGS

In order to more clearly illustrate the technical solutions in the embodiments of the present application, the drawings used in the description of the embodiments will be briefly described below. It is obvious that the drawings in the following description are only some embodiments of the present application. Other drawings can also be obtained from those skilled in the art based on these drawings without paying any creative effort.

FIG. 1 is a schematic diagram of a scenario structure of an electronic device for implementing deep learning according to an embodiment of the present application.

FIG. 2 is a schematic flowchart of an image processing method provided by an embodiment of the present application.

FIG. 3 is a schematic diagram of an application scenario of an image processing method provided by an embodiment of the present application.

4 is a schematic partial structural diagram of a convolutional neural network provided by an embodiment of the present application.

FIG. 5 is another application scenario diagram of an image processing method provided by an embodiment of the present application.

FIG. 6 is another schematic flowchart of an image processing apparatus according to an embodiment of the present application.

FIG. 7 is a schematic structural diagram of an image processing apparatus according to an embodiment of the present application.

FIG. 8 is another schematic structural diagram of an image processing apparatus according to an embodiment of the present application.

FIG. 9 is still another schematic structural diagram of an image processing apparatus according to an embodiment of the present application.

FIG. 10 is a schematic diagram of still another structure of an image processing apparatus according to an embodiment of the present application.

FIG. 11 is a schematic structural diagram of an electronic device according to an embodiment of the present application.

FIG. 12 is another schematic structural diagram of an electronic device according to an embodiment of the present application.

Detailed ways

The technical solutions in the embodiments of the present application are clearly and completely described in the following with reference to the drawings in the embodiments of the present application. It is obvious that the described embodiments are only a part of the embodiments of the present application, and not all of the embodiments. All other embodiments obtained by a person skilled in the art based on the embodiments of the present application without creative efforts are within the scope of the present application.

The embodiment of the present application provides an image processing method, device, storage medium, and electronic device. The details will be described separately below.

Referring to FIG. 1 , FIG. 1 is a schematic diagram of a scenario in which an electronic device implements deep learning according to an embodiment of the present disclosure.

When the user processes the image through the image processing function in the electronic device, the electronic device can record the input and output data during the processing. The electronic device may include a data collection and statistics system and a prediction system with feedback adjustment. The electronic device can acquire a large amount of image classification result data of the user through the data acquisition system, make corresponding statistics, and extract image features of the image, and analyze and process the extracted image features based on machine depth learning, and the loss to the convolutional neural network. The function performs parameter training. When an image is input, the electronic device predicts the classification result of the image through the prediction system. After the user makes the final selection behavior, the prediction system reversely reciprocates the weights of the weighting items according to the final result of the user behavior. After a number of iterative corrections, the weights of the weighting items of the prediction system are finally converged to form a learned database.

The electronic device may be a mobile terminal, such as a mobile phone, a tablet computer, or the like, or may be a conventional PC (Personal Computer), etc., which is not limited in this embodiment of the present application.

An embodiment of the present application provides an image processing method, which is applied to an electronic device, where the image processing method includes:

Obtaining first position information of a face key point in the original image;

The corrected target face image is merged with the original image.

In some embodiments, before acquiring the first location information of the face key in the original image, the method further includes:

Construct a convolutional neural network;

In some embodiments, the step of correcting the target face image based on the trained convolutional neural network model and the face region in the original image includes:

Extracting content features, illumination features, and smoothing features from the face regions of the original image based on the trained convolutional neural network model;

And generating an adjustment parameter according to the content feature, the illumination feature, and the smooth feature, under the constraint of the content loss function, the illumination loss function, and the smoothing loss function;

Correcting the target face image according to the adjustment parameter.

In some embodiments, the step of matching the corresponding target face image from the preset face database according to the location information, and aligning the target face image with the face in the original image includes:

Obtaining second location information of a face key point of the plurality of sample face images in the preset face database;

Matching the first location information with the second location information;

Selecting a sample face image with the largest matching degree from the sample face image as the target face image;

The target face image is mapped to the face region of the original image by affine transformation to align the target face image with the face in the original image.

In some embodiments, after the target face image is aligned with the face in the original image, and the corrected target face image is merged with the original image, the method further includes:

Performing edge feature point detection on the face region in the original image, and acquiring third position information of the edge feature point;

The original image is subjected to image segmentation processing according to the third position information to remove the face region, and the remaining image region is used as the background image.

In some embodiments, the step of fusing the corrected target face image with the original image comprises:

Generating a face mask according to the third location information;

The corrected target face image is merged with the background image in the original image by using the face mask.

In some embodiments, the acquiring first location information of a face key point in the original image includes:

Extracting image features of the original image;

Determining a face region in the original image according to the image feature;

The face key point detection is performed on the face area to obtain the first position information of the face key point.

In an embodiment, an image processing method is provided. As shown in FIG. 2, the flow may be as follows:

101. Acquire first location information of a key point of the face in the original image.

In the embodiment of the present application, the original image includes at least one human face. The original image may be an image captured by the electronic device through the camera, or may be an image directly obtained by the electronic device from a storage area of a server or other external device.

If the electronic device collects an image through its own camera, the acquired image may be an image obtained by self-timer of the front device head, or may be an image with a face collected by the rear camera. In some embodiments, the camera can be a digital camera or an analog camera. The digital camera converts the analog image signal generated by the image acquisition device into a digital signal, which is then stored in a computer. The image signal captured by the analog camera must be converted to a digital mode by a specific image capture card and compressed before being converted to a computer for use. The digital camera captures the image directly and transmits it to the computer via a serial, parallel or USB interface. In the embodiment of the present application, the electronic device generally adopts a digital camera to convert the collected image into data in real time and display it on the display interface of the electronic device (ie, the preview frame of the camera).

In the embodiment of the present application, the face image is first detected on the original image, the face region is determined, and then the face key point is detected from the face region to obtain the location information of the face key point. That is, in some embodiments, the step of "acquiring first location information of a face key point in the original image" may include the following process:

Extracting image features of the original image;

Determining a face region in the original image according to the image feature;

Specifically, Dlib can be used to detect face key points. Dlib is a machine learning C++ library that contains many algorithms commonly used in machine learning. Referring to FIG. 3, FIG. 3 is a schematic diagram of calibration results of detected key points of a face. Among them, the chin is calibrated by 17 key points of the face, the left and right eyebrows are each calibrated by 5 key points of the face, the nose is calibrated by 9 key points of the face, and the left and right eyes are respectively calibrated by 6 key points of the face. The mouth is calibrated by 20 key points of the face, with a total of 68 face key points.

102. Match the corresponding target face image from the preset face database according to the first location information, and align the target face image with the face in the original image.

Specifically, since the original image may be affected by the shooting angle at the time of shooting, the image is deformed, which seriously affects the recognition of the image. Therefore, it is necessary to take certain measures to transform the image and perform a certain degree of correction to facilitate the identification and registration of the machine. Therefore, in the embodiment of the present application, it is necessary to construct a face database in advance, and store an image of a reference face with different postures for performing identity replacement, in particular, to obtain photos of different angles, so that the original image can be in the face database. Refer to the face image as the target identity, and perform matching and face changing operations.

In the embodiment of the present application, the manner in which the target face image is aligned with the face in the original image may be various. For example, the target face image may be aligned with the face in the original image by using an affine transformation. That is, in some embodiments, the step of “matching the corresponding target face image from the preset face database according to the first location information and aligning the target face image with the face in the original image” may include the following process. :

Matching the first location information with the second location information;

Specifically, the first location information is matched with the second location information of the sample face image in the face database, and the greater the matching degree, the closer the posture and size of the two images are, so that the subsequent affine transformation process can be reduced. The computational difficulty of the medium transformation matrix. In order to improve the matching degree between the face image and the sample face image, a large number of face images with different postures can be obtained to increase the density of the deflection angle of the sample face image in the face database, and reduce the interval value between the deflection angles.

Among them, affine transformation, also known as affine mapping, means that in geometry, a vector space undergoes a linear transformation and is connected to a translation, transforming into another vector space. Affine transformations include: translation, rotation, scaling, beveling, and so on. To perform an affine transformation, you must first obtain the transformation matrix. To obtain the transformation matrix, you must first obtain the coordinates of the feature points, angles, etc., such as geometric matching, bolb and other methods can obtain the feature point coordinates, angle information.

In a specific implementation, first, coordinate and angle information (ie, second position information) of key points in the target face image are acquired, and then an affine transformation matrix is calculated according to the acquired second position information and the first position information. The target face image is affine transformed according to the calculated transformation matrix, and the target face image is mapped to the face position of the original image.

103. Correct the target face image based on the trained convolutional neural network model and the face region in the original image.

In some embodiments, before acquiring the first location information of the face key point in the original image, the method may further include the following steps:

Construct a convolutional neural network;

Based on the training samples, the constructed convolutional neural network is trained to adjust the content loss function, the illumination loss function, and the smoothing loss function parameters to obtain the trained convolutional neural network model.

Referring to FIG. 4, in the embodiment of the present application, the constructed convolutional neural network is a multi-scale architecture with branches, and the branches perform operations on different sampling versions according to the size of the input test image. Small images are automatically upsampled to a size of 2 after being convolved, and then channeled to a large image. Each such branch has a zero-filled convolution module followed by a linear rectification. These branches are then combined by a nearest-neighbor upsampling that differs by one and a concatenation along the channel axis.

In the specific training process, the training sample is first input, the parameter initialization is performed, and after the convolution and sampling process, the full connection layer is reached, and the affine transformation and parameter calculation are performed, and the processed image is outputted, and the loss is obtained by logistic regression analysis. The weight of the function is continuously feedback to correct the parameter settings of each loss function by artificially judging whether it meets the expectations.

In some embodiments, "correcting the target face image based on the trained convolutional neural network model and the face region in the original image" may include the following steps:

According to the content feature, the illumination feature, and the smooth feature, the adjustment parameter is generated under the constraint of the content loss function, the illumination loss function, and the smoothing loss function;

The target face image is corrected according to the adjustment parameters.

In the embodiment of the present application, the face exchange can be described as a problem of style migration. The goal of style migration is to render an image into the style of another image. Based on this, the pose and expression of the face in the original image are taken as the content, and the target face image is used as the style, and a loss function that allows the convolutional neural network to generate a high image realism result is designed. The loss function of the image is based on a feature map in an already trained neural network.

For the problem of the style loss function, the nearest neighbor method can be used, that is, the image at a certain position in the original image is replaced with the most similar segment in the target image. The search domain is restricted according to the key points extracted from the face. That is, for a certain part of the face in the original image, a similar tile search is performed only near a certain part of the mesh image.

In some embodiments, multiple images of the target face, ie multiple style images, are required. In the case of similar slice search, the loss is limited in the image area, but the search can be performed on the pieces extracted by the plurality of images, so that a variety of expressions can be reproduced.

In this embodiment, in order to keep the illumination unchanged during the face changing process, it is necessary to punish the transformation on the illumination. In order to extract the illumination changes, the algorithm trained a convolutional neural network classifier for illumination. For two images that are invariant except for illumination, the classifier determines whether the pair of images has undergone illumination transformation, and uses the feature map obtained from the network to calculate the illumination loss.

104. The corrected target face image is merged with the original image.

In the embodiment of the present application, there may be multiple ways to fuse the corrected target face image with the original image. One type is a region-based algorithm, which refers to a parameter that uses the relationship between two images to determine the coordinate change between images, including a space-based pixel registration algorithm and a frequency domain-based algorithm. The other type is an algorithm based on feature splicing, which uses the obvious features (points, lines, edges, contours, corner points) in the image to calculate the transformation between images. The third type is based on the splicing of the maximum mutual information, and the splicing work is shifted from the spatial domain to the small domain wave, and the wavelet reconstruction can obtain a complete image.

In some embodiments, after the target face image is aligned with the face in the original image, before the corrected target face image is merged with the original image, the following steps may be further included:

Specifically, the pattern is cut along the position of the detected edge feature point. Since the edge feature point is a face edge feature point, the face region can be finally separated from the original image, and finally the background image can be obtained.

Then the step "merging the corrected target face image with the original image" may include the following process:

Generating a face mask according to the third location information;

The face mask is used to fuse the corrected target face image with the background image in the original image.

The position information of the edge feature points may be relative position information between the edge feature points. A closed pattern is formed based on the edge feature points, and a region other than the closed pattern is used as a segmentation mask. The face mask is superimposed on the target face image mapped on the face region of the original image, and aligned with the face region of the original image, and the region of the target face image that is not blocked by the face mask is displayed. The occluded area is not displayed. For example, referring to FIG. 5, where a is the target face image, b is the original image, c is a face mask generated based on the face region in the original image b, and d is obtained by affine transformation of the target image a. The image is finally output as the fused image e after changing the face.

When the face image in the target picture is replaced with the processed target face image, the processed target face image may be merged with the target image based on the Poisson fusion technique to cover the original face image in the target image. Thereby, the face image in the target picture is replaced with the processed target face image. Among them, Poisson fusion technology can better eliminate the boundary between the target face image and the target image, making the picture more natural and unobtrusive, achieving seamless splicing.

It can be seen that the first location information of the face key point in the original image is obtained; the corresponding target face image is matched from the preset face database according to the first location information, and the target face image is in the original image. Face alignment; based on the trained convolutional neural network model and the face region in the original image, the target face image is corrected; and the corrected target face image is merged with the original image. The program can better maintain certain features of the original image through deep learning technology, and at the same time, it can dilute the image stitching and improve the image synthesis effect.

In an embodiment, another image processing method is also provided. As shown in FIG. 6, the flow may be as follows:

201. Construct a convolutional neural network.

In the embodiment of the present application, in the embodiment of the present application, the constructed convolutional neural network is a multi-scale architecture with branches, and the branches are executed on different sampling versions according to the size of the input test image. Operation. Small images are automatically upsampled to a size of 2 after being convolved, and then channeled to a large image. Each such branch has a zero-filled convolution module followed by a linear correction. These branches are then combined by a nearest neighbor upsampling that differs by one and a cascade along the channel axis.

202. Perform parameter training on the constructed convolutional neural network based on the training samples to adjust the content loss function, the illumination loss function, and the parameter setting of the smoothing loss function to obtain a trained convolutional neural network model.

In the embodiment of the present application, in order to ensure that certain features in the original image are not lost after the subsequent face changing, parameter training of the loss function in the convolutional neural network is performed.

Specifically, the face exchange can be described as a problem of style migration. The goal of style migration is to render an image into the style of another image. Based on this, the pose and expression of the face in the original image are taken as the content, and the target face image is used as the style, and a loss function that allows the convolutional neural network to generate a high image realism result is designed.

In some embodiments, images of multiple angles of the same face may be acquired as training samples. In the specific training process, the training sample is first input, the parameter initialization is performed, and after the convolution and sampling process, the full connection layer is reached, and the affine transformation and parameter calculation are performed, and the processed image is outputted, and the loss is obtained by logistic regression analysis. The weight of the function is continuously feedback to correct the parameter settings of each loss function by artificially judging whether it meets the expectations.

203. The face key point detection acquires first position information of the face key point in the original image and second position information of the face key point of the target face image.

In the embodiment of the present application, the original image includes at least one human face. The original image may be an image captured by the electronic device through the camera, or may be an image directly obtained by the electronic device from a storage area of a server or other external device. The target face image is a reference face for the identity replacement of the face in the original image.

Therefore, in the embodiment of the present application, it is necessary to construct a face database in advance, and store an image of a reference face with different postures for performing identity replacement, in particular, to obtain photos of different angles, so that the original image can be in the face database. Refer to the face image as the target identity, and perform matching and face changing operations.

In the specific implementation, the original image is firstly detected by the face, the face area is determined, and then the face key point is detected from the face area to obtain the position information of the face key point.

204. Calculate an affine transformation matrix according to the first location information and the second location information, and map the target face image to the face region of the original image based on the affine transformation matrix.

Since the original image may be affected by the shooting angle when shooting, the image is deformed, which seriously affects the recognition of the image. Therefore, it is necessary to take certain measures to transform the image and perform a certain degree of correction to facilitate the identification and registration of the machine.

Specifically, the first location information is matched with the two location information, and a matching relationship between the original image and the face key points in the target face image is established one by one. Then, the affine transformation matrix of the two images is obtained through the face key point pair, and the target face image is mapped to the face region of the original image based on the affine transformation matrix, so that the target face image and the original image are in the original image. The faces are aligned.

205. Correct the target face image based on the trained convolutional neural network model and the face region in the original image.

Specifically, the content feature, the illumination feature, and the smooth feature are extracted from the face region of the original image based on the trained convolutional neural network model; the content loss function, the illumination loss function, according to the content feature, the illumination feature, and the smooth feature, Under the constraint of the smoothing loss function, the corresponding adjustment parameters are generated, and the target face image is corrected according to the adjustment parameters. Through the trained convolutional neural network model to adjust the parameters, it is better to keep the facial expression, skin color, illumination and other features in the original image unchanged, making the face after changing face more natural.

206. Perform edge feature point detection on the face region in the original image, and obtain third location information of the edge feature point.

Similarly, the original image needs to be face-detected, and then the third position information of the edge feature points of the face region is calculated based on the correlation edge algorithm.

207. Perform image segmentation processing on the original image according to the third location information to remove the face region, and use the remaining image region as the background image.

208. Generate a face mask according to the third location information, and use the face mask to fuse the corrected target face image with the background image in the original image.

The position information of the edge feature points may be relative position information between the edge feature points. A closed pattern is formed based on the edge feature points, and an area other than the closed pattern is used as a face mask. The face mask is superimposed on the target face image mapped on the face region of the original image, and aligned with the face region of the original image, and the region of the target face image that is not blocked by the face mask is displayed. The occluded area is not displayed.

It can be seen that the image processing method provided by the embodiment of the present application obtains the first location information of the face key point in the original image, and matches the corresponding target face image from the preset face database according to the first location information, and Aligning the target face image with the face in the original image; correcting the target face image based on the trained convolutional neural network model and the face region in the original image; correcting the target face image and original Image fusion. The program can better maintain the expression, skin color and illumination of the original image through deep learning technology, and at the same time, it can dilute the image stitching and improve the image synthesis effect.

In another embodiment of the present application, an image processing apparatus is further provided, which may be integrated in an electronic device in the form of software or hardware, and the electronic device may specifically include a mobile phone, a tablet computer, a notebook computer, and the like. As shown in FIG. 7, the image processing apparatus 30 may include a location acquisition module 31, an alignment module 32, a correction module 33, and a fusion module 34, where:

a location obtaining module 31, configured to acquire first location information of a face key point in the original image;

The aligning module 32 is configured to match the corresponding target face image from the preset face database according to the first location information, and align the target face image with the face in the original image;

The correction module 33 is configured to correct the target face image based on the trained convolutional neural network model and the face region in the original image;

The fusion module 34 is configured to fuse the corrected target face image with the original image.

In some embodiments, referring to FIG. 8, the image processing apparatus 30 may further include:

a building module 35, configured to construct a convolutional neural network before acquiring first location information of a face key point in the original image;

a sample obtaining module 36, configured to acquire an image of a plurality of angles of a face in the original image as a training sample;

The training module 37 is configured to perform parameter training on the constructed convolutional neural network based on the training samples to adjust the parameter settings of the content loss function, the illumination loss function, and the smoothing loss function to obtain a trained convolutional neural network model.

In some embodiments, referring to FIG. 9, the correction module 33 can include:

The extraction sub-module 331 is configured to extract a content feature, an illumination feature, and a smooth feature from the face region of the original image based on the trained convolutional neural network model;

The generating submodule 332 is configured to generate an adjustment parameter according to the content feature, the illumination feature, and the smoothing feature, under the constraint of the content loss function, the illumination loss function, and the smoothing loss function;

The correction sub-module 333 is configured to correct the target facial image according to the adjustment parameter.

In some embodiments, referring to FIG. 10, the correction module 32 can include:

The obtaining sub-module 321 is configured to acquire second location information of a face key point of the plurality of sample face images in the preset face database;

a matching sub-module 322, configured to match the first location information with the second location information;

The selecting sub-module 323 is configured to select, from the sample face image, the sample face image with the largest matching degree as the target face image;

The mapping sub-module 324 is configured to map the target face image to the face region of the original image by affine transformation to align the target face image with the face in the original image.

In some embodiments, with continued reference to FIG. 10, the image processing device 30 can further include:

The edge feature point obtaining module 38 is configured to: after aligning the target face image with the face in the original image, and performing edge feature points on the face region in the original image before merging the corrected target face image with the original image Detecting, and acquiring third position information of edge feature points;

The segmentation module 39 is configured to perform image segmentation processing on the original image according to the third position information to remove the face region and use the remaining image region as the background image.

In some embodiments, the fusion module 34 can be used to:

Generating a face mask according to the third location information;

In some embodiments, the location acquisition module 31 can be used to:

Extracting image features of the original image;

Determining a face region in the original image according to the image feature;

It can be seen that the image processing apparatus provided by the embodiment of the present application obtains the first location information of the face key point in the original image, and matches the corresponding target face image from the preset face database according to the first location information, and Aligning the target face image with the face in the original image; correcting the target face image based on the trained convolutional neural network model and the face region in the original image; correcting the target face image and original Image fusion. The program can better maintain certain features of the original image through deep learning technology, and at the same time, it can dilute the image stitching and improve the image synthesis effect.

In another embodiment of the present application, an electronic device is further provided, and the electronic device may be a device such as a smart phone or a tablet computer. As shown in FIG. 11, the electronic device 400 includes a processor 401 and a memory 402. The processor 401 is electrically connected to the memory 402.

The processor 401 is a control center of the electronic device 400, which connects various parts of the entire electronic device using various interfaces and lines, executes the electronic by running or loading an application stored in the memory 402, and calling data stored in the memory 402. The various functions and processing data of the device enable overall monitoring of the electronic device.

In this embodiment, the processor 401 in the electronic device 400 loads the instructions corresponding to the process of one or more applications into the memory 402 according to the following steps, and is stored and stored in the memory 402 by the processor 401. In the application, thus implementing various functions:

Obtaining first position information of a face key point in the original image;

The corrected target face image is merged with the original image.

In some embodiments, before acquiring the first location information of the face key in the original image, the processor 401 is configured to perform the following steps:

Construct a convolutional neural network;

In some embodiments, the processor 401 is further configured to perform the following steps:

The target face image is corrected according to the adjustment parameters.

Matching the first location information with the second location information;

In some embodiments, after aligning the target face image with the face in the original image, before the merged target face image is merged with the original image, the processor 401 further performs the following steps:

Generating a face mask according to the third location information;

The corrected target face image is merged with the background image in the original image using a face mask.

Memory 402 can be used to store applications and data. The application stored in the memory 402 contains instructions that can be executed in the processor. Applications can form various functional modules. The processor 401 executes various functional applications and data processing by running an application stored in the memory 402.

In some embodiments, as shown in FIG. 12, the electronic device 400 further includes a display screen 403, a control circuit 404, a radio frequency circuit 405, an input unit 406, an audio circuit 407, a sensor 408, and a power source 409. The processor 401 is electrically connected to the display screen 403, the control circuit 404, the radio frequency circuit 405, the input unit 406, the audio circuit 407, the sensor 408, and the power source 409, respectively.

The display screen 403 can be used to display information entered by the user or information provided to the user as well as various graphical user interfaces of the electronic device, which can be composed of images, text, icons, video, and any combination thereof. The display screen 403 can be used as a screen in the embodiment of the present application for displaying information.

The control circuit 404 is electrically connected to the display screen 403 for controlling the display screen 403 to display information.

The radio frequency circuit 405 is configured to transmit and receive radio frequency signals to establish wireless communication with network devices or other electronic devices through wireless communication, and to transmit and receive signals with network devices or other electronic devices.

The input unit 406 can be configured to receive input digits, character information, or user characteristic information (eg, fingerprints), and to generate keyboard, mouse, joystick, optical, or trackball signal inputs related to user settings and function controls. The input unit 406 can include a fingerprint identification module.

The audio circuit 407 can provide an audio interface between the user and the electronic device through a speaker and a microphone.

Sensor 408 is used to collect external environmental information. Sensor 408 can include ambient brightness sensors, acceleration sensors, light sensors, motion sensors, and other sensors.

Power source 409 is used to power various components of electronic device 400. In some embodiments, the power supply 409 can be logically coupled to the processor 401 through a power management system to enable functions such as managing charging, discharging, and power management through the power management system.

The camera 410 is used for collecting external images, and can be a digital camera or an analog camera. In some embodiments, camera 410 may convert the acquired external picture into data for transmission to processor 401 to perform image processing operations.

Although not shown in FIG. 12, the electronic device 400 may further include a Bluetooth module or the like, and details are not described herein again.

It can be seen that the electronic device provided by the embodiment of the present application obtains the first location information of the face key point in the original image, and matches the corresponding target face image from the preset face database according to the first location information, and The target face image is aligned with the face in the original image; the target face image is corrected based on the trained convolutional neural network model and the face region in the original image; the corrected target face image and the original image are corrected Fusion. The program can better maintain certain features of the original image through deep learning technology, and at the same time, it can dilute the image stitching and improve the image synthesis effect.

A further embodiment of the present application further provides a storage medium having stored therein a plurality of instructions adapted to be loaded by a processor to perform the steps of any of the image processing methods described above.

A person skilled in the art may understand that all or part of the various steps of the foregoing embodiments may be performed by a program to instruct related hardware. The program may be stored in a computer readable storage medium, and the storage medium may include: Read Only Memory (ROM), Random Access Memory (RAM), disk or optical disk.

The terms "a", "an", "the", and "the" In addition, unless otherwise stated herein, the recitation of numerical ranges herein is merely referring to each of the individual These values are stated separately in this article. In addition, the steps of all methods described herein can be performed in any suitable order unless otherwise indicated herein or otherwise indicated. Changes to the application are not limited to the sequence of steps described. The use of any and all examples or exemplary language, such as "a" Numerous modifications and adaptations will be apparent to those skilled in the art without departing from the scope of the invention.

The image processing method, the device, the storage medium and the electronic device provided by the embodiments of the present application are described in detail. In the present application, the principle and the implementation manner of the application are described in the specific examples, and the description of the above embodiments is provided. It is only used to help understand the method of the present application and its core ideas; at the same time, for those skilled in the art, according to the idea of the present application, there will be changes in the scope of specific implementations and applications, in summary, The contents of this specification are not to be construed as limiting the application.

Claims

An image processing method is applied to an electronic device, including:

Obtaining first position information of a face key point in the original image;

Matching the corresponding target face image from the preset face database according to the first location information, and aligning the target face image with the face in the original image;

Correcting the target face image based on the trained convolutional neural network model and the face region in the original image;

The corrected target face image is merged with the original image.
The image processing method according to claim 1, wherein the method further comprises: before acquiring the first location information of the face key point in the original image, the method further comprising:

Construct a convolutional neural network;

Obtaining images of multiple angles of the face in the original image as training samples;

Based on the training samples, parameter training is performed on the constructed convolutional neural network to adjust the parameter settings of the content loss function, the illumination loss function, and the smoothing loss function to obtain a trained convolutional neural network model.
The image processing method according to claim 2, wherein the step of correcting the target face image based on the trained convolutional neural network model and the face region in the original image comprises:

Extracting content features, illumination features, and smoothing features from the face regions of the original image based on the trained convolutional neural network model;

And generating an adjustment parameter according to the content feature, the illumination feature, and the smooth feature, under the constraint of the content loss function, the illumination loss function, and the smoothing loss function;

Correcting the target face image according to the adjustment parameter.
The image processing method according to claim 1, wherein the step of matching the corresponding target face image from the preset face database according to the position information, and aligning the target face image with the face in the original image, include:

Obtaining second location information of a face key point of the plurality of sample face images in the preset face database;

Matching the first location information with the second location information;

Selecting a sample face image with the largest matching degree from the sample face image as the target face image;

The target face image is mapped to the face region of the original image by affine transformation to align the target face image with the face in the original image.
The image processing method according to claim 1, wherein the method further comprises: after aligning the target face image with the face in the original image, and merging the corrected target face image with the original image, the method further comprising:

Performing edge feature point detection on the face region in the original image, and acquiring third position information of the edge feature point;

The original image is subjected to image segmentation processing according to the third position information to remove the face region, and the remaining image region is used as the background image.
The image processing method according to claim 5, wherein the step of fusing the corrected target face image with the original image comprises:

Generating a face mask according to the third location information;

The corrected target face image is merged with the background image in the original image by using the face mask.
The image processing method according to claim 1, wherein the obtaining the first location information of the face key point in the original image comprises:

Extracting image features of the original image;

Determining a face region in the original image according to the image feature;

The face key point detection is performed on the face area to obtain the first position information of the face key point.
An image processing apparatus, wherein the apparatus comprises:

a location obtaining module, configured to acquire first location information of a face key point in the original image;

An alignment module, configured to match a corresponding target face image from the preset face database according to the first location information, and align the target face image with the face in the original image;

a correction module for correcting the target face image based on the trained convolutional neural network model and the face region in the original image;

A fusion module for merging the corrected target face image with the original image.
The image processing device of claim 8, wherein the device further comprises:

a building module, configured to construct a convolutional neural network before acquiring first position information of a face key point in the original image;

a sample obtaining module, configured to acquire an image of a plurality of angles of a face in the original image as a training sample;

The training module is configured to perform parameter training on the constructed convolutional neural network based on the training sample to adjust a parameter of a content loss function, an illumination loss function, and a smoothing loss function to obtain a trained convolutional neural network model.
The image processing device according to claim 9, wherein said correction module comprises:

Extracting a sub-module for extracting content features, illumination features, and smoothing features from a face region of the original image based on the trained convolutional neural network model;

Generating a submodule, configured to generate an adjustment parameter according to the content feature, the illumination feature, and the smooth feature, under the constraint of the content loss function, the illumination loss function, and the smoothing loss function;

The correction submodule is configured to correct the target facial image according to the adjustment parameter.
The image processing device of claim 8, wherein the alignment module comprises:

Obtaining a sub-module, configured to acquire second location information of a face key point of the plurality of sample face images in the preset face database;

a matching submodule, configured to match the first location information with the second location information;

Selecting a sub-module for selecting a sample face image with the largest matching degree from the sample face image as the target face image;

a mapping sub-module for mapping the target face image to the face region of the original image by affine transformation to align the target face image with the face in the original image.
The image processing device according to claim 8, further comprising:

The edge feature point acquiring module is configured to perform edge feature point detection on the face region in the original image after the target face image is aligned with the face in the original image, and the corrected target face image is merged with the original image. And obtaining third position information of the edge feature points;

The segmentation module is configured to perform image segmentation processing on the original image according to the third position information to remove the face region and use the remaining image region as the background image.
The image processing device according to claim 12, wherein said fusion module is configured to:

Generating a face mask according to the third location information;

The corrected target face image is merged with the background image in the original image by using the face mask.
A storage medium, wherein the storage medium stores a plurality of instructions adapted to be loaded by a processor to perform the following steps:

Obtaining first position information of a face key point in the original image;

Matching the corresponding target face image from the preset face database according to the first location information, and aligning the target face image with the face in the original image;

Correcting the target face image based on the trained convolutional neural network model and the face region in the original image;

The corrected target face image is merged with the original image.
An electronic device, comprising a processor and a memory, the processor being electrically connected to the memory, the memory for storing instructions and data; the processor for performing the following steps:

Obtaining first position information of a face key point in the original image;

Matching the corresponding target face image from the preset face database according to the first location information, and aligning the target face image with the face in the original image;

Correcting the target face image based on the trained convolutional neural network model and the face region in the original image;

The corrected target face image is merged with the original image.

.
The electronic device of claim 15, wherein the processor is configured to perform the following steps before acquiring the first location information of the face key in the original image:

Construct a convolutional neural network;

Obtaining images of multiple angles of the face in the original image as training samples;

Based on the training samples, parameter training is performed on the constructed convolutional neural network to adjust the parameter settings of the content loss function, the illumination loss function, and the smoothing loss function to obtain a trained convolutional neural network model.
The electronic device of claim 16, wherein the processor is configured to perform the following steps when the target face image is corrected based on the trained convolutional neural network model and the face region in the original image:

Extracting content features, illumination features, and smoothing features from the face regions of the original image based on the trained convolutional neural network model;

And generating an adjustment parameter according to the content feature, the illumination feature, and the smooth feature, under the constraint of the content loss function, the illumination loss function, and the smoothing loss function;

Correcting the target face image according to the adjustment parameter.
The electronic device according to claim 15, wherein when the corresponding target face image is matched from the preset face database according to the position information, and the target face image is aligned with the face in the original image, The processor is used to perform the following steps:

Obtaining second location information of a face key point of the plurality of sample face images in the preset face database;

Matching the first location information with the second location information;

Selecting a sample face image with the largest matching degree from the sample face image as the target face image;

The target face image is mapped to the face region of the original image by affine transformation to align the target face image with the face in the original image.
The electronic device of claim 15, wherein the processor is configured to perform the following steps before merging the corrected target face image with the original image after aligning the target face image with the face in the original image :

Performing edge feature point detection on the face region in the original image, and acquiring third position information of the edge feature point;

The original image is subjected to image segmentation processing according to the third position information to remove the face region, and the remaining image region is used as the background image.
The electronic device of claim 19, wherein the processor is configured to perform the following steps when the corrected target face image is merged with the original image:

Generating a face mask according to the third location information;

The corrected target face image is merged with the background image in the original image by using the face mask.