CN111325657A - Image processing method, image processing device, electronic equipment and computer readable storage medium - Google Patents

Image processing method, image processing device, electronic equipment and computer readable storage medium Download PDF

Info

Publication number
CN111325657A
CN111325657A CN202010100146.9A CN202010100146A CN111325657A CN 111325657 A CN111325657 A CN 111325657A CN 202010100146 A CN202010100146 A CN 202010100146A CN 111325657 A CN111325657 A CN 111325657A
Authority
CN
China
Prior art keywords
image
makeup
face
region
model
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202010100146.9A
Other languages
Chinese (zh)
Inventor
关扬
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing QIYI Century Science and Technology Co Ltd
Original Assignee
Beijing QIYI Century Science and Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing QIYI Century Science and Technology Co Ltd filed Critical Beijing QIYI Century Science and Technology Co Ltd
Priority to CN202010100146.9A priority Critical patent/CN111325657A/en
Publication of CN111325657A publication Critical patent/CN111325657A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T3/00Geometric image transformations in the plane of the image
    • G06T3/04Context-preserving transformations, e.g. by using an importance map
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T5/00Image enhancement or restoration
    • G06T5/50Image enhancement or restoration using two or more images, e.g. averaging or subtraction
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/20Image preprocessing
    • G06V10/25Determination of region of interest [ROI] or a volume of interest [VOI]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/10Image acquisition modality
    • G06T2207/10024Color image

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Multimedia (AREA)
  • Image Analysis (AREA)
  • Image Processing (AREA)

Abstract

The embodiment of the invention discloses an image processing method, an image processing device, electronic equipment and a computer readable storage medium. An embodiment of the method comprises: carrying out face detection on an image to be processed to obtain a first face image; performing makeup processing on the first face image to generate a second face image; detecting an interested region in the second face image, and determining a corresponding region corresponding to the interested region in the image to be processed; replacing the corresponding area with an interesting area to obtain a fused image; and updating the pixel value of each pixel point in the fused image by using an image fusion algorithm to generate a target image. The embodiment can realize dressing treatment on the face area in the whole-body image without modifying other areas, and improves the reality degree of the image after dressing treatment.

Description

Image processing method, image processing device, electronic equipment and computer readable storage medium
Technical Field
The embodiments of the present invention relate to the field of computer technologies, and in particular, to an image processing method and apparatus, an electronic device, and a computer-readable storage medium.
Background
With the development of computer technology, the functions of client Applications (APPs) are becoming more and more abundant. For example, the image processing application can provide a plurality of makeup special effects of different styles, and the video application can play videos and also provide makeup effects for the human face in the video frame when recording the human face video.
The existing application mainly aims at performing makeup treatment on face images. For a whole-body image, generally, only the image as a whole can be subjected to global image processing, such as buffing, beautifying, adding filters and the like, and only the face area in the whole-body image cannot be subjected to makeup processing without modifying other areas. Therefore, the mode of processing the whole-body image leads the difference between the processed whole-body image and the original image to be larger, and leads the reality degree of the processed whole-body image to be lower.
Disclosure of Invention
The embodiment of the invention provides an image processing method, an image processing device, electronic equipment and a computer readable storage medium, and aims to solve the technical problem that the reality degree of a processed image is low due to the fact that the image is processed globally. The specific technical scheme is as follows:
in a first aspect of the present invention, there is provided an image processing method, including: carrying out face detection on an image to be processed to obtain a first face image; performing makeup processing on the first face image to generate a second face image; detecting an interested region in the second face image, and determining a corresponding region corresponding to the interested region in the image to be processed; replacing the corresponding area with an interesting area to obtain a fused image; and updating the pixel value of each pixel point in the fused image by using an image fusion algorithm to generate a target image.
In a second aspect of the present invention, there is also provided an image processing apparatus comprising: the face detection unit is configured to perform face detection on the image to be processed to obtain a first face image; a makeup processing unit configured to perform makeup processing on the first face image to generate a second face image; the determining unit is configured to detect an interested area in the second face image and determine a corresponding area corresponding to the interested area in the image to be processed; a replacement unit configured to replace the corresponding region with a region of interest, resulting in a fused image; and the generating unit is configured to update the pixel value of each pixel point in the fused image by using an image fusion algorithm to generate a target image.
In a third aspect of the present invention, there is also provided an electronic device, including a processor, a communication interface, a memory, and a communication bus, where the processor, the communication interface, and the memory complete communication with each other through the communication bus; a memory for storing a computer program; a processor for implementing the method steps described in the first aspect when executing the program stored in the memory.
In yet another aspect of the present invention, there is also provided a computer-readable storage medium having stored therein instructions, which when run on a computer, cause the computer to execute any of the image processing methods described above.
In yet another aspect of the present invention, there is also provided a computer program product containing instructions which, when run on a computer, cause the computer to perform any of the image processing methods described above.
According to the image processing method, the image processing device, the electronic equipment and the computer readable storage medium, the first face image is obtained by carrying out face detection on the image to be processed; then, performing makeup processing on the first face image to generate a second face image; then detecting an interested region in the second face image, and determining a corresponding region corresponding to the interested region in the image to be processed; then replacing the corresponding region with the region of interest to obtain a fused image; and finally, updating the pixel value of each pixel point in the fused image by using an image fusion algorithm, thereby generating the target image. Therefore, the makeup processing of the face area in the whole-body image can be realized without modifying other areas, and the reality degree of the processed image is improved. In addition, the corresponding area in the image to be processed is replaced by the interesting area in the face image after makeup processing, and compared with a mode of directly replacing a rectangular area of the face, the replaced area can be closer to the actual face area. Meanwhile, the obtained fusion image is processed by using an image fusion algorithm, so that the transition of the boundary color of the region of interest in the fusion image is more natural, and the quality of the image after makeup processing is improved.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below.
FIG. 1 is a flow diagram of one embodiment of an image processing method according to the present application;
FIG. 2 is a flow diagram of yet another embodiment of an image processing method according to the present application;
FIG. 3 is a schematic diagram of an application scenario of an image processing method according to the present application;
FIG. 4 is a schematic block diagram of one embodiment of an image processing apparatus according to the present application;
FIG. 5 is a schematic block diagram of a computer system suitable for use in implementing an electronic device according to embodiments of the present application.
Detailed Description
The present application will be described in further detail with reference to the following drawings and examples. It is to be understood that the specific embodiments described herein are merely illustrative of the relevant invention and not restrictive of the invention. It should be noted that, for convenience of description, only the portions related to the related invention are shown in the drawings.
It should be noted that the embodiments and features of the embodiments in the present application may be combined with each other without conflict. The present application will be described in detail below with reference to the embodiments with reference to the attached drawings.
Referring to FIG. 1, a flow 100 of one embodiment of an image processing method according to the present application is shown. The image processing method comprises the following steps:
step 101, performing face detection on an image to be processed to obtain a first face image.
In this embodiment, an executing subject (e.g., an electronic device such as a mobile phone or a tablet computer) of the image processing method may first acquire an image to be processed. The image to be processed may be an image including a face region, such as a whole body image.
In one scenario, the execution body may be equipped with an image capture device, such as a camera. The image to be processed may be an image captured by the image capturing device, or may be a frame of a video captured by the image capturing device.
In another scenario, the image to be processed may be an image obtained from the internet or an image transmitted by another device.
In this embodiment, the execution subject may perform face detection on the image to be processed by using various face detection methods to obtain a first face image. For example, the face detection may be performed on the image to be processed by an existing face detection model. The first face image is a rectangular image area indicated by a face detection frame obtained after the face detection is carried out on the image to be processed.
In some optional implementation manners of this embodiment, the executing body may use a pre-trained multi-task detection model to detect the image to be processed. The multi-task detection model can be used for detecting the face of an image. The Multi-task detection model may be obtained by training a Multi-task convolutional neural network (MTCNN).
Here, the multitask convolutional neural network includes three lightweight neural network structures, which are a P-net (pro-social network) network, an R-net (refine network) network, and an O-net (output network) network. The P-Net Network is a Full Convolution Network (FCN) and is used for performing preliminary feature extraction and face detection frame calibration on a plurality of images. The multiple images input to the P-Net network may be images obtained by scaling a certain image to be processed to different scales. The R-Net network is a convolutional neural network and is used for screening a plurality of simple detection frames obtained by the P-Net network and performing frame Regression (BBR) and Non-maximum suppression (NMS) processing on the selected face detection frame so as to further optimize the detection result. The O-Net network is also a convolutional neural network, and a convolutional layer is added compared with the R-Net network. The O-Net network can further optimize the face detection frame selected by the P-Net network to obtain the final face detection frame of the image to be processed.
In this implementation, the executing body may first zoom the image to be processed to different scales (e.g., down to 0.5 times, up to 2 times, etc.), resulting in a plurality of zoomed images. Wherein the plurality of scaled images include the image to be processed. Then, the scaled images may be input to a multi-task detection model trained in advance, so as to obtain a first face image used for representing a face region in the image to be processed.
Thus, by scaling the image to be processed, images of a plurality of sizes can be obtained. Because the size of the face in the picture to be processed may be larger, smaller or moderate, when the size of the face in the picture to be processed is smaller, the face in the image obtained after the picture to be processed is amplified will also become larger, so that the face detection is performed on the amplified image, and the accuracy of the detection result can be improved. Similarly, when the size of the face in the picture to be processed is larger, the face in the image obtained by reducing the picture to be processed is also reduced, so that the face detection is performed on the reduced image, and the accuracy of the detection result can be improved.
And 102, performing makeup processing on the first face image to generate a second face image.
In this embodiment, the executing body may perform makeup processing on the first face image, and use the face image obtained after the makeup processing as the second face image. The makeup processing may be adding a makeup effect to the face in the first face image, or performing processing such as changing makeup, beautifying, polishing, whitening, and the like on the face in the first face image. The examples of the present application are described with the addition and makeup effects as examples.
In one scenario, the face in the first face image is in a plain state, that is, the first face image is a makeup-free face image. In this case, the makeup processing may be performed on the first face image, and the makeup processing may be performed on the face.
In another scenario, the face in the first face image is in a makeup state, that is, the first face image is a makeup face image. In this case, the makeup processing may be performed on the first face image, and the makeup removal processing may be performed on the face.
In this embodiment, the execution body may perform makeup processing on the first face image using various existing makeup processing models. Here, the makeup processing model may be used to perform makeup processing on a face image. The makeup treatment model may be trained using various existing convolutional neural network structures (e.g., DenseBox, VGGNet, ResNet, SegNet, etc.).
In some optional implementations of the embodiment, taking the first face image as a makeup-free face as an example, the second face image may be generated by the following steps:
first, a makeup reference image is acquired. Wherein, the makeup reference image can be a face image with a target makeup. The makeup-free face image can be one image selected by a user from a plurality of preset makeup reference images, or can be a default image in the plurality of preset makeup reference images. Different makeup reference images may carry different makeup. The above different makeup. For example, may include, but is not limited to: harbour dressing, cartoon dressing, french dressing, European and American dressing, girl dressing and the like.
And inputting the first face image and the makeup reference image into a pre-trained makeup processing model to obtain a second face image corresponding to the first face image. Wherein, the second face image is provided with the target makeup. The makeup treatment model can be used for performing makeup treatment on a face image. Wherein, the makeup treatment model can be obtained by training in advance through a machine learning method. The machine learning method may be a supervised learning method, a method of confrontation training, or the like.
In the above implementation manner, optionally, the dressing model may be obtained through the following steps S11-substep S13:
in sub-step S11, a sample set is obtained.
Here, each group of samples in the sample set includes a makeup sample image and a makeup-free sample image, and each sample image is a face image.
Note that each sample image may have label information. The above-mentioned label information can be used to indicate whether it is a sample image. For example, if "1" indicates that the image is the sample image and "0" indicates that the image is not the sample image, the label information attached to each sample image is "1".
And a substep S12 of extracting the pre-established makeup generation countermeasure network.
Here, a makeup generation countermeasure Network (beautyxgan) is a generation countermeasure Network for performing makeup processing on a face image. The makeup creation countermeasure network described above includes a creation model (may also be referred to as a generator), a first discriminant model, and a second discriminant model (each discriminant model may also be referred to as a discriminator).
The generated model can be used for exchanging makeup in two face images. That is, it can be used to generate makeup removal images corresponding to makeup bearing sample images and makeup application images corresponding to makeup free sample images. In practice, the above-described generative model may employ a convolutional neural network structure. For example, an encoder (encoder), a Residual block (Residual block), and a decoder (decoder) may be included. The encoder may include a plurality of convolutional layers and pooling layers, and the decoder may include a plurality of anti-convolutional layers and anti-pooling layers. The convolutional layer may be configured to extract image features, the pooling layer may be configured to perform downsampling (down sampling) on input information, the inverse pooling layer may be configured to perform upsampling (up sampling) on the input information, the deconvolution layer is configured to perform deconvolution on the input information, and the transpose of the convolution kernel of the convolutional layer is used as the convolution kernel of the deconvolution layer to process the input information. Deconvolution is the inverse of convolution, and achieves signal restoration. The last deconvolution layer of the convolutional neural network can output the generated images, i.e., the two images after makeup exchange.
The first judging model is used for judging whether the makeup removing image generated by the generating model is a makeup-free sample image. In practice, if the first discriminant model determines that the image inputted thereto is a makeup-free sample image, a certain preset value (e.g., 1) may be outputted; if it is determined that the characteristic information input thereto is taken from the visible light image, another preset value (e.g., 0) may be output.
The second judging model is used for judging whether the makeup image output by the generating model is a makeup sample image. In practice, if the first discriminant model determines that the image inputted thereto is a makeup-free sample image, a certain preset value (e.g., 1) may be outputted; if it is determined that the characteristic information input thereto is taken from the visible light image, another preset value (e.g., 0) may be output.
The first and second discrimination models may be various existing models that can implement a classification function. For example, a Naive Bayesian Model (NBM), a Support Vector Machine (SVM), a neural network including fully connected layers (FCs)), or a classification function (e.g., a neural network such as a softmax function, etc.
And a substep S13 of training the makeup generation countermeasure network by a countermeasure training method based on the sample set, and using the trained generation model as a makeup processing model.
Here, the countermeasure training is a method for training generation of a countermeasure network. In the countercheck training process, firstly, parameters of a network are fixedly generated, and a discriminant model is trained; and then fixing the trained discrimination model, and training to generate the model. And sequentially and iteratively executing the two training steps to obtain a trained generation model and a trained discrimination model.
Thus, the executing entity may specifically iteratively execute the following training steps:
a first training step: the parameters of the generated model are fixed, and the first and second discrimination models are trained by a machine learning method based on the sample set.
A second training step: parameters of the first and second discrimination models are fixed, and the generated model is trained by a machine learning method based on the sample set.
It should be noted that the countermeasure training is not limited to the above-described implementation, and other training modes may be provided as needed. As an example, the generative model may be trained first, followed by the discriminant model. As another example, the first and second discrimination models may be trained twice (i.e., twice epochs) and then the generative model may be trained. As still another example, after the first and second discrimination models are trained once, the generation model may be trained twice in succession.
Optionally, the first training step, namely the step of training the first and second discrimination models, may be performed with reference to the following steps:
firstly, copying the generated model to obtain a copied model.
Wherein, the input of the copy model is the output of the generation model. Since the generative model can exchange makeup of a set of makeup-bearing face images and makeup-free face images, the replication model can be used to generate a makeup-bearing image (which may be referred to herein as a makeup-bearing restored image) corresponding to a makeup-removing image output by the generative model, and to generate a makeup-free image (which may be referred to herein as a makeup-free restored image) corresponding to a makeup-applying image output by the generative model.
And a second step of inputting the makeup sample image and the makeup-free sample image in the sample as the input of the generated model, inputting the makeup removing image generated by the generated model and the makeup sample image in the sample as the input of the first judgment model, inputting the makeup applying image generated by the generated model and the makeup-free sample image in the sample as the input of the second judgment model and the input of a pre-trained semantic segmentation network, and inputting the makeup removing image and the makeup applying image generated by the generated model as the input of the replica model.
Here, the semantic segmentation network described above may be used to extract histogram features of the five sense organ regions in the face image.
And thirdly, fixing parameters of the generated model, and training the first discrimination model and the second discrimination model by using a machine learning method based on each group of samples and information output by each model after the group of samples are input.
In training the first and second discrimination models for each sample, the following steps may be referred to for each group of samples:
first, a first loss value is determined based on the makeup sample image in the set of samples and the makeup removal image generated by the generative model. In practice, the characteristic information of the makeup sample image input to the generative model and the makeup removal image generated by the generative model described above may be extracted. And then, inputting the characteristic information of the makeup sample image and the characteristic information of the makeup removing image into a preset loss function, thereby obtaining a first loss value. The feature information can be extracted by a preset feature extraction model (such as a VGG network). The above-described loss function may be used to characterize the degree of difference between the makeup sample image and the makeup removal image. The smaller the first loss value is, the smaller the difference between the makeup removal image output from the generated model and the makeup-attached sample image input to the generated model is.
Then, a second loss value is determined based on the cosmetic-free sample image in the set of samples and the cosmetic image generated by the generative model. In practice, the second loss value may be determined in a similar manner as the first loss value. The embodiment of the present application is not described in detail herein.
And then, setting marking information for the makeup removing image and the makeup applying image generated by the generated model. The annotation information may be used to indicate whether the image is a sample image. For example, if "1" indicates that the image is the sample image and "0" indicates that the image is not the sample image, the label information attached to each sample image is "1". At this time, the label information of the makeup removal image and the makeup application image generated by the generative model may be set to "0".
Then, a third loss value is determined based on the determination result output by the first determination model and the label information of each image (including the makeup removal image output by the generation model and the makeup-free sample image in the sample) input to the first determination model. Here, another loss function may be employed to determine the third loss value. The loss function may be a euclidean distance function, a highest function, or the like.
Then, a fourth loss value is determined based on the determination result output by the second determination model and the label information of each image (including the makeup image output by the generation model and the makeup sample image in the sample) input to the second determination model. In practice, the fourth loss value may be determined in a similar manner as the third loss value. The embodiment of the present application is not described in detail herein.
Then, a fifth loss value is determined based on the cosmetic restoration image and the cosmetic sample images in the set of samples. In practice, the euclidean distance between the reduced image with makeup and the sample image with makeup may be determined as the fifth loss value.
Then, a sixth loss value is determined based on the cosmetic-free restored image and the cosmetic-free sample image in the set of samples. In practice, the euclidean distance between the makeup-free restored image and the makeup-free sample image may be determined as the sixth loss value.
Then, a seventh loss value is determined based on the histogram features (including the histogram features of the facial features in the facial feature region in the makeup image generated by the generative model and the histogram features of the facial feature region in the makeup-free sample image in the sample) output by the semantic segmentation network. Here, a loss function for calculating the difference of the histogram features may be set in advance, and the value of the loss function may be set as a seventh loss value.
Finally, the sum of the loss values is used as a total loss value, and the parameters of the first and second discrimination models are updated based on the total loss value. In practice, during the training process, the executing agent may obtain a gradient of the loss value relative to the model parameter by using a back propagation algorithm, and then update the model parameter based on the gradient by using a gradient descent algorithm.
It can be understood that, in the training process, the total loss value may constrain the way and direction of parameter modification, and the training target is to minimize the value of the loss function, so that the parameters of the first and second discrimination models obtained after training are the corresponding parameters when the total loss value is the minimum value.
Optionally, the second training step, i.e. the step of training the generated model, may be performed with reference to the following steps:
the method comprises the steps of firstly, taking a makeup sample image and a makeup-free sample image in a sample as input of a generated model, taking a makeup removing image generated by the generated model and the makeup sample image in the sample as input of a first discrimination model, taking a makeup applying image generated by the generated model and the makeup-free sample image in the sample as input of a second discrimination model and input of a pre-trained semantic segmentation network, and taking the makeup removing image and the makeup applying image generated by the generated model as input of a copied model obtained after copying the generated model.
And secondly, fixing parameters of the current first judging model and the current second judging model, and training the generated model by using a machine learning method based on each sample image in the sample, the labeling information of each sample image in the sample and the information output by each model.
In training the generative model for each sample, the following steps may be referred to:
first, a first loss value is determined based on the makeup sample image in the sample and the makeup removal image generated by the above-described generation model.
Then, a second loss value is determined based on the makeup-free sample image in the sample and the makeup image generated by the generative model.
And then, setting marking information for the makeup removing image and the makeup applying image generated by the generated model.
Then, a third loss value is determined based on the discrimination result output by the first discrimination model and the label information of each image input to the first discrimination model.
Then, a fourth loss value is determined based on the discrimination result output by the second discrimination model and the label information of each image input to the second discrimination model.
Then, a fifth loss value is determined based on the makeup-bearing restored image and the makeup-bearing sample image in the sample.
Then, a sixth loss value is determined based on the makeup-free restored image and the makeup-free sample image in the sample.
Then, a seventh loss value is determined based on the histogram feature output by the semantic segmentation network.
And finally, taking the sum of the loss values as a total loss value, and updating the parameters of the generated model based on the total loss value.
It should be noted that the method for training the generation model is basically the same as the method for training the discriminant model, and the process for training the generation model is not described in detail in the embodiments of the present application.
It should be noted that, when the first face image is a face image with makeup, the second face image may be a face image after makeup removal processing. In this case, the manner of obtaining the second face image is similar to the manner of obtaining the second face image when the first face image is a makeup-free face image, that is, the makeup processing model can be trained in a similar manner, so that the makeup processing model is used to obtain the face image after makeup removal processing. The embodiment of the present application is not described in detail herein.
Step 103, detecting an interested area in the second face image, and determining a corresponding area corresponding to the interested area in the image to be processed.
In this embodiment, the execution subject may first detect a Region of Interest (ROI) in the second face image. The region of interest may be an irregular region surrounded by the face contour in the second face image. In practice, the above-mentioned execution subject may perform the region of interest detection through an existing image processing tool (such as photoshop, OpenCV, etc.). In the detection process, the coordinates of the contour point and the central point of the region of interest in the second face image can be acquired.
Then, the execution subject may determine a corresponding region in the image to be processed corresponding to the region of interest. Specifically, the second face image is generated by performing makeup processing on the first face image, and thus the second face image has the same size as the first face image. Since the position of the region of interest in the second facial image is known, the position of the region in the first facial image corresponding to the region of interest is known. The first face image is obtained after the face detection is carried out on the image to be processed, and the coordinates of the face detection frame can be obtained through the face detection, so that the position of the first face image in the image to be detected is known. Under the condition that the position of the image to be detected in the first face image and the position of the region corresponding to the region of interest in the first face image are known, the position of the corresponding region corresponding to the region of interest in the image to be detected can be determined.
In some optional implementations of the embodiment, the executing subject may determine a corresponding region corresponding to the region of interest in the image to be processed by using the following steps:
firstly, detecting a region of interest in a second face image by adopting an image mask. The image mask is a region or process for controlling image processing by blocking (wholly or partially) an image to be processed with a selected image, graphic or object. An image mask may be applied for extraction of the region of interest. In practice, a matrix of pixels with image drowning can be preset. The pixel values in the pixel matrix include 0 and 1. And then, multiplying the pixel matrix of the preset image mask and the pixel matrix of the second face image to obtain the region of interest. At this time, the pixel values within the region of interest remain unchanged, while the image values outside the region of interest are all 0.
And secondly, determining a first coordinate of the central point of the region of interest in the second face image. Here, the center point of the region of interest may be determined by: first, the upper, lower, left and right boundaries of the region of interest are determined. Then, centerlines of the upper and lower boundaries, and the left and right boundaries are determined. And finally, taking the intersection point of the two central lines as the central point of the region of interest.
And thirdly, determining a second coordinate corresponding to the first coordinate in the image to be processed. Here, since the first coordinates are coordinates of the center point of the region of interest in the second face image, the first face image and the second face image have the same size, and the position of the first face image in the image to be processed is known. Therefore, the second coordinate corresponding to the first coordinate in the image to be processed can be determined according to the first coordinate and the position of the first face image in the image to be processed.
And fourthly, determining a corresponding area corresponding to the region of interest in the image to be processed based on the second coordinate. And the interested areas correspond to the pixels in the corresponding areas one to one. In practice, if the first coordinate is (a, b), the second coordinate is (a + c, b + d), and the coordinate of a certain point in the region of interest of the second face image is (m, n), the coordinate point of the corresponding region of the point in the image to be processed is (m + c, n + d). Wherein a, b, c, d, m and n are real numbers. Therefore, the corresponding region corresponding to the region of interest in the image to be processed can be determined.
And step 104, replacing the corresponding region with the region of interest to obtain a fused image.
In this embodiment, after detecting a corresponding region corresponding to the region of interest in the image to be processed, the execution main body may replace the corresponding region with the region of interest to obtain a fused image.
The region of interest in the face image after makeup processing is used for replacing the corresponding region in the image to be processed, and compared with a mode of directly replacing a rectangular face region, the method can enable the replaced region to be closer to the actual face region.
In some optional implementations of this embodiment, the executing body may determine, as the target pixel, each pixel in the corresponding region one by one, a corresponding pixel corresponding to the target pixel in the region of interest, and then replace a pixel value of the target pixel with a pixel value of the corresponding pixel. Thereby, it is possible to realize that the corresponding region in the image to be processed is replaced with the region of interest.
And 105, updating the pixel value of each pixel point in the fused image by using an image fusion algorithm to generate a target image.
In this embodiment, the executing body may adopt various image fusion algorithms to update the pixel values of the pixels in the fused image, so as to generate the target image. In practice, the image fusion algorithm that can be adopted is not limited to a weighted average image fusion algorithm, a poisson image fusion algorithm, a direct average image fusion algorithm, a median filtering image fusion algorithm, and the like.
By processing the obtained fusion image, the color transition of the boundary of the interested area in the fusion image can be more natural, and the quality of the image after makeup processing is improved.
In some optional implementations of this embodiment, the target image may be generated by:
in a first step, the gradient field of the fused image is determined.
In practice, the gradient is a vector, which means that the directional derivative of a certain function at that point takes the maximum value along that direction, i.e. the function changes the fastest and the rate of change is the maximum (modulo of the gradient) along that direction (direction of this gradient) at that point. Here, the gradient field of the fused image can be obtained by invoking a command to solve the gradient in an image processing tool (e.g., OpenCV); the gradient components of the fused image can be obtained in the x and y directions of the fused image by a difference method, so that the gradient field of the fused image can be obtained. It should be noted that the gradient field of the fusion image includes the gradient of each pixel point in the fusion image.
And secondly, determining a divergence field of the fused image based on the gradient field.
In practice, divergence (divergence) can be used to characterize the degree of divergence of the vector field at each point in space. The divergence for each point in the image is a value (scalar). Here, the gradient field of the fused image can be obtained by invoking a command to solve divergence in an image processing tool (e.g., OpenCV); the divergence can also be calculated by a definition formula of the divergence, namely the divergence is obtained by a gradient partial derivative solving mode. It should be noted that the divergence field of the fused image includes the divergence of each pixel point in the fused image.
And thirdly, updating the pixel value of each pixel point in the fused image by using a Poisson image fusion algorithm based on the divergence field to generate a target image.
In practice, the main idea of the poisson image fusion algorithm is to synthesize image pixels again by interpolation according to divergence information and boundary information (i.e. pixel values) of the regions. Therefore, based on the obtained divergence field of the fused image and the original pixel values of the pixels in the fusion region, the updated pixel values of the pixels in the fused image can be obtained through the poisson image fusion algorithm. It should be noted that the same pixel point exists in the fusion image after the pixel value is updated and before the pixel value is updated, and the distance between the pixel point and the region of interest is usually far.
In practice, the principle of the poisson image fusion algorithm is to solve the equation Ax ═ b. Wherein, x is a matrix formed by pixel values of all pixel points to be solved; b is a divergence matrix formed by the divergence of each pixel point; a is a numerical matrix obtained based on the Poisson equation and the original pixel values of the pixels in the image.
The Poisson image fusion algorithm calculates the pixel value of each pixel point through the divergence field of the image, and compared with simple and direct image fusion methods such as a weighted average image fusion algorithm and a median filtering image fusion algorithm, the Poisson image fusion algorithm can further enable the boundary color transition of the region of interest in the fused image to be more natural, and improves the quality of the image after makeup processing.
According to the method provided by the embodiment of the application, the first face image is obtained by carrying out face detection on the image to be processed; then, performing makeup processing on the first face image to generate a second face image; then detecting an interested region in the second face image, and determining a corresponding region corresponding to the interested region in the image to be processed; then replacing the corresponding area with the region of interest to obtain a fused image; and finally, updating the pixel value of each pixel point in the fused image by using an image fusion algorithm, thereby generating the target image. Therefore, the makeup processing of the face area in the whole-body image can be realized without modifying other areas, and the reality degree of the processed image is improved. In addition, the corresponding area in the image to be processed is replaced by the interesting area in the face image after makeup processing, and compared with a mode of directly replacing a rectangular area of the face, the replaced area can be closer to the actual face area. Meanwhile, the obtained fusion image is processed by using an image fusion algorithm, so that the transition of the boundary color of the region of interest in the fusion image is more natural, and the quality of the image after makeup processing is improved.
With further reference to fig. 2, a flow 200 of yet another embodiment of an image processing method is shown. The flow 200 of the image processing method comprises the following steps:
step 201, performing face detection on an image to be processed to obtain a first face image.
In this embodiment, an execution subject of the image processing method (e.g., an electronic device such as a mobile phone or a tablet computer) may perform face detection on an image to be processed by using various face detection methods, so as to obtain a first face image. The first face image is a rectangular image area indicated by a face detection frame obtained after the face detection is carried out on the image to be processed.
As an example, the executing entity may use a multi-task detection model trained in advance to detect the image to be processed, so as to obtain the first face image. The image to be processed may be an image including a face region, such as a whole body image. The multi-task detection model can be obtained by training a multi-task convolutional neural network. In practice, the image to be processed may be first scaled to different scales (e.g., down to 0.5 times, up to 2 times, etc.), resulting in a plurality of scaled images. Wherein the plurality of scaled images include the image to be processed. Then, the scaled images may be input to a multi-task detection model trained in advance, so as to obtain a first face image used for representing a face region in the image to be processed.
Thus, by scaling the image to be processed, images of a plurality of sizes can be obtained. Because the size of the face in the picture to be processed may be larger, smaller or moderate, when the size of the face in the picture to be processed is smaller, the face in the image obtained after the picture to be processed is amplified will also become larger, so that the face detection is performed on the amplified image, and the accuracy of the detection result can be improved. Similarly, when the size of the face in the picture to be processed is larger, the face in the image obtained by reducing the picture to be processed is also reduced, so that the face detection is performed on the reduced image, and the accuracy of the detection result can be improved.
It should be noted that step 201 may refer to step 101 in the embodiment corresponding to fig. 1, and details of this embodiment are not described herein again.
Step 202, performing makeup processing on the first face image to generate a second face image.
In this embodiment, the executing body may perform makeup processing on the first face image, and use the face image obtained after the makeup processing as the second face image. The makeup processing may refer to adding a makeup effect or removing a makeup effect to the face in the first face image.
As an example, when the first face image is a makeup-free face, a makeup reference image may be acquired first. Wherein, the makeup reference image can be a face image with a target makeup. Then, the first face image and the makeup reference image are input to a pre-trained makeup processing model, and a second face image corresponding to the first face image is obtained. Wherein, the second face image is provided with the target makeup. The makeup treatment model can be used for performing makeup treatment on a face image. Wherein, the makeup processing model can be a generated network obtained after training an antagonistic network for makeup generation.
It should be noted that step 202 may refer to step 102 in the embodiment corresponding to fig. 1, and details of this embodiment are not described herein again.
Step 203, detecting an interested region in the second face image, and determining a corresponding region corresponding to the interested region in the image to be processed.
In this embodiment, the execution subject may first detect the region of interest in the second face image. The region of interest may be an irregular region surrounded by the face contour in the second face image. In practice, the above-mentioned execution subject may perform the region of interest detection through an existing image processing tool (such as photoshop, OpenCV, etc.). In the detection process, the coordinates of the contour point and the central point of the region of interest in the second face image can be acquired. Then, the execution subject may determine a corresponding region in the image to be processed corresponding to the region of interest.
It should be noted that step 203 may refer to step 103 in the embodiment corresponding to fig. 1, and details of this embodiment are not described herein again.
And step 204, replacing the corresponding region with a region of interest to obtain a fused image.
In this embodiment, after detecting a corresponding region corresponding to the region of interest in the image to be processed, the execution main body may replace the corresponding region with the region of interest to obtain a fused image.
The region of interest in the face image after makeup processing is used for replacing the corresponding region in the image to be processed, and compared with a mode of directly replacing a rectangular face region, the method can enable the replaced region to be closer to the actual face region.
It should be noted that step 204 may refer to step 104 in the embodiment corresponding to fig. 1, and details of this embodiment are not described herein again.
Step 205, determine the gradient field of the fused image.
In this embodiment, the execution subject may be obtained by calling a command for solving a gradient in an image processing tool (e.g., OpenCV); the gradient components of the fused image can be obtained in the x and y directions of the fused image by a difference method, so that the gradient field of the fused image can be obtained. It should be noted that the gradient field of the fusion image includes the gradient of each pixel point in the fusion image.
In practice, the gradient is a vector, which means that the directional derivative of a certain function at that point takes the maximum value along that direction, i.e. the function changes the fastest and the rate of change is the maximum (modulo of the gradient) along that direction (direction of this gradient) at that point. Here, the gradient field of the fused image may
Step 206, determining a divergence field of the fused image based on the gradient field.
In this embodiment, the execution subject may be calculated by a definition formula of divergence, that is, divergence is obtained by a way of gradient partial derivation. It should be noted that the divergence field of the fused image includes the divergence of each pixel point in the fused image.
In practice, divergence (divergence) can be used to characterize the degree of divergence of the vector field at each point in space. The divergence for each point in the image is a value (scalar).
And step 207, updating the pixel value of each pixel point in the fused image by using a Poisson image fusion algorithm based on the divergence field, and generating a target image.
In this embodiment, the executing body may update the pixel value of each pixel in the fused image by using a poisson image fusion algorithm based on the divergence field, so as to generate the target image.
In practice, the main idea of the poisson image fusion algorithm is to synthesize image pixels again by interpolation according to divergence information and boundary information (i.e. pixel values) of the regions. The principle of the poisson image fusion algorithm is to solve the equation Ax ═ b. Wherein, x is a matrix formed by pixel values of all pixel points to be solved; b is a divergence matrix formed by the divergence of each pixel point; a is a numerical matrix obtained based on the Poisson equation and the original pixel values of the pixels in the image.
Therefore, based on the obtained divergence field of the fused image and the original pixel values of the pixels in the fusion region, the updated pixel values of the pixels in the fused image can be obtained through the poisson image fusion algorithm.
With continued reference to fig. 3, fig. 3 is a schematic diagram of an application scenario of the image processing method according to the present embodiment. In the application scenario of fig. 3, the execution subject may be an electronic device such as a mobile phone. The electronic equipment can be used for image shooting and video recording. The user can use the electronic equipment to shoot the whole body image of other people.
The electronic device may obtain the whole-body image, and then perform scaling processing on the whole-body image to obtain a plurality of scaled images. And inputting the plurality of scaled images into the multitask detection model to obtain the character head portrait. The multi-tasking detection model may be pre-trained using an MTCNN model structure.
Then, the figure avatar can be input to a makeup model to obtain a figure avatar after makeup. The makeup treatment model may be obtained by training a BeautyGAN model. The dressing model is the generation model in BeautyGAN.
Then, a region surrounded by the face contour in the person head image after makeup may be extracted, and the region may be used as a region of interest, and a corresponding region corresponding to the region of interest in the whole-body image may be determined. And then, replacing the corresponding region with the region of interest to obtain a fused image. Therefore, the fused image can be processed by using a Poisson image fusion algorithm to obtain a beautiful whole body image. The color of the human face contour in the makeup image is transitional and natural.
As can be seen from fig. 2, compared with the embodiment corresponding to fig. 1, the flow 200 of the image processing method in the present embodiment relates to the step of updating the pixel values of the fused image by using the poisson fusion algorithm. Therefore, in the scheme described in this embodiment, the pixel value of each pixel point is determined through a poisson image fusion algorithm, and compared with simple and direct image fusion methods such as a weighted average image fusion algorithm and a median filtering image fusion algorithm, the border color transition of the region of interest in the fused image can be further more natural, and the quality of the image after makeup processing is improved.
With further reference to fig. 4, as an implementation of the methods shown in the above figures, the present application provides an embodiment of an image processing apparatus, which corresponds to the embodiment of the method shown in fig. 1, and which is particularly applicable to various electronic devices.
As shown in fig. 4, the image processing apparatus 400 according to the present embodiment includes: a face detection unit 401 configured to perform face detection on an image to be processed to obtain a first face image; a makeup processing unit 402 configured to perform makeup processing on the first face image to generate a second face image; a determining unit 403, configured to detect a region of interest in the second face image, and determine a corresponding region corresponding to the region of interest in the image to be processed; a replacing unit 404 configured to replace the corresponding region with the region of interest, resulting in a fused image; the generating unit 405 is configured to update the pixel value of each pixel point in the fused image by using an image fusion algorithm, and generate a target image.
In some optional implementations of this embodiment, the generating unit 405 may be further configured to: determining a gradient field of the fused image; determining a divergence field of the fused image based on the gradient field; and updating the pixel value of each pixel point in the fused image by using a Poisson image fusion algorithm based on the divergence field to generate a target image.
In some optional implementations of this embodiment, the face detection unit 401 may be further configured to: zooming an image to be processed to different scales to obtain a plurality of zoomed images, wherein the zoomed images comprise the image to be processed; and inputting the plurality of scaled images into a pre-trained multi-task detection model to obtain a first face image for representing a face region in the image to be processed, wherein the multi-task detection model is used for carrying out face detection on the image, and is obtained by training a multi-task convolutional neural network.
In some optional implementations of the embodiment, the first facial image is a makeup-free facial image; and, the makeup processing unit 402, as described above, may be further configured to: obtaining a makeup reference image, wherein the makeup reference image is a face image with a target makeup; inputting the first face image and the makeup reference image into a pre-trained makeup processing model to obtain a second face image corresponding to the first face image, wherein the second face image is provided with the target makeup.
In some optional implementations of this embodiment, the makeup treatment model is trained by: obtaining a sample set, wherein each group of samples in the sample set comprises a sample image with makeup and a sample image without makeup, and each sample image is a human face image; extracting a pre-established makeup generation confrontation network, wherein the makeup generation confrontation network comprises a generation model, a first judgment model and a second judgment model, the generation model is used for generating a makeup removing image corresponding to a makeup sample image and a makeup applying image corresponding to a makeup-free sample image, the first judgment model is used for judging whether the makeup removing image generated by the generation model is the makeup-free sample image, and the second judgment model is used for judging whether the makeup applying image generated by the generation model is the makeup sample image; and training the makeup generation confrontation network by using a confrontation training mode based on the sample set, and taking the trained generation model as a makeup treatment model.
In some optional implementation manners of this embodiment, the training the makeup generation confrontation network based on the sample set by using a confrontation training manner includes: the following training steps are performed iteratively: fixing parameters of the generated model, and training the first discrimination model and the second discrimination model by using a machine learning method based on the sample set; parameters of the first and second discrimination models are fixed, and the generated model is trained by a machine learning method based on the sample set.
In some optional implementation manners of this embodiment, each sample image has label information, where the label information is used to indicate whether the sample image is a sample image; and training the first and second discrimination models by a machine learning method based on the sample set by fixing the parameters of the generated model, the method including: copying the generated model to obtain a copied model, wherein the copied model is used for generating a makeup-carrying restored image corresponding to the makeup removing image and a makeup-free restored image corresponding to the makeup applying image; respectively taking a makeup sample image and a makeup-free sample image in each group of samples as the input of the generated model, taking a makeup removing image generated by the generated model and a makeup sample image in the sample as the input of the first judgment model, taking a makeup applying image generated by the generated model and a makeup-free sample image in the sample as the input of the second judgment model and the input of a pre-trained semantic segmentation network, and taking a makeup removing image and a makeup applying image generated by the generated model as the input of the duplication model, wherein the semantic segmentation network is used for extracting histogram features of a five-sense region in a human face image; the parameters of the generated model are fixed, and the first and second discrimination models are trained by a machine learning method based on each set of samples and information output by each model after the set of samples is input.
In some optional implementation manners of this embodiment, the training the first and second discrimination models by using a machine learning method based on each group of samples and information output by each model after the group of samples is input includes: for each set of samples, the following steps are performed: determining a first loss value based on the makeup sample image in the group of samples and the makeup removing image generated by the generation model; determining a second loss value based on the non-cosmetic sample image in the group of samples and the cosmetic image generated by the generation model; setting marking information for the makeup removing image and the makeup applying image generated by the generated model; determining a third loss value based on the discrimination result output by the first discrimination model and the label information of each image input to the first discrimination model; determining a fourth loss value based on the discrimination result output by the second discrimination model and the label information of each image input to the second discrimination model; determining a fifth loss value based on the cosmetic reduction image and the cosmetic sample image in the group of samples; determining a sixth loss value based on the makeup-free restored image and the makeup-free sample image in the group of samples; determining a seventh loss value based on the histogram feature output by the semantic segmentation network; the sum of the loss values is used as a total loss value, and the parameters of the first and second discrimination models are updated based on the total loss value.
In some optional implementations of this embodiment, the determining unit 403 is further configured to: detecting an interested region in the second face image by adopting an image mask; determining a first coordinate of the central point of the region of interest in the second face image; determining a second coordinate corresponding to the first coordinate in the image to be processed; and determining a corresponding region corresponding to the region of interest in the image to be processed based on the second coordinates, wherein the region of interest corresponds to pixels in the corresponding region one to one.
In some optional implementations of this embodiment, the replacing unit 404 is further configured to: and taking each pixel in the corresponding area as a target pixel one by one, determining a corresponding pixel corresponding to the target pixel in the interested area, and replacing the pixel value of the target pixel with the pixel value of the corresponding pixel.
According to the device provided by the embodiment of the application, the first face image is obtained by carrying out face detection on the image to be processed; then, performing makeup processing on the first face image to generate a second face image; then detecting an interested region in the second face image, and determining a corresponding region corresponding to the interested region in the image to be processed; then replacing the corresponding area with the region of interest to obtain a fused image; and finally, updating the pixel value of each pixel point in the fused image by using an image fusion algorithm, thereby generating the target image. Therefore, the makeup processing of the face area in the whole-body image can be realized without modifying other areas, and the reality degree of the processed image is improved. In addition, the corresponding area in the image to be processed is replaced by the interesting area in the face image after makeup processing, and compared with a mode of directly replacing a rectangular area of the face, the replaced area can be closer to the actual face area. Meanwhile, the obtained fusion image is processed by using an image fusion algorithm, so that the transition of the boundary color of the region of interest in the fusion image is more natural, and the quality of the image after makeup processing is improved.
An embodiment of the present invention further provides an electronic device, as shown in fig. 5, which includes a processor 501, a communication interface 502, a memory 503 and a communication bus 504, where the processor 501, the communication interface 502 and the memory 503 complete mutual communication through the communication bus 504,
a memory 503 for storing a computer program;
the processor 501, when executing the program stored in the memory 503, implements the following steps: carrying out face detection on an image to be processed to obtain a first face image; performing makeup processing on the first face image to generate a second face image; detecting an interested region in the second face image, and determining a corresponding region corresponding to the interested region in the image to be processed; replacing the corresponding region with the region of interest to obtain a fused image; and updating the pixel value of each pixel point in the fused image by using an image fusion algorithm to generate a target image.
The communication bus mentioned in the above terminal may be a Peripheral Component Interconnect (PCI) bus, an Extended Industry Standard Architecture (EISA) bus, or the like. The communication bus may be divided into an address bus, a data bus, a control bus, etc. For ease of illustration, only one thick line is shown, but this does not mean that there is only one bus or one type of bus.
The communication interface is used for communication between the terminal and other equipment.
The Memory may include a Random Access Memory (RAM) or a non-volatile Memory (non-volatile Memory), such as at least one disk Memory. Optionally, the memory may also be at least one memory device located remotely from the processor.
The Processor may be a general-purpose Processor, and includes a Central Processing Unit (CPU), a Network Processor (NP), and the like; the device can also be a Digital Signal Processor (DSP), an Application Specific Integrated Circuit (ASIC), a Field Programmable Gate Array (FPGA) or other Programmable logic device, a discrete Gate or transistor logic device, or a discrete hardware component.
In yet another embodiment of the present invention, a computer-readable storage medium is further provided, which has instructions stored therein, and when the instructions are executed on a computer, the instructions cause the computer to execute the image processing method described in any of the above embodiments.
In yet another embodiment, the present invention further provides a computer program product containing instructions which, when run on a computer, cause the computer to perform the image processing method described in any of the above embodiments.
In the above embodiments, the implementation may be wholly or partially realized by software, hardware, firmware, or any combination thereof. When implemented in software, may be implemented in whole or in part in the form of a computer program product. The computer program product includes one or more computer instructions. When loaded and executed on a computer, cause the processes or functions described in accordance with the embodiments of the invention to occur, in whole or in part. The computer may be a general purpose computer, a special purpose computer, a network of computers, or other programmable device. The computer instructions may be stored in a computer readable storage medium or transmitted from one computer readable storage medium to another, for example, from one website site, computer, server, or data center to another website site, computer, server, or data center via wired (e.g., coaxial cable, fiber optic, Digital Subscriber Line (DSL)) or wireless (e.g., infrared, wireless, microwave, etc.). The computer-readable storage medium can be any available medium that can be accessed by a computer or a data storage device, such as a server, a data center, etc., that incorporates one or more of the available media. The usable medium may be a magnetic medium (e.g., floppy Disk, hard Disk, magnetic tape), an optical medium (e.g., DVD), or a semiconductor medium (e.g., Solid State Disk (SSD)), among others.
It is noted that, herein, relational terms such as first and second, and the like may be used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Also, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other identical elements in a process, method, article, or apparatus that comprises the element.
All the embodiments in the present specification are described in a related manner, and the same and similar parts among the embodiments may be referred to each other, and each embodiment focuses on the differences from the other embodiments. In particular, for the system embodiment, since it is substantially similar to the method embodiment, the description is simple, and for the relevant points, reference may be made to the partial description of the method embodiment.
The above description is only for the preferred embodiment of the present invention, and is not intended to limit the scope of the present invention. Any modification, equivalent replacement, or improvement made within the spirit and principle of the present invention shall fall within the protection scope of the present invention.

Claims (11)

1. An image processing method, characterized in that the method comprises:
carrying out face detection on an image to be processed to obtain a first face image;
performing makeup processing on the first face image to generate a second face image;
detecting an interested region in the second face image, and determining a corresponding region corresponding to the interested region in the image to be processed;
replacing the corresponding region with the region of interest to obtain a fused image;
and updating the pixel value of each pixel point in the fused image by using an image fusion algorithm to generate a target image.
2. The method according to claim 1, wherein the generating a target image by updating the pixel value of each pixel point in the fused image by using an image fusion algorithm comprises:
determining a gradient field of the fused image;
determining a divergence field of the fused image based on the gradient field;
and updating the pixel value of each pixel point in the fused image by using a Poisson image fusion algorithm based on the divergence field to generate a target image.
3. The method according to claim 1, wherein the performing face detection on the image to be processed to obtain a first face image comprises:
zooming an image to be processed to different scales to obtain a plurality of zoomed images, wherein the zoomed images comprise the image to be processed;
and inputting the plurality of scaled images into a pre-trained multi-task detection model to obtain a first face image for representing a face region in the image to be processed, wherein the multi-task detection model is used for carrying out face detection on the image, and is obtained by training a multi-task convolutional neural network.
4. The method of claim 1, wherein the first facial image is a makeup-free facial image;
the making up treatment is carried out on the first face image to generate a second face image, and the making up treatment comprises the following steps:
acquiring a makeup reference image, wherein the makeup reference image is a face image with a target makeup;
inputting the first face image and the makeup reference image into a pre-trained makeup processing model to obtain a second face image corresponding to the first face image, wherein the second face image is provided with the target makeup.
5. The method of claim 4, wherein the cosmetic treatment model is trained by:
obtaining a sample set, wherein each group of samples in the sample set comprises a sample image with makeup and a sample image without makeup, and each sample image is a human face image;
extracting a pre-established makeup generation confrontation network, wherein the makeup generation confrontation network comprises a generation model, a first judgment model and a second judgment model, the generation model is used for generating a makeup removing image corresponding to a makeup sample image and a makeup applying image corresponding to a makeup-free sample image, the first judgment model is used for judging whether the makeup removing image generated by the generation model is the makeup-free sample image, and the second judgment model is used for judging whether the makeup applying image generated by the generation model is the makeup-containing sample image;
and training the makeup generation confrontation network by using a confrontation training mode based on the sample set, and taking the trained generation model as a makeup treatment model.
6. The method of claim 5, wherein training the makeup generation confrontation network with a confrontation training mode based on the sample set comprises:
the following training steps are performed iteratively:
fixing parameters of the generated model, and training the first discrimination model and the second discrimination model by using a machine learning method based on the sample set;
and fixing parameters of the first discrimination model and the second discrimination model, and training the generated model by using a machine learning method based on the sample set.
7. The method according to claim 1, wherein the detecting a region of interest in the second face image and determining a corresponding region in the image to be processed corresponding to the region of interest comprises:
detecting an interested region in the second face image by adopting an image mask;
determining a first coordinate of the central point of the region of interest in the second face image;
determining a second coordinate corresponding to the first coordinate in the image to be processed;
and determining a corresponding region corresponding to the region of interest in the image to be processed based on the second coordinates, wherein the region of interest corresponds to pixels in the corresponding region one to one.
8. The method of claim 1, wherein replacing the corresponding region with the region of interest, resulting in a fused image, comprises:
and taking each pixel in the corresponding area as a target pixel one by one, determining a corresponding pixel corresponding to the target pixel in the interested area, and replacing the pixel value of the target pixel with the pixel value of the corresponding pixel.
9. An image processing apparatus, characterized in that the apparatus comprises:
the face detection unit is configured to perform face detection on the image to be processed to obtain a first face image;
a makeup processing unit configured to perform makeup processing on the first face image, generating a second face image;
a determining unit configured to detect a region of interest in the second face image and determine a corresponding region corresponding to the region of interest in the image to be processed;
a replacement unit configured to replace the corresponding region with the region of interest, resulting in a fused image;
the generating unit is configured to update the pixel value of each pixel point in the fused image by using an image fusion algorithm to generate a target image.
10. An electronic device is characterized by comprising a processor, a communication interface, a memory and a communication bus, wherein the processor and the communication interface are used for realizing mutual communication by the memory through the communication bus;
a memory for storing a computer program;
a processor for implementing the method steps of any of claims 1 to 8 when executing a program stored in the memory.
11. A computer-readable storage medium, on which a computer program is stored which, when being executed by a processor, carries out the method according to any one of claims 1 to 8.
CN202010100146.9A 2020-02-18 2020-02-18 Image processing method, image processing device, electronic equipment and computer readable storage medium Pending CN111325657A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010100146.9A CN111325657A (en) 2020-02-18 2020-02-18 Image processing method, image processing device, electronic equipment and computer readable storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010100146.9A CN111325657A (en) 2020-02-18 2020-02-18 Image processing method, image processing device, electronic equipment and computer readable storage medium

Publications (1)

Publication Number Publication Date
CN111325657A true CN111325657A (en) 2020-06-23

Family

ID=71172696

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010100146.9A Pending CN111325657A (en) 2020-02-18 2020-02-18 Image processing method, image processing device, electronic equipment and computer readable storage medium

Country Status (1)

Country Link
CN (1) CN111325657A (en)

Cited By (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111754396A (en) * 2020-07-27 2020-10-09 腾讯科技(深圳)有限公司 Face image processing method and device, computer equipment and storage medium
CN111767924A (en) * 2020-07-03 2020-10-13 杭州睿琪软件有限公司 Image processing method, image processing apparatus, electronic device, and storage medium
CN111784773A (en) * 2020-07-02 2020-10-16 清华大学 Image processing method and device and neural network training method and device
CN112348765A (en) * 2020-10-23 2021-02-09 深圳市优必选科技股份有限公司 Data enhancement method and device, computer readable storage medium and terminal equipment
CN112767287A (en) * 2021-03-10 2021-05-07 百果园技术(新加坡)有限公司 Model training method, image processing method, device, equipment and medium
CN113344776A (en) * 2021-06-30 2021-09-03 北京字跳网络技术有限公司 Image processing method, model training method, device, electronic device and medium
CN113570689A (en) * 2021-07-28 2021-10-29 杭州网易云音乐科技有限公司 Portrait cartoon method, apparatus, medium and computing device
CN113822245A (en) * 2021-11-22 2021-12-21 杭州魔点科技有限公司 Face recognition method, electronic device, and medium
CN113887384A (en) * 2021-09-29 2022-01-04 平安银行股份有限公司 Pedestrian trajectory analysis method, device, equipment and medium based on multi-trajectory fusion
WO2023239302A1 (en) * 2022-06-10 2023-12-14 脸萌有限公司 Image processing method and apparatus, electronic device, and storage medium

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108257084A (en) * 2018-02-12 2018-07-06 北京中视广信科技有限公司 A kind of automatic cosmetic method of lightweight face based on mobile terminal
CN108898546A (en) * 2018-06-15 2018-11-27 北京小米移动软件有限公司 Face image processing process, device and equipment, readable storage medium storing program for executing
CN109859288A (en) * 2018-12-25 2019-06-07 北京飞搜科技有限公司 Based on the image painting methods and device for generating confrontation network
CN109886881A (en) * 2019-01-10 2019-06-14 中国科学院自动化研究所 Face dressing minimizing technology
US20200013212A1 (en) * 2017-04-04 2020-01-09 Intel Corporation Facial image replacement using 3-dimensional modelling techniques

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20200013212A1 (en) * 2017-04-04 2020-01-09 Intel Corporation Facial image replacement using 3-dimensional modelling techniques
CN108257084A (en) * 2018-02-12 2018-07-06 北京中视广信科技有限公司 A kind of automatic cosmetic method of lightweight face based on mobile terminal
CN108898546A (en) * 2018-06-15 2018-11-27 北京小米移动软件有限公司 Face image processing process, device and equipment, readable storage medium storing program for executing
CN109859288A (en) * 2018-12-25 2019-06-07 北京飞搜科技有限公司 Based on the image painting methods and device for generating confrontation network
CN109886881A (en) * 2019-01-10 2019-06-14 中国科学院自动化研究所 Face dressing minimizing technology

Cited By (16)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111784773A (en) * 2020-07-02 2020-10-16 清华大学 Image processing method and device and neural network training method and device
CN111767924A (en) * 2020-07-03 2020-10-13 杭州睿琪软件有限公司 Image processing method, image processing apparatus, electronic device, and storage medium
WO2022002002A1 (en) * 2020-07-03 2022-01-06 杭州睿琪软件有限公司 Image processing method, image processing apparatus, electronic device, and storage medium
CN111767924B (en) * 2020-07-03 2024-01-26 杭州睿琪软件有限公司 Image processing method, image processing apparatus, electronic device, and storage medium
CN111754396B (en) * 2020-07-27 2024-01-09 腾讯科技(深圳)有限公司 Face image processing method, device, computer equipment and storage medium
CN111754396A (en) * 2020-07-27 2020-10-09 腾讯科技(深圳)有限公司 Face image processing method and device, computer equipment and storage medium
WO2022022154A1 (en) * 2020-07-27 2022-02-03 腾讯科技(深圳)有限公司 Facial image processing method and apparatus, and device and storage medium
CN112348765A (en) * 2020-10-23 2021-02-09 深圳市优必选科技股份有限公司 Data enhancement method and device, computer readable storage medium and terminal equipment
CN112767287A (en) * 2021-03-10 2021-05-07 百果园技术(新加坡)有限公司 Model training method, image processing method, device, equipment and medium
CN113344776A (en) * 2021-06-30 2021-09-03 北京字跳网络技术有限公司 Image processing method, model training method, device, electronic device and medium
CN113570689A (en) * 2021-07-28 2021-10-29 杭州网易云音乐科技有限公司 Portrait cartoon method, apparatus, medium and computing device
CN113570689B (en) * 2021-07-28 2024-03-01 杭州网易云音乐科技有限公司 Portrait cartoon method, device, medium and computing equipment
CN113887384A (en) * 2021-09-29 2022-01-04 平安银行股份有限公司 Pedestrian trajectory analysis method, device, equipment and medium based on multi-trajectory fusion
CN113822245B (en) * 2021-11-22 2022-03-04 杭州魔点科技有限公司 Face recognition method, electronic device, and medium
CN113822245A (en) * 2021-11-22 2021-12-21 杭州魔点科技有限公司 Face recognition method, electronic device, and medium
WO2023239302A1 (en) * 2022-06-10 2023-12-14 脸萌有限公司 Image processing method and apparatus, electronic device, and storage medium

Similar Documents

Publication Publication Date Title
CN111325657A (en) Image processing method, image processing device, electronic equipment and computer readable storage medium
JP7446457B2 (en) Image optimization method and device, computer storage medium, computer program, and electronic equipment
JP7301092B2 (en) Image recognition method, apparatus, electronic device, computer storage medium, and program
CN111507333B (en) Image correction method and device, electronic equipment and storage medium
JP6161326B2 (en) Image processing apparatus, image processing method, and program
CN110503704B (en) Method and device for constructing three-dimensional graph and electronic equipment
JP7419080B2 (en) computer systems and programs
US11087514B2 (en) Image object pose synchronization
CN112308866A (en) Image processing method, image processing device, electronic equipment and storage medium
CN112001285B (en) Method, device, terminal and medium for processing beauty images
CN111127309A (en) Portrait style transfer model training method, portrait style transfer method and device
CN116958492B (en) VR editing method for reconstructing three-dimensional base scene rendering based on NeRf
US20240020810A1 (en) UNIVERSAL STYLE TRANSFER USING MULTl-SCALE FEATURE TRANSFORM AND USER CONTROLS
KR102628115B1 (en) Image processing method, device, storage medium, and electronic device
Zhang et al. A light dual-task neural network for haze removal
CN113793259B (en) Image zooming method, computer device and storage medium
CN114638375A (en) Video generation model training method, video generation method and device
CN114331902A (en) Noise reduction method and device, electronic equipment and medium
CN111476741B (en) Image denoising method, image denoising device, electronic equipment and computer readable medium
CN111836058A (en) Method, device and equipment for real-time video playing and storage medium
Peng et al. Mpib: An mpi-based bokeh rendering framework for realistic partial occlusion effects
CN112714337A (en) Video processing method and device, electronic equipment and storage medium
WO2024041235A1 (en) Image processing method and apparatus, device, storage medium and program product
US20220398704A1 (en) Intelligent Portrait Photography Enhancement System
WO2022258013A1 (en) Image processing method and apparatus, electronic device and readable storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination