WO2022160701A1 - 特效生成方法、装置、设备及存储介质 - Google Patents

特效生成方法、装置、设备及存储介质 Download PDF

Info

Publication number
WO2022160701A1
WO2022160701A1 PCT/CN2021/115411 CN2021115411W WO2022160701A1 WO 2022160701 A1 WO2022160701 A1 WO 2022160701A1 CN 2021115411 W CN2021115411 W CN 2021115411W WO 2022160701 A1 WO2022160701 A1 WO 2022160701A1
Authority
WO
WIPO (PCT)
Prior art keywords
map
face
hair
image
mask
Prior art date
Application number
PCT/CN2021/115411
Other languages
English (en)
French (fr)
Inventor
吴文岩
唐斯伟
郑程耀
张丽
钱晨
Original Assignee
北京市商汤科技开发有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 北京市商汤科技开发有限公司 filed Critical 北京市商汤科技开发有限公司
Publication of WO2022160701A1 publication Critical patent/WO2022160701A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T3/00Geometric image transformations in the plane of the image
    • G06T3/04Context-preserving transformations, e.g. by using an importance map
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T5/00Image enhancement or restoration
    • G06T5/50Image enhancement or restoration using two or more images, e.g. averaging or subtraction
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T5/00Image enhancement or restoration
    • G06T5/70Denoising; Smoothing
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/90Determination of colour characteristics
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • G06V40/16Human faces, e.g. facial parts, sketches or expressions
    • G06V40/168Feature extraction; Face representation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20212Image combination
    • G06T2207/20221Image fusion; Image merging
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/30Subject of image; Context of image processing
    • G06T2207/30196Human being; Person
    • G06T2207/30201Face

Definitions

  • the present disclosure relates to the technical field of computer vision, and in particular, to a special effect generation method, apparatus, device and storage medium.
  • the present disclosure provides a special effect generation method, apparatus, device and storage medium.
  • a method for generating special effects includes: blurring a hair region in a target face image to obtain a blurred hair image; and generating a texture according to the target face image image, face mask image and hair mask image; fuse the face mask image and hair mask image to obtain a fusion mask image; based on the fusion coefficient determined according to the fusion mask image, The fuzzy hair map and the texture map are fused to obtain a special effect map of the target face map.
  • determining the fusion coefficient according to the fusion mask map includes: based on different regions in the fusion mask map, respectively determining corresponding regions in the texture map and the fuzzy hair map fusion coefficient.
  • the fusion of the blurred hair map and the texture map to obtain a special effect map of the target face map includes: according to pixel values in the texture map and the texture The fusion coefficient of the picture is to determine the first set of pixel values; according to the pixel values in the fuzzy hair picture and the fusion coefficient of the fuzzy hair picture, the second set of pixel values is determined; based on the first set of pixel values and the The second set of pixel values determines the pixel values in the special effect map of the target face map.
  • the method further includes: fusing the special effect image of the target face image with the original face image to obtain the original special effect picture.
  • the method further includes: based on the gender information of the human face in the original special effect image, adjusting the outline of the face in the original special effect image, and/or adjusting the original effect image Beautify.
  • the blurring the hair region in the target face image to obtain the blurred hair image includes: performing hair segmentation on the target face image to obtain a hair segmentation image; based on the hair Segmenting the image, filling the hair region in the target face image according to background pixels to obtain the fuzzy hair image.
  • the generating a texture map, a face mask map and a hair mask map according to the target face map includes: acquiring face key point information in the target face map; According to the face key point information, determine the face heat map corresponding to the target face map; input the face heat map into a pre-trained deep neural network to obtain the texture map, face mask map and Hair mask illustration.
  • the deep neural network includes an encoder and a decoder; the encoder is configured to perform an encoding operation on the face heat map according to a convolution filter; the decoder is configured to The face heatmap is decoded according to the convolution filter.
  • the acquiring the face key point information in the target face map includes: acquiring the face key points in the target face map based on a pre-trained face key point detection network point information; the face key point detection network is obtained by training according to a sample face map, wherein the sample face map includes a face image whose face angle is greater than a preset angle threshold.
  • the method further includes: performing color migration processing on the face region in the texture map based on the color of the face region in the target face map, so that the texture map The color of the face region in the target face map is consistent with the color of the face region in the target face map.
  • performing color migration processing on the face region in the texture map based on the color of the face region in the target face map includes: according to the face region in the texture map The color value of the pixel, the first color average value is obtained; according to the color value of the face area pixel in the target face map, the second color average value is obtained; based on the first color average value and the second color average value value, and update the color value of the pixel in the face area in the texture map.
  • the method further includes: merging the eye region and the mouth in the fusion mask image Adjust the pixel value of the area.
  • an apparatus for generating special effects comprising: a blurring processing module for performing blurring processing on a hair region in a target face image to obtain a blurred hair image; a generating module for using According to the target face map, a texture map, a face mask map and a hair mask map are generated; the first fusion module is used to fuse the face mask map and the hair mask map to obtain a fusion mask.
  • the second fusion module is configured to fuse the fuzzy hair map and the texture map based on the fusion coefficient determined according to the fusion mask map to obtain a special effect map of the target face map.
  • the second fusion module includes: a fusion coefficient determination sub-module, configured to respectively determine the texture map and the fuzzy hair map based on different regions in the fusion mask map The fusion coefficient of the corresponding region.
  • the method when the second fusion module is configured to fuse the blurred hair image and the texture image to obtain a special effect image of the target face image, the method includes: according to the texture The pixel value in the figure and the fusion coefficient of the texture map, determine the first set of pixel values; according to the pixel value of the fuzzy hair map and the fusion coefficient of the fuzzy hair map, determine the second set of pixel values; based on the The first pixel value set and the second pixel value set determine the pixel values in the special effect map of the target face map.
  • the apparatus further includes a third fusion module, configured to fuse the special effect image of the target face image with the original face image to obtain the original special effect image.
  • the apparatus further includes an adjustment processing module, configured to adjust the outline of the face in the original special effect image based on the gender information of the face in the original special effect image, and/or adjust The original special effect image is subjected to beautification processing.
  • an adjustment processing module configured to adjust the outline of the face in the original special effect image based on the gender information of the face in the original special effect image, and/or adjust The original special effect image is subjected to beautification processing.
  • the method when the blurring processing module is used for blurring the hair region in the target face image to obtain a blurred hair image, the method includes: performing hair segmentation on the target face image to obtain Hair segmentation map; based on the hair segmentation map, fill the hair region in the target face map according to background pixels to obtain the fuzzy hair map.
  • the generation module includes: a face key point sub-module for acquiring face key point information in the target face map; a face heat map sub-module for face key point information, to determine the face heat map corresponding to the target face map; deep neural network sub-module for inputting the face heat map into a pre-trained deep neural network to obtain the texture map, human Face mask and hair mask.
  • the deep neural network includes an encoder and a decoder; the encoder is configured to perform an encoding operation on the face heat map according to a convolution filter; the decoder is configured to The face heatmap is decoded according to the convolution filter.
  • the face key point sub-module when used to obtain the face key point information in the target face map, it includes: based on a pre-trained face key point detection network, obtaining all the face key points.
  • the face key point information in the target face map; the face key point detection network is obtained by training according to the sample face map, wherein the sample face map includes a face angle greater than a preset angle threshold face image.
  • the device further includes: a color migration module, configured to perform color migration processing on the face region in the texture map based on the color of the face region in the target face map, so as to Make the color of the face region in the texture map consistent with the color of the face region in the target face map.
  • a color migration module configured to perform color migration processing on the face region in the texture map based on the color of the face region in the target face map, so as to Make the color of the face region in the texture map consistent with the color of the face region in the target face map.
  • the color migration module when configured to perform color migration processing on the face region in the texture map based on the color of the face region in the target face map, it includes: according to The color value of the pixels in the face area in the texture map is to obtain the first color average value; according to the color value of the face area pixels in the target face map, the second color average value is obtained; based on the first color average value value and the average value of the second color, and update the color value of the pixel in the face area in the texture map.
  • the apparatus further includes: a pixel value adjustment module, configured to adjust the pixel values of the eye region and the mouth region in the fusion mask map.
  • a computer device including a memory, a processor, and a computer program stored in the memory and executable on the processor, the processor implementing the first aspect when the program is executed
  • a computer-readable storage medium on which a computer program is stored, and when the program is executed by a processor, implements any one of the special effect generation methods described in the first aspect.
  • a computer program product including a computer program, which implements any one of the special effect generation methods described in the first aspect when the program is executed by a processor.
  • a fuzzy hair image is obtained by blurring the hair region of the target face image, and then a texture image, a face mask image and a hair mask image are generated according to the target face image, and the face mask image is generated.
  • the image and the hair mask image are further fused to obtain a fusion mask image, so that the blurred hair image and the texture image can be fused according to the fusion coefficient determined by the fusion mask image to generate a more realistic and natural special effect image.
  • FIG. 2 is a schematic diagram of a fuzzy hair diagram according to an exemplary embodiment
  • FIG. 3 is a schematic diagram of a hair segmentation diagram according to an exemplary embodiment
  • FIG. 4 is a schematic diagram of a texture map, a face mask map, and a hair mask map according to an exemplary embodiment
  • 5A is a schematic diagram of a fusion mask diagram according to an exemplary embodiment
  • 5B is a schematic diagram of yet another fusion mask diagram according to an exemplary embodiment
  • FIG. 6 is a schematic diagram of a special effect diagram according to an exemplary embodiment
  • FIG. 7 is a flow chart of processing a target face map according to an exemplary embodiment
  • FIG. 8 is a schematic diagram of a face heat map according to an exemplary embodiment
  • FIG. 9 is a schematic diagram of a deep neural network according to an exemplary embodiment
  • FIG. 10 is a flowchart of yet another special effect generation method according to an exemplary embodiment
  • FIG. 11 is a schematic diagram of an image processing flow according to an exemplary embodiment
  • FIG. 12 is a schematic diagram of an apparatus for generating special effects according to an exemplary embodiment
  • FIG. 13 is a schematic diagram of yet another special effect generating apparatus according to an exemplary embodiment
  • Fig. 14 is a schematic structural diagram of a computer device according to an exemplary embodiment.
  • first, second, third, etc. may be used in this disclosure to describe various pieces of information, such information should not be limited by these terms. These terms are only used to distinguish the same type of information from each other.
  • first information may also be referred to as the second information, and similarly, the second information may also be referred to as the first information, without departing from the scope of the present disclosure.
  • word "if” as used herein can be interpreted as "at the time of” or "when” or "in response to determining.”
  • special effects generation has gradually become a hot topic in computer vision and iconography.
  • special effects generation technology has important applications in many image generation fields.
  • special effects such as gender conversion, style transfer, and ornament addition can be realized for image processing.
  • a special effect function of gender conversion may be provided to the user, so as to realize gender conversion in the short video or photo obtained by shooting.
  • a feminine photo can be obtained from a male photo taken, or a masculine photo can be obtained from a female photo.
  • the present disclosure provides a special effect generation method, which can blur the hair area in a target face image to obtain a blurred hair image; and generate a texture image, a face mask image and a hair mask according to the target face image Figure; fuse the face mask map and the hair mask map to obtain the fusion mask map; based on the fusion coefficient determined according to the fusion mask map, fuse the fuzzy hair map and the texture map to obtain the special effect map of the target face map .
  • FIG. 1 is a flowchart of a method for generating special effects according to an embodiment provided by the present disclosure. As shown in Figure 1, the process includes:
  • Step 101 blurring the hair region in the target face image to obtain a blurred hair image.
  • the target face map is an image that contains the target face and is to be processed with special effects.
  • an image frame containing a target face can be acquired from a video recording device as a target face map to be subjected to special effects processing according to the special effect generation method of the present disclosure.
  • the target face can include the first face that appears from any direction in the image.
  • the target face may include the first face that appears from the left in the image, or the first face that appears from the right in the image.
  • the target face may include all faces in the image, or include faces in the image that meet specific requirements.
  • the specific requirement may be a pre-defined condition for screening the target face in the image.
  • the face in the image may be selected by the user as the target face that meets the specific requirements.
  • the specific requirement can be a specific face attribute, and only if the face in the image satisfies the specific face attribute can be used as the target face.
  • a specific requirement may be pre-defined as the completeness of the human face, and a face whose facial integrity meets the requirement may be used as the target face.
  • the method before blurring the hair region in the target face image, the method further includes: acquiring an original face image to be processed; adjusting the original face image , to obtain the target face map of preset size and/or preset angle.
  • the image including the target face collected by the image collection device may be referred to as the original face map. Since the size of the original face map is usually not uniform, it is inconvenient to further detect or process the target face in the image. For example, in the case of processing the target face image based on the deep neural network, the deep neural network usually requires the size of the input image to be consistent.
  • the original face image can be adjusted to obtain a target face image conforming to the preset size.
  • the adjustment processing may include cropping processing of the original face image.
  • the original face image can be cropped to obtain an image that meets the preset size requirements as the target face image.
  • the original face image can be cropped into a target face image with a resolution of 512*384.
  • the adjustment process may include angular adjustment of the original face map. Since the angles of the target faces in the original face map are different, the direction of the generated special effects may be inconsistent, which affects the user's perception of the special effects, so it is necessary to adjust the angle of the target face.
  • the angle of the original face image can be adjusted to obtain the target face image that meets the preset angle requirements.
  • the preset angle requirements can be customized in advance according to the actual special effect direction. Exemplarily, it may be pre-defined that the target face does not have left and right roll in the target face map.
  • the original face image can be cropped into a target face image with a resolution of 512*384.
  • a target face image with a resolution of 512*384.
  • ensuring that the target face is in the center of the image during the adjustment process can be implemented in various ways, which is not limited in this embodiment.
  • the face frame of the target face in the original face image can be detected, and the range of the face frame can be expanded outward based on the center point of the face frame until a preset size requirement is reached and cropped.
  • the specific manners for ensuring the integrity of the hair region in the image during the adjustment process may include various methods, which are not limited in this embodiment.
  • the size range or inclination angle of the target face can be determined, and the range of the hair region corresponding to the target face in the image can be determined in combination with the empirical value, thereby Include hair areas during cropping.
  • a deep neural network can be trained with training samples in advance to obtain a hair segmentation model that can be used to identify hair regions in an image, and the hair segmentation model can determine the hair region of the target face in the image to determine the hair when cropping. The clipping extent of the region.
  • the adjustment processing method for the original face image may include various methods, and any adjustment processing method that can obtain an image of a preset size can be used as the adjustment processing method in this embodiment.
  • the size of the target face image obtained in this way is uniform, which is convenient for special effects processing.
  • performing adjustment processing on the original face image to obtain the target face image of a preset size and/or preset angle includes: performing an adjustment on the original face image The cropping process is performed to obtain a cropped face image; the affine transformation is performed on the cropped face image to obtain the target face image of a preset size.
  • the original face image may be firstly cropped to obtain a cropped face image including the target face. If the size of the cropped face image conforms to the preset size, the cropped face image can be used as the target face image; if the size of the cropped face image does not conform to the preset size, affine transformation can be performed on the cropped face image , to obtain the transformed image of the preset size as the target face image. In this manner, when the cropped image does not conform to the preset size, the cropped image can be subjected to affine transformation to obtain a target face image with a uniform size, so as to facilitate special effects processing on the target face image.
  • the hair area in the target face image can be blurred, for example, the hair area is occluded or blurred to hide the hair pixels to obtain a blurred hair image.
  • it is a blurred hair image obtained by blurring the hair region of the target face image.
  • the target face image can be subjected to hair segmentation processing to obtain a hair segmentation image; based on the hair segmentation image, the hair region in the target face image is filled according to background pixels to obtain the fuzzy hair. picture.
  • the hair region in the target face map can be segmented to obtain a hair segment map.
  • Figure 3 shows a hair segmentation map.
  • the specific manner of segmenting the hair region in the target face map is not limited in this embodiment. Exemplarily, based on a learnable machine learning model or a neural network model, a segmentation model that meets the requirements and can be used to segment the hair region in the target face image can be pre-trained, so that the target face image can be input into the target face image. Segment the model and output the corresponding hair segmentation map from the segmentation model.
  • the hair region corresponding to the target face in the target face map can be further determined.
  • the pixels of the hair region in the target face image can be further filled as background pixels, so that the background pixels are used to block the hair region in the target face image to obtain a blurred hair image.
  • the original pixels of the hair area in the target face image are replaced with background pixels, that is, the background pixels are used to realize the detection of hair. Occlusion of the area, and finally a blurred hair map.
  • the background pixel may be a pixel on the background image behind the human object in the face image.
  • any pixel adjacent to or not adjacent to the hair region in the target face image can be filled into the hair region as a background pixel, so as to realize the blurring of the hair region in the target face image.
  • any pixel that can be used to block the hair region can be used as a background pixel, and this embodiment does not limit the specific value of the background pixel, nor does it limit the acquisition method of the background pixel.
  • Step 102 according to the target face map, generate a texture map, a face mask map and a hair mask map.
  • the target face image can be used as the input of a pre-trained deep neural network, and the deep neural network outputs the texture image, face mask image and hair mask image corresponding to the target face image .
  • the deep neural network outputs the texture image, face mask image and hair mask image corresponding to the target face image .
  • a large number of sample face maps can be collected in advance as training samples, and the texture map, face mask map and hair mask map corresponding to the sample face map can be used as label values.
  • the sample face map can be input into the deep neural network to be trained, and the predicted texture map, face mask map and hair mask map are output by the deep neural network. The difference between the texture map, face mask map and hair mask map and the corresponding label values, adjust the network parameters.
  • the face heat map corresponding to the target face map can be used as the input of the pre-trained deep neural network, and the deep neural network outputs the corresponding texture map, face mask according to the face heat map Diemap and hair mask map.
  • the face key point information in the target face map can be obtained first; according to the face key point information, the face heat map corresponding to the target face map can be determined; thus, the face heat map can be input into the pre-trained deep neural network network.
  • a large number of sample face maps can be collected in advance, and the corresponding face heat map can be obtained as the input of the deep neural network according to the face key point information in the sample face map.
  • the face mask map and the hair mask map corresponding to the sample face map are manually marked, and the texture map corresponding to the sample face map is determined and marked based on the face mask map and the hair mask map.
  • the face heat map corresponding to the sample face map is input into the deep neural network to be trained, and the predicted texture map, face mask map and hair mask map are output by the deep neural network, and according to the output The difference between the result and the labeled label value adjusts the network parameters.
  • the corresponding texture map, face mask map and hair mask map can be generated according to the target face map.
  • the face mask map is used to represent the face area in the final special effect map
  • the hair mask map is used to represent the hair area in the final special effect map.
  • the pixel value of the corresponding face area is 1, and the pixel value of other areas is 0; in the hair mask image, the pixel value of the corresponding hair area is 1, and the pixel value of other areas is 0.
  • the deep neural network can output a corresponding texture map ImgTexture, a face mask map ImgFace-mask and a hair mask map ImgHair-mask according to the target face map ImgTarget.
  • the texture image ImgTexture is a face image with a different gender than the target face image ImgTarget.
  • the face in the texture map ImgTexture generated in this embodiment may be a texture map containing female features.
  • the pixel value of the face area is "1"
  • the pixel value of other parts is "0”, so as to distinguish the face area in the special effect image.
  • the hair mask image ImgHair-mask the pixel value of the hair area is "1", and the pixel value of other parts is "0", so as to distinguish the hair area in the special effect image.
  • Step 103 fuse the face mask map and the hair mask map to obtain a fusion mask map.
  • the face mask map and the hair mask map can be fused to obtain a fusion mask map.
  • the corresponding pixels of the face mask map and the hair mask map may be added to obtain a fusion mask map.
  • the pixel values in the face mask map and the hair mask map may be weighted and summed based on a preset weight value, so as to change the face region or hair region in the fusion mask map pixel value.
  • the pixel value of the face region in the fusion mask image can be updated to "0.5" by means of weighted summation, the pixel value of the hair region in the fusion mask image can be kept as "1", and the pixel value of other parts can be kept as "0.5". "0".
  • face regions, hair regions and other background regions can be distinguished.
  • the face mask image ImgFace-mask shown in FIG. 4 and the hair mask image ImgHair-mask may be fused to obtain the fusion mask image shown in FIG. 5A .
  • the method further includes: merging the eye region and the mouth in the fusion mask image Adjust the pixel value of the area. Therefore, the eye features and mouth features of the target face can be retained in the special effect map obtained based on the fuzzy hair map.
  • the pixel values of the eye region and the mouth region in the fusion mask map can be adjusted to obtain the final fusion mask map.
  • the corresponding eye region and mouth region in the fusion mask map can be determined according to the face key points corresponding to the target face.
  • the pixel values of the eye region and the mouth region in the fusion mask map can be updated.
  • the pixel values of the eye region and mouth region in the fusion mask map can be updated to "0". In this way, the eye region and the mouth region can be distinguished from the face region, so that the eye features and mouth features of the target face can be preserved in the process of generating the special effect map.
  • FIG. 5B it is a fusion mask image after the pixel values of the eye area and the mouth area are updated in FIG. 5A .
  • the pixel values of the face region including the eye region and the mouth region are both 0.5, and the eye region or the mouth region in the face region cannot be distinguished.
  • the pixel values of the mouth area and the eye area may be updated to "0", as shown in FIG. 5B .
  • the color of the face region in the texture map generated from the target face map is consistent with the color of the face region in the training sample.
  • the colors of the face area in the actual target face map are various. If the texture map generated by the deep neural network is directly fused, the color of the face in the generated special effect map may be different from the target face. The actual color difference is too large, and the color difference from the neck area is too large, resulting in an unrealistic effect image.
  • color migration processing may also be performed on the face region in the texture map based on the color of the face region in the target face map, so that the face region in the texture map The color of is consistent with the color of the face area in the target face map.
  • the first color average value can be obtained according to the color value of the pixels in the face region in the texture map;
  • the second average value can be obtained according to the color value of the pixels in the face region in the target face map Color average value; based on the first color average value and the second color average value, update the color value of the pixels in the face area in the texture map.
  • the color value of the pixel is used to characterize the color feature of the pixel. For example, in Lab (a color model established according to an international standard for measuring color formulated by the International Commission on Illumination (CIE) in 1931 and improved in 1976), each The Lab value of the pixel can be used as the color value of this embodiment.
  • CIE International Commission on Illumination
  • the color of the pixels in the face region in the texture map after color transfer processing and the color of the pixels in the face region in the target face map tend to be consistent in terms of visual effects.
  • the difference between the Lab value of the pixel in the face region in the texture map and the Lab value of the pixel in the face region in the target face map after color migration processing is close to 0, so from the visual effect
  • the color of the face region in the texture map after the color transfer process is consistent with the color of the face region in the target face map.
  • the Lab values of the pixels in the face region in the texture map can be averaged to obtain the average Lab value of the pixels in the face region in the texture map;
  • the Lab values of the pixels are averaged to obtain the average Lab value of the face region in the target face map.
  • the Lab value of the pixels in the face region in the texture map can be subtracted from the average Lab value of the pixels in the face region in the texture map, and the average Lab value of the face region in the target face map can be obtained.
  • the updated Lab value of the pixel in the face region in the texture map implements the update of the Lab value of the pixel in the face region in the texture map.
  • the color migration processing is performed on the face area in the texture image in the Lab color mode, so that the color of the face in the target face image can be consistent with the color of the face in the texture image. Prevents the face color in the generated special effect image from being too different from the actual color of the target face, and from the neck area, so that the generated special effect image is more real and natural.
  • color migration processing may be performed on the face region in the texture map based on the color of the face region in the target face map, so that the color of the face region in the texture map is the same as that of the face in the target face map. Areas are the same color.
  • the color of the face in the texture image obtained in this way is consistent with the color of the face in the target face image, the color difference between the face and the neck is reduced, and the final special effect image is more realistic and natural.
  • Step 104 Based on the fusion coefficient determined according to the fusion mask map, the fuzzy hair map and the texture map are fused to obtain a special effect map of the target face map.
  • the fusion coefficients of the fuzzy hair map and the texture map can be determined according to the fusion mask map.
  • the fusion coefficient is used to represent the proportion of different regions in the image participating in the image fusion.
  • the fusion coefficient of the hair region in the blurred hair map is usually smaller, so the pixels corresponding to the hair region in the blurred hair map have less influence on the hair region in the fused image.
  • the manner of determining the fusion coefficient includes determining according to experience or adjusting according to the actual effect diagram.
  • determining the fusion coefficient according to the fusion mask map includes: based on different regions in the fusion mask map, respectively determining the fusion coefficients of corresponding regions in the texture map, and correspondingly The fusion coefficients of the corresponding regions in the fuzzy hair map are respectively determined.
  • a fusion mask map usually includes multiple distinct regions.
  • the fusion mask map shown in FIG. 5B includes a hair area, a face area, an eye area, a mouth area, and a background area, respectively.
  • the fusion coefficients of the corresponding regions in the texture map can be determined according to empirical values, and the fusion coefficients of the corresponding regions in the fuzzy hair map can be determined accordingly.
  • the fusion coefficient of the hair region in the texture map can be determined to be 1, the fusion coefficient of the face region in the texture map can be determined to be 0.5, and the fusion coefficients of the eye region, mouth region and background region in the texture map can be determined as 0.
  • the sum of the fusion coefficient of the texture map corresponding to the same region of the fusion mask map and the fusion coefficient of the corresponding fuzzy hair map is 1.
  • the pixel values of different pixels in the fusion mask map may be determined in advance as fusion coefficients of corresponding pixels in the texture map.
  • the pixel value of the hair region is set to 1
  • the pixel value of the face region is set to 0.5
  • the pixel values of the eye region, mouth region and background region are all set to 0 .
  • the fuzzy hair map and the texture map can be fused to obtain a special effect map of the target face map. As shown in Figure 6, the special effect map of the target face map is obtained.
  • the fusion of the blurred hair image and the texture image to obtain the special effect image of the target face image includes: according to the pixel values in the texture image and the the fusion coefficient of the texture map, to determine the first set of pixel values; according to the pixel values of the fuzzy hair map and the fusion coefficient of the fuzzy hair map, to determine the second set of pixel values; based on the first set of pixel values and the The second set of pixel values determines the pixel values in the special effect map of the target face map.
  • the fusion coefficients corresponding to different regions of the texture map are different.
  • the fusion coefficient of the face region in the texture map is 0.5
  • the fusion coefficient of the hair region is 1
  • the fusion coefficient of the background region is 0.
  • the pixel value of the face region in the texture map can be taken as a weight of 0.5
  • the pixel value of the hair region in the texture map can be taken as a weight of 1
  • the pixel value of the background region in the texture map can be taken as a weight of 0 to obtain:
  • the pixel value corresponding to the complete texture map is the first pixel value set.
  • the fusion coefficients corresponding to different regions in the fuzzy hair map are also different.
  • the fusion coefficient of the face region in the blurred hair map is 0.5
  • the fusion coefficient of the hair region is 0,
  • the fusion coefficient of the background region is 1.
  • the weight of the pixel value of the face area in the fuzzy hair image can be taken as 0.5
  • the pixel value of the hair area in the fuzzy hair image can be taken as the weight of
  • the pixel value of the background area in the fuzzy hair image can be taken as 1.
  • weight, and the pixel value corresponding to the complete fuzzy hair image is obtained as the second pixel value set.
  • the corresponding pixel values in the two pixel sets can be added to obtain a complete pixel value set, that is, each pixel in the special effect image of the target face image is obtained, That is, the special effect map of the target face map is obtained.
  • part of the face features in the texture map can be preserved, while part of the face features in the fuzzy hair map can be preserved.
  • fusion processing is performed according to the determined fusion coefficients.
  • the forehead region fusion coefficient, the chin region fusion coefficient, the ear region fusion coefficient, and the like may be predetermined.
  • the method further includes: fusing the special effect image of the target face image with the original face image to obtain Original effect map.
  • the special effect map can be pasted back to the original face map to realize adding special effects on the basis of the original face map.
  • the pixel value of the special effect image of the target face image can be overlaid with the corresponding pixel value of the original face image to obtain the original special effect image.
  • a fuzzy hair map is obtained by blurring the hair region in the target face map; a texture map, a face mask map and a hair mask map are generated according to the target face map based on a deep neural network; The face mask map and the hair mask map are fused to obtain a fusion mask map, and the fusion coefficient is determined according to the fusion mask map; based on the determined fusion coefficient, the fuzzy hair map and the texture map are fused to obtain a more natural, Real effects graphics.
  • the facial contour in the original special effect image may be adjusted based on the gender information of the face in the original special effect image, and/or the original effect image may be adjusted.
  • the special effect map is processed for beautification.
  • the face contour in the special effect image when the face in the converted original special effect image is male, the face contour in the special effect image can be adjusted to be more angular, so as to conform to the characteristics of the male face contour. ; or, in the case that the human face in the converted original special effect image is a woman, the facial contour in the special effect image can be adjusted to be softer or rounder to match the features of the female face contour.
  • the original special effect image may be further beautified, for example, the original special effect image may be whitened, beautified, or added with filters, so as to further beautify the original special effect image.
  • step 102 may include the following steps:
  • Step 201 Obtain face key point information in the target face map.
  • face key point detection may be performed on the original face image to obtain face key point information.
  • the position coordinates of 106 face key points are obtained as face key point information.
  • the original face image is cropped to obtain a target face image of preset size including face key points.
  • face key points can be detected on the original face image, and the target face image can be obtained by cropping the original face image on the basis of the detected face key points, thereby obtaining the target face image.
  • Face key point information may be performed on the basis of the target face image, and the face key point information in the target face image may be directly obtained.
  • step 201 may include: acquiring the face key point information in the target face map based on a pre-trained face key point detection network; the face key point detection network is obtained by training according to a sample face map, wherein the sample face map includes a face image whose face angle is greater than a preset angle threshold.
  • Face keypoint detection networks include deep neural networks capable of training and learning.
  • the sample face map is the training sample used to train the face keypoint detection network. Before acquiring the face key point information in the target face image, it is necessary to use the sample face image for training in advance to obtain the face key point detection network.
  • the training samples may include sample face images with a face angle less than or equal to a preset angle threshold, and may also include sample face images with a face angle greater than the preset angle threshold.
  • the preset angle threshold may include a face deflection angle of 70 degrees, and the face deflection angle refers to the angle by which the face rotates left and right when facing the face.
  • the preset angle threshold may further include a face pitch angle of 30 degrees, and the face pitch angle refers to the angle at which the face is turned up and down when facing the face.
  • the training samples used to train the face key point detection network include face images with a face angle greater than a preset angle threshold, so that the trained face key point detection network can detect a larger face angle. key points of the face. Therefore, based on the key points detected by the face key point detection network, special effects can be generated for a face image whose face angle is greater than a certain angle threshold.
  • Step 202 Determine a face heat map corresponding to the target face map according to the face key point information.
  • a corresponding face heat map can be generated according to the face key point information.
  • exemplary, Excel, R language, Python or MATLAB can be used to generate a corresponding face heat map according to the 106 face key points detected from the target face image, as shown in FIG. 8 .
  • the face key points in the target face map may be used as key points in the face heat map, thereby obtaining the face heat map.
  • the pixel value corresponding to the key points of the face detected in the target face map can be set to 255, and the pixel values other than the key points of the face in the target face map can be set to 0, that is, the face heat map can be obtained.
  • Step 203 Input the face heat map into a pre-trained deep neural network to obtain the texture map, face mask map and hair mask map.
  • the deep neural network before using the face heat map to obtain the texture map, the face mask map, and the hair mask map, the deep neural network can be trained by using training samples in advance. For example, a large number of sample face maps can be collected in advance, and a corresponding face heat map can be obtained according to the face key point information in the sample face map as the input of the deep neural network.
  • the face mask map and the hair mask map corresponding to the sample face map are manually marked, and the texture map corresponding to the sample face map is determined and marked based on the face mask map and the hair mask map.
  • the face heat map corresponding to the sample face map is input into the deep neural network to be trained, and the predicted texture map, face mask map and hair mask map are output by the deep neural network, and according to the output
  • the difference between the result and the labeled label value adjusts the network parameters.
  • the face heat map corresponding to the target face map can be input into the trained deep neural network to obtain the corresponding texture map, face mask map and hair mask map.
  • the deep neural network 900 includes an encoder 910 and a decoder 920; the encoder 910 is configured to perform an encoding operation on the face heat map according to a convolution filter; the decoding The device 920 is configured to perform a decoding operation on the face heat map according to the convolution filter.
  • the face heat map can be used as the input of the deep neural network
  • the encoder 910 performs an encoding operation on the face heat map according to the convolution filter
  • the decoder 920 performs the encoding operation according to the convolution filter.
  • the decoder decodes the face heat map, and finally outputs the corresponding texture map, face mask map and hair mask map.
  • the encoder 910 may include 6 convolution filters, each of which has a convolution kernel size of 4*4 and a stride of 2. Assuming that the feature size of a convolutional layer input is C*H*W, the size becomes (H/2)*(W/2) after filtering. The first 5 convolutional layers are followed by a weight normalizer and a LeakyReLU activator; the last convolutional layer does not have a LeakyReLU activator.
  • the decoder 920 may include 6 convolution filters, each of which has a convolution kernel size of 3*3 and a stride of 1.
  • Each convolutional layer is followed by a weight normalizer and a Sub-pixel Convolution with a magnification of 2.
  • the feature size of a convolutional layer input is C*(H/2)*(W/2), and the size becomes H*W after filtering.
  • the last convolutional layer is followed by a convolutional layer with a kernel size of 3*3 and a stride of 1.
  • the number of output channels is 5.
  • the first three channels are the generated texture maps, and the fourth channel is the generated people. Face mask map, the fifth channel is the generated hair mask map.
  • a deep neural network obtained by pre-training can be used to obtain a texture map, a human face corresponding to the target face map, and a face heat map determined according to the key point information of the face detected in the target face map as input.
  • Face mask and hair mask Therefore, the fusion mask image can be obtained by fusing the obtained face mask image and the hair mask image, and the fuzzy hair image and the texture image can be fused based on the fusion coefficient determined according to the fusion mask image, so as to obtain a more realistic and natural image.
  • the effect map corresponding to the target face can be used to obtain a texture map, a human face corresponding to the target face map, and a face heat map determined according to the key point information of the face detected in the target face map as input.
  • Face mask and hair mask Therefore, the fusion mask image can be obtained by fusing the obtained face mask image and the hair mask image, and the fuzzy hair image and the texture image can be fused based on the fusion coefficient determined according to the fusion mask image, so as to obtain
  • FIG. 11 includes the target face image Img1, the hair segmentation image Img2, the blurred hair image Img3, the face heat map Img4, the texture Figure Img5, face mask figure Img6, hair mask figure Img7, fusion mask figure Img8 and special effect figure Img9 of the target face.
  • step 1001 the original face image can be detected to obtain face key point information.
  • the original face image can be detected, and the position coordinates of 106 face key points can be obtained as face key point information.
  • the detection of the original face map may utilize any network model known to those skilled in the art that can detect the key points of the face. Alternatively, it can be trained based on a learnable machine learning model or a neural network model to obtain a network model that can be used to detect the key points of the face in the original face image.
  • step 1002 the original face image is adjusted to obtain a target face image of a preset size and/or a preset angle.
  • the image including the target face collected by the image collection device may be referred to as the original face map. Since the size of the original face map is usually not uniform, it is inconvenient to further detect or process the target face in the image. For example, in the case of processing the target face image based on the deep neural network, the deep neural network usually requires the size of the input image to be consistent.
  • the original face image can be adjusted.
  • the adjustment processing may include cropping processing of the original face image.
  • the adjustment process may include adjusting the angle of the original face image to obtain a target face image that meets the angle requirement.
  • the original face image can be adjusted to obtain the target face image Img1 in FIG. 11 .
  • step 1003 the target face image is subjected to hair segmentation to obtain a hair segmentation image.
  • the deep neural network can be trained with training samples in advance to obtain a hair segmentation model that can be used to identify the hair region in the image, and the hair segmentation model can determine the hair region of the target face in the image.
  • the target face image Img1 as shown in FIG. 11 can be input into a pre-trained hair segmentation model, and the corresponding hair segmentation image Img2 as shown in FIG. 11 is output from the hair segmentation model.
  • step 1004 based on the hair segmentation map, the hair region in the target face map is filled according to the background pixels to obtain a fuzzy hair map.
  • the hair region corresponding to the target face in the target face map can be further determined.
  • the pixels of the hair region in the target face image can be further filled as background pixels, so that the background pixels are used to cover the hair region in the target face image to obtain a blurred hair image.
  • the pixels of the hair region in the target face image Img1 can be refilled as environmental pixels to obtain the blurred hair image Img3.
  • step 1005 a face heat map corresponding to the target face image is determined according to the face key point information.
  • Excel, R language, Python or MATLAB can be used to generate a corresponding face heat map according to the 106 face key points detected from the target face image.
  • the face key points in the target face map can be used as the key points in the face heat map, thereby obtaining the face heat map.
  • the pixel value corresponding to the key points of the face detected in the target face map can be set to 255, and the pixel values other than the key points of the face in the target face map can be set to 0, that is, the face heat map can be obtained.
  • the corresponding face heat map Img4 can be obtained according to the face key point information corresponding to the target face image Img1 in FIG. 11 .
  • step 1006 the face heat map is input into the pre-trained deep neural network to obtain a texture map, a face mask map and a hair mask map.
  • the deep neural network Before using the face heat map to obtain the texture map, face mask map and hair mask map, the deep neural network can be trained with training samples in advance. For example, a large number of sample face maps can be collected in advance, and a corresponding face heat map can be obtained according to the face key point information in the sample face map as the input of the deep neural network. After training a deep neural network that meets the requirements, in this step, the face heat map corresponding to the target face image can be input into the trained deep neural network, and the texture map, face mask map and hair corresponding to the target face image can be obtained. mask map.
  • this step can input the face heat map Img4 into the deep neural network obtained by pre-training, and obtain the texture map Img5, the face mask map Img6 and the hair mask map Img7.
  • step 1007 color migration processing is performed on the face region in the texture map based on the color of the face region in the target face map.
  • this step needs to perform color migration processing on the face region in the texture map based on the color of the face region in the target face map.
  • the Lab values of the pixels in the face region in the texture map can be averaged to obtain the first Lab average value; the Lab values of the face region pixels in the target face map can be averaged to obtain the second Lab average value.
  • the Lab value of the pixels in the face region in the texture map can be subtracted from the first Lab average value and added to the second Lab average value to update the Lab value of the pixels in the face region in the texture map.
  • step 1008 the face mask map and the hair mask map are fused to obtain a fusion mask map.
  • the corresponding pixels of the face mask map and the hair mask map may be added to obtain a fusion mask map.
  • the pixel values in the face mask map and the hair mask map may be weighted and summed based on a preset weight value, so as to change the face region or hair region in the fusion mask map pixel value.
  • the pixel values of the eye region and mouth region in the fusion mask map can also be adjusted to obtain the final fusion mask map.
  • the pixel values of the eye region and mouth region in the fusion mask map can be updated to "0". In this way, the eye region and the mouth region can be distinguished from the face region, so that the eye features and mouth features of the target face can be preserved in the process of generating the special effect map.
  • the pixel values in the face mask map Img6 and the hair mask map Img7 may be weighted and summed to obtain the fusion mask map Img8.
  • the pixel value of the face region in the fusion mask image Img8 is updated to "0.5"
  • the pixel value of the hair region remains "1”
  • the pixel value of the background region, mouth region and eye region remains "0”.
  • the face area, the hair area and the background area, the mouth area and the eye area can be distinguished.
  • step 1009 based on the fusion coefficient determined according to the fusion mask image, the fuzzy hair image and the texture image are fused to obtain a special effect image of the target face image.
  • the fusion coefficient of the hair region in the texture map can be determined to be 1, the fusion coefficient of the face region in the texture map can be determined to be 0.5, and the fusion coefficient of the eye region, mouth region and background region in the texture map can be determined to be 0. .
  • the fusion coefficient of the hair region in the fuzzy hair map can be determined to be 0, the fusion coefficient of the face region in the fuzzy hair map can be determined to be 0.5, and the eye region, mouth region and background in the fuzzy hair map can be determined.
  • the fusion coefficients of the regions are all 1.
  • the pixel value of the face region in the texture map can be taken as the weight of 0.5
  • the pixel value of the hair region in the texture map can be taken as the weight of 1
  • the pixel value of other regions in the texture map can be taken as the weight of 0, to obtain the corresponding complete
  • the pixel value of the texture map is the first set of pixel values.
  • the pixel value of the face area in the fuzzy hair image can be taken as a weight of 0.5
  • the pixel value of the hair area in the fuzzy hair image can be taken as a weight of 0, and the pixel values of other areas in the fuzzy hair image can be taken as a weight of 1.
  • the pixel value corresponding to the complete fuzzy hair image is obtained as the second pixel value set.
  • the corresponding pixel values in the two pixel sets can be added to obtain a complete pixel value set, that is, each pixel in the special effect image of the target face image is obtained, That is, the special effect map of the target face map is obtained.
  • this step can be based on the fusion coefficient determined according to the fusion mask map Img8, the pixel value in the fuzzy hair map Img3 is taken as the weight of the corresponding coefficient, and the pixel value in the texture map Img5 is taken the weight of the corresponding coefficient, Then the pixel values obtained after taking the weights are added correspondingly to obtain the pixel values in the special effect map Img9 of the target face map, that is, the special effect map Img9 of the target face map is obtained.
  • step 1010 the special effect image of the target face image and the original face image are fused to obtain the original special effect image.
  • the special effect map of the target face map After the special effect map of the target face map is obtained, the special effect map can be pasted back to the original face map to realize adding special effects on the basis of the original face map.
  • the pixel value in the special effect image of the target face image can be overlaid with the corresponding pixel value of the original face image to obtain the original special effect image.
  • step 1011 based on the gender information of the face in the original special effect image, adjust the outline of the face in the original special effect image, and/or perform beautification processing on the original special effect image.
  • the face contour in the special effect image can be adjusted to be more angular to match the characteristics of the male face contour;
  • the face contour in the special effect image can be adjusted to be softer or rounder to conform to the features of the female face contour.
  • the original special effect image may be further beautified, for example, the original special effect image may be whitened, beautified, or added with filters, so as to further beautify the original special effect image.
  • the present disclosure provides an apparatus for generating special effects, and the apparatus can execute the method for generating special effects in any embodiment of the present disclosure.
  • the apparatus may include a blurring processing module 1201 , a generating module 1202 , a first fusion module 1203 and a second fusion module 1204 .
  • the blurring processing module 1201 is used for blurring the hair area in the target face image to obtain a blurred hair image
  • the generating module 1202 is used for generating a texture image and a face mask image according to the target face image and the hair mask map
  • the first fusion module 1203 is used to fuse the face mask map and the hair mask map to obtain a fusion mask map
  • the second fusion module 1204 is used to fuse the mask map based on the The fusion coefficient determined by the model image is used to fuse the fuzzy hair image and the texture image to obtain a special effect image of the target face image.
  • the second fusion module 1204 includes: a fusion coefficient determination sub-module 1301, configured to respectively determine corresponding regions in the texture map based on different regions in the fusion mask map The fusion coefficient of , and the fusion coefficient of the corresponding area in the fuzzy hair map is determined.
  • the second fusion module 1204 when used to fuse the fuzzy hair map and the texture map to obtain a special effect map of the target face map, it includes: The pixel value and the fusion coefficient of the texture map determine a first set of pixel values; according to the pixel value of the fuzzy hair map and the fusion coefficient of the fuzzy hair map, determine a second set of pixel values; based on the first pixel The value set and the second pixel value set determine the pixel values in the special effect map of the target face map.
  • the apparatus further includes a third fusion module 1302, configured to fuse the special effect image of the target face image with the original face image to obtain the original special effect image.
  • a third fusion module 1302 configured to fuse the special effect image of the target face image with the original face image to obtain the original special effect image.
  • the apparatus further includes an adjustment processing module 1303 for adjusting the outline of the face in the original special effect image based on the gender information of the face in the original special effect image, and /or performing beautification processing on the original special effect image.
  • the method includes: performing hair segmentation on the target face image to obtain a hair segmentation image ; Based on the hair segmentation map, fill the hair region in the target face map according to background pixels to obtain the fuzzy hair map.
  • the generation module 1202 includes: a face key point sub-module 1304, which is used to obtain the face key point information in the target face map; a face heat map sub-module 1305, is used to determine the face heat map corresponding to the target face map according to the key point information of the face; the deep neural network sub-module 1306 is used to input the face heat map into a pre-trained deep neural network to obtain The texture map, face mask map and hair mask map.
  • the deep neural network includes an encoder and a decoder; the encoder is configured to perform an encoding operation on the face heat map according to a convolution filter; the decoder is configured to perform an encoding operation according to the convolution filter The decoder performs a decoding operation on the face heat map.
  • the face key point sub-module 1304 when used to obtain the face key point information in the target face map, it includes: obtaining the target person based on a pre-trained face key point detection network face key point information in the face map; the face key point detection network is obtained by training according to a sample face map, wherein the sample face map includes faces whose face angles are greater than a preset angle threshold image.
  • the apparatus further includes: a color migration module 1307, configured to perform color migration processing on the face region in the texture map based on the color of the face region in the target face map, so that the The color of the face region in the texture map is consistent with the color of the face region in the target face map.
  • a color migration module 1307 configured to perform color migration processing on the face region in the texture map based on the color of the face region in the target face map, so that the The color of the face region in the texture map is consistent with the color of the face region in the target face map.
  • the color migration module 1307 when configured to perform color migration processing on the face region in the texture map based on the color of the face region in the target face map, it includes: according to the texture The color value of the pixels in the face area in the figure, the first color average value is obtained; according to the color value of the face area pixels in the target face image, the second color average value is obtained; based on the first color average value and all The second color average value is used to update the color value of the pixels in the face region in the texture map.
  • the apparatus further includes: a pixel value adjustment module 1308, configured to adjust the pixel values of the eye region and the mouth region in the fusion mask image.
  • the present disclosure also provides a computer device, including a memory, a processor, and a computer program stored in the memory and executable on the processor, where the processor can implement the special effects of any embodiment of the present disclosure when the processor executes the program Generate method.
  • the device may include: a processor 1010 , a memory 1020 , an input/output interface 1030 , a communication interface 1040 and a bus 1050 .
  • the processor 1010 , the memory 1020 , the input/output interface 1030 and the communication interface 1040 realize the communication connection among each other within the device through the bus 1050 .
  • the processor 1010 can be implemented by a general-purpose CPU (Central Processing Unit, central processing unit), a microprocessor, an application specific integrated circuit (Application Specific Integrated Circuit, ASIC), or one or more integrated circuits, etc. program to implement the technical solutions provided by the embodiments of this specification.
  • a general-purpose CPU Central Processing Unit, central processing unit
  • a microprocessor an application specific integrated circuit (Application Specific Integrated Circuit, ASIC)
  • ASIC Application Specific Integrated Circuit
  • the memory 1020 may be implemented in the form of a ROM (Read Only Memory, read-only memory), a RAM (Random Access Memory, random access memory), a static storage device, a dynamic storage device, and the like.
  • the memory 1020 may store an operating system and other application programs. When implementing the technical solutions provided by the embodiments of this specification through software or firmware, relevant program codes are stored in the memory 1020 and invoked by the processor 1010 for execution.
  • the input/output interface 1030 is used to connect the input/output module to realize information input and output.
  • the input/output module can be configured in the device as a component (not shown in the figure), or can be externally connected to the device to provide corresponding functions.
  • the input device may include a keyboard, a mouse, a touch screen, a microphone, various sensors, etc.
  • the output device may include a display, a speaker, a vibrator, an indicator light, and the like.
  • the communication interface 1040 is used to connect a communication module (not shown in the figure), so as to realize the communication interaction between the device and other devices.
  • the communication module may implement communication through wired means (eg, USB, network cable, etc.), or may implement communication through wireless means (eg, mobile network, WIFI, Bluetooth, etc.).
  • Bus 1050 includes a path to transfer information between the various components of the device (eg, processor 1010, memory 1020, input/output interface 1030, and communication interface 1040).
  • the above device only shows the processor 1010, the memory 1020, the input/output interface 1030, the communication interface 1040 and the bus 1050, in a specific implementation process, the device may also include necessary components for normal operation. other components.
  • the above-mentioned device may only include components necessary to implement the solutions of the embodiments of the present specification, rather than all the components shown in the figures.
  • the present disclosure also provides a computer-readable storage medium on which a computer program is stored, and when the program is executed by a processor, the special effect generation method of any embodiment of the present disclosure can be implemented.
  • non-transitory computer-readable storage medium may be ROM, random access memory (RAM), CD-ROM, magnetic tape, floppy disk, optical data storage device, etc., which is not limited in the present disclosure.
  • embodiments of the present disclosure provide a computer program product, comprising computer-readable code, when the computer-readable code is executed on a device, the processor in the device executes any of the above implementations.
  • the computer program product can be specifically implemented by hardware, software or a combination thereof.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Health & Medical Sciences (AREA)
  • General Health & Medical Sciences (AREA)
  • Oral & Maxillofacial Surgery (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Biophysics (AREA)
  • Mathematical Physics (AREA)
  • Data Mining & Analysis (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • Computational Linguistics (AREA)
  • Evolutionary Computation (AREA)
  • Software Systems (AREA)
  • Biomedical Technology (AREA)
  • Artificial Intelligence (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Human Computer Interaction (AREA)
  • Multimedia (AREA)
  • Image Analysis (AREA)
  • Image Processing (AREA)

Abstract

一种特效生成方法、装置、设备及存储介质,涉及计算机视觉技术领域,其中方法包括:步骤101,将目标人脸图中的头发区域进行模糊处理,得到模糊头发图;步骤102,根据目标人脸图,生成纹理图、人脸掩模图和头发掩模图;步骤103,将人脸掩模图和头发掩模图进行融合,得到融合掩模图;步骤104,基于根据融合掩模图确定的融合系数,将模糊头发图和纹理图进行融合,得到目标人脸图的特效图。

Description

特效生成方法、装置、设备及存储介质
相关申请的交叉引用
本公开要求于2021年1月29日提交的、申请号为202110130196.6、发明名称为“特效生成方法、装置、设备及存储介质”的中国专利申请的优先权,该中国专利申请公开的全部内容以引用的方式并入本文中。
技术领域
本公开涉及计算机视觉技术领域,具体涉及一种特效生成方法、装置、设备及存储介质。
背景技术
在视频创作领域常需要对视频中的对象增加特效。特效生成作为图像生成研究的新兴主题,逐渐成为计算机视觉和图像学中重要的热门主题。并且,特效生成在很多图像生成技术领域有着重要应用,例如性别转换、风格迁移和各类饰品的添加特效。
发明内容
本公开提供了一种特效生成方法、装置、设备及存储介质。
根据本公开实施例的第一方面,提供一种特效生成方法,所述方法包括:将目标人脸图中的头发区域进行模糊处理,得到模糊头发图;根据所述目标人脸图,生成纹理图、人脸掩模图和头发掩模图;将所述人脸掩模图和头发掩模图进行融合,得到融合掩模图;基于根据所述融合掩模图确定的融合系数,将所述模糊头发图和所述纹理图进行融合,得到所述目标人脸图的特效图。
在一些可选实施例中,根据所述融合掩模图确定所述融合系数,包括:基于所述融合掩模图中的不同区域,分别确定所述纹理图和所述模糊头发图中对应区域的融合系数。
在一些可选实施例中,所述将所述模糊头发图和所述纹理图进行融合,得到所述目标人脸图的特效图,包括:根据所述纹理图中的像素值和所述纹理图的融合系数,确定第一像素值集合;根据所述模糊头发图中的像素值和所述模糊头发图的融合系数,确定第二像素值集合;基于所述第一像素值集合和所述第二像素值集合,确定所述目标人脸图的特效图中的像素值。
在一些可选实施例中,在所述得到所述目标人脸图的特效图之后,所述方法还包括:将所述目标人脸图的特效图与原始人脸图进行融合,得到原始特效图。
在一些可选实施例中,所述方法还包括:基于所述原始特效图中人脸的性别信息,对所述原始特效图中的人脸轮廓进行调整,和/或对所述原始特效图进行美颜处理。
在一些可选实施例中,所述将目标人脸图中的头发区域进行模糊处理,得到模糊头发图,包括:将所述目标人脸图进行头发分割,得到头发分割图;基于所述头发分割图,将所述目标人脸图中的头发区域按照背景像素进行填充,得到所述模糊头发图。
在一些可选实施例中,所述根据所述目标人脸图,生成纹理图、人脸掩模图和头发 掩模图,包括:获取所述目标人脸图中的人脸关键点信息;根据所述人脸关键点信息,确定所述目标人脸图对应的人脸热图;将所述人脸热图输入预先训练的深度神经网络,得到所述纹理图、人脸掩模图和头发掩模图。
在一些可选实施例中,所述深度神经网络包括编码器和解码器;所述编码器,用于根据卷积滤波器对所述人脸热图进行编码操作;所述解码器,用于根据卷积滤波器对所述人脸热图进行解码操作。
在一些可选实施例中,所述获取所述目标人脸图中的人脸关键点信息,包括:基于预先训练的人脸关键点检测网络,获取所述目标人脸图中的人脸关键点信息;所述人脸关键点检测网络是根据样本人脸图进行训练得到的,其中,所述样本人脸图中包括人脸角度大于预设角度阈值的人脸图像。
在一些可选实施例中,所述方法还包括:基于所述目标人脸图中人脸区域的颜色,对所述纹理图中的人脸区域进行颜色迁移处理,以使得所述纹理图中的人脸区域的颜色与所述目标人脸图中人脸区域的颜色一致。
在一些可选实施例中,所述基于所述目标人脸图中人脸区域的颜色,对所述纹理图中的人脸区域进行颜色迁移处理,包括:根据所述纹理图中人脸区域像素的颜色值,得到第一颜色平均值;根据所述目标人脸图中人脸区域像素的颜色值,得到第二颜色平均值;基于所述第一颜色平均值和所述第二颜色平均值,对所述纹理图中人脸区域像素的颜色值进行更新。
在一些可选实施例中,在所述将所述人脸掩模图和头发掩模图进行融合,得到融合掩模图之后,还包括:对所述融合掩模图中的眼睛区域和嘴巴区域的像素值进行调整。
根据本公开实施例的第二方面,提供一种特效生成装置,所述装置包括:模糊处理模块,用于将目标人脸图中的头发区域进行模糊处理,得到模糊头发图;生成模块,用于根据所述目标人脸图,生成纹理图、人脸掩模图和头发掩模图;第一融合模块,用于将所述人脸掩模图和头发掩模图进行融合,得到融合掩模图;第二融合模块,用于基于根据所述融合掩模图确定的融合系数,将所述模糊头发图和所述纹理图进行融合,得到所述目标人脸图的特效图。
在一些可选实施例中,所述第二融合模块,包括:融合系数确定子模块,用于基于所述融合掩模图中的不同区域,分别确定所述纹理图和所述模糊头发图中对应区域的融合系数。
在一些可选实施例中,所述第二融合模块,在用于将所述模糊头发图和所述纹理图进行融合,得到所述目标人脸图的特效图时,包括:根据所述纹理图中的像素值和所述纹理图的融合系数,确定第一像素值集合;根据所述模糊头发图的像素值和所述模糊头发图的融合系数,确定第二像素值集合;基于所述第一像素值集合和所述第二像素值集合,确定所述目标人脸图的特效图中的像素值。
在一些可选实施例中,所述装置还包括第三融合模块,用于将所述目标人脸图的特效图与原始人脸图进行融合,得到原始特效图。
在一些可选实施例中,所述装置还包括调整处理模块,用于基于所述原始特效图中人脸的性别信息,对所述原始特效图中的人脸轮廓进行调整,和/或对所述原始特效图进行美颜处理。
在一些可选实施例中,所述模糊处理模块,在用于将目标人脸图中的头发区域进行模糊处理,得到模糊头发图时,包括:将所述目标人脸图进行头发分割,得到头发分割图;基于所述头发分割图,将所述目标人脸图中的头发区域按照背景像素进行填充,得 到所述模糊头发图。
在一些可选实施例中,所述生成模块,包括:人脸关键点子模块,用于获取所述目标人脸图中的人脸关键点信息;人脸热图子模块,用于根据所述人脸关键点信息,确定所述目标人脸图对应的人脸热图;深度神经网络子模块,用于将所述人脸热图输入预先训练的深度神经网络,得到所述纹理图、人脸掩模图和头发掩模图。
在一些可选实施例中,所述深度神经网络包括编码器和解码器;所述编码器,用于根据卷积滤波器对所述人脸热图进行编码操作;所述解码器,用于根据卷积滤波器对所述人脸热图进行解码操作。
在一些可选实施例中,所述人脸关键点子模块,在用于获取所述目标人脸图中的人脸关键点信息时,包括:基于预先训练的人脸关键点检测网络,获取所述目标人脸图中的人脸关键点信息;所述人脸关键点检测网络是根据样本人脸图进行训练得到的,其中,所述样本人脸图中包括人脸角度大于预设角度阈值的人脸图像。
在一些可选实施例中,所述装置还包括:颜色迁移模块,用于基于所述目标人脸图中人脸区域的颜色,对所述纹理图中的人脸区域进行颜色迁移处理,以使得所述纹理图中的人脸区域的颜色与所述目标人脸图中人脸区域的颜色一致。
在一些可选实施例中,所述颜色迁移模块,在用于基于所述目标人脸图中人脸区域的颜色,对所述纹理图中的人脸区域进行颜色迁移处理时,包括:根据所述纹理图中人脸区域像素的颜色值,得到第一颜色平均值;根据所述目标人脸图中人脸区域像素的颜色值,得到第二颜色平均值;基于所述第一颜色平均值和所述第二颜色平均值,对所述纹理图中人脸区域像素的颜色值进行更新。
在一些可选实施例中,所述装置还包括:像素值调整模块,用于对所述融合掩模图中的眼睛区域和嘴巴区域的像素值进行调整。
根据本公开实施例的第三方面,提供一种计算机设备,包括存储器、处理器及存储在存储器上并可在处理器上运行的计算机程序,所述处理器执行所述程序时实现第一方面中任一项所述的特效生成方法。
根据本公开实施例的第四方面,提供一种计算机可读存储介质,其上存储有计算机程序,所述程序被处理器执行时实现第一方面中任一所述的特效生成方法。
根据本公开实施例的第五方面,提供一种计算机程序产品,包括计算机程序,所述程序被处理器执行时实现第一方面中任一所述的特效生成方法。
本公开实施例中,通过将目标人脸图的头发区域进行模糊处理得到模糊头发图,再根据目标人脸图生成纹理图、人脸掩模图和头发掩模图,并将人脸掩模图和头发掩模图进一步融合得到融合掩模图,从而可以根据融合掩模图确定的融合系数将模糊头发图和纹理图进行融合,生成更加真实、自然的特效图。
应当理解的是,以上的一般描述和后文的细节描述仅是示例性和解释性的,而非限制本公开。
附图说明
此处的附图被并入说明书中并构成本说明书的一部分,示出了符合本公开的实施例,并与说明书一起用于解释本公开的原理。
图1是根据一示例性实施例示出的一种特效生成方法;
图2是根据一示例性实施例示出的一种模糊头发图示意图;
图3是根据一示例性实施例示出的一种头发分割图示意图;
图4是根据一示例性实施例示出的一种纹理图、人脸掩模图和头发掩模图的示意图;
图5A是根据一示例性实施例示出的一种融合掩模图示意图;
图5B是根据一示例性实施例示出的又一种融合掩模图示意图;
图6是根据一示例性实施例示出的一种特效图示意图;
图7是根据一示例性实施例示出的一种目标人脸图处理流程图;
图8是根据一示例性实施例示出的一种人脸热图示意图;
图9是根据一示例性实施例示出的一种深度神经网络示意图;
图10是根据一示例性实施例示出的又一种特效生成方法流程图;
图11是根据一示例性实施例示出的一种图像处理流程示意图;
图12是根据一示例性实施例示出的一种特效生成装置示意图;
图13是根据一示例性实施例示出的又一种特效生成装置示意图;
图14是根据一示例性实施例示出的一种计算机设备的结构示意图。
具体实施方式
这里将详细地对示例性实施例进行说明,其示例表示在附图中。下面的描述涉及附图时,除非另有表示,不同附图中的相同数字表示相同或相似的要素。以下示例性实施例中所描述的具体方式并不代表与本公开相一致的所有方案。相反,它们仅是与如所附权利要求书中所详述的、本公开的一些方面相一致的装置和方法的例子。
在本公开使用的术语是仅仅出于描述特定实施例的目的,而非旨在限制本公开。在本公开和所附权利要求书中所使用的单数形式的“一种”、“所述”和“该”也旨在包括多数形式,除非上下文清楚地表示其他含义。还应当理解,本文中使用的术语“和/或”是指并包含一个或多个相关联的列出项目的任何或所有可能组合。
应当理解,尽管在本公开可能采用术语第一、第二、第三等来描述各种信息,但这些信息不应限于这些术语。这些术语仅用来将同一类型的信息彼此区分开。例如,在不脱离本公开范围的情况下,第一信息也可以被称为第二信息,类似地,第二信息也可以被称为第一信息。取决于语境,如在此所使用的词语“如果”可以被解释成为“在……时”或“当……时”或“响应于确定”。
在视频特效领域,常需要对视频中的对象增加各种特效,以达到更佳的视觉效果。特效生成作为图像生成研究领域的新兴主题,逐渐成为计算机视觉和图像学中的热门主题。并且,特效生成技术在很多图像生成领域有着重要应用,例如,基于特效生成技术可以对图像处理实现性别转换、风格迁移和饰品添加等特效。例如,在一些短视频拍摄应用或美颜相机中,可以向用户提供性别转换的特效功能,以在拍摄得到的短视频或照片中实现性别的转换。比如,基于性别转换特效,可以从拍摄的男性照片得到女性化的照片,或者可以从拍摄的女性照片得到男性化的照片。
基于以上,本公开提供了一种特效生成方法,可以将目标人脸图中的头发区域进行模糊处理,得到模糊头发图;根据目标人脸图生成纹理图、人脸掩模图和头发掩模图; 将人脸掩模图和头发掩模图进行融合得到融合掩模图;基于根据融合掩模图确定的融合系数,将模糊头发图和纹理图进行融合,得到目标人脸图的特效图。
为了使本公开提供的特效生成方法更加清楚,下面结合附图和具体实施例对本公开提供的方案执行过程进行详细描述。
参见图1,图1是本公开提供的实施例示出的一种特效生成方法流程图。如图1所示,该流程包括:
步骤101,将目标人脸图中的头发区域进行模糊处理,得到模糊头发图。
本公开实施例中,目标人脸图是包含目标人脸的、待进行特效处理的图像。例如,可以从视频录制设备中获取包含目标人脸的图像帧,作为要按照本公开的特效生成方法进行特效处理的目标人脸图。
其中,目标人脸可以包括图像中从任何方向开始出现的第一张人脸。示例性的,目标人脸可以包括图像中左侧开始出现的第一张人脸,或者图像中右侧开始出现的第一张人脸。可选地,目标人脸可以包括图像中的全部人脸,或者包括图像中符合特定要求的人脸。其中,特定要求可以是预先定义的筛选图像中目标人脸的条件。示例性地,可以由用户自行选择图像中的人脸作为符合特定要求的目标人脸。或者,特定要求可以是特定面部属性,只有在图像中人脸满足特定面部属性的情况下,才可以作为目标人脸。例如,可以预先定义特定要求为人脸面部完整程度,并将人脸面部完整程度符合要求的人脸作为目标人脸。
在一些可选实施例中,在所述将目标人脸图中的头发区域进行模糊处理之前,所述方法还包括:获取待处理的原始人脸图;将所述原始人脸图进行调整处理,得到预设尺寸和/或预设角度的所述目标人脸图。
由于图像采集设备或视频录制设备的规格不同,获取的图像尺寸各不相同。其中,图像采集设备采集的包括目标人脸的图像,可以称为原始人脸图。由于原始人脸图的尺寸通常不统一,不便于对图像中的目标人脸进行进一步检测或处理。例如,基于深度神经网络对目标人脸图进行处理的情况下,深度神经网络通常要求输入的图像的尺寸保持一致。
上述实施例中,可以将原始人脸图进行调整处理,得到符合预设尺寸的目标人脸图。其中,调整处理可以包括对原始人脸图进行裁剪处理。例如,可以将原始人脸图进行裁剪,得到符合预设尺寸要求的图像作为目标人脸图。比如,可以将原始人脸图裁剪为分辨率为512*384的目标人脸图。
或者,调整处理可以包括对原始人脸图进行角度调整。由于原始人脸图中目标人脸的角度各不相同,可能导致生成的特效方向不一致,影响用户对特效的观感,所以有必要对目标人脸的角度进行调整。本实施例中可以对原始人脸图进行角度调整,得到符合预设角度要求的目标人脸图。其中,预设角度要求可以预先根据实际特效方向进行自定义。示例性地,可以预先定义目标人脸在目标人脸图中不存在左右侧倾。
示例性地,可以将原始人脸图裁剪为分辨率为512*384的目标人脸图。在一种可能的实现方式中,可以在调整处理过程中保证目标人脸处于图像中央、保证图像中头发区域的完整性,以便于对目标人脸图进行特效处理。
其中,调整过程中保证目标人脸处于图像中央可由多种形式实现,本实施例不限制。例如,可以检测出原始人脸图中目标人脸的人脸框,以人脸框的中心点为基础将人脸框的范围向外扩展,直到达到预设的尺寸要求并进行裁剪。
其中,调整过程中保证图像中头发区域的完整性的具体方式可以包括多种,本实施 例并不限制。示例性地,可以在检测出原始人脸图中目标人脸的基础上,确定目标人脸的大小范围或倾斜角度,并结合经验值确定目标人脸对应的头发区域在图像中的范围,从而在裁剪过程中包括头发区域。示例性地,可以预先利用训练样本对深度神经网络进行训练以得到可用于识别图像中头发区域的头发分割模型,由该头发分割模型确定图像中目标人脸的头发区域,以在裁剪时确定头发区域的裁剪范围。
可以理解的是,对原始人脸图进行调整处理的方式可以包括多种,凡是可以得到预设尺寸图像的调整处理方式,均可作为本实施例的调整处理的方式。该方式得到的目标人脸图的尺寸统一,便于进行特效处理。
在一种可能的实现方式中,所述将所述原始人脸图进行调整处理,得到预设尺寸和/或预设角度的所述目标人脸图,包括:将所述原始人脸图进行裁剪处理,得到裁剪人脸图;对所述裁剪人脸图进行仿射变换,得到预设尺寸的所述目标人脸图。
上述实现方式中,可以首先对原始人脸图进行裁剪处理,得到包含目标人脸的裁剪人脸图。若裁剪人脸图的尺寸符合预设尺寸,则可将该裁剪人脸图作为目标人脸图;若裁剪人脸图的尺寸不符合预设尺寸,则可对裁剪人脸图进行仿射变换,得到预设尺寸的变换后图像,作为目标人脸图。该方式中,在裁剪处理后的图像不符合预设尺寸的情况下,可以对裁剪处理后的图像进行仿射变换得到尺寸统一的目标人脸图,以便于对目标人脸图进行特效处理。
在得到目标人脸图后,本步骤可以将目标人脸图中的头发区域进行模糊处理,例如使得头发区域被遮挡或模糊化以隐藏头发像素,得到模糊头发图。如图2所示,是对目标人脸图的头发区域进行模糊处理后得到的模糊头发图。
在一种可能的实现方式中,可以将目标人脸图进行头发分割处理,得到头发分割图;基于头发分割图,将目标人脸图中的头发区域按照背景像素进行填充,得到所述模糊头发图。
上述实现方式中,可以将目标人脸图中的头发区域进行分割处理,得到头发分割图。如图3,示出一种头发分割图。其中,将目标人脸图中的头发区域进行分割处理的具体方式,本实施例并不限制。示例性地,可以基于可学习的机器学习模型或神经网络模型,预先训练得到符合要求的可用于对目标人脸图中的头发区域进行分割处理的分割模型,从而可以将目标人脸图输入该分割模型并由该分割模型输出对应的头发分割图。
在得到头发分割图的基础上,可以进一步确定目标人脸图中目标人脸对应的头发区域。本实施例中,进一步可以将目标人脸图中头发区域的像素重新填充为背景像素,以用背景像素遮挡目标人脸图中的头发区域,得到模糊头发图。如图2所示,将目标人脸图中的头发区域的像素均填充为背景像素后,目标人脸图中的头发区域的原有像素被替换为背景像素,即利用背景像素实现了对头发区域的遮挡,最终得到了模糊头发图。
其中,背景像素可以是人脸图像中人体对象之后的背景图像上的一个像素。示例性地,目标人脸图中与头发区域相邻或不相邻的任一像素,均可以作为背景像素填充至头发区域,实现对目标人脸图中头发区域的模糊处理。可以理解的是,凡是可以用来遮挡头发区域的像素均可以作为背景像素,本实施例并不限制背景像素的具体值,也不限制该背景像素的获取方式。
步骤102,根据所述目标人脸图,生成纹理图、人脸掩模图和头发掩模图。
在一种可能的实现方式中,可以将目标人脸图作为预先训练的深度神经网络的输入,由该深度神经网络输出目标人脸图对应的纹理图、人脸掩模图和头发掩模图。该方式中可以预先收集大量样本人脸图作为训练样本,并且将样本人脸图对应的纹理图、人脸掩 模图和头发掩模图作为标签值。在对深度神经网络进行训练的过程中,可以将样本人脸图输入待训练的深度神经网络,由该深度神经网络输出预测的纹理图、人脸掩模图和头发掩模图,并根据预测的纹理图、人脸掩模图和头发掩模图与相应的标签值之间的差异,调整网络参数。
在一种可能的实现方式中,可以将目标人脸图对应的人脸热图作为预先训练的深度神经网络的输入,由该深度神经网络根据人脸热图输出对应的纹理图、人脸掩模图和头发掩模图。其中,可以首先获取目标人脸图中的人脸关键点信息;根据人脸关键点信息,确定目标人脸图对应的人脸热图;从而,可以将人脸热图输入预先训练的深度神经网络。该方式中可以预先收集大量样本人脸图,根据样本人脸图中的人脸关键点信息得到对应的人脸热图作为深度神经网络的输入。并且,由人工对样本人脸图对应的人脸掩模图和头发掩模图进行标注,基于人脸掩模图和头发掩模图确定样本人脸图对应的纹理图并进行标注。在进行训练过程中,将样本人脸图对应的人脸热图输入待训练的深度神经网络,由该深度神经网络输出预测的纹理图、人脸掩模图和头发掩模图,并根据输出结果与标注的标签值之间的差异调整网络参数。
在训练得到深度神经网络后,本步骤可以基于深度神经网络,根据目标人脸图生成对应的纹理图、人脸掩模图和头发掩模图。其中,人脸掩模图用于表示最终特效图中的人脸区域,头发掩模图用于表示最终特效图中的头发区域。例如,人脸掩模图中对应人脸区域的像素值为1,其他区域的像素值为0;头发掩模图中,对应头发区域的像素值为1,其他区域的像素值为0。
示例性地,如图4所示,深度神经网络可以根据目标人脸图ImgTarget输出对应的纹理图ImgTexture、人脸掩模图ImgFace-mask和头发掩模图ImgHair-mask。其中,纹理图ImgTexture为与目标人脸图ImgTarget性别不同的人脸图像。例如,目标人脸图ImgTarget中的人脸是男性的情况下,本实施例生成的纹理图ImgTexture中的人脸可以是包含女性特征的纹理图。人脸掩模图ImgFace-mask中,人脸区域的像素值是“1”,其他部分的像素值是“0”,以此区别特效图中的人脸区域。头发掩模图ImgHair-mask中,头发区域的像素值是“1”,其他部分的像素值是“0”,以此区别特效图中的头发区域。
步骤103,将所述人脸掩模图和头发掩模图进行融合,得到融合掩模图。
在得到目标人脸图对应的纹理图、人脸掩模图和头发掩模图后,本步骤可以将人脸掩模图和头发掩模图进行融合,得到融合掩模图。示例性地,可以将人脸掩模图和头发掩模图进行对应像素相加,得到融合掩模图。
在一种可能的实现方式中,可以基于预先设置的权重值,对人脸掩模图和头发掩模图中的像素值进行加权求和,以改变融合掩模图中人脸区域或头发区域的像素值。例如,可以通过加权求和的方式,将融合掩模图中人脸区域的像素值更新为“0.5”,将融合掩模图中头发区域的像素值保持“1”,其他部分的像素值保持“0”。基于融合掩模图中不同的像素值,可以区别出人脸区域、头发区域和其他背景区域。
示例性地,本步骤可以将图4所示的人脸掩模图ImgFace-mask和头发掩模图ImgHair-mask进行融合处理,得到如图5A所示的融合掩模图。
在一些可选实施例中,在所述将所述人脸掩模图和头发掩模图进行融合,得到融合掩模图之后,还包括:对所述融合掩模图中的眼睛区域和嘴巴区域的像素值进行调整。从而,可以在基于所述模糊头发图得到的特效图中保留目标人脸的眼睛特征和嘴巴特征。
为了最终生成的特效图能够保留目标人脸的更多特征,上述实施例中,可以将融合掩模图中的眼睛区域和嘴巴区域的像素值进行调整,得到最终的融合掩模图。在一种可 能的实现方式中,可以根据目标人脸对应的人脸关键点确定融合掩模图中对应的眼睛区域和嘴巴区域。进一步地,可以将融合掩模图中眼睛区域和嘴巴区域的像素值进行更新。例如,可以将融合掩模图中眼睛区域和嘴巴区域的像素值更新为“0”。以此可以将眼睛区域和嘴巴区域,从人脸区域中区别出来,便于在生成特效图的过程中保留目标人脸的眼睛特征和嘴巴特征。
如图5B所示,为图5A进行眼睛区域和嘴巴区域像素值更新后的融合掩模图。如图5A,其中包括眼睛区域和嘴巴区域的人脸区域的像素值均为0.5,并不能区分出人脸区域中的眼睛区域或嘴巴区域。本实施例中可以将嘴巴区域和眼睛区域的像素值更新为“0”,如图5B所示。
基于预先训练的深度神经网络,根据目标人脸图生成的纹理图中的人脸区域的颜色与训练样本中的人脸区域的颜色保持一致。但是,实际的目标人脸图中的人脸区域的颜色各种各样,若将深度神经网络生成的纹理图直接进行融合处理,可能会导致生成的特效图中的人脸颜色与目标人脸的实际颜色差异过大、与脖子区域的颜色差异过大,造成特效图不够真实。
在一些可选实施例中,还可以基于所述目标人脸图中人脸区域的颜色,对所述纹理图中的人脸区域进行颜色迁移处理,以使得所述纹理图中的人脸区域的颜色与所述目标人脸图中人脸区域的颜色一致。在一种可能的实现方式中,可以根据所述纹理图中人脸区域像素的颜色值,得到第一颜色平均值;根据所述目标人脸图中人脸区域像素的颜色值,得到第二颜色平均值;基于所述第一颜色平均值和所述第二颜色平均值,对所述纹理图中人脸区域像素的颜色值进行更新。其中,像素的颜色值用于表征像素的颜色特征。例如,在Lab(根据国际照明委员会(Commission International Eclairage,CIE)在1931年所制定的一种测定颜色的国际标准建立的、并于1976年被改进的一种色彩模式)色彩模式下,每个像素的Lab值可以作为本实施例的颜色值。
颜色迁移处理后的纹理图中人脸区域的像素的颜色与目标人脸图中人脸区域的像素的颜色,从视觉效果上趋于一致。具体地,在Lab色彩模式下,经过颜色迁移处理后纹理图中人脸区域的像素的Lab值与目标人脸图中人脸区域的像素的Lab值的差异趋近于0,所以从视觉效果上颜色迁移处理后的纹理图中的人脸区域的颜色与目标人脸图中人脸区域的颜色一致。
以Lab色彩模式为例,本实施例可以将纹理图中的人脸区域像素的Lab值求平均值,得到纹理图中的人脸区域像素的平均Lab值;将目标人脸图中人脸区域像素的Lab值求平均值,得到目标人脸图中的人脸区域的平均Lab值。进一步地,可以将纹理图中人脸区域的像素的Lab值,减去纹理图中的人脸区域像素的平均Lab值,再加上目标人脸图中的人脸区域的平均Lab值,得到更新后的纹理图中的人脸区域的像素的Lab值,实现了对纹理图中人脸区域的像素的Lab值的更新。该方式中以Lab色彩模式对纹理图中的人脸区域进行颜色迁移处理,可以使目标人脸图中的人脸颜色与纹理图中的人脸的颜色一致。防止生成的特效图中的人脸颜色与目标人脸的实际颜色差异过大、与脖子区域的颜色差异过大,从而生成的特效图更加真实、自然。
上述实施例中,可以基于目标人脸图中人脸区域的颜色,对纹理图中的人脸区域进行颜色迁移处理,以使得纹理图中的人脸区域的颜色与目标人脸图中人脸区域的颜色一致。该方式得到的纹理图中人脸颜色与目标人脸图中的人脸颜色保持一致,人脸与脖子之间的颜色差异减小,最终生成的特效图更加真实、自然。
步骤104,基于根据所述融合掩模图确定的融合系数,将所述模糊头发图和所述纹理图进行融合,得到所述目标人脸图的特效图。
在得到融合掩模图后,本步骤可以根据融合掩模图确定模糊头发图和纹理图的融合系数。其中,融合系数用于表示图像中不同区域参与图像融合的比重。例如,模糊头发图中头发区域的融合系数通常较小,所以模糊头发图中头发区域对应的像素对融合后图像中头发区域的影响更小。其中,确定融合系数的方式包括根据经验确定或根据实际效果图进行调整。
在一些可选实施例中,根据所述融合掩模图确定所述融合系数,包括:基于所述融合掩模图中的不同区域,分别确定所述纹理图中对应区域的融合系数,并相应分别确定所述模糊头发图中对应区域的融合系数。
融合掩模图中通常包括多个不同的区域。例如,图5B所示的融合掩模图中分别包括头发区域、人脸区域、眼睛区域、嘴巴区域和背景区域。在一种可能的实现方式中,可以基于融合掩模图中的不同区域,根据经验值分别确定纹理图中对应区域的融合系数,并相应分别确定模糊头发图中对应区域的融合系数。
示例性地,可以将纹理图中头发区域的融合系数确定为1,将纹理图中人脸区域的融合系数确定为0.5,将纹理图中眼睛区域、嘴巴区域和背景区域的融合系数均确定为0。相应地,可以设置模糊头发图中头发区域的融合系数为0,设置模糊头发图中人脸区域的融合系数为0.5,设置模糊头发图中眼睛区域、嘴巴区域和背景区域的融合系数均为1。可选地,融合掩模图的相同区域对应的纹理图的融合系数与对应的模糊头发图的融合系数的和为1。
在一种可能的实现方式中,可以预先将融合掩模图中不同像素的像素值确定为纹理图中对应像素的融合系数。例如,图5B所示的融合掩模图中,将头发区域的像素值设置为1,将人脸区域的像素值设置为0.5,将眼睛区域、嘴巴区域和背景区域的像素值均设置为0。进一步地,可以基于“融合掩模图的相同区域对应的纹理图的融合系数与对应的模糊头发图的融合系数的和为1”的原则,根据纹理图的融合系数确定对应的模糊头发图的融合系数。以纹理图中任一像素为例,如果纹理图中该像素的融合系数是0.5,则可以确定模糊头发图中对应像素的融合系数是0.5(=1-0.5)。
在确定融合系数后,本步骤可以基于确定的融合系数,将模糊头发图和纹理图进行融合处理,得到目标人脸图的特效图。如图6所示,得到了目标人脸图的特效图。
在一种可能的实现方式中,所述将所述模糊头发图和所述纹理图进行融合,得到所述目标人脸图的特效图,包括:根据所述纹理图中的像素值和所述纹理图的融合系数,确定第一像素值集合;根据所述模糊头发图的像素值和所述模糊头发图的融合系数,确定第二像素值集合;基于所述第一像素值集合和所述第二像素值集合,确定所述目标人脸图的特效图中的像素值。
纹理图的不同区域对应的融合系数不同。例如,纹理图中人脸区域的融合系数为0.5,头发区域的融合系数为1,背景区域的融合系数为0。上述实现方式中,可以将纹理图中人脸区域的像素值取0.5的权重,将纹理图中头发区域的像素值取1的权重,将纹理图中背景区域的像素值取0的权重,得到对应完整的纹理图的像素值为第一像素值集合。
同理,模糊头发图中不同区域对应的融合系数也不同。例如,模糊头发图中人脸区域的融合系数为0.5,头发区域的融合系数为0,背景区域的融合系数为1。上述实现方式中,可以将模糊头发图中人脸区域的像素值取0.5的权重,将模糊头发图中头发区域的像素值取0的权重,将模糊头发图中背景区域的像素值取1的权重,得到对应完整的模糊头发图的像素值为第二像素值集合。
在得到第一像素值集合和第二像素值集合之后,可以将两个像素集合中对应的 像素值相加得到一个完整的像素值集合,即得到目标人脸图的特效图中的各个像素,也即得到了目标人脸图的特效图。
基于不同的融合系数可以保留纹理图中人脸的部分特征,同时保留模糊头发图中人脸的部分特征。对纹理图中头发区域的像素值取1的权重,对模糊头发图中头发区域的像素值取0(=1-1)的权重,实现特效图中完整保留纹理图中头发区域的特征,去掉模糊头发图中头发区域的特征。
需要说明的是,在实际处理中可以预先确定更多区域的对应的融合系数,并根据确定的融合系数进行融合处理。示例性地,可以预先确定额头区域融合系数、下巴区域融合系数、耳朵区域融合系数,等等。
在一些可选实施例中,在所述得到所述目标人脸图的特效图之后,所述方法还包括:将所述目标人脸图的特效图与所述原始人脸图进行融合,得到原始特效图。在得到目标人脸图的特效图后,可以将特效图贴回原始人脸图中,实现在原始人脸图的基础上增加特效。例如,可以将目标人脸图的特效图的像素值,覆盖原始人脸图的对应像素值,得到原始特效图。
在本公开实施例中,通过将目标人脸图中的头发区域进行模糊处理,得到模糊头发图;基于深度神经网络根据目标人脸图生成纹理图、人脸掩模图和头发掩模图;将人脸掩模图和头发掩模图进行融合得到融合掩模图,并根据该融合掩模图确定融合系数;基于确定的融合系数将模糊头发图和纹理图进行融合处理,得到更加自然、真实的特效图。
在一些可选实施例中,在得到目标人脸的原始特效图后,可以基于原始特效图中人脸的性别信息,对原始特效图中的脸部轮廓进行调整,和/或对所述原始特效图进行美颜处理。示例性地,在性别转换特效中,在转换得到的原始特效图中的人脸是男性的情况下,可以将特效图中的脸部轮廓调整得更具棱角,以符合男性脸部轮廓的特征;或者,在转换得到的原始特效图中的人脸是女性的情况下,可以将特效图中的脸部轮廓调整得更柔和或圆润,以符合女性脸部轮廓的特征。示例性地,还可以进一步对原始特效图进行美颜处理,例如可以对原始特效图进行美白处理、美妆处理或者添加滤镜等,以进一步美化原始特效图。
在一些可选实施例中,如图7所示,步骤102的具体实现可以包括以下步骤:
步骤201,获取所述目标人脸图中的人脸关键点信息。
在一种可能的实现方式中,可以对原始人脸图进行人脸关键点检测,得到人脸关键点信息。例如,得到106个人脸关键点的位置坐标,作为人脸关键点信息。在检测得到人脸关键点的基础上,对原始人脸图进行裁剪处理得到预设尺寸的包含人脸关键点的目标人脸图。该方式中,可以对原始人脸图进行人脸关键点检测,在检测得到人脸关键点的基础上对原始人脸图进行裁剪处理得到目标人脸图,从而得到了目标人脸图中的人脸关键点信息。在另一种可能的实现方式中,可以在目标人脸图的基础上进行人脸关键点检测,直接得到目标人脸图中的人脸关键点信息。
在一些可选实施例中,步骤201的具体实现可以包括:基于预先训练的人脸关键点检测网络,获取所述目标人脸图中的人脸关键点信息;所述人脸关键点检测网络是根据样本人脸图进行训练得到的,其中,所述样本人脸图中包括人脸角度大于预设角度阈值的人脸图像。
人脸关键点检测网络包括能够进行训练学习的深度神经网络。样本人脸图是用于训练人脸关键点检测网络的训练样本。在获取目标人脸图中的人脸关键点信息之前, 需要预先利用样本人脸图进行训练得到人脸关键点检测网络。其中,训练样本中可以包括人脸角度小于等于预设的角度阈值的样本人脸图,也可以包括人脸角度大于预设的角度阈值的样本人脸图。示例性地,预设的角度阈值可包括人脸偏转角70度,人脸偏转角指正对人脸情况下人脸左右转动的角度。示例性地,预设的角度阈值还可包括人脸俯仰角30度,人脸俯仰角指正对人脸情况下人脸上下转动的角度。
在利用相关技术对人脸图像中的人脸关键点进行检测的过程中,通常只能检测出一定角度阈值之内的人脸的关键点。即,对于人脸角度大于一定角度阈值的人脸图像,并不能检测出对应人脸的关键点。上述实施例中,用于训练人脸关键点检测网络的训练样本中包括人脸角度大于预设的角度阈值的人脸图像,使得训练得到的人脸关键点检测网络可以检测出更大人脸角度的人脸的关键点。从而,可以基于利用该人脸关键点检测网络检测出的关键点,对人脸角度大于一定角度阈值的人脸图像进行特效生成。
步骤202,根据所述人脸关键点信息,确定所述目标人脸图对应的人脸热图。
在得到目标人脸图中的人脸关键点信息后,可以根据人脸关键点信息生成对应的人脸热图。示例性地,可以利用Excel、R语言、Python或MATLAB,根据从目标人脸图中检测得到的106个人脸关键点,生成对应的人脸热图,如图8所示。在一种可能的实现方式中,可以将目标人脸图中的人脸关键点作为人脸热图中的关键点,从而得到人脸热图。例如,可以将目标人脸图中检测得到的人脸关键点对应的像素值取255,目标人脸图中人脸关键点之外的像素值取0,即得到人脸热图。
步骤203,将所述人脸热图输入预先训练的深度神经网络,得到所述纹理图、人脸掩模图和头发掩模图。
本实施例中,在利用人脸热图得到纹理图、人脸掩模图和头发掩模图之前,可以预先利用训练样本对深度神经网络进行训练。例如,可以预先收集大量样本人脸图,根据样本人脸图中的人脸关键点信息得到对应的人脸热图作为深度神经网络的输入。并且,由人工对样本人脸图对应的人脸掩模图和头发掩模图进行标注,基于人脸掩模图和头发掩模图确定样本人脸图对应的纹理图并进行标注。在进行训练过程中,将样本人脸图对应的人脸热图输入待训练的深度神经网络,由该深度神经网络输出预测的纹理图、人脸掩模图和头发掩模图,并根据输出结果与标注的标签值之间的差异调整网络参数。在训练得到符合要求的深度神经网络之后,本步骤可以将目标人脸图对应的人脸热图输入训练后的深度神经网络,得到对应的纹理图、人脸掩模图和头发掩模图。
在一些可选实施例中,所述深度神经网络900包括编码器910和解码器920;所述编码器910,用于根据卷积滤波器对所述人脸热图进行编码操作;所述解码器920,用于根据卷积滤波器对所述人脸热图进行解码操作。
如图9所示,上述实施例中,可以将人脸热图作为深度神经网络的输入,由编码器910根据卷积滤波器对人脸热图进行编码操作,由解码器920根据卷积滤波器对人脸热图进行解码操作,最终输出对应的纹理图、人脸掩模图和头发掩模图。
示例性地,编码器910可以包括6个卷积滤波器,每个卷积滤波器的卷积核大小为4*4,步长为2。假设某卷积层输入的特征尺寸为C*H*W,经过滤波处理后尺寸变为(H/2)*(W/2)。前5个卷积层后均附带一个权重归一化器和一个LeakyReLU激活器;最后一个卷积层不带LeakyReLU激活器。
示例性地,解码器920可以包括6个卷积滤波器,每个卷积滤波器的卷积核大小为3*3,步长为1。每个卷积层后均附带一个权重归一化器和一个Sub-pixel Convolution,放大倍数为2。假设某卷积层输入的特征尺寸为C*(H/2)*(W/2),经过滤波后尺寸变为H*W。最后一个卷积层后再附带一个核大小为3*3、步长为1的卷积层,输出通道数 为5,其中前三个通道为生成的纹理图,第四个通道为生成的人脸掩模图,第五个通道为生成的头发掩模图。
上述实施例中,可以通过预先训练得到的深度神经网络,将根据目标人脸图中检测出的人脸关键点信息确定的人脸热图作为输入,得到对应目标人脸图的纹理图、人脸掩模图和头发掩模图。从而,可以基于得到的人脸掩模图和头发掩模图进行融合得到融合掩模图,并基于根据融合掩模图确定的融合系数将模糊头发图和纹理图进行融合,得到更加真实、自然的对应目标人脸的特效图。
以下以一个完整的实施例,说明本公开提供的特效生成方法,具体执行步骤可参见如图10所示的流程图。另外,在本实施例的说明过程中,将结合图11所示的图像处理流程,其中图11中包括目标人脸图Img1、头发分割图Img2、模糊头发图Img3、人脸热图Img4、纹理图Img5、人脸掩模图Img6、头发掩模图Img7、融合掩模图Img8和目标人脸的特效图Img9。
在步骤1001中,可以对原始人脸图进行检测,得到人脸关键点信息。
例如,可以对原始人脸图进行检测,得到106个人脸关键点的位置坐标,作为人脸关键点信息。其中,对原始人脸图的检测可以利用本领域技术人员熟知的可进行人脸关键点检测的任意网络模型。或者,可以基于可学习的机器学习模型或神经网络模型进行训练,得到可用于检测原始人脸图中人脸关键点的网络模型。
在步骤1002,对原始人脸图进行调整,得到预设尺寸和/或预设角度的目标人脸图。
由于图像采集设备或视频录制设备的规格不同,获取的图像尺寸各不相同。其中,图像采集设备采集的包括目标人脸的图像,可以称为原始人脸图。由于原始人脸图的尺寸通常不统一,不便于对图像中的目标人脸进行进一步检测或处理。例如,基于深度神经网络对目标人脸图进行处理的情况下,深度神经网络通常要求输入的图像的尺寸保持一致。
本步骤可以将原始人脸图进行调整处理。其中,调整处理可以包括对原始人脸图进行裁剪处理。或者,调整处理可以包括对原始人脸图进行角度调整,得到符合角度要求的目标人脸图。例如,可以对原始人脸图进行调整处理,得到图11中目标人脸图Img1。
在步骤1003,将目标人脸图进行头发分割,得到头发分割图。
在一种可能的实现方式中,可以预先利用训练样本对深度神经网络进行训练以得到可用于识别图像中头发区域的头发分割模型,由该头发分割模型确定图像中目标人脸的头发区域。例如,可以将如图11中目标人脸图Img1输入预先训练得到的头发分割模型,由该头发分割模型输出对应的如图11中头发分割图Img2。
在步骤1004,基于头发分割图,将目标人脸图中的头发区域按照背景像素进行填充,得到模糊头发图。
在得到头发分割图的基础上,可以进一步确定目标人脸图中目标人脸对应的头发区域。本步骤,可以进一步将目标人脸图中头发区域的像素重新填充为背景像素,以用背景像素遮挡目标人脸图中的头发区域,得到模糊头发图。
例如,本步骤可以基于图11中头发分割图Img2,将目标人脸图Img1中的头发区域的像素重新填充为环境像素,得到模糊头发图Img3。
在步骤1005,根据人脸关键点信息,确定目标人脸图对应的人脸热图。
示例性地,可以利用Excel、R语言、Python或MATLAB,根据从目标人脸图中检测得到的106个人脸关键点,生成对应的人脸热图。例如,可以将目标人脸图中的人脸关键点作为人脸热图中的关键点,从而得到人脸热图。例如,可以将目标人脸图中检测得到的人脸关键点对应的像素值取255,目标人脸图中人脸关键点之外的像素值取0,即得到人脸热图。
以图11为例,可以根据图11中目标人脸图Img1对应的人脸关键点信息,得到对应的人脸热图Img4。
在步骤1006,将人脸热图输入预先训练得到的深度神经网络,得到纹理图、人脸掩模图和头发掩模图。
在利用人脸热图得到纹理图、人脸掩模图和头发掩模图之前,可以预先利用训练样本对深度神经网络进行训练。例如,可以预先收集大量样本人脸图,根据样本人脸图中的人脸关键点信息得到对应的人脸热图作为深度神经网络的输入。在训练得到符合要求的深度神经网络之后,本步骤可以将目标人脸图对应的人脸热图输入训练后的深度神经网络,得到目标人脸图对应的纹理图、人脸掩模图和头发掩模图。
以图11为例,本步骤可以将人脸热图Img4输入预先训练得到的深度神经网络,得到纹理图Img5、人脸掩模图Img6和头发掩模图Img7。
在步骤1007,基于目标人脸图中人脸区域的颜色,对纹理图中的人脸区域进行颜色迁移处理。
为了使纹理图中人脸颜色与目标人脸图中的人脸颜色保持一致,本步骤需要基于目标人脸图中人脸区域的颜色,对纹理图中的人脸区域进行颜色迁移处理。例如,可以将纹理图中的人脸区域像素的Lab值求平均值,得到第一Lab平均值;将目标人脸图中人脸区域像素的Lab值求平均值,得到第二Lab平均值。进一步地,可以将纹理图中人脸区域的像素的Lab值减去第一Lab平均值再加上第二Lab平均值,实现对纹理图中人脸区域的像素的Lab值的更新。
在步骤1008,将人脸掩模图和头发掩模图进行融合,得到融合掩模图。
示例性地,可以将人脸掩模图和头发掩模图进行对应像素相加,得到融合掩模图。在一种可能的实现方式中,可以基于预先设置的权重值,对人脸掩模图和头发掩模图中的像素值进行加权求和,以改变融合掩模图中人脸区域或头发区域的像素值。
为了最终生成的特效图能够保留目标人脸的更多特征,还可以将融合掩模图中的眼睛区域和嘴巴区域的像素值进行调整,得到最终的融合掩模图。例如,可以将融合掩模图中眼睛区域和嘴巴区域的像素值更新为“0”。以此可以将眼睛区域和嘴巴区域从人脸区域中区别出来,便于在生成特效图的过程中保留目标人脸的眼睛特征和嘴巴特征。
以图11为例,可以对人脸掩模图Img6和头发掩模图Img7中的像素值进行加权求和,得到融合掩模图Img8。其中,融合掩模图Img8中人脸区域的像素值更新为“0.5”,头发区域的像素值保持“1”,背景区域、嘴巴区域和眼睛区域的像素值保持“0”。基于融合掩模图Img8中不同的像素值,可以区别出人脸区域、头发区域和背景区域、嘴巴区域和眼睛区域。
在步骤1009,基于根据融合掩模图确定的融合系数,将模糊头发图和纹理图进行融合,得到目标人脸图的特效图。
示例性地,可以将纹理图中头发区域的融合系数确定为1,将纹理图中人脸区域的融合系数确定为0.5,将纹理图中眼睛区域、嘴巴区域和背景区域的融合系数确定为0。 相应地,对应纹理图的融合系数,可以确定模糊头发图中头发区域的融合系数为0,确定模糊头发图中人脸区域的融合系数为0.5,确定模糊头发图中眼睛区域、嘴巴区域和背景区域的融合系数均为1。
进一步地,可以将纹理图中人脸区域的像素值取0.5的权重,将纹理图中头发区域的像素值取1的权重,将纹理图中其他区域的像素值取0的权重,得到对应完整的纹理图的像素值为第一像素值集合。同理,可以将模糊头发图中人脸区域的像素值取0.5的权重,将模糊头发图中头发区域的像素值取0的权重,将模糊头发图中其他区域的像素值取1的权重,得到对应完整的模糊头发图的像素值为第二像素值集合。
在得到第一像素值集合和第二像素值集合之后,可以将两个像素集合中对应的像素值相加得到一个完整的像素值集合,即得到目标人脸图的特效图中的各个像素,也即得到了目标人脸图的特效图。
以图11为例,本步骤可以基于根据融合掩模图Img8确定的融合系数,将模糊头发图Img3中的像素值取对应系数的权重,将纹理图Img5中的像素值取对应系数的权重,再将取权重后得到的像素值对应相加,得到目标人脸图的特效图Img9中的像素值,也即得到了目标人脸图的特效图Img9。
在步骤1010,将目标人脸图的特效图与原始人脸图进行融合,得到原始特效图。
在得到目标人脸图的特效图后,可以将特效图贴回原始人脸图中,实现在原始人脸图的基础上增加特效。例如,可以将目标人脸图的特效图中的像素值,覆盖原始人脸图的对应像素值,得到原始特效图。
在步骤1011,基于原始特效图中人脸的性别信息,对原始特效图中的人脸轮廓进行调整,和/或对原始特效图进行美颜处理。
例如,在性别转换特效中,转换得到的原始特效图中人脸是男性的情况下,可以将特效图中的脸部轮廓调整得更具棱角,以符合男性脸部轮廓的特征;或者,转换得到的原始特效图中人脸是女性的情况下,可以将特效图中的脸部轮廓调整得更柔和或圆润,以符合女性脸部轮廓的特征。或者,还可以进一步对原始特效图进行美颜处理,例如可以对原始特效图进行美白处理、美妆处理或者添加滤镜等,以进一步美化原始特效图。
图12所示,本公开提供了一种特效生成装置,该装置可以执行本公开任一实施例的特效生成方法。该装置可以包括模糊处理模块1201、生成模块1202、第一融合模块1203和第二融合模块1204。其中:模糊处理模块1201,用于将目标人脸图中的头发区域进行模糊处理,得到模糊头发图;生成模块1202,用于根据所述目标人脸图,生成纹理图、人脸掩模图和头发掩模图;第一融合模块1203,用于将所述人脸掩模图和头发掩模图进行融合,得到融合掩模图;第二融合模块1204,用于基于根据所述融合掩模图确定的融合系数,将所述模糊头发图和所述纹理图进行融合,得到所述目标人脸图的特效图。
可选地,如图13所示,所述第二融合模块1204,包括:融合系数确定子模块1301,用于基于所述融合掩模图中的不同区域,分别确定所述纹理图中对应区域的融合系数,以及确定所述模糊头发图中对应区域的融合系数。
可选地,所述第二融合模块1204,在用于将所述模糊头发图和所述纹理图进行融合,得到所述目标人脸图的特效图时,包括:根据所述纹理图中的像素值和所述纹理图的融合系数,确定第一像素值集合;根据所述模糊头发图的像素值和所述模糊头发图的融合系数,确定第二像素值集合;基于所述第一像素值集合和所述第二像素值集合, 确定所述目标人脸图的特效图中的像素值。
可选地,如图13所示,所述装置还包括第三融合模块1302,用于将所述目标人脸图的特效图与原始人脸图进行融合,得到原始特效图。
可选地,如图13所示,所述装置还包括调整处理模块1303,用于基于所述原始特效图中人脸的性别信息,对所述原始特效图中的人脸轮廓进行调整,和/或对所述原始特效图进行美颜处理。
可选地,所述模糊处理模块1201,在用于将目标人脸图中的头发区域进行模糊处理,得到模糊头发图时,包括:将所述目标人脸图进行头发分割,得到头发分割图;基于所述头发分割图,将所述目标人脸图中的头发区域按照背景像素进行填充,得到所述模糊头发图。
可选地,如图13所示,所述生成模块1202,包括:人脸关键点子模块1304,用于获取所述目标人脸图中的人脸关键点信息;人脸热图子模块1305,用于根据所述人脸关键点信息,确定所述目标人脸图对应的人脸热图;深度神经网络子模块1306,用于将所述人脸热图输入预先训练的深度神经网络,得到所述纹理图、人脸掩模图和头发掩模图。
可选地,所述深度神经网络包括编码器和解码器;所述编码器,用于根据卷积滤波器对所述人脸热图进行编码操作;所述解码器,用于根据卷积滤波器对所述人脸热图进行解码操作。
可选地,所述人脸关键点子模块1304,在用于获取所述目标人脸图中的人脸关键点信息时,包括:基于预先训练的人脸关键点检测网络,获取所述目标人脸图中的人脸关键点信息;所述人脸关键点检测网络是根据样本人脸图进行训练得到的,其中,所述样本人脸图中包括人脸角度大于预设角度阈值的人脸图像。
可选地,所述装置还包括:颜色迁移模块1307,用于基于所述目标人脸图中人脸区域的颜色,对所述纹理图中的人脸区域进行颜色迁移处理,以使得所述纹理图中的人脸区域的颜色与所述目标人脸图中人脸区域的颜色一致。
可选地,所述颜色迁移模块1307,在用于基于所述目标人脸图中人脸区域的颜色,对所述纹理图中的人脸区域进行颜色迁移处理时,包括:根据所述纹理图中人脸区域像素的颜色值,得到第一颜色平均值;根据所述目标人脸图中人脸区域像素的颜色值,得到第二颜色平均值;基于所述第一颜色平均值和所述第二颜色平均值,对所述纹理图中人脸区域像素的颜色值进行更新。
可选地,如图13所示,所述装置还包括:像素值调整模块1308,用于对所述融合掩模图中的眼睛区域和嘴巴区域的像素值进行调整。
对于装置实施例而言,由于其基本对应于方法实施例,所以相关之处参见方法实施例的部分说明即可。以上所描述的装置实施例仅仅是示意性的,其中所述作为分离部件说明的单元可以是或者也可以不是物理上分开的,作为单元显示的部件可以是或者也可以不是物理单元,即可以位于一个地方,或者也可以分布到多个网络单元上。可以根据实际的需要选择其中的部分或者全部模块来实现本公开至少一个实施例方案的目的。本领域普通技术人员在不付出创造性劳动的情况下,即可以理解并实施。
本公开还提供了一种计算机设备,包括存储器、处理器及存储在存储器上并可在处理器上运行的计算机程序,所述处理器执行所述程序时能够实现本公开任一实施例的特效生成方法。
图14示出了本公开实施例所提供的一种更为具体的计算机设备硬件结构示意图, 该设备可以包括:处理器1010、存储器1020、输入/输出接口1030、通信接口1040和总线1050。其中处理器1010、存储器1020、输入/输出接口1030和通信接口1040通过总线1050实现彼此之间在设备内部的通信连接。
处理器1010可以采用通用的CPU(Central Processing Unit,中央处理器)、微处理器、应用专用集成电路(Application Specific Integrated Circuit,ASIC)、或者一个或多个集成电路等方式实现,用于执行相关程序,以实现本说明书实施例所提供的技术方案。
存储器1020可以采用ROM(Read Only Memory,只读存储器)、RAM(Random Access Memory,随机存取存储器)、静态存储设备,动态存储设备等形式实现。存储器1020可以存储操作系统和其他应用程序,在通过软件或者固件来实现本说明书实施例所提供的技术方案时,相关的程序代码保存在存储器1020中,并由处理器1010来调用执行。
输入/输出接口1030用于连接输入/输出模块,以实现信息输入及输出。输入/输出模块可以作为组件配置在设备中(图中未示出),也可以外接于设备以提供相应功能。其中输入设备可以包括键盘、鼠标、触摸屏、麦克风、各类传感器等,输出设备可以包括显示器、扬声器、振动器、指示灯等。
通信接口1040用于连接通信模块(图中未示出),以实现本设备与其他设备的通信交互。其中通信模块可以通过有线方式(例如USB、网线等)实现通信,也可以通过无线方式(例如移动网络、WIFI、蓝牙等)实现通信。
总线1050包括一通路,在设备的各个组件(例如处理器1010、存储器1020、输入/输出接口1030和通信接口1040)之间传输信息。
需要说明的是,尽管上述设备仅示出了处理器1010、存储器1020、输入/输出接口1030、通信接口1040以及总线1050,但是在具体实施过程中,该设备还可以包括实现正常运行所必需的其他组件。此外,本领域的技术人员可以理解的是,上述设备中也可以仅包含实现本说明书实施例方案所必需的组件,而不必包含图中所示的全部组件。
本公开还提供了一种计算机可读存储介质,其上存储有计算机程序,所述程序被处理器执行时能够实现本公开任一实施例的特效生成方法。
其中,所述非临时性计算机可读存储介质可以是ROM、随机存取存储器(RAM)、CD-ROM、磁带、软盘和光数据存储设备等,本公开并不对此进行限制。
在一些可选实施例中,本公开实施例提供了一种计算机程序产品,包括计算机可读代码,当计算机可读代码在设备上运行时,设备中的处理器执行用于实现如上任一实施例提供的特效生成方法。该计算机程序产品可以具体通过硬件、软件或其结合的方式实现。
本领域技术人员在考虑说明书及实践这里申请的发明后,将容易想到本公开的其它实施方案。本公开旨在涵盖本公开的任何变型、用途或者适应性变化,这些变型、用途或者适应性变化遵循本公开的一般性原理并包括本公开未申请的本技术领域中的公知常识或惯用技术手段。说明书和实施例仅被视为示例性的,本公开的真正范围和精神由下面的权利要求指出。
应当理解的是,本公开并不局限于上面已经描述并在附图中示出的精确结构,并且可以在不脱离其范围的条件下进行各种修改和改变。本公开的范围仅由所附的权利要求来限制。
以上所述仅为本公开的较佳实施例而已,并不用于限制本公开,凡在本公开的 精神和原则之内,所做的任何修改、等同替换、改进等,均应包含在本公开保护的范围之内。

Claims (15)

  1. 一种特效生成方法,其特征在于,所述方法包括:
    将目标人脸图中的头发区域进行模糊处理,得到模糊头发图;
    根据所述目标人脸图,生成纹理图、人脸掩模图和头发掩模图;
    将所述人脸掩模图和所述头发掩模图进行融合,得到融合掩模图;
    基于根据所述融合掩模图确定的融合系数,将所述模糊头发图和所述纹理图进行融合,得到所述目标人脸图的特效图。
  2. 根据权利要求1所述的方法,其特征在于,根据所述融合掩模图确定所述融合系数,包括:
    基于所述融合掩模图中的不同区域,分别确定所述纹理图和所述模糊头发图中对应区域的融合系数。
  3. 根据权利要求2所述的方法,其特征在于,所述将所述模糊头发图和所述纹理图进行融合,得到所述目标人脸图的特效图,包括:
    根据所述纹理图中的像素值和所述纹理图的融合系数,确定第一像素值集合;
    根据所述模糊头发图中的像素值和所述模糊头发图的融合系数,确定第二像素值集合;
    基于所述第一像素值集合和所述第二像素值集合,确定所述目标人脸图的特效图中的像素值。
  4. 根据权利要求1所述的方法,其特征在于,在所述得到所述目标人脸图的特效图之后,所述方法还包括:
    将所述目标人脸图的特效图与原始人脸图进行融合,得到原始特效图。
  5. 根据权利要求4所述的方法,其特征在于,所述方法还包括以下任意一个或多个:
    基于所述原始特效图中人脸的性别信息,对所述原始特效图中的人脸轮廓进行调整,
    基于所述原始特效图中人脸的性别信息,对所述原始特效图进行美颜处理。
  6. 根据权利要求1至5中任一项所述的方法,其特征在于,所述将目标人脸图中的头发区域进行模糊处理,得到模糊头发图,包括:
    将所述目标人脸图进行头发分割,得到头发分割图;
    基于所述头发分割图,将所述目标人脸图中的头发区域按照背景像素进行填充,得到所述模糊头发图。
  7. 根据权利要求1至6中任一项所述的方法,其特征在于,所述根据所述目标人脸图,生成纹理图、人脸掩模图和头发掩模图,包括:
    获取所述目标人脸图中的人脸关键点信息;
    根据所述人脸关键点信息,确定所述目标人脸图对应的人脸热图;
    将所述人脸热图输入预先训练的深度神经网络,得到所述纹理图、人脸掩模图和头发掩模图。
  8. 根据权利要求7所述的方法,其特征在于,所述深度神经网络包括:
    编码器,用于根据卷积滤波器对所述人脸热图进行编码操作;和
    解码器,用于根据卷积滤波器对所述人脸热图进行解码操作。
  9. 根据权利要求7或8所述的方法,其特征在于,所述获取所述目标人脸图中的人脸关键点信息,包括:
    基于预先训练的人脸关键点检测网络,获取所述目标人脸图中的人脸关键点信息;所述人脸关键点检测网络是根据样本人脸图进行训练得到的,其中,所述样本人脸图中包括人脸角度大于预设角度阈值的人脸图像。
  10. 根据权利要求1至9中任一项所述的方法,其特征在于,所述根据所述目标人脸图,生成纹理图,包括:
    基于预先训练的深度神经网络,根据所述目标人脸图生成纹理图;
    基于所述目标人脸图中人脸区域的颜色,对所述纹理图中的人脸区域进行颜色迁移处理,以使得所述纹理图中的人脸区域的颜色与所述目标人脸图中人脸区域的颜色一致。
  11. 根据权利要求10所述的方法,其特征在于,所述基于所述目标人脸图中人脸区域的颜色,对所述纹理图中的人脸区域进行颜色迁移处理,包括:
    根据所述纹理图中人脸区域像素的颜色值,得到第一颜色平均值;
    根据所述目标人脸图中人脸区域像素的颜色值,得到第二颜色平均值;
    基于所述第一颜色平均值和所述第二颜色平均值,对所述纹理图中人脸区域像素的颜色值进行更新。
  12. 根据权利要求1至11中任一项所述的方法,其特征在于,在所述将所述人脸掩模图和头发掩模图进行融合,得到融合掩模图之后,还包括:
    对所述融合掩模图中的眼睛区域和嘴巴区域的像素值进行调整。
  13. 一种特效生成装置,其特征在于,所述装置包括:
    模糊处理模块,用于将目标人脸图中的头发区域进行模糊处理,得到模糊头发图;
    生成模块,用于根据所述目标人脸图,生成纹理图、人脸掩模图和头发掩模图;
    第一融合模块,用于将所述人脸掩模图和头发掩模图进行融合,得到融合掩模图;
    第二融合模块,用于基于根据所述融合掩模图确定的融合系数,将所述模糊头发图和所述纹理图进行融合,得到所述目标人脸图的特效图。
  14. 一种计算机设备,包括存储器、处理器及存储在所述存储器上并可在所述处理器上运行的计算机程序,其特征在于,所述处理器执行所述程序时实现权利要求1至12中任一项所述的方法。
  15. 一种计算机可读存储介质,其上存储有计算机程序,其特征在于,所述程序被处理器执行时实现权利要求1至12中任一项所述的方法。
PCT/CN2021/115411 2021-01-29 2021-08-30 特效生成方法、装置、设备及存储介质 WO2022160701A1 (zh)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202110130196.6A CN112884637B (zh) 2021-01-29 2021-01-29 特效生成方法、装置、设备及存储介质
CN202110130196.6 2021-01-29

Publications (1)

Publication Number Publication Date
WO2022160701A1 true WO2022160701A1 (zh) 2022-08-04

Family

ID=76052019

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2021/115411 WO2022160701A1 (zh) 2021-01-29 2021-08-30 特效生成方法、装置、设备及存储介质

Country Status (2)

Country Link
CN (1) CN112884637B (zh)
WO (1) WO2022160701A1 (zh)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115984426A (zh) * 2023-03-21 2023-04-18 美众(天津)科技有限公司 发型演示图像的生成的方法、装置、终端及存储介质

Families Citing this family (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112884637B (zh) * 2021-01-29 2023-04-07 北京市商汤科技开发有限公司 特效生成方法、装置、设备及存储介质
CN113205568B (zh) * 2021-04-30 2024-03-19 北京达佳互联信息技术有限公司 图像处理方法、装置、电子设备及存储介质
CN113592970B (zh) * 2021-07-28 2024-04-12 网易(杭州)网络有限公司 毛发造型的生成方法及装置、电子设备、存储介质
CN113673474B (zh) * 2021-08-31 2024-01-12 Oppo广东移动通信有限公司 图像处理方法、装置、电子设备及计算机可读存储介质
US11954828B2 (en) * 2021-10-14 2024-04-09 Lemon Inc. Portrait stylization framework using a two-path image stylization and blending
CN116152046A (zh) * 2021-11-22 2023-05-23 北京字跳网络技术有限公司 图像处理方法、装置、电子设备及存储介质
CN114219877B (zh) * 2021-12-06 2024-06-25 北京字节跳动网络技术有限公司 人像头发流动特效处理方法、装置、介质和电子设备
CN114339448B (zh) * 2021-12-31 2024-02-13 深圳万兴软件有限公司 光束视频特效的制作方法、装置、计算机设备及存储介质
CN116051386B (zh) * 2022-05-30 2023-10-20 荣耀终端有限公司 图像处理方法及其相关设备
CN115358959A (zh) * 2022-08-26 2022-11-18 北京字跳网络技术有限公司 特效图的生成方法、装置、设备及存储介质
CN115938023B (zh) * 2023-03-15 2023-05-02 深圳市皇家金盾智能科技有限公司 智能门锁人脸识别解锁方法、装置、介质及智能门锁

Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20160239991A1 (en) * 2013-07-25 2016-08-18 Morphotrust Usa, Llc System and Method for Creating a Virtual Backdrop
CN108198127A (zh) * 2017-11-27 2018-06-22 维沃移动通信有限公司 一种图像处理方法、装置及移动终端
US20200034996A1 (en) * 2017-10-18 2020-01-30 Tencent Technology (Shenzhen) Company Limited Image processing method, apparatus, terminal, and storage medium
CN110738595A (zh) * 2019-09-30 2020-01-31 腾讯科技(深圳)有限公司 图片处理方法、装置和设备及计算机存储介质
CN110782419A (zh) * 2019-10-18 2020-02-11 杭州趣维科技有限公司 一种基于图形处理器的三维人脸融合方法及系统
CN111192201A (zh) * 2020-04-08 2020-05-22 腾讯科技(深圳)有限公司 一种生成人脸图像及其模型训练的方法、装置及电子设备
CN111652828A (zh) * 2020-05-27 2020-09-11 北京百度网讯科技有限公司 人脸图像生成方法、装置、设备和介质
CN112116624A (zh) * 2019-06-21 2020-12-22 华为技术有限公司 一种图像处理方法和电子设备
CN112884637A (zh) * 2021-01-29 2021-06-01 北京市商汤科技开发有限公司 特效生成方法、装置、设备及存储介质

Family Cites Families (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110189340B (zh) * 2019-06-03 2022-01-21 北京达佳互联信息技术有限公司 图像分割方法、装置、电子设备及存储介质
CN111445564B (zh) * 2020-03-26 2023-10-27 腾讯科技(深圳)有限公司 人脸纹理图像生成方法、装置、计算机设备和存储介质
CN111652796A (zh) * 2020-05-13 2020-09-11 上海连尚网络科技有限公司 图像处理方法、电子设备及计算机可读存储介质

Patent Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20160239991A1 (en) * 2013-07-25 2016-08-18 Morphotrust Usa, Llc System and Method for Creating a Virtual Backdrop
US20200034996A1 (en) * 2017-10-18 2020-01-30 Tencent Technology (Shenzhen) Company Limited Image processing method, apparatus, terminal, and storage medium
CN108198127A (zh) * 2017-11-27 2018-06-22 维沃移动通信有限公司 一种图像处理方法、装置及移动终端
CN112116624A (zh) * 2019-06-21 2020-12-22 华为技术有限公司 一种图像处理方法和电子设备
CN110738595A (zh) * 2019-09-30 2020-01-31 腾讯科技(深圳)有限公司 图片处理方法、装置和设备及计算机存储介质
CN110782419A (zh) * 2019-10-18 2020-02-11 杭州趣维科技有限公司 一种基于图形处理器的三维人脸融合方法及系统
CN111192201A (zh) * 2020-04-08 2020-05-22 腾讯科技(深圳)有限公司 一种生成人脸图像及其模型训练的方法、装置及电子设备
CN111652828A (zh) * 2020-05-27 2020-09-11 北京百度网讯科技有限公司 人脸图像生成方法、装置、设备和介质
CN112884637A (zh) * 2021-01-29 2021-06-01 北京市商汤科技开发有限公司 特效生成方法、装置、设备及存储介质

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115984426A (zh) * 2023-03-21 2023-04-18 美众(天津)科技有限公司 发型演示图像的生成的方法、装置、终端及存储介质
CN115984426B (zh) * 2023-03-21 2023-07-04 美众(天津)科技有限公司 发型演示图像的生成的方法、装置、终端及存储介质

Also Published As

Publication number Publication date
CN112884637B (zh) 2023-04-07
CN112884637A (zh) 2021-06-01

Similar Documents

Publication Publication Date Title
WO2022160701A1 (zh) 特效生成方法、装置、设备及存储介质
JP6961811B2 (ja) 画像処理のための方法および装置、ならびにコンピュータ可読記憶媒体
US11189104B2 (en) Generating 3D data in a messaging system
CN111402135B (zh) 图像处理方法、装置、电子设备及计算机可读存储介质
US11250571B2 (en) Robust use of semantic segmentation in shallow depth of field rendering
CN108764091B (zh) 活体检测方法及装置、电子设备和存储介质
JP7413400B2 (ja) 肌質測定方法、肌質等級分類方法、肌質測定装置、電子機器及び記憶媒体
CN107771336B (zh) 基于颜色分布的图像中的特征检测和掩模
CN106682632B (zh) 用于处理人脸图像的方法和装置
WO2019019828A1 (zh) 目标对象的遮挡检测方法及装置、电子设备及存储介质
WO2023109753A1 (zh) 虚拟角色的动画生成方法及装置、存储介质、终端
US20210067756A1 (en) Effects for 3d data in a messaging system
CN111008935B (zh) 一种人脸图像增强方法、装置、系统及存储介质
CN106447604B (zh) 一种变换视频中面部画面的方法和装置
KR20220051376A (ko) 메시징 시스템에서의 3d 데이터 생성
WO2023124391A1 (zh) 妆容迁移及妆容迁移网络的训练方法和装置
CN107944420A (zh) 人脸图像的光照处理方法和装置
CN110660076A (zh) 一种人脸交换方法
CN113628327A (zh) 一种头部三维重建方法及设备
KR20200043432A (ko) 이미지 데이터에 가상 조명 조정들을 제공하기 위한 기술
WO2023066120A1 (zh) 图像处理方法、装置、电子设备及存储介质
CN116917938A (zh) 整个身体视觉效果
CN111836058B (zh) 用于实时视频播放方法、装置、设备以及存储介质
WO2020040061A1 (ja) 画像処理装置、画像処理方法及び画像処理プログラム
CN117136381A (zh) 整个身体分割

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 21922296

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 21922296

Country of ref document: EP

Kind code of ref document: A1