CN117241064A - Live-broadcast real-time face replacement method, equipment and storage medium - Google Patents

Live-broadcast real-time face replacement method, equipment and storage medium Download PDF

Info

Publication number
CN117241064A
CN117241064A CN202311517511.6A CN202311517511A CN117241064A CN 117241064 A CN117241064 A CN 117241064A CN 202311517511 A CN202311517511 A CN 202311517511A CN 117241064 A CN117241064 A CN 117241064A
Authority
CN
China
Prior art keywords
face
areas
regions
live
replacement
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202311517511.6A
Other languages
Chinese (zh)
Other versions
CN117241064B (en
Inventor
王文峰
杨振
温龙
李慧娟
汤星星
刘淑梅
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Golden Partner Technology Co ltd
Beijing Jingpaidang Technology Co ltd
Original Assignee
Beijing Golden Partner Technology Co ltd
Beijing Jingpaidang Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Golden Partner Technology Co ltd, Beijing Jingpaidang Technology Co ltd filed Critical Beijing Golden Partner Technology Co ltd
Priority to CN202311517511.6A priority Critical patent/CN117241064B/en
Publication of CN117241064A publication Critical patent/CN117241064A/en
Application granted granted Critical
Publication of CN117241064B publication Critical patent/CN117241064B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Landscapes

  • Image Processing (AREA)

Abstract

The invention relates to the technical field of face replacement. The live real-time face replacement method comprises the steps of obtaining a first face model; performing region division on the first face model to obtain N first face regions; sorting according to the weight values corresponding to the N first face regions to obtain N' first face regions with the weight values arranged in sequence from small to large; acquiring a first frame image in a live video stream, and obtaining a second face in the first frame image; dividing the second face into areas according to the N first face areas to obtain M second face areas, wherein the M second face areas are matched with the N first face areas; gradually replacing M second face areas according to the N' first face areas to obtain a target face, wherein the target face represents at least partial replacement of the first face in the second face; and applying the target face to the live video stream to obtain live real-time face replacement.

Description

Live-broadcast real-time face replacement method, equipment and storage medium
Technical Field
The present invention relates to the field of face replacement technologies, and in particular, to a method, an apparatus, and a storage medium for live real-time face replacement.
Background
The face changing technology is an artificial intelligent image processing technology, and can extract face information of one person and match the face information with face information of another person, so that a new face image with the combination of the two features is generated. In recent years, with the development of deep learning technology, a face synthesis algorithm based on a neural network has become a mainstream.
However, the current face-changing technology still can be perceived by viewers when the live video process is switched, and because human eyes have visual persistence, even though most of the current face-changing methods increase computing power, the user perception is obvious due to the visual persistence of human eyes when the face-changing speed is continuously improved.
Disclosure of Invention
The application provides a live face replacement method, equipment and a storage medium, which are used for solving the problem of persistence of vision in the existing face replacement process.
The first aspect of the application provides a live face replacement method, which comprises the following steps:
acquiring a first face model;
performing region division on the first face model to obtain N first face regions;
sorting according to the weight values corresponding to the N first face regions to obtain N' first face regions with the weight values arranged in sequence from small to large;
Acquiring a first frame image in a live video stream, and obtaining a second face in the first frame image;
dividing the second face into areas according to the N first face areas to obtain M second face areas, wherein the M second face areas are matched with the N first face areas;
gradually replacing M second face areas according to the N' first face areas to obtain a target face, wherein the target face represents that at least part of the second faces are replaced by the first faces;
and applying the target face to the live video stream to obtain the live real-time face replacement.
In some implementations, the step of obtaining the first face model includes:
acquiring a plurality of first images of different poses, expressions and illumination of a first face;
calibrating and aligning facial feature points in a plurality of first images by using a generating countermeasure network, and determining consistency of facial feature point positions, wherein the generating countermeasure network comprises a training generator network and a training discriminator network;
adding random noise into the training generator network to obtain a first training image, wherein the first training image represents an image of the first face obtained by adding the random noise into the first face image;
The training discriminator network discriminates the first training image to obtain discrimination results;
according to the judging result, the training generator network is adjusted to obtain a second training image, and the second training image represents the image of the first face obtained by adding random noise into the image of the first face;
and adjusting the training discriminator network according to the discrimination result, and discriminating the second training image to obtain the first face model.
In some implementations, the step of performing region division on the first face model to obtain N first face regions includes:
acquiring facial feature points of the first face model;
and carrying out region division on the facial feature points according to the distribution of the facial feature points to obtain the N first face regions.
In some embodiments, the step of sorting according to the weight values corresponding to the N first face regions to obtain N' first face regions with weight values arranged in order from small to large includes:
obtaining a weight value of each first face region according to a weight average value of facial feature points contained in each first face region;
And sequencing from small to large according to the weight values corresponding to the N first face regions to obtain N' first face regions with the weight values sequentially arranged from small to large.
In some implementations, the step of obtaining a first frame image in the live video stream to obtain a second face in the first frame image includes:
in a live video process, acquiring a first frame image in a live video stream, wherein the first frame image represents a certain frame image in the live video stream;
and carrying out face recognition on the first frame image to obtain the second face in the first frame image.
In some implementations, the step of dividing the second face into regions according to the N first face regions to obtain M second face regions includes:
mirror-imaging the N first face areas and the second face areas into a coordinate system respectively to obtain coordinates corresponding to the N first face areas and the second face areas respectively;
and carrying out division matching on the coordinates corresponding to the second face according to the coordinates of each first face region to obtain M second face regions.
In some implementations, the step of gradually replacing the M second face areas according to the N' first face areas to obtain the target face includes:
gradually replacing the N 'first face regions with the M second face regions according to the sequence from small to large of the weight values of the N' first face regions, so as to obtain the gradually replaced second faces;
and obtaining the target face with the replacement sequence according to the second face replaced step by step.
In some implementations, the step of applying the target face to the live video stream to obtain the live real-time face replacement includes:
gradually applying the target face with the replacement sequence to the live video stream to obtain a live video stream gradually replaced by the target face;
and in the live video stream gradually replaced by the target face, when the second face is completely replaced by the first face, obtaining the live real-time face replacement.
A second aspect of the present application provides a computer device comprising a memory storing a computer program and a processor implementing the aforementioned method of live real-time face replacement when executing the computer program.
A third aspect of the present application provides a computer storage medium having stored thereon a computer program which when executed by a processor performs the steps of the method of live real-time face replacement described previously.
The application has the beneficial effects that:
the application provides a live-broadcast real-time face replacement method, equipment and a storage medium, wherein a first face model is acquired; performing region division on the first face model to obtain N first face regions; secondly, sorting is carried out according to the weight values corresponding to the N first face regions, so that N' first face regions with the weight values arranged in sequence from small to large are obtained; then, a first frame image in the live video stream is obtained, and a second face in the first frame image is obtained; next, according to the N first face areas, carrying out area division on the second face to obtain M second face areas; finally, gradually replacing M second face areas according to the N' first face areas to obtain a target face; and finally, applying the target face to the live video stream to obtain live real-time face replacement. Through the method, the first face model is divided, the divided first face model is utilized to gradually replace the second face, namely, the second face is gradually replaced by the first face model in the live video playing process, so that the user has smaller perception of face replacement and even cannot perceive that the second face is replaced in the gradual replacement process, and the perception influence on the user when the face is suddenly replaced is effectively reduced.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings that are needed in the description of the embodiments or the prior art will be briefly described, and it is obvious that the drawings in the description below are some embodiments of the present invention, and other drawings can be obtained according to the drawings without inventive effort for a person skilled in the art.
Fig. 1 is a flowchart of a method for live face replacement in real time according to the present invention.
Detailed Description
The technical solutions of the present invention will be clearly and completely described in connection with the embodiments, and it is apparent that the described embodiments are some embodiments of the present invention, but not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.
In the description of the present invention, it should be understood that the terms "center", "longitudinal", "lateral", "length", "width", "thickness", "upper", "lower", "front", "rear", "left", "right", "vertical", "horizontal", "top", "bottom", "inner", "outer", "clockwise", "counterclockwise", etc. indicate orientations or positional relationships based on the orientations or positional relationships shown in the drawings are merely for convenience in describing the present invention and simplifying the description, and do not indicate or imply that the device or element referred to must have a specific orientation, be configured and operated in a specific orientation, and thus should not be construed as limiting the present invention.
Furthermore, the terms "first," "second," and the like, are used for descriptive purposes only and are not to be construed as indicating or implying a relative importance or implicitly indicating the number of technical features indicated. Thus, a feature defining "a first" or "a second" may explicitly or implicitly include one or more of the described features. In the description of the present application, the meaning of "a plurality" is two or more, unless explicitly defined otherwise. Furthermore, the terms "mounted," "connected," "coupled," and "connected" are to be construed broadly, and may be, for example, fixedly connected, detachably connected, or integrally connected; can be mechanically or electrically connected; can be directly connected or indirectly connected through an intermediate medium, and can be communication between two elements. The specific meaning of the above terms in the present application will be understood in specific cases by those of ordinary skill in the art.
The following description of some of the terms involved in the present application is provided to make the solution of the present application clear.
The generation countermeasure network, (Generative Adversarial Network, GAN) is a framework composed of two neural network models, one is a Generator (producer) network and the other is a Discriminator (Discriminator) network. The goal of GAN is to generate a realistic, new sample that resembles real data by training the challenge process between two networks.
The model fitting-based method is to model the facial feature point position problem as an optimization problem for minimizing fitting errors, and obtain the facial feature point position by solving the optimization problem.
ASM-based methods align facial feature points using Active Shape Model models. ASM is a model composed of PCA, and parameters such as weight and variance of the model and each feature point can be learned by a face training set, and then the parameters are used to match the target face. The ASM method can overcome deformation well, and has high alignment precision and efficiency.
Active Shape Model model is a shape modeling-based method for identifying and matching contours of faces or other objects. The model obtains feature vectors and feature values by carrying out Principal Component Analysis (PCA) on contour data in a training set, and builds a statistical model capable of describing contour changes. During the test phase, the ASM may automatically adjust the model parameters to accommodate the new input image.
The expert scoring method is a common quantitative research method, and weights of facial feature points are obtained by scoring, ranking, grading and other modes of the facial feature points by the expert.
Haar cam refers to a face feature point detection method based on Haar wavelets, and a Cascade classifier and an AdaBoost algorithm are utilized to rapidly and accurately position face positions and face feature points.
HOG (direction gradient histogram), which is a feature extraction method based on local direction gradient histogram, is more accurate than Haar Cascade.
The 3DDFA is a face detection and facial feature point positioning method based on a 3D deep learning technology, and can perform pose estimation, shape recovery and texture reconstruction on a face.
The deep convolutional neural network (Convolutional Neural Network, CNN) refers to a deep learning model widely used in the fields of computer vision and image processing. The method realizes learning and extracting of image features by combining a plurality of convolution layers, a pooling layer and a full connection layer.
Currently, face changing technology has been able to change the face of a real person, for example, changing the face of a to the face of B in video. At the moment when the face-changing technology is mature, most of the technologies emphasize the speed of face-changing and the sense of reality after face-changing. Since the human eyes have a persistence phenomenon in the process of watching pictures, when watching movies, although each image watched is independent, when 24 frames or more can be displayed per second, the user can feel that the video in 1 second is coherent, and the characteristic that the human eyes have persistence is utilized. However, it is also this feature that results in a perceived abrupt change in face, mainly because the human eye has some continuous perceptibility of the continuously changing image.
The application provides a live face replacement method.
The first aspect of the application provides a live face replacement method, which comprises the following steps:
as shown in fig. 1, S100, a first face model is acquired.
If the live face is wanted to be replaced in real time, the first face model is obtained first, and therefore the live face is convenient to replace in real time. Specifically, acquiring the first face model includes steps S101 to S106.
S101, acquiring a plurality of first images of different poses, expressions and illumination of a first face.
The first face to be replaced is determined first, and the first face may be a face of a host or a face of another person. It should be noted that, the face of the first face as the main player may be used to enhance the live broadcast effect, for example, the main player often performs the fine-tuning during live broadcast, for example, uses cosmetics to cover wrinkles, etc., and the fine-tuning obviously takes a long time, and the main player cannot guarantee to perform the fine-tuning before live broadcast each time. In this case as well, the face image information of the nicely dressed anchor, i.e., a plurality of first images of different attitudes, expressions, and illuminations of the first face, can be acquired by the image pickup apparatus. So as to obtain information of different angles and expressions of the first face according to a plurality of first images with different postures, expressions and illumination, and the first face model which is convenient to build is more lifelike.
S102, calibrating and aligning facial feature points in the first images by using the generation countermeasure network, and determining the consistency of the positions of the facial feature points.
Wherein generating the countermeasure network includes training a generator network and a training arbiter network.
Specifically, the facial feature points in the plurality of first images are calibrated and aligned using the generation countermeasure network such that the facial feature point positions of each first image are consistent. Therefore, the consistency of the positions and the shapes of the eyes, the nose, the mouth and the like corresponding to the facial feature points in each first image is ensured, the gesture and the position of the first face in the plurality of first images are adjusted to be consistent, and the accuracy of face recognition can be improved. And then, according to the change difference of facial feature points between the two first images, obtaining information such as expression change actions of the first face. That is, with calibration and alignment, facial pose information of the first face, i.e., angle and rotation information of the face in the image, may be derived to generate an estimate of the first face's motion against the network. In addition, by calibrating the facial feature points, the change of the facial expression can be quantified and described, so that the analysis and recognition of the expression are realized.
The calibration of the facial feature points may be performed by calibrating the facial feature points. Facial feature point alignment may be aligned by a method of ASM-based methods.
And S103, adding random noise into the training generator network to obtain a first training image.
The first training image represents an image of a first face obtained by adding random noise to the first face image.
S104, training a discriminator network to discriminate the first training image, and obtaining a discrimination result.
And S105, adjusting the training generator network according to the judging result to obtain a second training image.
The second training image represents an image of the first face obtained by adding random noise to the first face image.
And S106, adjusting a training discriminator network according to the discrimination result, and discriminating the second training image to obtain the first face model.
Wherein random noise is added in step S103 by using the training generator network, so as to obtain a first training image, that is, the first training image is a first face image containing random noise. Then, inputting the first training image into a training discriminator network so as to discriminate the first training image by the training discriminator network to obtain a discrimination result, wherein the discrimination result can be that noise in the first training image is recognized or that noise in the first training image cannot be recognized, and when the noise in the first training image is recognized, the first face model is described as being completed; when the noise in the first training image cannot be identified, the first face model still needs to be trained. Thus, the parameters of the training arbiter network need to be adjusted, wherein the adjustment of the parameters can be understood as the adjustment of the initial first face model, and the process can be understood as the continuous improvement of the initial first face model. That is, the training arbiter network parameters are adjusted to identify this random noise. And then, the training generator network is adjusted to obtain a second training image, and the training discriminator network is trained again by utilizing the second training image. At this time, the second training image may be the first training image, so that the training arbiter network after adjusting the parameters can recognize the noise in the first training image. The second training image may also add different noise relative to the random noise in the first training image. The training of the training discriminant network is performed for a plurality of times by the method, and when the noise added into the training discriminant network each time by the training generator network can be recognized by the training discriminant network, the completion of the initial training of the first face model can be described, and a final first face model is obtained.
And S200, carrying out region division on the first face model to obtain N first face regions.
In the process of replacing the second face with the first face, the conventional replacement is easy to cause the user to feel abrupt, so that the first face model is divided into N first face areas, and then the face replacement is performed area by area in the subsequent steps, so that the perception degree of the user on the face replacement is reduced to a large extent, and the abrupt feeling is reduced or no abrupt feeling is caused.
Specifically, obtaining N first face regions includes steps S201 and S202.
S201, facial feature points of the first face model are acquired.
S202, dividing the facial feature points into areas according to the distribution of the facial feature points to obtain N first face areas.
Wherein, the facial feature points are obtained by the method for obtaining facial feature points mentioned in the previous step. Next, the magnitude of the facial influence change is given a weight to the facial feature points according to the facial feature point transformation. Illustratively, the eye and mouth transformations have a greater impact on the face, that is, the eye and mouth transformations are most perceived by the user, and thus the corresponding facial feature points of the eye and mouth are given a higher weight, while other locations, for example, the chin and forehead transformations have a smaller impact on the face than the eye and mouth, and thus the chin and forehead are given a lower weight.
Specifically, after the entire area of the face is obtained, the forehead area, the eye area, the nose area, the eyebrow area, the cheek area, the lip area, the chin area, and the neck area in the entire area of the face are determined, so that N first face areas are obtained, where N represents the number of areas. Illustratively, frontal area: the area from the hairline to the upper edge of the eyebrow, including the forehead and forehead. Eye region: including the orbital, upper and lower eyelids, eyeball, eyelashes, and periocular region. Nose region: including the back, nose wings, nose tips and nostril parts. Eyebrow area: and is located above the eye sockets in the region from the upper edge of the eyebrows to the lower edge of the eye sockets. Cheek area: including the area between the maxilla and mandible, covering both cheeks and the side faces. Lip area: including the upper and lower lips and around the mouth. Chin area: the area between the mandible and the lower lip. Neck region: extending from the mandible to the region between the collarbones. Wherein, the neck region can be selected to be replaced according to the requirement.
In addition, boundaries between the forehead region, the eye region, the nose region, the eyebrow region, the cheek region, the lip region, the chin region, and the neck region may be determined according to physiological structures and facial features, for example, if muscles are present at the boundaries of the eye region and the cheek region, the muscles may be divided into corresponding regions according to the physiological features of the face, so that the divided regions of the muscles may be simultaneously replaced when the region replacement is performed in a subsequent step. For another example, the face may have a mole, a tattoo, a scar, or the like, and the mole, the tattoo, the scar, or the like may be individually divided into one region by an expert scoring method, and the mole, the tattoo, or the scar is easily observed by the user when the face is provided with the mole, and therefore, the region corresponding to the mole is weighted, and when the region is weighted as the replacement order, the replacement of the sub-region is delayed.
The sizes of the forehead region, the eye region, the nose region, the eyebrow region, the cheek region, the lip region, the chin region, and the neck region may be adjusted according to the actual situation, and the present application is not limited thereto. In addition, the regions of the forehead region, the eye region, the nose region, the eyebrow region, the cheek region, the lip region, the chin region and the neck region of the facial division can be used as father regions according to requirements, and then the father regions are divided into a plurality of sub-regions, so that the perception of the user on face change is further reduced. That is, the sub-areas are gradually replaced for the replacement of the parent area, thereby completing the replacement of the parent area. The replacement sequence of the sub-areas is also replaced according to the weight corresponding to each sub-area under each parent area, and if the weights are the same, the sub-area continuity is used as the priority to replace.
S300, sorting according to the weight values corresponding to the N first face regions, and obtaining N' first face regions with the weight values arranged in sequence from small to large.
The step of obtaining N' first face regions with weight values sequentially arranged from small to large specifically includes steps S301 and S302.
S301, obtaining the weight value of each first face region according to the weight average value of the facial feature points contained in each first face region.
S302, sorting from small to large according to the weight values corresponding to the N first face regions, and obtaining N' first face regions with weight values arranged from small to large.
The facial feature points may be extracted by a facial feature point detection algorithm, for example, facial feature extraction by HOG (direction gradient histogram), 3DDFA, deep Convolutional Neural Network (CNN), or Haar Cascade. The weight values corresponding to the facial feature points can be used for learning the positions and weights of the facial feature points by training a large-scale facial data set according to a deep learning model formed by a deep Convolutional Neural Network (CNN). Thus, the position and the corresponding weight of the facial feature point can be obtained by applying the face of the application to the deep learning model. That is, the present application uses a conventional method for extracting facial feature points and calculating corresponding weights, which is not limited thereto.
Next, a weighted average value included in each first face region is calculated, that is, in the case where 5 face feature points are included in one first face region and the weights of the 5 face feature points are known, an average value of the 5 face feature points is taken as the weight value of the one first face region. And then sorting the weight values corresponding to the N first face regions from small to large, and obtaining N 'first face regions with the weight values arranged in sequence from small to large, wherein N' corresponds to the number of N.
S400, acquiring a first frame image in the live video stream, and obtaining a second face in the first frame image.
The first frame image is an image with a second face in the live video stream.
Specifically, obtaining the second face in the first frame image includes steps S401 and S402.
S401, in the live video process, acquiring a first frame image in a live video stream.
Wherein the first frame image represents a certain frame image in the live video stream.
S402, face recognition is carried out on the first frame image, and a second face in the first frame image is obtained.
The first frame of image may be an image at a certain moment in the live video stream, for example, 2 minutes, 30 minutes, etc., which is not limited in the present application. After the first frame image is obtained, face features in the first frame image are identified, and second face data in the first frame image are obtained. The method used for identifying the facial features in the first frame image can be used for identifying the facial feature points, which is not limited in the present application.
It should be noted that, the second face is a face to be replaced by the first face, that is, when the first face is a well-dressed face, at least part of features will be the same as those of the second face, so that feature points obtained by using the first face are matched with a plurality of faces identified in the first frame image, and the second face can be obtained. If the second face is not the same person as the first face, a plurality of faces in the first frame image need to be recognized, and then, the faces needing to be replaced are selected from the plurality of recognized faces to be used as the second face. So that the second face is replaced in the video stream after the first frame of image of the live video stream.
S500, dividing the second face into areas according to the N first face areas to obtain M second face areas.
Wherein the M second face regions are matched with the N first face regions.
Specifically, obtaining M second face regions includes steps S501 and S502.
S501, mirroring the N first face areas and the second face areas into a coordinate system respectively to obtain coordinates corresponding to the N first face areas and the second face areas respectively.
S502, dividing and matching the coordinates corresponding to the second face according to the coordinates of each first face area to obtain M second face areas.
After the N first face regions and the second face regions are obtained in the foregoing steps, the N first face regions and the second face regions are respectively placed in the same coordinate system, so that coordinates corresponding to the N first face regions and the second face regions can be obtained respectively. Next, edge coordinates of the first face region and edge coordinates of the second face region are obtained according to the coordinate system. Then, matching the edge coordinates of the first face region with the edge coordinates of the second face, and in the process, if the area of the first face region is larger than that of the second face, scaling adjustment is needed for the first face region so as to match with the area of the second face as much as possible.
Then, the facial feature points of the second face are extracted, and the facial feature point extraction method mentioned in the previous step can be used. Next, the facial feature points of the second face region and the first face region are calibrated and aligned. When a certain area of the second face is replaced, the corresponding position can be found in the second face. That is, each of the N first face regions is corresponding to a second face region, thereby obtaining M second face regions of the second face. For example, the number of first face regions is 1 to N, the number of second face regions is 1 to M, and the 1 to N first face regions are in one-to-one correspondence with the 1 to M second face regions.
S600, gradually replacing the M second face areas according to the N' first face areas to obtain the target face.
Wherein the target face represents at least a partial replacement of the first face in the second face.
Specifically, obtaining the target face includes steps S601 to S602.
S601, gradually replacing the N 'first face regions with the M second face regions according to the sequence of the weight values of the N' first face regions from small to large, and obtaining gradually replaced second faces.
S602, obtaining a target face with a replacement sequence according to the second face replaced step by step.
Under the condition that 1 to N first face areas and 1 to M second face areas are obtained in the foregoing step and correspond to each other one by one, in a live broadcast process after a first frame image of a live broadcast video, the M second face areas are replaced step by step according to a preset time in a sequence from small to large in weight value of the N' first face areas, where the step by step is understood as one.
Note that, one of the M second face regions that is currently replaced is spaced from the last replaced region by a preset time, and this preset time may be set as needed, for example, to 30 seconds, so that one of the M second face regions is replaced at intervals of 30 seconds. That is, the target face having the replacement order represents 1 to M second face regions that are gradually replaced with the target face in order of the weight values from small to large. Compared with the existing face replacement which can be completed once, the face replacement method divides the process into a plurality of face area replacements, and the plurality of face area replacements are replacements with time intervals, so that the replacement mode can save calculation force to a certain extent.
And S700, applying the target face to the live video stream to obtain live real-time face replacement.
The step of obtaining live real-time face substitution includes steps S701 and S702.
S701, gradually applying the target face with the replacement sequence to the live video stream to obtain a live video stream gradually replaced by the target face;
s702, in the live video stream gradually replaced by the target face, when the second face is completely replaced by the first face, live real-time face replacement is obtained.
After a first frame of image of the live video stream, gradually replacing a second face in the live video with a target face according to preset time, and finally obtaining live real-time face replacement. That is, in a live video after a first frame of image, a face region is replaced over a period of time, and eventually a first face will be presented in the live video, while during the interval between two preset times, an image is presented in the live video stream where a second face portion is replaced by the first face. This process is slow and not easily perceived by the user, reducing the user's abrupt sense of face replacement.
In some embodiments, when the first face is a face of the anchor, if the first face is opposite to the live video, a difference in skin brightness may occur in the second face of the anchor, so in steps S100 to S700, the skin color of the replaced area and the skin color of the un-replaced area are kept the same, and after step S700, step S800 is further included.
S800, gradually adjusting the skin color of the first face within a set time period, and obtaining the skin color of the first face which is the same as the skin color of the first face model after the set time period.
For example, the skin color of the first face model corresponds to 100, the current skin color of the first face is 0, the preset time period is divided into 100 time points, the skin color of the first face is adjusted once at each time point, for example, the 1 st time point, and the skin color of the first face is adjusted from 0 to 1. In this way, after a set period of time, the skin color of the first face may be made the same as the skin color of the first face model.
In steps S100 to S700, if the second face is the same person as the first face model, the carefully-decorated face in the first face model may be replaced with the second face. If the second face is not the same person as the first face model, the steps S100 to S700 are performed.
A second aspect of the present application provides a computer device comprising a memory storing a computer program and a processor implementing the method of live real-time face replacement described above when executing the computer program.
A third aspect of the present application provides a computer storage medium having stored thereon a computer program which when executed by a processor performs the steps of the method of live real-time face replacement described above.
In the description of the embodiments of the present invention, those skilled in the art will appreciate that the embodiments of the present invention may be implemented as a method, an apparatus, an electronic device, and a computer-readable storage medium. Thus, embodiments of the present invention may be embodied in the following forms: complete hardware, complete software (including firmware, resident software, micro-code, etc.), a combination of hardware and software. Furthermore, in some embodiments, embodiments of the invention may also be implemented in the form of a computer program product in one or more computer-readable storage media having computer program code embodied therein.
Any combination of one or more computer-readable storage media may be employed by the computer-readable storage media described above. The computer-readable storage medium includes: an electrical, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any combination thereof. More specific examples of the computer readable storage medium include the following: portable computer diskette, hard disk, random Access Memory (RAM), read-only Memory (ROM), erasable programmable read-only Memory (EPROM), flash Memory (Flash Memory), optical fiber, compact disc read-only Memory (CD-ROM), optical storage device, magnetic storage device, or any combination thereof. In embodiments of the present invention, a computer readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, device.
The computer program code embodied in the computer readable storage medium may be transmitted using any appropriate medium, including: wireless, wire, fiber optic cable, radio Frequency (RF), or any suitable combination thereof.
Computer program code for carrying out operations of embodiments of the present invention may be written in assembly instructions, instruction Set Architecture (ISA) instructions, machine-related instructions, microcode, firmware instructions, state setting data, integrated circuit configuration data, or in one or more programming languages, or combinations thereof, including an object oriented programming language such as: java, smalltalk, C ++, also include conventional procedural programming languages, such as: c language or similar programming language. The computer program code may execute entirely on the user's computer, partly on the user's computer, as a stand-alone software package, partly on the user's computer and partly on a remote computer or entirely on the remote computer or server. In the case of remote computers, the remote computers may be connected via any sort of network, including: a Local Area Network (LAN) or a Wide Area Network (WAN), which may be connected to the user's computer or to an external computer.
The embodiment of the invention describes a method, a device and electronic equipment through flowcharts and/or block diagrams.
It will be understood that each block of the flowchart illustrations and/or block diagrams, and combinations of blocks in the flowchart illustrations and/or block diagrams, can be implemented by computer-readable program instructions. These computer-readable program instructions may be provided to a processor of a general purpose computer, special purpose computer, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions/acts specified in the flowchart and/or block diagram block or blocks.
These computer readable program instructions may also be stored in a computer readable storage medium that can cause a computer or other programmable data processing apparatus to function in a particular manner. Thus, instructions stored in a computer-readable storage medium produce an instruction means which implement the functions/acts specified in the flowchart and/or block diagram block or blocks.
The computer readable program instructions may also be loaded onto a computer, other programmable data processing apparatus, or other devices to cause a series of operational steps to be performed on the computer, other programmable apparatus or other devices to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide processes for implementing the functions/acts specified in the flowchart and/or block diagram block or blocks.
The terms first and second and the like in the description and in the claims of embodiments of the invention, are used for distinguishing between different objects and not necessarily for describing a particular sequential order of objects. For example, the first target object and the second target object, etc., are used to distinguish between different target objects, and are not used to describe a particular order of target objects.
In embodiments of the invention, words such as "exemplary" or "such as" are used to mean serving as an example, instance, or illustration. Any embodiment or design described herein as "exemplary" or "e.g." in an embodiment should not be taken as preferred or advantageous over other embodiments or designs. Rather, the use of words such as "exemplary" or "such as" is intended to present related concepts in a concrete fashion.
In the description of the embodiments of the present invention, unless otherwise indicated, the meaning of "a plurality" means two or more. For example, the plurality of processing units refers to two or more processing units; the plurality of systems means two or more systems.
Finally, it should be noted that: the above embodiments are only for illustrating the technical solution of the present invention, and not for limiting the same; although the invention has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical scheme described in the foregoing embodiments can be modified or some or all of the technical features thereof can be replaced by equivalents; such modifications and substitutions do not depart from the spirit of the invention.

Claims (10)

1. A method for live real-time face replacement, the method comprising:
acquiring a first face model;
performing region division on the first face model to obtain N first face regions;
sorting according to the weight values corresponding to the N first face regions to obtain N' first face regions with the weight values arranged in sequence from small to large;
acquiring a first frame image in a live video stream, and obtaining a second face in the first frame image;
dividing the second face into areas according to the N first face areas to obtain M second face areas, wherein the M second face areas are matched with the N first face areas;
gradually replacing M second face areas according to the N' first face areas to obtain a target face, wherein the target face represents that at least part of the second faces are replaced by the first faces;
and applying the target face to the live video stream to obtain the live real-time face replacement.
2. The method of live real-time face replacement of claim 1, wherein the step of obtaining a first face model comprises:
Acquiring a plurality of first images of different poses, expressions and illumination of a first face;
calibrating and aligning facial feature points in a plurality of first images by using a generating countermeasure network, and determining consistency of facial feature point positions, wherein the generating countermeasure network comprises a training generator network and a training discriminator network;
adding random noise into the training generator network to obtain a first training image, wherein the first training image represents an image of the first face obtained by adding the random noise into the first face image;
the training discriminator network discriminates the first training image to obtain discrimination results;
according to the judging result, the training generator network is adjusted to obtain a second training image, and the second training image represents the image of the first face obtained by adding random noise into the image of the first face;
and adjusting the training discriminator network according to the discrimination result, and discriminating the second training image to obtain the first face model.
3. The method for replacing live real-time face according to claim 1, wherein the step of dividing the first face model into N first face areas includes:
Acquiring facial feature points of the first face model;
and carrying out region division on the facial feature points according to the distribution of the facial feature points to obtain the N first face regions.
4. The method for replacing live real-time face according to claim 3, wherein the step of sorting according to the weight values corresponding to the N first face regions to obtain N' first face regions with weight values arranged in order from small to large comprises:
obtaining a weight value of each first face region according to a weight average value of facial feature points contained in each first face region;
and sequencing from small to large according to the weight values corresponding to the N first face regions to obtain N' first face regions with the weight values sequentially arranged from small to large.
5. The method for replacing live real-time faces according to claim 1, wherein the step of obtaining a first frame image in a live video stream to obtain a second face in the first frame image comprises:
in a live video process, acquiring a first frame image in a live video stream, wherein the first frame image represents a certain frame image in the live video stream;
And carrying out face recognition on the first frame image to obtain the second face in the first frame image.
6. The method for replacing live real-time face according to claim 1, wherein the step of dividing the second face into M second face regions according to the N first face regions includes:
mirror-imaging the N first face areas and the second face areas into a coordinate system respectively to obtain coordinates corresponding to the N first face areas and the second face areas respectively;
and carrying out division matching on the coordinates corresponding to the second face according to the coordinates of each first face region to obtain M second face regions.
7. The method for replacing live real-time faces according to claim 1, wherein the step of gradually replacing M second face areas according to the N' first face areas to obtain a target face includes:
gradually replacing the N 'first face regions with the M second face regions according to the sequence from small to large of the weight values of the N' first face regions, so as to obtain the gradually replaced second faces;
and obtaining the target face with the replacement sequence according to the second face replaced step by step.
8. The method of live real-time face replacement of claim 7, wherein the step of applying the target face to the live video stream to obtain the live real-time face replacement comprises:
gradually applying the target face with the replacement sequence to the live video stream to obtain a live video stream gradually replaced by the target face;
and in the live video stream gradually replaced by the target face, when the second face is completely replaced by the target face, obtaining the live real-time face replacement.
9. A computer device comprising a memory and a processor, the memory storing a computer program, characterized in that the processor implements the method of live real-time face replacement of any of claims 1 to 7 when the computer program is executed.
10. A computer storage medium having stored thereon a computer program, which when executed by a processor performs the steps of the method of live real-time face replacement of any of claims 1 to 7.
CN202311517511.6A 2023-11-15 2023-11-15 Live-broadcast real-time face replacement method, equipment and storage medium Active CN117241064B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202311517511.6A CN117241064B (en) 2023-11-15 2023-11-15 Live-broadcast real-time face replacement method, equipment and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202311517511.6A CN117241064B (en) 2023-11-15 2023-11-15 Live-broadcast real-time face replacement method, equipment and storage medium

Publications (2)

Publication Number Publication Date
CN117241064A true CN117241064A (en) 2023-12-15
CN117241064B CN117241064B (en) 2024-03-19

Family

ID=89082997

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202311517511.6A Active CN117241064B (en) 2023-11-15 2023-11-15 Live-broadcast real-time face replacement method, equipment and storage medium

Country Status (1)

Country Link
CN (1) CN117241064B (en)

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108681719A (en) * 2018-05-21 2018-10-19 北京微播视界科技有限公司 Method of video image processing and device
CN111583105A (en) * 2020-05-14 2020-08-25 厦门美图之家科技有限公司 Portrait generation method, device, equipment and storage medium
CN112307923A (en) * 2020-10-30 2021-02-02 北京中科深智科技有限公司 Partitioned expression migration method and system
CN112949605A (en) * 2021-04-13 2021-06-11 杭州欣禾圣世科技有限公司 Semantic segmentation based face makeup method and system
WO2022237081A1 (en) * 2021-05-14 2022-11-17 北京市商汤科技开发有限公司 Makeup look transfer method and apparatus, and device and computer-readable storage medium
WO2023124391A1 (en) * 2021-12-30 2023-07-06 上海商汤智能科技有限公司 Methods and apparatuses for makeup transfer and makeup transfer network training

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108681719A (en) * 2018-05-21 2018-10-19 北京微播视界科技有限公司 Method of video image processing and device
CN111583105A (en) * 2020-05-14 2020-08-25 厦门美图之家科技有限公司 Portrait generation method, device, equipment and storage medium
CN112307923A (en) * 2020-10-30 2021-02-02 北京中科深智科技有限公司 Partitioned expression migration method and system
CN112949605A (en) * 2021-04-13 2021-06-11 杭州欣禾圣世科技有限公司 Semantic segmentation based face makeup method and system
WO2022237081A1 (en) * 2021-05-14 2022-11-17 北京市商汤科技开发有限公司 Makeup look transfer method and apparatus, and device and computer-readable storage medium
WO2023124391A1 (en) * 2021-12-30 2023-07-06 上海商汤智能科技有限公司 Methods and apparatuses for makeup transfer and makeup transfer network training

Also Published As

Publication number Publication date
CN117241064B (en) 2024-03-19

Similar Documents

Publication Publication Date Title
US20210209851A1 (en) Face model creation
US20220366627A1 (en) Animating virtual avatar facial movements
US11423556B2 (en) Methods and systems to modify two dimensional facial images in a video to generate, in real-time, facial images that appear three dimensional
CN108427503B (en) Human eye tracking method and human eye tracking device
CN112700523B (en) Virtual object face animation generation method and device, storage medium and terminal
He et al. Photo-realistic monocular gaze redirection using generative adversarial networks
CN108171789B (en) Virtual image generation method and system
CN112437950A (en) Skeletal system for animating virtual head portraits
WO2022095721A1 (en) Parameter estimation model training method and apparatus, and device and storage medium
CN111771231A (en) Matching mesh for avatars
CN111325846B (en) Expression base determination method, avatar driving method, device and medium
CN106529409A (en) Eye ocular fixation visual angle measuring method based on head posture
CN110807364A (en) Modeling and capturing method and system for three-dimensional face and eyeball motion
US10512321B2 (en) Methods, systems and instruments for creating partial model of a head for use in hair transplantation
CN111936912A (en) Head scan alignment using eye registration
CN112232128B (en) Eye tracking based method for identifying care needs of old disabled people
CN112101320A (en) Model training method, image generation method, device, equipment and storage medium
CN107422844B (en) Information processing method and electronic equipment
Lanz et al. Automated classification of therapeutic face exercises using the Kinect
CN111028318A (en) Virtual face synthesis method, system, device and storage medium
CN117241064B (en) Live-broadcast real-time face replacement method, equipment and storage medium
CN114967128B (en) Sight tracking system and method applied to VR glasses
CN116433718A (en) Feature tracking system and method
CN111222448B (en) Image conversion method and related product
CN113033250A (en) Facial muscle state analysis and evaluation method

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant