CN110267079B

CN110267079B - Method and device for replacing human face in video to be played

Info

Publication number: CN110267079B
Application number: CN201810276537.9A
Authority: CN
Inventors: 陈宇; 李洋; 孙晓雨; 高杨; 闫富国; 陈曦
Original assignee: Tencent Technology Shenzhen Co Ltd
Current assignee: Tencent Technology Shenzhen Co Ltd
Priority date: 2018-03-30
Filing date: 2018-03-30
Publication date: 2023-03-24
Anticipated expiration: 2038-03-30
Also published as: CN110267079A

Abstract

The disclosure provides a method and a device for replacing a human face in a video to be played. The method comprises the following steps: identifying a first face from decoded frame data of a video to be played; carrying out three-dimensional modeling by taking the identified key point of the first face as a vertex to obtain a three-dimensional model, and keeping the vertex coordinates of the three-dimensional model and the key point coordinates of the first face in the video to be played synchronous; acquiring a second face; and applying the acquired second face as a texture to the three-dimensional model. The present disclosure provides a technique for replacing a face in a video with a face of another person, which can restore features such as the orientation and expression of the face in the video.

Description

Method and device for replacing human face in video to be played

Technical Field

The present disclosure relates to the field of image processing, and in particular, to a method and an apparatus for replacing a face in a video to be played.

Background

In the current face image editing process, face replacement, which is commonly called "face change", is often required. The current "face change" mainly aims at the face change between static pictures, namely, the face of a user A is "scratched" from the static picture of the user A, and the face of the user B in the static picture of the user B is replaced.

The face changing technology has poor effect when applied to the face replacement of videos. Because the technology is only simple still picture content replacement, the characteristics of the face orientation, the expression and the like in the video cannot be restored.

Disclosure of Invention

An object of the present disclosure is to provide a technique for replacing a face in a video with a face of another person, which can restore features such as the orientation and expression of the face in the video.

According to a first aspect of the embodiments of the present disclosure, a method for replacing a face in a video to be played is disclosed, which includes:

identifying a first face from decoded frame data of a video to be played;

performing three-dimensional modeling by taking the identified key point of the first face as a vertex to obtain a three-dimensional model, and keeping the vertex coordinates of the three-dimensional model and the key point coordinates of the first face in the video to be played synchronous;

acquiring a second face;

and applying the acquired second face as a texture to the three-dimensional model.

According to a second aspect of the embodiments of the present disclosure, a device for replacing a face in a video to be played is disclosed, including:

the identification unit is used for identifying a first face from the decoded frame data of the video to be played;

the three-dimensional modeling unit is used for carrying out three-dimensional modeling by taking the identified key point of the first face as a vertex to obtain a three-dimensional model, so that the vertex coordinates of the three-dimensional model and the key point coordinates of the first face in the video to be played are kept synchronous;

the acquisition unit is used for acquiring a second face;

and the application unit is used for applying the acquired second face as texture to the three-dimensional model.

According to a third aspect of the embodiments of the present disclosure, a device for replacing a face in a video to be played is disclosed, which includes:

a memory storing computer readable instructions;

a processor reading computer readable instructions stored by the memory to perform the above described method.

According to a fourth aspect of embodiments of the present disclosure, a computer program medium is disclosed, having computer readable instructions stored thereon, which, when executed by a processor of a computer, cause the computer to perform the above-described method.

In the embodiment of the disclosure, a first face is identified from decoded frame data of a video to be played. And performing three-dimensional modeling by taking the identified key point of the first face as a vertex to obtain a three-dimensional model, so that the vertex coordinates of the three-dimensional model and the key point coordinates of the first face in the video to be played are kept synchronous. Along with the change of the orientation and the expression of the first face in the video to be played, the three-dimensional model obtained by the key points of the first face can follow the change of the orientation and the expression. However, the three-dimensional model is a silhouette graphic composed of key points, which has no color, i.e., lacks texture. The disclosed embodiments then apply the second face as a texture to the three-dimensional model. Thus, the face in the obtained picture has the appearance of the second face, but has the orientation and expression of the first face. Therefore, the effect of restoring the orientation and the expression of the original face in the video in the face replacing process is achieved.

Additional features and advantages of the disclosure will be set forth in the detailed description which follows, or in part will be obvious from the description, or may be learned by practice of the disclosure.

It is to be understood that both the foregoing general description and the following detailed description are exemplary and explanatory only and are not restrictive of the disclosure.

Drawings

The above and other objects, features and advantages of the present disclosure will become more apparent by describing in detail exemplary embodiments thereof with reference to the attached drawings.

Fig. 1 is a block diagram illustrating an application environment of a method for replacing a face in a video to be played according to an exemplary embodiment of the present disclosure.

Fig. 2 shows a flowchart of a method for replacing a face in a video to be played according to an example embodiment of the present disclosure.

Fig. 3 shows a flowchart of a method for replacing a face in a video to be played according to an example embodiment of the present disclosure.

FIG. 4 illustrates a detailed flow diagram of three-dimensional modeling according to an example embodiment of the present disclosure.

FIG. 5 shows a detailed flow diagram of applying a second face as a texture to the three-dimensional model according to an example embodiment of the present disclosure.

Fig. 6A illustrates key points of a first face identified in decoded frame data of a video to be played according to an example embodiment of the present disclosure.

FIG. 6B illustrates a three-dimensional model built based on the keypoints identified in FIG. 6A, according to an example embodiment of the present disclosure.

Fig. 6C shows a schematic diagram of a second face with keypoints, according to an example embodiment of the present disclosure.

FIG. 6D is a diagram illustrating the result of applying the second face shown in FIG. 6C as a texture to the three-dimensional model shown in FIG. 6B according to an example embodiment of the present disclosure.

Fig. 7 is a block diagram illustrating an apparatus for replacing a face in a video to be played according to an example embodiment of the present disclosure.

Fig. 8 is a block diagram illustrating an apparatus for replacing a human face in a video to be played according to an example embodiment of the present disclosure.

Fig. 9 shows a specific flowchart of a method for replacing a human face in a video to be played in a scene of template face changing according to an example embodiment of the present disclosure.

Fig. 10A illustrates an interface for a user to select a faceting template video in a scene of the faceting template video, according to an example embodiment of the present disclosure.

Fig. 10B illustrates an interface for a user to take a picture in a scene of a face-changing template video according to an example embodiment of the present disclosure.

Fig. 11 is a block diagram illustrating an alternative apparatus for replacing a face in a video to be played according to an example embodiment of the present disclosure.

Detailed Description

Example embodiments will now be described more fully with reference to the accompanying drawings. Example embodiments may, however, be embodied in many different forms and should not be construed as limited to the examples set forth herein; rather, these example embodiments are provided so that this disclosure will be thorough and complete, and will fully convey the concept of example embodiments to those skilled in the art. The drawings are merely schematic illustrations of the present disclosure and are not necessarily drawn to scale. The same reference numerals in the drawings denote the same or similar parts, and thus their repetitive description will be omitted.

Furthermore, the described features, structures, or characteristics may be combined in any suitable manner in one or more example embodiments. In the following description, numerous specific details are provided to give a thorough understanding of example embodiments of the disclosure. One skilled in the relevant art will recognize, however, that the subject matter of the present disclosure can be practiced without one or more of the specific details, or with other methods, components, steps, and so forth. In other instances, well-known structures, methods, implementations, or operations are not shown or described in detail to avoid obscuring aspects of the disclosure.

Some of the block diagrams shown in the figures are functional entities and do not necessarily correspond to physically or logically separate entities. These functional entities may be implemented in the form of software, or in one or more hardware modules or integrated circuits, or in different networks and/or processor devices and/or microcontroller devices.

The method for replacing a face in a video to be played refers to a method for replacing a face in a video to be played with another face, and specifically, in the present disclosure, refers to a method for replacing a first face in a video to be played with a second face.

The application environment shown in fig. 1 comprises the internet 1, a user equipment 1 and a user equipment 2. The user device 2 is the provider of the video to be face-replaced. A user 2 records a video through a user device 2 and puts it on the internet 1. The user 1 sees the video on the internet 1 through the user equipment 1, and wants to replace the face of the user 2 in the video with the face of the user, so as to send the video to the friend. The user 1 then takes a picture of himself by means of the camera 11 in the user device 1. The picture is then transferred to the image processing unit 12 in the user device 1. At the same time, the image processing unit 12 downloads the video of the user 2 from the internet. The image processing unit 12 replaces the face of the user 2 in the video of the user 2 with the face of the user 1 based on the picture of the user 1 and the video of the user 2, thereby completing face replacement. The face-changed video is displayed on the display 13. The user 1 may also send the video to his friends.

As shown in fig. 2, according to an embodiment of the present disclosure, a method for replacing a face in a video to be played is provided, including:

step 110, identifying a first face from decoded frame data of a video to be played;

step 120, taking the identified key point of the first face as a vertex, performing three-dimensional modeling to obtain a three-dimensional model, and keeping the vertex coordinates of the three-dimensional model and the key point coordinates of the first face in the video to be played synchronous;

step 125, acquiring a second face;

and step 130, applying the acquired second face as a texture to the three-dimensional model.

These steps are described in detail below.

In step 110, a first face is identified from decoded frame data of a video to be played.

In one embodiment, step 110 includes:

sequentially decoding frames in a video to be played into decoded frame data;

putting the decoded frame data into a buffer area;

a first face is identified from the decoded frame data of the buffer.

Generally, when a video is played, the format of the stored video to be played is different from the format that can be displayed on a display screen. Therefore, the video to be played is decoded and then played on the display. In order to improve the uniformity of the playing speed, a video to be played is generally put into a buffer area according to the decoding sequence, and then decoded frame data are taken out from the buffer area in sequence for on-screen playing. Therefore, the frames in the video to be played are decoded into decoded frame data in sequence and put into the buffer. The decoded frame data refers to frame data obtained by decoding frames of a video to be played. In this embodiment, the decoded frame data is not directly displayed on the screen, but a "face changing" process is first completed, that is, a first face in the video to be played is replaced by a second face. Therefore, in this embodiment, it is necessary to recognize the first face from the decoded frame data of the buffer, rather than directly displaying the decoded frame data on the screen.

In the decoded frame data put into the buffer, some frame data contain the first face, and some frame data do not contain the first face. In one embodiment, the decoded frame data of each frame are sequentially taken out from the buffer and whether the first face exists is identified, and until the decoded frame data containing the first face is found, the first face is considered to be identified from the decoded frame data of the video to be played.

In one embodiment, the first face is identified from the decoded frame data of the buffer by a face recognition technique.

In step 120, a three-dimensional modeling is performed by using the identified key point of the first face as a vertex to obtain a three-dimensional model, so that coordinates of the vertex of the three-dimensional model and coordinates of the key point of the first face in the video to be played are kept synchronous.

In the face recognition technology, the key points are the most important points for distinguishing different faces. These points are most likely to represent the difference between different faces. Fig. 6A illustrates key points of a first face recognized in a frame of a video to be played according to an example embodiment of the present disclosure. As shown in fig. 6A, many key points are distributed on the face contour, eyebrow contour, eye contour, nose contour, and mouth contour of the human face. Taking the outline of the mouth as an example, as shown in fig. 6A, the left corner, the 1/3 position from the left corner to the lowest lip portion, the 2/3 position from the left corner to the lowest lip portion, the 1/3 position from the lowest lip portion to the right corner, the 2/3 position from the lowest lip portion to the right corner, and the right corner are 6 key points respectively. For example, the key point 801 represents the key point of the right mouth corner, and the key point 802 represents the key point at 1/2 between the chin top and the lowest point of the right earlobe.

The key points may be defined in advance by the user.

The three-dimensional model refers to a three-dimensional model constructed by three-dimensional software, and includes various buildings, people, vegetation, machines and the like, such as a three-dimensional model drawing of a building. It is a figure formed by connecting a plurality of vertices in a predetermined order. For example, a three-dimensional model of a building is a figure in which each corner of the building is drawn as a vertex at an appropriate coordinate position in a three-dimensional coordinate system, and the vertices are connected in a predetermined order. It may be static or dynamic. The static three-dimensional model refers to a model in which vertex coordinates in a three-dimensional model are fixed. A dynamic three-dimensional model refers to a model in which the coordinates of vertices in the three-dimensional model change. In the embodiments of the present disclosure, the three-dimensional model refers to the latter. Vertices refer to points through which the contours of a three-dimensional model drawn from the contours pass through the vertices. In a dynamic three-dimensional model, these vertices dynamically change, causing the shape of the three-dimensional model to dynamically change.

Three-dimensional modeling refers to the process of building a three-dimensional model.

In this embodiment, the key points of the first face identified in step 110 are taken as vertices. These vertices are connected in a predetermined order, thereby constructing a three-dimensional model as shown in fig. 6B. The key points in the three-dimensional model shown in fig. 6B are not static, but change with the change of the key point of the first face in the video to be played among the frames of the video.

In one embodiment, the three-dimensional modeling is performed by using the identified key points of the first face as vertices, and specifically includes:

step 1201, identifying key points of the first face;

step 1202, determining coordinates of the identified key points;

and 1203, performing three-dimensional modeling by taking the coordinates of the identified key points as vertex coordinates.

In step 1201, the key points of the first face are identified by a face identification technique. In one embodiment, step 1202 includes: and establishing x and y coordinates by taking the center of the first face as an origin of a coordinate system, wherein the x and y coordinates of the key point in the coordinate system are the coordinates of the key point of the first face in the video frame to be played.

In one embodiment, the center of the first face may be located at the tip of the nose. In another embodiment, the geometric center of the first face may also be used as the center of the first face.

In one embodiment, step 1203 includes: and connecting the vertex coordinates according to a preset sequence by taking the coordinates of the identified key points as the vertex coordinates to obtain the established three-dimensional model.

The predetermined order is prescribed in advance. For example, for the lower lip profile shown in FIG. 6A, the 6 key points of the left corner of the mouth, the 1/3 position of the left corner of the mouth to the lowest part of the lower lip, the 2/3 position of the left corner of the mouth to the lowest part of the lower lip, the 1/3 position of the lowest part of the lower lip to the right corner of the mouth, and the right corner of the mouth are sequentially connected. There is a predetermined order connecting the key points on the contour of the face, the key points on the contour of the eye, and the like, respectively. The connected three-dimensional models are shown in fig. 6B in the predetermined order. The three-dimensional model of fig. 6B is not static. Each vertex on the three-dimensional model of fig. 6B is not stationary, but the coordinates of the vertex and the coordinates of the key point of the first face in the video to be played are kept synchronous.

In order to implement such synchronization, as shown in fig. 4, in an embodiment, the synchronizing the vertex coordinates of the three-dimensional model with the coordinates of the key point of the first face in the video to be played specifically includes:

step 1204, tracking a key point of a first face in a frame of a video to be played;

step 1205, tracking the coordinates of the key points of the first face in each frame;

and 1206, keeping the vertex coordinates of the three-dimensional model consistent with the coordinates of the corresponding key points in each tracked frame.

In step 1204, tracking the key point of the first face in the frame of the video to be played means that, after the key point of the first face is identified from the decoded frame data of a certain frame of the video to be played in step 1201, for each key point, the key point is continuously identified in the decoded frame data of the following frame.

For example, after the key point of the right mouth corner of the first face is identified in the decoded frame data of the 4 th frame of the video to be played, the key point of the right mouth corner is continuously identified in the decoded frame data of the following 5 th, 6 th, 7 … … frames.

In one embodiment, step 1205 includes: and establishing x and y coordinates by taking the center of the first face in the frame of the key points of the first face as the origin of a coordinate system, and taking the x and y coordinates of the tracked key points in the coordinate system as the coordinates of the key points of the first face in the frame.

In step 1206, the three-dimensional model vertex coordinates are aligned with the coordinates of the corresponding keypoints in the tracked frames.

For example, if the coordinates of the key point of the right mouth corner of the first face identified in the decoded frame data of the 4 th frame of the video to be played are (2, -2), the coordinates of the key point of the right mouth corner of the first face identified in the decoded frame data of the 5 th frame are (2, -2.1), and the coordinates of the key point of the right mouth corner of the first face identified in the decoded frame data of the 6 th frame are (2, -2.2) … …, the vertex coordinates at the right mouth corner of the three-dimensional model shown in fig. 6B are changed from (2, -2) to (2, -2.1), and then to (2, -2.2) … ….

In step 125, a second face is obtained.

In one embodiment, the second face is a face recognized from a stored face image of oneself or another person on the user device. In this case, step 125 may include:

calling a face image stored on user equipment;

recognizing a face from the face image;

and acquiring the recognized face from the face image as a second face.

In one case, only one face image is stored on the user device, and the face image may be the image of the user himself who performs the face replacement operation or the image of another person. At this time, the second face recognized from the retrieved face image is the unique face.

In another case, a plurality of face images may be stored on the user device, but the user who performs the face replacement operation needs to replace the face of himself or a certain other person among them onto the face in the video. In this case, the acquiring of the recognized face from the face image includes: if the recognized face is the face of the specific user, the recognized face is obtained from the face image.

Here, the specific user may be the user himself who performs the face replacement operation, or may be another user. For example, user a may wish to replace the face of user B in the video with the face of user C in a photograph of user C stored in the user device. At this time, the specific user is user C.

In one embodiment, the particular user may be specified by the user operating face replacement. In this embodiment, the acquiring of the recognized face from the face image includes:

receiving a designation of a particular user;

if the recognized face is the face of the specified specific user, the recognized face is obtained from the face image.

An advantage of this embodiment is that the flexibility of face selection in face replacement is increased. In this case, the user who performs the face replacement operation may designate a different specific user, thereby completing different face replacements.

In one embodiment, receiving a designation of a particular user includes:

displaying a face identity list corresponding to a face recognized from a face image;

a selection of a face identity from the list of face identities is received as a designation of a particular user.

In this embodiment, the user device may prompt the user to enter the facial identity of the person in the photograph each time the user takes a picture. And the user equipment correspondingly stores the picture and the face identity. Thus, when a face is identified from a face image stored by the user device, the corresponding face identity can be obtained and placed in a face identity list for display. The user performing the face replacement can conveniently select one face identity from the face identity list as the specified specific user. The embodiment has the advantages that the user who performs the face replacement operation can conveniently specify the second face for replacement, and the replacement efficiency is improved.

In another embodiment, receiving a designation of a particular user includes:

displaying a face thumbnail list of faces recognized from the face images;

a designation of a face thumbnail in the face thumbnail list is received as a designation of a particular user.

This embodiment has the advantage that the user is not required to enter the identity of the face in the photograph on the user device each time a picture is taken. When the human faces are recognized from the human face images stored in the user equipment, the human faces are changed into thumbnails and are placed in a human face thumbnail list to be displayed. The user who performs the face replacement operation can specify a face thumbnail in the face thumbnail list, and can also specify a specific user. Therefore, even if the user does not input the face identity in the photo on the user equipment after taking the photo each time, the specific user can be conveniently and quickly specified through the face thumbnail.

In another embodiment, the second face is a face that prompts the user to take a picture and is extracted from an image of the face taken by the user. That is, it does not extract the second face from the stored image, but takes a live image from which the second face is extracted. The embodiment has the advantages that the human face image can be shot on site under the condition that the user is not satisfied with the human face picture stored in the user equipment, and the augmented reality interaction sense is achieved.

In this embodiment, step 125 may include:

displaying a user photographing option;

in response to selection of the user's photo option, taking an image;

recognizing a human face from the shot image;

and acquiring the recognized face from the shot image as a second face.

In one embodiment, the displayed user photo option includes a displayed camera icon. If the user who performs face replacement presses a camera icon displayed on the screen, shooting of an image starts.

In another embodiment, the displayed user photo options include a text prompt to take a photo. If the user clicks or touches the text prompt, the image starts to be taken.

Those skilled in the art will appreciate that the user's photo option may take other forms, such as representing the user's photo option with an area at a predetermined location on the screen. If the user touches the area, the image starts to be photographed. Other forms of user photo options are also contemplated by those skilled in the art.

In addition, a plurality of faces may be recognized from the photographed image. In this case, in an embodiment, acquiring the recognized face from the captured image as the second face specifically includes:

extracting and displaying a plurality of recognized human faces;

and responding to the selection of the user on the plurality of extracted faces, and taking the face selected by the user as a second face.

For example, the user a wants to replace the face of the user B in the video with the face of the user C, but at the time of shooting, it happens that the user D approaches the user C and enters the shot image. At this time, the faces of the user C and the user D are recognized from the photographed images. And displaying the recognized faces of the user C and the user D to the user A. User a selects by touch the face of listed user C instead of user D. Thus, the face of user C is taken as the second face, rather than the face of user D.

The embodiment has the advantage that the user who carries out face replacement can conveniently specify the face of the person who really wants to replace when someone intrudes into the shot picture.

In step 130, a second face is applied to the three-dimensional model as a texture.

Texture in computer graphics includes both texture of an object surface in the general sense of even an object surface exhibiting uneven grooves, and color patterns on a smooth surface of an object. The latter is referred to in this disclosure. A texture is an array of pixels having rows and columns. The intersection of a row and a column corresponds to one pixel. Each pixel has four values R, G, B, A, which represent the red (R), green (G), blue (B), and opacity (α) values of the corresponding location, respectively. The appearance of texture resembles a color picture. The second face, as shown in fig. 6C, can be regarded as a picture, and actually can also be regarded as a texture, i.e. several rows and several columns of pixel arrays, each pixel having R, G, B, A four values.

Since the key points of the three-dimensional model established in step 120 are dynamically changed, it reflects the orientation and expression changes of the first face in the video to be played. However, the three-dimensional model is a connected-together outline of the key points, which has no color, i.e., lacks texture. Then, the disclosed embodiments apply the second face as a texture to the three-dimensional model. Thus, the face in the obtained picture has the appearance of the second face, but has the orientation and expression of the first face. Therefore, the effect of restoring the orientation and the expression of the original face in the video in the face replacing process is achieved.

In one embodiment, step 130 may include:

filling the human face edge pixels of the three-dimensional model by using the color of the second human face edge pixels in the texture;

and filling the pixels in the face of the three-dimensional model by using the color of the pixels in the second face in the texture.

A texture can be viewed as an array of pixels in rows and columns. The face edge pixels in the texture refer to those pixels in the pixel array that are passed by the edge of the face. The three-dimensional model is a model with vertex coordinates and key point coordinates of a first face in a video to be played kept synchronous and changing constantly, the periphery of the model is still in the shape of a face, and the shape of the face is also composed of pixels. The face edge pixels of the three-dimensional model are the pixels in the outer perimeter of the three-dimensional model, except that they are blank before color filling.

In one embodiment, if the number of the second face edge pixels in the texture is equal to the number of the face edge pixels of the three-dimensional model, the color of each pixel of the second face edge in the texture may be filled into the blank pixels of the face edge of the three-dimensional model in a one-to-one correspondence.

In another embodiment, if the number of the second face edge pixels in the texture is not equal to the number of the face edge pixels of the three-dimensional model, the color of each pixel of the second face edge in the texture can be subjected to pixel interpolation transformation. The pixel interpolation method is an existing method, and the specific implementation is not described in detail. After this transformation, the number of second face edge pixels in the texture may become equal to the number of face edge pixels of the three-dimensional model. Then, the colors of the pixels at the edge of the second face in the transformed texture can be filled into the blank pixels at the edge of the face of the three-dimensional model in a one-to-one correspondence manner. For example, the number of the second face edge pixels in the texture is 100, and the number of the face edge pixels in the three-dimensional model is 200, a pixel is inserted between adjacent pixels of the second face edge in the texture according to a pixel interpolation method, and the color (including red chroma, green chroma, blue chroma and opacity value) of the pixel is estimated according to a pixel difference method. Thus, the number of second face edge pixels in the texture becomes 200, and their colors are filled in the blank pixels of the face edge of the three-dimensional model in a one-to-one correspondence. As another example, the number of second face edge pixels in the texture is 100, and the number of face edge pixels of the three-dimensional model is 150. And taking 3 adjacent pixels from the edge of the second face in the texture, wherein the distance between 2 segments of pixels is arranged in the middle. The 2-segment inter-pixel distance is changed into 3-segment inter-pixel distance, namely, 2 pixels are inserted in the middle of the pixels at two ends on average. At this time, the color values of the 2 pixels to be inserted can be obtained from the color values of the 3 adjacent pixels in the second face edge. Thus, the number of second face edge pixels in the texture becomes 150, and their colors are filled in the blank pixels of the face edge of the three-dimensional model in a one-to-one correspondence.

Then, filling of the pixels inside the face of the three-dimensional model with the color of the pixels inside the second face in the texture is started.

A texture can be viewed as an array of pixels in rows and columns. The face-inside pixels in the texture refer to pixels in an inside area surrounded by edges of the face in this pixel array. The three-dimensional model is a model with vertex coordinates always kept synchronous with the coordinates of key points of a first face in a video to be played and constantly changing, the periphery of the model is still in the shape of a face, and the shape of the face is also composed of pixels. The pixels inside the face of the three-dimensional model are pixels in an area surrounded by the outer periphery of the three-dimensional model. These pixels are blank before color filling.

In one embodiment, the color of the pixels inside the second face in the texture is used to fill the pixels inside the face of the three-dimensional model by using a pixel difference method.

In one example, since the texture is an array of pixels, all pixels of a row of the texture inside the second face may be fetched. The interior of the face of the three-dimensional model is also composed of pixels, and all blank pixels in the corresponding line in the face can be extracted. If the number of the pixels of the line extracted from the texture is equal to the number of the pixels of the corresponding line in the face of the three-dimensional model, the colors of the pixels of the line extracted from the texture can be filled in the blank pixels of the corresponding line in the face of the three-dimensional model in a one-to-one correspondence manner. And if the number of the pixels of the line extracted from the texture is not equal to the number of the pixels of the corresponding line in the human face of the three-dimensional model, performing pixel interpolation conversion by using the colors of the pixels of the line extracted from the texture. After the transformation, the number of pixels of the line taken out of the texture becomes equal to the number of pixels of the corresponding line inside the face of the three-dimensional model. Then, the colors of the pixels of the line taken out from the transformed texture can be filled into the blank pixels of the corresponding line in the human face of the three-dimensional model in a one-to-one correspondence manner.

In another embodiment, as shown in FIG. 5, step 130 comprises:

step 1301, filling the pixels of the corresponding vertexes in the three-dimensional model by using the colors of the pixels of the key points in the texture;

step 1302, filling pixels between corresponding vertices in the three-dimensional model based on colors of pixels between key points in the texture.

In step 1301, the pixels of the corresponding vertex in the three-dimensional model are filled with the color of the pixels of the key point in the texture. In one embodiment, the color of the pixel includes a red (R) chromaticity, a green (G) chromaticity, a blue (B) chromaticity, an opacity (a) of the pixel.

For example, the pixel at the right mouth corner in the three-dimensional model is filled with R, G, B and alpha value of the pixel at the right mouth corner in the texture of the second face, so that the pixel has R, G, B and alpha value. Thus, the pixel of the right mouth corner of the three-dimensional model established by taking the key point of the first face as the vertex has the R, G, B and alpha value of the pixel of the right mouth corner in the texture of the second face.

After filling all the pixels of the vertices in the three-dimensional model in step 1301, the regions of the three-dimensional model other than the vertices are left unfilled by color. These regions are filled in step 1302. In step 1302, pixels between corresponding vertices in the three-dimensional model are filled based on colors of pixels between keypoints in the texture. For example, there are 2 pixels P1, P2 between the pixels of two key points K1, K2 in the texture, with R, G, B, the alpha values (R1, G1, B1, alpha 1) and (R2, G2, B2, alpha 2), respectively. But in the three-dimensional model, there are 3 pixels between the pixels of the corresponding vertices. In this case, the R, G, B and α values of the 3 pixels can be obtained by pixel interpolation from (R1, G1, B1, α 1) and (R2, G2, B2, α 2).

Compared with the previous embodiment of filling with edge pixels and pixels inside the face, the embodiment of filling with the key points and the pixels between the key points can make the face expression after replacement more vivid and real. Because the key points are the points which can represent the facial features in the human face most, the points are firstly subjected to pixel filling, and then the pixel filling is carried out among the points, so that the typical changes of the human face in various expressions can be reflected better, and the presented human face expression is closer to reality.

In one embodiment, as shown in fig. 3, the method further comprises:

step 140, drawing a display frame based on the decoded frame data;

step 150, drawing the result of applying the second face as the texture to the three-dimensional model on the display frame to cover the first face.

These steps are described in detail below.

In step 140, a display frame is rendered based on the decoded frame data.

The decoded frame data is frame data decoded from the video to be played with the first face. The display frame is a frame for display, and corresponds to a frame in the video to be played. The display frame drawn based on the decoded frame data is a display frame with a first face.

In step 150, a second face is rendered over the display frame, overlaying the first face, as a result of the texture applied to the three-dimensional model.

The result of applying the second face as a texture to the three-dimensional model is a face having the texture (color pattern) of the second face, but the orientation and expression of the first face. The face is of equal size to the first face on the display frame. The result of applying the second face as a texture to the three-dimensional model is drawn on the display frame, and just covers the first face, so that the video to be played with the second face replacing the first face is obtained, as shown in fig. 6D. The second face in the video maintains the orientation and expression of the first face.

In step 901, the terminal device displays a face change template video for the user to select, as shown in fig. 10A.

In step 902, the user selects a face change template video.

In step 903, the terminal device displays an interface to prompt the user to take a picture or select a photo. Fig. 10B shows an interface for prompting the user to take a picture.

In step 904, the user takes or selects a photograph.

In step 905, the terminal device decodes the frame in the face-changing template video into decoded frame data by using a decoder, and places the decoded frame data into a buffer area to wait for being displayed on a screen.

In step 906, the terminal device identifies a human face in the face-changing template video from the decoded frame data in the buffer.

In step 907, the terminal device identifies key points of the face in the face change template video.

In step 908, the terminal device determines the coordinates of the identified keypoints.

In step 909, the terminal device performs three-dimensional modeling with the identified coordinates of the key points as vertex coordinates.

In step 910, the terminal device tracks key points of the human face in the face changing template video.

In step 911, the terminal device tracks the coordinates of the key points of the face in each frame.

In step 912, the terminal device keeps the three-dimensional model vertex coordinates consistent with the coordinates of the corresponding key point in each tracked frame.

In step 913, the terminal device uses the face of the user in the picture taken or selected by the user as a texture, and fills the pixels of the corresponding vertices in the three-dimensional model with the colors of the pixels of the key points in the texture.

In step 914, the terminal device fills in pixels between corresponding vertices in the three-dimensional model based on the colors of the pixels between the keypoints in the texture.

In step 915, the terminal device renders a display frame based on the decoded frame data.

In step 916, the terminal device draws the user's face as a result of applying the texture to the three-dimensional model over the display frame, overlaying the face in the template video.

In step 917, the terminal device displays the result after the user's face is drawn on the display frame as a result of the texture being applied to the three-dimensional model.

As shown in fig. 7, according to an embodiment of the present disclosure, there is provided an apparatus for replacing a human face in a video to be played, including:

the identification unit 710 is configured to identify a first face from decoded frame data of a video to be played;

the three-dimensional modeling unit 720 is configured to perform three-dimensional modeling by using the identified key point of the first face as a vertex to obtain a three-dimensional model, so that coordinates of the vertex of the three-dimensional model and coordinates of the key point of the first face in the video to be played are kept synchronous;

an acquisition unit 725 configured to acquire a second face;

an applying unit 730, configured to apply the obtained second face as a texture to the three-dimensional model.

Optionally, as shown in fig. 8, the apparatus further includes:

a first rendering unit 740 for rendering into a display frame based on the decoded frame data;

a second rendering unit 750 for rendering a second face over the display frame, overlaying the first face, as a result of the texture being applied to the three-dimensional model.

Optionally, the three-dimensional modeling unit 720 is further configured to:

identifying key points of the first face;

determining coordinates of the identified key points;

and performing three-dimensional modeling by taking the coordinates of the identified key points as vertex coordinates.

Optionally, the three-dimensional modeling unit 720 is further configured to:

tracking a key point of a first face in a frame of a video to be played;

tracking coordinates of key points of the first face in each frame;

and enabling the vertex coordinates of the three-dimensional model to be consistent with the coordinates of the corresponding key points in each tracked frame.

Optionally, the application unit 730 is further configured to:

filling pixels of corresponding vertexes in the three-dimensional model by using the colors of the pixels of the key points in the texture;

filling pixels between corresponding vertices in the three-dimensional model based on colors of pixels between key points in the texture.

Optionally, the color of the pixel comprises a red (R) chromaticity, a green (G) chromaticity, a blue (B) chromaticity, an opacity (a) of the pixel.

The following describes the replacement apparatus 9 for a human face in a video to be played according to an embodiment of the present disclosure with reference to fig. 11. The face replacement apparatus 9 shown in fig. 11 is only an example, and should not bring any limitation to the functions and the application range of the embodiment of the present invention.

As shown in fig. 11, the replacing means 9 for the human face in the video to be played is represented in the form of a general-purpose computing device. The components of the replacing device 9 for the face in the video to be played may include, but are not limited to: at least one processing unit 810, at least one memory unit 820, and a bus 830 that couples the various system components including the memory unit 820 and the processing unit 810.

The memory unit stores program code that can be executed by the processing unit 810 to cause the processing unit 810 to perform the steps according to various exemplary embodiments of the present invention described in the description part of the above exemplary methods of the present specification. For example, the processing unit 810 may perform the various steps as shown in fig. 2.

The storage unit 820 may include readable media in the form of volatile memory units such as a random access memory unit (RAM) 8201 and/or a cache memory unit 8202, and may further include a read only memory unit (ROM) 8203.

The storage unit 820 may also include a program/utility 8204 having a set (at least one) of program modules 8205, such program modules 8205 including, but not limited to: an operating system, one or more application programs, other program modules, and program data, each of which, or some combination thereof, may comprise an implementation of a network environment.

Bus 830 may be any of several types of bus structures including a memory unit bus or memory unit controller, a peripheral bus, an accelerated graphics port, a processing unit, or a local bus using any of a variety of bus architectures.

The replacement means 9 of the face in the video to be played may also communicate with one or more external devices 700 (e.g., keyboard, pointing device, bluetooth device, etc.), with one or more devices that enable the user to interact with the replacement means 9 of the face in the video to be played, and/or with any device (e.g., router, modem, etc.) that enables the replacement means 9 of the face in the video to be played to communicate with one or more other computing devices. Such communication may occur via an input/output (I/O) interface 650. Also, the replacement device 9 for the face in the video to be played may also communicate with one or more networks (e.g., a Local Area Network (LAN), a Wide Area Network (WAN), and/or a public network, such as the internet) through the network adapter 860. As shown, the network adapter 860 communicates over the bus 830 with other modules of the replacement device 9 for a face in a video to be played. It should be understood that although not shown in the figures, the alternative means 9 for replacing faces in a video to be played may be implemented using other hardware and/or software modules, including but not limited to: microcode, device drivers, redundant processing units, external disk drive arrays, RAID systems, tape drives, and data backup storage systems, among others.

Through the above description of the embodiments, those skilled in the art will readily understand that the exemplary embodiments described herein may be implemented by software, or by software in combination with necessary hardware. Therefore, the technical solution according to the embodiments of the present disclosure may be embodied in the form of a software product, which may be stored in a non-volatile storage medium (which may be a CD-ROM, a usb disk, a removable hard disk, etc.) or on a network, and includes several instructions to enable a computing device (which may be a personal computer, a server, a terminal device, or a network device, etc.) to execute the method according to the embodiments of the present disclosure.

In an exemplary embodiment of the present disclosure, there is also provided a computer program medium having stored thereon computer readable instructions which, when executed by a processor of a computer, cause the computer to perform the method described in the above method embodiment section.

According to an embodiment of the present disclosure, there is also provided a program product for implementing the method in the above method embodiment, which may employ a portable compact disc read only memory (CD-ROM) and include program code, and may be run on a terminal device, such as a personal computer. However, the program product of the present invention is not limited in this regard and, in the present document, a readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device.

The program product may employ any combination of one or more readable media. The readable medium may be a readable signal medium or a readable storage medium. A readable storage medium may be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any combination of the foregoing. More specific examples (a non-exhaustive list) of the readable storage medium include: an electrical connection having one or more wires, a portable disk, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.

A computer readable signal medium may include a propagated data signal with readable program code embodied therein, for example, in baseband or as part of a carrier wave. Such a propagated data signal may take many forms, including, but not limited to, electro-magnetic, optical, or any suitable combination thereof. A readable signal medium may be any readable medium that is not a readable storage medium and that can communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device.

Program code embodied on a readable medium may be transmitted using any appropriate medium, including but not limited to wireless, wireline, optical fiber cable, RF, etc., or any suitable combination of the foregoing.

Program code for carrying out operations for aspects of the present invention may be written in any combination of one or more programming languages, including an object oriented programming language such as Java, C + + or the like and conventional procedural programming languages, such as the "C" programming language or similar programming languages. The program code may execute entirely on the user's computing device, partly on the user's device, as a stand-alone software package, partly on the user's computing device and partly on a remote computing device, or entirely on the remote computing device or server. In the case of a remote computing device, the remote computing device may be connected to the user computing device through any kind of network, including a Local Area Network (LAN) or a Wide Area Network (WAN), or may be connected to an external computing device (e.g., through the internet using an internet service provider).

It should be noted that although in the above detailed description several modules or units of the device for action execution are mentioned, such a division is not mandatory. Indeed, the features and functionality of two or more modules or units described above may be embodied in one module or unit, according to embodiments of the present disclosure. Conversely, the features and functions of one module or unit described above may be further divided into embodiments by a plurality of modules or units.

Moreover, although the steps of the methods of the present disclosure are depicted in the drawings in a particular order, this does not require or imply that the steps must be performed in this particular order, or that all of the depicted steps must be performed, to achieve desirable results. Additionally or alternatively, certain steps may be omitted, multiple steps combined into one step execution, and/or one step broken into multiple step executions, etc.

Through the above description of the embodiments, those skilled in the art will readily understand that the exemplary embodiments described herein may be implemented by software, or by software in combination with necessary hardware. Therefore, the technical solution according to the embodiments of the present disclosure may be embodied in the form of a software product, which may be stored in a non-volatile storage medium (which may be a CD-ROM, a usb disk, a removable hard disk, etc.) or on a network, and includes several instructions to enable a computing device (which may be a personal computer, a server, a mobile terminal, or a network device, etc.) to execute the method according to the embodiments of the present disclosure.

Other embodiments of the disclosure will be apparent to those skilled in the art from consideration of the specification and practice of the invention disclosed herein. This application is intended to cover any variations, uses, or adaptations of the disclosure following, in general, the principles of the disclosure and including such departures from the present disclosure as come within known or customary practice within the art to which the disclosure pertains. It is intended that the specification and examples be considered as exemplary only, with a true scope and spirit of the disclosure being indicated by the following claims.

Claims

1. A method for replacing a human face in a video to be played is characterized by comprising the following steps:

identifying a first face from decoded frame data of a video to be played;

the method comprises the following steps of taking a key point of an identified first face as a vertex, carrying out three-dimensional modeling to obtain a three-dimensional model, and enabling the vertex coordinate of the three-dimensional model to be synchronous with the key point coordinate of the first face in a video to be played, wherein the three-dimensional model is a contour graph which is connected by key points and has no texture, and the step of enabling the vertex coordinate of the three-dimensional model to be synchronous with the key point coordinate of the first face in the video to be played comprises the following steps: after the key points of the first face are identified from the decoded frame data of one frame in the video to be played, for each key point, continuously identifying the key points in the decoded frame data of the following frame; establishing x and y coordinates by taking the center of the first face in a frame of the tracked key point of the first face as an origin of a coordinate system, and taking the x and y coordinates of the tracked key point in the coordinate system as the coordinates of the key point of the first face in the frame; enabling the vertex coordinates of the three-dimensional model to be consistent with the coordinates of the corresponding key points in each tracked frame;

acquiring a second face, wherein the second face is a face recognized from a face image of the user or others stored on user equipment, or a face prompting the user to take a picture and extracted from the face image taken by the user;

taking the second face as a texture, and filling pixels of corresponding vertexes in the three-dimensional model by using the colors of the pixels of key points in the texture;

and filling pixels between corresponding vertexes in the three-dimensional model based on the colors of the pixels between the key points in the texture, wherein the three-dimensional model after pixel filling has the orientation and expression of the first face.

2. The method of claim 1, further comprising:

rendering a display frame based on the decoded frame data;

the second face is rendered over the display frame as a result of the texture being applied to the three-dimensional model, overlaying the first face.

3. The method according to claim 1, wherein performing three-dimensional modeling with the identified key points of the first face as vertices specifically comprises:

identifying key points of the first face;

determining coordinates of the identified keypoints;

4. The method of claim 1, wherein the color of the pixel comprises a red (R) chromaticity, a green (G) chromaticity, a blue (B) chromaticity, and an opacity (a) of the pixel.

5. A device for replacing a human face in a video to be played is characterized by comprising:

the three-dimensional modeling unit is used for carrying out three-dimensional modeling by taking the identified key points of the first face as vertexes to obtain a three-dimensional model, so that the vertex coordinates of the three-dimensional model and the key point coordinates of the first face in the video to be played are kept synchronous, and the three-dimensional model is a non-texture contour graph formed by connecting the key points; the three-dimensional modeling unit is further configured to: after the key points of the first face are identified from the decoded frame data of one frame in the video to be played, continuously identifying the key points in the decoded frame data of the following frame for each key point; establishing x and y coordinates by taking the center of the first face in a frame of the tracked key point of the first face as an origin of a coordinate system, and taking the x and y coordinates of the tracked key point in the coordinate system as the coordinates of the key point of the first face in the frame; enabling the vertex coordinates of the three-dimensional model to be consistent with the coordinates of the corresponding key points in each tracked frame;

the acquisition unit is used for acquiring a second face, wherein the second face is a face recognized from a face image of the user or others stored on the user equipment, or a face prompting the user to take a picture and extracted from the face image taken by the user;

the application unit is used for filling the pixels of the corresponding vertexes in the three-dimensional model by using the colors of the pixels of the key points in the texture by taking the second face as the texture; and filling pixels between corresponding vertexes in the three-dimensional model based on the colors of the pixels between the key points in the texture, wherein the three-dimensional model after pixel filling has the orientation and expression of the first face.

6. The apparatus of claim 5, further comprising:

a first rendering unit for rendering into a display frame based on the decoded frame data;

and the second drawing unit is used for drawing a second face as a texture on the display frame to cover the first face.

7. The apparatus of claim 5, wherein the three-dimensional modeling unit is further configured to:

identifying key points of the first face;

determining coordinates of the identified key points;

8. The apparatus of claim 5, wherein the color of the pixel comprises a red (R) chromaticity, a green (G) chromaticity, a blue (B) chromaticity, and an opacity (a) of the pixel.

9. A device for replacing a human face in a video to be played is characterized by comprising:

a memory storing computer readable instructions;

a processor reading computer readable instructions stored by the memory to perform the method of any of claims 1-4.

10. A computer program medium having computer readable instructions stored thereon which, when executed by a processor of a computer, cause the computer to perform the method of any one of claims 1-4.