CN116721194B

CN116721194B - Face rendering method and device based on generation model

Info

Publication number: CN116721194B
Application number: CN202310996097.5A
Authority: CN
Inventors: 林诗琪; 张磊; 高熙和
Original assignee: Hanbo Semiconductor Shanghai Co ltd
Current assignee: Hanbo Semiconductor Shanghai Co ltd
Priority date: 2023-08-09
Filing date: 2023-08-09
Publication date: 2023-10-24
Anticipated expiration: 2043-08-09
Also published as: CN116721194A

Abstract

The application provides a face rendering method and device based on a generation model. The face rendering method based on the generated model comprises the following steps: s1, acquiring an original face video frame comprising a face and generated by a rendering pipeline of a 3D engine and rendering auxiliary information used by the rendering pipeline; s2, repairing the original face video frame by using a face repair generation model according to the rendering auxiliary information to obtain a repaired face video frame; s3, replacing the original face video frame with the repaired face video frame to generate a 3D video comprising the face. The method and the device provided by the application can effectively improve the rendering effect of the face in the 3D video under the condition of not changing the existing 3D engine.

Description

Face rendering method and device based on generation model

Technical Field

The application relates to the field of computer image processing, in particular to a face rendering method and device based on a generated model.

Background

The face rendering is to display a three-dimensional face model grid with a two-dimensional image with high reality through a calculation mode. Face rendering is widely used in the fields of 3D games, movie works, live broadcasting of virtual persons, virtual reality, and the like. Traditional graphic rendering is mainly completed through complex algorithms such as rasterization or optical tracking, and algorithm parameters need to be manually adjusted according to different scenes and requirements to achieve an ideal effect. The authenticity of face rendering is also dependent on the quality of the 3D engine. The current mainstream modeling method based on computer graphics, such as the method based on the general model 3DMM and the CANDID-3, has the problem of poor modeling of details such as facial hair, pupils, skin textures and the like, is difficult to meet the current requirements of people on the actual experience of 3D scenes, and greatly reduces the user experience due to the need of manually adjusting complex parameter settings.

The method for solving the face modeling problem at present is to use precision equipment (such as a three-dimensional scanner) to capture face detail characteristics in an all-around manner and accurately restore the face detail characteristics to a three-dimensional space. However, the method is high in price, and needs rescanning for different people, so that mass production cannot be realized, and the application range is narrow.

Accordingly, it is desirable to provide a method capable of improving the rendering effect of a 3D engine on a human face conveniently and efficiently.

Disclosure of Invention

In view of the above, the present application provides a face rendering method and apparatus based on a generated model, which are used for solving the above technical problems in the prior art.

According to one aspect of the present application, there is provided a face rendering method based on a generation model, the method comprising:

s1, acquiring an original face video frame comprising a face and generated by a rendering pipeline of a 3D engine and rendering auxiliary information used by the rendering pipeline;

s2, repairing the original face video frame by using a face repair generation model according to the rendering auxiliary information to obtain a repaired face video frame;

s3, replacing the original face video frame with the repaired face video frame to generate a 3D video comprising the face,

wherein the rendering auxiliary information comprises a target effect face texture image.

According to some embodiments of the application, the rendering side information further comprises face depth information and/or illumination information.

According to some embodiments of the application, the face depth information comprises depth buffer data of the rendering pipeline, and the illumination information comprises position, intensity and/or color information of the light source.

According to some embodiments of the application, step S2 comprises:

s21, extracting an original face image from an original face video frame;

s22, inputting the original face image and rendering auxiliary information into the face restoration generating model to obtain a restored face image;

s23, replacing the original face image in the original face video frame by the repaired face image to obtain the repaired face video frame.

According to some embodiments of the application, the method comprises performing step S2 in sequence for each original face video frame generated by the dye line.

According to some embodiments of the application, the face restoration generation model includes:

the appearance encoder is used for encoding the input original face image; and

and the texture encoder is used for encoding the input target effect face texture image.

According to some embodiments of the application, the face restoration generation model further includes:

the depth information encoder is used for encoding the input human face depth information; and/or

And the illumination information encoder is used for encoding the input illumination information.

According to an aspect of the present application, there is provided a face rendering apparatus based on a generative model, the apparatus comprising:

an acquisition unit for acquiring an original face video frame including a face generated by a rendering pipeline of the 3D engine and rendering auxiliary information used by the rendering pipeline;

the restoration unit is used for restoring the original face video frame by utilizing the face restoration generating model according to the rendering auxiliary information to obtain a restored face video frame;

a replacing unit for replacing the original face video frame with the repaired face video frame to obtain a 3D video including a face,

wherein the rendering assistance information comprises a target effect face texture image.

According to an aspect of the present application, there is provided an electronic apparatus including:

one or more processors;

a memory for storing executable instructions;

the one or more processors are configured to implement the methods described above via the executable instructions.

According to an aspect of the present application, there is provided a computer-readable storage medium having stored thereon a computer program which, when executed by a processor, causes the processor to perform the method described above.

According to the face rendering method and device based on the generation model, the subsequent repair processing is carried out after the face video frame is generated by the rendering pipeline of the 3D engine, so that the face rendering effect in the 3D video output by the 3D engine is improved under the condition that the 3D model of the existing 3D engine is not changed.

Drawings

The accompanying drawings are included to provide a further understanding of the application and are incorporated in and constitute a part of this specification, illustrate and do not limit the application.

FIG. 1 illustrates a flow chart of a face rendering method based on a generative model provided by an exemplary embodiment of the present application;

fig. 2 is a schematic structural diagram of a face restoration generating model according to an exemplary embodiment of the present application;

FIG. 3 illustrates an exemplary diagram of an unrepaired face video frame in an exemplary embodiment of the application;

FIG. 4 illustrates an exemplary diagram of a repaired face video frame in an exemplary embodiment of the present application;

fig. 5 is a block diagram illustrating a face rendering apparatus based on a generative model according to an exemplary embodiment of the present application;

fig. 6 shows a block diagram of an electronic device according to an exemplary embodiment of the present application.

Detailed Description

Various exemplary embodiments of the present application will be described in detail below with reference to the accompanying drawings. The description of the exemplary embodiments is merely illustrative, and is not intended to be any limitation on the application, its application or use. The present application may be embodied in many different forms and is not limited to the embodiments described herein. These embodiments are provided so that this disclosure will be thorough and complete, and will fully convey the scope of the application to those skilled in the art.

Unless the context clearly indicates otherwise, if the number of elements is not specifically limited, the elements may be one or more. As used in this specification, the term "plurality/s/these" means two or more, and the term "based on/according to" should be interpreted as "based at least in part on/according to". Furthermore, the term "and/or" and "at least one of …" encompasses any and all possible combinations of the listed items.

The face rendering method provided by the embodiment of the application repairs the face video frame generated by the rendering pipeline of the 3D engine by using the generating model, so that the face rendering effect in the video generated by the 3D engine is improved under the condition of not changing the 3D engine.

Referring to fig. 1, a flowchart of a face rendering method based on a generative model according to an exemplary embodiment of the present application is shown. As shown in fig. 1, the face rendering method based on the generation model provided by the exemplary embodiment of the application includes:

s2: repairing the original face video frame by using a face repair generation model according to the rendering auxiliary information to obtain a repaired face video frame;

s3: and replacing the original face video frame with the repaired face video frame to generate the 3D video comprising the face.

In step S1, an original face video frame including a face generated by a rendering pipeline of a 3D engine and rendering side information used by the rendering pipeline are acquired. The 3D engine may be an existing conventional 3D engine. The rendering pipeline of the 3D engine is capable of rendering faces to generate video frames (i.e., face video frames) that include face images. Because of the limited rendering capabilities of the rendering pipeline of the 3D engine, the output of the rendering pipeline is a low quality face rendering image (e.g., a simplified or cartoon version of a face rendering image) that has a rendering effect that differs significantly from the shape (geometry) and texture (texture) of the target effect. A low quality face rendering map may be understood as a face image rendered with a low quality face model and a low quality texture map.

Rendering side information used when the rendering pipeline renders a face includes, for example, a target effect face texture image. The target effect face texture image refers to a face texture primitive used when the rendering pipeline is used as a texture map, and the texture image should be a high quality image. It should be appreciated that this image is one of the inputs to the rendering pipeline and not the rendered texture map. In some embodiments, the rendering side information may further include face depth information and/or illumination information, etc. Face depth information (or face depth map) refers to depth buffer data (depth buffer) of a rendering pipeline for representing depth information of an object. Illumination information refers to information used in the rendering pipeline to calculate illumination, such as information of the position, intensity, color, etc. of the light source. The rendering auxiliary information can guide the face restoration generating model to restore the face video frames generated by the rendering pipeline, so that the face rendering effect in the 3D video is improved.

In step S2, according to the rendering auxiliary information, the original face video frame is repaired by using the face repair generation model, and a repaired face video frame is obtained. Step S2 may be sequentially performed for each original face video frame, so as to repair each original face video frame, thereby obtaining a repaired face video frame corresponding to each original face video frame. In some embodiments, only a part of original face video frames (for example, key frames) can be repaired without repairing each original face video frame, so that resource consumption required by face image repair is reduced, and execution efficiency of the method provided by the embodiment of the application is improved.

The face restoration generation model in the embodiment of the application generates the corresponding face picture by utilizing the capability of generating the detail characteristic of the simulated real picture which is strong against the network. The generating countermeasure network consists of a generator and a discriminator, false pictures are generated through the training generator, the discriminator learns how to distinguish the false pictures and real pictures of a given data set and gives negative feedback to the generator, and the network can gradually learn how to simulate details and characteristics of the real pictures in the countermeasure training, so that the effect of spurious reality is achieved. The face restoration generating model is thus capable of restoring a face image with image defects or poor quality to a complete and high-quality face image.

In step S2, the rendering auxiliary information obtained in step S1 is used to assist the face restoration generating model in restoration of the original face video frame. Rendering side information includes, for example, target effect face texture images, face depth maps, illumination information, and the like. The target effect face texture image can provide guidance for the face restoration generating model, and guide the face restoration generating model to generate wanted texture details, such as details of missing spots, wrinkles, pores and the like in the low-quality rendering chart. The face depth information can provide three-dimensional information of the face rendering graph, so that the face restoration generating model is assisted to generate a high-quality face image with better stereoscopic degree. In addition, as the face texture image is a face which is not processed by an illumination algorithm, the illumination information can assist the face restoration generating model to generate a face image with illumination more meeting the requirement of a rendering target.

The face restoration generating model according to the embodiment of the application comprises an encoder and a generator according to functions. The encoder is used for extracting the face characteristics and encoding the face characteristics, and the generator utilizes the characteristics extracted by the encoder to reconstruct and repair the face. The face restoration generating model according to the embodiment of the application can comprise a plurality of encoders and a generator connected with the plurality of encoders, wherein each encoder is responsible for encoding one input type. In the face restoration generation model shown in fig. 2, two encoders are included, one for encoding an input low-quality rendered face image (i.e., an original face image) and the other for encoding a target face texture image. The two encoders input the extracted feature codes into a generator to generate a restored face image. In some embodiments, the face restoration generation model may further include a depth information encoder and/or a lighting information encoder for encoding the depth information and/or the lighting information, respectively. The face restoration generation model may also include only one encoder, in which case operations such as latitude alignment and/or stitching may be required on different types of information input by the encoder, and the encoded extracted features may then be input to the generator.

The face restoration generating model described in the embodiment of the application can be obtained through pre-training. The training data of the face restoration generating model comprises a high-quality face image obtained in advance, rendering auxiliary information reversely obtained through an existing algorithm and a low-quality 3D face rendering image. And using the low-quality 3D face rendering image and rendering auxiliary information as the input of the face restoration generating model, and training the face restoration generating model by comparing the high-quality face image with the restored image output by the face restoration generating model, so as to obtain the trained face restoration generating model.

The trained face restoration generation model can retain the structural features of the original face and add real details. Fig. 3 shows an unrepaired original face video frame and a partial magnified view of the face eyes in an exemplary embodiment of the application. Fig. 4 shows a repaired face video frame and a partial enlarged view of face eyes obtained by repairing a face image in the original face video frame in fig. 3 through a face repair generation model according to an embodiment of the present application. As can be clearly seen by comparing fig. 4 and fig. 3, the face restoration model improves the pupil realism rendering effect. The repaired pupil is greatly improved in permeability, three-dimensional degree and detail. Since the eyes are important parts for representing the authenticity of the human face, the authenticity of the whole human face is improved wholly through the optimization of the eyes. In addition to repairing the eyes of a human face, the human face repair generation model can optimize, for example, facial hair and improve the authenticity of the facial hair.

According to some embodiments of the application, step S2 may specifically include:

s21, extracting an original face image from an original face video frame;

s22, inputting the original face image and rendering auxiliary information into a face restoration generating model to obtain a restored face image;

In step S21, an original face image is extracted from the original face video frame. Because the face video frame generally comprises the face image and the background image, the influence of the background image on the face image restoration can be avoided by extracting the face image in the video frame, and the execution efficiency of the face image restoration is improved.

In step S22, the original face image and rendering auxiliary information are input into a face restoration generation model, and a restored face image is obtained. The encoder of the face restoration generating model encodes the input original face image and rendering auxiliary information respectively, and then transmits the encoded characteristic information to the generator. The feature information of the rendering assistance information can instruct the generator to generate a restored face image.

In step S23, the restored face image is used to replace the original face image in the original face video frame, so as to obtain a restored face video frame. Specifically, the affine transformation position corresponding to the face image can be restored to the original face video frame according to the affine transformation position corresponding to the face image, so that the restored face video frame is obtained.

In step S3, the original face video frames in the original video are replaced with the repaired face video frames, and a 3D video including a face is generated. Since the restored face video frames are obtained based on the original face video frames, the original face video frames are replaced with corresponding restored face video frames, and a 3D video with improved face rendering effect can be obtained.

According to the face rendering method based on the generation model, the face restoration generation model is added after the rendering pipeline of the 3D engine to carry out post-processing on the original face video frame, so that the rendering effect of the face video frame generated by the rendering pipeline is improved, and the technical problem that the rendering effect of the existing 3D face model is not real enough is solved.

Meanwhile, as the human face restoration model can restore the characteristics of the real human face characteristics, the operation efficiency of the 3D engine can be improved on the premise of not reducing the rendering authenticity by replacing the algorithm with the high-fidelity, time-consuming and labor-consuming algorithm in the original rendering pipeline with the algorithm with slightly poor quality and high efficiency.

The face rendering method based on the generated model can be widely applied to application scenes such as 3D games, animations, virtual persons and the like so as to improve the rendering effect of face videos output by a 3D engine.

Referring to fig. 5, a face rendering apparatus based on a generative model according to an exemplary embodiment of the present application is shown. As shown in fig. 5, the apparatus 200 includes:

an acquisition unit 201 for acquiring an original video frame including a face generated via a rendering pipeline of a 3D engine and rendering auxiliary information used by the rendering pipeline;

the repairing unit 202 is configured to repair the original face video frame by using a face repair generation model according to the rendering auxiliary information, so as to obtain a repaired face video frame;

a replacing unit 203 for replacing the original face video frame in the original video with the repaired face video frame, generating a 3D video including a face,

According to some embodiments of the present application, the repair unit 202 may specifically include:

the extraction unit is used for extracting an original face image from the original face video frame;

the execution unit is used for inputting the original face image and rendering auxiliary information into a face restoration generating model to obtain a restored face image;

and the generating unit is used for replacing the original face image in the original face video frame by the repaired face image to obtain the repaired face video frame.

It should be understood that the apparatus shown in fig. 5 corresponds to the method previously described in this specification. Thus, the operations, features and advantages described above with respect to the method are equally applicable to the apparatus and the unit modules comprised thereof; the operations, features and advantages described above for the apparatus and the unit modules comprised thereof are equally applicable to the method. For brevity, substantially identical/similar operations, features and advantages are not described in detail herein.

Although specific functions are discussed above with reference to specific modules, it should be noted that the functions of each unit module in the technical solution of the present application may be implemented by dividing the functions into a plurality of unit modules, and/or at least some functions of the plurality of unit modules may be implemented by combining the functions into a single unit module. The manner in which a particular unit module performs an action in the present application includes that the particular unit module itself performs the action, or that the particular unit module invokes or otherwise accesses the performed action (or performs the action in conjunction with the particular unit module). Thus, a particular unit module that performs an action may include that particular unit module itself that performs the action and/or another unit module that the particular unit module invokes or otherwise accesses that performs the action.

In addition to the technical scheme, the application further provides electronic equipment, which comprises one or more processors and a memory for storing executable instructions. Wherein the one or more processors are configured to implement the above-described methods via executable instructions.

The application also provides a computer readable storage medium having stored thereon a computer program which, when executed by a processor, causes the processor to perform the above method.

In the following part of the present description, illustrative examples of the aforementioned electronic device, non-transitory computer readable storage medium, and computer program product will be described in connection with fig. 6.

Fig. 6 shows a block diagram of an electronic device according to an exemplary embodiment of the present application. The system provided by the present application may also be implemented, in whole or in part, by electronic device 900 or a similar device or system.

The electronic device 900 may be a variety of different types of devices. Examples of electronic device 900 include, but are not limited to: desktop, server, notebook or netbook computers, mobile devices, wearable devices, entertainment devices, televisions or other display devices, automotive computers, and the like.

The electronic device 900 may include at least one processor 902, memory 904, communication interface(s) 909, display device 901, other input/output (I/O) devices 910, and one or more mass storage devices 903, which can communicate with each other, such as through a system bus 911 or other suitable connection.

The processor 902 may be a single processing unit or multiple processing units, all of which may include a single or multiple computing units or multiple cores. The processor 902 may be implemented as one or more microprocessors, microcomputers, microcontrollers, digital signal processors, central processing units, state machines, logic circuitries, and/or any devices that manipulate signals based on operational instructions. The processor 902 may be configured to, among other capabilities, obtain and execute computer-readable instructions stored in the memory 904, mass storage device 903, or other computer-readable medium, such as program code of the operating system 905, program code of the application programs 906, program code of other programs 907, and so forth.

Memory 904 and mass storage device 903 are examples of computer-readable storage media for storing instructions that are executed by processor 902 to implement the various functions as previously described. For example, the memory 904 may generally include volatile memory and non-volatile memory. In addition, mass storage devices 903 may generally include hard drives, solid state drives, removable media, and the like. The memory 904 and the mass storage device 903 may both be referred to collectively as memory or computer-readable storage media in the present application, and may be non-transitory media capable of storing computer-readable, processor-executable program instructions as computer program code that may be executed by the processor 902 as a particular machine configured to implement the operations and functions described in the examples of this application.

A number of programs may be stored on mass storage device 903. These programs include an operating system 905, one or more application programs 906, other programs 907, and program data 908, and may be loaded into the memory 904 for execution. Examples of such application programs or program modules may include, for example, computer program logic (e.g., computer program code or instructions) for implementing the following components/functions: the methods provided by the present application (including any suitable steps of the methods) and/or additional embodiments described herein.

Although illustrated in fig. 6 as being stored in memory 904 of electronic device 900, operating system 905, one or more application programs 906, other programs 907, and program data 908, or portions thereof, may be implemented using any form of computer readable media accessible by electronic device 900. Herein, a computer-readable medium may be any available computer-readable storage medium or communication medium that can be accessed by a computer.

Communication media includes, for example, computer readable instructions, data structures, program modules, or other data in a communication signal that is transferred from one system to another system. The communication medium may include a conductive transmission medium, as well as a wireless medium capable of propagating energy waves. Computer readable instructions, data structures, program modules, or other data may be embodied as a modulated data signal, for example, in a wireless medium. The modulation may be analog, digital or hybrid modulation techniques.

By way of example, computer-readable storage media may include volatile and nonvolatile, removable and non-removable media implemented in any method or technology for storage of information such as computer-readable instructions, data structures, program modules or other data. For example, computer-readable storage media include, but are not limited to, volatile memory, such as random access memory; and nonvolatile memory such as flash memory, various read only memories, magnetic and ferromagnetic/ferroelectric memory; magnetic and optical storage devices; or other known media or later developed computer-readable information/data that may be stored for use by a computer system.

One or more communication interfaces 909 are used to exchange data with other devices, such as via a network, direct connection, or the like. Such communication interfaces may be one or more of the following: any type of network interface, wired or wireless interface, wi-MAX interface, ethernet interface, universal serial bus interface, cellular network interface, bluetooth interface, NFC interface, etc. Communication interface 909 may facilitate communication within a variety of networks and protocol types, including wired and wireless networks, the internet, and the like. Communication interface 909 may also provide for communication with external storage devices (not shown) such as in a storage array, network attached storage, storage area network, or the like.

In some examples, a display device 901, such as a monitor, may be included for displaying information and images to a user. Other I/O devices 910 may be devices that receive various inputs from a user and provide various outputs to the user, and may include touch input devices, gesture input devices, cameras, keyboards, remote controls, mice, printers, audio input/output devices, and so on. The technical solutions described herein may be supported by these various configurations of the electronic device 900 and are not limited to the specific examples of the technical solutions described herein.

While the application has been illustrated and described in detail in the drawings and foregoing description, such illustration and description are to be considered illustrative and schematic and not restrictive; it will be evident to those skilled in the art that the application is not limited to the details of the foregoing illustrative embodiments, and that the present application may be embodied in other specific forms without departing from the spirit or essential characteristics thereof.

The scope of the application is, therefore, indicated by the appended claims rather than by the foregoing description, and all changes which come within the meaning and range of equivalency of the claims are therefore intended to be embraced therein. Any reference sign in a claim should not be construed as limiting the claim concerned. Furthermore, it is evident that the word "comprising" does not exclude other elements or steps, and that the singular does not exclude a plurality. A plurality of units or means recited in the apparatus claims can also be implemented by means of one unit or means in software or hardware.

Claims

1. A face rendering method based on a generated model, the method comprising:

wherein the rendering auxiliary information comprises a target effect face texture image,

wherein, step S2 includes:

s21, extracting an original face image from the original face video frame;

s23, replacing the original face image in the original face video frame by the repaired face image to obtain the repaired face video frame,

and wherein the face restoration generation model comprises:

the appearance encoder is used for encoding the input original face image; and

2. The method of claim 1, wherein the rendering side information further comprises face depth information and/or illumination information.

3. The method of claim 2, wherein the face depth information comprises depth buffer data of the rendering pipeline, and the illumination information comprises position, intensity, and/or color information of a light source.

4. A method according to any one of claims 1 to 3, comprising performing step S2 in sequence for each raw face video frame generated by the rendering pipeline.

5. The method of claim 1, wherein the face restoration generation model further comprises:

6. A face rendering apparatus based on a generative model, the apparatus comprising:

wherein the repair unit includes:

a generating unit for replacing the original face image in the original face video frame with the restored face image to obtain a restored face video frame,

and wherein the face restoration generation model comprises:

the appearance encoder is used for encoding the input original face image; and

7. An electronic device, the electronic device comprising:

one or more processors;

a memory for storing executable instructions;

the one or more processors are configured to implement the method of any one of claims 1 to 5 via the executable instructions.

8. A computer readable storage medium having stored thereon a computer program which, when executed by a processor, causes the processor to perform the method of any of claims 1 to 5.