CN112562027A

CN112562027A - Face model generation method and device, electronic equipment and storage medium

Info

Publication number: CN112562027A
Application number: CN202011400827.3A
Authority: CN
Inventors: 王迪
Original assignee: Beijing Baidu Netcom Science and Technology Co Ltd
Current assignee: Beijing Baidu Netcom Science and Technology Co Ltd
Priority date: 2020-12-02
Filing date: 2020-12-02
Publication date: 2021-03-26

Abstract

The application discloses a method and a device for generating a face model, electronic equipment and a storage medium, and relates to the technical field of computers, in particular to the technical field of artificial intelligence such as computer vision, deep learning and augmented reality. The specific implementation scheme is as follows: acquiring a first face model and a plurality of candidate expression information, wherein the first face model comprises: static feature information of the human face; determining texture features of the first face model according to the static feature information; determining a change feature map of each candidate expression information relative to standard expression information; and synthesizing according to the texture features, the change feature map and the first face model to obtain a second face model, so that the generated face model has dynamic detail information of various expressions, and the representation effect of the face model is effectively improved.

Description

Face model generation method and device, electronic equipment and storage medium

Technical Field

The application relates to the technical field of computers, in particular to the technical field of artificial intelligence such as computer vision, deep learning and augmented reality, and particularly relates to a method and a device for generating a face model, electronic equipment and a storage medium.

Background

Artificial intelligence is the subject of research that makes computers simulate some human mental processes and intelligent behaviors (such as learning, reasoning, thinking, planning, etc.), both at the hardware level and at the software level. Artificial intelligence hardware technologies generally include technologies such as sensors, dedicated artificial intelligence chips, cloud computing, distributed storage, big data processing, and the like; the artificial intelligence software technology mainly comprises a computer vision technology, a voice recognition technology, a natural language processing technology, machine learning/deep learning, a big data processing technology, a knowledge map technology and the like.

In the related art, for example, a three-dimensional scanner may be used to generate a face model, or a sparse multi-machine-position may be used to acquire two-dimensional images, so that a face model is synthesized from the two-dimensional images acquired by a plurality of machine positions.

Disclosure of Invention

A method, an apparatus, an electronic device, a storage medium and a computer program product for generating a face model are provided.

According to a first aspect, a method for generating a face model is provided, which includes: acquiring a first face model and a plurality of candidate expression information, wherein the first face model comprises: static feature information of the human face;

determining texture features of the first face model according to the static feature information; determining a change feature map of each candidate expression information relative to standard expression information; and synthesizing according to the texture features, the change feature map and the first face model to obtain a second face model.

According to a second aspect, there is provided an apparatus for generating a face model, comprising: the obtaining module is used for obtaining a first face model and a plurality of candidate expression information, wherein the first face model comprises: static feature information of the human face; the first determining module is used for determining the texture features of the first face model according to the static feature information; the second determination module is used for determining a change feature map of each candidate expression information relative to standard expression information; and the synthesis module is used for synthesizing a second face model according to the texture features, the change feature map and the first face model.

According to a third aspect, there is provided an electronic device comprising: at least one processor; and a memory communicatively coupled to the at least one processor; the memory stores instructions executable by the at least one processor, and the instructions are executed by the at least one processor, so that the at least one processor can execute the method for generating the face model according to the embodiment of the application.

According to a fourth aspect, a non-transitory computer-readable storage medium is proposed, in which computer instructions are stored, the computer instructions being configured to cause the computer to perform the method for generating a face model disclosed in the embodiments of the present application.

According to a fifth aspect, a computer program product is proposed, comprising a computer program, which when executed by a processor, implements the method for generating a face model disclosed in embodiments of the present application.

It should be understood that the statements in this section do not necessarily identify key or critical features of the embodiments of the present application, nor do they limit the scope of the present application. Other features of the present application will become apparent from the following description.

Drawings

The drawings are included to provide a better understanding of the present solution and are not intended to limit the present application. Wherein:

FIG. 1 is a schematic diagram according to a first embodiment of the present application;

FIG. 2 is a schematic diagram according to a second embodiment of the present application;

FIG. 3 is a schematic illustration according to a third embodiment of the present application;

FIG. 4 is a schematic illustration according to a fourth embodiment of the present application;

fig. 5 is a block diagram of an electronic device for implementing a method for generating a face model according to an embodiment of the present application.

Detailed Description

The following description of the exemplary embodiments of the present application, taken in conjunction with the accompanying drawings, includes various details of the embodiments of the application for the understanding of the same, which are to be considered exemplary only. Accordingly, those of ordinary skill in the art will recognize that various changes and modifications of the embodiments described herein can be made without departing from the scope and spirit of the present application. Also, descriptions of well-known functions and constructions are omitted in the following description for clarity and conciseness.

Fig. 1 is a schematic diagram according to a first embodiment of the present application.

It should be noted that an execution subject of the method for generating a face model according to this embodiment is a device for generating a face model, where the device may be implemented in a software and/or hardware manner, and the device may be configured in an electronic device, and the electronic device may include, but is not limited to, a terminal, a server, and the like.

The embodiment of the application relates to the technical field of artificial intelligence such as computer vision, deep learning and augmented reality.

Wherein, Artificial Intelligence (Artificial Intelligence), english is abbreviated as AI. The method is a new technical science for researching and developing theories, methods, technologies and application systems for simulating, extending and expanding human intelligence.

Deep learning is the intrinsic law and expression level of the learning sample data, and the information obtained in the learning process is very helpful for the interpretation of data such as characters, images and sounds. The final goal of deep learning is to make a machine capable of human-like analytical learning, and to recognize data such as characters, images, and sounds.

Augmented Reality (AR) technology is a technology that skillfully fuses virtual information and the real world, and a plurality of technical means such as multimedia, three-dimensional modeling, real-time tracking and registration, intelligent interaction, sensing and the like are widely applied, and virtual information such as characters, images, three-dimensional models, music, videos and the like generated by a computer is applied to the real world after analog simulation, and the two kinds of information complement each other, so that the real world is enhanced.

Computer vision means that a camera and a computer are used to replace human eyes to perform machine vision such as identification, tracking and measurement on a target, and further image processing is performed, so that the computer processing becomes an image more suitable for human eye observation or transmitted to an instrument for detection.

As shown in fig. 1, the method for generating a face model includes:

s101: acquiring a first face model and a plurality of candidate expression information, wherein the first face model comprises: static feature information of a human face.

The face model may specifically be a three-dimensional model of a target face, and in the process of generating a face model with dynamic detail information of multiple expressions, the referred face model only containing static feature information of the face may be referred to as a first face model, and correspondingly, the subsequently generated face model with dynamic detail information of multiple expressions may be referred to as a second face model.

The static feature information of the face is used to describe information related to static features that do not change with the expression in the face, where the features are static wrinkles, and static means that the wrinkles do not change with the expression, and the static feature information may be, for example, the size, the position, and the like of the static features that do not change with the expression in the face, which is not limited herein.

When the first face model with the static feature information of the face is obtained, for example, a plurality of two-dimensional face images and the weight corresponding to each face image are input into the artificial intelligent model, the prediction parameters are adjusted, the face mixed shape is input, and the initial three-dimensional face model is calculated, where the initial three-dimensional face model only contains the static feature information of the face, and the initial three-dimensional face model may be referred to as the first face model.

After the first face model with the static feature information of the face is obtained, the face model with the dynamic detail information of various expressions can be further synthesized by combining some candidate expression information according to the static feature information of the face.

The above candidate expression information may be, for example, information related to some candidate expressions obtained through modeling, where the candidate expressions are, for example, happy, angry, sad, and the like, and the related information is, for example, when a happy candidate expression is mapped in a two-dimensional face image, the corresponding size, position, and the like of a face part, for example, when the candidate expression is happy, the face part is mapped in the two-dimensional face image, the mouth part of the face may be a mouth corner, the eye part may be a crescent shape, and the distance between the eye corner and the mouth corner may be closer than that of a face not having a happy expression, information describing the mouth corner rise and the eye part to be a crescent shape may be referred to as related information, and information related to the candidate expression information may be referred to as candidate expression information.

After the first face model and the candidate facial expression information are obtained, the corresponding second face model can be synthesized according to the obtained first face model and the candidate facial expression information.

S102: and determining the texture features of the first face model according to the static feature information.

The texture features can be used for reflecting visual features of homogeneity phenomena in images, represent the surface structure organization arrangement attributes of the object surface with slow change or periodic change, are different from image features such as gray scale, color and the like, and can be represented by gray scale distribution of pixels and surrounding space neighborhoods.

That is to say, in the embodiment of the present application, after the static feature information of the face is determined, the texture feature of the first face model may also be determined according to the static feature information.

In some embodiments, in determining the texture feature of the first face model based on the static feature information, the static feature information may be coordinate-converted to obtain texture map coordinates in a texture map coordinate space, and the texture feature corresponding to the first face model may be determined.

Where the texture map coordinate space is referred to as the UV coordinate space, where UV corresponds to XY in a planar coordinate system, i.e. U and V represent the U and V axes, which are similar to the X, Y and Z axes of the spatial model, the texture map UV coordinates of the UV coordinate space can be used to define information about the position of each point on the picture, which points are interrelated with the first face model and can be used to determine the position of the surface texture map, which texture map UV coordinates are the exact correspondence of each point on the image to the surface of the model object.

Of course, the determining of the texture feature of the first face model according to the static feature information may also be implemented in any other possible manner, such as a modeling manner, and the like, without limitation.

The texture feature of the first face model is determined according to the static feature information, and the static feature information can be subjected to coordinate conversion to obtain texture mapping coordinates in a texture mapping coordinate space and used as the texture feature of the first face model, so that the subsequent modeling processing process of the face model aiming at dynamic detail information with multiple expressions can be facilitated, the candidate expression information and the first face model are efficiently and accurately fused, the synthesis process of the model is more coherent, and the synthesis effect of the face model is improved.

S103: and determining a change characteristic diagram of each candidate expression information relative to the standard expression information.

After determining the texture features of the first face model according to the static feature information, a change feature map of each candidate expression information relative to standard expression information may be determined, where the standard expression is the emotion of the face in a natural state, and information related to the standard expression may be referred to as standard expression information.

In some embodiments, determining the change feature map of each candidate expression information with respect to the standard expression information may be respectively performing coordinate transformation on each candidate expression information to obtain a plurality of candidate expression coordinates corresponding to the texture map in the coordinate space, performing coordinate transformation on the standard expression information to obtain a standard expression coordinate corresponding to the texture map in the coordinate space, determining the change feature of each candidate expression coordinate with respect to the standard expression coordinate, and generating the change feature map according to the change feature.

The change characteristic graph can be used for describing the change situation of various candidate expression information relative to the standard expression information, and can be represented in the form of a graph.

That is, each candidate expression information and each standard expression information can be converted into a candidate expression coordinate and a standard expression coordinate in the texture map coordinate space, then, corresponding mathematical operation can be performed on the candidate expression coordinate and the standard expression coordinate to obtain the change feature of each candidate expression coordinate relative to the standard expression coordinate, and then, a corresponding change feature map is formed according to the change feature.

In this way, each candidate expression information and the standard expression information are converted into the texture map coordinate space, so that the change characteristics can be presented in the texture map coordinate dimension, the presentation effect of each candidate expression information relative to the change characteristics of the standard expression information is improved, and the change characteristics of the texture map coordinate dimension and the first human face model can be conveniently and efficiently fused in the following process.

Of course, any other possible manner may be adopted to determine the variation feature map of each candidate expression information with respect to the standard expression information, such as a modeling manner, a mathematical operation manner, and the like, which is not limited in this respect.

S104: and synthesizing according to the texture features, the change feature map and the first face model to obtain a second face model.

After determining the texture feature of the first face model according to the static feature information and determining the change feature map of each candidate expression information relative to the standard expression information, the second face model can be obtained by synthesizing the texture feature, the change feature map and the first face model.

For example, the texture feature and the change feature map may be directly fit into the first face model by using a model processing method, and the change feature map is determined according to a change condition of each candidate expression information with respect to the standard expression information, so that the second face model can carry dynamic detail information of multiple expressions, which is not limited.

For another example, the texture features, the variation feature map and the first face model may be input into a pre-trained model, so as to obtain a second face model which is output by the model and can carry dynamic detail information of multiple expressions.

In the embodiment of the present application, the texture features and the change feature map may be input to a pre-trained face generation model to obtain a shift mapping output by the face generation model and corresponding to each candidate expression information, and the corresponding shift mapping and the first face model are subjected to synthesis processing to obtain a second face model.

In this embodiment, by obtaining the first face model and the multiple candidate expression information, the first face model includes: the method comprises the steps of obtaining static feature information of a face, determining textural features of a first face model according to the static feature information, determining a change feature map of each candidate expression information relative to standard expression information, and synthesizing a second face model according to the textural features, the change feature map and the first face model, so that the generated face model has dynamic detail information of multiple expressions, and the representation effect of the face model is effectively improved.

Fig. 2 is a schematic diagram according to a second embodiment of the present application.

As shown in fig. 2, the method for generating a face model includes:

s201: acquiring a first face model and a plurality of candidate expression information, wherein the first face model comprises: static feature information of a human face.

S202: and determining the texture features of the first face model according to the static feature information.

S203: and determining a change characteristic diagram of each candidate expression information relative to the standard expression information.

S201 to S203 may refer to the above embodiments specifically, and are not described herein again.

S204: and inputting the texture features and the change feature map into a pre-trained face generation model to obtain a shift mapping corresponding to each candidate expression information output by the face generation model.

The face generation model is obtained by training based on massive training data in advance, so that a good processing effect can be achieved, and the displacement map (namely, the displacement map, DP) can show the stereoscopic vision effect of each candidate expression mapped in the three-dimensional face model, so that the synthesized face model has a better expression capability on each candidate expression, and the second face model is more vivid and lifelike.

The face generation model may specifically be, for example, a neural network model in artificial intelligence, or a machine learning model, and the like, which is not limited thereto.

Optionally, in some embodiments, the texture features and the variation feature map are input to a pre-trained face generation model to obtain at least one predicted shift map corresponding to each candidate expression information output by the face generation model; determining a loss value between the predicted shift map and the calibrated shift map; and if the loss value meets the reference loss threshold, the predicted shift mapping is used as the corresponding shift mapping, so that the shift mapping corresponding to each candidate expression information can be quickly determined, and the generation effect of the shift mapping is improved.

The reference loss threshold is a loss value calculated by a loss function corresponding to the face generation model when the face generation model is trained to be converged, wherein the calibration shift mapping refers to a reference shift mapping which is obtained by calibration in advance and corresponds to the candidate expression information.

That is to say, in the embodiment of the application, the pre-trained face generation model performs image translation on the texture features and the change feature maps to obtain some candidate predicted shift maps, and then, the shift map corresponding to each candidate expression information may be selected according to a comparison condition between a loss value between each predicted shift map and the calibrated shift map and a reference loss threshold.

S205: and acquiring the synthetic weight which is output by the face generation model and corresponds to the shift mapping.

In the above image translation of the texture feature and the change feature map by the pre-trained face generation model, the synthesis weight corresponding to the shift mapping can also be predicted by the face generation model, wherein the synthesis weight is used as a reference when synthesizing the second face model.

S206: and synthesizing the corresponding shift mapping and the first face model according to the synthesis weight to obtain a second face model.

That is to say, in the embodiment of the application, not only the pre-trained face generation model performs image translation on the texture features and the change feature map to obtain the shift maps corresponding to each candidate expression information, but also the synthesis weights corresponding to each shift map can be determined, so that when the second face model is obtained by synthesis, the corresponding shift maps and the first face model are synthesized according to the synthesis weights, and the expression detail information can be more visualized in the face model and more naturally represent the detail information of various expressions.

In this embodiment, by obtaining the first face model and the multiple candidate expression information, the first face model includes: the method comprises the steps of obtaining static feature information of a face, determining textural features of a first face model according to the static feature information, determining a change feature map of each candidate expression information relative to standard expression information, and synthesizing a second face model according to the textural features, the change feature map and the first face model, so that the generated face model has dynamic detail information of multiple expressions, and the representation effect of the face model is effectively improved. The face generation model is obtained by training based on massive training data in advance, so that a good processing effect can be achieved, and the displacement mapping can show a stereoscopic vision effect that each candidate expression is mapped in the three-dimensional face model, so that the synthesized face model has a good expression capacity on each candidate expression, and the second face model is more vivid.

Fig. 3 is a schematic diagram according to a third embodiment of the present application.

As shown in fig. 3, the apparatus 30 for generating a face model includes:

an obtaining module 301, configured to obtain a first face model and multiple candidate expression information, where the first face model includes: static feature information of the human face;

a first determining module 302, configured to determine a texture feature of the first face model according to the static feature information;

a second determining module 303, configured to determine a variation feature map of each candidate expression information with respect to the standard expression information;

and a synthesizing module 304, configured to synthesize a second face model according to the texture features, the variation feature map, and the first face model.

In some embodiments of the present application, as shown in fig. 4, the generating device 40 of the face model includes: an obtaining module 401, a first determining module 402, a second determining module 403, and a synthesizing module 404, where the synthesizing module 404 includes:

the generating submodule 4041 is used for inputting the texture features and the change feature map into a pre-trained face generation model so as to obtain a shift mapping output by the face generation model and corresponding to each candidate expression information;

and a synthesizing submodule 4042, configured to perform synthesizing processing on the corresponding shift mapping and the first face model to obtain a second face model.

In some embodiments of the present application, as shown in fig. 4, wherein the synthesizing module 404 further comprises:

the obtaining sub-module 4043 is configured to obtain a synthesis weight output by the face generation model and corresponding to the shift mapping, where the synthesis sub-module is specifically configured to perform synthesis processing on the corresponding shift mapping and the first face model according to the synthesis weight to obtain a second face model.

In some embodiments of the present application, the first determining module 402 is specifically configured to:

and performing coordinate conversion on the static feature information to obtain texture mapping coordinates in a texture mapping coordinate space, and taking the texture mapping coordinates as the texture features of the first face model.

In some embodiments of the present application, the second determining module 403 is specifically configured to:

respectively carrying out coordinate conversion on each candidate expression information to obtain a plurality of candidate expression coordinates in a coordinate space corresponding to the texture map;

performing coordinate conversion on the standard expression information to obtain a standard expression coordinate corresponding to the texture mapping coordinate space;

determining the change characteristics of each candidate expression coordinate relative to the standard expression coordinate; and

and generating a change characteristic map according to the change characteristics.

In some embodiments of the present application, the sub-module 4041 is specifically configured to:

inputting the texture features and the change feature map into a pre-trained face generation model to obtain at least one predicted displacement map which is output by the face generation model and corresponds to each candidate expression information;

determining a loss value between the predicted shift map and the calibrated shift map;

and if the loss value meets the reference loss threshold value, taking the predicted shift map as a corresponding shift map.

It is understood that the generating apparatus 40 of the face model in fig. 4 of the present embodiment and the generating apparatus 30 of the face model in the above-mentioned embodiment, the obtaining module 401 and the obtaining module 301 in the above-mentioned embodiment, the first determining module 402 and the first determining module 302 in the above-mentioned embodiment, the second determining module 403 and the second determining module 303 in the above-mentioned embodiment, and the synthesizing module 404 and the synthesizing module 304 in the above-mentioned embodiment may have the same functions and structures.

It should be noted that the explanation of the generation method of the face model described above is also applicable to the generation apparatus of the face model of the present embodiment, and details are not repeated here.

According to an embodiment of the present application, an electronic device and a readable storage medium are also provided.

As shown in fig. 5, fig. 5 is a block diagram of an electronic device for implementing a method for generating a face model according to an embodiment of the present application. Electronic devices are intended to represent various forms of digital computers, such as laptops, desktops, workstations, personal digital assistants, servers, blade servers, mainframes, and other appropriate computers. The electronic device may also represent various forms of mobile devices, such as personal digital processing, cellular phones, smart phones, wearable devices, and other similar computing devices. The components shown herein, their connections and relationships, and their functions, are meant to be examples only, and are not meant to limit implementations of the present application that are described and/or claimed herein.

As shown in fig. 5, the electronic apparatus includes: one or more processors 501, memory 502, and interfaces for connecting the various components, including high-speed interfaces and low-speed interfaces. The various components are interconnected using different buses and may be mounted on a common motherboard or in other manners as desired. The processor may process instructions for execution within the electronic device, including instructions stored in or on the memory to display graphical information of a GUI on an external input/output apparatus (such as a display device coupled to the interface). In other embodiments, multiple processors and/or multiple buses may be used, along with multiple memories and multiple memories, as desired. Also, multiple electronic devices may be connected, with each device providing portions of the necessary operations (e.g., as a server array, a group of blade servers, or a multi-processor system). In fig. 5, one processor 501 is taken as an example.

Memory 502 is a non-transitory computer readable storage medium as provided herein. The memory stores instructions executable by at least one processor, so that the at least one processor executes the method for generating the face model provided by the application. The non-transitory computer-readable storage medium of the present application stores computer instructions for causing a computer to execute the method of generating a face model provided herein.

The memory 502, which is a non-transitory computer-readable storage medium, may be used to store non-transitory software programs, non-transitory computer-executable programs, and modules, such as program instructions/modules (e.g., the obtaining module 401, the first determining module 402, the second determining module 403, and the synthesizing module 404 shown in fig. 4) corresponding to the generation method of the face model in the embodiment of the present application. The processor 501 executes various functional applications and data processing of the server by running non-transitory software programs, instructions and modules stored in the memory 502, that is, the method for generating the face model in the above method embodiment is realized.

The memory 502 may include a storage program area and a storage data area, wherein the storage program area may store an operating system, an application program required for at least one function; the storage data area may store data created according to use of an electronic device that performs the generation method of the face model, and the like. Further, the memory 502 may include high speed random access memory, and may also include non-transitory memory, such as at least one magnetic disk storage device, flash memory device, or other non-transitory solid state storage device. In some embodiments, memory 502 optionally includes memory located remotely from processor 501, which may be connected to an electronic device via a network. Examples of such networks include, but are not limited to, the internet, intranets, local area networks, mobile communication networks, and combinations thereof.

The electronic device may further include: an input device 503 and an output device 504. The processor 501, the memory 502, the input device 503 and the output device 504 may be connected by a bus or other means, and fig. 5 illustrates the connection by a bus as an example.

The input device 503 may receive input numeric or character information and generate key signal inputs related to user settings and function control of the electronic apparatus, such as a touch screen, a keypad, a mouse, a track pad, a touch pad, a pointing stick, one or more mouse buttons, a track ball, a joystick, or other input devices. The output devices 504 may include a display device, auxiliary lighting devices (e.g., LEDs), and haptic feedback devices (e.g., vibrating motors), among others. The display device may include, but is not limited to, a Liquid Crystal Display (LCD), a Light Emitting Diode (LED) display, and a plasma display. In some implementations, the display device can be a touch screen.

Various implementations of the systems and techniques described here can be realized in digital electronic circuitry, integrated circuitry, application specific ASICs (application specific integrated circuits), computer hardware, firmware, software, and/or combinations thereof. These various embodiments may include: implemented in one or more computer programs that are executable and/or interpretable on a programmable system including at least one programmable processor, which may be special or general purpose, receiving data and instructions from, and transmitting data and instructions to, a storage system, at least one input device, and at least one output device.

These computer programs (also known as programs, software applications, or code) include machine instructions for a programmable processor, and may be implemented using high-level procedural and/or object-oriented programming languages, and/or assembly/machine languages. As used herein, the terms "machine-readable medium" and "computer-readable medium" refer to any computer program product, apparatus, and/or device (e.g., magnetic discs, optical disks, memory, Programmable Logic Devices (PLDs)) used to provide machine instructions and/or data to a programmable processor, including a machine-readable medium that receives machine instructions as a machine-readable signal. The term "machine-readable signal" refers to any signal used to provide machine instructions and/or data to a programmable processor.

To provide for interaction with a user, the systems and techniques described here can be implemented on a computer having: a display device (e.g., a CRT (cathode ray tube) or LCD (liquid crystal display) monitor) for displaying information to a user; and a keyboard and a pointing device (e.g., a mouse or a trackball) by which a user can provide input to the computer. Other kinds of devices may also be used to provide for interaction with a user; for example, feedback provided to the user can be any form of sensory feedback (e.g., visual feedback, auditory feedback, or tactile feedback); and input from the user may be received in any form, including acoustic, speech, or tactile input.

The systems and techniques described here can be implemented in a computing system that includes a back-end component (e.g., as a data server), or that includes a middleware component (e.g., an application server), or that includes a front-end component (e.g., a user computer having a graphical user interface or a web browser through which a user can interact with an implementation of the systems and techniques described here), or any combination of such back-end, middleware, or front-end components. The components of the system can be interconnected by any form or medium of digital data communication (e.g., a communication network). Examples of communication networks include: local Area Networks (LANs), Wide Area Networks (WANs), the internet, and blockchain networks.

The computer system may include clients and servers. A client and server are generally remote from each other and typically interact through a communication network. The relationship of client and server arises by virtue of computer programs running on the respective computers and having a client-server relationship to each other. The Server can be a cloud Server, also called a cloud computing Server or a cloud host, and is a host product in a cloud computing service system, so as to solve the defects of high management difficulty and weak service expansibility in the traditional physical host and VPS service ("Virtual Private Server", or simply "VPS"). The server may also be a server of a distributed system, or a server incorporating a blockchain.

It should be understood that various forms of the flows shown above may be used, with steps reordered, added, or deleted. For example, the steps described in the present application may be executed in parallel, sequentially, or in different orders, and the present invention is not limited thereto as long as the desired results of the technical solutions disclosed in the present application can be achieved.

The above-described embodiments should not be construed as limiting the scope of the present application. It should be understood by those skilled in the art that various modifications, combinations, sub-combinations and substitutions may be made in accordance with design requirements and other factors. Any modification, equivalent replacement, and improvement made within the spirit and principle of the present application shall be included in the protection scope of the present application.

Claims

1. A method for generating a face model comprises the following steps:

acquiring a first face model and a plurality of candidate expression information, wherein the first face model comprises: static feature information of the human face;

determining texture features of the first face model according to the static feature information;

determining a change feature map of each candidate expression information relative to standard expression information; and

and synthesizing according to the texture features, the change feature map and the first face model to obtain a second face model.

2. The method of claim 1, wherein the synthesizing from the texture features, the variation feature map, and the first face model to derive a second face model comprises:

inputting the texture features and the change feature map into a pre-trained face generation model to obtain a shift mapping output by the face generation model and corresponding to each candidate expression information; and

and synthesizing the corresponding shift mapping and the first face model to obtain the second face model.

3. The method of claim 2, further comprising, after the inputting the texture features and the variation feature map to a pre-trained face generation model:

and acquiring a synthesis weight output by the face generation model and corresponding to the shift mapping, wherein the corresponding shift mapping and the first face model are synthesized according to the synthesis weight to obtain the second face model.

4. The method of claim 1, wherein the determining the textural features of the first face model from the static feature information comprises:

5. The method of claim 4, wherein the determining a variation feature map of each of the candidate expression information relative to standard expression information comprises:

performing coordinate conversion on each candidate expression information to obtain a plurality of candidate expression coordinates corresponding to the texture map coordinate space;

determining a variation characteristic of each candidate expression coordinate relative to the standard expression coordinate; and

and generating the change characteristic graph according to the change characteristics.

6. The method of claim 2, wherein the inputting the texture feature and the variation feature map into a pre-trained face generation model to obtain a shift map corresponding to each candidate expression information output by the face generation model comprises:

inputting the texture features and the change feature map into a pre-trained face generation model to obtain at least one predicted shift mapping output by the face generation model and corresponding to each candidate expression information;

and if the loss value meets a reference loss threshold, taking the predicted shift map as the corresponding shift map.

7. An apparatus for generating a face model, comprising:

the obtaining module is used for obtaining a first face model and a plurality of candidate expression information, wherein the first face model comprises: static feature information of the human face;

the first determining module is used for determining the texture features of the first face model according to the static feature information;

the second determination module is used for determining a change feature map of each candidate expression information relative to standard expression information; and

and the synthesis module is used for synthesizing a second face model according to the texture features, the change feature map and the first face model.

8. The apparatus of claim 7, wherein the synthesis module comprises:

the generating submodule is used for inputting the texture features and the change feature map into a pre-trained face generating model so as to obtain a shift mapping output by the face generating model and corresponding to each candidate expression information; and

and the synthesis submodule is used for carrying out synthesis processing on the corresponding shift mapping and the first face model so as to obtain the second face model.

9. The apparatus of claim 8, wherein the synthesis module further comprises:

and the obtaining sub-module is configured to obtain a synthesis weight corresponding to the shift mapping output by the face generation model, and the synthesizing sub-module is specifically configured to perform synthesis processing on the corresponding shift mapping and the first face model according to the synthesis weight to obtain the second face model.

10. The apparatus of claim 7, wherein the first determining module is specifically configured to:

11. The apparatus of claim 10, wherein the second determining module is specifically configured to:

12. The apparatus according to claim 8, wherein the generation submodule is specifically configured to:

13. An electronic device, comprising:

at least one processor; and

a memory communicatively coupled to the at least one processor; wherein the content of the first and second substances,

the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the method of any one of claims 1-6.

14. A non-transitory computer readable storage medium having stored thereon computer instructions for causing the computer to perform the method of any one of claims 1-6.

15. A computer program product comprising a computer program which, when executed by a processor, implements the method of any one of claims 1-6.