WO2022147736A1

WO2022147736A1 - Virtual image construction method and apparatus, device, and storage medium

Info

Publication number: WO2022147736A1
Application number: PCT/CN2021/070727
Authority: WO
Inventors: 谢新林
Original assignee: 广州视源电子科技股份有限公司
Priority date: 2021-01-07
Filing date: 2021-01-07
Publication date: 2022-07-14
Also published as: CN115335865A

Abstract

Embodiments of the present application relate to the technical field of image processing. Disclosed are a virtual image construction method and apparatus, a device, and a storage medium. The method comprises: obtaining current frame image data, the current frame image data comprising a face image of a target object; constructing a human face neutral expression base and a plurality of face personalized expression bases of the target object according to the current frame image data; constructing a three-dimensional face model of the target object according to the face neutral expression base and the plurality of face personalized expression bases; when the three-dimensional face model is mapped to the face image, determining pose parameters of the three-dimensional face model and weight coefficients of the face personalized expression bases; and sending the pose parameters and the weight coefficients to a remote device, so that the remote device generates, according to the pose parameters and the weight coefficients, a virtual image corresponding to the face image. By using the method, the technical problem in the related art of lagging caused by transmitting a real face image can be solved.

Description

Virtual image construction method, device, device and storage medium

technical field

The embodiments of the present application relate to the technical field of image processing, and in particular, to a method, apparatus, device, and storage medium for constructing a virtual image.

Background technique

With the development of network communication technology, users can enjoy network communication resources such as video calls, cloud classrooms, and cloud conferences without leaving home. At present, when using network communication technology for video communication, both parties to the call can see the current face image of the other party. In addition, in order to improve the video communication experience of both parties in the call, high-definition images will be transmitted, so that both parties in the call can clearly see the corresponding face images. For example, in a cloud classroom or cloud conference, the high-definition face image of the speaker on the presenter side will be sent to other devices, so that users of other devices can view the high-definition face image. This method has the following defects: the transmission of high-definition images has high requirements on the network bandwidth, and when the network bandwidth is limited, the phenomenon of freezing is likely to occur.

SUMMARY OF THE INVENTION

Embodiments of the present application provide a virtual image construction method, apparatus, device, and storage medium, so as to solve the technical problem of the stuck phenomenon caused by the transmission of real face images in the related art.

In a first aspect, an embodiment of the present application provides a method for constructing a virtual image, including:

Acquiring current frame image data, the current frame image data comprising the face image of the target object;

Construct a neutral facial expression base and a plurality of personalized facial expression bases of the target object according to the current frame image data;

Constructing a three-dimensional face model of the target object according to the neutral facial expression base and a plurality of individual facial expression bases;

determining the pose parameters of the three-dimensional human face model when the three-dimensional human face model is mapped to the human face image and the weight coefficients of each of the individualized expression bases of the human face;

Sending the pose parameters and the weight coefficients to a remote device, so that the remote device generates a virtual image corresponding to the face image according to the pose parameters and the weight coefficients.

In a second aspect, an embodiment of the present application further provides a virtual image construction device, including:

an image acquisition module for acquiring current frame image data, where the current frame image data includes a face image of a target object;

an expression base building module for constructing a neutral facial expression base and a plurality of individualized facial expression bases of the target object according to the current frame image data;

a face model building module, used for constructing a three-dimensional face model of the target object according to the neutral facial expression base and a plurality of individual facial expression bases;

a parameter determination module, configured to determine the pose parameters of the three-dimensional human face model when the three-dimensional human face model is mapped to the human face image and the weight coefficients of the individualized expression bases of the human face;

A parameter sending module, configured to send the pose parameters and the weight coefficients to a remote device, so that the remote device generates images corresponding to the face according to the pose parameters and the weight coefficients virtual image.

In a third aspect, an embodiment of the present application further provides a virtual image construction device, including:

one or more processors;

memory for storing one or more programs;

When the one or more programs are executed by the one or more processors, the one or more processors implement the virtual image construction method as described in the first aspect.

In a fourth aspect, an embodiment of the present application further provides a computer-readable storage medium on which a computer program is stored, and when the program is executed by a processor, implements the virtual image construction method described in the first aspect.

The above-mentioned virtual image construction method, device, equipment and storage medium, by acquiring the current frame image data including the target object's face image, and constructing the target object's neutral facial expression base and a plurality of facial personalized expressions according to the current frame image data After that, a 3D face model is constructed according to the neutral face expression base and multiple face personalized expression bases. After that, the weight coefficient and pose parameters when the 3D face model is mapped to the face image are determined. The pose parameters and weight coefficients are sent to the remote device, so that the remote device can display the virtual image corresponding to the face image through the pose parameters and weight coefficients, which solves the problem caused by the transmission of real face images in the related art. Caton technical issues. Due to the transmission of the weight coefficients of the face personalized expression base and the pose parameters of the three-dimensional face model, the demand for network bandwidth is greatly reduced, and it is especially suitable for remote video communication scenarios. In addition, the transmitted weight coefficients and pose parameters can enable the remote device to display the corresponding virtual image, effectively protecting the privacy of the target object and preventing information leakage. At the same time, the virtual image accurately follows the expressions and poses in the face image. , to ensure the imaging quality of the remote device.

Description of drawings

1 is a flowchart of a method for constructing a virtual image provided by an embodiment of the present application;

2 is a schematic diagram of current frame image data provided by an embodiment of the present application;

3 is a schematic diagram of a virtual image provided by an embodiment of the present application;

4 is a flowchart of another virtual image construction method provided by an embodiment of the present application;

FIG. 5 is a schematic diagram of a reference three-dimensional face model provided by an embodiment of the present application;

6 is a schematic diagram of a face image provided by an embodiment of the present application;

7 is a schematic diagram of a three-dimensional face model of a target object provided by an embodiment of the present application;

FIG. 8 is a schematic diagram of key points of a human face provided by an embodiment of the present application;

FIG. 9 is a schematic diagram of an expression refinement partition provided by an embodiment of the present application;

10 is a schematic diagram of expression transfer provided by an embodiment of the present application;

11 is a schematic diagram of selecting key points of a face provided by an embodiment of the present application;

12 is a flowchart of another virtual image construction method provided by an embodiment of the present application;

13 is a schematic diagram of a mutually exclusive expression base provided by an embodiment of the present application;

14 is a schematic diagram of another mutually exclusive expression base provided by an embodiment of the present application;

FIG. 15 is a schematic structural diagram of an apparatus for constructing a virtual image provided by an embodiment of the present application;

FIG. 16 is a schematic structural diagram of a virtual image construction device according to an embodiment of the present application.

Detailed ways

The present application will be further described in detail below with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are used to explain the present application, but not to limit the present application. In addition, it should be noted that, for the convenience of description, the drawings only show some but not all the structures related to the present application.

The virtual image construction method provided by the embodiment of the present application may be executed by a virtual image construction device, and the virtual image construction device may be implemented by means of software and/or hardware, and the virtual image construction device may be composed of two or more physical entities, It can also be a physical entity. For example, the virtual image construction device may be a smart device such as a computer, a mobile phone, a tablet computer, or an interactive smart tablet.

In the embodiment, the virtual image construction device is applied in the scenario of video communication using network communication technology, such as online conferences and online classes. In this scenario, in addition to the virtual image construction device, it also includes other devices participating in video communication. The other devices can be one or more, and the other devices can also be smart devices such as computers, mobile phones, tablet computers, or interactive smart tablets. During video communication, the virtual image construction device executes the virtual image construction method provided in this embodiment, so as to process when collecting the face image of the local user, thereby enabling other devices to display the virtual image obtained based on the face image. At this time, the other devices are remote devices with respect to the virtual image construction device. It can be understood that in practical applications, the virtual image construction method provided in this embodiment can also be executed when the remote device collects the face image of the user. In this case, the remote device can also be considered as a virtual image construction device, while the local device is Used to display the corresponding virtual image. For example, in an online classroom scenario, the device used by the lecturer can be considered as a virtual image acquisition device, and the device used by the students can be considered as a remote device. For another example, in an online conference scenario, the device used by the current speaker may be considered as a virtual image construction device, and the devices used by other participants may be considered as remote devices.

In an embodiment, the virtual image construction device is installed with at least one type of operating system, wherein the operating system includes but is not limited to an Android system, an IOS system, and/or a Windows system. The virtual image construction device can install at least one application program based on the operating system, and the application program can be an application program that comes with the operating system, or it can be an application program downloaded from a third-party device or server. The embodiment of the application program is not limited. , it can be understood that the virtual image construction method provided by the embodiment of the present application may also be an application program itself. In the embodiment, the virtual image constructing device is installed with at least an application program for executing the virtual avatar constructing method provided by the embodiment of the present application, and the virtual avatar constructing method is executed when the application program runs.

FIG. 1 is a flowchart of a method for constructing a virtual image according to an embodiment of the present application. Referring to Figure 1, the virtual image construction method specifically includes:

Step 110: Acquire current frame image data, where the current frame image data includes a face image of the target object.

When performing video communication, the virtual image construction device may collect image data through an image collection device (eg, a camera) installed by itself. In the embodiment, the currently collected image data is recorded as the current frame image data. In the embodiment, the current frame image data includes the face image of the target object, wherein the target object refers to the object that needs to generate a virtual image, and any object that can be recognized as a face image can be considered as the target object, and does not need to be Specified in advance; for example, in an online classroom scenario, the target object can be a lecturer using a virtual image construction device, and the face image of the target object refers to the lecturer's face image. It can be understood that the number of target objects in the image data of the current frame is one or more. In the embodiment, the image data of the current frame includes one target object for exemplary description. In practical applications, when there are multiple target objects, the processing method of each target object is the same as the processing method of the current target object.

Optionally, after obtaining the image data of the current frame, first confirm whether the image data of the current frame contains the face image of the target object, if so, execute the subsequent steps, otherwise, stop executing the subsequent steps and obtain the image data of the next frame as the current frame image data to repeat this step. The embodiment of the technical means used to confirm whether the current frame image data contains a face image is not limited. For example, a face detection algorithm based on deep learning is used to detect the face image area in the current frame image data. face image area, it is determined that the image data of the current frame contains a face image; otherwise, it is determined that the image data of the current frame does not contain a face image.

Step 120 , construct a neutral facial expression base and a plurality of personalized facial expression bases of the target object according to the current frame image data.

The expression base can be understood as a three-dimensional face model containing the position information of the key points of the face, and the expression of the person can be reflected by the position information of the key points of the face in the expression base. The key points of the face can be obtained from the key parts of the face. The key parts of the face include eyebrows, eyes, nose, mouth, and cheeks, etc. The key points of the face are located in the above key parts and are used to describe the key parts. The point of the current action. At this time, the action of each key part can be determined through the face key points, and then the face pose, face position and face expression can be determined, that is, each face key point is the semantic information of the face.

In one embodiment, facial expressions are divided into neutral expressions and personalized expressions. Neutral expression refers to the shape of the face without any expression, which can reflect the identity of the face. Among them, the identity of the face is a specific description of the shape of the face. For example, the identity of the face describes the key parts of the face. For example, In other words, the key parts described by face identity are big eyes, high nose bridge, and thin lips. At this time, since the faces of different objects are different, there will be differences between the key parts described by the face identities of different objects. Personalized expressions refer to expressions made by a human face, such as eyes closed, mouth open, frowning, etc. According to the facial expressions, the expression bases are divided into neutral expression bases and personalized facial expression bases. Among them, the neutral facial expression base can be understood as an expression base representing neutral expressions, and the shape of the human face in three-dimensional space can be confirmed through the neutral facial expression base. The face personalized expression base refers to an expression base containing personalized expressions, and each face personalized expression base corresponds to a personalized expression. Understandably, since the expressions of the human face are very rich, if you want to express all the expressions of the human face, you need to build a large number of personalized facial expression bases, which will greatly increase the amount of data processing. Therefore, in the embodiment, only the face personalized expression base of the basic expression is constructed, wherein the specific content of the basic expression can be set according to the actual situation, and various expressions of the human face can be obtained by combining the basic expression and the neutral expression . For example, the basic expressions for eyes include: left eye closed, left eye wide, right eye closed and right eye wide, at this time, various expressions of eyes can be obtained according to the above four basic expressions and neutral expressions, such as , the expression with slightly squinting eyes can be obtained by linear superposition of left eye closed, right eye closed and neutral expression.

In the embodiment, the neutral facial expression base and individual facial expression bases of the target object are constructed by using the facial image of the current frame image data. Among them, when constructing the neutral facial expression base of the target object, prior information can be introduced. The prior information is obtained by collecting a large amount of 3D face data, which can reflect the average coordinate data, face identity base vector and personalized expression base vector of a large amount of 3D face data, and a 3D face model can be constructed through the prior information , it can be understood that there are differences in the 3D face models obtained when different coefficients are set for the prior information. The 3D face model can be regarded as a reference 3D face model, that is, a 3D face model corresponding to the target object face image in the current frame image data can be obtained by adjusting the reference 3D face model. At this time, first obtain the coordinates of each face key point in the two-dimensional plane in the face image of the target object, and then refer to the three-dimensional key points of the three-dimensional face model (that is, refer to the face key points in the three-dimensional face model). ) into the two-dimensional plane to determine the coordinates of the three-dimensional key points in the two-dimensional plane, and then calculate the three-dimensional key points in the reference three-dimensional face model and the face key points in the face image in the two-dimensional plane. It is understandable that the three-dimensional key points that need to calculate the error have a corresponding relationship with the face key points, that is, the relative positions of each group of corresponding three-dimensional key points and face key points in the corresponding image are the same. For example, the three-dimensional key points and The key points of the face are the left boundary points of the eyes. After that, adjust the position of the 3D key points in the reference 3D face model according to the calculated error, so as to ensure that the 3D key points of the adjusted reference 3D face model are projected to the 2D plane with the face key points in the face image. The coordinates of the points should be as coincident as possible. The adjustment of the positions of the three-dimensional key points can be realized by adjusting the coefficients used by the prior information. When the coefficients are different, the positions of the three-dimensional key points in the reference three-dimensional face model constructed by the prior information are different. When the face key points overlap as much as possible, it can be considered that the adjusted reference 3D face model is close to the 3D face model of the target object. After that, obtain the final used coefficients, and remove the personalized expression base vector in the prior information, so as to construct a reference 3D person without personalized expressions by using the final used coefficients and the prior information after removing the personalized expression base vector Face model, at this time, the reference 3D face model can be considered as the neutral facial expression base of the target object. It should be noted that in practical applications, other methods can also be used to construct a neutral facial expression base. For example, by using a neural network, the corresponding facial image or key points in the facial image can be input into the neural network. Human face neutral expression base.

Afterwards, the neutral facial expression base is processed to obtain the personalized facial expression base of the target object. Among them, the prior information is also introduced when constructing the face personalized expression base. At this time, each basic expression corresponds to a prior information, the prior information represents the three-dimensional face model corresponding to the basic expression, and the neutral expression also corresponds to a prior information. The prior information represents a three-dimensional face model corresponding to a neutral expression, and the basic expression and the neutral expression in the a priori information above belong to the same face. Further, first calculate the transfer deformation variable between the prior information corresponding to the neutral expression and the prior information corresponding to the basic expression, that is, after transforming the prior information corresponding to the neutral expression by the transfer deformation variable, the basic expression can be obtained. The prior information corresponding to the expression. Afterwards, the neutral facial expression base is converted according to the transfer deformation variable, so as to obtain the personalized facial expression base of the target object under the basic expression. It can be understood that, according to the above method, the personalized facial expression base of the target object under each basic expression can be obtained.

Step 130 , constructing a three-dimensional face model of the target object according to the neutral facial expression base and multiple personalized facial expression bases.

Exemplarily, a weight coefficient is set for each individual face expression base, and then linear weighting is performed on each face individual face expression base and the face neutral expression base in combination with the weight coefficient, so that a target object with an expression can be obtained. 3D face model. At this point, the 3D face model is expressed as:

Among them, B represents a three-dimensional face model, B ₀ represents the neutral facial expression base, B _i represents the ith face personalized expression base, 1≤i≤n, n is the total number of face personalized expression bases, β _i represents the weight coefficient corresponding to B _i .

It can be understood that since the specific expression of the face image in the current frame of image data cannot be clearly defined, it is impossible to set an accurate weight coefficient for each face personalized expression base. Then, when building a three-dimensional face model, it can be used for each person. The face personalized expression base sets an initial weight coefficient. It is understandable that the expressions represented by the currently constructed three-dimensional face model are different from the expressions of the face images.

Step 140: Determine the pose parameters of the three-dimensional face model when the three-dimensional face model is mapped to the face image and the weight coefficients of the individualized expression bases of each face.

It can be understood that when the expression represented by the three-dimensional face model is closer to the real expression of the face image, the difference between the three-dimensional face model and the face image is smaller when the three-dimensional face model is mapped to the two-dimensional plane. Therefore, in the embodiment, the weight coefficients of the individualized expression bases of each face in the three-dimensional human face model can be continuously adjusted, so that the expressions represented by the three-dimensional human face model are close to the expressions of the human face image. At the same time, since the head of the target object will have certain movements, such as tilting the head, turning the head, etc., therefore, in the embodiment, it is also necessary to adjust the pose parameters of the three-dimensional face model, so that the adjusted three-dimensional face model moves Consistent with the head motion of the target object in the face image. Among them, the pose parameters can also be understood as rigid transformation parameters. In the embodiment, the rigid transformation refers to changing the position, orientation and size of the three-dimensional face model without changing the shape. The rigid transformation parameter refers to a parameter used when performing rigid transformation on the three-dimensional face model. In the embodiment, the rigid transformation parameter includes: a rigid rotation matrix, a translation vector, and a scaling factor. The rigid rotation matrix is used to change the orientation of the 3D face model, the translation vector is used to change the position of the 3D face model, and the scaling factor is used to change the size of the 3D face model.

In the embodiment, the difference between the two-dimensional image and the face image when the three-dimensional face model is mapped to the two-dimensional plane is determined by constructing an error parameter formula. The smaller the error parameter is, the smaller the difference is, the closer the expression represented by the 3D face model is to the real expression of the face image, and the more consistent the action of the 3D face model is with the head action of the target object in the face image. In an optional way, the error parameter formula is constructed by the coordinate difference between the two-dimensional image when the three-dimensional face model is mapped to the two-dimensional plane and the key points of the face in the face image. It is understandable that the coordinates of the key points of the face corresponding to the 3D face model can be determined by the weight coefficients and pose parameters. Therefore, when constructing the error parameter formula, the weight coefficients and pose parameters can be considered as unknown quantities. Constantly adjust the weight coefficients and pose parameters, so that the coordinates of the face key points corresponding to the 3D face model and the face key point coordinates in the face image are getting closer and closer, so that the error parameters are getting smaller and smaller, and the 3D face The expression represented by the model is more and more the same as the real expression of the face image, and the action of the 3D face model is more and more consistent with the pose parameters of the head action of the target object in the face image. Specifically, when the calculated error parameter has reached the desired value, the currently used weight coefficient and pose parameter can be used as the finally obtained weight coefficient and pose parameter. Among them, it can be determined whether the error parameter reaches the desired value by setting the adjustment times of the weight coefficient and the pose parameter, that is, when the adjustment times reaches a certain number of times, it is determined that the error parameter reaches the desired value. It is also possible to determine whether the error parameter reaches the expected value by setting the parameter threshold, that is, when the error parameter is lower than the parameter threshold, it is determined that the error parameter reaches the expected value. Generally speaking, if the error parameter has reached the expected value, it can be considered that the expression represented by the 3D face model is sufficiently the same as the real expression of the face image, and the action of the 3D face model is the same as the head of the target object in the face image. The pose parameters of the actions are sufficiently consistent.

Step 150: Send the pose parameters and the weight coefficients to the remote device, so that the remote device generates a virtual image corresponding to the face image according to the pose parameters and the weight coefficients.

Exemplarily, the pose parameters and weight coefficients are sent to a remote device for video communication with the virtual image construction device. A virtual image is stored in the remote device, and the virtual image may be a cartoon image, which may be a two-dimensional virtual image or a three-dimensional virtual image. In the embodiment, a three-dimensional virtual image is used as an example. The storage of the three-dimensional virtual image in the remote device specifically includes storing a neutral expression base and a personalized expression base of the three-dimensional virtual image, wherein each personalized expression base of the three-dimensional virtual image has the same characteristics as the corresponding personalized facial expression base. expression. In one embodiment, the user of the remote device can install an application program in the remote device, and the application program can receive the pose parameters and weight coefficients sent by the virtual image construction device, and generate the pose parameters and weight coefficients according to the pose parameters and the weight coefficients. A virtual image corresponding to the face image, and the remote device stores the virtual image when the application is installed. When the application program is upgraded or updated, the virtual image stored in the remote device can be updated at the same time.

In one embodiment, when the remote device generates a virtual image corresponding to a face image, it can render and display a preset three-dimensional virtual image through a graphics rendering framework of an open source graphics library (Open Graphics Library, OpenGL). Wherein, during rendering, the individualized expression base and neutral expression base of the three-dimensional virtual image are linearly weighted according to the weight coefficient, so as to obtain a three-dimensional virtual image containing expressions, wherein the linear weighting method is the same as that of constructing the three-dimensional face model in step 130. The same way of linear weighting. After generating the three-dimensional virtual image containing the expression, the graphics rendering framework performs corresponding rigid transformation on the three-dimensional virtual image containing the expression according to the pose parameters, and displays it after the rigid transformation is completed. At this time, the displayed three-dimensional virtual image not only has the same expression as the human face image, but also has the same pose as the human face image. For example, FIG. 2 is a schematic diagram of current frame image data provided by an embodiment of the present application. FIG. 3 is a schematic diagram of a virtual image provided by an embodiment of the present application. After the current frame image data shown in FIG. 2 is processed by the above method, the remote device can display the virtual image schematic diagram shown in FIG. 3 .

It can be understood that after sending the weight coefficient and pose parameters to the remote device, the next frame of image data can be obtained, and the next frame of image data can be used as the current frame of image data, and the above process is repeated to make the remote device display continuous display. virtual image.

Optionally, the virtual image construction device can also be set with a function control for enabling the virtual image, and the function control can be realized by physical physical buttons or virtual buttons. When it is detected that the function control is triggered, the above method is executed to enable the remote control. The terminal device displays the virtual image. When it is detected that the function control is stopped being triggered, it only needs to send the current frame image data to the remote device, so that the remote device can display the current frame image. In this way, the user can determine whether to display the real image based on his own needs, thereby improving the user's experience.

Above, by acquiring the current frame image data containing the target object's face image, and constructing the target object's face neutral expression base and a plurality of face personalized expression bases according to the current frame image data, and then, according to the face neutral expression base. Build a 3D face model with multiple face personalized expression bases, and then determine the weight coefficients and pose parameters when the 3D face model is mapped to the face image, and then send the pose parameters and weight coefficients to the remote device. The technical means of enabling the remote device to display the virtual image corresponding to the face image through pose parameters and weight coefficients solves the technical problems of information leakage and jamming caused by the transmission of the real face image in the related art. Due to the transmission of the weight coefficients of the face personalized expression base and the pose parameters of the three-dimensional face model, the demand for network bandwidth is greatly reduced, and it is especially suitable for remote video communication scenarios. In addition, the transmitted weight coefficients and pose parameters can enable the remote device to display the corresponding virtual image, effectively protecting the privacy of the target object and preventing information leakage. At the same time, the virtual image accurately follows the expressions and poses in the face image. , to ensure the imaging quality of the remote device.

FIG. 4 is a flowchart of another virtual image construction method provided by an embodiment of the present application. This embodiment is embodied on the basis of the above-mentioned embodiment. Referring to Figure 4, the virtual image construction method specifically includes:

Step 210: Acquire current frame image data, where the current frame image data includes a face image of the target object.

Step 220: Construct a neutral facial expression base of the target object according to the current frame image data and the preset a priori information of the facial model.

In the embodiment, the prior information of the face model refers to the prior information used when constructing the neutral facial expression base of the target object, and the reference three-dimensional face model can be constructed through the prior information of the face model, and the reference three-dimensional face model In this embodiment, it is a neutral expression. In one embodiment, the face neutral expression base of the target object is obtained by fitting the face model parameters through the face image. For example, FIG. 5 is a schematic diagram of a reference three-dimensional face model provided by an embodiment of the present application, FIG. 6 is a schematic diagram of a face image provided by an embodiment of the present application, and FIG. 7 is a schematic diagram of a three-dimensional face model of a target object provided by an embodiment of the present application. , the reference 3D face model shown in Fig. 5 is constructed through the prior information of the face model. After that, the reference 3D face model shown in Fig. 5 is fitted with the face image shown in Fig. 6, and the target shown in Fig. 7 can be obtained. The three-dimensional face model of the object can be understood. When the three-dimensional face model shown in FIG. 7 has no expression, it can be used as a neutral expression base of the face. It should be noted that FIG. 7 shows a side view of the three-dimensional face model.

Exemplarily, the prior information of the face model can be constructed based on the three-dimensional face data in the published BFM (Basel Face Mode) database, and each three-dimensional face data can be considered as a three-dimensional face model. The embodiment of the expression of the model is not limited. In one embodiment, Principal Component Analysis (PCA) is used to extract 200 three-dimensional face data in the BFM database to obtain a bilinear model, wherein the bilinear model is constructed based on the 200 three-dimensional face data. Referring to the 3D face model, the specific model expression formula is:

M=MU+PC _id ·α _id +PC _exp ·α _exp

Among them, M is the reference 3D face model, MU is the average coordinate data of 200 3D face data, and MU has a total of 3h data, where h refers to the average number of point clouds of 200 3D face data. Each point cloud contains the coordinates of the three axes of x, y, and z. A three-dimensional face can be constructed through MU. PC _id is the face identity base vector obtained from 200 three-dimensional face data, which can be reflected as MU through PC _id The superimposed face identity, that is, the face identity of the reference 3D face model can be obtained by superimposing the PC _id for the MU (for example, the neutral expression face feature when there is no expression), and the PC _exp is obtained by 200 3D face data. Personalized expression base vector, which can be expressed as the personalized expression superimposed by MU through PC _exp , that is, the personalized expression of the reference 3D face model can be obtained by superimposing PC _exp for MU, α _id is the coefficient corresponding to the face identity base vector, α _exp is the coefficient corresponding to the personalized expression base vector. That is, the PC _id and PC _exp are linearly weighted by α _id and α _exp , and the weighted results are fused into the average coordinate data of the 3D face data, so that the reference 3D face model can be obtained. It can be understood that MU, PC _id , PC _exp is the prior information of the face model.

After the reference 3D face model is constructed, the reference 3D face model can be mapped to the 2D plane to determine the difference between the 2D image in the 2D plane and the face image, and then adjust the face model priors according to the difference. The information uses α _id and α _exp so that the adjusted reference 3D face model is mapped to the 2D plane and is highly similar to or the same as the face image.

In one embodiment, when determining the difference between the two-dimensional image corresponding to the reference three-dimensional face model and the face image, it is specifically determined by the key points of the face. At this time, step 220 includes steps 221-223:

Step 221: Detect the face image in the current frame data image.

In one embodiment, a face recognition algorithm is used to detect the location area where the human face is located in the current frame data image, and then the location area where the human face is located is extracted to obtain the human face image.

Step 222: Perform facial key point positioning on the face image to obtain a key point coordinate array.

Using the face key point detection technology, the face key points are detected in the face image, and the coordinates of the detected face key points are obtained, and the coordinates of each face key point are formed into a key point coordinate array.

In the embodiment, taking 68 face key points as an example, FIG. 8 is a schematic diagram of face key points provided by the embodiment of the present application. Referring to FIG. 8 , a total of 68 face key points are detected in the current face image, and each face key point has corresponding face semantic information. After that, the coordinates of the 68 face key points in the face image are arranged in a certain order to form a key point coordinate array. At this time, the key point coordinate array can be expressed as: Landmarks={x ₁ , y ₁ , x ₂ , y ₂ ,...,x ₆₈ ,y ₆₈ }, where (x ₁ , y ₁ ) are the coordinates of the first face key point, and so on. It is understandable that the arrangement order of the key points of the face can be set according to the actual situation, and the embodiment is not limited.

Exemplarily, in the process of video communication, there is a certain correlation between the image data of two adjacent frames, and the above key point detection process is obtained based on one frame of image data. The coordinates of the face key points with the same face semantic information have a large difference, which will affect the later calculation and cause the final generated virtual image to have a jitter problem. In order to prevent the above problem, in the embodiment, after this step, the method further includes: performing a filtering operation and a smoothing operation on the key point coordinate array.

The filtering operation refers to adjusting the key point coordinate array of the current frame in combination with the key point coordinate array in the image data of the previous frame, so as to ensure that the key point coordinate array of the previous frame and the key point coordinate array of the current frame are smoothly gradient. , so that the coordinate arrays of key points of each frame in the video communication process are all smoothly gradient. In one embodiment, the filtering operation is implemented by means of Kalman filtering. Among them, when Kalman filtering is performed on the key point coordinate data of the current frame, the key point coordinate array of the current frame and the key point coordinate array of the previous frame are weighted to update the weighted result to the key point coordinate data of the current frame. .

The smoothing operation is used to avoid the situation that some face key points are outliers, so that the coordinate curve between adjacent face key points is smooth. In one embodiment, the PCA algorithm is used to perform a smoothing operation on the filtered key point coordinate array to update the key point coordinate array.

It is understandable that the key point coordinate array used subsequently is the key point coordinate array after filtering and smoothing operations.

Step 223: Determine the neutral facial expression base of the target object according to the facial image, the coordinate array of key points and the preset prior information of the facial model.

Exemplarily, construct the energy constraint formula between the reference three-dimensional face model and the face image obtained according to the prior information of the face model, which is specifically:

Among them, E _lan (p) represents the energy constraint between the reference 3D face model and the face image, p represents the parameters used by the reference 3D face model, p includes the coefficient α _id corresponding to the face identity base vector, the personalized expression The coefficient α _exp corresponding to the basis vector, the weak perspective projection matrix ∏ and the rigid transformation matrix φ, where the weak perspective projection is mainly used to project the 3D space point information (such as the reference 3D face model) to the 2D imaging plane. The projection matrix refers to the matrix used when projecting the reference 3D face model to the 2D plane, and the rigid transformation matrix may include rigid rotation matrix, translation vector, and scale factor. ω _conf,j represents the confidence of the detection of the jth face key point in the face image, f _j represents the coordinates of the jth face key point in the face image, F represents the key point coordinate array, v _j represents the reference 3D face The coordinates of the jth 3D keypoint when the model is mapped to the 2D plane. It can be understood that the arrangement order of the face key points when the reference 3D face model is mapped to the 2D plane is the same as the arrangement order of the face key points in the face image. In one embodiment, when the reference three-dimensional face model is mapped to the two-dimensional plane, the more similar the coordinates of each three-dimensional key point and the corresponding face key point in the face image, the smaller the E _lan (p), the more the reference three-dimensional face model. A 3D face model close to the target object. Therefore, in the embodiment, by continuously adjusting the parameters of the reference three-dimensional face model, E _lan (p) becomes smaller and smaller (that is, the projection error between the three-dimensional key points and the face key points becomes smaller and smaller), when E _lan (p ) is stabilized, determine the specific value of p when E _lan (p) is the smallest, at this time the projection error of the three-dimensional key point and the face key point is the smallest. When E _lan (p) is the smallest, remove α _exp , ∏ and φ, take the specific value of α _id , and substitute it into M=MU+PC _id ·α _id to obtain the final reference three-dimensional face model, at this time, get The reference 3D face model is closest to the 3D face model when the target object has no expression. Therefore, the obtained reference 3D face model can be determined as the neutral facial expression base of the target object. Therefore, in this embodiment, the face identity information is reconstructed only from a single frame of face image, so that the neutral expression information corresponding to the face image is reconstructed each time to prepare for the subsequent construction of the personalized expression base.

Step 230, according to the neutral facial expression base of the human face and the preset reference neutral expression base and each reference individualized expression base, determine each face individualized expression base of the target object, and each reference individualized expression base corresponds to a human face. Personalized expression base.

The reference neutral expression base is a preset expression base representing neutral expressions. The reference personalized expression base is an expression base obtained by adding a preset basic expression on the basis of the reference neutral expression base. Each reference personalized expression base has a corresponding physical meaning. In the embodiment, the facial action coding system (Facial Action Coding System, FACS) is used to define each facial muscle action as a different action unit AU value or AD value, that is, to classify each basic expression by muscle action. For example, the AU value corresponding to "the inner eyebrow is raised upward" is recorded as AU1. In addition, each AU value also includes a refinement value, which is used to indicate the movement range of the muscle. For example, if the AU value including the refinement value is AU1 (0.2), it means that the current basic expression is that the inner eyebrow is pulled up. , and the pull degree is 0.2. For another example, the AU value corresponding to "eyes closed" is denoted as AU43, then AU43(0) indicates that the eyes are normally opened, and AU43(1) indicates that the eyes are completely closed. Schematic diagram of the partition. Referring to FIG. 9 , from left to right are the refinement values corresponding to the closing degrees of each eye during the process from fully opening to fully closing the eyes. In the embodiment, 26 basic expressions are defined according to muscle movements, and each basic expression corresponds to a reference personalized expression base. At this time, each reference personalized expression base, corresponding basic expression and AU value are shown in the following table:

BlendshapeBlendshape	自定义表情custom emoji	FACS定义Definition of FACS	BlendshapeBlendshape	自定义表情custom emoji	FACS定义Definition of FACS
00	左眼闭left eye closed	AU43AU43	1313	右嘴角上扬right corner of mouth up	AU12AU12
11	右眼闭right eye closed	AU43AU43	1414	左嘴角外展Left mouth corner abduction	AU20AU20
22	左眼瞪大left eye widened	AU5AU5	1515	右嘴角外展right mouth corner	AU20AU20
33	右眼瞪大right eye wide	AU5AU5	1616	上嘴唇内收upper lip adducted	AU28AU28
44	皱左眉frown	AU4AU4	1717	下嘴唇内收Adduction of lower lip	AU28AU28
55	皱右眉frown	AU4AU4	1818	下嘴唇向外lower lip outward	AD29AD29
66	挑眉头raised eyebrows	AU1AU1	1919	上嘴唇向上upper lip up	AU10AU10
77	挑左眉尾Pick left eyebrow	AU2AU2	2020	下嘴唇向下lower lip down	AU16AU16
88	挑右眉尾Pick right eyebrow	AU2AU2	21twenty one	左嘴角向下left corner of mouth down	AU17AU17
99	张嘴open mouth	AU26AU26	22twenty two	右嘴角向下right corner of mouth down	AU17AU17
1010	下巴左移Chin left	AD30AD30	23twenty three	嘟嘴pouting	AU18AU18
1111	下巴右移Chin right	AD30AD30	24twenty four	脸颊鼓起cheeks bulge	AD34AD34
1212	左嘴角上扬Left corner of mouth up	AU12AU12	2525	皱鼻wrinkled nose	AU9AU9

Table 1

Among them, Blendshape represents the personalized expression base, 0-25 is the number of 26 personalized expression bases, the custom expression is the basic expression corresponding to each expression base, and the FACS definition represents the AU value or AD corresponding to each personalized expression base value. It can be seen from the above-mentioned custom expressions that each personalized expression base mainly divides some left and right symmetrical expressions, so that when the target object is an asymmetric expression, the corresponding personalized expression base of the face can be constructed accurately. According to the above table, after adding 26 expressions to the reference neutral expression base, 26 reference personalized expression bases can be obtained.

Exemplarily, according to the reference neutral expression base and the reference individual expression base, the deformation information required when the reference neutral expression base is transformed to the reference individual expression base can be determined, and the deformation information can also be regarded as the reference neutral expression base. The pass-through variable passed to the reference personalized expression base. Afterwards, using the deformation information to process the neutral expression base of the human face, the personalized expression base of the human face can be obtained.

In one embodiment, the deformation information is obtained by means of three-dimensional mesh deformation. In this case, step 230 includes steps 231-232:

Step 231: Determine deformation information according to the reference neutral expression base and the reference personalized expression base.

In one embodiment, the reference neutral expression base can be divided into a plurality of triangular patches after triangulating the face key points in the reference neutral expression base according to the arrangement order by using the Delaunay triangulation algorithm. The three vertices of the patch are three face key points that form a triangle, and each triangular patch can form a three-dimensional mesh representing the reference neutral expression base. Similarly, using the Delaunay triangulation algorithm to triangulate the face key points in the reference personalized expression base in order of arrangement, the reference personalized expression base can be divided into multiple triangular patches, each triangular patch The three vertices of are the three face key points that form a triangle, and each triangular facet can form a three-dimensional mesh representing the reference personalized expression base.

Each triangular face in the reference personalized expression base is in one-to-one correspondence with each triangular face in the reference neutral expression base. According to the corresponding relationship, it can be determined that the deformation of the triangular face in the reference neutral expression base corresponds to the reference individual expression base. The deformation information of the triangular patch. Among them, the deformation information represents the transfer deformation variables (rotation matrix, translation vector, scaling factor, etc.) used in the deformation of the triangular face in the reference neutral expression base, so that the deformed triangular face corresponds to the reference personalized expression base. Triangular patches are the same. Each triangular patch corresponds to a deformation information. The deformation information of each triangular facet constitutes the deformation information from the reference neutral expression base to the current reference individual expression base. Understandably, each reference personalized expression base has corresponding deformation information.

Step 232 , determining the personalized facial expression base of the target object according to the deformation information and the neutral facial expression base.

Exemplarily, three-dimensional network registration is performed on the neutral expression base of the face and the reference neutral expression base. Three-dimensional space transformation (such as scaling, rotation, and translation) is performed on each triangular face in the neutral expression base, so that the transformed triangular facets are in one-to-one correspondence with each triangular face in the neutral face expression base. At this time, the reference neutral expression base after three-dimensional space transformation can be called the deformed reference neutral expression base. The three-dimensional coordinates in the three-dimensional space of each triangular patch in the deformed reference neutral expression base and the corresponding triangular face in the human face neutral expression base are highly similar or identical.

Optionally, in order to better match the deformed reference neutral expression base with the face neutral expression base, in the embodiment, smooth constraints and key point constraints are used for each face in the deformed reference neutral expression base. The three-dimensional coordinates of the key points are processed. Among them, 3D smoothing can be used for smoothing constraints, and PCA algorithm is used for key point constraints.

The deformed reference neutral expression base is the same as the human face neutral expression base. At this time, the triangular patches in the deformed reference neutral expression base and the face neutral expression base can be determined through the k-d tree. The corresponding relationship between the triangular facets in the reference neutral expression base and the triangular facets in the neutral face expression base is determined. In an embodiment, a k-d tree can be understood as a data structure that organizes points in a k-dimensional Euclidean space.

Then, according to the corresponding relationship between each triangular facet in the reference neutral expression base and each triangular facet in the human face neutral expression base, when the reference neutral expression base is used to transform to a certain reference personalized expression base, The deformation information processes the corresponding triangular faces in the neutral expression base of the face, that is, deforms the triangular faces, so that the deformed triangular faces are used as the triangular faces in the face personalized expression base. According to this method , after processing each triangular face in the neutral facial expression base, the individualized facial expression base corresponding to the reference individualized expression base can be obtained. After processing in the above manner, each reference personalized expression base has a corresponding face personalized expression base.

At this time, the above processing process can be calculated by the deformation formula, wherein the deformation formula of a triangular patch is expressed as:

in,

Represents the vertex-related information of the triangular facets in the face personalized expression base, V _T represents the vertex-related information of the corresponding triangular facets in the face neutral expression base,

is the inverse matrix of V _T ,

represents the vertex-related information of the corresponding triangular facets in the reference personalized expression base, V _S represents the vertex-related information of the corresponding triangular facets in the reference neutral expression base,

is the inverse matrix of V _S ,

and

is the three-dimensional coordinates of the three face key points of the triangular face in the face personalized expression base,

is the normal vector of the triangular face, similarly, V _T =[v _T2 -v _T1 v _T3 -v _T1 v _T4 -v _T1 ], v _T1 , v _T2 and v _T3 are the corresponding triangular faces in the neutral expression base of the face The three-dimensional coordinates of the three face key points of the patch, v _T4 is the normal vector of the triangular patch,

and

In order to refer to the three-dimensional coordinates of the three face key points corresponding to the triangular face in the personalized expression base,

is the normal vector of the triangular patch, V _S =[v _S2 -v _S1 v _S3 -v _S1 v _S4 -v _S1 ], v _S1 , v _S2 and v _S3 are the three corresponding triangular patches in the reference neutral expression base The three-dimensional coordinates of the face key points, v _S4 is the normal vector of the triangular patch.

contains the V _S deformation to

The deformation information of , can be obtained by deforming V _T through this deformation information

After the triangular facets in the neutral facial expression base are deformed according to the above formula, the corresponding personalized facial expression base can be obtained.

For example, FIG. 10 is a schematic diagram of expression transfer provided by an embodiment of the present application. Referring to Figure 10, the first column in the first row is a reference neutral expression base, and the second to fourth columns in the first row are three reference personalized expression bases, and the corresponding basic expressions are closed right eye, open mouth and pouting. The first column in the second row is the neutral facial expression base, and the deformation information is determined according to the reference neutral expression base and each reference personalized expression base. Face personalized expression base, the second row, second row to fourth column in Figure 10 are the face personalized expression base obtained according to the reference personalized expression base of the first row, second row to fourth column, at this time, The basic expressions corresponding to the three face personalized expression bases are closing the right eye, opening the mouth, and pouting, namely, the basic expressions are transferred from the reference personalized expression base to the face personalized expression base.

Step 240 , constructing a three-dimensional face model of the target object according to the neutral facial expression base and multiple personalized facial expression bases.

Step 250 , constructing an error parameter formula when the three-dimensional face model is mapped to the face image.

The error parameter formula can also be understood as an energy function. Exemplarily, the construction rule of the error parameter formula can be set according to the actual situation. In one embodiment, the error parameter formula is constructed by minimizing the residual error. In this case, the error parameter formula is:

Among them, E represents the error parameter, B represents the three-dimensional face model,

B ₀ represents the neutral facial expression base of the target object, B _i represents the ith face personalized expression base of the target object, 1≤i≤n, n is the total number of face personalized expression bases, β _i represents B _i Corresponding weight coefficient, B ^k represents the kth face key point in the three-dimensional face model, 1≤k≤M, M is the total number of face key points (M=68 in the above embodiment), f ^k represents the face image In the kth face key point, s represents the scaling factor when the 3D face model is mapped to the face image, R represents the rigid rotation matrix when the 3D face model is mapped to the face image, and t represents the 3D face model mapped to the face image. The translation vector of the face image, s, R, and t are the pose parameters, and ||*|| represents the norm of *. It can be understood that when the 3D face model is mapped to the 2D plane, the difference between the kth face key point in the 3D face model and the kth face key point in the face image can be reflected by sR·B ^k +tf ^k , and then obtain the error parameter according to the difference. In the above formula, β _i , s, R and t are unknowns. It should be noted that the face key points of the three-dimensional face model and the face key points of the face image have the same arrangement order.

In another embodiment, the error parameter formula is constructed by means of the linear least squares method, that is, the above-mentioned minimized residuals are converted into the form of solving by the linear least squares method. Among them, the least squares method (also known as the least squares method) is a mathematical optimization technique, and the unknown data can be easily obtained by using the least squares method (in the embodiment, β _i , s, R and t are unknown data), And make the sum of squares of errors between the obtained data and the actual data to be the smallest. At this time, when the minimization residual is converted into the form of linear least squares solution, the error parameter formula can be expressed as:

min E' _exp =min||Aβ-b|| ²

Among them, A=sR·ΔB, b=ft-sR·B ₀ .

Among them, E' _exp represents the error parameter, ΔB=[B ₁ -B ₀ B ₂ -B ₀ ... B _n -B ₀ ], B ₀ represents the neutral facial expression base, and B _i represents the ith person's personalized facial expression Base, 1≤i≤n, n represents the total number of face personalized expression bases, β represents the weight coefficient vector, β=(β ₁ β ₂ … β _n ), β _i represents the i-th face personalized expression base Weight coefficient, s represents the scaling factor when the 3D face model is mapped to the face image, R represents the rigid rotation matrix of the face when the 3D face model is mapped to the face image, t represents the 3D face model is mapped to the face image The translation vector when , f represents the face key points in the face image, f=(f ¹ ... f ^M ), and M is the total number of face key points. From the above formula, it can be seen that Aβ can reflect the difference between the personalized facial expression base and the neutral facial expression base in the two-dimensional plane, and b can reflect the difference between the neutral facial expression base and the face image in the two-dimensional plane. It can be understood that the closer the 3D face model is to the face image, the smaller the difference between Aβ and b. When using the above error parameter formula to solve β, the error parameter formula is a linear equation system, and at this time, the solved β is (A ^T A) ^-1 ·A ^T b.

In yet another embodiment, the error parameter formula when β is solved by the linear least squares method in the previous embodiment is a linear equation system, and the solved β is (A ^T A) ⁻¹ ·A ^T b. At this time, the value range of β includes both positive and negative numbers, and negative numbers are meaningless for the 3D face model, that is, the weight coefficient cannot be negative. At the same time, each time β is solved, it is solved by the difference value of the key points of the face when the 3D face model is mapped. If the detection process of the key points of the face in the face image is wrong, it will affect the accuracy of the calculation results. , for example, the mouth in the face image is closed, but due to the detection error of the key points of the face, there is a certain distance between the key points of the face in the upper lip and the lower lip (the two key points should be extremely coincident or completely coincident), so that the possibility of recognizing the mouth as open will appear in the subsequent calculation process. Therefore, in the embodiment, quadratic programming and dynamic constraints are performed on β to avoid the above problems. At this time, the dynamic equation of β can be expressed as Cβ≤d, that is, the error parameter formula is:

min E' _exp =min||Aβ-b|| ²

Among them, Cβ≤d.

Among them, C represents the constraint parameter of β, and d represents the value range of β. where C and d are constraints on β, where,

Among them, eye represents the unit matrix, and eye(n) represents the unit matrix corresponding to the personalized expression base of n faces. The specific value of d can be set according to the actual situation. For example, β should be in the range of 0.5-1, so d can be set between 0.5 and 1. In an alternative,

Among them, ones(n) represents the upper bound of n weight coefficients, which contains n values, each value corresponds to a weight coefficient, and zero(n) represents the lower bound of n weight coefficients, which contains n values , each value corresponds to a weight coefficient. Generally speaking, the weight coefficient should be between 0-1, therefore, ones(n) can be n 1s, and zero(n) can be n 0s. which is

Among them, both 1 and 0 are n. Through the above constraints, the value range of the weight coefficient can be fixed between 0-1 to prevent the occurrence of negative numbers. In another alternative,

Among them, ones(n) represents the upper bound of the n weight coefficients, zero(n) represents the lower bound of the n weight coefficients, p _n and q _n are the value constraint matrices, p _n and q _n are based on the face The relative distance of the face key points in the image is determined. The relative distance refers to the pixel distance of the face key points in the face image. The face key points used to calculate a relative distance are located in the same key part. Under different expressions, the pixel distance between the face key points in the key parts. Differences may exist, therefore, _pn and _qn are determined by relative distances in the examples. Wherein, all face key points or some face key points in key parts may be used for calculating the relative distance. It is understandable that each face personalized expression base corresponds to a p value and a q value, n p values form p _n , and n q values form q _n . Through the above formula, the weight coefficients corresponding to different facial personalized expression bases can have different value ranges. For example, FIG. 11 is a schematic diagram of face key point selection according to an embodiment of the present application. Referring to Figure 11, in the face image, there are 6 face key points corresponding to the left eye, among which the face key point P1 and the face key point P2 are a group of faces located in the upper eyelid and the lower eyelid in the left eye respectively. Key points, face key point P3 and face key point P4 are a group of face key points located in the upper eyelid and lower eyelid in the left eye respectively. When the left eye is closed, the face key point P1 and the face key point P2 are close to each other. or overlap, the face key point P3 and the face key point P4 are close to or overlap. Therefore, by calculating the distance between the face key point P1 and the face key point P2 and the distance between the face key point P3 and the face key point P4, it can be determined whether the left eye is closed. At this time, the distance used to determine whether the left eye is closed can be regarded as the relative distance of the face key points of the left eye, and the relative distance of the face key points of the left eye is

L represents the relative distance of the face key point, it can be understood that L is the pixel distance, p ₁ represents the two-dimensional coordinate of the face key point P1 in the face image, and p ₂ represents the second position of the face key point P2 in the face image. Dimensional coordinates, p ₃ represents the two-dimensional coordinates of the face key point P3 in the face image, and p ₄ represents the two-dimensional coordinates of the face key point P4 in the face image. When the left eye is closed, even if there is a problem with face key point detection, the value of L will be relatively small, such as L≤5, at this time, 5 can be understood as the allowable error distance, that is, when there is a problem with face key point detection , as long as the relative distance of the key points of the face does not exceed the error distance, the current action of the corresponding key part can be determined, and then a reasonable value range can be set for the weight coefficient of the face personalized expression base corresponding to the action. For example, after calculating the distance of L, if L≤5, it means that the probability of the left eye closed in the face image is high, then the weight coefficient corresponding to the face personalized expression base indicating that the left eye is closed should be larger, so , you can set a larger value range for the weight coefficient, such as setting a value range of 0.9-1, at this time, the p value corresponding to the left eye closed in p _n can be 1, and the left eye closed in q _n The corresponding q value can be set to 0.9, so that the value range of the weight coefficient of the face personalized expression base representing the left eye closed in β is between 0.9 and 1. In the same way, referring to Figure 11, by calculating the relative distance of the face key points of the three groups of face key points of the mouth (the face key points in the box), it can be determined whether the mouth is closed, and then the corresponding person whose mouth is open can be determined. Set a reasonable p value and q value for the weight coefficient of the face personalized expression base. For example, if the relative distance of the key points on the face does not exceed the error distance (for example, L≤3), the mouth is considered to be closed, and the p value is set to 0.1, and the q value is set. is 0, so that the value range of the weight coefficient corresponding to the face personalized expression base representing the open mouth is between 0 and 0.1. If the relative distance of the key points of the face exceeds the error distance, set the value of p to 1 and the value of q is 0, so that the value range of the weight coefficient corresponding to the face personalized expression base representing the open mouth is between 0 and 1. Whether the right eye is closed can be determined by calculating the relative distance of the face key points of the two groups of face key points (face key points in the box) of the right eye, and then the weight coefficient of the face personalized expression base corresponding to the right eye closed Set a reasonable value range. According to the above method, the calculation method of the relative distance of the face key points corresponding to the face personalized expression base, the error distance, and the p and q values when the relative distance of the face key points does not exceed the error distance, and the relative distance of the face key points are predetermined. p- and q-values when the error distance is exceeded. Then, when constructing the error parameter formula, the p value and q value are determined by calculating the relative distance of the key points of the face, and then the value range of the weight coefficient is determined. At this time, the relative distance of the key points of the face and the corresponding error distance can be regarded as the weight. prior information on the coefficients. In this way, errors caused by incorrect detection of face key points can be allowed during the detection of face key points, and the accuracy of the subsequent processing process is ensured.

In yet another embodiment, in order to facilitate the subsequent calculation of the error parameter formula min E' _exp =min||Aβ-b|| ² , it is converted into

in,

Converted from ||Aβ-b|| ² , at this time, the error parameter formula is:

Among them, Cβ≤d.

Among them, E _exp represents the error parameter, β represents the weight coefficient vector, β=(β ₁ β ₂ … β _n ), n represents the total number of face personalized expression bases, β _i represents the ith face personalized expression base Weight coefficient, 1≤i≤n, A=sR·ΔB, s is the scaling factor when the 3D face model is mapped to the face image, R is the rigid rotation matrix when the 3D face model is mapped to the face image, ΔB =[B ₁ -B ₀ B ₂ -B ₀ ... B _n -B ₀ ], B ₀ represents the neutral facial expression base, B _i represents the i-th face personalized expression base, b=ft-sR·B ₀ , f represents the key points of the face in the face image, t represents the translation vector when the three-dimensional face model is mapped to the face image; s, R and t are the pose parameters, C represents the constraint parameter of β, and d represents the Ranges. The manner of determining C and d may refer to the foregoing embodiment.

In yet another embodiment, the error parameter formula can also be constructed by the LI regular optimization method, and in order to ensure that the weight coefficient is within the correct value range, when constructing the error parameter formula, the L1 regularity can be combined with the gradient projection, that is, in each When calculating the weight coefficient using the L1 regularity, the gradient of the weight coefficient is projected into the value range of the weight coefficient to ensure that the final calculated weight coefficient is within the corresponding value range. At this point, the constructed error parameter formula is:

in,

is the relevant information on the x-axis between the i-th face key point and the i-th face key point in the face image after the personalized expression base of the j-th face is mapped to the two-dimensional space (the relevant information is used to reflect the two face key points whether the coordinates are the same),

is the correlation information between the ith face key point and the ith face key point in the face image on the x-axis after the kth face personalized expression base is mapped to the two-dimensional space, y ⁽ⁱ⁾ is the three-dimensional face model mapping After entering the two-dimensional space, the relevant information of the ith face key point and the ith face key point in the face image on the y-axis, λ is the L1 regular coefficient, and its value can be set according to the actual situation, θ _j is the th j is the weight coefficient of the individual face expression base, θ _k is the weight coefficient of the kth face individual expression base, n is the total number of face individual expression bases, and m is the total number of face key points. Further, the weight coefficient is set between 0 and 1. At this time, in order to facilitate the calculation, the above formula can be converted into:

in,

It is the weight coefficient of the personalized expression base of the jth face when processing the image data of the previous frame. According to the above formula, the weight coefficient of each face personalized expression base can be calculated, and then the pose parameter can be calculated according to the weight coefficient.

In this embodiment, the error parameter formula used in the subsequent calculation process is:

Among them, Cβ≤d.

The reason for choosing the above error parameter formula is that in the subsequent iterative calculation, the calculation process is simple, and the weight parameters are effectively carried out quadratic programming and dynamic supervision.

Step 260: Determine, according to the error parameter formula, the pose parameters of the three-dimensional face model when the error parameters are the smallest and the weight coefficients of the individualized expression bases of each face.

Exemplarily, after the error parameter formula is constructed, the unknowns in the error parameter formula include pose parameters and weight coefficients. Therefore, the pose parameters and weight parameters used in the error parameter formula when the error parameter is the smallest can be determined through the error parameter formula, and then the pose parameters and weight parameters are used as the final calculated pose parameters and weight coefficients.

In one embodiment, when calculating the pose parameters and the weight coefficients, the calculation may be performed in an alternate iterative manner. For example, first set the initialization parameters for the weight coefficient, then substitute the initialization parameters into the error parameter formula to fix the weight coefficient in the error parameter formula, and perform calculation to determine the value of the pose parameter when the error parameter is the smallest in the current calculation process. After that, the value of the calculated pose parameter is substituted into the error parameter formula again to fix the pose parameter, and the calculation is performed to determine the parameter of the weight coefficient when the error parameter is the smallest in the current calculation process. At this point, it is considered that one iteration calculation is over, after that, the parameters of the currently calculated weight coefficient are obtained, and the above process is repeated again for iterative calculation, until the number of iterative calculations reaches a preset number of times, or, until the error parameter less than the preset parameter threshold. After that, the pose parameters and weight coefficients after the iterative calculation is stopped are used as the finally calculated pose parameters and weight coefficients. In one embodiment, step 260 includes steps 261-267:

Step 261: Obtain the initialization weight coefficients of each face personalized expression base, and use the initialization weight coefficients as the current weight coefficients.

Exemplarily, the initialization weight coefficient refers to a preset weight coefficient, that is, a weight coefficient is preset for each individual face expression base. The specific value of the initialization weight coefficient can be set according to the actual situation. For example, according to the value range of the weight coefficient of the face personalized expression base, a value boundary is selected as the initialization weight coefficient of the face personalized expression base.

In the embodiment, in order to facilitate the description of the calculation process of the weight coefficients and the pose parameters, the currently used weight coefficients are recorded as the current weight coefficients. weight factor.

Step 262: Substitute the current weight coefficient into the error parameter formula, and calculate the candidate pose parameters of the three-dimensional face model when the error parameter is the smallest.

The current weight coefficient is substituted into the error parameter formula, so that the weight coefficient in the error parameter formula is a fixed value (the value of the current weight coefficient). At this time, for the error parameter formula, the unknowns are only the pose parameters. After that, the calculation is performed according to the error parameter formula to determine the specific value of the pose parameter when the error parameter is the smallest in the current calculation process. In the embodiment, the pose parameters obtained by this calculation are recorded as candidate pose parameters. The candidate pose parameters can be understood as intermediate values, and the purpose of calculating the candidate pose parameters is to obtain the final pose parameters.

Step 263: Substitute the candidate pose parameters into the error parameter formula, and calculate the candidate weight coefficients of the individualized expression bases for each face when the error parameters are the smallest.

Exemplarily, after calculating the candidate pose parameters, the currently calculated candidate pose parameters are substituted into the error parameter formula, so that the pose parameters in the error reference formula are fixed values. At this time, for the error parameter formula , whose unknowns only have weight coefficients. After that, the calculation is performed according to the error parameter formula to determine the specific value of the weight coefficient when the error parameter is the smallest in the current calculation process. In the embodiment, the weight coefficient obtained by this calculation is recorded as the candidate weight coefficient. The candidate weight coefficient can be understood as an intermediate value, and the purpose of calculating the candidate weight coefficient is to obtain the final weight coefficient.

Step 264, update the current number of iterations.

Exemplarily, an iterative calculation process refers to a process of obtaining candidate pose parameters and candidate weight coefficients after substituting the current weight coefficient into the error parameter formula. After the candidate pose parameters and the candidate weight coefficients are obtained, it is determined that one iteration calculation is completed, and the number of iterations is updated, that is, the current number of iterations is increased by 1. It can be understood that after each candidate weight coefficient is obtained, the number of iterations is incremented by 1, and the candidate weight coefficient and candidate pose parameter calculated by the latest iteration are used as the current and final candidate weight coefficient and candidate pose. parameter.

Step 265: Determine whether the number of iterations reaches the number threshold, and when the number of iterations does not reach the number threshold, perform step 266. When the number of iterations reaches the number threshold, step 267 is executed.

In an embodiment, the number of times threshold is used to confirm whether to stop the iterative calculation. The number of times threshold may be set in combination with the actual situation. For example, an appropriate number of times threshold may be determined in combination with historical experience data. In this embodiment, the number of times threshold is 5. Exemplarily, after updating the number of iterations, it is determined whether the current number of iterations reaches the number threshold, if so, stop the iterative calculation, and execute step 266 , if not, continue the iterative calculation and execute step 267 .

Step 266 , take the candidate weight coefficient as the current weight coefficient, and return to step 262 .

The candidate weight coefficient obtained by this iterative calculation is used as the current weight coefficient, and the process returns to step 262 to start a new iterative calculation.

Step 267: Use the finally obtained candidate pose parameters as the pose parameters of the three-dimensional face model, and use the finally obtained candidate weight coefficients as the weight coefficients of the face personalized expression base.

The candidate pose parameters and the candidate weight coefficients finally obtained refer to the candidate pose parameters and the candidate weight coefficients calculated by the latest iteration when the number of iterations reaches the number threshold. When the number of iterations reaches the threshold, the iterative calculation is stopped, and the finally obtained candidate pose parameters and candidate weight coefficients are used as the pose parameters of the final 3D face model and the weight coefficients of the face personalized expression base.

It should be noted that, in general, when the number of iterations satisfies the number threshold, after adjusting the three-dimensional face model according to the finally obtained candidate pose parameters and candidate weight coefficients, when the adjusted three-dimensional face model is mapped to the two-dimensional space, The resulting 2D image is highly similar or identical to the face image.

It can be understood that the above description is described by taking the weight coefficient fixed first as an example. In practical applications, the pose parameters can also be fixed first for calculation.

It should be noted that, in practical applications, other calculation methods may also be used, which are not limited in the embodiment, and it is only necessary to obtain the corresponding pose parameters and weight coefficients when the error parameters are the smallest.

Step 270: Send the pose parameters and the weight coefficients to the remote device, so that the remote device generates a virtual image corresponding to the face image according to the pose parameters and the weight coefficients.

In the above, the neutral facial expression base of the target object is constructed according to the current frame image data and the preset a priori information of the facial model by acquiring the current frame image data including the facial image of the target object, and then according to the neutral facial expression Based on the base, the reference neutral expression base and the reference personalized expression base, the personalized expression base of the face is obtained, and the 3D face model is constructed according to the individualized expression base and the neutral expression base of each face, and the 3D face model and the human face are constructed. The error parameter formula of the face image, then, according to the error parameter formula, determine the weight coefficient of the face personalized expression base and the pose parameters of the 3D face model when the error parameter is the smallest, and send the weight coefficient and pose parameters to the remote device. , which solves the technical problems of information leakage and jamming caused by the transmission of real face images in the related art, and ensures the imaging quality of the remote device. When constructing the error parameter formula, the strategy of quadratic programming and dynamic constraint is adopted for the weight coefficient through the relative distance of the face key points, which can effectively eliminate the influence on the weight coefficient when the face key point detection is inaccurate, and improve the accuracy of the weight coefficient. sex. The basic expressions are defined by using FACS refinement, each basic expression corresponds to a face personalized expression base, which makes the expressions contained in the 3D face model more abundant, thereby ensuring that the obtained pose parameters and weight coefficients are close to the real face image. , especially the basic expression defined by the FACS refinement mainly divides the left and right symmetrical expressions. When the expression of the face image is asymmetrical, it can be effectively captured and driven, so that the obtained pose parameters and weight coefficients are close to the real ones. face image. When calculating the weight coefficients and pose parameters, by fixing the weight coefficients or pose parameters and performing iterative calculation, the error parameter formula can be converted into a linear solution formula, which simplifies the calculation process.

FIG. 12 is a flowchart of still another virtual image construction method provided by an embodiment of the present application. Referring to Figure 12, the virtual image construction method specifically includes:

Step 310: Acquire current frame image data, where the current frame image data includes a face image of the target object.

Step 320 , construct a neutral facial expression base and a plurality of personalized facial expression bases of the target object according to the current frame image data.

Step 330 , constructing a three-dimensional face model of the target object according to the neutral facial expression base and multiple personalized facial expression bases.

Step 340 , constructing an error parameter formula when the three-dimensional face model is mapped to the face image.

In the embodiment, the error parameter formula adopted is:

Cβ≤d

Among them, E _exp represents the error parameter, β represents the weight coefficient vector, β=(β ₁ β ₂ … β _n ), n represents the total number of face personalized expression bases, β _i represents the ith face personalized expression base Weight coefficient, 1≤i≤n, A=sR·ΔB, s is the scaling factor when the 3D face model is mapped to the face image, R is the rigid rotation matrix when the 3D face model is mapped to the face image, ΔB =[B ₁ -B ₀ B ₂ -B ₀ ... B _n -B ₀ ], B ₀ represents the neutral facial expression base, B _i represents the i-th face personalized expression base, b=ft-sR·B ₀ , f represents the face key points in the face image, t represents the translation vector when the three-dimensional face model is mapped to the face image, s, R and t are the pose parameters, C represents the constraint parameter of β, and d represents the Ranges. The reason for choosing the above error parameter formula is that in the subsequent iterative calculation, the calculation process is simple, and the weight parameters are effectively subjected to quadratic programming and dynamic supervision.

In one embodiment,

Among them, ones(n) represents the upper bound of the value of n weight coefficients, and zero(n) represents the lower bound of the value of n weight coefficients.

In one embodiment,

Among them, ones(n) represents the upper bound of the n weight coefficients, zero(n) represents the lower bound of the n weight coefficients, p _n and q _n are the value constraint matrices, p _n and q _n are based on the face The relative distance of the face key points in the image is determined.

For example, referring to Fig. 11, after calculating the relative distance L of the face key points of the left eye in the face image, if L≤5, it means that the probability of closing the left eye in the face image is high, then it means that the person whose left eye is closed The weight coefficient corresponding to the face personalized expression base should be relatively large. Therefore, a larger value range can be set for the weight coefficient, such as a value range of 0.9-1. At this time, p _n and the left eye are closed. The corresponding p value can be set to 1, and the q value corresponding to the left eye closure in q _n can be set to 0.9, so that the weight coefficient of the face personalized expression base representing the left eye closure in β ranges from 0.9 to 1. . After calculating the relative distance of the face key points of the mouth in the face image, if the relative distance of the face key points does not exceed the error distance (for example, L≤3), the mouth is considered to be closed, and the p value is set to 0.1 and the q value is 0 , and then make the value range of the face personalized expression base weight coefficient representing the open mouth between 0 and 0.1. If the relative distance of the key points of the face exceeds the error distance (for example, L>3), then set the p value to 1, q The value is 0, so that the value range of the weight coefficient of the face personalized expression base representing the open mouth is between 0-1. According to the above method, the relative distance of the face key points and the corresponding error distance are used as the prior information of the weight coefficient. In this way, errors caused by incorrect detection of face key points can be allowed in the detection of face key points, and the accuracy of the subsequent processing process can be ensured.

Step 350 , searching for mutually exclusive expression bases in the personalized expression bases of each face.

Mutually exclusive expression bases refer to face-specific expression bases in which the corresponding expressions cannot appear in the human face at the same time. For example, left chin and right chin cannot appear in a human face at the same time, so the corresponding two personalized facial expression bases can be regarded as mutually exclusive expression bases. For example, FIG. 13 is a schematic diagram of a mutually exclusive expression base provided by an embodiment of the present application. Referring to FIG. 13 , based on the reader’s vision, the lips and chin in the left face personalized expression base are moved to the left, and the right The lips and chin in the face personalized expression base are shifted to the right. For a human face, it can only make one of the expressions, but cannot make two expressions at the same time. For another example, FIG. 14 is a schematic diagram of another mutually exclusive expression base provided by the embodiment of the present application. Referring to FIG. 14 , the expression of the human face personalized expression base on the left is an open mouth, and the expression of the human face personalized expression base on the right is Cheeks puffed out. For a human face, it cannot bulge its cheeks when opening its mouth, so it can be considered as a mutually exclusive expression base.

It is understandable that the expressions that cannot appear at the same time in the mutually exclusive expression base not only refer to the basic expressions corresponding to the face personalized expression base, but also include the superimposed expressions. The corresponding multiple face personalized expression bases are also mutually exclusive expression bases. For example, the superimposed expression is the expression of wrinkling the nose and frowning the left eyebrow, and the other superimposed expression is the raised eyebrow and the left eyebrow tail. At this time, Two superimposed expressions cannot appear in a human face at the same time. Therefore, the facial personalized expression base of wrinkling the nose and frowning the left eyebrow and the facial personalized expression base of raising the eyebrow and raising the left eyebrow tail are mutually exclusive expression bases.

In the embodiment, taking the self-defined basic expression and the correspondingly generated facial expression base in step 230 as an example, at this time, all mutually exclusive expression bases are found in the 26 human face personalized expression bases. It is understandable that the mutually exclusive expression base can be constructed manually, and the virtual image construction device directly obtains the mutually exclusive expression base. The virtual image construction device can also gradually add basic expressions in the same three-dimensional face model and superimpose the basic expressions to determine whether the three-dimensional face model can display all expressions at the same time, thereby determining mutually exclusive expression bases.

In one embodiment, the mutually exclusive expression bases in the 26 face personalized expression bases are shown in the following table:

互斥1Mutual exclusion 1	互斥2Mutex 2
B0B0	B2B2
B1B1	B3B3
B4,B25B4,B25	B6,B7B6,B7
B5,B25B5,B25	B6,B8B6,B8
B9B9	B24B24
B10B10	B11B11
B12B12	B21B21
B13B13	B22B22

Table 2

In the above table, the face personalized expression base included in mutual exclusion 1 and the face personalized expression base included in mutual exclusion 2 in the same row are mutually exclusive expression bases. It should be noted that "B" in Table 2 corresponds to "Blendshape" in Table 1, and the number after "B" in Table 2 is the number of "Blendshape".

Step 360: Group the individualized expression bases of faces according to mutually exclusive expression bases to obtain multiple expression base groups, and any two individualized facial expression bases in each expression base group are not mutually exclusive.

Since the mutually exclusive facial personalized expression bases cannot appear in the human face at the same time, in the embodiment, the facial personalized expression bases are grouped according to the mutually exclusive expression bases, and at this time, each grouping is recorded as an expression basis set. At this time, the face personalized expression bases in each expression base group are not mutually exclusive. For example, if an expression base group contains the personalized facial expression base corresponding to B1, then it will not contain the personalized facial expression base corresponding to B3. If an expression base group contains the personalized facial expression bases corresponding to B4 and B25, then it will not contain the personalized facial expression bases corresponding to B6 and B7.

It can be understood that the specific manner of grouping is not limited by the embodiment, and it is only required that each expression base group does not contain mutually exclusive facial personalized expression bases.

Step 370: Calculate the minimum error parameter corresponding to each expression base set, and the pose parameters of the three-dimensional face model and the weight coefficients of the individual face expression bases in the expression base set when the minimum error parameter is based on the error parameter formula.

After grouping, the calculation is performed in units of expression basis groups, and one group of expression basis groups is optimized each time. Since the calculation process of each expression basis set is the same, in the embodiment, the calculation of one expression basis set is taken as an example for description. Exemplarily, according to the error parameter formula, when calculating the minimum error parameter, the weight coefficients of the individualized expression bases of each face in the expression base group and the pose parameters of the three-dimensional face model. Since the expression basis group does not contain all the face personalized expression basis, in the calculation process, the weight coefficient of the face personalized expression basis not included in the expression basis group can always be set to 0, so as to reduce the weight of the solution. the number of coefficients. It can be understood that in the calculation process, the iterative calculation method can also be used. For details, refer to the process described in step 260. The only difference is that the final obtained weight coefficient is not included in the facial expression base group. The weight factor is 0.

It is understandable that since each expression base group contains different facial expression bases, the minimum error parameters, weight coefficients and pose parameters obtained may be different when different expression base groups are used for calculation. Therefore, after each expression basis group is calculated in the above manner, each expression basis group corresponds to a minimum error parameter, a weight coefficient and a pose parameter.

Step 380: From the minimum error parameters corresponding to each expression basis set, select the smallest minimum error parameter.

The smaller the minimum error parameter is, the closer the two-dimensional image obtained when the three-dimensional face model is mapped to the two-dimensional space is closer to the face image. Therefore, in the embodiment, among the minimum error parameters corresponding to each expression basis set, select The smallest minimum error parameter. In general, the smallest minimum error parameter is only one.

Step 390: Use the pose parameter and weight coefficient corresponding to the smallest minimum error parameter as the finally obtained pose parameter and weight coefficient.

Confirm the expression basis set corresponding to the smallest minimum error parameter, and obtain the pose parameters and weight coefficients of the expression basis set as the final pose parameters and weight coefficients. It can be understood that after adjusting the three-dimensional face model by using the pose parameters and weight coefficients corresponding to the smallest minimum error parameters, the three-dimensional face model is the closest to the face image.

Step 3100: Send the pose parameters and the weight coefficients to the remote device, so that the remote device generates a virtual image corresponding to the face image according to the pose parameters and the weight coefficients.

Optionally, when the weight coefficient is sent, for the face personalized expression base not included in the expression base group, the weight coefficient is set to 0, and is sent to the remote device together with other weight coefficients. Optionally, when sending the weight coefficient, only the weight coefficient of the face personalized expression base in the expression base group corresponding to the smallest minimum error parameter is sent, and the remote device searches for the corresponding personalized expression base according to the received weight coefficient. Instead of using all the personalized expression bases, a corresponding virtual image is constructed according to the searched personalized expression bases and the corresponding weight coefficients.

Above, by acquiring the current frame image data containing the target object's face image, and constructing the target object's face neutral expression base and a plurality of face personalized expression bases according to the current frame image data, and then, according to the face neutral expression base. Build a 3D face model with multiple face personalized expression bases, and then construct the error parameter formula when the 3D face model is mapped to a face image, and then group the face personalized expression bases according to the mutually exclusive expression bases. And calculate the weight coefficient of the face personalized expression base and the pose parameters of the three-dimensional face model when the error parameter is the smallest in each expression base group, and then select the corresponding weight coefficient and pose parameter of the expression base set with the smallest error parameter, And send it to the remote device, so that the remote device can display the virtual image corresponding to the face image through the pose parameters and weight coefficients, which solves the information leakage caused by the transmission of the real face image in the related art. The technical problem of freezing reduces the demand for network bandwidth, effectively protects the privacy of the target object, and ensures the imaging quality of the remote device. Further, by finding mutually exclusive expression bases, and grouping them according to the mutually exclusive expression bases, the pose parameters and weight coefficients are calculated in units of expression base groups, which reduces the number of weight coefficients to be solved each time, thereby reducing the number of expressions. The base search space makes the expression coefficient solution more accurate and efficient, and at the same time, fewer facial expression bases are used to express the expression of the face image.

FIG. 15 is a schematic structural diagram of an apparatus for constructing a virtual image provided by an embodiment of the present application. Referring to FIG. 15 , the virtual image construction apparatus includes: an image acquisition module 401 , an expression base construction module 402 , a face model construction module 403 , a parameter determination module 404 and a parameter transmission module 405 .

Among them, the image acquisition module 401 is used to acquire the current frame image data, and the current frame image data includes the face image of the target object; the expression base construction module 402 is used to construct the neutral facial expression base of the target object according to the current frame image data and a plurality of face personalized expression bases; the face model building module 403 is used to construct a three-dimensional face model of the target object according to the neutral facial expression base and a plurality of face personalized expression bases; the parameter determination module 404 is used to determine When the three-dimensional face model is mapped to the face image, the pose parameters of the three-dimensional face model and the weight coefficients of the individualized expression bases of each face; the parameter sending module 405 is used for sending the pose parameters and the weight coefficients to the remote device, So that the remote device generates a virtual image corresponding to the face image according to the pose parameters and the weight coefficient.

On the basis of the above embodiment, the parameter determination module 404 includes: a formula construction unit for constructing an error parameter formula when the three-dimensional face model is mapped to a face image; a formula calculation unit for determining the error parameter according to the error parameter formula The minimum pose parameters of the 3D face model and the weight coefficients of the individualized expression bases of each face.

On the basis of the above-mentioned embodiment, the formula calculation unit includes: an initial parameter acquisition subunit, used to acquire the initialization weight coefficient of each face personalized expression base, and the initialization weight coefficient is used as the current weight coefficient; the first parameter is substituted into the subunit, It is used to substitute the current weight coefficient into the error parameter formula, and calculate the candidate pose parameters of the 3D face model when the error parameter is the smallest; the second parameter is substituted into the subunit, which is used to substitute the candidate pose parameters into the error parameter formula, and calculate When the error parameter is the smallest, the candidate weight coefficient of each face personalized expression base; the times update subunit is used to update the current iteration times; the times judgment subunit is used to judge whether the iteration times reach the times threshold; the return subunit is used to use When the number of iterations does not reach the number threshold, the candidate weight coefficient is used as the current weight coefficient, and the operation of substituting the current weight coefficient into the error parameter formula is returned to execute the operation until the number of iterations reaches the number threshold; the first parameter selects the subunit, which is used for When the number of iterations reaches the threshold, the finally obtained candidate pose parameters are used as the pose parameters of the 3D face model, and the finally obtained candidate weight coefficients are used as the weight coefficients of the face personalized expression base.

On the basis of the above embodiment, the error parameter formula is

Among them, Cβ≤d, where E _exp represents the error parameter, β represents the weight coefficient vector, β=(β ₁ β ₂ … β _n ), n represents the total number of personalized facial expression bases, β _i represents the ith person The weight coefficient of the face personalized expression base, 1≤i≤n, A=sR ΔB, s represents the scaling factor when the 3D face model is mapped to the face image, R represents the 3D face model is mapped to the face image The rigid rotation matrix of , ΔB=[B ₁ -B ₀ B ₂ -B ₀ ... B _n -B ₀ ], B ₀ represents the neutral facial expression base, B _i represents the i-th face personalized expression base, b= ft-sR·B ₀ , f represents the face key points in the face image, t represents the translation vector when the 3D face model is mapped to the face image, s, R and t are the pose parameters, and C represents the constraint of β parameter, d represents the value range of β.

On the basis of the above-mentioned embodiment,

Among them, ones(n) represents the upper bound of the value of n weight coefficients, and zero(n) represents the lower bound of the value of n weight coefficients; or,

Among them, ones(n) represents the upper bound of the value of n weight coefficients, zero(n) represents the lower bound of the value of n weight coefficients, p _n and q _n are the value constraint matrices, p _n and q _n according to the The relative distance of the face key points in the face image is determined.

On the basis of the above-mentioned embodiment, it also includes: an expression base search module, which is used to determine, according to the error parameter formula, the pose parameters of the three-dimensional face model when the error parameter is the smallest and the weight coefficients of the individualized expression bases of each face before , to find the mutually exclusive expression bases in the personalized expression bases of each face; the expression base grouping module is used to group the personalized expression bases of each face according to the mutually exclusive expression bases, and obtain multiple expression base groups, each of which is Any two face-personalized expression bases in the expression base group are not mutually exclusive. The formula calculation unit includes: a group calculation sub-unit, which is used to calculate the minimum error parameter corresponding to each expression basis group according to the error parameter formula, and the pose parameters of the three-dimensional face model and each person in the expression basis group when the minimum error parameter is used. The weight coefficient of the face personalized expression base; the minimum parameter selection subunit is used to select the smallest minimum error parameter among the minimum error parameters corresponding to each expression base group; the second parameter selection subunit is used to select the smallest minimum error parameter The pose parameters and weight coefficients corresponding to the error parameters are used as the finally obtained pose parameters and weight coefficients.

On the basis of the above embodiment, the expression base construction module 402 includes: a neutral expression base construction unit, configured to construct a neutral expression base of the target object according to the current frame image data and the preset prior information of the face model; The personalized expression base construction unit is used to determine the individual facial expression bases of the target object according to the neutral facial expression base, the preset reference neutral expression base and each reference personalized expression base, and each reference personalized expression base The expression base corresponds to a face personalized expression base.

On the basis of the above embodiment, the neutral expression base construction unit includes: a face image detection subunit for detecting the face image in the current frame data image; a key point location subunit for detecting the face image Perform facial key point positioning to obtain a key point coordinate array; neutral expression base determination subunit, used to determine the face of the target object according to the face image, the key point coordinate array and the preset a priori information of the face model Sexual expression base.

On the basis of the above embodiment, the personalized expression base construction unit includes: a deformation information determination subunit, used for determining deformation information according to the reference neutral expression base and the reference personalized expression base; the personalized expression base determination subunit, used for According to the deformation information and the neutral facial expression base, the facial personalized expression base of the target object is determined.

The virtual image construction device provided above can be used to execute the virtual image construction method provided by any of the above embodiments, and has corresponding functions and beneficial effects.

It is worth noting that, in the embodiment of the above virtual image construction apparatus, the units and modules included are only divided according to functional logic, but are not limited to the above division, as long as the corresponding functions can be realized; in addition, The specific names of the functional units are only for the convenience of distinguishing from each other, and are not used to limit the protection scope of the present invention.

FIG. 16 is a schematic structural diagram of a virtual image construction device according to an embodiment of the present application. As shown in FIG. 16 , the virtual image construction device includes a processor 50, a memory 51, an input device 52, and an output device 53; the number of processors 50 in the virtual image construction device may be one or more, and in FIG. 16, one process Take device 50 as an example. The processor 50 , the memory 51 , the input device 52 , and the output device 53 in the virtual image construction device may be connected by a bus or in other ways, and the connection by a bus is taken as an example in FIG. 16 .

As a computer-readable storage medium, the memory 51 can be used to store software programs, computer-executable programs, and modules, such as program instructions/modules corresponding to the virtual image construction method in the embodiment of the present invention (for example, the image acquisition module 401, expression base construction module 402, face model construction module 403, parameter determination module 404 and parameter transmission module 405). The processor 50 executes various functional applications and data processing of the virtual image construction device by running the software programs, instructions and modules stored in the memory 51 , that is, implements the above virtual image construction device method.

The memory 51 may mainly include a storage program area and a storage data area, wherein the storage program area may store an operating system, an application program required for at least one function; the storage data area may store data created according to the use of the virtual image construction apparatus, and the like. In addition, the memory 51 may include high-speed random access memory, and may also include non-volatile memory, such as at least one magnetic disk storage device, flash memory device, or other non-volatile solid-state storage device. In some instances, memory 51 may further include memory located remotely relative to processor 50, and these remote memories may be connected to the virtual image construction device through a network. Examples of such networks include, but are not limited to, the Internet, an intranet, a local area network, a mobile communication network, and combinations thereof.

The input device 52 can be used to receive input digital or character information, and generate key signal input related to user settings and function control of the virtual image construction device, and also includes image capture devices, audio capture devices, and the like. The output device 53 may include a display device such as a display screen. The virtual image construction apparatus may further include communication means for data communication with other apparatuses.

The above virtual image construction device includes a virtual image construction device, which can be used to execute any virtual image construction method, and has corresponding functions and beneficial effects.

In addition, the embodiments of the present application also provide a storage medium containing computer-executable instructions, when executed by a computer processor, the computer-executable instructions are used to execute the relevant information in the virtual image construction method provided by any embodiment of the present application. operation, and has corresponding functions and beneficial effects.

As will be appreciated by those skilled in the art, the embodiments of the present application may be provided as a method, a system, or a computer program product.

Accordingly, the present application may take the form of an entirely hardware embodiment, an entirely software embodiment, or an embodiment combining software and hardware aspects. Furthermore, the present application may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, etc.) having computer-usable program code embodied therein. The present application is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems), and computer program products according to embodiments of the present application. It will be understood that each flow and/or block in the flowchart illustrations and/or block diagrams, and combinations of flows and/or blocks in the flowchart illustrations and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to the processor of a general purpose computer, special purpose computer, embedded processor or other programmable data processing device to produce a machine such that the instructions executed by the processor of the computer or other programmable data processing device produce Means for implementing the functions specified in a flow or flow of a flowchart and/or a block or blocks of a block diagram. These computer program instructions may also be stored in a computer-readable memory capable of directing a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory result in an article of manufacture comprising instruction means, the instructions The apparatus implements the functions specified in the flow or flow of the flowcharts and/or the block or blocks of the block diagrams. These computer program instructions can also be loaded on a computer or other programmable data processing device to cause a series of operational steps to be performed on the computer or other programmable device to produce a computer-implemented process such that The instructions provide steps for implementing the functions specified in the flow or blocks of the flowcharts and/or the block or blocks of the block diagrams.

In a typical configuration, a computing device includes one or more processors (CPUs), input/output interfaces, network interfaces, and memory. Memory may include non-persistent memory in computer readable media, random access memory (RAM) and/or non-volatile memory in the form of, for example, read only memory (ROM) or flash memory (flash RAM). Memory is an example of a computer-readable medium.

Computer-readable media includes both persistent and non-permanent, removable and non-removable media, and storage of information may be implemented by any method or technology. Information may be computer readable instructions, data structures, modules of programs, or other data. Examples of computer storage media include, but are not limited to, phase-change memory (PRAM), static random access memory (SRAM), dynamic random access memory (DRAM), other types of random access memory (RAM), read only memory (ROM), Electrically Erasable Programmable Read Only Memory (EEPROM), Flash Memory or other memory technology, Compact Disc Read Only Memory (CD-ROM), Digital Versatile Disc (DVD) or other optical storage, Magnetic tape cassettes, magnetic tape magnetic disk storage or other magnetic storage devices or any other non-transmission medium that can be used to store information that can be accessed by a computing device. As defined herein, computer-readable media does not include transitory computer-readable media, such as modulated data signals and carrier waves.

It should also be noted that the terms "comprising", "comprising" or any other variation thereof are intended to encompass a non-exclusive inclusion such that a process, method, article or device comprising a series of elements includes not only those elements, but also Other elements not expressly listed, or which are inherent to such a process, method, article of manufacture, or apparatus are also included. Without further limitation, an element qualified by the phrase "comprising a..." does not preclude the presence of additional identical elements in the process, method, article of manufacture or apparatus that includes the element.

Note that the above are only preferred embodiments of the present application and applied technical principles. Those skilled in the art will understand that the present application is not limited to the specific embodiments described herein, and various obvious changes, readjustments and substitutions can be made by those skilled in the art without departing from the protection scope of the present application. Therefore, although the present application has been described in detail through the above embodiments, the present application is not limited to the above embodiments, and can also include more other equivalent embodiments without departing from the concept of the present application. The scope is determined by the scope of the appended claims.

Claims

A virtual image construction method, comprising:

Acquiring current frame image data, the current frame image data comprising the face image of the target object;

Construct a neutral facial expression base and a plurality of personalized facial expression bases of the target object according to the current frame image data;

Constructing a three-dimensional face model of the target object according to the neutral facial expression base and a plurality of individual facial expression bases;

determining the pose parameters of the three-dimensional human face model when the three-dimensional human face model is mapped to the human face image and the weight coefficients of each of the individualized expression bases of the human face;

Sending the pose parameters and the weight coefficients to a remote device, so that the remote device generates a virtual image corresponding to the face image according to the pose parameters and the weight coefficients.
The virtual image construction method according to claim 1, wherein, when the three-dimensional face model is determined to be mapped to the face image, the pose parameters of the three-dimensional face model and the individualized face The weight coefficients of the expression base include:

constructing the error parameter formula when the three-dimensional face model is mapped to the face image;

According to the error parameter formula, the pose parameters of the three-dimensional face model and the weight coefficients of the individualized expression bases of the face are determined when the error parameter is the smallest.
The virtual image construction method according to claim 2, wherein, according to the error parameter formula, the pose parameters of the three-dimensional face model and the weights of the individualized expression bases of the faces are determined when the error parameters are the smallest. Factors include:

Obtain the initialization weight coefficients of each of the face personalized expression bases, and use the initialization weight coefficients as the current weight coefficients;

Substitute the current weight coefficient into the error parameter formula, and calculate the candidate pose parameter of the three-dimensional face model when the error parameter is the smallest;

Substitute the candidate pose parameters into the error parameter formula, and calculate the candidate weight coefficients of the individualized expression bases of each face when the error parameter is the smallest;

update the current iteration count;

Judging whether the number of iterations reaches the number threshold;

When the number of iterations does not reach the threshold of the number of times, take the candidate weight coefficient as the current weight coefficient, and return to perform the operation of substituting the current weight coefficient into the error parameter formula until the number of iterations reaches the up to the number of thresholds;

When the number of iterations reaches the threshold of the number of times, the finally obtained candidate pose parameters are used as the pose parameters of the three-dimensional face model, and the finally obtained candidate weight coefficients are used as the face personalized expression base. weight factor.
The virtual image construction method according to claim 2 or 3, wherein the error parameter formula is:
Among them, Cβ≤d,

Among them, E exp represents the error parameter, β represents the weight coefficient vector, β=(β 1 β 2 ... β n ), n represents the total number of the face personalized expression base, β i represents the ith face personalized expression The weight coefficient of the basis, 1≤i≤n, A=sR·ΔB, s represents the scaling factor when the 3D face model is mapped to the face image, R represents the 3D face model is mapped to the Rigid rotation matrix of the face image, ΔB=[B 1 -B 0 B 2 -B 0 ... B n -B 0 ], B 0 represents the neutral expression base of the face, and B i represents the i-th face personality Expression base, b=ft-sR·B 0 , f represents the face key point in the face image, t represents the translation vector when the three-dimensional face model is mapped to the face image, s, R And t is the pose parameter, C represents the constraint parameter of β, and d represents the value range of β.
The virtual image construction method according to claim 4, wherein,
Wherein, ones(n) represents the upper bound of the value of the n weight coefficients, and zero(n) represents the lower bound of the value of the n weight coefficients; or,

Among them, ones(n) represents the upper bound of the value of the n weight coefficients, zero(n) represents the lower bound of the value of the n weighted coefficients, p n and q n are the value constraint matrices, p n and q n is determined according to the relative distance of the face key points in the face image.
The virtual image construction method according to claim 2, wherein, according to the error parameter formula, the pose parameters of the three-dimensional face model and the weights of the individualized expression bases of the faces are determined when the error parameters are the smallest. Before the coefficients, also include:

Find mutually exclusive expression bases in each of the face personalized expression bases;

According to the mutually exclusive expression bases, the individualized facial expression bases are grouped to obtain a plurality of expression base groups, and any two personalized facial expression bases in each of the expression base groups are not mutually exclusive. exclude;

According to the error parameter formula, it is determined that the pose parameters of the three-dimensional face model when the error parameter is the smallest and the weight coefficients of the individualized expression bases of the face include:

According to the error parameter formula, calculate the minimum error parameter corresponding to each expression basis set, as well as the pose parameters of the three-dimensional face model and the individualized expressions of each face in the expression basis set at the minimum error parameter The weight coefficient of the base;

Among the minimum error parameters corresponding to each expression basis set, select the minimum minimum error parameter;

The pose parameter and weight coefficient corresponding to the smallest minimum error parameter are used as the finally obtained pose parameter and weight coefficient.
The method for constructing a virtual image according to claim 1, wherein the constructing a neutral facial expression base and a plurality of personalized facial expression bases of the target object according to the current frame image data comprises:

Constructing the neutral facial expression base of the target object according to the current frame image data and the preset prior information of the facial model;

According to the neutral facial expression base, the preset reference neutral expression base and each reference individualized expression base, each of the individualized facial expression bases of the target object is determined, and each of the reference individualized expression bases is determined. The base corresponds to a face personalized expression base.
The virtual image construction method according to claim 7, wherein the construction of the neutral facial expression base of the target object according to the current frame image data and preset human face model prior information comprises:

Detecting the face image in the current frame data image;

Perform face key point positioning on the face image to obtain a key point coordinate array;

The neutral facial expression base of the target object is determined according to the facial image, the key point coordinate array and the preset prior information of the facial model.
The virtual image construction method according to claim 7, wherein, according to the neutral facial expression base and preset reference neutral expression bases and each reference personalized expression base, determining each person of the target object The face-personalized expression base includes:

Determine deformation information according to the reference neutral expression base and the reference personalized expression base;

According to the deformation information and the neutral facial expression base, a personalized facial expression base of the target object is determined.
A virtual image construction device, comprising:

an image acquisition module for acquiring current frame image data, where the current frame image data includes a face image of a target object;

an expression base building module for constructing a neutral facial expression base and a plurality of individualized facial expression bases of the target object according to the current frame image data;

a face model building module, used for constructing a three-dimensional face model of the target object according to the neutral facial expression base and a plurality of individual facial expression bases;

a parameter determination module, configured to determine the pose parameters of the three-dimensional human face model when the three-dimensional human face model is mapped to the human face image and the weight coefficients of the individualized expression bases of the human face;

A parameter sending module, configured to send the pose parameters and the weight coefficients to a remote device, so that the remote device generates images corresponding to the face according to the pose parameters and the weight coefficients virtual image.
A virtual image construction device, comprising:

one or more processors;

memory for storing one or more programs;

When the one or more programs are executed by the one or more processors, the one or more processors implement the virtual image construction method according to any one of claims 1-9.
A computer-readable storage medium on which a computer program is stored, wherein, when the program is executed by a processor, the virtual image construction method according to any one of claims 1-9 is implemented.