WO2022147736A1 - Virtual image construction method and apparatus, device, and storage medium - Google Patents

Virtual image construction method and apparatus, device, and storage medium Download PDF

Info

Publication number
WO2022147736A1
WO2022147736A1 PCT/CN2021/070727 CN2021070727W WO2022147736A1 WO 2022147736 A1 WO2022147736 A1 WO 2022147736A1 CN 2021070727 W CN2021070727 W CN 2021070727W WO 2022147736 A1 WO2022147736 A1 WO 2022147736A1
Authority
WO
WIPO (PCT)
Prior art keywords
face
expression
expression base
personalized
image
Prior art date
Application number
PCT/CN2021/070727
Other languages
French (fr)
Chinese (zh)
Inventor
谢新林
Original Assignee
广州视源电子科技股份有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 广州视源电子科技股份有限公司 filed Critical 广州视源电子科技股份有限公司
Priority to PCT/CN2021/070727 priority Critical patent/WO2022147736A1/en
Priority to CN202180024686.6A priority patent/CN115335865A/en
Publication of WO2022147736A1 publication Critical patent/WO2022147736A1/en

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T13/00Animation
    • G06T13/203D [Three Dimensional] animation
    • G06T13/403D [Three Dimensional] animation of characters, e.g. humans, animals or virtual beings

Definitions

  • the embodiments of the present application relate to the technical field of image processing, and in particular, to a method, apparatus, device, and storage medium for constructing a virtual image.
  • Embodiments of the present application provide a virtual image construction method, apparatus, device, and storage medium, so as to solve the technical problem of the stuck phenomenon caused by the transmission of real face images in the related art.
  • an embodiment of the present application provides a method for constructing a virtual image, including:
  • the current frame image data comprising the face image of the target object
  • an embodiment of the present application further provides a virtual image construction device, including:
  • an image acquisition module for acquiring current frame image data, where the current frame image data includes a face image of a target object
  • an expression base building module for constructing a neutral facial expression base and a plurality of individualized facial expression bases of the target object according to the current frame image data
  • a face model building module used for constructing a three-dimensional face model of the target object according to the neutral facial expression base and a plurality of individual facial expression bases;
  • a parameter determination module configured to determine the pose parameters of the three-dimensional human face model when the three-dimensional human face model is mapped to the human face image and the weight coefficients of the individualized expression bases of the human face;
  • a parameter sending module configured to send the pose parameters and the weight coefficients to a remote device, so that the remote device generates images corresponding to the face according to the pose parameters and the weight coefficients virtual image.
  • an embodiment of the present application further provides a virtual image construction device, including:
  • processors one or more processors
  • memory for storing one or more programs
  • the one or more processors When the one or more programs are executed by the one or more processors, the one or more processors implement the virtual image construction method as described in the first aspect.
  • an embodiment of the present application further provides a computer-readable storage medium on which a computer program is stored, and when the program is executed by a processor, implements the virtual image construction method described in the first aspect.
  • the above-mentioned virtual image construction method, device, equipment and storage medium by acquiring the current frame image data including the target object's face image, and constructing the target object's neutral facial expression base and a plurality of facial personalized expressions according to the current frame image data After that, a 3D face model is constructed according to the neutral face expression base and multiple face personalized expression bases. After that, the weight coefficient and pose parameters when the 3D face model is mapped to the face image are determined. The pose parameters and weight coefficients are sent to the remote device, so that the remote device can display the virtual image corresponding to the face image through the pose parameters and weight coefficients, which solves the problem caused by the transmission of real face images in the related art. Caton technical issues.
  • the transmitted weight coefficients and pose parameters can enable the remote device to display the corresponding virtual image, effectively protecting the privacy of the target object and preventing information leakage.
  • the virtual image accurately follows the expressions and poses in the face image. , to ensure the imaging quality of the remote device.
  • FIG. 1 is a flowchart of a method for constructing a virtual image provided by an embodiment of the present application
  • FIG. 2 is a schematic diagram of current frame image data provided by an embodiment of the present application.
  • FIG. 3 is a schematic diagram of a virtual image provided by an embodiment of the present application.
  • FIG. 5 is a schematic diagram of a reference three-dimensional face model provided by an embodiment of the present application.
  • FIG. 6 is a schematic diagram of a face image provided by an embodiment of the present application.
  • FIG. 7 is a schematic diagram of a three-dimensional face model of a target object provided by an embodiment of the present application.
  • FIG. 8 is a schematic diagram of key points of a human face provided by an embodiment of the present application.
  • FIG. 9 is a schematic diagram of an expression refinement partition provided by an embodiment of the present application.
  • FIG. 10 is a schematic diagram of expression transfer provided by an embodiment of the present application.
  • FIG. 11 is a schematic diagram of selecting key points of a face provided by an embodiment of the present application.
  • FIG. 13 is a schematic diagram of a mutually exclusive expression base provided by an embodiment of the present application.
  • FIG. 14 is a schematic diagram of another mutually exclusive expression base provided by an embodiment of the present application.
  • FIG. 15 is a schematic structural diagram of an apparatus for constructing a virtual image provided by an embodiment of the present application.
  • FIG. 16 is a schematic structural diagram of a virtual image construction device according to an embodiment of the present application.
  • the virtual image construction method provided by the embodiment of the present application may be executed by a virtual image construction device, and the virtual image construction device may be implemented by means of software and/or hardware, and the virtual image construction device may be composed of two or more physical entities, It can also be a physical entity.
  • the virtual image construction device may be a smart device such as a computer, a mobile phone, a tablet computer, or an interactive smart tablet.
  • the virtual image construction device is applied in the scenario of video communication using network communication technology, such as online conferences and online classes.
  • the virtual image construction device in addition to the virtual image construction device, it also includes other devices participating in video communication.
  • the other devices can be one or more, and the other devices can also be smart devices such as computers, mobile phones, tablet computers, or interactive smart tablets.
  • the virtual image construction device executes the virtual image construction method provided in this embodiment, so as to process when collecting the face image of the local user, thereby enabling other devices to display the virtual image obtained based on the face image.
  • the other devices are remote devices with respect to the virtual image construction device.
  • the virtual image construction method provided in this embodiment can also be executed when the remote device collects the face image of the user.
  • the remote device can also be considered as a virtual image construction device, while the local device is Used to display the corresponding virtual image.
  • the device used by the lecturer can be considered as a virtual image acquisition device, and the device used by the students can be considered as a remote device.
  • the device used by the current speaker may be considered as a virtual image construction device, and the devices used by other participants may be considered as remote devices.
  • the virtual image construction device is installed with at least one type of operating system, wherein the operating system includes but is not limited to an Android system, an IOS system, and/or a Windows system.
  • the virtual image construction device can install at least one application program based on the operating system, and the application program can be an application program that comes with the operating system, or it can be an application program downloaded from a third-party device or server.
  • the embodiment of the application program is not limited. , it can be understood that the virtual image construction method provided by the embodiment of the present application may also be an application program itself.
  • the virtual image constructing device is installed with at least an application program for executing the virtual avatar constructing method provided by the embodiment of the present application, and the virtual avatar constructing method is executed when the application program runs.
  • FIG. 1 is a flowchart of a method for constructing a virtual image according to an embodiment of the present application.
  • the virtual image construction method specifically includes:
  • Step 110 Acquire current frame image data, where the current frame image data includes a face image of the target object.
  • the virtual image construction device may collect image data through an image collection device (eg, a camera) installed by itself.
  • the currently collected image data is recorded as the current frame image data.
  • the current frame image data includes the face image of the target object, wherein the target object refers to the object that needs to generate a virtual image, and any object that can be recognized as a face image can be considered as the target object, and does not need to be Specified in advance; for example, in an online classroom scenario, the target object can be a lecturer using a virtual image construction device, and the face image of the target object refers to the lecturer's face image.
  • the number of target objects in the image data of the current frame is one or more.
  • the image data of the current frame includes one target object for exemplary description. In practical applications, when there are multiple target objects, the processing method of each target object is the same as the processing method of the current target object.
  • the embodiment of the technical means used to confirm whether the current frame image data contains a face image is not limited.
  • a face detection algorithm based on deep learning is used to detect the face image area in the current frame image data. face image area, it is determined that the image data of the current frame contains a face image; otherwise, it is determined that the image data of the current frame does not contain a face image.
  • Step 120 construct a neutral facial expression base and a plurality of personalized facial expression bases of the target object according to the current frame image data.
  • the expression base can be understood as a three-dimensional face model containing the position information of the key points of the face, and the expression of the person can be reflected by the position information of the key points of the face in the expression base.
  • the key points of the face can be obtained from the key parts of the face.
  • the key parts of the face include eyebrows, eyes, nose, mouth, and cheeks, etc.
  • the key points of the face are located in the above key parts and are used to describe the key parts.
  • the point of the current action At this time, the action of each key part can be determined through the face key points, and then the face pose, face position and face expression can be determined, that is, each face key point is the semantic information of the face.
  • facial expressions are divided into neutral expressions and personalized expressions.
  • Neutral expression refers to the shape of the face without any expression, which can reflect the identity of the face.
  • the identity of the face is a specific description of the shape of the face.
  • the identity of the face describes the key parts of the face.
  • the key parts described by face identity are big eyes, high nose bridge, and thin lips.
  • Personalized expressions refer to expressions made by a human face, such as eyes closed, mouth open, frowning, etc.
  • the expression bases are divided into neutral expression bases and personalized facial expression bases.
  • the neutral facial expression base can be understood as an expression base representing neutral expressions, and the shape of the human face in three-dimensional space can be confirmed through the neutral facial expression base.
  • the face personalized expression base refers to an expression base containing personalized expressions, and each face personalized expression base corresponds to a personalized expression. Understandably, since the expressions of the human face are very rich, if you want to express all the expressions of the human face, you need to build a large number of personalized facial expression bases, which will greatly increase the amount of data processing. Therefore, in the embodiment, only the face personalized expression base of the basic expression is constructed, wherein the specific content of the basic expression can be set according to the actual situation, and various expressions of the human face can be obtained by combining the basic expression and the neutral expression .
  • the basic expressions for eyes include: left eye closed, left eye wide, right eye closed and right eye wide, at this time, various expressions of eyes can be obtained according to the above four basic expressions and neutral expressions, such as , the expression with slightly squinting eyes can be obtained by linear superposition of left eye closed, right eye closed and neutral expression.
  • the neutral facial expression base and individual facial expression bases of the target object are constructed by using the facial image of the current frame image data.
  • prior information can be introduced.
  • the prior information is obtained by collecting a large amount of 3D face data, which can reflect the average coordinate data, face identity base vector and personalized expression base vector of a large amount of 3D face data, and a 3D face model can be constructed through the prior information , it can be understood that there are differences in the 3D face models obtained when different coefficients are set for the prior information.
  • the 3D face model can be regarded as a reference 3D face model, that is, a 3D face model corresponding to the target object face image in the current frame image data can be obtained by adjusting the reference 3D face model.
  • a reference 3D face model that is, a 3D face model corresponding to the target object face image in the current frame image data can be obtained by adjusting the reference 3D face model.
  • first obtain the coordinates of each face key point in the two-dimensional plane in the face image of the target object and then refer to the three-dimensional key points of the three-dimensional face model (that is, refer to the face key points in the three-dimensional face model). ) into the two-dimensional plane to determine the coordinates of the three-dimensional key points in the two-dimensional plane, and then calculate the three-dimensional key points in the reference three-dimensional face model and the face key points in the face image in the two-dimensional plane.
  • the three-dimensional key points that need to calculate the error have a corresponding relationship with the face key points, that is, the relative positions of each group of corresponding three-dimensional key points and face key points in the corresponding image are the same.
  • the three-dimensional key points and The key points of the face are the left boundary points of the eyes.
  • adjust the position of the 3D key points in the reference 3D face model according to the calculated error so as to ensure that the 3D key points of the adjusted reference 3D face model are projected to the 2D plane with the face key points in the face image.
  • the coordinates of the points should be as coincident as possible.
  • the adjustment of the positions of the three-dimensional key points can be realized by adjusting the coefficients used by the prior information.
  • the reference 3D face model can be considered as the neutral facial expression base of the target object. It should be noted that in practical applications, other methods can also be used to construct a neutral facial expression base. For example, by using a neural network, the corresponding facial image or key points in the facial image can be input into the neural network. Human face neutral expression base.
  • the neutral facial expression base is processed to obtain the personalized facial expression base of the target object.
  • the prior information is also introduced when constructing the face personalized expression base.
  • each basic expression corresponds to a prior information
  • the prior information represents the three-dimensional face model corresponding to the basic expression
  • the neutral expression also corresponds to a prior information.
  • the prior information represents a three-dimensional face model corresponding to a neutral expression
  • the basic expression and the neutral expression in the a priori information above belong to the same face.
  • first calculate the transfer deformation variable between the prior information corresponding to the neutral expression and the prior information corresponding to the basic expression that is, after transforming the prior information corresponding to the neutral expression by the transfer deformation variable, the basic expression can be obtained.
  • the neutral facial expression base is converted according to the transfer deformation variable, so as to obtain the personalized facial expression base of the target object under the basic expression. It can be understood that, according to the above method, the personalized facial expression base of the target object under each basic expression can be obtained.
  • Step 130 constructing a three-dimensional face model of the target object according to the neutral facial expression base and multiple personalized facial expression bases.
  • a weight coefficient is set for each individual face expression base, and then linear weighting is performed on each face individual face expression base and the face neutral expression base in combination with the weight coefficient, so that a target object with an expression can be obtained.
  • 3D face model is expressed as: Among them, B represents a three-dimensional face model, B 0 represents the neutral facial expression base, B i represents the ith face personalized expression base, 1 ⁇ i ⁇ n, n is the total number of face personalized expression bases, ⁇ i represents the weight coefficient corresponding to B i .
  • Step 140 Determine the pose parameters of the three-dimensional face model when the three-dimensional face model is mapped to the face image and the weight coefficients of the individualized expression bases of each face.
  • the weight coefficients of the individualized expression bases of each face in the three-dimensional human face model can be continuously adjusted, so that the expressions represented by the three-dimensional human face model are close to the expressions of the human face image.
  • the pose parameters can also be understood as rigid transformation parameters.
  • the rigid transformation refers to changing the position, orientation and size of the three-dimensional face model without changing the shape.
  • the rigid transformation parameter refers to a parameter used when performing rigid transformation on the three-dimensional face model.
  • the rigid transformation parameter includes: a rigid rotation matrix, a translation vector, and a scaling factor. The rigid rotation matrix is used to change the orientation of the 3D face model, the translation vector is used to change the position of the 3D face model, and the scaling factor is used to change the size of the 3D face model.
  • the difference between the two-dimensional image and the face image when the three-dimensional face model is mapped to the two-dimensional plane is determined by constructing an error parameter formula.
  • the error parameter formula is constructed by the coordinate difference between the two-dimensional image when the three-dimensional face model is mapped to the two-dimensional plane and the key points of the face in the face image. It is understandable that the coordinates of the key points of the face corresponding to the 3D face model can be determined by the weight coefficients and pose parameters.
  • the weight coefficients and pose parameters can be considered as unknown quantities. Constantly adjust the weight coefficients and pose parameters, so that the coordinates of the face key points corresponding to the 3D face model and the face key point coordinates in the face image are getting closer and closer, so that the error parameters are getting smaller and smaller, and the 3D face
  • the expression represented by the model is more and more the same as the real expression of the face image, and the action of the 3D face model is more and more consistent with the pose parameters of the head action of the target object in the face image. Specifically, when the calculated error parameter has reached the desired value, the currently used weight coefficient and pose parameter can be used as the finally obtained weight coefficient and pose parameter.
  • the error parameter it can be determined whether the error parameter reaches the desired value by setting the adjustment times of the weight coefficient and the pose parameter, that is, when the adjustment times reaches a certain number of times, it is determined that the error parameter reaches the desired value. It is also possible to determine whether the error parameter reaches the expected value by setting the parameter threshold, that is, when the error parameter is lower than the parameter threshold, it is determined that the error parameter reaches the expected value. Generally speaking, if the error parameter has reached the expected value, it can be considered that the expression represented by the 3D face model is sufficiently the same as the real expression of the face image, and the action of the 3D face model is the same as the head of the target object in the face image. The pose parameters of the actions are sufficiently consistent.
  • Step 150 Send the pose parameters and the weight coefficients to the remote device, so that the remote device generates a virtual image corresponding to the face image according to the pose parameters and the weight coefficients.
  • the pose parameters and weight coefficients are sent to a remote device for video communication with the virtual image construction device.
  • a virtual image is stored in the remote device, and the virtual image may be a cartoon image, which may be a two-dimensional virtual image or a three-dimensional virtual image.
  • a three-dimensional virtual image is used as an example.
  • the storage of the three-dimensional virtual image in the remote device specifically includes storing a neutral expression base and a personalized expression base of the three-dimensional virtual image, wherein each personalized expression base of the three-dimensional virtual image has the same characteristics as the corresponding personalized facial expression base. expression.
  • the user of the remote device can install an application program in the remote device, and the application program can receive the pose parameters and weight coefficients sent by the virtual image construction device, and generate the pose parameters and weight coefficients according to the pose parameters and the weight coefficients.
  • a virtual image corresponding to the face image, and the remote device stores the virtual image when the application is installed.
  • the application program is upgraded or updated, the virtual image stored in the remote device can be updated at the same time.
  • the remote device when it generates a virtual image corresponding to a face image, it can render and display a preset three-dimensional virtual image through a graphics rendering framework of an open source graphics library (Open Graphics Library, OpenGL).
  • OpenGL Open Graphics Library
  • the individualized expression base and neutral expression base of the three-dimensional virtual image are linearly weighted according to the weight coefficient, so as to obtain a three-dimensional virtual image containing expressions, wherein the linear weighting method is the same as that of constructing the three-dimensional face model in step 130.
  • the graphics rendering framework After generating the three-dimensional virtual image containing the expression, the graphics rendering framework performs corresponding rigid transformation on the three-dimensional virtual image containing the expression according to the pose parameters, and displays it after the rigid transformation is completed.
  • FIG. 2 is a schematic diagram of current frame image data provided by an embodiment of the present application.
  • FIG. 3 is a schematic diagram of a virtual image provided by an embodiment of the present application.
  • next frame of image data can be obtained, and the next frame of image data can be used as the current frame of image data, and the above process is repeated to make the remote device display continuous display. virtual image.
  • the virtual image construction device can also be set with a function control for enabling the virtual image, and the function control can be realized by physical physical buttons or virtual buttons.
  • the function control can be realized by physical physical buttons or virtual buttons.
  • the above method is executed to enable the remote control.
  • the terminal device displays the virtual image.
  • the function control is stopped being triggered, it only needs to send the current frame image data to the remote device, so that the remote device can display the current frame image. In this way, the user can determine whether to display the real image based on his own needs, thereby improving the user's experience.
  • the target object's face image by acquiring the current frame image data containing the target object's face image, and constructing the target object's face neutral expression base and a plurality of face personalized expression bases according to the current frame image data, and then, according to the face neutral expression base.
  • Build a 3D face model with multiple face personalized expression bases and then determine the weight coefficients and pose parameters when the 3D face model is mapped to the face image, and then send the pose parameters and weight coefficients to the remote device.
  • the technical means of enabling the remote device to display the virtual image corresponding to the face image through pose parameters and weight coefficients solves the technical problems of information leakage and jamming caused by the transmission of the real face image in the related art.
  • the transmitted weight coefficients and pose parameters can enable the remote device to display the corresponding virtual image, effectively protecting the privacy of the target object and preventing information leakage.
  • the virtual image accurately follows the expressions and poses in the face image. , to ensure the imaging quality of the remote device.
  • FIG. 4 is a flowchart of another virtual image construction method provided by an embodiment of the present application. This embodiment is embodied on the basis of the above-mentioned embodiment. Referring to Figure 4, the virtual image construction method specifically includes:
  • Step 210 Acquire current frame image data, where the current frame image data includes a face image of the target object.
  • Step 220 Construct a neutral facial expression base of the target object according to the current frame image data and the preset a priori information of the facial model.
  • the prior information of the face model refers to the prior information used when constructing the neutral facial expression base of the target object
  • the reference three-dimensional face model can be constructed through the prior information of the face model, and the reference three-dimensional face model In this embodiment, it is a neutral expression.
  • the face neutral expression base of the target object is obtained by fitting the face model parameters through the face image.
  • FIG. 5 is a schematic diagram of a reference three-dimensional face model provided by an embodiment of the present application
  • FIG. 6 is a schematic diagram of a face image provided by an embodiment of the present application
  • FIG. 7 is a schematic diagram of a three-dimensional face model of a target object provided by an embodiment of the present application.
  • FIG. 5 is constructed through the prior information of the face model. After that, the reference 3D face model shown in Fig. 5 is fitted with the face image shown in Fig. 6, and the target shown in Fig. 7 can be obtained.
  • the three-dimensional face model of the object can be understood.
  • the three-dimensional face model shown in FIG. 7 has no expression, it can be used as a neutral expression base of the face. It should be noted that FIG. 7 shows a side view of the three-dimensional face model.
  • the prior information of the face model can be constructed based on the three-dimensional face data in the published BFM (Basel Face Mode) database, and each three-dimensional face data can be considered as a three-dimensional face model.
  • the embodiment of the expression of the model is not limited.
  • Principal Component Analysis is used to extract 200 three-dimensional face data in the BFM database to obtain a bilinear model, wherein the bilinear model is constructed based on the 200 three-dimensional face data.
  • PCA Principal Component Analysis
  • M is the reference 3D face model
  • MU is the average coordinate data of 200 3D face data
  • MU has a total of 3h data, where h refers to the average number of point clouds of 200 3D face data.
  • Each point cloud contains the coordinates of the three axes of x, y, and z.
  • a three-dimensional face can be constructed through MU.
  • PC id is the face identity base vector obtained from 200 three-dimensional face data, which can be reflected as MU through PC id
  • the superimposed face identity that is, the face identity of the reference 3D face model can be obtained by superimposing the PC id for the MU (for example, the neutral expression face feature when there is no expression), and the PC exp is obtained by 200 3D face data.
  • Personalized expression base vector which can be expressed as the personalized expression superimposed by MU through PC exp , that is, the personalized expression of the reference 3D face model can be obtained by superimposing PC exp for MU, ⁇ id is the coefficient corresponding to the face identity base vector, ⁇ exp is the coefficient corresponding to the personalized expression base vector. That is, the PC id and PC exp are linearly weighted by ⁇ id and ⁇ exp , and the weighted results are fused into the average coordinate data of the 3D face data, so that the reference 3D face model can be obtained. It can be understood that MU, PC id , PC exp is the prior information of the face model.
  • the reference 3D face model can be mapped to the 2D plane to determine the difference between the 2D image in the 2D plane and the face image, and then adjust the face model priors according to the difference.
  • the information uses ⁇ id and ⁇ exp so that the adjusted reference 3D face model is mapped to the 2D plane and is highly similar to or the same as the face image.
  • step 220 when determining the difference between the two-dimensional image corresponding to the reference three-dimensional face model and the face image, it is specifically determined by the key points of the face.
  • step 220 includes steps 221-223:
  • Step 221 Detect the face image in the current frame data image.
  • a face recognition algorithm is used to detect the location area where the human face is located in the current frame data image, and then the location area where the human face is located is extracted to obtain the human face image.
  • Step 222 Perform facial key point positioning on the face image to obtain a key point coordinate array.
  • the face key points are detected in the face image, and the coordinates of the detected face key points are obtained, and the coordinates of each face key point are formed into a key point coordinate array.
  • FIG. 8 is a schematic diagram of face key points provided by the embodiment of the present application.
  • a total of 68 face key points are detected in the current face image, and each face key point has corresponding face semantic information.
  • the coordinates of the 68 face key points in the face image are arranged in a certain order to form a key point coordinate array.
  • the method further includes: performing a filtering operation and a smoothing operation on the key point coordinate array.
  • the filtering operation refers to adjusting the key point coordinate array of the current frame in combination with the key point coordinate array in the image data of the previous frame, so as to ensure that the key point coordinate array of the previous frame and the key point coordinate array of the current frame are smoothly gradient. , so that the coordinate arrays of key points of each frame in the video communication process are all smoothly gradient.
  • the filtering operation is implemented by means of Kalman filtering. Among them, when Kalman filtering is performed on the key point coordinate data of the current frame, the key point coordinate array of the current frame and the key point coordinate array of the previous frame are weighted to update the weighted result to the key point coordinate data of the current frame. .
  • the smoothing operation is used to avoid the situation that some face key points are outliers, so that the coordinate curve between adjacent face key points is smooth.
  • the PCA algorithm is used to perform a smoothing operation on the filtered key point coordinate array to update the key point coordinate array.
  • the key point coordinate array used subsequently is the key point coordinate array after filtering and smoothing operations.
  • Step 223 Determine the neutral facial expression base of the target object according to the facial image, the coordinate array of key points and the preset prior information of the facial model.
  • E lan (p) represents the energy constraint between the reference 3D face model and the face image
  • p represents the parameters used by the reference 3D face model
  • p includes the coefficient ⁇ id corresponding to the face identity base vector
  • the coefficient ⁇ exp corresponding to the basis vector
  • the weak perspective projection matrix ⁇ and the rigid transformation matrix ⁇ where the weak perspective projection is mainly used to project the 3D space point information (such as the reference 3D face model) to the 2D imaging plane.
  • the projection matrix refers to the matrix used when projecting the reference 3D face model to the 2D plane
  • the rigid transformation matrix may include rigid rotation matrix, translation vector, and scale factor.
  • ⁇ conf,j represents the confidence of the detection of the jth face key point in the face image
  • f j represents the coordinates of the jth face key point in the face image
  • F represents the key point coordinate array
  • v j represents the reference 3D face
  • E lan (p) when the reference three-dimensional face model is mapped to the two-dimensional plane, the more similar the coordinates of each three-dimensional key point and the corresponding face key point in the face image, the smaller the E lan (p), the more the reference three-dimensional face model.
  • Step 230 according to the neutral facial expression base of the human face and the preset reference neutral expression base and each reference individualized expression base, determine each face individualized expression base of the target object, and each reference individualized expression base corresponds to a human face.
  • Personalized expression base according to the neutral facial expression base of the human face and the preset reference neutral expression base and each reference individualized expression base, determine each face individualized expression base of the target object, and each reference individualized expression base corresponds to a human face. Personalized expression base.
  • the reference neutral expression base is a preset expression base representing neutral expressions.
  • the reference personalized expression base is an expression base obtained by adding a preset basic expression on the basis of the reference neutral expression base.
  • Each reference personalized expression base has a corresponding physical meaning.
  • the facial action coding system (Facial Action Coding System, FACS) is used to define each facial muscle action as a different action unit AU value or AD value, that is, to classify each basic expression by muscle action. For example, the AU value corresponding to "the inner eyebrow is raised upward" is recorded as AU1.
  • each AU value also includes a refinement value, which is used to indicate the movement range of the muscle.
  • the AU value including the refinement value is AU1 (0.2)
  • the current basic expression is that the inner eyebrow is pulled up.
  • the pull degree is 0.2.
  • the AU value corresponding to "eyes closed” is denoted as AU43
  • AU43(0) indicates that the eyes are normally opened
  • AU43(1) indicates that the eyes are completely closed.
  • Schematic diagram of the partition Referring to FIG. 9 , from left to right are the refinement values corresponding to the closing degrees of each eye during the process from fully opening to fully closing the eyes.
  • 26 basic expressions are defined according to muscle movements, and each basic expression corresponds to a reference personalized expression base. At this time, each reference personalized expression base, corresponding basic expression and AU value are shown in the following table:
  • Blendshape custom emoji Definition of FACS Blendshape custom emoji Definition of FACS 0 left eye closed AU43 13 right corner of mouth up AU12 1 right eye closed AU43 14 Left mouth corner abduction AU20 2 left eye widened AU5 15 right mouth corner AU20 3 right eye wide AU5 16 upper lip adducted AU28 4 frown AU4 17 Adduction of lower lip AU28 5 frown AU4 18 lower lip outward AD29 6 raised eyebrows AU1 19 upper lip up AU10 7 Pick left eyebrow AU2 20 lower lip down AU16 8 Pick right eyebrow AU2 twenty one left corner of mouth down AU17 9 open mouth AU26 twenty two right corner of mouth down AU17 10 Chin left AD30 twenty three pouting AU18 11 Chin right AD30 twenty four cheeks bulge AD34 12 Left corner of mouth up AU12 25 wrinkled nose AU9
  • Blendshape represents the personalized expression base
  • 0-25 is the number of 26 personalized expression bases
  • the custom expression is the basic expression corresponding to each expression base
  • the FACS definition represents the AU value or AD corresponding to each personalized expression base value.
  • the deformation information required when the reference neutral expression base is transformed to the reference individual expression base can be determined, and the deformation information can also be regarded as the reference neutral expression base.
  • the deformation information is obtained by means of three-dimensional mesh deformation.
  • step 230 includes steps 231-232:
  • Step 231 Determine deformation information according to the reference neutral expression base and the reference personalized expression base.
  • the reference neutral expression base can be divided into a plurality of triangular patches after triangulating the face key points in the reference neutral expression base according to the arrangement order by using the Delaunay triangulation algorithm.
  • the three vertices of the patch are three face key points that form a triangle, and each triangular patch can form a three-dimensional mesh representing the reference neutral expression base.
  • the reference personalized expression base can be divided into multiple triangular patches, each triangular patch
  • the three vertices of are the three face key points that form a triangle, and each triangular facet can form a three-dimensional mesh representing the reference personalized expression base.
  • Each triangular face in the reference personalized expression base is in one-to-one correspondence with each triangular face in the reference neutral expression base. According to the corresponding relationship, it can be determined that the deformation of the triangular face in the reference neutral expression base corresponds to the reference individual expression base.
  • the deformation information of the triangular patch represents the transfer deformation variables (rotation matrix, translation vector, scaling factor, etc.) used in the deformation of the triangular face in the reference neutral expression base, so that the deformed triangular face corresponds to the reference personalized expression base.
  • Triangular patches are the same. Each triangular patch corresponds to a deformation information.
  • the deformation information of each triangular facet constitutes the deformation information from the reference neutral expression base to the current reference individual expression base. Understandably, each reference personalized expression base has corresponding deformation information.
  • Step 232 determining the personalized facial expression base of the target object according to the deformation information and the neutral facial expression base.
  • three-dimensional network registration is performed on the neutral expression base of the face and the reference neutral expression base.
  • Three-dimensional space transformation (such as scaling, rotation, and translation) is performed on each triangular face in the neutral expression base, so that the transformed triangular facets are in one-to-one correspondence with each triangular face in the neutral face expression base.
  • the reference neutral expression base after three-dimensional space transformation can be called the deformed reference neutral expression base.
  • the three-dimensional coordinates in the three-dimensional space of each triangular patch in the deformed reference neutral expression base and the corresponding triangular face in the human face neutral expression base are highly similar or identical.
  • smooth constraints and key point constraints are used for each face in the deformed reference neutral expression base.
  • the three-dimensional coordinates of the key points are processed.
  • 3D smoothing can be used for smoothing constraints
  • PCA algorithm is used for key point constraints.
  • the deformed reference neutral expression base is the same as the human face neutral expression base.
  • the triangular patches in the deformed reference neutral expression base and the face neutral expression base can be determined through the k-d tree.
  • the corresponding relationship between the triangular facets in the reference neutral expression base and the triangular facets in the neutral face expression base is determined.
  • a k-d tree can be understood as a data structure that organizes points in a k-dimensional Euclidean space.
  • each triangular facet in the reference neutral expression base when used to transform to a certain reference personalized expression base, the deformation information processes the corresponding triangular faces in the neutral expression base of the face, that is, deforms the triangular faces, so that the deformed triangular faces are used as the triangular faces in the face personalized expression base.
  • the individualized facial expression base after processing each triangular face in the neutral facial expression base, the individualized facial expression base corresponding to the reference individualized expression base can be obtained.
  • each reference personalized expression base After processing in the above manner, each reference personalized expression base has a corresponding face personalized expression base.
  • the above processing process can be calculated by the deformation formula, wherein the deformation formula of a triangular patch is expressed as:
  • V T represents the vertex-related information of the corresponding triangular facets in the face personalized expression base
  • V S represents the vertex-related information of the corresponding triangular facets in the reference neutral expression base
  • V S is the vertex-related information of the corresponding triangular facets in the reference neutral expression base
  • V T [v T2 -v T1 v T3 -v T1 v T4 -v T1 ]
  • v T1 , v T2 and v T3 are the corresponding triangular faces in the neutral expression base of the face
  • v T4 is the normal vector of the triangular patch
  • FIG. 10 is a schematic diagram of expression transfer provided by an embodiment of the present application.
  • the first column in the first row is a reference neutral expression base
  • the second to fourth columns in the first row are three reference personalized expression bases, and the corresponding basic expressions are closed right eye, open mouth and pouting.
  • the first column in the second row is the neutral facial expression base, and the deformation information is determined according to the reference neutral expression base and each reference personalized expression base.
  • Face personalized expression base the second row, second row to fourth column in Figure 10 are the face personalized expression base obtained according to the reference personalized expression base of the first row, second row to fourth column, at this time,
  • the basic expressions corresponding to the three face personalized expression bases are closing the right eye, opening the mouth, and pouting, namely, the basic expressions are transferred from the reference personalized expression base to the face personalized expression base.
  • Step 240 constructing a three-dimensional face model of the target object according to the neutral facial expression base and multiple personalized facial expression bases.
  • Step 250 constructing an error parameter formula when the three-dimensional face model is mapped to the face image.
  • the error parameter formula can also be understood as an energy function.
  • the construction rule of the error parameter formula can be set according to the actual situation.
  • the error parameter formula is constructed by minimizing the residual error. In this case, the error parameter formula is:
  • E represents the error parameter
  • B represents the three-dimensional face model
  • B 0 represents the neutral facial expression base of the target object
  • B i represents the ith face personalized expression base of the target object
  • 1 ⁇ i ⁇ n n is the total number of face personalized expression bases
  • ⁇ i represents B i
  • B k represents the kth face key point in the three-dimensional face model
  • f k represents the face image In the kth face key point
  • s represents the scaling factor when the 3D face model is mapped to the face image
  • R represents the rigid rotation matrix when the 3D face model is mapped to the face image
  • t represents the 3D face model mapped to the face image.
  • the translation vector of the face image, s, R, and t are the pose parameters, and
  • the error parameter formula is constructed by means of the linear least squares method, that is, the above-mentioned minimized residuals are converted into the form of solving by the linear least squares method.
  • the least squares method also known as the least squares method
  • the unknown data can be easily obtained by using the least squares method (in the embodiment, ⁇ i , s, R and t are unknown data), And make the sum of squares of errors between the obtained data and the actual data to be the smallest.
  • the error parameter formula can be expressed as:
  • E' exp represents the error parameter
  • ⁇ B [B 1 -B 0 B 2 -B 0 ... B n -B 0 ]
  • B 0 represents the neutral facial expression base
  • B i represents the ith person's personalized facial expression Base
  • 1 ⁇ i ⁇ n n represents the total number of face personalized expression bases
  • represents the weight coefficient vector
  • ( ⁇ 1 ⁇ 2 ... ⁇ n )
  • ⁇ i represents the i-th face personalized expression base
  • Weight coefficient
  • s represents the scaling factor when the 3D face model is mapped to the face image
  • R represents the rigid rotation matrix of the face when the 3D face model is mapped to the face image
  • t represents the 3D face model is mapped to the face image
  • a ⁇ can reflect the difference between the personalized facial expression base and the neutral facial expression base in the two-dimensional plane
  • b can reflect the difference between the neutral facial expression base and the face image in the two-dimensional plane. It can be understood that the closer the 3D face model is to the face image, the smaller the difference between A ⁇ and b.
  • the error parameter formula is a linear equation system, and at this time, the solved ⁇ is (A T A) -1 ⁇ A T b.
  • the error parameter formula when ⁇ is solved by the linear least squares method in the previous embodiment is a linear equation system, and the solved ⁇ is (A T A) ⁇ 1 ⁇ A T b.
  • the value range of ⁇ includes both positive and negative numbers, and negative numbers are meaningless for the 3D face model, that is, the weight coefficient cannot be negative.
  • each time ⁇ is solved it is solved by the difference value of the key points of the face when the 3D face model is mapped. If the detection process of the key points of the face in the face image is wrong, it will affect the accuracy of the calculation results.
  • the mouth in the face image is closed, but due to the detection error of the key points of the face, there is a certain distance between the key points of the face in the upper lip and the lower lip (the two key points should be extremely coincident or completely coincident), so that the possibility of recognizing the mouth as open will appear in the subsequent calculation process. Therefore, in the embodiment, quadratic programming and dynamic constraints are performed on ⁇ to avoid the above problems. At this time, the dynamic equation of ⁇ can be expressed as C ⁇ d, that is, the error parameter formula is:
  • C represents the constraint parameter of ⁇
  • d represents the value range of ⁇ .
  • C and d are constraints on ⁇
  • eye represents the unit matrix
  • eye(n) represents the unit matrix corresponding to the personalized expression base of n faces.
  • the specific value of d can be set according to the actual situation. For example, ⁇ should be in the range of 0.5-1, so d can be set between 0.5 and 1.
  • ones(n) represents the upper bound of n weight coefficients, which contains n values, each value corresponds to a weight coefficient
  • zero(n) represents the lower bound of n weight coefficients, which contains n values , each value corresponds to a weight coefficient.
  • the weight coefficient should be between 0-1, therefore, ones(n) can be n 1s, and zero(n) can be n 0s. which is Among them, both 1 and 0 are n.
  • the value range of the weight coefficient can be fixed between 0-1 to prevent the occurrence of negative numbers.
  • ones(n) represents the upper bound of the n weight coefficients
  • zero(n) represents the lower bound of the n weight coefficients
  • p n and q n are the value constraint matrices
  • p n and q n are based on the face
  • the relative distance of the face key points in the image is determined. The relative distance refers to the pixel distance of the face key points in the face image.
  • each face personalized expression base corresponds to a p value and a q value
  • n p values form p n
  • n q values form q n .
  • the weight coefficients corresponding to different facial personalized expression bases can have different value ranges.
  • FIG. 11 is a schematic diagram of face key point selection according to an embodiment of the present application.
  • face key points corresponding to the left eye there are 6 face key points corresponding to the left eye, among which the face key point P1 and the face key point P2 are a group of faces located in the upper eyelid and the lower eyelid in the left eye respectively.
  • Key points, face key point P3 and face key point P4 are a group of face key points located in the upper eyelid and lower eyelid in the left eye respectively.
  • the distance used to determine whether the left eye is closed can be regarded as the relative distance of the face key points of the left eye
  • the relative distance of the face key points of the left eye is L represents the relative distance of the face key point, it can be understood that L is the pixel distance
  • p 1 represents the two-dimensional coordinate of the face key point P1 in the face image
  • p 2 represents the second position of the face key point P2 in the face image.
  • p 3 represents the two-dimensional coordinates of the face key point P3 in the face image
  • p 4 represents the two-dimensional coordinates of the face key point P4 in the face image.
  • the weight coefficient corresponding to the face personalized expression base indicating that the left eye is closed should be larger, so , you can set a larger value range for the weight coefficient, such as setting a value range of 0.9-1, at this time, the p value corresponding to the left eye closed in p n can be 1, and the left eye closed in q n The corresponding q value can be set to 0.9, so that the value range of the weight coefficient of the face personalized expression base representing the left eye closed in ⁇ is between 0.9 and 1.
  • Whether the right eye is closed can be determined by calculating the relative distance of the face key points of the two groups of face key points (face key points in the box) of the right eye, and then the weight coefficient of the face personalized expression base corresponding to the right eye closed Set a reasonable value range.
  • the calculation method of the relative distance of the face key points corresponding to the face personalized expression base, the error distance, and the p and q values when the relative distance of the face key points does not exceed the error distance, and the relative distance of the face key points are predetermined. p- and q-values when the error distance is exceeded. Then, when constructing the error parameter formula, the p value and q value are determined by calculating the relative distance of the key points of the face, and then the value range of the weight coefficient is determined. At this time, the relative distance of the key points of the face and the corresponding error distance can be regarded as the weight. prior information on the coefficients. In this way, errors caused by incorrect detection of face key points can be allowed during the detection of face key points, and the accuracy of the subsequent processing process is ensured.
  • E exp represents the error parameter
  • represents the weight coefficient vector
  • ( ⁇ 1 ⁇ 2 ... ⁇ n )
  • n represents the total number of face personalized expression bases
  • ⁇ i represents the ith face personalized expression base
  • Weight coefficient 1 ⁇ i ⁇ n
  • A sR ⁇ B
  • s is the scaling factor when the 3D face model is mapped to the face image
  • R is the rigid rotation matrix when the 3D face model is mapped to the face image
  • ⁇ B [B 1 -B 0 B 2 -B 0 ...
  • B 0 represents the neutral facial expression base
  • B i represents the i-th face personalized expression base
  • b ft-sR ⁇ B 0
  • f represents the key points of the face in the face image
  • t represents the translation vector when the three-dimensional face model is mapped to the face image
  • s, R and t are the pose parameters
  • C represents the constraint parameter of ⁇
  • d represents the Ranges.
  • the manner of determining C and d may refer to the foregoing embodiment.
  • the error parameter formula can also be constructed by the LI regular optimization method, and in order to ensure that the weight coefficient is within the correct value range, when constructing the error parameter formula, the L1 regularity can be combined with the gradient projection, that is, in each When calculating the weight coefficient using the L1 regularity, the gradient of the weight coefficient is projected into the value range of the weight coefficient to ensure that the final calculated weight coefficient is within the corresponding value range.
  • the constructed error parameter formula is:
  • is the L1 regular coefficient, and its value can be set according to the actual situation
  • ⁇ j is the th j is the weight coefficient of the individual face expression base
  • ⁇ k is the weight coefficient of the kth face individual expression base
  • n is the total number of face individual expression bases
  • m is the total number of
  • the weight coefficient of the personalized expression base of the jth face when processing the image data of the previous frame. According to the above formula, the weight coefficient of each face personalized expression base can be calculated, and then the pose parameter can be calculated according to the weight coefficient.
  • the error parameter formula used in the subsequent calculation process is:
  • Step 260 Determine, according to the error parameter formula, the pose parameters of the three-dimensional face model when the error parameters are the smallest and the weight coefficients of the individualized expression bases of each face.
  • the unknowns in the error parameter formula include pose parameters and weight coefficients. Therefore, the pose parameters and weight parameters used in the error parameter formula when the error parameter is the smallest can be determined through the error parameter formula, and then the pose parameters and weight parameters are used as the final calculated pose parameters and weight coefficients.
  • the calculation may be performed in an alternate iterative manner. For example, first set the initialization parameters for the weight coefficient, then substitute the initialization parameters into the error parameter formula to fix the weight coefficient in the error parameter formula, and perform calculation to determine the value of the pose parameter when the error parameter is the smallest in the current calculation process. After that, the value of the calculated pose parameter is substituted into the error parameter formula again to fix the pose parameter, and the calculation is performed to determine the parameter of the weight coefficient when the error parameter is the smallest in the current calculation process.
  • step 260 includes steps 261-267:
  • Step 261 Obtain the initialization weight coefficients of each face personalized expression base, and use the initialization weight coefficients as the current weight coefficients.
  • the initialization weight coefficient refers to a preset weight coefficient, that is, a weight coefficient is preset for each individual face expression base.
  • the specific value of the initialization weight coefficient can be set according to the actual situation. For example, according to the value range of the weight coefficient of the face personalized expression base, a value boundary is selected as the initialization weight coefficient of the face personalized expression base.
  • the currently used weight coefficients are recorded as the current weight coefficients. weight factor.
  • Step 262 Substitute the current weight coefficient into the error parameter formula, and calculate the candidate pose parameters of the three-dimensional face model when the error parameter is the smallest.
  • the current weight coefficient is substituted into the error parameter formula, so that the weight coefficient in the error parameter formula is a fixed value (the value of the current weight coefficient).
  • the unknowns are only the pose parameters.
  • the calculation is performed according to the error parameter formula to determine the specific value of the pose parameter when the error parameter is the smallest in the current calculation process.
  • the pose parameters obtained by this calculation are recorded as candidate pose parameters.
  • the candidate pose parameters can be understood as intermediate values, and the purpose of calculating the candidate pose parameters is to obtain the final pose parameters.
  • Step 263 Substitute the candidate pose parameters into the error parameter formula, and calculate the candidate weight coefficients of the individualized expression bases for each face when the error parameters are the smallest.
  • the currently calculated candidate pose parameters are substituted into the error parameter formula, so that the pose parameters in the error reference formula are fixed values.
  • the error parameter formula whose unknowns only have weight coefficients.
  • the calculation is performed according to the error parameter formula to determine the specific value of the weight coefficient when the error parameter is the smallest in the current calculation process.
  • the weight coefficient obtained by this calculation is recorded as the candidate weight coefficient.
  • the candidate weight coefficient can be understood as an intermediate value, and the purpose of calculating the candidate weight coefficient is to obtain the final weight coefficient.
  • Step 264 update the current number of iterations.
  • an iterative calculation process refers to a process of obtaining candidate pose parameters and candidate weight coefficients after substituting the current weight coefficient into the error parameter formula. After the candidate pose parameters and the candidate weight coefficients are obtained, it is determined that one iteration calculation is completed, and the number of iterations is updated, that is, the current number of iterations is increased by 1. It can be understood that after each candidate weight coefficient is obtained, the number of iterations is incremented by 1, and the candidate weight coefficient and candidate pose parameter calculated by the latest iteration are used as the current and final candidate weight coefficient and candidate pose. parameter.
  • Step 265 Determine whether the number of iterations reaches the number threshold, and when the number of iterations does not reach the number threshold, perform step 266. When the number of iterations reaches the number threshold, step 267 is executed.
  • the number of times threshold is used to confirm whether to stop the iterative calculation.
  • the number of times threshold may be set in combination with the actual situation. For example, an appropriate number of times threshold may be determined in combination with historical experience data. In this embodiment, the number of times threshold is 5. Exemplarily, after updating the number of iterations, it is determined whether the current number of iterations reaches the number threshold, if so, stop the iterative calculation, and execute step 266 , if not, continue the iterative calculation and execute step 267 .
  • Step 266 take the candidate weight coefficient as the current weight coefficient, and return to step 262 .
  • the candidate weight coefficient obtained by this iterative calculation is used as the current weight coefficient, and the process returns to step 262 to start a new iterative calculation.
  • Step 267 Use the finally obtained candidate pose parameters as the pose parameters of the three-dimensional face model, and use the finally obtained candidate weight coefficients as the weight coefficients of the face personalized expression base.
  • the candidate pose parameters and the candidate weight coefficients finally obtained refer to the candidate pose parameters and the candidate weight coefficients calculated by the latest iteration when the number of iterations reaches the number threshold.
  • the iterative calculation is stopped, and the finally obtained candidate pose parameters and candidate weight coefficients are used as the pose parameters of the final 3D face model and the weight coefficients of the face personalized expression base.
  • Step 270 Send the pose parameters and the weight coefficients to the remote device, so that the remote device generates a virtual image corresponding to the face image according to the pose parameters and the weight coefficients.
  • the neutral facial expression base of the target object is constructed according to the current frame image data and the preset a priori information of the facial model by acquiring the current frame image data including the facial image of the target object, and then according to the neutral facial expression Based on the base, the reference neutral expression base and the reference personalized expression base, the personalized expression base of the face is obtained, and the 3D face model is constructed according to the individualized expression base and the neutral expression base of each face, and the 3D face model and the human face are constructed.
  • the error parameter formula of the face image then, according to the error parameter formula, determine the weight coefficient of the face personalized expression base and the pose parameters of the 3D face model when the error parameter is the smallest, and send the weight coefficient and pose parameters to the remote device.
  • each basic expression corresponds to a face personalized expression base, which makes the expressions contained in the 3D face model more abundant, thereby ensuring that the obtained pose parameters and weight coefficients are close to the real face image.
  • the basic expression defined by the FACS refinement mainly divides the left and right symmetrical expressions.
  • the expression of the face image is asymmetrical, it can be effectively captured and driven, so that the obtained pose parameters and weight coefficients are close to the real ones. face image.
  • the error parameter formula can be converted into a linear solution formula, which simplifies the calculation process.
  • FIG. 12 is a flowchart of still another virtual image construction method provided by an embodiment of the present application.
  • the virtual image construction method specifically includes:
  • Step 310 Acquire current frame image data, where the current frame image data includes a face image of the target object.
  • Step 320 construct a neutral facial expression base and a plurality of personalized facial expression bases of the target object according to the current frame image data.
  • Step 330 constructing a three-dimensional face model of the target object according to the neutral facial expression base and multiple personalized facial expression bases.
  • Step 340 constructing an error parameter formula when the three-dimensional face model is mapped to the face image.
  • the error parameter formula adopted is:
  • E exp represents the error parameter
  • represents the weight coefficient vector
  • ( ⁇ 1 ⁇ 2 ... ⁇ n )
  • n represents the total number of face personalized expression bases
  • ⁇ i represents the ith face personalized expression base
  • Weight coefficient 1 ⁇ i ⁇ n
  • A sR ⁇ B
  • s is the scaling factor when the 3D face model is mapped to the face image
  • R is the rigid rotation matrix when the 3D face model is mapped to the face image
  • ⁇ B [B 1 -B 0 B 2 -B 0 ...
  • B 0 represents the neutral facial expression base
  • B i represents the i-th face personalized expression base
  • b ft-sR ⁇ B 0
  • f represents the face key points in the face image
  • t represents the translation vector when the three-dimensional face model is mapped to the face image
  • s, R and t are the pose parameters
  • C represents the constraint parameter of ⁇
  • d represents the Ranges.
  • ones(n) represents the upper bound of the value of n weight coefficients
  • zero(n) represents the lower bound of the value of n weight coefficients
  • ones(n) represents the upper bound of the n weight coefficients
  • zero(n) represents the lower bound of the n weight coefficients
  • p n and q n are the value constraint matrices
  • p n and q n are based on the face The relative distance of the face key points in the image is determined.
  • the weight coefficient corresponding to the face personalized expression base should be relatively large. Therefore, a larger value range can be set for the weight coefficient, such as a value range of 0.9-1.
  • p n and the left eye are closed.
  • the corresponding p value can be set to 1
  • the q value corresponding to the left eye closure in q n can be set to 0.9, so that the weight coefficient of the face personalized expression base representing the left eye closure in ⁇ ranges from 0.9 to 1. .
  • the relative distance of the face key points of the mouth in the face image After calculating the relative distance of the face key points of the mouth in the face image, if the relative distance of the face key points does not exceed the error distance (for example, L ⁇ 3), the mouth is considered to be closed, and the p value is set to 0.1 and the q value is 0 , and then make the value range of the face personalized expression base weight coefficient representing the open mouth between 0 and 0.1. If the relative distance of the key points of the face exceeds the error distance (for example, L>3), then set the p value to 1, q The value is 0, so that the value range of the weight coefficient of the face personalized expression base representing the open mouth is between 0-1. According to the above method, the relative distance of the face key points and the corresponding error distance are used as the prior information of the weight coefficient. In this way, errors caused by incorrect detection of face key points can be allowed in the detection of face key points, and the accuracy of the subsequent processing process can be ensured.
  • the error distance for example, L ⁇ 3
  • Step 350 searching for mutually exclusive expression bases in the personalized expression bases of each face.
  • FIG. 13 is a schematic diagram of a mutually exclusive expression base provided by an embodiment of the present application. Referring to FIG. 13 , based on the reader’s vision, the lips and chin in the left face personalized expression base are moved to the left, and the right The lips and chin in the face personalized expression base are shifted to the right. For a human face, it can only make one of the expressions, but cannot make two expressions at the same time. For another example, FIG.
  • FIG. 14 is a schematic diagram of another mutually exclusive expression base provided by the embodiment of the present application.
  • the expression of the human face personalized expression base on the left is an open mouth
  • the expression of the human face personalized expression base on the right is Cheeks puffed out.
  • a human face it cannot bulge its cheeks when opening its mouth, so it can be considered as a mutually exclusive expression base.
  • the expressions that cannot appear at the same time in the mutually exclusive expression base not only refer to the basic expressions corresponding to the face personalized expression base, but also include the superimposed expressions.
  • the corresponding multiple face personalized expression bases are also mutually exclusive expression bases.
  • the superimposed expression is the expression of wrinkling the nose and frowning the left eyebrow
  • the other superimposed expression is the raised eyebrow and the left eyebrow tail.
  • Two superimposed expressions cannot appear in a human face at the same time. Therefore, the facial personalized expression base of wrinkling the nose and frowning the left eyebrow and the facial personalized expression base of raising the eyebrow and raising the left eyebrow tail are mutually exclusive expression bases.
  • the mutually exclusive expression base can be constructed manually, and the virtual image construction device directly obtains the mutually exclusive expression base.
  • the virtual image construction device can also gradually add basic expressions in the same three-dimensional face model and superimpose the basic expressions to determine whether the three-dimensional face model can display all expressions at the same time, thereby determining mutually exclusive expression bases.
  • the mutually exclusive expression bases in the 26 face personalized expression bases are shown in the following table:
  • the face personalized expression base included in mutual exclusion 1 and the face personalized expression base included in mutual exclusion 2 in the same row are mutually exclusive expression bases. It should be noted that “B” in Table 2 corresponds to “Blendshape” in Table 1, and the number after “B” in Table 2 is the number of "Blendshape”.
  • Step 360 Group the individualized expression bases of faces according to mutually exclusive expression bases to obtain multiple expression base groups, and any two individualized facial expression bases in each expression base group are not mutually exclusive.
  • the facial personalized expression bases are grouped according to the mutually exclusive expression bases, and at this time, each grouping is recorded as an expression basis set.
  • the face personalized expression bases in each expression base group are not mutually exclusive. For example, if an expression base group contains the personalized facial expression base corresponding to B1, then it will not contain the personalized facial expression base corresponding to B3. If an expression base group contains the personalized facial expression bases corresponding to B4 and B25, then it will not contain the personalized facial expression bases corresponding to B6 and B7.
  • each expression base group does not contain mutually exclusive facial personalized expression bases.
  • Step 370 Calculate the minimum error parameter corresponding to each expression base set, and the pose parameters of the three-dimensional face model and the weight coefficients of the individual face expression bases in the expression base set when the minimum error parameter is based on the error parameter formula.
  • the calculation is performed in units of expression basis groups, and one group of expression basis groups is optimized each time. Since the calculation process of each expression basis set is the same, in the embodiment, the calculation of one expression basis set is taken as an example for description.
  • the error parameter formula when calculating the minimum error parameter, the weight coefficients of the individualized expression bases of each face in the expression base group and the pose parameters of the three-dimensional face model. Since the expression basis group does not contain all the face personalized expression basis, in the calculation process, the weight coefficient of the face personalized expression basis not included in the expression basis group can always be set to 0, so as to reduce the weight of the solution. the number of coefficients. It can be understood that in the calculation process, the iterative calculation method can also be used. For details, refer to the process described in step 260. The only difference is that the final obtained weight coefficient is not included in the facial expression base group. The weight factor is 0.
  • each expression base group contains different facial expression bases, the minimum error parameters, weight coefficients and pose parameters obtained may be different when different expression base groups are used for calculation. Therefore, after each expression basis group is calculated in the above manner, each expression basis group corresponds to a minimum error parameter, a weight coefficient and a pose parameter.
  • Step 380 From the minimum error parameters corresponding to each expression basis set, select the smallest minimum error parameter.
  • Step 390 Use the pose parameter and weight coefficient corresponding to the smallest minimum error parameter as the finally obtained pose parameter and weight coefficient.
  • the three-dimensional face model is the closest to the face image.
  • Step 3100 Send the pose parameters and the weight coefficients to the remote device, so that the remote device generates a virtual image corresponding to the face image according to the pose parameters and the weight coefficients.
  • the weight coefficient when the weight coefficient is sent, for the face personalized expression base not included in the expression base group, the weight coefficient is set to 0, and is sent to the remote device together with other weight coefficients.
  • the weight coefficient when sending the weight coefficient, only the weight coefficient of the face personalized expression base in the expression base group corresponding to the smallest minimum error parameter is sent, and the remote device searches for the corresponding personalized expression base according to the received weight coefficient. Instead of using all the personalized expression bases, a corresponding virtual image is constructed according to the searched personalized expression bases and the corresponding weight coefficients.
  • the technical problem of freezing reduces the demand for network bandwidth, effectively protects the privacy of the target object, and ensures the imaging quality of the remote device.
  • the pose parameters and weight coefficients are calculated in units of expression base groups, which reduces the number of weight coefficients to be solved each time, thereby reducing the number of expressions.
  • the base search space makes the expression coefficient solution more accurate and efficient, and at the same time, fewer facial expression bases are used to express the expression of the face image.
  • FIG. 15 is a schematic structural diagram of an apparatus for constructing a virtual image provided by an embodiment of the present application.
  • the virtual image construction apparatus includes: an image acquisition module 401 , an expression base construction module 402 , a face model construction module 403 , a parameter determination module 404 and a parameter transmission module 405 .
  • the image acquisition module 401 is used to acquire the current frame image data, and the current frame image data includes the face image of the target object;
  • the expression base construction module 402 is used to construct the neutral facial expression base of the target object according to the current frame image data and a plurality of face personalized expression bases;
  • the face model building module 403 is used to construct a three-dimensional face model of the target object according to the neutral facial expression base and a plurality of face personalized expression bases;
  • the parameter determination module 404 is used to determine When the three-dimensional face model is mapped to the face image, the pose parameters of the three-dimensional face model and the weight coefficients of the individualized expression bases of each face;
  • the parameter sending module 405 is used for sending the pose parameters and the weight coefficients to the remote device, So that the remote device generates a virtual image corresponding to the face image according to the pose parameters and the weight coefficient.
  • the parameter determination module 404 includes: a formula construction unit for constructing an error parameter formula when the three-dimensional face model is mapped to a face image; a formula calculation unit for determining the error parameter according to the error parameter formula The minimum pose parameters of the 3D face model and the weight coefficients of the individualized expression bases of each face.
  • the weight coefficient of the face personalized expression base, 1 ⁇ i ⁇ n, A sR ⁇ B, s represents the scaling factor when the 3D face model is mapped to the face image, R represents the 3D face model is mapped to the face image
  • the rigid rotation matrix of , ⁇ B [B 1 -B 0 B 2 -B 0 ...
  • B 0 represents the neutral facial expression base
  • B i represents the i-th face personalized expression base
  • b ft-sR ⁇ B 0
  • f represents the face key points in the face image
  • t represents the translation vector when the 3D face model is mapped to the face image
  • s, R and t are the pose parameters
  • C represents the constraint of ⁇ parameter
  • d represents the value range of ⁇ .
  • ones(n) represents the upper bound of the value of n weight coefficients
  • zero(n) represents the lower bound of the value of n weight coefficients
  • ones(n) represents the upper bound of the value of n weight coefficients
  • zero(n) represents the lower bound of the value of n weight coefficients
  • p n and q n are the value constraint matrices, p n and q n according to the The relative distance of the face key points in the face image is determined.
  • an expression base search module which is used to determine, according to the error parameter formula, the pose parameters of the three-dimensional face model when the error parameter is the smallest and the weight coefficients of the individualized expression bases of each face before , to find the mutually exclusive expression bases in the personalized expression bases of each face;
  • the expression base grouping module is used to group the personalized expression bases of each face according to the mutually exclusive expression bases, and obtain multiple expression base groups, each of which is Any two face-personalized expression bases in the expression base group are not mutually exclusive.
  • the formula calculation unit includes: a group calculation sub-unit, which is used to calculate the minimum error parameter corresponding to each expression basis group according to the error parameter formula, and the pose parameters of the three-dimensional face model and each person in the expression basis group when the minimum error parameter is used.
  • the weight coefficient of the face personalized expression base is used to select the smallest minimum error parameter among the minimum error parameters corresponding to each expression base group; the second parameter selection subunit is used to select the smallest minimum error parameter
  • the pose parameters and weight coefficients corresponding to the error parameters are used as the finally obtained pose parameters and weight coefficients.
  • the expression base construction module 402 includes: a neutral expression base construction unit, configured to construct a neutral expression base of the target object according to the current frame image data and the preset prior information of the face model;
  • the personalized expression base construction unit is used to determine the individual facial expression bases of the target object according to the neutral facial expression base, the preset reference neutral expression base and each reference personalized expression base, and each reference personalized expression base
  • the expression base corresponds to a face personalized expression base.
  • the neutral expression base construction unit includes: a face image detection subunit for detecting the face image in the current frame data image; a key point location subunit for detecting the face image Perform facial key point positioning to obtain a key point coordinate array; neutral expression base determination subunit, used to determine the face of the target object according to the face image, the key point coordinate array and the preset a priori information of the face model sexual expression base.
  • the personalized expression base construction unit includes: a deformation information determination subunit, used for determining deformation information according to the reference neutral expression base and the reference personalized expression base; the personalized expression base determination subunit, used for According to the deformation information and the neutral facial expression base, the facial personalized expression base of the target object is determined.
  • the virtual image construction device provided above can be used to execute the virtual image construction method provided by any of the above embodiments, and has corresponding functions and beneficial effects.
  • the units and modules included are only divided according to functional logic, but are not limited to the above division, as long as the corresponding functions can be realized; in addition, The specific names of the functional units are only for the convenience of distinguishing from each other, and are not used to limit the protection scope of the present invention.
  • FIG. 16 is a schematic structural diagram of a virtual image construction device according to an embodiment of the present application.
  • the virtual image construction device includes a processor 50, a memory 51, an input device 52, and an output device 53; the number of processors 50 in the virtual image construction device may be one or more, and in FIG. 16, one process Take device 50 as an example.
  • the processor 50 , the memory 51 , the input device 52 , and the output device 53 in the virtual image construction device may be connected by a bus or in other ways, and the connection by a bus is taken as an example in FIG. 16 .
  • the memory 51 can be used to store software programs, computer-executable programs, and modules, such as program instructions/modules corresponding to the virtual image construction method in the embodiment of the present invention (for example, the image acquisition module 401, expression base construction module 402, face model construction module 403, parameter determination module 404 and parameter transmission module 405).
  • the processor 50 executes various functional applications and data processing of the virtual image construction device by running the software programs, instructions and modules stored in the memory 51 , that is, implements the above virtual image construction device method.
  • the memory 51 may mainly include a storage program area and a storage data area, wherein the storage program area may store an operating system, an application program required for at least one function; the storage data area may store data created according to the use of the virtual image construction apparatus, and the like.
  • the memory 51 may include high-speed random access memory, and may also include non-volatile memory, such as at least one magnetic disk storage device, flash memory device, or other non-volatile solid-state storage device.
  • memory 51 may further include memory located remotely relative to processor 50, and these remote memories may be connected to the virtual image construction device through a network. Examples of such networks include, but are not limited to, the Internet, an intranet, a local area network, a mobile communication network, and combinations thereof.
  • the input device 52 can be used to receive input digital or character information, and generate key signal input related to user settings and function control of the virtual image construction device, and also includes image capture devices, audio capture devices, and the like.
  • the output device 53 may include a display device such as a display screen.
  • the virtual image construction apparatus may further include communication means for data communication with other apparatuses.
  • the above virtual image construction device includes a virtual image construction device, which can be used to execute any virtual image construction method, and has corresponding functions and beneficial effects.
  • embodiments of the present application also provide a storage medium containing computer-executable instructions, when executed by a computer processor, the computer-executable instructions are used to execute the relevant information in the virtual image construction method provided by any embodiment of the present application. operation, and has corresponding functions and beneficial effects.
  • the embodiments of the present application may be provided as a method, a system, or a computer program product.
  • the present application may take the form of an entirely hardware embodiment, an entirely software embodiment, or an embodiment combining software and hardware aspects.
  • the present application may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, etc.) having computer-usable program code embodied therein.
  • the present application is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems), and computer program products according to embodiments of the present application. It will be understood that each flow and/or block in the flowchart illustrations and/or block diagrams, and combinations of flows and/or blocks in the flowchart illustrations and/or block diagrams, can be implemented by computer program instructions.
  • These computer program instructions may be provided to the processor of a general purpose computer, special purpose computer, embedded processor or other programmable data processing device to produce a machine such that the instructions executed by the processor of the computer or other programmable data processing device produce Means for implementing the functions specified in a flow or flow of a flowchart and/or a block or blocks of a block diagram.
  • These computer program instructions may also be stored in a computer-readable memory capable of directing a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory result in an article of manufacture comprising instruction means, the instructions The apparatus implements the functions specified in the flow or flow of the flowcharts and/or the block or blocks of the block diagrams.
  • a computing device includes one or more processors (CPUs), input/output interfaces, network interfaces, and memory.
  • Memory may include non-persistent memory in computer readable media, random access memory (RAM) and/or non-volatile memory in the form of, for example, read only memory (ROM) or flash memory (flash RAM).
  • RAM random access memory
  • ROM read only memory
  • flash RAM flash memory
  • Computer-readable media includes both persistent and non-permanent, removable and non-removable media, and storage of information may be implemented by any method or technology.
  • Information may be computer readable instructions, data structures, modules of programs, or other data.
  • Examples of computer storage media include, but are not limited to, phase-change memory (PRAM), static random access memory (SRAM), dynamic random access memory (DRAM), other types of random access memory (RAM), read only memory (ROM), Electrically Erasable Programmable Read Only Memory (EEPROM), Flash Memory or other memory technology, Compact Disc Read Only Memory (CD-ROM), Digital Versatile Disc (DVD) or other optical storage, Magnetic tape cassettes, magnetic tape magnetic disk storage or other magnetic storage devices or any other non-transmission medium that can be used to store information that can be accessed by a computing device.
  • computer-readable media does not include transitory computer-readable media, such as modulated data signals and carrier waves.

Landscapes

  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Processing Or Creating Images (AREA)

Abstract

Embodiments of the present application relate to the technical field of image processing. Disclosed are a virtual image construction method and apparatus, a device, and a storage medium. The method comprises: obtaining current frame image data, the current frame image data comprising a face image of a target object; constructing a human face neutral expression base and a plurality of face personalized expression bases of the target object according to the current frame image data; constructing a three-dimensional face model of the target object according to the face neutral expression base and the plurality of face personalized expression bases; when the three-dimensional face model is mapped to the face image, determining pose parameters of the three-dimensional face model and weight coefficients of the face personalized expression bases; and sending the pose parameters and the weight coefficients to a remote device, so that the remote device generates, according to the pose parameters and the weight coefficients, a virtual image corresponding to the face image. By using the method, the technical problem in the related art of lagging caused by transmitting a real face image can be solved.

Description

虚拟图像构建方法、装置、设备及存储介质Virtual image construction method, device, device and storage medium 技术领域technical field
本申请实施例涉及图像处理技术领域,尤其涉及一种虚拟图像构建方法、装置、设备及存储介质。The embodiments of the present application relate to the technical field of image processing, and in particular, to a method, apparatus, device, and storage medium for constructing a virtual image.
背景技术Background technique
随着网络通信技术的发展,使得用户足不出户即可享受视频通话、云课堂、云会议等网络通信资源。目前,利用网络通信技术进行视频交流时,通话双方可以看到对方当前的人脸图像。并且,为了提高通话双方的视频交流感受,会传输高清图像,以便于通话双方清楚的看到对应的人脸图像。例如,云课堂或云会议时,主讲端的发言者高清人脸图像被会发送至其他设备中,以使其他设备的使用者观看到高清人脸图像。这种方式存在如下缺陷:传输高清图像对网络带宽具有较高的要求,当网络带宽有限时,容易出现卡顿的现象。With the development of network communication technology, users can enjoy network communication resources such as video calls, cloud classrooms, and cloud conferences without leaving home. At present, when using network communication technology for video communication, both parties to the call can see the current face image of the other party. In addition, in order to improve the video communication experience of both parties in the call, high-definition images will be transmitted, so that both parties in the call can clearly see the corresponding face images. For example, in a cloud classroom or cloud conference, the high-definition face image of the speaker on the presenter side will be sent to other devices, so that users of other devices can view the high-definition face image. This method has the following defects: the transmission of high-definition images has high requirements on the network bandwidth, and when the network bandwidth is limited, the phenomenon of freezing is likely to occur.
发明内容SUMMARY OF THE INVENTION
本申请实施例提供了一种虚拟图像构建方法、装置、设备及存储介质,以解决相关技术中传输真实的人脸图像所带来的卡顿现象的技术问题。Embodiments of the present application provide a virtual image construction method, apparatus, device, and storage medium, so as to solve the technical problem of the stuck phenomenon caused by the transmission of real face images in the related art.
第一方面,本申请实施例提供了一种虚拟图像构建方法,包括:In a first aspect, an embodiment of the present application provides a method for constructing a virtual image, including:
获取当前帧图像数据,所述当前帧图像数据包含目标对象的人脸图像;Acquiring current frame image data, the current frame image data comprising the face image of the target object;
根据所述当前帧图像数据构建所述目标对象的人脸中性表情基和多个人脸个性化表情基;Construct a neutral facial expression base and a plurality of personalized facial expression bases of the target object according to the current frame image data;
根据所述人脸中性表情基和多个所述人脸个性化表情基构建所述目标对象的三维人脸模型;Constructing a three-dimensional face model of the target object according to the neutral facial expression base and a plurality of individual facial expression bases;
确定所述三维人脸模型映射到所述人脸图像时所述三维人脸模型的位姿参数以及各所述人脸个性化表情基的权重系数;determining the pose parameters of the three-dimensional human face model when the three-dimensional human face model is mapped to the human face image and the weight coefficients of each of the individualized expression bases of the human face;
将所述位姿参数和所述权重系数发送至远端设备,以使所述远端设备根据所述位姿参数和所述权重系数生成与所述人脸图像相对应的虚拟图像。Sending the pose parameters and the weight coefficients to a remote device, so that the remote device generates a virtual image corresponding to the face image according to the pose parameters and the weight coefficients.
第二方面,本申请实施例还提供了一种虚拟图像构建装置,包括:In a second aspect, an embodiment of the present application further provides a virtual image construction device, including:
图像获取模块,用于获取当前帧图像数据,所述当前帧图像数据包含目标对象的人脸图像;an image acquisition module for acquiring current frame image data, where the current frame image data includes a face image of a target object;
表情基构建模块,用于根据所述当前帧图像数据构建所述目标对象的人脸中性表情基和多个人脸个性化表情基;an expression base building module for constructing a neutral facial expression base and a plurality of individualized facial expression bases of the target object according to the current frame image data;
人脸模型构建模块,用于根据所述人脸中性表情基和多个所述人脸个性化表情基构建所述目标对象的三维人脸模型;a face model building module, used for constructing a three-dimensional face model of the target object according to the neutral facial expression base and a plurality of individual facial expression bases;
参数确定模块,用于确定所述三维人脸模型映射到所述人脸图像时所述三维人脸模型的位姿参数以及各所述人脸个性化表情基的权重系数;a parameter determination module, configured to determine the pose parameters of the three-dimensional human face model when the three-dimensional human face model is mapped to the human face image and the weight coefficients of the individualized expression bases of the human face;
参数发送模块,用于将所述位姿参数和所述权重系数发送至远端设备,以使所述远端设备根据所述位姿参数和所述权重系数生成与所述人脸图像相对应的虚拟图像。A parameter sending module, configured to send the pose parameters and the weight coefficients to a remote device, so that the remote device generates images corresponding to the face according to the pose parameters and the weight coefficients virtual image.
第三方面,本申请实施例还提供了一种虚拟图像构建设备,包括:In a third aspect, an embodiment of the present application further provides a virtual image construction device, including:
一个或多个处理器;one or more processors;
存储器,用于存储一个或多个程序;memory for storing one or more programs;
当所述一个或多个程序被所述一个或多个处理器执行,使得所述一个或多个处理器实现如第一方面所述的虚拟图像构建方法。When the one or more programs are executed by the one or more processors, the one or more processors implement the virtual image construction method as described in the first aspect.
第四方面,本申请实施例还提供了一种计算机可读存储介质,其上存储有计算机程序,该程序被处理器执行时实现如第一方面所述的虚拟图像构建方法。In a fourth aspect, an embodiment of the present application further provides a computer-readable storage medium on which a computer program is stored, and when the program is executed by a processor, implements the virtual image construction method described in the first aspect.
上述虚拟图像构建方法、装置、设备及存储介质,通过获取包含目标对象人脸图像的当前帧图像数据,并根据当前帧图像数据构建目标对象的人脸中性表情基和多个人脸个性化表情基,之后,根据人脸中性表情基和多个人脸个性化表情基构建三维人脸模型,之后,确定三维人脸模型映射到人脸图像时的权重系数和位姿参数,之后,将位姿参数和权重系数发送至远端设备,以使远端设备通过位姿参数和权重系数显示与人脸图像对应的虚拟图像的技术手段,解决了相关技术中传输真实的人脸图像所带来的卡顿的技术问题。由于传输人脸个性化表情基的权重系数以及三维人脸模型的位姿参数,大大降低了网络带宽的需求,尤其适用于远程的视频交流场景。并且,传输的权重系数和位姿参数可以使远端设备显示对应的虚拟图像,有效保护了目标对象的隐私,防止信息泄露,同时,虚拟图像对人脸图像中的表情和位姿进行准确跟随,保证了远端设备的成像质量。The above-mentioned virtual image construction method, device, equipment and storage medium, by acquiring the current frame image data including the target object's face image, and constructing the target object's neutral facial expression base and a plurality of facial personalized expressions according to the current frame image data After that, a 3D face model is constructed according to the neutral face expression base and multiple face personalized expression bases. After that, the weight coefficient and pose parameters when the 3D face model is mapped to the face image are determined. The pose parameters and weight coefficients are sent to the remote device, so that the remote device can display the virtual image corresponding to the face image through the pose parameters and weight coefficients, which solves the problem caused by the transmission of real face images in the related art. Caton technical issues. Due to the transmission of the weight coefficients of the face personalized expression base and the pose parameters of the three-dimensional face model, the demand for network bandwidth is greatly reduced, and it is especially suitable for remote video communication scenarios. In addition, the transmitted weight coefficients and pose parameters can enable the remote device to display the corresponding virtual image, effectively protecting the privacy of the target object and preventing information leakage. At the same time, the virtual image accurately follows the expressions and poses in the face image. , to ensure the imaging quality of the remote device.
附图说明Description of drawings
图1为本申请一个实施例提供的一种虚拟图像构建方法的流程图;1 is a flowchart of a method for constructing a virtual image provided by an embodiment of the present application;
图2为本申请实施例提供的当前帧图像数据示意图;2 is a schematic diagram of current frame image data provided by an embodiment of the present application;
图3为本申请实施例提供的虚拟图像示意图;3 is a schematic diagram of a virtual image provided by an embodiment of the present application;
图4为本申请一个实施例提供的另一种虚拟图像构建方法的流程图;4 is a flowchart of another virtual image construction method provided by an embodiment of the present application;
图5为本申请实施例提供的参考三维人脸模型示意图;FIG. 5 is a schematic diagram of a reference three-dimensional face model provided by an embodiment of the present application;
图6为本申请实施例提供的人脸图像示意图;6 is a schematic diagram of a face image provided by an embodiment of the present application;
图7为本申请实施例提供的目标对象三维人脸模型示意图;7 is a schematic diagram of a three-dimensional face model of a target object provided by an embodiment of the present application;
图8为本申请实施例提供的人脸关键点示意图;FIG. 8 is a schematic diagram of key points of a human face provided by an embodiment of the present application;
图9为本申请实施例提供的表情精细化分区示意图;FIG. 9 is a schematic diagram of an expression refinement partition provided by an embodiment of the present application;
图10为本申请实施例提供的表情传递示意图;10 is a schematic diagram of expression transfer provided by an embodiment of the present application;
图11为本申请实施例提供的人脸关键点选取示意图;11 is a schematic diagram of selecting key points of a face provided by an embodiment of the present application;
图12为本申请一个实施例提供的又一种虚拟图像构建方法的流程图;12 is a flowchart of another virtual image construction method provided by an embodiment of the present application;
图13为本申请实施例提供的一种互为排斥表情基示意图;13 is a schematic diagram of a mutually exclusive expression base provided by an embodiment of the present application;
图14为本申请实施例提供的另一种互为排斥表情基示意图;14 is a schematic diagram of another mutually exclusive expression base provided by an embodiment of the present application;
图15为本申请一个实施例提供的一种虚拟图像构建装置的结构示意图;FIG. 15 is a schematic structural diagram of an apparatus for constructing a virtual image provided by an embodiment of the present application;
图16为本申请一个实施例提供的一种虚拟图像构建设备的结构示意图。FIG. 16 is a schematic structural diagram of a virtual image construction device according to an embodiment of the present application.
具体实施方式Detailed ways
下面结合附图和实施例对本申请作进一步的详细说明。可以理解的是,此处所描述的具体实施例用于解释本申请,而非对本申请的限定。另外还需要说明的是,为了便于描述,附图中仅示出了与本申请相关的部分而非全部结构。The present application will be further described in detail below with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are used to explain the present application, but not to limit the present application. In addition, it should be noted that, for the convenience of description, the drawings only show some but not all the structures related to the present application.
本申请实施例提供的虚拟图像构建方法可以由虚拟图像构建设备执行,该虚拟图像构建设备可以通过软件和/或硬件的方式实现,该虚拟图像构建设备可以是两个或多个物理实体构成,也可以是一个物理实体构成。例如,虚拟图像构建设备可以是电脑、手机、平板电脑或交互智能平板等智能设备。The virtual image construction method provided by the embodiment of the present application may be executed by a virtual image construction device, and the virtual image construction device may be implemented by means of software and/or hardware, and the virtual image construction device may be composed of two or more physical entities, It can also be a physical entity. For example, the virtual image construction device may be a smart device such as a computer, a mobile phone, a tablet computer, or an interactive smart tablet.
实施例中,虚拟图像构建设备应用在网上会议、网上课堂等利用网络通信技术进行视频交流的场景下。该场景中,除了虚拟图像构建设备,还包括参加视频交流的其他设备,其他设备可以为一个或多个,其他设备同样可以是电脑、手机、平板电脑或交互智能平板等智能设备。在视频交流时,虚拟图像构建设备执行本实施例所提供的虚拟图像构建方法,以在采集本地使用者的人脸图像时进行处理,进而使其他设备显示基于人脸图像得到的虚拟图像。此时,其他设备是相对于虚拟图像构建设备的远端设备。可理解,实际应用中,远端设备采集使用者的人脸图像时也可以执行本实施例提供的虚拟图像构建方法,此时,远端设备也可以认为是虚拟图像构建设备,而本地设备则用于显示对应的虚拟图像。例如,网上课堂场景下,讲师所使用的设备可以认为是虚拟图像采集设备,学生所使用的设备可以认为是远端设备。再如,网上会议场景下,当前发言者所使用的设备可以认为是虚拟图像构建设备,其他参会者所使用的设备可以认为是远端设备。In the embodiment, the virtual image construction device is applied in the scenario of video communication using network communication technology, such as online conferences and online classes. In this scenario, in addition to the virtual image construction device, it also includes other devices participating in video communication. The other devices can be one or more, and the other devices can also be smart devices such as computers, mobile phones, tablet computers, or interactive smart tablets. During video communication, the virtual image construction device executes the virtual image construction method provided in this embodiment, so as to process when collecting the face image of the local user, thereby enabling other devices to display the virtual image obtained based on the face image. At this time, the other devices are remote devices with respect to the virtual image construction device. It can be understood that in practical applications, the virtual image construction method provided in this embodiment can also be executed when the remote device collects the face image of the user. In this case, the remote device can also be considered as a virtual image construction device, while the local device is Used to display the corresponding virtual image. For example, in an online classroom scenario, the device used by the lecturer can be considered as a virtual image acquisition device, and the device used by the students can be considered as a remote device. For another example, in an online conference scenario, the device used by the current speaker may be considered as a virtual image construction device, and the devices used by other participants may be considered as remote devices.
实施例中,虚拟图像构建设备安装有至少一类操作系统,其中,操作系统包括但不限定于安卓系统、IOS系统和/或Windows系统等。虚拟图像构建设备可以基于操作系统安装至少一个应用程序,该应用程序可以为操作系统自带的应用程序,也可以为从第三方设备或者服务器中下载的应用程序,应用程序的类别实施例不作限定,可理解,本申请实施例提供的虚拟图像构建方法也可以为应用程序本身。实施例中,虚拟图像构建设备至少安装有用于执行本申请实施例提供的虚拟头像构建方法的应用程序,且在该应用程序运行时执行虚拟头像构建方法。In an embodiment, the virtual image construction device is installed with at least one type of operating system, wherein the operating system includes but is not limited to an Android system, an IOS system, and/or a Windows system. The virtual image construction device can install at least one application program based on the operating system, and the application program can be an application program that comes with the operating system, or it can be an application program downloaded from a third-party device or server. The embodiment of the application program is not limited. , it can be understood that the virtual image construction method provided by the embodiment of the present application may also be an application program itself. In the embodiment, the virtual image constructing device is installed with at least an application program for executing the virtual avatar constructing method provided by the embodiment of the present application, and the virtual avatar constructing method is executed when the application program runs.
图1为本申请一个实施例提供的一种虚拟图像构建方法的流程图。参考图1,该虚拟图像构建方法具体包括:FIG. 1 is a flowchart of a method for constructing a virtual image according to an embodiment of the present application. Referring to Figure 1, the virtual image construction method specifically includes:
步骤110、获取当前帧图像数据,当前帧图像数据包含目标对象的人脸图像。Step 110: Acquire current frame image data, where the current frame image data includes a face image of the target object.
虚拟图像构建设备在进行视频交流时,可以通过自身安装的图像采集装置(如摄像头)采集图像数据,实施例中,将当前采集的图像数据记为当前帧图像数据。实施例中,当前帧图像数据包含目标对象的人脸图像,其中,目标对象是指需要生成虚拟图像的对象,任何能被识别出人脸图像的对象都可以认为是目标对象,而并不需要事先指定;例如,网上课堂场景下,目标对象可以为使用虚拟图像构建设备的讲师,目标对象的人脸图像是指讲师的面部图像。可理解,当前帧图像数据中目标对象的数量为一个或多个,实施例中,以当前帧图像数据包含一个目标对象进行示例性描述。实际应用中,目标对象为多个时,每个目标对象的处理方法与当前一个目标对象的处理方法相同。When performing video communication, the virtual image construction device may collect image data through an image collection device (eg, a camera) installed by itself. In the embodiment, the currently collected image data is recorded as the current frame image data. In the embodiment, the current frame image data includes the face image of the target object, wherein the target object refers to the object that needs to generate a virtual image, and any object that can be recognized as a face image can be considered as the target object, and does not need to be Specified in advance; for example, in an online classroom scenario, the target object can be a lecturer using a virtual image construction device, and the face image of the target object refers to the lecturer's face image. It can be understood that the number of target objects in the image data of the current frame is one or more. In the embodiment, the image data of the current frame includes one target object for exemplary description. In practical applications, when there are multiple target objects, the processing method of each target object is the same as the processing method of the current target object.
可选的,获取当前帧图像数据之后,先确认当前帧图像数据是否包含目标对象的人脸图像,若是,则执行后续步骤,否则,停止执行后续步骤并获取下一帧图像数据作为当前帧图像数据以重复执行本步骤。其中,确认当前帧图像数据是否包含人脸图像所采用的技术手段实施例不作限定,例如,使用基于深度学习的人脸检测算法来检测当前帧图像数据中的人脸图像区域,若检测到人脸图像区域,则确定当前帧图像数据包含人脸图像,否则,确定当前帧图像数据不包含人脸图像。Optionally, after obtaining the image data of the current frame, first confirm whether the image data of the current frame contains the face image of the target object, if so, execute the subsequent steps, otherwise, stop executing the subsequent steps and obtain the image data of the next frame as the current frame image data to repeat this step. The embodiment of the technical means used to confirm whether the current frame image data contains a face image is not limited. For example, a face detection algorithm based on deep learning is used to detect the face image area in the current frame image data. face image area, it is determined that the image data of the current frame contains a face image; otherwise, it is determined that the image data of the current frame does not contain a face image.
步骤120、根据当前帧图像数据构建目标对象的人脸中性表情基和多个人脸个性化表情基。 Step 120 , construct a neutral facial expression base and a plurality of personalized facial expression bases of the target object according to the current frame image data.
表情基可以理解为包含人脸关键点位置信息的人脸三维模型,通过表情基中人脸关键点位置信息可以体现出人的表情。人脸关键点可以通过人脸中的关键部位得到,其中,人脸的关键部位包括眉毛、眼睛、鼻子、嘴巴以及脸颊等,人脸关键点是指位于上述关键部位中且用于描述关键部位当前动作的点。此时,通过人脸关键点可以确定各关键部位的动作,进而确定人脸姿态、人脸位置以及人脸表情等内容,即每个人脸关键点都为人脸的语义信息。The expression base can be understood as a three-dimensional face model containing the position information of the key points of the face, and the expression of the person can be reflected by the position information of the key points of the face in the expression base. The key points of the face can be obtained from the key parts of the face. The key parts of the face include eyebrows, eyes, nose, mouth, and cheeks, etc. The key points of the face are located in the above key parts and are used to describe the key parts. The point of the current action. At this time, the action of each key part can be determined through the face key points, and then the face pose, face position and face expression can be determined, that is, each face key point is the semantic information of the face.
一个实施例中,将人脸表情分为中性表情和个性化表情。中性表情是指没有任何表情下人脸的形态,其可以体现出人脸身份,其中,人脸身份是对于人脸形态的具体描述,如人脸身份描述了人脸的关键部位,举例而言,人脸身份所描述的关键部位为大眼睛、高鼻梁、薄嘴唇。此时,由于不同对象的人脸不同,那么,不同对象的人脸身份所描述的关键部位间会存在差异。个性化表情是指人脸作出的表情,如闭眼、张嘴、皱眉等。根据人脸表情将表情基分为人脸中性表情基和人脸个性化表情基。其中,人脸中性表情基可以理解为表示中性表情的表情基,通过人脸中性表情基可以确认没有任何表情时人脸在三维空间下的形态。人脸个性化表情基是指包含个性化表情的表情基,每个人脸个性化表情基对应表示一个个性化表情。可理解,由于人脸的表情是很丰富的,如果想要表示出人脸的全部表情,则需要构建大量的人脸个性化表情基,这会大大增加数据处理量。因此,实施例中,仅构建基本表情的人脸个性化表情基,其中,基本表情的具体内容可以根据实际情况设定,通过对基本表情和中性表情的组合可以得到人脸的各种表情。例如,针对眼睛的基本表情包括:左眼闭、左眼瞪大、右眼闭和右眼瞪大,此时,根据上述四个基本表情和中性表情便可以得到眼睛的各种表情,例如,双眼微眯的表情可以通过左眼闭、右眼闭和中性表情线性叠加后获得。In one embodiment, facial expressions are divided into neutral expressions and personalized expressions. Neutral expression refers to the shape of the face without any expression, which can reflect the identity of the face. Among them, the identity of the face is a specific description of the shape of the face. For example, the identity of the face describes the key parts of the face. For example, In other words, the key parts described by face identity are big eyes, high nose bridge, and thin lips. At this time, since the faces of different objects are different, there will be differences between the key parts described by the face identities of different objects. Personalized expressions refer to expressions made by a human face, such as eyes closed, mouth open, frowning, etc. According to the facial expressions, the expression bases are divided into neutral expression bases and personalized facial expression bases. Among them, the neutral facial expression base can be understood as an expression base representing neutral expressions, and the shape of the human face in three-dimensional space can be confirmed through the neutral facial expression base. The face personalized expression base refers to an expression base containing personalized expressions, and each face personalized expression base corresponds to a personalized expression. Understandably, since the expressions of the human face are very rich, if you want to express all the expressions of the human face, you need to build a large number of personalized facial expression bases, which will greatly increase the amount of data processing. Therefore, in the embodiment, only the face personalized expression base of the basic expression is constructed, wherein the specific content of the basic expression can be set according to the actual situation, and various expressions of the human face can be obtained by combining the basic expression and the neutral expression . For example, the basic expressions for eyes include: left eye closed, left eye wide, right eye closed and right eye wide, at this time, various expressions of eyes can be obtained according to the above four basic expressions and neutral expressions, such as , the expression with slightly squinting eyes can be obtained by linear superposition of left eye closed, right eye closed and neutral expression.
实施例中,通过当前帧图像数据的人脸图像构建目标对象的人脸中性表情基和各人脸个性化表情基。其中,构建目标对象的人脸中性表情基时,可以引入先验信息。该先验信息通过采集大量的三维人脸数据后获取,其可以体现大量三维人脸数据的平均坐标数据、人脸身份基向量和个性化表情基向量,通过先验信息可以构建三维人脸模型,可理解,为先验信息设置不同的系数时所得到的三维人脸模型存在差异。该三维人脸模型可认为是参考三维人脸模型,即通过调整参考三维人脸模型可以得到与当前帧图像数据中目标对象人脸图像相对应的三维人脸模型。此时,先获取目标对象的人脸图像中各人脸关键点在二维平面中的坐标,之后,将参考三维人脸模型的三维关键点(即参考三维人脸模型中的人脸关键点)投影到该二维平面中,以确定三维关键点在该二维平面中的坐标,之后,计算参考三维人脸模型中三维关键点和人脸图像中人脸关键点在该二维平面中的误差,可理解,需要计算误差的三维关键点和人脸关键点具有对应关系,即每组对应的三维关键点和人脸关键点在对应图像中的相对位置相同,例如,三维关键点和人脸关键点均为眼睛的左边界点。之后,根据计算得到的误差对参考三维人脸模型中的三维关键点位置进行调整,以保证调整后的参考三维人脸模型的三维关键点投影到二维平面时与人脸图像中人脸关键点的坐标尽量重合。其中,调整三维关键点位置可以通过调整先验信息使用的系数实现,系数不同时,通过先验信息构建的参考三维人脸模型中三维关键点的位置不同。当人脸关键点尽量重合时,可以认为调整后的参考三维人脸模型贴近目标对象的三维人脸模型。之后,获取最终使用的系数,并去掉先验信息 中的个性化表情基向量,以通过最终使用的系数和去掉个性化表情基向量后的先验信息构建一不包含个性化表情的参考三维人脸模型,此时,该参考三维人脸模型可以认为是目标对象的人脸中性表情基。需说明,实际应用中,还可以采用其他方式构建人脸中性表情基,如利用神经网络的方式,将人脸图像或人脸图像中人脸关键点输入至神经网络后便可以得到对应的人脸中性表情基。In the embodiment, the neutral facial expression base and individual facial expression bases of the target object are constructed by using the facial image of the current frame image data. Among them, when constructing the neutral facial expression base of the target object, prior information can be introduced. The prior information is obtained by collecting a large amount of 3D face data, which can reflect the average coordinate data, face identity base vector and personalized expression base vector of a large amount of 3D face data, and a 3D face model can be constructed through the prior information , it can be understood that there are differences in the 3D face models obtained when different coefficients are set for the prior information. The 3D face model can be regarded as a reference 3D face model, that is, a 3D face model corresponding to the target object face image in the current frame image data can be obtained by adjusting the reference 3D face model. At this time, first obtain the coordinates of each face key point in the two-dimensional plane in the face image of the target object, and then refer to the three-dimensional key points of the three-dimensional face model (that is, refer to the face key points in the three-dimensional face model). ) into the two-dimensional plane to determine the coordinates of the three-dimensional key points in the two-dimensional plane, and then calculate the three-dimensional key points in the reference three-dimensional face model and the face key points in the face image in the two-dimensional plane. It is understandable that the three-dimensional key points that need to calculate the error have a corresponding relationship with the face key points, that is, the relative positions of each group of corresponding three-dimensional key points and face key points in the corresponding image are the same. For example, the three-dimensional key points and The key points of the face are the left boundary points of the eyes. After that, adjust the position of the 3D key points in the reference 3D face model according to the calculated error, so as to ensure that the 3D key points of the adjusted reference 3D face model are projected to the 2D plane with the face key points in the face image. The coordinates of the points should be as coincident as possible. The adjustment of the positions of the three-dimensional key points can be realized by adjusting the coefficients used by the prior information. When the coefficients are different, the positions of the three-dimensional key points in the reference three-dimensional face model constructed by the prior information are different. When the face key points overlap as much as possible, it can be considered that the adjusted reference 3D face model is close to the 3D face model of the target object. After that, obtain the final used coefficients, and remove the personalized expression base vector in the prior information, so as to construct a reference 3D person without personalized expressions by using the final used coefficients and the prior information after removing the personalized expression base vector Face model, at this time, the reference 3D face model can be considered as the neutral facial expression base of the target object. It should be noted that in practical applications, other methods can also be used to construct a neutral facial expression base. For example, by using a neural network, the corresponding facial image or key points in the facial image can be input into the neural network. Human face neutral expression base.
之后,对人脸中性表情基进行处理,以得到目标对象的人脸个性化表情基。其中,构建人脸个性化表情基时同样引入先验信息,此时,每种基础表情对应一个先验信息,该先验信息表示对应基础表情的三维人脸模型,中性表情也对应一个先验信息,该先验信息表示对应中性表情的三维人脸模型,上述先验信息中的基础表情和中性表情属于同一人脸。进一步的,先计算中性表情对应的先验信息和基础表情对应的先验信息之间的传递形变量,即通过传递形变量对中性表情对应的先验信息进行转换后,可以得到该基础表情对应的先验信息。之后,根据该传递形变量对人脸中性表情基进行转换,以得到目标对象在该基础表情下的人脸个性化表情基。可理解,按照上述方式便可以得到目标对象在每个基础表情下的人脸个性化表情基。Afterwards, the neutral facial expression base is processed to obtain the personalized facial expression base of the target object. Among them, the prior information is also introduced when constructing the face personalized expression base. At this time, each basic expression corresponds to a prior information, the prior information represents the three-dimensional face model corresponding to the basic expression, and the neutral expression also corresponds to a prior information. The prior information represents a three-dimensional face model corresponding to a neutral expression, and the basic expression and the neutral expression in the a priori information above belong to the same face. Further, first calculate the transfer deformation variable between the prior information corresponding to the neutral expression and the prior information corresponding to the basic expression, that is, after transforming the prior information corresponding to the neutral expression by the transfer deformation variable, the basic expression can be obtained. The prior information corresponding to the expression. Afterwards, the neutral facial expression base is converted according to the transfer deformation variable, so as to obtain the personalized facial expression base of the target object under the basic expression. It can be understood that, according to the above method, the personalized facial expression base of the target object under each basic expression can be obtained.
步骤130、根据人脸中性表情基和多个人脸个性化表情基构建目标对象的三维人脸模型。 Step 130 , constructing a three-dimensional face model of the target object according to the neutral facial expression base and multiple personalized facial expression bases.
示例性的,为每个人脸个性化表情基设置一权重系数,之后,结合权重系数将各人脸个性化表情基与人脸中性表情基进行线性加权,便可以得到带有表情的目标对象的三维人脸模型。此时,三维人脸模型表示为:
Figure PCTCN2021070727-appb-000001
其中,B表示三维人脸模型,B 0表示所述人脸中性表情基,B i表示第i个人脸个性化表情基,1≤i≤n,n为人脸个性化表情基的总数量,β i表示B i对应的权重系数。
Exemplarily, a weight coefficient is set for each individual face expression base, and then linear weighting is performed on each face individual face expression base and the face neutral expression base in combination with the weight coefficient, so that a target object with an expression can be obtained. 3D face model. At this point, the 3D face model is expressed as:
Figure PCTCN2021070727-appb-000001
Among them, B represents a three-dimensional face model, B 0 represents the neutral facial expression base, B i represents the ith face personalized expression base, 1≤i≤n, n is the total number of face personalized expression bases, β i represents the weight coefficient corresponding to B i .
可以理解,由于当前不能明确当前帧图像数据中人脸图像的具体表情,因此,无法为每个人脸个性化表情基设置准确的权重系数,那么,在构建三维人脸模型时,可以为每个人脸个性化表情基设置一初始的权重系数。可理解,当前构建的三维人脸模型所表示的表情与人脸图像的表情存在差异。It can be understood that since the specific expression of the face image in the current frame of image data cannot be clearly defined, it is impossible to set an accurate weight coefficient for each face personalized expression base. Then, when building a three-dimensional face model, it can be used for each person. The face personalized expression base sets an initial weight coefficient. It is understandable that the expressions represented by the currently constructed three-dimensional face model are different from the expressions of the face images.
步骤140、确定三维人脸模型映射到人脸图像时三维人脸模型的位姿参数以及各人脸个性化表情基的权重系数。Step 140: Determine the pose parameters of the three-dimensional face model when the three-dimensional face model is mapped to the face image and the weight coefficients of the individualized expression bases of each face.
可理解,三维人脸模型所表示的表情与人脸图像的真实表情越接近时,将三维人脸模型映射到二维平面时,其与人脸图像之间的差异越小。因此,实施例中,可以不断调整三维人脸模型中各人脸个性化表情基的权重系数,以使得三维人脸模型所表示的表情接近于人脸图 像的表情。同时,由于目标对象头部会存在一定的动作,如偏头、扭头等动作,因此,实施例中,还需要调整三维人脸模型的位姿参数,以使调整后的三维人脸模型的动作与人脸图像中目标对象的头部动作一致。其中,位姿参数也可以理解为刚性变换参数。实施例中,刚性变换是指改变三维人脸模型的位置、朝向和大小,而不改变形状的变化。刚性变换参数是指对三维人脸模型进行刚性变换时使用的参数,实施例中,刚性变换参数包括:刚性旋转矩阵、平移向量以及尺度缩放因子。刚性旋转矩阵用于改变三维人脸模型的朝向、平移向量用于改变三维人脸模型的位置,尺度缩放因子由于改变三维人脸模型的大小。It can be understood that when the expression represented by the three-dimensional face model is closer to the real expression of the face image, the difference between the three-dimensional face model and the face image is smaller when the three-dimensional face model is mapped to the two-dimensional plane. Therefore, in the embodiment, the weight coefficients of the individualized expression bases of each face in the three-dimensional human face model can be continuously adjusted, so that the expressions represented by the three-dimensional human face model are close to the expressions of the human face image. At the same time, since the head of the target object will have certain movements, such as tilting the head, turning the head, etc., therefore, in the embodiment, it is also necessary to adjust the pose parameters of the three-dimensional face model, so that the adjusted three-dimensional face model moves Consistent with the head motion of the target object in the face image. Among them, the pose parameters can also be understood as rigid transformation parameters. In the embodiment, the rigid transformation refers to changing the position, orientation and size of the three-dimensional face model without changing the shape. The rigid transformation parameter refers to a parameter used when performing rigid transformation on the three-dimensional face model. In the embodiment, the rigid transformation parameter includes: a rigid rotation matrix, a translation vector, and a scaling factor. The rigid rotation matrix is used to change the orientation of the 3D face model, the translation vector is used to change the position of the 3D face model, and the scaling factor is used to change the size of the 3D face model.
实施例中,通过构建误差参数公式确定三维人脸模型映射到二维平面时二维图像与人脸图像之间的差异。误差参数越小,表明差异越小,三维人脸模型所表示的表情与人脸图像的真实表情越接近,且三维人脸模型的动作与人脸图像中目标对象的头部动作越一致。一种可选方式,通过三维人脸模型映射到二维平面时的二维图像与人脸图像中人脸关键点的坐标差异构建误差参数公式。可理解,三维人脸模型对应的人脸关键点坐标可通过权重系数和位姿参数决定,因此,构建误差参数公式时,权重系数和位姿参数可认为是未知量,后续处理过程中,可以不断调整权重系数和位姿参数,以使三维人脸模型对应的人脸关键点坐标和人脸图像中的人脸关键点坐标越来越接近,进而使误差参数越来越小,三维人脸模型所表示的表情与人脸图像的真实表情越来越相同,三维人脸模型的动作与人脸图像中目标对象的头部动作的位姿参数越来越一致。具体的,当计算的误差参数已经达到期望的数值时,便可以将当前使用的权重系数和位姿参数作为最终得到的权重系数和位姿参数。其中,可以通过设定权重系数和位姿参数的调整次数,判断误差参数是否达到期望的数值,即当调整次数达到一定次数时,确定误差参数达到期望的数值。还可以通过设定参数阈值的方式,判断误差参数是否达到期望的数值,即当误差参数低于参数阈值时,确定误差参数达到期望的数值。一般而言,若误差参数已经达到期望的数值,则可以认为三维人脸模型所表示的表情与人脸图像的真实表情足够相同,三维人脸模型的动作与人脸图像中目标对象的头部动作的位姿参数足够一致。In the embodiment, the difference between the two-dimensional image and the face image when the three-dimensional face model is mapped to the two-dimensional plane is determined by constructing an error parameter formula. The smaller the error parameter is, the smaller the difference is, the closer the expression represented by the 3D face model is to the real expression of the face image, and the more consistent the action of the 3D face model is with the head action of the target object in the face image. In an optional way, the error parameter formula is constructed by the coordinate difference between the two-dimensional image when the three-dimensional face model is mapped to the two-dimensional plane and the key points of the face in the face image. It is understandable that the coordinates of the key points of the face corresponding to the 3D face model can be determined by the weight coefficients and pose parameters. Therefore, when constructing the error parameter formula, the weight coefficients and pose parameters can be considered as unknown quantities. Constantly adjust the weight coefficients and pose parameters, so that the coordinates of the face key points corresponding to the 3D face model and the face key point coordinates in the face image are getting closer and closer, so that the error parameters are getting smaller and smaller, and the 3D face The expression represented by the model is more and more the same as the real expression of the face image, and the action of the 3D face model is more and more consistent with the pose parameters of the head action of the target object in the face image. Specifically, when the calculated error parameter has reached the desired value, the currently used weight coefficient and pose parameter can be used as the finally obtained weight coefficient and pose parameter. Among them, it can be determined whether the error parameter reaches the desired value by setting the adjustment times of the weight coefficient and the pose parameter, that is, when the adjustment times reaches a certain number of times, it is determined that the error parameter reaches the desired value. It is also possible to determine whether the error parameter reaches the expected value by setting the parameter threshold, that is, when the error parameter is lower than the parameter threshold, it is determined that the error parameter reaches the expected value. Generally speaking, if the error parameter has reached the expected value, it can be considered that the expression represented by the 3D face model is sufficiently the same as the real expression of the face image, and the action of the 3D face model is the same as the head of the target object in the face image. The pose parameters of the actions are sufficiently consistent.
步骤150、将位姿参数和权重系数发送至远端设备,以使远端设备根据位姿参数和权重系数生成与人脸图像相对应的虚拟图像。Step 150: Send the pose parameters and the weight coefficients to the remote device, so that the remote device generates a virtual image corresponding to the face image according to the pose parameters and the weight coefficients.
示例性的,将位姿参数和权重系数发送至与虚拟图像构建设备进行视频交流的远端设备。远端设备中存储有虚拟图像,该虚拟图像可以为卡通图像,其可以为二维虚拟图像也可以为三维虚拟图像。实施例中,以三维虚拟图像进行为例。远端设备中存储三维虚拟图像具体包括存储三维虚拟图像的中性表情基和个性化表情基,其中,三维虚拟图像的每个个性化表情基均与对应的人脸个性化表情基具有相同的表情。在一实施例中,远端设备的用户可以在远 端设备中安装应用程序,该应用程序可以接收由虚拟图像构建装置发送来的位姿参数和权重系数,并根据位姿参数和权重系数生成与人脸图像相对应的虚拟图像,远端设备在安装该应用程序时存储虚拟图像。在应用程序进行升级、更新时,可以同时更新远端设备存储的虚拟图像。Exemplarily, the pose parameters and weight coefficients are sent to a remote device for video communication with the virtual image construction device. A virtual image is stored in the remote device, and the virtual image may be a cartoon image, which may be a two-dimensional virtual image or a three-dimensional virtual image. In the embodiment, a three-dimensional virtual image is used as an example. The storage of the three-dimensional virtual image in the remote device specifically includes storing a neutral expression base and a personalized expression base of the three-dimensional virtual image, wherein each personalized expression base of the three-dimensional virtual image has the same characteristics as the corresponding personalized facial expression base. expression. In one embodiment, the user of the remote device can install an application program in the remote device, and the application program can receive the pose parameters and weight coefficients sent by the virtual image construction device, and generate the pose parameters and weight coefficients according to the pose parameters and the weight coefficients. A virtual image corresponding to the face image, and the remote device stores the virtual image when the application is installed. When the application program is upgraded or updated, the virtual image stored in the remote device can be updated at the same time.
一个实施例中,远端设备生成与人脸图像相对应的虚拟图像时,可通过开源图形库(Open Graphics Library,OpenGL)的图形渲染框架对预先设定的三维虚拟图像进行渲染和显示。其中,在渲染时具体为根据权重系数对三维虚拟图像的个性化表情基和中性表情基进行线性加权,得到包含表情的三维虚拟图像,其中,线性加权方式与步骤130中构建三维人脸模型时线性加权的方式相同。生成包含表情的三维虚拟图像后,图形渲染框架根据位姿参数对包含表情的三维虚拟图像进行相应的刚性变换,并在刚性变换完成后进行显示。此时,显示的三维虚拟图像不仅表情与人脸图像中的表情相同,位姿也与人脸图像中的位姿相同。例如,图2为本申请实施例提供的当前帧图像数据示意图。图3为本申请实施例提供的虚拟图像示意图。通过上述方法对图2所示的当前帧图像数据进行处理后,便可以使远端设备显示图3所示的虚拟图像示意图。In one embodiment, when the remote device generates a virtual image corresponding to a face image, it can render and display a preset three-dimensional virtual image through a graphics rendering framework of an open source graphics library (Open Graphics Library, OpenGL). Wherein, during rendering, the individualized expression base and neutral expression base of the three-dimensional virtual image are linearly weighted according to the weight coefficient, so as to obtain a three-dimensional virtual image containing expressions, wherein the linear weighting method is the same as that of constructing the three-dimensional face model in step 130. The same way of linear weighting. After generating the three-dimensional virtual image containing the expression, the graphics rendering framework performs corresponding rigid transformation on the three-dimensional virtual image containing the expression according to the pose parameters, and displays it after the rigid transformation is completed. At this time, the displayed three-dimensional virtual image not only has the same expression as the human face image, but also has the same pose as the human face image. For example, FIG. 2 is a schematic diagram of current frame image data provided by an embodiment of the present application. FIG. 3 is a schematic diagram of a virtual image provided by an embodiment of the present application. After the current frame image data shown in FIG. 2 is processed by the above method, the remote device can display the virtual image schematic diagram shown in FIG. 3 .
可理解,向远端设备发送权重系数和位姿参数后,便可以获取下一帧图像数据,并将下一帧图像数据作为当前帧图像数据,重复上述过程,以使远端设备显示持续显示虚拟图像。It can be understood that after sending the weight coefficient and pose parameters to the remote device, the next frame of image data can be obtained, and the next frame of image data can be used as the current frame of image data, and the above process is repeated to make the remote device display continuous display. virtual image.
可选的,虚拟图像构建设备中还可以设置开启虚拟图像的功能控件,该功能控件可以通过实体物理按键实现也可以通过虚拟按键实现,当检测到功能控件被触发时,执行上述方法以使远端设备显示虚拟图像,当检测到功能控件被停止触发时,只需将当前帧图像数据发送至远端设备,以使远端设备显示当前帧图像即可。这样使用者可结合自身需求确定是否显示真实图像,进而提高了使用者的使用感受。Optionally, the virtual image construction device can also be set with a function control for enabling the virtual image, and the function control can be realized by physical physical buttons or virtual buttons. When it is detected that the function control is triggered, the above method is executed to enable the remote control. The terminal device displays the virtual image. When it is detected that the function control is stopped being triggered, it only needs to send the current frame image data to the remote device, so that the remote device can display the current frame image. In this way, the user can determine whether to display the real image based on his own needs, thereby improving the user's experience.
上述,通过获取包含目标对象人脸图像的当前帧图像数据,并根据当前帧图像数据构建目标对象的人脸中性表情基和多个人脸个性化表情基,之后,根据人脸中性表情基和多个人脸个性化表情基构建三维人脸模型,之后,确定三维人脸模型映射到人脸图像时的权重系数和位姿参数,之后,将位姿参数和权重系数发送至远端设备,以使远端设备通过位姿参数和权重系数显示与人脸图像对应的虚拟图像的技术手段,解决了相关技术中传输真实的人脸图像所带来的信息泄露及卡顿的技术问题。由于传输人脸个性化表情基的权重系数以及三维人脸模型的位姿参数,大大降低了网络带宽的需求,尤其适用于远程的视频交流场景。并且,传输的权重系数和位姿参数可以使远端设备显示对应的虚拟图像,有效保护了目标对象的隐私,防止信息泄露,同时,虚拟图像对人脸图像中的表情和位姿进行准确跟随,保证了远端设备的成像质量。Above, by acquiring the current frame image data containing the target object's face image, and constructing the target object's face neutral expression base and a plurality of face personalized expression bases according to the current frame image data, and then, according to the face neutral expression base. Build a 3D face model with multiple face personalized expression bases, and then determine the weight coefficients and pose parameters when the 3D face model is mapped to the face image, and then send the pose parameters and weight coefficients to the remote device. The technical means of enabling the remote device to display the virtual image corresponding to the face image through pose parameters and weight coefficients solves the technical problems of information leakage and jamming caused by the transmission of the real face image in the related art. Due to the transmission of the weight coefficients of the face personalized expression base and the pose parameters of the three-dimensional face model, the demand for network bandwidth is greatly reduced, and it is especially suitable for remote video communication scenarios. In addition, the transmitted weight coefficients and pose parameters can enable the remote device to display the corresponding virtual image, effectively protecting the privacy of the target object and preventing information leakage. At the same time, the virtual image accurately follows the expressions and poses in the face image. , to ensure the imaging quality of the remote device.
图4为本申请一个实施例提供的另一种虚拟图像构建方法的流程图。本实施例是在上述实施例的基础上进行具体化。参考图4,该虚拟图像构建方法具体包括:FIG. 4 is a flowchart of another virtual image construction method provided by an embodiment of the present application. This embodiment is embodied on the basis of the above-mentioned embodiment. Referring to Figure 4, the virtual image construction method specifically includes:
步骤210、获取当前帧图像数据,当前帧图像数据包含目标对象的人脸图像。Step 210: Acquire current frame image data, where the current frame image data includes a face image of the target object.
步骤220、根据当前帧图像数据和预先设置的人脸模型先验信息构建目标对象的人脸中性表情基。Step 220: Construct a neutral facial expression base of the target object according to the current frame image data and the preset a priori information of the facial model.
实施例中,人脸模型先验信息是指构建目标对象的人脸中性表情基时使用的先验信息,通过人脸模型先验信息可以构建参考三维人脸模型,该参考三维人脸模型在本实施例中为中性表情。一个实施例中,通过人脸图像拟合人脸模型参数进而得到目标对象的人脸中性表情基。例如,图5为本申请实施例提供的参考三维人脸模型示意图,图6为本申请实施例提供的人脸图像示意图,图7为本申请实施例提供的目标对象三维人脸模型示意图,此时,通过人脸模型先验信息构建出图5的参考三维人脸模型,之后,将图5的参考人脸三维模型与图6中的人脸图像进行拟合,便可以得到图7中目标对象的三维人脸模型,可理解,图7所示的三维人脸模型没有表情时,可以作为人脸中性表情基。需说明,图7示出了三维人脸模型的侧视图。In the embodiment, the prior information of the face model refers to the prior information used when constructing the neutral facial expression base of the target object, and the reference three-dimensional face model can be constructed through the prior information of the face model, and the reference three-dimensional face model In this embodiment, it is a neutral expression. In one embodiment, the face neutral expression base of the target object is obtained by fitting the face model parameters through the face image. For example, FIG. 5 is a schematic diagram of a reference three-dimensional face model provided by an embodiment of the present application, FIG. 6 is a schematic diagram of a face image provided by an embodiment of the present application, and FIG. 7 is a schematic diagram of a three-dimensional face model of a target object provided by an embodiment of the present application. , the reference 3D face model shown in Fig. 5 is constructed through the prior information of the face model. After that, the reference 3D face model shown in Fig. 5 is fitted with the face image shown in Fig. 6, and the target shown in Fig. 7 can be obtained. The three-dimensional face model of the object can be understood. When the three-dimensional face model shown in FIG. 7 has no expression, it can be used as a neutral expression base of the face. It should be noted that FIG. 7 shows a side view of the three-dimensional face model.
示例性的,人脸模型先验信息可以基于已公开的BFM(Basel Face Mode)数据库中的三维人脸数据来构建,每个三维人脸数据可以认为是一个三维人脸模型,该三维人脸模型的表情实施例不做限定。一个实施例中,使用主成分分析方法(Principal ComponentAnalysis,PCA)提取BFM数据库中200个三维人脸数据以获取双线性模型,其中,该双线性模型为基于200个三维人脸数据构建的参考三维人脸模型,其具体模型表达公式为:Exemplarily, the prior information of the face model can be constructed based on the three-dimensional face data in the published BFM (Basel Face Mode) database, and each three-dimensional face data can be considered as a three-dimensional face model. The embodiment of the expression of the model is not limited. In one embodiment, Principal Component Analysis (PCA) is used to extract 200 three-dimensional face data in the BFM database to obtain a bilinear model, wherein the bilinear model is constructed based on the 200 three-dimensional face data. Referring to the 3D face model, the specific model expression formula is:
M=MU+PC id·α id+PC exp·α exp M=MU+PC id ·α id +PC exp ·α exp
其中,M为参考三维人脸模型,MU为200个三维人脸数据的平均坐标数据,其中,MU共有3h个数据,其中,h是指200个三维人脸数据的平均点云个数,每个点云均包含x、y、z三个轴的坐标,通过MU可构建三维的人脸,PC id为通过200个三维人脸数据得到的人脸身份基向量,通过PC id可体现为MU叠加的人脸身份,即通过为MU叠加PC id可得到参考三维人脸模型的人脸身份(例如没有表情时的中性表情人脸特征),PC exp为通过200个三维人脸数据得到的个性化表情基向量,通过PC exp可体现为MU叠加的个性化表情,即通过为MU叠加PC exp可得到参考三维人脸模型的个性化表情,α id为人脸身份基向量对应的系数,α exp为个性化表情基向量对应的系数。即通过α id和α exp分别对PC id和PC exp进行线性加权,并将加权结果融合至三维人脸数据的平均坐标数据中,便可以得到参考三维人脸模型,可以理解,MU、PC id、PC exp是人脸模型先验信息。 Among them, M is the reference 3D face model, MU is the average coordinate data of 200 3D face data, and MU has a total of 3h data, where h refers to the average number of point clouds of 200 3D face data. Each point cloud contains the coordinates of the three axes of x, y, and z. A three-dimensional face can be constructed through MU. PC id is the face identity base vector obtained from 200 three-dimensional face data, which can be reflected as MU through PC id The superimposed face identity, that is, the face identity of the reference 3D face model can be obtained by superimposing the PC id for the MU (for example, the neutral expression face feature when there is no expression), and the PC exp is obtained by 200 3D face data. Personalized expression base vector, which can be expressed as the personalized expression superimposed by MU through PC exp , that is, the personalized expression of the reference 3D face model can be obtained by superimposing PC exp for MU, α id is the coefficient corresponding to the face identity base vector, α exp is the coefficient corresponding to the personalized expression base vector. That is, the PC id and PC exp are linearly weighted by α id and α exp , and the weighted results are fused into the average coordinate data of the 3D face data, so that the reference 3D face model can be obtained. It can be understood that MU, PC id , PC exp is the prior information of the face model.
构建参考三维人脸模型后,便可以将参考三维人脸模型映射到二维平面中,以确定二维平面中的二维图像与人脸图像的差异,进而根据该差异调整人脸模型先验信息使用的α id和α exp,以使得调整后得到的参考三维人脸模型映射到二维平面后与人脸图像的高度相似或相同。 After the reference 3D face model is constructed, the reference 3D face model can be mapped to the 2D plane to determine the difference between the 2D image in the 2D plane and the face image, and then adjust the face model priors according to the difference. The information uses α id and α exp so that the adjusted reference 3D face model is mapped to the 2D plane and is highly similar to or the same as the face image.
一个实施例中,确定参考三维人脸模型对应的二维图像和人脸图像的差异时,具体通过人脸关键点确定,此时,步骤220包括步骤221-步骤223:In one embodiment, when determining the difference between the two-dimensional image corresponding to the reference three-dimensional face model and the face image, it is specifically determined by the key points of the face. At this time, step 220 includes steps 221-223:
步骤221、检测当前帧数据图像中的人脸图像。Step 221: Detect the face image in the current frame data image.
一个实施例中,利用人脸识别算法检测出当前帧数据图像中人脸所在的位置区域,之后,抠取人脸所在的位置区域以得到人脸图像。In one embodiment, a face recognition algorithm is used to detect the location area where the human face is located in the current frame data image, and then the location area where the human face is located is extracted to obtain the human face image.
步骤222、对人脸图像进行人脸关键点定位,以得到关键点坐标数组。Step 222: Perform facial key point positioning on the face image to obtain a key point coordinate array.
利用人脸关键点检测技术,在人脸图像中检测到人脸关键点,并获取检测到的人脸关键点的坐标,将各人脸关键点的坐标组成关键点坐标数组。Using the face key point detection technology, the face key points are detected in the face image, and the coordinates of the detected face key points are obtained, and the coordinates of each face key point are formed into a key point coordinate array.
实施例中,以68个人脸关键点为例,图8为本申请实施例提供的人脸关键点示意图。参考图8,当前人脸图像中共检测出68个人脸关键点,每个人脸关键点均具有对应的人脸语义信息。之后,将68个人脸关键点在人脸图像中的坐标按照一定顺序排列,以组成关键点坐标数组,此时,关键点坐标数组可以表示为:Landmarks={x 1,y 1,x 2,y 2,…,x 68,y 68},其中,(x 1,y 1)为第一个人脸关键点的坐标,以此类推。可理解,人脸关键点的排列顺序可以结合实际情况设定,实施例不做限定。 In the embodiment, taking 68 face key points as an example, FIG. 8 is a schematic diagram of face key points provided by the embodiment of the present application. Referring to FIG. 8 , a total of 68 face key points are detected in the current face image, and each face key point has corresponding face semantic information. After that, the coordinates of the 68 face key points in the face image are arranged in a certain order to form a key point coordinate array. At this time, the key point coordinate array can be expressed as: Landmarks={x 1 , y 1 , x 2 , y 2 ,...,x 68 ,y 68 }, where (x 1 , y 1 ) are the coordinates of the first face key point, and so on. It is understandable that the arrangement order of the key points of the face can be set according to the actual situation, and the embodiment is not limited.
示例性的,在视频交流过程中,相邻两帧的图像数据之间具有一定的关联性,而上述关键点检测过程是基于一帧图像数据得到的,此时,若相邻两帧图像数据中具有相同人脸语义信息的人脸关键点的坐标具有较大的差异,会影响后期计算,使得最终生成的虚拟图像出现抖动的问题。为了防止上述问题,实施例中,本步骤之后,还包括:对关键点坐标数组进行滤波操作及平滑操作。Exemplarily, in the process of video communication, there is a certain correlation between the image data of two adjacent frames, and the above key point detection process is obtained based on one frame of image data. The coordinates of the face key points with the same face semantic information have a large difference, which will affect the later calculation and cause the final generated virtual image to have a jitter problem. In order to prevent the above problem, in the embodiment, after this step, the method further includes: performing a filtering operation and a smoothing operation on the key point coordinate array.
其中,滤波操作是指结合前一帧图像数据中的关键点坐标数组对当前帧的关键点坐标数组进行调整,保证前一帧的关键点坐标数组到当前帧的关键点坐标数组是平滑渐变的,进而使得视频交流过程中各帧的关键点坐标数组均是平滑渐变的。一个实施例中,通过卡尔曼滤波的方式实现滤波操作。其中,对当前帧的关键点坐标数据进行卡尔曼滤波时,会将当前帧的关键点坐标数组和前一帧的关键点坐标数组进行加权,以将加权结果更新为当前帧的关键点坐标数据。The filtering operation refers to adjusting the key point coordinate array of the current frame in combination with the key point coordinate array in the image data of the previous frame, so as to ensure that the key point coordinate array of the previous frame and the key point coordinate array of the current frame are smoothly gradient. , so that the coordinate arrays of key points of each frame in the video communication process are all smoothly gradient. In one embodiment, the filtering operation is implemented by means of Kalman filtering. Among them, when Kalman filtering is performed on the key point coordinate data of the current frame, the key point coordinate array of the current frame and the key point coordinate array of the previous frame are weighted to update the weighted result to the key point coordinate data of the current frame. .
平滑操作用于避免部分人脸关键点是出格点(outliers)的情况,使得相邻人脸关键点间坐标曲线是平滑的。一个实施例中,采用PCA算法对滤波后的关键点坐标数组进行平滑操作, 以更新关键点坐标数组。The smoothing operation is used to avoid the situation that some face key points are outliers, so that the coordinate curve between adjacent face key points is smooth. In one embodiment, the PCA algorithm is used to perform a smoothing operation on the filtered key point coordinate array to update the key point coordinate array.
可理解,后续使用的关键点坐标数组是经过滤波操作和平滑操作后的关键点坐标数组。It is understandable that the key point coordinate array used subsequently is the key point coordinate array after filtering and smoothing operations.
步骤223、根据人脸图像、关键点坐标数组和预先设置的人脸模型先验信息确定目标对象的人脸中性表情基。Step 223: Determine the neutral facial expression base of the target object according to the facial image, the coordinate array of key points and the preset prior information of the facial model.
示例性的,构建根据人脸模型先验信息得到的参考三维人脸模型与人脸图像之间的能量约束公式,其具体为:Exemplarily, construct the energy constraint formula between the reference three-dimensional face model and the face image obtained according to the prior information of the face model, which is specifically:
Figure PCTCN2021070727-appb-000002
Figure PCTCN2021070727-appb-000002
其中,E lan(p)表示参考三维人脸模型和人脸图像之间的能量约束,p表示参考三维人脸模型使用的参数,p包括人脸身份基向量对应的系数α id、个性化表情基向量对应的系数α exp,弱透视投影矩阵∏以及刚性变换矩阵φ,其中,弱透视投影主要是用于将三维空间点信息(如参考三维人脸模型)投影至二维成像平面,弱透视投影矩阵是指将参考三维人脸模型投影到二维平面时使用的矩阵,刚性变换矩阵可包括刚性旋转矩阵、平移向量、尺度缩放因子。ω conf,j表示人脸图像中第j个人脸关键点检测的置信度,f j表示人脸图像中第j个人脸关键点的坐标,F表示关键点坐标数组,v j表示参考三维人脸模型映射到二维平面时第j个三维关键点的坐标。可理解,参考三维人脸模型映射到二维平面时人脸关键点的排列顺序与人脸图像中人脸关键点的排列顺序相同。一个实施例中,当参考三维人脸模型映射到二维平面时各三维关键点与人脸图像中对应人脸关键点的坐标越相似,E lan(p)越小,参考三维人脸模型越贴近目标对象的三维人脸模型。因此,实施例中,通过不断调整参考三维人脸模型的参数使得E lan(p)越来越小(即三维关键点和人脸关键点的投影误差越来越小),当E lan(p)稳定下来后,确定E lan(p)最小时p的具体值,此时三维关键点和人脸关键点的投影误差最小。当E lan(p)最小时,剔除α exp、∏和φ,取α id的具体值,并代入M=MU+PC id·α id中以得到最终的参考三维人脸模型,此时,得到的参考三维人脸模型最贴近目标对象没有任何表情时的三维人脸模型,因此,可以将得到的参考三维人脸模型确定为目标对象的人脸中性表情基。从而,本实施例仅通过单帧人脸图像重建出其人脸身份信息,这样每次重建的是该人脸图像对应的中性表情信息,为后续的个性化表情基构建做准备。 Among them, E lan (p) represents the energy constraint between the reference 3D face model and the face image, p represents the parameters used by the reference 3D face model, p includes the coefficient α id corresponding to the face identity base vector, the personalized expression The coefficient α exp corresponding to the basis vector, the weak perspective projection matrix ∏ and the rigid transformation matrix φ, where the weak perspective projection is mainly used to project the 3D space point information (such as the reference 3D face model) to the 2D imaging plane. The projection matrix refers to the matrix used when projecting the reference 3D face model to the 2D plane, and the rigid transformation matrix may include rigid rotation matrix, translation vector, and scale factor. ω conf,j represents the confidence of the detection of the jth face key point in the face image, f j represents the coordinates of the jth face key point in the face image, F represents the key point coordinate array, v j represents the reference 3D face The coordinates of the jth 3D keypoint when the model is mapped to the 2D plane. It can be understood that the arrangement order of the face key points when the reference 3D face model is mapped to the 2D plane is the same as the arrangement order of the face key points in the face image. In one embodiment, when the reference three-dimensional face model is mapped to the two-dimensional plane, the more similar the coordinates of each three-dimensional key point and the corresponding face key point in the face image, the smaller the E lan (p), the more the reference three-dimensional face model. A 3D face model close to the target object. Therefore, in the embodiment, by continuously adjusting the parameters of the reference three-dimensional face model, E lan (p) becomes smaller and smaller (that is, the projection error between the three-dimensional key points and the face key points becomes smaller and smaller), when E lan (p ) is stabilized, determine the specific value of p when E lan (p) is the smallest, at this time the projection error of the three-dimensional key point and the face key point is the smallest. When E lan (p) is the smallest, remove α exp , ∏ and φ, take the specific value of α id , and substitute it into M=MU+PC id ·α id to obtain the final reference three-dimensional face model, at this time, get The reference 3D face model is closest to the 3D face model when the target object has no expression. Therefore, the obtained reference 3D face model can be determined as the neutral facial expression base of the target object. Therefore, in this embodiment, the face identity information is reconstructed only from a single frame of face image, so that the neutral expression information corresponding to the face image is reconstructed each time to prepare for the subsequent construction of the personalized expression base.
步骤230、根据人脸中性表情基和预先设置的参考中性表情基和各参考个性化表情基,确定目标对象的各人脸个性化表情基,每个参考个性化表情基对应一个人脸个性化表情基。 Step 230, according to the neutral facial expression base of the human face and the preset reference neutral expression base and each reference individualized expression base, determine each face individualized expression base of the target object, and each reference individualized expression base corresponds to a human face. Personalized expression base.
参考中性表情基是预先设定的一个表示中性表情的表情基。参考个性化表情基是在参考中性表情基的基础上,添加预先设定的基础表情而得到的表情基。每个参考个性化表情基均有对应的物理意义。实施例中,利用面部动作编码系统(Facial Action Coding System,FACS) 将脸部各个肌肉动作定义为不同的动作单元AU值或AD值,即通过肌肉动作对各基础表情进行分类。例如,将“内侧眉毛向上拉升”对应的AU值记为AU1。并且,每个AU值还包括精细化值,该精细化值用于表明肌肉的运动幅度,例如,包含精细化值的AU值为AU1(0.2),则说明当前基础表情为内侧眉毛向上拉升,且拉升程度为0.2。再如,“眼睛闭合”对应的AU值记为AU43,那么,AU43(0)表示眼睛正常睁开,AU43(1)表示眼睛完全闭合,此时,图9为本申请实施例提供的表情精细化分区示意图。参考图9,从左至右分别为眼睛由完全睁开到完全闭合的过程中各眼睛闭合程度对应的精细化值。实施例中,按照肌肉动作定义了26个基础表情,每个基础表情对应一个参考个性化表情基,此时,各参考个性化表情基、对应的基础表情以及AU值如下表所示:The reference neutral expression base is a preset expression base representing neutral expressions. The reference personalized expression base is an expression base obtained by adding a preset basic expression on the basis of the reference neutral expression base. Each reference personalized expression base has a corresponding physical meaning. In the embodiment, the facial action coding system (Facial Action Coding System, FACS) is used to define each facial muscle action as a different action unit AU value or AD value, that is, to classify each basic expression by muscle action. For example, the AU value corresponding to "the inner eyebrow is raised upward" is recorded as AU1. In addition, each AU value also includes a refinement value, which is used to indicate the movement range of the muscle. For example, if the AU value including the refinement value is AU1 (0.2), it means that the current basic expression is that the inner eyebrow is pulled up. , and the pull degree is 0.2. For another example, the AU value corresponding to "eyes closed" is denoted as AU43, then AU43(0) indicates that the eyes are normally opened, and AU43(1) indicates that the eyes are completely closed. Schematic diagram of the partition. Referring to FIG. 9 , from left to right are the refinement values corresponding to the closing degrees of each eye during the process from fully opening to fully closing the eyes. In the embodiment, 26 basic expressions are defined according to muscle movements, and each basic expression corresponds to a reference personalized expression base. At this time, each reference personalized expression base, corresponding basic expression and AU value are shown in the following table:
BlendshapeBlendshape 自定义表情custom emoji FACS定义Definition of FACS BlendshapeBlendshape 自定义表情custom emoji FACS定义Definition of FACS
00 左眼闭left eye closed AU43AU43 1313 右嘴角上扬right corner of mouth up AU12AU12
11 右眼闭right eye closed AU43AU43 1414 左嘴角外展Left mouth corner abduction AU20AU20
22 左眼瞪大left eye widened AU5AU5 1515 右嘴角外展right mouth corner AU20AU20
33 右眼瞪大right eye wide AU5AU5 1616 上嘴唇内收upper lip adducted AU28AU28
44 皱左眉frown AU4AU4 1717 下嘴唇内收Adduction of lower lip AU28AU28
55 皱右眉frown AU4AU4 1818 下嘴唇向外lower lip outward AD29AD29
66 挑眉头raised eyebrows AU1AU1 1919 上嘴唇向上upper lip up AU10AU10
77 挑左眉尾Pick left eyebrow AU2AU2 2020 下嘴唇向下lower lip down AU16AU16
88 挑右眉尾Pick right eyebrow AU2AU2 21twenty one 左嘴角向下left corner of mouth down AU17AU17
99 张嘴open mouth AU26AU26 22twenty two 右嘴角向下right corner of mouth down AU17AU17
1010 下巴左移Chin left AD30AD30 23twenty three 嘟嘴pouting AU18AU18
1111 下巴右移Chin right AD30AD30 24twenty four 脸颊鼓起cheeks bulge AD34AD34
1212 左嘴角上扬Left corner of mouth up AU12AU12 2525 皱鼻wrinkled nose AU9AU9
表1Table 1
其中,Blendshape表示个性化表情基,0-25为26个个性化表情基的编号,自定义表情为每个表情基对应的基础表情,FACS定义表示每个个性化表情基对应的AU值或AD值。由上述自定义表情可知,各个性化表情基主要将部分左右对称的表情划分开,以便于目标对象为不对称表情时,可以准确的构建对应的人脸个性化表情基。按照上述表格,为参考中性表情基分别添加26个表情后,便可以得到26个参考个性化表情基。Among them, Blendshape represents the personalized expression base, 0-25 is the number of 26 personalized expression bases, the custom expression is the basic expression corresponding to each expression base, and the FACS definition represents the AU value or AD corresponding to each personalized expression base value. It can be seen from the above-mentioned custom expressions that each personalized expression base mainly divides some left and right symmetrical expressions, so that when the target object is an asymmetric expression, the corresponding personalized expression base of the face can be constructed accurately. According to the above table, after adding 26 expressions to the reference neutral expression base, 26 reference personalized expression bases can be obtained.
示例性的,根据参考中性表情基和参考个性化表情基可以确定由参考中性表情基形变到 参考个性化表情基时所需要的形变信息,该形变信息也可以认为是参考中性表情基传递到参考个性化表情基的传递形变量。之后,利用形变信息对人脸中性表情基进行处理,便可以得到人脸个性化表情基。Exemplarily, according to the reference neutral expression base and the reference individual expression base, the deformation information required when the reference neutral expression base is transformed to the reference individual expression base can be determined, and the deformation information can also be regarded as the reference neutral expression base. The pass-through variable passed to the reference personalized expression base. Afterwards, using the deformation information to process the neutral expression base of the human face, the personalized expression base of the human face can be obtained.
一个实施例中,通过三维网格形变的方式得到形变信息,此时,步骤230包括步骤231-步骤232:In one embodiment, the deformation information is obtained by means of three-dimensional mesh deformation. In this case, step 230 includes steps 231-232:
步骤231、根据参考中性表情基和参考个性化表情基确定形变信息。Step 231: Determine deformation information according to the reference neutral expression base and the reference personalized expression base.
一个实施例中,利用Delaunay三角剖分算法对参考中性表情基中的人脸关键点按照排列顺序进行三角剖分后便可以将参考中性表情基分割成多个三角面片,每个三角面片的三个顶点是围成三角形的三个人脸关键点,各三角面片可以形成表示参考中性表情基的三维网格。同样的,利用Delaunay三角剖分算法对参考个性化表情基中的人脸关键点按照排列顺序进行三角剖分后便可以将参考个性化表情基分割成多个三角面片,每个三角面片的三个顶点是围成三角形的三个人脸关键点,各三角面片可以形成表示参考个性化表情基的三维网格。In one embodiment, the reference neutral expression base can be divided into a plurality of triangular patches after triangulating the face key points in the reference neutral expression base according to the arrangement order by using the Delaunay triangulation algorithm. The three vertices of the patch are three face key points that form a triangle, and each triangular patch can form a three-dimensional mesh representing the reference neutral expression base. Similarly, using the Delaunay triangulation algorithm to triangulate the face key points in the reference personalized expression base in order of arrangement, the reference personalized expression base can be divided into multiple triangular patches, each triangular patch The three vertices of are the three face key points that form a triangle, and each triangular facet can form a three-dimensional mesh representing the reference personalized expression base.
参考个性化表情基中各三角面片均与参考中性表情基中各三角面片一一对应,按照该对应关系可以确定参考中性表情基中三角面片形变到参考个性化表情基中对应三角面片的形变信息。其中,形变信息表示参考中性表情基中三角面片形变时采用的传递形变量(旋转矩阵、平移向量、尺度缩放因子等),以使形变后的三角面片与参考个性化表情基中对应三角面片相同。每个三角面片对应一个形变信息。各三角面片的形变信息组成了参考中性表情基变化到当前的参考个性化表情基的形变信息。可理解,每个参考个性化表情基均有对应的形变信息。Each triangular face in the reference personalized expression base is in one-to-one correspondence with each triangular face in the reference neutral expression base. According to the corresponding relationship, it can be determined that the deformation of the triangular face in the reference neutral expression base corresponds to the reference individual expression base. The deformation information of the triangular patch. Among them, the deformation information represents the transfer deformation variables (rotation matrix, translation vector, scaling factor, etc.) used in the deformation of the triangular face in the reference neutral expression base, so that the deformed triangular face corresponds to the reference personalized expression base. Triangular patches are the same. Each triangular patch corresponds to a deformation information. The deformation information of each triangular facet constitutes the deformation information from the reference neutral expression base to the current reference individual expression base. Understandably, each reference personalized expression base has corresponding deformation information.
步骤232、根据形变信息和人脸中性表情基确定目标对象的人脸个性化表情基。Step 232 , determining the personalized facial expression base of the target object according to the deformation information and the neutral facial expression base.
示例性的,对人脸中性表情基和参考中性表情基进行三维网络配准,一个实施例中,采用迭代最近点(Iterative Closest Point,ICP)算法实现三维网格配准,即对参考中性表情基中各三角面片进行三维空间变换(例如缩放、旋转和平移),以使变换后的各三角面片与人脸中性表情基中的各三角面片一一对应。此时,进行三维空间变换后的参考中性表情基可以称为形变后的参考中性表情基。形变后的参考中性表情基中的各三角面片与人脸中性表情基中对应的三角面片在三维空间中的三维坐标高度相似或相同。Exemplarily, three-dimensional network registration is performed on the neutral expression base of the face and the reference neutral expression base. Three-dimensional space transformation (such as scaling, rotation, and translation) is performed on each triangular face in the neutral expression base, so that the transformed triangular facets are in one-to-one correspondence with each triangular face in the neutral face expression base. At this time, the reference neutral expression base after three-dimensional space transformation can be called the deformed reference neutral expression base. The three-dimensional coordinates in the three-dimensional space of each triangular patch in the deformed reference neutral expression base and the corresponding triangular face in the human face neutral expression base are highly similar or identical.
可选的,为了使形变后的参考中性表情基与人脸中性表情基更加匹配,实施例中,采用平滑约束和关键点约束的方式对形变后的参考中性表情基中各人脸关键点的三维坐标进行处理。其中,可以采用3D平滑处理进行平滑约束,采用PCA算法进行关键点约束。Optionally, in order to better match the deformed reference neutral expression base with the face neutral expression base, in the embodiment, smooth constraints and key point constraints are used for each face in the deformed reference neutral expression base. The three-dimensional coordinates of the key points are processed. Among them, 3D smoothing can be used for smoothing constraints, and PCA algorithm is used for key point constraints.
形变后的参考中性表情基和人脸中性表情基相同,此时,可以通过k-d树确定形变后的参考中性表情基中各三角面片与人脸中性表情基中各三角面片的对应关系,进而确定参考中性表情基中各三角面片与人脸中性表情基中各三角面片的对应关系。实施例中,k-d树可理解 为在k维欧几里德空间(Euclidean Space)组织点的数据结构。The deformed reference neutral expression base is the same as the human face neutral expression base. At this time, the triangular patches in the deformed reference neutral expression base and the face neutral expression base can be determined through the k-d tree. The corresponding relationship between the triangular facets in the reference neutral expression base and the triangular facets in the neutral face expression base is determined. In an embodiment, a k-d tree can be understood as a data structure that organizes points in a k-dimensional Euclidean space.
之后,根据参考中性表情基中各三角面片与人脸中性表情基中各三角面片的对应关系,利用参考中性表情基变换到某个参考个性化表情基时各三角面片的形变信息对人脸中性表情基中相应三角面片进行处理,即对三角面片进行形变,以使形变后的三角面片作为人脸个性化表情基中的三角面片,按照这种方式,对人脸中性表情基中每个三角面片进行处理后,便可以得到该参考个性化表情基对应的人脸个性化表情基。按照上述方式处理后,每个参考个性化表情基均有对应的人脸个性化表情基。Then, according to the corresponding relationship between each triangular facet in the reference neutral expression base and each triangular facet in the human face neutral expression base, when the reference neutral expression base is used to transform to a certain reference personalized expression base, The deformation information processes the corresponding triangular faces in the neutral expression base of the face, that is, deforms the triangular faces, so that the deformed triangular faces are used as the triangular faces in the face personalized expression base. According to this method , after processing each triangular face in the neutral facial expression base, the individualized facial expression base corresponding to the reference individualized expression base can be obtained. After processing in the above manner, each reference personalized expression base has a corresponding face personalized expression base.
此时,上述处理过程可以通过形变公式进行计算,其中,一个三角面片的形变公式表示为:At this time, the above processing process can be calculated by the deformation formula, wherein the deformation formula of a triangular patch is expressed as:
Figure PCTCN2021070727-appb-000003
Figure PCTCN2021070727-appb-000003
其中,
Figure PCTCN2021070727-appb-000004
表示人脸个性化表情基内三角面片的顶点相关信息,V T表示人脸中性表情基中对应三角面片的顶点相关信息,
Figure PCTCN2021070727-appb-000005
为V T的逆矩阵,
Figure PCTCN2021070727-appb-000006
表示参考个性化表情基中对应三角面片的顶点相关信息,V S表示参考中性表情基中对应三角面片的顶点相关信息,
Figure PCTCN2021070727-appb-000007
为V S的逆矩阵,
Figure PCTCN2021070727-appb-000008
Figure PCTCN2021070727-appb-000009
Figure PCTCN2021070727-appb-000010
为人脸个性化表情基中三角面片的三个人脸关键点的三维坐标,
Figure PCTCN2021070727-appb-000011
为三角面片的法向量,同样的,V T=[v T2-v T1 v T3-v T1 v T4-v T1],v T1、v T2和v T3为人脸中性表情基中对应三角面片的三个人脸关键点的三维坐标,v T4为三角面片的法向量,
Figure PCTCN2021070727-appb-000012
Figure PCTCN2021070727-appb-000013
Figure PCTCN2021070727-appb-000014
为参考个性化表情基中对应三角面片的三个人脸关键点的三维坐标,
Figure PCTCN2021070727-appb-000015
为三角面片的法向量,V S=[v S2-v S1 v S3-v S1 v S4-v S1],v S1、v S2和v S3为参考中性表情基中对应三角面片的三个人脸关键点的三维坐标,v S4为三角面片的法向量。
Figure PCTCN2021070727-appb-000016
包含V S形变到
Figure PCTCN2021070727-appb-000017
的形变信息,通过该形变信息对V T进行形变便可以得到
Figure PCTCN2021070727-appb-000018
按照上述公式对人脸中性表情基中的三角面片进行形变后,便可以得到对应的人脸个性化表情基。
in,
Figure PCTCN2021070727-appb-000004
Represents the vertex-related information of the triangular facets in the face personalized expression base, V T represents the vertex-related information of the corresponding triangular facets in the face neutral expression base,
Figure PCTCN2021070727-appb-000005
is the inverse matrix of V T ,
Figure PCTCN2021070727-appb-000006
represents the vertex-related information of the corresponding triangular facets in the reference personalized expression base, V S represents the vertex-related information of the corresponding triangular facets in the reference neutral expression base,
Figure PCTCN2021070727-appb-000007
is the inverse matrix of V S ,
Figure PCTCN2021070727-appb-000008
Figure PCTCN2021070727-appb-000009
and
Figure PCTCN2021070727-appb-000010
is the three-dimensional coordinates of the three face key points of the triangular face in the face personalized expression base,
Figure PCTCN2021070727-appb-000011
is the normal vector of the triangular face, similarly, V T =[v T2 -v T1 v T3 -v T1 v T4 -v T1 ], v T1 , v T2 and v T3 are the corresponding triangular faces in the neutral expression base of the face The three-dimensional coordinates of the three face key points of the patch, v T4 is the normal vector of the triangular patch,
Figure PCTCN2021070727-appb-000012
Figure PCTCN2021070727-appb-000013
and
Figure PCTCN2021070727-appb-000014
In order to refer to the three-dimensional coordinates of the three face key points corresponding to the triangular face in the personalized expression base,
Figure PCTCN2021070727-appb-000015
is the normal vector of the triangular patch, V S =[v S2 -v S1 v S3 -v S1 v S4 -v S1 ], v S1 , v S2 and v S3 are the three corresponding triangular patches in the reference neutral expression base The three-dimensional coordinates of the face key points, v S4 is the normal vector of the triangular patch.
Figure PCTCN2021070727-appb-000016
contains the V S deformation to
Figure PCTCN2021070727-appb-000017
The deformation information of , can be obtained by deforming V T through this deformation information
Figure PCTCN2021070727-appb-000018
After the triangular facets in the neutral facial expression base are deformed according to the above formula, the corresponding personalized facial expression base can be obtained.
举例而言,图10为本申请实施例提供的表情传递示意图。参考图10,第一排第一列为参考中性表情基,第一排的第二列至第四列为三个参考个性化表情基,其对应的基本表情分别为闭右眼、张嘴和嘟嘴。第二排第一列为人脸中性表情基,根据参考中性表情基和各参考个性化表情基确定形变信息,之后,根据形变信息对人脸中性表情基进行处理后,便可以得到人脸个性化表情基,图10中第二排第二列至第四列分别为根据第一排第二列至第四列的参考个性化表情基得到的人脸个性化表情基,此时,三个人脸个性化表情基对应的基本表情分别为闭右眼、张嘴和嘟嘴,即实现了将基本表情由参考个性化表情基传递到人脸个性化表情基。For example, FIG. 10 is a schematic diagram of expression transfer provided by an embodiment of the present application. Referring to Figure 10, the first column in the first row is a reference neutral expression base, and the second to fourth columns in the first row are three reference personalized expression bases, and the corresponding basic expressions are closed right eye, open mouth and pouting. The first column in the second row is the neutral facial expression base, and the deformation information is determined according to the reference neutral expression base and each reference personalized expression base. Face personalized expression base, the second row, second row to fourth column in Figure 10 are the face personalized expression base obtained according to the reference personalized expression base of the first row, second row to fourth column, at this time, The basic expressions corresponding to the three face personalized expression bases are closing the right eye, opening the mouth, and pouting, namely, the basic expressions are transferred from the reference personalized expression base to the face personalized expression base.
步骤240、根据人脸中性表情基和多个人脸个性化表情基构建目标对象的三维人脸模型。 Step 240 , constructing a three-dimensional face model of the target object according to the neutral facial expression base and multiple personalized facial expression bases.
步骤250、构建三维人脸模型映射到人脸图像时的误差参数公式。 Step 250 , constructing an error parameter formula when the three-dimensional face model is mapped to the face image.
误差参数公式也可理解为能量函数。示例性,误差参数公式的构建规则可以根据实际情况设定。一个实施例中,通过最小化残差的方式来构建误差参数公式,此时,误差参数公式为:The error parameter formula can also be understood as an energy function. Exemplarily, the construction rule of the error parameter formula can be set according to the actual situation. In one embodiment, the error parameter formula is constructed by minimizing the residual error. In this case, the error parameter formula is:
Figure PCTCN2021070727-appb-000019
Figure PCTCN2021070727-appb-000019
其中,E表示误差参数,B表示三维人脸模型,
Figure PCTCN2021070727-appb-000020
B 0表示目标对象的人脸中性表情基,B i表示目标对象的第i个人脸个性化表情基,1≤i≤n,n为人脸个性化表情基的总数量,β i表示B i对应的权重系数,B k表示三维人脸模型中第k个人脸关键点,1≤k≤M,M为人脸关键点的总数量(上述实施例中M=68),f k表示人脸图像中第k个人脸关键点,s表示三维人脸模型映射到人脸图像时的尺度缩放因子,R表示三维人脸模型映射到人脸图像时的刚性旋转矩阵,t表示三维人脸模型映射到人脸图像时的平移向量,s、R和t为位姿参数,||*||表示*的范数。可理解,通过sR·B k+t-f k可体现三维人脸模型映射到二维平面时,三维人脸模型中第k个人脸关键点与人脸图像中第k个人脸关键点之间的差异,进而根据该差异得到误差参数。上述公式中,β i、s、R和t为未知数。需说明,三维人脸模型的人脸关键点与人脸图像的人脸关键点具有相同的排列顺序。
Among them, E represents the error parameter, B represents the three-dimensional face model,
Figure PCTCN2021070727-appb-000020
B 0 represents the neutral facial expression base of the target object, B i represents the ith face personalized expression base of the target object, 1≤i≤n, n is the total number of face personalized expression bases, β i represents B i Corresponding weight coefficient, B k represents the kth face key point in the three-dimensional face model, 1≤k≤M, M is the total number of face key points (M=68 in the above embodiment), f k represents the face image In the kth face key point, s represents the scaling factor when the 3D face model is mapped to the face image, R represents the rigid rotation matrix when the 3D face model is mapped to the face image, and t represents the 3D face model mapped to the face image. The translation vector of the face image, s, R, and t are the pose parameters, and ||*|| represents the norm of *. It can be understood that when the 3D face model is mapped to the 2D plane, the difference between the kth face key point in the 3D face model and the kth face key point in the face image can be reflected by sR·B k +tf k , and then obtain the error parameter according to the difference. In the above formula, β i , s, R and t are unknowns. It should be noted that the face key points of the three-dimensional face model and the face key points of the face image have the same arrangement order.
另一个实施例中,通过线性最小二乘法的方式构建误差参数公式,即将上述最小化残差转换成线性最小二乘法求解的形式。其中,最小二乘法(又称最小平方法)是一种数学优化技术,利用最小二乘法可以简便地求得未知的数据(实施例中,β i、s、R和t为未知的数据),并使得这些求得的数据与实际数据之间误差的平方和为最小。此时,将最小化残差转换成线性最小二乘法求解的形式时,误差参数公式可以表示为: In another embodiment, the error parameter formula is constructed by means of the linear least squares method, that is, the above-mentioned minimized residuals are converted into the form of solving by the linear least squares method. Among them, the least squares method (also known as the least squares method) is a mathematical optimization technique, and the unknown data can be easily obtained by using the least squares method (in the embodiment, β i , s, R and t are unknown data), And make the sum of squares of errors between the obtained data and the actual data to be the smallest. At this time, when the minimization residual is converted into the form of linear least squares solution, the error parameter formula can be expressed as:
min E' exp=min||Aβ-b|| 2 min E' exp =min||Aβ-b|| 2
其中,A=sR·ΔB,b=f-t-sR·B 0Among them, A=sR·ΔB, b=ft-sR·B 0 .
其中,E' exp表示误差参数,ΔB=[B 1-B 0 B 2-B 0 … B n-B 0],B 0表示人脸中性表情基,B i表示第i个人脸个性化表情基,1≤i≤n,n表示人脸个性化表情基的总数量,β表示权重系数向量,β=(β 1 β 2 … β n),β i表示第i个人脸个性化表情基的权重系数,s表示三维人脸模型映射到人脸图像时的尺度缩放因子,R表示三维人脸模型映射到人脸图像时的人脸刚性旋转矩阵,t表示三维人脸模型映射到人脸图像时的平移向量,f表示人脸图像中的 人脸关键点,f=(f 1 … f M),M为人脸关键点的总数量。通过上述公式可知,Aβ可以体现人脸个性化表情基与人脸中性表情基在二维平面下的差异值,b可以体现人脸中性表情基和人脸图像在二维平面下的差异值,可理解,三维人脸模型与人脸图像越接近时,Aβ与b之间的差异越小。利用上述误差参数公式求解β时,误差参数公式为一个线性方程组,此时,求解的β为(A TA) -1·A Tb。 Among them, E' exp represents the error parameter, ΔB=[B 1 -B 0 B 2 -B 0 ... B n -B 0 ], B 0 represents the neutral facial expression base, and B i represents the ith person's personalized facial expression Base, 1≤i≤n, n represents the total number of face personalized expression bases, β represents the weight coefficient vector, β=(β 1 β 2 … β n ), β i represents the i-th face personalized expression base Weight coefficient, s represents the scaling factor when the 3D face model is mapped to the face image, R represents the rigid rotation matrix of the face when the 3D face model is mapped to the face image, t represents the 3D face model is mapped to the face image The translation vector when , f represents the face key points in the face image, f=(f 1 ... f M ), and M is the total number of face key points. From the above formula, it can be seen that Aβ can reflect the difference between the personalized facial expression base and the neutral facial expression base in the two-dimensional plane, and b can reflect the difference between the neutral facial expression base and the face image in the two-dimensional plane. It can be understood that the closer the 3D face model is to the face image, the smaller the difference between Aβ and b. When using the above error parameter formula to solve β, the error parameter formula is a linear equation system, and at this time, the solved β is (A T A) -1 ·A T b.
又一个实施例中,利用前一实施例中线性最小二乘法求解β时误差参数公式为一个线性方程组,且求解的β为(A TA) -1·A Tb。此时,β的取值范围即包含正数也包含负数,而负数对于三维人脸模型是无意义的,即权重系数不能为负数。同时,每次求解β时,均是通过三维人脸模型映射时人脸关键点的差异值进行求解,若人脸图像中人脸关键点的检测过程出现失误,便会影响计算结果的准确性,例如,人脸图像中嘴部为闭合的状态,但是,由于人脸关键点检测误差,使得上嘴唇和下嘴唇中人脸关键点之间存在一定距离(无误差时两个关键点应极度重合或完全重合),这样后续计算过程中会出现将嘴部识别为张开的可能。因此,实施例中,对β进行二次规划并进行动态约束,以避免上述问题。此时,β的动态方程可表示为Cβ≤d,即误差参数公式为: In yet another embodiment, the error parameter formula when β is solved by the linear least squares method in the previous embodiment is a linear equation system, and the solved β is (A T A) −1 ·A T b. At this time, the value range of β includes both positive and negative numbers, and negative numbers are meaningless for the 3D face model, that is, the weight coefficient cannot be negative. At the same time, each time β is solved, it is solved by the difference value of the key points of the face when the 3D face model is mapped. If the detection process of the key points of the face in the face image is wrong, it will affect the accuracy of the calculation results. , for example, the mouth in the face image is closed, but due to the detection error of the key points of the face, there is a certain distance between the key points of the face in the upper lip and the lower lip (the two key points should be extremely coincident or completely coincident), so that the possibility of recognizing the mouth as open will appear in the subsequent calculation process. Therefore, in the embodiment, quadratic programming and dynamic constraints are performed on β to avoid the above problems. At this time, the dynamic equation of β can be expressed as Cβ≤d, that is, the error parameter formula is:
min E' exp=min||Aβ-b|| 2 min E' exp =min||Aβ-b|| 2
其中,Cβ≤d。Among them, Cβ≤d.
其中,C表示β的约束参数,d表示β的取值范围。其中,C和d是对β的约束,其中,
Figure PCTCN2021070727-appb-000021
其中,eye表示单位矩阵,eye(n)表示n个人脸个性化表情基对应的单位矩阵。d的具体值可以根据实际情况设定。例如,β的取值范围应该在0.5-1之间,因此,d可以设置为0.5和1。一个可选方案中,
Figure PCTCN2021070727-appb-000022
其中,ones(n)表示n个权重系数的取值上界,其包含n个值,每个值对应一个权重系数,zero(n)表示n个权重系数的取值下界,其包含n个值,每个值对应一个权重系数。一般而言,权重系数应在0-1之间,因此,ones(n)可以为n个1,zero(n)可以为n个0。即
Figure PCTCN2021070727-appb-000023
其中,1和0均为n个。通过上述约束,可以将权重系数的取值范围固定在0-1之间,防止出现负数的情况。另一个可选方案 中,
Figure PCTCN2021070727-appb-000024
其中,ones(n)表示n个权重系数的取值上界,zero(n)表示n个权重系数的取值下界,p n和q n为取值约束矩阵,p n和q n根据人脸图像中的人脸关键点相对距离确定。相对距离是指人脸关键点在人脸图像中的像素距离,计算一个相对距离所使用的人脸关键点位于同一关键部位,不同表情下,关键部位中的各人脸关键点间的像素距离可能会存在差异,因此,实施例中通过相对距离确定p n和q n。其中,计算相对距离可使用关键部位中的全部人脸关键点或者部分人脸关键点。可理解,每个人脸个性化表情基均对应一个p值和q值,n个p值组成了p n,n个q值组成q n。通过上述公式可以使不同人脸个性化表情基对应的权重系数具有不同的取值范围。举例而言,图11为本申请实施例提供的人脸关键点选取示意图。参考图11,在人脸图像中,左眼对应的人脸关键点共有6个,其中,人脸关键点P1和人脸关键点P2分别为左眼中位于上眼皮和下眼皮的一组人脸关键点,人脸关键点P3和人脸关键点P4分别为左眼中位于上眼皮和下眼皮的一组人脸关键点,当左眼闭合时,人脸关键点P1和人脸关键点P2接近或重合,人脸关键点P3和人脸关键点P4接近或重合。因此,通过计算人脸关键点P1和人脸关键点P2的距离以及人脸关键点P3和人脸关键点P4的距离,便可以确定左眼是否闭合。此时,用于确定左眼是否闭合的距离便可以认为是左眼的人脸关键点相对距离,且左眼的人脸关键点相对距离为
Figure PCTCN2021070727-appb-000025
L表示人脸关键点相对距离,可理解L具体为像素距离,p 1表示人脸关键点P1在人脸图像中的二维坐标,p 2表示人脸关键点P2在人脸图像中的二维坐标,p 3表示人脸关键点P3在人脸图像中的二维坐标,p 4表示人脸关键点P4在人脸图像中的二维坐标。当左眼闭合时,即使人脸关键点检测出现问题,L的值也会相对较小,如L≤5,此时,5可理解为允许的误差距离,即人脸关键点检测出现问题时,只要人脸关键点相对距离未超过误差距离,便可以确定对应关键部位当前的动作,进而为该动作对应的人脸个性化表情基的权重系数设置一个合理的取值范围。例如,计算L的距离后,若L≤5,则说明人脸图像中左眼闭合的概率较高,那么,表示左眼闭的人脸个性化表情基所对应的权重系数应该较大,因此,可以为该权重系数设置一个较大的取值范围,如设置0.9-1的取值范围,此时,p n中与左眼闭对应的p值可以取1,q n中与左眼闭对应的q值可以取0.9,以使β中表示左眼闭的人脸个性化表情基的权重系数取值范围在0.9-1之间。同理,参考图11,通过计算嘴部三组人脸关键点(方框内的人脸关键点)的人脸 关键点相对距离可以确定嘴部是否闭合,进而为嘴部张开对应的人脸个性化表情基的权重系数设置一个合理p值和q值,如人脸若关键点相对距离未超过误差距离(例如L≤3),则认为嘴部闭合,设置p值为0.1,q值为0,进而使表示张嘴的人脸个性化表情基所对应的权重系数的取值范围在0-0.1之间,若人脸关键点相对距离超过误差距离,则设置p值为1,q值为0,进而使表示张嘴的人脸个性化表情基所对应的权重系数的取值范围在0-1之间。通过计算右眼两组人脸关键点(方框内的人脸关键点)的人脸关键点相对距离可以确定右眼是否闭合,进而为右眼闭合对应的人脸个性化表情基的权重系数设置一个合理的取值范围。按照上述方式,预先确定人脸个性化表情基对应的人脸关键点相对距离计算方式、误差距离以及人脸关键点相对距离未超过误差距离时的p值和q值、人脸关键点相对距离超过误差距离时的p值和q值。进而在构建误差参数公式时,通过计算人脸关键点相对距离确定p值和q值,进而确定权重系数的取值范围,此时,人脸关键点相对距离和对应的误差距离可以认为是权重系数的先验信息。这样可以在人脸关键点检测时允许人脸关键点检测有误带来的误差,保证了后续处理过程的准确性。
Among them, C represents the constraint parameter of β, and d represents the value range of β. where C and d are constraints on β, where,
Figure PCTCN2021070727-appb-000021
Among them, eye represents the unit matrix, and eye(n) represents the unit matrix corresponding to the personalized expression base of n faces. The specific value of d can be set according to the actual situation. For example, β should be in the range of 0.5-1, so d can be set between 0.5 and 1. In an alternative,
Figure PCTCN2021070727-appb-000022
Among them, ones(n) represents the upper bound of n weight coefficients, which contains n values, each value corresponds to a weight coefficient, and zero(n) represents the lower bound of n weight coefficients, which contains n values , each value corresponds to a weight coefficient. Generally speaking, the weight coefficient should be between 0-1, therefore, ones(n) can be n 1s, and zero(n) can be n 0s. which is
Figure PCTCN2021070727-appb-000023
Among them, both 1 and 0 are n. Through the above constraints, the value range of the weight coefficient can be fixed between 0-1 to prevent the occurrence of negative numbers. In another alternative,
Figure PCTCN2021070727-appb-000024
Among them, ones(n) represents the upper bound of the n weight coefficients, zero(n) represents the lower bound of the n weight coefficients, p n and q n are the value constraint matrices, p n and q n are based on the face The relative distance of the face key points in the image is determined. The relative distance refers to the pixel distance of the face key points in the face image. The face key points used to calculate a relative distance are located in the same key part. Under different expressions, the pixel distance between the face key points in the key parts. Differences may exist, therefore, pn and qn are determined by relative distances in the examples. Wherein, all face key points or some face key points in key parts may be used for calculating the relative distance. It is understandable that each face personalized expression base corresponds to a p value and a q value, n p values form p n , and n q values form q n . Through the above formula, the weight coefficients corresponding to different facial personalized expression bases can have different value ranges. For example, FIG. 11 is a schematic diagram of face key point selection according to an embodiment of the present application. Referring to Figure 11, in the face image, there are 6 face key points corresponding to the left eye, among which the face key point P1 and the face key point P2 are a group of faces located in the upper eyelid and the lower eyelid in the left eye respectively. Key points, face key point P3 and face key point P4 are a group of face key points located in the upper eyelid and lower eyelid in the left eye respectively. When the left eye is closed, the face key point P1 and the face key point P2 are close to each other. or overlap, the face key point P3 and the face key point P4 are close to or overlap. Therefore, by calculating the distance between the face key point P1 and the face key point P2 and the distance between the face key point P3 and the face key point P4, it can be determined whether the left eye is closed. At this time, the distance used to determine whether the left eye is closed can be regarded as the relative distance of the face key points of the left eye, and the relative distance of the face key points of the left eye is
Figure PCTCN2021070727-appb-000025
L represents the relative distance of the face key point, it can be understood that L is the pixel distance, p 1 represents the two-dimensional coordinate of the face key point P1 in the face image, and p 2 represents the second position of the face key point P2 in the face image. Dimensional coordinates, p 3 represents the two-dimensional coordinates of the face key point P3 in the face image, and p 4 represents the two-dimensional coordinates of the face key point P4 in the face image. When the left eye is closed, even if there is a problem with face key point detection, the value of L will be relatively small, such as L≤5, at this time, 5 can be understood as the allowable error distance, that is, when there is a problem with face key point detection , as long as the relative distance of the key points of the face does not exceed the error distance, the current action of the corresponding key part can be determined, and then a reasonable value range can be set for the weight coefficient of the face personalized expression base corresponding to the action. For example, after calculating the distance of L, if L≤5, it means that the probability of the left eye closed in the face image is high, then the weight coefficient corresponding to the face personalized expression base indicating that the left eye is closed should be larger, so , you can set a larger value range for the weight coefficient, such as setting a value range of 0.9-1, at this time, the p value corresponding to the left eye closed in p n can be 1, and the left eye closed in q n The corresponding q value can be set to 0.9, so that the value range of the weight coefficient of the face personalized expression base representing the left eye closed in β is between 0.9 and 1. In the same way, referring to Figure 11, by calculating the relative distance of the face key points of the three groups of face key points of the mouth (the face key points in the box), it can be determined whether the mouth is closed, and then the corresponding person whose mouth is open can be determined. Set a reasonable p value and q value for the weight coefficient of the face personalized expression base. For example, if the relative distance of the key points on the face does not exceed the error distance (for example, L≤3), the mouth is considered to be closed, and the p value is set to 0.1, and the q value is set. is 0, so that the value range of the weight coefficient corresponding to the face personalized expression base representing the open mouth is between 0 and 0.1. If the relative distance of the key points of the face exceeds the error distance, set the value of p to 1 and the value of q is 0, so that the value range of the weight coefficient corresponding to the face personalized expression base representing the open mouth is between 0 and 1. Whether the right eye is closed can be determined by calculating the relative distance of the face key points of the two groups of face key points (face key points in the box) of the right eye, and then the weight coefficient of the face personalized expression base corresponding to the right eye closed Set a reasonable value range. According to the above method, the calculation method of the relative distance of the face key points corresponding to the face personalized expression base, the error distance, and the p and q values when the relative distance of the face key points does not exceed the error distance, and the relative distance of the face key points are predetermined. p- and q-values when the error distance is exceeded. Then, when constructing the error parameter formula, the p value and q value are determined by calculating the relative distance of the key points of the face, and then the value range of the weight coefficient is determined. At this time, the relative distance of the key points of the face and the corresponding error distance can be regarded as the weight. prior information on the coefficients. In this way, errors caused by incorrect detection of face key points can be allowed during the detection of face key points, and the accuracy of the subsequent processing process is ensured.
再一实施例中,为了便于后序对误差参数公式min E' exp=min||Aβ-b|| 2的计算,将其转换为
Figure PCTCN2021070727-appb-000026
其中,
Figure PCTCN2021070727-appb-000027
由||Aβ-b|| 2转换得到,此时,误差参数公式为:
In yet another embodiment, in order to facilitate the subsequent calculation of the error parameter formula min E' exp =min||Aβ-b|| 2 , it is converted into
Figure PCTCN2021070727-appb-000026
in,
Figure PCTCN2021070727-appb-000027
Converted from ||Aβ-b|| 2 , at this time, the error parameter formula is:
Figure PCTCN2021070727-appb-000028
Figure PCTCN2021070727-appb-000028
其中,Cβ≤d。Among them, Cβ≤d.
其中,E exp表示误差参数,β表示权重系数向量,β=(β 1 β 2 … β n),n表示人脸个性化表情基的总数量,β i表示第i个人脸个性化表情基的权重系数,1≤i≤n,A=sR·ΔB,s表示三维人脸模型映射到人脸图像时的尺度缩放因子,R表示三维人脸模型映射到人脸图像时的刚性旋转矩阵,ΔB=[B 1-B 0 B 2-B 0 … B n-B 0],B 0表示人脸中性表情基,B i表示第i个人脸个性化表情基,b=f-t-sR·B 0,f表示人脸图像中的人脸关键点,t表示三维人脸模型映射到人脸图像时的平移向量;s、R和t为位姿参数,C表示β的约束参数,d表示β的取值范围。其中,C和d的确定方式可参考上述实施例。 Among them, E exp represents the error parameter, β represents the weight coefficient vector, β=(β 1 β 2 … β n ), n represents the total number of face personalized expression bases, β i represents the ith face personalized expression base Weight coefficient, 1≤i≤n, A=sR·ΔB, s is the scaling factor when the 3D face model is mapped to the face image, R is the rigid rotation matrix when the 3D face model is mapped to the face image, ΔB =[B 1 -B 0 B 2 -B 0 ... B n -B 0 ], B 0 represents the neutral facial expression base, B i represents the i-th face personalized expression base, b=ft-sR·B 0 , f represents the key points of the face in the face image, t represents the translation vector when the three-dimensional face model is mapped to the face image; s, R and t are the pose parameters, C represents the constraint parameter of β, and d represents the Ranges. The manner of determining C and d may refer to the foregoing embodiment.
又再一实施例中,还可以采用LI正则优化的方式构建误差参数公式,并且为了保证权重系数在正确的取值范围内,构建误差参数公式时,可以将L1正则结合梯度投影,即在每次利用L1正则计算权重系数时,将权重系数的梯度投影至权重系数的取值范围中,以保证最终计算的权重系数位于相应的取值范围内。此时,构建的误差参数公式为:In yet another embodiment, the error parameter formula can also be constructed by the LI regular optimization method, and in order to ensure that the weight coefficient is within the correct value range, when constructing the error parameter formula, the L1 regularity can be combined with the gradient projection, that is, in each When calculating the weight coefficient using the L1 regularity, the gradient of the weight coefficient is projected into the value range of the weight coefficient to ensure that the final calculated weight coefficient is within the corresponding value range. At this point, the constructed error parameter formula is:
Figure PCTCN2021070727-appb-000029
Figure PCTCN2021070727-appb-000029
Figure PCTCN2021070727-appb-000030
Figure PCTCN2021070727-appb-000030
Figure PCTCN2021070727-appb-000031
Figure PCTCN2021070727-appb-000031
其中,
Figure PCTCN2021070727-appb-000032
是第j个人脸个性化表情基映射到二维空间后第i个人脸关键点与人脸图像中第i个人脸关键点在x轴上的相关信息(相关信息用于体现两个人脸关键点坐标是否一致),
Figure PCTCN2021070727-appb-000033
是第k个人脸个性化表情基映射到二维空间后第i个人脸关键点与人脸图像中第i个人脸关键点在x轴上的相关信息,y (i)是三维人脸模型映射到二维空间后第i个人脸关键点与人脸图像中第i个人脸关键点在y轴上的相关信息,λ为L1正则的系数,其值可以根据实际情况设定,θ j为第j个人脸个性化表情基的权重系数,θ k为第k个人脸个性化表情基的权重系数,n为人脸个性化表情基的总数量,m为人脸关键点的总数量。进一步的,设定权重系数在0-1之间,此时,为了便于计算,可以将上述公式转换为:
in,
Figure PCTCN2021070727-appb-000032
is the relevant information on the x-axis between the i-th face key point and the i-th face key point in the face image after the personalized expression base of the j-th face is mapped to the two-dimensional space (the relevant information is used to reflect the two face key points whether the coordinates are the same),
Figure PCTCN2021070727-appb-000033
is the correlation information between the ith face key point and the ith face key point in the face image on the x-axis after the kth face personalized expression base is mapped to the two-dimensional space, y (i) is the three-dimensional face model mapping After entering the two-dimensional space, the relevant information of the ith face key point and the ith face key point in the face image on the y-axis, λ is the L1 regular coefficient, and its value can be set according to the actual situation, θ j is the th j is the weight coefficient of the individual face expression base, θ k is the weight coefficient of the kth face individual expression base, n is the total number of face individual expression bases, and m is the total number of face key points. Further, the weight coefficient is set between 0 and 1. At this time, in order to facilitate the calculation, the above formula can be converted into:
Figure PCTCN2021070727-appb-000034
Figure PCTCN2021070727-appb-000034
Figure PCTCN2021070727-appb-000035
Figure PCTCN2021070727-appb-000035
Figure PCTCN2021070727-appb-000036
Figure PCTCN2021070727-appb-000036
其中,
Figure PCTCN2021070727-appb-000037
为处理前一帧图像数据时第j个人脸个性化表情基的权重系数。按照上述公式便可以计算得到各人脸个性化表情基的权重系数,之后,根据权重系数便可以计算得到位姿参数,如参照上述实施例中使用的误差参数方式,以计算得到位姿参数。
in,
Figure PCTCN2021070727-appb-000037
It is the weight coefficient of the personalized expression base of the jth face when processing the image data of the previous frame. According to the above formula, the weight coefficient of each face personalized expression base can be calculated, and then the pose parameter can be calculated according to the weight coefficient.
在本实施例中,后续计算过程中使用的误差参数公式为:In this embodiment, the error parameter formula used in the subsequent calculation process is:
Figure PCTCN2021070727-appb-000038
Figure PCTCN2021070727-appb-000038
其中,Cβ≤d。Among them, Cβ≤d.
选择上述误差参数公式的理由在于:后续迭代计算时,计算过程简单,且有效对权重参 数进行二次规划和动态监督。The reason for choosing the above error parameter formula is that in the subsequent iterative calculation, the calculation process is simple, and the weight parameters are effectively carried out quadratic programming and dynamic supervision.
步骤260、根据误差参数公式,确定误差参数最小时三维人脸模型的位姿参数以及各人脸个性化表情基的权重系数。Step 260: Determine, according to the error parameter formula, the pose parameters of the three-dimensional face model when the error parameters are the smallest and the weight coefficients of the individualized expression bases of each face.
示例性的,构建误差参数公式后,由于误差参数公式中的未知数包括位姿参数和权重系数。因此,通过误差参数公式便可以确定误差参数最小时误差参数公式中所使用的位姿参数和权重参数,进而将位姿参数和权重参数作为最终计算得到的位姿参数和权重系数。Exemplarily, after the error parameter formula is constructed, the unknowns in the error parameter formula include pose parameters and weight coefficients. Therefore, the pose parameters and weight parameters used in the error parameter formula when the error parameter is the smallest can be determined through the error parameter formula, and then the pose parameters and weight parameters are used as the final calculated pose parameters and weight coefficients.
一个实施例中,计算位姿参数和权重系数时,可以采用交替迭代的方式进行计算。例如,先为权重系数设置初始化的参数,之后,将初始化参数代入误差参数公式以固定误差参数公式中的权重系数,并进行计算以确定当前计算过程中误差参数最小时位姿参数的数值。之后,将计算得到的位姿参数的数值再次代入误差参数公式中以固定位姿参数,并进行计算以确定当前计算过程中误差参数最小时权重系数的参数。此时,认为一次迭代计算结束,之后,获取当前计算的权重系数的参数,并再次重复上述过程,以进行迭代计算,直到迭代计算的次数达到预先设定的次数为止,或者是,直到误差参数小于预先设定的参数阈值为止。之后,将迭代计算停止后的位姿参数和权重系数作为最终计算得到的位姿参数和权重系数。在一个实施例中,步骤260包括步骤261-步骤267:In one embodiment, when calculating the pose parameters and the weight coefficients, the calculation may be performed in an alternate iterative manner. For example, first set the initialization parameters for the weight coefficient, then substitute the initialization parameters into the error parameter formula to fix the weight coefficient in the error parameter formula, and perform calculation to determine the value of the pose parameter when the error parameter is the smallest in the current calculation process. After that, the value of the calculated pose parameter is substituted into the error parameter formula again to fix the pose parameter, and the calculation is performed to determine the parameter of the weight coefficient when the error parameter is the smallest in the current calculation process. At this point, it is considered that one iteration calculation is over, after that, the parameters of the currently calculated weight coefficient are obtained, and the above process is repeated again for iterative calculation, until the number of iterative calculations reaches a preset number of times, or, until the error parameter less than the preset parameter threshold. After that, the pose parameters and weight coefficients after the iterative calculation is stopped are used as the finally calculated pose parameters and weight coefficients. In one embodiment, step 260 includes steps 261-267:
步骤261、获取各人脸个性化表情基的初始化权重系数,将初始化权重系数作为当前权重系数。Step 261: Obtain the initialization weight coefficients of each face personalized expression base, and use the initialization weight coefficients as the current weight coefficients.
示例性的,初始化权重系数是指预先设定的权重系数,即预先为每个人脸个性化表情基设置一权重系数。初始化权重系数的具体值可以根据实际情况设定,如根据人脸个性化表情基权重系数的取值范围,选择一取值边界作为该人脸个性化表情基的初始化权重系数。Exemplarily, the initialization weight coefficient refers to a preset weight coefficient, that is, a weight coefficient is preset for each individual face expression base. The specific value of the initialization weight coefficient can be set according to the actual situation. For example, according to the value range of the weight coefficient of the face personalized expression base, a value boundary is selected as the initialization weight coefficient of the face personalized expression base.
实施例中,为了便于描述权重系数和位姿参数的计算过程,将当前使用的权重系数记为当前权重系数,可理解,由于开始计算时使用初始化权重系数,因此,先将初始化权重系数作为当前权重系数。In the embodiment, in order to facilitate the description of the calculation process of the weight coefficients and the pose parameters, the currently used weight coefficients are recorded as the current weight coefficients. weight factor.
步骤262、将当前权重系数代入误差参数公式,并计算误差参数最小时三维人脸模型的备选位姿参数。Step 262: Substitute the current weight coefficient into the error parameter formula, and calculate the candidate pose parameters of the three-dimensional face model when the error parameter is the smallest.
将当前权重系数代入误差参数公式中,以使误差参数公式中的权重系数为固定值(当前权重系数的值),此时,对于误差参数公式而言,其未知数仅有位姿参数。之后,根据误差参数公式进行计算,以确定当前计算过程中,误差参数最小时位姿参数的具体值。实施例中,将本次计算得到的位姿参数记为备选位姿参数。其中,备选位姿参数可理解为中间值,计算备选位姿参数的目的在于得到最终的位姿参数。The current weight coefficient is substituted into the error parameter formula, so that the weight coefficient in the error parameter formula is a fixed value (the value of the current weight coefficient). At this time, for the error parameter formula, the unknowns are only the pose parameters. After that, the calculation is performed according to the error parameter formula to determine the specific value of the pose parameter when the error parameter is the smallest in the current calculation process. In the embodiment, the pose parameters obtained by this calculation are recorded as candidate pose parameters. The candidate pose parameters can be understood as intermediate values, and the purpose of calculating the candidate pose parameters is to obtain the final pose parameters.
步骤263、将备选位姿参数代入误差参数公式,并计算误差参数最小时各人脸个性化表 情基的备选权重系数。Step 263: Substitute the candidate pose parameters into the error parameter formula, and calculate the candidate weight coefficients of the individualized expression bases for each face when the error parameters are the smallest.
示例性的,计算备选位姿参数后,将当前计算得到的备选位姿参数代入误差参数公式,以使误差参考公式中的位姿参数为固定值,此时,对于误差参数公式而言,其未知数仅有权重系数。之后,根据误差参数公式进行计算,以确定当前计算过程中,误差参数最小时权重系数的具体值。实施例中,将本次计算得到的权重系数记为备选权重系数。其中,备选权重系数可理解为中间值,计算备选权重系数的目的在于得到最终的权重系数。Exemplarily, after calculating the candidate pose parameters, the currently calculated candidate pose parameters are substituted into the error parameter formula, so that the pose parameters in the error reference formula are fixed values. At this time, for the error parameter formula , whose unknowns only have weight coefficients. After that, the calculation is performed according to the error parameter formula to determine the specific value of the weight coefficient when the error parameter is the smallest in the current calculation process. In the embodiment, the weight coefficient obtained by this calculation is recorded as the candidate weight coefficient. The candidate weight coefficient can be understood as an intermediate value, and the purpose of calculating the candidate weight coefficient is to obtain the final weight coefficient.
步骤264、更新当前的迭代次数。Step 264, update the current number of iterations.
示例性的,一次迭代计算过程是指将当前权重系数代入误差参数公式后得到备选位姿参数以及备选权重系数的过程。得到备选位姿参数和备选权重系数后,确定一次迭代计算完成,并更新迭代次数,即将当前的迭代次数加1。可理解,每得到备选权重系数后,便将迭代次数加1,并将最近一次迭代计算得到的备选权重系数和备选位姿参数作为当前最后得到的备选权重系数和备选位姿参数。Exemplarily, an iterative calculation process refers to a process of obtaining candidate pose parameters and candidate weight coefficients after substituting the current weight coefficient into the error parameter formula. After the candidate pose parameters and the candidate weight coefficients are obtained, it is determined that one iteration calculation is completed, and the number of iterations is updated, that is, the current number of iterations is increased by 1. It can be understood that after each candidate weight coefficient is obtained, the number of iterations is incremented by 1, and the candidate weight coefficient and candidate pose parameter calculated by the latest iteration are used as the current and final candidate weight coefficient and candidate pose. parameter.
步骤265、判断迭代次数是否达到次数阈值,迭代次数未达到次数阈值时,执行步骤266。迭代次数达到次数阈值时,执行步骤267。Step 265: Determine whether the number of iterations reaches the number threshold, and when the number of iterations does not reach the number threshold, perform step 266. When the number of iterations reaches the number threshold, step 267 is executed.
实施例中,次数阈值用于确认是否停止迭代计算。次数阈值可以结合实际情况设定,例如,结合历史经验数据确定合适的次数阈值,在本实施例中次数阈值为5。示例性的,更新迭代次数后,判断当前的迭代次数是否达到次数阈值,若是,则停止迭代计算,执行步骤266,若否,则继续进行迭代计算,执行步骤267。In an embodiment, the number of times threshold is used to confirm whether to stop the iterative calculation. The number of times threshold may be set in combination with the actual situation. For example, an appropriate number of times threshold may be determined in combination with historical experience data. In this embodiment, the number of times threshold is 5. Exemplarily, after updating the number of iterations, it is determined whether the current number of iterations reaches the number threshold, if so, stop the iterative calculation, and execute step 266 , if not, continue the iterative calculation and execute step 267 .
步骤266、将备选权重系数作为当前权重系数,并返回执行步骤262。Step 266 , take the candidate weight coefficient as the current weight coefficient, and return to step 262 .
将本次迭代计算得到的备选权重系数作为当前权重系数,并返回步骤262,开始新一次的迭代计算。The candidate weight coefficient obtained by this iterative calculation is used as the current weight coefficient, and the process returns to step 262 to start a new iterative calculation.
步骤267、将最后得到的备选位姿参数作为三维人脸模型的位姿参数,将最后得到的备选权重系数作为人脸个性化表情基的权重系数。Step 267: Use the finally obtained candidate pose parameters as the pose parameters of the three-dimensional face model, and use the finally obtained candidate weight coefficients as the weight coefficients of the face personalized expression base.
最后得到的备选位姿参数和备选权重系数是指迭代次数达到次数阈值时,最近一次迭代计算得到的备选位姿参数和备选权重系数。当迭代次数达到次数阈值时,停止迭代计算,并将最后得到的备选位姿参数和备选权重系数作为最终的三维人脸模型的位姿参数和人脸个性化表情基的权重系数。The candidate pose parameters and the candidate weight coefficients finally obtained refer to the candidate pose parameters and the candidate weight coefficients calculated by the latest iteration when the number of iterations reaches the number threshold. When the number of iterations reaches the threshold, the iterative calculation is stopped, and the finally obtained candidate pose parameters and candidate weight coefficients are used as the pose parameters of the final 3D face model and the weight coefficients of the face personalized expression base.
需说明,一般而言,当迭代次数满足次数阈值后,根据最后得到的备选位姿参数和备选权重系数调整三维人脸模型后,将调整的三维人脸模型映射到二维空间时,得到的二维图像与人脸图像高度相似或相同。It should be noted that, in general, when the number of iterations satisfies the number threshold, after adjusting the three-dimensional face model according to the finally obtained candidate pose parameters and candidate weight coefficients, when the adjusted three-dimensional face model is mapped to the two-dimensional space, The resulting 2D image is highly similar or identical to the face image.
可理解,上述以先固定权重系数为例进行描述,实际应用中,也可以先固定位姿参数进 行计算。It can be understood that the above description is described by taking the weight coefficient fixed first as an example. In practical applications, the pose parameters can also be fixed first for calculation.
需说明,实际应用中,还可以采用其他的计算方式,实施例对此不做限定,仅需在误差参数最小时获取对应的位姿参数和权重系数即可。It should be noted that, in practical applications, other calculation methods may also be used, which are not limited in the embodiment, and it is only necessary to obtain the corresponding pose parameters and weight coefficients when the error parameters are the smallest.
步骤270、将位姿参数和权重系数发送至远端设备,以使远端设备根据位姿参数和权重系数生成与人脸图像相对应的虚拟图像。Step 270: Send the pose parameters and the weight coefficients to the remote device, so that the remote device generates a virtual image corresponding to the face image according to the pose parameters and the weight coefficients.
上述,通过获取包含目标对象人脸图像的当前帧图像数据,并根据当前帧图像数据和预先设置的人脸模型先验信息构建目标对象的人脸中性表情基,进而根据人脸中性表情基、参考中性表情基和参考个性化表情基得到人脸个性化表情基,根据各人脸个性化表情基和人脸中性表情基构建三维人脸模型,并构建三维人脸模型与人脸图像的误差参数公式,之后,根据误差参数公式确定误差参数最小时人脸个性化表情基的权重系数以及三维人脸模型的位姿参数,并将权重系数和位姿参数发送至远端设备,以使远端设备生成对应的虚拟图像的技术手段,解决了相关技术中传输真实的人脸图像所带来的信息泄露及卡顿的技术问题,保证了远端设备的成像质量。构建误差参数公式时,通过人脸关键点相对距离对权重系数采用二次规划及动态约束的策略,可以有效的消除人脸关键点检测不准确时对权重系数的影响,提高了权重系数的准确性。利用FACS细化定义了基础表情,每个基础表情对应一个人脸个性化表情基,使得三维人脸模型包含的表情更加丰富,进而保证得到的位姿参数和权重系数贴近于真实的人脸图像,尤其是FACS细化定义的基础表情主要划分了左右对称表情,当人脸图像的表情是不对称表情时,可以有效的捕获和驱动,以使得到的位姿参数和权重系数贴近于真实的人脸图像。在计算权重系数和位姿参数时,通过固定权重系数或位姿参数并进行迭代计算的方式,可以将误差参数公式转换成线性求解的公式,简化了计算过程。In the above, the neutral facial expression base of the target object is constructed according to the current frame image data and the preset a priori information of the facial model by acquiring the current frame image data including the facial image of the target object, and then according to the neutral facial expression Based on the base, the reference neutral expression base and the reference personalized expression base, the personalized expression base of the face is obtained, and the 3D face model is constructed according to the individualized expression base and the neutral expression base of each face, and the 3D face model and the human face are constructed. The error parameter formula of the face image, then, according to the error parameter formula, determine the weight coefficient of the face personalized expression base and the pose parameters of the 3D face model when the error parameter is the smallest, and send the weight coefficient and pose parameters to the remote device. , which solves the technical problems of information leakage and jamming caused by the transmission of real face images in the related art, and ensures the imaging quality of the remote device. When constructing the error parameter formula, the strategy of quadratic programming and dynamic constraint is adopted for the weight coefficient through the relative distance of the face key points, which can effectively eliminate the influence on the weight coefficient when the face key point detection is inaccurate, and improve the accuracy of the weight coefficient. sex. The basic expressions are defined by using FACS refinement, each basic expression corresponds to a face personalized expression base, which makes the expressions contained in the 3D face model more abundant, thereby ensuring that the obtained pose parameters and weight coefficients are close to the real face image. , especially the basic expression defined by the FACS refinement mainly divides the left and right symmetrical expressions. When the expression of the face image is asymmetrical, it can be effectively captured and driven, so that the obtained pose parameters and weight coefficients are close to the real ones. face image. When calculating the weight coefficients and pose parameters, by fixing the weight coefficients or pose parameters and performing iterative calculation, the error parameter formula can be converted into a linear solution formula, which simplifies the calculation process.
图12为本申请一个实施例提供的又一种虚拟图像构建方法的流程图。参考图12,该虚拟图像构建方法具体包括:FIG. 12 is a flowchart of still another virtual image construction method provided by an embodiment of the present application. Referring to Figure 12, the virtual image construction method specifically includes:
步骤310、获取当前帧图像数据,当前帧图像数据包含目标对象的人脸图像。Step 310: Acquire current frame image data, where the current frame image data includes a face image of the target object.
步骤320、根据当前帧图像数据构建目标对象的人脸中性表情基和多个人脸个性化表情基。 Step 320 , construct a neutral facial expression base and a plurality of personalized facial expression bases of the target object according to the current frame image data.
步骤330、根据人脸中性表情基和多个人脸个性化表情基构建目标对象的三维人脸模型。 Step 330 , constructing a three-dimensional face model of the target object according to the neutral facial expression base and multiple personalized facial expression bases.
步骤340、构建三维人脸模型映射到人脸图像时的误差参数公式。 Step 340 , constructing an error parameter formula when the three-dimensional face model is mapped to the face image.
实施例中,采用的误差参数公式为:In the embodiment, the error parameter formula adopted is:
Figure PCTCN2021070727-appb-000039
Figure PCTCN2021070727-appb-000039
Cβ≤dCβ≤d
其中,E exp表示误差参数,β表示权重系数向量,β=(β 1 β 2 … β n),n表示人脸个性化表情基的总数量,β i表示第i个人脸个性化表情基的权重系数,1≤i≤n,A=sR·ΔB,s表示三维人脸模型映射到人脸图像时的尺度缩放因子,R表示三维人脸模型映射到人脸图像时的刚性旋转矩阵,ΔB=[B 1-B 0 B 2-B 0 … B n-B 0],B 0表示人脸中性表情基,B i表示第i个人脸个性化表情基,b=f-t-sR·B 0,f表示人脸图像中的人脸关键点,t表示三维人脸模型映射到人脸图像时的平移向量,s、R和t为位姿参数,C表示β的约束参数,d表示β的取值范围。选择上述误差参数公式的理由在于:后续迭代计算时,计算过程简单,且有效对权重参数进行二次规划和动态监督。 Among them, E exp represents the error parameter, β represents the weight coefficient vector, β=(β 1 β 2 … β n ), n represents the total number of face personalized expression bases, β i represents the ith face personalized expression base Weight coefficient, 1≤i≤n, A=sR·ΔB, s is the scaling factor when the 3D face model is mapped to the face image, R is the rigid rotation matrix when the 3D face model is mapped to the face image, ΔB =[B 1 -B 0 B 2 -B 0 ... B n -B 0 ], B 0 represents the neutral facial expression base, B i represents the i-th face personalized expression base, b=ft-sR·B 0 , f represents the face key points in the face image, t represents the translation vector when the three-dimensional face model is mapped to the face image, s, R and t are the pose parameters, C represents the constraint parameter of β, and d represents the Ranges. The reason for choosing the above error parameter formula is that in the subsequent iterative calculation, the calculation process is simple, and the weight parameters are effectively subjected to quadratic programming and dynamic supervision.
一个实施例中,
Figure PCTCN2021070727-appb-000040
其中,ones(n)表示n个权重系数的取值上界,zero(n)表示n个权重系数的取值下界。
In one embodiment,
Figure PCTCN2021070727-appb-000040
Among them, ones(n) represents the upper bound of the value of n weight coefficients, and zero(n) represents the lower bound of the value of n weight coefficients.
一个实施例中,
Figure PCTCN2021070727-appb-000041
其中,ones(n)表示n个权重系数的取值上界,zero(n)表示n个权重系数的取值下界,p n和q n为取值约束矩阵,p n和q n根据人脸图像中的人脸关键点相对距离确定。
In one embodiment,
Figure PCTCN2021070727-appb-000041
Among them, ones(n) represents the upper bound of the n weight coefficients, zero(n) represents the lower bound of the n weight coefficients, p n and q n are the value constraint matrices, p n and q n are based on the face The relative distance of the face key points in the image is determined.
例如,参考图11,计算人脸图像中左眼的人脸关键点相对距离L后,若L≤5,则说明人脸图像中左眼闭合的概率较高,那么,表示左眼闭的人脸个性化表情基所对应的权重系数应该较大,因此,可以为该权重系数设置一个较大的取值范围,如设置0.9-1的取值范围,此时,p n中与左眼闭对应的p值可以取1,q n中与左眼闭对应的q值可以取0.9,以使β中表示左眼闭的人脸个性化表情基的权重系数取值范围在0.9-1之间。计算人脸图像中嘴部的人脸关键点相对距离后,若人脸关键点相对距离未超过误差距离(例如L≤3),则认为嘴部闭合,设置p值为0.1,q值为0,进而使表示张嘴的人脸个性化表情基权重系数的取值范围在0-0.1之间,若人脸关键点相对距离超过误差距离(例如L>3),则设置p值为1,q值为0,进而使表示张嘴的人脸个性化表情基权重系数的取值范围在0-1之间。按照上述方式,将人脸关键点相对距离和对应的误差距离作为权重系数的先验信息。这样可以在人脸关键点检测时允许人脸关键点检测有误带来的误差,保证了后续处理过程的准确性。 For example, referring to Fig. 11, after calculating the relative distance L of the face key points of the left eye in the face image, if L≤5, it means that the probability of closing the left eye in the face image is high, then it means that the person whose left eye is closed The weight coefficient corresponding to the face personalized expression base should be relatively large. Therefore, a larger value range can be set for the weight coefficient, such as a value range of 0.9-1. At this time, p n and the left eye are closed. The corresponding p value can be set to 1, and the q value corresponding to the left eye closure in q n can be set to 0.9, so that the weight coefficient of the face personalized expression base representing the left eye closure in β ranges from 0.9 to 1. . After calculating the relative distance of the face key points of the mouth in the face image, if the relative distance of the face key points does not exceed the error distance (for example, L≤3), the mouth is considered to be closed, and the p value is set to 0.1 and the q value is 0 , and then make the value range of the face personalized expression base weight coefficient representing the open mouth between 0 and 0.1. If the relative distance of the key points of the face exceeds the error distance (for example, L>3), then set the p value to 1, q The value is 0, so that the value range of the weight coefficient of the face personalized expression base representing the open mouth is between 0-1. According to the above method, the relative distance of the face key points and the corresponding error distance are used as the prior information of the weight coefficient. In this way, errors caused by incorrect detection of face key points can be allowed in the detection of face key points, and the accuracy of the subsequent processing process can be ensured.
步骤350、查找各人脸个性化表情基中的互为排斥表情基。 Step 350 , searching for mutually exclusive expression bases in the personalized expression bases of each face.
互为排斥表情基是指对应的表情不能同时出现在人脸中的人脸个性化表情基。例如,下巴左移和下巴右移是无法同时出现在人脸中的,因此,对应的两个人脸个性化表情基便可以 认为是互为排斥表情基。例如,图13为本申请实施例提供的一种互为排斥表情基示意图,参考图13,以读者的视觉为基准,左边的人脸个性化表情基中嘴唇和下巴均为左移,右边的人脸个性化表情基中嘴唇和下巴均为右移。对于人脸而言,其仅能作出其中一种表情,而无法同时作出两个表情。再如,图14为本申请实施例提供的另一种互为排斥表情基示意图,参考图14,左边的人脸个性化表情基的表情为张嘴,右边的人脸个性化表情基的表情为脸颊鼓起。对于人脸而言,其在张嘴时是无法脸颊鼓起的,因此,可以认为其是互为排斥表情基。Mutually exclusive expression bases refer to face-specific expression bases in which the corresponding expressions cannot appear in the human face at the same time. For example, left chin and right chin cannot appear in a human face at the same time, so the corresponding two personalized facial expression bases can be regarded as mutually exclusive expression bases. For example, FIG. 13 is a schematic diagram of a mutually exclusive expression base provided by an embodiment of the present application. Referring to FIG. 13 , based on the reader’s vision, the lips and chin in the left face personalized expression base are moved to the left, and the right The lips and chin in the face personalized expression base are shifted to the right. For a human face, it can only make one of the expressions, but cannot make two expressions at the same time. For another example, FIG. 14 is a schematic diagram of another mutually exclusive expression base provided by the embodiment of the present application. Referring to FIG. 14 , the expression of the human face personalized expression base on the left is an open mouth, and the expression of the human face personalized expression base on the right is Cheeks puffed out. For a human face, it cannot bulge its cheeks when opening its mouth, so it can be considered as a mutually exclusive expression base.
可理解,互为排斥表情基中不能同时出现的表情不仅是指人脸个性化表情基对应的基础表情,还包括叠加后的表情,此时,叠加后的表情不能同时出现在人脸时,其对应的多个人脸个性化表情基也是互为排斥表情基,例如,叠加后的表情为皱鼻和皱左眉的表情,另一个叠加的表情为挑眉头和挑左眉尾,此时,两个叠加的表情不能同时出现在人脸中,因此,皱鼻和皱左眉的人脸个性化表情基与挑眉头和挑左眉尾的人脸个性化表情基为互为排斥表情基。It is understandable that the expressions that cannot appear at the same time in the mutually exclusive expression base not only refer to the basic expressions corresponding to the face personalized expression base, but also include the superimposed expressions. The corresponding multiple face personalized expression bases are also mutually exclusive expression bases. For example, the superimposed expression is the expression of wrinkling the nose and frowning the left eyebrow, and the other superimposed expression is the raised eyebrow and the left eyebrow tail. At this time, Two superimposed expressions cannot appear in a human face at the same time. Therefore, the facial personalized expression base of wrinkling the nose and frowning the left eyebrow and the facial personalized expression base of raising the eyebrow and raising the left eyebrow tail are mutually exclusive expression bases.
实施例中,以步骤230中自定义的基础表情及对应生成的人脸个性化表情基为例,此时,在26个人脸个性化表情基中查找到全部的互为排斥表情基。可理解,可以人工构建互为排斥表情基,虚拟图像构建设备直接获取互为排斥表情基。还可以由虚拟图像构建设备在同三维人脸模型中逐步添加各基础表情以及对各基础表情进行叠加,以确定三维人脸模型是否可以同时显示各表情,进而确定出互为排斥表情基。In the embodiment, taking the self-defined basic expression and the correspondingly generated facial expression base in step 230 as an example, at this time, all mutually exclusive expression bases are found in the 26 human face personalized expression bases. It is understandable that the mutually exclusive expression base can be constructed manually, and the virtual image construction device directly obtains the mutually exclusive expression base. The virtual image construction device can also gradually add basic expressions in the same three-dimensional face model and superimpose the basic expressions to determine whether the three-dimensional face model can display all expressions at the same time, thereby determining mutually exclusive expression bases.
在一个实施例中,26个人脸个性化表情基中的互为排斥表情基如下表所示:In one embodiment, the mutually exclusive expression bases in the 26 face personalized expression bases are shown in the following table:
互斥1Mutual exclusion 1 互斥2Mutex 2
B0B0 B2B2
B1B1 B3B3
B4,B25B4,B25 B6,B7B6,B7
B5,B25B5,B25 B6,B8B6,B8
B9B9 B24B24
B10B10 B11B11
B12B12 B21B21
B13B13 B22B22
表2Table 2
上表中,同一行内互斥1所包含的人脸个性化表情基和互斥2所包含的人脸个性化表情基为互为排斥表情基。需说明,表2中的“B”对应于表1中“Blendshape”,表2中“B”后的数字为“Blendshape”的编号。In the above table, the face personalized expression base included in mutual exclusion 1 and the face personalized expression base included in mutual exclusion 2 in the same row are mutually exclusive expression bases. It should be noted that "B" in Table 2 corresponds to "Blendshape" in Table 1, and the number after "B" in Table 2 is the number of "Blendshape".
步骤360、根据互为排斥表情基对各人脸个性化表情基进行分组,得到多个表情基组,每个表情基组中的任意两个人脸个人化表情基间均不互为排斥。Step 360: Group the individualized expression bases of faces according to mutually exclusive expression bases to obtain multiple expression base groups, and any two individualized facial expression bases in each expression base group are not mutually exclusive.
由于互为排斥的人脸个性化表情基不能同时出现在人脸中,因此,实施例中,根据互为排斥表情基对人脸个性化表情基分组,此时,每个分组记为一个表情基组。此时,每个表情基组中的人脸个性化表情基均不互为排斥。举例而言,一个表情基组中包含B1对应的人脸个性化表情基,那么,其不会包含B3对应的人脸个性化表情基。一个表情基组中包含B4和B25对应的人脸个性化表情基,那么,其不会包含B6和B7对应的人脸个性化表情基。Since the mutually exclusive facial personalized expression bases cannot appear in the human face at the same time, in the embodiment, the facial personalized expression bases are grouped according to the mutually exclusive expression bases, and at this time, each grouping is recorded as an expression basis set. At this time, the face personalized expression bases in each expression base group are not mutually exclusive. For example, if an expression base group contains the personalized facial expression base corresponding to B1, then it will not contain the personalized facial expression base corresponding to B3. If an expression base group contains the personalized facial expression bases corresponding to B4 and B25, then it will not contain the personalized facial expression bases corresponding to B6 and B7.
可理解,分组的具体方式实施例不作限定,均需满足每个表情基组中不包含互为排斥的人脸个性化表情基即可。It can be understood that the specific manner of grouping is not limited by the embodiment, and it is only required that each expression base group does not contain mutually exclusive facial personalized expression bases.
步骤370、根据误差参数公式,计算每个表情基组对应的最小误差参数,以及在最小误差参数时三维人脸模型的位姿参数和表情基组内各人脸个性化表情基的权重系数。Step 370: Calculate the minimum error parameter corresponding to each expression base set, and the pose parameters of the three-dimensional face model and the weight coefficients of the individual face expression bases in the expression base set when the minimum error parameter is based on the error parameter formula.
分组后,以表情基组为单位进行计算,每次优化一组表情基组。由于每个表情基组的计算过程相同,因此,实施例中以计算一个表情基组为例进行描述。示例性的,根据误差参数公式,计算最小误差参数时,该表情基组中各人脸个性化表情基的权重系数以及三维人脸模型的位姿参数。由于表情基组不包含全部的人脸个性化表情基,因此,在计算过程中,可以将表情基组内未包含的人脸个性化表情基的权重系数始终置0,以减少了求解的权重系数的数量。可理解,计算过程中,同样可以采用迭代计算的方式,具体可参考步骤260所描述的过程,区别仅在于,最后得到的权重系数中未包含在表情基组内的人脸个性化表情基的权重系数为0。After grouping, the calculation is performed in units of expression basis groups, and one group of expression basis groups is optimized each time. Since the calculation process of each expression basis set is the same, in the embodiment, the calculation of one expression basis set is taken as an example for description. Exemplarily, according to the error parameter formula, when calculating the minimum error parameter, the weight coefficients of the individualized expression bases of each face in the expression base group and the pose parameters of the three-dimensional face model. Since the expression basis group does not contain all the face personalized expression basis, in the calculation process, the weight coefficient of the face personalized expression basis not included in the expression basis group can always be set to 0, so as to reduce the weight of the solution. the number of coefficients. It can be understood that in the calculation process, the iterative calculation method can also be used. For details, refer to the process described in step 260. The only difference is that the final obtained weight coefficient is not included in the facial expression base group. The weight factor is 0.
可理解,由于每个表情基组包含的人脸个性化表情基不同,那么,使用不同的表情基组进行计算时,得到的最小误差参数、权重系数和位姿参数可能不同。因此,按照上述方式对每个表情基组进行计算后,每个表情基组均对应一个最小误差参数、权重系数和位姿参数。It is understandable that since each expression base group contains different facial expression bases, the minimum error parameters, weight coefficients and pose parameters obtained may be different when different expression base groups are used for calculation. Therefore, after each expression basis group is calculated in the above manner, each expression basis group corresponds to a minimum error parameter, a weight coefficient and a pose parameter.
步骤380、在每个表情基组对应的最小误差参数中,选择最小的最小误差参数。Step 380: From the minimum error parameters corresponding to each expression basis set, select the smallest minimum error parameter.
最小误差参数越小,说明将三维人脸模型映射到二维空间时得到的二维图像越接近人脸图像,因此,实施例中,在每个表情基组对应的各最小误差参数中,选择最小的最小误差参数。一般而言,最小的最小误差参数仅有一个。The smaller the minimum error parameter is, the closer the two-dimensional image obtained when the three-dimensional face model is mapped to the two-dimensional space is closer to the face image. Therefore, in the embodiment, among the minimum error parameters corresponding to each expression basis set, select The smallest minimum error parameter. In general, the smallest minimum error parameter is only one.
步骤390、将最小的最小误差参数对应的位姿参数以及权重系数作为最终得到的位姿参数和权重系数。Step 390: Use the pose parameter and weight coefficient corresponding to the smallest minimum error parameter as the finally obtained pose parameter and weight coefficient.
确认最小的最小误差参数对应的表情基组,并获取该表情基组的位姿参数和权重系数作为最终得到的位姿参数和权重系数。可理解,使用最小的最小误差参数对应的位姿参数和权重系数对三维人脸模型进行调整后,三维人脸模型最接近人脸图像。Confirm the expression basis set corresponding to the smallest minimum error parameter, and obtain the pose parameters and weight coefficients of the expression basis set as the final pose parameters and weight coefficients. It can be understood that after adjusting the three-dimensional face model by using the pose parameters and weight coefficients corresponding to the smallest minimum error parameters, the three-dimensional face model is the closest to the face image.
步骤3100、将位姿参数和权重系数发送至远端设备,以使远端设备根据位姿参数和权重系数生成与人脸图像相对应的虚拟图像。Step 3100: Send the pose parameters and the weight coefficients to the remote device, so that the remote device generates a virtual image corresponding to the face image according to the pose parameters and the weight coefficients.
可选的,发送权重系数时,对于未包含在表情基组中的人脸个性化表情基而言,其权重系数置0,并与其他权重系数一同发送至远端设备中。还可选的,发送权重系数时,只发送最小的最小误差参数所对应的表情基组内人脸个性化表情基的权重系数,远端设备根据接收到的权重系数寻找对应的个性化表情基而无需使用全部的个性化表情基,之后,根据寻找的个性化表情基和对应的权重系数构建对应的虚拟图像。Optionally, when the weight coefficient is sent, for the face personalized expression base not included in the expression base group, the weight coefficient is set to 0, and is sent to the remote device together with other weight coefficients. Optionally, when sending the weight coefficient, only the weight coefficient of the face personalized expression base in the expression base group corresponding to the smallest minimum error parameter is sent, and the remote device searches for the corresponding personalized expression base according to the received weight coefficient. Instead of using all the personalized expression bases, a corresponding virtual image is constructed according to the searched personalized expression bases and the corresponding weight coefficients.
上述,通过获取包含目标对象人脸图像的当前帧图像数据,并根据当前帧图像数据构建目标对象的人脸中性表情基和多个人脸个性化表情基,之后,根据人脸中性表情基和多个人脸个性化表情基构建三维人脸模型,之后,构建三维人脸模型映射到人脸图像时的误差参数公式,之后,根据互为排斥表情基对人脸个性化表情基进行分组,并分别计算各表情基组中误差参数最小时人脸个性化表情基的权重系数以及三维人脸模型的位姿参数,之后,选择误差参数最小的表情基组对应的权重系数和位姿参数,并发送至远端设备,以使远端设备通过位姿参数和权重系数显示与人脸图像对应的虚拟图像的技术手段,解决了相关技术中传输真实的人脸图像所带来的信息泄露及卡顿的技术问题,降低了对网络带宽的需求,有效保护了目标对象的隐私,保证了远端设备的成像质量。进一步的,通过查找互为排斥表情基,并根据互为排斥表情基进行分组,以表情基组为单位计算位姿参数和权重系数,减少了每次求解的权重系数的数量,从而减小表情基搜索空间,使得表情系数求解更加精准高效,同时,使用了更少的人脸个性化表情基表达人脸图像的表情。Above, by acquiring the current frame image data containing the target object's face image, and constructing the target object's face neutral expression base and a plurality of face personalized expression bases according to the current frame image data, and then, according to the face neutral expression base. Build a 3D face model with multiple face personalized expression bases, and then construct the error parameter formula when the 3D face model is mapped to a face image, and then group the face personalized expression bases according to the mutually exclusive expression bases. And calculate the weight coefficient of the face personalized expression base and the pose parameters of the three-dimensional face model when the error parameter is the smallest in each expression base group, and then select the corresponding weight coefficient and pose parameter of the expression base set with the smallest error parameter, And send it to the remote device, so that the remote device can display the virtual image corresponding to the face image through the pose parameters and weight coefficients, which solves the information leakage caused by the transmission of the real face image in the related art. The technical problem of freezing reduces the demand for network bandwidth, effectively protects the privacy of the target object, and ensures the imaging quality of the remote device. Further, by finding mutually exclusive expression bases, and grouping them according to the mutually exclusive expression bases, the pose parameters and weight coefficients are calculated in units of expression base groups, which reduces the number of weight coefficients to be solved each time, thereby reducing the number of expressions. The base search space makes the expression coefficient solution more accurate and efficient, and at the same time, fewer facial expression bases are used to express the expression of the face image.
图15为本申请一个实施例提供的一种虚拟图像构建装置的结构示意图。参考图15,该虚拟图像构建装置包括:图像获取模块401、表情基构建模块402、人脸模型构建模块403、参数确定模块404以及参数发送模块405。FIG. 15 is a schematic structural diagram of an apparatus for constructing a virtual image provided by an embodiment of the present application. Referring to FIG. 15 , the virtual image construction apparatus includes: an image acquisition module 401 , an expression base construction module 402 , a face model construction module 403 , a parameter determination module 404 and a parameter transmission module 405 .
其中,图像获取模块401,用于获取当前帧图像数据,当前帧图像数据包含目标对象的人脸图像;表情基构建模块402,用于根据当前帧图像数据构建目标对象的人脸中性表情基和多个人脸个性化表情基;人脸模型构建模块403,用于根据人脸中性表情基和多个人脸个性化表情基构建目标对象的三维人脸模型;参数确定模块404,用于确定三维人脸模型映射到人脸图像时三维人脸模型的位姿参数以及各人脸个性化表情基的权重系数;参数发送模块405,用于将位姿参数和权重系数发送至远端设备,以使远端设备根据位姿参数和权重系数生成与人脸图像相对应的虚拟图像。Among them, the image acquisition module 401 is used to acquire the current frame image data, and the current frame image data includes the face image of the target object; the expression base construction module 402 is used to construct the neutral facial expression base of the target object according to the current frame image data and a plurality of face personalized expression bases; the face model building module 403 is used to construct a three-dimensional face model of the target object according to the neutral facial expression base and a plurality of face personalized expression bases; the parameter determination module 404 is used to determine When the three-dimensional face model is mapped to the face image, the pose parameters of the three-dimensional face model and the weight coefficients of the individualized expression bases of each face; the parameter sending module 405 is used for sending the pose parameters and the weight coefficients to the remote device, So that the remote device generates a virtual image corresponding to the face image according to the pose parameters and the weight coefficient.
在上述实施例的基础上,参数确定模块404包括:公式构建单元,用于构建三维人脸模 型映射到人脸图像时的误差参数公式;公式计算单元,用于根据误差参数公式,确定误差参数最小时三维人脸模型的位姿参数以及各人脸个性化表情基的权重系数。On the basis of the above embodiment, the parameter determination module 404 includes: a formula construction unit for constructing an error parameter formula when the three-dimensional face model is mapped to a face image; a formula calculation unit for determining the error parameter according to the error parameter formula The minimum pose parameters of the 3D face model and the weight coefficients of the individualized expression bases of each face.
在上述实施例的基础上,公式计算单元包括:初始参数获取子单元,用于获取各人脸个性化表情基的初始化权重系数,将初始化权重系数作为当前权重系数;第一参数代入子单元,用于将当前权重系数代入误差参数公式,并计算误差参数最小时三维人脸模型的备选位姿参数;第二参数代入子单元,用于将备选位姿参数代入误差参数公式,并计算误差参数最小时各人脸个性化表情基的备选权重系数;次数更新子单元,用于更新当前的迭代次数;次数判断子单元,用于判断迭代次数是否达到次数阈值;返回子单元,用于迭代次数未达到次数阈值时,将备选权重系数作为当前权重系数,并返回执行将当前权重系数代入误差参数公式的操作,直到迭代次数达到次数阈值为止;第一参数选择子单元,用于迭代次数达到次数阈值时,将最后得到的备选位姿参数作为三维人脸模型的位姿参数,将最后得到的备选权重系数作为人脸个性化表情基的权重系数。On the basis of the above-mentioned embodiment, the formula calculation unit includes: an initial parameter acquisition subunit, used to acquire the initialization weight coefficient of each face personalized expression base, and the initialization weight coefficient is used as the current weight coefficient; the first parameter is substituted into the subunit, It is used to substitute the current weight coefficient into the error parameter formula, and calculate the candidate pose parameters of the 3D face model when the error parameter is the smallest; the second parameter is substituted into the subunit, which is used to substitute the candidate pose parameters into the error parameter formula, and calculate When the error parameter is the smallest, the candidate weight coefficient of each face personalized expression base; the times update subunit is used to update the current iteration times; the times judgment subunit is used to judge whether the iteration times reach the times threshold; the return subunit is used to use When the number of iterations does not reach the number threshold, the candidate weight coefficient is used as the current weight coefficient, and the operation of substituting the current weight coefficient into the error parameter formula is returned to execute the operation until the number of iterations reaches the number threshold; the first parameter selects the subunit, which is used for When the number of iterations reaches the threshold, the finally obtained candidate pose parameters are used as the pose parameters of the 3D face model, and the finally obtained candidate weight coefficients are used as the weight coefficients of the face personalized expression base.
在上述实施例的基础上,误差参数公式为
Figure PCTCN2021070727-appb-000042
其中,Cβ≤d,其中,E exp表示误差参数,β表示权重系数向量,β=(β 1 β 2 … β n),n表示人脸个性化表情基的总数量,β i表示第i个人脸个性化表情基的权重系数,1≤i≤n,A=sR·ΔB,s表示三维人脸模型映射到人脸图像时的尺度缩放因子,R表示三维人脸模型映射到人脸图像时的刚性旋转矩阵,ΔB=[B 1-B 0 B 2-B 0 … B n-B 0],B 0表示人脸中性表情基,B i表示第i个人脸个性化表情基,b=f-t-sR·B 0,f表示人脸图像中的人脸关键点,t表示三维人脸模型映射到人脸图像时的平移向量,s、R和t为位姿参数,C表示β的约束参数,d表示β的取值范围。
On the basis of the above embodiment, the error parameter formula is
Figure PCTCN2021070727-appb-000042
Among them, Cβ≤d, where E exp represents the error parameter, β represents the weight coefficient vector, β=(β 1 β 2 … β n ), n represents the total number of personalized facial expression bases, β i represents the ith person The weight coefficient of the face personalized expression base, 1≤i≤n, A=sR ΔB, s represents the scaling factor when the 3D face model is mapped to the face image, R represents the 3D face model is mapped to the face image The rigid rotation matrix of , ΔB=[B 1 -B 0 B 2 -B 0 ... B n -B 0 ], B 0 represents the neutral facial expression base, B i represents the i-th face personalized expression base, b= ft-sR·B 0 , f represents the face key points in the face image, t represents the translation vector when the 3D face model is mapped to the face image, s, R and t are the pose parameters, and C represents the constraint of β parameter, d represents the value range of β.
在上述实施例的基础上,
Figure PCTCN2021070727-appb-000043
其中,ones(n)表示n个权重系数的取值上界,zero(n)表示n个权重系数的取值下界;或,
Figure PCTCN2021070727-appb-000044
其中,ones(n)表示n个权重系数的取值上界,zero(n)表示n个权重系数的取值下界,p n和q n为取值约束矩阵,p n和q n根据所述人脸图像中的人脸关键点相对距离确定。
On the basis of the above-mentioned embodiment,
Figure PCTCN2021070727-appb-000043
Among them, ones(n) represents the upper bound of the value of n weight coefficients, and zero(n) represents the lower bound of the value of n weight coefficients; or,
Figure PCTCN2021070727-appb-000044
Among them, ones(n) represents the upper bound of the value of n weight coefficients, zero(n) represents the lower bound of the value of n weight coefficients, p n and q n are the value constraint matrices, p n and q n according to the The relative distance of the face key points in the face image is determined.
在上述实施例的基础上,还包括:表情基查找模块,用于根据所述误差参数公式,确定误差参数最小时三维人脸模型的位姿参数以及各人脸个性化表情基的权重系数之前,查找各人脸个性化表情基中的互为排斥表情基;表情基分组模块,用于根据互为排斥表情基对各人 脸个性化表情基进行分组,得到多个表情基组,每个表情基组中的任意两个人脸个人化表情基间均不互为排斥。公式计算单元包括:分组计算子单元,用于根据误差参数公式,计算每个表情基组对应的最小误差参数,以及在最小误差参数时三维人脸模型的位姿参数和表情基组内各人脸个性化表情基的权重系数;最小参数选择子单元,用于在每个表情基组对应的最小误差参数中,选择最小的最小误差参数;第二参数选择子单元,用于将最小的最小误差参数对应的位姿参数以及权重系数作为最终得到的位姿参数和权重系数。On the basis of the above-mentioned embodiment, it also includes: an expression base search module, which is used to determine, according to the error parameter formula, the pose parameters of the three-dimensional face model when the error parameter is the smallest and the weight coefficients of the individualized expression bases of each face before , to find the mutually exclusive expression bases in the personalized expression bases of each face; the expression base grouping module is used to group the personalized expression bases of each face according to the mutually exclusive expression bases, and obtain multiple expression base groups, each of which is Any two face-personalized expression bases in the expression base group are not mutually exclusive. The formula calculation unit includes: a group calculation sub-unit, which is used to calculate the minimum error parameter corresponding to each expression basis group according to the error parameter formula, and the pose parameters of the three-dimensional face model and each person in the expression basis group when the minimum error parameter is used. The weight coefficient of the face personalized expression base; the minimum parameter selection subunit is used to select the smallest minimum error parameter among the minimum error parameters corresponding to each expression base group; the second parameter selection subunit is used to select the smallest minimum error parameter The pose parameters and weight coefficients corresponding to the error parameters are used as the finally obtained pose parameters and weight coefficients.
在上述实施例的基础上,表情基构建模块402包括:中性表情基构建单元,用于根据当前帧图像数据和预先设置的人脸模型先验信息构建目标对象的人脸中性表情基;个性化表情基构建单元,用于根据人脸中性表情基和预先设置的参考中性表情基和各参考个性化表情基,确定目标对象的各人脸个性化表情基,每个参考个性化表情基对应一个人脸个性化表情基。On the basis of the above embodiment, the expression base construction module 402 includes: a neutral expression base construction unit, configured to construct a neutral expression base of the target object according to the current frame image data and the preset prior information of the face model; The personalized expression base construction unit is used to determine the individual facial expression bases of the target object according to the neutral facial expression base, the preset reference neutral expression base and each reference personalized expression base, and each reference personalized expression base The expression base corresponds to a face personalized expression base.
在上述实施例的基础上,中性表情基构建单元包括:人脸图像检测子单元,用于检测当前帧数据图像中的所述人脸图像;关键点定位子单元,用于对人脸图像进行人脸关键点定位,以得到关键点坐标数组;中性表情基确定子单元,用于根据人脸图像、关键点坐标数组和预先设置的人脸模型先验信息确定目标对象的人脸中性表情基。On the basis of the above embodiment, the neutral expression base construction unit includes: a face image detection subunit for detecting the face image in the current frame data image; a key point location subunit for detecting the face image Perform facial key point positioning to obtain a key point coordinate array; neutral expression base determination subunit, used to determine the face of the target object according to the face image, the key point coordinate array and the preset a priori information of the face model Sexual expression base.
在上述实施例的基础上,个性化表情基构建单元包括:形变信息确定子单元,用于根据参考中性表情基和参考个性化表情基确定形变信息;个性化表情基确定子单元,用于根据形变信息和人脸中性表情基确定目标对象的人脸个性化表情基。On the basis of the above embodiment, the personalized expression base construction unit includes: a deformation information determination subunit, used for determining deformation information according to the reference neutral expression base and the reference personalized expression base; the personalized expression base determination subunit, used for According to the deformation information and the neutral facial expression base, the facial personalized expression base of the target object is determined.
上述提供的虚拟图像构建装置可用于执行上述任意实施例提供的虚拟图像构建方法,具备相应的功能和有益效果。The virtual image construction device provided above can be used to execute the virtual image construction method provided by any of the above embodiments, and has corresponding functions and beneficial effects.
值得注意的是,上述虚拟图像构建装置的实施例中,所包括的各个单元和模块只是按照功能逻辑进行划分的,但并不局限于上述的划分,只要能够实现相应的功能即可;另外,各功能单元的具体名称也只是为了便于相互区分,并不用于限制本发明的保护范围。It is worth noting that, in the embodiment of the above virtual image construction apparatus, the units and modules included are only divided according to functional logic, but are not limited to the above division, as long as the corresponding functions can be realized; in addition, The specific names of the functional units are only for the convenience of distinguishing from each other, and are not used to limit the protection scope of the present invention.
图16为本申请一个实施例提供的一种虚拟图像构建设备的结构示意图。如图16所示,该虚拟图像构建设备包括处理器50、存储器51、输入装置52、输出装置53;虚拟图像构建设备中处理器50的数量可以是一个或多个,图16中以一个处理器50为例。虚拟图像构建设备中处理器50、存储器51、输入装置52、输出装置53可以通过总线或其他方式连接,图16中以通过总线连接为例。FIG. 16 is a schematic structural diagram of a virtual image construction device according to an embodiment of the present application. As shown in FIG. 16 , the virtual image construction device includes a processor 50, a memory 51, an input device 52, and an output device 53; the number of processors 50 in the virtual image construction device may be one or more, and in FIG. 16, one process Take device 50 as an example. The processor 50 , the memory 51 , the input device 52 , and the output device 53 in the virtual image construction device may be connected by a bus or in other ways, and the connection by a bus is taken as an example in FIG. 16 .
存储器51作为一种计算机可读存储介质,可用于存储软件程序、计算机可执行程序以及模块,如本发明实施例中的虚拟图像构建方法对应的程序指令/模块(例如,虚拟图像构建装置中的图像获取模块401、表情基构建模块402、人脸模型构建模块403、参数确定模块404 以及参数发送模块405)。处理器50通过运行存储在存储器51中的软件程序、指令以及模块,从而执行虚拟图像构建设备的各种功能应用以及数据处理,即实现上述的虚拟图像构建设备方法。As a computer-readable storage medium, the memory 51 can be used to store software programs, computer-executable programs, and modules, such as program instructions/modules corresponding to the virtual image construction method in the embodiment of the present invention (for example, the image acquisition module 401, expression base construction module 402, face model construction module 403, parameter determination module 404 and parameter transmission module 405). The processor 50 executes various functional applications and data processing of the virtual image construction device by running the software programs, instructions and modules stored in the memory 51 , that is, implements the above virtual image construction device method.
存储器51可主要包括存储程序区和存储数据区,其中,存储程序区可存储操作系统、至少一个功能所需的应用程序;存储数据区可存储根据虚拟图像构建设备的使用所创建的数据等。此外,存储器51可以包括高速随机存取存储器,还可以包括非易失性存储器,例如至少一个磁盘存储器件、闪存器件、或其他非易失性固态存储器件。在一些实例中,存储器51可进一步包括相对于处理器50远程设置的存储器,这些远程存储器可以通过网络连接至虚拟图像构建设备。上述网络的实例包括但不限于互联网、企业内部网、局域网、移动通信网及其组合。The memory 51 may mainly include a storage program area and a storage data area, wherein the storage program area may store an operating system, an application program required for at least one function; the storage data area may store data created according to the use of the virtual image construction apparatus, and the like. In addition, the memory 51 may include high-speed random access memory, and may also include non-volatile memory, such as at least one magnetic disk storage device, flash memory device, or other non-volatile solid-state storage device. In some instances, memory 51 may further include memory located remotely relative to processor 50, and these remote memories may be connected to the virtual image construction device through a network. Examples of such networks include, but are not limited to, the Internet, an intranet, a local area network, a mobile communication network, and combinations thereof.
输入装置52可用于接收输入的数字或字符信息,以及产生与虚拟图像构建设备的用户设置以及功能控制有关的键信号输入,还包括图像采集装置、音频采集装置等。输出装置53可包括显示屏等显示设备。虚拟图像构建设备还可包括通信装置,以与其他设备进行数据通信。The input device 52 can be used to receive input digital or character information, and generate key signal input related to user settings and function control of the virtual image construction device, and also includes image capture devices, audio capture devices, and the like. The output device 53 may include a display device such as a display screen. The virtual image construction apparatus may further include communication means for data communication with other apparatuses.
上述虚拟图像构建设备包含虚拟图像构建装置,可以用于执行任意虚拟图像构建方法,具备相应的功能和有益效果。The above virtual image construction device includes a virtual image construction device, which can be used to execute any virtual image construction method, and has corresponding functions and beneficial effects.
此外,本申请实施例还提供一种包含计算机可执行指令的存储介质,所述计算机可执行指令在由计算机处理器执行时用于执行本申请任意实施例所提供的虚拟图像构建方法中的相关操作,且具备相应的功能和有益效果。In addition, the embodiments of the present application also provide a storage medium containing computer-executable instructions, when executed by a computer processor, the computer-executable instructions are used to execute the relevant information in the virtual image construction method provided by any embodiment of the present application. operation, and has corresponding functions and beneficial effects.
本领域内的技术人员应明白,本申请的实施例可提供为方法、系统、或计算机程序产品。As will be appreciated by those skilled in the art, the embodiments of the present application may be provided as a method, a system, or a computer program product.
因此,本申请可采用完全硬件实施例、完全软件实施例、或结合软件和硬件方面的实施例的形式。而且,本申请可采用在一个或多个其中包含有计算机可用程序代码的计算机可用存储介质(包括但不限于磁盘存储器、CD-ROM、光学存储器等)上实施的计算机程序产品的形式。本申请是参照根据本申请实施例的方法、设备(系统)、和计算机程序产品的流程图和/或方框图来描述的。应理解可由计算机程序指令实现流程图和/或方框图中的每一流程和/或方框、以及流程图和/或方框图中的流程和/或方框的结合。可提供这些计算机程序指令到通用计算机、专用计算机、嵌入式处理机或其他可编程数据处理设备的处理器以产生一个机器,使得通过计算机或其他可编程数据处理设备的处理器执行的指令产生用于实现在流程图一个流程或多个流程和/或方框图一个方框或多个方框中指定的功能的装置。这些计算机程序指令也可存储在能引导计算机或其他可编程数据处理设备以特定方式工作的计算机可读存储器中,使得存储在该计算机可读存储器中的指令产生包括指令装置的制造品,该指 令装置实现在流程图一个流程或多个流程和/或方框图一个方框或多个方框中指定的功能。这些计算机程序指令也可装载到计算机或其他可编程数据处理设备上,使得在计算机或其他可编程设备上执行一系列操作步骤以产生计算机实现的处理,从而在计算机或其他可编程设备上执行的指令提供用于实现在流程图一个流程或多个流程和/或方框图一个方框或多个方框中指定的功能的步骤。Accordingly, the present application may take the form of an entirely hardware embodiment, an entirely software embodiment, or an embodiment combining software and hardware aspects. Furthermore, the present application may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, etc.) having computer-usable program code embodied therein. The present application is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems), and computer program products according to embodiments of the present application. It will be understood that each flow and/or block in the flowchart illustrations and/or block diagrams, and combinations of flows and/or blocks in the flowchart illustrations and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to the processor of a general purpose computer, special purpose computer, embedded processor or other programmable data processing device to produce a machine such that the instructions executed by the processor of the computer or other programmable data processing device produce Means for implementing the functions specified in a flow or flow of a flowchart and/or a block or blocks of a block diagram. These computer program instructions may also be stored in a computer-readable memory capable of directing a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory result in an article of manufacture comprising instruction means, the instructions The apparatus implements the functions specified in the flow or flow of the flowcharts and/or the block or blocks of the block diagrams. These computer program instructions can also be loaded on a computer or other programmable data processing device to cause a series of operational steps to be performed on the computer or other programmable device to produce a computer-implemented process such that The instructions provide steps for implementing the functions specified in the flow or blocks of the flowcharts and/or the block or blocks of the block diagrams.
在一个典型的配置中,计算设备包括一个或多个处理器(CPU)、输入/输出接口、网络接口和内存。存储器可能包括计算机可读介质中的非永久性存储器,随机存取存储器(RAM)和/或非易失性内存等形式,如只读存储器(ROM)或闪存(flash RAM)。存储器是计算机可读介质的示例。In a typical configuration, a computing device includes one or more processors (CPUs), input/output interfaces, network interfaces, and memory. Memory may include non-persistent memory in computer readable media, random access memory (RAM) and/or non-volatile memory in the form of, for example, read only memory (ROM) or flash memory (flash RAM). Memory is an example of a computer-readable medium.
计算机可读介质包括永久性和非永久性、可移动和非可移动媒体可以由任何方法或技术来实现信息存储。信息可以是计算机可读指令、数据结构、程序的模块或其他数据。计算机的存储介质的例子包括,但不限于相变内存(PRAM)、静态随机存取存储器(SRAM)、动态随机存取存储器(DRAM)、其他类型的随机存取存储器(RAM)、只读存储器(ROM)、电可擦除可编程只读存储器(EEPROM)、快闪记忆体或其他内存技术、只读光盘只读存储器(CD-ROM)、数字多功能光盘(DVD)或其他光学存储、磁盒式磁带,磁带磁磁盘存储或其他磁性存储设备或任何其他非传输介质,可用于存储可以被计算设备访问的信息。按照本文中的界定,计算机可读介质不包括暂存电脑可读媒体(transitory media),如调制的数据信号和载波。Computer-readable media includes both persistent and non-permanent, removable and non-removable media, and storage of information may be implemented by any method or technology. Information may be computer readable instructions, data structures, modules of programs, or other data. Examples of computer storage media include, but are not limited to, phase-change memory (PRAM), static random access memory (SRAM), dynamic random access memory (DRAM), other types of random access memory (RAM), read only memory (ROM), Electrically Erasable Programmable Read Only Memory (EEPROM), Flash Memory or other memory technology, Compact Disc Read Only Memory (CD-ROM), Digital Versatile Disc (DVD) or other optical storage, Magnetic tape cassettes, magnetic tape magnetic disk storage or other magnetic storage devices or any other non-transmission medium that can be used to store information that can be accessed by a computing device. As defined herein, computer-readable media does not include transitory computer-readable media, such as modulated data signals and carrier waves.
还需要说明的是,术语“包括”、“包含”或者其任何其他变体意在涵盖非排他性的包含,从而使得包括一系列要素的过程、方法、商品或者设备不仅包括那些要素,而且还包括没有明确列出的其他要素,或者是还包括为这种过程、方法、商品或者设备所固有的要素。在没有更多限制的情况下,由语句“包括一个……”限定的要素,并不排除在包括要素的过程、方法、商品或者设备中还存在另外的相同要素。It should also be noted that the terms "comprising", "comprising" or any other variation thereof are intended to encompass a non-exclusive inclusion such that a process, method, article or device comprising a series of elements includes not only those elements, but also Other elements not expressly listed, or which are inherent to such a process, method, article of manufacture, or apparatus are also included. Without further limitation, an element qualified by the phrase "comprising a..." does not preclude the presence of additional identical elements in the process, method, article of manufacture or apparatus that includes the element.
注意,上述仅为本申请的较佳实施例及所运用技术原理。本领域技术人员会理解,本申请不限于这里所述的特定实施例,对本领域技术人员来说能够进行各种明显的变化、重新调整和替代而不会脱离本申请的保护范围。因此,虽然通过以上实施例对本申请进行了较为详细的说明,但是本申请不仅仅限于以上实施例,在不脱离本申请构思的情况下,还可以包括更多其他等效实施例,而本申请的范围由所附的权利要求范围决定。Note that the above are only preferred embodiments of the present application and applied technical principles. Those skilled in the art will understand that the present application is not limited to the specific embodiments described herein, and various obvious changes, readjustments and substitutions can be made by those skilled in the art without departing from the protection scope of the present application. Therefore, although the present application has been described in detail through the above embodiments, the present application is not limited to the above embodiments, and can also include more other equivalent embodiments without departing from the concept of the present application. The scope is determined by the scope of the appended claims.

Claims (12)

  1. 一种虚拟图像构建方法,其中,包括:A virtual image construction method, comprising:
    获取当前帧图像数据,所述当前帧图像数据包含目标对象的人脸图像;Acquiring current frame image data, the current frame image data comprising the face image of the target object;
    根据所述当前帧图像数据构建所述目标对象的人脸中性表情基和多个人脸个性化表情基;Construct a neutral facial expression base and a plurality of personalized facial expression bases of the target object according to the current frame image data;
    根据所述人脸中性表情基和多个所述人脸个性化表情基构建所述目标对象的三维人脸模型;Constructing a three-dimensional face model of the target object according to the neutral facial expression base and a plurality of individual facial expression bases;
    确定所述三维人脸模型映射到所述人脸图像时所述三维人脸模型的位姿参数以及各所述人脸个性化表情基的权重系数;determining the pose parameters of the three-dimensional human face model when the three-dimensional human face model is mapped to the human face image and the weight coefficients of each of the individualized expression bases of the human face;
    将所述位姿参数和所述权重系数发送至远端设备,以使所述远端设备根据所述位姿参数和所述权重系数生成与所述人脸图像相对应的虚拟图像。Sending the pose parameters and the weight coefficients to a remote device, so that the remote device generates a virtual image corresponding to the face image according to the pose parameters and the weight coefficients.
  2. 根据权利要求1所述的虚拟图像构建方法,其中,所述确定所述三维人脸模型映射到所述人脸图像时,所述三维人脸模型的位姿参数以及各所述人脸个性化表情基的权重系数包括:The virtual image construction method according to claim 1, wherein, when the three-dimensional face model is determined to be mapped to the face image, the pose parameters of the three-dimensional face model and the individualized face The weight coefficients of the expression base include:
    构建所述三维人脸模型映射到所述人脸图像时的误差参数公式;constructing the error parameter formula when the three-dimensional face model is mapped to the face image;
    根据所述误差参数公式,确定误差参数最小时所述三维人脸模型的位姿参数以及各所述人脸个性化表情基的权重系数。According to the error parameter formula, the pose parameters of the three-dimensional face model and the weight coefficients of the individualized expression bases of the face are determined when the error parameter is the smallest.
  3. 根据权利要求2所述的虚拟图像构建方法,其中,所述根据所述误差参数公式,确定误差参数最小时所述三维人脸模型的位姿参数以及各所述人脸个性化表情基的权重系数包括:The virtual image construction method according to claim 2, wherein, according to the error parameter formula, the pose parameters of the three-dimensional face model and the weights of the individualized expression bases of the faces are determined when the error parameters are the smallest. Factors include:
    获取各所述人脸个性化表情基的初始化权重系数,将所述初始化权重系数作为当前权重系数;Obtain the initialization weight coefficients of each of the face personalized expression bases, and use the initialization weight coefficients as the current weight coefficients;
    将所述当前权重系数代入所述误差参数公式,并计算误差参数最小时所述三维人脸模型的备选位姿参数;Substitute the current weight coefficient into the error parameter formula, and calculate the candidate pose parameter of the three-dimensional face model when the error parameter is the smallest;
    将所述备选位姿参数代入所述误差参数公式,并计算所述误差参数最小时各所述人脸个性化表情基的备选权重系数;Substitute the candidate pose parameters into the error parameter formula, and calculate the candidate weight coefficients of the individualized expression bases of each face when the error parameter is the smallest;
    更新当前的迭代次数;update the current iteration count;
    判断所述迭代次数是否达到次数阈值;Judging whether the number of iterations reaches the number threshold;
    所述迭代次数未达到所述次数阈值时,将所述备选权重系数作为当前权重系数,并返回执行将所述当前权重系数代入所述误差参数公式的操作,直到所述迭代次数达到所述次数阈值为止;When the number of iterations does not reach the threshold of the number of times, take the candidate weight coefficient as the current weight coefficient, and return to perform the operation of substituting the current weight coefficient into the error parameter formula until the number of iterations reaches the up to the number of thresholds;
    所述迭代次数达到所述次数阈值时,将最后得到的备选位姿参数作为所述三维人脸模型的位姿参数,将最后得到的备选权重系数作为所述人脸个性化表情基的权重系数。When the number of iterations reaches the threshold of the number of times, the finally obtained candidate pose parameters are used as the pose parameters of the three-dimensional face model, and the finally obtained candidate weight coefficients are used as the face personalized expression base. weight factor.
  4. 根据权利要求2或3所述的虚拟图像构建方法,其中,所述误差参数公式为
    Figure PCTCN2021070727-appb-100001
    其中,Cβ≤d,
    The virtual image construction method according to claim 2 or 3, wherein the error parameter formula is:
    Figure PCTCN2021070727-appb-100001
    Among them, Cβ≤d,
    其中,E exp表示误差参数,β表示权重系数向量,β=(β 1 β 2 … β n),n表示所述人脸个性化表情基的总数量,β i表示第i个人脸个性化表情基的权重系数,1≤i≤n,A=sR·ΔB,s表示所述三维人脸模型映射到所述人脸图像时的尺度缩放因子,R表示所述三维人脸模型映射到所述人脸图像时的刚性旋转矩阵,ΔB=[B 1-B 0 B 2-B 0 … B n-B 0],B 0表示所述人脸中性表情基,B i表示第i个人脸个性化表情基,b=f-t-sR·B 0,f表示所述人脸图像中的人脸关键点,t表示所述三维人脸模型映射到所述人脸图像时的平移向量,s、R和t为位姿参数,C表示β的约束参数,d表示β的取值范围。 Among them, E exp represents the error parameter, β represents the weight coefficient vector, β=(β 1 β 2 ... β n ), n represents the total number of the face personalized expression base, β i represents the ith face personalized expression The weight coefficient of the basis, 1≤i≤n, A=sR·ΔB, s represents the scaling factor when the 3D face model is mapped to the face image, R represents the 3D face model is mapped to the Rigid rotation matrix of the face image, ΔB=[B 1 -B 0 B 2 -B 0 ... B n -B 0 ], B 0 represents the neutral expression base of the face, and B i represents the i-th face personality Expression base, b=ft-sR·B 0 , f represents the face key point in the face image, t represents the translation vector when the three-dimensional face model is mapped to the face image, s, R And t is the pose parameter, C represents the constraint parameter of β, and d represents the value range of β.
  5. 根据权利要求4所述的虚拟图像构建方法,其中,
    Figure PCTCN2021070727-appb-100002
    其中,ones(n)表示n个所述权重系数的取值上界,zero(n)表示n个所述权重系数的取值下界;或,
    The virtual image construction method according to claim 4, wherein,
    Figure PCTCN2021070727-appb-100002
    Wherein, ones(n) represents the upper bound of the value of the n weight coefficients, and zero(n) represents the lower bound of the value of the n weight coefficients; or,
    Figure PCTCN2021070727-appb-100003
    其中,ones(n)表示n个所述权重系数的取值上界,zero(n)表示n个所述权重系数的取值下界,p n和q n为取值约束矩阵,p n和q n根据所述人脸图像中的人脸关键点相对距离确定。
    Figure PCTCN2021070727-appb-100003
    Among them, ones(n) represents the upper bound of the value of the n weight coefficients, zero(n) represents the lower bound of the value of the n weighted coefficients, p n and q n are the value constraint matrices, p n and q n is determined according to the relative distance of the face key points in the face image.
  6. 根据权利要求2所述的虚拟图像构建方法,其中,所述根据所述误差参数公式,确定误差参数最小时所述三维人脸模型的位姿参数以及各所述人脸个性化表情基的权重系数之前,还包括:The virtual image construction method according to claim 2, wherein, according to the error parameter formula, the pose parameters of the three-dimensional face model and the weights of the individualized expression bases of the faces are determined when the error parameters are the smallest. Before the coefficients, also include:
    查找各所述人脸个性化表情基中的互为排斥表情基;Find mutually exclusive expression bases in each of the face personalized expression bases;
    根据所述互为排斥表情基对各所述人脸个性化表情基进行分组,得到多个表情基组,每个所述表情基组中的任意两个人脸个人化表情基间均不互为排斥;According to the mutually exclusive expression bases, the individualized facial expression bases are grouped to obtain a plurality of expression base groups, and any two personalized facial expression bases in each of the expression base groups are not mutually exclusive. exclude;
    所述根据所述误差参数公式,确定误差参数最小时所述三维人脸模型的位姿参数以及各所述人脸个性化表情基的权重系数包括:According to the error parameter formula, it is determined that the pose parameters of the three-dimensional face model when the error parameter is the smallest and the weight coefficients of the individualized expression bases of the face include:
    根据所述误差参数公式,计算每个所述表情基组对应的最小误差参数,以及在最小误差参数时所述三维人脸模型的位姿参数和所述表情基组内各人脸个性化表情基的权重系数;According to the error parameter formula, calculate the minimum error parameter corresponding to each expression basis set, as well as the pose parameters of the three-dimensional face model and the individualized expressions of each face in the expression basis set at the minimum error parameter The weight coefficient of the base;
    在每个所述表情基组对应的所述最小误差参数中,选择最小的最小误差参数;Among the minimum error parameters corresponding to each expression basis set, select the minimum minimum error parameter;
    将所述最小的最小误差参数对应的位姿参数以及权重系数作为最终得到的位姿参数和权重系数。The pose parameter and weight coefficient corresponding to the smallest minimum error parameter are used as the finally obtained pose parameter and weight coefficient.
  7. 根据权利要求1所述的虚拟图像构建方法,其中,所述根据所述当前帧图像数据构建所述目标对象的人脸中性表情基和多个人脸个性化表情基包括:The method for constructing a virtual image according to claim 1, wherein the constructing a neutral facial expression base and a plurality of personalized facial expression bases of the target object according to the current frame image data comprises:
    根据所述当前帧图像数据和预先设置的人脸模型先验信息构建所述目标对象的人脸中性表情基;Constructing the neutral facial expression base of the target object according to the current frame image data and the preset prior information of the facial model;
    根据所述人脸中性表情基和预先设置的参考中性表情基和各参考个性化表情基,确定所述目标对象的各所述人脸个性化表情基,每个所述参考个性化表情基对应一个人脸个性化表情基。According to the neutral facial expression base, the preset reference neutral expression base and each reference individualized expression base, each of the individualized facial expression bases of the target object is determined, and each of the reference individualized expression bases is determined. The base corresponds to a face personalized expression base.
  8. 根据权利要求7所述的虚拟图像构建方法,其中,所述根据所述当前帧图像数据和预先设置的人脸模型先验信息构建所述目标对象的人脸中性表情基包括:The virtual image construction method according to claim 7, wherein the construction of the neutral facial expression base of the target object according to the current frame image data and preset human face model prior information comprises:
    检测所述当前帧数据图像中的所述人脸图像;Detecting the face image in the current frame data image;
    对所述人脸图像进行人脸关键点定位,以得到关键点坐标数组;Perform face key point positioning on the face image to obtain a key point coordinate array;
    根据所述人脸图像、所述关键点坐标数组和预先设置的人脸模型先验信息确定所述目标对象的人脸中性表情基。The neutral facial expression base of the target object is determined according to the facial image, the key point coordinate array and the preset prior information of the facial model.
  9. 根据权利要求7所述的虚拟图像构建方法,其中,所述根据所述人脸中性表情基和预先设置的参考中性表情基和各参考个性化表情基,确定所述目标对象的各人脸个性化表情基包括:The virtual image construction method according to claim 7, wherein, according to the neutral facial expression base and preset reference neutral expression bases and each reference personalized expression base, determining each person of the target object The face-personalized expression base includes:
    根据所述参考中性表情基和所述参考个性化表情基确定形变信息;Determine deformation information according to the reference neutral expression base and the reference personalized expression base;
    根据所述形变信息和所述人脸中性表情基确定所述目标对象的人脸个性化表情基。According to the deformation information and the neutral facial expression base, a personalized facial expression base of the target object is determined.
  10. 一种虚拟图像构建装置,其中,包括:A virtual image construction device, comprising:
    图像获取模块,用于获取当前帧图像数据,所述当前帧图像数据包含目标对象的人脸图像;an image acquisition module for acquiring current frame image data, where the current frame image data includes a face image of a target object;
    表情基构建模块,用于根据所述当前帧图像数据构建所述目标对象的人脸中性表情基和多个人脸个性化表情基;an expression base building module for constructing a neutral facial expression base and a plurality of individualized facial expression bases of the target object according to the current frame image data;
    人脸模型构建模块,用于根据所述人脸中性表情基和多个所述人脸个性化表情基构建所述目标对象的三维人脸模型;a face model building module, used for constructing a three-dimensional face model of the target object according to the neutral facial expression base and a plurality of individual facial expression bases;
    参数确定模块,用于确定所述三维人脸模型映射到所述人脸图像时所述三维人脸模型的位姿参数以及各所述人脸个性化表情基的权重系数;a parameter determination module, configured to determine the pose parameters of the three-dimensional human face model when the three-dimensional human face model is mapped to the human face image and the weight coefficients of the individualized expression bases of the human face;
    参数发送模块,用于将所述位姿参数和所述权重系数发送至远端设备,以使所述远端设备根据所述位姿参数和所述权重系数生成与所述人脸图像相对应的虚拟图像。A parameter sending module, configured to send the pose parameters and the weight coefficients to a remote device, so that the remote device generates images corresponding to the face according to the pose parameters and the weight coefficients virtual image.
  11. 一种虚拟图像构建设备,其中,包括:A virtual image construction device, comprising:
    一个或多个处理器;one or more processors;
    存储器,用于存储一个或多个程序;memory for storing one or more programs;
    当所述一个或多个程序被所述一个或多个处理器执行,使得所述一个或多个处理器实现如权利要求1-9中任一所述的虚拟图像构建方法。When the one or more programs are executed by the one or more processors, the one or more processors implement the virtual image construction method according to any one of claims 1-9.
  12. 一种计算机可读存储介质,其上存储有计算机程序,其中,该程序被处理器执行时实现如权利要求1-9中任一所述的虚拟图像构建方法。A computer-readable storage medium on which a computer program is stored, wherein, when the program is executed by a processor, the virtual image construction method according to any one of claims 1-9 is implemented.
PCT/CN2021/070727 2021-01-07 2021-01-07 Virtual image construction method and apparatus, device, and storage medium WO2022147736A1 (en)

Priority Applications (2)

Application Number Priority Date Filing Date Title
PCT/CN2021/070727 WO2022147736A1 (en) 2021-01-07 2021-01-07 Virtual image construction method and apparatus, device, and storage medium
CN202180024686.6A CN115335865A (en) 2021-01-07 2021-01-07 Virtual image construction method, device, equipment and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
PCT/CN2021/070727 WO2022147736A1 (en) 2021-01-07 2021-01-07 Virtual image construction method and apparatus, device, and storage medium

Publications (1)

Publication Number Publication Date
WO2022147736A1 true WO2022147736A1 (en) 2022-07-14

Family

ID=82357818

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2021/070727 WO2022147736A1 (en) 2021-01-07 2021-01-07 Virtual image construction method and apparatus, device, and storage medium

Country Status (2)

Country Link
CN (1) CN115335865A (en)
WO (1) WO2022147736A1 (en)

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20220230399A1 (en) * 2021-01-19 2022-07-21 Samsung Electronics Co., Ltd. Extended reality interaction in synchronous virtual spaces using heterogeneous devices
CN114972661A (en) * 2022-08-01 2022-08-30 深圳元象信息科技有限公司 Face model construction method, face image generation device and storage medium
CN115222895A (en) * 2022-08-30 2022-10-21 北京百度网讯科技有限公司 Image generation method, device, equipment and storage medium
CN116453222A (en) * 2023-04-19 2023-07-18 北京百度网讯科技有限公司 Target object posture determining method, training device and storage medium
CN117746381A (en) * 2023-12-12 2024-03-22 北京迁移科技有限公司 Pose estimation model configuration method and pose estimation method
WO2024108552A1 (en) * 2022-11-25 2024-05-30 广州酷狗计算机科技有限公司 Face driving method and apparatus for virtual model, and device and storage medium

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2024113290A1 (en) * 2022-12-01 2024-06-06 京东方科技集团股份有限公司 Image processing method and apparatus, interactive device, electronic device and storage medium
CN115953813B (en) * 2022-12-19 2024-01-30 北京字跳网络技术有限公司 Expression driving method, device, equipment and storage medium

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1920886A (en) * 2006-09-14 2007-02-28 浙江大学 Video flow based three-dimensional dynamic human face expression model construction method
CN105528805A (en) * 2015-12-25 2016-04-27 苏州丽多数字科技有限公司 Virtual face animation synthesis method
WO2017137947A1 (en) * 2016-02-10 2017-08-17 Vats Nitin Producing realistic talking face with expression using images text and voice
CN111814652A (en) * 2020-07-03 2020-10-23 广州视源电子科技股份有限公司 Virtual portrait rendering method, device and storage medium

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1920886A (en) * 2006-09-14 2007-02-28 浙江大学 Video flow based three-dimensional dynamic human face expression model construction method
CN105528805A (en) * 2015-12-25 2016-04-27 苏州丽多数字科技有限公司 Virtual face animation synthesis method
WO2017137947A1 (en) * 2016-02-10 2017-08-17 Vats Nitin Producing realistic talking face with expression using images text and voice
CN111814652A (en) * 2020-07-03 2020-10-23 广州视源电子科技股份有限公司 Virtual portrait rendering method, device and storage medium

Cited By (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20220230399A1 (en) * 2021-01-19 2022-07-21 Samsung Electronics Co., Ltd. Extended reality interaction in synchronous virtual spaces using heterogeneous devices
US11995776B2 (en) * 2021-01-19 2024-05-28 Samsung Electronics Co., Ltd. Extended reality interaction in synchronous virtual spaces using heterogeneous devices
CN114972661A (en) * 2022-08-01 2022-08-30 深圳元象信息科技有限公司 Face model construction method, face image generation device and storage medium
CN115222895A (en) * 2022-08-30 2022-10-21 北京百度网讯科技有限公司 Image generation method, device, equipment and storage medium
WO2024108552A1 (en) * 2022-11-25 2024-05-30 广州酷狗计算机科技有限公司 Face driving method and apparatus for virtual model, and device and storage medium
CN116453222A (en) * 2023-04-19 2023-07-18 北京百度网讯科技有限公司 Target object posture determining method, training device and storage medium
CN116453222B (en) * 2023-04-19 2024-06-11 北京百度网讯科技有限公司 Target object posture determining method, training device and storage medium
CN117746381A (en) * 2023-12-12 2024-03-22 北京迁移科技有限公司 Pose estimation model configuration method and pose estimation method

Also Published As

Publication number Publication date
CN115335865A (en) 2022-11-11

Similar Documents

Publication Publication Date Title
WO2022147736A1 (en) Virtual image construction method and apparatus, device, and storage medium
US10679046B1 (en) Machine learning systems and methods of estimating body shape from images
CN111598998B (en) Three-dimensional virtual model reconstruction method, three-dimensional virtual model reconstruction device, computer equipment and storage medium
US11210804B2 (en) Methods, devices and computer program products for global bundle adjustment of 3D images
US11399141B2 (en) Processing holographic videos
WO2022156640A1 (en) Gaze correction method and apparatus for image, electronic device, computer-readable storage medium, and computer program product
WO2023050992A1 (en) Network training method and apparatus for facial reconstruction, and device and storage medium
Li et al. Object detection in the context of mobile augmented reality
WO2022001236A1 (en) Three-dimensional model generation method and apparatus, and computer device and storage medium
US20150035825A1 (en) Method for real-time face animation based on single video camera
CN111723707B (en) Gaze point estimation method and device based on visual saliency
US11315313B2 (en) Methods, devices and computer program products for generating 3D models
CN111161395A (en) Method and device for tracking facial expression and electronic equipment
US20220198731A1 (en) Pixel-aligned volumetric avatars
CN111815768B (en) Three-dimensional face reconstruction method and device
Chang et al. Salgaze: Personalizing gaze estimation using visual saliency
CN117218246A (en) Training method and device for image generation model, electronic equipment and storage medium
US11158122B2 (en) Surface geometry object model training and inference
Wang et al. Handling occlusion and large displacement through improved RGB-D scene flow estimation
Canton-Ferrer et al. Head orientation estimation using particle filtering in multiview scenarios
CN115460372A (en) Virtual image construction method, device, equipment and storage medium
CN115937365A (en) Network training method, device and equipment for face reconstruction and storage medium
Lee et al. Real-time camera tracking using a particle filter and multiple feature trackers
Jian et al. Realistic face animation generation from videos
US20230290101A1 (en) Data processing method and apparatus, electronic device, and computer-readable storage medium

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 21916793

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 21916793

Country of ref document: EP

Kind code of ref document: A1