WO2024098685A1 - 虚拟人物面部驱动方法、装置、终端设备和可读存储介质 - Google Patents

虚拟人物面部驱动方法、装置、终端设备和可读存储介质 Download PDF

Info

Publication number
WO2024098685A1
WO2024098685A1 PCT/CN2023/092007 CN2023092007W WO2024098685A1 WO 2024098685 A1 WO2024098685 A1 WO 2024098685A1 CN 2023092007 W CN2023092007 W CN 2023092007W WO 2024098685 A1 WO2024098685 A1 WO 2024098685A1
Authority
WO
WIPO (PCT)
Prior art keywords
facial
face
user
key points
matrix
Prior art date
Application number
PCT/CN2023/092007
Other languages
English (en)
French (fr)
Inventor
张顺四
卢增
赵寒枫
张强
Original Assignee
广州趣丸网络科技有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 广州趣丸网络科技有限公司 filed Critical 广州趣丸网络科技有限公司
Publication of WO2024098685A1 publication Critical patent/WO2024098685A1/zh

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T13/00Animation
    • G06T13/203D [Three Dimensional] animation
    • G06T13/403D [Three Dimensional] animation of characters, e.g. humans, animals or virtual beings
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T5/00Image enhancement or restoration
    • G06T5/70Denoising; Smoothing
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/20Image preprocessing
    • G06V10/34Smoothing or thinning of the pattern; Morphological operations; Skeletonisation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • G06V40/16Human faces, e.g. facial parts, sketches or expressions
    • G06V40/168Feature extraction; Face representation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • G06V40/16Human faces, e.g. facial parts, sketches or expressions
    • G06V40/172Classification, e.g. identification
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • G06V40/16Human faces, e.g. facial parts, sketches or expressions
    • G06V40/174Facial expression recognition
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2200/00Indexing scheme for image data processing or generation, in general
    • G06T2200/08Indexing scheme for image data processing or generation, in general involving all processing steps from image acquisition to 3D model generation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/30Subject of image; Context of image processing
    • G06T2207/30196Human being; Person
    • G06T2207/30201Face

Definitions

  • the present application relates to the field of face recognition and driving technology, and in particular to a virtual character face driving method, device, terminal equipment and readable storage medium.
  • Virtual characters can gradually shorten the distance between people in terms of geographical location at the perceptual level, thereby achieving real-time interaction. Therefore, the number of applications related to virtual characters for client users is increasing.
  • related products such as Kalidokit and ARkit can realize the driving of virtual characters.
  • Traditional virtual character driving generally relies on large-scale motion capture equipment, that is, multiple cameras or depth cameras (such as ARkit) are required to extract facial features, which is difficult to be widely used on platforms with low computing power (such as low-configuration PCs and mobile phones) and maintain good results.
  • traditional virtual character driving generally relies on large-scale motion capture equipment, that is, multiple cameras or depth cameras (such as ARkit) are required to extract facial features, which is difficult to be widely used on platforms with low computing power (such as low-configuration PCs and mobile phones) and maintain good results.
  • facial key points that is, by calculating the key point offset of the character to be tested relative to the key point under normal expression, and driving the virtual character based on the key point offset and some preset rules.
  • the expression of the virtual character obtained by this driving method is not delicate enough, resulting in poor display effect.
  • the embodiments of the present application provide a virtual character face driving method, apparatus, terminal device and readable storage medium to overcome the problem in the prior art that the virtual character expressions driven are not delicate enough, resulting in poor display effects.
  • the present application provides a method for driving a virtual character face.
  • a method for driving a virtual character face include:
  • the smoothed user face key points are aligned with the standard face key points using the least square method to obtain an aligned face key point matrix
  • the encoding coefficients are used to drive the face of the virtual character.
  • an embodiment of the present application provides a virtual character face driving device, the device comprising:
  • a facial image acquisition module configured to acquire a facial image of a user
  • a key point detection module is configured to detect key points of the face on the facial image to obtain key points of the user's face
  • a smoothing processing module configured to perform smoothing processing on the key points of the user's face
  • An alignment module is configured to align the smoothed user face key points with the standard face key points using a least squares method to obtain an aligned face key point matrix
  • a first division module is configured to divide the aligned facial key point matrix according to facial parts to obtain a key point matrix for each part;
  • the second division module is configured to divide each expression base according to the facial part to obtain the expression base matrix of each part;
  • a coding coefficient calculation module is configured to input the expression base matrix of each part into the corresponding key point matrix of the part for sparse coding, and calculate each coding coefficient;
  • the driving module is configured to use the encoding coefficients to drive the face of the virtual character.
  • an embodiment of the present application provides a terminal device, comprising: a memory; one or more processors coupled to the memory; one or more application programs, wherein one or more Multiple applications are stored in the memory and are configured to be executed by one or more processors, and the one or more applications are configured to execute the virtual character face driving method provided in the first aspect above.
  • an embodiment of the present application provides a computer-readable storage medium, in which a program code is stored.
  • the program code can be called by a processor to execute the virtual character face driving method provided in the first aspect above.
  • the virtual character face driving method, apparatus, terminal device and readable storage medium provided in the embodiments of the present application first obtain a user's facial image, perform facial key point detection on the facial image, and obtain the user's facial key points; then smooth the user's facial key points; align the smoothed user's facial key points with standard facial key points using the least squares method to obtain an aligned facial key point matrix; then divide the aligned facial key point matrix according to facial parts to obtain a key point matrix for each part; divide each expression base according to facial parts to obtain an expression base matrix for each part; input the expression base matrix for each part into the corresponding part key point matrix for sparse coding, and calculate each coding coefficient; finally, use each coding coefficient to drive the virtual character face.
  • the virtual character facial driving method provided in the embodiment of the present application smoothes the key points of the user's face, which can make the effect of the virtual character's expression driving smoother and more delicate, and adopts a key point alignment and sparse coding method based on the least squares method, which can improve the calculation accuracy and speed, making the virtual character's facial driving effect more realistic and further improving the display effect.
  • FIG1 is a schematic diagram of an application scenario of a virtual character face driving method provided in an embodiment of the present application
  • FIG2 is a schematic diagram of a flow chart of a virtual character face driving method provided by an embodiment of the present application.
  • FIG3 is a schematic diagram of determining key points of a standard face provided by an embodiment of the present application.
  • FIG4 is a schematic diagram of the structure of a vector synthesis method provided by an embodiment of the present application.
  • FIG5 is a schematic diagram of a structure for selecting an anchor point according to an embodiment of the present application.
  • FIG6 is a schematic diagram of the structure of a virtual character face driving device provided in one embodiment of the present application.
  • FIG7 is a schematic diagram of the structure of a terminal device provided in one embodiment of the present application.
  • FIG8 is a schematic diagram of the structure of a computer-readable storage medium provided in one embodiment of the present application.
  • FIG. 1 shows a schematic diagram of an application scenario of a virtual character face driving method provided in an embodiment of the present application.
  • the application scenario includes a terminal device 100 provided in an embodiment of the present application.
  • the terminal device 100 may be various electronic devices with a display screen (such as the structural diagrams of 102, 104, 106 and 108), including but not limited to smart phones and computer devices, wherein the computer device may be at least one of a desktop computer, a portable computer, a laptop computer, a tablet computer, etc.
  • the terminal device may be equipped with a camera or an image acquisition module, which is mainly used to collect a user's video image, wherein the user's video image is used to identify the user's facial image to identify the image, and then perform virtual character facial driving based on the user's facial image.
  • the terminal device 100 may generally refer to one of a plurality of terminal devices, and this embodiment is only illustrated by the terminal device 100. Those skilled in the art may know that the number of the above-mentioned terminal devices may be more or less. For example, the above-mentioned terminal devices may be only a few, or the above-mentioned terminal devices may be dozens or hundreds, or more. The embodiment of the present application does not limit the number and type of terminal devices.
  • the terminal device 100 may be used to execute a virtual character face driving method provided in an embodiment of the present application.
  • the terminal device 100 may include a video acquisition module, a face detection module, a face key point smoothing alignment module, a BlenderShape reorganization module based on sparse coding, a BlenderShape vertex parsing and storage module and a unity driver module, wherein the video acquisition module mainly It is used to collect video images of users; the face detection module is mainly used to perform face detection on video images of users, so as to obtain facial images of users; the face key point smoothing and alignment module is mainly used to smooth and align the key points of users'faces; the BlenderShape reorganization module based on sparse coding is mainly used to calculate coding coefficients; the BlenderShape vertex parsing and storage module is mainly used to store standard face key points, where the coding coefficients can be saved in shared memory; the unity driver module is mainly used to receive coding coefficients (i.e. sparse representation coefficients) and render virtual character facial animations to the engine in real time, so as
  • the application scenario includes, in addition to the terminal device 100 provided in the embodiment of the present application, a server, wherein a network is provided between the server and the terminal device.
  • the network is used to provide a medium for a communication link between the terminal device and the server.
  • the network may include various connection types, such as wired, wireless communication links or optical fiber cables, etc.
  • the server can be a server cluster composed of multiple servers, etc.
  • the terminal device interacts with the server through the network to receive or send messages, etc.
  • the server can be a server that provides various services.
  • the server can be used to execute the steps other than the visual display of the virtual character in a virtual character face driving method provided in an embodiment of the present application.
  • part of the steps can be executed on the terminal device and part of the steps can be executed on the server, which is not limited here.
  • FIG2 shows a schematic flow chart of a virtual character face driving method provided in an embodiment of the present application, and takes the method applied to the terminal device in FIG1 as an example for explanation, and includes the following steps:
  • Step S110 obtaining a facial image of the user, performing facial key point detection on the facial image, and obtaining facial key points of the user.
  • the user may be any person who wants to drive the virtual character to form the facial expression of the virtual character.
  • the user's facial image refers to a picture containing the user's facial information.
  • Facial key points refer to the points that can reflect a person's outline and expression, such as eyebrows, nose, mouth, eyes, and key points of facial contour.
  • obtaining the facial image of the user includes: obtaining a video image of the user; and detecting the video image using a face detection model to obtain the facial image of the user.
  • the camera of the terminal device or other external video acquisition device can be used to acquire a video stream in real time to obtain a video image of the user, and then the video image of the user is detected using a face detection model to obtain a facial image of the user.
  • face detection needs to be performed on each frame. If no face image is detected in the current frame, the face image is detected from the previous frame.
  • the face detection model may be an S3FD model.
  • performing facial key point detection on the facial image includes: performing facial key point detection on the facial image using MobileFaceNet.
  • MobileFaceNet is an efficient convolutional neural network model with high accuracy and very fast running speed.
  • facial key point detection multiple facial key points may be obtained, such as 26, 68, 468, and 1000, etc.
  • the number of facial key points may not be fixed, and a corresponding number of facial key points may be selected according to the actual usage scenario and the requirements of recognition accuracy.
  • Step S120 smoothing the key points of the user's face.
  • Step S130 aligning the smoothed user's facial key points with the standard facial key points using the least square method to obtain an aligned facial key point matrix.
  • the key points since the facial key points often change when the user makes an expression, the key points will shake, which will eventually affect the generation of the new BlenderShape (i.e., the expression base), and finally cause the problem of shaking in the process of driving the virtual character.
  • the orientation and expression of the face of each user and the same user at different times are not uniform, or there will be some offset relative to the lens, which will cause the facial image contained in the video image captured by the camera to not necessarily be the front face.
  • the extracted facial key points are difficult to compare with the facial key points of the BlenderShape in the expression base (i.e., the standard facial key points), which will affect the effect of the virtual character face driving to a certain extent.
  • the user's facial key points are first smoothed, and then the smoothed user's facial key points are aligned with the standard facial key points.
  • BlenderShape refers to 52 basic face models representing different expressions, and these basic face models have been decoupled to the greatest extent during production, and each BlenderShape will only express the changes in a single part of the face. Therefore, these 52 BenderShapes are also called expression bases, and the new expression is obtained by weighting some or all of the expression bases in these 52 expression bases through the corresponding coefficients calculated.
  • the standard face key points are determined based on these 52 BlenderShapes. First, all 52 BlenderShapes are read into the running environment, and all points and their serial numbers on the BlenderShapes are visualized.
  • the points on the 52 BlenderShapes are matched one by one with the face key points (that is, the key points required by the user to drive the virtual character's face, usually 68), and the serial numbers of each key point are recorded; then, the vertices on the 52 BlenderShape surfaces are parsed and converted into the form of face key points (usually 68), as shown in Figure 3. Finally, the 52 groups of 68 key points converted from these 52 BlenderShapes are grouped and stored in different matrices, which are recorded as expression base matrices.
  • Step S140 dividing the aligned facial key point matrix according to facial parts to obtain a key point matrix for each part.
  • the aligned facial key point matrix may be divided into key point matrices of various facial parts (eg, mouth, left and right eyes, left and right eyebrows, and facial contour).
  • various facial parts eg, mouth, left and right eyes, left and right eyebrows, and facial contour.
  • Step S150 dividing each expression base according to the facial part to obtain the expression base matrix of each part.
  • BlenderShape in each expression base are divided in the same way according to the parts of the face, and then sparse coding is calculated for different parts of the face.
  • This decoupling method can not only make the calculated coding sparse more accurate, but also greatly reduce the amount of calculation and speed up the system operation.
  • the mouth is taken as an example. Detailed description, other parts of the face are similar to the mouth.
  • There are 20 key points of the mouth so the key points related to the mouth in the aligned face key point matrix are divided together to form the key point matrix of the mouth.
  • For the key points of BlenderShape in the expression base there are N key points related to the mouth and M key points related to the nose.
  • the key points of the N BlenderShape related to the mouth are divided together and stored in the same matrix, which is recorded as the mouth expression base matrix, where the shape of the mouth expression base matrix is (N, 68*2); similarly, the key points of the M BlenderShape related to the nose are divided together and stored in the same matrix, which is recorded as the nose expression base matrix, where the shape of the nose expression base matrix is (M, 68*2).
  • the left and right eyebrows, facial contours, left and right eyes and other parts are stored in this way.
  • Step S160 inputting the expression base matrix of each part into the corresponding part key point matrix for sparse coding, and calculating each coding coefficient.
  • the expression base matrix of each part (such as the expression base matrix of the mouth) is needed to sparsely represent the corresponding part key point matrix (the key point matrix of the mouth) to obtain the coding coefficient.
  • Step S170 using each encoding coefficient to drive the face of the virtual character.
  • the coding coefficient is applied to the face of the virtual character to drive it to display the corresponding expression.
  • the virtual character face driving method provided in the embodiment of the present application first obtains the user's facial image, performs facial key point detection on the facial image, and obtains the user's facial key points; then smoothes the user's facial key points; aligns the smoothed user's facial key points with standard facial key points using the least squares method to obtain an aligned facial key point matrix; then divides the aligned facial key point matrix according to facial parts to obtain a key point matrix for each part; divides each expression base according to facial parts to obtain an expression base matrix for each part; inputs the expression base matrix for each part into the corresponding part key point matrix for sparse coding, and calculates each coding coefficient; finally, uses each coding coefficient to drive the virtual character face.
  • the virtual character facial driving method provided in the embodiment of the present application smoothes the key points of the user's face, which can make the effect of the virtual character's expression driving smoother and more delicate, and adopts a key point alignment and sparse coding method based on the least squares method, which can improve the calculation accuracy and speed, making the virtual character's facial driving effect more realistic and further improving the display effect.
  • smoothing is performed on the key points of the user's face, including: using a vector synthesis method to smooth the key points of the user's face.
  • a vector synthesis method can be used to smooth the key points of the user's face; the key point smoothing method of vector synthesis is shown in Figure 4, t-1, t and t+1 represent the facial key points in three adjacent frames, and the vector composed of t-1 and t is first calculated. And multiply this vector by a coefficient ⁇ , The model is reduced, and then Add it to the position of the point in the t+1th frame, so that the movement from the tth frame to the t+1th frame will be affected by the inertia of the movement from the t-1th frame to the tth frame, making the movement change from t-1 to t+1 smoother and more natural.
  • the user's facial key points after smoothing are aligned with the standard facial key points using the least squares method, including: processing the user's facial key points after smoothing using an affine transformation matrix to obtain transformed user's facial key points; wherein the affine transformation matrix is calculated using the least squares method; selecting anchor points from the standard facial key points; and aligning the transformed user's facial key points with the standard facial key points based on the anchor points.
  • an affine transformation matrix can be used to transform the geometric positions of a group of points to a group of target points.
  • the specific formula is as follows:
  • (x 1 ,y 1 ) refers to the key points of the user's face before being processed by the affine transformation matrix
  • (x 2 ,y 2 ) refers to the target point obtained by rotation, scaling and translation in sequence.
  • the least square method can be used to calculate the affine transformation matrix, mainly to calculate the three variables ⁇ , s and t of the matrix.
  • represents the rotation angle, which is related to the triangle
  • the function forms a rotation matrix
  • s represents the scale factor
  • t represents the offset after rotation and scaling.
  • pi represents the i-th key point to be transformed
  • qi is the i-th target key point
  • the calculated ⁇ , s and t should make sRpi +T and qi the closest.
  • Formula (2) describes in detail how to calculate these three coefficients.
  • s, R and T correspond to the scale factor, rotation matrix and offset after rotation and scaling in formula (1) respectively.
  • the meanings of ⁇ , s and t are the same as those in formula (1).
  • the specific expression of formula (2) is:
  • the affine transformation matrix can be recorded as M, and then multiplying M with the user's face key point matrix p can get the aligned face key point matrix p'.
  • key point alignment is to better compare the key points of the user's face with the key points of the standard face (i.e., the key points on the BlenderShape), so while aligning as much as possible, the original facial features of the user should be maintained as much as possible.
  • some points that will not change significantly due to expression changes and facial deviations are used as anchor points.
  • the number of anchor points can be multiple, and the anchor points are shown as the points in the rectangular frame in Figure 5.
  • step S160 when executing step S160, the expression base matrix of each part is input into the corresponding part key point matrix for sparse coding, and each coding coefficient is calculated, including: using the expression base matrix of each part to sparsely represent the corresponding part key point matrix to obtain each coding coefficient.
  • the mouth For ease of understanding, let's take the mouth as an example. Assume that the facial key point matrix is p', and separate each part of the face from p' (mouth, left and right eyes, left and right eyebrows, and facial contour)
  • the key point matrix of mp' is represented by the key point matrix of the mouth.
  • there are 20 key points of the mouth there are only 40 features when converted into vectors.
  • BlenderShape expression base Since each expression base has been decoupled when the BlenderShape expression base is produced, there is no need to use a large amount of data to learn a basic vector group. It is only necessary to calculate a set of encoding coefficients c based on the mouth expression base matrix FM each time the mp' to be tested is input. Use c to perform weighted calculation on the vector group in FM to obtain the coefficient representation of mp', and generate a BlenderSshape similar to the mouth shape of the face to be tested, so as to drive the face in real time.
  • the coding coefficient is calculated by the following formula:
  • c represents the encoding coefficient
  • p' represents the part key point matrix
  • F M represents the Mth part expression basis matrix
  • n represents the number of expression bases in the part expression basis
  • w represents a set of weights corresponding to c
  • minmize represents The minimum value of .
  • the purpose of using the above encoding coefficient formula is to solve a set of coefficients c, which can enable FM to combine a BlenderShape that is infinitely close to mp’.
  • the tail of the formula is a regularization term
  • w is a set of weights corresponding to c.
  • the addition of the regularization term makes the sum of the coefficients sparse, reducing those BlenderShapes that have only a low correlation with mp’, and relatively enhancing those BlenderShapes with expression bases that are most relevant to the expression to be tested.
  • the above coding coefficient calculation formula means that the coding coefficient is taken as The value of c when it takes the minimum value.
  • the above-mentioned embodiment disclosed in the present application describes in detail a virtual character face driving method.
  • the above-mentioned method disclosed in the present application can be implemented by various forms of equipment. Therefore, the present application also discloses a virtual character face driving device corresponding to the above-mentioned method.
  • the specific embodiments are given below for detailed description.
  • FIG6 is a virtual character face driving device disclosed in an embodiment of the present application, mainly comprising:
  • the facial image acquisition module 610 is configured to acquire a facial image of a user.
  • the key point detection module 620 is configured to perform facial key point detection on the facial image to obtain the user's facial key points.
  • the smoothing processing module 630 is configured to perform smoothing processing on key points of the user's face.
  • the alignment module 640 is configured to align the smoothed user's facial key points with the standard facial key points using the least square method to obtain an aligned facial key point matrix.
  • the first division module 650 is configured to divide the aligned facial key point matrix according to facial parts to obtain a key point matrix for each part.
  • the second division module 660 is configured to divide each expression base according to the facial parts to obtain the expression base matrix of each part.
  • the coding coefficient calculation module 670 is configured to input the expression base matrix of each part into the corresponding part key point matrix for sparse coding, and calculate each coding coefficient.
  • the driving module 680 is configured to use various coding coefficients to drive the face of the virtual character.
  • the facial image acquisition module 610 is configured to acquire a video image of a user; and detect the video image using a face detection model to obtain a facial image of the user.
  • the smoothing processing module 630 is configured to use a vector synthesis method to smooth the key points of the user's face.
  • the alignment module 640 is configured to process the smoothed user facial key points using an affine transformation matrix to obtain transformed user facial key points; wherein the affine transformation matrix is calculated using a least squares method; anchor points are selected from standard facial key points; and the transformed user facial key points are aligned with the standard facial key points based on the anchor points.
  • the coding coefficient calculation module 670 is configured to use the expression base matrix of each part to sparsely represent the corresponding part key point matrix to obtain each coding coefficient.
  • the coding coefficient is calculated by the following formula:
  • c represents the encoding coefficient
  • p' represents the part key point matrix
  • F M represents the Mth part expression basis matrix
  • n represents the number of expression bases in the part expression basis
  • w represents a set of weights corresponding to c
  • minmize represents The minimum value of .
  • the key point detection module 620 is configured to perform facial key point detection on the facial image using MobileFaceNet.
  • Each module in the above device can be implemented in whole or in part by software, hardware and a combination thereof.
  • the above modules can be embedded in or independent of the processor in the terminal device in hardware form, or stored in the memory of the terminal device in software form, so that the processor can call and execute the operations corresponding to the above modules.
  • the terminal device 70 may be a computer device.
  • the terminal device 70 in the present application may include one or more of the following components: a processor 72, a memory 74, and one or more application programs, wherein the one or more application programs may be stored in the memory 74 and configured to be executed by the one or more processors 72, and the one or more application programs are configured to execute the method described in the above embodiment of the virtual character face driving method.
  • the processor 72 may include one or more processing cores.
  • the processor 72 uses various interfaces and lines to connect various parts of the entire terminal device 70, and executes various functions and processes data of the terminal device 70 by running or executing instructions, programs, code sets or instruction sets stored in the memory 74, and calling data stored in the memory 74.
  • the processor 72 can be implemented in at least one hardware form of digital signal processing (Digital Signal Processing, DSP), field programmable gate array (Field-Programmable Gate Array, FPGA), and programmable logic array (Programmable Logic Array, PLA).
  • the processor 72 can integrate one or a combination of a central processing unit (Central Processing Unit, CPU), a graphics processing unit (Graphics Processing Unit, GPU), and a modem.
  • CPU Central Processing Unit
  • GPU Graphics Processing Unit
  • modem modem
  • the CPU mainly processes the operating system, user interface, and application programs; the GPU is responsible for rendering and drawing display content; and the modem is used to process wireless communications. It is understandable that the above-mentioned modem may not be integrated into the processor 72, but may be implemented separately through a communication chip.
  • the memory 74 may include a random access memory (RAM) or a read-only memory (ROM).
  • the memory 74 may be used to store instructions, programs, codes, code sets or instruction sets.
  • the memory 74 may include a program storage area and a data storage area, wherein the program storage area may store instructions for implementing an operating system, instructions for implementing at least one function (such as a touch function, a sound playback function, an image playback function, etc.), instructions for implementing the following various method embodiments, etc.
  • the data storage area may also store data created by the terminal device 70 during use, etc.
  • FIG. 7 is merely a block diagram of a partial structure related to the scheme of the present application, and does not constitute a limitation on the terminal device to which the scheme of the present application is applied.
  • the specific terminal device may include more or fewer components than shown in the figure, or combine certain components, or have a different arrangement of components.
  • the terminal device provided in the embodiment of the present application is used to implement the corresponding virtual character face driving method in the aforementioned method embodiment, and has the beneficial effects of the corresponding method embodiment, which will not be repeated here.
  • FIG 8 shows a block diagram of a computer-readable storage medium provided in an embodiment of the present application.
  • the computer-readable storage medium 80 stores program code, which can be called by a processor to execute the method described in the above embodiment of the virtual character face driving method.
  • the computer-readable storage medium 80 may be an electronic memory such as a flash memory, an EEPROM (electrically erasable programmable read-only memory), an EPROM, a hard disk, or a ROM.
  • the computer-readable storage medium 80 comprises a non-transitory computer-readable storage medium.
  • the computer-readable storage medium 80 has storage space for program code 82 for executing any method steps in the above method. These program codes may be read from or written to one or more computer program products.
  • the program code 82 may be compressed, for example, in a suitable form.

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Multimedia (AREA)
  • Health & Medical Sciences (AREA)
  • Oral & Maxillofacial Surgery (AREA)
  • General Health & Medical Sciences (AREA)
  • Human Computer Interaction (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Processing Or Creating Images (AREA)

Abstract

一种虚拟人物面部驱动方法、装置、终端设备和可读存储介质,其中方法包括获取用户的面部图像,对面部图像进行人脸关键点检测,得到用户人脸关键点(S110);对用户人脸关键点进行平滑处理(S120);对平滑处理后的用户人脸关键点与标准人脸关键点采用最小二乘法进行对齐处理,得到对齐后的人脸关键点矩阵(S130);对对齐后的人脸关键点矩阵按照人脸部位进行划分,得到各部位关键点矩阵(S140);对各表情基按照人脸部位进行划分,得到各部位表情基矩阵(S150);将各部位表情基矩阵输入到对应的部位关键点矩阵进行稀疏编码,计算出各编码系数(S160);采用各编码系数来驱动虚拟人物面部(S170)。该方法可以使虚拟人物表情驱动的效果更加平滑和细腻,以及面部驱动效果更加逼真。

Description

虚拟人物面部驱动方法、装置、终端设备和可读存储介质
本申请要求于2022年11月07日提交中国专利局、申请号为202211381723.1、发明名称为“虚拟人物面部驱动方法、装置、终端设备和可读存储介质”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。
技术领域
本申请涉及人脸识别与驱动技术领域,具体涉及一种虚拟人物面部驱动方法、装置、终端设备和可读存储介质。
背景技术
随着通信方式不断地发展,信息的传输也越来越快;与此同时虚拟人物也在借此机会不断地突破创新。虚拟人物可以使得人与人之间在地理位置的距离在感知层面上逐渐的缩短,从而实现实时互动,因此面向客户端用户的虚拟人物的相关应用也愈发增多。目前,相关产品例如Kalidokit、ARkit都可以实现对虚拟人物进行驱动。
传统的虚拟人物驱动一般依赖于大型动捕设备,即需要多个摄像头或深度摄像头(例如ARkit)对人脸面部进行特征提取,比较难在低运算能力的平台(例如低配置PC、手机)进行广泛地应用,并保持较好的效果。并且目前虚拟人物大部分都是通过人脸面部关键点来驱动的,即通过计算待测人物关键点相对于正常表情下的关键点偏移,并基于关键点偏移和一些预设好的规则来驱动虚拟人物。然而,通过该驱动方法得到的虚拟人物表情不够细腻,从而导致显示效果不佳。
发明内容
有鉴于此,本申请实施例中提供了一种虚拟人物面部驱动方法、装置、终端设备和可读存储介质,以克服现有技术中驱动得到的虚拟人物表情不够细腻,从而导致显示效果不佳的问题。
第一方面,本申请实施例提供了一种虚拟人物面部驱动方法,该方法 包括:
获取用户的面部图像,对所述面部图像进行人脸关键点检测,得到用户人脸关键点;
对所述用户人脸关键点进行平滑处理;
对平滑处理后的用户人脸关键点与标准人脸关键点采用最小二乘法进行对齐处理,得到对齐后的人脸关键点矩阵;
对所述对齐后的人脸关键点矩阵按照人脸部位进行划分,得到各部位关键点矩阵;
对各表情基按照人脸部位进行划分,得到各部位表情基矩阵;
将各所述部位表情基矩阵输入到对应的所述部位关键点矩阵进行稀疏编码,计算出各编码系数;
采用各所述编码系数来驱动虚拟人物面部。
第二方面,本申请实施例提供了一种虚拟人物面部驱动装置,该装置包括:
面部图像获取模块,设置为获取用户的面部图像;
关键点检测模块,设置为对所述面部图像进行人脸关键点检测,得到用户人脸关键点;
平滑处理模块,设置为对所述用户人脸关键点进行平滑处理;
对齐模块,设置为对平滑处理后的用户人脸关键点与标准人脸关键点采用最小二乘法进行对齐处理,得到对齐后的人脸关键点矩阵;
第一划分模块,设置为对所述对齐后的人脸关键点矩阵按照人脸部位进行划分,得到各部位关键点矩阵;
第二划分模块,设置为对各表情基按照人脸部位进行划分,得到各部位表情基矩阵;
编码系数计算模块,设置为将各所述部位表情基矩阵输入到对应的所述部位关键点矩阵进行稀疏编码,计算出各编码系数;
驱动模块,设置为采用各所述编码系数来驱动虚拟人物面部。
第三方面,本申请实施例提供了一种终端设备,包括:存储器;一个或多个处理器,与所述存储器耦接;一个或多个应用程序,其中,一个或 多个应用程序被存储在存储器中并被配置为由一个或多个处理器执行,一个或多个应用程序配置用于执行上述第一方面提供的虚拟人物面部驱动方法。
第四方面,本申请实施例提供了一种计算机可读取存储介质,计算机可读取存储介质中存储有程序代码,程序代码可被处理器调用执行上述第一方面提供的虚拟人物面部驱动方法。
本申请实施例提供的虚拟人物面部驱动方法、装置、终端设备和可读存储介质,首先获取用户的面部图像,对面部图像进行人脸关键点检测,得到用户人脸关键点;然后对用户人脸关键点进行平滑处理;对平滑处理后的用户人脸关键点与标准人脸关键点采用最小二乘法进行对齐处理,得到对齐后的人脸关键点矩阵;再对对齐后的人脸关键点矩阵按照人脸部位进行划分,得到各部位关键点矩阵;对各表情基按照人脸部位进行划分,得到各部位表情基矩阵;将各部位表情基矩阵输入到对应的部位关键点矩阵进行稀疏编码,计算出各编码系数;最后采用各编码系数来驱动虚拟人物面部。
本申请实施例提供的虚拟人物面部驱动方法,对用户人脸关键点进行平滑处理,可以使得虚拟人物表情驱动的效果更加平滑和细腻,并且采用了基于最小二乘法的关键点对齐和稀疏编码方法,能提高计算精度和速度,使得虚拟人物的面部驱动效果更加逼真,进一步提高了显示效果。
附图说明
为了更清楚地说明本申请实施例或现有技术中的技术方案,下面将对实施例或现有技术描述中所需要使用的附图作简单地介绍,显而易见地,下面描述中的附图仅仅是本申请的实施例,对于本领域普通技术人员来讲,在不付出创造性劳动的前提下,还可以根据提供的附图获得其他的附图。
图1为本申请实施例提供的虚拟人物面部驱动方法的应用场景示意图;
图2为本申请一个实施例提供的虚拟人物面部驱动方法的流程示意图;
图3为本申请一个实施例提供的确定标准人脸关键点的示意图;
图4为本申请一个实施例提供的向量合成方法的结构示意图;
图5为本申请一个实施例提供的选取锚定点的结构示意图;
图6为本申请一个实施例中提供的虚拟人物面部驱动装置的结构示意图;
图7为本申请一个实施例中提供的终端设备的结构示意图;
图8为本申请一个实施例中提供的计算机可读存储介质的结构示意图。
具体实施方式
下面将对本申请实施例中的技术方案进行清楚、完整地描述,显然,所描述的实施例仅仅是本申请一部分实施例,而不是全部的实施例。基于本申请中的实施例,本领域普通技术人员在没有做出创造性劳动前提下所获得的所有其他实施例,都属于本申请保护的范围。
为了更详细说明本申请,下面结合附图对本申请提供的一种虚拟人物面部驱动方法、装置、终端设备和计算机可读存储介质,进行具体地描述。
请参考图1,图1示出了本申请实施例提供的虚拟人物面部驱动方法的应用场景的示意图,该应用场景包括本申请实施例提供的终端设备100,终端设备100可以是具有显示屏的各种电子设备(如102、104、106和108的结构图),包括但不限于智能手机和计算机设备,其中计算机设备可以是台式计算机、便携式计算机、膝上型计算机、平板电脑等设备中的至少一种。终端设备可以安装有摄像头或图像采集模块,主要用于采集用户的视频图像,其中用户的视频图像用于来识别图像识别出用户的面部图像,然后基于用户的面部图像来进行虚拟人物面部驱动。终端设备100可以泛指多个终端设备中的一个,本实施例仅以终端设备100来举例说明。本领域技术人员可以知晓,上述终端设备的数量可以更多或更少。比如上述终端设备可以仅为几个,或者上述终端设备为几十个或几百个,或者更多数量,本申请实施例对终端设备的数量和类型不加以限定。终端设备100可以用来执行本申请实施例中提供的一种虚拟人物面部驱动方法。
可选地,终端设备100可以包括视频采集模块、人脸检测模块、人脸关键点平滑对齐模块、基于稀疏编码的BlenderShape重组模块、BlenderShape顶点解析与存储模块和unity驱动模块,其中视频采集模块主 要用于采集用户的视频图像;人脸检测模块主要用于对用户的视频图像进行人脸检测,从而得到用户的面部图像;人脸关键点平滑对齐模块主要用于对用户人脸关键点进行平滑以及对齐处理;基于稀疏编码的BlenderShape重组模块主要用于对计算编码系数;BlenderShape顶点解析与存储模块主要用于存储标准人脸关键点,其中编码系数可以保存到共享内存中;unity驱动模块主要用于接收编码系数(即稀疏表示系数),实时将虚拟人物面部动画渲染到引擎上,从而达到真人表情实时驱动虚拟数据人面部表情的目的。
在一种可选的实施方式中,该应用场景包括本申请实施例提供的终端设备100之外,还可以包括服务器,其中服务器与终端设备之间设置有网络。网络用于在终端设备和服务器之间提供通信链路的介质。网络可以包括各种连接类型,例如有线、无线通信链路或者光纤电缆等等。
应该理解,终端设备、网络和服务器的数目仅仅是示意性的。根据实现需要,可以具有任意数目的终端设备、网络和服务器。比如服务器可以是多个服务器组成的服务器集群等。其中,终端设备通过网络与服务器交互,以接收或发送消息等。服务器可以是提供各种服务的服务器。其中服务器可以用来执行本申请实施例中提供的一种虚拟人物面部驱动方法中除了可视化显示虚拟人物之外的步骤。此外,终端设备在执行本申请实施例中提供的一种虚拟人物面部驱动方法时,可以将一部分步骤在终端设备执行,一部分步骤在服务器执行,在这里不进行限定。
基于此,本申请实施例中提供了一种虚拟人物面部驱动方法。请参阅图2,图2示出了本申请实施例提供的一种虚拟人物面部驱动方法的流程示意图,以该方法应用于图1中的终端设备为例进行说明,包括以下步骤:
步骤S110,获取用户的面部图像,对面部图像进行人脸关键点检测,得到用户人脸关键点。
其中,用户可以是任意想要驱动虚拟人物,以形成虚拟人面部表情的人。用户的面部图像是指包含用户面部信息的图片。
人脸关键点是指能反映人的轮廓以及表情的点,例如眉毛、鼻子、嘴巴、眼睛以及脸型轮廓的关键点。
在一个实施例中,在执行步骤S110,获取用户的面部图像,包括:获取用户的视频图像;对视频图像采用人脸检测模型进行检测,得到用户的面部图像。
具体地,可以采用终端设备的摄像头或其他外接的视频采集设备(例如摄像机)实时获取视频流,以得到用户的视频图像,然后对用户的视频图像采用人脸检测模型来进行检测,从而得到用户的面部图像。
当视频流中包括多帧图像时,需要对每帧图像进行人脸检测,如果在当前帧图像中没有检测出面部脸图像,则从上一帧的图像中检测面部图像。
可选地,人脸检测模型可以是S3FD模型。
在一个实施例中,在执行步骤S110,对面部图像进行人脸关键点检测,包括:对面部图像采用MobileFaceNet进行人脸关键点检测。
其中,MobileFaceNet是一种高效的卷积神经网络模型,精度高且运行速度非常快。
另外,经过人脸关键点检测后可以得到多个人脸关键点,例如26个、68个、468个以及1000个等。人脸关键点的数量可以是不固定的,具体根据实际使用场景以及识别精度的需求选择相应数量的人脸关键点。
步骤S120,对用户人脸关键点进行平滑处理。
步骤S130,对平滑处理后的用户人脸关键点与标准人脸关键点采用最小二乘法进行对齐处理,得到对齐后的人脸关键点矩阵。
具体而言,由于用户在做表情时候往往会导致人脸关键点发生变化,从而会导致关键点的抖动,该抖动最后会影响到新BlenderShape(即表情基)的生成,最终导致驱动虚拟人物过程中出现抖动的问题。并且每个用户以及同一个用户在不同时刻人脸的朝向以及表情都是不统一的,或者相对镜头会有一些偏移,这样都会导致摄像头捕捉到视频图像中包含的面部图像不一定是正脸,此时提取的人脸关键点难以与表情基中BlenderShape的人脸关键点(即标准人脸关键点)进行比较,该情况会在一定的程度上影响到虚拟人物面部驱动的效果。为了解决这一问题,在本实施例中先对用户人脸关键点进行平滑处理,之后再将平滑处理后的用户人脸关键点与标准人脸关键点进行对齐处理。
其中,标准人脸关键点是基于表情基(即BlenderShape)确定的。其具体的过程为:BlenderShape指的是52个表示着各个不同表情的基础人脸模型,而且这些基础人脸模型在制作时已在最大的程度上进行了解耦,每个BlenderShape只会表达面部的单一部位的变化。因此以这52个BenderShape也被称为表情基,而新的表情就是通过这52个表情基中的部分或全部表情基通过计算出来的相应系数进行加权得到的。而标准人脸关键点就是根据这52个BlenderShape来确定的,首先将52个BlenderShape都读到运行环境中,将BlenderShape上的所有点及其序号都可视化展示出来,然后52个BlenderShape上的点与人脸关键点(也就是用户驱动虚拟人物面部时所需用到的关键点,通常是68个)进行一一对应,并记录各关键点序号;之后解析出52个BlenderShape面上的顶点,并将这些顶点转换为人脸关键点(通常是68个)的形式,具体请参照图3所示。最后将从这52个BlenderShape中转换得到的52组68个关键点分组存储在不同的矩阵之中,记为表情基矩阵。
对平滑处理后的用户人脸关键点与标准人脸关键点进行对齐时,需要对平滑处理后的用户人脸关键点进行仿射变换矩阵变化,该仿射变换矩阵中的参数需要通过最小二乘法计算得到。
步骤S140,对对齐后的人脸关键点矩阵按照人脸部位进行划分,得到各部位关键点矩阵。
具体地,可以对对齐后的人脸关键点矩阵进行划分,分成人脸各个部位(例如嘴部,左右眼,左右眉毛,脸部轮廓)的关键点矩阵。
步骤S150,对各表情基按照人脸部位进行划分,得到各部位表情基矩阵。
相应的,在各表情基中BlenderShape的关键点中按照人脸部位进行同样的划分,之后针对面部的不同部位进行稀疏编码的计算,这种解耦的方式不仅可以使得计算出来的编码稀疏更加准确,而且还能大大地减少计算量加快系统运行的速度。
为了便于理解给出一个详细的实施例。在本实施例中以嘴部为例进行 详细的说明,面部的其他部位与嘴部的类似。嘴部的关键点有20个,因此将对齐后的人脸关键点矩阵中与嘴巴相关的关键点划分一起,形成嘴巴的关键点矩阵。对于表情基中BlenderShape的关键点中有N个与嘴巴相关的关键点,有M个与鼻子相关的关键点,因此将与嘴巴相关的N个BlenderShape的关键点划分一起,存储在同一个矩阵中,记为嘴巴表情基矩阵,其中嘴巴表情基矩阵的形状是(N,68*2);同样的将与鼻子相关的M个BlenderShape的关键点划分一起,存储在同一个矩阵中,记为鼻子表情基矩阵,其中鼻子表情基矩阵的形状是(M,68*2)。以此类推,左右眉毛、面部轮廓、左右眼睛等部位也是这样存储。之后可以直接将这些矩阵读到缓存中,无需重复计算。
步骤S160,将各部位表情基矩阵输入到对应的部位关键点矩阵进行稀疏编码,计算出各编码系数。
具体地,就是需要各部位表情基矩阵(例如嘴巴表情基矩阵)来对对应的部位关键点矩阵(嘴巴的关键点矩阵)进行稀疏表示,从而得到编码系数。
步骤S170,采用各编码系数来驱动虚拟人物面部。
在得到编码系数后,将编码系数作用到虚拟人物面部上,从而来驱动其显示出相应的表情。
本申请实施例提供的虚拟人物面部驱动方法,首先获取用户的面部图像,对面部图像进行人脸关键点检测,得到用户人脸关键点;然后对用户人脸关键点进行平滑处理;对平滑处理后的用户人脸关键点与标准人脸关键点采用最小二乘法进行对齐处理,得到对齐后的人脸关键点矩阵;再对对齐后的人脸关键点矩阵按照人脸部位进行划分,得到各部位关键点矩阵;对各表情基按照人脸部位进行划分,得到各部位表情基矩阵;将各部位表情基矩阵输入到对应的部位关键点矩阵进行稀疏编码,计算出各编码系数;最后采用各编码系数来驱动虚拟人物面部。
本申请实施例提供的虚拟人物面部驱动方法,对用户人脸关键点进行平滑处理,可以使得虚拟人物表情驱动的效果更加平滑和细腻,并且采用了基于最小二乘法的关键点对齐和稀疏编码方法,能提高计算精度和速度,使得虚拟人物的面部驱动效果更加逼真,进一步提高了显示效果。
在一个实施例中,在执行步骤S120,对用户人脸关键点进行平滑处理,包括:采用向量合成方法来对用户人脸关键点进行平滑处理。
具体地,可以采用向量合成方法来对用户人脸关键点进行平滑处理;其中向量合成的关键点平滑方法如图4所示,t-1,t和t+1表示在相邻的三帧的面部关键点,先计算得到t-1,t组成的向量并在这个向量上乘上一个系数λ,对的模进行缩小,然后将与第t+1帧的点的位置进行相加,让第t帧到t+1帧的运动会受到t-1帧到t帧运动惯性的影响,使得t-1到t+1的运动变化更加的平滑自然。
在一个实施例中,在执行步骤S130,对平滑处理后的用户人脸关键点与标准人脸关键点采用最小二乘法进行对齐处理,包括:对平滑处理后的用户人脸关键点采用仿射变换矩阵进行处理,得到变换后的用户人脸关键点;其中仿射变换矩阵采用最小二乘法计算得到;从标准人脸关键点中筛选出锚定点;基于锚定点将变换后的用户人脸关键点与标准人脸关键点进行对齐处理。
具体地,为了将平滑之后的用户人脸关键点与标准人脸关键点进行对齐,为之后的编码系数的计算做准备,可以通过一个仿射变换矩阵将一组的点的几何位置转化到目标的一组点上,具体公式如下:
其中,(x1,y1)是指采用仿射变换矩阵处理前的用户人脸关键点,(x2,y2)是指依次通过旋转,缩放和平移得到了目标点。
在本实施例中,可以采用最小二乘法的方式来计算得到这个仿射变换矩阵,主要是计算出矩阵的三个变量θ,s和t。θ表示旋转角度,它和三角 函数构成了一个旋转矩阵,s表示尺度缩放系数,t表示进行旋转缩放之后的偏移量。pi表示第i个要被变换的关键点,qi是第i个目标关键点,计算出来的θ,s和t要使得sRpi+T和qi是最接近的。公式(2)中详细的说明了这三个系数的计算方式。s、R和T分别对应着公式(1)中的尺度缩放系数、旋转矩阵和进行旋转缩放之后的偏移量。θ,s和t的所表示的含义也与公式(1)中的相同。其中公式(2)的具体表达式为:
其中,仿射变换矩阵可以记为M,然后将M与用户人脸关键点矩阵p相乘就能得到对齐之后的人脸关键点矩阵p’。p和p’的shape都是(68,2)。68表示每个人关键点矩阵有68个点,矩阵的第二个维度表示x,y轴上的坐标
p'=M×p。
另外,关键点对齐的目的是为了更加好的比较用户人脸关键点和标准人脸关键点(即BlenderShape上的关键点),因此在尽可能对齐的情况下,要尽量的保持用户的原有脸部特征。基于此,本实施例中采用了一些不会由于表情变化和面部偏移而产生大幅度变化的点作为锚定点,只要将用户人脸和标准人脸中的锚定点进行对齐就能确保对齐的情况下还保持着用户人脸的面部特征。锚定点的个数可以是多个,锚定点如图5中矩形框出的点所示。
在一个实施例中,在执行步骤S160,将各部位表情基矩阵输入到对应的部位关键点矩阵进行稀疏编码,计算出各编码系数,包括:采用各部位表情基矩阵来对对应的部位关键点矩阵进行稀疏表示,以得到各编码系数。
为了便于理解,以嘴巴部位为例进行说明。假设人脸关键点矩阵为p’,从p’中切分出属于人脸每个部位(嘴部,左右眼,左右眉毛,脸部轮廓) 的关键点矩阵,其中mp’表示为嘴部关键点矩阵。可选地,嘴部关键点有20个,转换成向量也只有40个特征。而嘴巴表情基BlenderShape有15个,那么组成的嘴部表情基矩阵FM的shape是(15,40)。可以采用FM对进行mp’进行稀疏表示,为了使用FM对进行mp’稀疏表示,需要先将mp’的shape转化为(1,40)。由于在进行BlenderShape表情基制作的时候就已经将每个表情基进行了解耦的操作,所以无需用大量的数据去学习一个基向量组,只需要在每次输入待测mp’的时候基于嘴部表情基矩阵FM计算出一组编码系数c即可,用c对FM中的向量组进行加权计算,得到mp’的系数表示,生成一个与待测人脸嘴型相似的BlenderSshape,进而来实时的驱动人脸。
在一个实施例中,编码系数通过以下公式计算得到:
其中,c表示编码系数,p'表示部位关键点矩阵,FM表示第M个部位表情基矩阵,n表示部位表情基中表情基的数量,w表示与c对应的一组权重,minmize表示的最小值。
采用上述编码系数的公式目的是为了求解一组系数c,c可以使得FM能组合出无限接近mp’的BlenderShape,公式的尾部是一个正则项,w是与c对应的一组权重,加上正则项目的是使得系数之和变得稀疏,减小那些与mp’只有较低相关性的BlenderShape,相对就增强那些与待测表情最相关的表情基的BlenderShape。
另外,上述编码系数计算公式表示的含义是:编码系数的取值为当 取最小值时c的值。
应该理解的是,虽然图2的流程图中的各个步骤按照箭头的指示依次显示,但是这些步骤并不是必然按照箭头指示的顺序依次执行。除非本文中有明确的说明,这些步骤的执行并没有严格的顺序限制,这些步骤可以以其它的顺序执行。而且图2中的至少一部分步骤可以包括多个子步骤或者多个阶段,这些子步骤或者阶段并不必然是在同一时刻执行完成,而是可以在不同的时刻执行,这些子步骤或者阶段的执行顺序也不必然是依次进行,而是可以与其它步骤或者其它步骤的子步骤或者阶段的至少一部分轮流或者交替地执行。
上述本申请公开的实施例中详细描述了一种虚拟人物面部驱动方法,对于本申请公开的上述方法可采用多种形式的设备实现,因此本申请还公开了对应上述方法的虚拟人物面部驱动装置,下面给出具体的实施例进行详细说明。
请参阅图6,为本申请实施例公开的一种虚拟人物面部驱动装置,主要包括:
面部图像获取模块610,设置为获取用户的面部图像。
关键点检测模块620,设置为对面部图像进行人脸关键点检测,得到用户人脸关键点。
平滑处理模块630,设置为对用户人脸关键点进行平滑处理。
对齐模块640,设置为对平滑处理后的用户人脸关键点与标准人脸关键点采用最小二乘法进行对齐处理,得到对齐后的人脸关键点矩阵。
第一划分模块650,设置为对对齐后的人脸关键点矩阵按照人脸部位进行划分,得到各部位关键点矩阵。
第二划分模块660,设置为对各表情基按照人脸部位进行划分,得到各部位表情基矩阵。
编码系数计算模块670,设置为将各部位表情基矩阵输入到对应的部位关键点矩阵进行稀疏编码,计算出各编码系数。
驱动模块680,设置为采用各编码系数来驱动虚拟人物面部。
在一个实施例中,面部图像获取模块610,设置为获取用户的视频图像;对视频图像采用人脸检测模型进行检测,得到用户的面部图像。
在一个实施例中,平滑处理模块630,设置为采用向量合成方法来对用户人脸关键点进行平滑处理。
在一个实施例中,对齐模块640,设置为对平滑处理后的用户人脸关键点采用仿射变换矩阵进行处理,得到变换后的用户人脸关键点;其中仿射变换矩阵采用最小二乘法计算得到;从标准人脸关键点中筛选出锚定点;基于锚定点将变换后的用户人脸关键点与标准人脸关键点进行对齐处理。
在一个实施例中,编码系数计算模块670,设置为采用各部位表情基矩阵来对对应的部位关键点矩阵进行稀疏表示,以得到各编码系数。
在一个实施例中,编码系数通过以下公式计算得到:
其中,c表示编码系数,p'表示部位关键点矩阵,FM表示第M个部位表情基矩阵,n表示部位表情基中表情基的数量,w表示与c对应的一组权重,minmize表示的最小值。
在一个实施例中,关键点检测模块620,设置为对面部图像采用MobileFaceNet进行人脸关键点检测。
关于虚拟人物面部驱动装置的具体限定可以参见上文中对于方法的限定,在此不再赘述。上述装置中的各个模块可全部或部分通过软件、硬件及其组合来实现。上述各模块可以硬件形式内嵌于或独立于终端设备中的处理器中,也可以以软件形式存储于终端设备中的存储器中,以便于处理器调用执行以上各个模块对应的操作。
请参考图7,图7其示出了本申请实施例提供的一种终端设备的结构框图。该终端设备70可以是计算机设备。本申请中的终端设备70可以包括一个或多个如下部件:处理器72、存储器74以及一个或多个应用程序,其中一个或多个应用程序可以被存储在存储器74中并被配置为由一个或多个处理器72执行,一个或多个应用程序配置用于执行上述应用于虚拟人物面部驱动方法实施例中所描述的方法。
处理器72可以包括一个或者多个处理核。处理器72利用各种接口和线路连接整个终端设备70内的各个部分,通过运行或执行存储在存储器74内的指令、程序、代码集或指令集,以及调用存储在存储器74内的数据,执行终端设备70的各种功能和处理数据。可选地,处理器72可以采用数字信号处理(Digital Signal Processing,DSP)、现场可编程门阵列(Field-Programmable Gate Array,FPGA)、可编程逻辑阵列(Programmable Logic Array,PLA)中的至少一种硬件形式来实现。处理器72可集成中央处理器(Central Processing Unit,CPU)、图形处理器(Graphics Processing Unit,GPU)和调制解调器等中的一种或几种的组合。其中,CPU主要处理操作系统、用户界面和应用程序等;GPU用于负责显示内容的渲染和绘制;调制解调器用于处理无线通信。可以理解的是,上述调制解调器也可以不集成到处理器72中,单独通过一块通信芯片进行实现。
存储器74可以包括随机存储器(Random Access Memory,RAM),也可以包括只读存储器(Read-Only Memory)。存储器74可用于存储指令、程序、代码、代码集或指令集。存储器74可包括存储程序区和存储数据区,其中,存储程序区可存储用于实现操作系统的指令、用于实现至少一个功能的指令(比如触控功能、声音播放功能、图像播放功能等)、用于实现下述各个方法实施例的指令等。存储数据区还可以存储终端设备70在使用中所创建的数据等。
本领域技术人员可以理解,图7中示出的结构,仅仅是与本申请方案相关的部分结构的框图,并不构成对本申请方案所应用于其上的终端设备的限定,具体的终端设备可以包括比图中所示更多或更少的部件,或者组合某些部件,或者具有不同的部件布置。
综上,本申请实施例提供的终端设备用于实现前述方法实施例中相应的虚拟人物面部驱动方法,并具有相应的方法实施例的有益效果,在此不再赘述。
请参阅图8,其示出了本申请实施例提供的一种计算机可读取存储介质的结构框图。该计算机可读取存储介质80中存储有程序代码,程序代码可被处理器调用执行上述虚拟人物面部驱动方法实施例中所描述的方法。
计算机可读取存储介质80可以是诸如闪存、EEPROM(电可擦除可编程只读存储器)、EPROM、硬盘或者ROM之类的电子存储器。可选地,计算机可读取存储介质80包括非瞬时性计算机可读介质(non-transitory computer-readable storage medium)。计算机可读取存储介质80具有执行上述方法中的任何方法步骤的程序代码82的存储空间。这些程序代码可以从一个或者多个计算机程序产品中读出或者写入到这一个或者多个计算机程序产品中。程序代码82可以例如以适当形式进行压缩。
在本说明书的描述中,参考术语“一个实施例”、“一些实施例”、“示例”、“具体示例”、或“一些示例”等的描述意指结合该实施例或示例描述的具体特征、结构、材料或者特点包含于本申请的至少一个实施例或示例中。在本说明书中,对上述术语的示意性表述不必须针对的是相同的实施例或示例。而且,描述的具体特征、结构、材料或者特点可以在任一个或多个实施例或示例中以合适的方式结合。此外,在不相互矛盾的情况下,本领域的技术人员可以将本说明书中描述的不同实施例或示例以及不同实施例或示例的特征进行结合和组合。
对所公开的实施例的上述说明,使本领域专业技术人员能够实现或使用本申请。对这些实施例的多种修改对本领域的专业技术人员来说将是显而易见的,本文中所定义的一般原理可以在不脱离本申请的精神或范围的情况下,在其它实施例中实现。因此,本申请将不会被限制于本文所示的这些实施例,而是要符合与本文所公开的原理和新颖特点相一致的最宽的范围。

Claims (10)

  1. 一种虚拟人物面部驱动方法,其特征在于,所述方法包括:
    获取用户的面部图像,对所述面部图像进行人脸关键点检测,得到用户人脸关键点;
    对所述用户人脸关键点进行平滑处理;
    对平滑处理后的用户人脸关键点与标准人脸关键点采用最小二乘法进行对齐处理,得到对齐后的人脸关键点矩阵;
    对所述对齐后的人脸关键点矩阵按照人脸部位进行划分,得到各部位关键点矩阵;
    对各表情基按照人脸部位进行划分,得到各部位表情基矩阵;
    将各所述部位表情基矩阵输入到对应的所述部位关键点矩阵进行稀疏编码,计算出各编码系数;
    采用各所述编码系数来驱动虚拟人物面部。
  2. 根据权利要求1所述的方法,其特征在于,所述获取用户的面部图像,包括:
    获取用户的视频图像;
    对所述视频图像采用人脸检测模型进行检测,得到用户的面部图像。
  3. 根据权利要求1所述的方法,其特征在于,所述对所述用户人脸关键点进行平滑处理,包括:
    采用向量合成方法来对所述用户人脸关键点进行平滑处理。
  4. 根据权利要求1-3任一项所述的方法,其特征在于,所述对平滑处理后的用户人脸关键点与标准人脸关键点采用最小二乘法进行对齐处理,包括:
    对所述平滑处理后的用户人脸关键点采用仿射变换矩阵进行处理,得到变换后的用户人脸关键点;其中所述仿射变换矩阵采用所述最小二乘法计算得到;
    从所述标准人脸关键点中筛选出锚定点;
    基于所述锚定点将所述变换后的用户人脸关键点与标准人脸关键点进行对齐处理。
  5. 根据权利要求1-3任一项所述的方法,其特征在于,所述将各所述部位表情基矩阵输入到对应的所述部位关键点矩阵进行稀疏编码,计算出各编码系数,包括:
    采用各所述部位表情基矩阵来对对应的部位关键点矩阵进行稀疏表示,以得到各所述编码系数。
  6. 根据权利要求5所述的方法,其特征在于,所述编码系数通过以下公式计算得到:
    其中,c表示编码系数,p'表示部位关键点矩阵,FM表示第M个部位表情基矩阵,n表示部位表情基中表情基的数量,w表示与c对应的一组权重,minmize表示的最小值。
  7. 根据权利要求1-3任一项所述的方法,其特征在于,所述对所述面部图像进行人脸关键点检测,包括:
    对所述面部图像采用MobileFaceNet进行人脸关键点检测。
  8. 一种虚拟人物面部驱动装置,其特征在于,所述装置包括:
    面部图像获取模块,设置为获取用户的面部图像;
    关键点检测模块,设置为对所述面部图像进行人脸关键点检测,得到用户人脸关键点;
    平滑处理模块,设置为对所述用户人脸关键点进行平滑处理;
    对齐模块,设置为对平滑处理后的用户人脸关键点与标准人脸关键点采用最小二乘法进行对齐处理,得到对齐后的人脸关键点矩阵;
    第一划分模块,设置为对所述对齐后的人脸关键点矩阵按照人脸部位进行划分,得到各部位关键点矩阵;
    第二划分模块,设置为对各表情基按照人脸部位进行划分,得到各部位表情基矩阵;
    编码系数计算模块,设置为将各所述部位表情基矩阵输入到对应的所述部位关键点矩阵进行稀疏编码,计算出各编码系数;
    驱动模块,设置为采用各所述编码系数来驱动虚拟人物面部。
  9. 一种终端设备,其特征在于,包括:
    存储器;一个或多个处理器,与所述存储器耦接;一个或多个应用程序,其中,一个或多个应用程序被存储在存储器中并被配置为由一个或多个处理器执行,一个或多个应用程序配置用于执行如权利要求1-7任一项所述的方法。
  10. 一种计算机可读存储介质,其特征在于,所述计算机可读取存储介质中存储有程序代码,所述程序代码可被处理器调用执行如权利要求1-7任一项所述的方法。
PCT/CN2023/092007 2022-11-07 2023-05-04 虚拟人物面部驱动方法、装置、终端设备和可读存储介质 WO2024098685A1 (zh)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202211381723.1A CN115601484B (zh) 2022-11-07 2022-11-07 虚拟人物面部驱动方法、装置、终端设备和可读存储介质
CN202211381723.1 2022-11-07

Publications (1)

Publication Number Publication Date
WO2024098685A1 true WO2024098685A1 (zh) 2024-05-16

Family

ID=84853610

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2023/092007 WO2024098685A1 (zh) 2022-11-07 2023-05-04 虚拟人物面部驱动方法、装置、终端设备和可读存储介质

Country Status (2)

Country Link
CN (1) CN115601484B (zh)
WO (1) WO2024098685A1 (zh)

Families Citing this family (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115601484B (zh) * 2022-11-07 2023-03-28 广州趣丸网络科技有限公司 虚拟人物面部驱动方法、装置、终端设备和可读存储介质
CN116612512A (zh) * 2023-02-02 2023-08-18 北京甲板智慧科技有限公司 基于单目rgb相机的人脸表情图像处理方法和装置
CN116977515B (zh) * 2023-08-08 2024-03-15 广东明星创意动画有限公司 一种虚拟人物表情驱动方法

Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108062783A (zh) * 2018-01-12 2018-05-22 北京蜜枝科技有限公司 面部动画映射系统及方法
CN108960201A (zh) * 2018-08-01 2018-12-07 西南石油大学 一种基于人脸关键点提取和稀疏表达分类的表情识别方法
CN109087379A (zh) * 2018-08-09 2018-12-25 北京华捷艾米科技有限公司 人脸表情的迁移方法和人脸表情的迁移装置
CN112733667A (zh) * 2020-12-30 2021-04-30 平安科技(深圳)有限公司 基于人脸识别的人脸对齐方法及装置
CN113192162A (zh) * 2021-04-22 2021-07-30 清华珠三角研究院 语音驱动图像的方法、系统、装置及存储介质
CN113192164A (zh) * 2021-05-12 2021-07-30 广州虎牙科技有限公司 虚拟形象随动控制方法、装置、电子设备和可读存储介质
CN113643412A (zh) * 2021-07-14 2021-11-12 北京百度网讯科技有限公司 虚拟形象的生成方法、装置、电子设备及存储介质
CN114821734A (zh) * 2022-05-13 2022-07-29 北京沃东天骏信息技术有限公司 一种驱动虚拟人物表情的方法和装置
CN115601484A (zh) * 2022-11-07 2023-01-13 广州趣丸网络科技有限公司(Cn) 虚拟人物面部驱动方法、装置、终端设备和可读存储介质

Family Cites Families (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108520215B (zh) * 2018-03-28 2022-10-11 电子科技大学 基于多尺度联合特征编码器的单样本人脸识别方法
CN109727303B (zh) * 2018-12-29 2023-07-25 广州方硅信息技术有限公司 视频展示方法、系统、计算机设备、存储介质和终端
US11189020B2 (en) * 2019-02-06 2021-11-30 Thanh Phuoc Hong Systems and methods for keypoint detection
CN110141857A (zh) * 2019-04-26 2019-08-20 腾讯科技(深圳)有限公司 虚拟角色的面部显示方法、装置、设备及存储介质
CN110111247B (zh) * 2019-05-15 2022-06-24 浙江商汤科技开发有限公司 人脸变形处理方法、装置及设备
CN110826534B (zh) * 2019-11-30 2022-04-05 杭州小影创新科技股份有限公司 一种基于局部主成分分析的人脸关键点检测方法及系统
CN112308949A (zh) * 2020-06-29 2021-02-02 北京京东尚科信息技术有限公司 模型训练、人脸图像生成方法和装置以及存储介质
CN113869234B (zh) * 2021-09-29 2024-05-28 中国平安财产保险股份有限公司 人脸表情识别方法、装置、设备及存储介质

Patent Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108062783A (zh) * 2018-01-12 2018-05-22 北京蜜枝科技有限公司 面部动画映射系统及方法
CN108960201A (zh) * 2018-08-01 2018-12-07 西南石油大学 一种基于人脸关键点提取和稀疏表达分类的表情识别方法
CN109087379A (zh) * 2018-08-09 2018-12-25 北京华捷艾米科技有限公司 人脸表情的迁移方法和人脸表情的迁移装置
CN112733667A (zh) * 2020-12-30 2021-04-30 平安科技(深圳)有限公司 基于人脸识别的人脸对齐方法及装置
CN113192162A (zh) * 2021-04-22 2021-07-30 清华珠三角研究院 语音驱动图像的方法、系统、装置及存储介质
CN113192164A (zh) * 2021-05-12 2021-07-30 广州虎牙科技有限公司 虚拟形象随动控制方法、装置、电子设备和可读存储介质
CN113643412A (zh) * 2021-07-14 2021-11-12 北京百度网讯科技有限公司 虚拟形象的生成方法、装置、电子设备及存储介质
CN114821734A (zh) * 2022-05-13 2022-07-29 北京沃东天骏信息技术有限公司 一种驱动虚拟人物表情的方法和装置
CN115601484A (zh) * 2022-11-07 2023-01-13 广州趣丸网络科技有限公司(Cn) 虚拟人物面部驱动方法、装置、终端设备和可读存储介质

Also Published As

Publication number Publication date
CN115601484B (zh) 2023-03-28
CN115601484A (zh) 2023-01-13

Similar Documents

Publication Publication Date Title
WO2024098685A1 (zh) 虚拟人物面部驱动方法、装置、终端设备和可读存储介质
EP3992919B1 (en) Three-dimensional facial model generation method and apparatus, device, and medium
CN111971713A (zh) 使用图像和时间跟踪神经网络进行的3d面部捕获和修改
WO2022012085A1 (zh) 人脸图像处理方法、装置、存储介质及电子设备
CN113327278B (zh) 三维人脸重建方法、装置、设备以及存储介质
JP7268071B2 (ja) バーチャルアバターの生成方法及び生成装置
CN112288665A (zh) 图像融合的方法、装置、存储介质及电子设备
EP3991140A1 (en) Portrait editing and synthesis
CN109754464B (zh) 用于生成信息的方法和装置
US20220358675A1 (en) Method for training model, method for processing video, device and storage medium
CN115147265B (zh) 虚拟形象生成方法、装置、电子设备和存储介质
CN111047509A (zh) 一种图像特效处理方法、装置及终端
CN111583381B (zh) 游戏资源图的渲染方法、装置及电子设备
EP4345770A1 (en) Information processing method and apparatus, computer device, and storage medium
CN112766027A (zh) 图像处理方法、装置、设备及存储介质
CN114202615A (zh) 人脸表情的重建方法、装置、设备和存储介质
CN116342782A (zh) 生成虚拟形象渲染模型的方法和装置
TWI711004B (zh) 圖片處理方法和裝置
US20240013464A1 (en) Multimodal disentanglement for generating virtual human avatars
CN115359166B (zh) 一种图像生成方法、装置、电子设备和介质
CN113313631B (zh) 图像渲染方法和装置
CN115330980A (zh) 表情迁移方法、装置、电子设备及存储介质
Tran et al. VOODOO 3D: Volumetric Portrait Disentanglement for One-Shot 3D Head Reenactment
JP3910811B2 (ja) テクスチャマッピング方法、テクスチャマッピング処理プログラム及びそのプログラムを記録したコンピュータ読取り可能な記録媒体
CN114820908B (zh) 虚拟形象生成方法、装置、电子设备和存储介质