CN115937373A

CN115937373A - Avatar driving method, apparatus, device, and storage medium

Info

Publication number: CN115937373A
Application number: CN202211670321.3A
Authority: CN
Inventors: 张世昌; 赵亚飞; 郭紫垣; 范锡睿; 孙权; 张伟伟; 刘倩
Original assignee: Beijing Baidu Netcom Science and Technology Co Ltd
Current assignee: Beijing Baidu Netcom Science and Technology Co Ltd
Priority date: 2022-12-23
Filing date: 2022-12-23
Publication date: 2023-04-07
Anticipated expiration: 2042-12-23
Also published as: CN115937373B

Abstract

The present disclosure provides a method, an apparatus, a device, and a storage medium for driving an avatar, and relates to the technical field of artificial intelligence, and in particular to the technical fields of virtual human, metas, augmented reality, virtual reality, mixed reality, augmented reality, and the like. The specific implementation scheme is as follows: receiving an input data stream, wherein the input data stream comprises a plurality of time sequence frames, any time sequence frame is associated with a face gesture, the face gesture comprises a first part gesture and a second part gesture, and for any target time sequence frame in the plurality of time sequence frames, a first gesture transformation coefficient is determined according to an initial reference gesture and the first part gesture; updating the initial reference attitude according to the first attitude transformation coefficient to obtain an updated reference attitude; determining a second attitude transformation coefficient according to the updated reference attitude and the second part attitude; and driving the avatar according to the second attitude transformation coefficient.

Description

Avatar driving method, device, apparatus and storage medium

Technical Field

The present disclosure relates to the field of artificial intelligence technologies, and in particular, to the field of virtual humans, metas, augmented reality, virtual reality, mixed reality, augmented reality, and the like, and in particular, to a method, an apparatus, a device, and a storage medium for driving an avatar.

Background

With the development of computer technology and internet technology, various functional services in the aspects of life, entertainment and the like can be provided through the virtual image. For example, some avatars provide visual display services, and how to make the facial expressions made by the avatars more realistic is a problem that needs to be solved.

Disclosure of Invention

The present disclosure provides an avatar driving method, apparatus, device, and storage medium.

According to an aspect of the present disclosure, there is provided an avatar driving method including: receiving an input data stream, wherein the input data stream comprises a plurality of time-series frames, any one of the time-series frames is associated with a facial pose, the facial pose comprises a first part pose and a second part pose, the second part pose comprises a plurality of second sub-part poses, the correlation between any plurality of first part poses is smaller than the correlation between any plurality of second sub-part poses, and a first pose transformation coefficient is determined according to an initial reference pose and the first part pose for any one target time-series frame in the plurality of time-series frames; updating the initial reference attitude according to the first attitude transformation coefficient to obtain an updated reference attitude; determining a second attitude transformation coefficient according to the updated reference attitude and the second part attitude; and driving the avatar according to the second pose transform coefficient.

According to another aspect of the present disclosure, there is provided an avatar driving apparatus including: an input data stream receiving module for receiving an input data stream, wherein the input data stream comprises a plurality of time-series frames, any one of the time-series frames is associated with a facial pose, the facial pose comprises a first part pose and a second part pose, the second part pose comprises a plurality of second sub-part poses, and a correlation between any plurality of first part poses is less than a correlation between a plurality of second sub-part poses; the first attitude transformation coefficient determining module is used for determining a first attitude transformation coefficient according to the initial reference attitude and the first part attitude aiming at any one target time sequence frame in a plurality of time sequence frames; the updating reference attitude determination module is used for updating the initial reference attitude according to the first attitude transformation coefficient to obtain an updating reference attitude; the second posture transformation coefficient determining module is used for determining a second posture transformation coefficient according to the updated reference posture and the second part posture; and the virtual image driving module is used for driving the virtual image according to the second posture transformation coefficient.

According to another aspect of the present disclosure, there is provided an electronic device including: at least one processor and a memory communicatively coupled to the at least one processor. Wherein the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the method of the disclosed embodiments.

According to another aspect of the present disclosure, there is provided a non-transitory computer readable storage medium storing computer instructions for causing a computer to perform the method of the embodiments of the present disclosure.

It should be understood that the statements in this section do not necessarily identify key or critical features of the embodiments of the present disclosure, nor do they limit the scope of the present disclosure. Other features of the present disclosure will become apparent from the following description.

Drawings

The drawings are included to provide a better understanding of the present solution and are not to be construed as limiting the present disclosure. Wherein:

FIG. 1 schematically illustrates a system architecture diagram of an avatar driving method and apparatus according to an embodiment of the present disclosure;

fig. 2A schematically illustrates a flowchart of an avatar driving method according to an embodiment of the present disclosure;

FIG. 2B schematically illustrates a view of the chin-lip area pose;

fig. 3 schematically illustrates a schematic diagram of an avatar driving method according to another embodiment of the present disclosure;

fig. 4 is a diagram schematically illustrating determination of a third pose transformation coefficient of an avatar driving method according to still another embodiment of the present disclosure;

fig. 5 schematically illustrates a block diagram of an avatar driving apparatus according to still another embodiment of the present disclosure; and

fig. 6 schematically shows a block diagram of an electronic device that can implement the avatar driving method of the embodiment of the present disclosure.

Detailed Description

Exemplary embodiments of the present disclosure are described below with reference to the accompanying drawings, in which various details of the embodiments of the disclosure are included to assist understanding, and which are to be considered as merely exemplary. Accordingly, those of ordinary skill in the art will recognize that various changes and modifications of the embodiments described herein can be made without departing from the scope and spirit of the present disclosure. Also, descriptions of well-known functions and constructions are omitted in the following description for clarity and conciseness.

The terminology used herein is for the purpose of describing particular embodiments only and is not intended to be limiting of the disclosure. The terms "comprises," "comprising," and the like, as used herein, specify the presence of stated features, steps, operations, and/or components, but do not preclude the presence or addition of one or more other features, steps, operations, or components.

All terms (including technical and scientific terms) used herein have the same meaning as commonly understood by one of ordinary skill in the art unless otherwise defined. It is noted that the terms used herein should be interpreted as having a meaning that is consistent with the context of this specification and should not be interpreted in an idealized or overly formal sense.

Where a convention analogous to "at least one of A, B, and C, etc." is used, in general such a construction is intended in the sense one having skill in the art would understand the convention (e.g., "a system having at least one of A, B, and C" would include but not be limited to systems that have A alone, B alone, C alone, A and B together, A and C together, B and C together, and/or A, B, and C together, etc.).

Fig. 1 schematically illustrates a system architecture of an avatar driving method and apparatus according to an embodiment of the present disclosure. It should be noted that fig. 1 is only an example of a system architecture to which the embodiments of the present disclosure may be applied to help those skilled in the art understand the technical content of the present disclosure, and does not mean that the embodiments of the present disclosure may not be applied to other devices, systems, environments or scenarios.

As shown in fig. 1, a system architecture 100 according to this embodiment may include

clients

101, 102, 103, a network 104, and a server 105. Network 104 is the medium used to provide communication links between

clients

101, 102, 103 and server 105. Network 104 may include various connection types, such as wired, wireless communication links, or fiber optic cables, to name a few.

A user may use

clients

101, 102, 103 to interact with server 105 over network 104 to receive or send messages, etc. Various messaging client applications, such as shopping-like applications, web browser applications, search-like applications, instant messaging tools, mailbox clients, social platform software, etc. (examples only) may be installed on the

clients

101, 102, 103.

Clients

101, 102, 103 may be a variety of electronic devices having display screens and supporting web browsing, including but not limited to smart phones, tablets, laptop and desktop computers, and the like. The

clients

101, 102, 103 of the disclosed embodiments may run applications, for example.

The server 105 may be a server that provides various services, such as a back-office management server (for example only) that provides support for websites browsed by users using the

clients

101, 102, 103. The background management server may analyze and perform other processing on the received data such as the user request, and feed back a processing result (e.g., a webpage, information, or data obtained or generated according to the user request) to the client. In addition, the server 105 may also be a cloud server, i.e., the server 105 has a cloud computing function.

It should be noted that the avatar driving method provided by the embodiment of the present disclosure may be executed by the server 105. Accordingly, the avatar driving apparatus provided by the embodiment of the present disclosure may be provided in the server 105. The avatar driving method provided by the embodiments of the present disclosure may also be performed by a server or server cluster that is different from the server 105 and is capable of communicating with the

clients

101, 102, 103 and/or the server 105. Accordingly, the avatar driving apparatus provided in the embodiment of the present disclosure may also be provided in a server or a server cluster different from the server 105 and capable of communicating with the

clients

101, 102, 103 and/or the server 105.

In one example, the server 105 may obtain input data streams from the

clients

101, 102, 103 over the network 104 and perform avatar driving based on the input data streams.

It should be understood that the number of clients, networks, and servers in FIG. 1 is merely illustrative. There may be any number of clients, networks, and servers, as desired for an implementation.

It should be noted that in the technical solution of the present disclosure, the processes of collecting, storing, using, processing, transmitting, providing, disclosing and the like of the personal information of the user are all in accordance with the regulations of the relevant laws and regulations, and do not violate the customs of the public order.

In the technical scheme of the disclosure, before the personal information of the user is acquired or collected, the authorization or the consent of the user is acquired.

An avatar driving method according to an exemplary embodiment of the present disclosure is described below with reference to fig. 2A to 5 in conjunction with the system architecture of fig. 1. The avatar driving method of the embodiment of the present disclosure may be performed by the server 105 shown in fig. 1, for example.

Fig. 2A schematically illustrates a flowchart of an avatar driving method according to an embodiment of the present disclosure.

As shown in fig. 2A, the avatar driving method 200 of the embodiment of the present disclosure may include, for example, operations S210 to S250.

In operation S210, an input data stream is received.

The input data stream includes a plurality of time-series frames, any one of the time-series frames being associated with a facial pose, the facial pose including a first part pose and a second part pose, the second part pose including a plurality of second sub-part poses, a correlation between any of the plurality of first part poses being less than a correlation between the plurality of second sub-part poses.

The correlation between any plurality of first part poses being less than the correlation between a plurality of second sub-part poses can be understood as the correlation between any plurality of first part poses being less than the correlation between a plurality of second sub-part poses associated with any one second part pose.

Illustratively, the input data stream may be obtained by scanning a continuous three-dimensional facial expression in a time dimension on a certain real person object based on a camera array, and performing regularization on a scanned three-dimensional facial area, so that the input data stream may embody a time sequence frame in the time dimension, and may also embody primitive vertex positions displayed by the target time sequence frame for any one target time sequence frame, where the primitive vertex positions represent positions of primitive vertices, the primitive vertices may be understood as vertices of primitives (graphics elements), and may be combined into primitives by geometric vertices, the primitives may include points, line segments, polygons, and the like, for example, and a target object displayed by each image may be composed of multiple primitives, and the primitive vertex positions are three-dimensional data including x, y, and z coordinates, for example.

For example, the first part posture and the second part posture may be obtained by part-dividing a face in advance, for example.

The first part pose may also include one, in which case the first part pose may be considered more independent than the plurality of second sub-part poses of the second part pose.

Illustratively, the eye region, nose region, and chin region of the face may be taken as the first part region, and the first part pose displayed for any one of the target time-series frames corresponds to the pose of the first part region. The first part pose may include, for example, an eye region pose, a nose region pose, and a chin region pose.

Illustratively, for example, the facial pose displayed for any one target time-series frame may also be displayed with, for example, the chin-teeth area, the chin-lips area, and the eyes-eyebrows area as the second area, and the second part pose corresponds to the pose of the second part area. The second part postures may include, for example, a chin-teeth area posture, a chin-lips area posture, and an eye-eyebrow area posture.

For a second site pose, a jaw-tooth region pose, the corresponding second sub-portion pose comprises a jaw sub-region pose and a tooth sub-region pose. For a second part pose, a chin-lip region pose, the corresponding second sub-portion pose comprises a chin sub-region pose and a lip sub-region pose. For a second part pose, eye-eyebrow region pose, the corresponding second sub-part pose comprises an eye sub-region pose and an eyebrow sub-region pose.

As shown in fig. 2B, a number of specific examples of chin-lip region poses are schematically illustrated.

It should be noted that the correlation between any plurality of first part poses being less than the correlation between the second sub-part poses associated with any one of the second part poses can be understood as the degree of correlation between the first part poses being lower, but the degree of correlation between the second sub-part poses associated with any one of the second part poses being higher when the corresponding expression is made. For example, in the above example, for the two first part poses corresponding to the eye area and the chin area, since the eye movement is not linked to the chin synchronization movement, the expression change due to the eye movement should be reflected in the eye area pose change, and the chin area pose should not be changed. Thus, there should be a low correlation or coupling between the eye region pose and the chin region pose, e.g. the respective first site poses may be decoupled.

For example, in the above example, the jaw-tooth region pose, the second region pose, relates to the jaw region and the tooth region. In the actual human structure, the jaw and the teeth are integrally connected, so that expression changes caused by jaw movement are actually related to synchronous movement of the teeth. Thus, for the jaw-tooth region pose, the associated jaw region pose, and tooth region pose, should be strongly correlated or strongly coupled.

In operation S220, for any one target time-series frame among the plurality of time-series frames, a first pose transform coefficient is determined according to the initial reference pose and the first part pose.

Illustratively, the initial reference pose may be, for example, a facial pose of a certain time-series frame of the input data stream.

The pose transform coefficients are understood to be coefficient values that can be reconstructed into the corresponding facial pose. In the embodiment of the present disclosure, the first posture transformation coefficient, the second posture transformation coefficient, and the like are used only for distinguishing different posture transformation coefficients.

Exemplarily, the pose transform coefficient may include, for example, a blendshape, i.e., a hybrid deformation coefficient.

Illustratively, the initial reference pose may be, for example, a "blankness" facial pose.

In operation S230, the initial reference attitude is updated according to the first attitude transformation coefficient, resulting in an updated reference attitude.

In operation S240, a second pose transform coefficient is determined according to the updated reference pose and the second part pose.

In operation S250, the avatar is driven according to the second posture transformation coefficient.

According to the avatar driving method of the embodiment of the present disclosure, by receiving an input data stream and determining a first pose transformation coefficient according to an initial reference pose and a first part pose for any one target time frame among a plurality of time frames, an update of a pose corresponding to a first part region in the initial reference pose may be achieved, an update reference pose may be obtained by updating the initial reference pose according to the first pose transformation coefficient, an update pose of the first part region may be achieved as a new reference pose, and a second part pose may be updated on the basis of the update of the first part pose by using a second pose transformation coefficient determined according to the update reference pose and a second part pose, so that a facial expression of a driven avatar is more realistic.

According to the avatar driving method of the embodiment of the present disclosure, facial gestures may be subdivided based on correlations, and since the correlations between different parts may affect the reality of making facial expressions (for example, facial gestures reconstructed according to corresponding gesture transformation coefficients in some embodiments may show that a mouth is open due to movement of a chin, but teeth are not moved, and conflict with expressions made by real human body structures), the avatar driving method according to the embodiment of the present disclosure may reduce confusion of facial expressions and the like caused by avatar driving based on a first part gesture with low correlation and a second sub-part gesture with high correlation, so that the expressions made by driving an avatar are more realistic, and the reality, for example, facial expressions made in a first part region with low correlation may not be correlated with each other, and facial expressions may not be made independently in a second part region with high correlation.

Fig. 3 schematically shows a schematic diagram of an avatar driving method according to another embodiment of the present disclosure.

As shown in fig. 3, the avatar driving method 300 according to still another embodiment of the present disclosure may implement a specific example of driving an avatar according to a second posture transformation coefficient of operation S350, for example, using the following embodiment.

The input data stream includes vertex stream data, each time-series frame of the vertex stream data characterizing a facial pose by a plurality of primitive vertex positions. In the example of FIG. 3, the Input data stream Input is schematically shown as vertex stream data, e.g., a target temporal frame Pi of the vertex stream data characterizes a corresponding face pose fp-i by a plurality of primitive vertex positions 307.

In operation S351, for the target time-series frame Pi, second posture transformation difference data 308 is determined according to the second posture transformation coefficient 306 and the primitive vertex position 307 corresponding to the target time-series frame Pi.

For any one target time series frame Pi, the corresponding primitive vertex position 307 is the true facial pose captured by the camera, the second pose transformation coefficient is used for reconstructing the facial pose, the facial pose reconstructed from the second pose transformation coefficient has an error with the true facial pose (the true facial pose can be characterized by the corresponding primitive vertex position), and the degree of difference between the facial pose reconstructed from the second pose transformation coefficient and the true facial pose can be characterized by the second pose transformation difference data.

In operation S352, a third pose transform coefficient 309 is determined according to the second pose transform difference data 308 and the second pose transform coefficient 306.

In operation S353, the avatar 310 is driven according to the third posture transformation coefficient 309.

In the case where the second pose transform difference data represents a degree of difference between the second pose transform coefficient reconstructed face pose and the true face pose, a third pose transform coefficient may be determined based on the second transform difference data and the second pose transform coefficient. The third pose transform coefficient has less error for reconstructing the face pose than the second pose transform coefficient.

According to the avatar driving method of the embodiment of the present disclosure, the determined second pose transformation difference data may be used to balance the error of the facial pose reconstructed by the second pose transformation coefficient with respect to the target time series frame according to the second pose transformation coefficient and the primitive vertex position corresponding to the target time series frame. By transforming the difference data and the second pose transformation coefficient according to the second pose, the second pose transformation coefficient can be adjusted based on the error, the error of the obtained third pose transformation coefficient is smaller, and more accurate facial expression can be made according to the avatar driven by the third pose transformation coefficient.

In the example of fig. 3, operations S310 to S340 are also schematically shown, and operations S310 to S340 are similar to operations S210 to S240 of the above-described embodiment, respectively.

For example, in the example of fig. 3, receiving an Input data stream Input of operation S310 is schematically shown. The Input data stream Input comprises, for example, a time-sequential frame P ₁ A total of N time-sequential frames to time-sequential frame PN, each time-sequential frame being associated with a respective facial pose. For example, the timing frame P ₁ Associated with a facial pose fp-1. Taking target time-series frame Pi as an example, associated facial posesfp-i comprises a first location pose 301 and a second location pose 302.

In the example of fig. 3, the determination of the first pose transform coefficient 304 from the initial reference pose 303 and the first part pose 301 of operation S320 is schematically illustrated. Updating the initial reference attitude 303 according to the first attitude transformation coefficient 304 of S330, resulting in an updated reference attitude 305, is also schematically shown. The determination of the second pose transform coefficient 306 from the updated reference pose 305 and the second part pose 302 of S340 is also schematically illustrated.

Fig. 4 schematically shows a diagram of determining a third pose transformation coefficient of an avatar driving method according to still another embodiment of the present disclosure.

As shown in fig. 4, according to an avatar driving method according to still another embodiment of the present disclosure, a specific example of determining a third posture transformation coefficient according to second posture transformation difference data and a second posture transformation coefficient may be implemented, for example, with the following embodiments.

In operation S461, a first differential pose 403 is determined from the second pose transformation difference data 401 and the initial reference pose 402, the first differential pose being characterized by the positions 404 of the vertices of the respective primitive.

The second posture-transformation difference data is used as error-type data, and the first posture difference determined according to the second posture-transformation difference data and the initial reference posture can present the error-type second posture-transformation difference data in a facial posture mode.

In operation S462, the first difference pose 403 is bone decomposed using the first bone node data 405, resulting in bone-vertex first association data 406.

The first bone node data 405 includes a number of first bone nodes 405-M and poses of the first bone nodes 405-Pos, and the bone-vertex first association data characterizes an association weight between any one of the first bone nodes and a primitive vertex corresponding to the first difference pose.

The vertex of the primitive corresponding to the first difference pose may be understood as the vertex of the primitive corresponding to the case where the position of the vertex of the primitive is used to characterize the first difference pose.

Each bone node can be associated with a plurality of primitive vertices corresponding to the first differential pose, each primitive vertex corresponding to the associated first differential pose and the bone node have corresponding associated weights, that is, one-to-many association between the first bone node and the primitive vertex corresponding to the first differential pose can be realized, the pose of the first bone node has 3 degrees of freedom in movement and 3 degrees of freedom in rotation of the x-axis, the y-axis and the z-axis, and the primitive vertex position has 3 degrees of freedom in movement of the x-axis, the y-axis and the z-axis. In addition, some complex facial gestures are difficult for a real human to make, and with the avatar driving method of the embodiments of the present disclosure, for example, an avatar may be driven to make facial gestures difficult for a real human to make.

Illustratively, the number and pose of the first skeleton nodes may be manually set by the relevant personnel, or may be automatically inferred based on the skeleton decomposition principle according to the pose (deformation) corresponding to the first difference pose.

In the case where the first difference pose presents the error-type second pose transformation difference data in the form of a facial pose, the number and pose of the first bone nodes may be set manually, for example by the relevant person, on the basis of the reconstructed visualized first difference pose.

Illustratively, the first differential pose may be bone decomposed based on, for example, a linear Skinning Decomposition algorithm, abbreviated SSDR (Smooth Skinning Decomposition With ridged Bones). The SSDR model may resolve linear skinning data from a series of motions to approximately fit the corresponding deformation of the motion through a number of skeletal and vertex weight maps.

In operation S463, a second differential pose 407 corresponding to the bone-node first association data is determined from the bone-vertex first association data 406.

In operation S464, a third posture transformation coefficient 409 is determined according to the second posture transformation coefficient 408 and the second difference posture 407.

It should be noted that the second differential pose is identical to the first differential pose in terms of reconstruction results, and is different from the first differential pose in that the first differential pose is characterized by the corresponding primitive vertex position, and the second differential pose is characterized by the bone-node first correlation data.

When a third posture transformation coefficient for reconstructing the face posture needs to be determined, the second difference posture can be used for introducing skeleton-node first correlation data, the skeleton-node first correlation data is stronger in expression capacity and has smaller data quantity compared with the vertex position of the primitive, and the method can adapt to more subtle and larger facial postures and deformation.

Illustratively, according to the avatar driving method of still another embodiment of the present disclosure, a specific example of driving the avatar according to the third posture transformation coefficient may also be implemented, for example, with the following embodiments: and determining third attitude transformation difference data according to the third attitude transformation coefficient and the primitive vertex position corresponding to the target time sequence frame. And under the condition that the third posture conversion difference data does not meet the posture conversion difference condition, determining a fourth posture conversion coefficient according to the third posture conversion difference data and the third posture conversion coefficient, wherein a difference value between the face posture corresponding to the fourth posture conversion coefficient and the primitive vertex position corresponding to the target time sequence frame meets the posture conversion difference condition. And driving the virtual image according to the fourth posture transformation coefficient.

According to the avatar driving method of the embodiment of the present disclosure, in order to obtain a more accurate posture change coefficient, the posture change coefficient may be controlled within a specified error range by a posture change difference condition. In the case where the third pose transformation coefficient determined in the above embodiment does not satisfy the pose transformation difference condition, the third pose transformation coefficient may be continuously adjusted based on the third pose transformation difference data determined by the third pose transformation coefficient until the obtained fourth pose transformation coefficient satisfies the pose transformation difference condition, so that driving the avatar according to the fourth pose transformation coefficient has higher accuracy.

Exemplarily, according to the avatar driving method of still another embodiment of the present disclosure, a specific example of determining the fourth pose transformation coefficient according to the third pose transformation difference data and the third pose transformation coefficient may be implemented, for example, with the following embodiments: and determining a third differential attitude according to the third attitude transformation difference data and the initial reference attitude. And carrying out bone decomposition on the third difference posture by using the second bone node data to obtain bone-vertex second associated data. And determining a fourth differential posture corresponding to the bone-node second association data according to the bone-vertex second association data. And determining a fourth attitude transformation coefficient according to the third attitude transformation coefficient and the fourth differential attitude.

The third differential pose is characterized by the positions of the vertices of the respective primitives.

The second bone node data includes a number and a pose of second bone nodes, and the bone-vertex second correlation data characterizes a correlation weight between any one of the second bone nodes and a primitive vertex corresponding to the third differential pose.

It should be noted that, on the basis that the second posture difference data is determined first in the above-described embodiment, the third posture-change difference data determined further later corresponds to a finer and larger amount of face posture and deformation than the second posture difference data. According to the virtual image driving method of the embodiment of the present disclosure, iteration is performed through skeleton decomposition, so that a fine and large amount of facial gestures and deformations can be dealt with on the basis of improving accuracy, and the specific principle is as described in the above embodiments and is not described herein again.

Illustratively, according to the avatar driving method of yet another embodiment of the present disclosure, the pose transformation coefficient is obtained from a pose transformation function, the pose transformation function being associated with the positions of vertices of the primitive, a reference pose, the reference pose including at least one of an initial reference pose, an updated reference pose. The attitude transform coefficients include at least one of a first attitude transform coefficient, a second attitude transform coefficient, a third attitude transform coefficient, and a fourth attitude transform coefficient. The third attitude transformation coefficient is obtained according to the second attitude transformation coefficient, and the fourth attitude transformation coefficient is obtained according to the third attitude transformation coefficient.

It should be noted that the "first orientation transform coefficient", "second orientation transform coefficient", "third orientation transform coefficient", and "fourth orientation transform coefficient" are the same as the "first orientation transform coefficient", "second orientation transform coefficient", "third orientation transform coefficient", and "fourth orientation transform coefficient" in the above embodiments, respectively.

Illustratively, the pose transformation function can be characterized, for example, using the following equation (1).

C(x)＝basic+B·x (1)

x represents the value of the pose coefficient corresponding to the vertices of the primitive, and the value of x may be, for example, between 0 and 1. basic characterizes a reference pose, such as the initial reference pose or the updated reference pose of the above embodiments. B characterizes the position of the corresponding primitive vertex.

For example, the primitive vertex position corresponding to the target time-series frame may be B, the initial reference posture may be basic, and the first posture transformation coefficient x and the face posture C (x) corresponding to the first posture transformation coefficient for the target time-series frame may be obtained by using formula (1).

For example, the primitive vertex position of the face pose C (x) corresponding to the first pose transform coefficient may be B, the reference pose may be updated as basic, and the second pose transform coefficient x and the face pose C (x) corresponding to the second pose transform coefficient for the target time-series frame may be obtained by using equation (1).

Illustratively, according to the avatar driving method of yet another embodiment of the present disclosure, the pose transformation difference data is obtained according to a pose transformation difference function, the pose transformation difference function being associated with the pose transformation coefficients and the positions of the key vertices; the key vertex is determined from the full amount of primitive vertices according to the representation; the pose transformation difference data includes at least one of second pose transformation difference data and third pose transformation difference data. The third attitude transformation difference data is obtained according to the third attitude transformation coefficient.

It should be noted that the "orientation transformation coefficient" and the "third orientation transformation difference data" are respectively the same as the "orientation transformation coefficient" and the "third orientation transformation difference data" in the above embodiments.

Illustratively, the pose transformation difference function can be characterized, for example, using the following equation (2).

It should be noted that the value of x in the formula (1) may be variable, that is, the value of x may be variable, each specific value of x may correspond to a related facial pose, the value of x may be limited by an error condition using the formula (2), and an accurate specific value of x may be determined.

In equation (2), i represents any one target time-series frame, b _i Characterizing the key vertices, xpre characterizes the pose transform coefficients computed from the previous frame of the current frame, and equation (2) can be for the input data stream as a whole, α _i 、b _i 、β ₁ And beta ₂ Both characterize the coefficients.

In summary, according to the avatar driving method of the embodiment of the present disclosure, by using, for example, a plurality of vertex stream calculations based on the phases of the first position pose, the second pose transformation difference data, the third pose transformation difference data, and the like, the first position pose, the second position pose, and the like can be decoupled from each other, and the decoupling can be understood as dividing the original one-time execution facial pose reconstruction process into a plurality of times, and obtaining the phase pose transformation coefficients each time, thereby improving the avatar driving accuracy and facilitating the migration between different portraits.

Fig. 5 schematically illustrates a block diagram of an avatar driving apparatus according to an embodiment of the present disclosure.

As shown in fig. 5, the avatar driving apparatus 500 of the embodiment of the present disclosure includes, for example, an input data stream receiving module 510, a first posture transformation coefficient determining module 520, an update reference posture determining module 530, a second posture transformation coefficient determining module 540, and an avatar driving module 540.

An input data stream receiving module 510, configured to receive an input data stream, where the input data stream includes a plurality of time-series frames, any one of the time-series frames being associated with a facial pose, the facial pose including a first part pose and a second part pose, the second part pose including a plurality of second sub-part poses, and a correlation between any plurality of first part poses being smaller than a correlation between a plurality of second sub-part poses.

The first pose transformation coefficient determining module 520 is configured to determine a first pose transformation coefficient according to the initial reference pose and the first part pose for any target time-series frame in the plurality of time-series frames.

And an update reference posture determining module 530, configured to update the initial reference posture according to the first posture transformation coefficient, so as to obtain an update reference posture.

And a second pose transformation coefficient determining module 540, configured to determine a second pose transformation coefficient according to the updated reference pose and the second part pose.

And an avatar driving module 550 for driving the avatar according to the second pose transformation coefficient.

According to an embodiment of the present disclosure, an input data stream includes vertex stream data, each time-series frame of the vertex stream data characterizing a facial pose by a plurality of primitive vertex positions; the avatar driving module includes: the second attitude transformation difference data determination submodule is used for determining second attitude transformation difference data according to a second attitude transformation coefficient and the primitive vertex position corresponding to the target time sequence frame aiming at the target time sequence frame; the third attitude transformation coefficient determining submodule is used for determining a third attitude transformation coefficient according to the second attitude transformation difference data and the second attitude transformation coefficient; and the virtual image driving submodule is used for driving the virtual image according to the third posture transformation coefficient.

According to an embodiment of the present disclosure, the third posture transformation coefficient determining sub-module includes: a first difference attitude determination unit for determining a first difference attitude according to the second attitude transformation difference data and the initial reference attitude, the first difference attitude being characterized by the position of the corresponding primitive vertex; the bone-vertex first association data determining unit is used for carrying out bone decomposition on the first difference posture by utilizing first bone node data to obtain bone-vertex first association data, wherein the first bone node data comprise the number and the pose of first bone nodes, and the bone-vertex first association data represent association weights between any one first bone node and a primitive vertex corresponding to the first difference posture; a second difference posture determining unit, configured to determine, according to the bone-vertex first association data, a second difference posture corresponding to the bone-node first association data; and a third attitude transformation coefficient determining unit configured to determine a third attitude transformation coefficient based on the second attitude transformation coefficient and the second difference attitude.

According to an embodiment of the present disclosure, the avatar driving sub-module includes: the third attitude transformation difference data determination unit is used for determining third attitude transformation difference data according to a third attitude transformation coefficient and a primitive vertex position corresponding to the target time sequence frame aiming at the target time sequence frame; a fourth pose transformation coefficient determining unit, configured to determine a fourth pose transformation coefficient according to the third pose transformation difference data and the third pose transformation coefficient when the third pose transformation difference data does not satisfy the pose transformation difference condition, where a difference value between a face pose corresponding to the fourth pose transformation coefficient and a primitive vertex position corresponding to the target time-series frame satisfies the pose transformation difference condition; and an avatar driving unit for driving the avatar according to the fourth posture transformation coefficient.

According to an embodiment of the present disclosure, the fourth orientation transformation coefficient determination unit includes: a third difference posture determining subunit, configured to determine a third difference posture according to the third posture transformation difference data and the initial reference posture, where the third difference posture is represented by a position of a vertex of a corresponding primitive; the skeleton-vertex second association data determination subunit is used for performing skeleton decomposition on the third difference posture by using second skeleton node data to obtain skeleton-vertex second association data, wherein the second skeleton node data comprise the number and the pose of second skeleton nodes, and the skeleton-vertex second association data represent association weights between any one second skeleton node and a primitive vertex corresponding to the third difference posture; a fourth difference posture determining subunit, configured to determine, according to the bone-vertex second association data, a fourth difference posture corresponding to the bone-node second association data; and a fourth attitude transformation coefficient determining subunit for determining a fourth attitude transformation coefficient based on the third attitude transformation coefficient and the fourth differential attitude.

According to an embodiment of the disclosure, the first part pose comprises at least one of: eye region pose, nose region pose, and chin region pose; the second part pose includes at least one of: chin-to-teeth region poses, chin-to-lips region poses, eye-to-eyebrow region poses, the chin-to-teeth region poses including chin sub-region poses and teeth sub-region poses, the chin-to-lips region poses including chin sub-region poses and lips sub-region poses, and the eye-to-eyebrow region poses including eye sub-region poses and eyebrow sub-region poses.

According to the embodiment of the disclosure, the attitude transformation coefficient is obtained according to an attitude transformation function, the attitude transformation function is related to a reference attitude and a primitive vertex position, and the reference attitude comprises an initial reference attitude and an updated reference attitude; the attitude transform coefficient includes at least one of a first attitude transform coefficient, a second attitude transform coefficient, a third attitude transform coefficient, and a fourth attitude transform coefficient. The third attitude transformation coefficient is obtained according to the second attitude transformation coefficient, and the fourth attitude transformation coefficient is obtained according to the third attitude transformation coefficient.

According to an embodiment of the disclosure, the pose transformation difference data is obtained according to a pose transformation difference function, the pose transformation difference function being related to the pose transformation coefficient and the position of the key vertex; the key vertex is determined from the full amount of primitive vertices according to the representation; the pose transformation difference data includes at least one of second pose transformation difference data and third pose transformation difference data. The third attitude transformation difference data is obtained according to the third attitude transformation coefficient.

It should be understood that the embodiments of the apparatus part of the present disclosure are the same as or similar to the embodiments of the method part of the present disclosure, and the technical problems to be solved and the technical effects to be achieved are also the same as or similar to each other, and the detailed description of the present disclosure is omitted.

The present disclosure also provides an electronic device, a readable storage medium, and a computer program product according to embodiments of the present disclosure.

FIG. 6 illustrates a schematic block diagram of an example electronic device 600 that can be used to implement embodiments of the present disclosure. Electronic devices are intended to represent various forms of digital computers, such as laptops, desktops, workstations, personal digital assistants, servers, blade servers, mainframes, and other appropriate computers. The electronic device may also represent various forms of mobile devices, such as personal digital processing, cellular phones, smart phones, wearable devices, and other similar computing devices. The components shown herein, their connections and relationships, and their functions, are meant to be examples only, and are not meant to limit implementations of the disclosure described and/or claimed herein.

As shown in fig. 6, the apparatus 600 includes a computing unit 601, which can perform various appropriate actions and processes according to a computer program stored in a Read Only Memory (ROM) 602 or a computer program loaded from a storage unit 608 into a Random Access Memory (RAM) 603. In the RAM 603, various programs and data required for the operation of the device 600 can also be stored. The calculation unit 601, the ROM 602, and the RAM 603 are connected to each other via a bus 604. An input/output (I/O) interface 605 is also connected to bus 604.

A number of components in the device 600 are connected to the I/O interface 605, including: an input unit 606 such as a keyboard, a mouse, or the like; an output unit 607 such as various types of displays, speakers, and the like; a storage unit 608, such as a magnetic disk, optical disk, or the like; and a communication unit 609 such as a network card, modem, wireless communication transceiver, etc. The communication unit 609 allows the device 600 to exchange information/data with other devices via a computer network such as the internet and/or various telecommunication networks.

The computing unit 601 may be a variety of general and/or special purpose processing components having processing and computing capabilities. Some examples of the computing unit 601 include, but are not limited to, a Central Processing Unit (CPU), a Graphics Processing Unit (GPU), various dedicated Artificial Intelligence (AI) computing chips, various computing units running machine learning model algorithms, a Digital Signal Processor (DSP), and any suitable processor, controller, microcontroller, and so forth. The calculation unit 601 performs the respective methods and processes described above, such as the avatar driving method. For example, in some embodiments, the avatar-driven method may be implemented as a computer software program tangibly embodied in a machine-readable medium, such as storage unit 608. In some embodiments, part or all of the computer program may be loaded and/or installed onto the device 600 via the ROM 602 and/or the communication unit 609. When the computer program is loaded into the RAM 603 and executed by the computing unit 601, one or more steps of the avatar driving method described above may be performed. Alternatively, in other embodiments, the computing unit 601 may be configured to perform the avatar driving method by any other suitable means (e.g., by means of firmware).

Various implementations of the systems and techniques described here above may be implemented in digital electronic circuitry, integrated circuitry, field Programmable Gate Arrays (FPGAs), application Specific Integrated Circuits (ASICs), application Specific Standard Products (ASSPs), system on a chip (SOCs), complex Programmable Logic Devices (CPLDs), computer hardware, firmware, software, and/or combinations thereof. These various embodiments may include: implemented in one or more computer programs that are executable and/or interpretable on a programmable system including at least one programmable processor, which may be special or general purpose, receiving data and instructions from, and transmitting data and instructions to, a storage system, at least one input device, and at least one output device.

Program code for implementing the methods of the present disclosure may be written in any combination of one or more programming languages. These program codes may be provided to a processor or controller of a general purpose computer, special purpose computer, or other programmable data processing apparatus, such that the program codes, when executed by the processor or controller, cause the functions/operations specified in the flowchart and/or block diagram to be performed. The program code may execute entirely on the machine, partly on the machine, as a stand-alone software package partly on the machine and partly on a remote machine or entirely on the remote machine or server.

In the context of this disclosure, a machine-readable medium may be a tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. The machine-readable medium may be a machine-readable signal medium or a machine-readable storage medium. A machine-readable medium may include, but is not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any suitable combination of the foregoing. More specific examples of a machine-readable storage medium would include an electrical connection based on one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.

To provide for interaction with a user, the systems and techniques described here can be implemented on a computer having: a display device (e.g., a CRT (cathode ray tube) or LCD (liquid crystal display) monitor) for displaying information to a user; and a keyboard and a pointing device (e.g., a mouse or a trackball) by which a user can provide input to the computer. Other kinds of devices may also be used to provide for interaction with a user; for example, feedback provided to the user can be any form of sensory feedback (e.g., visual feedback, auditory feedback, or tactile feedback); and input from the user may be received in any form, including acoustic, speech, or tactile input.

The systems and techniques described here can be implemented in a computing system that includes a back-end component (e.g., as a data server), or that includes a middleware component (e.g., an application server), or that includes a front-end component (e.g., a user computer having a graphical user interface or a web browser through which a user can interact with an implementation of the systems and techniques described here), or any combination of such back-end, middleware, or front-end components. The components of the system can be interconnected by any form or medium of digital data communication (e.g., a communication network). Examples of communication networks include: local Area Networks (LANs), wide Area Networks (WANs), and the Internet.

The computer system may include clients and servers. A client and server are generally remote from each other and typically interact through a communication network. The relationship of client and server arises by virtue of computer programs running on the respective computers and having a client-server relationship to each other.

It should be understood that various forms of the flows shown above may be used, with steps reordered, added, or deleted. For example, the steps described in the present disclosure may be executed in parallel, sequentially, or in different orders, as long as the desired results of the technical solutions disclosed in the present disclosure can be achieved, and the present disclosure is not limited herein.

The above detailed description should not be construed as limiting the scope of the disclosure. It should be understood by those skilled in the art that various modifications, combinations, sub-combinations and substitutions may be made in accordance with design requirements and other factors. Any modification, equivalent replacement, and improvement made within the spirit and principle of the present disclosure should be included in the scope of protection of the present disclosure.

Claims

1. An avatar driving method comprising:

receiving an input data stream, wherein the input data stream comprises a plurality of time-series frames, any one of the time-series frames being associated with a facial pose, the facial pose comprising a first part pose and a second part pose, the second part pose comprising a plurality of second sub-part poses, a correlation between any plurality of the first part poses being less than a correlation between a plurality of the second sub-part poses;

determining a first posture transformation coefficient according to an initial reference posture and the first part posture aiming at any one target time sequence frame in the plurality of time sequence frames;

updating the initial reference attitude according to the first attitude transformation coefficient to obtain an updated reference attitude;

determining a second attitude transformation coefficient according to the updated reference attitude and the second part attitude; and

and driving the virtual image according to the second posture transformation coefficient.

2. The method of claim 1, wherein the input data stream comprises vertex stream data, each of the time-series frames of the vertex stream data characterizing the facial pose by a plurality of primitive vertex positions; the driving of the avatar according to the second pose transformation coefficient includes:

determining second attitude transformation difference data according to the second attitude transformation coefficient and the primitive vertex position corresponding to the target time sequence frame aiming at the target time sequence frame;

determining a third attitude transformation coefficient according to the second attitude transformation difference data and the second attitude transformation coefficient; and

and driving the virtual image according to the third posture transformation coefficient.

3. The method of claim 2, wherein the determining a third pose transform coefficient from the second pose transform difference data and the second pose transform coefficient comprises:

determining a first differential pose according to the second pose transformation difference data and the initial reference pose, the first differential pose being characterized by the position of the corresponding primitive vertex;

performing bone decomposition on the first difference posture by using first bone node data to obtain bone-vertex first association data, wherein the first bone node data comprises the number and the pose of first bone nodes, and the bone-vertex first association data represents the association weight between any one first bone node and a primitive vertex corresponding to the first difference posture;

determining a second differential posture corresponding to the bone-node first association data according to the bone-vertex first association data; and

and determining the third attitude transformation coefficient according to the second attitude transformation coefficient and the second differential attitude.

4. The method of claim 2, wherein the driving of the avatar according to the third pose transformation coefficient comprises:

determining third attitude transformation difference data according to the third attitude transformation coefficient and the primitive vertex position corresponding to the target time sequence frame aiming at the target time sequence frame;

determining a fourth pose transformation coefficient according to the third pose transformation difference data and the third pose transformation coefficient under the condition that the third pose transformation difference data does not meet a pose transformation difference condition, wherein a difference value between a face pose corresponding to the fourth pose transformation coefficient and a primitive vertex position corresponding to the target time sequence frame meets the pose transformation difference condition; and

and driving the virtual image according to the fourth posture transformation coefficient.

5. The method of claim 4, wherein the determining a fourth pose transform coefficient from the third pose transform difference data and the third pose transform coefficient comprises:

determining a third differential pose according to the third pose transformation difference data and the initial reference pose, the third differential pose being characterized by the positions of the vertices of the corresponding primitives;

performing bone decomposition on the third difference posture by using second bone node data to obtain bone-vertex second association data, wherein the second bone node data comprises the number and the pose of second bone nodes, and the bone-vertex second association data represents the association weight between any one second bone node and the corresponding primitive vertex of the third difference posture;

determining a fourth differential posture corresponding to the bone-node second association data according to the bone-vertex second association data; and

and determining the fourth attitude transformation coefficient according to the third attitude transformation coefficient and the fourth differential attitude.

6. The method of any of claims 1-5, wherein the first site pose comprises at least one of: eye region pose, nose region pose, and chin region pose; the second part pose comprises at least one of: a chin-teeth area pose, a chin-lips area pose, an eyes-eyebrows area pose, the chin-teeth area pose comprising a chin sub-area pose and a teeth sub-area pose, the chin-lips area pose comprising a chin sub-area pose and a lips sub-area pose, the eyes-eyebrows area pose comprising an eyes sub-area pose and an eyebrows sub-area pose.

7. The method of any of claims 1-5, wherein the pose transform coefficients are derived from a pose transform function, the pose transform function being related to a reference pose, primitive vertex positions, the reference pose comprising the initial reference pose, the updated reference pose; the attitude transformation coefficient comprises at least one of the first attitude transformation coefficient, the second attitude transformation coefficient, a third attitude transformation coefficient and a fourth attitude transformation coefficient, the third attitude transformation coefficient is obtained according to the second attitude transformation coefficient, and the fourth attitude transformation coefficient is obtained according to the third attitude transformation coefficient.

8. The method of claim 2, wherein the pose transformation difference data is derived from a pose transformation difference function, the pose transformation difference function being related to pose transformation coefficients and locations of key vertices; the key vertex is determined from the full number of primitive vertices according to the representation; the attitude transformation difference data includes at least one of the second attitude transformation difference data and third attitude transformation difference data, and the third attitude transformation difference data is obtained according to the third attitude transformation coefficient.

9. An avatar driving apparatus comprising:

an input data stream receiving module configured to receive an input data stream, wherein the input data stream comprises a plurality of time-series frames, any one of the time-series frames being associated with a facial pose, the facial pose comprising a first part pose and a second part pose, the second part pose comprising a plurality of second sub-part poses, a correlation between any plurality of the first part poses being less than a correlation between a plurality of the second sub-part poses;

a first posture transformation coefficient determining module, configured to determine, for any one target time-series frame of the multiple time-series frames, a first posture transformation coefficient according to an initial reference posture and the first part posture;

the updating reference attitude determination module is used for updating the initial reference attitude according to the first attitude transformation coefficient to obtain an updating reference attitude;

the second attitude transformation coefficient determining module is used for determining a second attitude transformation coefficient according to the updated reference attitude and the second part attitude; and

and the virtual image driving module is used for driving the virtual image according to the second posture transformation coefficient.

10. The apparatus of claim 9, wherein the input data stream comprises vertex stream data, each of the time-series frames of the vertex stream data characterizing the facial pose by a plurality of primitive vertex positions; the virtual image the driving module includes:

a second posture transformation difference data determining submodule, configured to determine, for the target time series frame, second posture transformation difference data according to the second posture transformation coefficient and the primitive vertex position corresponding to the target time series frame;

a third attitude transformation coefficient determining submodule, configured to determine a third attitude transformation coefficient according to the second attitude transformation difference data and the second attitude transformation coefficient;

and the virtual image driving submodule is used for driving the virtual image according to the third attitude transformation coefficient.

11. The apparatus of claim 10, wherein the third pose transform coefficient determination submodule comprises:

a first differential pose determination unit for determining a first differential pose from the second pose transformation difference data and the initial reference pose, the first differential pose being characterized by the position of the corresponding primitive vertex;

a bone-vertex first association data determining unit, configured to perform bone decomposition on the first difference posture by using first bone node data to obtain bone-vertex first association data, where the first bone node data includes a number and a pose of first bone nodes, and the bone-vertex first association data represents an association weight between any one of the first bone nodes and a primitive vertex corresponding to the first difference posture;

a second difference posture determining unit, configured to determine, according to the bone-vertex first association data, a second difference posture corresponding to the bone-node first association data; and

a third attitude transformation coefficient determination unit configured to determine the third attitude transformation coefficient according to the second attitude transformation coefficient and the second difference attitude.

12. The apparatus of claim 10, wherein the avatar driver sub-module comprises:

a third attitude transformation difference data determination unit, configured to determine, for the target time series frame, third attitude transformation difference data according to the third attitude transformation coefficient and the primitive vertex position corresponding to the target time series frame;

a fourth pose transform coefficient determination unit configured to determine a fourth pose transform coefficient according to the third pose transform difference data and the third pose transform coefficient, in a case where the third pose transform difference data does not satisfy a pose transform difference condition, where a difference value between a face pose corresponding to the fourth pose transform coefficient and a primitive vertex position corresponding to the target time-series frame satisfies the pose transform difference condition; and

and an avatar driving unit for driving an avatar according to the fourth posture transformation coefficient.

13. The apparatus of claim 12, wherein the fourth pose transform coefficient determining unit comprises:

a third difference pose determination subunit, configured to determine a third difference pose according to the third pose transformation difference data and the initial reference pose, where the third difference pose is represented by a position of a vertex of a corresponding primitive;

a bone-vertex second association data determination subunit, configured to perform bone decomposition on the third difference pose by using second bone node data to obtain bone-vertex second association data, where the second bone node data includes a number and a pose of second bone nodes, and the bone-vertex second association data represents an association weight between any one of the second bone nodes and a primitive vertex corresponding to the third difference pose;

a fourth difference posture determining subunit, configured to determine, according to the bone-vertex second association data, a fourth difference posture corresponding to the bone-node second association data; and

a fourth pose transform coefficient determining subunit configured to determine the fourth pose transform coefficient according to the third pose transform coefficient and the fourth difference pose.

14. The apparatus of any of claims 9-13, wherein the first site pose comprises at least one of: eye region pose, nose region pose, and chin region pose; the second part pose comprises at least one of: a chin-teeth area pose, a chin-lips area pose, an eyes-eyebrows area pose, the chin-teeth area pose comprising a chin sub-area pose and a teeth sub-area pose, the chin-lips area pose comprising a chin sub-area pose and a lips sub-area pose, the eyes-eyebrows area pose comprising an eyes sub-area pose and an eyebrows sub-area pose.

15. The apparatus as claimed in claims 9-13, wherein the pose transform coefficients are derived from a pose transform function, the pose transform function being related to a reference pose, primitive vertex positions, the reference pose comprising the initial reference pose, the updated reference pose; the attitude transformation coefficient includes at least one of the first attitude transformation coefficient, the second attitude transformation coefficient, a third attitude transformation coefficient and a fourth attitude transformation coefficient, the third attitude transformation coefficient is obtained according to the second attitude transformation coefficient, and the fourth attitude transformation coefficient is obtained according to the third attitude transformation coefficient.

16. The apparatus of claim 10, wherein the pose transform difference data is derived from a pose transform difference function, the pose transform difference function being related to pose transform coefficients and locations of key vertices; the key vertex is determined from the full number of primitive vertices according to the representation; the attitude transformation difference data includes at least one of the second attitude transformation difference data and third attitude transformation difference data, and the third attitude transformation difference data is obtained according to the third attitude transformation coefficient.

17. An electronic device, comprising:

at least one processor; and

a memory communicatively coupled to the at least one processor; wherein the content of the first and second substances,

the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the method of any one of claims 1-8.

18. A non-transitory computer readable storage medium having stored thereon computer instructions for causing the computer to perform the method of any one of claims 1-8.

19. A computer program product comprising a computer program stored on at least one of a readable storage medium and an electronic device, the computer program, when executed by a processor, implementing the method according to any one of claims 1-8.