CN109801349B

CN109801349B - Sound-driven three-dimensional animation character real-time expression generation method and system

Info

Publication number: CN109801349B
Application number: CN201811553860.2A
Authority: CN
Inventors: 周湘君; 杜庆焜; 芦振华; 甘靖山
Original assignee: Wuhan Xishan Yichuang Culture Co ltd
Current assignee: Wuhan Xishan Yichuang Culture Co ltd
Priority date: 2018-12-19
Filing date: 2018-12-19
Publication date: 2023-01-24
Anticipated expiration: 2038-12-19
Also published as: CN109801349A

Abstract

The application provides a sound-driven three-dimensional animation role real-time expression generation method, which is suitable for two-dimensional electronic games and animation production and comprises the following steps: importing an animation character expression library; receiving a preset voice file or real-time voice containing expression prompt words, and extracting the expression prompt words in the preset voice file or the real-time voice; matching a plurality of corresponding facial expression animations in the animation character expression library according to the extracted expression prompt words; and acquiring speech of the speech lines in real time, and generating facial expression animations corresponding to the speech lines according to the speech types identified by the facial expression animations. The application also provides a corresponding sound-driven three-dimensional animation character real-time expression generation system. The beneficial effect of this application does: the expression types of the facial expressions of the three-dimensional animation characters to be made are input through voice, and the corresponding facial animation expressions are directly matched based on the input speech of the lines, so that the making efficiency and the making accuracy of the facial animation are improved.

Description

Voice-driven three-dimensional animation character real-time expression generation method and system

Technical Field

The invention relates to the field of computer graphics, in particular to a sound-driven method and a device for generating real-time expressions of three-dimensional animation characters.

Background

In the process of developing an electronic game or animation, the facial expression of an animated character is often expressed in a close-up manner to match with the lines of a script due to the needs of the script. At this time, not only the change of the facial expression of the animated character is amplified to express the change of the mind of the character to the audience, but also the mouth shape of the character needs to be matched with the spoken lines thereof to avoid the picture from being inconsistent. The combination of the character's facial expression and mouth shape changes form a series of subtle and complex combinations. The sophistication and accuracy of such combinations will determine the character performance and overall quality of the video games and animations.

However, the current drawing mode of character animation still depends on manual operation to a large extent. Specifically, an artist first draws a key frame of a specified facial expression according to an original picture of a character; then, the mouth shape change of the character in the character animation is finely adjusted according to the stage words on the script, and a facial expression transition frame containing the mouth shape change is correspondingly inserted in a hand drawing mode. This causes the software developer or outsourcing design company to spend a large amount of labor cost and time to complete the above task, and also needs corresponding labor cost and time to check whether the lines and the mouth shape change are matched or not and whether the sound and the picture are inconsistent or not. However, the mouth shape corresponding to the same voice is fixed, and the way of changing the mouth shape is simpler compared with the combined change of other facial muscles. This makes the task of hand-drawing the transition frames in fact involve a significant amount of mechanical duplication of labor.

Disclosure of Invention

The method and the system for generating the three-dimensional animation role real-time expression in the voice-driven mode aim to overcome the defects of the prior art, and the effect of automatically forming the three-dimensional animation role expression animation in real time according to the input speech of the lines can be achieved.

In order to achieve the above object, the present application adopts the following technical solutions.

Firstly, the application provides a sound-driven three-dimensional animation role real-time expression generation method. The method is suitable for the production of two-dimensional electronic games and animations, and comprises the following steps:

s100) importing an animation character expression library, wherein the animation character expression library comprises a plurality of facial expression animations of which animation characters are classified according to expression types and voice types;

s200) receiving a preset voice file or real-time voice containing the expression prompt words, and extracting the expression prompt words in the preset voice file or the real-time voice;

s300) matching a plurality of corresponding facial expression animations in the animation character expression library according to the extracted expression cue words;

s400) obtaining speech of the speech lines in real time, and generating facial expression animations corresponding to the speech of the speech lines according to the speech types identified by the facial expression animations.

Further, in the above method of the present application, the step S100 may further include the following sub-steps:

s101) classifying facial expression animations in the animation character expression library into a plurality of first expression sub-libraries according to the expression types of the facial expression animations in the introduced animation character expression library;

s102) in each first expression sub-library, based on the voice type of the character in the facial expression animation, compiling indexes of the facial expression animations in the first expression sub-library.

Further, in the above method of the present application, the step S200 may further include the following sub-steps:

s201) checking whether the first voice feature in the extracted preset voice file or real-time voice is matched with a voice template of the expression prompt word, and if not, proposing to input the preset voice file or real-time voice again until the preset voice file or real-time voice is input;

s202) determining the expression type indicated by the expression prompt words contained in the preset voice file or the real-time voice according to the expression type identified by the voice template.

Still further, in the above method of the present application, the step S400 may further include the following sub-steps:

s401) extracting a second voice characteristic of the input speech in real time;

s402) determining the voice type of the input speech of the speech-lines according to the extracted second voice characteristics;

s403) searching for a corresponding facial expression animation based on the voice type of the inputted speech of the speech line and the index in the first expression sub-library to generate a series of facial expression animations of the character.

Still further, in the above method of the present application, the speech template of the expressive cue word is generated by training a speech collected in advance.

Alternatively, in the above method of the present application, the voice template of the expressive cue word is trained by dynamically updating the collected voices.

Alternatively, in the above method of the present application, the correspondence between the voice template and the expression type identified by the voice template is customized.

Further, in the above method of the present application, the correspondence relationship is stored locally.

Secondly, the application also provides a sound-driven three-dimensional animation role real-time expression generation system. The system is suitable for two-dimensional electronic games and animation production, and comprises the following modules: the animation character expression library comprises a plurality of facial expression animations, wherein the facial expression animations are classified according to expression types and voice types; the receiving module is used for receiving a preset voice file or real-time voice containing the expression prompt words and extracting the expression prompt words in the preset voice file or the real-time voice; the matching module is used for matching a plurality of corresponding facial expression animations in the animation character expression library according to the extracted expression prompt words; and the generating module is used for acquiring the speech of the speech lines in real time and generating the facial expression animation corresponding to the speech lines according to the speech type identified by the facial expression animation.

Further, in the above system of the present application, the import module may further include the following sub-modules: the classification module is used for classifying the facial expression animation of the animation character expression library into a plurality of first expression sub-libraries according to the expression type of the facial expression animation in the imported animation character expression library; and the index module is used for compiling indexes of the facial expression animations in the first expression sub-libraries based on the voice types of the roles in the facial expression animations in each first expression sub-library.

Further, in the above system of the present application, the receiving module may further include the following sub-modules: the checking module is used for checking whether the first voice feature in the extracted preset voice file or the extracted real-time voice is matched with the voice template of the expression prompt word or not, and if not, the preset voice file or the real-time voice is proposed to be input again until the preset voice file or the real-time voice is input; and the determining module is used for determining the expression type indicated by the expression prompt words contained in the preset voice file or the real-time voice according to the expression type identified by the voice template.

Still further, in the above system of the present application, the generating module may further include the following sub-modules: the extraction module extracts a second voice feature of the input speech of the speech-lines in real time; the judging module is used for determining the voice type of the input speech according to the extracted second voice feature; and the execution module is used for searching corresponding facial expression animations according to the voice type of the input speech and the index in the first expression sub-library so as to generate a series of character facial expression animations.

Still further, in the above system of the present application, the speech template of the expressive cue word is generated by training a speech collected in advance.

Alternatively, in the above system of the present application, the voice template of the expressive cue word is trained by dynamically updating the collected voices.

Alternatively, in the above system of the present application, the correspondence between the voice template and the expression type identified by the voice template is customized.

Further, in the above system of the present application, the correspondence relationship is stored locally.

Finally, the present application also provides a computer readable storage medium having stored thereon computer instructions suitable for two-dimensional video game and animation production. When the instructions are executed by the processor, the following steps are executed:

s200) receiving a preset voice file or real-time voice containing expression prompt words, and extracting the expression prompt words in the preset voice file or the real-time voice;

Further, when the processor executes the above instructions, the step S100 may further include the following sub-steps:

Further, when the processor executes the above instructions, the step S200 may further include the following sub-steps:

Still further, when the processor executes the above instructions, the step S400 may further include the following sub-steps:

s401) extracting a second voice characteristic of the input speech of the speech-stroke in real time;

s403) searching corresponding facial expression animations based on the voice type of the input speech-line voice and the index in the first expression sub-library to generate a series of facial expression animations of the character.

Still further, when the processor executes the instructions, the speech template of the expressive cue word is generated by training pre-collected speech.

Alternatively, the voice template of the expressive cue words is trained by dynamically updating the collected voices when the processor executes the instructions.

Alternatively, the correspondence between the speech template and its identified emoji type is custom when the processor executes the instructions.

Further, when the processor executes the above instructions, the correspondence relationship is stored locally.

The beneficial effect of this application does: the expression type of the facial expression of the three-dimensional animation character to be made is input through voice, and the corresponding facial animation expression is directly matched based on the input speech of the lines, so that the making efficiency and the making accuracy of the facial expression animation are improved.

Drawings

FIG. 1 is a flow chart of a method for generating real-time expressions of a three-dimensional animated character driven by sound disclosed in the present application;

FIG. 2 is a flowchart illustrating a sub-step method of importing an animated character emoticon library in one embodiment of the present application;

FIG. 3 is a diagram illustrating the structure of an animated character emoticon library in one embodiment of the present application;

FIG. 4 is a flowchart illustrating a sub-step method for extracting expressive hints from a pre-defined voice document or real-time speech in one embodiment of the present application;

FIG. 5 is a flow diagram illustrating a sub-method of matching facial expression animations from an animation character expression library, in one embodiment of the present application;

FIG. 6 is a diagram illustrating a customized voice template and emotion type correspondence, in an embodiment of the present application;

fig. 7 is a block diagram illustrating the structure of the system for generating real-time expressions of three-dimensional animated characters driven by sound disclosed in the present application.

Detailed Description

The conception, specific structure and technical effects of the present application will be described clearly and completely with reference to the following embodiments and the accompanying drawings, so that the purpose, scheme and effects of the present application can be fully understood. It should be noted that, in the present application, the embodiments and features of the embodiments may be combined with each other without conflict.

It should be noted that, unless otherwise specified, when a feature is referred to as being "fixed" or "connected" to another feature, it may be directly fixed or connected to the other feature or indirectly fixed or connected to the other feature. Furthermore, the descriptions of up, down, left, right, etc. used in this application are only relative to the relative positional relationship of the various components of the application in the drawings. As used in this application and the appended claims, the singular forms "a", "an", and "the" are intended to include the plural forms as well, unless the context clearly indicates otherwise.

Furthermore, unless defined otherwise, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this invention belongs. The terminology used in the description herein is for the purpose of describing particular embodiments only and is not intended to be limiting of the application. As used herein, the term "and/or" includes any combination of one or more of the associated listed items.

It will be understood that, although the terms first, second, third, etc. may be used herein to describe various elements, these elements should not be limited by these terms. These terms are only used to distinguish one type of element from another. For example, a first element could be termed a second element, and, similarly, a second element could be termed a first element, without departing from the scope of the present application. Depending on the context, the word "if" as used herein may be interpreted as "at" \8230; … "when" or "when 8230; \8230"; "when".

Referring to a method flow chart shown in fig. 1, the application discloses a sound-driven three-dimensional animation role real-time expression generation method, which is suitable for two-dimensional electronic games and animation production. The method comprises the following steps:

s300) matching a plurality of corresponding facial expression animations in the animation character expression library according to the extracted expression prompt words;

The animation character expression library may be an animation file of various expressions drawn based on an original drawing of an animation for a specific animation character. Specifically, in one or more embodiments of the present application, a model of a three-dimensional character may be created in three-dimensional software according to a related character original drawing, and a related map file may be created in two-dimensional software; then, bones are made and bound for the three-dimensional roles, weight files are drawn, and related facial expression animation files are made. These facial expression animation files have been produced by classifying the current state of the model of the three-dimensional character according to the expression type and the voice type. Wherein the expression type indicates the expression (e.g., "smile", "cry", and "confusion", etc.) that it represents, and the voice type indicates the mouth shape (e.g., mouth shape, smile, etc.) that it represents. To make a coherent animated expression, the particular states of the models may be arranged in a certain order and adjacent states may be made to transition naturally by interpolating the model parameters.

Referring to the flowchart of the sub-method shown in fig. 2, in one or more embodiments of the present application, in order to speed up the search for the specified emoji animation, the step S100 further includes the following sub-steps:

Specifically, referring to the schematic structural diagram of the animation character emotion library shown in fig. 3, the emotion animation is divided into a plurality of first emotion sub-libraries (such as expression sub-databases of "smile", "angry", and "sorry" shown in the figure) according to the emotion category to which the emotion animation belongs. Each first expression sub-library also maintains an expression-state number index table, and the index table records the model state numbers corresponding to the voice types, so that the corresponding model state numbers can be quickly found through the identified voice types. Further, those skilled in the art will appreciate that the number and shape of the display elements in the drawings are merely exemplary references and are not intended to limit the present invention.

Referring to the sub-method flow diagram shown in fig. 4, in one or more embodiments of the present application, the step S200 further includes the following sub-steps:

The first speech feature and the second speech feature referred to in this application refer to feature parameters formed by words, phonemes, tones, and the like of a specific language for collected speech on the basis of the specific language. The characteristic parameters are matched with the characteristic parameters of a plurality of groups of voices marked with specific expression meanings in the voice template of the expression prompt words so as to determine the expression prompt words contained in the received preset voice file or real-time voice. The speech features may be formed by analyzing the received sound using algorithms conventional in the art and not limited in this application. However, when the difference between the received preset voice file or real-time voice and the voice template of each expression prompt word is greater than a preset threshold, the preset voice file or real-time voice is considered to be doped with excessive environmental noise, which may have a serious influence on the judgment of the expression type. Therefore, in order to ensure accuracy, a preset voice file or real-time voice needs to be input again.

Referring to the sub-method flow diagram shown in fig. 5, in one or more embodiments of the present application, the step S400 further includes the following sub-steps:

The second speech feature of the speech-type speech may be obtained by dividing the input real-time speech by syllables or phonemes and determining the physical features (such as duration, pitch, and intensity) of each segment of speech according to each syllable or phoneme. These physical characteristics will be used to determine the speech type corresponding to each syllable or phoneme. Since the first emoticon library is indexed according to the voice type, the model state number corresponding to the syllable or phoneme can be quickly found. The above operations are continuously and circularly executed, and the input speech-lines can be automatically converted into a series of facial expression animations of the characters.

In one or more embodiments of the present application, the speech template of the expressive cue words is generated by training pre-collected speech. The speech text of the training speech may be a syntactically complete sentence or one or more keywords. And reading the voice text for multiple times by one user or respectively reading the voice texts by multiple users to form multiple sections of input voice of the voice text. And training a corresponding voice template for each expression prompt word, wherein the voice template can be formed based on the multiple input voices. After the voice template is trained, when a preset voice file or real-time voice containing the expression prompt words is received, the preset voice file or the real-time voice is matched with the voice recognition template so as to determine which one of the plurality of voice templates corresponds to the preset voice file or the real-time voice. The matching of the preset voice file or the real-time voice and the voice template can be realized by a conventional algorithm in the field, which is not limited in this application.

Still further, referring to the schematic diagram of the correspondence between the voice template and the expression type shown in fig. 6, in one or more embodiments of the present application, the correspondence between the voice template and the expression type identified by the voice template is self-defined. Therefore, after the model of the character role is modified, the corresponding three-dimensional animation character expression can be immediately rebuilt only by adjusting the corresponding relation. Meanwhile, the corresponding relation can be further stored locally, so that the expression-state number index table stored in each first expression sub-library can be maintained more conveniently.

Referring to a module structure diagram shown in fig. 7, the application also discloses a sound-driven three-dimensional animation role real-time expression generation system, which is suitable for two-dimensional electronic games and animation production. The system comprises the following modules: the system comprises an importing module, a judging module and a judging module, wherein the importing module is used for importing an animation role expression library, and the animation role expression library comprises a plurality of facial expression animations of which animation roles are classified according to expression types and voice types; the receiving module is used for receiving a preset voice file or real-time voice containing the expression prompt words and extracting the expression prompt words in the preset voice file or the real-time voice; the matching module is used for matching a plurality of corresponding facial expression animations in the animation character expression library according to the extracted expression prompt words; and the generation module is used for acquiring the speech of the speech line in real time and generating the facial expression animation corresponding to the speech line according to the speech type identified by the facial expression animation. The animation character expression library may be an animation file of various expressions drawn based on an original drawing of an animation for a specific animation character. Specifically, in one or more embodiments of the present application, a model of a three-dimensional character may be created in three-dimensional software according to a related character original drawing, and a related chartlet file may be created in two-dimensional software; then, bones are made and bound for the three-dimensional roles, weight files are drawn, and related facial expression animation files are made. These facial expression animation files have been produced by classifying the current state of the model of the three-dimensional character according to the expression type and the voice type. Wherein the expression type indicates the expression that it represents (e.g., "smile," "cry," and "confusion," etc.), and the voice type indicates the mouth shape that it represents (e.g., mouth shape is slightly open and sipped, etc.). To make a coherent animated expression, the particular states of the models may be arranged in a certain order and adjacent states may be made to transition naturally by interpolating the model parameters.

In one or more embodiments of the present application, in order to speed up the search for the specified expression animation, the import module may further include the following sub-modules: the classification module is used for classifying the facial expression animation of the animation character expression library into a plurality of first expression sub-libraries according to the expression type of the facial expression animation in the imported animation character expression library; and the index module is used for compiling indexes of the facial expression animations in the first expression sub-libraries based on the voice types of the roles in the facial expression animations in each first expression sub-library. Specifically, referring to the schematic structural diagram of the animation character expression library shown in fig. 3, the expression animation is divided into a plurality of first expression sub-libraries (such as the expression sub-databases of "smile", "angry", and "sad" shown in the figure) according to the expression category to which the expression animation belongs. Each first expression sub-library also maintains an expression-state number index table, and the index table records the model state numbers corresponding to the voice types, so that the corresponding model state numbers can be quickly found through the identified voice types. Furthermore, it will be understood by those skilled in the art that the number and shape of the display elements in the drawings are merely exemplary references and are not intended to limit the present invention.

In one or more embodiments of the present application, the receiving module may further include the following sub-modules: the checking module is used for checking whether the first voice feature in the extracted preset voice file or the extracted real-time voice is matched with the voice template of the expression prompt word or not, and if not, the preset voice file or the real-time voice is proposed to be input again until the preset voice file or the real-time voice is input; and the determining module is used for determining the expression type indicated by the expression prompt words contained in the preset voice file or the real-time voice according to the expression type identified by the voice template. The first speech feature and the second speech feature refer to feature parameters formed by words, phonemes, tones, and the like of a specific language for the collected speech based on the specific language. The characteristic parameters are matched with the characteristic parameters of a plurality of groups of voices marked with specific expression meanings in the voice template of the expression prompt words so as to determine the expression prompt words contained in the received preset voice file or real-time voice. The speech features may be formed by analyzing the received sound using algorithms conventional in the art and not limited in this application. However, when the difference between the received preset voice file or real-time voice and the voice template of each expression prompt word is greater than a preset threshold, the preset voice file or real-time voice is considered to be doped with excessive environmental noise, which may have a serious influence on the judgment of the expression type. Therefore, in order to ensure accuracy, a preset voice file or real-time voice needs to be input again.

In one or more embodiments of the present application, the generating module may further include the following sub-modules: the extraction module extracts a second voice characteristic of the input speech of the lines in real time; the judging module is used for determining the voice type of the input speech of the speech-lines according to the extracted second voice characteristics; and the execution module is used for searching corresponding facial expression animations based on the voice type of the input speech-line voice and the index in the first expression sub-library so as to generate a series of character facial expression animations. The second speech feature of the speech-type word may be obtained by dividing the input real-time speech by syllables or phonemes and determining the physical features (such as duration, pitch, and intensity) of each segment of speech according to each syllable or phoneme. These physical characteristics will be used to determine the speech type corresponding to each syllable or phoneme. Since the first emoticon library is indexed according to the voice type, the model state number corresponding to the syllable or phoneme can be quickly found. The above operations are continuously and circularly executed, and the input speech-lines can be automatically converted into a series of facial expression animations of the characters.

In one or more embodiments of the present application, the speech template of the expressive cue words is generated by training pre-collected speech. The speech text of the training speech may be a syntactically complete sentence or one or more keywords. And reading the voice text for multiple times by one user, or reading the voice text by multiple users respectively to form multi-section input voice of the voice text. And training a corresponding voice template for each expression prompt word, wherein the voice template can be formed based on the multiple input voices. After the voice template is trained, when a preset voice file or real-time voice containing the expression prompt words is received, the preset voice file or the real-time voice is matched with the voice recognition template so as to determine which one of the plurality of voice templates corresponds to the preset voice file or the real-time voice. The matching of the preset voice file or the real-time voice and the voice template can be realized by a conventional algorithm in the field, which is not limited in this application.

It should be recognized that the embodiments of the present application can be realized and implemented by computer hardware, a combination of hardware and software, or by computer instructions stored in a non-transitory computer readable memory. The method may be implemented in a computer program using standard programming techniques, including a non-transitory computer readable storage medium configured with the computer program, where the storage medium so configured causes a computer to operate in a specific and predefined manner, according to the method and figures described in the detailed description. Each program may be implemented in a high level procedural or object oriented programming language to communicate with a computer system. However, the program(s) can be implemented in assembly or machine language, if desired. In any case, the language may be a compiled or interpreted language. Furthermore, the program can be run on a programmed application specific integrated circuit for this purpose.

Further, the method may be implemented in any type of computing platform operatively connected to a suitable interface, including but not limited to a personal computer, mini computer, mainframe, workstation, networked or distributed computing environment, separate or integrated computer platform, or in communication with a charged particle tool or other imaging device, and the like. Aspects of the application may be embodied in machine-readable code stored on a non-transitory storage medium or device, whether removable or integrated into a computing platform, such as a hard disk, optically read and/or write storage medium, RAM, ROM, or the like, such that it is readable by a programmable computer, which when read by the storage medium or device can be used to configure and operate the computer to perform the procedures described herein. Further, the machine-readable code, or portions thereof, may be transmitted over a wired or wireless network. The applications described herein include these and other different types of non-transitory computer-readable storage media when such media include instructions or programs that implement the above steps in conjunction with a microprocessor or other data processor. The present application also includes the computer itself when programmed according to the methods and techniques described herein.

A computer program can be applied to input data to perform the functions described herein to transform the input data to generate output data that is stored to non-volatile memory. The output information may also be applied to one or more output devices, such as a display. In a preferred embodiment of the present application, the transformed data represents physical and tangible objects, including particular visual depictions of physical and tangible objects produced on a display.

Other variations are within the spirit of the present application. Accordingly, while the disclosed technology is susceptible to various modifications and alternative constructions, certain embodiments thereof have been shown in the drawings and have been described above in detail. It should be understood, however, that there is no intention to limit the application to the specific form or forms disclosed; on the contrary, the intention is to cover all modifications, alternative constructions, and equivalents falling within the spirit and scope of the application, as defined in the appended claims.

Claims

1. A sound-driven three-dimensional animation role real-time expression generation method is suitable for two-dimensional electronic games and animation production, and is characterized by comprising the following steps:

2. The method according to claim 1, wherein said step S100 further comprises the sub-steps of:

s101) classifying facial expression animations of the animation character expression library into a plurality of first expression sub-libraries according to the expression types of the facial expression animations in the imported animation character expression library;

3. The method according to claim 2, wherein the step S200 further comprises the sub-steps of:

4. The method according to claim 2, wherein the step S400 further comprises the sub-steps of:

5. The method of any one of claims 2 to 4, wherein the speech template of the expressive cue word is generated by training a pre-collected speech.

6. The method of claim 5, wherein the speech template of the expressive cue is trained by dynamically updating the collected speech.

7. The method of claim 5, wherein the correspondence between a speech template and its identified emoji type is customized.

8. The method of claim 7, wherein the correspondence is stored locally.

9. A sound-driven three-dimensional animation role real-time expression generation system is suitable for two-dimensional electronic games and animation production, and is characterized by comprising the following modules:

the system comprises an importing module, a judging module and a judging module, wherein the importing module is used for importing an animation role expression library, and the animation role expression library comprises a plurality of facial expression animations of which animation roles are classified according to expression types and voice types;

the receiving module is used for receiving a preset voice file or real-time voice containing the expression prompt words and extracting the expression prompt words in the preset voice file or the real-time voice;

the matching module is used for matching a plurality of corresponding facial expression animations in the animation character expression library according to the extracted expression prompt words;

and the generating module is used for acquiring the speech of the speech lines in real time and generating the facial expression animation corresponding to the speech lines according to the speech type identified by the facial expression animation.

10. A computer-readable storage medium having stored thereon computer instructions, characterized in that the instructions, when executed by a processor, carry out the steps of the method according to any one of claims 1 to 8.