CN109801349B - Sound-driven three-dimensional animation character real-time expression generation method and system - Google Patents

Sound-driven three-dimensional animation character real-time expression generation method and system Download PDF

Info

Publication number
CN109801349B
CN109801349B CN201811553860.2A CN201811553860A CN109801349B CN 109801349 B CN109801349 B CN 109801349B CN 201811553860 A CN201811553860 A CN 201811553860A CN 109801349 B CN109801349 B CN 109801349B
Authority
CN
China
Prior art keywords
expression
voice
speech
animation
real
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201811553860.2A
Other languages
Chinese (zh)
Other versions
CN109801349A (en
Inventor
周湘君
杜庆焜
芦振华
甘靖山
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Wuhan Xishan Yichuang Culture Co ltd
Original Assignee
Wuhan Xishan Yichuang Culture Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Wuhan Xishan Yichuang Culture Co ltd filed Critical Wuhan Xishan Yichuang Culture Co ltd
Priority to CN201811553860.2A priority Critical patent/CN109801349B/en
Publication of CN109801349A publication Critical patent/CN109801349A/en
Application granted granted Critical
Publication of CN109801349B publication Critical patent/CN109801349B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Abstract

The application provides a sound-driven three-dimensional animation role real-time expression generation method, which is suitable for two-dimensional electronic games and animation production and comprises the following steps: importing an animation character expression library; receiving a preset voice file or real-time voice containing expression prompt words, and extracting the expression prompt words in the preset voice file or the real-time voice; matching a plurality of corresponding facial expression animations in the animation character expression library according to the extracted expression prompt words; and acquiring speech of the speech lines in real time, and generating facial expression animations corresponding to the speech lines according to the speech types identified by the facial expression animations. The application also provides a corresponding sound-driven three-dimensional animation character real-time expression generation system. The beneficial effect of this application does: the expression types of the facial expressions of the three-dimensional animation characters to be made are input through voice, and the corresponding facial animation expressions are directly matched based on the input speech of the lines, so that the making efficiency and the making accuracy of the facial animation are improved.

Description

Voice-driven three-dimensional animation character real-time expression generation method and system
Technical Field
The invention relates to the field of computer graphics, in particular to a sound-driven method and a device for generating real-time expressions of three-dimensional animation characters.
Background
In the process of developing an electronic game or animation, the facial expression of an animated character is often expressed in a close-up manner to match with the lines of a script due to the needs of the script. At this time, not only the change of the facial expression of the animated character is amplified to express the change of the mind of the character to the audience, but also the mouth shape of the character needs to be matched with the spoken lines thereof to avoid the picture from being inconsistent. The combination of the character's facial expression and mouth shape changes form a series of subtle and complex combinations. The sophistication and accuracy of such combinations will determine the character performance and overall quality of the video games and animations.
However, the current drawing mode of character animation still depends on manual operation to a large extent. Specifically, an artist first draws a key frame of a specified facial expression according to an original picture of a character; then, the mouth shape change of the character in the character animation is finely adjusted according to the stage words on the script, and a facial expression transition frame containing the mouth shape change is correspondingly inserted in a hand drawing mode. This causes the software developer or outsourcing design company to spend a large amount of labor cost and time to complete the above task, and also needs corresponding labor cost and time to check whether the lines and the mouth shape change are matched or not and whether the sound and the picture are inconsistent or not. However, the mouth shape corresponding to the same voice is fixed, and the way of changing the mouth shape is simpler compared with the combined change of other facial muscles. This makes the task of hand-drawing the transition frames in fact involve a significant amount of mechanical duplication of labor.
Disclosure of Invention
The method and the system for generating the three-dimensional animation role real-time expression in the voice-driven mode aim to overcome the defects of the prior art, and the effect of automatically forming the three-dimensional animation role expression animation in real time according to the input speech of the lines can be achieved.
In order to achieve the above object, the present application adopts the following technical solutions.
Firstly, the application provides a sound-driven three-dimensional animation role real-time expression generation method. The method is suitable for the production of two-dimensional electronic games and animations, and comprises the following steps:
s100) importing an animation character expression library, wherein the animation character expression library comprises a plurality of facial expression animations of which animation characters are classified according to expression types and voice types;
s200) receiving a preset voice file or real-time voice containing the expression prompt words, and extracting the expression prompt words in the preset voice file or the real-time voice;
s300) matching a plurality of corresponding facial expression animations in the animation character expression library according to the extracted expression cue words;
s400) obtaining speech of the speech lines in real time, and generating facial expression animations corresponding to the speech of the speech lines according to the speech types identified by the facial expression animations.
Further, in the above method of the present application, the step S100 may further include the following sub-steps:
s101) classifying facial expression animations in the animation character expression library into a plurality of first expression sub-libraries according to the expression types of the facial expression animations in the introduced animation character expression library;
s102) in each first expression sub-library, based on the voice type of the character in the facial expression animation, compiling indexes of the facial expression animations in the first expression sub-library.
Further, in the above method of the present application, the step S200 may further include the following sub-steps:
s201) checking whether the first voice feature in the extracted preset voice file or real-time voice is matched with a voice template of the expression prompt word, and if not, proposing to input the preset voice file or real-time voice again until the preset voice file or real-time voice is input;
s202) determining the expression type indicated by the expression prompt words contained in the preset voice file or the real-time voice according to the expression type identified by the voice template.
Still further, in the above method of the present application, the step S400 may further include the following sub-steps:
s401) extracting a second voice characteristic of the input speech in real time;
s402) determining the voice type of the input speech of the speech-lines according to the extracted second voice characteristics;
s403) searching for a corresponding facial expression animation based on the voice type of the inputted speech of the speech line and the index in the first expression sub-library to generate a series of facial expression animations of the character.
Still further, in the above method of the present application, the speech template of the expressive cue word is generated by training a speech collected in advance.
Alternatively, in the above method of the present application, the voice template of the expressive cue word is trained by dynamically updating the collected voices.
Alternatively, in the above method of the present application, the correspondence between the voice template and the expression type identified by the voice template is customized.
Further, in the above method of the present application, the correspondence relationship is stored locally.
Secondly, the application also provides a sound-driven three-dimensional animation role real-time expression generation system. The system is suitable for two-dimensional electronic games and animation production, and comprises the following modules: the animation character expression library comprises a plurality of facial expression animations, wherein the facial expression animations are classified according to expression types and voice types; the receiving module is used for receiving a preset voice file or real-time voice containing the expression prompt words and extracting the expression prompt words in the preset voice file or the real-time voice; the matching module is used for matching a plurality of corresponding facial expression animations in the animation character expression library according to the extracted expression prompt words; and the generating module is used for acquiring the speech of the speech lines in real time and generating the facial expression animation corresponding to the speech lines according to the speech type identified by the facial expression animation.
Further, in the above system of the present application, the import module may further include the following sub-modules: the classification module is used for classifying the facial expression animation of the animation character expression library into a plurality of first expression sub-libraries according to the expression type of the facial expression animation in the imported animation character expression library; and the index module is used for compiling indexes of the facial expression animations in the first expression sub-libraries based on the voice types of the roles in the facial expression animations in each first expression sub-library.
Further, in the above system of the present application, the receiving module may further include the following sub-modules: the checking module is used for checking whether the first voice feature in the extracted preset voice file or the extracted real-time voice is matched with the voice template of the expression prompt word or not, and if not, the preset voice file or the real-time voice is proposed to be input again until the preset voice file or the real-time voice is input; and the determining module is used for determining the expression type indicated by the expression prompt words contained in the preset voice file or the real-time voice according to the expression type identified by the voice template.
Still further, in the above system of the present application, the generating module may further include the following sub-modules: the extraction module extracts a second voice feature of the input speech of the speech-lines in real time; the judging module is used for determining the voice type of the input speech according to the extracted second voice feature; and the execution module is used for searching corresponding facial expression animations according to the voice type of the input speech and the index in the first expression sub-library so as to generate a series of character facial expression animations.
Still further, in the above system of the present application, the speech template of the expressive cue word is generated by training a speech collected in advance.
Alternatively, in the above system of the present application, the voice template of the expressive cue word is trained by dynamically updating the collected voices.
Alternatively, in the above system of the present application, the correspondence between the voice template and the expression type identified by the voice template is customized.
Further, in the above system of the present application, the correspondence relationship is stored locally.
Finally, the present application also provides a computer readable storage medium having stored thereon computer instructions suitable for two-dimensional video game and animation production. When the instructions are executed by the processor, the following steps are executed:
s100) importing an animation character expression library, wherein the animation character expression library comprises a plurality of facial expression animations of which animation characters are classified according to expression types and voice types;
s200) receiving a preset voice file or real-time voice containing expression prompt words, and extracting the expression prompt words in the preset voice file or the real-time voice;
s300) matching a plurality of corresponding facial expression animations in the animation character expression library according to the extracted expression cue words;
s400) obtaining speech of the speech lines in real time, and generating facial expression animations corresponding to the speech of the speech lines according to the speech types identified by the facial expression animations.
Further, when the processor executes the above instructions, the step S100 may further include the following sub-steps:
s101) classifying facial expression animations in the animation character expression library into a plurality of first expression sub-libraries according to the expression types of the facial expression animations in the introduced animation character expression library;
s102) in each first expression sub-library, based on the voice type of the character in the facial expression animation, compiling indexes of the facial expression animations in the first expression sub-library.
Further, when the processor executes the above instructions, the step S200 may further include the following sub-steps:
s201) checking whether the first voice feature in the extracted preset voice file or real-time voice is matched with a voice template of the expression prompt word, and if not, proposing to input the preset voice file or real-time voice again until the preset voice file or real-time voice is input;
s202) determining the expression type indicated by the expression prompt words contained in the preset voice file or the real-time voice according to the expression type identified by the voice template.
Still further, when the processor executes the above instructions, the step S400 may further include the following sub-steps:
s401) extracting a second voice characteristic of the input speech of the speech-stroke in real time;
s402) determining the voice type of the input speech of the speech-lines according to the extracted second voice characteristics;
s403) searching corresponding facial expression animations based on the voice type of the input speech-line voice and the index in the first expression sub-library to generate a series of facial expression animations of the character.
Still further, when the processor executes the instructions, the speech template of the expressive cue word is generated by training pre-collected speech.
Alternatively, the voice template of the expressive cue words is trained by dynamically updating the collected voices when the processor executes the instructions.
Alternatively, the correspondence between the speech template and its identified emoji type is custom when the processor executes the instructions.
Further, when the processor executes the above instructions, the correspondence relationship is stored locally.
The beneficial effect of this application does: the expression type of the facial expression of the three-dimensional animation character to be made is input through voice, and the corresponding facial animation expression is directly matched based on the input speech of the lines, so that the making efficiency and the making accuracy of the facial expression animation are improved.
Drawings
FIG. 1 is a flow chart of a method for generating real-time expressions of a three-dimensional animated character driven by sound disclosed in the present application;
FIG. 2 is a flowchart illustrating a sub-step method of importing an animated character emoticon library in one embodiment of the present application;
FIG. 3 is a diagram illustrating the structure of an animated character emoticon library in one embodiment of the present application;
FIG. 4 is a flowchart illustrating a sub-step method for extracting expressive hints from a pre-defined voice document or real-time speech in one embodiment of the present application;
FIG. 5 is a flow diagram illustrating a sub-method of matching facial expression animations from an animation character expression library, in one embodiment of the present application;
FIG. 6 is a diagram illustrating a customized voice template and emotion type correspondence, in an embodiment of the present application;
fig. 7 is a block diagram illustrating the structure of the system for generating real-time expressions of three-dimensional animated characters driven by sound disclosed in the present application.
Detailed Description
The conception, specific structure and technical effects of the present application will be described clearly and completely with reference to the following embodiments and the accompanying drawings, so that the purpose, scheme and effects of the present application can be fully understood. It should be noted that, in the present application, the embodiments and features of the embodiments may be combined with each other without conflict.
It should be noted that, unless otherwise specified, when a feature is referred to as being "fixed" or "connected" to another feature, it may be directly fixed or connected to the other feature or indirectly fixed or connected to the other feature. Furthermore, the descriptions of up, down, left, right, etc. used in this application are only relative to the relative positional relationship of the various components of the application in the drawings. As used in this application and the appended claims, the singular forms "a", "an", and "the" are intended to include the plural forms as well, unless the context clearly indicates otherwise.
Furthermore, unless defined otherwise, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this invention belongs. The terminology used in the description herein is for the purpose of describing particular embodiments only and is not intended to be limiting of the application. As used herein, the term "and/or" includes any combination of one or more of the associated listed items.
It will be understood that, although the terms first, second, third, etc. may be used herein to describe various elements, these elements should not be limited by these terms. These terms are only used to distinguish one type of element from another. For example, a first element could be termed a second element, and, similarly, a second element could be termed a first element, without departing from the scope of the present application. Depending on the context, the word "if" as used herein may be interpreted as "at" \8230; … "when" or "when 8230; \8230"; "when".
Referring to a method flow chart shown in fig. 1, the application discloses a sound-driven three-dimensional animation role real-time expression generation method, which is suitable for two-dimensional electronic games and animation production. The method comprises the following steps:
s100) importing an animation character expression library, wherein the animation character expression library comprises a plurality of facial expression animations of which animation characters are classified according to expression types and voice types;
s200) receiving a preset voice file or real-time voice containing the expression prompt words, and extracting the expression prompt words in the preset voice file or the real-time voice;
s300) matching a plurality of corresponding facial expression animations in the animation character expression library according to the extracted expression prompt words;
s400) obtaining speech of the speech lines in real time, and generating facial expression animations corresponding to the speech of the speech lines according to the speech types identified by the facial expression animations.
The animation character expression library may be an animation file of various expressions drawn based on an original drawing of an animation for a specific animation character. Specifically, in one or more embodiments of the present application, a model of a three-dimensional character may be created in three-dimensional software according to a related character original drawing, and a related map file may be created in two-dimensional software; then, bones are made and bound for the three-dimensional roles, weight files are drawn, and related facial expression animation files are made. These facial expression animation files have been produced by classifying the current state of the model of the three-dimensional character according to the expression type and the voice type. Wherein the expression type indicates the expression (e.g., "smile", "cry", and "confusion", etc.) that it represents, and the voice type indicates the mouth shape (e.g., mouth shape, smile, etc.) that it represents. To make a coherent animated expression, the particular states of the models may be arranged in a certain order and adjacent states may be made to transition naturally by interpolating the model parameters.
Referring to the flowchart of the sub-method shown in fig. 2, in one or more embodiments of the present application, in order to speed up the search for the specified emoji animation, the step S100 further includes the following sub-steps:
s101) classifying facial expression animations in the animation character expression library into a plurality of first expression sub-libraries according to the expression types of the facial expression animations in the introduced animation character expression library;
s102) in each first expression sub-library, based on the voice type of the character in the facial expression animation, compiling indexes of the facial expression animations in the first expression sub-library.
Specifically, referring to the schematic structural diagram of the animation character emotion library shown in fig. 3, the emotion animation is divided into a plurality of first emotion sub-libraries (such as expression sub-databases of "smile", "angry", and "sorry" shown in the figure) according to the emotion category to which the emotion animation belongs. Each first expression sub-library also maintains an expression-state number index table, and the index table records the model state numbers corresponding to the voice types, so that the corresponding model state numbers can be quickly found through the identified voice types. Further, those skilled in the art will appreciate that the number and shape of the display elements in the drawings are merely exemplary references and are not intended to limit the present invention.
Referring to the sub-method flow diagram shown in fig. 4, in one or more embodiments of the present application, the step S200 further includes the following sub-steps:
s201) checking whether the first voice feature in the extracted preset voice file or real-time voice is matched with a voice template of the expression prompt word, and if not, proposing to input the preset voice file or real-time voice again until the preset voice file or real-time voice is input;
s202) determining the expression type indicated by the expression prompt words contained in the preset voice file or the real-time voice according to the expression type identified by the voice template.
The first speech feature and the second speech feature referred to in this application refer to feature parameters formed by words, phonemes, tones, and the like of a specific language for collected speech on the basis of the specific language. The characteristic parameters are matched with the characteristic parameters of a plurality of groups of voices marked with specific expression meanings in the voice template of the expression prompt words so as to determine the expression prompt words contained in the received preset voice file or real-time voice. The speech features may be formed by analyzing the received sound using algorithms conventional in the art and not limited in this application. However, when the difference between the received preset voice file or real-time voice and the voice template of each expression prompt word is greater than a preset threshold, the preset voice file or real-time voice is considered to be doped with excessive environmental noise, which may have a serious influence on the judgment of the expression type. Therefore, in order to ensure accuracy, a preset voice file or real-time voice needs to be input again.
Referring to the sub-method flow diagram shown in fig. 5, in one or more embodiments of the present application, the step S400 further includes the following sub-steps:
s401) extracting a second voice characteristic of the input speech of the speech-stroke in real time;
s402) determining the voice type of the input speech of the speech-lines according to the extracted second voice characteristics;
s403) searching corresponding facial expression animations based on the voice type of the input speech-line voice and the index in the first expression sub-library to generate a series of facial expression animations of the character.
The second speech feature of the speech-type speech may be obtained by dividing the input real-time speech by syllables or phonemes and determining the physical features (such as duration, pitch, and intensity) of each segment of speech according to each syllable or phoneme. These physical characteristics will be used to determine the speech type corresponding to each syllable or phoneme. Since the first emoticon library is indexed according to the voice type, the model state number corresponding to the syllable or phoneme can be quickly found. The above operations are continuously and circularly executed, and the input speech-lines can be automatically converted into a series of facial expression animations of the characters.
In one or more embodiments of the present application, the speech template of the expressive cue words is generated by training pre-collected speech. The speech text of the training speech may be a syntactically complete sentence or one or more keywords. And reading the voice text for multiple times by one user or respectively reading the voice texts by multiple users to form multiple sections of input voice of the voice text. And training a corresponding voice template for each expression prompt word, wherein the voice template can be formed based on the multiple input voices. After the voice template is trained, when a preset voice file or real-time voice containing the expression prompt words is received, the preset voice file or the real-time voice is matched with the voice recognition template so as to determine which one of the plurality of voice templates corresponds to the preset voice file or the real-time voice. The matching of the preset voice file or the real-time voice and the voice template can be realized by a conventional algorithm in the field, which is not limited in this application.
Still further, referring to the schematic diagram of the correspondence between the voice template and the expression type shown in fig. 6, in one or more embodiments of the present application, the correspondence between the voice template and the expression type identified by the voice template is self-defined. Therefore, after the model of the character role is modified, the corresponding three-dimensional animation character expression can be immediately rebuilt only by adjusting the corresponding relation. Meanwhile, the corresponding relation can be further stored locally, so that the expression-state number index table stored in each first expression sub-library can be maintained more conveniently.
Referring to a module structure diagram shown in fig. 7, the application also discloses a sound-driven three-dimensional animation role real-time expression generation system, which is suitable for two-dimensional electronic games and animation production. The system comprises the following modules: the system comprises an importing module, a judging module and a judging module, wherein the importing module is used for importing an animation role expression library, and the animation role expression library comprises a plurality of facial expression animations of which animation roles are classified according to expression types and voice types; the receiving module is used for receiving a preset voice file or real-time voice containing the expression prompt words and extracting the expression prompt words in the preset voice file or the real-time voice; the matching module is used for matching a plurality of corresponding facial expression animations in the animation character expression library according to the extracted expression prompt words; and the generation module is used for acquiring the speech of the speech line in real time and generating the facial expression animation corresponding to the speech line according to the speech type identified by the facial expression animation. The animation character expression library may be an animation file of various expressions drawn based on an original drawing of an animation for a specific animation character. Specifically, in one or more embodiments of the present application, a model of a three-dimensional character may be created in three-dimensional software according to a related character original drawing, and a related chartlet file may be created in two-dimensional software; then, bones are made and bound for the three-dimensional roles, weight files are drawn, and related facial expression animation files are made. These facial expression animation files have been produced by classifying the current state of the model of the three-dimensional character according to the expression type and the voice type. Wherein the expression type indicates the expression that it represents (e.g., "smile," "cry," and "confusion," etc.), and the voice type indicates the mouth shape that it represents (e.g., mouth shape is slightly open and sipped, etc.). To make a coherent animated expression, the particular states of the models may be arranged in a certain order and adjacent states may be made to transition naturally by interpolating the model parameters.
In one or more embodiments of the present application, in order to speed up the search for the specified expression animation, the import module may further include the following sub-modules: the classification module is used for classifying the facial expression animation of the animation character expression library into a plurality of first expression sub-libraries according to the expression type of the facial expression animation in the imported animation character expression library; and the index module is used for compiling indexes of the facial expression animations in the first expression sub-libraries based on the voice types of the roles in the facial expression animations in each first expression sub-library. Specifically, referring to the schematic structural diagram of the animation character expression library shown in fig. 3, the expression animation is divided into a plurality of first expression sub-libraries (such as the expression sub-databases of "smile", "angry", and "sad" shown in the figure) according to the expression category to which the expression animation belongs. Each first expression sub-library also maintains an expression-state number index table, and the index table records the model state numbers corresponding to the voice types, so that the corresponding model state numbers can be quickly found through the identified voice types. Furthermore, it will be understood by those skilled in the art that the number and shape of the display elements in the drawings are merely exemplary references and are not intended to limit the present invention.
In one or more embodiments of the present application, the receiving module may further include the following sub-modules: the checking module is used for checking whether the first voice feature in the extracted preset voice file or the extracted real-time voice is matched with the voice template of the expression prompt word or not, and if not, the preset voice file or the real-time voice is proposed to be input again until the preset voice file or the real-time voice is input; and the determining module is used for determining the expression type indicated by the expression prompt words contained in the preset voice file or the real-time voice according to the expression type identified by the voice template. The first speech feature and the second speech feature refer to feature parameters formed by words, phonemes, tones, and the like of a specific language for the collected speech based on the specific language. The characteristic parameters are matched with the characteristic parameters of a plurality of groups of voices marked with specific expression meanings in the voice template of the expression prompt words so as to determine the expression prompt words contained in the received preset voice file or real-time voice. The speech features may be formed by analyzing the received sound using algorithms conventional in the art and not limited in this application. However, when the difference between the received preset voice file or real-time voice and the voice template of each expression prompt word is greater than a preset threshold, the preset voice file or real-time voice is considered to be doped with excessive environmental noise, which may have a serious influence on the judgment of the expression type. Therefore, in order to ensure accuracy, a preset voice file or real-time voice needs to be input again.
In one or more embodiments of the present application, the generating module may further include the following sub-modules: the extraction module extracts a second voice characteristic of the input speech of the lines in real time; the judging module is used for determining the voice type of the input speech of the speech-lines according to the extracted second voice characteristics; and the execution module is used for searching corresponding facial expression animations based on the voice type of the input speech-line voice and the index in the first expression sub-library so as to generate a series of character facial expression animations. The second speech feature of the speech-type word may be obtained by dividing the input real-time speech by syllables or phonemes and determining the physical features (such as duration, pitch, and intensity) of each segment of speech according to each syllable or phoneme. These physical characteristics will be used to determine the speech type corresponding to each syllable or phoneme. Since the first emoticon library is indexed according to the voice type, the model state number corresponding to the syllable or phoneme can be quickly found. The above operations are continuously and circularly executed, and the input speech-lines can be automatically converted into a series of facial expression animations of the characters.
In one or more embodiments of the present application, the speech template of the expressive cue words is generated by training pre-collected speech. The speech text of the training speech may be a syntactically complete sentence or one or more keywords. And reading the voice text for multiple times by one user, or reading the voice text by multiple users respectively to form multi-section input voice of the voice text. And training a corresponding voice template for each expression prompt word, wherein the voice template can be formed based on the multiple input voices. After the voice template is trained, when a preset voice file or real-time voice containing the expression prompt words is received, the preset voice file or the real-time voice is matched with the voice recognition template so as to determine which one of the plurality of voice templates corresponds to the preset voice file or the real-time voice. The matching of the preset voice file or the real-time voice and the voice template can be realized by a conventional algorithm in the field, which is not limited in this application.
Still further, referring to the schematic diagram of the correspondence between the voice template and the expression type shown in fig. 6, in one or more embodiments of the present application, the correspondence between the voice template and the expression type identified by the voice template is self-defined. Therefore, after the model of the character role is modified, the corresponding three-dimensional animation character expression can be immediately rebuilt only by adjusting the corresponding relation. Meanwhile, the corresponding relation can be further stored locally, so that the expression-state number index table stored in each first expression sub-library can be maintained more conveniently.
It should be recognized that the embodiments of the present application can be realized and implemented by computer hardware, a combination of hardware and software, or by computer instructions stored in a non-transitory computer readable memory. The method may be implemented in a computer program using standard programming techniques, including a non-transitory computer readable storage medium configured with the computer program, where the storage medium so configured causes a computer to operate in a specific and predefined manner, according to the method and figures described in the detailed description. Each program may be implemented in a high level procedural or object oriented programming language to communicate with a computer system. However, the program(s) can be implemented in assembly or machine language, if desired. In any case, the language may be a compiled or interpreted language. Furthermore, the program can be run on a programmed application specific integrated circuit for this purpose.
Further, the method may be implemented in any type of computing platform operatively connected to a suitable interface, including but not limited to a personal computer, mini computer, mainframe, workstation, networked or distributed computing environment, separate or integrated computer platform, or in communication with a charged particle tool or other imaging device, and the like. Aspects of the application may be embodied in machine-readable code stored on a non-transitory storage medium or device, whether removable or integrated into a computing platform, such as a hard disk, optically read and/or write storage medium, RAM, ROM, or the like, such that it is readable by a programmable computer, which when read by the storage medium or device can be used to configure and operate the computer to perform the procedures described herein. Further, the machine-readable code, or portions thereof, may be transmitted over a wired or wireless network. The applications described herein include these and other different types of non-transitory computer-readable storage media when such media include instructions or programs that implement the above steps in conjunction with a microprocessor or other data processor. The present application also includes the computer itself when programmed according to the methods and techniques described herein.
A computer program can be applied to input data to perform the functions described herein to transform the input data to generate output data that is stored to non-volatile memory. The output information may also be applied to one or more output devices, such as a display. In a preferred embodiment of the present application, the transformed data represents physical and tangible objects, including particular visual depictions of physical and tangible objects produced on a display.
Other variations are within the spirit of the present application. Accordingly, while the disclosed technology is susceptible to various modifications and alternative constructions, certain embodiments thereof have been shown in the drawings and have been described above in detail. It should be understood, however, that there is no intention to limit the application to the specific form or forms disclosed; on the contrary, the intention is to cover all modifications, alternative constructions, and equivalents falling within the spirit and scope of the application, as defined in the appended claims.

Claims (10)

1. A sound-driven three-dimensional animation role real-time expression generation method is suitable for two-dimensional electronic games and animation production, and is characterized by comprising the following steps:
s100) importing an animation character expression library, wherein the animation character expression library comprises a plurality of facial expression animations of which animation characters are classified according to expression types and voice types;
s200) receiving a preset voice file or real-time voice containing the expression prompt words, and extracting the expression prompt words in the preset voice file or the real-time voice;
s300) matching a plurality of corresponding facial expression animations in the animation character expression library according to the extracted expression cue words;
s400) obtaining speech of the speech lines in real time, and generating facial expression animations corresponding to the speech of the speech lines according to the speech types identified by the facial expression animations.
2. The method according to claim 1, wherein said step S100 further comprises the sub-steps of:
s101) classifying facial expression animations of the animation character expression library into a plurality of first expression sub-libraries according to the expression types of the facial expression animations in the imported animation character expression library;
s102) in each first expression sub-library, based on the voice type of the character in the facial expression animation, compiling indexes of the facial expression animations in the first expression sub-library.
3. The method according to claim 2, wherein the step S200 further comprises the sub-steps of:
s201) checking whether the first voice feature in the extracted preset voice file or real-time voice is matched with a voice template of the expression prompt word, and if not, proposing to input the preset voice file or real-time voice again until the preset voice file or real-time voice is input;
s202) determining the expression type indicated by the expression prompt words contained in the preset voice file or the real-time voice according to the expression type identified by the voice template.
4. The method according to claim 2, wherein the step S400 further comprises the sub-steps of:
s401) extracting a second voice characteristic of the input speech of the speech-stroke in real time;
s402) determining the voice type of the input speech of the speech-lines according to the extracted second voice characteristics;
s403) searching for a corresponding facial expression animation based on the voice type of the inputted speech of the speech line and the index in the first expression sub-library to generate a series of facial expression animations of the character.
5. The method of any one of claims 2 to 4, wherein the speech template of the expressive cue word is generated by training a pre-collected speech.
6. The method of claim 5, wherein the speech template of the expressive cue is trained by dynamically updating the collected speech.
7. The method of claim 5, wherein the correspondence between a speech template and its identified emoji type is customized.
8. The method of claim 7, wherein the correspondence is stored locally.
9. A sound-driven three-dimensional animation role real-time expression generation system is suitable for two-dimensional electronic games and animation production, and is characterized by comprising the following modules:
the system comprises an importing module, a judging module and a judging module, wherein the importing module is used for importing an animation role expression library, and the animation role expression library comprises a plurality of facial expression animations of which animation roles are classified according to expression types and voice types;
the receiving module is used for receiving a preset voice file or real-time voice containing the expression prompt words and extracting the expression prompt words in the preset voice file or the real-time voice;
the matching module is used for matching a plurality of corresponding facial expression animations in the animation character expression library according to the extracted expression prompt words;
and the generating module is used for acquiring the speech of the speech lines in real time and generating the facial expression animation corresponding to the speech lines according to the speech type identified by the facial expression animation.
10. A computer-readable storage medium having stored thereon computer instructions, characterized in that the instructions, when executed by a processor, carry out the steps of the method according to any one of claims 1 to 8.
CN201811553860.2A 2018-12-19 2018-12-19 Sound-driven three-dimensional animation character real-time expression generation method and system Active CN109801349B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201811553860.2A CN109801349B (en) 2018-12-19 2018-12-19 Sound-driven three-dimensional animation character real-time expression generation method and system

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201811553860.2A CN109801349B (en) 2018-12-19 2018-12-19 Sound-driven three-dimensional animation character real-time expression generation method and system

Publications (2)

Publication Number Publication Date
CN109801349A CN109801349A (en) 2019-05-24
CN109801349B true CN109801349B (en) 2023-01-24

Family

ID=66557150

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201811553860.2A Active CN109801349B (en) 2018-12-19 2018-12-19 Sound-driven three-dimensional animation character real-time expression generation method and system

Country Status (1)

Country Link
CN (1) CN109801349B (en)

Families Citing this family (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110413841A (en) * 2019-06-13 2019-11-05 深圳追一科技有限公司 Polymorphic exchange method, device, system, electronic equipment and storage medium
CN110677598B (en) * 2019-09-18 2022-04-12 北京市商汤科技开发有限公司 Video generation method and device, electronic equipment and computer storage medium
CN111145322B (en) * 2019-12-26 2024-01-19 上海浦东发展银行股份有限公司 Method, apparatus, and computer-readable storage medium for driving avatar
CN113240781A (en) * 2021-05-20 2021-08-10 东营友帮建安有限公司 Movie animation production method and system based on voice driving and image recognition
CN114928755B (en) * 2022-05-10 2023-10-20 咪咕文化科技有限公司 Video production method, electronic equipment and computer readable storage medium
CN115617169B (en) * 2022-10-11 2023-05-30 深圳琪乐科技有限公司 Voice control robot and robot control method based on role relation

Family Cites Families (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP3674875B2 (en) * 1994-10-24 2005-07-27 株式会社イメージリンク Animation system
CN106875955A (en) * 2015-12-10 2017-06-20 掌赢信息科技(上海)有限公司 The preparation method and electronic equipment of a kind of sound animation
CN108763190B (en) * 2018-04-12 2019-04-02 平安科技(深圳)有限公司 Voice-based mouth shape cartoon synthesizer, method and readable storage medium storing program for executing

Also Published As

Publication number Publication date
CN109801349A (en) 2019-05-24

Similar Documents

Publication Publication Date Title
CN109801349B (en) Sound-driven three-dimensional animation character real-time expression generation method and system
CN110688911B (en) Video processing method, device, system, terminal equipment and storage medium
US11144597B2 (en) Computer generated emulation of a subject
CN108763190B (en) Voice-based mouth shape cartoon synthesizer, method and readable storage medium storing program for executing
CN106653052A (en) Virtual human face animation generation method and device
CN112162628A (en) Multi-mode interaction method, device and system based on virtual role, storage medium and terminal
CN112650831A (en) Virtual image generation method and device, storage medium and electronic equipment
CN109461437B (en) Verification content generation method and related device for lip language identification
CN111653265B (en) Speech synthesis method, device, storage medium and electronic equipment
CN112329451B (en) Sign language action video generation method, device, equipment and storage medium
CN113538636B (en) Virtual object control method and device, electronic equipment and medium
Wang et al. Comic-guided speech synthesis
CN112750187A (en) Animation generation method, device and equipment and computer readable storage medium
CN116704085A (en) Avatar generation method, apparatus, electronic device, and storage medium
CN115511704A (en) Virtual customer service generation method and device, electronic equipment and storage medium
CN115497448A (en) Method and device for synthesizing voice animation, electronic equipment and storage medium
JP6222465B2 (en) Animation generating apparatus, animation generating method and program
CN110781329A (en) Image searching method and device, terminal equipment and storage medium
JP6082657B2 (en) Pose assignment model selection device, pose assignment device, method and program thereof
CN112331184B (en) Voice mouth shape synchronization method and device, electronic equipment and storage medium
CN112233648A (en) Data processing method, device, equipment and storage medium combining RPA and AI
CN112733546A (en) Expression symbol generation method and device, electronic equipment and storage medium
CN116580721B (en) Expression animation generation method and device and digital human platform
CN114786059B (en) Video generation method, video generation device, electronic device, and storage medium
CN116721451A (en) Emotion cognition analysis method and device based on multi-modal fusion

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant