CN112348932A

CN112348932A - Mouth shape animation recording method and device, electronic equipment and storage medium

Info

Publication number: CN112348932A
Application number: CN202011268798.XA
Authority: CN
Inventors: 王毅; 赵冰; 郑宇辉; 谢文政
Original assignee: Guangzhou Boguan Information Technology Co Ltd
Current assignee: Guangzhou Boguan Information Technology Co Ltd
Priority date: 2020-11-13
Filing date: 2020-11-13
Publication date: 2021-02-09

Abstract

The disclosure provides a mouth-shaped animation recording method and device, electronic equipment and a storage medium, and relates to the technical field of computers. The mouth shape animation recording method comprises the following steps: responding to an operation instruction for starting recording of the mouth shape animation, and acquiring voice visual element data for generating the mouth shape animation in real time; converting the voice visual element data to obtain mouth shape deformation data; before the animation thread in the real-time rendering engine refreshes the global posture of the mouth shape animation, the mouth shape deformation data are assembled into a fusion curve corresponding to the mouth shape animation to obtain an editable mouth shape animation and stored, so that the editable mouth shape animation is recorded. The technical scheme of the embodiment of the disclosure can realize the recording of the mouth shape animation in the real-time rendering engine, and the mouth shape animation is edited and adjusted through an editing tool or after being exported.

Description

Mouth shape animation recording method and device, electronic equipment and storage medium

Technical Field

The present disclosure relates to the field of computer technologies, and in particular, to a mouth shape animation recording method, a mouth shape animation recording apparatus, an electronic device, and a computer-readable storage medium.

Background

With the rapid development of computer technology, the quality of electronic games, movies and the like is more and more concerned by people, and the synthesis of three-dimensional anthropomorphic expression animations with high reality sense becomes one of important technical targets in the related field.

At present, when a three-dimensional anthropomorphic expression animation is generated by a real-time rendering engine (such as a ghost engine UE4) in a related technical scheme, because a latest expression animation production scheme is a mode of constructing and pulling through points on an expression change line and an original model, most of the current real-time engines do not support the mode, production and application are separated, the fusion deformation of the expression animation cannot be finely adjusted, meanwhile, animation parameters in the real-time rendering engine are relatively fixed, and a modeler cannot strictly construct a model according to the fixed animation parameters, so that the expression effect of the mouth shape animation in the real-time rendering engine is poor, and even the mouth shape animation is broken or torn. Meanwhile, the real-time rendering engine does not support the recording of the mouth shape animation, so that the mouth shape animation cannot be edited, modified or exported.

It is to be noted that the information disclosed in the above background section is only for enhancement of understanding of the background of the present disclosure, and thus may include information that does not constitute prior art known to those of ordinary skill in the art.

Disclosure of Invention

An object of the embodiments of the present disclosure is to provide a mouth shape animation recording method, a mouth shape animation recording device, an electronic device, and a computer readable storage medium, so as to overcome, at least to a certain extent, a problem that a mouth shape animation cannot be edited, modified, or exported due to a fact that a real-time rendering engine does not support recording of the mouth shape animation in a related art scheme.

Additional features and advantages of the disclosure will be set forth in the detailed description which follows, or in part will be obvious from the description, or may be learned by practice of the disclosure.

According to a first aspect of the embodiments of the present disclosure, there is provided a mouth-shape animation recording method, including: responding to an operation instruction for starting recording of the mouth shape animation, and acquiring voice visual element data for generating the mouth shape animation in real time; converting the voice visual element data to obtain mouth shape deformation data; before the animation thread in the real-time rendering engine refreshes the global posture of the mouth shape animation, the mouth shape deformation data are assembled into a fusion curve corresponding to the mouth shape animation to obtain an editable mouth shape animation and stored, so that the editable mouth shape animation is recorded.

In some example embodiments of the present disclosure, based on the foregoing, the real-time rendering engine includes a character class that performs logical assembly and an animation space class that performs animation assembly; the assembling the mouth shape deformation data into the fusion curve corresponding to the mouth shape animation to obtain the editable mouth shape animation comprises the following steps: in response to the selection operation of the check option for transferring the mouth shape deformation data, transferring the mouth shape deformation data into a mouth shape deformation array through the role class; wherein the data structure of the mouth shape deformation array is the same as the data structure in the animation space class; and assembling the mouth shape deformation array transferred to the animation space class into a fusion curve corresponding to the mouth shape animation to obtain the editable mouth shape animation.

In some example embodiments of the present disclosure, based on the foregoing solution, assembling the mouth shape deformation array passed into the animation space class into a corresponding fusion curve of the mouth shape animation to obtain an editable mouth shape animation, including: acquiring a pre-configured data configuration table; and based on the data configuration table, assembling the mouth shape deformation array transferred to the animation space class into a fusion curve corresponding to the mouth shape animation to obtain the editable mouth shape animation.

In some example embodiments of the present disclosure, based on the foregoing scheme, converting the speech visual data to obtain mouth shape deformation data includes: converting the voice visual element data through a preset voice recognition toolkit to obtain mouth shape deformation data; and performing association processing on the voice visual element data and the mouth shape deformation data of different channels through a preset transcription function so as to realize unified processing on the fusion deformation data corresponding to the mouth shape animation.

In some example embodiments of the present disclosure, based on the foregoing scheme, after the voice visual element data of different channels and the mouth shape deformation data are associated through a preset transcription function, the method further includes: detecting whether the die deformation data of different channels exceeds a deformation threshold; and in response to the fact that the die deformation data of different channels exceed a deformation threshold, weakening the die deformation data through a preset weakening coefficient.

In some example embodiments of the present disclosure, in response to a selection operation of a check option for transcribing the mouth shape deformation data based on the foregoing scheme, transcribing the mouth shape deformation data into a mouth shape deformation array through the role class includes: predefining an empty array in the role class; and responding to the selection operation of the checking option for transferring the mouth shape deformation data, and writing the mouth shape deformation data into the empty array through a preset transfer function in the role class to generate the mouth shape deformation array.

In some example embodiments of the present disclosure, based on the foregoing solution, before assembling the mouth shape deformation data into the corresponding fusion curve of the mouth shape animation to obtain the editable mouth shape animation, the method further includes: acquiring voice visual position data corresponding to the mouth shape animation; and fusing the voice visual position data with a preset change curve corresponding to the mouth shape animation to obtain a fusion curve corresponding to the mouth shape animation.

According to a second aspect of the embodiments of the present disclosure, there is provided a mouth-shape animation recording apparatus, including: the voice visual element data acquisition module is used for responding to an operation instruction for starting recording of the mouth shape animation and acquiring voice visual element data corresponding to the mouth shape animation in real time; the voice visual element data conversion model is used for converting the voice visual element data to obtain mouth shape deformation data; the editable mouth shape animation recording module is used for assembling the mouth shape deformation data into a fusion curve corresponding to the mouth shape animation to obtain the editable mouth shape animation before the animation thread in the real-time rendering engine refreshes the global posture of the mouth shape animation, and storing the editable mouth shape animation so as to record the editable mouth shape animation.

In an exemplary embodiment of the present disclosure, based on the foregoing solution, the editable mouth shape animation recording module further includes: the mouth shape deformation array generating unit is used for responding to the selection operation of the check option for transferring the mouth shape deformation data, and transferring the mouth shape deformation data into a mouth shape deformation array through the role class; wherein the data structure of the mouth shape deformation array is the same as the data structure in the animation space class; and the mouth shape deformation data assembling unit is used for assembling the mouth shape deformation array transferred to the animation space class into a fusion curve corresponding to the mouth shape animation to obtain the editable mouth shape animation.

In an exemplary embodiment of the present disclosure, based on the foregoing, the die deformation data assembling unit is further configured to: acquiring a pre-configured data configuration table; and based on the data configuration table, assembling the mouth shape deformation array transferred to the animation space class into a fusion curve corresponding to the mouth shape animation to obtain the editable mouth shape animation.

In an exemplary embodiment of the present disclosure, based on the foregoing solution, the apparatus for recording lip-rounding animation further includes a data association unit configured to: converting the voice visual element data through a preset voice recognition toolkit to obtain mouth shape deformation data; and performing association processing on the voice visual element data and the mouth shape deformation data of different channels through a preset transcription function so as to realize unified processing on the fusion deformation data corresponding to the mouth shape animation.

In an exemplary embodiment of the present disclosure, based on the foregoing solution, the mouth shape animation recording apparatus further includes a mouth shape animation weakening unit configured to: detecting whether the die deformation data of different channels exceeds a deformation threshold; and in response to the fact that the die deformation data of different channels exceed a deformation threshold, weakening the die deformation data through a preset weakening coefficient.

In an exemplary embodiment of the present disclosure, based on the foregoing scheme, the mouth shape deformation array generating unit is further configured to: predefining an empty array in the role class; and responding to the selection operation of the checking option for transferring the mouth shape deformation data, and writing the mouth shape deformation data into the empty array through a preset transfer function in the role class to generate the mouth shape deformation array.

In an exemplary embodiment of the present disclosure, based on the foregoing solution, the apparatus for recording lip-rounding animation further includes a data association unit configured to: acquiring voice visual position data corresponding to the mouth shape animation; and fusing the voice visual position data with a preset change curve corresponding to the mouth shape animation to obtain a fusion curve corresponding to the mouth shape animation.

According to a third aspect of the embodiments of the present disclosure, there is provided an electronic apparatus including: a processor; and a memory, wherein the memory stores computer readable instructions, and the computer readable instructions when executed by the processor implement any one of the above-mentioned mouth shape animation recording methods.

According to a fourth aspect of the embodiments of the present disclosure, there is provided a computer-readable storage medium having stored thereon a computer program which, when executed by a processor, implements the mouth shape animation recording method according to any one of the above.

The technical scheme provided by the embodiment of the disclosure can have the following beneficial effects:

according to the mouth shape animation recording method in the exemplary embodiment of the disclosure, when the operation of starting recording the mouth shape animation is detected, voice visual element data used for generating the mouth shape animation is obtained in real time; converting the voice and video data to obtain mouth shape deformation data; and then before the animation thread in the real-time rendering engine refreshes the global posture of the mouth-shaped animation, the mouth-shaped deformation data is assembled into a fusion curve corresponding to the mouth-shaped animation to obtain the editable mouth-shaped animation and is stored, so that the editable mouth-shaped animation is recorded. On one hand, before the animation thread in the real-time rendering engine refreshes the global posture of the mouth shape animation (namely the mouth shape animation is not bound to the virtual model by the real-time rendering engine), the editable mouth shape animation containing the mouth shape deformation data is recorded, the recording of the editable mouth shape animation is realized, and then the recorded editable mouth shape animation can be edited or corrected through other tools after being exported, so that the quality of the mouth shape animation is improved; and on the other hand, the mouth shape deformation data is assembled into a fusion curve corresponding to the mouth shape animation to obtain the editable mouth shape animation, so that the recorded editable mouth shape animation can be edited after being separated from the real-time rendering engine, and the expression effect of the mouth shape animation is improved through editing and correction.

It is to be understood that both the foregoing general description and the following detailed description are exemplary and explanatory only and are not restrictive of the disclosure.

Drawings

The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments consistent with the present disclosure and together with the description, serve to explain the principles of the disclosure. It is to be understood that the drawings in the following description are merely exemplary of the disclosure, and that other drawings may be derived from those drawings by one of ordinary skill in the art without the exercise of inventive faculty. In the drawings:

FIG. 1 is a schematic diagram schematically illustrating an animation assembly flow in a related art;

FIG. 2 schematically illustrates a flow diagram of a method of mouth-animation recording for a real-time rendering engine, according to some embodiments of the present disclosure;

FIG. 3 schematically illustrates a flow diagram for assembling mouth-morphing data to generate an editable mouth animation according to some embodiments of the present disclosure;

FIG. 4 schematically illustrates a flow diagram for assembling die deformation data into a blend curve, according to some embodiments of the present disclosure;

FIG. 5 schematically illustrates a schematic diagram of recording an editable mouth animation according to some embodiments of the disclosure;

FIG. 6 schematically illustrates a schematic diagram of associating speech visual data of different channels with mouth-shape deformation data, according to some embodiments of the present disclosure;

FIG. 7 schematically illustrates a flow diagram of the weakening of die deformation data according to some embodiments of the present disclosure;

FIG. 8 schematically illustrates a schematic diagram of a lip motion picture recording apparatus according to some embodiments of the present disclosure;

FIG. 9 schematically illustrates a structural schematic of a computer system of an electronic device, in accordance with some embodiments of the present disclosure;

fig. 10 schematically illustrates a schematic diagram of a computer-readable storage medium, according to some embodiments of the present disclosure.

In the drawings, the same or corresponding reference numerals indicate the same or corresponding parts.

Detailed Description

Example embodiments will now be described more fully with reference to the accompanying drawings. Example embodiments may, however, be embodied in many different forms and should not be construed as limited to the examples set forth herein; rather, these embodiments are provided so that this disclosure will be thorough and complete, and will fully convey the concept of example embodiments to those skilled in the art.

Furthermore, the described features, structures, or characteristics may be combined in any suitable manner in one or more embodiments. In the following description, numerous specific details are provided to give a thorough understanding of embodiments of the disclosure. One skilled in the relevant art will recognize, however, that the subject matter of the present disclosure can be practiced without one or more of the specific details, or with other methods, components, devices, steps, and so forth. In other instances, well-known methods, devices, implementations, or operations have not been shown or described in detail to avoid obscuring aspects of the disclosure.

Furthermore, the drawings are merely schematic illustrations and are not necessarily drawn to scale. The block diagrams shown in the figures are functional entities only and do not necessarily correspond to physically separate entities. I.e. these functional entities may be implemented in the form of software, or in one or more hardware modules or integrated circuits, or in different networks and/or processor means and/or microcontroller means.

The inventor finds that in a real-time rendering engine, audio obtained in a streaming form and played synchronously produces a series of mouth shape deformation data after voice mouth shape recognition, and drives a designer to define a well-defined mouth shape action change curve in advance through the mouth shape deformation data, so as to finally form continuous mouth shape animation. In order to enrich the variety of the mouth shapes, designers provide diversified preset mouth shape animations which can obtain more detailed emotional performance when being adjusted singly or in combination.

However, the final expression of the real-time voice mouth shape animation is limited by two factors, namely the real-time self limitation, and only a small amount of change curve intervals are started to be linked with mouth shape deformation data in order to ensure real-time stable expression and avoid the situation that a plurality of mouth shapes are mixed to generate broken faces (skin tearing) or excessively exaggerated expression animations. Secondly, because of the subdivision degree of the voice recognition, the voice recognition provides data changes of 15 channels, but only 5 of the voice recognition are more common, and are respectively mouth shapes corresponding to the "a", "e", "i", "o" and "u", the data recognition accuracy of other channels is not very accurate, the effect is poor when mouth shape animation is synthesized and driven, and therefore necessary weakening measures need to be taken in real time, and the upper limit of emotion expression in the mouth shape animation is limited. In the later stage of movie and television process with high requirements on performance details, actions need to be recorded and finely adjusted so as to obtain smooth and rich mouth shape animation. However, since the real-time rendering engine does not support the fusion transformation, it is necessary to record the mouth shape animation frame for adjustment after the animation driving occurs, or to correct the mouth shape animation frame after the real-time rendering engine is derived.

Fig. 1 schematically shows a schematic diagram of an animation assembly flow in the related art.

Referring to fig. 1, the inventors have also found that in the real-time rendering engine, the composition of each frame of mouth-shaped animation occurs multiple times of composition assembly in the animation thread and the dynamics thread as the flow of the main thread proceeds. Taking the node of global attitude calculation in the animation thread (the node determining the mouth shape animation in the main thread) as a boundary, the whole mouth shape animation flow can be divided into two stages: an animation pose determination phase 101 before the node of the global pose calculation in the animation thread, and an animation generation rendering phase 102 after the node of the global pose calculation in the animation thread. Therefore, the task of the animation recording tool (the node for determining the mouth shape animation in the main thread) to record the animation is before the animation thread performs the global posture calculation. Therefore, if the purpose of recording the mouth shape animation is required, the fusion deformation of the face animation needs to be synthesized and recorded before the task of recording the animation by the animation recording tool is executed.

In view of one or more of the above problems, in the present exemplary embodiment, first, a mouth-shape animation recording method is provided, which may be applied to a terminal device and may also be applied to a server, and the following description will take the terminal device as an example to execute the method. Fig. 2 schematically illustrates a flow diagram of a method of mouth-shape animation recording according to some embodiments of the present disclosure. Referring to fig. 2, the method for recording a mouth shape animation may include the steps of:

step S210, responding to an operation instruction for starting recording the mouth shape animation, and acquiring voice visual element data for generating the mouth shape animation in real time;

step S220, converting the voice visual element data to obtain mouth shape deformation data;

step S230, before the animation thread in the real-time rendering engine refreshes the global posture of the mouth shape animation, assembling the mouth shape deformation data into a fusion curve corresponding to the mouth shape animation to obtain an editable mouth shape animation, and storing the editable mouth shape animation, so as to record the editable mouth shape animation.

According to the mouth shape animation recording method in the embodiment, on one hand, before an animation thread in a real-time rendering engine refreshes the global posture of the mouth shape animation (namely the mouth shape animation is not bound to a virtual model by the real-time rendering engine), an editable mouth shape animation containing mouth shape deformation data is recorded, the recording of the editable mouth shape animation is realized, and then the recorded editable mouth shape animation can be edited or the recorded editable mouth shape animation can be corrected through other tools after being exported, so that the quality of the mouth shape animation is improved; and on the other hand, the mouth shape deformation data is assembled into a fusion curve corresponding to the mouth shape animation to obtain the editable mouth shape animation, so that the recorded editable mouth shape animation can be edited after being separated from the real-time rendering engine, and the expression effect of the mouth shape animation is improved through editing and correction.

Next, the mouth shape animation recording method in the present exemplary embodiment will be further described.

In step S210, in response to an operation instruction for starting recording of the mouth shape animation, voice visual element data of the mouth shape animation is acquired.

In an example embodiment of the present disclosure, the mouth shape animation may be an animation corresponding to a lip portion of a face animation in the anthropomorphic expression animation, and since a user is highly sensitive to the mouth shape action of the animation model, and meanwhile, the mouth shape animation may also represent richer expression information, the mouth shape animation with high precision may effectively improve the representation effect of the anthropomorphic expression animation. Although only processing of the mouth shape animation is mentioned in the present disclosure, it is easily understood that the mouth shape animation is finally obtained in the whole anthropomorphic animation, that is, the mouth shape animation is substantially the whole anthropomorphic animation, and only the present disclosure is processed only for the mouth shape animation part.

The operation instruction for starting recording of the mouth-shaped animation may refer to a preset operation instruction for triggering recording of the mouth-shaped animation, for example, the operation instruction for starting recording of the mouth-shaped animation may be a trigger operation for triggering recording of the mouth-shaped animation through a preset recording control, or may be an operation for automatically recording the mouth-shaped animation through a predefined recording plug-in, and of course, other operation instructions capable of triggering recording of the mouth-shaped animation may also be used, which is not particularly limited in this example.

The voice visual element data may refer to data related to visual presentation of the mouth shape animation extracted by voice recognition software (e.g., Annosoft LipSync voice recognition software, ovrlipssync voice recognition software) and used for recording visual presentation of the mouth shape animation corresponding to different sounds, for example, the voice visual element data may be different relative position data of a plurality of key points arranged on the lip portion under different pronunciations, or peak data of a fusion deformation curve corresponding to the mouth shape animation, or of course, other data related to visual presentation of the mouth shape animation extracted by the voice recognition software may be included, which is not particularly limited in the present exemplary embodiment.

In step S220, the voice visual element data is converted to obtain mouth shape deformation data.

In an example embodiment of the present disclosure, the mouth shape deformation data (Morphing data) may refer to data obtained by transcribing the voice visual element data through a related transcription function and used for representing mouth shape animation deformation, and by converting the voice visual element data into the mouth shape deformation data, the voice visual element data may be subsequently assembled into the mouth shape animation, so as to implement editing of the mouth shape animation.

In step S230, before the animation thread in the real-time rendering engine refreshes the global pose of the mouth shape animation, the mouth shape deformation data is assembled into the fusion curve corresponding to the mouth shape animation to obtain an editable mouth shape animation, and the editable mouth shape animation is recorded.

In an example embodiment of the present disclosure, the real-time rendering Engine refers to an animation Engine capable of performing real-time computation and output on graphic data, for example, the real-time rendering Engine may include, but is not limited to, a fantasy 4 Engine (UE 4). Recording of the mouth-shaped animation is achieved before the animation thread in the real-time rendering engine refreshes the global posture of the mouth-shaped animation, namely before the main thread determines the final animation through an animation recording tool, and if the mouth-shaped animation is generated after the animation thread in the real-time rendering engine refreshes the global posture of the mouth-shaped animation, the mouth-shaped animation cannot be changed and edited. The fusion curve corresponding to the mouth shape animation is a mouth shape action change curve which is defined by a data-driven designer in advance, and the fusion curve can be obtained through voice recognition software.

The editable mouth shape animation may refer to a mouth shape animation generated after mouth shape deformation data is assembled, the editable mouth shape animation may be directly modified and edited in a real-time rendering engine through an editing modification tool, or may be edited and modified through other animation editing tools (for example, other animation editing tools may be 3Ds max software or Maya software, which is not specially limited in this example embodiment) after being derived from the real-time rendering engine (for example, the mouth shape animation is derived into FBX format), so that refinement of the mouth shape animation is achieved to improve a display effect of the mouth shape animation.

In an example embodiment of the present disclosure, the obtained mouth shape deformation data may be assembled into a corresponding fusion curve of the mouth shape animation through the steps in fig. 3 to obtain an editable mouth shape animation:

referring to fig. 3, in response to a selection operation of a check option for transcribing the mouth shape deformation data, transcribing the mouth shape deformation data into a mouth shape deformation array through the role class, in step S310;

and S320, assembling the mouth shape deformation array transferred to the animation space class into a fusion curve corresponding to the mouth shape animation to obtain the editable mouth shape animation.

The selection operation may be an operation of selecting a parameter for transferring the mouth-shape morphing data when the mouth-shape animation needs to be recorded, for example, the selection operation may be an operation of setting a parameter for transferring the mouth-shape morphing data through a plurality of provided check options, or may be an operation of quickly selecting a parameter for setting the mouth-shape morphing data through a shortcut key combination, which is not particularly limited in this embodiment.

Classes (Class type, Class) refer to user-defined reference data types, each Class containing a description of data and a set of functions that manipulate the data or pass the message, and instances of the classes are called objects. The real-time rendering engine may include a character class that performs logical assembly and an animation space class that performs animation assembly, the animation space class being configured to decouple animation assembly from logical assembly, and being independent from the character class (as compared to earlier game engine designs), with the benefit of being designed separately. The animation thread in the real-time rendering engine is a flow concept operating on the character model, and is substantially the execution sequence of each relevant class when executing.

The mouth shape deformation array can be an array generated after mouth shape deformation data are transcribed, the data structure of the mouth shape deformation array is the same as that of the animation space class, the mouth shape deformation data are transcribed into the mouth shape deformation array in the role class, and the mouth shape deformation data are conveniently transmitted to the animation space class to realize assembly.

Specifically, the null array can be predefined in the role class; and responding to the selection operation of the checking option for transferring the mouth shape deformation data, and writing the mouth shape deformation data into the empty array through a preset transfer function in the role class to generate the mouth shape deformation array.

The selection options can be options which are provided through an interactive interface and used for selecting the transcription parameters of the mouth-shaped deformation data when the mouth-shaped animation is recorded, and the transcription parameters for transcribing the mouth-shaped deformation data can be quickly selected through the selection options, so that the recording efficiency is improved, and the accuracy of the recorded editable animation is improved.

Further, the mouth shape deformation data transferred to the animation space class may be assembled into the corresponding fusion curve of the mouth shape animation through the steps in fig. 4 to obtain the editable mouth shape animation:

referring to fig. 4, in step S410, a pre-configured data configuration table is obtained;

step S420, based on the data configuration table, assembling the mouth shape deformation array transferred to the animation space class into a fusion curve corresponding to the mouth shape animation to obtain an editable mouth shape animation.

The data configuration table may be a data correspondence table configured in advance, and used for converting the mouth shape deformation data into a data form supported by the animation space class and assembling the data form to a fusion curve corresponding to the mouth shape animation. After the mouth shape deformation array is transmitted to the animation space class, the mouth shape deformation data in the mouth shape deformation array is cached into the corresponding array of the animation space class (the data structure of the mouth shape deformation array is the same as that of the animation space class, so that the mouth shape deformation data can be directly cached), and then the mouth shape deformation data is assembled into a fusion curve corresponding to the mouth shape animation to obtain the editable mouth shape animation through a pre-configured data configuration table before the animation thread in the real-time rendering engine refreshes the global posture of the mouth shape animation.

Fig. 5 schematically illustrates a schematic diagram of recording an editable mouth animation according to some embodiments of the disclosure.

Referring to fig. 5, in step S510, performing voice recognition on audio stream data through voice recognition software (such as Annosoft LipSync, OVRLipSync) of a voice recognition module in a real-time rendering engine;

step S520, the voice visual element data obtained by recognition is transmitted to a role class, and in the frame processing flow of the real-time rendering engine, when the voice visual element data is taken by the role class, data assembly in the role class does not support recording;

step S530, predefining an empty array in the role class, responding to the selection operation of the selection options aiming at the transcription mouth shape deformation data, and writing the mouth shape deformation data into the empty array through a preset transcription function so as to generate a mouth shape deformation array with the same data structure as that in the animation control class;

step S540, transferring the mouth shape deformation array to an animation control class;

step S550, assembling the mouth shape deformation data into a corresponding fusion curve of the mouth shape animation through the mouth shape deformation array transmitted to the animation space class to obtain an editable mouth shape animation;

and step S560, recording the editable mouth shape animation through the animation recording module.

In an example embodiment of the present disclosure, the voice visual element data may be converted into mouth shape deformation data by a preset voice recognition toolkit; and performing association processing on the voice visual element data and the mouth shape deformation data of different channels through a preset transcription function so as to realize unified processing on the fusion deformation data corresponding to the mouth shape animation.

The voice recognition Kit may be a Software Development Kit (SDK) preset in voice recognition Software (e.g., ansoft LipSync, OVRLipSync), and the mouth shape deformation data may be obtained by converting the voice visual data through the voice recognition SDK in the voice recognition Software.

The transcription function may be a preset function capable of associating the voice visual element data of different channels with the mouth shape deformation data, for example, the voice visual element data of different channels may be associated with names corresponding to the mouth shape deformation data through the transcription function, or the voice visual element data of different channels may be associated with values corresponding to the mouth shape deformation data through the transcription function, which is not particularly limited in this exemplary embodiment.

The fusion deformation data may refer to all fusion deformations including mouth shape fusion deformation, for example, the fusion deformation data may further include eyebrow fusion deformation, eye fusion deformation, ear fusion deformation and the like in the anthropomorphic expression animation, and the voice visual element data of different channels and the mouth shape deformation data are associated through a preset transcription function, and meanwhile, other fusion deformations except the mouth shape fusion deformation are also associated with the voice visual element data of different channels, so that unified processing of all the fusion deformation data is realized.

Fig. 6 schematically illustrates a schematic diagram of associating speech visual data of different channels with mouth-shape deformation data, according to some embodiments of the present disclosure.

Referring to fig. 6, the voice visual element data 601 of different channels obtained through voice recognition and other data 602 related to mouth shape animation deformation are associated with the names of the voice visual element data 601 of different channels and mouth shape deformation data through a preset transfer function 603, and meanwhile, the names of other fusion deformations except for the mouth shape fusion deformation are also associated with the voice visual element data of different channels to obtain a mouth shape deformation name array 604; performing association processing on the voice visual element data 601 of different channels and numerical values corresponding to the mouth shape deformation data through a preset transcription function 603, and associating the numerical values of other fusion deformations except the mouth shape fusion deformation with the voice visual element data of different channels to obtain a mouth shape deformation numerical value array 605; further, all the fusion deformation data corresponding to the mouth shape animation can be processed uniformly through the mouth shape deformation name array 604 and the mouth shape deformation numerical array 605.

Further, after the association processing of the speech viseme data of different channels and the mouth shape deformation data, the preliminary adjustment processing may be performed on the mouth shape deformation data through the steps in fig. 7:

referring to fig. 7, step S710 detects whether the die deformation data of different channels exceeds a deformation threshold;

and S720, responding to the fact that the mouth shape deformation data of different channels exceed a deformation threshold, and weakening the mouth shape deformation data through a preset weakening coefficient.

The deformation threshold may be a threshold that is obtained through experimental monitoring in advance and generates negative effects such as fracture deformation after the mouth shape deformation data of different channels are superimposed, for example, when three combinations of the mouth shapes "a", "u" and "o" corresponding to the mouth shape deformation data exceed 0.6, 0.5, and 0.8, the effect of the three curves is superimposed to generate fracture deformation or mouth shape distortion, at this time, 0.6, or 0.5, and 0.8 may be deformation thresholds, and of course, the deformation thresholds of a specific mouth shape animation are related to an animation model (face model) edited by a designer, at this time, the deformation thresholds are merely illustrative, and should not cause any special limitation to this example.

The weakening coefficient may be a value for weakening the die deformation data of different channels causing the fracture deformation or die distortion when the die deformation data of different channels exceeds a deformation threshold, for example, the weakening coefficient may be 0.8 or 0.6, which is not particularly limited in this exemplary embodiment. When the mouth shape deformation data of different channels exceed the deformation threshold, the mouth shape deformation data are cut by multiplying a preset weakening coefficient, preliminary adjustment of mouth shape animation is achieved, and the difficulty of subsequent adjustment is reduced.

In an example embodiment of the present disclosure, before assembling the mouth shape deformation data into the fusion curve corresponding to the mouth shape animation to obtain the editable mouth shape animation, the fusion curve corresponding to the mouth shape animation may be generated by:

acquiring voice visual position data corresponding to the mouth shape animation;

and fusing the voice visual position data with a preset change curve corresponding to the mouth shape animation to obtain a fusion curve corresponding to the mouth shape animation.

The video data (Viseme) refers to a human face modeling description method formulated by a human face animation parameter MPEG-4, and the video data can be divided into 15 pieces and respectively represent facial actions of sounds (such as initials, finals, vowels, consonants and the like) emitted by people. The preset variation curve may be a variation curve preset by a designer, and the preset variation curve can be driven by the acquired data in the form of streams (such as video streams and audio streams) to obtain a continuous morphing animation.

The voice viseme data is used for driving the preset change curve to be fused to obtain a fusion curve corresponding to the mouth shape animation, for example, the voice viseme data is used for pre-fusing to generate an eyebrow deformation fusion curve, an eye deformation fusion curve and the like, and the voice viseme data is further used for transferring to obtain mouth shape deformation data and assembling the mouth shape deformation data into the fusion curve to generate the editable mouth shape animation capable of freely editing the mouth shape deformation animation.

It should be noted that although the various steps of the methods of the present disclosure are depicted in the drawings in a particular order, this does not require or imply that these steps must be performed in this particular order, or that all of the depicted steps must be performed, to achieve desirable results. Additionally or alternatively, certain steps may be omitted, multiple steps combined into one step execution, and/or one step broken down into multiple step executions, etc.

In addition, in the present exemplary embodiment, a mouth-shape animation recording apparatus is also provided. Referring to fig. 8, the lip motion picture recording apparatus 800 includes: a voice visual element data acquisition module 810, a voice visual element data conversion model 820 and an editable mouth shape animation recording module 830. Wherein:

the voice visual element data acquisition module 810 is configured to respond to an operation instruction for starting recording of the mouth shape animation, and acquire, in real time, voice visual element data corresponding to the mouth shape animation;

the voice visual element data conversion model 820 is used for converting the voice visual element data to obtain mouth shape deformation data;

the editable mouth shape animation recording module 830 is configured to assemble the mouth shape deformation data into a fusion curve corresponding to the mouth shape animation to obtain an editable mouth shape animation before the animation thread in the real-time rendering engine refreshes the global posture of the mouth shape animation, and store the editable mouth shape animation, so as to record the editable mouth shape animation.

In an exemplary embodiment of the present disclosure, based on the foregoing solution, the editable mouth shape animation recording module 830 further includes:

the mouth shape deformation array generating unit is used for responding to the selection operation of the check option for transferring the mouth shape deformation data, and transferring the mouth shape deformation data into a mouth shape deformation array through the role class; wherein the data structure of the mouth shape deformation array is the same as the data structure in the animation space class;

and the mouth shape deformation data assembling unit is used for assembling the mouth shape deformation array transferred to the animation space class into a fusion curve corresponding to the mouth shape animation to obtain the editable mouth shape animation.

In an exemplary embodiment of the present disclosure, based on the foregoing, the die deformation data assembling unit is further configured to:

acquiring a pre-configured data configuration table;

and based on the data configuration table, assembling the mouth shape deformation array transferred to the animation space class into a fusion curve corresponding to the mouth shape animation to obtain the editable mouth shape animation.

In an exemplary embodiment of the present disclosure, based on the foregoing solution, the apparatus 800 further includes a data association unit configured to:

converting the voice visual element data through a preset voice recognition toolkit to obtain mouth shape deformation data; and

and performing association processing on the voice visual element data and the mouth shape deformation data of different channels through a preset transcription function so as to realize unified processing on the fusion deformation data corresponding to the mouth shape animation.

In an exemplary embodiment of the present disclosure, based on the foregoing solution, the apparatus 800 further includes a mouth shape animation weakening unit configured to:

detecting whether the die deformation data of different channels exceeds a deformation threshold;

and in response to the fact that the die deformation data of different channels exceed a deformation threshold, weakening the die deformation data through a preset weakening coefficient.

In an exemplary embodiment of the present disclosure, based on the foregoing scheme, the mouth shape deformation array generating unit is further configured to:

predefining an empty array in the role class;

and responding to the selection operation of the checking option for transferring the mouth shape deformation data, and writing the mouth shape deformation data into the empty array through a preset transfer function in the role class to generate the mouth shape deformation array.

The specific details of each module of the aforementioned apparatus for recording mouth shape animation have been described in detail in the corresponding method for recording mouth shape animation, and therefore are not described herein again.

It should be noted that although in the above detailed description several modules or units of the lip motion picture recording apparatus are mentioned, this division is not mandatory. Indeed, the features and functionality of two or more modules or units described above may be embodied in one module or unit, according to embodiments of the present disclosure. Conversely, the features and functions of one module or unit described above may be further divided into embodiments by a plurality of modules or units.

In addition, in an exemplary embodiment of the present disclosure, an electronic device capable of implementing the above-mentioned mouth shape animation recording method is also provided.

As will be appreciated by one skilled in the art, aspects of the present disclosure may be embodied as a system, method or program product. Accordingly, various aspects of the present disclosure may be embodied in the form of: an entirely hardware embodiment, an entirely software embodiment (including firmware, microcode, etc.) or an embodiment combining hardware and software aspects that may all generally be referred to herein as a "circuit," module "or" system.

An electronic device 900 according to such an embodiment of the disclosure is described below with reference to fig. 9. The electronic device 900 shown in fig. 9 is only an example and should not bring any limitations to the functionality or scope of use of the embodiments of the present disclosure.

As shown in fig. 9, the electronic device 900 is embodied in the form of a general purpose computing device. The components of electronic device 900 may include, but are not limited to: the at least one processing unit 910, the at least one storage unit 920, a bus 930 connecting different system components (including the storage unit 920 and the processing unit 910), and a display unit 940.

Wherein the storage unit stores program code that is executable by the processing unit 910 to cause the processing unit 910 to perform steps according to various exemplary embodiments of the present disclosure described in the above section "exemplary method" of the present specification. For example, the processing unit 910 may execute step S210 shown in fig. 1, and in response to an operation instruction for starting recording of a mouth shape animation, obtain, in real time, voice visual element data corresponding to the mouth shape animation; step S220, converting the voice visual element data to obtain mouth shape deformation data; step S230, before the animation thread in the real-time rendering engine refreshes the global posture of the mouth shape animation, assembling the mouth shape deformation data into a fusion curve corresponding to the mouth shape animation to obtain an editable mouth shape animation, and storing the editable mouth shape animation, so as to record the editable mouth shape animation.

The storage unit 920 may include readable media in the form of volatile memory units, such as a random access memory unit (RAM)921 and/or a cache memory unit 922, and may further include a read only memory unit (ROM) 923.

Storage unit 920 may also include a program/utility 924 having a set (at least one) of program modules 925, such program modules 925 including, but not limited to: an operating system, one or more application programs, other program modules, and program data, each of which, or some combination thereof, may comprise an implementation of a network environment.

Bus 930 can be any of several types of bus structures including a memory unit bus or memory unit controller, a peripheral bus, an accelerated graphics port, a processing unit, or a local bus using any of a variety of bus architectures.

The electronic device 900 may also communicate with one or more external devices 970 (e.g., keyboard, pointing device, bluetooth device, etc.), with one or more devices that enable a user to interact with the electronic device 900, and/or with any devices (e.g., router, modem, etc.) that enable the electronic device 900 to communicate with one or more other computing devices. Such communication may occur via input/output (I/O) interface 950. Also, the electronic device 900 may communicate with one or more networks (e.g., a Local Area Network (LAN), a Wide Area Network (WAN) and/or a public network, such as the Internet) via the network adapter 960. As shown, the network adapter 960 communicates with the other modules of the electronic device 900 via the bus 930. It should be appreciated that although not shown, other hardware and/or software modules may be used in conjunction with the electronic device 900, including but not limited to: microcode, device drivers, redundant processing units, external disk drive arrays, RAID systems, tape drives, and data backup storage systems, among others.

Through the above description of the embodiments, those skilled in the art will readily understand that the exemplary embodiments described herein may be implemented by software, or by software in combination with necessary hardware. Therefore, the technical solution according to the embodiments of the present disclosure may be embodied in the form of a software product, which may be stored in a non-volatile storage medium (which may be a CD-ROM, a usb disk, a removable hard disk, etc.) or on a network, and includes several instructions to enable a computing device (which may be a personal computer, a server, a terminal device, or a network device, etc.) to execute the method according to the embodiments of the present disclosure.

In an exemplary embodiment of the present disclosure, there is also provided a computer-readable storage medium having stored thereon a program product capable of implementing the above-described method of the present specification. In some possible embodiments, aspects of the present disclosure may also be implemented in the form of a program product comprising program code for causing a terminal device to perform the steps according to various exemplary embodiments of the present disclosure described in the "exemplary methods" section above of this specification, when the program product is run on the terminal device.

Referring to fig. 10, a program product 1000 for implementing the above-described mouth shape animation recording method according to an embodiment of the present disclosure is described, which may employ a portable compact disc read only memory (CD-ROM) and include program codes, and may be run on a terminal device, such as a personal computer. However, the program product of the present disclosure is not limited thereto, and in this document, a readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device.

The program product may employ any combination of one or more readable media. The readable medium may be a readable signal medium or a readable storage medium. A readable storage medium may be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any combination of the foregoing. More specific examples (a non-exhaustive list) of the readable storage medium include: an electrical connection having one or more wires, a portable disk, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.

A computer readable signal medium may include a propagated data signal with readable program code embodied therein, for example, in baseband or as part of a carrier wave. Such a propagated data signal may take many forms, including, but not limited to, electro-magnetic, optical, or any suitable combination thereof. A readable signal medium may also be any readable medium that is not a readable storage medium and that can communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device.

Program code embodied on a readable medium may be transmitted using any appropriate medium, including but not limited to wireless, wireline, optical fiber cable, RF, etc., or any suitable combination of the foregoing.

Program code for carrying out operations for the present disclosure may be written in any combination of one or more programming languages, including an object oriented programming language such as Java, C + + or the like and conventional procedural programming languages, such as the "C" programming language or similar programming languages. The program code may execute entirely on the user's computing device, partly on the user's device, as a stand-alone software package, partly on the user's computing device and partly on a remote computing device, or entirely on the remote computing device or server. In the case of a remote computing device, the remote computing device may be connected to the user computing device through any kind of network, including a Local Area Network (LAN) or a Wide Area Network (WAN), or may be connected to an external computing device (e.g., through the internet using an internet service provider).

Furthermore, the above-described figures are merely schematic illustrations of processes included in methods according to exemplary embodiments of the present disclosure, and are not intended to be limiting. It will be readily understood that the processes shown in the above figures are not intended to indicate or limit the chronological order of the processes. In addition, it is also readily understood that these processes may be performed synchronously or asynchronously, e.g., in multiple modules.

Through the above description of the embodiments, those skilled in the art will readily understand that the exemplary embodiments described herein may be implemented by software, or by software in combination with necessary hardware. Therefore, the technical solution according to the embodiments of the present disclosure may be embodied in the form of a software product, which may be stored in a non-volatile storage medium (which may be a CD-ROM, a usb disk, a removable hard disk, etc.) or on a network, and includes several instructions to enable a computing device (which may be a personal computer, a server, a touch terminal, or a network device, etc.) to execute the method according to the embodiments of the present disclosure.

Other embodiments of the disclosure will be apparent to those skilled in the art from consideration of the specification and practice of the disclosure disclosed herein. This application is intended to cover any variations, uses, or adaptations of the disclosure following, in general, the principles of the disclosure and including such departures from the present disclosure as come within known or customary practice within the art to which the disclosure pertains. It is intended that the specification and examples be considered as exemplary only, with a true scope and spirit of the disclosure being indicated by the following claims.

It will be understood that the present disclosure is not limited to the precise arrangements described above and shown in the drawings and that various modifications and changes may be made without departing from the scope thereof. The scope of the present disclosure is limited only by the appended claims.

Claims

1. A mouth shape animation recording method for a real-time rendering engine is characterized by comprising the following steps:

responding to an operation instruction for starting recording of the mouth shape animation, and acquiring voice visual element data for generating the mouth shape animation in real time;

converting the voice visual element data to obtain mouth shape deformation data;

before the animation thread in the real-time rendering engine refreshes the global posture of the mouth shape animation, the mouth shape deformation data are assembled into a fusion curve corresponding to the mouth shape animation to obtain an editable mouth shape animation and stored, so that the editable mouth shape animation is recorded.

2. The method of claim 1, wherein the real-time rendering engine includes a character class for performing logical assembly and an animation space class for performing animation assembly;

the assembling the mouth shape deformation data into the fusion curve corresponding to the mouth shape animation to obtain the editable mouth shape animation comprises the following steps:

in response to the selection operation of the check option for transferring the mouth shape deformation data, transferring the mouth shape deformation data into a mouth shape deformation array through the role class; wherein the data structure of the mouth shape deformation array is the same as the data structure in the animation space class;

and assembling the mouth shape deformation array transferred to the animation space class into a fusion curve corresponding to the mouth shape animation to obtain the editable mouth shape animation.

3. The method for recording mouth shape animation according to claim 2, wherein assembling the mouth shape deformation array transferred to the animation space class into a corresponding fusion curve of the mouth shape animation to obtain an editable mouth shape animation comprises:

acquiring a pre-configured data configuration table;

4. The method for recording mouth shape animation as claimed in claim 1, wherein converting the voice visual element data into mouth shape deformation data comprises:

5. The method of claim 4, wherein after associating the audio visual element data of different channels with the mouth shape distortion data by a predetermined transcription function, the method further comprises:

6. The method for recording mouth shape animation according to claim 1, wherein the transcribing the mouth shape deformation data into the mouth shape deformation array through the character class in response to a selection operation of a check option for transcribing the mouth shape deformation data comprises:

predefining an empty array in the role class;

7. The method for recording mouth shape animation according to claim 1, wherein before assembling the mouth shape deformation data into the corresponding blending curve of the mouth shape animation to obtain the editable mouth shape animation, the method further comprises:

8. An apparatus for recording a mouth-type animation for a real-time rendering engine, comprising:

the voice visual element data acquisition module is used for responding to an operation instruction for starting recording of the mouth shape animation and acquiring voice visual element data for generating the mouth shape animation in real time;

the voice visual element data conversion model is used for converting the voice visual element data to obtain mouth shape deformation data;

and the editable mouth shape animation recording module is used for assembling the mouth shape deformation data into a fusion curve corresponding to the mouth shape animation to obtain the editable mouth shape animation before the animation thread in the real-time rendering engine refreshes the global posture of the mouth shape animation, and storing the editable mouth shape animation so as to record the editable mouth shape animation.

9. An electronic device, comprising:

a processor; and

memory having stored thereon computer readable instructions which, when executed by the processor, implement the method of mouth-shape animation recording as claimed in any one of claims 1 to 7.

10. A computer-readable storage medium, on which a computer program is stored which, when being executed by a processor, carries out the method for mouth-shape animation recording as claimed in any one of claims 1 to 7.