WO2018049979A1

WO2018049979A1 - Animation synthesis method and device

Info

Publication number: WO2018049979A1
Application number: PCT/CN2017/099462
Authority: WO
Inventors: 吴松城; 方小致; 刘守达; 林明安; 陈军宏
Original assignee: 厦门幻世网络科技有限公司
Priority date: 2016-09-14
Filing date: 2017-08-29
Publication date: 2018-03-22
Also published as: CN106504304A; CN106504304B

Abstract

Disclosed are an animation synthesis method and device. According to the method, a terminal can receive text information inputted by a user and identify text keywords from the text information; then, the terminal can determine animations corresponding to the text keywords from a preset animation library, and synthesize the animations according to the sequence of the keywords in the text information, to obtain a fused animation. Compared with text information, an animation can more fully and vividly express the meaning of information. Therefore, compared with the approach in the prior art of only presenting information in the form of a text or voice, an animation converted from text information can more fully and vividly express the real meaning of the information, brining pleasure and convenience to a user during an information reading process.

Description

Method and device for synthesizing animation

The present application claims priority to Chinese Patent Application No. 20161082331, filed on No. in.

Technical field

The present application relates to the field of computer technology, and in particular, to a method and an apparatus for animation synthesis.

Background technique

With the continuous development of network technology and communication technology, wireless access (WIreless-Fidelity, WIFI), 3G, 4G and other Internet access methods have been popularized. Now, people can access the Internet through WIFI, 4G and other Internet access methods anytime, anywhere. Release information and enjoy the convenience brought by the information age.

At present, the user groups of instant messaging (IM) software or social software such as Weibo are increasing. On the one hand, due to their increasingly powerful functions, on the other hand, these software can continuously expand the social relationship of users. And to a certain extent, the information sharing is realized, thereby further realizing the information browsing needs of users in the information age.

When people use IM software, Weibo and other social software to publish information, the information is usually presented in the following two ways: First, the user enters the corresponding text information in the interface of the social software and publishes it. In this way, the information published by the user is presented in the form of text; secondly, the user issues his own voice as information through the voice transmission function in the social software (especially the IM software). Although these two forms of information release can effectively guarantee the normal presentation of information, however, both text information and voice information are too singular in the form of information expression, and text information or voice information is often insufficient. Expressing the full meaning of the information, this gives users the inconvenience of browsing this information.

Summary of the invention

The embodiment of the present invention provides a method for synthesizing an animation, which is used to solve the problem that the text information or the voice information in the prior art cannot fully express the meaning and cause inconvenience to the user in browsing the information.

The embodiment of the present application provides a method for animation synthesis, including:

Receiving input text information;

Identifying each text keyword in the text information;

Determining an animation corresponding to each text keyword from a preset animation library;

The determined animations are combined to obtain a fused animation.

An embodiment of the present application provides an apparatus for animation synthesis, including:

a receiving module, configured to receive input text information;

An identification module, configured to identify each text keyword in the text information;

a determining module, configured to respectively determine an animation corresponding to each text keyword from a preset animation library;

A synthesis module for synthesizing the determined animations to obtain a fusion animation.

The embodiment of the present application provides a method and an apparatus for synthesizing an animation. In this method, a terminal can receive text information input by a user, and identify each text keyword from the text information, and then the terminal can obtain a preset animation library. The animations corresponding to the respective text keywords are respectively determined, and each animation is synthesized according to the arrangement order of the keywords in the text information to obtain a fusion animation. Since the animation can express the meaning of the information more fully and vividly with respect to the text information, the text information is converted by the way of presenting the information in the form of text or voice. The obtained animation can more fully and vividly express the meaning of the information itself, thereby bringing the user the fun and convenience in the process of reading the information.

DRAWINGS

The drawings described herein are intended to provide a further understanding of the present application, and are intended to be a part of this application. In the drawing:

1 is a process of animation synthesis provided by an embodiment of the present application;

2 is a schematic diagram showing display of utterance information in a fusion animation according to an embodiment of the present application;

3 is a schematic diagram of a mouth animation provided by an embodiment of the present application;

FIG. 4 is a schematic diagram of an apparatus for animation synthesis according to an embodiment of the present application.

detailed description

The technical solutions of the present application will be clearly and completely described in the following with reference to the specific embodiments of the present application and the corresponding drawings. It is apparent that the described embodiments are only a part of the embodiments of the present application, and not all of them. All other embodiments obtained by a person of ordinary skill in the art based on the embodiments of the present application without departing from the inventive scope are the scope of the present application.

The technical solutions provided by the embodiments of the present application are described in detail below with reference to the accompanying drawings.

FIG. 1 is a process of animation synthesis provided by an embodiment of the present application, which specifically includes the following steps:

S101: Receive input text information.

In practical applications, users usually post some text information on social software such as Weibo, or send text chat information to other users through IM software. Because text information is too singular in expression, and can express The meaning of the information is limited. Therefore, in the embodiment of the present application, the terminal can convert the text information input by the user into an animation, so that the meaning of the information itself is more fully and vividly expressed by the animation. To this end, the terminal may first receive text information input by the user, wherein the terminal mentioned here may be a smart device such as a smart phone or a tablet computer, and of course, the user may also be in the terminal. Enter text information in the client.

It should be noted that, in the embodiment of the present application, the work of converting the text information into the corresponding animation may also be performed by an application such as a client or an application in the terminal, and the animation provided by the embodiment of the present application is illustrated for convenience and convenience. The method of synthesis is followed by a detailed description of the terminal.

S102: Identify each text keyword in the text information.

Since the text information usually contains multiple phrases, the animation corresponding to each phrase in the actual is also different. For example, if the text information is "Xiao Ming was raining when playing yesterday," from this text message. It can be seen that the animation that the text information may involve has an animation of raining and an animation of Xiaoming kicking the ball. Therefore, the animation that should be expressed by this text information should be the result of the synthesis of the two animations. Based on this, the terminal should identify each text keyword from the text information before converting the received text information into an animation, the purpose of which is to identify the text keyword to determine the text information may be involved in. The animation, and then in the subsequent process, the determined animations are combined to obtain a fusion animation corresponding to the text information.

Specifically, after receiving the text information input by the user, the terminal may segment the text information to obtain a plurality of phrases, and then pass the reverse text probability IDF value corresponding to each phrase saved in advance, and the word frequency TF of each phrase. The text keywords included in the text information are determined from each phrase. The specific implementation manner may be: inputting each phrase into a preset TF-IDF model, and the preset TF-IDF model may be for each The phrase determines the inverse text probability IDF value corresponding to the phrase and the word frequency TF, and obtains the important representation value of the phrase by calculating the product of the two, and then the preset TF-IDF model can respectively correspond to the calculated phrases. The important characterization values are output, and the terminal can sort the phrases according to the size of the important characterization values, and use the first few phrases as the text keywords of the text information.

In addition, the text keyword of the text information may also be determined from each phrase through a pre-trained recognition model, wherein the pre-trained recognition model may be a Hidden Markov Model (HMM) or the like. Machine learning model. The manner in which the text keywords are determined by the pre-trained recognition model is prior art, and therefore, no overstatement is made here.

S103: Determine an animation corresponding to each text keyword from a preset animation library.

The embodiment of the present application is intended to convert the text information input by the user into a corresponding animation. Therefore, after determining the text keywords included in the text information, the terminal may determine each text key from the preset animation library. Each animation corresponding to the word, and then in the subsequent process, the determined animations are combined to obtain an animation corresponding to the text information.

Specifically, after determining the text keywords included in the text information, the terminal may separately determine, for each text keyword, each animation keyword corresponding to each animation in the preset animation library and each of the text keywords. Similarity, wherein each animation keyword corresponding to each animation in the preset animation library can be calibrated in advance by an artificial method. For example, if the content displayed in an animation is played by a person, the manual manner can be manually The animation keyword corresponding to the animation is categorized as sports, and the animation and the animation keyword sports are correspondingly stored in a preset animation library.

In addition, in the embodiment of the present application, each animation keyword corresponding to each animation in the preset animation library may also be calibrated by a pre-trained first classification model. Specifically, the terminal may first convert each pre-saved animation into a corresponding feature vector, wherein converting the animation into a corresponding feature vector may be performed in the following manner: in actual application, the duration and severity of each animation are not In the same way, in each animation, the animation frames with the largest amount of change between animation frames are often the most distinguishable from other animations. Therefore, in the embodiment of the present application, the terminal converts each animation into a corresponding one. For the eigenvector, the amount of change T between each animation frame in the animation can be determined separately for each animation, and the change is selected. The z animation frames with the largest amount T are used as the animation frames representing the animation, and then the terminal can determine the sub-feature vectors corresponding to each animation frame for the selected z animation frames, wherein, for the three-dimensional animation The terminal can determine the sub-feature vector l corresponding to the animation frame according to the animated bone space coordinates in the animation frame, the bone acceleration between the frames, and the like, and further determine the sub-feature vectors of the z animation frames according to the respectively determined , convert the animation to the corresponding feature vector.

It should be noted that the feature vector conversion method described above is not unique, and each animation may be converted into a corresponding feature vector by other means, for example, for each animation, respectively, corresponding to each animation frame in the animation. The sub-feature vector, and then the terminal converts the animation into a corresponding feature vector according to each sub-feature vector corresponding to all the animation frames in the animation, and of course, other methods may be used. It is.

After the terminal respectively converts each animation into a corresponding feature vector, each feature vector may be separately input into a pre-trained first classification model, wherein for each feature vector, the first classification model implements the feature vector. After the calculation, several values can be obtained, wherein each value corresponds to a keyword, and when the terminal finds that a certain value is greater than other values among the values, the keyword corresponding to the value can be used as the keyword. The animation keyword of the animation, and the animation is associated with the animation keyword and saved in the preset animation library.

In the embodiment of the present application, the classification model described above may be a training model such as a neural network model, a hidden Markov model HMM, or a Support Vector Machine (SVM). In the training process of the classification model, a large number of sample animations can be collected first, and each sample animation is converted into a vector, a parameter, and the like, respectively, and input into the classification model, and then the classification model is trained.

It should be noted that in practical applications, each animation usually corresponds to multiple keywords. For example, if an animation shows that a person is playing football happily, the animation key corresponding to the animation The word can be sports, it can be playing football, or keywords such as happy and cheerful, so when the terminal determines the animation corresponding to a keyword, it may determine a plurality of animations and the key from the preset animation library. The word correspondingly, therefore, in order to be able to further accurately determine the animation corresponding to the keyword, in the embodiment of the present application, the terminal may further determine the feature information corresponding to the text information from the received text information, and according to The feature information and each keyword determine each animation corresponding to each keyword from a preset animation library.

Specifically, after the terminal determines the keywords included in the text information, the terminal may further extract the feature information in the text information, and the specific extraction manner may be: the terminal uses the preset feature analysis model to the text. The information is analyzed, and the feature information in the text information is extracted. For example, suppose a piece of text message is "We will play football happily tomorrow!", the terminal can convert this paragraph into a corresponding sequence of word vectors (since this passage is composed of multiple words, so this paragraph will be After each word in the word is converted into a word vector, the word vector is sorted according to the position of each word in the paragraph, and a sequence of word vectors capable of representing the phrase can be obtained, and the word vector sequence is input to the pre-predicate. The feature analysis model is set, and then the result of the feature analysis model is used to determine that the emotion expressed from the entire context of the passage should be a happy and happy emotion. Therefore, the terminal extracts the feature information from the passage. It should be happy or happy. Of course, the software developer can also pre-establish an emotional vocabulary library, and input the emotional vocabulary library into the terminal for storage. Correspondingly, after the terminal subsequently receives the text information sent by the user, the text information can be Each word is compared with each emotional word in the emotional vocabulary library to determine the emotional information corresponding to the text information.

Then, for the text keyword "playing football" in this paragraph, after the terminal recognizes the text keyword from the passage, the terminal can further "play soccer" and feature information according to the text keyword. "Happy", from the preset animation library, filter out the corresponding text keywords and feature information Animation. Since the text keyword "playing soccer" may correspond to a plurality of animations in the preset animation library, the terminal may further perform a plurality of animations corresponding to the text keyword "playing soccer" through the feature information "happy". The screening, and then the animation corresponding to the text keyword "playing football" and the feature information "happy" are determined.

The feature information described above may be emotional information such as "happy", "happy", "sad", and in order to enable the terminal to filter the corresponding animation from the preset animation library through the emotional information, it is necessary to calibrate each The emotional keyword corresponding to the animation further enables the terminal to determine the animation corresponding to the emotional information by matching the emotional information with the emotional keyword. Therefore, in the embodiment of the present application, the emotion information of each animation can be calibrated in advance by an artificial method. For example, if an animation shows that the content is a person sitting in a chair and crying, it can be manually The emotion information corresponding to the animation is determined to be "sadness." In addition, the emotional keyword corresponding to each animation may also be determined by the pre-trained second classification model. The specific method may be: after each animation is converted into a corresponding feature vector, each feature vector may be obtained. Inputting into the pre-trained second classification model respectively, and then determining the emotional keywords corresponding to the animations according to the output of the second classification model, and then matching the animations with the emotion information, wherein the The training method of the two-category model can be the same as the above-mentioned training of the first classification model, and will not be described in detail here.

It should be noted that the above-mentioned characteristic information is not limited to emotional information such as "happy" or "sad", but may also be weather information such as "cloudy", "sunny", "high wind", "raining" and the like. Or, such as "strong", "wilting", "safe" and other information, of course, other information, here is not an example. Correspondingly, each feature keyword corresponding to each feature information should also be stored in a preset animation library corresponding to each animation, and when determining each feature keyword corresponding to each animation, the same can be pre-trained. Classification model to determine, specific determination process and It is determined that the animation keywords corresponding to the respective animations are the same, and will not be described in detail here. The classification model mentioned here may also be a model such as a neural network model, a hidden Markov model HMM, a support vector machine SVM, or the like.

In an actual application, an animation may correspond to a plurality of feature keywords. Therefore, in order to further accurately determine an animation corresponding to the text keyword, in the embodiment of the present application, the terminal may also extract text information from different angles. The plurality of feature information may further filter a plurality of animations corresponding to the text keyword according to the extracted plurality of feature information, thereby more accurately displaying the animation corresponding to the text keyword with respect to the entire text information.

S104: Synthesize each determined animation to obtain a fusion animation.

After the terminal determines each animation related to the text information by using each text keyword, each animation may be combined to obtain a fusion animation capable of representing the text information, wherein the terminal may synthesize each animation by using Each animation is synthesized in the order in which the text keywords are arranged in the text information.

For example, suppose that in a text message of “Today's Clear Sky, I am going to fish”, the terminal can identify “clear sky”, “I”, “fishing” from the text information through a pre-trained recognition model. a text keyword, and then the terminal determines three animations H, X, and C corresponding to the three text keywords "clear sky", "me", and "fishing" from the preset animation library, and then according to The three text keywords are arranged in the text message of "Today's Clear Sky, I want to go fishing". The three animations H, X, and C are arranged to obtain the animation sequence to be synthesized as H, X, C, then, the terminal can synthesize the three animations according to the animation sequence H, X, and C to be fused, and finally obtain a fused animation representing the text information.

For the animation of the two animations, in the actual application, the two animations may be different. If you combine two different animations directly, the synthesized animation will look like a clear jump. Therefore, in order to make the synthesized animation look more natural, in the embodiment of the present application, a piece of animation for transition can be inserted in any two adjacent animations, and the animation segment and the two The adjacent animations are combined to obtain a fused animation.

Specifically, for two arbitrary adjacent animations, the transition animation segments to be inserted between the two animations are determined by the two animations, wherein the terminal can determine the transition animation by interpolation. Fragment.

For example, animation A and animation B are two adjacent animations, where animation A is the previous animation, and animation B is the latter animation, and animation A and animation B have significant differences, so in order to synthesize the two In the process of animation, these differences are eliminated. The terminal can analyze the motion of the characters in the animation A and the animation B, and determine the transition animations of the animations a1 and b1 to be inserted into the animation A and the animation B by interpolation. The characters from the two transitional animation segments a1 and b1 are in the order of a1 and b1, and the characters in the animation A are successively transitioned to the animation B, so that the animation A will be present due to the existence of the transitional animation segment. The animations obtained by synthesizing the transitional animation segments a1, b1 and animation B in order will be a coherent animation, and there will be no jumping feeling caused by the difference between the animations A and B.

In addition to the above-described synthetic manner, in the embodiment of the present application, the terminal may also add a certain effect between two adjacent animations to eliminate the difference between the two adjacent animations. Specifically, in general, animations are composed of animation frames, and each animation frame is arranged in a certain order and quickly projected to obtain a corresponding animation. For two animations with differences, if there are differences in the animation frames used for the two animations, then the two animations will often be the two animations that have differences. In other words, for the two animations, two The difference between animations is often used by these two animations The animation frame is determined, wherein, for the two animations, when the two animations are played in order, the last animation frame of the previous animation and the first animation frame of the latter animation can be used as the two Animated frames used for animation. Therefore, for two different animations, the way to eliminate or reduce the difference between the two animations may be to perform certain processing on the animation frames used for the two animations. The specific processing method may be: When the terminal determines each animation to be merged and arranges each animation according to the arrangement order of each text keyword in the text information, the terminal may specify the first specified animation of the previous animation for any two adjacent animations. The frame is set as the first effect, and each second specified animation frame of the latter animation is set as the second effect, wherein, if the last animation frame of the previous animation and the first few animation frames of the latter animation There is a clear difference between the previous animation and the latter animation. Therefore, in order to make the synthesized animation not have obvious jumping feeling, the terminal should try to eliminate or reduce the last animation of the previous animation. The difference between the frame and the first few animation frames of the latter animation is to ensure the integrity of the animation after the connection. When the terminal selects each of the first specified animation frames in the previous animation, the previous one can be selected as much as possible. Painting few animation frame specified as each of the first animation frame, and in the respective second selected frame specified animation, the animation may try to select the first few frames after a respective second specified as the animation frame animation. After the first and second specified animation frames are selected, the terminal may set the first specified animation frame to effect such as fade out, box-shaped contraction, etc., and the terminal may set each second specified animation according to the effect of the first specified animation frame. The effect of the frame is set to be opposite to the first specified animation frame. For example, when the terminal sets the effect of each first specified animation frame to fade out, the second specified animation frame of the subsequent animation may be correspondingly The effect is set to fade in effect.

After the terminal sets the effects for each of the first specified animation frames of the previous animation and the second designated animation frames of the subsequent animation, the two animations can be combined. In this way, when the synthesized animation is played to each of the first designated animation frames and each of the second specified animation frames, the terminal And the effect respectively set by each of the second specified animation frames will eliminate or reduce the difference between the animation frames, so that the synthesized animation does not have a significant jump feeling during the playing process.

In practical applications, there may be certain similarities between animation frames of different animations. Therefore, based on this, in the embodiment of the present application, for any two adjacent animations, the terminal may also be in the two animations. The animation frames similar to each other are respectively determined, and the animation frames similar to each other are synthesized into an animation frame in a certain manner, and then the two animations are synthesized according to the synthesized animation frames.

Specifically, for any two adjacent animations, the terminal may respectively determine the similarity between each animation frame of the previous animation and each animation frame of the latter animation, and respectively according to the determined similarities, respectively from the previous animation. Selecting the first animation frame and selecting the second animation frame from the latter animation, and merging the first animation frame and the second animation frame to obtain a fused frame, wherein the first animation frame and the first animation frame are selected The second animation frame has the highest similarity in the previous animation and the latter animation. Then, the terminal may further synthesize each animation frame, the fused frame, and each animation frame located after the second animation frame in the previous animation in the previous animation to obtain the fused animation.

For example, suppose that in the two adjacent animations C and D, the animation C includes a total of 5 animation frames from #1 to #5, and the animation D includes a total of 7 animation frames from *1 to *7, and the terminal is determined. The similarity between each animation frame in animation C and each animation frame in animation D is found. The #3 animation frame in animation C has the highest similarity to the *2 animation frame in animation D. Therefore, the terminal can be in the animation C. The #3 animation frame is fused with the *2 animation frame in the animation D to obtain the corresponding fused frame. When the terminal synthesizes the animation C and the animation D, the animation frames #1, #2 located before the #3 animation frame in the animation C, and the animation frames *3 to *7 located after the animation frame *2 in the animation D. Selecting and synthesizing the selected animation frames with the obtained fused frames. The specific fusion method may be that the animation frames #1, #2, the fused frame, and the animation frames *3 to *7 are combined into one in order. Animation, while animation frames #4, #5 and animation D in animation C The animation frame *1 can be removed accordingly.

When determining the similarity between each animation frame, the terminal can determine by calculating the Euclidean distance between each animation frame, wherein, for ordinary two-dimensional animation, the terminal can pass the three primary colors of the image (red, green, Blue) to construct the feature parameters of the picture, and determine the similarity between each animation frame by calculating the Euclidean distance between each feature parameter. Generally, the smaller the Euclidean distance value is between the two animation frames. The similarity is greater.

For 3D animation, the feature parameters corresponding to each animation frame cannot be simply constructed by the three primary colors of the image. Therefore, for 3D animation, the feature parameters corresponding to each animation frame in the 3D animation. It can be represented by the parameters of each animation frame in the skeletal animation. Specifically, in the embodiment of the present application, when determining the similarity between each animation frame in the previous animation and each animation frame in the subsequent animation, the terminal may separately determine each animation frame in the skeletal animation. The rotational angular velocity vector of the bone, the bone weight of each bone, the rotation vector of each bone, and the intensity factor of the animation, and then the terminal can adopt the formula

Determine the Euclidean distance between each animation frame in the previous animation and each animation frame in the latter animation, and then each animation frame and the next animation of the previous animation according to the determined Euclidean distance. The similarity of the animation frame is determined, wherein D(i, j) is the Euclidean distance of the i-th animation frame of the previous animation and the j-th animation frame of the latter animation, and the smaller the Euclidean distance, the previous animation The similarity between the i-th animation frame and the j-th animation frame of the latter animation is greater.

The rotation angular velocity vector of the nth bone of the ith animation frame of the previous animation,

The rotation angular velocity vector of the nth bone of the jth animation frame of the latter animation, wherein the criteria used for the skeletal animation in the actual application are the same, in other words, for two different skeletal animations, The bones of the hand or foot are usually the same, so the nth bone of the i-th animation frame mentioned here and the nth bone of the j-th animation frame represent the same part of the bone. In other words, the bone number of each animation frame in the previous animation is the same as the bone number of each animation frame in the latter animation.

The w _n in the above formula represents the bone weight of the nth bone, in the formula

Represents the rotation vector of the nth bone of the ith animation frame of the previous animation.

The rotation vector of the nth bone of the jth animation frame of the latter animation; and u in the formula is represented as the preset animation intensity coefficient. As can be seen from the above formula, for an animation frame in a three-dimensional animation, when calculating the Euclidean distance between two animation frames, the terminal starts from two aspects of the bone rotation vector and the bone rotation angular velocity vector. Each bone is compared in turn, and the calculated Euclidean distance is relatively accurate. Of course, the above formula is not unique, and other bone parameters can be introduced to further accurately determine the Euclidean distance between the animation frames, and then determine the animation frames by determining the Euclidean distance between the animation frames. The similarity between the two.

Of course, when determining the similarity between each animation frame of the previous animation and each animation frame of the latter animation, it can also be determined by means such as dot product, that is, after calculating the dot product of the two animation frames. The similarity of the two animation frames is determined by the dot product, and the specific process will not be described in detail.

The above description may result in the loss of multiple animation frames by determining the two animation frames with the highest similarity in the previous animation and the latter animation. For example, continue to use the above example, assuming #2 in the animation C When the *5 similarity in the animation D is the highest, the terminal will discard the animation frames #3 to #5 in the animation C and the animation frames in the animation D *1 to * in the process of synthesizing the animation C and the animation D. 4, that is to say, the terminal will lose 7 animation frames, and the animation C and animation D have a total of 12 frames. In this way, due to the excessive number of dropped frames, the final synthesized animation of the terminal will be affected by the effect. Impact.

In order to reduce the impact of frame dropping on animation synthesis as much as possible, in the embodiment of the present application, the terminal is When the similarity between each animation frame of the previous animation and each animation frame of the latter animation is determined, each third specified animation frame may be extracted in the previous animation and each fourth specified animation frame may be extracted from the latter animation. The third specified animation frame mentioned here refers to a continuous part of the animation frame in the previous animation. In order to reduce the disadvantages caused by the frame loss as much as possible, the latter animations in the previous animation may be selected. The frame is used as each of the third specified animation frames; similarly, each of the fourth designated animation frames mentioned herein refers to a continuous part of the animation frame in the latter animation, and the terminal can select the first several animation frames in the latter animation as the first Four specified animation frames, and then the terminal can further determine the similarity between each of the third specified animation frames and each of the fourth specified animation frames, and select two animation frames with the highest similarity according to the similarity for fusion. And then synthesize the animation through the fused frame.

For example, continue to use the above example, when the terminal determines the similarity between each animation frame in the animation C and each animation frame in the animation D, the animation frame of #3 to #5 in the animation C and the animation D in the animation D can be taken. ~*3 animation frame, and determining the similarity between the animation frames #3 to #5 and the animation frames *1 to *3, and then the terminal can select the two similarities from the determined similarities The animation frames are fused, and the animation C and the animation D are combined according to the obtained fused frame.

As can be seen from the above-mentioned synthesis method, since the terminal determines the similarity between each animation frame in the previous animation, only the similarity between a part of the animation frame in the previous animation and a part of the animation frame in the latter animation is determined. When the terminal synthesizes the animation according to the similarity between the animation frames, the number of dropped frames can be effectively controlled within a certain range, thereby reducing the adverse effect of frame dropping on animation synthesis to a certain extent.

Although the above-mentioned synthesis method can reduce the disadvantages caused by frame dropping to a certain extent, since the similarities determined by the terminal are only the similarities between the animation frames of the previous animation and the latter animation, Among the similarities of some animation frames, even the two animation frames with the highest similarity The actual difference may also be relatively large, which in turn leads to an animation that is synthesized based on the two animation frames.

Therefore, in order to further ensure the effect of the synthetic animation, in the embodiment of the present application, the terminal may proceed from two aspects of frame loss rate and similarity to determine two animation frames to be merged, wherein the frame loss mentioned here The rate refers to the ratio of the number of frames that have not been fused and not synthesized to the total number of frames of the animation in an animation. For example, suppose there are a total of 12 animation frames in two animations, and the terminal performs the two animations. When synthesizing, there are 4 animation frames that are discarded by the terminal during the synthesis process. That is, the 4 animation frames are not involved in the fusion process, nor participate in the synthesis process of the two animations. The frame loss rate for two animations is 1/3.

When determining the two animation frames to be merged, the terminal may first determine the similarity between each animation frame in the previous animation and each animation frame in the subsequent animation, and for each similarity The animation frame determines the frame loss rate corresponding to the synthesized animation when the two animation frames are used as the fused frame to synthesize the animation. When determining the similarity degree and the frame loss rate corresponding to each similarity, the terminal may determine the first animation frame from the previous animation, and determine the second animation frame from the latter animation, where the first animation frame and The second animation frame satisfies the formula

In this formula, x _IJ is the smallest x _ij that makes a*x _ij +b*y _ij , which is the Euclidean distance between the first animation frame and the second animation frame, and x _ij is the i-th of the previous animation. The Euclidean distance of the frame animation frame and the j-frame animation frame of the latter animation, the value range of i is 1 to the total number of frames of the previous animation, and the value range of j is 1 to the total number of frames of the latter animation; y _IJ is the minimum y _{ij of} a*x _ij +b*y _ij , that is, the integrated frame loss rate determined according to the first animation frame and/or according to the second animation frame, correspondingly, y _ij is according to the i The frame animation frame and/or the integrated frame loss rate determined according to the j-th frame animation frame, a and b are corresponding coefficients, and the coefficient can be determined manually, and only needs to be guaranteed to be not less than 0.

The above description of y _{ij does} not refer to the actual frame loss rate of the previous animation and the latter animation in the actual synthesis process, but a value that can represent the actual frame loss rate. Although this value cannot truly represent the animation synthesis process. The true frame loss rate, however, is positively related to the frame loss rate during animation synthesis. Therefore, when the value of y _ij is small, the previous animation and the latter animation are combined according to y _ij . The frame loss rate will also be relatively small.

For the way y _ij is determined, the terminal passes the formula

When the first animation frame and the second animation frame are determined, an expected frame loss rate of the previous animation may be determined according to the ith frame animation frame for the ith frame animation frame in the previous animation, and the determined The expected frame loss rate of the previous animation is taken as the integrated frame loss rate y _ij , or the terminal can be used for the j-th frame animation frame in the latter animation, and an expectation of the latter animation is determined according to the j-frame animation frame. The frame rate is lost, and the expected frame loss rate of the latter animation is determined as the integrated frame loss rate y _ij , wherein the expected frame loss rate of the previous animation mentioned here may be: the terminal according to the ith frame animation frame , determining the ratio of the number of animation frame frames that do not participate in the fusion and does not participate in the composition of the previous animation according to the animation frame of the i-th frame, and the previous animation is combined with the previous animation. In the process of compositing, the ratio of the animation frame discarded by the previous animation to the total number of frames of the previous animation; similarly, the expected frame loss rate of the latter animation may be: the terminal is animated according to the jth frame Frame, determining that the next animation is based on the jth frame Draw a frame, when compared with the previous animation, the ratio of the number of frames of the animation frame that does not participate in the fusion and does not participate in the composition and the total number of frames of the latter animation in the latter animation, that is, the previous animation and the latter animation are synthesized. In the process, the ratio of the animation frame discarded by the latter animation to the total number of frames of the latter animation.

The y _ij described above is expressed as the integrated frame loss rate determined by the terminal according to the i-th frame animation frame or the integrated frame loss rate determined by the terminal according to the j-th frame animation frame in the process of synthesizing two adjacent animations. In this way, because the terminal passes the formula

The determined first animation frame and the second animation frame are determined based on the frame loss rate and the similarity. Therefore, the animation synthesized by the terminal in the above manner can reduce the disadvantage of frame dropping to some extent. influences.

However, for two adjacent animations to be synthesized, if only the frame loss rate of one animation at the time of composition is considered, the frame loss rate may not be able to represent the overall animation when the two animations are synthesized. The frame loss rate, for example, assumes that for two adjacent animations, when the terminal synthesizes the two animations, the animation frames to be merged are respectively selected from the two animations, resulting in an animation corresponding to the lost The frame rate may be relatively low, and the frame loss rate of another animation may be very high. If the terminal only considers the two animations through the two fused animation frames, the frame loss rate of one of the animations may be lower. Regardless of the fact that this will cause the frame loss rate of another animation to be higher, after the terminal synthesizes the two animations in this way, the overall frame loss rate of the two animations may also be relatively high, which ultimately affects The effect of the fusion animation.

In order to avoid the above problem, in the embodiment of the present application, the manner in which the terminal determines y _ij may be based on the integrated frame loss rate of the adjacent two animation synthesis processes determined according to the i-th frame animation frame and the j-th frame animation frame, that is, The determination method of the y _ij considers the frame loss situation of the two animation synthesis processes, and the specific determination manner may be: the terminal is passing the formula

When the first animation frame and the second animation frame are determined, the i-th frame animation frame may be selected from the previous animation and the j-th frame animation frame may be selected from the latter animation, and then the terminal may determine the European state of the two animation frames. a distance x _ij , and an expected frame loss rate y _ij determined according to the ith frame animation frame and the j-frame animation frame, wherein y _ij herein may be that the terminal determines the previous animation according to the ith frame animation frame The sum of the expected frame loss rate and the expected frame loss rate of the latter animation determined according to the j-frame animation frame, the terminal according to the formula

Determine that a certain pair of animation frames are satisfied

Then, the first animation frame and the second animation frame can be determined by the pair of animation frames, and correspondingly, x _ij and y _ij corresponding to the pair of animation frames become x _IJ and y _IJ .

For example, suppose that for two adjacent animations of animation G and animation H, in which there are a total of 6 animation frames in the animation G, and there are a total of 4 animation frames in the animation H, the terminal is determining the first animation frame and The second animation frame, when passing the formula

It is found that the fourth frame animation frame of the animation G and the second frame animation frame of the animation H are merged, and the animation G and the animation H, a*x ₄₂ +b*y ₄₂ are synthesized in all combinations. The smallest one, wherein when determining the value of y ₄₂ , the terminal can determine that when the animation G is combined with the animation H according to the fourth frame animation frame, the fifth and sixth frame animation frames included in the animation G are discarded. Therefore, the terminal determines that the expected frame loss rate of the animation G is 1/3 according to the fourth frame animation frame of the animation G. Similarly, the terminal may further determine that when the animation H is combined with the animation G according to the second frame animation frame. , the first frame animation frame contained in the animation H will be discarded, so the terminal can determine the expected frame loss rate of the animation H according to the second frame animation frame of the animation H, and then the two expected frame loss rates. The sum value 7/12 is taken as the value of y ₄₂ . For the value of x ₄₂ , the terminal can pass the above formula

Determining the Euclidean distance G Animation Animation Frame 4 H animation frame in the second frame of the animation frame, and the determined value as the Euclidean distance to x ₄₂ in.

It should be noted that the above description of y _ij can be determined by using the sum of the expected frame loss rate of the previous animation in the two adjacent animations and the expected frame loss rate of the latter animation as y _ij . The average of the two expected frame loss rates is taken as the y _ij , and the weighted sum of the expected frame rate can be assigned as the y _ij , or The sum of the two expected frame loss rates is opened, and the value obtained by rooting is taken as the y _ij . Of course, the y _ij can also be the actual frame loss rate of the previous animation and the latter animation. In short, the meaning of y _ij is to be able to characterize the frame loss rate of two adjacent animations at the time of composition, that is, the y _ij should be positively correlated with the frame loss rate of the adjacent two animations, so no matter y _ij is determined what manner, the terminal can be determined with the y _ij frame loss rate after two animations can be positively related to the synthesis of adjacent, as for the determination is not the only way.

Since the Euclidean distance is inversely related to the similarity, the formula is passed.

The first animation frame to be merged and the second animation frame can ensure that the frame loss rate of the animation synthesized by the two animation frames is as low as possible, and the two animations can be guaranteed to a certain extent. Frames can be as similar as possible, further reducing the impact of dropped frames on animation synthesis. By changing the values of a and b, the first animation frame and the second animation frame ideal for the user can be obtained. For example, when a=1, b=0, that is, the above consideration only considers the correlation between two animations without considering the frame loss rate.

Thus, two animation frames with the smallest Euclidean distance (ie, the highest degree of similarity) are obtained, and when a=0, b=1, the above is only considering the frame loss rate between the two animations without considering the correlation. At this time

That is, two animation frames with the lowest frame loss rate are obtained.

It should be noted that the above formula is adopted.

The determined animation frame to be merged may be a plurality of pairs of animation frames, so when encountering such a situation, the terminal may further determine a pair of animation frames with the highest similarity from the plurality of pairs of animation frames for fusion. Or choose a pair of animation frames with the lowest frame loss rate from the multiple pairs of animation frames for fusion. Specifically, the terminal may determine a third animation frame in each first animation frame and a fourth animation frame in each second animation frame, wherein a similarity between the third animation frame and the fourth animation frame is the highest, or After the animation is synthesized according to the third animation frame and the fourth animation frame, the corresponding frame loss rate is the lowest. Due to the formula

The determined animation frames to be merged are used to reduce the adverse effects of the dropped frames as much as possible. Therefore, in the frames to be fused, whether the animations are synthesized with the highest similarity (ie, the European distance is the smallest), The lowest frame rate is used to synthesize the animation, and the resulting synthesized animation is as effective as possible to reduce the adverse effects of frame dropping.

In the embodiment of the present application, the terminal may also determine the number of the animation frames in the above-mentioned manner, in order to further reduce the adverse effects caused by the frame loss. The animation frames between the two animation frames to be fused are fused to each other in a certain way, so that no frame dropping occurs in the final synthesized animation.

Specifically, when the terminal passes the formula

After determining the first animation frame and the second animation frame, the terminal may select the first animation frame from the previous animation and locate the animation during the animation synthesis according to the first animation frame and the second animation frame. k animation frames after the first animation frame, and sorting the selected animation frames according to the order of the animation frames in the previous animation, thereby obtaining the first frame sequence; similarly, the terminal can be from the latter animation The k animation frames located before the second animation frame and the second animation frame are selected, and the selected animation frames are sorted according to the arrangement order of the animation frames in the subsequent animation to obtain the second frame. sequence. Then, the terminal may combine the first frame sequence and the second frame sequence with the same sequence number of animation frames to obtain k+1 fused frames, and then pass the animation frames located before the first animation frame in the previous animation. , k+1 fused frames, and each animation frame located after the second animation frame in the latter animation are synthesized.

For example, suppose the terminal performs the process of synthesizing the animation C and the animation D (the animation C includes a total of 5 animation frames of #1 to #5, and the animation D includes a total of 7 animation frames of *1 to *7), and determines Animation frame #3 in animation C and animation frame in animation D *3 Euclidean distance is the smallest (ie, the highest similarity) (this time in

In the formula, a=1, b=0, of course, in other embodiments, a and b can also select other values, so that other matching animation frames will be obtained, and the terminal can further select an animation frame from the previous animation. #3～#5 is used as the first frame sequence, and the animation frame *1~*3 is selected from the latter animation to form a second frame sequence (k=2), and then the terminal can sequence the first frame and the second frame. The two animation frames with the same sequence number and sequence number are merged, that is, the animation frame #3 and the animation frame *1 fusion, the animation frame #4 and the animation frame *2 fusion, the animation frame #5, and the animation frame *3 are merged. Get 3 fused frames. After determining the fused frame, the terminal can synthesize the animation frames #1, #2, and 3 fused frames in the previous animation and the animation frames *4 to *7 in the latter animation in order, and then obtain the synthesized Animation.

When the terminal fuses the animation frames to be fused, the formula can be used.

To perform the fusion, specifically, for the two animation frames in which the sequence number of the first frame sequence and the second frame sequence are both p, the terminal can pass the formula

Determining the fusion coefficient corresponding to the p-th animation frame in the first frame sequence, and determining the fusion coefficient corresponding to the p-th animation frame in the second frame sequence by using the formula β(p)=1-α(p), Then, the terminal may fuse the p-th animation frame in the first frame sequence with the p-th animation frame in the second frame sequence by using the determined fusion coefficients to obtain a corresponding fused frame.

Through the above fusion method, the terminal reduces the frame loss rate in the animation synthesis as much as possible, and in order to ensure that the synthesized animation does not have obvious jumping feeling in the effect, the terminal calculates the animations participating in the fusion process by calculating. The fusion coefficient of the frame is used to fuse each animation frame to ensure the display effect of each fusion frame in the synthesized animation, which reduces the disadvantages caused by the animation synthesis process.

The terminal synthesizes the respective animations corresponding to the text keywords according to the arrangement order of the text keywords in the text information, and then displays the obtained fused animation, and can publish the fused animation as information on the social platform, or It is sent as a chat message to other users. In order to further enhance the effect of the fused animation, in the embodiment of the present application, before the terminal displays or transmits the fused animation, the terminal may further determine the effect information corresponding to the text information, and pass the effect information. To adjust the fused animation, wherein the effect information mentioned herein may be the background music, the sound effect of the fused animation, or the voice information corresponding to the text information, etc. The specific method of determining the information and how to adjust the fusion animation through these kinds of effect information will be described in detail below.

For the background music of the fused animation, after obtaining the fused animation, the terminal can further determine each music corresponding to each text keyword from the preset music library according to the recognized text keywords, and the specific determination manner. The music keywords corresponding to the music in the music library are respectively matched, and the music corresponding to the music keyword matching the text keyword is used as the music corresponding to the text keyword, or For each text keyword, respectively calculating the similarity between the text keyword and each music keyword, and selecting music matching the text keyword according to the calculated similarities, wherein the terminal determines through calculation There may be multiple music corresponding to the text keyword. In order to filter music that is more in line with the context of the entire text information, the terminal may further implement a plurality of music corresponding to the text keyword according to the feature information of the text information. Screening to select music that is more in line with the text of the entire text, the specific screening method and the screening of the above description In the same way, this is not a detailed repeat.

For each music keyword corresponding to each music in the preset music library, the terminal may determine, for each music in the music library, a feature capable of representing the music, such as expressing the feature of the music by the Mel cepstrum coefficient MFCC. Then, the terminal may input the feature into the preset music model for the determined characteristics of each music, and determine the music keyword corresponding to the music according to the output result of the music model, specifically The process is the same as the above method of determining the animation keyword, and will not be described in further detail here. After determining the music keywords corresponding to each music, the terminal can associate each music with each music keyword to ensure that it is in the preset music library for later use. Certainly, in the embodiment of the present application, each music keyword corresponding to each music can also be determined by an artificial manner, that is, the music keywords corresponding to each music are manually calibrated and corresponding to each other and saved in the preset music. Library in.

After the terminal determines each music corresponding to each text keyword, each music can be synthesized according to the order of the text keywords in the text information in which they are located, and the corresponding fusion music is obtained, wherein the music is synthesized. The manner of synthesizing the animation is basically the same, for example, the terminal can realize the transition of each music in the fused music by setting the playing effect such as fade out or fade in the music, or by determining the fusion coefficient of each music. The integration of music implementation, the specific process will not be described in detail here.

After determining the fused music, the terminal may synthesize the fused music into the fused animation to further improve the playing effect of the fused animation, wherein the specific merging manner may be: the manner in which the terminal determines the fused animation playing speed To adjust the playing speed of the fused music, so that the fused music and the fused animation are synchronized in the playing speed, or the terminal can cycle the fused music in the fused animation at a certain playing speed, or the terminal can be adjusted. The fusion music playing speed can be based on the text keyword, and the music in the fused music and the animation in the fused animation are mutually correlated, thereby completing the compositing work of the fused music and the fused animation.

It should be noted that, when determining the music keywords corresponding to each music, the terminal may select different music models in different dimensions to determine, for example, when selecting a music model related to sports, the terminal finally passes the music model. The determined music keywords corresponding to the respective music should be related to sports, and when the music model related to the emotion is selected, the final determined music keywords corresponding to the respective music should be related to emotions. Therefore, for each music, there may be more than one music keyword corresponding to the music determined by the terminal through the music model of different dimensions, which lays a foundation for the subsequent terminal to filter the music through the feature information of the text information.

The music model mentioned above can be obtained after training a large amount of sample music collected, and the training method and The above methods for training other models are similar and will not be described in detail here. The background music of the fused animation can be determined by the above description, and the terminal can also determine an overall background music of the fused animation through the feature information of the text information, and then integrate the background music into the fused animation. .

For the sound effects of fused animations, the violentness of the animations is often different at different times. For example, some periods in the animation are relatively soothing on the screen, while some periods are more intense, and The movements of characters in an animation, the speed at which objects travel, and the like are often different at different times. Therefore, in order to further enhance the effect and the interest of the fused animation, in the embodiment of the present application, the terminal can adjust the sound effect of the fused music by monitoring the animation parameters in the fused animation, for example, when the terminal monitors a certain When the animation parameters of the time period change too fast, the fusion music corresponding to this time period can be adjusted more flexibly on the sound effect, or when the characters in the fusion animation perform actions such as clapping, stepping, and panting. At the same time, the terminal can fuse the sound effects corresponding to these actions into the fused music, and of course, other adjustment methods may be used, and the description will not be made here. After the terminal adjusts the sound effect of the fused music, the fused music after adjusting the sound effect can be synthesized into the fused animation, so that the existence of the sound effect further enhances the effect of the fused animation, thereby bringing the user More fun.

In practical applications, the text information input by the user usually contains some specified characters, such as a colon ":", a book name, etc., and the text information contained after the specified characters is usually a special piece of text information, such as colon double quotes. ":" is usually followed by a paragraph. In order to further enhance the effect and fun of the fusion animation, in the embodiment of the present application, the terminal can process a piece of text information after the specified character and process it. The obtained effect information is inserted into the fused animation. The specific manner may be that the terminal can determine the specified character included in the text information. Wherein, the specified character mentioned here may be a colon double quotation mark ":"", and then the terminal may extract a piece of sub-text information following the specified character from the text information according to the specified character, and adopt a voice recognition function The sub-text information is converted into a corresponding voice, and the terminal may insert the voice or the sub-text information corresponding to the voice as the effect information into the fused animation, wherein, for the determined voice, the terminal may The speech is synthesized in the fused animation to realize the dubbing of the fused animation. In the sub-text information, the terminal can insert the sub-text information into the fused animation in a preset display manner, as shown in FIG. 2 . Shown.

FIG. 2 is a schematic diagram showing display of utterance information in a fusion animation according to an embodiment of the present application.

In FIG. 2, when the terminal determines that the sub-text information following the colon double quotation mark “: “” in the text information is a utterance, the sub-text information can be used as the utterance of the character in the fused animation, and the utterance is placed. In the specified dialog box, it is displayed above the characters in the fused animation. Of course, this sub-text information can also be displayed in the fused animation through bubbles, clouds, etc., to enhance the display effect and fun of the fused animation.

It should be noted that the specified characters in the above description are not necessarily the colon double quotes ":", or may be a designated character such as "think:", and the terminal determines that the text information includes "think" and colon " : "When used in combination, it can be determined that the subsequent sub-text information should be described in the heart of the fused animation, and this sub-text information can be used as a heart activity in the fused animation, and displayed in a certain form in the fusion. In the animation, of course, the specified characters can also be used in combination with other characters or characters, such as the words "say", "question", etc., which will not be explained in detail here.

In practical applications, the process of dubbing an animation often involves a lip-shaped problem. Therefore, in the embodiment of the present application, the terminal may also use the entire piece of text information input by the user as a utterance, from the text information. Extracting corresponding voice feature information, and further determining each voice feature The mouth type corresponding to the information, wherein the mouth type mentioned here means that, in general, different syllables have corresponding mouth type categories, and each mouth type corresponds to a respective mouth type. Animation. The pronunciation of a word is usually formed by the pronunciation of several syllables. Correspondingly, the lip animation corresponding to a word should also be composed of animations corresponding to the vocal categories corresponding to several syllables. Therefore, when After the terminal determines each port type category, the mouth shape animation corresponding to each word in the text information is determined correspondingly, and then the mouth shape animation of each word is synthesized as effect information into the fusion animation, as shown in the figure. 3 is shown.

FIG. 3 is a schematic diagram of a lip animation provided by an embodiment of the present application.

In Figure 3, the pictures of "I" and "Line" are listed respectively. The pronunciation of "I" is "wo". Under normal circumstances, the terminal can split "wo" into "w" and "o", and determine the mouth type (mouth image) corresponding to "w" and "o" are respectively shown in Figure c1 and Figure c2, so that the terminal can further determine the corresponding "I" The mouth animation, the same reason, the pronunciation of "row" is composed of two syllables of "x" and "ing", and the terminal can determine according to the mouth type map d1 and the figure d2 corresponding to the two syllables. The mouth animation corresponding to the line.

After determining the mouth-shaped animation, the terminal can synthesize each mouth-shaped animation into the fusion animation according to the position of the single word based on the voice information in the text information, wherein the synthesis method may be: The size of the animation is adjusted according to the character shape of the fused animation, and then the character shape of the fused animation is replaced in turn, and then the fused animation with the matching of the voice and the mouth is obtained.

The above is the method for the animation synthesis provided by the embodiment of the present application. Based on the same idea, the embodiment of the present application further provides an animation synthesis device, as shown in FIG. 4 .

FIG. 4 is a schematic diagram of an apparatus for synthesizing an animation according to an embodiment of the present disclosure, specifically including:

a receiving module 401, configured to receive input text information;

The identification module 402 is configured to identify each text keyword in the text information;

a determining module 403, configured to respectively determine an animation corresponding to each text keyword from a preset animation library;

The compositing module 404 is configured to synthesize the determined animations to obtain a fused animation.

The determining module 403 is specifically configured to: extract feature information in the text information; and, for each text keyword, determine, according to the text keyword and the feature information, from the preset animation library, corresponding to the A text keyword and an animation corresponding to the feature information.

The synthesizing module 404 is specifically configured to synthesize the determined animations according to the order of the keywords in the text information.

The synthesizing module 404 is specifically configured to determine, for any two adjacent animations, a transition animation segment to be inserted between the previous animation and the latter animation, and the previous animation, the transition The animation segment and the subsequent animation are synthesized in sequence; or

For any two adjacent animations, the first specified animation frame of the previous animation is set as the first effect, and the second specified animation frame of the subsequent animation is set as the second effect, and the setting effect is Combining the previous animation with the latter animation, wherein the first effect includes at least a fade-out effect, and the second specified effect includes at least a fade-in effect; or

For any two adjacent animations, determine the similarity between each animation frame image of the previous animation and each animation frame image of the latter animation, and according to the determined similarities, the previous animation and the latter animation Perform the synthesis.

The synthesizing module 404 is specifically configured to: select, from the previous animation, a first animation frame and k animation frames located after the first animation frame, and press each of the selected animation frames in the previous one Sorting the order in the animation to obtain a first frame sequence; from the latter animation, the selection is located in the The k animation frames before the second animation frame and the second animation frame are sorted according to the arrangement order of the selected animation frames in the subsequent animation to obtain a second frame sequence; the first frame sequence and the first frame sequence An animation frame with the same sequence number in the two frame sequence is fused to obtain k+1 fused frames; for each animation frame, each fused frame, and the latter animation before the first animation frame in the previous animation Each animation frame located after the second animation frame is synthesized; wherein k is a positive integer.

The synthesis module 404 is specifically configured to adopt a formula

Determining a fusion coefficient corresponding to each animation frame in the first frame sequence; determining a fusion coefficient corresponding to each animation frame in the second frame sequence by using a formula β(p)=1-α(p); wherein: α(p a fusion coefficient corresponding to the p-th animation frame in the first frame sequence, β(p) is a fusion coefficient corresponding to the p-th animation frame in the second frame sequence; according to the determined fusion coefficients, The first frame sequence and the second frame sequence have the same sequence number of animation frames for fusion.

The device also includes:

The effect determining module 405 is configured to determine effect information corresponding to the text information, and adjust the fusion animation according to the effect information corresponding to the text information.

The effect determining module 405 is specifically configured to determine, according to the identified text keywords, music that matches the text keywords from the preset music library.

The effect determining module 405 is specifically configured to synthesize the determined music according to the order of the text keywords in the text information to obtain the fused music; and synthesize the fused music into the fused animation. .

The effect determining module 405 is specifically configured to: monitor each animation parameter corresponding to the fused animation; adjust the sound effect of the fused music according to each animation parameter; and synthesize the fused music after adjusting the sound effect into the fused animation.

The effect determining module 405 is specifically configured to: extract each voice feature information from the text information; and determine, according to the voice description feature information, each port type corresponding to each voice feature information; a category, determining each mouth-shaped animation corresponding to each of the lip-type categories, and using the respective mouth-shaped animations as the determined effect information.

The effect determining module 405 is specifically configured to synthesize each lip animation into the fused animation according to the position of the single word on which the respective voice feature information is extracted in the text information.

Those skilled in the art will appreciate that embodiments of the present invention can be provided as a method, system, or computer program product. Accordingly, the present invention may take the form of an entirely hardware embodiment, an entirely software embodiment, or a combination of software and hardware. Moreover, the invention can take the form of a computer program product embodied on one or more computer-usable storage media (including but not limited to disk storage, CD-ROM, optical storage, etc.) including computer usable program code.

The present invention has been described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (system), and computer program products according to embodiments of the invention. It will be understood that each of the processes and/or blocks in the flowcharts and/or block diagrams, and the flows in the flowcharts and/or block diagrams can be implemented by computer program instructions and/or Or a combination of boxes. These computer program instructions can be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing device to produce a machine for the execution of instructions for execution by a processor of a computer or other programmable data processing device. Means for implementing the functions specified in one or more of the flow or in a block or blocks of the flow chart.

The computer program instructions can also be stored in a computer readable memory that can direct a computer or other programmable data processing device to operate in a particular manner, such that the instructions stored in the computer readable memory produce an article of manufacture comprising the instruction device. The apparatus implements the functions specified in one or more blocks of a flow or a flow and/or block diagram of the flowchart.

These computer program instructions can also be loaded onto a computer or other programmable data processing device such that a series of operational steps are performed on a computer or other programmable device to produce computer-implemented processing for execution on a computer or other programmable device. The instructions provide steps for implementing the functions specified in one or more of the flow or in a block or blocks of a flow diagram.

In a typical configuration, a computing device includes one or more processors (CPUs), input/output interfaces, network interfaces, and memory.

The memory may include non-persistent memory, random access memory (RAM), and/or non-volatile memory in a computer readable medium, such as read only memory (ROM) or flash memory. Memory is an example of a computer readable medium.

Computer readable media includes both permanent and non-persistent, removable and non-removable media. Information storage can be implemented by any method or technology. The information can be computer readable instructions, data structures, modules of programs, or other data. Examples of computer storage media include, but are not limited to, phase change memory (PRAM), static random access memory (SRAM), dynamic random access memory (DRAM), other types of random access memory (RAM), read only memory. (ROM), electrically erasable programmable read only Memory (EEPROM), flash memory or other memory technology, CD-ROM, digital versatile disc (DVD) or other optical storage, magnetic cassette, magnetic tape storage or other magnetic A storage device or any other non-transportable medium that can be used to store information that can be accessed by a computing device. As defined herein, computer readable media does not include temporary storage of computer readable media, such as modulated data signals and carrier waves.

It is also to be understood that the terms "comprises" or "comprising" or "comprising" or any other variations are intended to encompass a non-exclusive inclusion, such that a process, method, article, Other elements not explicitly listed, or elements that are inherent to such a process, method, commodity, or equipment. An element defined by the phrase "comprising a ..." does not exclude the presence of additional equivalent elements in the process, method, item, or device including the element.

Those skilled in the art will appreciate that embodiments of the present application can be provided as a method, system, or computer program product. Thus, the present application can take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment in combination of software and hardware. Moreover, the application can take the form of a computer program product embodied on one or more computer-usable storage media (including but not limited to disk storage, CD-ROM, optical storage, etc.) including computer usable program code.

The above description is only an embodiment of the present application and is not intended to limit the application. Various changes and modifications can be made to the present application by those skilled in the art. Any modifications, equivalents, improvements, etc. made within the spirit and scope of the present application are intended to be included within the scope of the appended claims.

Claims

An animation synthesis method, comprising:

Receiving input text information;

Identifying each text keyword in the text information;

Determining an animation corresponding to each text keyword from a preset animation library;

The determined animations are combined to obtain a fused animation.
The method of claim 1, wherein the method further comprises: before determining the animation corresponding to each text keyword from the preset animation library, the method further comprising:

Determining feature vectors corresponding to each animation saved in advance;

Determining an animation keyword corresponding to each animation by using a pre-trained first classification model according to the determined feature vector corresponding to each animation;

Save each animation and its corresponding animation keywords in a preset animation library.
The method of claim 1, wherein the animation corresponding to each text keyword is determined from a preset animation library, and specifically includes:

Determining, for each text keyword, a similarity between the text keyword and each animation keyword saved in the animation library;

The animation corresponding to the text keyword is determined according to the determined similarities and the correspondence between the animation keywords and the animation.
The method according to claim 1 or 3, wherein the animation corresponding to each text keyword is respectively determined from a preset animation library, and specifically includes:

Extracting feature information in the text information;

For each text keyword, according to the text keyword and the feature information, an animation corresponding to the text keyword and corresponding to the feature information is determined from a preset animation library.
The method of claim 4 wherein said characteristic information comprises at least: Emotional information;

Before determining the animation corresponding to each text keyword from the preset animation library, the method further includes:

Determining an emotional keyword corresponding to each animation by using a pre-trained second classification model;

The correspondence between each animation and the emotional keyword is saved in a preset animation library.
The method of claim 1, wherein synthesizing the determined animations comprises:

The determined animations are combined according to the order of the text keywords in the text information.
The method according to claim 6, wherein the synthesizing the determined animations comprises:

Determining, for any two adjacent animations, a transitional animation segment to be inserted between the previous animation and the subsequent animation, and synthesizing the previous animation, the transitional animation segment, and the subsequent animation in sequence; or

For any two adjacent animations, the first specified animation frame of the previous animation is set as the first effect, and the second specified animation frame of the subsequent animation is set as the second effect, and the setting effect is The previous animation and the latter animation are combined; or

For any two adjacent animations, determine the similarity between each animation frame of the previous animation and each animation frame of the latter animation, and synthesize the previous animation and the latter animation according to the determined similarities. .
The method of claim 7 wherein said animation comprises: a three-dimensional animation;

The determining the similarity between each animation frame of the previous animation and each animation frame of the subsequent animation includes:

Adopt formula
Determine the Euclidean distance of each animation frame of the previous animation and each animation frame of the latter animation, and determine the similarity of each animation frame of the previous animation to each animation frame of the latter animation according to the determined Euclidean distance Degree, where:

D(i,j) is the Euclidean distance of the i-th animation frame of the previous animation and the j-th animation frame of the latter animation, wherein the smaller the Euclidean distance, the i-th animation frame and the The greater the similarity of the jth animation frame;

The rotation angular velocity vector of the nth bone of the ith animation frame of the previous animation,
a rotation angular velocity vector of the nth skeleton of the jth animation frame of the subsequent animation, wherein the skeleton number of each animation frame in the previous animation is the same as the skeleton number of each animation frame in the subsequent animation;

w n is the bone weight of the nth bone;

The rotation vector of the nth bone of the ith animation frame of the previous animation,
The rotation vector of the nth bone of the jth animation frame of the latter animation;

u is the preset animation intensity factor.
The method of claim 7, wherein determining the similarity between each frame of the previous animation and each frame of the subsequent animation comprises:

Extracting each third specified animation frame in the previous animation, and extracting each fourth specified animation frame in the subsequent animation;

The similarity of each of the third specified animation frames to each of the fourth specified animation frames is determined.
The method according to any one of claims 7 to 9, wherein the synthesizing the previous animation and the subsequent animation according to the determined similarities, specifically comprising:

Determining, according to the determined similarities, a first animation frame from the previous animation, and determining a second animation frame from the latter animation, the first animation frame and the second animation frame satisfy:

Where x ij is the Euclidean distance of the i-th frame animation frame of the previous animation and the j-th frame animation frame of the subsequent animation; i has a value range of [1, the total frame of the previous animation Number]; j has a value range of [1, the total number of frames of the latter animation];

y ij is a comprehensive frame loss rate determined according to the ith frame animation frame and/or according to the jth frame animation frame;

x IJ For a * x ij + b * y ij smallest x ij;

y IJ is y ij which minimizes a*x ij +b*y ij ;

I is the frame number of the first animation frame, and J is the frame number of the second animation frame;

a, b are the corresponding coefficients, a ≥ 0, b ≥ 0;

The previous animation and the subsequent animation are combined according to the first animation frame and the second animation frame.
The method according to claim 10, wherein the integrated frame loss rate determined according to the ith frame animation frame and/or the frame j frame animation frame comprises:

Determining, according to the ith frame animation frame, the number of animation frame frames that do not participate in the merging in the previous animation and does not participate in the merging, and according to the determined animation that does not participate in the merging and does not participate in the merging in the previous animation. The number of frame frames, and the total number of frames of the previous animation, determining an expected frame loss rate of the previous animation;

Determining, according to the j-th frame animation frame, an animation frame number that does not participate in the fusion and does not participate in the synthesis in the subsequent animation, and according to the determined animation that does not participate in the fusion and does not participate in the synthesis in the latter animation Determining an expected frame loss rate of the latter animation by the number of frame frames and the total number of frames of the subsequent animation;

The integrated frame loss rate is determined according to an expected frame loss rate of the previous animation and/or an expected frame loss rate of the subsequent animation.
The method of claim 11, wherein synthesizing the previous animation and the subsequent animation according to the first animation frame and the second animation frame comprises:

From the previous animation, selecting a first animation frame and k animation frames located after the first animation frame, and sorting the selected animation frames in the previous animation sequence, a sequence of first frames; from the latter animation, the selection is before the second animation frame k animation frames and second animation frames, and sorting the selected animation frames in the order of the subsequent animations to obtain a second frame sequence; sorting the first frame sequence and the second frame sequence The animation frames having the same serial number are fused to obtain k+1 fused frames; and the animation frames, the fused frames, and the latter animation located before the first animation frame in the previous animation are located in the first animation Each animation frame after the second animation frame is synthesized; wherein k is a positive integer.
The method according to claim 12, wherein the merging of the animation frames having the same sequence number in the first frame sequence and the second frame sequence comprises:

Adopt formula
Determining a fusion coefficient corresponding to each animation frame in the first frame sequence;

Determining a fusion coefficient corresponding to each animation frame in the second frame sequence by using a formula β(p)=1-α(p);

among them:

α(p) is a fusion coefficient corresponding to the p-th animation frame in the first frame sequence, and β(p) is a fusion coefficient corresponding to the p-th animation frame in the second frame sequence;

The animation frames having the same sequence number in the first frame sequence and the second frame sequence are fused according to the determined fusion coefficients.
The method of claim 1 wherein the method further comprises:

Determining effect information corresponding to the text information;

The fusion animation is adjusted according to the effect information corresponding to the text information.
The method of claim 14, wherein the determining the effect information corresponding to the text information comprises:

Based on the recognized text keywords, the music matching the text keywords is respectively determined from the preset music library.
The method of claim 15 wherein said blending animation is adjusted Body includes:

Combining the determined music according to the order of the text keywords in the text information to obtain the fused music;

The fused music is synthesized into the fused animation.
The method according to claim 15, wherein before the music matching the text keywords is separately determined from the preset music library, the method further comprises:

Determining, respectively, features corresponding to each music saved in advance, the features including a Mel Cepstrum coefficient MFCC feature;

Determining a music keyword corresponding to each music through a pre-trained music model according to the determined characteristics corresponding to each music;

The music and its corresponding music keywords are saved in a preset music library.
The method of claim 16, wherein the merging the fused music into the fused animation comprises:

Monitoring each animation parameter corresponding to the fusion animation;

Adjusting the sound effect of the fused music according to each animation parameter;

The fused music after adjusting the sound is synthesized into the fused animation.
The method of claim 14, wherein the determining the effect information corresponding to the text information comprises:

Determining a specified character included in the text information;

Extracting sub-text information in the text information according to the specified character;

Converting the sub-text information into speech;

The sub-text information and/or the voice is used as effect information.
The method of claim 19, wherein the adjusting the fusion animation comprises:

Pressing the sub-text information according to the position of the sub-text information in the text information Inserting into the fused animation in a preset display manner, and/or synthesizing the speech into the fused animation.
The method of claim 14, wherein the determining the effect information corresponding to the text information comprises:

Extracting each voice feature information from the text information;

Determining, according to the voice description feature information, each port type corresponding to each voice feature information;

And determining, according to each of the lip-type categories, each of the lip-shaped animations corresponding to the respective lip-type categories, and using the respective lip-shaped animations as the determined effect information.
The method of claim 21, wherein the adjusting the fusion animation comprises:

Each lip animation is synthesized into the fused animation based on the position of the single word on which the respective voice feature information is extracted in the text information.
An apparatus for animating and synthesizing, comprising:

a receiving module, configured to receive input text information;

An identification module, configured to identify each text keyword in the text information;

a determining module, configured to respectively determine an animation corresponding to each text keyword from a preset animation library;

A synthesis module for synthesizing the determined animations to obtain a fusion animation.
The apparatus according to claim 23, wherein the synthesizing module is configured to synthesize the determined animations according to the order of the keywords in the text information.
The device according to claim 23, wherein the synthesizing module is specifically configured to determine, for any two adjacent animations, a transition animation to be inserted between the previous animation and the latter animation a segment that synthesizes the previous animation, the transitional animation segment, and the subsequent animation in sequence; or

For any two adjacent animations, the first specified animation frame of the previous animation is set as the first effect, and the second specified animation frame of the subsequent animation is set as the second effect, and the setting effect is Combining the previous animation with the latter animation, wherein the first effect includes at least a fade-out effect, and the second specified effect includes at least a fade-in effect; or

For any two adjacent animations, determine the similarity between each animation frame image of the previous animation and each animation frame image of the latter animation, and according to the determined similarities, the previous animation and the latter animation Perform the synthesis.
The apparatus according to claim 25, wherein the synthesizing module is configured to: select, from the previous animation, a first animation frame and k animation frames located after the first animation frame, and Sorting the selected animation frames in the previous animation sequence to obtain a first frame sequence; from the latter animation, selecting k animation frames before the second animation frame and The second animation frame is sorted according to the arrangement order of the selected animation frames in the subsequent animation to obtain a second frame sequence; the first frame sequence and the second frame sequence are merged with the same sequence number of the animation frame. Obtaining k+1 fused frames; for each animation frame before the first animation frame, each fused frame in the previous animation, and each animation after the second animation frame in the latter animation The frame is synthesized; where k is a positive integer.
The apparatus according to claim 26, wherein said synthesizing module is specifically configured to adopt a formula
Determining a fusion coefficient corresponding to each animation frame in the first frame sequence; determining a fusion coefficient corresponding to each animation frame in the second frame sequence by using a formula β(p)=1-α(p); wherein: α(p a fusion coefficient corresponding to the p-th animation frame in the first frame sequence, β(p) is a fusion coefficient corresponding to the p-th animation frame in the second frame sequence; according to the determined fusion coefficients, The first frame sequence and the second frame sequence have the same sequence number of animation frames for fusion.