WO2018049979A1 - Animation synthesis method and device - Google Patents

Animation synthesis method and device Download PDF

Info

Publication number
WO2018049979A1
WO2018049979A1 PCT/CN2017/099462 CN2017099462W WO2018049979A1 WO 2018049979 A1 WO2018049979 A1 WO 2018049979A1 CN 2017099462 W CN2017099462 W CN 2017099462W WO 2018049979 A1 WO2018049979 A1 WO 2018049979A1
Authority
WO
WIPO (PCT)
Prior art keywords
animation
frame
previous
frames
text
Prior art date
Application number
PCT/CN2017/099462
Other languages
French (fr)
Chinese (zh)
Inventor
吴松城
方小致
刘守达
林明安
陈军宏
Original Assignee
厦门幻世网络科技有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 厦门幻世网络科技有限公司 filed Critical 厦门幻世网络科技有限公司
Publication of WO2018049979A1 publication Critical patent/WO2018049979A1/en

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T13/00Animation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/40Information retrieval; Database structures therefor; File system structures therefor of multimedia data, e.g. slideshows comprising image and additional audio data
    • G06F16/48Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually

Definitions

  • the present application relates to the field of computer technology, and in particular, to a method and an apparatus for animation synthesis.
  • WIFI wireless access
  • 3G 3G
  • 4G Internet access methods
  • IM instant messaging
  • social software such as Weibo
  • these software can continuously expand the social relationship of users.
  • the information sharing is realized, thereby further realizing the information browsing needs of users in the information age.
  • the information is usually presented in the following two ways: First, the user enters the corresponding text information in the interface of the social software and publishes it. In this way, the information published by the user is presented in the form of text; secondly, the user issues his own voice as information through the voice transmission function in the social software (especially the IM software).
  • these two forms of information release can effectively guarantee the normal presentation of information, however, both text information and voice information are too singular in the form of information expression, and text information or voice information is often insufficient. Expressing the full meaning of the information, this gives users the inconvenience of browsing this information.
  • the embodiment of the present invention provides a method for synthesizing an animation, which is used to solve the problem that the text information or the voice information in the prior art cannot fully express the meaning and cause inconvenience to the user in browsing the information.
  • the embodiment of the present application provides a method for animation synthesis, including:
  • the determined animations are combined to obtain a fused animation.
  • An embodiment of the present application provides an apparatus for animation synthesis, including:
  • a receiving module configured to receive input text information
  • An identification module configured to identify each text keyword in the text information
  • a determining module configured to respectively determine an animation corresponding to each text keyword from a preset animation library
  • a synthesis module for synthesizing the determined animations to obtain a fusion animation.
  • the embodiment of the present application provides a method and an apparatus for synthesizing an animation.
  • a terminal can receive text information input by a user, and identify each text keyword from the text information, and then the terminal can obtain a preset animation library.
  • the animations corresponding to the respective text keywords are respectively determined, and each animation is synthesized according to the arrangement order of the keywords in the text information to obtain a fusion animation. Since the animation can express the meaning of the information more fully and vividly with respect to the text information, the text information is converted by the way of presenting the information in the form of text or voice.
  • the obtained animation can more fully and vividly express the meaning of the information itself, thereby bringing the user the fun and convenience in the process of reading the information.
  • FIG. 2 is a schematic diagram showing display of utterance information in a fusion animation according to an embodiment of the present application
  • FIG. 3 is a schematic diagram of a mouth animation provided by an embodiment of the present application.
  • FIG. 4 is a schematic diagram of an apparatus for animation synthesis according to an embodiment of the present application.
  • FIG. 1 is a process of animation synthesis provided by an embodiment of the present application, which specifically includes the following steps:
  • S101 Receive input text information.
  • the terminal can convert the text information input by the user into an animation, so that the meaning of the information itself is more fully and vividly expressed by the animation.
  • the terminal may first receive text information input by the user, wherein the terminal mentioned here may be a smart device such as a smart phone or a tablet computer, and of course, the user may also be in the terminal. Enter text information in the client.
  • the work of converting the text information into the corresponding animation may also be performed by an application such as a client or an application in the terminal, and the animation provided by the embodiment of the present application is illustrated for convenience and convenience.
  • the method of synthesis is followed by a detailed description of the terminal.
  • the animation corresponding to each phrase in the actual is also different. For example, if the text information is "Xiao Ming was raining when playing yesterday," from this text message. It can be seen that the animation that the text information may involve has an animation of raining and an animation of Xiaoming kicking the ball. Therefore, the animation that should be expressed by this text information should be the result of the synthesis of the two animations. Based on this, the terminal should identify each text keyword from the text information before converting the received text information into an animation, the purpose of which is to identify the text keyword to determine the text information may be involved in. The animation, and then in the subsequent process, the determined animations are combined to obtain a fusion animation corresponding to the text information.
  • the terminal may segment the text information to obtain a plurality of phrases, and then pass the reverse text probability IDF value corresponding to each phrase saved in advance, and the word frequency TF of each phrase.
  • the text keywords included in the text information are determined from each phrase.
  • the specific implementation manner may be: inputting each phrase into a preset TF-IDF model, and the preset TF-IDF model may be for each
  • the phrase determines the inverse text probability IDF value corresponding to the phrase and the word frequency TF, and obtains the important representation value of the phrase by calculating the product of the two, and then the preset TF-IDF model can respectively correspond to the calculated phrases.
  • the important characterization values are output, and the terminal can sort the phrases according to the size of the important characterization values, and use the first few phrases as the text keywords of the text information.
  • the text keyword of the text information may also be determined from each phrase through a pre-trained recognition model, wherein the pre-trained recognition model may be a Hidden Markov Model (HMM) or the like. Machine learning model.
  • HMM Hidden Markov Model
  • Machine learning model The manner in which the text keywords are determined by the pre-trained recognition model is prior art, and therefore, no overstatement is made here.
  • the embodiment of the present application is intended to convert the text information input by the user into a corresponding animation. Therefore, after determining the text keywords included in the text information, the terminal may determine each text key from the preset animation library. Each animation corresponding to the word, and then in the subsequent process, the determined animations are combined to obtain an animation corresponding to the text information.
  • the terminal may separately determine, for each text keyword, each animation keyword corresponding to each animation in the preset animation library and each of the text keywords. Similarity, wherein each animation keyword corresponding to each animation in the preset animation library can be calibrated in advance by an artificial method. For example, if the content displayed in an animation is played by a person, the manual manner can be manually The animation keyword corresponding to the animation is categorized as sports, and the animation and the animation keyword sports are correspondingly stored in a preset animation library.
  • each animation keyword corresponding to each animation in the preset animation library may also be calibrated by a pre-trained first classification model.
  • the terminal may first convert each pre-saved animation into a corresponding feature vector, wherein converting the animation into a corresponding feature vector may be performed in the following manner: in actual application, the duration and severity of each animation are not In the same way, in each animation, the animation frames with the largest amount of change between animation frames are often the most distinguishable from other animations. Therefore, in the embodiment of the present application, the terminal converts each animation into a corresponding one. For the eigenvector, the amount of change T between each animation frame in the animation can be determined separately for each animation, and the change is selected.
  • the z animation frames with the largest amount T are used as the animation frames representing the animation, and then the terminal can determine the sub-feature vectors corresponding to each animation frame for the selected z animation frames, wherein, for the three-dimensional animation
  • the terminal can determine the sub-feature vector l corresponding to the animation frame according to the animated bone space coordinates in the animation frame, the bone acceleration between the frames, and the like, and further determine the sub-feature vectors of the z animation frames according to the respectively determined , convert the animation to the corresponding feature vector.
  • each animation may be converted into a corresponding feature vector by other means, for example, for each animation, respectively, corresponding to each animation frame in the animation.
  • the sub-feature vector, and then the terminal converts the animation into a corresponding feature vector according to each sub-feature vector corresponding to all the animation frames in the animation, and of course, other methods may be used. It is.
  • each feature vector may be separately input into a pre-trained first classification model, wherein for each feature vector, the first classification model implements the feature vector.
  • the first classification model implements the feature vector.
  • several values can be obtained, wherein each value corresponds to a keyword, and when the terminal finds that a certain value is greater than other values among the values, the keyword corresponding to the value can be used as the keyword.
  • the animation keyword of the animation, and the animation is associated with the animation keyword and saved in the preset animation library.
  • the classification model described above may be a training model such as a neural network model, a hidden Markov model HMM, or a Support Vector Machine (SVM).
  • a large number of sample animations can be collected first, and each sample animation is converted into a vector, a parameter, and the like, respectively, and input into the classification model, and then the classification model is trained.
  • SVM Support Vector Machine
  • each animation usually corresponds to multiple keywords.
  • the animation key corresponding to the animation The word can be sports, it can be playing football, or keywords such as happy and cheerful, so when the terminal determines the animation corresponding to a keyword, it may determine a plurality of animations and the key from the preset animation library.
  • the word correspondingly therefore, in order to be able to further accurately determine the animation corresponding to the keyword, in the embodiment of the present application, the terminal may further determine the feature information corresponding to the text information from the received text information, and according to The feature information and each keyword determine each animation corresponding to each keyword from a preset animation library.
  • the terminal may further extract the feature information in the text information, and the specific extraction manner may be: the terminal uses the preset feature analysis model to the text. The information is analyzed, and the feature information in the text information is extracted. For example, suppose a piece of text message is "We will play football really tomorrow!, the terminal can convert this paragraph into a corresponding sequence of word vectors (since this passage is composed of multiple words, so this paragraph will be After each word in the word is converted into a word vector, the word vector is sorted according to the position of each word in the paragraph, and a sequence of word vectors capable of representing the phrase can be obtained, and the word vector sequence is input to the pre-predicate.
  • the feature analysis model is set, and then the result of the feature analysis model is used to determine that the emotion expressed from the entire context of the passage should be a happy and happy emotion. Therefore, the terminal extracts the feature information from the passage. It should be happy or happy.
  • the software developer can also pre-establish an emotional vocabulary library, and input the emotional vocabulary library into the terminal for storage.
  • the text information can be Each word is compared with each emotional word in the emotional vocabulary library to determine the emotional information corresponding to the text information.
  • the terminal can further "play soccer” and feature information according to the text keyword. "Happy”, from the preset animation library, filter out the corresponding text keywords and feature information Animation. Since the text keyword "playing soccer” may correspond to a plurality of animations in the preset animation library, the terminal may further perform a plurality of animations corresponding to the text keyword "playing soccer” through the feature information "happy". The screening, and then the animation corresponding to the text keyword "playing football” and the feature information "happy" are determined.
  • the feature information described above may be emotional information such as "happy”, “happy”, “sad”, and in order to enable the terminal to filter the corresponding animation from the preset animation library through the emotional information, it is necessary to calibrate each
  • the emotional keyword corresponding to the animation further enables the terminal to determine the animation corresponding to the emotional information by matching the emotional information with the emotional keyword. Therefore, in the embodiment of the present application, the emotion information of each animation can be calibrated in advance by an artificial method. For example, if an animation shows that the content is a person sitting in a chair and crying, it can be manually The emotion information corresponding to the animation is determined to be "sadness.”
  • the emotional keyword corresponding to each animation may also be determined by the pre-trained second classification model.
  • the specific method may be: after each animation is converted into a corresponding feature vector, each feature vector may be obtained. Inputting into the pre-trained second classification model respectively, and then determining the emotional keywords corresponding to the animations according to the output of the second classification model, and then matching the animations with the emotion information, wherein the
  • the training method of the two-category model can be the same as the above-mentioned training of the first classification model, and will not be described in detail here.
  • each feature keyword corresponding to each feature information should also be stored in a preset animation library corresponding to each animation, and when determining each feature keyword corresponding to each animation, the same can be pre-trained.
  • Classification model to determine, specific determination process and It is determined that the animation keywords corresponding to the respective animations are the same, and will not be described in detail here.
  • the classification model mentioned here may also be a model such as a neural network model, a hidden Markov model HMM, a support vector machine SVM, or the like.
  • an animation may correspond to a plurality of feature keywords. Therefore, in order to further accurately determine an animation corresponding to the text keyword, in the embodiment of the present application, the terminal may also extract text information from different angles. The plurality of feature information may further filter a plurality of animations corresponding to the text keyword according to the extracted plurality of feature information, thereby more accurately displaying the animation corresponding to the text keyword with respect to the entire text information.
  • each animation may be combined to obtain a fusion animation capable of representing the text information, wherein the terminal may synthesize each animation by using Each animation is synthesized in the order in which the text keywords are arranged in the text information.
  • the terminal can identify “clear sky”, “I”, “fishing” from the text information through a pre-trained recognition model. a text keyword, and then the terminal determines three animations H, X, and C corresponding to the three text keywords "clear sky”, “me”, and "fishing” from the preset animation library, and then according to The three text keywords are arranged in the text message of "Today's Clear Sky, I want to go fishing".
  • the three animations H, X, and C are arranged to obtain the animation sequence to be synthesized as H, X, C, then, the terminal can synthesize the three animations according to the animation sequence H, X, and C to be fused, and finally obtain a fused animation representing the text information.
  • the two animations may be different. If you combine two different animations directly, the synthesized animation will look like a clear jump. Therefore, in order to make the synthesized animation look more natural, in the embodiment of the present application, a piece of animation for transition can be inserted in any two adjacent animations, and the animation segment and the two The adjacent animations are combined to obtain a fused animation.
  • the transition animation segments to be inserted between the two animations are determined by the two animations, wherein the terminal can determine the transition animation by interpolation. Fragment.
  • animation A and animation B are two adjacent animations, where animation A is the previous animation, and animation B is the latter animation, and animation A and animation B have significant differences, so in order to synthesize the two In the process of animation, these differences are eliminated.
  • the terminal can analyze the motion of the characters in the animation A and the animation B, and determine the transition animations of the animations a1 and b1 to be inserted into the animation A and the animation B by interpolation.
  • the characters from the two transitional animation segments a1 and b1 are in the order of a1 and b1, and the characters in the animation A are successively transitioned to the animation B, so that the animation A will be present due to the existence of the transitional animation segment.
  • the animations obtained by synthesizing the transitional animation segments a1, b1 and animation B in order will be a coherent animation, and there will be no jumping feeling caused by the difference between the animations A and B.
  • the terminal may also add a certain effect between two adjacent animations to eliminate the difference between the two adjacent animations.
  • animations are composed of animation frames, and each animation frame is arranged in a certain order and quickly projected to obtain a corresponding animation.
  • the two animations will often be the two animations that have differences.
  • the animation frame is determined, wherein, for the two animations, when the two animations are played in order, the last animation frame of the previous animation and the first animation frame of the latter animation can be used as the two Animated frames used for animation. Therefore, for two different animations, the way to eliminate or reduce the difference between the two animations may be to perform certain processing on the animation frames used for the two animations.
  • the specific processing method may be: When the terminal determines each animation to be merged and arranges each animation according to the arrangement order of each text keyword in the text information, the terminal may specify the first specified animation of the previous animation for any two adjacent animations.
  • each second specified animation frame of the latter animation is set as the second effect, wherein, if the last animation frame of the previous animation and the first few animation frames of the latter animation. Therefore, in order to make the synthesized animation not have obvious jumping feeling, the terminal should try to eliminate or reduce the last animation of the previous animation.
  • the difference between the frame and the first few animation frames of the latter animation is to ensure the integrity of the animation after the connection.
  • the animation may try to select the first few frames after a respective second specified as the animation frame animation.
  • the terminal may set the first specified animation frame to effect such as fade out, box-shaped contraction, etc., and the terminal may set each second specified animation according to the effect of the first specified animation frame.
  • the effect of the frame is set to be opposite to the first specified animation frame. For example, when the terminal sets the effect of each first specified animation frame to fade out, the second specified animation frame of the subsequent animation may be correspondingly The effect is set to fade in effect.
  • the terminal sets the effects for each of the first specified animation frames of the previous animation and the second designated animation frames of the subsequent animation, the two animations can be combined.
  • the terminal when the synthesized animation is played to each of the first designated animation frames and each of the second specified animation frames, the terminal And the effect respectively set by each of the second specified animation frames will eliminate or reduce the difference between the animation frames, so that the synthesized animation does not have a significant jump feeling during the playing process.
  • the terminal may also be in the two animations.
  • the animation frames similar to each other are respectively determined, and the animation frames similar to each other are synthesized into an animation frame in a certain manner, and then the two animations are synthesized according to the synthesized animation frames.
  • the terminal may respectively determine the similarity between each animation frame of the previous animation and each animation frame of the latter animation, and respectively according to the determined similarities, respectively from the previous animation. Selecting the first animation frame and selecting the second animation frame from the latter animation, and merging the first animation frame and the second animation frame to obtain a fused frame, wherein the first animation frame and the first animation frame are selected The second animation frame has the highest similarity in the previous animation and the latter animation. Then, the terminal may further synthesize each animation frame, the fused frame, and each animation frame located after the second animation frame in the previous animation in the previous animation to obtain the fused animation.
  • the animation C includes a total of 5 animation frames from #1 to #5
  • the animation D includes a total of 7 animation frames from *1 to *7
  • the terminal is determined.
  • the similarity between each animation frame in animation C and each animation frame in animation D is found.
  • the #3 animation frame in animation C has the highest similarity to the *2 animation frame in animation D. Therefore, the terminal can be in the animation C.
  • the #3 animation frame is fused with the *2 animation frame in the animation D to obtain the corresponding fused frame.
  • the specific fusion method may be that the animation frames #1, #2, the fused frame, and the animation frames *3 to *7 are combined into one in order. Animation, while animation frames #4, #5 and animation D in animation C The animation frame *1 can be removed accordingly.
  • the terminal can determine by calculating the Euclidean distance between each animation frame, wherein, for ordinary two-dimensional animation, the terminal can pass the three primary colors of the image (red, green, Blue) to construct the feature parameters of the picture, and determine the similarity between each animation frame by calculating the Euclidean distance between each feature parameter. Generally, the smaller the Euclidean distance value is between the two animation frames. The similarity is greater.
  • the feature parameters corresponding to each animation frame cannot be simply constructed by the three primary colors of the image. Therefore, for 3D animation, the feature parameters corresponding to each animation frame in the 3D animation. It can be represented by the parameters of each animation frame in the skeletal animation. Specifically, in the embodiment of the present application, when determining the similarity between each animation frame in the previous animation and each animation frame in the subsequent animation, the terminal may separately determine each animation frame in the skeletal animation.
  • the rotational angular velocity vector of the bone, the bone weight of each bone, the rotation vector of each bone, and the intensity factor of the animation and then the terminal can adopt the formula Determine the Euclidean distance between each animation frame in the previous animation and each animation frame in the latter animation, and then each animation frame and the next animation of the previous animation according to the determined Euclidean distance.
  • the similarity of the animation frame is determined, wherein D(i, j) is the Euclidean distance of the i-th animation frame of the previous animation and the j-th animation frame of the latter animation, and the smaller the Euclidean distance, the previous animation
  • D(i, j) is the Euclidean distance of the i-th animation frame of the previous animation and the j-th animation frame of the latter animation
  • the smaller the Euclidean distance the previous animation
  • the similarity between the i-th animation frame and the j-th animation frame of the latter animation is greater.
  • the rotation angular velocity vector of the nth bone of the ith animation frame of the previous animation The rotation angular velocity vector of the nth bone of the jth animation frame of the latter animation, wherein the criteria used for the skeletal animation in the actual application are the same, in other words, for two different skeletal animations,
  • the bones of the hand or foot are usually the same, so the nth bone of the i-th animation frame mentioned here and the nth bone of the j-th animation frame represent the same part of the bone.
  • the bone number of each animation frame in the previous animation is the same as the bone number of each animation frame in the latter animation.
  • the w n in the above formula represents the bone weight of the nth bone
  • the formula Represents the rotation vector of the nth bone of the ith animation frame of the previous animation.
  • the rotation vector of the nth bone of the jth animation frame of the latter animation; and u in the formula is represented as the preset animation intensity coefficient.
  • the terminal when calculating the Euclidean distance between two animation frames, the terminal starts from two aspects of the bone rotation vector and the bone rotation angular velocity vector. Each bone is compared in turn, and the calculated Euclidean distance is relatively accurate.
  • the above formula is not unique, and other bone parameters can be introduced to further accurately determine the Euclidean distance between the animation frames, and then determine the animation frames by determining the Euclidean distance between the animation frames. The similarity between the two.
  • each animation frame of the previous animation can also be determined by means such as dot product, that is, after calculating the dot product of the two animation frames.
  • dot product that is, after calculating the dot product of the two animation frames.
  • the similarity of the two animation frames is determined by the dot product, and the specific process will not be described in detail.
  • the above description may result in the loss of multiple animation frames by determining the two animation frames with the highest similarity in the previous animation and the latter animation. For example, continue to use the above example, assuming #2 in the animation C When the *5 similarity in the animation D is the highest, the terminal will discard the animation frames #3 to #5 in the animation C and the animation frames in the animation D *1 to * in the process of synthesizing the animation C and the animation D. 4, that is to say, the terminal will lose 7 animation frames, and the animation C and animation D have a total of 12 frames. In this way, due to the excessive number of dropped frames, the final synthesized animation of the terminal will be affected by the effect. Impact.
  • each third specified animation frame may be extracted in the previous animation and each fourth specified animation frame may be extracted from the latter animation.
  • the third specified animation frame mentioned here refers to a continuous part of the animation frame in the previous animation.
  • the latter animations in the previous animation may be selected.
  • each of the fourth designated animation frames mentioned herein refers to a continuous part of the animation frame in the latter animation
  • the terminal can select the first several animation frames in the latter animation as the first Four specified animation frames, and then the terminal can further determine the similarity between each of the third specified animation frames and each of the fourth specified animation frames, and select two animation frames with the highest similarity according to the similarity for fusion. And then synthesize the animation through the fused frame.
  • the terminal determines the similarity between each animation frame in the animation C and each animation frame in the animation D
  • the animation frame of #3 to #5 in the animation C and the animation D in the animation D can be taken.
  • the animation frames are fused, and the animation C and the animation D are combined according to the obtained fused frame.
  • the terminal since the terminal determines the similarity between each animation frame in the previous animation, only the similarity between a part of the animation frame in the previous animation and a part of the animation frame in the latter animation is determined.
  • the terminal synthesizes the animation according to the similarity between the animation frames, the number of dropped frames can be effectively controlled within a certain range, thereby reducing the adverse effect of frame dropping on animation synthesis to a certain extent.
  • the above-mentioned synthesis method can reduce the disadvantages caused by frame dropping to a certain extent, since the similarities determined by the terminal are only the similarities between the animation frames of the previous animation and the latter animation, Among the similarities of some animation frames, even the two animation frames with the highest similarity The actual difference may also be relatively large, which in turn leads to an animation that is synthesized based on the two animation frames.
  • the terminal may proceed from two aspects of frame loss rate and similarity to determine two animation frames to be merged, wherein the frame loss mentioned here
  • the rate refers to the ratio of the number of frames that have not been fused and not synthesized to the total number of frames of the animation in an animation. For example, suppose there are a total of 12 animation frames in two animations, and the terminal performs the two animations. When synthesizing, there are 4 animation frames that are discarded by the terminal during the synthesis process. That is, the 4 animation frames are not involved in the fusion process, nor participate in the synthesis process of the two animations.
  • the frame loss rate for two animations is 1/3.
  • the terminal may first determine the similarity between each animation frame in the previous animation and each animation frame in the subsequent animation, and for each similarity
  • the animation frame determines the frame loss rate corresponding to the synthesized animation when the two animation frames are used as the fused frame to synthesize the animation.
  • the terminal may determine the first animation frame from the previous animation, and determine the second animation frame from the latter animation, where the first animation frame and The second animation frame satisfies the formula
  • x IJ is the smallest x ij that makes a*x ij +b*y ij , which is the Euclidean distance between the first animation frame and the second animation frame
  • x ij is the i-th of the previous animation.
  • the Euclidean distance of the frame animation frame and the j-frame animation frame of the latter animation the value range of i is 1 to the total number of frames of the previous animation, and the value range of j is 1 to the total number of frames of the latter animation;
  • y IJ is the minimum y ij of a*x ij +b*y ij , that is, the integrated frame loss rate determined according to the first animation frame and/or according to the second animation frame, correspondingly, y ij is according to the i
  • the frame animation frame and/or the integrated frame loss rate determined according to the j-th frame animation frame, a and b are corresponding coefficients, and the coefficient can be determined manually, and only needs to be guaranteed to be not less than 0.
  • y ij does not refer to the actual frame loss rate of the previous animation and the latter animation in the actual synthesis process, but a value that can represent the actual frame loss rate. Although this value cannot truly represent the animation synthesis process.
  • the true frame loss rate is positively related to the frame loss rate during animation synthesis. Therefore, when the value of y ij is small, the previous animation and the latter animation are combined according to y ij .
  • the frame loss rate will also be relatively small.
  • an expected frame loss rate of the previous animation may be determined according to the ith frame animation frame for the ith frame animation frame in the previous animation, and the determined The expected frame loss rate of the previous animation is taken as the integrated frame loss rate y ij , or the terminal can be used for the j-th frame animation frame in the latter animation, and an expectation of the latter animation is determined according to the j-frame animation frame.
  • the frame rate is lost, and the expected frame loss rate of the latter animation is determined as the integrated frame loss rate y ij , wherein the expected frame loss rate of the previous animation mentioned here may be: the terminal according to the ith frame animation frame , determining the ratio of the number of animation frame frames that do not participate in the fusion and does not participate in the composition of the previous animation according to the animation frame of the i-th frame, and the previous animation is combined with the previous animation.
  • the ratio of the animation frame discarded by the previous animation to the total number of frames of the previous animation may be: the terminal is animated according to the jth frame Frame, determining that the next animation is based on the jth frame Draw a frame, when compared with the previous animation, the ratio of the number of frames of the animation frame that does not participate in the fusion and does not participate in the composition and the total number of frames of the latter animation in the latter animation, that is, the previous animation and the latter animation are synthesized. In the process, the ratio of the animation frame discarded by the latter animation to the total number of frames of the latter animation.
  • the y ij described above is expressed as the integrated frame loss rate determined by the terminal according to the i-th frame animation frame or the integrated frame loss rate determined by the terminal according to the j-th frame animation frame in the process of synthesizing two adjacent animations.
  • the terminal passes the formula
  • the determined first animation frame and the second animation frame are determined based on the frame loss rate and the similarity. Therefore, the animation synthesized by the terminal in the above manner can reduce the disadvantage of frame dropping to some extent. influences.
  • the frame loss rate may not be able to represent the overall animation when the two animations are synthesized.
  • the frame loss rate assumes that for two adjacent animations, when the terminal synthesizes the two animations, the animation frames to be merged are respectively selected from the two animations, resulting in an animation corresponding to the lost The frame rate may be relatively low, and the frame loss rate of another animation may be very high. If the terminal only considers the two animations through the two fused animation frames, the frame loss rate of one of the animations may be lower. Regardless of the fact that this will cause the frame loss rate of another animation to be higher, after the terminal synthesizes the two animations in this way, the overall frame loss rate of the two animations may also be relatively high, which ultimately affects The effect of the fusion animation.
  • the manner in which the terminal determines y ij may be based on the integrated frame loss rate of the adjacent two animation synthesis processes determined according to the i-th frame animation frame and the j-th frame animation frame, that is,
  • the determination method of the y ij considers the frame loss situation of the two animation synthesis processes, and the specific determination manner may be: the terminal is passing the formula When the first animation frame and the second animation frame are determined, the i-th frame animation frame may be selected from the previous animation and the j-th frame animation frame may be selected from the latter animation, and then the terminal may determine the European state of the two animation frames.
  • y ij herein may be that the terminal determines the previous animation according to the ith frame animation frame
  • the terminal according to the formula Determine that a certain pair of animation frames are satisfied
  • the first animation frame and the second animation frame can be determined by the pair of animation frames, and correspondingly, x ij and y ij corresponding to the pair of animation frames become x IJ and y IJ .
  • the terminal is determining the first animation frame and The second animation frame, when passing the formula It is found that the fourth frame animation frame of the animation G and the second frame animation frame of the animation H are merged, and the animation G and the animation H, a*x 42 +b*y 42 are synthesized in all combinations.
  • the smallest one wherein when determining the value of y 42 , the terminal can determine that when the animation G is combined with the animation H according to the fourth frame animation frame, the fifth and sixth frame animation frames included in the animation G are discarded.
  • the terminal determines that the expected frame loss rate of the animation G is 1/3 according to the fourth frame animation frame of the animation G. Similarly, the terminal may further determine that when the animation H is combined with the animation G according to the second frame animation frame. , the first frame animation frame contained in the animation H will be discarded, so the terminal can determine the expected frame loss rate of the animation H according to the second frame animation frame of the animation H, and then the two expected frame loss rates.
  • the sum value 7/12 is taken as the value of y 42 .
  • the terminal can pass the above formula Determining the Euclidean distance G Animation Animation Frame 4 H animation frame in the second frame of the animation frame, and the determined value as the Euclidean distance to x 42 in.
  • y ij can be determined by using the sum of the expected frame loss rate of the previous animation in the two adjacent animations and the expected frame loss rate of the latter animation as y ij .
  • the average of the two expected frame loss rates is taken as the y ij , and the weighted sum of the expected frame rate can be assigned as the y ij , or
  • the sum of the two expected frame loss rates is opened, and the value obtained by rooting is taken as the y ij .
  • the y ij can also be the actual frame loss rate of the previous animation and the latter animation.
  • the meaning of y ij is to be able to characterize the frame loss rate of two adjacent animations at the time of composition, that is, the y ij should be positively correlated with the frame loss rate of the adjacent two animations, so no matter y ij is determined what manner, the terminal can be determined with the y ij frame loss rate after two animations can be positively related to the synthesis of adjacent, as for the determination is not the only way.
  • the first animation frame to be merged and the second animation frame can ensure that the frame loss rate of the animation synthesized by the two animation frames is as low as possible, and the two animations can be guaranteed to a certain extent. Frames can be as similar as possible, further reducing the impact of dropped frames on animation synthesis.
  • the determined animation frame to be merged may be a plurality of pairs of animation frames, so when encountering such a situation, the terminal may further determine a pair of animation frames with the highest similarity from the plurality of pairs of animation frames for fusion. Or choose a pair of animation frames with the lowest frame loss rate from the multiple pairs of animation frames for fusion. Specifically, the terminal may determine a third animation frame in each first animation frame and a fourth animation frame in each second animation frame, wherein a similarity between the third animation frame and the fourth animation frame is the highest, or After the animation is synthesized according to the third animation frame and the fourth animation frame, the corresponding frame loss rate is the lowest.
  • the determined animation frames to be merged are used to reduce the adverse effects of the dropped frames as much as possible. Therefore, in the frames to be fused, whether the animations are synthesized with the highest similarity (ie, the European distance is the smallest), The lowest frame rate is used to synthesize the animation, and the resulting synthesized animation is as effective as possible to reduce the adverse effects of frame dropping.
  • the terminal may also determine the number of the animation frames in the above-mentioned manner, in order to further reduce the adverse effects caused by the frame loss.
  • the animation frames between the two animation frames to be fused are fused to each other in a certain way, so that no frame dropping occurs in the final synthesized animation.
  • the terminal may select the first animation frame from the previous animation and locate the animation during the animation synthesis according to the first animation frame and the second animation frame. k animation frames after the first animation frame, and sorting the selected animation frames according to the order of the animation frames in the previous animation, thereby obtaining the first frame sequence; similarly, the terminal can be from the latter animation
  • the k animation frames located before the second animation frame and the second animation frame are selected, and the selected animation frames are sorted according to the arrangement order of the animation frames in the subsequent animation to obtain the second frame. sequence.
  • the terminal may combine the first frame sequence and the second frame sequence with the same sequence number of animation frames to obtain k+1 fused frames, and then pass the animation frames located before the first animation frame in the previous animation. , k+1 fused frames, and each animation frame located after the second animation frame in the latter animation are synthesized.
  • the two animation frames with the same sequence number and sequence number are merged, that is, the animation frame #3 and the animation frame *1 fusion, the animation frame #4 and the animation frame *2 fusion, the animation frame #5, and the animation frame *3 are merged.
  • Get 3 fused frames After determining the fused frame, the terminal can synthesize the animation frames #1, #2, and 3 fused frames in the previous animation and the animation frames *4 to *7 in the latter animation in order, and then obtain the synthesized Animation.
  • the terminal When the terminal fuses the animation frames to be fused, the formula can be used.
  • the terminal reduces the frame loss rate in the animation synthesis as much as possible, and in order to ensure that the synthesized animation does not have obvious jumping feeling in the effect, the terminal calculates the animations participating in the fusion process by calculating.
  • the fusion coefficient of the frame is used to fuse each animation frame to ensure the display effect of each fusion frame in the synthesized animation, which reduces the disadvantages caused by the animation synthesis process.
  • the terminal synthesizes the respective animations corresponding to the text keywords according to the arrangement order of the text keywords in the text information, and then displays the obtained fused animation, and can publish the fused animation as information on the social platform, or It is sent as a chat message to other users.
  • the terminal may further determine the effect information corresponding to the text information, and pass the effect information.
  • the effect information mentioned herein may be the background music, the sound effect of the fused animation, or the voice information corresponding to the text information, etc. The specific method of determining the information and how to adjust the fusion animation through these kinds of effect information will be described in detail below.
  • the terminal can further determine each music corresponding to each text keyword from the preset music library according to the recognized text keywords, and the specific determination manner.
  • the music keywords corresponding to the music in the music library are respectively matched, and the music corresponding to the music keyword matching the text keyword is used as the music corresponding to the text keyword, or
  • For each text keyword respectively calculating the similarity between the text keyword and each music keyword, and selecting music matching the text keyword according to the calculated similarities, wherein the terminal determines through calculation There may be multiple music corresponding to the text keyword.
  • the terminal may further implement a plurality of music corresponding to the text keyword according to the feature information of the text information. Screening to select music that is more in line with the text of the entire text, the specific screening method and the screening of the above description In the same way, this is not a detailed repeat.
  • the terminal may determine, for each music in the music library, a feature capable of representing the music, such as expressing the feature of the music by the Mel cepstrum coefficient MFCC. Then, the terminal may input the feature into the preset music model for the determined characteristics of each music, and determine the music keyword corresponding to the music according to the output result of the music model, specifically The process is the same as the above method of determining the animation keyword, and will not be described in further detail here.
  • the terminal can associate each music with each music keyword to ensure that it is in the preset music library for later use.
  • each music keyword corresponding to each music can also be determined by an artificial manner, that is, the music keywords corresponding to each music are manually calibrated and corresponding to each other and saved in the preset music. Library in.
  • each music can be synthesized according to the order of the text keywords in the text information in which they are located, and the corresponding fusion music is obtained, wherein the music is synthesized.
  • the manner of synthesizing the animation is basically the same, for example, the terminal can realize the transition of each music in the fused music by setting the playing effect such as fade out or fade in the music, or by determining the fusion coefficient of each music.
  • the terminal may synthesize the fused music into the fused animation to further improve the playing effect of the fused animation, wherein the specific merging manner may be: the manner in which the terminal determines the fused animation playing speed To adjust the playing speed of the fused music, so that the fused music and the fused animation are synchronized in the playing speed, or the terminal can cycle the fused music in the fused animation at a certain playing speed, or the terminal can be adjusted.
  • the fusion music playing speed can be based on the text keyword, and the music in the fused music and the animation in the fused animation are mutually correlated, thereby completing the compositing work of the fused music and the fused animation.
  • the terminal may select different music models in different dimensions to determine, for example, when selecting a music model related to sports, the terminal finally passes the music model.
  • the determined music keywords corresponding to the respective music should be related to sports, and when the music model related to the emotion is selected, the final determined music keywords corresponding to the respective music should be related to emotions. Therefore, for each music, there may be more than one music keyword corresponding to the music determined by the terminal through the music model of different dimensions, which lays a foundation for the subsequent terminal to filter the music through the feature information of the text information.
  • the music model mentioned above can be obtained after training a large amount of sample music collected, and the training method and The above methods for training other models are similar and will not be described in detail here.
  • the background music of the fused animation can be determined by the above description, and the terminal can also determine an overall background music of the fused animation through the feature information of the text information, and then integrate the background music into the fused animation. .
  • the terminal can adjust the sound effect of the fused music by monitoring the animation parameters in the fused animation, for example, when the terminal monitors a certain
  • the animation parameters of the time period change too fast
  • the fusion music corresponding to this time period can be adjusted more flexibly on the sound effect, or when the characters in the fusion animation perform actions such as clapping, stepping, and panting.
  • the terminal can fuse the sound effects corresponding to these actions into the fused music, and of course, other adjustment methods may be used, and the description will not be made here.
  • the fused music after adjusting the sound effect can be synthesized into the fused animation, so that the existence of the sound effect further enhances the effect of the fused animation, thereby bringing the user More fun.
  • the text information input by the user usually contains some specified characters, such as a colon ":", a book name, etc., and the text information contained after the specified characters is usually a special piece of text information, such as colon double quotes. ":" is usually followed by a paragraph.
  • the terminal can process a piece of text information after the specified character and process it. The obtained effect information is inserted into the fused animation.
  • the specific manner may be that the terminal can determine the specified character included in the text information.
  • the specified character mentioned here may be a colon double quotation mark ":""
  • the terminal may extract a piece of sub-text information following the specified character from the text information according to the specified character, and adopt a voice recognition function
  • the sub-text information is converted into a corresponding voice, and the terminal may insert the voice or the sub-text information corresponding to the voice as the effect information into the fused animation, wherein, for the determined voice, the terminal may The speech is synthesized in the fused animation to realize the dubbing of the fused animation.
  • the terminal can insert the sub-text information into the fused animation in a preset display manner, as shown in FIG. 2 . Shown.
  • FIG. 2 is a schematic diagram showing display of utterance information in a fusion animation according to an embodiment of the present application.
  • the sub-text information when the terminal determines that the sub-text information following the colon double quotation mark “: “” in the text information is a utterance, the sub-text information can be used as the utterance of the character in the fused animation, and the utterance is placed. In the specified dialog box, it is displayed above the characters in the fused animation. Of course, this sub-text information can also be displayed in the fused animation through bubbles, clouds, etc., to enhance the display effect and fun of the fused animation.
  • the specified characters in the above description are not necessarily the colon double quotes “:”, or may be a designated character such as “think:”, and the terminal determines that the text information includes “think” and colon " : "When used in combination, it can be determined that the subsequent sub-text information should be described in the heart of the fused animation, and this sub-text information can be used as a heart activity in the fused animation, and displayed in a certain form in the fusion.
  • the specified characters can also be used in combination with other characters or characters, such as the words “say”, “question”, etc., which will not be explained in detail here.
  • the terminal may also use the entire piece of text information input by the user as a utterance, from the text information. Extracting corresponding voice feature information, and further determining each voice feature
  • the mouth type corresponding to the information wherein the mouth type mentioned here means that, in general, different syllables have corresponding mouth type categories, and each mouth type corresponds to a respective mouth type.
  • Animation The pronunciation of a word is usually formed by the pronunciation of several syllables.
  • the lip animation corresponding to a word should also be composed of animations corresponding to the vocal categories corresponding to several syllables.
  • the terminal determines each port type category
  • the mouth shape animation corresponding to each word in the text information is determined correspondingly, and then the mouth shape animation of each word is synthesized as effect information into the fusion animation, as shown in the figure. 3 is shown.
  • FIG. 3 is a schematic diagram of a lip animation provided by an embodiment of the present application.
  • the terminal can synthesize each mouth-shaped animation into the fusion animation according to the position of the single word based on the voice information in the text information, wherein the synthesis method may be: The size of the animation is adjusted according to the character shape of the fused animation, and then the character shape of the fused animation is replaced in turn, and then the fused animation with the matching of the voice and the mouth is obtained.
  • the embodiment of the present application further provides an animation synthesis device, as shown in FIG. 4 .
  • FIG. 4 is a schematic diagram of an apparatus for synthesizing an animation according to an embodiment of the present disclosure, specifically including:
  • a receiving module 401 configured to receive input text information
  • the identification module 402 is configured to identify each text keyword in the text information
  • a determining module 403 configured to respectively determine an animation corresponding to each text keyword from a preset animation library
  • the compositing module 404 is configured to synthesize the determined animations to obtain a fused animation.
  • the determining module 403 is specifically configured to: extract feature information in the text information; and, for each text keyword, determine, according to the text keyword and the feature information, from the preset animation library, corresponding to the A text keyword and an animation corresponding to the feature information.
  • the synthesizing module 404 is specifically configured to synthesize the determined animations according to the order of the keywords in the text information.
  • the synthesizing module 404 is specifically configured to determine, for any two adjacent animations, a transition animation segment to be inserted between the previous animation and the latter animation, and the previous animation, the transition The animation segment and the subsequent animation are synthesized in sequence; or
  • the first specified animation frame of the previous animation is set as the first effect
  • the second specified animation frame of the subsequent animation is set as the second effect
  • the setting effect is Combining the previous animation with the latter animation, wherein the first effect includes at least a fade-out effect, and the second specified effect includes at least a fade-in effect;
  • the synthesizing module 404 is specifically configured to: select, from the previous animation, a first animation frame and k animation frames located after the first animation frame, and press each of the selected animation frames in the previous one Sorting the order in the animation to obtain a first frame sequence; from the latter animation, the selection is located in the The k animation frames before the second animation frame and the second animation frame are sorted according to the arrangement order of the selected animation frames in the subsequent animation to obtain a second frame sequence; the first frame sequence and the first frame sequence An animation frame with the same sequence number in the two frame sequence is fused to obtain k+1 fused frames; for each animation frame, each fused frame, and the latter animation before the first animation frame in the previous animation Each animation frame located after the second animation frame is synthesized; wherein k is a positive integer.
  • the device also includes:
  • the effect determining module 405 is configured to determine effect information corresponding to the text information, and adjust the fusion animation according to the effect information corresponding to the text information.
  • the effect determining module 405 is specifically configured to determine, according to the identified text keywords, music that matches the text keywords from the preset music library.
  • the effect determining module 405 is specifically configured to synthesize the determined music according to the order of the text keywords in the text information to obtain the fused music; and synthesize the fused music into the fused animation. .
  • the effect determining module 405 is specifically configured to: monitor each animation parameter corresponding to the fused animation; adjust the sound effect of the fused music according to each animation parameter; and synthesize the fused music after adjusting the sound effect into the fused animation.
  • the effect determining module 405 is specifically configured to: extract each voice feature information from the text information; and determine, according to the voice description feature information, each port type corresponding to each voice feature information; a category, determining each mouth-shaped animation corresponding to each of the lip-type categories, and using the respective mouth-shaped animations as the determined effect information.
  • the effect determining module 405 is specifically configured to synthesize each lip animation into the fused animation according to the position of the single word on which the respective voice feature information is extracted in the text information.
  • the embodiment of the present application provides a method and an apparatus for synthesizing an animation.
  • a terminal can receive text information input by a user, and identify each text keyword from the text information, and then the terminal can obtain a preset animation library.
  • the animations corresponding to the respective text keywords are respectively determined, and each animation is synthesized according to the arrangement order of the keywords in the text information to obtain a fusion animation. Since the animation can express the meaning of the information more fully and vividly with respect to the text information, the text information is converted by the way of presenting the information in the form of text or voice.
  • the obtained animation can more fully and vividly express the meaning of the information itself, thereby bringing the user the fun and convenience in the process of reading the information.
  • embodiments of the present invention can be provided as a method, system, or computer program product. Accordingly, the present invention may take the form of an entirely hardware embodiment, an entirely software embodiment, or a combination of software and hardware. Moreover, the invention can take the form of a computer program product embodied on one or more computer-usable storage media (including but not limited to disk storage, CD-ROM, optical storage, etc.) including computer usable program code.
  • computer-usable storage media including but not limited to disk storage, CD-ROM, optical storage, etc.
  • the computer program instructions can also be stored in a computer readable memory that can direct a computer or other programmable data processing device to operate in a particular manner, such that the instructions stored in the computer readable memory produce an article of manufacture comprising the instruction device.
  • the apparatus implements the functions specified in one or more blocks of a flow or a flow and/or block diagram of the flowchart.
  • These computer program instructions can also be loaded onto a computer or other programmable data processing device such that a series of operational steps are performed on a computer or other programmable device to produce computer-implemented processing for execution on a computer or other programmable device.
  • the instructions provide steps for implementing the functions specified in one or more of the flow or in a block or blocks of a flow diagram.
  • a computing device includes one or more processors (CPUs), input/output interfaces, network interfaces, and memory.
  • processors CPUs
  • input/output interfaces network interfaces
  • memory volatile and non-volatile memory
  • the memory may include non-persistent memory, random access memory (RAM), and/or non-volatile memory in a computer readable medium, such as read only memory (ROM) or flash memory.
  • RAM random access memory
  • ROM read only memory
  • Memory is an example of a computer readable medium.
  • Computer readable media includes both permanent and non-persistent, removable and non-removable media.
  • Information storage can be implemented by any method or technology.
  • the information can be computer readable instructions, data structures, modules of programs, or other data.
  • Examples of computer storage media include, but are not limited to, phase change memory (PRAM), static random access memory (SRAM), dynamic random access memory (DRAM), other types of random access memory (RAM), read only memory. (ROM), electrically erasable programmable read only Memory (EEPROM), flash memory or other memory technology, CD-ROM, digital versatile disc (DVD) or other optical storage, magnetic cassette, magnetic tape storage or other magnetic
  • PRAM phase change memory
  • SRAM static random access memory
  • DRAM dynamic random access memory
  • RAM random access memory
  • ROM read only memory
  • EEPROM electrically erasable programmable read only Memory
  • flash memory or other memory technology
  • CD-ROM compact disc
  • DVD digital versatile disc
  • magnetic cassette magnetic tape storage or other magnetic
  • embodiments of the present application can be provided as a method, system, or computer program product.
  • the present application can take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment in combination of software and hardware.
  • the application can take the form of a computer program product embodied on one or more computer-usable storage media (including but not limited to disk storage, CD-ROM, optical storage, etc.) including computer usable program code.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Library & Information Science (AREA)
  • Multimedia (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • Processing Or Creating Images (AREA)

Abstract

Disclosed are an animation synthesis method and device. According to the method, a terminal can receive text information inputted by a user and identify text keywords from the text information; then, the terminal can determine animations corresponding to the text keywords from a preset animation library, and synthesize the animations according to the sequence of the keywords in the text information, to obtain a fused animation. Compared with text information, an animation can more fully and vividly express the meaning of information. Therefore, compared with the approach in the prior art of only presenting information in the form of a text or voice, an animation converted from text information can more fully and vividly express the real meaning of the information, brining pleasure and convenience to a user during an information reading process.

Description

一种动画合成的方法及装置Method and device for synthesizing animation
本申请要求于2016年09月14号提交中国专利局、申请号为201610823313.6、发明名称为“一种动画合成的方法及装置”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。The present application claims priority to Chinese Patent Application No. 20161082331, filed on No. in.
技术领域Technical field
本申请涉及计算机技术领域,尤其涉及一种动画合成的方法及装置。The present application relates to the field of computer technology, and in particular, to a method and an apparatus for animation synthesis.
背景技术Background technique
随着网络技术以及通讯技术的不断发展,无线保真(WIreless-Fidelity,WIFI)、3G、4G等上网方式不断被普及,现在,人们可以随时随地的通过WIFI、4G等上网方式来进行上网、发布信息,时刻享受着信息时代所带来的便利。With the continuous development of network technology and communication technology, wireless access (WIreless-Fidelity, WIFI), 3G, 4G and other Internet access methods have been popularized. Now, people can access the Internet through WIFI, 4G and other Internet access methods anytime, anywhere. Release information and enjoy the convenience brought by the information age.
当前,即时通讯(Instant Messaging,IM)软件或是微博等社交软件的用户群体正不断的增加,一方面由于其功能愈发的强大,另一方面,这些软件可以不断的拓宽用户的社交关系,并在一定程度上实现了信息共享,从而进一步实现了用户在信息时代的信息浏览需求。At present, the user groups of instant messaging (IM) software or social software such as Weibo are increasing. On the one hand, due to their increasingly powerful functions, on the other hand, these software can continuously expand the social relationship of users. And to a certain extent, the information sharing is realized, thereby further realizing the information browsing needs of users in the information age.
人们在使用IM软件、微博等社交软件发布信息时,所发布的信息通常是以以下两种方式呈现的:第一种,用户在社交软件的界面中输入相应的文本信息并将其发布,这样,用户发布的信息以文字的形式进行呈现;第二种,用户通过社交软件(尤其是IM软件)中的语音发送功能,将自己的语音作为信息进行发布。这两种信息发布形式虽然都能有效的保证信息的正常呈现,然而,无论是文本信息还是语音信息,在信息的表达形式上都过于单一,并且、文本信息或是语音信息往往也不能充分的表达出信息的完整含义,这就给用户在浏览这些信息的过程中带来的不便。 When people use IM software, Weibo and other social software to publish information, the information is usually presented in the following two ways: First, the user enters the corresponding text information in the interface of the social software and publishes it. In this way, the information published by the user is presented in the form of text; secondly, the user issues his own voice as information through the voice transmission function in the social software (especially the IM software). Although these two forms of information release can effectively guarantee the normal presentation of information, however, both text information and voice information are too singular in the form of information expression, and text information or voice information is often insufficient. Expressing the full meaning of the information, this gives users the inconvenience of browsing this information.
发明内容Summary of the invention
本申请实施例提供一种动画合成的方法以装置,用于解决现有技术中文本信息或语音信息不能充分表达含义而给用户在浏览该信息的过程中带来不便的问题。The embodiment of the present invention provides a method for synthesizing an animation, which is used to solve the problem that the text information or the voice information in the prior art cannot fully express the meaning and cause inconvenience to the user in browsing the information.
本申请实施例提供一种动画合成的方法,包括:The embodiment of the present application provides a method for animation synthesis, including:
接收输入的文本信息;Receiving input text information;
识别所述文本信息中的各文本关键词;Identifying each text keyword in the text information;
从预设的动画库中分别确定出各文本关键词所对应的动画;Determining an animation corresponding to each text keyword from a preset animation library;
将确定出的各动画进行合成,得到融合动画。The determined animations are combined to obtain a fused animation.
本申请实施例提供一种动画合成的装置,包括:An embodiment of the present application provides an apparatus for animation synthesis, including:
接收模块,用于接收输入的文本信息;a receiving module, configured to receive input text information;
识别模块,用于识别所述文本信息中的各文本关键词;An identification module, configured to identify each text keyword in the text information;
确定模块,用于从预设的动画库中分别确定出各文本关键词所对应的动画;a determining module, configured to respectively determine an animation corresponding to each text keyword from a preset animation library;
合成模块,用于将确定出的各动画进行合成,得到融合动画。A synthesis module for synthesizing the determined animations to obtain a fusion animation.
本申请实施例提供了一种动画合成的方法及装置,该方法中终端可接收用户输入的文本信息,并从该文本信息中识别出各文本关键词,而后,终端可从预设的动画库中分别确定出各文本关键词所对应的动画,并将各动画按照各关键词在文本信息中的排列顺序进行合成,得到融合动画。由于动画相对于文本信息来说,能够更加充分、生动的表达出信息中的含义,因此,相对于现有技术中只是将信息以文本或语音的形式进行呈现的方式来说,通过转化文本信息而得到的动画能够更加充分、生动的表达出信息本身的含义,从而给用户在阅读信息的过程中带来了乐趣以及便利。The embodiment of the present application provides a method and an apparatus for synthesizing an animation. In this method, a terminal can receive text information input by a user, and identify each text keyword from the text information, and then the terminal can obtain a preset animation library. The animations corresponding to the respective text keywords are respectively determined, and each animation is synthesized according to the arrangement order of the keywords in the text information to obtain a fusion animation. Since the animation can express the meaning of the information more fully and vividly with respect to the text information, the text information is converted by the way of presenting the information in the form of text or voice. The obtained animation can more fully and vividly express the meaning of the information itself, thereby bringing the user the fun and convenience in the process of reading the information.
附图说明 DRAWINGS
此处所说明的附图用来提供对本申请的进一步理解,构成本申请的一部分,本申请的示意性实施例及其说明用于解释本申请,并不构成对本申请的不当限定。在附图中:The drawings described herein are intended to provide a further understanding of the present application, and are intended to be a part of this application. In the drawing:
图1为本申请实施例提供的动画合成的过程;1 is a process of animation synthesis provided by an embodiment of the present application;
图2为本申请实施例提供的融合动画中话语信息的显示示意图;2 is a schematic diagram showing display of utterance information in a fusion animation according to an embodiment of the present application;
图3为本申请实施例提供的口型动画的示意图;3 is a schematic diagram of a mouth animation provided by an embodiment of the present application;
图4为本申请实施例提供的一种动画合成的装置示意图。FIG. 4 is a schematic diagram of an apparatus for animation synthesis according to an embodiment of the present application.
具体实施方式detailed description
为使本申请的目的、技术方案和优点更加清楚,下面将结合本申请具体实施例及相应的附图对本申请技术方案进行清楚、完整地描述。显然,所描述的实施例仅是本申请一部分实施例,而不是全部的实施例。基于本申请中的实施例,本领域普通技术人员在没有做出创造性劳动前提下所获得的所有其他实施例,都属于本申请保护的范围。The technical solutions of the present application will be clearly and completely described in the following with reference to the specific embodiments of the present application and the corresponding drawings. It is apparent that the described embodiments are only a part of the embodiments of the present application, and not all of them. All other embodiments obtained by a person of ordinary skill in the art based on the embodiments of the present application without departing from the inventive scope are the scope of the present application.
以下结合附图,详细说明本申请各实施例提供的技术方案。The technical solutions provided by the embodiments of the present application are described in detail below with reference to the accompanying drawings.
图1为本申请实施例提供的动画合成的过程,具体包括以下步骤:FIG. 1 is a process of animation synthesis provided by an embodiment of the present application, which specifically includes the following steps:
S101:接收输入的文本信息。S101: Receive input text information.
在实际应用中,用户通常会在微博等社交软件上发布一些文本信息,或是通过IM软件,向其他用户发送文本形式的聊天信息,由于文本信息在表现形式上过于单一,且能够表达出的含义有限,因此,在本申请实施例中,终端可将用户输入的文本信息转换成动画,以通过该动画更加充分、生动的表达出信息本身的含义。为此,终端可先接收用户输入的文本信息,其中,这里提到的终端可以是诸如智能手机、平板电脑等智能设备,当然,用户也可在终端中的 客户端中输入文本信息。In practical applications, users usually post some text information on social software such as Weibo, or send text chat information to other users through IM software. Because text information is too singular in expression, and can express The meaning of the information is limited. Therefore, in the embodiment of the present application, the terminal can convert the text information input by the user into an animation, so that the meaning of the information itself is more fully and vividly expressed by the animation. To this end, the terminal may first receive text information input by the user, wherein the terminal mentioned here may be a smart device such as a smart phone or a tablet computer, and of course, the user may also be in the terminal. Enter text information in the client.
需要说明的是,在本申请实施例中,将文本信息转换为相应动画的工作也可由终端中的客户端、App等应用来完成,而为了方便清楚、方便说明本申请实施例所提供的动画合成的方法,后续则仅以终端为例进行详细阐述。It should be noted that, in the embodiment of the present application, the work of converting the text information into the corresponding animation may also be performed by an application such as a client or an application in the terminal, and the animation provided by the embodiment of the present application is illustrated for convenience and convenience. The method of synthesis is followed by a detailed description of the terminal.
S102:识别所述文本信息中的各文本关键词。S102: Identify each text keyword in the text information.
由于文本信息中通常都会包含有多个词组,每个词组在实际中所对应的动画也有所不同,例如,假设文本信息为“小明昨天踢球的时候下雨了”,从这段文本信息中可以看出,该文本信息可能涉及的动画有下雨的动画以及小明踢球的动画,因此,这段文本信息所应表达出的动画应为这两个动画合成的结果。基于此,终端在将接收到的文本信息转化为动画之前,应从该文本信息中识别出各文本关键词,其目的在于,以识别文本关键词的方式来确定出该文本信息可能会涉及到的动画,继而在后续过程中,将确定出的各动画进行合成,得到该文本信息对应的融合动画。Since the text information usually contains multiple phrases, the animation corresponding to each phrase in the actual is also different. For example, if the text information is "Xiao Ming was raining when playing yesterday," from this text message. It can be seen that the animation that the text information may involve has an animation of raining and an animation of Xiaoming kicking the ball. Therefore, the animation that should be expressed by this text information should be the result of the synthesis of the two animations. Based on this, the terminal should identify each text keyword from the text information before converting the received text information into an animation, the purpose of which is to identify the text keyword to determine the text information may be involved in. The animation, and then in the subsequent process, the determined animations are combined to obtain a fusion animation corresponding to the text information.
具体的,终端在接收到用户输入的文本信息后,可将该文本信息进行分词,得到若干个词组,然后,通过预先保存的各词组对应的逆向文本概率IDF值,以及各词组的词频TF,从各词组中确定出该文本信息中包含的文本关键词,具体的实现方式可以是,将各词组分别输入到预设的TF-IDF模型中,而预设的TF-IDF模型可针对每个词组,确定出该词组对应的逆向文本概率IDF值以及词频TF,并通过计算两者的乘积得到该词组的重要表征值,而后,预设的TF-IDF模型可将计算出的各词组分别对应的各重要表征值进行输出,而终端则可将各词组按照重要表征值的大小进行排序,并将靠前的几个词组作为该文本信息的文本关键词。 Specifically, after receiving the text information input by the user, the terminal may segment the text information to obtain a plurality of phrases, and then pass the reverse text probability IDF value corresponding to each phrase saved in advance, and the word frequency TF of each phrase. The text keywords included in the text information are determined from each phrase. The specific implementation manner may be: inputting each phrase into a preset TF-IDF model, and the preset TF-IDF model may be for each The phrase determines the inverse text probability IDF value corresponding to the phrase and the word frequency TF, and obtains the important representation value of the phrase by calculating the product of the two, and then the preset TF-IDF model can respectively correspond to the calculated phrases. The important characterization values are output, and the terminal can sort the phrases according to the size of the important characterization values, and use the first few phrases as the text keywords of the text information.
除此之外,也可通过预先训练的识别模型,从各词组中确定出该文本信息的文本关键词,其中,预先训练的识别模型可以是诸如隐马尔克夫模型(Hidden Markov Model,HMM)等机器学习模型。通过预先训练的识别模型来确定文本关键词的方式为现有技术,因此,在这里就不做过多的阐述了。In addition, the text keyword of the text information may also be determined from each phrase through a pre-trained recognition model, wherein the pre-trained recognition model may be a Hidden Markov Model (HMM) or the like. Machine learning model. The manner in which the text keywords are determined by the pre-trained recognition model is prior art, and therefore, no overstatement is made here.
S103:从预设的动画库中分别确定出各文本关键词所对应的动画。S103: Determine an animation corresponding to each text keyword from a preset animation library.
由于本申请实施例意在将用户输入的文本信息转化为相应的动画,因此,终端在确定出该文本信息中包含的各文本关键词后,可从预设的动画库中确定出各文本关键词对应的各动画,进而在后续过程中,将确定出的各动画进行合成,得到该文本信息对应的动画。The embodiment of the present application is intended to convert the text information input by the user into a corresponding animation. Therefore, after determining the text keywords included in the text information, the terminal may determine each text key from the preset animation library. Each animation corresponding to the word, and then in the subsequent process, the determined animations are combined to obtain an animation corresponding to the text information.
具体的,终端在确定出该文本信息中包含的各文本关键词后,可针对每个文本关键词,分别确定出预设动画库中各动画对应的各动画关键词与该文本关键词的各相似度,其中,预设动画库中各动画对应的各动画关键词可以通过人为的方式事先进行标定,如,假设某一动画中显示的内容为一个人在打篮球,则可通过人工的方式将该动画对应的动画关键词标定为体育,并将该动画以及动画关键词体育对应起来存储在预设的动画库中。Specifically, after determining the text keywords included in the text information, the terminal may separately determine, for each text keyword, each animation keyword corresponding to each animation in the preset animation library and each of the text keywords. Similarity, wherein each animation keyword corresponding to each animation in the preset animation library can be calibrated in advance by an artificial method. For example, if the content displayed in an animation is played by a person, the manual manner can be manually The animation keyword corresponding to the animation is categorized as sports, and the animation and the animation keyword sports are correspondingly stored in a preset animation library.
除此之外,在本申请实施例中,预设动画库中各动画对应的各动画关键词也可通过预先训练的第一分类模型进行标定。具体的,终端可先将预先保存的各动画分别转换为相应的特征向量,其中,将动画转换为相应的特征向量可以通过以下方式:在实际应用中,每个动画的时长和剧烈程度都不尽相同,而在每个动画中,动画帧间变化量最大的几个动画帧往往是最能够显著区分于其他动画的,因此,在本申请实施例中,终端在将各动画转换为相应的特征向量时,可针对每个动画,分别确定出该动画中各动画帧之间的变化量T,并挑选出变 化量T最大的z个动画帧作为表示该动画的动画帧,而后,终端可针对选取出的z个动画帧,分别确定出每个动画帧所对应的子特征向量,其中,对于三维动画来说,终端可根据该动画帧中的动画骨骼空间坐标、帧间的骨骼加速度等数据来确定出该动画帧所对应的子特征向量l,进而根据分别确定出的z个动画帧的子特征向量,将该动画转换为相应的特征向量。In addition, in the embodiment of the present application, each animation keyword corresponding to each animation in the preset animation library may also be calibrated by a pre-trained first classification model. Specifically, the terminal may first convert each pre-saved animation into a corresponding feature vector, wherein converting the animation into a corresponding feature vector may be performed in the following manner: in actual application, the duration and severity of each animation are not In the same way, in each animation, the animation frames with the largest amount of change between animation frames are often the most distinguishable from other animations. Therefore, in the embodiment of the present application, the terminal converts each animation into a corresponding one. For the eigenvector, the amount of change T between each animation frame in the animation can be determined separately for each animation, and the change is selected. The z animation frames with the largest amount T are used as the animation frames representing the animation, and then the terminal can determine the sub-feature vectors corresponding to each animation frame for the selected z animation frames, wherein, for the three-dimensional animation The terminal can determine the sub-feature vector l corresponding to the animation frame according to the animated bone space coordinates in the animation frame, the bone acceleration between the frames, and the like, and further determine the sub-feature vectors of the z animation frames according to the respectively determined , convert the animation to the corresponding feature vector.
需要说明的是,上述说明的特征向量转换方式并不唯一,也可通过其他的方式将各动画转换为相应的特征向量,如,针对每个动画,分别确定出该动画中各动画帧所对应的子特征向量,而后,终端再根据该动画中所有动画帧对应的各子特征向量,将该动画转换为相应的特征向量,当然还可以是其他的方式,在此就不进行一一举例说明了。It should be noted that the feature vector conversion method described above is not unique, and each animation may be converted into a corresponding feature vector by other means, for example, for each animation, respectively, corresponding to each animation frame in the animation. The sub-feature vector, and then the terminal converts the animation into a corresponding feature vector according to each sub-feature vector corresponding to all the animation frames in the animation, and of course, other methods may be used. It is.
终端将各动画分别转换为相应的特征向量后,可将各特征向量分别输入到预先训练的第一分类模型中,其中,针对每个特征向量来说,该第一分类模型对该特征向量实施计算后,可得到若干个数值,其中,每个数值都对应一个关键词,而当终端发现在这些数值当中,某一数值均大于其他数值时,则可将该数值对应的关键词就作为该动画的动画关键词,并将该动画与动画关键词对应起来保存在预设的动画库中。After the terminal respectively converts each animation into a corresponding feature vector, each feature vector may be separately input into a pre-trained first classification model, wherein for each feature vector, the first classification model implements the feature vector. After the calculation, several values can be obtained, wherein each value corresponds to a keyword, and when the terminal finds that a certain value is greater than other values among the values, the keyword corresponding to the value can be used as the keyword. The animation keyword of the animation, and the animation is associated with the animation keyword and saved in the preset animation library.
在本申请实施例中,上述说明的分类模型可以是神经网络模型、隐马尔科夫模型HMM、支持向量机(Support Vector Machine,SVM)等训练模型。而在分类模型的训练过程中,可先采集大量的样本动画,并将各样本动画转换为向量、参数等形式分别输入到该分类模型中去,进而训练该分类模型。In the embodiment of the present application, the classification model described above may be a training model such as a neural network model, a hidden Markov model HMM, or a Support Vector Machine (SVM). In the training process of the classification model, a large number of sample animations can be collected first, and each sample animation is converted into a vector, a parameter, and the like, respectively, and input into the classification model, and then the classification model is trained.
需要说明的是,在实际应用中,每个动画通常都会对应多个关键词,例如,假设一个动画中显示的是一个人正欢快地踢足球,则这个动画对应的动画关键 词可以是体育,可以是踢足球,或是高兴、欢快等关键词,所以,终端在确定一个关键词所对应的动画时,可能会从预设的动画库中确定出多个动画与该关键词相对应,因此,为了能够进一步精确的确定出该关键词所对应的动画,在本申请实施例中,终端可进一步从接收到的文本信息中,确定该文本信息对应的特征信息,并根据该特征信息以及各关键词,从预设的动画库中分别确定出各关键词所对应的各动画。It should be noted that in practical applications, each animation usually corresponds to multiple keywords. For example, if an animation shows that a person is playing football happily, the animation key corresponding to the animation The word can be sports, it can be playing football, or keywords such as happy and cheerful, so when the terminal determines the animation corresponding to a keyword, it may determine a plurality of animations and the key from the preset animation library. The word correspondingly, therefore, in order to be able to further accurately determine the animation corresponding to the keyword, in the embodiment of the present application, the terminal may further determine the feature information corresponding to the text information from the received text information, and according to The feature information and each keyword determine each animation corresponding to each keyword from a preset animation library.
具体的,终端在确定出该文本信息中包含的各关键词后,可进一步的提取出该文本信息中的特征信息,具体的提取方式可以是:终端通过预设的特征分析模型来对该文本信息进行分析,进而提取出该文本信息中的特征信息。例如,假设一段文本信息为“我们明天欢快地去踢足球吧!”,终端可将这段话转换为相应的词向量序列(由于这段话是由多个词组成的,所以将这段话中的各个词转换为各词向量后,将各词向量按照各词在这段话的位置进行排序,即可得到能够表示这段话的词向量序列),并将该词向量序列输入到预设的特征分析模型,进而通过该特征分析模型输出的结果,确定出从这段话整个语境表达出的情感应为快乐、高兴的情感,因此,终端从这段话中提取出的特征信息应是快乐或高兴。当然,软件开发人员也可预先建立一个情绪词表库,并将该情绪词表库输入到终端中进行保存,相应的,终端后续接收到用户发送的文本信息后,可将该文本信息中的各个词与情绪词表库中的各情绪词进行比对,进而确定出该文本信息所对应的情绪信息。Specifically, after the terminal determines the keywords included in the text information, the terminal may further extract the feature information in the text information, and the specific extraction manner may be: the terminal uses the preset feature analysis model to the text. The information is analyzed, and the feature information in the text information is extracted. For example, suppose a piece of text message is "We will play football happily tomorrow!", the terminal can convert this paragraph into a corresponding sequence of word vectors (since this passage is composed of multiple words, so this paragraph will be After each word in the word is converted into a word vector, the word vector is sorted according to the position of each word in the paragraph, and a sequence of word vectors capable of representing the phrase can be obtained, and the word vector sequence is input to the pre-predicate. The feature analysis model is set, and then the result of the feature analysis model is used to determine that the emotion expressed from the entire context of the passage should be a happy and happy emotion. Therefore, the terminal extracts the feature information from the passage. It should be happy or happy. Of course, the software developer can also pre-establish an emotional vocabulary library, and input the emotional vocabulary library into the terminal for storage. Correspondingly, after the terminal subsequently receives the text information sent by the user, the text information can be Each word is compared with each emotional word in the emotional vocabulary library to determine the emotional information corresponding to the text information.
而后,对于这段话中“踢足球”这一文本关键词来说,终端在从这段话中识别出该文本关键词后,可进一步的根据该文本关键词“踢足球”以及特征信息“快乐”,从预设的动画库中,筛选出与该文本关键词以及特征信息对应的 动画。由于在预设的动画库中,文本关键词“踢足球”可能会对应多个动画,所以,终端可通过该特征信息“快乐”进一步的对文本关键词“踢足球”对应的多个动画进行筛选,继而确定出与文本关键词“踢足球”和特征信息“快乐”同时对应的动画。Then, for the text keyword "playing football" in this paragraph, after the terminal recognizes the text keyword from the passage, the terminal can further "play soccer" and feature information according to the text keyword. "Happy", from the preset animation library, filter out the corresponding text keywords and feature information Animation. Since the text keyword "playing soccer" may correspond to a plurality of animations in the preset animation library, the terminal may further perform a plurality of animations corresponding to the text keyword "playing soccer" through the feature information "happy". The screening, and then the animation corresponding to the text keyword "playing football" and the feature information "happy" are determined.
上述说明的特征信息可以是诸如“快乐”、“高兴”、“悲伤”等情绪信息,而为了使终端能够通过情绪信息从预设的动画库中筛选出相应的动画,则需要事先标定出各动画所对应的情绪关键词,进而使得终端后续可通过情绪信息与情绪关键词的匹配,确定出该情绪信息对应的动画。因此,在本申请实施例中,可通过人为的方式事先对各动画的情绪信息进行标定,如,假设一个动画所显示的内容为一个人坐在椅子上大哭,则可通过人工的方式将该动画对应的情绪信息确定为“悲伤”。除此之外,也可通过预先训练的第二分类模型,对各动画对应的情绪关键词进行确定,具体的方式可以是,将各动画分别转换为相应的特征向量后,可将各特征向量分别输入到预先训练的第二分类模型之中,而后,根据该第二分类模型输出的结果,确定出各动画所对应的情绪关键词,继而将各动画与情绪信息相匹配,其中,该第二分类模型的训练方式可以与上述训练第一分类模型的方式相同,在此就不进行详细赘述了。The feature information described above may be emotional information such as "happy", "happy", "sad", and in order to enable the terminal to filter the corresponding animation from the preset animation library through the emotional information, it is necessary to calibrate each The emotional keyword corresponding to the animation further enables the terminal to determine the animation corresponding to the emotional information by matching the emotional information with the emotional keyword. Therefore, in the embodiment of the present application, the emotion information of each animation can be calibrated in advance by an artificial method. For example, if an animation shows that the content is a person sitting in a chair and crying, it can be manually The emotion information corresponding to the animation is determined to be "sadness." In addition, the emotional keyword corresponding to each animation may also be determined by the pre-trained second classification model. The specific method may be: after each animation is converted into a corresponding feature vector, each feature vector may be obtained. Inputting into the pre-trained second classification model respectively, and then determining the emotional keywords corresponding to the animations according to the output of the second classification model, and then matching the animations with the emotion information, wherein the The training method of the two-category model can be the same as the above-mentioned training of the first classification model, and will not be described in detail here.
需要说明的是,上述提到的特征信息并不只限于“高兴”、“悲伤”这样的情绪信息,也可以是诸如“阴天”、“晴天”、“大风”、“下雨”等天气信息、或是诸如“强壮”、“萎靡”、“安详”等仪态信息,当然也可以其他的信息,在这就不进行一一举例说明了。相应的,与各特征信息相对应的各特征关键词也应与各动画对应起来保存在预设的动画库中,而在确定各动画所对应的各特征关键词时,则同样可通过预先训练的分类模型来进行确定,具体的确定过程与上 述确定各动画所对应的动画关键词相同,在此就不进行详细说明了。而这里提到的分类模型也可以是诸如神经网络模型、隐马尔科夫模型HMM、支持向量机SVM等模型。It should be noted that the above-mentioned characteristic information is not limited to emotional information such as "happy" or "sad", but may also be weather information such as "cloudy", "sunny", "high wind", "raining" and the like. Or, such as "strong", "wilting", "safe" and other information, of course, other information, here is not an example. Correspondingly, each feature keyword corresponding to each feature information should also be stored in a preset animation library corresponding to each animation, and when determining each feature keyword corresponding to each animation, the same can be pre-trained. Classification model to determine, specific determination process and It is determined that the animation keywords corresponding to the respective animations are the same, and will not be described in detail here. The classification model mentioned here may also be a model such as a neural network model, a hidden Markov model HMM, a support vector machine SVM, or the like.
在实际应用中,一个动画可能会对应多个特征关键词,因此,为了能够进一步精确的确定出文本关键词对应的动画,在本申请实施例中,终端也可从不同的角度提取出文本信息中的多个特征信息,进而可根据提取出的多个特征信息对文本关键词对应的多个动画进行进一步筛选,从而更加准确出该文本关键词相对于整个文本信息所对应的动画。In an actual application, an animation may correspond to a plurality of feature keywords. Therefore, in order to further accurately determine an animation corresponding to the text keyword, in the embodiment of the present application, the terminal may also extract text information from different angles. The plurality of feature information may further filter a plurality of animations corresponding to the text keyword according to the extracted plurality of feature information, thereby more accurately displaying the animation corresponding to the text keyword with respect to the entire text information.
S104:将确定出的各动画进行合成,得到融合动画。S104: Synthesize each determined animation to obtain a fusion animation.
终端通过各文本关键词确定出该文本信息所涉及的各动画后,可将各动画进行合成,以得到能够表示该文本信息的融合动画,其中,终端可各动画进行合成的方式可以是,将各动画按照各文本关键词在该文本信息中的排列顺序进行合成。After the terminal determines each animation related to the text information by using each text keyword, each animation may be combined to obtain a fusion animation capable of representing the text information, wherein the terminal may synthesize each animation by using Each animation is synthesized in the order in which the text keywords are arranged in the text information.
例如,假设在一段为“今天晴空万里,我要去钓鱼”的文本信息中,终端可通过预先训练的识别模型从该文本信息中识别出“晴空万里”、“我”、“钓鱼”这三个文本关键词,而后,终端从预设的动画库中分别确定出“晴空万里”、“我”、“钓鱼”这三个文本关键词所对应的三个动画H、X、C,继而根据这三个文本关键词在“今天晴空万里,我要去钓鱼”这段文本信息中的排列顺序,将这三个动画H、X、C进行排列,得到待合成的动画序列为H、X、C,而后,终端可按照该待融合的动画序列H、X、C将这三个动画进行合成,最终得到表示该文本信息的融合动画。For example, suppose that in a text message of “Today's Clear Sky, I am going to fish”, the terminal can identify “clear sky”, “I”, “fishing” from the text information through a pre-trained recognition model. a text keyword, and then the terminal determines three animations H, X, and C corresponding to the three text keywords "clear sky", "me", and "fishing" from the preset animation library, and then according to The three text keywords are arranged in the text message of "Today's Clear Sky, I want to go fishing". The three animations H, X, and C are arranged to obtain the animation sequence to be synthesized as H, X, C, then, the terminal can synthesize the three animations according to the animation sequence H, X, and C to be fused, and finally obtain a fused animation representing the text information.
对于两个动画的合成过程来说,在实际应用中,两个动画可能会有所差别, 若将两个有所差别的动画直接进行合成,则合成后的动画看上去将会有明显的跳跃感。所以,为了使合成后的动画看上去更加的自然,在本申请实施例中,可在任意两个相邻的动画中,插入一段用于过渡的动画片段,并将这段动画片段与这两个相邻的动画一并进行合成,得到融合动画。For the animation of the two animations, in the actual application, the two animations may be different. If you combine two different animations directly, the synthesized animation will look like a clear jump. Therefore, in order to make the synthesized animation look more natural, in the embodiment of the present application, a piece of animation for transition can be inserted in any two adjacent animations, and the animation segment and the two The adjacent animations are combined to obtain a fused animation.
具体的,对于两个任意相邻的动画来说,通过这两个动画来确定出待插入到这两个动画之间的过渡动画片段,其中,终端可通过插值的方式来确定出该过渡动画片段。Specifically, for two arbitrary adjacent animations, the transition animation segments to be inserted between the two animations are determined by the two animations, wherein the terminal can determine the transition animation by interpolation. Fragment.
例如,动画A和动画B是两个相邻的动画,其中,动画A为前一动画,而动画B为后一动画,动画A和动画B具有明显的差别,因此,为了在合成这两个动画的过程中消除这些差别,终端可通过动画A和动画B中人物动作的分析,并通过插值的方式确定出动画a1、b1这两个待插入到动画A和动画B的过渡动画片段,其中,从这两个过渡动画片段a1、b1中的人物动作是按照a1、b1的顺序,依次将动画A中的人物动作过渡到了动画B,这样一来,由于过渡动画片段的存在,将动画A、过渡动画片段a1、b1、动画B按照顺序进行合成后得到的动画将是一个连贯的动画,而并不会出现因动画A、B之间存在差别所引起的跳跃感。For example, animation A and animation B are two adjacent animations, where animation A is the previous animation, and animation B is the latter animation, and animation A and animation B have significant differences, so in order to synthesize the two In the process of animation, these differences are eliminated. The terminal can analyze the motion of the characters in the animation A and the animation B, and determine the transition animations of the animations a1 and b1 to be inserted into the animation A and the animation B by interpolation. The characters from the two transitional animation segments a1 and b1 are in the order of a1 and b1, and the characters in the animation A are successively transitioned to the animation B, so that the animation A will be present due to the existence of the transitional animation segment. The animations obtained by synthesizing the transitional animation segments a1, b1 and animation B in order will be a coherent animation, and there will be no jumping feeling caused by the difference between the animations A and B.
除了上述说明的合成方式外,在本申请实施例中,终端也可在两个相邻的动画之间加入一定的效果,以消除这两个相邻动画之间存在的差别。具体的,通常情况下,动画都是各动画帧组成的,各动画帧按照一定的顺序排列并快速的进行放映就得到了相应的动画。对于两个存在差别的动画,若两个动画中用于衔接的动画帧存在差别,则这两个动画往往也将是存在差别的两个动画,换句话说,对于两个动画来说,两个动画的差别往往都是由这两个动画用于衔接 的动画帧来决定的,其中,对于两个动画来说,这两个动画按顺序进行播放时,前一动画的最后一动画帧和后一动画的第一个动画帧就可作为这两个动画用于衔接的动画帧。因此,对于两个有差别的动画来说,消除或降低这两个动画之间差别的方式可以是对这两个动画中用于衔接的动画帧进行一定的处理,具体的处理方式可以是,当终端确定出待融合的各动画并将各动画按照各文本关键词在文本信息中的排列顺序进行排列后,终端可针对任意两个相邻的动画,将前一动画的各第一指定动画帧设定为第一效果,而将后一动画的各第二指定动画帧设为第二效果,其中,由于若前一动画的后几个动画帧与后一动画的前几个动画帧之间存在明显的差别,则前一动画和后一动画之间也必然存在差别,因此,为了使合成后的动画不会出现明显的跳跃感,终端应尽量消除或降低前一动画的后几动画帧和后一动画的前几动画帧所带来的差别,为保证衔接后动画的完整性,终端在前一动画中选取各第一指定动画帧时,可尽量选取该前一动画的后几个动画帧作为各第一指定动画帧,而在选取各第二指定动画帧时,可尽量选取后一动画的前几个动画帧作为各第二指定动画帧。在选取完第一、第二指定动画帧后,终端可将第一指定动画帧设为诸如淡出、盒状收缩等效果,而终端可根据第一指定动画帧的效果,将各第二指定动画帧的效果设定为与第一指定动画帧相反的效果,如,当终端将各第一指定动画帧的效果设定为淡出时,则可相应的将后一动画的各第二指定动画帧的效果设定为淡入效果。In addition to the above-described synthetic manner, in the embodiment of the present application, the terminal may also add a certain effect between two adjacent animations to eliminate the difference between the two adjacent animations. Specifically, in general, animations are composed of animation frames, and each animation frame is arranged in a certain order and quickly projected to obtain a corresponding animation. For two animations with differences, if there are differences in the animation frames used for the two animations, then the two animations will often be the two animations that have differences. In other words, for the two animations, two The difference between animations is often used by these two animations The animation frame is determined, wherein, for the two animations, when the two animations are played in order, the last animation frame of the previous animation and the first animation frame of the latter animation can be used as the two Animated frames used for animation. Therefore, for two different animations, the way to eliminate or reduce the difference between the two animations may be to perform certain processing on the animation frames used for the two animations. The specific processing method may be: When the terminal determines each animation to be merged and arranges each animation according to the arrangement order of each text keyword in the text information, the terminal may specify the first specified animation of the previous animation for any two adjacent animations. The frame is set as the first effect, and each second specified animation frame of the latter animation is set as the second effect, wherein, if the last animation frame of the previous animation and the first few animation frames of the latter animation There is a clear difference between the previous animation and the latter animation. Therefore, in order to make the synthesized animation not have obvious jumping feeling, the terminal should try to eliminate or reduce the last animation of the previous animation. The difference between the frame and the first few animation frames of the latter animation is to ensure the integrity of the animation after the connection. When the terminal selects each of the first specified animation frames in the previous animation, the previous one can be selected as much as possible. Painting few animation frame specified as each of the first animation frame, and in the respective second selected frame specified animation, the animation may try to select the first few frames after a respective second specified as the animation frame animation. After the first and second specified animation frames are selected, the terminal may set the first specified animation frame to effect such as fade out, box-shaped contraction, etc., and the terminal may set each second specified animation according to the effect of the first specified animation frame. The effect of the frame is set to be opposite to the first specified animation frame. For example, when the terminal sets the effect of each first specified animation frame to fade out, the second specified animation frame of the subsequent animation may be correspondingly The effect is set to fade in effect.
终端分别针对前一动画的各第一指定动画帧和后一动画的各第二指定动画帧设定为效果后,可将这两个动画进行合成。这样一来,当合成后的动画播放到各第一指定动画帧以及各第二指定动画帧时,终端对各第一指定动画帧以 及各第二指定动画帧分别设定的效果将会消除或降低这些动画帧之间的差别,从而使得合成后的动画在播放的过程中不会出现明显的跳跃感。After the terminal sets the effects for each of the first specified animation frames of the previous animation and the second designated animation frames of the subsequent animation, the two animations can be combined. In this way, when the synthesized animation is played to each of the first designated animation frames and each of the second specified animation frames, the terminal And the effect respectively set by each of the second specified animation frames will eliminate or reduce the difference between the animation frames, so that the synthesized animation does not have a significant jump feeling during the playing process.
在实际应用中,不同动画的动画帧之间有时也会存在一定的相似性,因此,基于此,在本申请实施例中,对于任意两个相邻的动画,终端也可在这两个动画分别确定出彼此相似的动画帧,并将彼此相似的动画帧采用一定的方式合成为一个动画帧,而后再根据合成后的动画帧来对这两个动画进行合成。In practical applications, there may be certain similarities between animation frames of different animations. Therefore, based on this, in the embodiment of the present application, for any two adjacent animations, the terminal may also be in the two animations. The animation frames similar to each other are respectively determined, and the animation frames similar to each other are synthesized into an animation frame in a certain manner, and then the two animations are synthesized according to the synthesized animation frames.
具体的,对于任意两个相邻的动画,终端可分别确定出前一动画的每个动画帧与后一动画的每个动画帧的相似度,并根据确定出的各相似度,分别从前一动画中选取出第一动画帧以及从后一动画中选取出第二动画帧,并将该第一动画帧和第二动画帧进行融合,得到一个融合帧,其中,选取出第一动画帧和第二动画帧在前一动画和后一动画中相似度最高。而后,终端可进一步的将前一动画中位于第一动画帧之前的各动画帧、融合帧、以及位于后一动画中第二动画帧之后的各动画帧进行合成,得到融合后的动画。Specifically, for any two adjacent animations, the terminal may respectively determine the similarity between each animation frame of the previous animation and each animation frame of the latter animation, and respectively according to the determined similarities, respectively from the previous animation. Selecting the first animation frame and selecting the second animation frame from the latter animation, and merging the first animation frame and the second animation frame to obtain a fused frame, wherein the first animation frame and the first animation frame are selected The second animation frame has the highest similarity in the previous animation and the latter animation. Then, the terminal may further synthesize each animation frame, the fused frame, and each animation frame located after the second animation frame in the previous animation in the previous animation to obtain the fused animation.
例如,假设在相邻的两个动画C和D中,动画C中包含有#1~#5一共5个动画帧,动画D包含有*1~*7一共7个动画帧,终端在确定出动画C中每个动画帧与动画D中每个动画帧的相似度发现,动画C中的#3动画帧与动画D中的*2动画帧相似度最高,因此,终端可将动画C中的#3动画帧与动画D中的*2动画帧进行融合,得到相应的融合帧。终端在将动画C和动画D进行合成时,可将动画C中位于#3动画帧之前的动画帧#1、#2,以及动画D中位于动画帧*2之后的动画帧*3~*7选取出来,并将选取出来的各动画帧与得到的融合帧进行合成,具体的融合方式可以是,将动画帧#1、#2、融合帧、动画帧*3~*7按照顺序合成为一个动画,而动画C中的动画帧#4、#5以及动画D中 的动画帧*1可相应的去掉。For example, suppose that in the two adjacent animations C and D, the animation C includes a total of 5 animation frames from #1 to #5, and the animation D includes a total of 7 animation frames from *1 to *7, and the terminal is determined. The similarity between each animation frame in animation C and each animation frame in animation D is found. The #3 animation frame in animation C has the highest similarity to the *2 animation frame in animation D. Therefore, the terminal can be in the animation C. The #3 animation frame is fused with the *2 animation frame in the animation D to obtain the corresponding fused frame. When the terminal synthesizes the animation C and the animation D, the animation frames #1, #2 located before the #3 animation frame in the animation C, and the animation frames *3 to *7 located after the animation frame *2 in the animation D. Selecting and synthesizing the selected animation frames with the obtained fused frames. The specific fusion method may be that the animation frames #1, #2, the fused frame, and the animation frames *3 to *7 are combined into one in order. Animation, while animation frames #4, #5 and animation D in animation C The animation frame *1 can be removed accordingly.
终端在确定各动画帧之间的相似度时,可通过计算各动画帧之间的欧式距离来进行确定,其中,对于普通的二维动画来说,终端可通过图片的三原色(红、绿、蓝)来构建图片的特征参数,并通过计算各特征参数之间欧式距离的方式,来确定各动画帧之间的相似度,通常情况下,欧式距离数值越小,两个动画帧之间的相似度也就越大。When determining the similarity between each animation frame, the terminal can determine by calculating the Euclidean distance between each animation frame, wherein, for ordinary two-dimensional animation, the terminal can pass the three primary colors of the image (red, green, Blue) to construct the feature parameters of the picture, and determine the similarity between each animation frame by calculating the Euclidean distance between each feature parameter. Generally, the smaller the Euclidean distance value is between the two animation frames. The similarity is greater.
而对于三维动画来说,其每个动画帧所对应的特征参数并不能简单的通过图片的三原色来进行构建,所以,对于三维动画来说,三维动画中的每个动画帧所对应的特征参数可通过各动画帧在骨骼动画中的参数来进行表示。具体的,在本申请实施例中,终端在确定前一动画中的每一动画帧与后一动画中的每一动画帧的相似度时,可分别确定出各动画帧在骨骼动画中的各骨骼的旋转角速度向量、各骨骼的骨骼权重、各骨骼的旋转向量以及动画的剧烈程度系数,而后,终端可采用公式
Figure PCTCN2017099462-appb-000001
分别确定出前一动画中每个动画帧与后一动画中每个动画帧之间的欧式距离,进而根据确定出的欧式距离,来对前一动画的每个动画帧与后一动画的每个动画帧的相似度进行确定,其中,D(i,j)为前一动画的第i个动画帧与后一动画的第j个动画帧的欧氏距离,当欧式距离越小时,前一动画的第i个动画帧与后一动画的第j个动画帧的相似度就越大。
For 3D animation, the feature parameters corresponding to each animation frame cannot be simply constructed by the three primary colors of the image. Therefore, for 3D animation, the feature parameters corresponding to each animation frame in the 3D animation. It can be represented by the parameters of each animation frame in the skeletal animation. Specifically, in the embodiment of the present application, when determining the similarity between each animation frame in the previous animation and each animation frame in the subsequent animation, the terminal may separately determine each animation frame in the skeletal animation. The rotational angular velocity vector of the bone, the bone weight of each bone, the rotation vector of each bone, and the intensity factor of the animation, and then the terminal can adopt the formula
Figure PCTCN2017099462-appb-000001
Determine the Euclidean distance between each animation frame in the previous animation and each animation frame in the latter animation, and then each animation frame and the next animation of the previous animation according to the determined Euclidean distance. The similarity of the animation frame is determined, wherein D(i, j) is the Euclidean distance of the i-th animation frame of the previous animation and the j-th animation frame of the latter animation, and the smaller the Euclidean distance, the previous animation The similarity between the i-th animation frame and the j-th animation frame of the latter animation is greater.
Figure PCTCN2017099462-appb-000002
为前一动画的第i个动画帧的第n个骨骼的旋转角速度向量,
Figure PCTCN2017099462-appb-000003
为后一动画的第j个动画帧的第n个骨骼的旋转角速度向量,其中,实际应用中的骨骼动画所采用的标准都是一致的,换句话说,对于两个不同的骨骼动画,表示手部或脚部的骨骼标号通常都是一样的,因此,这里提到的第i个动画帧的第 n个骨骼与第j个动画帧的第n个骨骼表示的都是同一部位的骨骼,也就是说,前一动画中各动画帧的骨骼编号与后一动画中各动画帧的骨骼编号都是相同的。
Figure PCTCN2017099462-appb-000002
The rotation angular velocity vector of the nth bone of the ith animation frame of the previous animation,
Figure PCTCN2017099462-appb-000003
The rotation angular velocity vector of the nth bone of the jth animation frame of the latter animation, wherein the criteria used for the skeletal animation in the actual application are the same, in other words, for two different skeletal animations, The bones of the hand or foot are usually the same, so the nth bone of the i-th animation frame mentioned here and the nth bone of the j-th animation frame represent the same part of the bone. In other words, the bone number of each animation frame in the previous animation is the same as the bone number of each animation frame in the latter animation.
上述公式中的wn表示的是第n个骨骼的骨骼权重,公式中的
Figure PCTCN2017099462-appb-000004
表示的是前一动画的第i个动画帧的第n个骨骼的旋转向量,
Figure PCTCN2017099462-appb-000005
为后一动画的第j个动画帧的第n个骨骼的旋转向量;而公式中的u则表示为预设的动画剧烈程度系数。从上述公式中可以看出,对于三维动画中的动画帧,终端在计算两个动画帧之间的欧式距离时,从骨骼旋转向量和骨骼旋转角速度向量两个方面出发,将两个动画帧中的每一骨骼都依次进行了比较,进而计算出的欧式距离相对较为准确。当然,上述公式并不唯一,可以再引入其他的骨骼参数,以进一步准确的确定出各动画帧之间的欧式距离,进而通过确定出的各动画帧之间的欧式距离,确定出各动画帧之间的相似度。
The w n in the above formula represents the bone weight of the nth bone, in the formula
Figure PCTCN2017099462-appb-000004
Represents the rotation vector of the nth bone of the ith animation frame of the previous animation.
Figure PCTCN2017099462-appb-000005
The rotation vector of the nth bone of the jth animation frame of the latter animation; and u in the formula is represented as the preset animation intensity coefficient. As can be seen from the above formula, for an animation frame in a three-dimensional animation, when calculating the Euclidean distance between two animation frames, the terminal starts from two aspects of the bone rotation vector and the bone rotation angular velocity vector. Each bone is compared in turn, and the calculated Euclidean distance is relatively accurate. Of course, the above formula is not unique, and other bone parameters can be introduced to further accurately determine the Euclidean distance between the animation frames, and then determine the animation frames by determining the Euclidean distance between the animation frames. The similarity between the two.
当然,在确定出前一动画的每个动画帧与后一动画的每个动画帧的相似度时,也可通过诸如点积等方式来进行确定,即,计算出两个动画帧的点积后,通过点积来确定这两个动画帧的相似度,具体过程就不进行详细说明了。Of course, when determining the similarity between each animation frame of the previous animation and each animation frame of the latter animation, it can also be determined by means such as dot product, that is, after calculating the dot product of the two animation frames. The similarity of the two animation frames is determined by the dot product, and the specific process will not be described in detail.
上述说明的通过确定出前一动画和后一动画中相似度最高的两个动画帧来合成动画的方式,可能会丢掉多个动画帧,例如,继续沿用上例,假设动画C中的#2与动画D中的*5相似度最高时,终端在对动画C和动画D进行合成的过程中,将会丢掉动画C中的动画帧#3~#5和动画D中的动画帧*1~*4,也就是说终端将会丢掉7个动画帧,而动画C和动画D一共才有12帧,这样一来,由于丢掉的帧数过多,终端最终合成的动画在效果上将会受到一定的影响。The above description may result in the loss of multiple animation frames by determining the two animation frames with the highest similarity in the previous animation and the latter animation. For example, continue to use the above example, assuming #2 in the animation C When the *5 similarity in the animation D is the highest, the terminal will discard the animation frames #3 to #5 in the animation C and the animation frames in the animation D *1 to * in the process of synthesizing the animation C and the animation D. 4, that is to say, the terminal will lose 7 animation frames, and the animation C and animation D have a total of 12 frames. In this way, due to the excessive number of dropped frames, the final synthesized animation of the terminal will be affected by the effect. Impact.
为了尽可能的降低丢帧对动画合成的影响,在本申请实施例中,终端在确 定前一动画的每个动画帧与后一动画的每个动画帧的相似度时,可在前一动画中提取各第三指定动画帧以及从后一动画中提取各第四指定动画帧,其中,这里提到的各第三指定动画帧是指前一动画中的连续的一部分动画帧,为了尽可能的降低丢帧所带来的不利因素,可选取前一动画中的后几个动画帧作为各第三指定动画帧;同理,这里提到的各第四指定动画帧是指后一动画中连续的一部分动画帧,而终端可选取后一动画中前几个动画帧作为各第四指定动画帧,而后,终端可进一步的确定出每个第三指定动画帧和每个第四指定动画帧之间的相似度,并根据相似度选取出相似度最高的两个动画帧进行融合,进而通过该融合帧来合成动画。In order to reduce the impact of frame dropping on animation synthesis as much as possible, in the embodiment of the present application, the terminal is When the similarity between each animation frame of the previous animation and each animation frame of the latter animation is determined, each third specified animation frame may be extracted in the previous animation and each fourth specified animation frame may be extracted from the latter animation. The third specified animation frame mentioned here refers to a continuous part of the animation frame in the previous animation. In order to reduce the disadvantages caused by the frame loss as much as possible, the latter animations in the previous animation may be selected. The frame is used as each of the third specified animation frames; similarly, each of the fourth designated animation frames mentioned herein refers to a continuous part of the animation frame in the latter animation, and the terminal can select the first several animation frames in the latter animation as the first Four specified animation frames, and then the terminal can further determine the similarity between each of the third specified animation frames and each of the fourth specified animation frames, and select two animation frames with the highest similarity according to the similarity for fusion. And then synthesize the animation through the fused frame.
例如,继续沿用上例,终端在确定动画C中每个动画帧与动画D中每个动画帧之间的相似度时,可取动画C中#3~#5的动画帧和动画D中*1~*3的动画帧,并对动画帧#3~#5和动画帧*1~*3之间的相似度进行确定,而后,终端可从确定出的各相似度中选取相似度最大的两个动画帧进行融合,并根据得到的融合帧来对动画C和动画D进行合成。For example, continue to use the above example, when the terminal determines the similarity between each animation frame in the animation C and each animation frame in the animation D, the animation frame of #3 to #5 in the animation C and the animation D in the animation D can be taken. ~*3 animation frame, and determining the similarity between the animation frames #3 to #5 and the animation frames *1 to *3, and then the terminal can select the two similarities from the determined similarities The animation frames are fused, and the animation C and the animation D are combined according to the obtained fused frame.
从上述的合成方式可以看出,由于终端在确定出各动画帧之间的相似度时,只确定出了前一动画中一部分动画帧和后一动画中一部分动画帧之间的相似度,因此,终端后续根据这部分动画帧之间的相似度来合成动画时,可将丢帧的数量有效的控制在一定的范围内,从而在一定程度上降低了丢帧对动画合成的不利影响。As can be seen from the above-mentioned synthesis method, since the terminal determines the similarity between each animation frame in the previous animation, only the similarity between a part of the animation frame in the previous animation and a part of the animation frame in the latter animation is determined. When the terminal synthesizes the animation according to the similarity between the animation frames, the number of dropped frames can be effectively controlled within a certain range, thereby reducing the adverse effect of frame dropping on animation synthesis to a certain extent.
上述的合成方式虽然能够在一定程度上降低丢帧所带来的不利因素,但是,由于终端确定出的各相似度只是前一动画和后一动画中一部分动画帧之间的相似度,在这部分动画帧的相似度中,即使是相似度最高的两个动画帧,其 实际的差别可能也相对较大,进而导致基于这两个动画帧而合成的动画从效果上看也会有跳跃感出现。Although the above-mentioned synthesis method can reduce the disadvantages caused by frame dropping to a certain extent, since the similarities determined by the terminal are only the similarities between the animation frames of the previous animation and the latter animation, Among the similarities of some animation frames, even the two animation frames with the highest similarity The actual difference may also be relatively large, which in turn leads to an animation that is synthesized based on the two animation frames.
因此,为进一步的保证合成动画的效果,在本申请实施例中,终端可从丢帧率和相似度两方面进行出发,来确定待融合的两个动画帧,其中,这里提到的丢帧率指的是,在一段动画中,未进行融合且未进行合成的帧数与该动画总帧数的比值,例如,假设两个动画中总共有12个动画帧,终端将这两个动画进行合成时,有4个动画帧在合成的过程中被终端丢弃,即,这4个动画帧即未参与到融合过程中,也未参与到这两个动画的合成过程中,此时,合成这两个动画时的丢帧率则是1/3。Therefore, in order to further ensure the effect of the synthetic animation, in the embodiment of the present application, the terminal may proceed from two aspects of frame loss rate and similarity to determine two animation frames to be merged, wherein the frame loss mentioned here The rate refers to the ratio of the number of frames that have not been fused and not synthesized to the total number of frames of the animation in an animation. For example, suppose there are a total of 12 animation frames in two animations, and the terminal performs the two animations. When synthesizing, there are 4 animation frames that are discarded by the terminal during the synthesis process. That is, the 4 animation frames are not involved in the fusion process, nor participate in the synthesis process of the two animations. The frame loss rate for two animations is 1/3.
终端在确定待融合的两个动画帧时,可先分别确定出前一动画中每个动画帧和后一动画中每个动画帧之间的相似度,并针对每个相似度所对应的两个动画帧,确定出将这两个动画帧作为融合帧来合成动画时,合成后的动画所对应的丢帧率是多少。终端在确定出相似度以及各相似度所对应的丢帧率时,可从前一动画中确定出第一动画帧,并从后一动画中确定出第二动画帧,其中,第一动画帧和第二动画帧满足公式
Figure PCTCN2017099462-appb-000006
在这个公式中,xIJ为使a*xij+b*yij最小的xij,即为第一动画帧和第二动画帧之间的欧氏距离,xij为前一动画的第i帧动画帧和后一动画的第j帧动画帧的欧氏距离,i的取值范围为1~前一动画的总帧数,j的取值范围为1~后一动画的总帧数;yIJ为使a*xij+b*yij最小的yij,即表示根据第一动画帧和/或根据第二动画帧确定出的综合丢帧率,相应的,yij为根据第i帧动画帧和/或根据第j帧动画帧确定出的综合丢帧率,a、b则为相应的系数,该系数可由人为进行确定,只需保证不小于0即可。
When determining the two animation frames to be merged, the terminal may first determine the similarity between each animation frame in the previous animation and each animation frame in the subsequent animation, and for each similarity The animation frame determines the frame loss rate corresponding to the synthesized animation when the two animation frames are used as the fused frame to synthesize the animation. When determining the similarity degree and the frame loss rate corresponding to each similarity, the terminal may determine the first animation frame from the previous animation, and determine the second animation frame from the latter animation, where the first animation frame and The second animation frame satisfies the formula
Figure PCTCN2017099462-appb-000006
In this formula, x IJ is the smallest x ij that makes a*x ij +b*y ij , which is the Euclidean distance between the first animation frame and the second animation frame, and x ij is the i-th of the previous animation. The Euclidean distance of the frame animation frame and the j-frame animation frame of the latter animation, the value range of i is 1 to the total number of frames of the previous animation, and the value range of j is 1 to the total number of frames of the latter animation; y IJ is the minimum y ij of a*x ij +b*y ij , that is, the integrated frame loss rate determined according to the first animation frame and/or according to the second animation frame, correspondingly, y ij is according to the i The frame animation frame and/or the integrated frame loss rate determined according to the j-th frame animation frame, a and b are corresponding coefficients, and the coefficient can be determined manually, and only needs to be guaranteed to be not less than 0.
上述说明的yij并非指的是前一动画和后一动画在实际合成过程中的真实丢帧率,而是一个能够表征实际丢帧率的数值,这个数值虽然不能真实的表示出动画合成过程中的真实丢帧率,但是,该数值与动画合成过程中的丢帧率是成正相关的,所以,当yij的数值较小时,根据yij将上述前一动画和后一动画进行合成后的丢帧率也将相对较小。The above description of y ij does not refer to the actual frame loss rate of the previous animation and the latter animation in the actual synthesis process, but a value that can represent the actual frame loss rate. Although this value cannot truly represent the animation synthesis process. The true frame loss rate, however, is positively related to the frame loss rate during animation synthesis. Therefore, when the value of y ij is small, the previous animation and the latter animation are combined according to y ij . The frame loss rate will also be relatively small.
对于yij的确定方式,终端通过公式
Figure PCTCN2017099462-appb-000007
确定出第一动画帧以及第二动画帧时,可针对前一动画中的第i帧动画帧,根据该第i帧动画帧,确定出前一动画的一个预期丢帧率,并将确定出的前一动画的预期丢帧率就作为综合丢帧率yij,或是,终端可针对后一动画中的第j帧动画帧,根据该第j帧动画帧,确定出后一动画的一个预期丢帧率,并将确定出的后一动画的预期丢帧率就作为综合丢帧率yij,其中,这里提到的前一动画的预期丢帧率可以是:终端根据第i帧动画帧,确定出前一动画根据该第i帧动画帧,与后一动画进行合成时,在前一动画中不参与融合且不参与合成的动画帧帧数与前一动画总帧数的比值,即前一动画和后一动画在合成的过程中,前一动画所丢弃的动画帧与前一动画总帧数的比值;同理,后一动画的预期丢帧率可以是:终端根据第j帧动画帧,确定出当后一动画根据该第j帧动画帧,与前一动画进行合成时,在后一动画中不参与融合且不参与合成的动画帧帧数与后一动画总帧数的比值,即,前一动画和后一动画在合成的过程中,后一动画所丢弃的动画帧与后一动画总帧数的比值。
For the way y ij is determined, the terminal passes the formula
Figure PCTCN2017099462-appb-000007
When the first animation frame and the second animation frame are determined, an expected frame loss rate of the previous animation may be determined according to the ith frame animation frame for the ith frame animation frame in the previous animation, and the determined The expected frame loss rate of the previous animation is taken as the integrated frame loss rate y ij , or the terminal can be used for the j-th frame animation frame in the latter animation, and an expectation of the latter animation is determined according to the j-frame animation frame. The frame rate is lost, and the expected frame loss rate of the latter animation is determined as the integrated frame loss rate y ij , wherein the expected frame loss rate of the previous animation mentioned here may be: the terminal according to the ith frame animation frame , determining the ratio of the number of animation frame frames that do not participate in the fusion and does not participate in the composition of the previous animation according to the animation frame of the i-th frame, and the previous animation is combined with the previous animation. In the process of compositing, the ratio of the animation frame discarded by the previous animation to the total number of frames of the previous animation; similarly, the expected frame loss rate of the latter animation may be: the terminal is animated according to the jth frame Frame, determining that the next animation is based on the jth frame Draw a frame, when compared with the previous animation, the ratio of the number of frames of the animation frame that does not participate in the fusion and does not participate in the composition and the total number of frames of the latter animation in the latter animation, that is, the previous animation and the latter animation are synthesized. In the process, the ratio of the animation frame discarded by the latter animation to the total number of frames of the latter animation.
上述说明的yij表示为在两个相邻动画的合成过程中,终端根据第i帧动画帧确定出的综合丢帧率,或是终端根据第j帧动画帧确定出的综合丢帧率,这样一来,由于终端通过公式
Figure PCTCN2017099462-appb-000008
确定出的第一动画帧 以及第二动画帧是基于丢帧率以及相似度两方面考虑而确定的,因此,终端通过上述方式合成后的动画能够在一定程度上降低丢帧所带来的不利影响。
The y ij described above is expressed as the integrated frame loss rate determined by the terminal according to the i-th frame animation frame or the integrated frame loss rate determined by the terminal according to the j-th frame animation frame in the process of synthesizing two adjacent animations. In this way, because the terminal passes the formula
Figure PCTCN2017099462-appb-000008
The determined first animation frame and the second animation frame are determined based on the frame loss rate and the similarity. Therefore, the animation synthesized by the terminal in the above manner can reduce the disadvantage of frame dropping to some extent. influences.
然而,对于待合成的两个相邻动画来说,若单单只考虑一个动画在合成时的丢帧率,则该丢帧率可能并不能在整体的角度上表征出这两个动画在合成时的丢帧率,例如,假设对于两个相邻动画来说,终端在对这两个动画进行合成时,从这两个动画中分别选取出待融合的动画帧,致使一个动画所对应的丢帧率可能相对较低,而另一个动画的丢帧率可能会非常高,若终端只考虑通过这两个融合的动画帧合成这两个动画时能够使其中一个动画的丢帧率较低,而不考虑这样会致使另一个动画的丢帧率较高,则终端通过这种方式合成这两个动画后,这两个动画在总体上的丢帧率可能也将相对较高,进而最终影响到融合动画的显示效果。However, for two adjacent animations to be synthesized, if only the frame loss rate of one animation at the time of composition is considered, the frame loss rate may not be able to represent the overall animation when the two animations are synthesized. The frame loss rate, for example, assumes that for two adjacent animations, when the terminal synthesizes the two animations, the animation frames to be merged are respectively selected from the two animations, resulting in an animation corresponding to the lost The frame rate may be relatively low, and the frame loss rate of another animation may be very high. If the terminal only considers the two animations through the two fused animation frames, the frame loss rate of one of the animations may be lower. Regardless of the fact that this will cause the frame loss rate of another animation to be higher, after the terminal synthesizes the two animations in this way, the overall frame loss rate of the two animations may also be relatively high, which ultimately affects The effect of the fusion animation.
为了避免上述问题的发生,在本申请实施例中,终端确定yij的方式可根据第i帧动画帧和第j帧动画帧确定出的相邻两动画合成过程中的综合丢帧率,即,该这种yij的确定方式考虑了这两个动画合成过程中各自的丢帧情况,具体的确定方式可以是:终端在通过公式
Figure PCTCN2017099462-appb-000009
确定第一动画帧以及第二动画帧时,可从前一动画中选取第i帧动画帧以及从后一动画中选取出第j帧动画帧,而后,终端可确定出这两个动画帧的欧式距离xij,以及根据该第i帧动画帧以及根据该第j帧动画帧确定出的预期丢帧率yij,其中,这里的yij可以是终端根据第i帧动画帧确定出前一动画的预期丢帧率和根据第j帧动画帧确定出的后一动画的预期丢帧率之和,终端根据公式
Figure PCTCN2017099462-appb-000010
确定出某一对动画帧满足
Figure PCTCN2017099462-appb-000011
时,即可将这对动画帧确定出第一动画帧以及第二动画帧,相应的,该对动画帧对应 的xij以及yij则成为xIJ以及yIJ
In order to avoid the above problem, in the embodiment of the present application, the manner in which the terminal determines y ij may be based on the integrated frame loss rate of the adjacent two animation synthesis processes determined according to the i-th frame animation frame and the j-th frame animation frame, that is, The determination method of the y ij considers the frame loss situation of the two animation synthesis processes, and the specific determination manner may be: the terminal is passing the formula
Figure PCTCN2017099462-appb-000009
When the first animation frame and the second animation frame are determined, the i-th frame animation frame may be selected from the previous animation and the j-th frame animation frame may be selected from the latter animation, and then the terminal may determine the European state of the two animation frames. a distance x ij , and an expected frame loss rate y ij determined according to the ith frame animation frame and the j-frame animation frame, wherein y ij herein may be that the terminal determines the previous animation according to the ith frame animation frame The sum of the expected frame loss rate and the expected frame loss rate of the latter animation determined according to the j-frame animation frame, the terminal according to the formula
Figure PCTCN2017099462-appb-000010
Determine that a certain pair of animation frames are satisfied
Figure PCTCN2017099462-appb-000011
Then, the first animation frame and the second animation frame can be determined by the pair of animation frames, and correspondingly, x ij and y ij corresponding to the pair of animation frames become x IJ and y IJ .
例如,假设对于动画G和动画H这两个相邻的动画来说,其中,动画G中总共有6个动画帧,而动画H中总共有4个动画帧,终端在确定第一动画帧和第二动画帧时,通过公式
Figure PCTCN2017099462-appb-000012
发现,将动画G的第4帧动画帧和动画H的第2帧动画帧进行融合,并以此来合成动画G和动画H,a*x42+b*y42得到的值在所有的组合中最小,其中,终端在确定y42的值时,可以确定出当动画G根据该第4帧动画帧与动画H进行合成时,将会丢弃动画G中包含的第5、6帧动画帧,所以,终端根据动画G的第4帧动画帧确定出动画G的预期丢帧率为1/3,同理,终端可进一步确定出当动画H根据该第2帧动画帧与动画G进行合成时,将会丢弃动画H中包含的第1帧动画帧,所以,终端可根据动画H的第2帧动画帧确定出动画H的预期丢帧率为1/4,继而将两个预期丢帧率的和值7/12就作为y42的值。而对于x42的值,终端可通过上述公式
Figure PCTCN2017099462-appb-000013
确定出动画G中的第4帧动画帧与动画H中第2帧动画帧的欧式距离,并将确定出的欧式距离就作为x42的值。
For example, suppose that for two adjacent animations of animation G and animation H, in which there are a total of 6 animation frames in the animation G, and there are a total of 4 animation frames in the animation H, the terminal is determining the first animation frame and The second animation frame, when passing the formula
Figure PCTCN2017099462-appb-000012
It is found that the fourth frame animation frame of the animation G and the second frame animation frame of the animation H are merged, and the animation G and the animation H, a*x 42 +b*y 42 are synthesized in all combinations. The smallest one, wherein when determining the value of y 42 , the terminal can determine that when the animation G is combined with the animation H according to the fourth frame animation frame, the fifth and sixth frame animation frames included in the animation G are discarded. Therefore, the terminal determines that the expected frame loss rate of the animation G is 1/3 according to the fourth frame animation frame of the animation G. Similarly, the terminal may further determine that when the animation H is combined with the animation G according to the second frame animation frame. , the first frame animation frame contained in the animation H will be discarded, so the terminal can determine the expected frame loss rate of the animation H according to the second frame animation frame of the animation H, and then the two expected frame loss rates. The sum value 7/12 is taken as the value of y 42 . For the value of x 42 , the terminal can pass the above formula
Figure PCTCN2017099462-appb-000013
Determining the Euclidean distance G Animation Animation Frame 4 H animation frame in the second frame of the animation frame, and the determined value as the Euclidean distance to x 42 in.
需要说明的是,上述说明的yij的确定方式除了可将两个相邻动画中前一动画的预期丢帧率和后一动画的预期丢帧率的和值作为yij外,还可将这两个预期丢帧率的平均值就作为该yij,也可以将为各预期丢帧率分配权重,并将这两个预期丢帧率的加权和值就作为该yij,亦或是将这两个预期丢帧率的和值进行开根,并将开根得到的值就作为该yij,当然,该yij的也可以是前一动画和后一动画的实际丢帧率,总之,yij的意义在于能够表征出相邻两个动画在合成时的丢帧率即可,即,该yij应与相邻两个动画合成后的丢帧率成正相关,所以,无论 该yij的确定方式是何,终端确定出的yij能够与相邻两个动画合成后的丢帧率成正相关即可,而至于确定的方式则并不唯一。It should be noted that the above description of y ij can be determined by using the sum of the expected frame loss rate of the previous animation in the two adjacent animations and the expected frame loss rate of the latter animation as y ij . The average of the two expected frame loss rates is taken as the y ij , and the weighted sum of the expected frame rate can be assigned as the y ij , or The sum of the two expected frame loss rates is opened, and the value obtained by rooting is taken as the y ij . Of course, the y ij can also be the actual frame loss rate of the previous animation and the latter animation. In short, the meaning of y ij is to be able to characterize the frame loss rate of two adjacent animations at the time of composition, that is, the y ij should be positively correlated with the frame loss rate of the adjacent two animations, so no matter y ij is determined what manner, the terminal can be determined with the y ij frame loss rate after two animations can be positively related to the synthesis of adjacent, as for the determination is not the only way.
由于欧式距离与相似度成负相关,因此,通过公式
Figure PCTCN2017099462-appb-000014
确定出的待融合的第一动画帧以及第二动画帧既能在一定程度上保证通过这两个动画帧合成后的动画丢帧率尽可能低,也能在一定程度上保证这两个动画帧能够尽可能的相似,从而将丢帧对动画合成的影响进一步的降低。通过改变a和b的取值,可以得到用户理想的第一动画帧和第二动画帧。例如,当a=1,b=0时,即为上述只考虑两段动画之间的相关性而不考虑丢帧率的情况,此时当取
Figure PCTCN2017099462-appb-000015
从而得到欧式距离最小(即相似度最大)的两个动画帧,而当a=0,b=1时,即为上述只考虑两段动画之间的丢帧率而不考虑相关性的情况,此时当取
Figure PCTCN2017099462-appb-000016
即得到丢帧率最低的两个动画帧。
Since the Euclidean distance is inversely related to the similarity, the formula is passed.
Figure PCTCN2017099462-appb-000014
The first animation frame to be merged and the second animation frame can ensure that the frame loss rate of the animation synthesized by the two animation frames is as low as possible, and the two animations can be guaranteed to a certain extent. Frames can be as similar as possible, further reducing the impact of dropped frames on animation synthesis. By changing the values of a and b, the first animation frame and the second animation frame ideal for the user can be obtained. For example, when a=1, b=0, that is, the above consideration only considers the correlation between two animations without considering the frame loss rate.
Figure PCTCN2017099462-appb-000015
Thus, two animation frames with the smallest Euclidean distance (ie, the highest degree of similarity) are obtained, and when a=0, b=1, the above is only considering the frame loss rate between the two animations without considering the correlation. At this time
Figure PCTCN2017099462-appb-000016
That is, two animation frames with the lowest frame loss rate are obtained.
需要说明的是,通过上述公式
Figure PCTCN2017099462-appb-000017
确定出的待融合的动画帧可能是多对动画帧,因此,当遇到这种情况时,终端可进一步的从这多对动画帧中确定出相似度最高的一对动画帧来进行融合,或是从这多对动画帧中选取丢帧率最低的一对动画帧来进行融合。具体的,终端可确定出各第一动画帧中的第三动画帧以及各第二动画帧中的第四动画帧,其中,第三动画帧和第四动画帧之间的相似度最高,或是根据第三动画帧和第四动画帧来合成动画后,相应的丢帧率最低。由于通过公式
Figure PCTCN2017099462-appb-000018
确定出的待融合的动画帧以将丢帧的不利影响尽可能的降低,所以,后续在这些待融合帧中,无论是以相似度最高(即欧式距离最小)的标准来合成动画,还是以丢帧率最低的标准来合成动画,其最终得到的合成后的动画从效果上看都尽 可能的降低了丢帧所带来的不利影响。
It should be noted that the above formula is adopted.
Figure PCTCN2017099462-appb-000017
The determined animation frame to be merged may be a plurality of pairs of animation frames, so when encountering such a situation, the terminal may further determine a pair of animation frames with the highest similarity from the plurality of pairs of animation frames for fusion. Or choose a pair of animation frames with the lowest frame loss rate from the multiple pairs of animation frames for fusion. Specifically, the terminal may determine a third animation frame in each first animation frame and a fourth animation frame in each second animation frame, wherein a similarity between the third animation frame and the fourth animation frame is the highest, or After the animation is synthesized according to the third animation frame and the fourth animation frame, the corresponding frame loss rate is the lowest. Due to the formula
Figure PCTCN2017099462-appb-000018
The determined animation frames to be merged are used to reduce the adverse effects of the dropped frames as much as possible. Therefore, in the frames to be fused, whether the animations are synthesized with the highest similarity (ie, the European distance is the smallest), The lowest frame rate is used to synthesize the animation, and the resulting synthesized animation is as effective as possible to reduce the adverse effects of frame dropping.
上述说明的几种动画合成方式或多或少的都会在合成过程中丢掉一部分动画帧,为了进一步的降低丢帧所带来的不利影响,在本申请实施例中,可也将终端确定出的待融合的两个动画帧之间的各动画帧通过一定的方式相互进行融合,以使最终合成的动画中不会存在丢帧的现象发生。In the embodiment of the present application, the terminal may also determine the number of the animation frames in the above-mentioned manner, in order to further reduce the adverse effects caused by the frame loss. The animation frames between the two animation frames to be fused are fused to each other in a certain way, so that no frame dropping occurs in the final synthesized animation.
具体的,当终端通过公式
Figure PCTCN2017099462-appb-000019
确定出第一动画帧和第二动画帧后,终端在根据该第一动画帧以及第二动画帧来进行动画合成的过程中,可从前一动画中,选择出该第一动画帧以及位于该第一动画帧之后的k个动画帧,并将选择出的各动画帧按照各动画帧在前一动画中的排列顺序进行排序,进而得到第一帧序列;同理,终端可从后一动画中,选取出位于第二动画帧之前的k个动画帧以及该第二动画帧,并将选择出的各动画帧按照各动画帧在后一动画中的排列顺序进行排序,以得到第二帧序列。而后,终端可将第一帧序列和第二帧序列中序号相同的动画帧进行两两融合,得到k+1个融合帧,进而通过前一动画中位于该第一动画帧之前的各动画帧、k+1个融合帧、以及后一动画中位于该第二动画帧之后的各动画帧进行合成。
Specifically, when the terminal passes the formula
Figure PCTCN2017099462-appb-000019
After determining the first animation frame and the second animation frame, the terminal may select the first animation frame from the previous animation and locate the animation during the animation synthesis according to the first animation frame and the second animation frame. k animation frames after the first animation frame, and sorting the selected animation frames according to the order of the animation frames in the previous animation, thereby obtaining the first frame sequence; similarly, the terminal can be from the latter animation The k animation frames located before the second animation frame and the second animation frame are selected, and the selected animation frames are sorted according to the arrangement order of the animation frames in the subsequent animation to obtain the second frame. sequence. Then, the terminal may combine the first frame sequence and the second frame sequence with the same sequence number of animation frames to obtain k+1 fused frames, and then pass the animation frames located before the first animation frame in the previous animation. , k+1 fused frames, and each animation frame located after the second animation frame in the latter animation are synthesized.
例如,假设终端对动画C和动画D实施合成的过程中(动画C中包含有#1~#5一共5个动画帧,动画D包含有*1~*7一共7个动画帧),确定出动画C中的动画帧#3和动画D中的动画帧*3欧式距离最小(即相似度最高)(此时在
Figure PCTCN2017099462-appb-000020
公式中,a=1,b=0,当然在其他实施例当中a和b也可以选取其他值,这样将得到其他相匹配的动画帧),则终端可进一步的从前一动画中选取出动画帧#3~#5作为第一帧序列,并从后一动画中选取出动画帧*1~*3组成第二帧序列(k=2),而后,终端可将第一帧序列和第二帧序 列中序号排序序号相同的两个动画帧进行融合,即,动画帧#3和动画帧*1融合、动画帧#4和动画帧*2融合、动画帧#5和动画帧*3进行融合,得到3个融合帧。终端在确定出融合帧后,可将前一动画中的动画帧#1、#2、3个融合帧、以及后一动画中的动画帧*4~*7按顺序进行合成,进而得到合成后的动画。
For example, suppose the terminal performs the process of synthesizing the animation C and the animation D (the animation C includes a total of 5 animation frames of #1 to #5, and the animation D includes a total of 7 animation frames of *1 to *7), and determines Animation frame #3 in animation C and animation frame in animation D *3 Euclidean distance is the smallest (ie, the highest similarity) (this time in
Figure PCTCN2017099462-appb-000020
In the formula, a=1, b=0, of course, in other embodiments, a and b can also select other values, so that other matching animation frames will be obtained, and the terminal can further select an animation frame from the previous animation. #3~#5 is used as the first frame sequence, and the animation frame *1~*3 is selected from the latter animation to form a second frame sequence (k=2), and then the terminal can sequence the first frame and the second frame. The two animation frames with the same sequence number and sequence number are merged, that is, the animation frame #3 and the animation frame *1 fusion, the animation frame #4 and the animation frame *2 fusion, the animation frame #5, and the animation frame *3 are merged. Get 3 fused frames. After determining the fused frame, the terminal can synthesize the animation frames #1, #2, and 3 fused frames in the previous animation and the animation frames *4 to *7 in the latter animation in order, and then obtain the synthesized Animation.
终端在对各待融合的动画帧进行融合时,可采用公式
Figure PCTCN2017099462-appb-000021
来进行融合,具体的,对于第一帧序列和第二帧序列中排序序号都为p的两个动画帧来说,终端可通过公式
Figure PCTCN2017099462-appb-000022
确定出第一帧序列中第p个动画帧对应的融合系数,同时可通过公式β(p)=1-α(p)来确定出第二帧序列中第p个动画帧对应的融合系数,而后,终端可通过确定出的各融合系数,将第一帧序列中的第p个动画帧与第二帧序列中的第p个动画帧进行融合,得到相应的融合帧。
When the terminal fuses the animation frames to be fused, the formula can be used.
Figure PCTCN2017099462-appb-000021
To perform the fusion, specifically, for the two animation frames in which the sequence number of the first frame sequence and the second frame sequence are both p, the terminal can pass the formula
Figure PCTCN2017099462-appb-000022
Determining the fusion coefficient corresponding to the p-th animation frame in the first frame sequence, and determining the fusion coefficient corresponding to the p-th animation frame in the second frame sequence by using the formula β(p)=1-α(p), Then, the terminal may fuse the p-th animation frame in the first frame sequence with the p-th animation frame in the second frame sequence by using the determined fusion coefficients to obtain a corresponding fused frame.
通过上述的融合方式,终端尽可能的降低了动画合成中的丢帧率,并且,为了保证合成后的动画在效果上不会出现明显的跳跃感,终端通过计算参与到融合过程中的各动画帧的融合系数,来对各动画帧进行融合,以保证各融合帧在合成后动画中的显示效果,降低了动画合成过程中所带来的不利因素。Through the above fusion method, the terminal reduces the frame loss rate in the animation synthesis as much as possible, and in order to ensure that the synthesized animation does not have obvious jumping feeling in the effect, the terminal calculates the animations participating in the fusion process by calculating. The fusion coefficient of the frame is used to fuse each animation frame to ensure the display effect of each fusion frame in the synthesized animation, which reduces the disadvantages caused by the animation synthesis process.
终端根据各文本关键词在文本信息中的排列顺序将各文本关键词对应的各动画进行合成后,可将得到的融合动画进行显示,并可将该融合动画作为信息发布在社交平台上,或是将其作为聊天信息发送给其他的用户。而为了进一步提升融合动画所带来的效果,在本申请实施例中,终端在将该融合动画进行显示或发送之前,可进一步的确定出该文本信息所对应的效果信息,并通过该效果信息,来对该融合动画实施调整,其中,这里提到的效果信息可以是该融合动画的背景音乐、音效、或是该文本信息对应的语音信息等,对于这几种效 果信息的确定方式,以及如何通过这几种效果信息来对融合动画实施调整的具体过程,下面将进行详细说明。The terminal synthesizes the respective animations corresponding to the text keywords according to the arrangement order of the text keywords in the text information, and then displays the obtained fused animation, and can publish the fused animation as information on the social platform, or It is sent as a chat message to other users. In order to further enhance the effect of the fused animation, in the embodiment of the present application, before the terminal displays or transmits the fused animation, the terminal may further determine the effect information corresponding to the text information, and pass the effect information. To adjust the fused animation, wherein the effect information mentioned herein may be the background music, the sound effect of the fused animation, or the voice information corresponding to the text information, etc. The specific method of determining the information and how to adjust the fusion animation through these kinds of effect information will be described in detail below.
对于融合动画的背景音乐,终端在得到融合动画后,可进一步的根据识别出的各文本关键词,从预设的音乐库中分别确定出各文本关键词所对应的各音乐,具体的确定方式可以是,将各文本关键词分别与音乐库中各音乐对应的音乐关键词相匹配,并将与文本关键词相匹配的音乐关键词所对应的音乐就作为文本关键词对应的音乐,或是,针对每个文本关键词,分别计算该文本关键词与各音乐关键词的相似度,并根据计算出的各相似度,选取与该文本关键词相匹配的音乐,其中,终端经计算确定出的该文本关键词所对应的音乐可能有多个,为了筛选出更加符合整个文本信息语境的音乐,终端可进一步根据该文本信息的特征信息来对该文本关键词所对应的多个音乐实施筛选,以选出更加符合整个文本信息语音的音乐,具体的筛选方式与上述说明的筛选动画的方式相同,在此就不进行详细赘述了。For the background music of the fused animation, after obtaining the fused animation, the terminal can further determine each music corresponding to each text keyword from the preset music library according to the recognized text keywords, and the specific determination manner. The music keywords corresponding to the music in the music library are respectively matched, and the music corresponding to the music keyword matching the text keyword is used as the music corresponding to the text keyword, or For each text keyword, respectively calculating the similarity between the text keyword and each music keyword, and selecting music matching the text keyword according to the calculated similarities, wherein the terminal determines through calculation There may be multiple music corresponding to the text keyword. In order to filter music that is more in line with the context of the entire text information, the terminal may further implement a plurality of music corresponding to the text keyword according to the feature information of the text information. Screening to select music that is more in line with the text of the entire text, the specific screening method and the screening of the above description In the same way, this is not a detailed repeat.
对于预设音乐库中各音乐所对应的各音乐关键词,终端可针对音乐库中的每个音乐,确定出能够表示该音乐的特征,如以梅尔倒谱系数MFCC来表示该音乐的特征,而后,终端可针对确定出的每个音乐的特征,将该特征输入到预设的音乐模型中去,并根据该音乐模型的输出结果,确定出户该音乐所对应的音乐关键词,具体过程与上述确定动画关键词的方式相同,在此就不进一步详细说明了。终端在确定出各音乐所对应的各音乐关键词后,可将各音乐与各音乐关键词对应起来保证在预设的音乐库中,以备后续进行使用。当然,在本申请实施例中,各音乐所对应的各音乐关键词也可通过人为的方式进行确定,即,人为标定出各音乐对应的各音乐关键词并相互对应的保存在预设的音乐库 中。For each music keyword corresponding to each music in the preset music library, the terminal may determine, for each music in the music library, a feature capable of representing the music, such as expressing the feature of the music by the Mel cepstrum coefficient MFCC. Then, the terminal may input the feature into the preset music model for the determined characteristics of each music, and determine the music keyword corresponding to the music according to the output result of the music model, specifically The process is the same as the above method of determining the animation keyword, and will not be described in further detail here. After determining the music keywords corresponding to each music, the terminal can associate each music with each music keyword to ensure that it is in the preset music library for later use. Certainly, in the embodiment of the present application, each music keyword corresponding to each music can also be determined by an artificial manner, that is, the music keywords corresponding to each music are manually calibrated and corresponding to each other and saved in the preset music. Library in.
终端确定出各文本关键词对应的各音乐后,可将各音乐按照各文本关键词在其所在的文本信息中的排列顺序进行合成,得到相应的融合音乐,其中,将各音乐进行合成的方式与上述合成动画的方式基本相同,如终端可通过对各音乐设定诸如淡出或淡入等播放效果的方式来实现融合音乐中各音乐的过渡,或是通过确定各音乐融合系数的方式来对各音乐实施融合,具体的过程在此就不进行详细说明了。After the terminal determines each music corresponding to each text keyword, each music can be synthesized according to the order of the text keywords in the text information in which they are located, and the corresponding fusion music is obtained, wherein the music is synthesized. The manner of synthesizing the animation is basically the same, for example, the terminal can realize the transition of each music in the fused music by setting the playing effect such as fade out or fade in the music, or by determining the fusion coefficient of each music. The integration of music implementation, the specific process will not be described in detail here.
终端在确定出该融合音乐后,可将该融合音乐合成到上述融合动画中,以进一步提高该融合动画的播放效果,其中,具体的合成方式可以是,终端通过确定该融合动画播放速度的方式来调整该融合音乐的播放速度,使得融合音乐和融合动画在播放速度上实现同步,或是终端可将该融合音乐以一定的播放速度循环在融合动画中进行播放,亦或是终端可在调整该融合音乐播放速度的同时,可基于文本关键词,将该融合音乐中各音乐与融合动画中的各动画相互对应起来,从而完成融合音乐与融合动画的合成工作。After determining the fused music, the terminal may synthesize the fused music into the fused animation to further improve the playing effect of the fused animation, wherein the specific merging manner may be: the manner in which the terminal determines the fused animation playing speed To adjust the playing speed of the fused music, so that the fused music and the fused animation are synchronized in the playing speed, or the terminal can cycle the fused music in the fused animation at a certain playing speed, or the terminal can be adjusted. The fusion music playing speed can be based on the text keyword, and the music in the fused music and the animation in the fused animation are mutually correlated, thereby completing the compositing work of the fused music and the fused animation.
需要说明的是,终端在确定各音乐所对应的各音乐关键词时,可以不同的维度选取不同的音乐模型来进行确定,比如说,选取与体育相关的音乐模型时,终端最终通过该音乐模型而确定出的各音乐对应的各音乐关键词应是与体育相关的,而选取与情绪相关的音乐模型时,最终确定出的各音乐对应的各音乐关键词应是与情绪相关的。所以,对于每个音乐来说,终端通过不同维度的音乐模型所确定出的该音乐对应的音乐关键词可能有多个,这就给后续终端通过文本信息的特征信息筛选各音乐奠定了基础。It should be noted that, when determining the music keywords corresponding to each music, the terminal may select different music models in different dimensions to determine, for example, when selecting a music model related to sports, the terminal finally passes the music model. The determined music keywords corresponding to the respective music should be related to sports, and when the music model related to the emotion is selected, the final determined music keywords corresponding to the respective music should be related to emotions. Therefore, for each music, there may be more than one music keyword corresponding to the music determined by the terminal through the music model of different dimensions, which lays a foundation for the subsequent terminal to filter the music through the feature information of the text information.
上述提到的音乐模型可经采集的大量样本音乐训练后得出,训练的方式与 上述训练其他模型的方式相似,在此就不进行详细说明了。而融合动画的背景音乐除了可通过上述说明的方式进行确定之外,终端也可通过该文本信息的特征信息,确定出该融合动画的一个整体背景音乐,进而将该背景音乐融合到融合动画中。The music model mentioned above can be obtained after training a large amount of sample music collected, and the training method and The above methods for training other models are similar and will not be described in detail here. The background music of the fused animation can be determined by the above description, and the terminal can also determine an overall background music of the fused animation through the feature information of the text information, and then integrate the background music into the fused animation. .
而对于融合动画的音效来说,通常情况下,动画在不同时段所显示的剧烈程度往往都是不同的,比如,动画中的有些时段在画面上相对舒缓,而有些时段则较为激烈,并且,动画中的人物动作、物体的行进速度等在不同的时段往往也不尽相同。因此,为了进一步的提升融合动画的效果以及趣味性,在本申请实施例中,终端可通过对融合动画中各项动画参数的监测,来调整融合音乐的音效,例如,当终端监测到某一时段的各项动画参数变化过快时,则可将这一时段所对应的融合音乐在音效上调整的较为激烈一些,或是,当融合动画中的人物进行诸如拍手、踏步、急喘等动作时,终端可将这些动作所对应的音效融合到融合音乐中,当然也可以是其他的调整方式,在此就不进行一一说明了。而当终端调整完该融合音乐的音效后,可将调整音效后的融合音乐合成到该融合动画中去,这样一来,音效的存在将进一步的提升融合动画的效果,进而给用户带来了更多的趣味性。For the sound effects of fused animations, the violentness of the animations is often different at different times. For example, some periods in the animation are relatively soothing on the screen, while some periods are more intense, and The movements of characters in an animation, the speed at which objects travel, and the like are often different at different times. Therefore, in order to further enhance the effect and the interest of the fused animation, in the embodiment of the present application, the terminal can adjust the sound effect of the fused music by monitoring the animation parameters in the fused animation, for example, when the terminal monitors a certain When the animation parameters of the time period change too fast, the fusion music corresponding to this time period can be adjusted more flexibly on the sound effect, or when the characters in the fusion animation perform actions such as clapping, stepping, and panting. At the same time, the terminal can fuse the sound effects corresponding to these actions into the fused music, and of course, other adjustment methods may be used, and the description will not be made here. After the terminal adjusts the sound effect of the fused music, the fused music after adjusting the sound effect can be synthesized into the fused animation, so that the existence of the sound effect further enhances the effect of the fused animation, thereby bringing the user More fun.
在实际应用中,用户输入的文本信息中通常都会包含有一些指定的字符,如冒号“:”、书名号等,这些指定字符的后面包含的文本信息通常是一段特殊的文本信息,如冒号双引号“:“”后面通常表示的是一段话语。为了进一步的提升融合动画的效果以及趣味性,在本申请实施例中,终端可将指定字符后面的一段文本信息进行一定的处理,并将处理后得到的效果信息插入到该融合动画中,具体的方式可以是,终端可从该文本信息中确定出其包含的指定字符, 其中,这里提到的指定字符可以是冒号双引号“:“”,而后,终端可根据该指定字符,从该文本信息中提取出该指定字符后面的一段子文本信息,并通过语音识别功能将该子文本信息转换为相应的语音,后续过程中,终端可将该语音或是该语音对应的子文本信息作为效果信息插入到该融合动画中,其中,对于确定出的语音来说,终端可将该语音合成在融合动画中,以实现对该融合动画的配音。而于该子文本信息来说,终端可将该子文本信息以预设的显示方式插入到该融合动画中,如图2所示。In practical applications, the text information input by the user usually contains some specified characters, such as a colon ":", a book name, etc., and the text information contained after the specified characters is usually a special piece of text information, such as colon double quotes. ":" is usually followed by a paragraph. In order to further enhance the effect and fun of the fusion animation, in the embodiment of the present application, the terminal can process a piece of text information after the specified character and process it. The obtained effect information is inserted into the fused animation. The specific manner may be that the terminal can determine the specified character included in the text information. Wherein, the specified character mentioned here may be a colon double quotation mark ":"", and then the terminal may extract a piece of sub-text information following the specified character from the text information according to the specified character, and adopt a voice recognition function The sub-text information is converted into a corresponding voice, and the terminal may insert the voice or the sub-text information corresponding to the voice as the effect information into the fused animation, wherein, for the determined voice, the terminal may The speech is synthesized in the fused animation to realize the dubbing of the fused animation. In the sub-text information, the terminal can insert the sub-text information into the fused animation in a preset display manner, as shown in FIG. 2 . Shown.
图2为本申请实施例提供的融合动画中话语信息的显示示意图。FIG. 2 is a schematic diagram showing display of utterance information in a fusion animation according to an embodiment of the present application.
在图2中,终端在确定出文本信息中冒号双引号“:“”后面的一段子文本信息为一段话语时,则可将这段子文本信息作为融合动画中人物的话语,并将该话语放置在指定的对话框中显示在融合动画中的人物上方,当然,这段子文本信息也可通过诸如气泡、云朵等形式显示在融合动画中,以提升融合动画的显示效果以及趣味性。In FIG. 2, when the terminal determines that the sub-text information following the colon double quotation mark “: “” in the text information is a utterance, the sub-text information can be used as the utterance of the character in the fused animation, and the utterance is placed. In the specified dialog box, it is displayed above the characters in the fused animation. Of course, this sub-text information can also be displayed in the fused animation through bubbles, clouds, etc., to enhance the display effect and fun of the fused animation.
需要说明的是,上述说明的指定字符并不一定是冒号双引号“:“”,也可以是诸如“想:”这样的指定字符,终端当确定出文本信息中包含有“想”和冒号“:”连用的情况时,则可确定出之后的一段子文本信息应为融合动画中人物的心里描述,进而可将这段子文本信息作为融合动画中人物的心里活动,以一定的形式显示在融合动画中。当然,该指定字符也可以其他的字符或字符连用,如单字“说”、“问”等,在此就不一一进行详细说明了。It should be noted that the specified characters in the above description are not necessarily the colon double quotes ":", or may be a designated character such as "think:", and the terminal determines that the text information includes "think" and colon " : "When used in combination, it can be determined that the subsequent sub-text information should be described in the heart of the fused animation, and this sub-text information can be used as a heart activity in the fused animation, and displayed in a certain form in the fusion. In the animation, of course, the specified characters can also be used in combination with other characters or characters, such as the words "say", "question", etc., which will not be explained in detail here.
在实际应用中,对动画进行配音的过程中往往都会涉及到口型的问题,因此,在本申请实施例中,终端可也将用户输入的整段文本信息作为一段话语,从这段文本信息中提取出相应的各语音特征信息,并进一步的确定出各语音特 征信息所对应的口型类别,其中,这里提到的口型类别是指,通常情况下,不同的音节都对应有相应的口型类别,而每个口型类别都对应有各自的口型动画。一个字的发音通常是由若干个音节的发音所形成的,相应的,一个字所对应的口型动画则也应是由若干个音节对应的口型类别所对应的动画构成的,因此,当终端确定出各口型类别后,则就相应的确定出了文本信息中每个字所对应的口型动画,进而将每个字的口型动画作为效果信息合成到融合动画中去,如图3所示。In practical applications, the process of dubbing an animation often involves a lip-shaped problem. Therefore, in the embodiment of the present application, the terminal may also use the entire piece of text information input by the user as a utterance, from the text information. Extracting corresponding voice feature information, and further determining each voice feature The mouth type corresponding to the information, wherein the mouth type mentioned here means that, in general, different syllables have corresponding mouth type categories, and each mouth type corresponds to a respective mouth type. Animation. The pronunciation of a word is usually formed by the pronunciation of several syllables. Correspondingly, the lip animation corresponding to a word should also be composed of animations corresponding to the vocal categories corresponding to several syllables. Therefore, when After the terminal determines each port type category, the mouth shape animation corresponding to each word in the text information is determined correspondingly, and then the mouth shape animation of each word is synthesized as effect information into the fusion animation, as shown in the figure. 3 is shown.
图3为本申请实施例提供的口型动画的示意图。FIG. 3 is a schematic diagram of a lip animation provided by an embodiment of the present application.
图3中分别列举了“我”、“行”所对应的各口型图片,其中,“我”的发音为“wo”,通常情况下,终端可将“wo”拆分为“w”以及“o”,并确定出“w”以及“o”所对应的口型类别(口型图片)分别为图c1和图c2,这样一来,终端则可进一步的确定出“我”所对应的口型动画,同理,,“行”的发音由“x”和“ing”两个音节所构成,则终端可根据这两个音节所对应的口型类别图d1和图d2,确定出“行”所对应的口型动画。In Figure 3, the pictures of "I" and "Line" are listed respectively. The pronunciation of "I" is "wo". Under normal circumstances, the terminal can split "wo" into "w" and "o", and determine the mouth type (mouth image) corresponding to "w" and "o" are respectively shown in Figure c1 and Figure c2, so that the terminal can further determine the corresponding "I" The mouth animation, the same reason, the pronunciation of "row" is composed of two syllables of "x" and "ing", and the terminal can determine according to the mouth type map d1 and the figure d2 corresponding to the two syllables. The mouth animation corresponding to the line.
终端在确定出各口型动画后,可根据各语音信息所基于的单字在该文本信息中的位置,将各口型动画合成到融合动画中,其中,合成的方式可以是,将各口型动画的大小按照融合动画中的人物口型进行调整后,依次对融合动画中的人物口型进行替换,继而得到带有语音与口型相匹配的融合动画。After determining the mouth-shaped animation, the terminal can synthesize each mouth-shaped animation into the fusion animation according to the position of the single word based on the voice information in the text information, wherein the synthesis method may be: The size of the animation is adjusted according to the character shape of the fused animation, and then the character shape of the fused animation is replaced in turn, and then the fused animation with the matching of the voice and the mouth is obtained.
以上为本申请实施例提供的动画合成的方法,基于同样的思路,本申请实施例还提供了动画合成的装置,如图4所示。The above is the method for the animation synthesis provided by the embodiment of the present application. Based on the same idea, the embodiment of the present application further provides an animation synthesis device, as shown in FIG. 4 .
图4为本申请实施例提供的一种动画合成的装置示意图,具体包括:FIG. 4 is a schematic diagram of an apparatus for synthesizing an animation according to an embodiment of the present disclosure, specifically including:
接收模块401,用于接收输入的文本信息; a receiving module 401, configured to receive input text information;
识别模块402,用于识别所述文本信息中的各文本关键词;The identification module 402 is configured to identify each text keyword in the text information;
确定模块403,用于从预设的动画库中分别确定出各文本关键词所对应的动画;a determining module 403, configured to respectively determine an animation corresponding to each text keyword from a preset animation library;
合成模块404,用于将确定出的各动画进行合成,得到融合动画。The compositing module 404 is configured to synthesize the determined animations to obtain a fused animation.
所述确定模块403具体用于,提取所述文本信息中的特征信息;针对每个文本关键词,根据该文本关键词以及所述特征信息,从预设的动画库中,确定出对应于该文本关键词、且对应于所述特征信息的动画。The determining module 403 is specifically configured to: extract feature information in the text information; and, for each text keyword, determine, according to the text keyword and the feature information, from the preset animation library, corresponding to the A text keyword and an animation corresponding to the feature information.
所述合成模块404具体用于,将确定出的各动画按照所述各关键词在所述文本信息中的排序进行合成。The synthesizing module 404 is specifically configured to synthesize the determined animations according to the order of the keywords in the text information.
所述合成模块404具体用于,针对任意两个相邻的动画,确定待插入到所述前一动画和所述后一动画之间的过渡动画片段,将所述前一动画、所述过渡动画片段以及所述后一动画按顺序进行合成;或The synthesizing module 404 is specifically configured to determine, for any two adjacent animations, a transition animation segment to be inserted between the previous animation and the latter animation, and the previous animation, the transition The animation segment and the subsequent animation are synthesized in sequence; or
针对任意两个相邻的动画,将前一动画的各第一指定动画帧设为第一效果,将后一动画的各第二指定动画帧设为第二效果,并将该设置效果后的前一动画和后一动画进行合成,其中,所述第一效果至少包括淡出效果,所述第二指定效果至少包括淡入效果;或For any two adjacent animations, the first specified animation frame of the previous animation is set as the first effect, and the second specified animation frame of the subsequent animation is set as the second effect, and the setting effect is Combining the previous animation with the latter animation, wherein the first effect includes at least a fade-out effect, and the second specified effect includes at least a fade-in effect; or
针对任意两个相邻的动画,确定前一动画的每个动画帧图像与后一动画的每个动画帧图像的相似度,根据确定出的各相似度,对该前一动画和后一动画进行合成。For any two adjacent animations, determine the similarity between each animation frame image of the previous animation and each animation frame image of the latter animation, and according to the determined similarities, the previous animation and the latter animation Perform the synthesis.
所述合成模块404具体用于,从所述前一动画中,选择第一动画帧以及位于所述第一动画帧之后的k个动画帧,并按选择出的各动画帧在所述前一动画中的排列顺序进行排序,得到第一帧序列;从所述后一动画中,选择位于所述 第二动画帧之前的k个动画帧以及第二动画帧,并按选择出的各动画帧在所述后一动画中的排列顺序进行排序,得到第二帧序列;将第一帧序列和第二帧序列中排序序号相同的动画帧进行融合,得到k+1个融合帧;对所述前一动画中位于所述第一动画帧之前的各动画帧、各融合帧、所述后一动画中位于所述第二动画帧之后的各动画帧进行合成;其中,k为正整数。The synthesizing module 404 is specifically configured to: select, from the previous animation, a first animation frame and k animation frames located after the first animation frame, and press each of the selected animation frames in the previous one Sorting the order in the animation to obtain a first frame sequence; from the latter animation, the selection is located in the The k animation frames before the second animation frame and the second animation frame are sorted according to the arrangement order of the selected animation frames in the subsequent animation to obtain a second frame sequence; the first frame sequence and the first frame sequence An animation frame with the same sequence number in the two frame sequence is fused to obtain k+1 fused frames; for each animation frame, each fused frame, and the latter animation before the first animation frame in the previous animation Each animation frame located after the second animation frame is synthesized; wherein k is a positive integer.
所述合成模块404具体用于,采用公式
Figure PCTCN2017099462-appb-000023
确定所述第一帧序列中各动画帧对应的融合系数;采用公式β(p)=1-α(p)确定所述第二帧序列中各动画帧对应的融合系数;其中:α(p)为所述第一帧序列中第p个动画帧对应的融合系数,β(p)为所述第二帧序列中第p个动画帧对应的融合系数;根据确定出的各融合系数,将第一帧序列和第二帧序列中排序序号相同的动画帧进行融合。
The synthesis module 404 is specifically configured to adopt a formula
Figure PCTCN2017099462-appb-000023
Determining a fusion coefficient corresponding to each animation frame in the first frame sequence; determining a fusion coefficient corresponding to each animation frame in the second frame sequence by using a formula β(p)=1-α(p); wherein: α(p a fusion coefficient corresponding to the p-th animation frame in the first frame sequence, β(p) is a fusion coefficient corresponding to the p-th animation frame in the second frame sequence; according to the determined fusion coefficients, The first frame sequence and the second frame sequence have the same sequence number of animation frames for fusion.
所述装置还包括:The device also includes:
效果确定模块405,用于确定所述文本信息对应的效果信息;根据所述文本信息对应的效果信息,调整所述融合动画。The effect determining module 405 is configured to determine effect information corresponding to the text information, and adjust the fusion animation according to the effect information corresponding to the text information.
所述效果确定模块405具体用于,根据识别出的各文本关键词,从预设的音乐库中分别确定出与所述各文本关键词相匹配的音乐。The effect determining module 405 is specifically configured to determine, according to the identified text keywords, music that matches the text keywords from the preset music library.
所述效果确定模块405具体用于,按照各文本关键词在所述文本信息中的排列顺序,对确定出的各音乐进行合成,得到融合音乐;将所述融合音乐合成到所述融合动画中。The effect determining module 405 is specifically configured to synthesize the determined music according to the order of the text keywords in the text information to obtain the fused music; and synthesize the fused music into the fused animation. .
所述效果确定模块405具体用于,监测所述融合动画对应的各动画参数;根据各动画参数调整所述融合音乐的音效;将调整音效后的融合音乐合成到所述融合动画中。 The effect determining module 405 is specifically configured to: monitor each animation parameter corresponding to the fused animation; adjust the sound effect of the fused music according to each animation parameter; and synthesize the fused music after adjusting the sound effect into the fused animation.
所述效果确定模块405具体用于,从所述文本信息提取各语音特征信息;根据所述各语音述特征信息,确定所述各语音特征信息对应的各口型类别;根据所述各口型类别,确定所述各口型类别对应的各口型动画,并将所述各口型动画作为确定的效果信息。The effect determining module 405 is specifically configured to: extract each voice feature information from the text information; and determine, according to the voice description feature information, each port type corresponding to each voice feature information; a category, determining each mouth-shaped animation corresponding to each of the lip-type categories, and using the respective mouth-shaped animations as the determined effect information.
所述效果确定模块405具体用于,根据提取各语音特征信息所基于的单字在所述文本信息中的位置,将各口型动画合成到所述融合动画中。The effect determining module 405 is specifically configured to synthesize each lip animation into the fused animation according to the position of the single word on which the respective voice feature information is extracted in the text information.
本申请实施例提供了一种动画合成的方法及装置,该方法中终端可接收用户输入的文本信息,并从该文本信息中识别出各文本关键词,而后,终端可从预设的动画库中分别确定出各文本关键词所对应的动画,并将各动画按照各关键词在文本信息中的排列顺序进行合成,得到融合动画。由于动画相对于文本信息来说,能够更加充分、生动的表达出信息中的含义,因此,相对于现有技术中只是将信息以文本或语音的形式进行呈现的方式来说,通过转化文本信息而得到的动画能够更加充分、生动的表达出信息本身的含义,从而给用户在阅读信息的过程中带来了乐趣以及便利。The embodiment of the present application provides a method and an apparatus for synthesizing an animation. In this method, a terminal can receive text information input by a user, and identify each text keyword from the text information, and then the terminal can obtain a preset animation library. The animations corresponding to the respective text keywords are respectively determined, and each animation is synthesized according to the arrangement order of the keywords in the text information to obtain a fusion animation. Since the animation can express the meaning of the information more fully and vividly with respect to the text information, the text information is converted by the way of presenting the information in the form of text or voice. The obtained animation can more fully and vividly express the meaning of the information itself, thereby bringing the user the fun and convenience in the process of reading the information.
本领域内的技术人员应明白,本发明的实施例可提供为方法、系统、或计算机程序产品。因此,本发明可采用完全硬件实施例、完全软件实施例、或结合软件和硬件方面的实施例的形式。而且,本发明可采用在一个或多个其中包含有计算机可用程序代码的计算机可用存储介质(包括但不限于磁盘存储器、CD-ROM、光学存储器等)上实施的计算机程序产品的形式。Those skilled in the art will appreciate that embodiments of the present invention can be provided as a method, system, or computer program product. Accordingly, the present invention may take the form of an entirely hardware embodiment, an entirely software embodiment, or a combination of software and hardware. Moreover, the invention can take the form of a computer program product embodied on one or more computer-usable storage media (including but not limited to disk storage, CD-ROM, optical storage, etc.) including computer usable program code.
本发明是参照根据本发明实施例的方法、设备(系统)、和计算机程序产品的流程图和/或方框图来描述的。应理解可由计算机程序指令实现流程图和/或方框图中的每一流程和/或方框、以及流程图和/或方框图中的流程和/ 或方框的结合。可提供这些计算机程序指令到通用计算机、专用计算机、嵌入式处理机或其他可编程数据处理设备的处理器以产生一个机器,使得通过计算机或其他可编程数据处理设备的处理器执行的指令产生用于实现在流程图一个流程或多个流程和/或方框图一个方框或多个方框中指定的功能的装置。The present invention has been described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (system), and computer program products according to embodiments of the invention. It will be understood that each of the processes and/or blocks in the flowcharts and/or block diagrams, and the flows in the flowcharts and/or block diagrams can be implemented by computer program instructions and/or Or a combination of boxes. These computer program instructions can be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing device to produce a machine for the execution of instructions for execution by a processor of a computer or other programmable data processing device. Means for implementing the functions specified in one or more of the flow or in a block or blocks of the flow chart.
这些计算机程序指令也可存储在能引导计算机或其他可编程数据处理设备以特定方式工作的计算机可读存储器中,使得存储在该计算机可读存储器中的指令产生包括指令装置的制造品,该指令装置实现在流程图一个流程或多个流程和/或方框图一个方框或多个方框中指定的功能。The computer program instructions can also be stored in a computer readable memory that can direct a computer or other programmable data processing device to operate in a particular manner, such that the instructions stored in the computer readable memory produce an article of manufacture comprising the instruction device. The apparatus implements the functions specified in one or more blocks of a flow or a flow and/or block diagram of the flowchart.
这些计算机程序指令也可装载到计算机或其他可编程数据处理设备上,使得在计算机或其他可编程设备上执行一系列操作步骤以产生计算机实现的处理,从而在计算机或其他可编程设备上执行的指令提供用于实现在流程图一个流程或多个流程和/或方框图一个方框或多个方框中指定的功能的步骤。These computer program instructions can also be loaded onto a computer or other programmable data processing device such that a series of operational steps are performed on a computer or other programmable device to produce computer-implemented processing for execution on a computer or other programmable device. The instructions provide steps for implementing the functions specified in one or more of the flow or in a block or blocks of a flow diagram.
在一个典型的配置中,计算设备包括一个或多个处理器(CPU)、输入/输出接口、网络接口和内存。In a typical configuration, a computing device includes one or more processors (CPUs), input/output interfaces, network interfaces, and memory.
内存可能包括计算机可读介质中的非永久性存储器,随机存取存储器(RAM)和/或非易失性内存等形式,如只读存储器(ROM)或闪存(flash RAM)。内存是计算机可读介质的示例。The memory may include non-persistent memory, random access memory (RAM), and/or non-volatile memory in a computer readable medium, such as read only memory (ROM) or flash memory. Memory is an example of a computer readable medium.
计算机可读介质包括永久性和非永久性、可移动和非可移动媒体可以由任何方法或技术来实现信息存储。信息可以是计算机可读指令、数据结构、程序的模块或其他数据。计算机的存储介质的例子包括,但不限于相变内存(PRAM)、静态随机存取存储器(SRAM)、动态随机存取存储器(DRAM)、其他类型的随机存取存储器(RAM)、只读存储器(ROM)、电可擦除可编程只读 存储器(EEPROM)、快闪记忆体或其他内存技术、只读光盘只读存储器(CD-ROM)、数字多功能光盘(DVD)或其他光学存储、磁盒式磁带,磁带磁磁盘存储或其他磁性存储设备或任何其他非传输介质,可用于存储可以被计算设备访问的信息。按照本文中的界定,计算机可读介质不包括暂存电脑可读媒体(transitory media),如调制的数据信号和载波。Computer readable media includes both permanent and non-persistent, removable and non-removable media. Information storage can be implemented by any method or technology. The information can be computer readable instructions, data structures, modules of programs, or other data. Examples of computer storage media include, but are not limited to, phase change memory (PRAM), static random access memory (SRAM), dynamic random access memory (DRAM), other types of random access memory (RAM), read only memory. (ROM), electrically erasable programmable read only Memory (EEPROM), flash memory or other memory technology, CD-ROM, digital versatile disc (DVD) or other optical storage, magnetic cassette, magnetic tape storage or other magnetic A storage device or any other non-transportable medium that can be used to store information that can be accessed by a computing device. As defined herein, computer readable media does not include temporary storage of computer readable media, such as modulated data signals and carrier waves.
还需要说明的是,术语“包括”、“包含”或者其任何其他变体意在涵盖非排他性的包含,从而使得包括一系列要素的过程、方法、商品或者设备不仅包括那些要素,而且还包括没有明确列出的其他要素,或者是还包括为这种过程、方法、商品或者设备所固有的要素。在没有更多限制的情况下,由语句“包括一个……”限定的要素,并不排除在包括所述要素的过程、方法、商品或者设备中还存在另外的相同要素。It is also to be understood that the terms "comprises" or "comprising" or "comprising" or any other variations are intended to encompass a non-exclusive inclusion, such that a process, method, article, Other elements not explicitly listed, or elements that are inherent to such a process, method, commodity, or equipment. An element defined by the phrase "comprising a ..." does not exclude the presence of additional equivalent elements in the process, method, item, or device including the element.
本领域技术人员应明白,本申请的实施例可提供为方法、系统或计算机程序产品。因此,本申请可采用完全硬件实施例、完全软件实施例或结合软件和硬件方面的实施例的形式。而且,本申请可采用在一个或多个其中包含有计算机可用程序代码的计算机可用存储介质(包括但不限于磁盘存储器、CD-ROM、光学存储器等)上实施的计算机程序产品的形式。Those skilled in the art will appreciate that embodiments of the present application can be provided as a method, system, or computer program product. Thus, the present application can take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment in combination of software and hardware. Moreover, the application can take the form of a computer program product embodied on one or more computer-usable storage media (including but not limited to disk storage, CD-ROM, optical storage, etc.) including computer usable program code.
以上所述仅为本申请的实施例而已,并不用于限制本申请。对于本领域技术人员来说,本申请可以有各种更改和变化。凡在本申请的精神和原理之内所作的任何修改、等同替换、改进等,均应包含在本申请的权利要求范围之内。 The above description is only an embodiment of the present application and is not intended to limit the application. Various changes and modifications can be made to the present application by those skilled in the art. Any modifications, equivalents, improvements, etc. made within the spirit and scope of the present application are intended to be included within the scope of the appended claims.

Claims (27)

  1. 一种动画合成的方法,其特征在于,包括:An animation synthesis method, comprising:
    接收输入的文本信息;Receiving input text information;
    识别所述文本信息中的各文本关键词;Identifying each text keyword in the text information;
    从预设的动画库中分别确定出各文本关键词所对应的动画;Determining an animation corresponding to each text keyword from a preset animation library;
    将确定出的各动画进行合成,得到融合动画。The determined animations are combined to obtain a fused animation.
  2. 如权利要求1所述的方法,其特征在于,从预设的动画库中分别确定出各文本关键词所对应的动画之前,所述方法还包括:The method of claim 1, wherein the method further comprises: before determining the animation corresponding to each text keyword from the preset animation library, the method further comprising:
    分别确定预先保存的各动画对应的特征向量;Determining feature vectors corresponding to each animation saved in advance;
    根据确定的各动画对应的特征向量,通过预先训练的第一分类模型,确定各动画对应的动画关键词;Determining an animation keyword corresponding to each animation by using a pre-trained first classification model according to the determined feature vector corresponding to each animation;
    将各动画及其对应的动画关键词保存在预设的动画库中。Save each animation and its corresponding animation keywords in a preset animation library.
  3. 如权利要求1所述的方法,其特征在于,从预设的动画库中分别确定出各文本关键词所对应的动画,具体包括:The method of claim 1, wherein the animation corresponding to each text keyword is determined from a preset animation library, and specifically includes:
    针对每个文本关键词,确定该文本关键词与所述动画库中保存的各动画关键词的相似度;Determining, for each text keyword, a similarity between the text keyword and each animation keyword saved in the animation library;
    根据确定的各相似度以及各动画关键词与动画的对应关系,确定该文本关键词对应的动画。The animation corresponding to the text keyword is determined according to the determined similarities and the correspondence between the animation keywords and the animation.
  4. 如权利要求1或3所述的方法,其特征在于,从预设的动画库中分别确定出各文本关键词所对应的动画,具体包括:The method according to claim 1 or 3, wherein the animation corresponding to each text keyword is respectively determined from a preset animation library, and specifically includes:
    提取所述文本信息中的特征信息;Extracting feature information in the text information;
    针对每个文本关键词,根据该文本关键词以及所述特征信息,从预设的动画库中,确定出对应于该文本关键词、且对应于所述特征信息的动画。For each text keyword, according to the text keyword and the feature information, an animation corresponding to the text keyword and corresponding to the feature information is determined from a preset animation library.
  5. 如权利要求4所述的方法,其特征在于,所述特征信息至少包括: 情绪信息;The method of claim 4 wherein said characteristic information comprises at least: Emotional information;
    从预设的动画库中分别确定出各文本关键词所对应的动画之前,所述方法还包括:Before determining the animation corresponding to each text keyword from the preset animation library, the method further includes:
    通过预先训练的第二分类模型,确定各动画对应的情绪关键词;Determining an emotional keyword corresponding to each animation by using a pre-trained second classification model;
    将各动画与情绪关键词的对应关系保存在预设的动画库中。The correspondence between each animation and the emotional keyword is saved in a preset animation library.
  6. 如权利要求1所述的方法,其特征在于,将确定出的各动画进行合成,具体包括:The method of claim 1, wherein synthesizing the determined animations comprises:
    将确定出的各动画按照所述各文本关键词在所述文本信息中的排序进行合成。The determined animations are combined according to the order of the text keywords in the text information.
  7. 如权利要求6所述的方法,其特征在于,将确定出的各动画进行合成,具体包括:The method according to claim 6, wherein the synthesizing the determined animations comprises:
    针对任意两个相邻的动画,确定待插入到前一动画和后一动画之间的过渡动画片段,将所述前一动画、所述过渡动画片段以及所述后一动画按顺序进行合成;或Determining, for any two adjacent animations, a transitional animation segment to be inserted between the previous animation and the subsequent animation, and synthesizing the previous animation, the transitional animation segment, and the subsequent animation in sequence; or
    针对任意两个相邻的动画,将前一动画的各第一指定动画帧设为第一效果,将后一动画的各第二指定动画帧设为第二效果,并将该设置效果后的前一动画和后一动画进行合成;或For any two adjacent animations, the first specified animation frame of the previous animation is set as the first effect, and the second specified animation frame of the subsequent animation is set as the second effect, and the setting effect is The previous animation and the latter animation are combined; or
    针对任意两个相邻的动画,确定前一动画的每个动画帧与后一动画的每个动画帧的相似度,根据确定出的各相似度,对该前一动画和后一动画进行合成。For any two adjacent animations, determine the similarity between each animation frame of the previous animation and each animation frame of the latter animation, and synthesize the previous animation and the latter animation according to the determined similarities. .
  8. 如权利要求7所述的方法,其特征在于,所述动画包括:三维动画;The method of claim 7 wherein said animation comprises: a three-dimensional animation;
    所述确定前一动画的每个动画帧与后一动画的每个动画帧的相似度,具体包括:The determining the similarity between each animation frame of the previous animation and each animation frame of the subsequent animation includes:
    采用公式
    Figure PCTCN2017099462-appb-100001
    确定前一动画的每个动画帧与后一动画的每个动画帧的欧氏距离,并根据确定的欧氏距离确定前一 动画的每个动画帧与后一动画的每个动画帧的相似度,其中:
    Adopt formula
    Figure PCTCN2017099462-appb-100001
    Determine the Euclidean distance of each animation frame of the previous animation and each animation frame of the latter animation, and determine the similarity of each animation frame of the previous animation to each animation frame of the latter animation according to the determined Euclidean distance Degree, where:
    D(i,j)为前一动画的第i个动画帧与后一动画的第j个动画帧的欧氏距离,其中,该欧氏距离越小,所述第i个动画帧与所述第j个动画帧的相似度越大;D(i,j) is the Euclidean distance of the i-th animation frame of the previous animation and the j-th animation frame of the latter animation, wherein the smaller the Euclidean distance, the i-th animation frame and the The greater the similarity of the jth animation frame;
    Figure PCTCN2017099462-appb-100002
    为前一动画的第i个动画帧的第n个骨骼的旋转角速度向量,
    Figure PCTCN2017099462-appb-100003
    为后一动画的第j个动画帧的第n个骨骼的旋转角速度向量,所述前一动画中各动画帧的骨骼编号与所述后一动画中各动画帧的骨骼编号相同;
    Figure PCTCN2017099462-appb-100002
    The rotation angular velocity vector of the nth bone of the ith animation frame of the previous animation,
    Figure PCTCN2017099462-appb-100003
    a rotation angular velocity vector of the nth skeleton of the jth animation frame of the subsequent animation, wherein the skeleton number of each animation frame in the previous animation is the same as the skeleton number of each animation frame in the subsequent animation;
    wn为第n个骨骼的骨骼权重;w n is the bone weight of the nth bone;
    Figure PCTCN2017099462-appb-100004
    为前一动画的第i个动画帧的第n个骨骼的旋转向量,
    Figure PCTCN2017099462-appb-100005
    为后一动画的第j个动画帧的第n个骨骼的旋转向量;
    Figure PCTCN2017099462-appb-100004
    The rotation vector of the nth bone of the ith animation frame of the previous animation,
    Figure PCTCN2017099462-appb-100005
    The rotation vector of the nth bone of the jth animation frame of the latter animation;
    u为预设的动画剧烈程度系数。u is the preset animation intensity factor.
  9. 如权利要求7所述的方法,其特征在于,确定前一动画的每帧图像与后一动画的每帧图像的相似度,具体包括:The method of claim 7, wherein determining the similarity between each frame of the previous animation and each frame of the subsequent animation comprises:
    在所述前一动画中提取各第三指定动画帧,在所述后一动画中提取各第四指定动画帧;Extracting each third specified animation frame in the previous animation, and extracting each fourth specified animation frame in the subsequent animation;
    确定每个第三指定动画帧与每个第四指定动画帧的相似度。The similarity of each of the third specified animation frames to each of the fourth specified animation frames is determined.
  10. 如权利要求7~9任一所述的方法,其特征在于,根据确定出的各相似度,对该前一动画和后一动画进行合成,具体包括:The method according to any one of claims 7 to 9, wherein the synthesizing the previous animation and the subsequent animation according to the determined similarities, specifically comprising:
    根据确定出的各相似度,从所述前一动画中确定出第一动画帧,从后一动画中确定出第二动画帧,所述第一动画帧和第二动画帧满足:Determining, according to the determined similarities, a first animation frame from the previous animation, and determining a second animation frame from the latter animation, the first animation frame and the second animation frame satisfy:
    Figure PCTCN2017099462-appb-100006
    Figure PCTCN2017099462-appb-100006
    其中,xij为所述前一动画的第i帧动画帧和所述后一动画的第j帧动画帧的欧氏距离;i的取值范围为[1,所述前一动画的总帧数];j的取值范围为[1,所述后一动画的总帧数]; Where x ij is the Euclidean distance of the i-th frame animation frame of the previous animation and the j-th frame animation frame of the subsequent animation; i has a value range of [1, the total frame of the previous animation Number]; j has a value range of [1, the total number of frames of the latter animation];
    yij为根据所述第i帧动画帧和/或根据所述第j帧动画帧确定出的综合丢帧率;y ij is a comprehensive frame loss rate determined according to the ith frame animation frame and/or according to the jth frame animation frame;
    xIJ为使a*xij+b*yij最小的xijx IJ For a * x ij + b * y ij smallest x ij;
    yIJ为使a*xij+b*yij最小的yijy IJ is y ij which minimizes a*x ij +b*y ij ;
    I为所述第一动画帧的帧号,J为所述第二动画帧的帧号;I is the frame number of the first animation frame, and J is the frame number of the second animation frame;
    a、b则为相应的系数,a≥0,b≥0;a, b are the corresponding coefficients, a ≥ 0, b ≥ 0;
    根据所述第一动画帧和第二动画帧,对该前一动画和后一动画进行合成。The previous animation and the subsequent animation are combined according to the first animation frame and the second animation frame.
  11. 如权利要求10所述的方法,其特征在于,根据所述第i帧动画帧和/或根据所述第j帧动画帧确定出的综合丢帧率,具体包括:The method according to claim 10, wherein the integrated frame loss rate determined according to the ith frame animation frame and/or the frame j frame animation frame comprises:
    根据所述第i帧动画帧,确定出所述前一动画中不参与融合且不参与合成的动画帧帧数,并根据确定出的所述前一动画中不参与融合且不参与合成的动画帧帧数,以及所述前一动画的总帧数,确定所述前一动画的预期丢帧率;Determining, according to the ith frame animation frame, the number of animation frame frames that do not participate in the merging in the previous animation and does not participate in the merging, and according to the determined animation that does not participate in the merging and does not participate in the merging in the previous animation. The number of frame frames, and the total number of frames of the previous animation, determining an expected frame loss rate of the previous animation;
    根据所述第j帧动画帧,确定出所述后一动画中不参与融合且不参与合成的动画帧帧数,并根据确定出的所述后一动画中不参与融合且不参与合成的动画帧帧数,以及所述后一动画的总帧数,确定所述后一动画的预期丢帧率;Determining, according to the j-th frame animation frame, an animation frame number that does not participate in the fusion and does not participate in the synthesis in the subsequent animation, and according to the determined animation that does not participate in the fusion and does not participate in the synthesis in the latter animation Determining an expected frame loss rate of the latter animation by the number of frame frames and the total number of frames of the subsequent animation;
    根据所述前一动画的预期丢帧率和/或所述后一动画的预期丢帧率,确定所述综合丢帧率。The integrated frame loss rate is determined according to an expected frame loss rate of the previous animation and/or an expected frame loss rate of the subsequent animation.
  12. 如权利要求11所述的方法,其特征在于,根据所述第一动画帧和第二动画帧,对该前一动画和后一动画进行合成,具体包括:The method of claim 11, wherein synthesizing the previous animation and the subsequent animation according to the first animation frame and the second animation frame comprises:
    从所述前一动画中,选择第一动画帧以及位于所述第一动画帧之后的k个动画帧,并按选择出的各动画帧在所述前一动画中的排列顺序进行排序,得到第一帧序列;从所述后一动画中,选择位于所述第二动画帧之前 的k个动画帧以及第二动画帧,并按选择出的各动画帧在所述后一动画中的排列顺序进行排序,得到第二帧序列;将第一帧序列和第二帧序列中排序序号相同的动画帧进行融合,得到k+1个融合帧;对所述前一动画中位于所述第一动画帧之前的各动画帧、各融合帧、所述后一动画中位于所述第二动画帧之后的各动画帧进行合成;其中,k为正整数。From the previous animation, selecting a first animation frame and k animation frames located after the first animation frame, and sorting the selected animation frames in the previous animation sequence, a sequence of first frames; from the latter animation, the selection is before the second animation frame k animation frames and second animation frames, and sorting the selected animation frames in the order of the subsequent animations to obtain a second frame sequence; sorting the first frame sequence and the second frame sequence The animation frames having the same serial number are fused to obtain k+1 fused frames; and the animation frames, the fused frames, and the latter animation located before the first animation frame in the previous animation are located in the first animation Each animation frame after the second animation frame is synthesized; wherein k is a positive integer.
  13. 如权利要求12所述的方法,其特征在于,将第一帧序列和第二帧序列中排序序号相同的动画帧进行融合,具体包括:The method according to claim 12, wherein the merging of the animation frames having the same sequence number in the first frame sequence and the second frame sequence comprises:
    采用公式
    Figure PCTCN2017099462-appb-100007
    确定所述第一帧序列中各动画帧对应的融合系数;
    Adopt formula
    Figure PCTCN2017099462-appb-100007
    Determining a fusion coefficient corresponding to each animation frame in the first frame sequence;
    采用公式β(p)=1-α(p)确定所述第二帧序列中各动画帧对应的融合系数;Determining a fusion coefficient corresponding to each animation frame in the second frame sequence by using a formula β(p)=1-α(p);
    其中:among them:
    α(p)为所述第一帧序列中第p个动画帧对应的融合系数,β(p)为所述第二帧序列中第p个动画帧对应的融合系数;α(p) is a fusion coefficient corresponding to the p-th animation frame in the first frame sequence, and β(p) is a fusion coefficient corresponding to the p-th animation frame in the second frame sequence;
    根据确定出的各融合系数,将第一帧序列和第二帧序列中排序序号相同的动画帧进行融合。The animation frames having the same sequence number in the first frame sequence and the second frame sequence are fused according to the determined fusion coefficients.
  14. 如权利要求1所述的方法,其特征在于,所述方法还包括:The method of claim 1 wherein the method further comprises:
    确定所述文本信息对应的效果信息;Determining effect information corresponding to the text information;
    根据所述文本信息对应的效果信息,调整所述融合动画。The fusion animation is adjusted according to the effect information corresponding to the text information.
  15. 如权利要求14所述的方法,其特征在于,确定所述文本信息对应的效果信息,具体包括:The method of claim 14, wherein the determining the effect information corresponding to the text information comprises:
    根据识别出的各文本关键词,从预设的音乐库中分别确定出与所述各文本关键词相匹配的音乐。Based on the recognized text keywords, the music matching the text keywords is respectively determined from the preset music library.
  16. 如权利要求15所述的方法,其特征在于,调整所述融合动画,具 体包括:The method of claim 15 wherein said blending animation is adjusted Body includes:
    按照各文本关键词在所述文本信息中的排列顺序,对确定出的各音乐进行合成,得到融合音乐;Combining the determined music according to the order of the text keywords in the text information to obtain the fused music;
    将所述融合音乐合成到所述融合动画中。The fused music is synthesized into the fused animation.
  17. 如权利要求15所述的方法,其特征在于,从预设的音乐库中分别确定出与所述各文本关键词相匹配的音乐之前,所述方法还包括:The method according to claim 15, wherein before the music matching the text keywords is separately determined from the preset music library, the method further comprises:
    分别确定预先保存的各音乐对应的特征,所述特征包括梅尔倒谱系数MFCC特征;Determining, respectively, features corresponding to each music saved in advance, the features including a Mel Cepstrum coefficient MFCC feature;
    根据确定的各音乐对应的特征,通过预先训练的音乐模型,确定各音乐对应的音乐关键词;Determining a music keyword corresponding to each music through a pre-trained music model according to the determined characteristics corresponding to each music;
    将各音乐及其对应的音乐关键词保存在预设的音乐库中。The music and its corresponding music keywords are saved in a preset music library.
  18. 如权利要求16所述的方法,其特征在于,将所述融合音乐合成到所述融合动画中,具体包括:The method of claim 16, wherein the merging the fused music into the fused animation comprises:
    监测所述融合动画对应的各动画参数;Monitoring each animation parameter corresponding to the fusion animation;
    根据各动画参数调整所述融合音乐的音效;Adjusting the sound effect of the fused music according to each animation parameter;
    将调整音效后的融合音乐合成到所述融合动画中。The fused music after adjusting the sound is synthesized into the fused animation.
  19. 如权利要求14所述的方法,其特征在于,确定所述文本信息对应的效果信息,具体包括:The method of claim 14, wherein the determining the effect information corresponding to the text information comprises:
    确定所述文本信息中包含的指定字符;Determining a specified character included in the text information;
    根据所述指定字符,提取所述文本信息中子文本信息;Extracting sub-text information in the text information according to the specified character;
    将所述子文本信息转换成语音;Converting the sub-text information into speech;
    将所述子文本信息和/或所述语音作为效果信息。The sub-text information and/or the voice is used as effect information.
  20. 如权利要求19所述的方法,其特征在于,调整所述融合动画,具体包括:The method of claim 19, wherein the adjusting the fusion animation comprises:
    根据所述子文本信息在所述文本信息中的位置,将所述子文本信息按 照预设的显示方式插入到所述融合动画中,和/或将所述语音合成到所述融合动画中。Pressing the sub-text information according to the position of the sub-text information in the text information Inserting into the fused animation in a preset display manner, and/or synthesizing the speech into the fused animation.
  21. 如权利要求14所述的方法,其特征在于,确定所述文本信息对应的效果信息,具体包括:The method of claim 14, wherein the determining the effect information corresponding to the text information comprises:
    从所述文本信息提取各语音特征信息;Extracting each voice feature information from the text information;
    根据所述各语音述特征信息,确定所述各语音特征信息对应的各口型类别;Determining, according to the voice description feature information, each port type corresponding to each voice feature information;
    根据所述各口型类别,确定所述各口型类别对应的各口型动画,并将所述各口型动画作为确定的效果信息。And determining, according to each of the lip-type categories, each of the lip-shaped animations corresponding to the respective lip-type categories, and using the respective lip-shaped animations as the determined effect information.
  22. 如权利要求21所述的方法,其特征在于,调整所述融合动画,具体包括:The method of claim 21, wherein the adjusting the fusion animation comprises:
    根据提取各语音特征信息所基于的单字在所述文本信息中的位置,将各口型动画合成到所述融合动画中。Each lip animation is synthesized into the fused animation based on the position of the single word on which the respective voice feature information is extracted in the text information.
  23. 一种动画合成的装置,其特征在于,包括:An apparatus for animating and synthesizing, comprising:
    接收模块,用于接收输入的文本信息;a receiving module, configured to receive input text information;
    识别模块,用于识别所述文本信息中的各文本关键词;An identification module, configured to identify each text keyword in the text information;
    确定模块,用于从预设的动画库中分别确定出各文本关键词所对应的动画;a determining module, configured to respectively determine an animation corresponding to each text keyword from a preset animation library;
    合成模块,用于将确定出的各动画进行合成,得到融合动画。A synthesis module for synthesizing the determined animations to obtain a fusion animation.
  24. 如权利要求23所述的装置,其特征在于,所述合成模块具体用于,将确定出的各动画按照所述各关键词在所述文本信息中的排序进行合成。The apparatus according to claim 23, wherein the synthesizing module is configured to synthesize the determined animations according to the order of the keywords in the text information.
  25. 如权利要求23所述的装置,其特征在于,所述合成模块具体用于,针对任意两个相邻的动画,确定待插入到所述前一动画和所述后一动画之间的过渡动画片段,将所述前一动画、所述过渡动画片段以及所述后一动画按顺序进行合成;或 The device according to claim 23, wherein the synthesizing module is specifically configured to determine, for any two adjacent animations, a transition animation to be inserted between the previous animation and the latter animation a segment that synthesizes the previous animation, the transitional animation segment, and the subsequent animation in sequence; or
    针对任意两个相邻的动画,将前一动画的各第一指定动画帧设为第一效果,将后一动画的各第二指定动画帧设为第二效果,并将该设置效果后的前一动画和后一动画进行合成,其中,所述第一效果至少包括淡出效果,所述第二指定效果至少包括淡入效果;或For any two adjacent animations, the first specified animation frame of the previous animation is set as the first effect, and the second specified animation frame of the subsequent animation is set as the second effect, and the setting effect is Combining the previous animation with the latter animation, wherein the first effect includes at least a fade-out effect, and the second specified effect includes at least a fade-in effect; or
    针对任意两个相邻的动画,确定前一动画的每个动画帧图像与后一动画的每个动画帧图像的相似度,根据确定出的各相似度,对该前一动画和后一动画进行合成。For any two adjacent animations, determine the similarity between each animation frame image of the previous animation and each animation frame image of the latter animation, and according to the determined similarities, the previous animation and the latter animation Perform the synthesis.
  26. 如权利要求25所述的装置,其特征在于,所述合成模块具体用于,从所述前一动画中,选择第一动画帧以及位于所述第一动画帧之后的k个动画帧,并按选择出的各动画帧在所述前一动画中的排列顺序进行排序,得到第一帧序列;从所述后一动画中,选择位于所述第二动画帧之前的k个动画帧以及第二动画帧,并按选择出的各动画帧在所述后一动画中的排列顺序进行排序,得到第二帧序列;将第一帧序列和第二帧序列中排序序号相同的动画帧进行融合,得到k+1个融合帧;对所述前一动画中位于所述第一动画帧之前的各动画帧、各融合帧、所述后一动画中位于所述第二动画帧之后的各动画帧进行合成;其中,k为正整数。The apparatus according to claim 25, wherein the synthesizing module is configured to: select, from the previous animation, a first animation frame and k animation frames located after the first animation frame, and Sorting the selected animation frames in the previous animation sequence to obtain a first frame sequence; from the latter animation, selecting k animation frames before the second animation frame and The second animation frame is sorted according to the arrangement order of the selected animation frames in the subsequent animation to obtain a second frame sequence; the first frame sequence and the second frame sequence are merged with the same sequence number of the animation frame. Obtaining k+1 fused frames; for each animation frame before the first animation frame, each fused frame in the previous animation, and each animation after the second animation frame in the latter animation The frame is synthesized; where k is a positive integer.
  27. 如权利要求26所述的装置,其特征在于,所述合成模块具体用于,采用公式
    Figure PCTCN2017099462-appb-100008
    确定所述第一帧序列中各动画帧对应的融合系数;采用公式β(p)=1-α(p)确定所述第二帧序列中各动画帧对应的融合系数;其中:α(p)为所述第一帧序列中第p个动画帧对应的融合系数,β(p)为所述第二帧序列中第p个动画帧对应的融合系数;根据确定出的各融合系数,将第一帧序列和第二帧序列中排序序号相同的动画帧进行融合。
    The apparatus according to claim 26, wherein said synthesizing module is specifically configured to adopt a formula
    Figure PCTCN2017099462-appb-100008
    Determining a fusion coefficient corresponding to each animation frame in the first frame sequence; determining a fusion coefficient corresponding to each animation frame in the second frame sequence by using a formula β(p)=1-α(p); wherein: α(p a fusion coefficient corresponding to the p-th animation frame in the first frame sequence, β(p) is a fusion coefficient corresponding to the p-th animation frame in the second frame sequence; according to the determined fusion coefficients, The first frame sequence and the second frame sequence have the same sequence number of animation frames for fusion.
PCT/CN2017/099462 2016-09-14 2017-08-29 Animation synthesis method and device WO2018049979A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201610823313.6 2016-09-14
CN201610823313.6A CN106504304B (en) 2016-09-14 2016-09-14 A kind of method and device of animation compound

Publications (1)

Publication Number Publication Date
WO2018049979A1 true WO2018049979A1 (en) 2018-03-22

Family

ID=58291427

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2017/099462 WO2018049979A1 (en) 2016-09-14 2017-08-29 Animation synthesis method and device

Country Status (2)

Country Link
CN (1) CN106504304B (en)
WO (1) WO2018049979A1 (en)

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109189985A (en) * 2018-08-17 2019-01-11 北京达佳互联信息技术有限公司 Text style processing method, device, electronic equipment and storage medium
CN110941990A (en) * 2019-10-22 2020-03-31 泰康保险集团股份有限公司 Method and device for evaluating human body actions based on skeleton key points
CN111028325A (en) * 2019-12-12 2020-04-17 广东智媒云图科技股份有限公司 Animal animation production method and device for limb characteristic point connecting line
CN112750184A (en) * 2019-10-30 2021-05-04 阿里巴巴集团控股有限公司 Data processing, action driving and man-machine interaction method and equipment

Families Citing this family (13)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106504304B (en) * 2016-09-14 2019-09-24 厦门黑镜科技有限公司 A kind of method and device of animation compound
CN109598775B (en) * 2017-09-30 2023-03-31 腾讯科技(深圳)有限公司 Dynamic image synthesis method, device, terminal and storage medium
CN108447474B (en) * 2018-03-12 2020-10-16 北京灵伴未来科技有限公司 Modeling and control method for synchronizing virtual character voice and mouth shape
CN108961396A (en) * 2018-07-03 2018-12-07 百度在线网络技术(北京)有限公司 Generation method, device and the terminal device of three-dimensional scenic
CN108961431A (en) * 2018-07-03 2018-12-07 百度在线网络技术(北京)有限公司 Generation method, device and the terminal device of facial expression
CN109493402A (en) * 2018-11-09 2019-03-19 网易(杭州)网络有限公司 A kind of production method and device of plot animation
CN110446066B (en) * 2019-08-28 2021-11-19 北京百度网讯科技有限公司 Method and apparatus for generating video
CN112422999B (en) * 2020-10-27 2022-02-25 腾讯科技(深圳)有限公司 Live content processing method and computer equipment
CN113230657B (en) * 2021-05-21 2022-12-13 珠海金山数字网络科技有限公司 Role interaction method and device
CN113539240A (en) * 2021-07-19 2021-10-22 北京沃东天骏信息技术有限公司 Animation generation method and device, electronic equipment and storage medium
CN113744370B (en) * 2021-08-12 2022-07-01 北京百度网讯科技有限公司 Animation synthesis method, animation synthesis device, electronic device, and storage medium
CN113870396B (en) * 2021-10-11 2023-08-15 北京字跳网络技术有限公司 Mouth shape animation generation method and device, computer equipment and storage medium
CN114496173A (en) * 2021-12-31 2022-05-13 北京航天长峰股份有限公司 Short video operation report generation method and device, computer equipment and storage medium

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102841919A (en) * 2012-06-30 2012-12-26 北京神州泰岳软件股份有限公司 Method and system for analyzing expressions in conversion text
CN103136780A (en) * 2013-03-18 2013-06-05 北京工业大学 Keyframe based sign language phonetic change animation synthesis method
CN104361620A (en) * 2014-11-27 2015-02-18 韩慧健 Mouth shape animation synthesis method based on comprehensive weighted algorithm
CN104731959A (en) * 2015-04-03 2015-06-24 北京威扬科技有限公司 Video abstraction generating method, device and system based on text webpage content
CN104732590A (en) * 2015-03-09 2015-06-24 北京工业大学 Sign language animation synthesis method
CN104835190A (en) * 2015-04-29 2015-08-12 华东师范大学 3D instant messaging system and messaging method
CN106504304A (en) * 2016-09-14 2017-03-15 厦门幻世网络科技有限公司 A kind of method and device of animation compound

Family Cites Families (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102012939B (en) * 2010-12-13 2012-11-14 中国人民解放军国防科学技术大学 Method for automatically tagging animation scenes for matching through comprehensively utilizing overall color feature and local invariant features
CN102521843B (en) * 2011-11-28 2014-06-04 大连大学 Three-dimensional human body motion analysis and synthesis method based on manifold learning
CN103793446B (en) * 2012-10-29 2019-03-01 汤晓鸥 The generation method and system of music video
CN104731960B (en) * 2015-04-03 2018-03-09 北京威扬科技有限公司 Method, apparatus and system based on ecommerce webpage content generation video frequency abstract

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102841919A (en) * 2012-06-30 2012-12-26 北京神州泰岳软件股份有限公司 Method and system for analyzing expressions in conversion text
CN103136780A (en) * 2013-03-18 2013-06-05 北京工业大学 Keyframe based sign language phonetic change animation synthesis method
CN104361620A (en) * 2014-11-27 2015-02-18 韩慧健 Mouth shape animation synthesis method based on comprehensive weighted algorithm
CN104732590A (en) * 2015-03-09 2015-06-24 北京工业大学 Sign language animation synthesis method
CN104731959A (en) * 2015-04-03 2015-06-24 北京威扬科技有限公司 Video abstraction generating method, device and system based on text webpage content
CN104835190A (en) * 2015-04-29 2015-08-12 华东师范大学 3D instant messaging system and messaging method
CN106504304A (en) * 2016-09-14 2017-03-15 厦门幻世网络科技有限公司 A kind of method and device of animation compound

Cited By (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109189985A (en) * 2018-08-17 2019-01-11 北京达佳互联信息技术有限公司 Text style processing method, device, electronic equipment and storage medium
CN109189985B (en) * 2018-08-17 2020-10-09 北京达佳互联信息技术有限公司 Text style processing method and device, electronic equipment and storage medium
CN110941990A (en) * 2019-10-22 2020-03-31 泰康保险集团股份有限公司 Method and device for evaluating human body actions based on skeleton key points
CN110941990B (en) * 2019-10-22 2023-06-16 泰康保险集团股份有限公司 Method and device for evaluating human body actions based on skeleton key points
CN112750184A (en) * 2019-10-30 2021-05-04 阿里巴巴集团控股有限公司 Data processing, action driving and man-machine interaction method and equipment
CN112750184B (en) * 2019-10-30 2023-11-10 阿里巴巴集团控股有限公司 Method and equipment for data processing, action driving and man-machine interaction
CN111028325A (en) * 2019-12-12 2020-04-17 广东智媒云图科技股份有限公司 Animal animation production method and device for limb characteristic point connecting line
CN111028325B (en) * 2019-12-12 2023-08-11 广东智媒云图科技股份有限公司 Animal animation production method and device for connecting limb characteristic points

Also Published As

Publication number Publication date
CN106504304A (en) 2017-03-15
CN106504304B (en) 2019-09-24

Similar Documents

Publication Publication Date Title
WO2018049979A1 (en) Animation synthesis method and device
US11670024B2 (en) Methods and systems for image and voice processing
US9361722B2 (en) Synthetic audiovisual storyteller
US10658005B1 (en) Methods and systems for image and voice processing
US10671838B1 (en) Methods and systems for image and voice processing
CN111415677B (en) Method, apparatus, device and medium for generating video
US10803646B1 (en) Methods and systems for image and voice processing
US9959657B2 (en) Computer generated head
CN112465935A (en) Virtual image synthesis method and device, electronic equipment and storage medium
US20140210831A1 (en) Computer generated head
KR20190070065A (en) Method and apparatus for generating adaptlve song lip sync animation based on text
CN113077537B (en) Video generation method, storage medium and device
CN110096966A (en) A kind of audio recognition method merging the multi-modal corpus of depth information Chinese
US20210390945A1 (en) Text-driven video synthesis with phonetic dictionary
Wang et al. Synthesizing photo-real talking head via trajectory-guided sample selection
WO2021034463A1 (en) Methods and systems for image and voice processing
CN113609255A (en) Method, system and storage medium for generating facial animation
Wang et al. HMM trajectory-guided sample selection for photo-realistic talking head
CN115953521A (en) Remote digital human rendering method, device and system
JP2015038725A (en) Utterance animation generation device, method, and program
CN116958343A (en) Facial animation generation method, device, equipment, medium and program product
Wang et al. Photo-real lips synthesis with trajectory-guided sample selection.
Luo et al. Synthesizing real-time speech-driven facial animation
KR102287325B1 (en) Method and apparatus for generating a voice suitable for the appearance
KR102138132B1 (en) System for providing animation dubbing service for learning language

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 17850181

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 17850181

Country of ref document: EP

Kind code of ref document: A1

32PN Ep: public notification in the ep bulletin as address of the adressee cannot be established

Free format text: NOTING OF LOSS OF RIGHTS PURSUANT TO RULE 112(1) EPC (EPO FORM 1205A DATED 02.09.2019)

122 Ep: pct application non-entry in european phase

Ref document number: 17850181

Country of ref document: EP

Kind code of ref document: A1