CN114820888A - Animation generation method and system and computer equipment - Google Patents

Animation generation method and system and computer equipment Download PDF

Info

Publication number
CN114820888A
CN114820888A CN202210435987.4A CN202210435987A CN114820888A CN 114820888 A CN114820888 A CN 114820888A CN 202210435987 A CN202210435987 A CN 202210435987A CN 114820888 A CN114820888 A CN 114820888A
Authority
CN
China
Prior art keywords
music
animation
segment
information
target
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202210435987.4A
Other languages
Chinese (zh)
Inventor
周凡
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Guangzhou Huya Technology Co Ltd
Original Assignee
Guangzhou Huya Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Guangzhou Huya Technology Co Ltd filed Critical Guangzhou Huya Technology Co Ltd
Priority to CN202210435987.4A priority Critical patent/CN114820888A/en
Publication of CN114820888A publication Critical patent/CN114820888A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T13/00Animation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/60Information retrieval; Database structures therefor; File system structures therefor of audio data
    • G06F16/65Clustering; Classification
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/60Information retrieval; Database structures therefor; File system structures therefor of audio data
    • G06F16/68Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually
    • G06F16/683Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually using metadata automatically derived from the content
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/70Information retrieval; Database structures therefor; File system structures therefor of video data
    • G06F16/78Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually
    • G06F16/783Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually using metadata automatically derived from the content
    • G06F16/7834Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually using metadata automatically derived from the content using audio features
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N20/00Machine learning
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10HELECTROPHONIC MUSICAL INSTRUMENTS; INSTRUMENTS IN WHICH THE TONES ARE GENERATED BY ELECTROMECHANICAL MEANS OR ELECTRONIC GENERATORS, OR IN WHICH THE TONES ARE SYNTHESISED FROM A DATA STORE
    • G10H1/00Details of electrophonic musical instruments
    • G10H1/0008Associated control or indicating means
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/27Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the analysis technique
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/48Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use
    • G10L25/51Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use for comparison or discrimination
    • G10L25/57Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use for comparison or discrimination for processing of video signals
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10HELECTROPHONIC MUSICAL INSTRUMENTS; INSTRUMENTS IN WHICH THE TONES ARE GENERATED BY ELECTROMECHANICAL MEANS OR ELECTRONIC GENERATORS, OR IN WHICH THE TONES ARE SYNTHESISED FROM A DATA STORE
    • G10H2210/00Aspects or methods of musical processing having intrinsic musical character, i.e. involving musical theory or musical parameters or relying on musical knowledge, as applied in electrophonic musical tools or instruments
    • G10H2210/031Musical analysis, i.e. isolation, extraction or identification of musical elements or musical parameters from a raw acoustic signal or from an encoded audio signal
    • G10H2210/071Musical analysis, i.e. isolation, extraction or identification of musical elements or musical parameters from a raw acoustic signal or from an encoded audio signal for rhythm pattern analysis or rhythm style recognition

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Multimedia (AREA)
  • General Physics & Mathematics (AREA)
  • Library & Information Science (AREA)
  • Data Mining & Analysis (AREA)
  • General Engineering & Computer Science (AREA)
  • Signal Processing (AREA)
  • Acoustics & Sound (AREA)
  • Databases & Information Systems (AREA)
  • Computational Linguistics (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Human Computer Interaction (AREA)
  • Software Systems (AREA)
  • Artificial Intelligence (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Evolutionary Computation (AREA)
  • Medical Informatics (AREA)
  • Computing Systems (AREA)
  • Mathematical Physics (AREA)
  • Processing Or Creating Images (AREA)

Abstract

The embodiment of the application provides an animation generation method, an animation generation system and computer equipment. Therefore, a large number of music-animation data sets are used, a deep learning model is adopted for extracting music characteristics, a search algorithm of characteristic matching is adopted for processing animation, controllability of the whole animation matching and generating process is achieved, expansibility of an animation generating method is improved, the generated animation is more natural and reasonable, and use experience of users can be improved.

Description

Animation generation method and system and computer equipment
Technical Field
The application relates to the technical field related to artificial intelligence and animation processing, in particular to an animation generation method, an animation generation system and computer equipment.
Background
With the continuous development and application of computer technology, animation or creation based on computers has gained wide attention and application in various scenes. In some specific application scenarios, dance animations need to be automatically generated from music. For example, in the application scenario of a music dance game, a corresponding dance animation is often required to be generated according to music.
The inventor researches and analyzes some common schemes for generating dance animations based on music at present, and finds that most of the conventional animation generation methods have the problems of poor expansibility, insufficient naturalness and reasonability of generated animations and the like, and the use experience of users is influenced.
Disclosure of Invention
Based on the above, in order to at least partially solve the technical problem, in a first aspect, an embodiment of the present application provides an animation generation method, including:
segmenting target music into at least two music segments;
sequentially inputting each music segment into a music identification model obtained by pre-training for music characteristic identification to obtain music characteristic information of each music segment, wherein the music characteristic information comprises style information and rhythm information of the music segment;
searching animation segments matched with the music segments from a pre-established animation library according to the music characteristic information of the music segments;
and generating the animation corresponding to the target music according to the animation segments matched with the music segments.
According to an embodiment of the first aspect, the method further includes a step of training a pre-selected deep learning model to obtain the music recognition model, where the step includes:
establishing a sample music library, wherein the sample music library can comprise sample music with different style types;
cutting each sample music in the sample music library into a plurality of sample music sections, and labeling corresponding music characteristic information for each sample music section to obtain a music characteristic label of each sample music section, wherein the music characteristic information corresponding to the sample music sections comprises style information and rhythm information;
and training the pre-selected deep learning model by using the sample music library to obtain a music identification model for identifying the music characteristic information of the music.
According to an embodiment of the first aspect, training a pre-selected deep learning model using the sample music library to obtain a music recognition model for recognizing music feature information of music includes:
sequentially inputting each sample music fragment into the deep learning model, and identifying and obtaining music characteristic information of the sample music fragment by the deep learning model;
calculating to obtain a loss function value of the deep learning model according to the music characteristic information of the sample music fragment obtained through identification and the music characteristic label of the sample music fragment;
and performing iterative optimization on the model parameters of the deep learning model according to the loss function value until a model training termination condition is met, and taking the deep learning model after training as the music recognition model.
According to an embodiment of the first aspect, the method further includes a step of pre-establishing the animation library, where the step includes:
acquiring animations respectively matched with each sample music in the sample music library;
segmenting each animation to obtain a plurality of animation segments, wherein the duration of each animation segment is the same as that of a sample music segment of sample music corresponding to the animation;
performing feature labeling on each animation segment to obtain animation feature information corresponding to each animation segment, wherein the animation feature information comprises style information and rhythm information of the animation segment;
and storing each animation segment marked with the animation characteristic information into a set database to obtain the animation library.
According to an implementation manner of the first aspect, searching for an animation segment matching each of the music segments from a pre-established animation library according to music characteristic information of each of the music segments includes:
aiming at any target music segment, searching a plurality of animation segments matched with the music segment from the animation library according to the style information of the target music segment;
searching at least one candidate animation segment matched with the target music segment from the plurality of animation segments according to the rhythm information of the target music segment;
and determining a target animation segment matched with the target music segment according to the motion information of the at least one candidate animation segment, wherein the motion information comprises position information, speed information, rotation speed and angular speed information for representing a target object in the candidate animation segment.
According to an embodiment of the first aspect, determining a target animation segment matching the target music segment according to the motion information of the at least one candidate animation segment includes:
acquiring motion information of the last preset number of frames of the animation segment matched with the previous music segment of the target music segment as an index;
and searching one of the motion information of the previous preset number of frames from the at least one candidate animation segment according to the index, wherein the motion information of the previous preset number of frames is closest to the index, and the one is used as a target animation segment matched with the target music segment.
In an implementation manner of the first aspect, the method further includes:
and zooming the target animation segment by adopting a B-spline interpolation method to enable the duration of the target animation segment to be consistent with that of the target music segment.
In a second aspect, an embodiment of the present application further provides an animation generation system, applied to a computer device, where the animation generation system includes:
the segmentation module is used for segmenting the target music into at least two music pieces;
the recognition module is used for sequentially inputting each music segment into a music recognition model obtained by pre-training for music characteristic recognition to obtain music characteristic information of each music segment, wherein the music characteristic information comprises style information and rhythm information of the music segment;
the searching module is used for searching the animation segments matched with the music segments from a pre-established animation library according to the music characteristic information of the music segments;
and the generating module is used for generating the animation corresponding to the target music according to the animation segments matched with the music segments.
In an implementation manner of the second aspect, the search module is specifically configured to:
aiming at any target music segment, searching a plurality of animation segments matched with the music segment from the animation library according to the style information of the target music segment;
searching at least one candidate animation segment matched with the target music segment from the plurality of animation segments according to the rhythm information of the target music segment;
and determining a target animation segment matched with the target music segment according to the motion information of the at least one candidate animation segment, wherein the motion information comprises position information, speed information, rotation speed and angular speed information for characterizing a target object in the candidate animation segment.
In a third aspect, an embodiment of the present application further provides a computer device, including a machine-readable storage medium and one or more processors, where the machine-readable storage medium stores machine-executable instructions, and the machine-executable instructions, when executed by the one or more processors, implement the animation generation method described above.
Based on the above content of the embodiment of the present application, compared with the prior art, the animation generation method, system and computer device provided by the embodiment of the present application, by using a large number of music-animation data sets, instead of using an end-to-end music-animation generation model, a deep learning model (music recognition model) is used for extracting music features, a search algorithm of feature matching is used for processing the animation, the animation can be corrected according to the user's idea in the search process, the whole animation matching and generation process can be controlled, the animation can be high-quality animation searched and corrected from an action library, and the output animation can be ensured to be lossless. Therefore, the animation generation method in the embodiment has better expansibility, the generated animation is more natural and reasonable, and the use experience of a user can be improved. In addition, as the scale of the action library is gradually improved and increased, the animation generation effect is better.
Drawings
In order to more clearly illustrate the technical solutions of the embodiments of the present application, the drawings that are required to be used in the embodiments will be briefly described below, it should be understood that the following drawings only illustrate some embodiments of the present application and therefore should not be considered as limiting the scope, and for those skilled in the art, other related drawings can be obtained from the drawings without inventive effort.
Fig. 1 is a schematic flowchart of an animation generation method according to an embodiment of the present application.
Fig. 2 is a schematic flow chart of the implementation of step S200 in fig. 1.
Fig. 3 is a schematic flow chart of the implementation of step S300 in fig. 1.
Fig. 4 is a schematic diagram of the segmentation of the target music provided in the present embodiment.
FIG. 5 is a schematic diagram of scaling a matched animation segment according to an embodiment of the present application.
Fig. 6 is a schematic functional module diagram of an animation generation system provided in an embodiment of the present application.
Fig. 7 is a schematic diagram of a computer device for implementing the animation generation method according to an embodiment of the present application.
Detailed Description
In order to make the objects, technical solutions and advantages of the embodiments of the present application clearer, the technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application, and it is obvious that the described embodiments are some embodiments of the present application, but not all embodiments. The components of the embodiments of the present application, generally described and illustrated in the figures herein, can be arranged and designed in a wide variety of different configurations.
Thus, the following detailed description of the embodiments of the present application, presented in the accompanying drawings, is not intended to limit the scope of the claimed application, but is merely representative of selected embodiments of the application. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present application.
It should be noted that: like reference numbers and letters refer to like items in the following figures, and thus, once an item is defined in one figure, it need not be further defined and explained in subsequent figures.
In the description of the present application, the terms "first," "second," "third," and the like are used solely to distinguish one from another and are not to be construed as indicating or implying relative importance.
In the description of the present application, it is further noted that, unless expressly stated or limited otherwise, the terms "disposed," "mounted," "connected," and "connected" are to be construed broadly, e.g., as meaning either a fixed connection, a removable connection, or an integral connection; can be mechanically or electrically connected; they may be connected directly or indirectly through intervening media, or they may be interconnected between two elements. The specific meaning of the above terms in the present application can be understood in a specific case by those of ordinary skill in the art.
Based on the technical problems mentioned in the foregoing background, the inventor of the present application finds that there are two common ways for generating dance animations based on music at present, one way is to adopt a music library and an action library, a user selects a piece of music from the set music library, and then the system selects a corresponding dance animation from the action library according to a corresponding relationship, so as to ensure consistency of the music and the dance animation in terms of duration, style, rhythm, and the like.
The other method is to directly model the relation between music and animation by adopting a deep learning method, and the scheme usually needs to collect a large amount of paired music and animation data and train a deep learning model to realize the purpose of inputting music into the model and outputting animation. However, the interpretability and the controllability of the scheme are poor, the generated animation is not natural and reasonable enough, meanwhile, the whole dance animation generation process is difficult to artificially control, and compared with the original music and dance animation, a model cannot achieve a hundred percent fitting, so that the output animation is damaged.
Embodiments of the present application will be described below by way of example with reference to the accompanying drawings.
Fig. 1 is a schematic flow chart of an animation generation method provided in an embodiment of the present application, and in this embodiment, the animation generation method may be executed and implemented by a computer device. It should be understood that, in the animation generation method provided in this embodiment, the order of some steps included in the animation generation method may be interchanged according to actual needs in actual implementation, or some steps may be omitted or deleted, which is not specifically limited in this embodiment.
The following describes each step of the animation generation method of the present embodiment in detail by way of example with reference to fig. 1, and in detail, as shown in fig. 1, the method may include related contents described in step S100 to step S400 below.
Step S100, segmenting the target music into at least two music segments.
In this embodiment, the music pieces may be divided into a section according to the tempo or the bar of the target music, and the division position may be located in the middle of two nearest tempo points. Wherein the target music may be a complete song or a processed song segment. Preferably, in this embodiment, the target music has a fixed BPM (beats per minute), so that the respective divided pieces of music have the same duration. As an example, as shown in fig. 4, for example, the target music may be segmented in units of beats, or the target music may be segmented in units of bars, for example, each bar may include four beats, and the specific segmentation manner is not limited in this embodiment.
And step S200, sequentially inputting each music segment into a music identification model obtained by pre-training for music characteristic identification to obtain music characteristic information of each music segment. In this embodiment, the music feature information includes, but is not limited to, style information and rhythm information of the music piece.
In one possible implementation manner of the embodiment, as shown in fig. 2, the music recognition model may be implemented in advance by machine learning through steps S210 to S230 described below.
Step S210, a sample music library is established. The sample music library may include a plurality of sample music of different genre types, for example, sample music of different genre types such as rock, pop, classical, and the like.
Step S220, segmenting each sample music in the sample music library into a plurality of sample music pieces, and labeling corresponding music feature information for each sample music piece to obtain a music feature tag of each sample music piece.
For example, in the present embodiment, each sample music may be sliced into sample music pieces of one piece in terms of beats or bars, and the position of the slicing may be in the middle of two closest beat points. The music feature information may include, but is not limited to, style information and rhythm information of the corresponding sample music piece, where the style information may be described by using natural words such as rock, pop, and classical words, and the rhythm information may be represented by using a feature vector (e.g., an 8-dimensional vector of one-hot). Preferably, in the present embodiment, each sample music has a fixed bpm (beats per minute) so that the time duration of each sample music piece cut out is the same. Meanwhile, style information of sample music pieces corresponding to the same sample music should be the same, but tempo information may be different.
Step S230, training the pre-selected deep learning model by using the sample music library to obtain a music recognition model for recognizing music characteristic information of music.
For example, in one possible implementation manner, each sample music piece may be sequentially input into the deep learning model, and the deep learning model identifies and obtains music feature information of the sample music piece; then, calculating according to the music characteristic information of the sample music fragment obtained by identification and the music characteristic label of the sample music fragment to obtain a loss function value of the deep learning model; and finally, carrying out iterative optimization on the model parameters of the deep learning model according to the loss function value until a model training termination condition is met, and taking the deep learning model after training as the music recognition model. The model training termination condition may be that the loss function value is smaller than a set threshold, or that the number of times of iterative training reaches a set number of times, which is not limited in this embodiment.
In addition, in this embodiment, the inputting of each sample music piece into the deep learning model may refer to inputting an audio feature (for example, an MFCC feature) obtained by processing each sample music piece by a predetermined audio processing method into the deep learning model. The deep learning model used in the training process can adopt the current mainstream music model without limitation, and the music recognition model obtained after the training can accurately recognize music characteristic information of a section of externally input music (segmented according to beats or bars), for example, can accurately recognize styles and output rhythm vectors.
Step S300, searching and obtaining the animation segments matched with the music segments from a pre-established animation library according to the music characteristic information of the music segments.
For example, after a music piece M is input to the music recognition model, the model outputs style information and rhythm information corresponding to the music piece M. Because each animation segment also has style information and rhythm information, and the style information has higher semantic meaning relative to the rhythm information, a plurality of animation segments with style information consistent with the music segment M can be found as candidates, for example, rock music is usually found in rock-and-roll style animation segments, and then a more matched animation segment is further screened according to the rhythm information of the music segment M. The target animation segment that best matches the music segment M may be screened by using the feature distance between the feature vector corresponding to the tempo information of the L2 distance measure music segment M and the feature vector corresponding to the animation segment, for example, a plurality of animation segments with a distance of L2 smaller than a certain threshold may be used as candidates, and then one of the target animation segments may be selected from the candidates, or an animation segment with a minimum distance of L2 may be directly used as the target animation segment.
In a possible implementation manner of this embodiment, the animation library may be built together when the sample music library is built. For example, as shown in FIG. 4, the animation library may be created through steps S310-S340 described below, which will be described in an exemplary manner.
Step S310, acquiring animations respectively matched with each sample music in the sample music library.
The animation can be a complete dance action picture, a professional dance actor dancing along with corresponding sample music by wearing a recording device can be captured by the aid of an inertial capturing device or an optical capturing device, and then the captured whole dance action picture can be stored after being processed.
Step S320, segmenting each animation to obtain a plurality of animation segments, where each animation segment has the same duration as a sample music segment of sample music corresponding to the animation.
Step S330, performing feature labeling on each animation segment to obtain animation feature information corresponding to each animation segment, wherein the animation feature information comprises style information and rhythm information of the animation segment. And the style information of the animation segment is the same as the style information and the rhythm information of the corresponding sample music segment. Meanwhile, the style information of each animation segment obtained by segmenting the animation corresponding to the same sample music is consistent, but the rhythm information can be different. The cut animation segments are used as the minimum units of the animation generation, so that each music segment in the sample music library can correspond to an accurately matched animation segment. The lengths of the animation segments and the sample music segments can be smaller than 1 second (according to beat segmentation) and can also be 1-3 seconds (according to bar segmentation).
Step S340, storing each animation segment labeled with the animation feature information into a set database to obtain the animation library.
In this way, the samples included in the finally created sample music library are different sample music pieces and corresponding labels (music feature labels), and the animation library is composed of each animation piece and corresponding label (animation feature information). The music library is used for training the deep learning model to obtain a music identification model, and the animation library is used for searching the identification result (music characteristic information) of the music fragment based on the music identification model to obtain a matched animation fragment and then generating an animation corresponding to the target music.
In a possible application scenario of step 300, an animation corresponding to the target music may be generated directly according to the animation segments matched with the music segments. Therefore, in an application scene similar to live webcasting, the anchor plays the music and simultaneously can acquire the animation corresponding to the played music, so that dance accompanying of the music is realized, and the live webcasting interaction effect with the user is improved.
In another possible application scenario of the foregoing step 300, the target music may be played while playing the matching animation segment, for example, the target music may be music currently being played, and before playing the music, the matching of the animation segment to be played may be performed in advance, and the animation segment matching the music segment is searched for and played synchronously with the music segment to be played.
In an application scenario where music and corresponding animation are played synchronously, if the durations of music segments and animation segments matched with the music segments are inconsistent, the problem of animation jump may occur. In order to avoid this problem, in the embodiment of the present application, when music feature information of each music piece is obtained and a matching animation piece is searched in an animation library, corresponding scaling processing needs to be performed on the animation piece. Specifically, in an alternative implementation manner, the present application embodiment implements the search matching process of the animation segments of each music segment and the scaling process of the animation segments by the following method, and the process may be included in step S300.
Firstly, aiming at any target music segment, searching a plurality of animation segments matched with the music segment from the animation library according to the style information of the target music segment;
then, searching at least one candidate animation segment matched with the target music segment from the plurality of animation segments according to the rhythm information of the target music segment; generally, a plurality of animation segments which are matched with or similar to the target music segment can be searched as candidate animation segments according to the rhythm information;
finally, a target animation segment matching the target music segment is determined according to the motion information of the at least one candidate animation segment, wherein the motion information includes, but is not limited to, position information, speed information, rotation speed and angular speed information and the like for representing target objects (such as an anchor, a virtual character and the like) in the animation segment. For example, as an example, motion information of a last preset number of frames (for example, a last 3 frames) of an animation segment matched with a previous music segment of the target music segment may be first obtained as an index, and then one of multiple animations in which the motion information of the previous preset number of frames (for example, the previous 3 frames) is closest to the index may be searched from the at least one candidate animation segment according to the index as the target animation segment matched with the target music segment, so that diversity of animation generation may be increased.
For example, as an example, assume that the target music piece is music piece M, and its motion information I (C) -3~-1 ) As an index, the style information, rhythm information and music segment M are matched into an animation segment set
Figure BDA0003612848870000101
N segments of animation, wherein the first 3 frames of motion information of the ith segment of animation
Figure BDA0003612848870000102
And I (C) -3~-1 ) The best match is found, then the final determined target animation segment is animation segment D i . The matching degree of the animation segments can be measured by adopting L2 loss, and the target animation segment can be determined according to the size of the matching degree.
Further, in order to avoid the problem of animation jump and ensure that each music piece and the animation piece matched with the music piece can be kept consistent during final playing, in this embodiment, a B-spline interpolation method may be adopted to scale the target animation piece, so that the target animation piece and the target music piece are played synchronously after the durations of the target animation piece and the target music piece are consistent.
As shown in FIG. 5, the duration of the music piece M is t, and the corresponding animation piece D is matched i Is of duration t d And with D i Corresponding sample music piece X i Is also t d Since the bpm of different music may be different, i.e. notAs well as the duration between beats of the music. Thus, when the target music is random and is not included in the music library, it may result in music piece M and animation piece D i Are not uniform, and thus, it is necessary to perform the animation segment D i Scaling is performed to match the duration of the piece of music M. Based on this, in this embodiment, the animation segment D can be interpolated by B-spline i And synchronously playing the music segment M to be played after zooming so as to ensure the continuity of the played animation. In addition, the embodiment can ensure the first-order smoothness of the interpolation points, and the actions of the head and the tail of the clip can not be changed at the beat points before and after zooming, so that the beat animation in the zoomed action clip is ensured to be corresponding to the beat point of the music clip M to be played. Duration D of animation segment after interpolation i ' is consistent with the duration of the music piece M, and when the music piece M is played, the animation piece is played.
And step S400, generating the animation corresponding to the target music according to the animation segments matched with the music segments.
In this embodiment, after determining the animation segments matched with each music segment, each animation segment may be spliced or synthesized into a complete animation. In this embodiment, the generated animation may be used as the dance-matching animation of the target music to realize the function of automatically matching dance with music, and may be used in the fields of dance animation production, live webcast and the like. Taking live webcasting as an example, when a main broadcast plays a piece of music, dance animation corresponding to the music can be automatically generated and displayed in a live broadcast picture, so as to enhance the interaction between audiences and the main broadcast. For example, a music library and a high-quality dance action library paired with the music library may be prepared in advance at the beginning of the live broadcast service provided by the anchor. The recording of high-quality dance movements can be collected by adopting an inertial capture device or an optical capture device, a professional dancer (or a main broadcaster) dances along with music by wearing the recording device, and the whole dance movements are processed and stored in a dance movement library. In the process of making a music library, the used music can contain various types of music, such as rock, pop, classical and the like, and the music is stored in the music library after being processed, each piece of music is divided into a section of music fragment according to the beat or the measure, and the division position can be positioned between two nearest beat points. In a specific implementation, the music library may be adopted to train a deep learning model to obtain the music recognition model for recognizing music feature information. The dance action library obtained by the production can be used as the animation library, animation segments matched with all music segments are searched according to the recognition result of the music characteristic information, and finally dance matching animation matched with the music played in the live broadcast process is generated based on the searched animation segments, so that the interaction of network live broadcast is realized.
Fig. 6 is a schematic diagram of an animation generation system for implementing the animation generation method provided in the embodiment of the present application. In this embodiment, the animation generation system can be applied to the computer device 100 shown in fig. 7. In detail, the computer device 100 may include one or more processors 110, a machine-readable storage medium 120, and an animation generation system 130. The processor 110 and the machine-readable storage medium 120 may be communicatively connected via a system bus. The machine-readable storage medium 120 stores machine-executable instructions, and the processor 110 implements the animation generation method described above by reading and executing the machine-executable instructions in the machine-readable storage medium 120. In this embodiment, the computer device 100 may be a cloud server for executing each function module included in the front end of the animation generation system, may also be a user client for executing each function module included in the back end of the animation generation system, or may also be a cloud server for executing each function module included in the front end and the back end of the animation generation system at the same time, or may also be a combination of a cloud server and a user client for executing each function module included in the front end and the back end of the animation generation system respectively, which is not limited in this embodiment.
The machine-readable storage medium 120 may be, but is not limited to, a Random Access Memory (RAM), a Read Only Memory (ROM), a Programmable Read-Only Memory (PROM), an Erasable Read-Only Memory (EPROM), an electrically Erasable Read-Only Memory (EEPROM), and the like. The machine-readable storage medium 120 is used for storing a program, and the processor 110 executes the program after receiving an execution instruction.
The processor 110 may be an integrated circuit chip having signal processing capabilities. The Processor may be, but is not limited to, a general-purpose Processor, and includes a Central Processing Unit (CPU), a Network Processor (NP), a Digital Signal Processor (DSP), an Application Specific Integrated Circuit (ASIC), a Field Programmable Gate Array (FPGA), and the like.
In this embodiment, the animation generation system 130 may include a segmentation module 131, a recognition module 132, a search module 133, and a generation module 134.
The segmenting module 131 is configured to segment the target music into at least two music segments.
In this embodiment, the segmentation module 131 may be configured to perform the step S100, and for further details of the segmentation module 131, reference may be made to relevant contents of the step S100, which is not described herein again.
The recognition module 132 is configured to sequentially input each music segment into a music recognition model obtained through pre-training to perform music feature recognition, so as to obtain music feature information of each music segment, where the music feature information includes style information and rhythm information of the music segment.
In this embodiment, the identification module 132 may be configured to execute the step S200, and for further details of the identification module 132, reference may be made to relevant contents of the step S200, which is not described herein again.
The searching module 133 is configured to search for an animation segment matching each of the music segments from a pre-established animation library according to the music characteristic information of each of the music segments.
In this embodiment, the search module 133 may be configured to execute the step S300, and for more details of the search module 133, reference may be made to relevant contents of the step S300, which is not described herein again.
The generating module 134 is configured to generate an animation corresponding to the target music according to the animation segments matched with the music segments.
In this embodiment, the generating module 134 may be configured to execute the step S400, and for further details of the generating module 134, reference may be made to relevant contents of the step S400, which is not described herein again.
In summary, the animation generation method, system and computer device provided in the embodiments of the present application use a large number of music-animation data sets, do not use an end-to-end music-animation generation model, but only use a deep learning model (music recognition model) for extracting music features, and use a feature matching search algorithm for processing animation, so that the animation can be modified according to the user's thoughts during the search process, thereby realizing the controllability of the whole animation matching and generation process, and the animation is a high-quality animation searched and modified from an action library, so that the output animation is lossless. Therefore, the animation generation method in the embodiment has better expansibility, the generated animation is more natural and reasonable, and the use experience of the user can be improved. In addition, as the scale of the action library is gradually improved and increased, the animation generation effect is better.
In the embodiments provided in the present application, it should be understood that the disclosed apparatus and method may be implemented in other ways. The apparatus embodiments described above are merely illustrative and, for example, the flowcharts and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of apparatus, methods and computer program products according to various embodiments of the present application. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams and/or flowchart illustration, and combinations of blocks in the block diagrams and/or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.
In addition, functional modules in the embodiments of the present application may be integrated together to form an independent part, or each module may exist alone, or two or more modules may be integrated to form an independent part.
The functions, if implemented in the form of software functional modules and sold or used as a stand-alone product, may be stored in a computer readable storage medium. Based on such understanding, the technical solution of the present application or portions thereof that substantially contribute to the prior art may be embodied in the form of a software product stored in a storage medium and including instructions for causing a computer device (which may be a personal computer, a server, or a network device) to execute all or part of the steps of the method according to the embodiments of the present application. And the aforementioned storage medium includes: a U-disk, a removable hard disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk or an optical disk, and other various media capable of storing program codes.
It is noted that, herein, relational terms such as first and second, and the like may be used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Also, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other identical elements in a process, method, article, or apparatus that comprises the element.
The above description is only for various embodiments of the present application, but the scope of the present application is not limited thereto, and any person skilled in the art can easily conceive of changes or substitutions within the technical scope of the present application, and all such changes or substitutions are included in the scope of the present application. Therefore, the protection scope of the present application shall be subject to the protection scope of the claims.

Claims (10)

1. A method of animation generation, the method comprising:
segmenting target music into at least two music segments;
sequentially inputting each music segment into a music identification model obtained by pre-training for music characteristic identification to obtain music characteristic information of each music segment, wherein the music characteristic information comprises style information and rhythm information of the music segment;
searching animation segments matched with the music segments from a pre-established animation library according to the music characteristic information of the music segments;
and generating the animation corresponding to the target music according to the animation segments matched with the music segments.
2. The animation generation method as claimed in claim 1, further comprising a step of training a pre-selected deep learning model to obtain the music recognition model, the step comprising:
establishing a sample music library, wherein the sample music library can comprise sample music with different style types;
cutting each sample music in the sample music library into a plurality of sample music sections, and labeling corresponding music characteristic information for each sample music section to obtain a music characteristic label of each sample music section, wherein the music characteristic information corresponding to the sample music sections comprises style information and rhythm information;
and training the pre-selected deep learning model by using the sample music library to obtain a music identification model for identifying the music characteristic information of the music.
3. The animation generation method according to claim 2, wherein training a pre-selected deep learning model using the sample music library to obtain a music recognition model for recognizing music characteristic information of music comprises:
sequentially inputting each sample music fragment into the deep learning model, and identifying and obtaining music characteristic information of the sample music fragment by the deep learning model;
calculating to obtain a loss function value of the deep learning model according to the music characteristic information of the sample music fragment obtained through identification and the music characteristic label of the sample music fragment;
and performing iterative optimization on the model parameters of the deep learning model according to the loss function value until a model training termination condition is met, and taking the deep learning model after training as the music recognition model.
4. The animation generation method as claimed in claim 2 or 3, further comprising a step of previously establishing the animation library, the step comprising:
acquiring animations respectively matched with each sample music in the sample music library;
segmenting each animation to obtain a plurality of animation segments, wherein the duration of each animation segment is the same as that of a sample music segment of sample music corresponding to the animation;
performing feature labeling on each animation segment to obtain animation feature information corresponding to each animation segment, wherein the animation feature information comprises style information and rhythm information of the animation segment;
and storing each animation segment marked with the animation characteristic information into a set database to obtain the animation library.
5. The animation generation method as claimed in claim 1, wherein searching for an animation piece matching each of the music pieces from a pre-established animation library based on the music characteristic information of each of the music pieces comprises:
aiming at any target music segment, searching a plurality of animation segments matched with the music segment from the animation library according to the style information of the target music segment;
searching at least one candidate animation segment matched with the target music segment from the plurality of animation segments according to the rhythm information of the target music segment;
and determining a target animation segment matched with the target music segment according to the motion information of the at least one candidate animation segment, wherein the motion information comprises position information, speed information, rotation speed and angular speed information for representing a target object in the candidate animation segment.
6. The animation generation method as claimed in claim 5, wherein determining the target animation segment matching the target music segment according to the motion information of the at least one candidate animation segment comprises:
acquiring motion information of the last preset number of frames of the animation segment matched with the previous music segment of the target music segment as an index;
and searching one of the motion information of the previous preset number of frames from the at least one candidate animation segment according to the index, wherein the motion information of the previous preset number of frames is closest to the index, and the one is used as a target animation segment matched with the target music segment.
7. The animation generation method as claimed in claim 6, further comprising:
and zooming the target animation segment by adopting a B-spline interpolation method to enable the duration of the target animation segment to be consistent with that of the target music segment.
8. An animation generation system applied to a computer device, the animation generation system comprising:
the segmentation module is used for segmenting the target music into at least two music pieces;
the recognition module is used for sequentially inputting each music segment into a music recognition model obtained by pre-training for music characteristic recognition to obtain music characteristic information of each music segment, wherein the music characteristic information comprises style information and rhythm information of the music segment;
the searching module is used for searching the animation segments matched with the music segments from a pre-established animation library according to the music characteristic information of the music segments;
and the generating module is used for generating the animation corresponding to the target music according to the animation segments matched with the music segments.
9. The animation generation system of claim 8, wherein the search module is specifically configured to:
aiming at any target music segment, searching a plurality of animation segments matched with the music segment from the animation library according to the style information of the target music segment;
searching at least one candidate animation segment matched with the target music segment from the plurality of animation segments according to the rhythm information of the target music segment;
and determining a target animation segment matched with the target music segment according to the motion information of the at least one candidate animation segment, wherein the motion information comprises position information, speed information, rotation speed and angular speed information for representing a target object in the candidate animation segment.
10. A computer device comprising a machine-readable storage medium and one or more processors, the machine-readable storage medium having stored thereon machine-executable instructions that, when executed by the one or more processors, implement the animation generation method of any one of claims 1-7.
CN202210435987.4A 2022-04-24 2022-04-24 Animation generation method and system and computer equipment Pending CN114820888A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210435987.4A CN114820888A (en) 2022-04-24 2022-04-24 Animation generation method and system and computer equipment

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210435987.4A CN114820888A (en) 2022-04-24 2022-04-24 Animation generation method and system and computer equipment

Publications (1)

Publication Number Publication Date
CN114820888A true CN114820888A (en) 2022-07-29

Family

ID=82507845

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210435987.4A Pending CN114820888A (en) 2022-04-24 2022-04-24 Animation generation method and system and computer equipment

Country Status (1)

Country Link
CN (1) CN114820888A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114998491A (en) * 2022-08-01 2022-09-02 阿里巴巴(中国)有限公司 Digital human driving method, device, equipment and storage medium

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
KR20070008238A (en) * 2005-07-13 2007-01-17 엘지전자 주식회사 Apparatus and method of music synchronization based on dancing
CN109615682A (en) * 2018-12-07 2019-04-12 北京微播视界科技有限公司 Animation producing method, device, electronic equipment and computer readable storage medium
CN109977255A (en) * 2019-02-22 2019-07-05 北京奇艺世纪科技有限公司 Model generating method, audio-frequency processing method, device, terminal and storage medium
CN111179385A (en) * 2019-12-31 2020-05-19 网易(杭州)网络有限公司 Dance animation processing method and device, electronic equipment and storage medium
CN113160848A (en) * 2021-05-07 2021-07-23 网易(杭州)网络有限公司 Dance animation generation method, dance animation model training method, dance animation generation device, dance animation model training device, dance animation equipment and storage medium
CN113873254A (en) * 2021-10-26 2021-12-31 西安微电子技术研究所 Method, system and equipment for reducing photoelectric video and storage medium thereof

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
KR20070008238A (en) * 2005-07-13 2007-01-17 엘지전자 주식회사 Apparatus and method of music synchronization based on dancing
CN109615682A (en) * 2018-12-07 2019-04-12 北京微播视界科技有限公司 Animation producing method, device, electronic equipment and computer readable storage medium
CN109977255A (en) * 2019-02-22 2019-07-05 北京奇艺世纪科技有限公司 Model generating method, audio-frequency processing method, device, terminal and storage medium
CN111179385A (en) * 2019-12-31 2020-05-19 网易(杭州)网络有限公司 Dance animation processing method and device, electronic equipment and storage medium
US20230162421A1 (en) * 2019-12-31 2023-05-25 Netease (Hangzhou) Network Co.,Ltd. Dance Animation Processing Method and Apparatus, Electronic Device, and Storage Medium
CN113160848A (en) * 2021-05-07 2021-07-23 网易(杭州)网络有限公司 Dance animation generation method, dance animation model training method, dance animation generation device, dance animation model training device, dance animation equipment and storage medium
CN113873254A (en) * 2021-10-26 2021-12-31 西安微电子技术研究所 Method, system and equipment for reducing photoelectric video and storage medium thereof

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114998491A (en) * 2022-08-01 2022-09-02 阿里巴巴(中国)有限公司 Digital human driving method, device, equipment and storage medium
CN114998491B (en) * 2022-08-01 2022-11-18 阿里巴巴(中国)有限公司 Digital human driving method, device, equipment and storage medium

Similar Documents

Publication Publication Date Title
CN111683209B (en) Mixed-cut video generation method and device, electronic equipment and computer-readable storage medium
Huang et al. Dance revolution: Long-term dance generation with music via curriculum learning
CN110955786B (en) Dance action data generation method and device
Lee et al. Music similarity-based approach to generating dance motion sequence
CN114242070B (en) Video generation method, device, equipment and storage medium
Nieto et al. Audio-based music structure analysis: Current trends, open challenges, and applications
CN110347872B (en) Video cover image extraction method and device, storage medium and electronic equipment
CN111179385B (en) Dance animation processing method and device, electronic equipment and storage medium
US20170154457A1 (en) Systems and methods for speech animation using visemes with phonetic boundary context
JP2002014691A (en) Identifying method of new point in source audio signal
US20220182722A1 (en) System and method for automatic detection of periods of heightened audience interest in broadcast electronic media
JPWO2012137493A1 (en) Image processing apparatus, image processing method, image processing program, and integrated circuit
WO2014096832A1 (en) Audio analysis system and method using audio segment characterisation
CN114820888A (en) Animation generation method and system and computer equipment
Au et al. Choreograph: Music-conditioned automatic dance choreography over a style and tempo consistent dynamic graph
CN114093021A (en) Dance video motion extraction method and device, computer equipment and storage medium
CN111147871A (en) Singing recognition method and device in live broadcast room, server and storage medium
CN114302174A (en) Video editing method and device, computing equipment and storage medium
KR20240128047A (en) Video production method and device, electronic device and readable storage medium
US11748406B2 (en) AI-assisted sound effect editorial
Lin et al. Audio musical dice game: A user-preference-aware medley generating system
CN113313065A (en) Video processing method and device, electronic equipment and readable storage medium
CN117059123A (en) Small-sample digital human voice-driven action replay method based on gesture action graph
CN116543077A (en) Animation control information construction method and device, equipment, medium and product thereof
CN113012723B (en) Multimedia file playing method and device and electronic equipment

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination