CN113160848B - Dance animation generation method, model training method, device, equipment and storage medium - Google Patents

Dance animation generation method, model training method, device, equipment and storage medium Download PDF

Info

Publication number
CN113160848B
CN113160848B CN202110497533.5A CN202110497533A CN113160848B CN 113160848 B CN113160848 B CN 113160848B CN 202110497533 A CN202110497533 A CN 202110497533A CN 113160848 B CN113160848 B CN 113160848B
Authority
CN
China
Prior art keywords
music
sample
piece
loss function
function value
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202110497533.5A
Other languages
Chinese (zh)
Other versions
CN113160848A (en
Inventor
段颖琳
石天阳
袁燚
范长杰
胡志鹏
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Netease Hangzhou Network Co Ltd
Original Assignee
Netease Hangzhou Network Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Netease Hangzhou Network Co Ltd filed Critical Netease Hangzhou Network Co Ltd
Priority to CN202110497533.5A priority Critical patent/CN113160848B/en
Publication of CN113160848A publication Critical patent/CN113160848A/en
Application granted granted Critical
Publication of CN113160848B publication Critical patent/CN113160848B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/48Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use
    • G10L25/51Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use for comparison or discrimination
    • G10L25/57Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use for comparison or discrimination for processing of video signals
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T13/00Animation
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/03Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the type of extracted parameters
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/03Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the type of extracted parameters
    • G10L25/18Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the type of extracted parameters the extracted parameters being spectral information of each sub-band
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/27Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the analysis technique
    • G10L25/30Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the analysis technique using neural networks

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Computational Linguistics (AREA)
  • Health & Medical Sciences (AREA)
  • Signal Processing (AREA)
  • Theoretical Computer Science (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Human Computer Interaction (AREA)
  • Acoustics & Sound (AREA)
  • Multimedia (AREA)
  • General Physics & Mathematics (AREA)
  • Evolutionary Computation (AREA)
  • Artificial Intelligence (AREA)
  • Data Mining & Analysis (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • General Health & Medical Sciences (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Spectroscopy & Molecular Physics (AREA)
  • Processing Or Creating Images (AREA)

Abstract

The application provides a dance animation generation method, a model training method, a device, equipment and a storage medium, and relates to the technical field of video production. The method comprises the following steps: dividing input music to obtain a plurality of pieces of music of the input music; performing feature analysis on the music piece to obtain the music feature of the music piece; determining sample music characteristics of a sample music piece, of which the similarity with the music characteristics of the music piece in the sample data set reaches a preset threshold, as target music characteristics; determining a sample dancing segment corresponding to a sample music segment where the target music feature is located as a target dancing segment corresponding to each music segment; and generating target dance animation according to the target dance segment. Compared with the prior art, the method and the device avoid the problems that the time consumption for making the choreography is long, and the consumption of time cost and labor cost is large.

Description

Dance animation generation method, model training method, device, equipment and storage medium
Technical Field
The application relates to the technical field of video production, in particular to a dance animation generation method, a model training method, a device, equipment and a storage medium.
Background
Music dance games are a common play method in games, which provides rich dance segments for player users, and player users can freely combine the dance segments and obtain music dance animation by matching with proper music. After the music dancing animation is produced, the player can further upload the produced music dancing animation to social media for sharing and communication, so that the playing method is popular.
Music dance animation is an important part of games, both in play and in animation. The existing manufacturing flow mainly comprises three parts, namely, firstly, a music analyst deconstructs music, and a professional dancer carries out dancing action design and arrangement according to the existing music and deconstructing results; and thirdly, refining the motion by an animator, and generating music dance animation according to the refined motion and music.
However, such a method for producing a music dance animation is complicated, resulting in a long time for producing a music dance animation, and a large time cost and a large manpower cost.
Disclosure of Invention
The application aims to overcome the defects in the prior art and provide a dance animation generation method, a model training method, a device, equipment and a storage medium, so as to solve the problems that the time consumption for making a music dance animation is long, and the time cost and the labor cost are both high in the prior art.
In order to achieve the above purpose, the technical scheme adopted by the embodiment of the application is as follows:
In a first aspect, an embodiment of the present application provides a dance animation generation method, where the method includes:
Dividing input music to obtain a plurality of pieces of music of the input music;
performing feature analysis on the music piece to obtain the music feature of the music piece;
Determining sample music characteristics of a sample music piece, wherein the similarity between the sample music piece and the music characteristics of the music piece in the sample data set reaches a preset threshold value, as target music characteristics; the sample data set comprises a plurality of sample music dance data, each sample music dance data comprises a sample music piece and a sample dance piece corresponding to the sample music piece, and the sample music is characterized in that the sample music piece is obtained through feature analysis; determining a sample dancing segment corresponding to the sample music segment where the target music feature is located as a target dancing segment corresponding to each music segment;
and generating target dance animation according to the target dance segment.
Optionally, the dividing the input music to obtain a plurality of pieces of music of the input music includes:
And carrying out music theory analysis on the input music to obtain a plurality of music pieces.
Optionally, the performing feature analysis on the music piece to obtain music features of the music piece includes:
and adopting a preset music analysis model to perform feature analysis on the music piece to obtain the music feature of the music piece.
Optionally, the music analysis model includes: presetting an encoder and a conversion model; the step of analyzing the characteristics of the music piece to obtain the music characteristics of the music piece comprises the following steps:
converting each music piece into a music frequency spectrum signal;
adopting the preset encoder to encode the music frequency spectrum signal to obtain one-dimensional characteristics of each music piece;
converting the one-dimensional characteristics of each music piece by adopting the preset conversion model to obtain the music characteristics of each music piece; the music characteristics of each music piece include: context information of the front and rear pieces of music.
Optionally, the music analysis model further includes: presetting a decoder, the method further comprising:
adopting the preset decoder to decode the music characteristics of each music piece to obtain a decoded reconstruction signal; wherein the reconstructed signal is a restored music spectrum signal;
Calculating a target loss function value from the reconstructed signal and the music spectrum signal of each of the pieces of music;
And optimizing the coding parameters of the preset coder according to the target loss function value until a preset stopping condition is met, so as to obtain the optimized coder.
Optionally, the restored music spectrum signal includes: the music characteristic of the restored music piece, each music spectrum signal comprises: musical characteristics of each of the pieces of music;
the calculating a target loss function value according to the restored music spectrum signal and each music spectrum signal includes:
and calculating the target loss function value according to the music characteristics of the restored music pieces and the music characteristics of each music piece.
Optionally, the restored music spectrum signal includes: the mel spectrum characteristics of the restored musical piece, each of the music spectrum signals including: mel spectrum characteristics of each of the pieces of music;
the calculating a target loss function value according to the restored music spectrum signal and each music spectrum signal includes:
Calculating a first loss function value according to the Mel spectrum characteristics of the restored music pieces and the Mel spectrum characteristics of each music piece;
the objective loss function value is calculated from the first loss function value.
Optionally, the restored music spectrum signal further includes: the melody characteristics of the restored musical piece, each of the music spectrum signals including: melody characteristics of each of the pieces of music;
The calculating a target loss function value according to the restored music spectrum signal and each music spectrum signal, further includes:
calculating a second loss function value according to the melody characteristics of the restored musical pieces and the melody characteristics of each musical piece;
the objective loss function value is calculated from the first loss function value and the second loss function value.
Optionally, the restored music spectrum signal further includes: the beat characteristics of the restored musical piece, and each music frequency spectrum signal comprises: beat characteristics of each of the pieces of music;
The calculating a target loss function value according to the restored music spectrum signal and each music spectrum signal, further includes:
calculating a third loss function value according to the beat characteristics of the restored music pieces and the beat characteristics of each music piece;
the objective loss function value is calculated from the first loss function value, the second loss function value, and the third loss function value.
Optionally, before determining, as the target music feature, the sample music feature of the sample music piece in which the similarity between the sample data set and the music feature of the music piece reaches the preset threshold, the method further includes:
and carrying out feature analysis on the sample music piece to obtain sample music features corresponding to the sample music piece.
Optionally, before determining, as the target music feature, the sample music feature of the sample music piece in which the similarity between the sample data set and the music feature of the music piece reaches the preset threshold, the method further includes:
Acquiring sample music and sample dance in a single sample music dance animation;
splitting the sample music to obtain a plurality of sample music fragments;
Splitting the sample dance to obtain a plurality of sample dance fragments;
and determining the sample dance segments corresponding to each sample music segment according to the dividing points of each sample music segment and the rhythm points of each sample dance segment.
Optionally, the splitting the sample dance to obtain a plurality of sample dance segments includes:
Determining a plurality of preselected moments from the sample dance according to the motion parameters of the bone key points in the sample dance;
According to the motion parameters, determining preselected moments when the motion parameter changes beyond a preset parameter change range from the preselected moments as target rhythm points;
and splitting the sample dance according to the target rhythm point to obtain a plurality of sample dance fragments.
Optionally, the determining a plurality of preselected moments from the sample dance according to the motion parameters of the skeletal key points in the sample dance includes:
According to the speed and the acceleration of the skeleton key points in the sample dance, determining the sample dance moment with the acceleration of 0 as a preselected moment;
Correspondingly, the determining, according to the motion parameter, a preselected time point at which the motion parameter changes beyond a preset parameter change range from the plurality of preselected time points is a target rhythm point, including:
calculating the action speed change at each preselected time, and determining the preselected time when the action speed change exceeds the preset parameter change range as a target rhythm point.
Optionally, the determining, as the target music feature, the sample music feature of the sample music piece in which the similarity between the sample data set and the music feature of the music piece reaches the preset threshold includes:
and according to a cosine matching algorithm, determining the sample music feature with the smallest cosine distance from the music feature of the music piece in the sample data set as the target music feature.
Optionally, the generating the target dance animation according to the target dance segment includes:
calculating transition frames of adjacent target dance segments;
and generating the target music dancing animation according to the music fragments, the target dancing fragments corresponding to the music fragments and the transition frames.
In a second aspect, another embodiment of the present application provides a training method of a music analysis model, the music analysis model including: a preset encoder, a preset conversion model and a preset decoder, the method comprising:
Converting the sample music piece into a sample music frequency spectrum signal;
the encoder is adopted to encode the sample music frequency spectrum signals to obtain one-dimensional characteristics of each sample music piece;
Converting the one-dimensional characteristics of each sample music piece by adopting the preset conversion model to obtain the music characteristics of each sample music piece; the music characteristics of each sample music piece comprise: context information of the front and rear sample music pieces;
Adopting the preset decoder to decode the music characteristics of each sample music piece to obtain a decoded reconstruction signal; the reconstructed signal is a restored sample music frequency spectrum signal;
Calculating a target loss function value according to the reconstruction signal of each sample music piece and the sample music frequency spectrum signal;
And optimizing the coding parameters of the preset coder according to the target loss function value until a preset stopping condition is met, so as to obtain the optimized coder.
Optionally, the restored sample music spectrum signal includes: the music characteristics of the restored sample music piece, each sample music spectrum signal comprises: musical characteristics of each of the sample musical pieces;
Said calculating an objective loss function value from said reconstructed signal for each of said sample musical pieces and said sample music spectrum signal, comprising:
And calculating the target loss function value according to the music characteristics of the restored sample music pieces and the music characteristics of each sample music piece.
Optionally, the restored music spectrum signal includes: the mel-frequency spectrum characteristics of the restored sample music piece, and each music frequency spectrum signal comprises: mel spectrum characteristics of each of the sample musical pieces;
Said calculating an objective loss function value from said reconstructed signal for each of said sample musical pieces and said sample music spectrum signal, comprising:
Calculating a first loss function value according to the Mel spectrum characteristics of the restored sample music pieces and the Mel spectrum characteristics of each sample music piece;
the objective loss function value is calculated from the first loss function value.
Optionally, the restored music spectrum signal includes: the melody characteristics of the restored sample music piece, each music spectrum signal comprising: melody features of each of the sample pieces of music;
Said calculating an objective loss function value from said reconstructed signal for each of said sample musical pieces and said sample music spectrum signal, comprising:
calculating a second loss function value according to the melody characteristics of the restored sample music pieces and the melody characteristics of each sample music piece;
the objective loss function value is calculated from the first loss function value and the second loss function value.
Optionally, the restored music spectrum signal includes: the beat characteristics of the restored sample music piece, and each music spectrum signal comprises: beat characteristics of each of the sample pieces of music;
Said calculating an objective loss function value from said reconstructed signal for each of said sample musical pieces and said sample music spectrum signal, comprising:
Calculating a third loss function value according to the beat characteristics of the restored sample music pieces and the beat characteristics of each sample music piece;
the objective loss function value is calculated from the first loss function value, the second loss function value, and the third loss function value.
In a third aspect, another embodiment of the present application provides a dance animation generation apparatus, the apparatus comprising: the device comprises a segmentation module, an analysis module, a determination module and a generation module, wherein:
The segmentation module is used for segmenting the input music to obtain a plurality of music pieces of the input music;
the analysis module is used for carrying out feature analysis on each music piece to obtain the music feature of each music piece;
The determining module is used for determining sample music characteristics of the sample music fragments, wherein the similarity between the sample music characteristics and the music characteristics of the music fragments in the sample data set reaches a preset threshold value, as target music characteristics; the sample data set comprises a plurality of sample music dance data, each sample music dance data comprises a sample music piece and a sample dance piece corresponding to the sample music piece, and the sample music is characterized in that the sample music piece is obtained through feature analysis; determining a sample dancing segment corresponding to the sample music segment where the target music feature is located as a target dancing segment corresponding to each music segment;
and the generating module is used for generating target dance animation according to the target dance segment.
Optionally, the analysis module is specifically configured to perform music theory analysis on the input music to obtain a plurality of music pieces.
Optionally, the analysis module is specifically configured to perform feature analysis on the music piece by using a preset music analysis model to obtain a music feature of the music piece.
Optionally, the music analysis model includes: presetting an encoder and a conversion model; the apparatus further comprises: a conversion module and a processing module, wherein:
The conversion module is used for converting each music piece into a music frequency spectrum signal;
The processing module is used for carrying out coding processing on the music frequency spectrum signals by adopting the preset coder to obtain one-dimensional characteristics of each music fragment;
the conversion module is specifically configured to convert one-dimensional features of each music piece by using the preset conversion model to obtain music features of each music piece; the music characteristics of each music piece include: context information of the front and rear pieces of music.
Optionally, the apparatus further comprises: a calculation module and an optimization module, wherein:
The processing module is specifically configured to decode the music feature of each music segment by using the preset decoder to obtain a decoded reconstructed signal; wherein the reconstructed signal is a restored music spectrum signal;
The calculation module is used for calculating a target loss function value according to the reconstruction signals of the music fragments and the music spectrum signals;
And the optimizing module is used for optimizing the coding parameters of the preset coder according to the target loss function value until the preset stopping condition is met, so as to obtain the optimized coder.
Optionally, the restored music spectrum signal includes: the music characteristic of the restored music piece, each music spectrum signal comprises: musical characteristics of each of the pieces of music;
the calculation module is specifically configured to calculate the objective loss function value according to the music characteristics of the restored music piece and the music characteristics of each music piece.
Optionally, the restored music spectrum signal includes: the mel-frequency spectrum characteristics of the restored musical piece, and each music frequency spectrum signal comprises: mel spectrum characteristics of each of the pieces of music;
The calculation module is specifically configured to calculate a first loss function value according to the mel spectrum characteristics of the restored musical piece and the mel spectrum characteristics of each musical piece; the objective loss function value is calculated from the first loss function value.
Optionally, the restored music spectrum signal further includes: the melody characteristics of the restored musical piece, each of the music spectrum signals including: melody characteristics of each of the pieces of music;
The calculation module is specifically configured to calculate a second loss function value according to the melody characteristics of the restored musical piece and the melody characteristics of each musical piece; the objective loss function value is calculated from the first loss function value and the second loss function value.
Optionally, the restored music spectrum signal further includes: the beat characteristics of the restored musical piece, and each music frequency spectrum signal comprises: beat characteristics of each of the pieces of music;
the calculation module is specifically configured to calculate a third loss function value according to the beat characteristics of the restored musical piece and the beat characteristics of each musical piece; the objective loss function value is calculated from the first loss function value, the second loss function value, and the third loss function value.
Optionally, the analysis module is specifically configured to perform feature analysis on the sample music piece to obtain a sample music feature corresponding to the sample music piece.
Optionally, the obtaining module is configured to obtain sample music and sample dance in the single sample music dance animation;
the splitting module is specifically configured to split the sample music to obtain a plurality of sample music pieces; splitting the sample dance to obtain a plurality of sample dance fragments;
The determining module is specifically configured to determine a sample dance segment corresponding to each sample music segment according to a division point of each sample music segment and a rhythm point of each sample dance segment.
Optionally, the determining module is specifically configured to determine a plurality of preselected moments from the sample dance according to a motion parameter of a skeletal key point in the sample dance; according to the motion parameters, determining preselected moments when the motion parameter changes beyond a preset parameter change range from the preselected moments as target rhythm points;
the dividing module is specifically configured to split the sample dance according to the target rhythm point to obtain the plurality of sample dance segments.
Optionally, the determining module is specifically configured to determine, according to the speed and the acceleration of the skeletal key point in the sample dance, a sample dance time with an acceleration of 0 as a preselected time; calculating the action speed change at each preselected time, and determining the preselected time when the action speed change exceeds the preset parameter change range as a target rhythm point.
Optionally, the determining module is specifically configured to determine, as the target music feature, a sample music feature with a minimum cosine distance from the music feature of the music piece in the sample data set according to a cosine matching algorithm.
Optionally, the calculating module is specifically configured to calculate a transition frame of the adjacent target dance segment;
the generation module is specifically configured to generate the target music dance animation according to each music segment, the target dance segment corresponding to each music segment, and the transition frame.
In a fourth aspect, another embodiment of the present application provides a training apparatus for a music analysis model, the music analysis model including: a preset encoder, a preset conversion model, and a preset decoder, the apparatus comprising: the device comprises a conversion module, a processing module, a calculation module and an optimization module, wherein:
The conversion module is used for converting the sample music piece into a sample music frequency spectrum signal;
the processing module is used for carrying out coding processing on the sample music frequency spectrum signals by adopting the coder to obtain one-dimensional characteristics of each sample music piece;
The conversion module is specifically configured to convert one-dimensional features of each sample music piece by using the preset conversion model to obtain music features of each sample music piece; the music characteristics of each sample music piece comprise: context information of the front and rear sample music pieces;
the processing module is specifically configured to decode the music feature of each sample music segment by using the preset decoder to obtain a decoded reconstructed signal; the reconstructed signal is a restored sample music frequency spectrum signal;
The calculation module is used for calculating a target loss function value according to the reconstruction signals of the sample music fragments and the sample music frequency spectrum signals;
And the optimizing module is used for optimizing the coding parameters of the preset coder according to the target loss function value until the preset stopping condition is met, so as to obtain the optimized coder.
Optionally, the restored sample music spectrum signal includes: the music characteristics of the restored sample music piece, each sample music spectrum signal comprises: musical characteristics of each of the sample musical pieces;
The calculation module is specifically configured to calculate the objective loss function value according to the music characteristics of the restored sample music piece and the music characteristics of each sample music piece.
Optionally, the restored music spectrum signal includes: the mel-frequency spectrum characteristics of the restored sample music piece, and each music frequency spectrum signal comprises: mel spectrum characteristics of each of the sample musical pieces;
the calculation module is specifically configured to calculate a first loss function value according to the mel spectrum characteristics of the restored sample music pieces and the mel spectrum characteristics of each sample music piece; the objective loss function value is calculated from the first loss function value.
Optionally, the restored music spectrum signal includes: the melody characteristics of the restored sample music piece, each music spectrum signal comprising: melody features of each of the sample pieces of music;
The calculation module is specifically configured to calculate a second loss function value according to the melody characteristics of the restored sample music piece and the melody characteristics of each sample music piece; the objective loss function value is calculated from the first loss function value and the second loss function value.
Optionally, the restored music spectrum signal includes: the beat characteristics of the restored sample music piece, and each music spectrum signal comprises: beat characteristics of each of the sample pieces of music;
the calculation module is specifically configured to calculate a third loss function value according to beat features of the restored sample music pieces and beat features of each sample music piece; the objective loss function value is calculated from the first loss function value, the second loss function value, and the third loss function value.
In a fifth aspect, another embodiment of the present application provides an electronic device, including: a processor, a storage medium storing machine-readable instructions executable by the processor, the processor and the storage medium communicating over a bus when the electronic device is running, the processor executing the machine-readable instructions to perform the steps of the method as described in any of the first or second aspects above.
In a sixth aspect, another embodiment of the application provides a storage medium having stored thereon a computer program which, when executed by a processor, performs the steps of the method according to any of the first or second aspects described above.
The beneficial effects of the application are as follows: by adopting the dance animation generation method provided by the application, input music can be segmented to obtain a plurality of pieces of music of the input music, then, characteristic analysis is carried out on each piece of music to obtain the music characteristics of each piece of music, the sample music characteristics of which the similarity with the music characteristics of each piece of music reaches a preset threshold value are determined in a sample data set to be target music characteristics, then, the sample dance piece corresponding to the sample music piece where the target music characteristics are determined to be the target dance piece corresponding to each piece of music, and finally, the target dance animation is generated according to the target dance piece, namely, the target dance animation corresponding to the input music is generated, so that the rapid automatic generation of the target dance animation is realized, and the problems of great time consumption and labor cost for generating the dance animation are reduced.
Drawings
In order to more clearly illustrate the technical solutions of the embodiments of the present application, the drawings that are needed in the embodiments will be briefly described below, it being understood that the following drawings only illustrate some embodiments of the present application and therefore should not be considered as limiting the scope, and other related drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.
FIG. 1 is a flowchart illustrating a dance animation generation method according to an embodiment of the present application;
FIG. 2 is a flowchart illustrating a dance animation generation method according to another embodiment of the present application;
FIG. 3 is a flowchart illustrating a dance animation generation method according to another embodiment of the present application;
FIG. 4 is a flow chart of encoder optimization according to an embodiment of the present application;
FIG. 5 is a flowchart illustrating a dance animation generation method according to another embodiment of the present application;
FIG. 6 is a flowchart illustrating a dance animation generation method according to another embodiment of the present application;
FIG. 7 is a flowchart illustrating a dance animation generation method according to another embodiment of the present application;
FIG. 8 is a flowchart illustrating a dance animation generation method according to an embodiment of the present application;
FIG. 9 is a schematic diagram of a dance animation generation apparatus according to an embodiment of the present application;
FIG. 10 is a schematic diagram of a dance animation generation apparatus according to another embodiment of the present application;
FIG. 11 is a schematic diagram of a dance animation generation apparatus according to an embodiment of the present application;
fig. 12 is a schematic structural diagram of an electronic device according to an embodiment of the present application.
Detailed Description
For the purpose of making the objects, technical solutions and advantages of the embodiments of the present application more apparent, the technical solutions of the embodiments of the present application will be clearly and completely described below with reference to the accompanying drawings in the embodiments of the present application, and it is apparent that the described embodiments are some embodiments of the present application, but not all embodiments of the present application.
The components of the embodiments of the present application generally described and illustrated in the figures herein may be arranged and designed in a wide variety of different configurations. Thus, the following detailed description of the embodiments of the application, as presented in the figures, is not intended to limit the scope of the application, as claimed, but is merely representative of selected embodiments of the application. All other embodiments, which can be made by a person skilled in the art without making any inventive effort, are intended to be within the scope of the present application.
Additionally, flowcharts used in this disclosure illustrate operations implemented according to some embodiments of the present application. It should be understood that the operations of the flow diagrams may be implemented out of order and that steps without logical context may be performed in reverse order or concurrently. Moreover, one or more other operations may be added to or removed from the flow diagrams by those skilled in the art under the direction of the present disclosure.
The dance animation generation method provided by the embodiment of the application is explained below by combining a plurality of specific application examples. Fig. 1 is a flow chart of a dance animation generation method according to an embodiment of the application, as shown in fig. 1, the method includes:
s101: the input music is divided to obtain a plurality of pieces of music of the input music.
In one embodiment of the present application, for example, music theory analysis may be performed on input music to obtain a plurality of pieces of music. For example, the input music may be simply analyzed by using an audio analysis algorithm to obtain the beat of the input music, then the obtained beats are integrated, a plurality of music rest points of the input music are found, and the input music is divided according to the plurality of music rest points to obtain a plurality of pieces of the input music.
In some possible embodiments, the manner of determining the plurality of pieces of music may include the following three manners: judging in a mode of stopping the period: there is typically some form of semi-termination or termination in a piece of music, i.e. there is no long sound at the end of the piece of music, but a rest with a longer duration; or judge by the way of the last long voice of the sentence: the end of the music piece is provided with a sound with a longer time value, after the music piece is prolonged, the melody is not in an up-down fluctuation state, the melody line becomes horizontal, and the hearing feeling of the melody is provided; or judging in a melody or rhythm repetition mode: some music fragments have no long period and no rest period, but the melody or rhythm is repeated, and the individual music fragments can be clearly understood, which usually occurs in songs with higher speed and higher dynamic.
S102: and carrying out feature analysis on each music piece to obtain the music feature of each music piece.
In some possible embodiments, for example, a preset music analysis model may be used to perform feature analysis on each music piece, so as to obtain the music feature of each music piece.
The input of the preset music analysis model is a music piece, and the output is the music characteristic of the music piece.
S103: and determining the sample music characteristics of the sample music pieces reaching a preset threshold value as target music characteristics according to the music characteristics of each music piece in the sample data set.
Before S103, sample music features corresponding to the sample music segments may be obtained by performing feature analysis on the sample music segments.
In one embodiment of the present application, the calculation manner of the similarity between the sample data set and the music feature of each music piece may be, for example, a cosine matching algorithm, a euclidean distance calculation algorithm, a pearson correlation coefficient calculation algorithm, etc., which should be understood that the foregoing embodiment is only illustrative, and the specific similarity calculation method may be flexibly adjusted according to the needs of the user, and is not limited to the foregoing embodiment.
Taking a cosine matching algorithm as an example for a similarity calculation method, taking a sample music feature with the smallest cosine distance from the music feature of the music piece in the sample data set as a target music feature, wherein each music piece has a corresponding target music feature.
Wherein the sample dataset comprises: a plurality of sample music dance data, each sample music dance data comprising: the music characteristics of the sample music pieces and the sample dance pieces corresponding to the sample music pieces are obtained by carrying out characteristic analysis on the sample music pieces; the corresponding relation between the sample music piece and the sample dance piece is corresponding in time sequence and rhythm.
In one embodiment of the present application, the sample music segment and the sample dance segment may be obtained from a single sample music dance animation, for example, the sample music dance animation may be split into a plurality of sample music dance animation segments, and then the music and dance of the plurality of sample music dance animation segments are split, where a correspondence exists between the sample music segment and the sample dance segment obtained by splitting the same sample music dance animation segment.
In the above embodiment, since the music characteristics of the sample music piece and the sample dance piece corresponding to the sample music piece are obtained from the single sample music dance animation, the matching degree between the sample music and the sample dance is higher in such sample music dance data.
S104: and determining the sample dancing segments corresponding to the sample music segments with the target music features as target dancing segments corresponding to the music segments.
The sample dancing segments corresponding to the sample music segments are obtained from single sample music dancing animation, so that the obtained target dancing segments are higher in matching degree with the music segments, and the obtained target dancing segments can keep the fineness of original actions in the sample dancing segments, so that the matching accuracy between the target dancing segments and the music segments is improved.
S105: and generating target dance animation according to the target dance segment.
In one embodiment of the application, the target dance animation can be generated after splicing according to the corresponding relation according to the music pieces and the target dance pieces corresponding to the music pieces; specifically, for example, according to each music segment and the target dance segment corresponding to each music segment, each music dance segment is formed, and then each music dance segment is joined to generate the final whole target music dance animation.
In one embodiment of the present application, the dance animation generation method is, for example, constructed based on a one-time-of-chance (one shot) game scene, and the dance animation generation method in the one shot game scene does not need iteration, and even if only one training sample is used, prediction with a certain accuracy rate can still be completed, for example, accurate matching is realized when only one piece of music-dance capture data is used; and the dance motion generated by the method provided by the application can be directly applied to the game scene. However, the application scenario of the method provided by the present application is not limited to this, and the method can be applied to any scenario in which a corresponding music dance animation needs to be generated according to input music, and the application is not limited to this.
By adopting the dance animation generation method provided by the application, input music can be segmented to obtain a plurality of pieces of music of the input music, then, characteristic analysis is carried out on each piece of music to obtain the music characteristics of each piece of music, the sample music characteristics of which the similarity with the music characteristics of each piece of music reaches a preset threshold value are determined in a sample data set to be target music characteristics, then, the sample dance piece corresponding to the sample music piece where the target music characteristics are determined to be the target dance piece corresponding to each piece of music, and finally, the target dance animation is generated according to the target dance piece, namely, the target dance animation corresponding to the input music is generated, so that the rapid automatic generation of the target dance animation is realized, and the problems of great time consumption and labor cost for generating the dance animation are reduced.
Optionally, on the basis of the foregoing embodiment, the embodiment of the present application may further provide a dance animation generation method, where an implementation procedure of the method is described below by way of example with reference to the accompanying drawings. Fig. 2 is a schematic flow chart of a dance animation generation method according to another embodiment of the present application, where a music analysis model includes: presetting an encoder and a conversion model; as shown in fig. 2, S103 may include:
s106: each musical piece is converted into a music spectrum signal.
By way of example, in some possible embodiments, the audio signal of each musical piece may be converted into a musical spectrum signal by means of mel-frequency spectrum, for example.
S107: and adopting a preset encoder to encode the music frequency spectrum signal to obtain one-dimensional characteristics of each music piece.
In one embodiment of the application, the encoder may be, for example, a convolutional neural Network based encoder, which may be, for example, a Residual Network (ResNet), from which ResNet a set of one-dimensional features corresponding to each piece of music is generated.
S108: and converting the one-dimensional characteristics of each music piece by adopting a preset conversion model to obtain the music characteristics of each music piece.
The musical characteristics of each musical piece include: context information of the front and rear pieces of music.
In some possible embodiments, the preset conversion model may be, for example, a model based on a self-attention mechanism (for example, a transducer model), so that, in order to enable the music feature to have context information of the preceding and following music pieces, in embodiments of the present application, the encoder first processes the music spectrum signal corresponding to each music piece to obtain one-dimensional features of each music piece, and then inputs the one-dimensional features corresponding to each music piece belonging to the same input music together into the preset conversion model to obtain the music feature of each music piece having context information of the preceding and following music pieces.
For example, in one embodiment of the present application, in order to further enhance the expressive power of music features, the present application further employs a masking mechanism to randomly mask one-dimensional features of the input preset conversion model, where the preset conversion model needs to predict the one-dimensional features of the masked positions according to the features of other unmasked positions. The masked areas need to be learned for a preset conversion model, while the unmasked areas can be directly output according to the input. Since the preset conversion model internally includes the self-attention mechanism module, the feature of the current position can be changed according to the context information, and when the input is a mask, the preset conversion model can identify the mask and replace the mask according to the context feature. Therefore, if other situations such as noise exist in the input music features, the part with noise can be directly complemented by the preset conversion model after learning according to the context information, instead of directly outputting the music features with noise, so that the preset conversion model after learning by the mask mechanism has stronger robustness.
Optionally, on the basis of the foregoing embodiment, the embodiment of the present application may further provide a dance animation generation method, where an implementation procedure of the method is described below by way of example with reference to the accompanying drawings. Fig. 3 is a schematic flow chart of a dance animation generation method according to another embodiment of the present application, and fig. 4 is a schematic flow chart of encoder optimization according to an embodiment of the present application, where the music analysis model further includes: the decoder is preset, as shown in fig. 3, the method may further include:
S109: and adopting a preset decoder to decode the music characteristics of each music piece to obtain a decoded reconstructed signal.
Illustratively, in some possible embodiments, to obtain an essential representation of music, the present application introduces a first decoder to restore one-dimensional features of each piece of music, obtaining a reconstructed signal, where the reconstructed signal is a restored music spectrum signal, and in one embodiment of the present application, the first decoder may have 8 transposed 2D convolution layers, for example.
S110: the objective loss function value is calculated from the reconstructed signal and the music spectrum signal of each music piece.
In one embodiment of the present application, the restored music spectrum signal includes: the mel-frequency spectrum characteristics of the restored music piece, each music frequency spectrum signal comprises: mel spectrum characteristics of each musical piece; the calculation mode of the target loss function at this time may be: calculating a first loss function value according to the Mel spectrum characteristics of the restored music pieces and the Mel spectrum characteristics of each music piece; the objective loss function value is calculated from the first loss function value.
The first loss function value may be calculated, for example, by: lspe (E, G) = ||g (E (x)) -x||1; wherein E is a preset encoder, G is a preset decoder, and x is a Mel spectrum characteristic; the first loss function value obtained at this time is the objective loss function value.
In another embodiment of the present application, the restored music spectrum signal further includes: the melody characteristics of the restored musical piece, each music spectrum signal including: melody characteristics of each musical piece; the calculation mode of the target loss function at this time may be: calculating a second loss function value according to the melody characteristics of the restored music pieces and the melody characteristics of each music piece; the objective loss function value is calculated from the first loss function value and the second loss function value.
In some possible embodiments, the melody characteristic of the restored music piece may be restored according to a second decoder, for example, a decoder with 5 transposes to be a convolution layer, where the second loss function value may be calculated according to the following formula: lmld (E, G2) = |g2 (E (x)) -moldy (x) |1, where E is a preset encoder, G2 is a decoder with 5 transposes to be a convolutional layer, and x is a Melody feature.
In another embodiment of the present application, the restored music spectrum signal further includes: the beat characteristics of the restored music piece, each music spectrum signal comprises: beat characteristics of each musical piece; the calculation mode of the target loss function at this time may be: calculating a third loss function value according to the beat characteristics of the restored music pieces and the beat characteristics of each music piece; the target loss function value is calculated from the first loss function value, the second loss function value, and the third loss function value.
In some possible embodiments, the beat characteristics of the restored music piece may be restored according to a third decoder, for example, the third decoder may be a rhythm decoder, which may be similar to the second decoder in structure, but the third decoder may generate a binary output, where the third loss function value may be calculated according to the following formula, for example: lrym (E, G3) = BCELoss (G3 (E (x)); ryhm (x)). Wherein: e is a preset encoder, G3 is a cadence decoder similar in structure to G2, but producing a binary output, x is a beat feature, BCELoss is a binary cross entropy loss function.
In one embodiment of the present application, the main melody and rhythm of each music piece can be obtained by using a common music analysis database, wherein the music analysis database can be, for example, a voice signal processing database librosa, and the selection of the specific music analysis database can be flexibly adjusted according to the needs of the user, and is not limited to the above embodiment.
It should be understood that the above embodiments are merely exemplary, and the specific objective loss function value is calculated according to one, two, or three of the first, second, and third loss function values, and may be flexibly adjusted according to the needs of the user, and is not limited to the above embodiments.
When the objective loss function value is calculated according to at least two loss function values, for example, the loss function values may be accumulated with a preset weight, and the accumulated value is the objective loss function value, and the specific calculation mode of the objective loss function value may be flexibly adjusted according to the needs of the user, which is not limited by the above embodiment.
S111: and optimizing the coding parameters of the preset coder according to the objective loss function value until the preset stopping condition is met, so as to obtain the optimized coder.
As shown in fig. 4, the optimization flow of the encoder may be, for example, that after the music spectrum signals of each sample music segment are encoded according to the encoder, one-dimensional features corresponding to each sample music segment are obtained, and each one-dimensional feature is input into a preset conversion model, then the preset conversion model obtains each sample music spectrum signal according to the one-dimensional features of each sample music segment, then each music feature is decoded by a decoder to obtain a reconstructed signal corresponding to each music feature, and a target loss function is calculated according to the reconstructed signal and the music spectrum signal of each sample music feature, and the encoder is optimized according to the target loss function until a preset stop condition is met.
The method for optimizing the coding parameters of the preset coder according to the objective loss function value can enable the one-dimensional characteristics obtained by coding through the coder to be more consistent with the one-dimensional characteristics of the corresponding music piece, and the characteristics of the music piece restored through the one-dimensional characteristics are kept consistent with the original characteristics of the music piece, so that the accuracy of obtaining each music frequency spectrum signal subsequently is improved.
Optionally, on the basis of the foregoing embodiment, the embodiment of the present application may further provide a dance animation generation method, where an implementation procedure of the method is described below by way of example with reference to the accompanying drawings. Fig. 5 is a flowchart of a dance animation generation method according to another embodiment of the present application, where, as shown in fig. 5, before S103, the method may further include:
S112: sample music and sample dance in a single sample music dance animation are acquired.
And dividing the sample music and the sample dance according to the single sample music dance animation to obtain the sample music and the sample dance corresponding to the single sample music dance animation.
S113: splitting the sample music to obtain a plurality of sample music fragments.
In an embodiment of the present application, the splitting manner of the sample music may be the same as the splitting manner of the input music, which is not described herein.
S114: splitting the sample dance to obtain a plurality of sample dance fragments.
In one embodiment of the present application, the sample dance may be split according to different types of sample music dance animation, for example, in the description of the waltzstyle dance, if the sample music dance animation is of the waltzstyle, at this time, each section is three beats and each beat is one beat, the first beat is a re-beat, and the three beats are one together volt cycle, and in particular, the method for splitting the sample dance may be flexibly adjusted according to the needs of the user, which is not limited to the above embodiment.
S115: and determining the sample dance segments corresponding to the sample music segments according to the dividing points of the sample music segments and the rhythm points of the sample dance segments.
And the sample dance segments corresponding to the sample music segments are the same as the duration of the sample music segments.
Because each sample music piece has its corresponding sample dancing piece, and sample music piece and sample dancing piece all are split and paired in follow single sample music dance, so the cooperation degree and the degree of fit between sample music piece and the sample dancing piece in the sample music dancing data that obtain like this are higher.
Optionally, on the basis of the foregoing embodiment, the embodiment of the present application may further provide a dance animation generation method, where an implementation procedure of the method is described below by way of example with reference to the accompanying drawings. Fig. 6 is a flowchart of a dance animation generation method according to another embodiment of the present application, where, as shown in fig. 6, S114 may include:
S116: and determining a plurality of preselected moments from the sample dance according to the motion parameters of the bone key points in the sample dance.
In one embodiment of the application, the motion parameters may include, for example: acceleration, speed, etc., i.e., the time when the plurality of accelerations are 0 may be determined as the preselected time according to, for example, the acceleration of the skeletal key points in the sample dance.
Correspondingly, S115 may include:
S117: and determining a preselected time point, at which the motion parameter changes beyond a preset parameter change range, from a plurality of preselected time points as a target rhythm point according to the motion parameter.
For example, the preselected time when the motion change speed exceeds the preset motion change speed range may be selected as the target rhythm point according to the motion change speed of the frame before and after the preselected point, that is, the preselected time when the motion change speed is greater than or equal to the maximum motion change speed threshold or the preselected time when the motion change speed is less than or equal to the minimum motion change speed threshold may be selected as the target rhythm point.
In some possible embodiments, since the rhythm points of the sample dance are denser than the dividing points of the music pieces, the present application further divides the sample music in combination with the dividing points of the corresponding music pieces. When the dividing point of the music piece coincides with the target rhythm point of the sample music, dividing the sample music directly according to the target rhythm point of the sample music; when the dividing point of the music piece is not coincident with the target rhythm point of the sample music, determining the target rhythm point closest to the dividing point of the music piece as a target dividing point, and cutting the sample music according to the target dividing point.
The method for further determining the rhythm point of the sample dance through the dividing point of the music piece can further improve the accuracy of the sample dance cutting point, so that the cut sample music piece and the sample dance piece are matched more.
Optionally, on the basis of the foregoing embodiment, the embodiment of the present application may further provide a dance animation generation method, where an implementation procedure of the method is described below by way of example with reference to the accompanying drawings. Fig. 7 is a flowchart of a dance animation generation method according to another embodiment of the present application, where, as shown in fig. 7, S105 may include:
S118: and calculating transition frames of adjacent target dance segments.
In one embodiment of the present application, since the end-to-end connection of the different target dance segments is not necessarily complete, the present application calculates the transition frame between the adjacent target dance segments by using a hybrid algorithm, which may be, for example, a linear algorithm or a deep learning-based method, and the present application is not limited in this regard.
S109: and generating target music dancing animation according to each music segment, the target dance segment corresponding to each music segment and the transition frame.
The mode of transition between adjacent target dance fragments through the transition frame enables the obtained target music dance animation to be smoother, and accordingly user experience is further improved.
By adopting the dance animation generation method provided by the application, high-quality music dance animation can be generated according to the input music, the generated music dance animation not only can keep the fineness degree of dance actions in the original sample music dance animation, but also can realize the contextual understanding of the input music, so that more consistent music dance animation conforming to the input music style can be generated, and the user experience of a user when the music dance animation is generated is improved.
Fig. 8 is a flowchart of a training method of a music analysis model according to an embodiment of the present application, where the music analysis model includes: a preset encoder, a preset conversion model, and a preset decoder, as shown in fig. 8, the method includes:
s201: the sample musical piece is converted into a sample musical spectrum signal.
The sample music segment may be split from the sample music dance segment, or obtained by crawling from a network, and the specific sample music segment obtaining manner may be flexibly adjusted according to the user's needs, which is not limited to the above embodiments.
S202: and (3) adopting an encoder to encode the sample music frequency spectrum signals to obtain one-dimensional characteristics of each sample music piece.
In one embodiment of the application, the encoder may be, for example, a convolutional neural Network based encoder, which may be, for example, a Residual Network (ResNet), from which ResNet a set of one-dimensional features corresponding to each piece of music is generated.
S203: and converting the one-dimensional characteristics of each sample music piece by adopting a preset conversion model to obtain the music characteristics of each sample music piece.
In some possible embodiments, the preset conversion model may be, for example, a model based on a self-attention mechanism (for example, a transducer model), so that, in order to enable the music feature to have context information of the preceding and following music pieces, in embodiments of the present application, the encoder first processes the music spectrum signal corresponding to each music piece to obtain one-dimensional features of each music piece, and then inputs the one-dimensional features corresponding to each music piece belonging to the same input music together into the preset conversion model to obtain the music feature of each music piece having context information of the preceding and following music pieces.
The music characteristics of each sample music piece include: context information of the front and rear sample pieces of music.
For example, in one embodiment of the present application, in order to further enhance the expressive power of music features, the present application further employs a masking mechanism to randomly mask one-dimensional features of the input preset conversion model, where the preset conversion model needs to predict the one-dimensional features of the masked positions according to the features of other unmasked positions. The masked areas need to be learned for a preset conversion model, while the unmasked areas can be directly output according to the input. Since the preset conversion model internally includes the self-attention mechanism module, the feature of the current position can be changed according to the context information, and when the input is a mask, the preset conversion model can identify the mask and replace the mask according to the context feature. Therefore, if other situations such as noise exist in the input music features, the part with noise can be directly complemented by the preset conversion model after learning according to the context information, instead of directly outputting the music features with noise, so that the preset conversion model after learning by the mask mechanism has stronger robustness.
S204: and adopting a preset decoder to decode the music characteristics of each sample music piece to obtain a decoded reconstructed signal.
The reconstructed signal is a restored sample music frequency spectrum signal.
In one embodiment of the present application, the restored sample music spectrum signal includes: the music characteristics of the restored sample music piece, each sample music spectrum signal comprises: musical characteristics of each sample musical piece; the way to calculate the objective loss function value at this time may be: and calculating the objective loss function value according to the music characteristics of the restored sample music pieces and the music characteristics of each sample music piece.
Illustratively, in some possible embodiments, the restored music spectrum signal comprises: the mel-frequency spectrum characteristics of the restored sample music piece, and each music frequency spectrum signal comprises: mel spectrum characteristics of each sample piece of music; the objective loss function value may be calculated by: calculating a first loss function value according to the Mel spectrum characteristics of the restored sample music pieces and the Mel spectrum characteristics of each sample music piece; the objective loss function value is calculated from the first loss function value.
Illustratively, in other possible embodiments, the restored musical spectrum signal comprises: the melody characteristics of the restored sample music piece, each music spectrum signal includes: melody features of each sample musical piece; the objective loss function value may be calculated by: calculating a second loss function value according to the melody characteristics of the restored sample music pieces and the melody characteristics of each sample music piece; the objective loss function value is calculated from the first loss function value and the second loss function value.
Illustratively, in other possible embodiments, the restored musical spectrum signal comprises: the beat characteristics of the restored sample music piece, each music spectrum signal comprises: beat characteristics of each sample musical piece; the objective loss function value may be calculated by: calculating a third loss function value according to the beat characteristics of the restored sample music pieces and the beat characteristics of each sample music piece; the target loss function value is calculated from the first loss function value, the second loss function value, and the third loss function value.
S205: and calculating a target loss function value according to the reconstructed signal of each sample music piece and the sample music frequency spectrum signal.
S206: and optimizing the coding parameters of the preset coder according to the objective loss function value until the preset stopping condition is met, so as to obtain the optimized coder.
The method for optimizing the coding parameters of the preset coder according to the objective loss function value can enable the one-dimensional characteristics obtained by coding through the coder to be more consistent with the one-dimensional characteristics of the corresponding music piece, and the characteristics of the music piece restored through the one-dimensional characteristics are kept consistent with the original characteristics of the music piece, so that the accuracy of obtaining each music frequency spectrum signal subsequently is improved.
The dance animation generation device provided by the application is explained below with reference to the accompanying drawings, and the dance animation generation device can execute any one of the dance animation generation methods of fig. 1-7, and the specific implementation and the beneficial effects thereof refer to the above, and are not repeated below.
FIG. 9 is a schematic structural diagram of a dance animation generation apparatus according to an embodiment of the present application, as shown in FIG. 9, the apparatus includes: a segmentation module 301, an analysis module 302, a determination module 303, and a generation module 304, wherein:
The dividing module 301 is configured to divide the input music to obtain a plurality of pieces of music of the input music.
The analysis module 302 is configured to perform feature analysis on each music piece to obtain a music feature of each music piece.
A determining module 303, configured to determine that a sample music feature in the sample data set, in which a similarity between the sample data set and a music feature of each music piece reaches a preset threshold, is a target music feature; wherein the sample dataset comprises: the system comprises a plurality of sample music dance data, a plurality of computer program modules and a computer program, wherein each sample music dance data comprises music characteristics of a sample music piece and a sample dance piece corresponding to the sample music piece, and the sample music characteristics are obtained by carrying out characteristic analysis on the sample music piece; and determining the sample dancing segments corresponding to the sample music segments with the target music features as target dancing segments corresponding to the music segments.
The generating module 304 is configured to generate a target dance animation according to the target dance segment.
Optionally, the analysis module 302 is specifically configured to perform music theory analysis on the input music to obtain a plurality of music pieces.
Optionally, the analysis module 302 is specifically configured to perform feature analysis on the music piece by using a preset music analysis model to obtain a music feature of the music piece.
Fig. 10 is a schematic structural diagram of a dance animation generation apparatus according to an embodiment of the present application, where a music analysis model includes: presetting an encoder and a conversion model; as shown in fig. 10, the apparatus further includes: a conversion module 305 and a processing module 306, wherein:
The conversion module 305 is configured to convert each music piece into a music spectrum signal.
And the processing module 306 is used for encoding the music spectrum signal by adopting an encoder to obtain the one-dimensional characteristics of each music piece.
The conversion module 305 is specifically configured to convert one-dimensional features of each music piece by using a preset conversion model to obtain music features of each music piece, where the music features of each music piece include: context information of the front and rear pieces of music.
As shown in fig. 10, the apparatus further includes: a calculation module 307 and an optimization module 308, wherein:
The processing module 306 is specifically configured to decode the music feature of each music segment by using a preset decoder to obtain a decoded reconstructed signal; wherein the reconstructed signal is a restored music spectrum signal.
The calculation module 207 is specifically configured to calculate a target loss function value according to the reconstructed signal and the music spectrum signal of each music piece.
And an optimizing module 308, configured to optimize the encoding parameters of the preset encoder according to the objective loss function value until a preset stopping condition is satisfied, thereby obtaining an optimized encoder.
Optionally, the restored music spectrum signal includes: the music characteristics of the restored musical piece, each music spectrum signal comprising: musical characteristics of each musical piece;
The calculating module 307 is specifically configured to calculate the objective loss function value according to the music characteristic of the restored music piece and the music characteristic of each music piece.
Optionally, the restored music spectrum signal includes: the mel-frequency spectrum characteristics of the restored music piece, each music frequency spectrum signal comprises: mel-frequency spectrum characteristics of each musical piece.
A calculating module 307, specifically configured to calculate a first loss function value according to the mel spectrum feature of the restored music piece and the mel spectrum feature of each music piece; the objective loss function value is calculated from the first loss function value.
Optionally, the restored music spectrum signal further comprises: the melody characteristics of the restored musical piece, each music spectrum signal including: melody characteristics of each musical piece.
A calculating module 307, configured to calculate a second loss function value according to the melody characteristics of the restored musical piece and the melody characteristics of each musical piece; the objective loss function value is calculated from the first loss function value and the second loss function value.
Optionally, the restored music spectrum signal further comprises: the beat characteristics of the restored music piece, each music spectrum signal comprises: beat characteristics of each musical piece.
A calculating module 307, specifically configured to calculate a third loss function value according to the beat characteristics of the restored music pieces and the beat characteristics of each music piece; the target loss function value is calculated from the first loss function value, the second loss function value, and the third loss function value.
As shown in fig. 10, the apparatus further includes: the obtaining module 309 is configured to obtain sample music and sample dance in the single sample music dance animation.
The splitting module 301 is specifically configured to split the sample music to obtain a plurality of sample music pieces; splitting the sample dance to obtain a plurality of sample dance fragments.
The determining module 303 is specifically configured to determine a sample dance segment corresponding to each sample music segment according to the division point of each sample music segment and the rhythm point of each sample dance segment.
Optionally, the determining module 303 is specifically configured to determine a plurality of preselected moments from the sample dance according to the motion parameters of the skeletal key points in the sample dance; and determining a preselected time point, at which the motion parameter changes beyond a preset parameter change range, from a plurality of preselected time points as a target rhythm point according to the motion parameter.
The segmentation module 301 is specifically configured to split the sample dance according to the target rhythm point to obtain a plurality of sample dance segments.
Optionally, the determining module 303 is specifically configured to determine, according to the speed and the acceleration of the bone key point in the sample dance, a sample dance time with an acceleration of 0 as a preselected time; and calculating the action speed change at each preselected time, and determining the preselected time when the action speed change exceeds the preset parameter change range as a target rhythm point.
Optionally, the determining module 303 is specifically configured to determine, as the target music feature, a sample music feature with a minimum cosine distance from the music feature of the music piece in the sample data set according to a cosine matching algorithm.
Optionally, the calculating module 307 is specifically configured to calculate a transition frame of the adjacent target dance segment;
the generating module 304 is specifically configured to generate a target music dance animation according to each music clip, a target dance clip corresponding to each music clip, and a transition frame.
The training device for a music analysis model provided by the present application is explained below with reference to the accompanying drawings, and the training device for a music analysis model can execute the training method for a music analysis model of fig. 8, and the specific implementation and the beneficial effects thereof are referred to above and are not described in detail below.
FIG. 11 is a schematic diagram of a dance animation generation apparatus according to an embodiment of the present application, wherein a music analysis model includes: a preset encoder, a preset conversion model, and a preset decoder, as shown in fig. 11, the apparatus includes: a conversion module 401, a processing module 402, a calculation module 403, and an optimization module 404, wherein:
A conversion module 401, configured to convert a sample music piece into a sample music spectrum signal;
a processing module 402, configured to perform encoding processing on the sample music spectrum signal by using an encoder, so as to obtain one-dimensional features of each sample music piece;
the conversion module 401 is specifically configured to convert one-dimensional features of each sample musical piece by using a preset conversion model to obtain musical features of each sample musical piece; the music characteristics of each sample music piece include: context information of the front and rear sample music pieces;
the processing module 402 is specifically configured to decode the music feature of each sample music piece by using a preset decoder to obtain a decoded reconstructed signal; the reconstructed signal is a restored sample music frequency spectrum signal;
a calculation module 403, configured to calculate a target loss function value according to the reconstructed signal of each sample music piece and the sample music spectrum signal;
And an optimizing module 404, configured to optimize the encoding parameters of the preset encoder according to the objective loss function value until a preset stopping condition is satisfied, thereby obtaining an optimized encoder.
Optionally, the restored sample music spectrum signal includes: the music characteristics of the restored sample music piece, each sample music spectrum signal comprises: musical characteristics of each sample musical piece;
the calculating module 403 is specifically configured to calculate the objective loss function value according to the music feature of the restored sample music piece and the music feature of each sample music piece.
Optionally, the restored music spectrum signal includes: the mel-frequency spectrum characteristics of the restored sample music piece, and each music frequency spectrum signal comprises: mel spectrum characteristics of each sample piece of music;
a calculation module 403, configured to calculate a first loss function value according to the mel spectrum feature of the restored sample music piece and the mel spectrum feature of each sample music piece; the objective loss function value is calculated from the first loss function value.
Optionally, the restored music spectrum signal includes: the melody characteristics of the restored sample music piece, each music spectrum signal includes: melody features of each sample musical piece;
a calculating module 403, configured to calculate a second loss function value according to the melody feature of the restored sample music piece and the melody feature of each sample music piece; the objective loss function value is calculated from the first loss function value and the second loss function value.
Optionally, the restored music spectrum signal includes: the beat characteristics of the restored sample music piece, each music spectrum signal comprises: beat characteristics of each sample musical piece;
A calculating module 403, configured to calculate a third loss function value according to the beat characteristics of the restored sample music piece and the beat characteristics of each sample music piece; the target loss function value is calculated from the first loss function value, the second loss function value, and the third loss function value.
The foregoing apparatus is used for executing the method provided in the foregoing embodiment, and its implementation principle and technical effects are similar, and are not described herein again.
The above modules may be one or more integrated circuits configured to implement the above methods, for example: one or more Application SPECIFIC INTEGRATED Circuits (ASIC), or one or more microprocessors (DIGITAL SINGNAL Processor DSP), or one or more field programmable gate arrays (Field Programmable GATE ARRAY FPGA), etc. For another example, when a module above is implemented in the form of a processing element scheduler code, the processing element may be a general-purpose processor, such as a central processing unit (Central Processing Unit, CPU) or other processor that may invoke the program code. For another example, the modules may be integrated together and implemented in the form of a System-on-a-Chip (SoC for short).
Fig. 12 is a schematic structural diagram of an electronic device according to an embodiment of the present application, where the electronic device may be integrated in a terminal device or a chip of the terminal device.
The electronic device includes: a processor 501, a storage medium 502, and a bus 503.
The processor 501 is configured to store a program, and the processor 501 invokes the program stored in the storage medium 502 to execute the method embodiments corresponding to fig. 1 to 8. The specific implementation manner and the technical effect are similar, and are not repeated here.
Optionally, the present application also provides a program product, such as a storage medium, on which a computer program is stored, including a program which, when being executed by a processor, performs the corresponding embodiments of the above-mentioned method.
In the several embodiments provided by the present application, it should be understood that the disclosed apparatus and method may be implemented in other manners. For example, the apparatus embodiments described above are merely illustrative, e.g., the division of the units is merely a logical function division, and there may be additional divisions when actually implemented, e.g., multiple units or components may be combined or integrated into another system, or some features may be omitted or not performed. Alternatively, the coupling or direct coupling or communication connection shown or discussed with each other may be an indirect coupling or communication connection via some interfaces, devices or units, which may be in electrical, mechanical or other form.
The units described as separate units may or may not be physically separate, and units shown as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the units may be selected according to actual needs to achieve the purpose of the solution of this embodiment.
In addition, each functional unit in the embodiments of the present application may be integrated in one processing unit, or each unit may exist alone physically, or two or more units may be integrated in one unit. The integrated units may be implemented in hardware or in hardware plus software functional units.
The integrated units implemented in the form of software functional units described above may be stored in a computer readable storage medium. The software functional unit is stored in a storage medium and includes instructions for causing a computer device (which may be a personal computer, a server, or a network device, etc.) or a processor to perform part of the steps of the methods according to the embodiments of the present application. And the aforementioned storage medium includes: a usb disk, a removable hard disk, a Read-Only Memory (ROM), a random access Memory (Random Access Memory RAM), a magnetic disk, or an optical disk, etc., which can store program codes.

Claims (19)

1. A method for generating dance animation, the method comprising:
Dividing input music to obtain a plurality of pieces of music of the input music;
performing feature analysis on the music piece to obtain the music feature of the music piece;
Determining sample music characteristics of a sample music piece, wherein the similarity between the sample music piece and the music characteristics of the music piece in the sample data set reaches a preset threshold value, as target music characteristics; the sample data set comprises a plurality of sample music dance data, each sample music dance data comprises a sample music piece and a sample dance piece corresponding to the sample music piece, and the sample music is characterized in that the sample music piece is obtained through feature analysis;
determining a sample dancing segment corresponding to the sample music segment where the target music feature is located as a target dancing segment corresponding to each music segment;
Generating target dance animation according to the target dance segment;
before determining the sample music feature of the sample music piece, of which the similarity with the music feature of the music piece in the sample data set reaches the preset threshold, as the target music feature, the method further comprises:
Acquiring sample music and sample dance in a single sample music dance animation;
splitting the sample music to obtain a plurality of sample music fragments;
According to the speed and the acceleration of the skeleton key points in the sample dance, determining the sample dance moment with the acceleration of 0 as a preselected moment;
Calculating the action speed change of each preselected time, and determining the preselected time when the action speed change exceeds a preset parameter change range as a target rhythm point;
splitting the sample dance according to the target rhythm point to obtain a plurality of sample dance fragments;
determining sample dancing fragments corresponding to each sample music fragment according to the dividing points of each sample music fragment and the rhythm points of each sample dancing fragment;
The step of analyzing the characteristics of the music piece to obtain the music characteristics of the music piece comprises the following steps:
converting each music piece into a music frequency spectrum signal;
Adopting a preset encoder to encode the music frequency spectrum signal to obtain one-dimensional characteristics of each music piece;
Converting the one-dimensional characteristics of each music piece by adopting a preset conversion model to obtain the music characteristics of each music piece; the music characteristics of each music piece include: context information of the front and rear pieces of music.
2. The method of claim 1, wherein the dividing the input music to obtain a plurality of pieces of music of the input music comprises:
And carrying out music theory analysis on the input music to obtain a plurality of music pieces.
3. The method of claim 1, wherein the method further comprises:
adopting a preset decoder to decode the music characteristics of each music piece to obtain a decoded reconstruction signal; wherein the reconstructed signal is a restored music spectrum signal;
Calculating a target loss function value from the reconstructed signal and the music spectrum signal of each of the pieces of music;
And optimizing the coding parameters of the preset coder according to the target loss function value until a preset stopping condition is met, so as to obtain the optimized coder.
4. A method as claimed in claim 3, wherein the restored musical spectrum signal comprises: the music characteristic of the restored music piece, each music spectrum signal comprises: musical characteristics of each of the pieces of music;
Said calculating a target loss function value from said reconstructed signal and said music spectrum signal for each of said pieces of music, comprising:
and calculating the target loss function value according to the music characteristics of the restored music pieces and the music characteristics of each music piece.
5. A method as claimed in claim 3, wherein the restored musical spectrum signal comprises: the mel spectrum characteristics of the restored musical piece, each of the music spectrum signals including: mel spectrum characteristics of each of the pieces of music;
Said calculating a target loss function value from said reconstructed signal and said music spectrum signal for each of said pieces of music, comprising:
Calculating a first loss function value according to the Mel spectrum characteristics of the restored music pieces and the Mel spectrum characteristics of each music piece;
the objective loss function value is calculated from the first loss function value.
6. The method of claim 5, wherein the restored music spectrum signal further comprises: the melody characteristics of the restored musical piece, each of the music spectrum signals including: melody characteristics of each of the pieces of music;
The calculating a target loss function value according to the reconstructed signal of each music piece and the music spectrum signal, further comprises:
calculating a second loss function value according to the melody characteristics of the restored musical pieces and the melody characteristics of each musical piece;
the objective loss function value is calculated from the first loss function value and the second loss function value.
7. The method of claim 6, wherein the restored music spectrum signal further comprises: the beat characteristics of the restored musical piece, and each music frequency spectrum signal comprises: beat characteristics of each of the pieces of music;
The calculating a target loss function value according to the reconstructed signal of each music piece and the music spectrum signal, further comprises:
calculating a third loss function value according to the beat characteristics of the restored music pieces and the beat characteristics of each music piece;
the objective loss function value is calculated from the first loss function value, the second loss function value, and the third loss function value.
8. The method of claim 1, wherein before determining as the target musical feature the sample musical feature of the sample musical piece for which the similarity to the musical feature of the musical piece in the sample data set reaches a preset threshold, the method further comprises:
and carrying out feature analysis on the sample music piece to obtain sample music features corresponding to the sample music piece.
9. The method of claim 1, wherein determining as the target music feature the sample music feature of the sample music piece for which the similarity to the music feature of the music piece in the sample data set reaches a preset threshold value, comprises:
and according to a cosine matching algorithm, determining the sample music feature with the smallest cosine distance from the music feature of the music piece in the sample data set as the target music feature.
10. The method of any one of claims 1-9, wherein generating a target dance animation from the target dance segment comprises:
calculating transition frames of adjacent target dance segments;
And generating the target dance animation according to the music fragments, the target dance fragments corresponding to the music fragments and the transition frames.
11. A method of training a music analysis model, the music analysis model comprising: a preset encoder, a preset conversion model and a preset decoder, the method comprising:
Converting the sample music piece into a sample music frequency spectrum signal;
the encoder is adopted to encode the sample music frequency spectrum signals to obtain one-dimensional characteristics of each sample music piece;
Converting the one-dimensional characteristics of each sample music piece by adopting the preset conversion model to obtain the music characteristics of each sample music piece; the music characteristics of each sample music piece comprise: context information of the front and rear sample music pieces;
Adopting the preset decoder to decode the music characteristics of each sample music piece to obtain a decoded reconstruction signal; the reconstructed signal is a restored sample music frequency spectrum signal;
Calculating a target loss function value according to the reconstruction signal of each sample music piece and the sample music frequency spectrum signal;
And optimizing the coding parameters of the preset coder according to the target loss function value until a preset stopping condition is met, so as to obtain the optimized coder.
12. The method of claim 11, wherein the restored sample music spectrum signal comprises: the music characteristics of the restored sample music piece, each sample music spectrum signal comprises: musical characteristics of each of the sample musical pieces;
Said calculating an objective loss function value from said reconstructed signal for each of said sample musical pieces and said sample music spectrum signal, comprising:
And calculating the target loss function value according to the music characteristics of the restored sample music pieces and the music characteristics of each sample music piece.
13. The method of claim 11, wherein the restored music spectrum signal comprises: the mel-frequency spectrum characteristics of the restored sample music piece, and each music frequency spectrum signal comprises: mel spectrum characteristics of each of the sample musical pieces;
Said calculating an objective loss function value from said reconstructed signal for each of said sample musical pieces and said sample music spectrum signal, comprising:
Calculating a first loss function value according to the Mel spectrum characteristics of the restored sample music pieces and the Mel spectrum characteristics of each sample music piece;
the objective loss function value is calculated from the first loss function value.
14. The method of claim 13, wherein the restored music spectrum signal comprises: the melody characteristics of the restored sample music piece, each music spectrum signal comprising: melody features of each of the sample pieces of music;
Said calculating an objective loss function value from said reconstructed signal for each of said sample musical pieces and said sample music spectrum signal, comprising:
calculating a second loss function value according to the melody characteristics of the restored sample music pieces and the melody characteristics of each sample music piece;
the objective loss function value is calculated from the first loss function value and the second loss function value.
15. The method of claim 14, wherein the restored music spectrum signal comprises: the beat characteristics of the restored sample music piece, and each music spectrum signal comprises: beat characteristics of each of the sample pieces of music;
Said calculating an objective loss function value from said reconstructed signal for each of said sample musical pieces and said sample music spectrum signal, comprising:
Calculating a third loss function value according to the beat characteristics of the restored sample music pieces and the beat characteristics of each sample music piece;
the objective loss function value is calculated from the first loss function value, the second loss function value, and the third loss function value.
16. A dance animation generation apparatus, the apparatus comprising: the device comprises a segmentation module, an analysis module, a determination module and a generation module, wherein:
The segmentation module is used for segmenting the input music to obtain a plurality of music pieces of the input music;
the analysis module is used for carrying out feature analysis on each music piece to obtain the music feature of each music piece;
The determining module is used for determining sample music characteristics of the sample music fragments, wherein the similarity between the sample music characteristics and the music characteristics of the music fragments in the sample data set reaches a preset threshold value, as target music characteristics; the sample data set comprises a plurality of sample music dance data, each sample music dance data comprises a sample music piece and a sample dance piece corresponding to the sample music piece, and the sample music is characterized in that the sample music piece is obtained through feature analysis; determining a sample dancing segment corresponding to the sample music segment where the target music feature is located as a target dancing segment corresponding to each music segment;
the generation module is used for generating target dance animation according to the target dance segment;
the apparatus further comprises: the acquisition module is used for acquiring sample music and sample dance in the single sample music dance animation;
The splitting module is specifically configured to split the sample music to obtain a plurality of sample music pieces;
the determining module is specifically configured to determine, according to the speed and the acceleration of the skeletal key point in the sample dance, that a sample dance time with the acceleration of 0 is a preselected time; calculating the action speed change of each preselected time, and determining the preselected time when the action speed change exceeds a preset parameter change range as a target rhythm point;
the dividing module is specifically configured to split the sample dance according to the target rhythm point to obtain the plurality of sample dance segments;
The determining module is specifically configured to determine a sample dance segment corresponding to each sample music segment according to a division point of each sample music segment and a rhythm point of each sample dance segment;
The apparatus further comprises: a conversion module and a processing module, wherein:
The conversion module is used for converting each music piece into a music frequency spectrum signal;
the processing module is used for carrying out coding processing on the music frequency spectrum signals by adopting a preset coder to obtain one-dimensional characteristics of each music piece;
The conversion module is specifically configured to convert one-dimensional features of each music piece by using a preset conversion model to obtain music features of each music piece; the music characteristics of each music piece include: context information of the front and rear pieces of music.
17. A training device for a music analysis model, the music analysis model comprising: a preset encoder, a preset conversion model, and a preset decoder, the apparatus comprising: the device comprises a conversion module, a processing module, a calculation module and an optimization module, wherein:
The conversion module is used for converting the sample music piece into a sample music frequency spectrum signal;
the processing module is used for carrying out coding processing on the sample music frequency spectrum signals by adopting the coder to obtain one-dimensional characteristics of each sample music piece;
The conversion module is specifically configured to convert one-dimensional features of each sample music piece by using the preset conversion model to obtain music features of each sample music piece; the music characteristics of each sample music piece comprise: context information of the front and rear sample music pieces;
the processing module is specifically configured to decode the music feature of each sample music segment by using the preset decoder to obtain a decoded reconstructed signal; the reconstructed signal is a restored sample music frequency spectrum signal;
The calculation module is used for calculating a target loss function value according to the reconstruction signals of the sample music fragments and the sample music frequency spectrum signals;
And the optimizing module is used for optimizing the coding parameters of the preset coder according to the target loss function value until the preset stopping condition is met, so as to obtain the optimized coder.
18. An electronic device, the device comprising: a processor, a storage medium storing machine-readable instructions executable by the processor, the processor in communication with the storage medium via a bus, the processor executing the machine-readable instructions to perform the method of any of the preceding claims 1-15.
19. A storage medium having stored thereon a computer program which, when executed by a processor, performs the method of any of the preceding claims 1-15.
CN202110497533.5A 2021-05-07 2021-05-07 Dance animation generation method, model training method, device, equipment and storage medium Active CN113160848B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110497533.5A CN113160848B (en) 2021-05-07 2021-05-07 Dance animation generation method, model training method, device, equipment and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110497533.5A CN113160848B (en) 2021-05-07 2021-05-07 Dance animation generation method, model training method, device, equipment and storage medium

Publications (2)

Publication Number Publication Date
CN113160848A CN113160848A (en) 2021-07-23
CN113160848B true CN113160848B (en) 2024-06-04

Family

ID=76874249

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110497533.5A Active CN113160848B (en) 2021-05-07 2021-05-07 Dance animation generation method, model training method, device, equipment and storage medium

Country Status (1)

Country Link
CN (1) CN113160848B (en)

Families Citing this family (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP7147384B2 (en) * 2018-09-03 2022-10-05 ヤマハ株式会社 Information processing method and information processing device
CN114154574A (en) * 2021-12-03 2022-03-08 北京达佳互联信息技术有限公司 Training and beat-to-beat joint detection method of beat-to-beat joint detection model
CN114419205B (en) * 2021-12-22 2024-01-02 北京百度网讯科技有限公司 Driving method of virtual digital person and training method of pose acquisition model
CN114820888A (en) * 2022-04-24 2022-07-29 广州虎牙科技有限公司 Animation generation method and system and computer equipment
CN115712739B (en) * 2022-11-17 2024-03-26 腾讯音乐娱乐科技(深圳)有限公司 Dance motion generation method, computer device and storage medium

Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
KR20070008238A (en) * 2005-07-13 2007-01-17 엘지전자 주식회사 Apparatus and method of music synchronization based on dancing
JP2008234453A (en) * 2007-03-22 2008-10-02 Sony Corp Content retrieval device, content retrieval method and content retrieval program
JP2011204113A (en) * 2010-03-26 2011-10-13 Kddi Corp Video content generation system, metadata construction device, video content generation device, portable terminal, video content distribution device, and computer program
KR20140071588A (en) * 2012-12-04 2014-06-12 장준영 A method that dance a robot by matching music and motion of robot
CN109032384A (en) * 2018-08-30 2018-12-18 Oppo广东移动通信有限公司 Music control method, device and storage medium and wearable device
CN110008373A (en) * 2019-03-14 2019-07-12 浙江大学 The information extraction of music graph structure and generation model, construction method and application based on messaging network
CN110992449A (en) * 2019-11-29 2020-04-10 网易(杭州)网络有限公司 Dance action synthesis method, device, equipment and storage medium
CN112330779A (en) * 2020-11-04 2021-02-05 北京慧夜科技有限公司 Method and system for generating dance animation of character model

Patent Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
KR20070008238A (en) * 2005-07-13 2007-01-17 엘지전자 주식회사 Apparatus and method of music synchronization based on dancing
JP2008234453A (en) * 2007-03-22 2008-10-02 Sony Corp Content retrieval device, content retrieval method and content retrieval program
JP2011204113A (en) * 2010-03-26 2011-10-13 Kddi Corp Video content generation system, metadata construction device, video content generation device, portable terminal, video content distribution device, and computer program
KR20140071588A (en) * 2012-12-04 2014-06-12 장준영 A method that dance a robot by matching music and motion of robot
CN109032384A (en) * 2018-08-30 2018-12-18 Oppo广东移动通信有限公司 Music control method, device and storage medium and wearable device
CN110008373A (en) * 2019-03-14 2019-07-12 浙江大学 The information extraction of music graph structure and generation model, construction method and application based on messaging network
CN110992449A (en) * 2019-11-29 2020-04-10 网易(杭州)网络有限公司 Dance action synthesis method, device, equipment and storage medium
CN112330779A (en) * 2020-11-04 2021-02-05 北京慧夜科技有限公司 Method and system for generating dance animation of character model

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
动作与音乐的节奏特征匹配模型;樊儒昆;傅晶;程司雷;张翔;耿卫东;;计算机辅助设计与图形学学报;20100615(06);第990-996页 *

Also Published As

Publication number Publication date
CN113160848A (en) 2021-07-23

Similar Documents

Publication Publication Date Title
CN113160848B (en) Dance animation generation method, model training method, device, equipment and storage medium
CN112784130B (en) Twin network model training and measuring method, device, medium and equipment
CN106653056B (en) Fundamental frequency extraction model and training method based on LSTM recurrent neural network
CN110955786A (en) Dance action data generation method and device
CN109448683A (en) Music generating method and device neural network based
CN111508526B (en) Method and device for detecting audio beat information and storage medium
CN114913327A (en) Lower limb skeleton CT image segmentation algorithm based on improved U-Net
CN113409803B (en) Voice signal processing method, device, storage medium and equipment
CN113423005B (en) Intelligent music generation method and system based on improved neural network
CN113035228A (en) Acoustic feature extraction method, device, equipment and storage medium
CN111563161B (en) Statement identification method, statement identification device and intelligent equipment
CN110600046A (en) Many-to-many speaker conversion method based on improved STARGAN and x vectors
CN113823323A (en) Audio processing method and device based on convolutional neural network and related equipment
CN115631275A (en) Multi-mode driven human body action sequence generation method and device
CN105741853B (en) A kind of digital speech perceptual hash method based on formant frequency
CN113421546B (en) Speech synthesis method based on cross-test multi-mode and related equipment
Wang et al. Acoustic scene classification based on dense convolutional networks incorporating multi-channel features
CN116959464A (en) Training method of audio generation network, audio generation method and device
CN113241054B (en) Speech smoothing model generation method, speech smoothing method and device
CN113053336B (en) Musical composition generation method, device, equipment and storage medium
CN114822509A (en) Speech recognition method, speech recognition device, computer equipment and storage medium
CN113889130A (en) Voice conversion method, device, equipment and medium
CN113870896A (en) Motion sound false judgment method and device based on time-frequency graph and convolutional neural network
CN112906815A (en) Method for predicting human face by sound based on condition generation countermeasure network
CN117476027B (en) Voice conversion method and device, storage medium and electronic device

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant