CN115379299A - Dance action generation method and device, electronic equipment and storage medium - Google Patents

Dance action generation method and device, electronic equipment and storage medium Download PDF

Info

Publication number
CN115379299A
CN115379299A CN202211010740.4A CN202211010740A CN115379299A CN 115379299 A CN115379299 A CN 115379299A CN 202211010740 A CN202211010740 A CN 202211010740A CN 115379299 A CN115379299 A CN 115379299A
Authority
CN
China
Prior art keywords
dancer
dance
action
segment
vector
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202211010740.4A
Other languages
Chinese (zh)
Inventor
王子轩
贾珈
兴军亮
孟凡博
陈国文
王砚峰
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Tsinghua University
Tencent Technology Shenzhen Co Ltd
Original Assignee
Tsinghua University
Tencent Technology Shenzhen Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Tsinghua University, Tencent Technology Shenzhen Co Ltd filed Critical Tsinghua University
Priority to CN202211010740.4A priority Critical patent/CN115379299A/en
Publication of CN115379299A publication Critical patent/CN115379299A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/80Generation or processing of content or additional data by content creator independently of the distribution process; Content per se
    • H04N21/81Monomedia components thereof
    • H04N21/816Monomedia components thereof involving special video data, e.g 3D video
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/80Generation or processing of content or additional data by content creator independently of the distribution process; Content per se
    • H04N21/83Generation or processing of protective or descriptive data associated with content; Content structuring
    • H04N21/845Structuring of content, e.g. decomposing content into time segments
    • H04N21/8456Structuring of content, e.g. decomposing content into time segments by decomposing the content in the time domain, e.g. in time segments

Landscapes

  • Engineering & Computer Science (AREA)
  • Multimedia (AREA)
  • Signal Processing (AREA)
  • Electrophonic Musical Instruments (AREA)

Abstract

The invention provides a dance action generation method, a dance action generation device, electronic equipment and a dance action storage medium; the audio characteristic information contained in the music file is extracted and analyzed, so that the audio characteristic of the obtained music file can be quantitatively described; performing music-driven dance generation based on the audio characteristics, introducing a dancer activation vector with the personal performance style of the dancer, and performing dance action prediction by using the dancer activation vector and the extracted audio characteristic information by using a convolutional neural network algorithm to obtain all dance actions which may be displayed by the dancer in the performance, so that complete dance actions with the personal style can be generated for different dancers under the same background music file; finally, action units of a plurality of dancers are integrated completely through action splicing, and multi-user group dances which are mixed in a plurality of styles and have rich levels can be obtained.

Description

Dance action generation method and device, electronic equipment and storage medium
Technical Field
The present application relates to the field of motion simulation technologies, and in particular, to a dance motion generation method and apparatus, an electronic device, and a storage medium.
Background
The artistic effect brought by the multi-element conjunction of music and dance is the unavailable audiovisual enjoyment, and along with the rapid development of interconnection and computer technology, the wonderful music and dance creation is not limited to artists per se; therefore, the music-driven dance generation technology is applied to the fields of virtual reality, man-machine interaction and the like.
In the related technology, music-driven dance generation obtains audio characteristic information, performs related coding to obtain a deep audio characteristic vector, and then combines with a dance motion vector to generate a new dance motion unit; and splicing the generated dance action units through the action splicing model to obtain a complete and natural dance.
However, the existing music-driven dance generation generally only focuses on the action generation of single dance for a single person, so that the dance effect is single.
Disclosure of Invention
Based on the method, the invention provides a dance action generation method. In order to solve current music drive dance generation technique, can only be concentrated on single dance action and generate, cause the single problem of dance effect.
The invention also provides a dance action generating device used for ensuring the realization and application of the method in practice.
The dance action generation method provided by the embodiment of the invention comprises the following steps:
acquiring a music file, and extracting mixed audio features of the music file, wherein the mixed audio features are used for representing the style of the music file; the music file is divided into a plurality of segments according to playing time;
for each current segment, predicting the dancer activation vector of each current segment dancer according to the mixed audio features and the dancer activation vector of the dancer of the previous segment; the dancer activation vector is used for representing whether each dancer in the segment performs dancing actions or not;
performing action prediction according to the mixed audio features, the dance action set of the dancer in the previous segment and the dancer activation vector of the dancer in the current segment to obtain the dance action set of each dancer in the current segment; the dance action set comprises all action gestures to be done by the dancer in the segment;
and splicing dance motion postures contained in the dance motion set of the dancer aiming at each segment to obtain a dance posture sequence of the dancer, and summarizing the dance posture sequences of the dancers in all the segments to obtain a group dance motion posture set.
An embodiment of the present invention further provides a dance motion generating apparatus, including:
the audio feature extraction module is used for acquiring a music file and extracting mixed audio features of the music file, wherein the mixed audio features are used for representing the style of the music file; the music file is divided into a plurality of segments according to playing time;
the first prediction module is used for predicting the dancer activation vector of each current segment dancer according to the mixed audio features and the dancer activation vector of the dancer in the previous segment; the dancer activation vector is used for representing whether each dancer in the segment performs dancing actions or not;
the second prediction module is used for performing motion prediction according to the mixed audio features, the dance motion set of the dancer in the previous section and the dancer activation vector of the dancer in the current section to obtain the dance motion set of each dancer in the current section; the dance action set comprises all action gestures to be done by the dancer in the segment;
and the sequence summarizing module is used for splicing dance action gestures contained in the dance action set of the dancer aiming at each segment to obtain a dance gesture sequence of the dancer, summarizing the dance gesture sequences of the dancers in all the segments to obtain a group dance action gesture set.
An embodiment of the present invention further provides an electronic device, including: a processor; a memory for storing the processor-executable instructions; wherein the processor is configured to execute the instructions to implement the method described above.
Embodiments of the present invention also provide a storage medium, and when instructions in the storage medium are executed by a processor of an electronic device, the electronic device is enabled to execute the method described above.
An embodiment of the present invention further provides a computer program product, which includes a computer program, and when the computer program is executed by a processor, the method described above is implemented.
In the embodiment of the invention, the audio characteristic information contained in the music file is extracted and analyzed, so that the audio characteristic of the acquired music file can be quantitatively described; performing music-driven dance generation based on the audio characteristics, introducing a dancer activation vector with the personal performance style of the dancer, and performing dance action prediction by using the dancer activation vector and the extracted audio characteristic information by using a convolutional neural network algorithm to obtain all dance actions which may be displayed by the dancer in the performance, so that complete dance actions with the personal style can be generated for different dancers under the same background music file; finally, action units of a plurality of dancers are integrated completely through action splicing, and multi-person group dances which are mixed with a plurality of styles and have rich levels can be obtained.
Drawings
In order to more clearly illustrate the technical solutions in the embodiments of the present invention, the drawings needed to be used in the description of the embodiments will be briefly introduced below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and it is obvious for those skilled in the art to obtain other drawings based on these drawings without inventive labor.
FIG. 1 is a flow chart of simple steps of a dance action generating method according to an embodiment of the present invention;
FIG. 2 is a flow chart of relation between specific implementation steps of a dance action generation method according to an embodiment of the present invention;
fig. 3 is a diagram illustrating an extraction result of audio parameter information according to an embodiment of the present invention;
FIG. 4 is a diagram illustrating relationships between steps of generating a complete music-driven dance according to an embodiment of the present invention;
FIG. 5 is a block diagram of a dance motion generating apparatus according to an embodiment of the present invention;
fig. 6 is a schematic diagram of a hardware structure of an electronic device according to an embodiment of the present invention;
fig. 7 is a block diagram of an electronic device according to an embodiment of the present invention.
DETAILED DESCRIPTION OF EMBODIMENT (S) OF INVENTION
The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
The invention is operational with numerous general purpose or special purpose computing device environments or configurations. For example: personal computers, server computers, hand-held or portable devices, tablet-type devices, multi-processor appliances, distributed computing environments that include any of the above devices or equipment, and the like.
The invention may be described in the general context of computer-executable instructions, such as program modules, being executed by a computer. Generally, program modules include routines, programs, objects, components, data structures, etc. that perform particular tasks or implement particular abstract data types. The invention may also be practiced in distributed computing environments where tasks are performed by remote processing devices that are linked through a communications network. In a distributed computing environment, program modules may be located in both local and remote computer storage media including memory storage devices.
As shown in fig. 1, fig. 1 is a flowchart illustrating simple steps of a dance motion generation method according to an embodiment of the present invention, where the method is as shown in fig. 1, and the method may include:
step 101, acquiring a music file, and extracting mixed audio features of the music file, wherein the mixed audio features are used for representing the style of the music file; the music file is divided into a plurality of sections according to the playing time.
In the embodiment of the invention, a music file is used as an important information basis for music-driven dance generation, and dance motions matched with the music content style are generated by analyzing the content style of music in a passing mode and extracting corresponding characteristic information; dance action sequence mapping is carried out through audio modeling, and matching relation between audio features and dance actions can be strengthened.
Specifically, the system firstly acquires a music file to preprocess the audio; the system model takes a piece of music as target input, and extracts a plurality of pieces of audio characteristic information in the target whole music file; and the acquired audio characteristic information is expanded on a time axis, and the division of the time axis is the complete playing time of the music file, for example: the total playing time of a currently acquired music file is 3 minutes and 20 seconds, and the audio frequency of the music file is divided into a plurality of segments according to the playing time, wherein the time length of each segment is 10 seconds; each time segment has corresponding audio characteristic information, and the audio characteristic information is spliced end to end according to the time sequence of division of the audio characteristic information, so that the mixed audio characteristic corresponding to the whole music file is obtained; in the time slice, the music style of the whole piece of music can be quantitatively expressed through the acquired multiple pieces of complete audio characteristic information.
Compared with the traditional music-driven dance generation algorithm, the method has the thought that the processing process is approximately regarded as a regression problem of similarity matching, the most matched action sequence is found out through the mapping relation only by comparing the target music file with the existing music information in the database, and the continuity of the generated actions is not considered, so that the generated dance actions are not consistent and lack of artistry; and the target music is analyzed by modeling the audio, and then the individual dance styles of different dancers are mixed to predict dance actions, so that the dance actions matched with the audio characteristics can be generated more flexibly, and meanwhile, the action sequences are more consistent.
102, predicting the dancer activation vector of each current segment dancer according to the mixed audio features and the dancer activation vector of the dancer in the previous segment for each current segment; the dancer activation vector is used to characterize an initial dancing action of the dancer at a beginning of a segment.
In the embodiment of the invention, after the mixed audio characteristics are obtained through modeling analysis and extraction, the mixed audio characteristics are mixed with the personal dance action style of a dancer, and the complete dance action which accords with the music characteristics and reserves the personal dance style can be displayed on the basis of background music in dance performance; the method and the device have the advantages that the personal dance action style of the Dancer is represented by setting a Dancer Activation Vector (DAV).
Specifically, after the preprocessing stage of the music file in step 101, the first stage of music-driven dance generation is entered; the characteristics of the processing step in the stage can be described as a 'dancer cooperation stage', the acquired mixed audio characteristics are converted into characteristic vectors through coding processing, and therefore correlation operation can be carried out on the characteristic vectors and dancer activation vectors drawn by developers, and the dancer activation vectors performing in the current time segment are obtained.
Wherein, the action of the dancer activation vector is to describe whether each dancer in the section respectively performs dancing actions; for example: dancing in groups requires only dancers 1 and 3 to perform dancing in the current time segment, and the dancer activation vectors in the segment correspondingly activate dancers 1 and 3 and do not activate other dancers, namely, the other dancers do not perform dancing actions and are in a resting or waiting state.
In the operation process, the dancer activation vector of the current segment is determined by the mixed audio features of the current segment and the dancer activation vector at the previous moment, so that the operation model has a circular output feedback mechanism, namely the output value of the previous segment is used as the input value of the next segment; in the first cycle operation, the dancer activation vectors of the first time segment are preset by developers according to dancers matched with dancing actions to be generated, and the preset value can be changed by the developers at the beginning of the system to match personal styles of different dancers; the output of the previous cycle calculation is used as the input of the next cycle calculation, which is favorable for ensuring the consistency of the head and the tail of the activation vectors; the position pointed by the ending point of the dance activation vector of the previous cycle is used as the calculation starting point of the dance activation vector of the next cycle due to the direction addable characteristic of the vector; reflects the integral effect and can effectively ensure the consistency of the actions of the dancer in the performance.
By analyzing the mixed audio characteristic information obtained by the music file and the dancer activation vector DAV to participate in the music-driven dance action generation process, the consistency of the front and back action vectors can be well guaranteed by a circulating feedback mechanism used by the dancer activation vector operation under the condition that the matching music style and the dance style are consistent.
103, performing action prediction according to the mixed audio features, the dancing action set of the dancer in the previous segment and the dancer activation vector of the dancer in the current segment to obtain a dancing action set of each dancer in the current segment; the dance action set comprises all action gestures to be done by the dancer in the segment.
In the embodiment of the invention, all alternative dance motions which can be made by the dancer in the fusion personal dance style in the target background music can be obtained by calculating the obtained dancer activation vector DAV for motion prediction.
In particular, according to the relevant characteristics of the content implemented in this step, this phase may be referred to as "multi-dancer-action dancing phase", intended to generate a corresponding specific dancing action by means of a previously obtained dancer activation vector DAV with a characteristic personal dancing style of the dancer; similarly, as shown in step 102, the pre-processed mixed audio features are encoded into audio feature vectors, and then the audio feature vectors and the dance motion set at the previous moment are used as input to perform decoding operation, so as to obtain a decoding result, and then perform perceptual mapping output to obtain all candidate dance motion vectors of the dancer in the current moment segment; it should be noted that this part of the alternative dance motion vector takes into account all the possibilities of the dancer activation vector DAV obtained in step 102; that is, for each possible occurrence of the dancer activation vector DAV, a specific dancing action vector is calculated, and the corresponding dancing action is generated through the systematic encoding and decoding operation, mixing the styles of all the dancers activated by the dancer activation vector DAV. Therefore, all alternative dance motion vectors are obtained, and therefore the actual final dance motion vector can be selected according to the value of the specific dancer activation vector DAV.
Similarly, a circular feedback calculation mechanism is also expanded and introduced in the decoding operation process of the step, and the dance action set of the previous segment is used as the dance action set calculation input of the next segment; and performing action selection on the dancer action vector of the current moment segment obtained through decoding calculation and the dancer activation vector DAV input of the current moment obtained in the previous stage, so as to obtain the dancing action corresponding to each dancer in the current moment segment, and corresponding the dancing action to the time segment in which the dancer is positioned, namely the dancing action sequence of the dancer in the complete performance process.
In the step, the complete dance movement to be displayed by the dancer is obtained according to the obtained candidate dance movement vector of the dancer; it should be noted that in the process, the generation result of each complete cycle calculation processing only aims at the individual dancers, and does not necessarily directly generate the corresponding multi-person group dancing actions; this is because not all dancers in a group of dances have coherent performance needs during the entire music playing, where there are also time segments for individual dancers to perform as a stop motion or a rest state.
The personal dance action sequence obtained through the steps not only reserves the personal action style of the dancer, but also has matched cooperative action according to the music style of the music file; the dance training system has the advantages that the action is smooth and attractive, the personal style is kept, and meanwhile, the dance training system has higher flexibility in the later arrangement and adjustment process aiming at the personal dance generation sequence.
And 104, splicing dance movement gestures contained in the dance movement set of the dancer aiming at each segment to obtain dance gesture sequences of the dancer, and summarizing the dance gesture sequences of the dancers in all the segments to obtain a group dance movement gesture set.
In the embodiment of the invention, the final stage of the music-driven dance generation process can be called as a 'movement transition stage', and a complete dance movement sequence personally shown by a generated dancer is spliced to obtain a coherent dance movement; then dance performance is performed in sequence based on the time division segments of music playing. For example, after all the steps, the complete and coherent dance movements of the four-digit multi-dancer A, B, C and D are generated, and dance performances are performed in sequence in cooperation with the time division segments of music playing; the music files are divided according to a time segment of every 20 seconds, in the first 20 second segment of music playing, two dancers, namely, a dancer B and a dancer C, firstly start to show dancing actions according to respective dancing action sequences, and a dancer A and a dancer C only show one fixed posture action in the first time segment and are in a resting state to serve as accompanying dancers of the dancer B and the dancer C; along with the continuous playing of music, different dance combinations are arranged in each time segment in the respective dance sequences of the four dancers A, B, C and D, and the dance combination has the action combination of one person resting and three persons dancing, also has the action combination of three persons resting and one person dancing, and can also have the condition that four persons simultaneously develop the dance postures of activities; and finally, with the music playing ending, the four dancers all return to the resting posture at the music playing starting moment as ending actions.
Specifically, after the two steps are processed, the action unit sequence of each dancer in the whole music playing process is obtained, and similarly, according to the processing action characteristics, the step can be defined as an action transition stage; therefore, all action units of each dancer are spliced through the action splicing model according to the sequence of the sequence arrangement in the final stage of the processing, and the multi-person group dance with mixed multi-person styles and rich hierarchy is obtained.
In summary, the invention can quantitatively describe the music style of the acquired music file by extracting and analyzing the audio characteristic information contained in the music file; music-driven dance generation is carried out based on the music style, and alternative dance actions matched with the music style can be obtained; adding a dancer activation vector with the personal performance style of the dancer, and editing the obtained alternative dancing actions by using a convolutional neural network algorithm, so that complete dancing actions with the personal style can be generated for different dancers under the same background music file; finally, action units of a plurality of dancers are integrated completely through action splicing, and multi-person group dances which are mixed with a plurality of styles and have rich levels can be obtained.
As shown in fig. 2, fig. 2 is a relation flow diagram of specific implementation steps of a dance action generation method according to an embodiment of the present invention; as shown in fig. 2, it comprises the steps of:
step 201, extracting audio parameters from the music file; wherein the audio parameters include: a chrominance spectrum, a basic rhythm unit, and a starting note.
Extracting a target audio parameter according to the obtained music file, as shown in fig. 3, fig. 3 is a diagram of an extraction result of audio parameter information according to an embodiment of the present invention; the parameters specifically comprise 301 basic unit (Beat), 302 initial note (Onset) and 303 Chroma Spectrum (Chroma Spectrum) information of the overall music rhythm; wherein the Onset note Onset specifically refers to the beginning of a note or other sounds in music, and the amplitude value rises from zero to an initial peak value; the audio parameters can intuitively exhibit the audio characteristics of music.
Specifically, dividing the acquired music file into a plurality of different continuous clips on a time axis according to the time length, wherein the division result corresponds to a plurality of different interval paragraphs on the time axis; and extracting the audio parameter information according to the audio waveform generated in the playing process of the music file.
The audio parameters in the chromaticity frequency spectrum, the basic rhythm unit and the initial note III can briefly and intuitively reflect the audio characteristics of the audio waveform of the obtained target music file; the playing time of the music file is divided into a plurality of time segments in a refining mode according to the time length, so that the music file can be completely and conveniently extracted according to the time sequence rule, and the extraction work of each time segment is not interfered mutually.
Step 202, splicing the extracted audio parameters to obtain the mixed audio features.
According to step 201, the segments of the extracted audio parameter information after time segment division are independent from each other, but the audio information of the music source file itself is continuous, so that the audio parameter information after extraction needs to be subsequently spliced, so that the audio parameter information in all the segments are connected to completely express the audio characteristics of the whole music file.
Specifically, the audio parameter information extraction result is displayed on a time axis, and then the three audio features are spliced according to the sequence of the time intervals divided by the respective time axes, so that the mixed audio feature capable of completely representing the whole music file can be obtained.
The music style of a target music file can be quantitatively analyzed by extracting the audio parameters of the music file and reading the audio parameters; for example, for music with strong rhythm, the peak column of the Beat parameter is denser on the time axis, i.e. the whole audio file has more obvious rhythm corresponding to the listener; meanwhile, in the subsequent dance driving generation process, dance actions more fit with the rhythm can be designed by the aid of the clear rhythm parameters, and the rhythm dance actions with the 'stepping point' characteristic are more ornamental.
Step 203, performing local music coding on the obtained mixed audio features to obtain a depth audio feature vector for each current segment; inputting the depth audio feature vector and the dancer activation vector of the dancer in the previous segment into a decoder for gate decoding to obtain a gate decoding result; wherein the decoder is a gate control loop unit based decoder.
The gate control circulation Unit (GRU) is a decoder based on a gate mechanism, is one of a Recurrent neural network training model, can solve the problem that the existing Recurrent neural network technology cannot memorize for a long time and the gradient problem in back propagation, and is used for solving the possible long-term dependence phenomenon in the prediction problem; for example, in an alternative implementation process of the embodiment of the present application, the obtained depth audio feature vector and the dancer activation vector of the dancer in the previous segment are input into a decoder for gate decoding, and the decoding process is also a training process for the model, so that the decoding operation can be performed more accurately through multiple loop feedback calculations by using the memory feature.
The depth audio feature vector and the dancer activation vector obtained by the processing unit are circularly processed to perform decoding operation, so that the common features of data can be found and processed accumulatively in the multiple-cycle decoding operation process due to the memory characteristic of the model, and the problem that the traditional cyclic neural network technology cannot memorize for a long time and solve the gradient problem in back propagation can be solved.
And step 204, predicting according to the gate decoding result to obtain the dancer activation vector of the current segment dancer.
Referring to the brief embodiment of step 102, steps 203-204 are specific implementation steps in the "dancer collaboration stage" described in step 102; predicting according to the decoding operation result of the gate control loop unit GRU in the step 203; the operational decoding result of the GRU is still a vector unit which is the same as the input vector of the forehead dancer activation of the previous segment, but combines the feature information contained in the mixed audio features; therefore, the dancer activation vector of the current section dancer can be obtained through prediction according to the output result.
Through the gate decoding operation implemented in step 203, the obtained mixed audio features are integrated with the feature information of the dancer activation vector of the previous segment, the dancer activation vector of the current time segment calculated according to the dancer activation vector information of the previous segment can be obtained according to the integrated coding result, and meanwhile, the dancer activation vector of the current segment continues to be used as the decoding calculation input of the next time segment.
And step 205, predicting the alternative dance motion vector of each dancer in the current segment according to the mixed audio characteristics and the dance motion set of the dancer in the previous segment.
This step is used as the second stage in the system processing, and the hierarchical relationship of the initial step of the "multi-dancer action dancing stage" and the step 203 are in the same level parallel relationship; and the mixed audio features obtained after the music file is preprocessed and coded are also used as target input to be processed in the next step.
Specifically, a set of mixed audio features and dancer actions are taken as system inputs; the dance player comprises a dance player motion selection module, a dance player motion selection module and a feedback loop, wherein the mixed audio features are obtained by splicing extracted audio parameter information in a preprocessing stage of a music file, and a dance player motion set is an output result of the motion selection module in the stage, and meanwhile the result is used as calculation input quantity through the feedback loop to participate in loop calculation; similarly, the mixed audio features are firstly encoded by local music to generate a depth audio feature vector, and the depth audio feature vector and the action set of the dancer in the previous segment are taken as input to be sequentially subjected to correlation calculation by a gate control cycle unit GRU and a Multilayer Perceptron (MLP) to obtain all candidate dance action vectors output as the dancer at the current moment.
And similarly, a calculation mechanism of loop feedback is used, a calculation output result of the previous time segment is used as a calculation input of the next time segment, and the vector calculation results of the previous time segment and the next time segment can be better connected by utilizing the characteristics of the memory calculation mechanism of the loop control units GRU, so that the coherent dance action can be generated more accurately.
Optionally, step 205 may specifically include:
substep 2051 performs local music coding on the mixed audio features to obtain depth audio feature vectors.
The mixed audio features obtained in step 202 are input to a Local audio coding module (LME) for coding, and the numerical mixed audio feature information is converted into a depth audio feature vector.
The mixed audio feature information is specific numerical information, but the introduction of direction information besides the numerical information generated by quantizing the dance-participating movement inevitably needs to be converted into a vector capable of participating in movement generation by the local audio coding module LME.
Optionally, the sub-step 2051 may specifically include:
sub-step 20511, performing local music coding on the mixed audio features based on a coding mode of a convolutional neural network structure to extract deep information of the mixed audio features to obtain the deep audio feature vector.
Local music coding is carried out on the obtained mixed audio characteristics through a local audio coding module LME; the local audio coding module LME is a variant model based on a Convolutional Neural Network (CNN) and is used for coding the characteristic information of the music segments and extracting deep information.
The depth audio feature vector obtained after the convolutional neural network coding can accurately and clearly reflect the audio feature information in the obtained target music file, and the digital scalar information of the audio feature is converted into vector information, so that the dance motion vector can be conveniently synthesized and calculated.
And a substep 2052 of combining the depth audio feature vector and the dance action information of the previous section of the dancer, performing gate decoding to obtain a gate decoding result.
Introducing a loop feedback calculation mechanism, and decoding through a gate control loop unit GRU by taking the obtained depth audio characteristic vector and the dancer activation vector at the previous moment as common input; in the whole system processing operation process, the action of the dancer activation vector DAV is to describe the initial dancing action of the dancer at the starting moment of a certain performance segment; in the operation process, the dancer activation vector DAV at the current moment is determined by the mixed audio features of the current moment segment and the dancer activation vector DAV at the previous moment, namely the output value of the previous segment is used as the input value of the next moment; in the first cycle operation, the dancer activation vectors of the first time segment are preset by developers according to dancers matched with dancing actions to be generated, and the preset value can be changed by the developers at the beginning of the system to match personal styles of different dancers; through the step, the deep audio characteristic vector capable of representing the music file style information and the dancer activation vector representing the dancer action style are decoded together for the first time, and the dancer activation vector at each moment in the whole music playing time is obtained preliminarily, so that the initial dancing action corresponding to the dancer can be correctly matched according to the content styles of the music file at different moments.
Specifically, a gate controlled recycling Unit (GRU) is a decoder based on a gate mechanism, is one of the Recurrent neural network training models, can solve the problem that the existing Recurrent neural network technology cannot memorize for a long time and the gradient problem in back propagation, and is used for solving the possible long-term dependence phenomenon in the prediction problem; for example, in an optional implementation process of the embodiment of the present application, the obtained depth audio feature vector and dance motion information of a previous dance person are input to a decoding process of a decoder for gate decoding, and the decoding process is also a training process of a model, so that a decoding operation can be performed more accurately through multiple loop feedback calculations by using a memory feature.
The depth audio feature vectors and dance motion information obtained by the processing unit through cyclic processing are adopted to perform decoding operation, common features of data can be found and processed accumulatively in the process of multiple cyclic decoding operation due to the memory characteristic of the model, and the problem that the traditional cyclic neural network technology cannot memorize for a long time and solve the gradient problem in back propagation can be solved.
And a substep 2053 of performing multilayer perception data mapping on the gate decoding result to obtain an alternative dance motion vector of the dancer in the current segment.
Specifically, the depth audio feature vector obtained in the previous sub-step 2052 and the dance motion information of the previous dance segment are input and transmitted to a gate decoder GRU for decoding, and the obtained decoding result is imported to a multi-layer Perceptron (MLP) for data set mapping; the multi-layer perceptron MLP is a feedforward artificial neural network model, and a plurality of input data sets of the multi-layer perceptron MLP are mapped to a single output data set; the dance action information is a final output result of the second stage and is used for representing the specific dance action of each dancer at any moment, which is finally obtained in the multi-dancer action dancing stage of the second stage; the output result of the step is the alternative dance motion vector of the dancer, and dance motion information of the dancer can be obtained through subsequent links.
The dancer motion vector generated through the step is an intermediate quantity and is not used as a final calculation result of the dancer motion; through door decoding operation and multilayer perception mapping, the personal dance action style of a dancer and the style mixing result of the acquired target music file are more accurately determined.
And step 206, performing action prediction according to the alternative dance action vector of the dancer in the current segment and the dancer activation vector of the dancer to obtain a dance action set of the dancer in the current segment.
The alternative dance motion vectors obtained through the steps are used as motion materials generated by dance driving, and the optimal display effect can be achieved only through screening and arranging; therefore, the dance motion to be actually shown by the dancer in the whole performance process is further determined according to the alternative dance motion vector of the dancer in the current segment and the dancer activation vector of the dancer.
Optionally, step 206 may specifically include:
substep 2061, establishing a correspondence between the dancing action vector of the dancer and the dancer activation vector of the dancer for the current segment.
And a step 2053 of adapting, namely establishing a corresponding relation between the dance action vector of the dancer and the dancer activation vector of the dancer.
Specifically, the candidate actions included in the generated candidate dance action vector have a correspondence relationship with previous dance activation vectors; the dancer activation vector is used as a starting motion vector of the alternative motion; the alternative dance Motion vector at the current moment and the dancer activation vector at the current moment are used as common input and transmitted to a Motion selection module (Motion Select), and the dance Motion of each dancer in the current time segment can be obtained; wherein the action selection module is used for: matching and screening out dance motion vectors corresponding to the dance motion vectors of the current time segment in all the obtained candidate dance motion vectors of the current time of the dancer; for example, if the dancer activation vector for the current time segment requires only dancer 1 and dancer 3 to perform dancing, then a dancing motion vector that considers motion and music history information and only mixes the dancer 1 and dancer 3 styles needs to be selected from the candidate dancing motion vectors. The dance movements simultaneously comprise dance movements and stopping movements required by a dancer in the performance process, and are in a resting state.
And a substep 2062 of performing action selection operation according to the correspondence to obtain a dance action set of the dancer in the current segment.
And executing action selection operation according to the corresponding relation and action selection module in the substep 2061 to obtain a dance action collection with the result of the current segment as the dancer.
And step 207, splicing dance motion postures contained in the dance motion set of each dancer through the dance motion splicing model for each section to obtain a dance posture sequence of the dancer.
The dance action collection obtained in step 206 is a discrete action collection, and the complete dance performance also needs to adjust the display sequence of the actions, so that the actions are coordinated with each other to show a coherent characteristic; therefore, dance movement gestures contained in the dance movement set of each dancer are spliced through the dance movement splicing model, and dance movement gesture sequences of the dancers are obtained.
Specifically, the dance movement splicing model arranges the dance movement postures of each dancer according to the adjusted display sequence, splices the arranged sequence results together according to a head-tail relationship, and outputs results capable of displaying all dance postures continuously displayed by a single dancer in one complete performance; meanwhile, the start and the end of the dance gesture sequence are also matched with the start and the end of the music file playing time.
Through the steps, all dance action sequences of each dancer in the whole music file playing process can be obtained, wherein the dance action sequences comprise the complete dance actions and the corresponding time information.
And step 208, summarizing dance gesture sequences of dancers in all the segments to obtain a group dance motion gesture set corresponding to the music file, wherein the group dance motion gesture set is used for representing a series of dance motions performed by each dancer in each segment played by the music file.
Referring to step 207, the splicing result is a dance gesture sequence obtained for the individual dancer; the multi-person group dance method aims at solving the problem that multi-person group dance generated based on music style hybrid driving needs to be transited to a group dance gesture sequence of multi-dancer on the basis of generation of a complete action gesture sequence of single dancer.
Specifically, the personal dance posture sequences of all dancers in one group dance performance are gathered together, and then the multi-person group dance performance with any number can be obtained; in the actual performance process, a single dancer only needs to cooperate with music to perform dance performance and action display according to the dance posture sequence of the single dancer, the posture sequences of all the dancers in the whole process of the embodiment of the invention adopt the same generation step and calculation method, dance actions are generated through the same target background music, and the action display effect among the dancers in the same time segment in the group performance process is uniform and consistent in style.
And 209, correspondingly displaying the group dance motion gesture set according to the time playing precision of the music file while playing the music file.
And finally, displaying all dance action gestures according to the playing time of the music file, namely displaying the group dance of any number of people.
Compared with the prior art, the embodiment of the invention completely mixes the music style and the personal dance style of dancers, summarizes the dancing action posture sequence into a dancing action posture collection based on the single dancing action posture sequence, and can generate group dancing of any number of people thanks to the design of the model;
referring to fig. 4, fig. 4 is a diagram showing a relation between steps of generating a complete music-driven dance according to an embodiment of the present invention; as shown in fig. 4, the method comprises a 401 audio preprocessing stage, a 402 dancer cooperation stage, a 403 multi-dancer movement dancing stage and a 404 movement transition stage, and specifically, the music style of the obtained music files is described in a quantitative manner by extracting and analyzing the audio characteristic information contained in the music files; performing music-driven dance generation based on the music style to obtain alternative dance actions matched with the music style; adding a dancer activation vector with the personal performance style of the dancer, and editing the obtained alternative dancing actions by using a convolutional neural network algorithm, so that complete dancing actions with the personal style are generated for different dancers under the same background music file; finally, action units of a plurality of dancers are integrated completely through action splicing, and multi-user group dances which are mixed in a plurality of styles and have rich levels can be obtained.
In conclusion, the audio characteristic information contained in the music file is extracted and analyzed, so that the audio characteristics of the obtained music file can be quantitatively described; music-driven dance generation is carried out based on the audio characteristics, and alternative dance actions matched with the audio characteristics can be obtained; adding a dancer activation vector with the personal performance style of the dancer, and editing the obtained alternative dancing actions by using a convolutional neural network algorithm, so that complete dancing actions with the personal style can be generated for different dancers under the same background music file; finally, action units of a plurality of dancers are integrated completely through action splicing, and multi-person group dances which are mixed with a plurality of styles and have rich levels can be obtained.
Fig. 5 is a block diagram of a dance motion generating apparatus according to an embodiment of the present invention, and as shown in fig. 5, the apparatus includes:
the audio feature extraction module 501 is configured to acquire a music file, and extract a mixed audio feature of the music file, where the mixed audio feature is used to represent a style of the music file; the music file is divided into a plurality of segments according to playing time;
a first prediction module 502, configured to predict, for each current segment, a dancer activation vector of each current segment dancer according to the mixed audio feature and the dancer activation vector of the dancer of the previous segment; the dancer activation vector is used for representing whether each dancer in the segment performs dancing actions or not;
a second prediction module 503, configured to perform motion prediction according to the mixed audio features, the dance motion set of the dancer in the previous segment, and the dancer activation vector of the dancer in the current segment, to obtain a dance motion set of each dancer in the current segment; the dance action set comprises all action gestures to be done by the dancer in the segment;
and the sequence summarizing module 504 is used for splicing dance motion gestures contained in the dance motion set of the dancer according to each segment to obtain a dance gesture sequence of the dancer, summarizing the dance gesture sequences of the dancers in all segments, and obtaining a group dance motion gesture set.
Optionally, the audio feature extraction module 501 includes:
the parameter extraction submodule is used for extracting audio parameters from the music file; wherein the audio parameters include: a chromaticity spectrum, a basic rhythm unit and a starting note;
and the parameter splicing submodule is used for splicing the extracted audio parameters to obtain the mixed audio features.
Optionally, the first prediction module 502 includes:
the gate decoding sub-module is used for inputting the obtained mixed audio features and the dancer activation vector of the dancer in the previous segment into a decoder for gate decoding to obtain a gate decoding result; wherein the decoder is a decoder based on a gate control loop unit;
and the first prediction sub-module is used for predicting according to the gate decoding result to obtain the dancer activation vector of the current segment dancer.
Optionally, the second prediction module 503 includes:
the action vector prediction sub-module is used for predicting the alternative dance action vector of each dancer in the current segment according to the mixed audio features and the dance action set of the dancer in the previous segment;
optionally, the motion vector predictor module includes:
the music coding unit is used for carrying out local music coding on the mixed audio features to obtain a depth audio feature vector;
the gate decoding unit is used for combining the depth audio characteristic vector and the dance action information of the dancer in the previous segment to perform gate decoding to obtain a gate decoding result;
and the perception data mapping unit is used for carrying out multilayer perception data mapping on the gate decoding result to obtain the alternative dance motion vector of the dancer in the current segment.
Optionally, the music encoding unit includes:
and the audio feature vector extraction subunit is used for carrying out local music coding on the mixed audio features based on a coding mode of a convolutional neural network structure so as to extract deep information of the mixed audio features to obtain the deep audio feature vector.
And the second prediction sub-module is used for performing motion prediction according to the alternative dance motion vector of the dancer in the current segment and the dancer activation vector of the dancer to obtain a dance motion set of the dancer in the current segment.
Optionally, the second prediction sub-module includes:
the corresponding relation establishing unit is used for establishing a corresponding relation between the dancing action vector of the dancer and the dancer activation vector of the dancer aiming at the current segment;
and the action selection unit is used for carrying out action selection operation according to the corresponding relation to obtain a dance action set of the dancer in the current segment.
Optionally, the sequence summarizing module includes:
the sequence splicing sub-module is used for splicing dance action postures contained in the dance action set of each dancer through a dance action splicing model aiming at each segment to obtain a dance posture sequence of the dancer;
and the sequence summarizing submodule is used for summarizing dance gesture sequences of dancers in all the segments to obtain a group dance action gesture set corresponding to the music file, and the group dance action gesture set is used for representing a series of dance actions made by each dancer in each segment played by the music file.
Optionally, the apparatus further comprises:
and the action posture display module is used for correspondingly displaying the group dance action posture set according to the time playing precision of the music file while playing the music file. With regard to the apparatus in the above-described embodiment, the specific manner in which each module performs the operation has been described in detail in the embodiment related to the method, and will not be elaborated here.
In summary, the dance action generating apparatus provided in the embodiment of the present invention includes an audio feature extracting module that extracts and analyzes audio feature information included in a music file, and quantificationally describes audio features of the obtained music file; music-driven dance generation is carried out on the basis of the audio characteristics, and alternative dance actions matched with the audio characteristics are obtained through a first prediction module; and then adding dancer activation vectors with the personal dance styles of the dancers, arranging the obtained alternative dance actions by using a convolutional neural network algorithm through a second prediction module, so that complete dance actions with the personal styles can be generated for different dancers under the same background music file, finally integrating action units of a plurality of dancers completely through action splicing, summarizing the complete action posture sequences of all the dancers in the single performance process through a sequence summarizing module, and obtaining the multi-person group dance with various mixed styles and rich levels.
Fig. 6 is a block diagram illustrating an electronic device 600 according to an example embodiment. For example, the electronic device 600 may be a mobile phone, a computer, a digital broadcast terminal, a messaging device, a game console, a tablet device, a medical device, an exercise device, a personal digital assistant, and the like.
Referring to fig. 6, electronic device 600 may include one or more of the following components: processing component 602, memory 604, power component 606, multimedia component 608, audio component 610, input/output (I/O) interface 612, sensor component 614, and communication component 616.
The processing component 602 generally controls overall operation of the electronic device 600, such as operations associated with display, telephone calls, data communications, camera operations, and recording operations. The processing component 602 may include one or more processors 620 to execute instructions to perform all or a portion of the steps of the methods described above. Further, the processing component 602 can include one or more modules that facilitate interaction between the processing component 602 and other components. For example, the processing component 602 can include a multimedia module to facilitate interaction between the multimedia component 608 and the processing component 602.
The memory 604 is used to store various types of data to support operations at the electronic device 600. Examples of such data include instructions for any application or method operating on the electronic device 600, contact data, phonebook data, messages, pictures, multimedia, and so forth. The memory 604 may be implemented by any type or combination of volatile or non-volatile memory devices such as Static Random Access Memory (SRAM), electrically erasable programmable read-only memory (EEPROM), erasable programmable read-only memory (EPROM), programmable read-only memory (PROM), read-only memory (ROM), magnetic memory, flash memory, magnetic or optical disks.
Power supply component 606 provides power to the various components of electronic device 600. The power components 606 may include a power management system, one or more power supplies, and other components associated with generating, managing, and distributing power for the electronic device 600.
The multimedia component 608 includes a screen that provides an output interface between the electronic device 600 and a user. In some embodiments, the screen may include a Liquid Crystal Display (LCD) and a Touch Panel (TP). If the screen includes a touch panel, the screen may be implemented as a touch screen to receive an input signal from a user. The touch panel includes one or more touch sensors to sense touch, slide, and gestures on the touch panel. The touch sensor may not only sense demarcations of a touch or slide action, but also detect a duration and pressure associated with the touch or slide operation. In some embodiments, the multimedia component 608 includes a front facing camera and/or a rear facing camera. The front camera and/or the rear camera may receive external multimedia data when the electronic device 600 is in an operation mode, such as a photographing mode or a multimedia mode. Each front camera and rear camera may be a fixed optical lens system or have a focal length and optical zoom capability.
The audio component 610 is used to output and/or input audio signals. For example, the audio component 610 may include a Microphone (MIC) for receiving external audio signals when the electronic device 600 is in an operational mode, such as a call mode, a recording mode, and a voice recognition mode. The received audio signal may further be stored in the memory 604 or transmitted via the communication component 616. In some embodiments, audio component 610 further includes a speaker for outputting audio signals.
The I/O interface 612 provides an interface between the processing component 602 and peripheral interface modules, which may be keyboards, click wheels, buttons, etc. These buttons may include, but are not limited to: a home button, a volume button, a start button, and a lock button.
The sensor component 614 includes one or more sensors for providing status assessment of various aspects of the electronic device 600. For example, the sensor component 614 may detect an open/closed state of the electronic device 600, the relative positioning of components, such as a display and keypad of the electronic device 600, the sensor component 614 may also detect a change in the position of the electronic device 600 or a component of the electronic device 600, the presence or absence of user contact with the electronic device 600, orientation or acceleration/deceleration of the electronic device 600, and a change in the temperature of the electronic device 600. The sensor assembly 614 may include a proximity sensor configured to detect the presence of a nearby object in the absence of any physical contact. The sensor assembly 614 may also include a light sensor, such as a CMOS or CCD image sensor, for use in imaging applications. In some embodiments, the sensor assembly 614 may also include an acceleration sensor, a gyroscope sensor, a magnetic sensor, a pressure sensor, or a temperature sensor.
The communication component 616 is operative to facilitate communications between the electronic device 600 and other devices in a wired or wireless manner. The electronic device 600 may access a wireless network based on a communication standard, such as WiFi, a carrier network (such as 2G, 3G, 4G, or 5G), or a combination thereof. In an exemplary embodiment, the communication component 616 receives broadcast signals or broadcast related information from an external broadcast management system via a broadcast channel. In an exemplary embodiment, the communication component 616 further includes a Near Field Communication (NFC) module to facilitate short-range communications. For example, the NFC module may be implemented based on Radio Frequency Identification (RFID) technology, infrared data association (IrDA) technology, ultra Wideband (UWB) technology, bluetooth (BT) technology, and other technologies.
In an exemplary embodiment, the electronic device 600 may be implemented by one or more Application Specific Integrated Circuits (ASICs), digital Signal Processors (DSPs), digital Signal Processing Devices (DSPDs), programmable Logic Devices (PLDs), field Programmable Gate Arrays (FPGAs), controllers, micro-controllers, microprocessors or other electronic components for implementing one of the message transmission methods provided by the embodiments of the present disclosure.
In an exemplary embodiment, a non-transitory storage medium including instructions, such as the memory 604 including instructions, executable by the processor 620 of the electronic device 600 to perform the above-described method is also provided. For example, the non-transitory storage medium may be a ROM, a Random Access Memory (RAM), a CD-ROM, a magnetic tape, a floppy disk, an optical data storage device, and the like.
Fig. 7 is a block diagram illustrating an electronic device 700 in accordance with an example embodiment. For example, the electronic device 700 may be provided as a server. Referring to fig. 7, electronic device 700 includes a processing component 722 that further includes one or more processors, and memory resources, represented by memory 732, for storing instructions, such as applications, that are executable by processing component 722. The application programs stored in memory 732 may include one or more modules that each correspond to a set of instructions. Further, the processing component 722 is configured to execute instructions to perform methods provided by embodiments of the present disclosure.
The electronic device 700 may also include a power component 726 configured to perform power management of the electronic device 700, a wired or wireless network interface 750 configured to connect the electronic device 700 to a network, and an input/output (I/O) interface 758. The electronic device 700 may operate based on an operating system stored in memory 732, such as Windows Server, mac OS XTM, unixTM, linuxTM, freeBSDTM, or the like.
Embodiments of the present disclosure also provide a computer program product comprising a computer program that, when executed by a processor, implements the method.
Other embodiments of the disclosure will be apparent to those skilled in the art from consideration of the specification and practice of the disclosure disclosed herein. This disclosure is intended to cover any variations, uses, or adaptations of the disclosure following, in general, the principles of the disclosure and including such departures from the present disclosure as come within known or customary practice in the art to which the disclosure pertains. It is intended that the specification and examples be considered as exemplary only, with a true scope and spirit of the disclosure being indicated by the following claims.
It will be understood that the present disclosure is not limited to the precise arrangements described above and shown in the drawings and that various modifications and changes may be made without departing from the scope thereof. The scope of the present disclosure is limited only by the appended claims.

Claims (20)

1. A dance action generating method, comprising:
acquiring a music file, and extracting mixed audio features of the music file, wherein the mixed audio features are used for representing the style of the music file; the music file is divided into a plurality of segments according to playing time;
for each current segment, predicting a dancer activation vector of each current segment dancer according to the mixed audio features and the dancer activation vector of the dancer of the previous segment; the dancer activation vector is used for representing whether each dancer in the segment performs dancing actions or not;
performing action prediction according to the mixed audio features, the dance action set of the dancer in the previous segment and the dancer activation vector of the dancer in the current segment to obtain the dance action set of each dancer in the current segment; the dance action set comprises all action gestures to be done by the dancer in the segment;
and splicing dance motion postures contained in the dance motion set of the dancer aiming at each segment to obtain a dance posture sequence of the dancer, and summarizing the dance posture sequences of the dancers in all the segments to obtain a group dance motion posture set.
2. The method of claim 1, wherein the obtaining the music file and extracting the mixed audio feature of the music file comprises:
extracting audio parameters from the music file; wherein the audio parameters include: a chromaticity spectrum, a basic rhythm unit and a starting note;
and splicing the extracted audio parameters to obtain the mixed audio features.
3. The method of claim 1, wherein predicting, for each current segment, a dancer activation vector for each current segment dancer from the mixed audio feature and a dancer activation vector for a previous segment dancer comprises:
inputting the obtained mixed audio features and the dancer activation vector of the dancer in the previous segment into a decoder for gate decoding to obtain a gate decoding result; wherein the decoder is a decoder based on a gate control loop unit;
and predicting according to the gate decoding result to obtain the dancer activation vector of the current segment dancer.
4. The method of claim 1, wherein the obtaining of the dance action set of each dancer in the current segment according to the mixed audio features, the dance action set of the dancer in the previous segment and the dancer activation vector of the dancer in the current segment for action prediction comprises:
predicting alternative dance motion vectors of each dancer in the current segment according to the mixed audio features and the dance motion collection of the dancer in the previous segment;
and performing action prediction according to the alternative dance action vector of the dancer in the current segment and the dancer activation vector of the dancer to obtain a dance action set of the dancer in the current segment.
5. The method of claim 4, wherein predicting the candidate dance motion vector of each of the dancers in the current segment according to the mixed audio features and the dance motion set of the dancers in the previous segment comprises:
performing local music coding on the mixed audio features to obtain a depth audio feature vector;
combining the depth audio feature vector with the dance action information of the dancer in the previous section to perform door decoding to obtain a door decoding result;
and carrying out multilayer perception data mapping on the gate decoding result to obtain the alternative dance motion vector of the dancer in the current segment.
6. The method of claim 4, wherein the obtaining of the dance action set of the dancer in the current segment by performing action prediction according to the alternative dance action vector of the dancer and the dancer activation vector of the dancer in the current segment comprises:
establishing a corresponding relation between the dancing action vector of the dancer and the dancer activation vector of the dancer aiming at the current segment;
and performing action selection operation according to the corresponding relation to obtain a dance action set of the dancer in the current segment.
7. The method of claim 5, wherein the locally music coding the mixed audio feature to obtain a deep audio feature vector comprises:
and carrying out local music coding on the mixed audio features based on a coding mode of a convolutional neural network structure so as to extract deep information of the mixed audio features to obtain the deep audio feature vector.
8. The method according to claim 1, wherein the splicing, for each of the segments, dance motion gestures included in the dance motion set of the dancer to obtain a dance gesture sequence of the dancer, and summarizing the dance gesture sequences of the dancers in all the segments to obtain a group dance motion gesture set comprises:
splicing dance motion postures contained in the dance motion set of each dancer through a dance motion splicing model aiming at each section to obtain a dance posture sequence of the dancer;
and summarizing dance gesture sequences of dancers in all the segments to obtain a group dance action gesture set corresponding to the music file, wherein the group dance action gesture set is used for representing a series of dance actions performed by each dancer in each segment played by the music file.
9. The method according to claim 1, wherein, after the splicing, for each of the segments, dance motion gestures included in the dance motion set of the dancer to obtain a dance gesture sequence of the dancer, and summarizing the dance gesture sequences of the dancers in all the segments to obtain a group dance motion gesture set, the method further comprises:
and correspondingly displaying the group dance motion attitude set according to the time playing precision of the music file while playing the music file.
10. A dance motion generation apparatus, comprising:
the audio feature extraction module is used for acquiring music files and extracting mixed audio features of the music files, wherein the mixed audio features are used for representing the styles of the music files; the music file is divided into a plurality of segments according to playing time;
the first prediction module is used for predicting the dancer activation vector of each current segment dancer according to the mixed audio features and the dancer activation vector of the dancer in the previous segment; the dancer activation vector is used for representing whether each dancer in the segment performs dancing actions or not;
the second prediction module is used for performing motion prediction according to the mixed audio features, the dance motion set of the dancer in the previous segment and the dancer activation vector of the dancer in the current segment to obtain the dance motion set of each dancer in the current segment; the dance action set comprises all action gestures to be done by the dancer in the segment;
and the sequence summarizing module is used for splicing dance action gestures contained in the dance action set of the dancer aiming at each segment to obtain a dance gesture sequence of the dancer, summarizing the dance gesture sequences of the dancers in all the segments to obtain a group dance action gesture set.
11. The apparatus of claim 10, wherein the audio feature extraction module comprises:
the parameter extraction submodule is used for extracting audio parameters from the music file; wherein the audio parameters include: a chromaticity spectrum, a basic rhythm unit and a starting note;
and the parameter splicing submodule is used for splicing the extracted audio parameters to obtain the mixed audio features.
12. The apparatus of claim 10, wherein the first prediction module comprises:
the gate decoding submodule is used for inputting the obtained mixed audio features and the dancer activation vector of the dancer in the previous section into a decoder for gate decoding to obtain a gate decoding result; wherein the decoder is a decoder based on a gate control loop unit;
and the first prediction sub-module is used for predicting according to the gate decoding result to obtain the dancer activation vector of the current segment dancer.
13. The apparatus of claim 10, wherein the second prediction module comprises:
the action vector prediction sub-module is used for predicting the alternative dance action vector of each dancer in the current segment according to the mixed audio features and the dance action set of the dancer in the previous segment;
and the second prediction sub-module is used for performing motion prediction according to the candidate dance motion vector of the dancer in the current segment and the dancer activation vector of the dancer to obtain a dance motion set of the dancer in the current segment.
14. The apparatus of claim 13, wherein the motion vector predictor sub-module comprises:
the music coding unit is used for carrying out local music coding on the mixed audio features to obtain a deep audio feature vector;
the gate decoding unit is used for combining the depth audio feature vector and the dance action information of the previous dance section to perform gate decoding to obtain a gate decoding result;
and the perception data mapping unit is used for carrying out multilayer perception data mapping on the gate decoding result to obtain the alternative dance motion vector of the dancer in the current segment.
15. The apparatus of claim 13, wherein the second predictor sub-module comprises:
the corresponding relation establishing unit is used for establishing a corresponding relation between the dancing action vector of the dancer and the dancer activation vector of the dancer aiming at the current segment;
and the action selection unit is used for carrying out action selection operation according to the corresponding relation to obtain a dance action set of the dancer in the current segment.
16. The apparatus of claim 14, wherein the music encoding unit comprises:
and the audio feature vector extraction subunit is used for carrying out local music coding on the mixed audio features based on a coding mode of a convolutional neural network structure so as to extract deep information of the mixed audio features to obtain the deep audio feature vector.
17. The apparatus of claim 10, wherein the sequence summarizing module comprises:
the sequence splicing sub-module is used for splicing dance action postures contained in the dance action set of each dancer through a dance action splicing model according to each segment to obtain a dance posture sequence of the dancers;
and the sequence summarizing submodule is used for summarizing dance gesture sequences of dancers in all the segments to obtain a group dance action gesture set corresponding to the music file, and the group dance action gesture set is used for representing a series of dance actions made by each dancer in each segment played by the music file.
18. The apparatus of claim 10, further comprising:
and the action posture display module is used for correspondingly displaying the group dance action posture set according to the time playing precision of the music file while playing the music file.
19. An electronic device, comprising: a processor;
a memory for storing the processor-executable instructions;
wherein the processor is configured to execute the instructions to implement the method of any one of claims 1 to 9.
20. A storage medium, wherein instructions in the storage medium, when executed by a processor of an electronic device, enable the electronic device to perform the method of any of claims 1-9.
CN202211010740.4A 2022-08-23 2022-08-23 Dance action generation method and device, electronic equipment and storage medium Pending CN115379299A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202211010740.4A CN115379299A (en) 2022-08-23 2022-08-23 Dance action generation method and device, electronic equipment and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202211010740.4A CN115379299A (en) 2022-08-23 2022-08-23 Dance action generation method and device, electronic equipment and storage medium

Publications (1)

Publication Number Publication Date
CN115379299A true CN115379299A (en) 2022-11-22

Family

ID=84067297

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202211010740.4A Pending CN115379299A (en) 2022-08-23 2022-08-23 Dance action generation method and device, electronic equipment and storage medium

Country Status (1)

Country Link
CN (1) CN115379299A (en)

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20040264939A1 (en) * 2003-06-30 2004-12-30 Microsoft Corporation Content-based dynamic photo-to-video methods and apparatuses
CN101615302A (en) * 2009-07-30 2009-12-30 浙江大学 The dance movement generation method that music data drives based on machine learning
CN109176541A (en) * 2018-09-06 2019-01-11 南京阿凡达机器人科技有限公司 A kind of method, equipment and storage medium realizing robot and dancing
CN110853670A (en) * 2019-11-04 2020-02-28 南京理工大学 Music-driven dance generating method
CN110955786A (en) * 2019-11-29 2020-04-03 网易(杭州)网络有限公司 Dance action data generation method and device
CN111968202A (en) * 2020-08-21 2020-11-20 北京中科深智科技有限公司 Real-time dance action generation method and system based on music rhythm
CN114237271A (en) * 2021-12-15 2022-03-25 灵起科技(深圳)有限公司 Method for falling, rising and synchronous action of group dance robot

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20040264939A1 (en) * 2003-06-30 2004-12-30 Microsoft Corporation Content-based dynamic photo-to-video methods and apparatuses
CN101615302A (en) * 2009-07-30 2009-12-30 浙江大学 The dance movement generation method that music data drives based on machine learning
CN109176541A (en) * 2018-09-06 2019-01-11 南京阿凡达机器人科技有限公司 A kind of method, equipment and storage medium realizing robot and dancing
CN110853670A (en) * 2019-11-04 2020-02-28 南京理工大学 Music-driven dance generating method
CN110955786A (en) * 2019-11-29 2020-04-03 网易(杭州)网络有限公司 Dance action data generation method and device
CN111968202A (en) * 2020-08-21 2020-11-20 北京中科深智科技有限公司 Real-time dance action generation method and system based on music rhythm
CN114237271A (en) * 2021-12-15 2022-03-25 灵起科技(深圳)有限公司 Method for falling, rising and synchronous action of group dance robot

Similar Documents

Publication Publication Date Title
CN110662083B (en) Data processing method and device, electronic equipment and storage medium
TWI778477B (en) Interaction methods, apparatuses thereof, electronic devices and computer readable storage media
US11308671B2 (en) Method and apparatus for controlling mouth shape changes of three-dimensional virtual portrait
CN110019961A (en) Method for processing video frequency and device, for the device of video processing
CN112396679B (en) Virtual object display method and device, electronic equipment and medium
CN111954063B (en) Content display control method and device for video live broadcast room
CN113642394B (en) Method, device and medium for processing actions of virtual object
US11521653B2 (en) Video sequence layout method, electronic device and storage medium
CN113923462A (en) Video generation method, live broadcast processing method, video generation device, live broadcast processing device and readable medium
CN114998491B (en) Digital human driving method, device, equipment and storage medium
CN113704390A (en) Interaction method and device of virtual objects, computer readable medium and electronic equipment
CN113689879A (en) Method, device, electronic equipment and medium for driving virtual human in real time
CN114429611B (en) Video synthesis method and device, electronic equipment and storage medium
CN110349577B (en) Man-machine interaction method and device, storage medium and electronic equipment
CN113282791B (en) Video generation method and device
CN110929616A (en) Human hand recognition method and device, electronic equipment and storage medium
CN113689530B (en) Method and device for driving digital person and electronic equipment
CN115379299A (en) Dance action generation method and device, electronic equipment and storage medium
CN113689880A (en) Method, device, electronic equipment and medium for driving virtual human in real time
CN116564272A (en) Method for providing voice content and electronic equipment
CN114401439A (en) Dance video generation method, equipment and storage medium
CN113920229A (en) Virtual character processing method and device and storage medium
CN114356068A (en) Data processing method and device and electronic equipment
CN114130027A (en) Game audio playing method and device, electronic equipment, storage medium and product
CN114093341A (en) Data processing method, apparatus and medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination