CN111104964B

CN111104964B - Method, equipment and computer storage medium for matching music with action

Info

Publication number: CN111104964B
Application number: CN201911158848.6A
Authority: CN
Inventors: 林超
Original assignee: Beijing Yonghang Technology Co Ltd
Current assignee: Beijing Yonghang Technology Co Ltd
Priority date: 2019-11-22
Filing date: 2019-11-22
Publication date: 2023-10-17
Anticipated expiration: 2039-11-22
Also published as: CN111104964A

Abstract

The disclosure discloses a method, equipment and a computer storage medium for matching music and actions, and belongs to the field of music dancing. The method comprises the following steps: acquiring a plurality of music dance fragments and corresponding rhythm characteristics; determining the distance between the music piece and the action piece; determining a music fragment and an action fragment with the largest distance and the smallest distance; taking the feature sequence of the action segment, the feature sequence of the music segment and the distance between the music segment and the action segment as training samples, training to obtain a matching distance model of the music segment and the action segment, wherein the output of the model is the matching distance between the music segment and the action segment; acquiring an action transition distance; taking the sum of the matching distance and the action transition distance as the distance between the music piece and the action piece; obtaining music to be matched; and determining a plurality of target action fragments with the smallest total distance with the music to be matched and matching the target action fragments with the music to be matched. The method and the device solve the problem that the matching degree of the action segment and the music to be matched is poor in the related art.

Description

Method, equipment and computer storage medium for matching music with action

Technical Field

The present disclosure relates to the field of music dance, and in particular, to a method, apparatus, and computer storage medium for matching music and actions.

Background

Dance actions matching a given music performance have found widespread use in music dance games and other fields.

In a method for matching music and actions in the related art, firstly, music to be matched and an action segment library are obtained, wherein the action segment library can comprise a plurality of action segments, and then a plurality of action segments are randomly selected from the action segment library to be matched with the music to be matched.

However, in the above method, the action segments are randomly selected from the action segment library, and the matching degree of the action segments and the music to be matched is poor.

Disclosure of Invention

The embodiment of the disclosure provides a music and action matching method, which can solve the problem of poor matching degree of action fragments and music to be matched in the related technology. The technical scheme is as follows:

according to a first aspect of the present disclosure, there is provided a music and action matching method, the music and action matching method including:

acquiring a plurality of manually-coded music dance segments, wherein each music dance segment comprises a music segment and a corresponding action segment;

acquiring a plurality of rhythm characteristics corresponding to the music dancing pieces;

determining the Euclidean distance of any two rhythm features in the plurality of rhythm features as the distance between a music piece corresponding to a first rhythm feature and an action piece corresponding to a second rhythm feature in the any two rhythm features;

determining n music pieces and action pieces with the largest distance and m music pieces and action pieces with the smallest distance in the music pieces and action pieces corresponding to the rhythm features, wherein m and n are integers larger than 0;

acquiring the characteristic sequences of the n pieces of music and the action pieces and the characteristic sequences of the m pieces of music and the action pieces;

training to obtain a music piece and action piece matching distance model by taking the feature sequences of the n music pieces and the action pieces, the feature sequences of the m music pieces and the action pieces, the distances between the n music pieces and the action pieces and the distances between the m music pieces and the action pieces as training samples, wherein the output of the music piece and action piece matching distance model is the matching distance between the music pieces and the action pieces;

acquiring an action transition distance formula, wherein the action transition distance formula is used for outputting an action transition distance;

taking the sum of the matching distance between the music piece and the action transition distance as the distance between the music piece and the action piece;

obtaining music to be matched, wherein the music to be matched comprises a plurality of pieces of music to be matched;

determining a plurality of target action fragments with minimum total distances to the plurality of music fragments to be matched in an action fragment library, wherein the action fragment library comprises a plurality of action fragments;

and matching the target action fragments with the music to be matched.

Optionally, the obtaining a plurality of rhythm features corresponding to the plurality of music dance segments includes:

determining the rhythm feature according to a rhythm feature formula, the rhythm feature formula comprising:

z(M)＝h _z (f _motion (M))＝[z ₁ ，z ₂ ，…，z _zdim ] ^T ；

wherein M is any action segment, z (M) is the rhythm feature corresponding to M, and h is _z For feature mapping, the zdim is the dimension of the z (M), the f _motion (M)＝[f _anim (M，t ₁ )，f _anim (M，t ₂ )，…，f _anim (M，t _N )]The characteristic sequence of M is in a matrix form, t is any moment of M, N is the sampling number of M, and f _anim (M，t)＝[p(M，t)，q ₁ (M，t)，q ₂ (M，t)，…，q _r (M，t)] ^T The characteristic of M at the moment t in matrix form, wherein p (M, t) is a root nodeAnd q (M, t) is rotation information of a joint of a character, and r is a serial number of the joint.

Optionally, the determining, according to the determining, that the euclidean distance between any two rhythm features in the plurality of rhythm features is the distance between the music piece corresponding to the first rhythm feature and the action piece corresponding to the second rhythm feature in the any two rhythm features includes:

determining the distance between a music piece corresponding to a first rhythm feature and an action piece corresponding to a second rhythm feature in the two rhythm features according to a first distance formula, wherein the first distance formula comprises:

D _match (A _i ，M _j )＝D _motion (M _i ，M _j )＝||z(M _i )-z(M _j )||；

wherein the A _i For the music piece corresponding to the first playing feature, the M _j For the action segment corresponding to the second rhythm feature, the D _match For the distance between the music piece corresponding to the first rhythm feature and the action piece corresponding to the second rhythm feature in the arbitrary two rhythm features, the M is _i For the action segment corresponding to the first playing feature, the D _motion For the distance between the motion segment corresponding to the first rhythm feature and the motion segment corresponding to the second rhythm feature in the arbitrary two rhythm features, z (M _i ) For the first nodal feature, the z (M _j ) Is the second cadence characteristic.

Optionally, the obtaining the feature sequences of the n pieces of music and the action pieces, and the feature sequences of the m pieces of music and the action pieces include:

according to f _motion (M)＝[f _anim (M，t ₁ )，f _anim (M，t ₂ )，…，f _anim (M，t _N )]Determining a characteristic sequence of the action segment in a matrix form;

according to f _audio (A)＝[f _mfcc (A，t ₁ )，f _mfcc (A，t ₂ )，…，f _mfcc (A，t _N )]Determining a characteristic sequence of the music piece in a matrix form;

wherein A is any music piece, t is any time of M and A, N is the sampling number of M and A, and f _mfcc (a, t) is a mel-frequency cepstrum coefficient of said a at said any time t.

Optionally, the training to obtain a matching distance model of the music piece and the action piece by using the feature sequences of the n music pieces and the action piece, the feature sequences of the m music pieces and the action piece, the distances between the n music pieces and the action piece, and the distances between the m music pieces and the action piece as training samples includes:

acquiring training data, wherein the training data comprises characteristic sequences of the n music pieces and action pieces, characteristic sequences of the m music pieces and action pieces, distances between the n music pieces and the action pieces and distances between the m music pieces and the action pieces;

training the initial neural network model according to the training data to obtain feature mapping about music fragments and feature mapping about action fragments;

obtaining a matching distance model of the music piece and the action piece according to the feature mapping of the music piece and the feature mapping of the action piece, wherein the matching distance model of the music piece and the action piece comprises the following steps:

D _match (A _i ，M _j )＝h _match (f _audio (A _i )，f _motion (M _j ))；

h _match (f _audio (A _i )，f _motion (M _j ))＝||h _audio (f _audio (A _i ))-h _motion (f _motion (M _j ))||；

wherein the h is _audio For feature mapping with respect to a piece of music, the h _motion For feature mapping with respect to action segments, the h _match Mapping of matching distance for musical piece to action piece。

Optionally, the action transition distance formula includes:

cost(M _i ，M _j )＝max{speed(f _trans (M _i ，M _j ))}；

f _trans (M _i ，M _j )＝blend(f _from (M _i )，f _to (M _j ))；

f _from (M)＝[f _anim (M，t _N-s )，f _anim (M，t _N-s+1 )，…，f _anim (M，t _N+s-1 )，f _anim (M，t _N+s )]；

f _to (M)＝[f _anim (M，t _-s )，f _anim (M，t _-s+1 )，…，f _anim (M，t _s-1 )，f _anim (M，t _s )]；

wherein the D is _trans For the motion transition distance, the cost (M _i ，M _j ) To take the maximum value in all joint speeds as the M _i Transition to the M _j At the cost of θ is a first threshold, speed is the joint velocity, and f _trans For transitional actions, the blend is an action mixing algorithm, and f _from To the M _i The time of the last frame is taken as the center, the length of the half beat is taken as a window, and the M is taken as the center _i Intercepting the action of half beat, said f _to To the M _j A first frame time is taken as a center, a half beat length is taken as a window, and the M is taken as a center _j And intercepting the action of half beat, wherein s is the radius of the window.

Optionally, the step of taking the sum of the matching distance between the music piece and the action transition distance as the distance between the music piece and the action piece includes:

determining the distance between the music piece and the action piece according to a distance formula between the music piece and the action piece, wherein the distance formula between the music piece and the action piece comprises the following steps:

wherein D is the distance between the music piece and the action piece, and M is the sum of the distance between the music piece and the action piece _x1 ，M _x2 ，…，M _xn And the action fragments are a plurality of the action fragments in the action fragment library.

Optionally, the determining a plurality of target action segments with the smallest total distance from the plurality of music segments to be matched in the action segment library includes:

and determining the target action fragments with the smallest total distance with the music fragments to be matched in the action fragment library through a dynamic programming algorithm.

In another aspect, a music and action matching device is provided, the music and action matching device including a processor and a memory, the memory storing at least one instruction, at least one program, a code set or a set of instructions, the at least one instruction, the at least one program, the code set or the set of instructions being loaded and executed by the processor to implement the music and action matching method according to the first aspect.

In yet another aspect, a computer storage medium is provided, where at least one instruction, at least one program, a set of codes, or a set of instructions is stored in the computer storage medium, where the at least one instruction, the at least one program, the set of codes, or the set of instructions are loaded and executed by a processor to implement the method for matching music and actions according to the first aspect.

The technical scheme provided by the embodiment of the disclosure has the beneficial effects that at least:

the method comprises the steps of taking a characteristic sequence of a music piece, a characteristic sequence of an action piece and a distance between the music piece and the action piece as training samples, training to obtain a music piece and action piece matching distance model, obtaining an action transition distance by using the output of the model as the matching distance between the music piece and the action piece, obtaining to-be-matched music comprising a plurality of to-be-matched music pieces by taking the sum of the matching distance and the action transition distance as the distance between the music piece and the action piece, determining a plurality of target action pieces with the smallest total distance with the to-be-matched music pieces, and matching the plurality of target action pieces with the to-be-matched music, wherein the matching degree of the action pieces with the to-be-matched music is higher. The method solves the problem of poor matching degree of the action segment and the music to be matched in the related technology.

Drawings

In order to more clearly illustrate the technical solutions of the embodiments of the present disclosure, the drawings required for the description of the embodiments will be briefly introduced below, and it is obvious that the drawings in the following description are only some embodiments of the present disclosure, and other drawings may be obtained according to these drawings without inventive effort for a person of ordinary skill in the art.

FIG. 1 is a schematic diagram of an implementation environment of a music and action matching method provided by an embodiment of the present disclosure;

FIG. 2 is a flow chart of a method for matching music to actions provided by an embodiment of the present disclosure;

FIG. 3 is a flow chart of another method of matching music to actions provided by an embodiment of the present disclosure;

fig. 4 is a schematic structural diagram of a matching device for music and actions according to an embodiment of the present disclosure.

Specific embodiments of the present disclosure have been shown by way of the above drawings and will be described in more detail below. These drawings and the written description are not intended to limit the scope of the disclosed concepts in any way, but rather to illustrate the disclosed concepts to those skilled in the art by reference to specific embodiments.

Detailed Description

For the purposes of clarity, technical solutions and advantages of the present disclosure, the following further details the embodiments of the present disclosure with reference to the accompanying drawings.

In a current matching method of music and actions, firstly, music to be matched and an action segment library are obtained, wherein the action segment library can comprise a plurality of action segments, and then a plurality of action segments are randomly selected from the action segment library to be matched with the music to be matched.

However, according to the music and action matching method, action fragments are randomly selected from the action fragment library, and the matching degree of the action fragments and the music to be matched is poor.

The embodiment of the disclosure provides a music and action matching method, equipment and a computer storage medium.

Fig. 1 is a schematic diagram of an implementation environment of a music and action matching method according to an embodiment of the present disclosure, where the implementation environment may include a server 11 and a terminal 12.

The server 11 may be a server or a cluster of servers.

The terminal 12 may be a mobile phone, a tablet computer, a notebook computer, an intelligent wearable device, or other terminals. The terminal 12 may be connected to the server by wire or wirelessly (fig. 1 shows the case of a connection made wirelessly).

Fig. 2 is a flowchart of a method for matching music and actions according to an embodiment of the present disclosure. The matching method of the music and the actions can be applied to the server of the implementation environment. The matching method of the music and the action can comprise the following steps:

step 201, a plurality of manually-encoded music dance segments are obtained, wherein each music dance segment comprises a music segment and a corresponding action segment.

Step 202, obtaining a plurality of rhythm features corresponding to a plurality of music dance segments.

In step 203, the euclidean distance between any two rhythm features in the plurality of rhythm features is determined as the distance between the music piece corresponding to the first rhythm feature and the action piece corresponding to the second rhythm feature in the any two rhythm features.

Step 204, determining n pieces of music and motion segments with the largest distance and m pieces of music and motion segments with the smallest distance from the pieces of music and motion segments corresponding to the rhythmic features, where m and n are integers greater than 0.

Step 205, obtain the feature sequences of n pieces of music and action pieces, and the feature sequences of m pieces of music and action pieces.

In step 206, the matching distance model between the music piece and the action piece is obtained by training with the feature sequences of the n music pieces and the action piece, the feature sequences of the m music pieces and the action piece, the distances between the n music pieces and the action piece, and the distances between the m music pieces and the action piece as training samples, and the output of the matching distance model between the music piece and the action piece is the matching distance between the music piece and the action piece.

Step 207, obtaining an action transition distance formula, wherein the action transition distance formula is used for outputting the action transition distance.

Step 208, taking the sum of the matching distance between the music piece and the action transition distance as the distance between the music piece and the action piece.

In step 209, music to be matched is obtained, where the music to be matched includes a plurality of pieces of music to be matched.

Step 210, determining a plurality of target action segments with minimum total distance from the plurality of music segments to be matched in the action segment library, wherein the action segment library comprises a plurality of action segments.

Step 211, matching the plurality of target action segments with the music to be matched.

In summary, the embodiment of the disclosure provides a method for matching music and actions, which uses a feature sequence of a music piece, a feature sequence of an action piece, and distances between the music piece and the action piece as training samples to train and obtain a music piece and action piece matching distance model, wherein the output of the model is a matching distance between the music piece and the action piece, an action transition distance is obtained, the sum of the matching distance and the action transition distance is used as a distance between the music piece and the action piece, to obtain to-be-matched music including a plurality of to-be-matched music pieces, a plurality of target action pieces with the smallest total distance to the to-be-matched music pieces are determined, the plurality of target action pieces are matched with the to-be-matched music, and the matching degree of the action pieces and the to-be-matched music is higher. The method solves the problem of poor matching degree of the action segment and the music to be matched in the related technology.

Fig. 3 is a flowchart of another music and action matching method according to an embodiment of the present disclosure, where the music and action matching method may be applied to the server of the above-described implementation environment. The music and action matching method provided by the embodiment of the disclosure can be applied to music dance games, and the music dance games can be realized by means of skeleton animation. As can be seen with reference to fig. 3, the method for matching music with actions may include:

step 301, a plurality of manually-encoded music dance segments are obtained, wherein each music dance segment comprises a music segment and a corresponding action segment.

And the person dances given music to obtain a plurality of manually-dance music pieces. The music dance can be split into a plurality of pieces according to a certain step distance or length, so that a plurality of music dance pieces are obtained. Each music piece corresponds to an action piece.

For example, music may be danced by a person wearing the motion capture device, by acquiring motion captured by the motion capture device, and processing the acquired motion to obtain a music dance. The music dance can be split according to a section of step distance, a section of time length and a section of step distance, and the two sections of time length are used for obtaining a plurality of music dance fragments.

Step 302, determining the rhythm feature according to the rhythm feature formula.

The rhythm characteristic formula comprises:

z(M)＝h _z (f _motion (M))＝[z ₁ ，z ₂ ，…，z _zdim ] ^T ；

f _motion (M)＝[f _anim (M，t ₁ )，f _anim (M，t ₂ )，…，f _anim (M，t _N )]；

f _anim (M，t)＝[p(M，t)，q ₁ (M，t)，q ₂ (M，t)，…，q _r (M，t)] ^T ；

wherein M is any action segment, z (M) is rhythm characteristic corresponding to M, h _z For feature mapping, zdim is the dimension of z (M), illustratively zdim may take 128.f (f) _motion The characteristic sequence of M is in a matrix form, t is any moment of M, N is the sampling number of M, the sampling number can be 8 sampling points per beat, 1 bar is 4 beats, and the sampling number of the action of one bar can be 32.f (f) _anim The characteristic of the matrix form M at the moment t is that p (M, t) is the three-dimensional space position of the root node, q (M, t) is the rotation information of the joint of the character, and r is the serial number of the joint.

f _motion The motion information comprises the gesture information of the motion segment in a certain time, namely the motion information of the motion segment formed by the information of a plurality of joints at a certain moment, and z (M) can reflect the rhythm characteristics of the motion segment M, so that the motion segment can be associated with the music segment. Feature map h _z The learning can be performed by an unsupervised training method. The unsupervised training may include a self-encoder. Feature map h _z May include: encoding the source features into feature spaces with different dimensions, restoring the compressed features to the source features by a decoder, minimizing the difference between the restored features and the source features, and obtaining the final intermediate encoding features which are the better feature mapping h _z 。

Illustratively, the joints of the character are shown in table 1.

TABLE 1

Sequence number	Joint	Name of the name
			1	Bip01	Root node
2	Bip01 Neck	Neck (B)
			3	Bip01 Spine	Spinal column
4	Bip01 L Thigh	Left thigh
			5	Bip01 R Thigh	Right thigh
6	Bip01 L Calf	Left calf
			7	Bip01 R Calf	Right lower leg
8	Bip01 L UpperArm	Left upper arm
			9	Bip01 R UpperArm	Right upper arm
10	Bip01 L Forearm	Left forearm
			11	Bip01 R Forearm	Right forearm
12	Bip01 L Hand	Left hand
			13	Bip01 R Hand	Right hand
14	Bip01 L Foot	Left foot
			15	Bip01 R Foot	Right foot

Exemplary, f _motion May comprise a two-dimensional matrix. First f is carried out _motion Inputting a 2-layer 3*3 convolutional neural network to obtain a group of local features f _local Then f _loca1 Inputting a 2-layer 3*3 convolutional neural network and a full-connection layer to obtain global features f _global . Last f _global And inputting a full connection layer to obtain a rhythm characteristic z. And then can be connected with f _global (M _i ) And f _local (M _i ) As positive samples, f is connected by means of negative sampling _global (M _i ) And randomly selected f _local (M _j ) As a negative sample, the positive and negative samples are distinguished by training the arbiter, optimizing the cadence signature z. The arbiter may comprise a 3-layer fully connected layer neural network. Rhythm feature z can distinguish different action piecesThe segment varies in cadence over time.

Step 303, determining the distance between the music piece corresponding to the first rhythm feature and the action piece corresponding to the second rhythm feature in any two rhythm features according to the first distance formula.

The first distance formula includes:

D _match (A _i ，M _j )＝D _motion (M _i ，M _j )＝||z(M _i )-z(M _j )||；

wherein A is _i For the music piece corresponding to the first playing feature, M _j For the action segment corresponding to the second rhythm feature, D _match M is the distance between the music piece corresponding to the first rhythm feature and the action piece corresponding to the second rhythm feature in any two rhythm features _i For the action segment corresponding to the first playing characteristic, D _motion For the distance between the motion segment corresponding to the first rhythm feature and the motion segment corresponding to the second rhythm feature in any two rhythm features, z (M _i ) For the first characteristic of playing, z (M _j ) Is a second cadence characteristic.

According to the Euclidean distance between the first rhythm feature and the second rhythm feature, the distance between the music piece corresponding to the first rhythm feature and the action piece corresponding to the second rhythm feature can be determined. The euclidean distance, i.e. euclidean metric, is the actual distance between two points in two and three dimensions.

For example, the Euclidean distance may be used to calculate the difference between the first and second rhythm features, i.e. the distance D between the action segment corresponding to the first rhythm feature and the action segment corresponding to the second rhythm feature _motion Further obtaining the distance D between the music piece corresponding to the first rhythm feature and the action piece corresponding to the second rhythm feature _match 。

Step 304, determining n pieces of music and action pieces with the largest distance and m pieces of music and action pieces with the smallest distance from the pieces of music and action pieces corresponding to the rhythm features. m and n are integers greater than 0.

For the followingY-section manually-danced music dance segment can obtain Y ² Distance between music piece and action piece, Y ² The distance may have a problem of more data, so that the n pieces of music and action pieces with the largest distance and the m pieces of music and action pieces with the smallest distance among the pieces of music and action pieces corresponding to the rhythm features can be determined according to the first distance formula, so that the data can be reduced and the data can be more representative.

Step 305, obtaining the feature sequences of n pieces of music and action pieces, and the feature sequences of m pieces of music and action pieces.

wherein A is any music piece, t is any time of M and A, N is the sampling number of M and A, f _mfcc (A, t) is the Mel frequency cepstrum coefficient of A at any time t.

Mel-frequency cepstral coefficients (Mel-Frequency Cepstral Coefficients, MFCC) are coefficients that constitute a Mel-frequency cepstral, whose frequency band more closely approximates the human auditory system.

Step 306, training data is acquired.

The training data comprises characteristic sequences of n pieces of music and action pieces, characteristic sequences of m pieces of music and action pieces, distances of n pieces of music and action pieces and distances of m pieces of music and action pieces.

The distance between the music piece and the action piece can comprise n distances farthest from the music piece and m distances nearest to the music piece, so that training data can be more representative.

Step 307, training the initial neural network model according to the training data to obtain feature maps about the music pieces and feature maps about the action pieces.

The initial neural network model may include, among other things, a 3-layer Long Short-term memory network (Long Short-TermMemory, LSTM).

Step 308, obtaining a matching distance model of the music piece and the action piece according to the feature mapping of the music piece and the feature mapping of the action piece.

The music piece and action piece matching distance model comprises:

D _match (A _i ，M _j )＝h _match (f _audio (A _i )，f _motion (M _j ))；

wherein h is _audio For feature mapping with respect to musical pieces, h _motion Mapping h for features about action segments _match Mapping the matching distance of the music piece and the action piece.

In the embodiment of the disclosure, the Euclidean distance can be used for calculating the distance between the feature map about the music piece and the feature map about the action piece, so as to obtain the matching distance between the music piece and the action piece.

Step 309, determining the action transition distance according to the action transition distance formula.

The action transition distance formula comprises:

cost(M _i ，M _j )＝max{speed(f _trans (M _i ，M _j ))}；

f _trans (M _i ，M _j )＝blend(f _from (M _i )，f _to (M _j ))；

wherein D is _trans To move the transition distance, cost (M _i ，M _j ) To take the maximum value as M in all joint speeds _i Transition to M _j At the cost of θ is a first threshold, speed is joint velocity, f _trans For transitional motion, blend is a motion mixing algorithm, f _from To be M _i The time of the last frame is taken as the center, the length of the half beat is taken as a window, and the time of the last frame is taken as the center, from M _i Intercepting half beat action, f _to To be M _j The moment of the first frame is taken as the center, the length of the half beat is taken as a window, and the time of the first frame is taken as the center, and the time of the second frame is taken as the window _j The action of half beat is intercepted, s is the radius of the window.

The action blending algorithm blend may synthesize two action segments into one action segment.

In the embodiment of the disclosure, the number of samples may be 8 samples per beat, and the radius s of the window is 4.

Step 310, determining the distance between the music piece and the action piece according to the distance formula between the music piece and the action piece.

The distance formula of the music piece and the action piece comprises:

wherein D is the distance between the music piece and the action piece, { M _x1 ，M _x2 ，...，M _xn And the action fragments in the action fragment library.

Matching the music piece with the action piece by a distance D _match Transition distance D from action segment _trans The distance D between the music piece and the action piece can be obtained by adding.

Step 311, obtain the music to be matched.

The music to be matched is music which is not manually danced, and the music to be matched comprises a plurality of music pieces to be matched. The music to be matched can be selected by a user through a terminal, or can be selected by an operator of the server, or can be selected by the server directly in a music library comprising a plurality of music to be matched.

In step 312, a plurality of target action segments with the smallest total distance from the plurality of music segments to be matched in the action segment library are determined by a dynamic programming algorithm.

The action fragment library comprises a plurality of action fragments. The dynamic programming algorithm is a method for solving the optimization problem, and can be used for determining a plurality of target action fragments with the smallest total distance from a plurality of music fragments to be matched in the action fragment library.

Step 313, matching the plurality of target action segments with the music to be matched.

And matching the target action fragments with the smallest total distance with the music fragments to be matched in the action fragment library with the music to be matched. After the matching is completed, connecting a plurality of target action fragments to obtain dance actions matched with the music to be matched.

According to the music and action matching method, a server can dance a large amount of music to be matched, and matching time of the music and the action is shortened.

In summary, the present disclosure provides a method for matching music and actions, where a training sample is a feature sequence of a music piece, a feature sequence of an action piece, and a distance between a music piece and an action piece, and a matching distance model of a music piece and an action piece is obtained by training, an output of the model is a matching distance between a music piece and an action piece, an action transition distance is obtained, a sum of the matching distance and the action transition distance is used as a distance between a music piece and an action piece, to-be-matched music including a plurality of to-be-matched music pieces is obtained, a plurality of target action pieces with the smallest total distance to the to-be-matched music pieces are determined, and a matching degree between the action pieces and the to-be-matched music is high. The method solves the problem of poor matching degree of the action segment and the music to be matched in the related technology.

In one exemplary embodiment, music is danced by a person wearing the motion capture device, and music dance is obtained by acquiring motion captured by the motion capture device and processing the acquired motion. According to the one-section step distance, one-section time length and one-section step distance, the two-section time length splits the music dance to obtain a plurality of music dance fragments, and the server acquires the plurality of music dance fragments. Determining rhythm characteristics according to the rhythm characteristic formula, determining Euclidean distance between a music piece corresponding to a first rhythm characteristic and an action piece corresponding to a second rhythm characteristic in any two rhythm characteristics according to the first distance formula, determining n music pieces and action pieces with the largest distance and m music pieces and action pieces with the smallest distance (m and n are integers larger than 0) in the music pieces and the action pieces corresponding to the rhythm characteristics, and obtaining feature sequences of the n music pieces and the action pieces and feature sequences of the m music pieces and the action pieces.

The method comprises the steps of obtaining feature sequences of n pieces of music and action pieces, feature sequences of m pieces of music and action pieces, distances of n pieces of music and action pieces and distances of m pieces of music and action pieces as training data. Training the initial neural network model according to the training data to obtain feature mapping about the music piece and feature mapping about the action piece, and obtaining a matching distance model of the music piece and the action piece according to the feature mapping about the music piece and the feature mapping about the action piece, wherein the output of the model is the matching distance of the music piece and the action piece. And determining the action transition distance according to the action transition distance formula. And determining the distance between the music piece and the action piece according to the output of the matching distance model of the music piece and the action piece.

And obtaining music to be matched, wherein the music to be matched comprises a plurality of music pieces to be matched. And determining a plurality of target action fragments with the smallest total distance with the plurality of music fragments to be matched in the action fragment library through a dynamic programming algorithm. And matching the plurality of target action fragments with the music to be matched.

Referring to fig. 4, a schematic structural diagram of a music and action matching device 400 according to an embodiment of the disclosure is shown, where the music and action matching device 400 may be a server. By way of example, as shown in fig. 4, the apparatus 400 includes a Central Processing Unit (CPU) 401, a system memory 404 including a Random Access Memory (RAM) 402 and a Read Only Memory (ROM) 403, and a system bus 405 connecting the system memory 404 and the central processing unit 401. Apparatus 400 also includes a basic input/output system (I/O system) 406 to facilitate the transfer of information between various devices within the computer, and a mass storage device 407 for storing an operating system 413, application programs 414, and other program modules 415.

The basic input/output system 406 includes a display 408 for displaying information and an input device 409, such as a mouse, keyboard, etc., for user input of information. Wherein both the display 408 and the input device 409 are coupled to the central processing unit 401 via an input output controller 410 coupled to the system bus 405. The basic input/output system 406 may also include an input/output controller 410 for receiving and processing input from a number of other devices, such as a keyboard, mouse, or electronic stylus. Similarly, the input/output controller 410 also provides output to a display screen, a printer, or other type of output device.

The mass storage device 407 is connected to the central processing unit 401 through a mass storage controller (not shown) connected to the system bus 405. The mass storage device 407 and its associated computer-readable medium provide non-volatile storage for the apparatus 400. That is, mass storage device 407 may include a computer-readable medium (not shown) such as a hard disk or CD-ROM drive.

Computer readable storage media may include computer storage media and communication media without loss of generality. Computer storage media includes volatile and nonvolatile, removable and non-removable media implemented in any method or technology for storage of information such as computer readable instructions, data structures, program modules or other data. Computer storage media includes RAM, ROM, EPROM, EEPROM, flash memory or other solid state memory technology, CD-ROM, DVD or other optical storage, magnetic cassettes, magnetic tape, magnetic disk storage or other magnetic storage devices. Of course, those skilled in the art will recognize that computer storage media are not limited to the ones described above. The system memory 404 and mass storage device 407 described above may be collectively referred to as memory.

The apparatus 400 may also operate via a network, such as the internet, connected to a remote computer on the network, in accordance with various embodiments of the present disclosure. I.e., the apparatus 400 may be connected to the network 412 through a network interface unit 411 coupled to the system bus 405, or alternatively, the network interface unit 411 may be used to connect to other types of networks or remote computer systems (not shown).

The memory further includes one or more programs, one or more programs stored in the memory and configured to be executed by the CPU to implement the methods provided by the embodiments of the present disclosure.

The embodiment of the application also provides a computer storage medium, in which at least one instruction, at least one section of program, a code set or an instruction set is stored, and the at least one instruction, the at least one section of program, the code set or the instruction set is loaded and executed by a processor to implement the matching method of music and actions as provided in the above method embodiment.

The foregoing is merely an alternative embodiment of the present disclosure, and is not intended to limit the present disclosure, any modification, equivalent replacement, improvement, etc. that comes within the spirit and principles of the present disclosure are included in the scope of the present disclosure.

Claims

1. A method of matching music to actions, the method comprising:

determining a plurality of rhythm characteristics corresponding to the plurality of music dance segments according to a rhythm characteristic formula, wherein the rhythm characteristic formula comprises:

z(M)＝h _z (f _motion (M))＝[z ₁ ，z ₂ ，…，z _zdim ] ^T ；

wherein M is any action segment, z (M) is the rhythm feature corresponding to M, and h is _z For feature mapping, the zdim is the dimension of the z (M), the f _motion The characteristic sequence of M is in a matrix form, t is any moment of M, N is the sampling number of M, and f _anim The characteristic of the M at the moment t in a matrix form is that p (M, t) is the three-dimensional space position of a root node, q (M, t) is the rotation information of the joint of the character, and r is the serial number of the joint;

D _match (A _i ，M _j )＝h _match (f _audio (A _i )，f _motion (M _j ))；

wherein the A _i For the music piece corresponding to the first playing feature, the M _j For the action segment corresponding to the second rhythm feature, the D _match The distance between the music piece corresponding to the first rhythm feature and the action piece corresponding to the second rhythm feature in the arbitrary two rhythm features is h _audio For feature mapping with respect to a piece of music, the h _motion For feature mapping with respect to action segments, the h _match Mapping the matching distance between the music piece and the action piece, wherein the output of the matching distance model between the music piece and the action piece is the matching distance between the music piece and the action piece;

matching the target action fragments with the music to be matched;

the obtaining the feature sequences of the n pieces of music and the action pieces, and the feature sequences of the m pieces of music and the action pieces, includes:

wherein A is any music piece, t is any time of M and A, N is the sampling number of M and A, and f _mfcc (a, t) is the mel-frequency cepstral coefficient of said a at said any time t;

the action transition distance formula comprises:

cost(M _i ，M _j )＝max{speed(f _trans (M _i ，M _j ))}；

f _trans (M _i ，M _j )＝blend(f _from (M _i )，f _to (M _j ))；

wherein the M _i For the action segment corresponding to the first playing feature, the D _trans For the motion transition distance, the cost (M _i ，M _j ) To take the maximum value in all joint speeds as the M _i Transition to the M _j At the cost of θ is a first threshold, speed is the joint velocity, and f _trans For transitional actions, the blend is an action mixing algorithm, and f _from To the M _i The time of the last frame is taken as the center, the length of the half beat is taken as a window, and the M is taken as the center _i Intercepting the action of half beat, said f _to To the M _j A first frame time is taken as a center, a half beat length is taken as a window, and the M is taken as a center _j And intercepting the action of half beat, wherein s is the radius of the window.

2. The method according to claim 1, wherein determining the euclidean distance between any two of the plurality of rhythm features as the distance between the piece of music corresponding to the first rhythm feature and the action piece corresponding to the second rhythm feature includes:

D _match (A _i ，M _j )＝D _motion (M _i ，M _j )＝||z(M _i )-z(M _j )||；

wherein the D is _motion For the distance between the motion segment corresponding to the first rhythm feature and the motion segment corresponding to the second rhythm feature in the arbitrary two rhythm features, z (M _i ) For the first nodal feature, the z (M _j ) Is the second cadence characteristic.

3. The method of claim 1, wherein said summing the matching distance of the musical piece to the action piece and the action transition distance as the distance of the musical piece to the action piece comprises:

wherein D is the distance between the music piece and the action piece, and M is the sum of the distance between the music piece and the action piece _x1 ，M _x2 ，...，M _xn And the action fragments are a plurality of the action fragments in the action fragment library.

4. The method of claim 1, wherein determining a plurality of target action segments in the action segment library that have a smallest total distance from the plurality of music segments to be matched comprises:

5. A music and action matching device, characterized in that it comprises a processor and a memory, in which a computer program is stored, which computer program is loaded and executed by the processor to implement the music and action matching method according to any one of claims 1 to 4.

6. A computer storage medium having a computer program stored therein, the computer program being loaded and executed by a processor to implement the method of matching music and actions of any of claims 1 to 4.