CN109285560A

CN109285560A - A kind of music features extraction method, apparatus and electronic equipment

Info

Publication number: CN109285560A
Application number: CN201811139448.6A
Authority: CN
Inventors: 刘思阳; 蒋紫东; 冯巍
Original assignee: Beijing QIYI Century Science and Technology Co Ltd
Current assignee: Beijing QIYI Century Science and Technology Co Ltd
Priority date: 2018-09-28
Filing date: 2018-09-28
Publication date: 2019-01-29
Anticipated expiration: 2038-09-28
Also published as: CN109285560B

Abstract

The embodiment of the invention provides a kind of music features extraction method and devices, this method comprises: obtaining music data, the time series that the music data is made of δ note matrix, every a line of each note matrix respectively indicates a note, each column of the note matrix respectively indicate the broadcast state of the note, and δ is positive integer；The music data is inputted into trained Recognition with Recurrent Neural Network in advance, obtains the feature of the music data, the feature of the music data includes syllable BEAT eigenmatrix, trifle BAR eigenmatrix and song feature vector.In this way, the embodiment of the present invention obtains the musical features of multiple dimensions of music data by preparatory trained Recognition with Recurrent Neural Network, it can effectively solve the problems, such as that artificial extraction musical features efficiency of the existing technology is lower.

Description

A kind of music features extraction method, apparatus and electronic equipment

Technical field

The present invention relates to music features extraction fields, more particularly to a kind of music features extraction method, apparatus and electronics Equipment.

Background technique

With the continuous development of science and technology, the user of using terminal appreciation music is more and more.User can be glad by terminal Various types of other music is appreciated, for example, pop music, classical music etc..

In order to meet the needs of users, more and more music can be appreciated for user；And it can root in order to facilitate user The music for selecting it to be appreciated according to the classification of music needs to classify to music.Traditional music assorting method is usual are as follows: The artificial musical features for extracting music, and classified based on artificial extracted musical features to music.It will be apparent that existing The mode efficiency of this artificial extraction musical features is lower.

Summary of the invention

The embodiment of the present invention is designed to provide a kind of music features extraction method, apparatus and electronic equipment, to improve The efficiency of music features extraction, specific technical solution are as follows:

In a first aspect, the embodiment of the invention provides a kind of music features extraction methods, which comprises

Obtain music data, the time series that the music data is made of δ note matrix, each note square Every a line of battle array respectively indicates a note, and each column of the note matrix respectively indicate the broadcast state of the note, and δ is Positive integer；

The music data is inputted into trained Recognition with Recurrent Neural Network in advance, obtains the feature of the music data, institute The feature for stating music data includes syllable BEAT eigenmatrix, trifle BAR eigenmatrix and song feature vector.

Optionally, the note matrix M ∈ R^a×3, wherein M indicates that the note matrix, a indicate the line number of the M, institute The first row for stating M indicates whether the note plays, and the secondary series of the M indicates whether the note is played again, the M Third column indicate the broadcasting intensity of the note.

Optionally, ε note matrix forms a BEAT, and ζ BEAT forms a BAR, and η BAR forms a music Song, the music data are made of one or more music track, and ε, ζ, η are positive integer.

Optionally, described that the music data is inputted into trained Recognition with Recurrent Neural Network in advance, obtain the music number According to feature the step of, comprising:

In the music data input trained Recognition with Recurrent Neural Network in advance, determination currently inputs described preparatory The note matrix M of trained Recognition with Recurrent Neural Network_tLocation information in target BAR, the target BAR are the note square Battle array M_tThe BAR at place；

By the M_tLocation information in target BAR is converted to position vectorWherein,γ indicates one The quantity of the note matrix contained in BAR；

By the note matrix M_t, note matrix M_tN number of note matrix and note matrix M before_tN number of sound later The one-dimensional convolutional layer that convolution kernel is θ in preparatory trained Recognition with Recurrent Neural Network described in Input matrix is accorded with, note Matrix C is obtained_t ∈R^а×3×θ, the music Matrix C_tFor with note matrix M_tContext-sensitive note matrix, N are positive integer；

By the position vectorRespectively with note Matrix C_tIt is pre- described in the Input matrix for being spliced, and splicing being obtained The first first layer neural network of trained Recognition with Recurrent Neural Network, obtains BEAT eigenmatrix by described, wherein described preparatory The first layer neural network of trained Recognition with Recurrent Neural Network is used to carry out feature to the BEAT matrix in the music data to mention It takes, ζ BEAT eigenmatrix forms a BAR matrix；

The BEAT eigenmatrix exported is inputted to the second layer nerve net of the trained Recognition with Recurrent Neural Network in advance Network obtains BAR eigenmatrix, wherein the second layer neural network of the trained Recognition with Recurrent Neural Network in advance is used for institute It states BEAT eigenmatrix and carries out feature extraction, the η BAR eigenmatrix forms a music track；

The third layer network that the BAR eigenmatrix is inputted to the trained Recognition with Recurrent Neural Network in advance, obtains song Mesh feature vector, wherein the third layer network of the trained Recognition with Recurrent Neural Network in advance is used for the BAR eigenmatrix Carry out feature extraction.

Optionally, the method also includes:

The song feature vector is inputted to the full articulamentum and softmax of the trained Recognition with Recurrent Neural Network in advance Layer, exports each classification of the music data.

Optionally, the trained Recognition with Recurrent Neural Network in advance is that two-way shot and long term remembers Recognition with Recurrent Neural Network Bi- LSTM。

Second aspect, the embodiment of the invention provides a kind of music features extraction devices, comprising:

Module is obtained, for obtaining music data, the time series that the music data is made of δ note matrix, Every a line of each note matrix respectively indicates a note, and each column of the note matrix respectively indicate the note Broadcast state；

Characteristic extracting module obtains described for the music data to be inputted trained Recognition with Recurrent Neural Network in advance The feature of music data, the feature of the music data include that syllable BEAT eigenmatrix, trifle BAR eigenmatrix and song are special Levy vector.

Optionally, ε note matrix forms a BEAT, and ζ BEAT forms a BAR, and η BAR forms a music Song, the music data are made of one or more music track.

Optionally, the characteristic extracting module, comprising:

Location information acquisition submodule, in the music data input trained Recognition with Recurrent Neural Network in advance When, determine the note matrix M for currently inputting the trained Recognition with Recurrent Neural Network in advance_tLocation information in target BAR, The target BAR is the note matrix M_tThe BAR at place；

Transform subblock is used for the M_tLocation information in target BAR is converted to position vectorWherein,γ indicates the quantity of the note matrix contained in a BAR；

Input submodule is used for the note matrix M_t, note matrix M_tN number of note matrix and note square before Battle array M_tConvolution kernel is the one-dimensional convolutional layer of θ in preparatory trained Recognition with Recurrent Neural Network described in N number of note Input matrix later, Obtain note Matrix C_t∈R^а×3×θ, the music Matrix C_tFor with note matrix M_tContext-sensitive note matrix, N is positive whole Number；

First processing submodule, is used for the position vectorRespectively with note Matrix C_tSpliced, and will splicing The first layer neural network of preparatory trained Recognition with Recurrent Neural Network described in obtained Input matrix obtains BEAT spy by described Levy matrix, wherein the first layer neural network of the trained Recognition with Recurrent Neural Network in advance is used for in the music data BEAT matrix carry out feature extraction, ζ BEAT eigenmatrix one BAR matrix of composition；

Second processing submodule, the BEAT eigenmatrix input circulation nerve trained in advance for will be exported The second layer neural network of network, obtains BAR eigenmatrix, wherein the second of the trained Recognition with Recurrent Neural Network in advance Layer neural network is used to carry out the BEAT eigenmatrix feature extraction, and the η BAR eigenmatrix forms a music Mesh；

Third handles submodule, for the BAR eigenmatrix to be inputted the trained Recognition with Recurrent Neural Network in advance Third layer network, obtain song feature vector, wherein it is described in advance trained Recognition with Recurrent Neural Network third layer network use In to BAR eigenmatrix progress feature extraction.

Optionally, described device further include:

Input module, for the song feature vector to be inputted connecting entirely for the trained Recognition with Recurrent Neural Network in advance Layer and softmax layers are connect, each classification of the music data is exported.

The third aspect, the embodiment of the invention also provides a kind of electronic equipment, including processor, communication interface, memory And communication bus, wherein the processor, the communication interface, the memory are completed mutual by the communication bus Communication；

The memory, for storing computer program；

The processor when for executing the program stored on the memory, realizes music described in first aspect Feature extracting method.

At the another aspect that the present invention is implemented, a kind of computer readable storage medium is additionally provided, it is described computer-readable Instruction is stored in storage medium, when run on a computer, so that computer executes the spy of music described in first aspect Levy extracting method.

At the another aspect that the present invention is implemented, a kind of computer program product comprising instruction is additionally provided, when it is being counted When being run on calculation machine, so that computer executes music features extraction method described in first aspect.

Music features extraction method provided in an embodiment of the present invention, by obtaining music data, which is by δ The time series of note matrix composition, and music data is inputted into trained Recognition with Recurrent Neural Network in advance, obtain music data Feature, the feature of music data includes BEAT eigenmatrix, BAR eigenmatrix and song feature vector.In this way, the present invention is real The feature that example obtains multiple dimensions of music data by preparatory trained Recognition with Recurrent Neural Network is applied, can effectively be solved existing The problem of musical features inefficiency is manually chosen present in technology, and the accurate of extracted musical features can be improved Rate.

Detailed description of the invention

In order to more clearly explain the embodiment of the invention or the technical proposal in the existing technology, to embodiment or will show below There is attached drawing needed in technical description to be briefly described.

Fig. 1 is music features extraction method flow diagram provided in an embodiment of the present invention；

Fig. 2 is the schematic diagram of music features extraction in another embodiment provided by the invention；

Fig. 3 is music features extraction schematic device provided in an embodiment of the present invention；

Fig. 4 is music features extraction device structure schematic diagram provided in an embodiment of the present invention.

Specific embodiment

Following will be combined with the drawings in the embodiments of the present invention, and technical solution in the embodiment of the present invention is described.

In order to solve the problems, such as that artificial extraction musical features efficiency of the existing technology is lower, the embodiment of the present invention is provided A kind of music features extraction method, apparatus and electronic equipment.

In a first aspect, a kind of music features extraction method provided in an embodiment of the present invention is described in detail first.

As shown in Figure 1, a kind of music features extraction method provided in an embodiment of the present invention may comprise steps of:

Step S110 obtains music data, wherein the music data provided in the embodiment of the present invention is by δ note square The time series of battle array composition, every a line of each note matrix respectively indicate a note, and each column of note matrix distinguish table Show the broadcast state of note.

Music data in the embodiment of the present invention can be with are as follows: the music data of MID format, the music data can be note Time series.It is available by δ by the way that the combination of each of note time series moment note is converted to note matrix The time series of note matrix composition.

In the embodiment of the present invention, note matrix can be indicated with M, note matrix M can by the matrix that a row 3 arranges Lai It indicates, wherein a indicates the quantity of note, and the first row of M indicates whether note plays, and can be indicated with 0 and 1, such as with 1 table Show that the note plays, 0 indicates not play；The secondary series of M indicates whether note is played again, such as indicates the note again with 1 Secondary broadcasting, 0 indicates not play again, and the third column of M indicate that the broadcasting intensity of note, the broadcasting intensity can be by MID music texts In intensity mapping to the section of 0~β in part, wherein β indicates that note plays the maximum value of intensity.It is understood that each Note can correspond to a note key, and when note key is pressed, corresponding note is played, and otherwise the note is not played.

Illustratively, note matrix M can be indicated are as follows: M ∈ R^a×3, such as:

Wherein, the value of x, y and z belong to the positive number no more than β.

In embodiment provided by the invention, ε note matrix can be formed to a BEAT, ζ BEAT forms one BAR, η BAR form a music track, and music data is made of multiple music tracks.Certainly, BEAT, BEAT and music Mesh can be matrix.

For example, by taking 4/4 melody as an example, four note matrix M form a BEAT, and four BEAT form one BAR, 16 BAR form a melody song, and melody song here can be a melody segment, can be by the melody song As training sample；MID file is cut according to above-mentioned rule, a training sample is a matrix, matrix ∈ R^а×3×δ, wherein δ=ε × ζ × η.

Music data is inputted trained Recognition with Recurrent Neural Network in advance, obtains the feature of music data, sound by step S120 The feature of happy data includes syllable BEAT eigenmatrix, trifle BAR eigenmatrix and song feature vector.

As can be seen from the above description, to be by music data can be the embodiment of the present invention includes that multiple note matrixes form Matrix, i.e. ε note matrix form a BEAT, and ζ BEAT forms a BAR, and η BAR forms a music track, the sound Happy data can be made of one or more music track.

Therefore, which can be input in preparatory trained Recognition with Recurrent Neural Network by the embodiment of the present invention, should Preparatory trained Recognition with Recurrent Neural Network can be convolutional neural networks (Convolutional Neural Network, abbreviation CNN), or two-way shot and long term remembers Recognition with Recurrent Neural Network Bi-LSTM.Using preparatory trained Recognition with Recurrent Neural Network come Extract BEAT eigenmatrix, BAR eigenmatrix and the song feature vector of music data.

It is complete in order to describe clear and scheme, specific implementation of the embodiment below to S120 is retouched in detail It states.

Music features extraction method provided in an embodiment of the present invention, by obtaining music data, which is by δ The time series of note matrix composition, and music data is inputted into trained Recognition with Recurrent Neural Network in advance, obtain music data Feature, the feature of music data includes BEAT eigenmatrix, BAR eigenmatrix and song feature vector.The present invention is real in this way It applies example and the various dimensions feature of music data is obtained by preparatory trained Recognition with Recurrent Neural Network, can effectively solve the prior art Present in the problem of manually choosing musical features inefficiency, and the accuracy rate of extracted musical features can be improved.

In order to input trained Recognition with Recurrent Neural Network in advance to by the music data, the spy of the music data is obtained Sign is described in detail, in conjunction with above-described embodiment, in another embodiment provided by the invention, as shown in Fig. 2, step S120 can To include the following steps:

Step S1 determines current input training in advance when music data inputs trained Recognition with Recurrent Neural Network in advance The note matrix M of good Recognition with Recurrent Neural Network_tLocation information in target BAR, target BAR are note matrix M_tPlace BAR。

Step S2, by M_tLocation information in target BAR is converted to position vector

Wherein,γ indicates the quantity of the note matrix contained in a BAR.

Step S3, by note matrix M_t, note matrix M_tN number of note matrix and note matrix M before_tLater N number of The note Input matrix one-dimensional convolutional layer that convolution kernel is θ in trained Recognition with Recurrent Neural Network in advance, obtains note Matrix C_t ∈R^а×3×θ。

Step S4, by position vectorRespectively with note Matrix C_tSpliced, and the Input matrix that splicing is obtained is pre- The first first layer neural network of trained Recognition with Recurrent Neural Network, obtains BEAT eigenmatrix, wherein preparatory trained circulation The first layer neural network of neural network is used to carry out feature extraction, ζ BEAT feature square to the BEAT matrix in music data Battle array one BAR matrix of composition.

Wherein, ζ BEAT eigenmatrix forms a BAR matrix.

The BEAT eigenmatrix exported is inputted the second layer nerve of trained Recognition with Recurrent Neural Network in advance by step S5 Network obtains BAR eigenmatrix, wherein it is described in advance trained Recognition with Recurrent Neural Network second layer neural network for pair BAR matrix carries out feature extraction.

Wherein, η BAR feature square music data battle array forms a music track.

BAR eigenmatrix is inputted the third layer network of trained Recognition with Recurrent Neural Network in advance, obtains song by step S6 Feature vector, wherein the third layer network of preparatory trained Recognition with Recurrent Neural Network is used to carry out feature to BAR eigenmatrix to mention It takes.

In order to which scheme is complete and description is clear, below in conjunction with specific embodiments, with trained circulation nerve in advance Network is for two-way shot and long term remembers Recognition with Recurrent Neural Network Bi-LSTM, to carry out to technical solution provided in an embodiment of the present invention Detailed description.

As shown in Fig. 2, the input of Bi-LSTM is the time series of note matrix M composition, in input note matrix M_tWhen It waits, the first step determines note matrix M_tPosition of BAR where it generates position vectorWherein the vector of the position is one Dimensional vector, the position for being set to 1 is note matrix M_tThe position of BAR at place, thereforeWherein γ is in a BAR The number of the note matrix contained.Specifically, assuming that 5 BEAT eigenmatrixes form a BAR eigenmatrix, it is assumed that one BEAT is second BEAT of the BAR eigenmatrix, then the position vector generatedFor [0 100 0].

Second step, by note matrix M_t, note matrix M_tN number of note matrix and note matrix M before_tLater N number of The note Input matrix one-dimensional convolutional layer that convolution kernel is θ into preparatory trained Recognition with Recurrent Neural Network, obtains and note square Battle array M_tContext-sensitive note Matrix C_t∈R^а×3×θ, wherein context-sensitive note Matrix C_tAre as follows: fusion note matrix M_t The relational matrix of front and back note matrix information.Specifically, by M_t-N, M_t-N+1..., M_t+N-1, M_t+NIt is sent to one that convolution kernel is θ The convolutional layer of dimension, it can output and note matrix M_tContext-sensitive note matrix.

Then, by position vectorRespectively with C_tSpliced, and the Input matrix that splicing is obtained is to Bi-LSTM's The first layer neural network of Bi-LSTM is known as first layer Bi-LSTM network to describe to understand by first layer neural network, the One layer of Bi-LSTM network is used to extract the BEAT feature of music data, i.e., the output of first layer Bi-LSTM network be from BEAT to Amount 1, BEAT vector 2 ..., BEAT vector m composition BEAT eigenmatrix.

It should be noted that by position vectorRespectively with C_tSplicing can be for by position vectorWith C_tIt closes And.For example, it is assumed that position vectorFor [0 00 1], note Matrix C_tFor [1 23 4] and [5 67 8], position VectorWith note Matrix C_tAfter splicing, obtained matrix are as follows: [1 234000 1] and [5 678000 1].

BEAT eigenmatrix is input to the second layer neural network of Bi-LSTM by third step, clear in order to describe, can be with The second layer neural network of Bi-LSTM is known as second layer Bi-LSTM network, second layer Bi-LSTM network is for extracting music The BAR feature of data, the i.e. output of second layer Bi-LSTM network be have BAR vector 1, BAR vector 2 ... .., BAR vector n group At BAR eigenmatrix.

BAR eigenmatrix is input to the third layer neural network of Bi-LSTM by the 4th step, clear in order to describe, and can be incited somebody to action The third layer neural network of Bi-LSTM is known as third layer Bi-LSTM network, and third layer Bi-LSTM network is used for entire melody Feature carries out more high-dimensional extraction, i.e. the output of third layer Bi-LSTM network is song vector.

Song vector is input to full articulamentum and Softmax layers, exports the classification results of the music data by the 5th step, It is understood that different song vectors corresponds to different classification results, wherein classification results can be the volume of music categories Number etc..

The music features extraction method provided through the embodiment of the present invention in this way can be automatical and efficient extraction music track In characteristic, and the feature of melody different stage can be extracted, the characteristic extracted can be convenient ground Classified automatically to music track.

Second aspect, the embodiment of the invention also provides a kind of music features extraction devices, as shown in figure 3, the device can To include:

Module 310 is obtained, for obtaining music data, the timing sequence that the music data is made of δ note matrix Column, every a line of each note matrix respectively indicate a note, and each column of the note matrix respectively indicate described The broadcast state of note；

Characteristic extracting module 320 obtains institute for the music data to be inputted trained Recognition with Recurrent Neural Network in advance The feature of music data is stated, the feature of the music data includes syllable BEAT eigenmatrix, trifle BAR eigenmatrix and song Feature vector.

Optionally, the characteristic extracting module, comprising:

Input submodule is used for the note matrix M_t, note matrix M_tN number of note matrix and note square before Battle array M_tConvolution kernel is the one-dimensional convolutional layer of θ in preparatory trained Recognition with Recurrent Neural Network described in N number of note Input matrix later, Obtain note Matrix C_t∈R^а×3×θ, the music Matrix C_tFor with note matrix M_tContext-sensitive note matrix；

Optionally, described device further include:

Optionally, the trained Recognition with Recurrent Neural Network in advance is two-way shot and long term memory trained circulation mind in advance Through network B i-LSTM.

The third aspect, the embodiment of the invention also provides a kind of electronic equipment, as shown in figure 4, including processor 401, logical Believe interface 402, memory 403 and communication bus 404, wherein processor 401, communication interface 402, memory 403 pass through communication Bus 404 completes mutual communication,

Memory 403, for storing computer program；

Processor 401 when for executing the program stored on memory 403, realizes that music described in first aspect is special Levy extracting method.

The communication bus that above-mentioned electronic equipment is mentioned can be Peripheral Component Interconnect standard (Peripheral Component Interconnect, PCI) bus or expanding the industrial standard structure (Extended Industry Standard Architecture, EISA) bus etc..The communication bus can be divided into address bus, data/address bus, control bus etc..For just It is only indicated with a thick line in expression, figure, it is not intended that an only bus or a type of bus.

Communication interface is for the communication between above-mentioned electronic equipment and other equipment.

Memory may include random access memory (Random Access Memory, RAM), also may include non-easy The property lost memory (Non-Volatile Memory, NVM), for example, at least a magnetic disk storage.Optionally, memory may be used also To be storage device that at least one is located remotely from aforementioned processor.

Above-mentioned processor can be general processor, including central processing unit (Central Processing Unit, CPU), network processing unit (Network Processor, NP) etc.；It can also be digital signal processor (Digital Signal Processing, DSP), it is specific integrated circuit (Application Specific Integrated Circuit, ASIC), existing It is field programmable gate array (Field-Programmable Gate Array, FPGA) or other programmable logic device, discrete Door or transistor logic, discrete hardware components.

Fourth aspect, the embodiment of the invention also provides a kind of computer readable storage medium, the computer-readable storages Instruction is stored in medium, when run on a computer, so that computer realizes that musical features described in first aspect mention Take method.

5th aspect, the embodiment of the invention also provides a kind of computer program products comprising instruction, when it is being calculated When being run on machine, so that computer realizes music features extraction method described in first aspect.

It should be noted that, in this document, relational terms such as first and second and the like are used merely to a reality Body or operation are distinguished with another entity or operation, are deposited without necessarily requiring or implying between these entities or operation In any actual relationship or order or sequence.Moreover, the terms "include", "comprise" or its any other variant are intended to Non-exclusive inclusion, so that the process, method, article or equipment including a series of elements is not only wanted including those Element, but also including other elements that are not explicitly listed, or further include for this process, method, article or equipment Intrinsic element.In the absence of more restrictions, the element limited by sentence "including a ...", it is not excluded that There is also other identical elements in process, method, article or equipment including element.

Each embodiment in this specification is all made of relevant mode and describes, same and similar portion between each embodiment Dividing may refer to each other, and each embodiment focuses on the differences from other embodiments.Especially for device, For system, electronic equipment, storage medium embodiment, since it is substantially similar to the method embodiment, so the comparison of description is simple Single, the relevent part can refer to the partial explaination of embodiments of method.

The above is merely preferred embodiments of the present invention, it is not intended to limit the scope of the present invention.It is all in this hair Any modification, equivalent replacement, improvement and so within bright spirit and principle, are included within the scope of protection of the present invention.

Claims

1. a kind of music features extraction method characterized by comprising

Acquisition music data, the time series that the music data is made of δ note matrix, each note matrix Every a line respectively indicates a note, and each column of the note matrix respectively indicate the broadcast state of the note, and δ is positive whole Number；

The music data is inputted into trained Recognition with Recurrent Neural Network in advance, obtains the feature of the music data, the sound The feature of happy data includes syllable BEAT eigenmatrix, trifle BAR eigenmatrix and song feature vector.

2. the method according to claim 1, wherein the note matrix M ∈ R^a×3, wherein M indicates the sound Matrix is accorded with, a indicates the line number of the M, and the first row of the M indicates whether the note plays, and the secondary series of the M indicates institute State whether note is played again, the third column of the M indicate the broadcasting intensity of the note.

3. method according to claim 1 or 2, which is characterized in that ε note matrix forms a BEAT, ζ BEAT group At a BAR, η BAR forms a music track, and the music data is made of one or more music track, ε, ζ, η It is positive integer.

4. according to the method described in claim 3, it is characterized in that, described input trained in advance follow for the music data Ring neural network, the step of obtaining the feature of the music data, comprising:

In the music data input trained Recognition with Recurrent Neural Network in advance, determination currently inputs the preparatory training The note matrix M of good Recognition with Recurrent Neural Network_tLocation information in target BAR, the target BAR are the note matrix M_t The BAR at place；

By the M_tLocation information in target BAR is converted to position vectorWherein,γ indicates a BAR In the quantity of note matrix that contains；

By the note matrix M_t, note matrix M_tN number of note matrix and note matrix M before_tN number of note square later The one-dimensional convolutional layer that convolution kernel is θ in the battle array input trained Recognition with Recurrent Neural Network in advance, obtains note Matrix C_t∈R^a ^×3×θ, the music Matrix C_tFor with note matrix M_tContext-sensitive note matrix, N are positive integer；

By the position vectorRespectively with note Matrix C_tSpliced, and will be instructed in advance described in the obtained Input matrix of splicing The first layer neural network for the Recognition with Recurrent Neural Network perfected obtains BEAT eigenmatrix by described, wherein the preparatory training The first layer neural network of good Recognition with Recurrent Neural Network is used to carry out feature extraction, ζ to the BEAT matrix in the music data A BEAT eigenmatrix forms a BAR matrix；

The second layer neural network that the BEAT eigenmatrix exported is inputted to the trained Recognition with Recurrent Neural Network in advance, obtains To BAR eigenmatrix, wherein the second layer neural network of the trained Recognition with Recurrent Neural Network in advance is used for the BEAT Eigenmatrix carries out feature extraction, and the η BAR eigenmatrix forms a music track；

The third layer network that the BAR eigenmatrix is inputted to the trained Recognition with Recurrent Neural Network in advance, obtains song spy Levy vector, wherein the third layer network of the trained Recognition with Recurrent Neural Network in advance is used to carry out the BAR eigenmatrix Feature extraction.

5. according to the method described in claim 4, it is characterized in that, the method also includes:

The song feature vector is inputted into the full articulamentum of trained Recognition with Recurrent Neural Network in advance and softmax layers, Export each classification of the music data.

6. method according to any one of claims 1 to 5, which is characterized in that the trained circulation nerve net in advance Network is that two-way shot and long term remembers Recognition with Recurrent Neural Network Bi-LSTM.

7. a kind of music features extraction device characterized by comprising

Module is obtained, for obtaining music data, the time series that the music data is made of δ note matrix, each Every a line of the note matrix respectively indicates a note, and each column of the note matrix respectively indicate broadcasting for the note Put state；

Characteristic extracting module obtains the music for the music data to be inputted trained Recognition with Recurrent Neural Network in advance The feature of data, the feature of the music data include syllable BEAT eigenmatrix, trifle BAR eigenmatrix and song feature to Amount.

8. device according to claim 7, which is characterized in that the note matrix M ∈ R^a×3, wherein M indicates the sound Matrix is accorded with, a indicates the line number of the M, and the first row of the M indicates whether the note plays, and the secondary series of the M indicates institute State whether note is played again, the third column of the M indicate the broadcasting intensity of the note.

9. device according to claim 7 or 8, which is characterized in that ε note matrix forms a BEAT, ζ BEAT group At a BAR, η BAR forms a music track, and the music data is made of one or more music track.

10. device according to claim 9, which is characterized in that the characteristic extracting module, comprising:

Location information acquisition submodule is used in the music data input trained Recognition with Recurrent Neural Network in advance, Determine the note matrix M for currently inputting the trained Recognition with Recurrent Neural Network in advance_tLocation information in target BAR, institute Stating target BAR is the note matrix M_tThe BAR at place；

Input submodule is used for the note matrix M_t, note matrix M_tN number of note matrix and note matrix M before_t Convolution kernel is the one-dimensional convolutional layer of θ in preparatory trained Recognition with Recurrent Neural Network described in N number of note Input matrix later, is obtained Note Matrix C_t∈R^a×3×θ, the music Matrix C_tFor with note matrix M_tContext-sensitive note matrix, N are positive integer；

First processing submodule, is used for the position vectorRespectively with note Matrix C_tSpliced, and splicing is obtained The first layer neural network of preparatory trained Recognition with Recurrent Neural Network described in Input matrix, obtains BEAT feature square by described Battle array, wherein the first layer neural network of the trained Recognition with Recurrent Neural Network in advance is used for in the music data BEAT matrix carries out feature extraction, and ζ BEAT eigenmatrix forms a BAR matrix；

Second processing submodule, the BEAT eigenmatrix input trained Recognition with Recurrent Neural Network in advance for will be exported Second layer neural network, obtain BAR eigenmatrix, wherein it is described in advance trained Recognition with Recurrent Neural Network the second layer mind It is used to carry out the BEAT eigenmatrix feature extraction through network, the η BAR eigenmatrix forms a music track；

Third handles submodule, for the BAR eigenmatrix to be inputted the of the trained Recognition with Recurrent Neural Network in advance Three-layer network obtains song feature vector, wherein it is described in advance trained Recognition with Recurrent Neural Network third layer network for pair The BAR eigenmatrix carries out feature extraction.

11. device according to claim 9, which is characterized in that further include:

Input module, for the song feature vector to be inputted to the full articulamentum of the trained Recognition with Recurrent Neural Network in advance With softmax layers, export each classification of the music data.

12. according to the described in any item devices of claim 7 to 11, which is characterized in that the circulation nerve trained in advance Network is that two-way shot and long term remembers Recognition with Recurrent Neural Network Bi-LSTM.

13. a kind of electronic equipment, which is characterized in that including processor, communication interface, memory and communication bus, wherein described Processor, the communication interface, the memory complete mutual communication by the communication bus；

The memory, for storing computer program；

The processor when for executing the program stored on the memory, realizes any side claim 1-6 Method step.