CN109285560A - A kind of music features extraction method, apparatus and electronic equipment - Google Patents
A kind of music features extraction method, apparatus and electronic equipment Download PDFInfo
- Publication number
- CN109285560A CN109285560A CN201811139448.6A CN201811139448A CN109285560A CN 109285560 A CN109285560 A CN 109285560A CN 201811139448 A CN201811139448 A CN 201811139448A CN 109285560 A CN109285560 A CN 109285560A
- Authority
- CN
- China
- Prior art keywords
- note
- matrix
- neural network
- bar
- recurrent neural
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
- 238000000605 extraction Methods 0.000 title claims abstract description 49
- 239000011159 matrix material Substances 0.000 claims abstract description 167
- 238000013528 artificial neural network Methods 0.000 claims abstract description 120
- 230000000306 recurrent effect Effects 0.000 claims abstract description 91
- 239000013598 vector Substances 0.000 claims abstract description 53
- 238000000034 method Methods 0.000 claims abstract description 23
- 238000004891 communication Methods 0.000 claims description 18
- 230000015654 memory Effects 0.000 claims description 16
- 238000012545 processing Methods 0.000 claims description 10
- 230000007774 longterm Effects 0.000 claims description 6
- 238000004590 computer program Methods 0.000 claims description 5
- 210000005036 nerve Anatomy 0.000 claims description 5
- 238000012549 training Methods 0.000 claims description 5
- 210000004218 nerve net Anatomy 0.000 claims description 2
- 238000005516 engineering process Methods 0.000 abstract description 5
- 238000003860 storage Methods 0.000 description 7
- 238000013527 convolutional neural network Methods 0.000 description 3
- 238000010586 diagram Methods 0.000 description 3
- 230000002093 peripheral effect Effects 0.000 description 2
- 238000004364 calculation method Methods 0.000 description 1
- 238000011161 development Methods 0.000 description 1
- 238000000802 evaporation-induced self-assembly Methods 0.000 description 1
- 230000004927 fusion Effects 0.000 description 1
- 230000007787 long-term memory Effects 0.000 description 1
- 230000014759 maintenance of location Effects 0.000 description 1
- 238000013507 mapping Methods 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
Classifications
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L25/00—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
- G10L25/27—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the analysis technique
- G10L25/30—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the analysis technique using neural networks
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L25/00—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
- G10L25/03—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the type of extracted parameters
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10H—ELECTROPHONIC MUSICAL INSTRUMENTS; INSTRUMENTS IN WHICH THE TONES ARE GENERATED BY ELECTROMECHANICAL MEANS OR ELECTRONIC GENERATORS, OR IN WHICH THE TONES ARE SYNTHESISED FROM A DATA STORE
- G10H2210/00—Aspects or methods of musical processing having intrinsic musical character, i.e. involving musical theory or musical parameters or relying on musical knowledge, as applied in electrophonic musical tools or instruments
- G10H2210/031—Musical analysis, i.e. isolation, extraction or identification of musical elements or musical parameters from a raw acoustic signal or from an encoded audio signal
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10H—ELECTROPHONIC MUSICAL INSTRUMENTS; INSTRUMENTS IN WHICH THE TONES ARE GENERATED BY ELECTROMECHANICAL MEANS OR ELECTRONIC GENERATORS, OR IN WHICH THE TONES ARE SYNTHESISED FROM A DATA STORE
- G10H2210/00—Aspects or methods of musical processing having intrinsic musical character, i.e. involving musical theory or musical parameters or relying on musical knowledge, as applied in electrophonic musical tools or instruments
- G10H2210/031—Musical analysis, i.e. isolation, extraction or identification of musical elements or musical parameters from a raw acoustic signal or from an encoded audio signal
- G10H2210/036—Musical analysis, i.e. isolation, extraction or identification of musical elements or musical parameters from a raw acoustic signal or from an encoded audio signal of musical genre, i.e. analysing the style of musical pieces, usually for selection, filtering or classification
Landscapes
- Engineering & Computer Science (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Computational Linguistics (AREA)
- Signal Processing (AREA)
- Health & Medical Sciences (AREA)
- Human Computer Interaction (AREA)
- Physics & Mathematics (AREA)
- Acoustics & Sound (AREA)
- Multimedia (AREA)
- Evolutionary Computation (AREA)
- Artificial Intelligence (AREA)
- Machine Translation (AREA)
- Auxiliary Devices For Music (AREA)
Abstract
The embodiment of the invention provides a kind of music features extraction method and devices, this method comprises: obtaining music data, the time series that the music data is made of δ note matrix, every a line of each note matrix respectively indicates a note, each column of the note matrix respectively indicate the broadcast state of the note, and δ is positive integer;The music data is inputted into trained Recognition with Recurrent Neural Network in advance, obtains the feature of the music data, the feature of the music data includes syllable BEAT eigenmatrix, trifle BAR eigenmatrix and song feature vector.In this way, the embodiment of the present invention obtains the musical features of multiple dimensions of music data by preparatory trained Recognition with Recurrent Neural Network, it can effectively solve the problems, such as that artificial extraction musical features efficiency of the existing technology is lower.
Description
Technical field
The present invention relates to music features extraction fields, more particularly to a kind of music features extraction method, apparatus and electronics
Equipment.
Background technique
With the continuous development of science and technology, the user of using terminal appreciation music is more and more.User can be glad by terminal
Various types of other music is appreciated, for example, pop music, classical music etc..
In order to meet the needs of users, more and more music can be appreciated for user;And it can root in order to facilitate user
The music for selecting it to be appreciated according to the classification of music needs to classify to music.Traditional music assorting method is usual are as follows:
The artificial musical features for extracting music, and classified based on artificial extracted musical features to music.It will be apparent that existing
The mode efficiency of this artificial extraction musical features is lower.
Summary of the invention
The embodiment of the present invention is designed to provide a kind of music features extraction method, apparatus and electronic equipment, to improve
The efficiency of music features extraction, specific technical solution are as follows:
In a first aspect, the embodiment of the invention provides a kind of music features extraction methods, which comprises
Obtain music data, the time series that the music data is made of δ note matrix, each note square
Every a line of battle array respectively indicates a note, and each column of the note matrix respectively indicate the broadcast state of the note, and δ is
Positive integer;
The music data is inputted into trained Recognition with Recurrent Neural Network in advance, obtains the feature of the music data, institute
The feature for stating music data includes syllable BEAT eigenmatrix, trifle BAR eigenmatrix and song feature vector.
Optionally, the note matrix M ∈ Ra×3, wherein M indicates that the note matrix, a indicate the line number of the M, institute
The first row for stating M indicates whether the note plays, and the secondary series of the M indicates whether the note is played again, the M
Third column indicate the broadcasting intensity of the note.
Optionally, ε note matrix forms a BEAT, and ζ BEAT forms a BAR, and η BAR forms a music
Song, the music data are made of one or more music track, and ε, ζ, η are positive integer.
Optionally, described that the music data is inputted into trained Recognition with Recurrent Neural Network in advance, obtain the music number
According to feature the step of, comprising:
In the music data input trained Recognition with Recurrent Neural Network in advance, determination currently inputs described preparatory
The note matrix M of trained Recognition with Recurrent Neural NetworktLocation information in target BAR, the target BAR are the note square
Battle array MtThe BAR at place;
By the MtLocation information in target BAR is converted to position vectorWherein,γ indicates one
The quantity of the note matrix contained in BAR;
By the note matrix Mt, note matrix MtN number of note matrix and note matrix M beforetN number of sound later
The one-dimensional convolutional layer that convolution kernel is θ in preparatory trained Recognition with Recurrent Neural Network described in Input matrix is accorded with, note Matrix C is obtainedt
∈Rа×3×θ, the music Matrix CtFor with note matrix MtContext-sensitive note matrix, N are positive integer;
By the position vectorRespectively with note Matrix CtIt is pre- described in the Input matrix for being spliced, and splicing being obtained
The first first layer neural network of trained Recognition with Recurrent Neural Network, obtains BEAT eigenmatrix by described, wherein described preparatory
The first layer neural network of trained Recognition with Recurrent Neural Network is used to carry out feature to the BEAT matrix in the music data to mention
It takes, ζ BEAT eigenmatrix forms a BAR matrix;
The BEAT eigenmatrix exported is inputted to the second layer nerve net of the trained Recognition with Recurrent Neural Network in advance
Network obtains BAR eigenmatrix, wherein the second layer neural network of the trained Recognition with Recurrent Neural Network in advance is used for institute
It states BEAT eigenmatrix and carries out feature extraction, the η BAR eigenmatrix forms a music track;
The third layer network that the BAR eigenmatrix is inputted to the trained Recognition with Recurrent Neural Network in advance, obtains song
Mesh feature vector, wherein the third layer network of the trained Recognition with Recurrent Neural Network in advance is used for the BAR eigenmatrix
Carry out feature extraction.
Optionally, the method also includes:
The song feature vector is inputted to the full articulamentum and softmax of the trained Recognition with Recurrent Neural Network in advance
Layer, exports each classification of the music data.
Optionally, the trained Recognition with Recurrent Neural Network in advance is that two-way shot and long term remembers Recognition with Recurrent Neural Network Bi-
LSTM。
Second aspect, the embodiment of the invention provides a kind of music features extraction devices, comprising:
Module is obtained, for obtaining music data, the time series that the music data is made of δ note matrix,
Every a line of each note matrix respectively indicates a note, and each column of the note matrix respectively indicate the note
Broadcast state;
Characteristic extracting module obtains described for the music data to be inputted trained Recognition with Recurrent Neural Network in advance
The feature of music data, the feature of the music data include that syllable BEAT eigenmatrix, trifle BAR eigenmatrix and song are special
Levy vector.
Optionally, the note matrix M ∈ Ra×3, wherein M indicates that the note matrix, a indicate the line number of the M, institute
The first row for stating M indicates whether the note plays, and the secondary series of the M indicates whether the note is played again, the M
Third column indicate the broadcasting intensity of the note.
Optionally, ε note matrix forms a BEAT, and ζ BEAT forms a BAR, and η BAR forms a music
Song, the music data are made of one or more music track.
Optionally, the characteristic extracting module, comprising:
Location information acquisition submodule, in the music data input trained Recognition with Recurrent Neural Network in advance
When, determine the note matrix M for currently inputting the trained Recognition with Recurrent Neural Network in advancetLocation information in target BAR,
The target BAR is the note matrix MtThe BAR at place;
Transform subblock is used for the MtLocation information in target BAR is converted to position vectorWherein,γ indicates the quantity of the note matrix contained in a BAR;
Input submodule is used for the note matrix Mt, note matrix MtN number of note matrix and note square before
Battle array MtConvolution kernel is the one-dimensional convolutional layer of θ in preparatory trained Recognition with Recurrent Neural Network described in N number of note Input matrix later,
Obtain note Matrix Ct∈Rа×3×θ, the music Matrix CtFor with note matrix MtContext-sensitive note matrix, N is positive whole
Number;
First processing submodule, is used for the position vectorRespectively with note Matrix CtSpliced, and will splicing
The first layer neural network of preparatory trained Recognition with Recurrent Neural Network described in obtained Input matrix obtains BEAT spy by described
Levy matrix, wherein the first layer neural network of the trained Recognition with Recurrent Neural Network in advance is used for in the music data
BEAT matrix carry out feature extraction, ζ BEAT eigenmatrix one BAR matrix of composition;
Second processing submodule, the BEAT eigenmatrix input circulation nerve trained in advance for will be exported
The second layer neural network of network, obtains BAR eigenmatrix, wherein the second of the trained Recognition with Recurrent Neural Network in advance
Layer neural network is used to carry out the BEAT eigenmatrix feature extraction, and the η BAR eigenmatrix forms a music
Mesh;
Third handles submodule, for the BAR eigenmatrix to be inputted the trained Recognition with Recurrent Neural Network in advance
Third layer network, obtain song feature vector, wherein it is described in advance trained Recognition with Recurrent Neural Network third layer network use
In to BAR eigenmatrix progress feature extraction.
Optionally, described device further include:
Input module, for the song feature vector to be inputted connecting entirely for the trained Recognition with Recurrent Neural Network in advance
Layer and softmax layers are connect, each classification of the music data is exported.
Optionally, the trained Recognition with Recurrent Neural Network in advance is that two-way shot and long term remembers Recognition with Recurrent Neural Network Bi-
LSTM。
The third aspect, the embodiment of the invention also provides a kind of electronic equipment, including processor, communication interface, memory
And communication bus, wherein the processor, the communication interface, the memory are completed mutual by the communication bus
Communication;
The memory, for storing computer program;
The processor when for executing the program stored on the memory, realizes music described in first aspect
Feature extracting method.
At the another aspect that the present invention is implemented, a kind of computer readable storage medium is additionally provided, it is described computer-readable
Instruction is stored in storage medium, when run on a computer, so that computer executes the spy of music described in first aspect
Levy extracting method.
At the another aspect that the present invention is implemented, a kind of computer program product comprising instruction is additionally provided, when it is being counted
When being run on calculation machine, so that computer executes music features extraction method described in first aspect.
Music features extraction method provided in an embodiment of the present invention, by obtaining music data, which is by δ
The time series of note matrix composition, and music data is inputted into trained Recognition with Recurrent Neural Network in advance, obtain music data
Feature, the feature of music data includes BEAT eigenmatrix, BAR eigenmatrix and song feature vector.In this way, the present invention is real
The feature that example obtains multiple dimensions of music data by preparatory trained Recognition with Recurrent Neural Network is applied, can effectively be solved existing
The problem of musical features inefficiency is manually chosen present in technology, and the accurate of extracted musical features can be improved
Rate.
Detailed description of the invention
In order to more clearly explain the embodiment of the invention or the technical proposal in the existing technology, to embodiment or will show below
There is attached drawing needed in technical description to be briefly described.
Fig. 1 is music features extraction method flow diagram provided in an embodiment of the present invention;
Fig. 2 is the schematic diagram of music features extraction in another embodiment provided by the invention;
Fig. 3 is music features extraction schematic device provided in an embodiment of the present invention;
Fig. 4 is music features extraction device structure schematic diagram provided in an embodiment of the present invention.
Specific embodiment
Following will be combined with the drawings in the embodiments of the present invention, and technical solution in the embodiment of the present invention is described.
In order to solve the problems, such as that artificial extraction musical features efficiency of the existing technology is lower, the embodiment of the present invention is provided
A kind of music features extraction method, apparatus and electronic equipment.
In a first aspect, a kind of music features extraction method provided in an embodiment of the present invention is described in detail first.
As shown in Figure 1, a kind of music features extraction method provided in an embodiment of the present invention may comprise steps of:
Step S110 obtains music data, wherein the music data provided in the embodiment of the present invention is by δ note square
The time series of battle array composition, every a line of each note matrix respectively indicate a note, and each column of note matrix distinguish table
Show the broadcast state of note.
Music data in the embodiment of the present invention can be with are as follows: the music data of MID format, the music data can be note
Time series.It is available by δ by the way that the combination of each of note time series moment note is converted to note matrix
The time series of note matrix composition.
In the embodiment of the present invention, note matrix can be indicated with M, note matrix M can by the matrix that a row 3 arranges Lai
It indicates, wherein a indicates the quantity of note, and the first row of M indicates whether note plays, and can be indicated with 0 and 1, such as with 1 table
Show that the note plays, 0 indicates not play;The secondary series of M indicates whether note is played again, such as indicates the note again with 1
Secondary broadcasting, 0 indicates not play again, and the third column of M indicate that the broadcasting intensity of note, the broadcasting intensity can be by MID music texts
In intensity mapping to the section of 0~β in part, wherein β indicates that note plays the maximum value of intensity.It is understood that each
Note can correspond to a note key, and when note key is pressed, corresponding note is played, and otherwise the note is not played.
Illustratively, note matrix M can be indicated are as follows: M ∈ Ra×3, such as:
Wherein, the value of x, y and z belong to the positive number no more than β.
In embodiment provided by the invention, ε note matrix can be formed to a BEAT, ζ BEAT forms one
BAR, η BAR form a music track, and music data is made of multiple music tracks.Certainly, BEAT, BEAT and music
Mesh can be matrix.
For example, by taking 4/4 melody as an example, four note matrix M form a BEAT, and four BEAT form one
BAR, 16 BAR form a melody song, and melody song here can be a melody segment, can be by the melody song
As training sample;MID file is cut according to above-mentioned rule, a training sample is a matrix, matrix ∈ Rа×3×δ, wherein δ=ε × ζ × η.
Music data is inputted trained Recognition with Recurrent Neural Network in advance, obtains the feature of music data, sound by step S120
The feature of happy data includes syllable BEAT eigenmatrix, trifle BAR eigenmatrix and song feature vector.
As can be seen from the above description, to be by music data can be the embodiment of the present invention includes that multiple note matrixes form
Matrix, i.e. ε note matrix form a BEAT, and ζ BEAT forms a BAR, and η BAR forms a music track, the sound
Happy data can be made of one or more music track.
Therefore, which can be input in preparatory trained Recognition with Recurrent Neural Network by the embodiment of the present invention, should
Preparatory trained Recognition with Recurrent Neural Network can be convolutional neural networks (Convolutional Neural Network, abbreviation
CNN), or two-way shot and long term remembers Recognition with Recurrent Neural Network Bi-LSTM.Using preparatory trained Recognition with Recurrent Neural Network come
Extract BEAT eigenmatrix, BAR eigenmatrix and the song feature vector of music data.
It is complete in order to describe clear and scheme, specific implementation of the embodiment below to S120 is retouched in detail
It states.
Music features extraction method provided in an embodiment of the present invention, by obtaining music data, which is by δ
The time series of note matrix composition, and music data is inputted into trained Recognition with Recurrent Neural Network in advance, obtain music data
Feature, the feature of music data includes BEAT eigenmatrix, BAR eigenmatrix and song feature vector.The present invention is real in this way
It applies example and the various dimensions feature of music data is obtained by preparatory trained Recognition with Recurrent Neural Network, can effectively solve the prior art
Present in the problem of manually choosing musical features inefficiency, and the accuracy rate of extracted musical features can be improved.
In order to input trained Recognition with Recurrent Neural Network in advance to by the music data, the spy of the music data is obtained
Sign is described in detail, in conjunction with above-described embodiment, in another embodiment provided by the invention, as shown in Fig. 2, step S120 can
To include the following steps:
Step S1 determines current input training in advance when music data inputs trained Recognition with Recurrent Neural Network in advance
The note matrix M of good Recognition with Recurrent Neural NetworktLocation information in target BAR, target BAR are note matrix MtPlace
BAR。
Step S2, by MtLocation information in target BAR is converted to position vector
Wherein,γ indicates the quantity of the note matrix contained in a BAR.
Step S3, by note matrix Mt, note matrix MtN number of note matrix and note matrix M beforetLater N number of
The note Input matrix one-dimensional convolutional layer that convolution kernel is θ in trained Recognition with Recurrent Neural Network in advance, obtains note Matrix Ct
∈Rа×3×θ。
Step S4, by position vectorRespectively with note Matrix CtSpliced, and the Input matrix that splicing is obtained is pre-
The first first layer neural network of trained Recognition with Recurrent Neural Network, obtains BEAT eigenmatrix, wherein preparatory trained circulation
The first layer neural network of neural network is used to carry out feature extraction, ζ BEAT feature square to the BEAT matrix in music data
Battle array one BAR matrix of composition.
Wherein, ζ BEAT eigenmatrix forms a BAR matrix.
The BEAT eigenmatrix exported is inputted the second layer nerve of trained Recognition with Recurrent Neural Network in advance by step S5
Network obtains BAR eigenmatrix, wherein it is described in advance trained Recognition with Recurrent Neural Network second layer neural network for pair
BAR matrix carries out feature extraction.
Wherein, η BAR feature square music data battle array forms a music track.
BAR eigenmatrix is inputted the third layer network of trained Recognition with Recurrent Neural Network in advance, obtains song by step S6
Feature vector, wherein the third layer network of preparatory trained Recognition with Recurrent Neural Network is used to carry out feature to BAR eigenmatrix to mention
It takes.
In order to which scheme is complete and description is clear, below in conjunction with specific embodiments, with trained circulation nerve in advance
Network is for two-way shot and long term remembers Recognition with Recurrent Neural Network Bi-LSTM, to carry out to technical solution provided in an embodiment of the present invention
Detailed description.
As shown in Fig. 2, the input of Bi-LSTM is the time series of note matrix M composition, in input note matrix MtWhen
It waits, the first step determines note matrix MtPosition of BAR where it generates position vectorWherein the vector of the position is one
Dimensional vector, the position for being set to 1 is note matrix MtThe position of BAR at place, thereforeWherein γ is in a BAR
The number of the note matrix contained.Specifically, assuming that 5 BEAT eigenmatrixes form a BAR eigenmatrix, it is assumed that one
BEAT is second BEAT of the BAR eigenmatrix, then the position vector generatedFor [0 100 0].
Second step, by note matrix Mt, note matrix MtN number of note matrix and note matrix M beforetLater N number of
The note Input matrix one-dimensional convolutional layer that convolution kernel is θ into preparatory trained Recognition with Recurrent Neural Network, obtains and note square
Battle array MtContext-sensitive note Matrix Ct∈Rа×3×θ, wherein context-sensitive note Matrix CtAre as follows: fusion note matrix Mt
The relational matrix of front and back note matrix information.Specifically, by Mt-N, Mt-N+1..., Mt+N-1, Mt+NIt is sent to one that convolution kernel is θ
The convolutional layer of dimension, it can output and note matrix MtContext-sensitive note matrix.
Then, by position vectorRespectively with CtSpliced, and the Input matrix that splicing is obtained is to Bi-LSTM's
The first layer neural network of Bi-LSTM is known as first layer Bi-LSTM network to describe to understand by first layer neural network, the
One layer of Bi-LSTM network is used to extract the BEAT feature of music data, i.e., the output of first layer Bi-LSTM network be from BEAT to
Amount 1, BEAT vector 2 ..., BEAT vector m composition BEAT eigenmatrix.
It should be noted that by position vectorRespectively with CtSplicing can be for by position vectorWith CtIt closes
And.For example, it is assumed that position vectorFor [0 00 1], note Matrix CtFor [1 23 4] and [5 67 8], position
VectorWith note Matrix CtAfter splicing, obtained matrix are as follows: [1 234000 1] and [5 678000 1].
BEAT eigenmatrix is input to the second layer neural network of Bi-LSTM by third step, clear in order to describe, can be with
The second layer neural network of Bi-LSTM is known as second layer Bi-LSTM network, second layer Bi-LSTM network is for extracting music
The BAR feature of data, the i.e. output of second layer Bi-LSTM network be have BAR vector 1, BAR vector 2 ... .., BAR vector n group
At BAR eigenmatrix.
BAR eigenmatrix is input to the third layer neural network of Bi-LSTM by the 4th step, clear in order to describe, and can be incited somebody to action
The third layer neural network of Bi-LSTM is known as third layer Bi-LSTM network, and third layer Bi-LSTM network is used for entire melody
Feature carries out more high-dimensional extraction, i.e. the output of third layer Bi-LSTM network is song vector.
Song vector is input to full articulamentum and Softmax layers, exports the classification results of the music data by the 5th step,
It is understood that different song vectors corresponds to different classification results, wherein classification results can be the volume of music categories
Number etc..
The music features extraction method provided through the embodiment of the present invention in this way can be automatical and efficient extraction music track
In characteristic, and the feature of melody different stage can be extracted, the characteristic extracted can be convenient ground
Classified automatically to music track.
Second aspect, the embodiment of the invention also provides a kind of music features extraction devices, as shown in figure 3, the device can
To include:
Module 310 is obtained, for obtaining music data, the timing sequence that the music data is made of δ note matrix
Column, every a line of each note matrix respectively indicate a note, and each column of the note matrix respectively indicate described
The broadcast state of note;
Characteristic extracting module 320 obtains institute for the music data to be inputted trained Recognition with Recurrent Neural Network in advance
The feature of music data is stated, the feature of the music data includes syllable BEAT eigenmatrix, trifle BAR eigenmatrix and song
Feature vector.
Music features extraction method provided in an embodiment of the present invention, by obtaining music data, which is by δ
The time series of note matrix composition, and music data is inputted into trained Recognition with Recurrent Neural Network in advance, obtain music data
Feature, the feature of music data includes BEAT eigenmatrix, BAR eigenmatrix and song feature vector.The present invention is real in this way
It applies example and the various dimensions feature of music data is obtained by preparatory trained Recognition with Recurrent Neural Network, can effectively solve the prior art
Present in the problem of manually choosing musical features inefficiency, and the accuracy rate of extracted musical features can be improved.
Optionally, the note matrix M ∈ Ra×3, wherein M indicates that the note matrix, a indicate the line number of the M, institute
The first row for stating M indicates whether the note plays, and the secondary series of the M indicates whether the note is played again, the M
Third column indicate the broadcasting intensity of the note.
Optionally, ε note matrix forms a BEAT, and ζ BEAT forms a BAR, and η BAR forms a music
Song, the music data are made of one or more music track.
Optionally, the characteristic extracting module, comprising:
Location information acquisition submodule, in the music data input trained Recognition with Recurrent Neural Network in advance
When, determine the note matrix M for currently inputting the trained Recognition with Recurrent Neural Network in advancetLocation information in target BAR,
The target BAR is the note matrix MtThe BAR at place;
Transform subblock is used for the MtLocation information in target BAR is converted to position vectorWherein,γ indicates the quantity of the note matrix contained in a BAR;
Input submodule is used for the note matrix Mt, note matrix MtN number of note matrix and note square before
Battle array MtConvolution kernel is the one-dimensional convolutional layer of θ in preparatory trained Recognition with Recurrent Neural Network described in N number of note Input matrix later,
Obtain note Matrix Ct∈Rа×3×θ, the music Matrix CtFor with note matrix MtContext-sensitive note matrix;
First processing submodule, is used for the position vectorRespectively with note Matrix CtSpliced, and will splicing
The first layer neural network of preparatory trained Recognition with Recurrent Neural Network described in obtained Input matrix obtains BEAT spy by described
Levy matrix, wherein the first layer neural network of the trained Recognition with Recurrent Neural Network in advance is used for in the music data
BEAT matrix carry out feature extraction, ζ BEAT eigenmatrix one BAR matrix of composition;
Second processing submodule, the BEAT eigenmatrix input circulation nerve trained in advance for will be exported
The second layer neural network of network, obtains BAR eigenmatrix, wherein the second of the trained Recognition with Recurrent Neural Network in advance
Layer neural network is used to carry out the BEAT eigenmatrix feature extraction, and the η BAR eigenmatrix forms a music
Mesh;
Third handles submodule, for the BAR eigenmatrix to be inputted the trained Recognition with Recurrent Neural Network in advance
Third layer network, obtain song feature vector, wherein it is described in advance trained Recognition with Recurrent Neural Network third layer network use
In to BAR eigenmatrix progress feature extraction.
Optionally, described device further include:
Input module, for the song feature vector to be inputted connecting entirely for the trained Recognition with Recurrent Neural Network in advance
Layer and softmax layers are connect, each classification of the music data is exported.
Optionally, the trained Recognition with Recurrent Neural Network in advance is two-way shot and long term memory trained circulation mind in advance
Through network B i-LSTM.
The third aspect, the embodiment of the invention also provides a kind of electronic equipment, as shown in figure 4, including processor 401, logical
Believe interface 402, memory 403 and communication bus 404, wherein processor 401, communication interface 402, memory 403 pass through communication
Bus 404 completes mutual communication,
Memory 403, for storing computer program;
Processor 401 when for executing the program stored on memory 403, realizes that music described in first aspect is special
Levy extracting method.
Music features extraction method provided in an embodiment of the present invention, by obtaining music data, which is by δ
The time series of note matrix composition, and music data is inputted into trained Recognition with Recurrent Neural Network in advance, obtain music data
Feature, the feature of music data includes BEAT eigenmatrix, BAR eigenmatrix and song feature vector.The present invention is real in this way
It applies example and the various dimensions feature of music data is obtained by preparatory trained Recognition with Recurrent Neural Network, can effectively solve the prior art
Present in the problem of manually choosing musical features inefficiency, and the accuracy rate of extracted musical features can be improved.
The communication bus that above-mentioned electronic equipment is mentioned can be Peripheral Component Interconnect standard (Peripheral Component
Interconnect, PCI) bus or expanding the industrial standard structure (Extended Industry Standard
Architecture, EISA) bus etc..The communication bus can be divided into address bus, data/address bus, control bus etc..For just
It is only indicated with a thick line in expression, figure, it is not intended that an only bus or a type of bus.
Communication interface is for the communication between above-mentioned electronic equipment and other equipment.
Memory may include random access memory (Random Access Memory, RAM), also may include non-easy
The property lost memory (Non-Volatile Memory, NVM), for example, at least a magnetic disk storage.Optionally, memory may be used also
To be storage device that at least one is located remotely from aforementioned processor.
Above-mentioned processor can be general processor, including central processing unit (Central Processing Unit,
CPU), network processing unit (Network Processor, NP) etc.;It can also be digital signal processor (Digital Signal
Processing, DSP), it is specific integrated circuit (Application Specific Integrated Circuit, ASIC), existing
It is field programmable gate array (Field-Programmable Gate Array, FPGA) or other programmable logic device, discrete
Door or transistor logic, discrete hardware components.
Fourth aspect, the embodiment of the invention also provides a kind of computer readable storage medium, the computer-readable storages
Instruction is stored in medium, when run on a computer, so that computer realizes that musical features described in first aspect mention
Take method.
Music features extraction method provided in an embodiment of the present invention, by obtaining music data, which is by δ
The time series of note matrix composition, and music data is inputted into trained Recognition with Recurrent Neural Network in advance, obtain music data
Feature, the feature of music data includes BEAT eigenmatrix, BAR eigenmatrix and song feature vector.The present invention is real in this way
It applies example and the various dimensions feature of music data is obtained by preparatory trained Recognition with Recurrent Neural Network, can effectively solve the prior art
Present in the problem of manually choosing musical features inefficiency, and the accuracy rate of extracted musical features can be improved.
5th aspect, the embodiment of the invention also provides a kind of computer program products comprising instruction, when it is being calculated
When being run on machine, so that computer realizes music features extraction method described in first aspect.
Music features extraction method provided in an embodiment of the present invention, by obtaining music data, which is by δ
The time series of note matrix composition, and music data is inputted into trained Recognition with Recurrent Neural Network in advance, obtain music data
Feature, the feature of music data includes BEAT eigenmatrix, BAR eigenmatrix and song feature vector.The present invention is real in this way
It applies example and the various dimensions feature of music data is obtained by preparatory trained Recognition with Recurrent Neural Network, can effectively solve the prior art
Present in the problem of manually choosing musical features inefficiency, and the accuracy rate of extracted musical features can be improved.
It should be noted that, in this document, relational terms such as first and second and the like are used merely to a reality
Body or operation are distinguished with another entity or operation, are deposited without necessarily requiring or implying between these entities or operation
In any actual relationship or order or sequence.Moreover, the terms "include", "comprise" or its any other variant are intended to
Non-exclusive inclusion, so that the process, method, article or equipment including a series of elements is not only wanted including those
Element, but also including other elements that are not explicitly listed, or further include for this process, method, article or equipment
Intrinsic element.In the absence of more restrictions, the element limited by sentence "including a ...", it is not excluded that
There is also other identical elements in process, method, article or equipment including element.
Each embodiment in this specification is all made of relevant mode and describes, same and similar portion between each embodiment
Dividing may refer to each other, and each embodiment focuses on the differences from other embodiments.Especially for device,
For system, electronic equipment, storage medium embodiment, since it is substantially similar to the method embodiment, so the comparison of description is simple
Single, the relevent part can refer to the partial explaination of embodiments of method.
The above is merely preferred embodiments of the present invention, it is not intended to limit the scope of the present invention.It is all in this hair
Any modification, equivalent replacement, improvement and so within bright spirit and principle, are included within the scope of protection of the present invention.
Claims (13)
1. a kind of music features extraction method characterized by comprising
Acquisition music data, the time series that the music data is made of δ note matrix, each note matrix
Every a line respectively indicates a note, and each column of the note matrix respectively indicate the broadcast state of the note, and δ is positive whole
Number;
The music data is inputted into trained Recognition with Recurrent Neural Network in advance, obtains the feature of the music data, the sound
The feature of happy data includes syllable BEAT eigenmatrix, trifle BAR eigenmatrix and song feature vector.
2. the method according to claim 1, wherein the note matrix M ∈ Ra×3, wherein M indicates the sound
Matrix is accorded with, a indicates the line number of the M, and the first row of the M indicates whether the note plays, and the secondary series of the M indicates institute
State whether note is played again, the third column of the M indicate the broadcasting intensity of the note.
3. method according to claim 1 or 2, which is characterized in that ε note matrix forms a BEAT, ζ BEAT group
At a BAR, η BAR forms a music track, and the music data is made of one or more music track, ε, ζ, η
It is positive integer.
4. according to the method described in claim 3, it is characterized in that, described input trained in advance follow for the music data
Ring neural network, the step of obtaining the feature of the music data, comprising:
In the music data input trained Recognition with Recurrent Neural Network in advance, determination currently inputs the preparatory training
The note matrix M of good Recognition with Recurrent Neural NetworktLocation information in target BAR, the target BAR are the note matrix Mt
The BAR at place;
By the MtLocation information in target BAR is converted to position vectorWherein,γ indicates a BAR
In the quantity of note matrix that contains;
By the note matrix Mt, note matrix MtN number of note matrix and note matrix M beforetN number of note square later
The one-dimensional convolutional layer that convolution kernel is θ in the battle array input trained Recognition with Recurrent Neural Network in advance, obtains note Matrix Ct∈Ra ×3×θ, the music Matrix CtFor with note matrix MtContext-sensitive note matrix, N are positive integer;
By the position vectorRespectively with note Matrix CtSpliced, and will be instructed in advance described in the obtained Input matrix of splicing
The first layer neural network for the Recognition with Recurrent Neural Network perfected obtains BEAT eigenmatrix by described, wherein the preparatory training
The first layer neural network of good Recognition with Recurrent Neural Network is used to carry out feature extraction, ζ to the BEAT matrix in the music data
A BEAT eigenmatrix forms a BAR matrix;
The second layer neural network that the BEAT eigenmatrix exported is inputted to the trained Recognition with Recurrent Neural Network in advance, obtains
To BAR eigenmatrix, wherein the second layer neural network of the trained Recognition with Recurrent Neural Network in advance is used for the BEAT
Eigenmatrix carries out feature extraction, and the η BAR eigenmatrix forms a music track;
The third layer network that the BAR eigenmatrix is inputted to the trained Recognition with Recurrent Neural Network in advance, obtains song spy
Levy vector, wherein the third layer network of the trained Recognition with Recurrent Neural Network in advance is used to carry out the BAR eigenmatrix
Feature extraction.
5. according to the method described in claim 4, it is characterized in that, the method also includes:
The song feature vector is inputted into the full articulamentum of trained Recognition with Recurrent Neural Network in advance and softmax layers,
Export each classification of the music data.
6. method according to any one of claims 1 to 5, which is characterized in that the trained circulation nerve net in advance
Network is that two-way shot and long term remembers Recognition with Recurrent Neural Network Bi-LSTM.
7. a kind of music features extraction device characterized by comprising
Module is obtained, for obtaining music data, the time series that the music data is made of δ note matrix, each
Every a line of the note matrix respectively indicates a note, and each column of the note matrix respectively indicate broadcasting for the note
Put state;
Characteristic extracting module obtains the music for the music data to be inputted trained Recognition with Recurrent Neural Network in advance
The feature of data, the feature of the music data include syllable BEAT eigenmatrix, trifle BAR eigenmatrix and song feature to
Amount.
8. device according to claim 7, which is characterized in that the note matrix M ∈ Ra×3, wherein M indicates the sound
Matrix is accorded with, a indicates the line number of the M, and the first row of the M indicates whether the note plays, and the secondary series of the M indicates institute
State whether note is played again, the third column of the M indicate the broadcasting intensity of the note.
9. device according to claim 7 or 8, which is characterized in that ε note matrix forms a BEAT, ζ BEAT group
At a BAR, η BAR forms a music track, and the music data is made of one or more music track.
10. device according to claim 9, which is characterized in that the characteristic extracting module, comprising:
Location information acquisition submodule is used in the music data input trained Recognition with Recurrent Neural Network in advance,
Determine the note matrix M for currently inputting the trained Recognition with Recurrent Neural Network in advancetLocation information in target BAR, institute
Stating target BAR is the note matrix MtThe BAR at place;
Transform subblock is used for the MtLocation information in target BAR is converted to position vectorWherein,γ indicates the quantity of the note matrix contained in a BAR;
Input submodule is used for the note matrix Mt, note matrix MtN number of note matrix and note matrix M beforet
Convolution kernel is the one-dimensional convolutional layer of θ in preparatory trained Recognition with Recurrent Neural Network described in N number of note Input matrix later, is obtained
Note Matrix Ct∈Ra×3×θ, the music Matrix CtFor with note matrix MtContext-sensitive note matrix, N are positive integer;
First processing submodule, is used for the position vectorRespectively with note Matrix CtSpliced, and splicing is obtained
The first layer neural network of preparatory trained Recognition with Recurrent Neural Network described in Input matrix, obtains BEAT feature square by described
Battle array, wherein the first layer neural network of the trained Recognition with Recurrent Neural Network in advance is used for in the music data
BEAT matrix carries out feature extraction, and ζ BEAT eigenmatrix forms a BAR matrix;
Second processing submodule, the BEAT eigenmatrix input trained Recognition with Recurrent Neural Network in advance for will be exported
Second layer neural network, obtain BAR eigenmatrix, wherein it is described in advance trained Recognition with Recurrent Neural Network the second layer mind
It is used to carry out the BEAT eigenmatrix feature extraction through network, the η BAR eigenmatrix forms a music track;
Third handles submodule, for the BAR eigenmatrix to be inputted the of the trained Recognition with Recurrent Neural Network in advance
Three-layer network obtains song feature vector, wherein it is described in advance trained Recognition with Recurrent Neural Network third layer network for pair
The BAR eigenmatrix carries out feature extraction.
11. device according to claim 9, which is characterized in that further include:
Input module, for the song feature vector to be inputted to the full articulamentum of the trained Recognition with Recurrent Neural Network in advance
With softmax layers, export each classification of the music data.
12. according to the described in any item devices of claim 7 to 11, which is characterized in that the circulation nerve trained in advance
Network is that two-way shot and long term remembers Recognition with Recurrent Neural Network Bi-LSTM.
13. a kind of electronic equipment, which is characterized in that including processor, communication interface, memory and communication bus, wherein described
Processor, the communication interface, the memory complete mutual communication by the communication bus;
The memory, for storing computer program;
The processor when for executing the program stored on the memory, realizes any side claim 1-6
Method step.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201811139448.6A CN109285560B (en) | 2018-09-28 | 2018-09-28 | Music feature extraction method and device and electronic equipment |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201811139448.6A CN109285560B (en) | 2018-09-28 | 2018-09-28 | Music feature extraction method and device and electronic equipment |
Publications (2)
Publication Number | Publication Date |
---|---|
CN109285560A true CN109285560A (en) | 2019-01-29 |
CN109285560B CN109285560B (en) | 2021-09-03 |
Family
ID=65182408
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201811139448.6A Active CN109285560B (en) | 2018-09-28 | 2018-09-28 | Music feature extraction method and device and electronic equipment |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN109285560B (en) |
Cited By (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110136729A (en) * | 2019-03-27 | 2019-08-16 | 北京奇艺世纪科技有限公司 | Model generating method, audio-frequency processing method, device and computer readable storage medium |
CN110264984A (en) * | 2019-05-13 | 2019-09-20 | 北京奇艺世纪科技有限公司 | Model training method, music generating method, device and electronic equipment |
CN112885315A (en) * | 2020-12-24 | 2021-06-01 | 携程旅游信息技术(上海)有限公司 | Model generation method, music synthesis method, system, device and medium |
Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN103186527A (en) * | 2011-12-27 | 2013-07-03 | 北京百度网讯科技有限公司 | System for building music classification model, system for recommending music and corresponding method |
US20130233155A1 (en) * | 2012-03-06 | 2013-09-12 | Apple Inc. | Systems and methods of note event adjustment |
US20140180674A1 (en) * | 2012-12-21 | 2014-06-26 | Arbitron Inc. | Audio matching with semantic audio recognition and report generation |
CN107045867A (en) * | 2017-03-22 | 2017-08-15 | 科大讯飞股份有限公司 | Automatic composing method, device and terminal device |
CN107123415A (en) * | 2017-05-04 | 2017-09-01 | 吴振国 | A kind of automatic music method and system |
CN107146631A (en) * | 2016-02-29 | 2017-09-08 | 北京搜狗科技发展有限公司 | Music recognition methods, note identification model method for building up, device and electronic equipment |
-
2018
- 2018-09-28 CN CN201811139448.6A patent/CN109285560B/en active Active
Patent Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN103186527A (en) * | 2011-12-27 | 2013-07-03 | 北京百度网讯科技有限公司 | System for building music classification model, system for recommending music and corresponding method |
US20130233155A1 (en) * | 2012-03-06 | 2013-09-12 | Apple Inc. | Systems and methods of note event adjustment |
US20140180674A1 (en) * | 2012-12-21 | 2014-06-26 | Arbitron Inc. | Audio matching with semantic audio recognition and report generation |
CN107146631A (en) * | 2016-02-29 | 2017-09-08 | 北京搜狗科技发展有限公司 | Music recognition methods, note identification model method for building up, device and electronic equipment |
CN107045867A (en) * | 2017-03-22 | 2017-08-15 | 科大讯飞股份有限公司 | Automatic composing method, device and terminal device |
CN107123415A (en) * | 2017-05-04 | 2017-09-01 | 吴振国 | A kind of automatic music method and system |
Cited By (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110136729A (en) * | 2019-03-27 | 2019-08-16 | 北京奇艺世纪科技有限公司 | Model generating method, audio-frequency processing method, device and computer readable storage medium |
CN110136729B (en) * | 2019-03-27 | 2021-08-20 | 北京奇艺世纪科技有限公司 | Model generation method, audio processing method, device and computer-readable storage medium |
CN110264984A (en) * | 2019-05-13 | 2019-09-20 | 北京奇艺世纪科技有限公司 | Model training method, music generating method, device and electronic equipment |
CN110264984B (en) * | 2019-05-13 | 2021-07-06 | 北京奇艺世纪科技有限公司 | Model training method, music generation method and device and electronic equipment |
CN112885315A (en) * | 2020-12-24 | 2021-06-01 | 携程旅游信息技术(上海)有限公司 | Model generation method, music synthesis method, system, device and medium |
CN112885315B (en) * | 2020-12-24 | 2024-01-02 | 携程旅游信息技术(上海)有限公司 | Model generation method, music synthesis method, system, equipment and medium |
Also Published As
Publication number | Publication date |
---|---|
CN109285560B (en) | 2021-09-03 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
Garcia et al. | How to read paintings: semantic art understanding with multi-modal retrieval | |
Jiang et al. | Exploiting feature and class relationships in video categorization with regularized deep neural networks | |
CN110909164A (en) | Text enhancement semantic classification method and system based on convolutional neural network | |
Aguiar et al. | Exploring data augmentation to improve music genre classification with convnets | |
CN109285560A (en) | A kind of music features extraction method, apparatus and electronic equipment | |
CN107463605A (en) | The recognition methods and device of low-quality News Resources, computer equipment and computer-readable recording medium | |
Yang et al. | Music Genre Classification Using Duplicated Convolutional Layers in Neural Networks. | |
CN105022754A (en) | Social network based object classification method and apparatus | |
Gowda et al. | A new split for evaluating true zero-shot action recognition | |
Chen et al. | Recognizing the style of visual arts via adaptive cross-layer correlation | |
CN105989067A (en) | Method for generating text abstract from image, user equipment and training server | |
CN108920644A (en) | Talk with judgment method, device, equipment and the computer-readable medium of continuity | |
Abdul-Rashid et al. | Shrec’18 track: 2d image-based 3d scene retrieval | |
Chen et al. | An Improved Deep Fusion CNN for Image Recognition. | |
Gao et al. | Bottom-up and top-down: Bidirectional additive net for edge detection | |
CN114398485B (en) | Expert portrait construction method and device based on multi-view fusion | |
Dhiraj et al. | An effective analysis of deep learning based approaches for audio based feature extraction and its visualization | |
CN108133020A (en) | Video classification methods, device, storage medium and electronic equipment | |
CN108154120A (en) | video classification model training method, device, storage medium and electronic equipment | |
CN108304387A (en) | The recognition methods of noise word, device, server group and storage medium in text | |
Wang et al. | Deep feature fusion for high-resolution aerial scene classification | |
CN108229640A (en) | The method, apparatus and robot of emotion expression service | |
CN110532570A (en) | A kind of method and apparatus of method and apparatus and model training that naming Entity recognition | |
Heakl et al. | A study on broadcast networks for music genre classification | |
CN110197213A (en) | Image matching method, device and equipment neural network based |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |