CN109285560A - A kind of music features extraction method, apparatus and electronic equipment - Google Patents

A kind of music features extraction method, apparatus and electronic equipment Download PDF

Info

Publication number
CN109285560A
CN109285560A CN201811139448.6A CN201811139448A CN109285560A CN 109285560 A CN109285560 A CN 109285560A CN 201811139448 A CN201811139448 A CN 201811139448A CN 109285560 A CN109285560 A CN 109285560A
Authority
CN
China
Prior art keywords
note
matrix
neural network
bar
recurrent neural
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201811139448.6A
Other languages
Chinese (zh)
Other versions
CN109285560B (en
Inventor
刘思阳
蒋紫东
冯巍
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing QIYI Century Science and Technology Co Ltd
Original Assignee
Beijing QIYI Century Science and Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing QIYI Century Science and Technology Co Ltd filed Critical Beijing QIYI Century Science and Technology Co Ltd
Priority to CN201811139448.6A priority Critical patent/CN109285560B/en
Publication of CN109285560A publication Critical patent/CN109285560A/en
Application granted granted Critical
Publication of CN109285560B publication Critical patent/CN109285560B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/27Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the analysis technique
    • G10L25/30Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the analysis technique using neural networks
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/03Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the type of extracted parameters
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10HELECTROPHONIC MUSICAL INSTRUMENTS; INSTRUMENTS IN WHICH THE TONES ARE GENERATED BY ELECTROMECHANICAL MEANS OR ELECTRONIC GENERATORS, OR IN WHICH THE TONES ARE SYNTHESISED FROM A DATA STORE
    • G10H2210/00Aspects or methods of musical processing having intrinsic musical character, i.e. involving musical theory or musical parameters or relying on musical knowledge, as applied in electrophonic musical tools or instruments
    • G10H2210/031Musical analysis, i.e. isolation, extraction or identification of musical elements or musical parameters from a raw acoustic signal or from an encoded audio signal
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10HELECTROPHONIC MUSICAL INSTRUMENTS; INSTRUMENTS IN WHICH THE TONES ARE GENERATED BY ELECTROMECHANICAL MEANS OR ELECTRONIC GENERATORS, OR IN WHICH THE TONES ARE SYNTHESISED FROM A DATA STORE
    • G10H2210/00Aspects or methods of musical processing having intrinsic musical character, i.e. involving musical theory or musical parameters or relying on musical knowledge, as applied in electrophonic musical tools or instruments
    • G10H2210/031Musical analysis, i.e. isolation, extraction or identification of musical elements or musical parameters from a raw acoustic signal or from an encoded audio signal
    • G10H2210/036Musical analysis, i.e. isolation, extraction or identification of musical elements or musical parameters from a raw acoustic signal or from an encoded audio signal of musical genre, i.e. analysing the style of musical pieces, usually for selection, filtering or classification

Landscapes

  • Engineering & Computer Science (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Computational Linguistics (AREA)
  • Signal Processing (AREA)
  • Health & Medical Sciences (AREA)
  • Human Computer Interaction (AREA)
  • Physics & Mathematics (AREA)
  • Acoustics & Sound (AREA)
  • Multimedia (AREA)
  • Evolutionary Computation (AREA)
  • Artificial Intelligence (AREA)
  • Machine Translation (AREA)
  • Auxiliary Devices For Music (AREA)

Abstract

The embodiment of the invention provides a kind of music features extraction method and devices, this method comprises: obtaining music data, the time series that the music data is made of δ note matrix, every a line of each note matrix respectively indicates a note, each column of the note matrix respectively indicate the broadcast state of the note, and δ is positive integer;The music data is inputted into trained Recognition with Recurrent Neural Network in advance, obtains the feature of the music data, the feature of the music data includes syllable BEAT eigenmatrix, trifle BAR eigenmatrix and song feature vector.In this way, the embodiment of the present invention obtains the musical features of multiple dimensions of music data by preparatory trained Recognition with Recurrent Neural Network, it can effectively solve the problems, such as that artificial extraction musical features efficiency of the existing technology is lower.

Description

A kind of music features extraction method, apparatus and electronic equipment
Technical field
The present invention relates to music features extraction fields, more particularly to a kind of music features extraction method, apparatus and electronics Equipment.
Background technique
With the continuous development of science and technology, the user of using terminal appreciation music is more and more.User can be glad by terminal Various types of other music is appreciated, for example, pop music, classical music etc..
In order to meet the needs of users, more and more music can be appreciated for user;And it can root in order to facilitate user The music for selecting it to be appreciated according to the classification of music needs to classify to music.Traditional music assorting method is usual are as follows: The artificial musical features for extracting music, and classified based on artificial extracted musical features to music.It will be apparent that existing The mode efficiency of this artificial extraction musical features is lower.
Summary of the invention
The embodiment of the present invention is designed to provide a kind of music features extraction method, apparatus and electronic equipment, to improve The efficiency of music features extraction, specific technical solution are as follows:
In a first aspect, the embodiment of the invention provides a kind of music features extraction methods, which comprises
Obtain music data, the time series that the music data is made of δ note matrix, each note square Every a line of battle array respectively indicates a note, and each column of the note matrix respectively indicate the broadcast state of the note, and δ is Positive integer;
The music data is inputted into trained Recognition with Recurrent Neural Network in advance, obtains the feature of the music data, institute The feature for stating music data includes syllable BEAT eigenmatrix, trifle BAR eigenmatrix and song feature vector.
Optionally, the note matrix M ∈ Ra×3, wherein M indicates that the note matrix, a indicate the line number of the M, institute The first row for stating M indicates whether the note plays, and the secondary series of the M indicates whether the note is played again, the M Third column indicate the broadcasting intensity of the note.
Optionally, ε note matrix forms a BEAT, and ζ BEAT forms a BAR, and η BAR forms a music Song, the music data are made of one or more music track, and ε, ζ, η are positive integer.
Optionally, described that the music data is inputted into trained Recognition with Recurrent Neural Network in advance, obtain the music number According to feature the step of, comprising:
In the music data input trained Recognition with Recurrent Neural Network in advance, determination currently inputs described preparatory The note matrix M of trained Recognition with Recurrent Neural NetworktLocation information in target BAR, the target BAR are the note square Battle array MtThe BAR at place;
By the MtLocation information in target BAR is converted to position vectorWherein,γ indicates one The quantity of the note matrix contained in BAR;
By the note matrix Mt, note matrix MtN number of note matrix and note matrix M beforetN number of sound later The one-dimensional convolutional layer that convolution kernel is θ in preparatory trained Recognition with Recurrent Neural Network described in Input matrix is accorded with, note Matrix C is obtainedt ∈Rа×3×θ, the music Matrix CtFor with note matrix MtContext-sensitive note matrix, N are positive integer;
By the position vectorRespectively with note Matrix CtIt is pre- described in the Input matrix for being spliced, and splicing being obtained The first first layer neural network of trained Recognition with Recurrent Neural Network, obtains BEAT eigenmatrix by described, wherein described preparatory The first layer neural network of trained Recognition with Recurrent Neural Network is used to carry out feature to the BEAT matrix in the music data to mention It takes, ζ BEAT eigenmatrix forms a BAR matrix;
The BEAT eigenmatrix exported is inputted to the second layer nerve net of the trained Recognition with Recurrent Neural Network in advance Network obtains BAR eigenmatrix, wherein the second layer neural network of the trained Recognition with Recurrent Neural Network in advance is used for institute It states BEAT eigenmatrix and carries out feature extraction, the η BAR eigenmatrix forms a music track;
The third layer network that the BAR eigenmatrix is inputted to the trained Recognition with Recurrent Neural Network in advance, obtains song Mesh feature vector, wherein the third layer network of the trained Recognition with Recurrent Neural Network in advance is used for the BAR eigenmatrix Carry out feature extraction.
Optionally, the method also includes:
The song feature vector is inputted to the full articulamentum and softmax of the trained Recognition with Recurrent Neural Network in advance Layer, exports each classification of the music data.
Optionally, the trained Recognition with Recurrent Neural Network in advance is that two-way shot and long term remembers Recognition with Recurrent Neural Network Bi- LSTM。
Second aspect, the embodiment of the invention provides a kind of music features extraction devices, comprising:
Module is obtained, for obtaining music data, the time series that the music data is made of δ note matrix, Every a line of each note matrix respectively indicates a note, and each column of the note matrix respectively indicate the note Broadcast state;
Characteristic extracting module obtains described for the music data to be inputted trained Recognition with Recurrent Neural Network in advance The feature of music data, the feature of the music data include that syllable BEAT eigenmatrix, trifle BAR eigenmatrix and song are special Levy vector.
Optionally, the note matrix M ∈ Ra×3, wherein M indicates that the note matrix, a indicate the line number of the M, institute The first row for stating M indicates whether the note plays, and the secondary series of the M indicates whether the note is played again, the M Third column indicate the broadcasting intensity of the note.
Optionally, ε note matrix forms a BEAT, and ζ BEAT forms a BAR, and η BAR forms a music Song, the music data are made of one or more music track.
Optionally, the characteristic extracting module, comprising:
Location information acquisition submodule, in the music data input trained Recognition with Recurrent Neural Network in advance When, determine the note matrix M for currently inputting the trained Recognition with Recurrent Neural Network in advancetLocation information in target BAR, The target BAR is the note matrix MtThe BAR at place;
Transform subblock is used for the MtLocation information in target BAR is converted to position vectorWherein,γ indicates the quantity of the note matrix contained in a BAR;
Input submodule is used for the note matrix Mt, note matrix MtN number of note matrix and note square before Battle array MtConvolution kernel is the one-dimensional convolutional layer of θ in preparatory trained Recognition with Recurrent Neural Network described in N number of note Input matrix later, Obtain note Matrix Ct∈Rа×3×θ, the music Matrix CtFor with note matrix MtContext-sensitive note matrix, N is positive whole Number;
First processing submodule, is used for the position vectorRespectively with note Matrix CtSpliced, and will splicing The first layer neural network of preparatory trained Recognition with Recurrent Neural Network described in obtained Input matrix obtains BEAT spy by described Levy matrix, wherein the first layer neural network of the trained Recognition with Recurrent Neural Network in advance is used for in the music data BEAT matrix carry out feature extraction, ζ BEAT eigenmatrix one BAR matrix of composition;
Second processing submodule, the BEAT eigenmatrix input circulation nerve trained in advance for will be exported The second layer neural network of network, obtains BAR eigenmatrix, wherein the second of the trained Recognition with Recurrent Neural Network in advance Layer neural network is used to carry out the BEAT eigenmatrix feature extraction, and the η BAR eigenmatrix forms a music Mesh;
Third handles submodule, for the BAR eigenmatrix to be inputted the trained Recognition with Recurrent Neural Network in advance Third layer network, obtain song feature vector, wherein it is described in advance trained Recognition with Recurrent Neural Network third layer network use In to BAR eigenmatrix progress feature extraction.
Optionally, described device further include:
Input module, for the song feature vector to be inputted connecting entirely for the trained Recognition with Recurrent Neural Network in advance Layer and softmax layers are connect, each classification of the music data is exported.
Optionally, the trained Recognition with Recurrent Neural Network in advance is that two-way shot and long term remembers Recognition with Recurrent Neural Network Bi- LSTM。
The third aspect, the embodiment of the invention also provides a kind of electronic equipment, including processor, communication interface, memory And communication bus, wherein the processor, the communication interface, the memory are completed mutual by the communication bus Communication;
The memory, for storing computer program;
The processor when for executing the program stored on the memory, realizes music described in first aspect Feature extracting method.
At the another aspect that the present invention is implemented, a kind of computer readable storage medium is additionally provided, it is described computer-readable Instruction is stored in storage medium, when run on a computer, so that computer executes the spy of music described in first aspect Levy extracting method.
At the another aspect that the present invention is implemented, a kind of computer program product comprising instruction is additionally provided, when it is being counted When being run on calculation machine, so that computer executes music features extraction method described in first aspect.
Music features extraction method provided in an embodiment of the present invention, by obtaining music data, which is by δ The time series of note matrix composition, and music data is inputted into trained Recognition with Recurrent Neural Network in advance, obtain music data Feature, the feature of music data includes BEAT eigenmatrix, BAR eigenmatrix and song feature vector.In this way, the present invention is real The feature that example obtains multiple dimensions of music data by preparatory trained Recognition with Recurrent Neural Network is applied, can effectively be solved existing The problem of musical features inefficiency is manually chosen present in technology, and the accurate of extracted musical features can be improved Rate.
Detailed description of the invention
In order to more clearly explain the embodiment of the invention or the technical proposal in the existing technology, to embodiment or will show below There is attached drawing needed in technical description to be briefly described.
Fig. 1 is music features extraction method flow diagram provided in an embodiment of the present invention;
Fig. 2 is the schematic diagram of music features extraction in another embodiment provided by the invention;
Fig. 3 is music features extraction schematic device provided in an embodiment of the present invention;
Fig. 4 is music features extraction device structure schematic diagram provided in an embodiment of the present invention.
Specific embodiment
Following will be combined with the drawings in the embodiments of the present invention, and technical solution in the embodiment of the present invention is described.
In order to solve the problems, such as that artificial extraction musical features efficiency of the existing technology is lower, the embodiment of the present invention is provided A kind of music features extraction method, apparatus and electronic equipment.
In a first aspect, a kind of music features extraction method provided in an embodiment of the present invention is described in detail first.
As shown in Figure 1, a kind of music features extraction method provided in an embodiment of the present invention may comprise steps of:
Step S110 obtains music data, wherein the music data provided in the embodiment of the present invention is by δ note square The time series of battle array composition, every a line of each note matrix respectively indicate a note, and each column of note matrix distinguish table Show the broadcast state of note.
Music data in the embodiment of the present invention can be with are as follows: the music data of MID format, the music data can be note Time series.It is available by δ by the way that the combination of each of note time series moment note is converted to note matrix The time series of note matrix composition.
In the embodiment of the present invention, note matrix can be indicated with M, note matrix M can by the matrix that a row 3 arranges Lai It indicates, wherein a indicates the quantity of note, and the first row of M indicates whether note plays, and can be indicated with 0 and 1, such as with 1 table Show that the note plays, 0 indicates not play;The secondary series of M indicates whether note is played again, such as indicates the note again with 1 Secondary broadcasting, 0 indicates not play again, and the third column of M indicate that the broadcasting intensity of note, the broadcasting intensity can be by MID music texts In intensity mapping to the section of 0~β in part, wherein β indicates that note plays the maximum value of intensity.It is understood that each Note can correspond to a note key, and when note key is pressed, corresponding note is played, and otherwise the note is not played.
Illustratively, note matrix M can be indicated are as follows: M ∈ Ra×3, such as:
Wherein, the value of x, y and z belong to the positive number no more than β.
In embodiment provided by the invention, ε note matrix can be formed to a BEAT, ζ BEAT forms one BAR, η BAR form a music track, and music data is made of multiple music tracks.Certainly, BEAT, BEAT and music Mesh can be matrix.
For example, by taking 4/4 melody as an example, four note matrix M form a BEAT, and four BEAT form one BAR, 16 BAR form a melody song, and melody song here can be a melody segment, can be by the melody song As training sample;MID file is cut according to above-mentioned rule, a training sample is a matrix, matrix ∈ Rа×3×δ, wherein δ=ε × ζ × η.
Music data is inputted trained Recognition with Recurrent Neural Network in advance, obtains the feature of music data, sound by step S120 The feature of happy data includes syllable BEAT eigenmatrix, trifle BAR eigenmatrix and song feature vector.
As can be seen from the above description, to be by music data can be the embodiment of the present invention includes that multiple note matrixes form Matrix, i.e. ε note matrix form a BEAT, and ζ BEAT forms a BAR, and η BAR forms a music track, the sound Happy data can be made of one or more music track.
Therefore, which can be input in preparatory trained Recognition with Recurrent Neural Network by the embodiment of the present invention, should Preparatory trained Recognition with Recurrent Neural Network can be convolutional neural networks (Convolutional Neural Network, abbreviation CNN), or two-way shot and long term remembers Recognition with Recurrent Neural Network Bi-LSTM.Using preparatory trained Recognition with Recurrent Neural Network come Extract BEAT eigenmatrix, BAR eigenmatrix and the song feature vector of music data.
It is complete in order to describe clear and scheme, specific implementation of the embodiment below to S120 is retouched in detail It states.
Music features extraction method provided in an embodiment of the present invention, by obtaining music data, which is by δ The time series of note matrix composition, and music data is inputted into trained Recognition with Recurrent Neural Network in advance, obtain music data Feature, the feature of music data includes BEAT eigenmatrix, BAR eigenmatrix and song feature vector.The present invention is real in this way It applies example and the various dimensions feature of music data is obtained by preparatory trained Recognition with Recurrent Neural Network, can effectively solve the prior art Present in the problem of manually choosing musical features inefficiency, and the accuracy rate of extracted musical features can be improved.
In order to input trained Recognition with Recurrent Neural Network in advance to by the music data, the spy of the music data is obtained Sign is described in detail, in conjunction with above-described embodiment, in another embodiment provided by the invention, as shown in Fig. 2, step S120 can To include the following steps:
Step S1 determines current input training in advance when music data inputs trained Recognition with Recurrent Neural Network in advance The note matrix M of good Recognition with Recurrent Neural NetworktLocation information in target BAR, target BAR are note matrix MtPlace BAR。
Step S2, by MtLocation information in target BAR is converted to position vector
Wherein,γ indicates the quantity of the note matrix contained in a BAR.
Step S3, by note matrix Mt, note matrix MtN number of note matrix and note matrix M beforetLater N number of The note Input matrix one-dimensional convolutional layer that convolution kernel is θ in trained Recognition with Recurrent Neural Network in advance, obtains note Matrix Ct ∈Rа×3×θ
Step S4, by position vectorRespectively with note Matrix CtSpliced, and the Input matrix that splicing is obtained is pre- The first first layer neural network of trained Recognition with Recurrent Neural Network, obtains BEAT eigenmatrix, wherein preparatory trained circulation The first layer neural network of neural network is used to carry out feature extraction, ζ BEAT feature square to the BEAT matrix in music data Battle array one BAR matrix of composition.
Wherein, ζ BEAT eigenmatrix forms a BAR matrix.
The BEAT eigenmatrix exported is inputted the second layer nerve of trained Recognition with Recurrent Neural Network in advance by step S5 Network obtains BAR eigenmatrix, wherein it is described in advance trained Recognition with Recurrent Neural Network second layer neural network for pair BAR matrix carries out feature extraction.
Wherein, η BAR feature square music data battle array forms a music track.
BAR eigenmatrix is inputted the third layer network of trained Recognition with Recurrent Neural Network in advance, obtains song by step S6 Feature vector, wherein the third layer network of preparatory trained Recognition with Recurrent Neural Network is used to carry out feature to BAR eigenmatrix to mention It takes.
In order to which scheme is complete and description is clear, below in conjunction with specific embodiments, with trained circulation nerve in advance Network is for two-way shot and long term remembers Recognition with Recurrent Neural Network Bi-LSTM, to carry out to technical solution provided in an embodiment of the present invention Detailed description.
As shown in Fig. 2, the input of Bi-LSTM is the time series of note matrix M composition, in input note matrix MtWhen It waits, the first step determines note matrix MtPosition of BAR where it generates position vectorWherein the vector of the position is one Dimensional vector, the position for being set to 1 is note matrix MtThe position of BAR at place, thereforeWherein γ is in a BAR The number of the note matrix contained.Specifically, assuming that 5 BEAT eigenmatrixes form a BAR eigenmatrix, it is assumed that one BEAT is second BEAT of the BAR eigenmatrix, then the position vector generatedFor [0 100 0].
Second step, by note matrix Mt, note matrix MtN number of note matrix and note matrix M beforetLater N number of The note Input matrix one-dimensional convolutional layer that convolution kernel is θ into preparatory trained Recognition with Recurrent Neural Network, obtains and note square Battle array MtContext-sensitive note Matrix Ct∈Rа×3×θ, wherein context-sensitive note Matrix CtAre as follows: fusion note matrix Mt The relational matrix of front and back note matrix information.Specifically, by Mt-N, Mt-N+1..., Mt+N-1, Mt+NIt is sent to one that convolution kernel is θ The convolutional layer of dimension, it can output and note matrix MtContext-sensitive note matrix.
Then, by position vectorRespectively with CtSpliced, and the Input matrix that splicing is obtained is to Bi-LSTM's The first layer neural network of Bi-LSTM is known as first layer Bi-LSTM network to describe to understand by first layer neural network, the One layer of Bi-LSTM network is used to extract the BEAT feature of music data, i.e., the output of first layer Bi-LSTM network be from BEAT to Amount 1, BEAT vector 2 ..., BEAT vector m composition BEAT eigenmatrix.
It should be noted that by position vectorRespectively with CtSplicing can be for by position vectorWith CtIt closes And.For example, it is assumed that position vectorFor [0 00 1], note Matrix CtFor [1 23 4] and [5 67 8], position VectorWith note Matrix CtAfter splicing, obtained matrix are as follows: [1 234000 1] and [5 678000 1].
BEAT eigenmatrix is input to the second layer neural network of Bi-LSTM by third step, clear in order to describe, can be with The second layer neural network of Bi-LSTM is known as second layer Bi-LSTM network, second layer Bi-LSTM network is for extracting music The BAR feature of data, the i.e. output of second layer Bi-LSTM network be have BAR vector 1, BAR vector 2 ... .., BAR vector n group At BAR eigenmatrix.
BAR eigenmatrix is input to the third layer neural network of Bi-LSTM by the 4th step, clear in order to describe, and can be incited somebody to action The third layer neural network of Bi-LSTM is known as third layer Bi-LSTM network, and third layer Bi-LSTM network is used for entire melody Feature carries out more high-dimensional extraction, i.e. the output of third layer Bi-LSTM network is song vector.
Song vector is input to full articulamentum and Softmax layers, exports the classification results of the music data by the 5th step, It is understood that different song vectors corresponds to different classification results, wherein classification results can be the volume of music categories Number etc..
The music features extraction method provided through the embodiment of the present invention in this way can be automatical and efficient extraction music track In characteristic, and the feature of melody different stage can be extracted, the characteristic extracted can be convenient ground Classified automatically to music track.
Second aspect, the embodiment of the invention also provides a kind of music features extraction devices, as shown in figure 3, the device can To include:
Module 310 is obtained, for obtaining music data, the timing sequence that the music data is made of δ note matrix Column, every a line of each note matrix respectively indicate a note, and each column of the note matrix respectively indicate described The broadcast state of note;
Characteristic extracting module 320 obtains institute for the music data to be inputted trained Recognition with Recurrent Neural Network in advance The feature of music data is stated, the feature of the music data includes syllable BEAT eigenmatrix, trifle BAR eigenmatrix and song Feature vector.
Music features extraction method provided in an embodiment of the present invention, by obtaining music data, which is by δ The time series of note matrix composition, and music data is inputted into trained Recognition with Recurrent Neural Network in advance, obtain music data Feature, the feature of music data includes BEAT eigenmatrix, BAR eigenmatrix and song feature vector.The present invention is real in this way It applies example and the various dimensions feature of music data is obtained by preparatory trained Recognition with Recurrent Neural Network, can effectively solve the prior art Present in the problem of manually choosing musical features inefficiency, and the accuracy rate of extracted musical features can be improved.
Optionally, the note matrix M ∈ Ra×3, wherein M indicates that the note matrix, a indicate the line number of the M, institute The first row for stating M indicates whether the note plays, and the secondary series of the M indicates whether the note is played again, the M Third column indicate the broadcasting intensity of the note.
Optionally, ε note matrix forms a BEAT, and ζ BEAT forms a BAR, and η BAR forms a music Song, the music data are made of one or more music track.
Optionally, the characteristic extracting module, comprising:
Location information acquisition submodule, in the music data input trained Recognition with Recurrent Neural Network in advance When, determine the note matrix M for currently inputting the trained Recognition with Recurrent Neural Network in advancetLocation information in target BAR, The target BAR is the note matrix MtThe BAR at place;
Transform subblock is used for the MtLocation information in target BAR is converted to position vectorWherein,γ indicates the quantity of the note matrix contained in a BAR;
Input submodule is used for the note matrix Mt, note matrix MtN number of note matrix and note square before Battle array MtConvolution kernel is the one-dimensional convolutional layer of θ in preparatory trained Recognition with Recurrent Neural Network described in N number of note Input matrix later, Obtain note Matrix Ct∈Rа×3×θ, the music Matrix CtFor with note matrix MtContext-sensitive note matrix;
First processing submodule, is used for the position vectorRespectively with note Matrix CtSpliced, and will splicing The first layer neural network of preparatory trained Recognition with Recurrent Neural Network described in obtained Input matrix obtains BEAT spy by described Levy matrix, wherein the first layer neural network of the trained Recognition with Recurrent Neural Network in advance is used for in the music data BEAT matrix carry out feature extraction, ζ BEAT eigenmatrix one BAR matrix of composition;
Second processing submodule, the BEAT eigenmatrix input circulation nerve trained in advance for will be exported The second layer neural network of network, obtains BAR eigenmatrix, wherein the second of the trained Recognition with Recurrent Neural Network in advance Layer neural network is used to carry out the BEAT eigenmatrix feature extraction, and the η BAR eigenmatrix forms a music Mesh;
Third handles submodule, for the BAR eigenmatrix to be inputted the trained Recognition with Recurrent Neural Network in advance Third layer network, obtain song feature vector, wherein it is described in advance trained Recognition with Recurrent Neural Network third layer network use In to BAR eigenmatrix progress feature extraction.
Optionally, described device further include:
Input module, for the song feature vector to be inputted connecting entirely for the trained Recognition with Recurrent Neural Network in advance Layer and softmax layers are connect, each classification of the music data is exported.
Optionally, the trained Recognition with Recurrent Neural Network in advance is two-way shot and long term memory trained circulation mind in advance Through network B i-LSTM.
The third aspect, the embodiment of the invention also provides a kind of electronic equipment, as shown in figure 4, including processor 401, logical Believe interface 402, memory 403 and communication bus 404, wherein processor 401, communication interface 402, memory 403 pass through communication Bus 404 completes mutual communication,
Memory 403, for storing computer program;
Processor 401 when for executing the program stored on memory 403, realizes that music described in first aspect is special Levy extracting method.
Music features extraction method provided in an embodiment of the present invention, by obtaining music data, which is by δ The time series of note matrix composition, and music data is inputted into trained Recognition with Recurrent Neural Network in advance, obtain music data Feature, the feature of music data includes BEAT eigenmatrix, BAR eigenmatrix and song feature vector.The present invention is real in this way It applies example and the various dimensions feature of music data is obtained by preparatory trained Recognition with Recurrent Neural Network, can effectively solve the prior art Present in the problem of manually choosing musical features inefficiency, and the accuracy rate of extracted musical features can be improved.
The communication bus that above-mentioned electronic equipment is mentioned can be Peripheral Component Interconnect standard (Peripheral Component Interconnect, PCI) bus or expanding the industrial standard structure (Extended Industry Standard Architecture, EISA) bus etc..The communication bus can be divided into address bus, data/address bus, control bus etc..For just It is only indicated with a thick line in expression, figure, it is not intended that an only bus or a type of bus.
Communication interface is for the communication between above-mentioned electronic equipment and other equipment.
Memory may include random access memory (Random Access Memory, RAM), also may include non-easy The property lost memory (Non-Volatile Memory, NVM), for example, at least a magnetic disk storage.Optionally, memory may be used also To be storage device that at least one is located remotely from aforementioned processor.
Above-mentioned processor can be general processor, including central processing unit (Central Processing Unit, CPU), network processing unit (Network Processor, NP) etc.;It can also be digital signal processor (Digital Signal Processing, DSP), it is specific integrated circuit (Application Specific Integrated Circuit, ASIC), existing It is field programmable gate array (Field-Programmable Gate Array, FPGA) or other programmable logic device, discrete Door or transistor logic, discrete hardware components.
Fourth aspect, the embodiment of the invention also provides a kind of computer readable storage medium, the computer-readable storages Instruction is stored in medium, when run on a computer, so that computer realizes that musical features described in first aspect mention Take method.
Music features extraction method provided in an embodiment of the present invention, by obtaining music data, which is by δ The time series of note matrix composition, and music data is inputted into trained Recognition with Recurrent Neural Network in advance, obtain music data Feature, the feature of music data includes BEAT eigenmatrix, BAR eigenmatrix and song feature vector.The present invention is real in this way It applies example and the various dimensions feature of music data is obtained by preparatory trained Recognition with Recurrent Neural Network, can effectively solve the prior art Present in the problem of manually choosing musical features inefficiency, and the accuracy rate of extracted musical features can be improved.
5th aspect, the embodiment of the invention also provides a kind of computer program products comprising instruction, when it is being calculated When being run on machine, so that computer realizes music features extraction method described in first aspect.
Music features extraction method provided in an embodiment of the present invention, by obtaining music data, which is by δ The time series of note matrix composition, and music data is inputted into trained Recognition with Recurrent Neural Network in advance, obtain music data Feature, the feature of music data includes BEAT eigenmatrix, BAR eigenmatrix and song feature vector.The present invention is real in this way It applies example and the various dimensions feature of music data is obtained by preparatory trained Recognition with Recurrent Neural Network, can effectively solve the prior art Present in the problem of manually choosing musical features inefficiency, and the accuracy rate of extracted musical features can be improved.
It should be noted that, in this document, relational terms such as first and second and the like are used merely to a reality Body or operation are distinguished with another entity or operation, are deposited without necessarily requiring or implying between these entities or operation In any actual relationship or order or sequence.Moreover, the terms "include", "comprise" or its any other variant are intended to Non-exclusive inclusion, so that the process, method, article or equipment including a series of elements is not only wanted including those Element, but also including other elements that are not explicitly listed, or further include for this process, method, article or equipment Intrinsic element.In the absence of more restrictions, the element limited by sentence "including a ...", it is not excluded that There is also other identical elements in process, method, article or equipment including element.
Each embodiment in this specification is all made of relevant mode and describes, same and similar portion between each embodiment Dividing may refer to each other, and each embodiment focuses on the differences from other embodiments.Especially for device, For system, electronic equipment, storage medium embodiment, since it is substantially similar to the method embodiment, so the comparison of description is simple Single, the relevent part can refer to the partial explaination of embodiments of method.
The above is merely preferred embodiments of the present invention, it is not intended to limit the scope of the present invention.It is all in this hair Any modification, equivalent replacement, improvement and so within bright spirit and principle, are included within the scope of protection of the present invention.

Claims (13)

1. a kind of music features extraction method characterized by comprising
Acquisition music data, the time series that the music data is made of δ note matrix, each note matrix Every a line respectively indicates a note, and each column of the note matrix respectively indicate the broadcast state of the note, and δ is positive whole Number;
The music data is inputted into trained Recognition with Recurrent Neural Network in advance, obtains the feature of the music data, the sound The feature of happy data includes syllable BEAT eigenmatrix, trifle BAR eigenmatrix and song feature vector.
2. the method according to claim 1, wherein the note matrix M ∈ Ra×3, wherein M indicates the sound Matrix is accorded with, a indicates the line number of the M, and the first row of the M indicates whether the note plays, and the secondary series of the M indicates institute State whether note is played again, the third column of the M indicate the broadcasting intensity of the note.
3. method according to claim 1 or 2, which is characterized in that ε note matrix forms a BEAT, ζ BEAT group At a BAR, η BAR forms a music track, and the music data is made of one or more music track, ε, ζ, η It is positive integer.
4. according to the method described in claim 3, it is characterized in that, described input trained in advance follow for the music data Ring neural network, the step of obtaining the feature of the music data, comprising:
In the music data input trained Recognition with Recurrent Neural Network in advance, determination currently inputs the preparatory training The note matrix M of good Recognition with Recurrent Neural NetworktLocation information in target BAR, the target BAR are the note matrix Mt The BAR at place;
By the MtLocation information in target BAR is converted to position vectorWherein,γ indicates a BAR In the quantity of note matrix that contains;
By the note matrix Mt, note matrix MtN number of note matrix and note matrix M beforetN number of note square later The one-dimensional convolutional layer that convolution kernel is θ in the battle array input trained Recognition with Recurrent Neural Network in advance, obtains note Matrix Ct∈Ra ×3×θ, the music Matrix CtFor with note matrix MtContext-sensitive note matrix, N are positive integer;
By the position vectorRespectively with note Matrix CtSpliced, and will be instructed in advance described in the obtained Input matrix of splicing The first layer neural network for the Recognition with Recurrent Neural Network perfected obtains BEAT eigenmatrix by described, wherein the preparatory training The first layer neural network of good Recognition with Recurrent Neural Network is used to carry out feature extraction, ζ to the BEAT matrix in the music data A BEAT eigenmatrix forms a BAR matrix;
The second layer neural network that the BEAT eigenmatrix exported is inputted to the trained Recognition with Recurrent Neural Network in advance, obtains To BAR eigenmatrix, wherein the second layer neural network of the trained Recognition with Recurrent Neural Network in advance is used for the BEAT Eigenmatrix carries out feature extraction, and the η BAR eigenmatrix forms a music track;
The third layer network that the BAR eigenmatrix is inputted to the trained Recognition with Recurrent Neural Network in advance, obtains song spy Levy vector, wherein the third layer network of the trained Recognition with Recurrent Neural Network in advance is used to carry out the BAR eigenmatrix Feature extraction.
5. according to the method described in claim 4, it is characterized in that, the method also includes:
The song feature vector is inputted into the full articulamentum of trained Recognition with Recurrent Neural Network in advance and softmax layers, Export each classification of the music data.
6. method according to any one of claims 1 to 5, which is characterized in that the trained circulation nerve net in advance Network is that two-way shot and long term remembers Recognition with Recurrent Neural Network Bi-LSTM.
7. a kind of music features extraction device characterized by comprising
Module is obtained, for obtaining music data, the time series that the music data is made of δ note matrix, each Every a line of the note matrix respectively indicates a note, and each column of the note matrix respectively indicate broadcasting for the note Put state;
Characteristic extracting module obtains the music for the music data to be inputted trained Recognition with Recurrent Neural Network in advance The feature of data, the feature of the music data include syllable BEAT eigenmatrix, trifle BAR eigenmatrix and song feature to Amount.
8. device according to claim 7, which is characterized in that the note matrix M ∈ Ra×3, wherein M indicates the sound Matrix is accorded with, a indicates the line number of the M, and the first row of the M indicates whether the note plays, and the secondary series of the M indicates institute State whether note is played again, the third column of the M indicate the broadcasting intensity of the note.
9. device according to claim 7 or 8, which is characterized in that ε note matrix forms a BEAT, ζ BEAT group At a BAR, η BAR forms a music track, and the music data is made of one or more music track.
10. device according to claim 9, which is characterized in that the characteristic extracting module, comprising:
Location information acquisition submodule is used in the music data input trained Recognition with Recurrent Neural Network in advance, Determine the note matrix M for currently inputting the trained Recognition with Recurrent Neural Network in advancetLocation information in target BAR, institute Stating target BAR is the note matrix MtThe BAR at place;
Transform subblock is used for the MtLocation information in target BAR is converted to position vectorWherein,γ indicates the quantity of the note matrix contained in a BAR;
Input submodule is used for the note matrix Mt, note matrix MtN number of note matrix and note matrix M beforet Convolution kernel is the one-dimensional convolutional layer of θ in preparatory trained Recognition with Recurrent Neural Network described in N number of note Input matrix later, is obtained Note Matrix Ct∈Ra×3×θ, the music Matrix CtFor with note matrix MtContext-sensitive note matrix, N are positive integer;
First processing submodule, is used for the position vectorRespectively with note Matrix CtSpliced, and splicing is obtained The first layer neural network of preparatory trained Recognition with Recurrent Neural Network described in Input matrix, obtains BEAT feature square by described Battle array, wherein the first layer neural network of the trained Recognition with Recurrent Neural Network in advance is used for in the music data BEAT matrix carries out feature extraction, and ζ BEAT eigenmatrix forms a BAR matrix;
Second processing submodule, the BEAT eigenmatrix input trained Recognition with Recurrent Neural Network in advance for will be exported Second layer neural network, obtain BAR eigenmatrix, wherein it is described in advance trained Recognition with Recurrent Neural Network the second layer mind It is used to carry out the BEAT eigenmatrix feature extraction through network, the η BAR eigenmatrix forms a music track;
Third handles submodule, for the BAR eigenmatrix to be inputted the of the trained Recognition with Recurrent Neural Network in advance Three-layer network obtains song feature vector, wherein it is described in advance trained Recognition with Recurrent Neural Network third layer network for pair The BAR eigenmatrix carries out feature extraction.
11. device according to claim 9, which is characterized in that further include:
Input module, for the song feature vector to be inputted to the full articulamentum of the trained Recognition with Recurrent Neural Network in advance With softmax layers, export each classification of the music data.
12. according to the described in any item devices of claim 7 to 11, which is characterized in that the circulation nerve trained in advance Network is that two-way shot and long term remembers Recognition with Recurrent Neural Network Bi-LSTM.
13. a kind of electronic equipment, which is characterized in that including processor, communication interface, memory and communication bus, wherein described Processor, the communication interface, the memory complete mutual communication by the communication bus;
The memory, for storing computer program;
The processor when for executing the program stored on the memory, realizes any side claim 1-6 Method step.
CN201811139448.6A 2018-09-28 2018-09-28 Music feature extraction method and device and electronic equipment Active CN109285560B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201811139448.6A CN109285560B (en) 2018-09-28 2018-09-28 Music feature extraction method and device and electronic equipment

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201811139448.6A CN109285560B (en) 2018-09-28 2018-09-28 Music feature extraction method and device and electronic equipment

Publications (2)

Publication Number Publication Date
CN109285560A true CN109285560A (en) 2019-01-29
CN109285560B CN109285560B (en) 2021-09-03

Family

ID=65182408

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201811139448.6A Active CN109285560B (en) 2018-09-28 2018-09-28 Music feature extraction method and device and electronic equipment

Country Status (1)

Country Link
CN (1) CN109285560B (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110136729A (en) * 2019-03-27 2019-08-16 北京奇艺世纪科技有限公司 Model generating method, audio-frequency processing method, device and computer readable storage medium
CN110264984A (en) * 2019-05-13 2019-09-20 北京奇艺世纪科技有限公司 Model training method, music generating method, device and electronic equipment
CN112885315A (en) * 2020-12-24 2021-06-01 携程旅游信息技术(上海)有限公司 Model generation method, music synthesis method, system, device and medium

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103186527A (en) * 2011-12-27 2013-07-03 北京百度网讯科技有限公司 System for building music classification model, system for recommending music and corresponding method
US20130233155A1 (en) * 2012-03-06 2013-09-12 Apple Inc. Systems and methods of note event adjustment
US20140180674A1 (en) * 2012-12-21 2014-06-26 Arbitron Inc. Audio matching with semantic audio recognition and report generation
CN107045867A (en) * 2017-03-22 2017-08-15 科大讯飞股份有限公司 Automatic composing method, device and terminal device
CN107123415A (en) * 2017-05-04 2017-09-01 吴振国 A kind of automatic music method and system
CN107146631A (en) * 2016-02-29 2017-09-08 北京搜狗科技发展有限公司 Music recognition methods, note identification model method for building up, device and electronic equipment

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103186527A (en) * 2011-12-27 2013-07-03 北京百度网讯科技有限公司 System for building music classification model, system for recommending music and corresponding method
US20130233155A1 (en) * 2012-03-06 2013-09-12 Apple Inc. Systems and methods of note event adjustment
US20140180674A1 (en) * 2012-12-21 2014-06-26 Arbitron Inc. Audio matching with semantic audio recognition and report generation
CN107146631A (en) * 2016-02-29 2017-09-08 北京搜狗科技发展有限公司 Music recognition methods, note identification model method for building up, device and electronic equipment
CN107045867A (en) * 2017-03-22 2017-08-15 科大讯飞股份有限公司 Automatic composing method, device and terminal device
CN107123415A (en) * 2017-05-04 2017-09-01 吴振国 A kind of automatic music method and system

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110136729A (en) * 2019-03-27 2019-08-16 北京奇艺世纪科技有限公司 Model generating method, audio-frequency processing method, device and computer readable storage medium
CN110136729B (en) * 2019-03-27 2021-08-20 北京奇艺世纪科技有限公司 Model generation method, audio processing method, device and computer-readable storage medium
CN110264984A (en) * 2019-05-13 2019-09-20 北京奇艺世纪科技有限公司 Model training method, music generating method, device and electronic equipment
CN110264984B (en) * 2019-05-13 2021-07-06 北京奇艺世纪科技有限公司 Model training method, music generation method and device and electronic equipment
CN112885315A (en) * 2020-12-24 2021-06-01 携程旅游信息技术(上海)有限公司 Model generation method, music synthesis method, system, device and medium
CN112885315B (en) * 2020-12-24 2024-01-02 携程旅游信息技术(上海)有限公司 Model generation method, music synthesis method, system, equipment and medium

Also Published As

Publication number Publication date
CN109285560B (en) 2021-09-03

Similar Documents

Publication Publication Date Title
Garcia et al. How to read paintings: semantic art understanding with multi-modal retrieval
Jiang et al. Exploiting feature and class relationships in video categorization with regularized deep neural networks
CN110909164A (en) Text enhancement semantic classification method and system based on convolutional neural network
Aguiar et al. Exploring data augmentation to improve music genre classification with convnets
CN109285560A (en) A kind of music features extraction method, apparatus and electronic equipment
CN107463605A (en) The recognition methods and device of low-quality News Resources, computer equipment and computer-readable recording medium
Yang et al. Music Genre Classification Using Duplicated Convolutional Layers in Neural Networks.
CN105022754A (en) Social network based object classification method and apparatus
Gowda et al. A new split for evaluating true zero-shot action recognition
Chen et al. Recognizing the style of visual arts via adaptive cross-layer correlation
CN105989067A (en) Method for generating text abstract from image, user equipment and training server
CN108920644A (en) Talk with judgment method, device, equipment and the computer-readable medium of continuity
Abdul-Rashid et al. Shrec’18 track: 2d image-based 3d scene retrieval
Chen et al. An Improved Deep Fusion CNN for Image Recognition.
Gao et al. Bottom-up and top-down: Bidirectional additive net for edge detection
CN114398485B (en) Expert portrait construction method and device based on multi-view fusion
Dhiraj et al. An effective analysis of deep learning based approaches for audio based feature extraction and its visualization
CN108133020A (en) Video classification methods, device, storage medium and electronic equipment
CN108154120A (en) video classification model training method, device, storage medium and electronic equipment
CN108304387A (en) The recognition methods of noise word, device, server group and storage medium in text
Wang et al. Deep feature fusion for high-resolution aerial scene classification
CN108229640A (en) The method, apparatus and robot of emotion expression service
CN110532570A (en) A kind of method and apparatus of method and apparatus and model training that naming Entity recognition
Heakl et al. A study on broadcast networks for music genre classification
CN110197213A (en) Image matching method, device and equipment neural network based

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant