CN106919662A - A kind of music recognition methods and system - Google Patents
A kind of music recognition methods and system Download PDFInfo
- Publication number
- CN106919662A CN106919662A CN201710077359.2A CN201710077359A CN106919662A CN 106919662 A CN106919662 A CN 106919662A CN 201710077359 A CN201710077359 A CN 201710077359A CN 106919662 A CN106919662 A CN 106919662A
- Authority
- CN
- China
- Prior art keywords
- music
- eigenmatrix
- identified
- snatch
- similarity
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/60—Information retrieval; Database structures therefor; File system structures therefor of audio data
- G06F16/68—Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually
- G06F16/683—Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually using metadata automatically derived from the content
Abstract
The invention discloses a kind of music recognition methods and system.The method includes:Obtain snatch of music to be identified;Extract mel cepstrum coefficients, mel cepstrum coefficients first-order difference, linear prediction residue error and the perception linear predictor coefficient of each frame audio in snatch of music to be identified;The characteristic vector of audio is constituted using the above-mentioned coefficient of audio;Characteristic vector to audio described in each frame is combined, and obtains the eigenmatrix of snatch of music to be identified;The eigenmatrix of snatch of music to be identified is compared with the sample musical features matrix in music libraries, maximum similarity eigenmatrix is obtained, the maximum similarity eigenmatrix is the sample musical features matrix maximum with the similarity of snatch of music to be identified;Obtain the music information of the maximum sample musical features matrix of similarity;Music information is exported.The music recognition methods and system that the present invention is provided have more preferable noise immunity and discrimination, and recognition effect is more preferable.
Description
Technical field
The present invention relates to music recognition field, more particularly to a kind of music recognition methods and system.
Background technology
With flourishing for science and technology, especially the generation of computer technology and develop rapidly, sound and music meter
Calculating (Sound andMusic Computing, SMC) becomes an emerging cross discipline, and the subject is mainly by calculating
Method understands, simulates and produces sound and music.
With the ripe and development of every technology, people can obtain music by multiple channel.Wherein network is most just
It is prompt, most quick to obtain channel.This music that directly results on network generates explosive growth, and people are also accustomed to passing through
Network retrieval down-load music.But the music for spreading through the internet not has complete music information to mark, on the one hand for
Management causes certain difficulty, and the on the other hand use for obtaining taker also result in certain puzzlement.Simultaneously in life,
Music is often simple broadcasting, and the relevant information of music such as title of the song, singer etc., it is not same along with music always
When occur.This case, for having heard that music and music-lover that is interested in it, wanting to know about and obtain the music be
One very stubborn problem.Retrieval accordingly, for music becomes a highly important problem.
The retrieval of music mainly has two ways, and one kind is text based music retrieval, and another kind is based on content
Music retrieval.Wherein, text based music retrieval system mainly by user submit to keyword message, such as title of the song, Ge Shouming,
Lyrics fragment etc., is then retrieved and is matched using keyword to the music information in database, so as to realize purpose.The skill
Art is very ripe and is used widely, but its limitation is also fairly obvious.Which is bright for that cannot provide
The unknown audio of true keyword message cannot be retrieved and recognized, and the keyword message that user provides also very easily is produced
Raw mistake, causes error and the failure of retrieval.
And music recognition, namely content-based music retrieval is different from text based music retrieval, it and not according to
The text message of music, but be directly identified, so as to reach the mesh of retrieval according to content of music samples fragment itself
's.Although in itself with complicated physical characteristic, per song has its metastable feature to music, and can make
One section of music is characterized with the feature of these stabilizations.And music recognition feature namely according to these music come complete identification.
The feature that can go out from extraction of music has many, such as beat number per minute, timing node etc. is started over, using different
Theoretical and method, can extract different audio frequency characteristics.And different audio frequency characteristics have different characteristics, for different
Condition, can choose appropriate feature and be identified.Music recognition overcomes the limitation of text based music retrieval, concern
Music in itself, with more preferable adaptability and practicality, will also be increasingly becoming the main trend of music retrieval.
Music recognition, in being calculated as sound and music compared with based on and practical problem, is at home and abroad weighed
Depending on.The music recognition service of Shazam companies is by extracting happy line (Music-Fingerprinting, MFP), then carrying out
Match to realize music recognition, it proposes a kind of happy line extracting method of distinguished point based, i.e., find feature from frequency spectrum
Point, is exactly the happy line of the fragment to the sequence for constituting by its composition characteristic point to (Peak-Pairs) by characteristic point.Separately there is one kind
Classical music automatic recognition system based on hidden Markov model, the system is according to the half rank note category feature for extracting
(Chroma Features) is clustered, and is then identified using hidden Markov model.
The magnanimity music retrieval system of a kind of optimization based on Shazam companies achievement, by happy line extraction process
The middle optimization for adding frequency spectrum optimization, peak value filtering, setting feature three links of pixel confidence to realize to happy line.Also use line
Property alignment matching method be used to the singing search system realized approximate melody matching and build based on this.The main basis of the method
The pitch contour of close melody is in similitude geometrically, the new method that pitch and rhythm characteristic are considered and formed in the lump.
But above-mentioned music recognition methods are not high to the discrimination of music, recognition effect is also unsatisfactory, needs further
Improve.
The content of the invention
There is more preferable noise immunity and discrimination, the more preferable music of recognition effect it is an object of the invention to provide one kind
Recognition methods and system.
To achieve the above object, the invention provides following scheme:
A kind of music recognition methods, methods described includes:
Obtain snatch of music to be identified;
Extract mel cepstrum coefficients, the jump of mel cepstrum coefficients one of each frame audio in the snatch of music to be identified
Point, linear prediction residue error and perceive linear predictor coefficient;
Using the mel cepstrum coefficients of the audio, mel cepstrum coefficients first-order difference, linear prediction residue error and sense
Know that linear predictor coefficient constitutes the characteristic vector of the audio;
Characteristic vector to audio described in each frame is combined, and obtains the eigenmatrix of the snatch of music to be identified;
The eigenmatrix of the snatch of music to be identified is compared with the sample musical features matrix in music libraries, is obtained
To maximum similarity eigenmatrix, the maximum similarity eigenmatrix is maximum with the similarity of the snatch of music to be identified
Sample musical features matrix;
Obtain the music information of the maximum sample musical features matrix of the similarity;
By music information output.
Optionally, after one section of music to be identified of the acquisition, each frame audio in the extraction music to be identified
Before mel cepstrum coefficients, mel cepstrum coefficients first-order difference, linear prediction residue error and perception linear predictor coefficient, also wrap
Include:
The snatch of music to be identified is pre-processed, the pretreatment includes preemphasis treatment, framing and adding window.
Optionally, the sample musical features matrix in the eigenmatrix by the snatch of music to be identified and music libraries
It is compared, obtains similarity maximum sample musical features matrix, specifically includes:
From the sample musical features matrixThe middle feature square intercepted with the snatch of music to be identified
Battle array has the matrix of same number of rowsThe sample musical features matrix has multiple, wherein, k
=1,2 ..., M-N+1, Δ k=1, tz (1), tz (2) ..., tz (M) they are each frame audio in the sample musical features matrix B
Characteristic vector, M is the number of characteristic vector in the musical features matrix B, and N is the feature square of the snatch of music to be identified
The line number of battle array, by matrix BkLabeled as eigenmatrix to be compared;
The similarity of eigenmatrix to be compared and the eigenmatrix of the snatch of music to be identified is calculated, obtains being treated with described
Recognize the maximum eigenmatrix to be compared of the eigenmatrix similarity of snatch of music, the spy with the snatch of music to be identified
Levy the musical features matrix as maximum similarity musical features matrix belonging to the maximum eigenmatrix to be compared of matrix similarity.
Optionally, described from the musical features matrixMiddle interception and the musical film to be identified
The eigenmatrix of section has the matrix of same number of rowsWherein, Δ k=1, k=1,2 ..., M-N+
1, tz (1), tz (2) ..., tz (M) are the characteristic vector of each frame audio in the musical features matrix B, and M is the special music
The number of characteristic vector in matrix B is levied, before N is the line number of the eigenmatrix of the snatch of music to be identified, is also included:
Judge whether to need to complete to calculate in Preset Time, and predetermined threshold value is less than to accuracy requirement;
If it is, Δ k to be set greater than 1 integer.
Optionally, the similarity for calculating eigenmatrix to be compared and the eigenmatrix of the snatch of music to be identified,
Specifically include:
Cosine using matrix absolute value distance method, the cosine law method of vector space model or vector space model is determined
Manage the feature that the method being combined with Euclidean distance calculates the eigenmatrix to be compared and the snatch of music to be identified
The similarity of matrix.
Present invention also offers a kind of music recognition system, the system includes:
Snatch of music acquisition module to be identified, for obtaining snatch of music to be identified;
Parameter extraction module, mel cepstrum coefficients, plum for extracting each frame audio in the snatch of music to be identified
That cepstrum coefficient first-order difference, linear prediction residue error and perception linear predictor coefficient;
Characteristic vector determining module, for using the mel cepstrum coefficients of the audio, mel cepstrum coefficients first-order difference,
Linear prediction residue error and perception linear predictor coefficient constitute the characteristic vector of the audio;
Eigenmatrix determining module, is combined for the characteristic vector to audio described in each frame, obtains described waiting to know
The eigenmatrix of other snatch of music;
Matching module, for by the sample musical features square in the eigenmatrix of the snatch of music to be identified and music libraries
Battle array is compared, and obtains maximum similarity eigenmatrix, and the maximum similarity eigenmatrix is and the musical film to be identified
The maximum sample musical features matrix of the similarity of section;
Music information acquisition module, the music information for obtaining the maximum sample musical features matrix of the similarity;
Music information output module, for the music information to be exported.
Optionally, the system also includes:
Pretreatment module, for being pre-processed to the snatch of music to be identified, the pretreatment is included at preemphasis
Reason, framing and adding window.
Optionally, the matching module, specifically includes:
Matrix interception unit, for from the sample musical features matrixMiddle interception is to be identified with described
The eigenmatrix of snatch of music has the matrix of same number of rowsThe sample musical features square
Battle array has multiple, wherein, k=1,2 ..., M-N+1, Δ k=1, tz (1), tz (2) ..., tz (M) they are the sample musical features square
The characteristic vector of each frame audio in battle array B, M is the number of characteristic vector in the musical features matrix B, and N is described to be identified
The line number of the eigenmatrix of snatch of music, by matrix BkLabeled as eigenmatrix to be compared;
Similarity calculated, for calculating eigenmatrix to be compared with the eigenmatrix of the snatch of music to be identified
Similarity, obtains the to be compared eigenmatrix maximum with the eigenmatrix similarity of the snatch of music to be identified, described and institute
Musical features matrix belonging to the eigenmatrix to be compared of the eigenmatrix similarity maximum for stating snatch of music to be identified is most
Big similarity musical features matrix.
Optionally, the matching unit also includes:
Judging unit, completes to calculate for judging whether to need in Preset Time, and to accuracy requirement less than default
Threshold value;
Setting unit, for completing to calculate in Preset Time when needs, and when being less than predetermined threshold value to accuracy requirement,
Δ k is set greater than 1 integer.
Optionally, the similarity calculated is specifically included:
Similarity Measure subelement, for using matrix absolute value distance method, the cosine law method of vector space model or
The cosine law of person's vector space model calculates the eigenmatrix to be compared and institute with the method that Euclidean distance is combined
State the similarity of the eigenmatrix of snatch of music to be identified.
According to the specific embodiment that the present invention is provided, the invention discloses following technique effect:The present invention is fallen using Mel
Spectral coefficient, linear prediction residue error, perceive linear prediction these three essential characteristics and carry out the comprehensive identification to realize music, three
Class audio frequency feature has different characteristics and advantage, and more preferable recognition result can be obtained with reference to after;Linear prediction is perceived because of mould
The masking effect of human ear is intended, with more preferable noise immunity and discrimination, the recognition effect for obtaining is more preferable.
Brief description of the drawings
In order to illustrate more clearly about the embodiment of the present invention or technical scheme of the prior art, below will be to institute in embodiment
The accompanying drawing for needing to use is briefly described, it should be apparent that, drawings in the following description are only some implementations of the invention
Example, for those of ordinary skill in the art, without having to pay creative labor, can also be according to these accompanying drawings
Obtain other accompanying drawings.
Fig. 1 is the schematic flow sheet of embodiment of the present invention music recognition methods;
Fig. 2 is the structural representation of embodiment of the present invention music recognition system.
Specific embodiment
Below in conjunction with the accompanying drawing in the embodiment of the present invention, the technical scheme in the embodiment of the present invention is carried out clear, complete
Site preparation is described, it is clear that described embodiment is only a part of embodiment of the invention, rather than whole embodiments.It is based on
Embodiment in the present invention, it is every other that those of ordinary skill in the art are obtained under the premise of creative work is not made
Embodiment, belongs to the scope of protection of the invention.
There are the music recognition methods and system being obviously improved it is an object of the invention to provide a kind of accuracy and noise immunity.
It is below in conjunction with the accompanying drawings and specific real to enable the above objects, features and advantages of the present invention more obvious understandable
The present invention is further detailed explanation to apply mode.
Fig. 1 is embodiment of the present invention music recognition methods steps flow chart schematic diagram, as shown in figure 1, music recognition methods are walked
It is rapid as follows:
Step 101:Obtain snatch of music to be identified;
Step 102:Extract mel cepstrum coefficients, the mel cepstrum coefficients of each frame audio in the snatch of music to be identified
First-order difference, linear prediction residue error and perception linear predictor coefficient;
Step 103:Using the mel cepstrum coefficients of the audio, mel cepstrum coefficients first-order difference, linear prediction cepstrum coefficient
Coefficient and perception linear predictor coefficient constitute the characteristic vector of the audio;
Step 104:Characteristic vector to audio described in each frame is combined, and obtains the spy of the snatch of music to be identified
Levy matrix;
Step 105:The eigenmatrix of the snatch of music to be identified is entered with the sample musical features matrix in music libraries
Row compares, and obtains maximum similarity eigenmatrix, and the maximum similarity eigenmatrix is and the snatch of music to be identified
The maximum sample musical features matrix of similarity;
Step 106:Obtain the music information of the maximum sample musical features matrix of the similarity;
Step 107:By music information output.
Wherein, before step 101, also include:
The snatch of music to be identified is pre-processed, the pretreatment includes preemphasis treatment, framing and adding window.
Step 105, specifically includes:
Judge whether to need to complete to calculate in Preset Time, and predetermined threshold value is less than to accuracy requirement;
If it is, Δ k to be set greater than 1 integer;
From the sample musical features matrixThe middle feature square intercepted with the snatch of music to be identified
Battle array has the matrix of same number of rowsThe sample musical features matrix has multiple, wherein, k
=1,2 ..., M-N+1, Δ k=1, tz (1), tz (2) ..., tz (M) they are each frame audio in the sample musical features matrix B
Characteristic vector, M is the number of characteristic vector in the musical features matrix B, and N is the feature square of the snatch of music to be identified
The line number of battle array, by matrix BkLabeled as eigenmatrix to be compared;
The similarity of eigenmatrix to be compared and the eigenmatrix of the snatch of music to be identified is calculated, obtains being treated with described
Recognize the maximum eigenmatrix to be compared of the eigenmatrix similarity of snatch of music, the spy with the snatch of music to be identified
Levy the musical features matrix as maximum similarity musical features matrix belonging to the maximum eigenmatrix to be compared of matrix similarity.
When calculating the similarity of the eigenmatrix to be compared and the eigenmatrix of the snatch of music to be identified, using matrix absolute value
The cosine law of Furthest Neighbor, the cosine law method of vector space model or vector space model is combined with Euclidean distance
Method carry out the calculating of similarity.
Used as preferred embodiment, pretreatment includes three work, respectively preemphasis treatment, framing, adding window.
A) preemphasis treatment
Preemphasis treatment is that, by a high-pass filter, the transmission function of this wave filter is as follows by audio signal:
H (z)=μ z-1
In above formula, the value of parameter μ generally takes 0.97 between 0.9-1.0.The purpose of preemphasis treatment is lifting
High fdrequency component in audio signal, making the frequency spectrum of audio signal becomes more flat, strengthens track characteristics, is spectrum analysis or sound
Road Parameter analysis are ready.
B) framing
As a whole, be one has unstable, stochastic behaviour process to audio signal, but audio is between the short time
In (usually 10ms-30ms), audio signal shows certain stability.In passing research, more attention is
Audio carries out short-time analysis in the invariant feature for showing interior in short-term.Continuous audio signal is divided into comprising equivalent amount
The period in short-term of sampled point, i.e. frame.Three audio frequency characteristics of required extraction are analyzed on frame-basis.Framing
The short-time stability of audio signal can be kept, so that for short-time analysis lays the first stone.Meanwhile, to ensure the continuous of audio signal
Property and dynamic, in framing it is required that adjacent two interframe has one section of overlapping region, the Duplication of interframe is typically about
50%-80%.44100Hz is using audio in system, is a frame with 512 sampled points, corresponding time span is (512
÷ 44100) × 1000=11.61ms, interframe Duplication is the sampled point of 50%, i.e., 256.
C) adding window
To continuous audio signal sub-frame processing, spectrum energy is leaked caused by producing after being truncated by unlimited signal,
In order to reduce spectrum energy leakage, it is to avoid each issuable signal in frame two ends is discontinuous, it is necessary to be carried out at adding window to each frame
Reason.Conventional window function has following three kinds:
(1) rectangular window (RectangularWindow)
(2) Hamming window (Hamming Window)
(3) Hanning window (Hann Window)
From the point of view of window function, Hamming window and Hanning window are all broad sense raised cosines.From the angle for reducing spectrum energy leakage
For degree, Hanning window is better than rectangular window, but frequency identification precision is low.Hamming window is reducing spectrum energy compared with rectangular window
Leakage aspect is more excellent, while frequency identification precision is more preferable compared with Hanning window.Therefore, the system carries out adding window with Hamming window.
In the flow of the extraction mel cepstrum coefficients of current standard, also adding window is carried out using Hamming window.
As the preferred embodiments of the present invention, the mel cepstrum system of each frame audio in the snatch of music to be identified is extracted
Number, mel cepstrum coefficients first-order difference, linear prediction residue error and perception linear predictor coefficient.
1. mel cepstrum coefficients MFCC and its first-order difference
Mel cepstrum coefficients, also known as mel-frequency cepstrum coefficient.The frequency band of mel-frequency cepstrum is first-class in Mel scale
Away from what is divided, it is similar to the auditory system of the mankind compared with the linear interval frequency band that cepstrum is used, more, can be from human ear
The angle of the sense of hearing preferably represents sound.It is by the common-used formula that frequency f is converted to mel-frequency m:The flow for extracting mel cepstrum coefficients is as follows:
A) Fast Fourier Transform (FFT) (Fast Fourier Transform, FFT) is discrete Fourier transform (Discrete
FourierTransform, DFT) fast algorithm.For N point sequences { x [n] } 0≤n≤N, its discrete Fourier transform is
Accordingly, the discrete Fourier transform of audio signal is set to
Wherein x (n) is the audio signal of input, and N shows the points of discrete Fourier transform.
Because acoustic characteristic is difficult to draw in time domain, in order to obtain the characteristic of different audios, usually using discrete
Fourier transformation is converted to frequency domain, to obtain its Energy distribution on frequency spectrum, such that it is able to be further processed,
Obtain the characteristic of audio.In practice, each frame is obtained by Fast Fourier Transform (FFT) by the audio of pretreatment to each frame
Frequency spectrum, then to frequency spectrum modulus square, finally obtain the power spectrum of every frame signal.
B) Mel wave filter group is one group of equally distributed M triangle bandpass filter in Mel scale, and usual M takes
24, centre frequency is f (m) (m=1,2 ..., M), and the interval of each f (m) is widened along with the increase of m.The triangle band logical is filtered
The frequency response function of ripple device is
Wherein
Because the scope that each triangle bandpass filter is covered in the wave filter group be similar to human ear one is critical
Bandwidth, in order to simulate the masking effect of human ear, is filtered using the wave filter group to the power spectrum of every frame signal.
C) output result to Mel wave filter group carries out following computing, so as to try to achieve logarithmic energy.
D) logarithmic energy for trying to achieve back carries out discrete cosine transform, as follows
Wherein L is mel cepstrum coefficients exponent number, generally takes 12-16.Because 0 rank of mel cepstrum coefficients reflects spectrum energy,
It is often used without, using 12-16 ranks parameter thereafter as mel cepstrum coefficients.The mel cepstrum coefficients that the system is extracted are 12 ranks.
E) it is merely able to react the static characteristic of audio because of mel cepstrum coefficients, in order to describe the dynamic characteristic of audio, so
Calculate the first-order difference parameter of mel cepstrum coefficients.Calculate first-order difference formula be:
Wherein, dtIt is t-th first-order difference, CtIt is t-th mel cepstrum coefficients, Q is the exponent number of mel cepstrum coefficients, K tables
Show the time difference of first derivative, can use 1 or 2.
F) mel cepstrum coefficients and its first-order difference are merged, as one 24 vector of dimension, as first audio spy
Levy.
2. linear prediction residue error LPCC
Linear prediction residue error is obtained based on linear predictor coefficient (LPC).The basic thought of linear prediction
It is that using the correlation between audio sampling point, past several sample values predict present or following sample value.Extract line
Property prediction cepstrum coefficient flow it is as follows:
A) LPC parameters are solved
Tract characteristics are simulated using all-pole modeling, transfer function is as follows:
P is the exponent number of linear prediction, a in formulakIt is that (k=1,2 ..., p), G is that vocal tract filter increases to linear predictor coefficient
Benefit.
If a certain frame audio signal is x (n), the linear prediction result of the audio is:
Error between the frame in audio signal and linear prediction result is:
Linear predictor coefficient is by making the error between the sample value of actual audio signal and linear prediction result equal
Square criterion is issued to minimum come what is tried to achieve, i.e. following formula value reaches minimum:
To cause that above formula value is minimum, to akLocal derviation is sought, it is zero then to make it, need to solve following canonical side after abbreviation
Journey group:
Wherein R (k) is the auto-correlation coefficient of sequence x (n).Above formula can be rewritten into following matrix form:
Rp·ap=-rp
Wherein
ap=[ap1,ap2,...,app]T
rp=[R (1), R (2) ..., R (p)]T
Then Levinson-Durbin algorithms are used, namely based on autocorrelative Recursive Solution equation, is solved cutting edge aligned
Predictive coefficient.
Through observation shows that, RpToeplitz matrixes, with the element on diagonal it is all equal the characteristics of.
Levinson-Durbin is iterated calculating using this feature, the result of low order is first calculated, then using the result of low order
Go to calculate high order result.For m rank linear predictions, linear predictor coefficient algorithm is can obtain as follows:
amk=a(m-1)k+ρma(m-1)(m-k)=a(m-1)k+amma(m-1)(m-k)
Wherein, am-1It is (m-1) rank predictive coefficient,
Subscript b represent rm-1=[R (1), R (2) ..., R (m-1)]TThe inverted order arrangement of element.
According to this algorithm, linear predictor coefficient can be obtained.
B) LPCC parameters are solved
Cepstrum generally refers to the inverse Fourier transform of the logarithm value of power spectrum, and the mathematic(al) representation of cepstrum function is:
C (q)=IDFT (log (| s (f) |2))
Wherein, s (f) is the Fourier transformation of signal s (t), and to take the logarithm, IDFT () is inverse Fourier transform to log ().
Linear predictor coefficient and cepstrum are defined, and linear prediction residue error exists a kind of using linear predictor coefficient realization
Recursive Solution method, it is as follows:
Specific recurrence relation is as follows:
Wherein, anIt is n-th order linear predictor coefficient.
Linear prediction residue error c is tried to achieve according to this recurrence relation, i.e.,
C=[c (1), c (2) ..., c (q)], 10≤q≤16
Q is the exponent number of linear prediction residue error in formula, and 12 are taken in the present invention.
3. linear prediction PLP is perceived
It is the characteristic parameter based on auditory model to perceive linear prediction, and the all-pole modeling using linear prediction is divided
Analysis, but the related conclusions of human auditory model are applied into spectrum analysis from linear prediction unlike linear prediction, is perceived
In, by the method for approximate calculation so that audible spectrum meets human auditory system and perceives mechanism, improves the robust of audio frequency characteristics
Property.
Perception linear forecasting technology realizes the imitation to the Auditory Perception mechanism of human ear by three levels, faces respectively
The treatment of boundary frequency range analysis, contour of equal loudness preemphasis, signal intensity-hearing loudness conversion.Extract and perceive linear prediction feature
Flow is as follows:
A) discrete Fourier transform DFT
It is similar to mel cepstrum coefficients MFCC is extracted, calculated using Fast Fourier Transform (FFT), obtain every frame audio letter
Number frequency spectrum, then to the power spectrum for square obtaining every frame signal of frequency spectrum modulus.After this process, the power of audio signal is obtained
Spectrum P (f).
B) critical band analysis
The masking effect of human auditory system is an important feature, refers to that the sound not waited when two loudness acts on people
During ear, the sound that loudness sound higher can allow loudness relatively low becomes to be difficult to be therefore easily perceived by humans.Under study for action, critical band is analyzed is
Embodiment to masking effect.Critical band refers to continuously making an uproar by the frequency centered on it and with certain bandwidth when certain pure tone
When sound is sheltered, if the pure tone can just be heard the power that the power for being is equal to this inband noise, then this band
Width is referred to as critical band width.The unit of critical band is Bark.
In order that obtaining PLP features can approximately imitate the mechanism of perception of human auditory system, it is necessary to be analyzed by critical band, will
The frequency axis f of power spectrum is mapped to Bark domains, and formula is as follows:
The audio frequency that system is used is 44100Hz, and 30 frequency bands can be obtained after substitution.Then by the power spectrum after mapping with
Simulate the curve of critical bandMutually roll up, obtain critical band power spectrum θ (k):
Wherein simulate critical band curveIt is as follows:
Z in formula0K () represents k-th centre frequency of critical band power spectrum.
The loudness preemphasis such as c)
Loudness describes subjective feeling of the human ear to sound, and the external ear and middle ear of people are for 1~5kHz frequency ranges
Interior sound has the lifting of 10~20dB.In order to imitate this characteristic of human ear, it is necessary to the treatment of loudness preemphasis such as carry out.Use
The 40dB equal loudness contours of simulation such as carry out at the treatment of loudness preemphasis, and because of it, no matter noise intensity height can preferably reflect people
Ear is widely used in the evaluation criterion of noise to the sensation of noise loudness.
Simulating contour of equal loudness is:
Preemphasis is carried out to critical band power spectrum using simulation contour of equal loudness:
τ (k)=E [f0(k)] θ (k), (k=1,2 ..., 30)
Wherein f0K () represents the frequency corresponding to k-th centre frequency of critical band power spectrum.
D) intensity-hearing loudness conversion
Because the relation between the intensity of sound and the loudness of auditory perceptual is nonlinear, in order to simulate this relation,
Need to carry out the conversion between intensity and hearing loudness:
E) inverse discrete Fourier transform IDFT
Inverse discrete Fourier transform is the inverse process to discrete Fourier transform, and direct computation of DFT is carried out here by δ (k)
Leaf inverse transformation obtains the short-time autocorrelation function of audio signal, and the linear prediction analysis after being prepares.
For N point sequences { x [n] } 0≤n<N, its inverse discrete Fourier transform is
F) all-pole modeling
Using all-pole modeling, the result of inverse discrete Fourier transform is carried out to δ (k) carries out linear prediction analysis, method
It is identical with linear predictor coefficient is solved.Go out 12 rank linear predictor coefficients using Levinson-Durbin Algorithm for Solving, then utilize
With solution linear prediction residue error identical method, 16 rank cepstrum coefficients are solved, be as a result and perceive linear predictor coefficient.
As the preferred embodiment of the invention, using the mel cepstrum coefficients of the audio, the jump of mel cepstrum coefficients one
Point, linear prediction residue error and perceive the characteristic vector that linear predictor coefficient constitutes the audio;To audio described in each frame
Characteristic vector be combined, obtain the eigenmatrix of the snatch of music to be identified.
For each frame audio, 12 Jan Vermeer cepstrum coefficient m, 12 Jan Vermeer cepstrum coefficient first-order difference Δs are can extract out
M, the 12 dimensional linears dimensions of prediction cepstrum coefficient l and 16 perceive linear predictor coefficient p:
M=[m1 m2…m12]
Δ m=[Δ m1 Δm2…Δm12]
L=[l1 l2…l12]
P=[p1 p2…p16]
Aforementioned four feature is merged turns into the general audio characteristic vector tz of one 52 dimension:
Tz=[m Δ m l p]
For a section audio, the eigenmatrix A, N that can extract out N × 52 represent totalframes, by audio duration, adopt
Sample rate, frame length and frame are moved and determined:
Why propose the synthesis based on mel cepstrum coefficients, linear prediction residue error, perception linear prediction
Audio frequency characteristics are used as music recognition standard, because the different characteristic that these features have.Mel cepstrum coefficients are based on
Auditory model, according to human ear mechanism, obtains result, with preferably anti-by with hertz into the mel-frequency of non-linear relation
Making an uproar property;Linear prediction residue error is based on channel model, according to principle of sound, is obtained by the linear approximation of all-pole modeling
As a result, the tract characteristics and resonance peak character of audio signal are reflected;Perceive linear prediction and be based on auditory model, according to human ear
Mechanism, by obtaining result for the simulation and linear prediction of human ear characteristic, reflects the masking characteristics of human ear, with compared with
Good noise immunity.
Four kinds of features of selection each have its advantage and can be complementary, so with mel cepstrum coefficients, linear prediction cepstrum coefficient
Coefficient and perceive linear predictor coefficient be combined the general audio feature to be formed as criterion of identification, can carry out more fully,
More accurately retrieval matching, music recognition rate and noise immunity can get a promotion.
It is as preferred embodiment, the sample music in the eigenmatrix of the snatch of music to be identified and music libraries is special
Levy matrix to be compared, obtain maximum similarity eigenmatrix, the maximum similarity eigenmatrix is and the sound to be identified
The maximum sample musical features matrix of the similarity of happy fragment.
Matching is to be compared the eigenmatrix of any sample music in the eigenmatrix of snatch of music and audio repository
And matching, most close eigenmatrix is found, the sample music information belonging to the matrix is then obtained, namely:
The eigenmatrix A of-unknown audio sample fragments
- audio repository provides a series of eigenmatrix B for being used to match and comparingk(n)
- find out the corresponding n of the matrix most like with the eigenmatrix A of unknown audio sample fragments
Such as, the eigenmatrix of snatch of music is A, is the matrix of N × 52, if the feature square of any music in audio repository
Battle array is B, is the matrix of M × 52, and M >=N:
Interception has the submatrix B of same number of rows with matrix A from matrix B successivelyk
Wherein k represents submatrix BkThe first row be located at position in eigenmatrix B, the changes delta k of k values is defaulted as 1,
Namely matrix BkWith matrix Bk+1Between have N-1 rows identical, complete identification need M-N+1 matrix BkCarried out with matrix A successively
Compare, calculate similarity ψ (k).
Calculating two similarities of matrix can have various methods, and three kinds of methods have been outlined below:
A) matrix absolute value distance
Two similarity degrees of matrix are calculated, there are various methods, simplest is exactly to seek two absolute value distances of matrix.
Size is the matrix A and matrix B of N × 52kSubtract each other that to obtain matrix poor, matrix difference is asked and thoroughly deserves matrix
C:
C=| A-Bk|
Then diversity factor ψ (k) for obtaining two matrixes is summed up to all elements in Matrix C:
ψ (k)=C1,1+C1,2+C1,3+…+Ci,j+…+CN,52,
1≤i≤N, 1≤j≤52
Although this method is easier to realize also relatively intuitively, readily appreciate, but this method is easily influenceed by 0 value, once sound
Occur the pause of long period in pleasure, matching effect will be influenceed.
B) cosine law of vector space model
It is considered that two similarity degrees of matrix, are in essence to compare the numerical value on two correspondence positions of matrix
Difference degree, as long as namely the corresponding relation of numerical value do not change, the form of matrix can be what is be changed.By two
Individual matrix is deformed, and becomes two multi-C vectors, the method then referring to text similarity is calculated in natural language processing, profit
Audio similarity is calculated with the cosine law of vector space model (Vector Space Model).
According to the cosine law of vector space model, the similarity of two section audios can be by their audio frequency characteristics in N
The relative position performance of dimension space, and the relative position of audio frequency characteristics is then the cosine table by the angle between two vectors
Show, angle between the two is smaller, then similarity is bigger.
Size is the matrix A and matrix B of N × 52kBe converted to the matrix that two sizes are 1 × 52N, namely two
The vectorial a of 52N dimensions1And a2:
a1((i-1) * 52+j)=A (i, j), 1≤i≤N, 1≤j≤52
a2((i-1) * 52+j)=Bk(i, j), 1≤i≤N, 1≤j≤52
According to the cosine law of space vector, calculating matrix A and matrix BkSimilarity ψ:
Wherein vector a2According to matrix BkChange and change.
C) cosine law & Euclidean distances of vector space model
It is similar to former approach, the similarity of matrix is asked for using the cosine law of vector space model;Difference
It is that eigenmatrix is first divided into three submatrixs according to the dimension of three class audio frequency features, then asks for the sub- square of audio sample
Similarity between battle array and audio repository submatrix, then obtains the similarity of eigenmatrix by Euclidean distance.
Size is the matrix A and matrix B of N × 52kFirst respectively according to the dimension of three category features, three sub- squares are divided into
Battle array, the respectively A of N × 241And Bk1, N × 12 A2And Bk2, N × 16 A3And Bk3, the six sub- matrix conversions that will be obtained are six
Individual vector of different sizes:
a1((i-1) * 24+j)=A1(i, j), 1≤i≤N, 1≤j≤24
a2((i-1) * 12+j)=A2(i, j), 1≤i≤N, 1≤j≤12
a3((i-1) * 16+j)=A3(i, j), 1≤i≤N, 1≤j≤16
b1((i-1) * 24+j)=Bk1(i, j), 1≤i≤N, 1≤j≤24
b2((i-1) * 12+j)=Bk2(i, j), 1≤i≤N, 1≤j≤12
b3((i-1) * 16+j)=Bk3(i, j), 1≤i≤N, 1≤j≤16
According to the cosine law of space vector, calculated sub-matrix A1With submatrix Bk1, submatrix A2With submatrix Bk2, sub- square
Battle array A3With submatrix Bk3Similarity ψ1(k)、ψ2(k)、ψ3(k):
According to Euclidean distance, matrix A and matrix B are calculatedkSimilarity ψ (k):
(3rd) kind method is have selected when system is realized carries out Similarity Measure, obtain M-N+1 dimension similarity to
Amount, then by comparing, finds out the similarity ψ closest to 1 in ψ (k)maxAnd its corresponding k, then obtain corresponding audio
Time location t:
Wherein L represents frame length, and sr represents sample rate, and wherein L is that the unit that 512, sr is 44100, t is the second.
By the eigenmatrix B of all audios in the eigenmatrix A of snatch of music and audio repositorykN () (n is in audio repository
Audio quantity), Similarity Measure is carried out, obtain the ψ of audio sample and every first audiomaxAnd t, the ψ of all audios in storehousemaxFormed
One n-dimensional vector ψmaxN (), the t of all audios forms n-dimensional vector t (n) in storehouse.
By comparing, n-dimensional vector ψ is obtainedmaxIn (n) closest to 1 that, then using corresponding n, obtain audio
Title and time location, namely complete identification.
3. Rapid matching
Matching process described above is more comprehensive and accurate method, and relative, matching speed also can be slower.For reality
When property is higher, the relatively low situation of accuracy requirement, 1 can be set greater than by by the value of Δ k, so as to accelerate submatrix Bk's
Translational speed.Because Δ k values increase to the integer more than 1 so that matrix BkWith matrix Bk+1Identical line number is reduced to N- Δ k,
Completing identification needs the number of times of the matrix Similarity Measure for carrying out to be reduced to from M-N+1It is secondary, with the increase of Δ k,
Calculation times reduction is bigger, and the increase of matching speed is also more notable.
The music recognition methods that the present invention is provided are linear pre- using mel cepstrum coefficients, linear prediction residue error, perception
Surveying these three essential characteristics carries out the comprehensive identification to realize music, and three class audio frequency features have different characteristics and advantage, tie
More preferable recognition result can be obtained after conjunction;Linear prediction is perceived because simulating the masking effect of human ear, with more preferable anti-noise
Property and discrimination, the recognition effect for obtaining are more preferable.
Present invention also offers a kind of music recognition system, Fig. 2 shows for the structure of embodiment of the present invention music recognition system
It is intended to, as shown in Fig. 2 the system includes:
Snatch of music acquisition module 201 to be identified, for obtaining snatch of music to be identified;
Parameter extraction module 202, for extract each frame audio in the snatch of music to be identified mel cepstrum coefficients,
Mel cepstrum coefficients first-order difference, linear prediction residue error and perception linear predictor coefficient;
Characteristic vector determining module 203, for mel cepstrum coefficients, the jump of mel cepstrum coefficients one using the audio
Point, linear prediction residue error and perceive the characteristic vector that linear predictor coefficient constitutes the audio;
Eigenmatrix determining module 204, is combined for the characteristic vector to audio described in each frame, obtains described treating
Recognize the eigenmatrix of snatch of music;
Matching module 205, for the sample music in the eigenmatrix of the snatch of music to be identified and music libraries is special
Levy matrix to be compared, obtain maximum similarity eigenmatrix, the maximum similarity eigenmatrix is and the sound to be identified
The maximum sample musical features matrix of the similarity of happy fragment;
Music information acquisition module 206, the music letter for obtaining the maximum sample musical features matrix of the similarity
Breath;
Music information output module 207, for the music information to be exported.
The system also includes:
Pretreatment module, for being pre-processed to the snatch of music to be identified, the pretreatment is included at preemphasis
Reason, framing and adding window.
The matching module 205, specifically includes:
Matrix interception unit, for from the sample musical features matrixMiddle interception is to be identified with described
The eigenmatrix of snatch of music has the matrix of same number of rowsThe sample musical features square
Battle array has multiple, wherein, k=1,2 ..., M-N+1, Δ k=1, tz (1), tz (2) ..., tz (M) they are the sample musical features square
The characteristic vector of each frame audio in battle array B, M is the number of characteristic vector in the musical features matrix B, and N is described to be identified
The line number of the eigenmatrix of snatch of music, by matrix BkLabeled as eigenmatrix to be compared;
Similarity calculated, for calculating eigenmatrix to be compared with the eigenmatrix of the snatch of music to be identified
Similarity, obtains the to be compared eigenmatrix maximum with the eigenmatrix similarity of the snatch of music to be identified, described and institute
Musical features matrix belonging to the eigenmatrix to be compared of the eigenmatrix similarity maximum for stating snatch of music to be identified is most
Big similarity musical features matrix.
Judging unit, completes to calculate for judging whether to need in Preset Time, and to accuracy requirement less than default
Threshold value;
Setting unit, for completing to calculate in Preset Time when needs, and when being less than predetermined threshold value to accuracy requirement,
Δ k is set greater than 1 integer.
The similarity calculated is specifically included:
Similarity Measure subelement, for using matrix absolute value distance method, the cosine law method of vector space model or
The cosine law of person's vector space model calculates the eigenmatrix to be compared and institute with the method that Euclidean distance is combined
State the similarity of the eigenmatrix of snatch of music to be identified.
The music recognition system that the present invention is provided is linear pre- using mel cepstrum coefficients, linear prediction residue error, perception
Surveying these three essential characteristics carries out the comprehensive identification to realize music, and three class audio frequency features have different characteristics and advantage, tie
More preferable recognition result can be obtained after conjunction;Linear prediction is perceived because simulating the masking effect of human ear, with more preferable anti-noise
Property and discrimination, the recognition effect for obtaining are more preferable.
Each embodiment is described by the way of progressive in this specification, and what each embodiment was stressed is and other
The difference of embodiment, between each embodiment identical similar portion mutually referring to.For system disclosed in embodiment
For, because it is corresponded to the method disclosed in Example, so description is fairly simple, related part is said referring to method part
It is bright.
Specific case used herein is set forth to principle of the invention and implementation method, and above example is said
It is bright to be only intended to help and understand the method for the present invention and its core concept;Simultaneously for those of ordinary skill in the art, foundation
Thought of the invention, will change in specific embodiments and applications.In sum, this specification content is not
It is interpreted as limitation of the present invention.
Claims (10)
1. a kind of music recognition methods, it is characterised in that methods described includes:
Obtain snatch of music to be identified;
Extract mel cepstrum coefficients, mel cepstrum coefficients first-order difference, the line of each frame audio in the snatch of music to be identified
Property prediction cepstrum coefficient and perceive linear predictor coefficient;
Using the mel cepstrum coefficients of the audio, mel cepstrum coefficients first-order difference, linear prediction residue error and perception line
Property predictive coefficient constitute the characteristic vector of the audio;
Characteristic vector to audio described in each frame is combined, and obtains the eigenmatrix of the snatch of music to be identified;
The eigenmatrix of the snatch of music to be identified is compared with the sample musical features matrix in music libraries, is obtained most
Big similarity eigenmatrix, the maximum similarity eigenmatrix is the sample maximum with the similarity of the snatch of music to be identified
This musical features matrix;
Obtain the music information of the maximum sample musical features matrix of the similarity;
By music information output.
2. recognition methods according to claim 1, it is characterised in that after one section of music to be identified of the acquisition, carry
Take mel cepstrum coefficients, mel cepstrum coefficients first-order difference, the linear prediction cepstrum coefficient of each frame audio in the music to be identified
Before coefficient and perception linear predictor coefficient, also include:
The snatch of music to be identified is pre-processed, the pretreatment includes preemphasis treatment, framing and adding window.
3. recognition methods according to claim 1, it is characterised in that the feature square by the snatch of music to be identified
Battle array is compared with the sample musical features matrix in music libraries, obtains similarity maximum sample musical features matrix, specific bag
Include:
From the sample musical features matrixMiddle interception has with the eigenmatrix of the snatch of music to be identified
There is the matrix of same number of rowsThe sample musical features matrix has multiple, wherein, k=1,
2 ..., M-N+1, Δ k=1, tz (1), tz (2) ..., tz (M) are the spy of each frame audio in the sample musical features matrix B
Vector is levied, M is the number of characteristic vector in the musical features matrix B, and N is the eigenmatrix of the snatch of music to be identified
Line number, by matrix BkLabeled as eigenmatrix to be compared;
The similarity of eigenmatrix to be compared and the eigenmatrix of the snatch of music to be identified is calculated, obtains to be identified with described
The maximum eigenmatrix to be compared of the eigenmatrix similarity of snatch of music, the feature square with the snatch of music to be identified
Musical features matrix belonging to the maximum eigenmatrix to be compared of battle array similarity is maximum similarity musical features matrix.
4. recognition methods according to claim 3, it is characterised in that described from the musical features matrixIt is middle to intercept the matrix with the eigenmatrix of the snatch of music to be identified with same number of rowsWherein, Δ k=1, k=1,2 ..., M-N+1, tz (1), tz (2) ..., tz (M) is described
The characteristic vector of each frame audio in musical features matrix B, M is the number of characteristic vector in the musical features matrix B, and N is
Before the line number of the eigenmatrix of the snatch of music to be identified, also include:
Judge whether to need to complete to calculate in Preset Time, and predetermined threshold value is less than to accuracy requirement;
If it is, Δ k to be set greater than 1 integer.
5. recognition methods according to claim 3, it is characterised in that the calculating eigenmatrix to be compared is waited to know with described
The similarity of the eigenmatrix of other snatch of music, specifically includes:
Using matrix absolute value distance method, the cosine law method of vector space model or vector space model the cosine law with
The method that Euclidean distance is combined calculates the eigenmatrix of the eigenmatrix to be compared and the snatch of music to be identified
Similarity.
6. a kind of music recognition system, it is characterised in that the system includes:
Snatch of music acquisition module to be identified, for obtaining snatch of music to be identified;
Parameter extraction module, falls for extracting the mel cepstrum coefficients of each frame audio in the snatch of music to be identified, Mel
Spectral coefficient first-order difference, linear prediction residue error and perception linear predictor coefficient;
Characteristic vector determining module, for using the mel cepstrum coefficients of the audio, mel cepstrum coefficients first-order difference, linear
Prediction cepstrum coefficient and perception linear predictor coefficient constitute the characteristic vector of the audio;
Eigenmatrix determining module, is combined for the characteristic vector to audio described in each frame, obtains the sound to be identified
The eigenmatrix of happy fragment;
Matching module, for the eigenmatrix of the snatch of music to be identified to be entered with the sample musical features matrix in music libraries
Row compares, and obtains maximum similarity eigenmatrix, and the maximum similarity eigenmatrix is and the snatch of music to be identified
The maximum sample musical features matrix of similarity;
Music information acquisition module, the music information for obtaining the maximum sample musical features matrix of the similarity;
Music information output module, for the music information to be exported.
7. identifying system according to claim 6, it is characterised in that the system also includes:
Pretreatment module, for being pre-processed to the snatch of music to be identified, the pretreatment includes preemphasis treatment, divides
Frame and adding window.
8. identifying system according to claim 6, it is characterised in that the matching module, specifically includes:
Matrix interception unit, for from the sample musical features matrixMiddle interception and the music to be identified
The eigenmatrix of fragment has the matrix of same number of rowsThe sample musical features matrix has
Multiple, wherein, k=1,2 ..., M-N+1, Δ k=1, tz (1), tz (2) ..., tz (M) they are the sample musical features matrix B
In each frame audio characteristic vector, M is the number of characteristic vector in the musical features matrix B, and N is the music to be identified
The line number of the eigenmatrix of fragment, by matrix BkLabeled as eigenmatrix to be compared;
Similarity calculated, it is similar to the eigenmatrix of the snatch of music to be identified for calculating eigenmatrix to be compared
Degree, obtains the to be compared eigenmatrix maximum with the eigenmatrix similarity of the snatch of music to be identified, described to be treated with described
Recognize that the musical features matrix belonging to the maximum eigenmatrix to be compared of the eigenmatrix similarity of snatch of music is maximum phase
Like degree musical features matrix.
9. identifying system according to claim 8, it is characterised in that the matching unit also includes:
Judging unit, completes to calculate, and be less than predetermined threshold value to accuracy requirement for judging whether to need in Preset Time;
Setting unit, for completing to calculate in Preset Time when needs, and when being less than predetermined threshold value to accuracy requirement, by Δ
K is set greater than 1 integer.
10. identifying system according to claim 8, it is characterised in that the similarity calculated is specifically included:
Similarity Measure subelement, for using matrix absolute value distance method, the cosine law method of vector space model or to
The cosine law of quantity space model calculates the eigenmatrix to be compared and is treated with described with the method that Euclidean distance is combined
Recognize the similarity of the eigenmatrix of snatch of music.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201710077359.2A CN106919662B (en) | 2017-02-14 | 2017-02-14 | Music identification method and system |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201710077359.2A CN106919662B (en) | 2017-02-14 | 2017-02-14 | Music identification method and system |
Publications (2)
Publication Number | Publication Date |
---|---|
CN106919662A true CN106919662A (en) | 2017-07-04 |
CN106919662B CN106919662B (en) | 2021-08-31 |
Family
ID=59454524
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201710077359.2A Active CN106919662B (en) | 2017-02-14 | 2017-02-14 | Music identification method and system |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN106919662B (en) |
Cited By (16)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN107731233A (en) * | 2017-11-03 | 2018-02-23 | 王华锋 | A kind of method for recognizing sound-groove based on RNN |
CN108665903A (en) * | 2018-05-11 | 2018-10-16 | 复旦大学 | A kind of automatic testing method and its system of audio signal similarity degree |
CN108735230A (en) * | 2018-05-10 | 2018-11-02 | 佛山市博知盾识科技有限公司 | Background music recognition methods, device and equipment based on mixed audio |
CN109308913A (en) * | 2018-08-02 | 2019-02-05 | 平安科技(深圳)有限公司 | Sound quality evaluation method, device, computer equipment and storage medium |
CN109802987A (en) * | 2018-09-11 | 2019-05-24 | 北京京东方技术开发有限公司 | For the content delivery method of display device, driving means and display equipment |
WO2019184523A1 (en) * | 2018-03-29 | 2019-10-03 | 北京字节跳动网络技术有限公司 | Media feature comparison method and device |
CN110717062A (en) * | 2018-07-11 | 2020-01-21 | 阿里巴巴集团控股有限公司 | Music searching and vehicle-mounted music playing method, device, equipment and storage medium |
CN111429891A (en) * | 2020-03-30 | 2020-07-17 | 腾讯科技(深圳)有限公司 | Audio data processing method, device and equipment and readable storage medium |
CN112102846A (en) * | 2020-09-04 | 2020-12-18 | 腾讯科技(深圳)有限公司 | Audio processing method and device, electronic equipment and storage medium |
WO2021051681A1 (en) * | 2019-09-19 | 2021-03-25 | 腾讯音乐娱乐科技(深圳)有限公司 | Song recognition method and apparatus, storage medium and electronic device |
CN113345443A (en) * | 2021-04-22 | 2021-09-03 | 西北工业大学 | Marine mammal vocalization detection and identification method based on mel-frequency cepstrum coefficient |
CN113432856A (en) * | 2021-06-28 | 2021-09-24 | 西门子电机(中国)有限公司 | Motor testing method, device, electronic equipment and storage medium |
CN114036341A (en) * | 2022-01-10 | 2022-02-11 | 腾讯科技(深圳)有限公司 | Music tag prediction method and related equipment |
WO2022148163A1 (en) * | 2021-01-05 | 2022-07-14 | 北京字跳网络技术有限公司 | Method and apparatus for positioning music clip, and device and storage medium |
CN114783152A (en) * | 2022-03-30 | 2022-07-22 | 郑州熙禾智能科技有限公司 | Energy storage power station fire alarm method and system based on gas-sound information fusion |
CN116546264A (en) * | 2023-04-10 | 2023-08-04 | 北京度友信息技术有限公司 | Video processing method and device, electronic equipment and storage medium |
Citations (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JP2005049859A (en) * | 2003-07-28 | 2005-02-24 | Sony Corp | Method and device for automatically recognizing audio data |
CN104462537A (en) * | 2014-12-24 | 2015-03-25 | 北京奇艺世纪科技有限公司 | Method and device for classifying voice data |
CN104882146A (en) * | 2015-05-12 | 2015-09-02 | 百度在线网络技术(北京)有限公司 | Method and device for processing audio popularization information |
CN105810213A (en) * | 2014-12-30 | 2016-07-27 | 浙江大华技术股份有限公司 | Typical abnormal sound detection method and device |
CN105893389A (en) * | 2015-01-26 | 2016-08-24 | 阿里巴巴集团控股有限公司 | Voice message search method, device and server |
CN105976812A (en) * | 2016-04-28 | 2016-09-28 | 腾讯科技(深圳)有限公司 | Voice identification method and equipment thereof |
WO2016156554A1 (en) * | 2015-04-01 | 2016-10-06 | Spotify Ab | System and method for generating dynamic playlists utilising device co-presence proximity |
-
2017
- 2017-02-14 CN CN201710077359.2A patent/CN106919662B/en active Active
Patent Citations (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JP2005049859A (en) * | 2003-07-28 | 2005-02-24 | Sony Corp | Method and device for automatically recognizing audio data |
CN104462537A (en) * | 2014-12-24 | 2015-03-25 | 北京奇艺世纪科技有限公司 | Method and device for classifying voice data |
CN105810213A (en) * | 2014-12-30 | 2016-07-27 | 浙江大华技术股份有限公司 | Typical abnormal sound detection method and device |
CN105893389A (en) * | 2015-01-26 | 2016-08-24 | 阿里巴巴集团控股有限公司 | Voice message search method, device and server |
WO2016156554A1 (en) * | 2015-04-01 | 2016-10-06 | Spotify Ab | System and method for generating dynamic playlists utilising device co-presence proximity |
CN104882146A (en) * | 2015-05-12 | 2015-09-02 | 百度在线网络技术(北京)有限公司 | Method and device for processing audio popularization information |
CN105976812A (en) * | 2016-04-28 | 2016-09-28 | 腾讯科技(深圳)有限公司 | Voice identification method and equipment thereof |
Non-Patent Citations (2)
Title |
---|
华斌,尹文慧,张奕林: "基于哼唱的音乐检索应用系统", 《计算机工程与应用》 * |
胡政权: "说话人识别中语音参数提取方法的研究", 《中国优秀硕士学位论文全文数据库信息科技辑》 * |
Cited By (24)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN107731233A (en) * | 2017-11-03 | 2018-02-23 | 王华锋 | A kind of method for recognizing sound-groove based on RNN |
WO2019184523A1 (en) * | 2018-03-29 | 2019-10-03 | 北京字节跳动网络技术有限公司 | Media feature comparison method and device |
CN110569373A (en) * | 2018-03-29 | 2019-12-13 | 北京字节跳动网络技术有限公司 | Media feature comparison method and device |
US11593582B2 (en) | 2018-03-29 | 2023-02-28 | Beijing Bytedance Network Technology Co., Ltd. | Method and device for comparing media features |
CN108735230B (en) * | 2018-05-10 | 2020-12-04 | 上海麦克风文化传媒有限公司 | Background music identification method, device and equipment based on mixed audio |
CN108735230A (en) * | 2018-05-10 | 2018-11-02 | 佛山市博知盾识科技有限公司 | Background music recognition methods, device and equipment based on mixed audio |
CN108665903A (en) * | 2018-05-11 | 2018-10-16 | 复旦大学 | A kind of automatic testing method and its system of audio signal similarity degree |
CN108665903B (en) * | 2018-05-11 | 2021-04-30 | 复旦大学 | Automatic detection method and system for audio signal similarity |
CN110717062B (en) * | 2018-07-11 | 2024-03-22 | 斑马智行网络(香港)有限公司 | Music search and vehicle-mounted music playing method, device, equipment and storage medium |
CN110717062A (en) * | 2018-07-11 | 2020-01-21 | 阿里巴巴集团控股有限公司 | Music searching and vehicle-mounted music playing method, device, equipment and storage medium |
CN109308913A (en) * | 2018-08-02 | 2019-02-05 | 平安科技(深圳)有限公司 | Sound quality evaluation method, device, computer equipment and storage medium |
US11410706B2 (en) | 2018-09-11 | 2022-08-09 | Beijing Boe Technology Development Co., Ltd. | Content pushing method for display device, pushing device and display device |
WO2020052324A1 (en) * | 2018-09-11 | 2020-03-19 | 京东方科技集团股份有限公司 | Content pushing method used for display apparatus, pushing apparatus, and display device |
CN109802987A (en) * | 2018-09-11 | 2019-05-24 | 北京京东方技术开发有限公司 | For the content delivery method of display device, driving means and display equipment |
WO2021051681A1 (en) * | 2019-09-19 | 2021-03-25 | 腾讯音乐娱乐科技(深圳)有限公司 | Song recognition method and apparatus, storage medium and electronic device |
CN111429891B (en) * | 2020-03-30 | 2022-03-04 | 腾讯科技(深圳)有限公司 | Audio data processing method, device and equipment and readable storage medium |
CN111429891A (en) * | 2020-03-30 | 2020-07-17 | 腾讯科技(深圳)有限公司 | Audio data processing method, device and equipment and readable storage medium |
CN112102846A (en) * | 2020-09-04 | 2020-12-18 | 腾讯科技(深圳)有限公司 | Audio processing method and device, electronic equipment and storage medium |
WO2022148163A1 (en) * | 2021-01-05 | 2022-07-14 | 北京字跳网络技术有限公司 | Method and apparatus for positioning music clip, and device and storage medium |
CN113345443A (en) * | 2021-04-22 | 2021-09-03 | 西北工业大学 | Marine mammal vocalization detection and identification method based on mel-frequency cepstrum coefficient |
CN113432856A (en) * | 2021-06-28 | 2021-09-24 | 西门子电机(中国)有限公司 | Motor testing method, device, electronic equipment and storage medium |
CN114036341A (en) * | 2022-01-10 | 2022-02-11 | 腾讯科技(深圳)有限公司 | Music tag prediction method and related equipment |
CN114783152A (en) * | 2022-03-30 | 2022-07-22 | 郑州熙禾智能科技有限公司 | Energy storage power station fire alarm method and system based on gas-sound information fusion |
CN116546264A (en) * | 2023-04-10 | 2023-08-04 | 北京度友信息技术有限公司 | Video processing method and device, electronic equipment and storage medium |
Also Published As
Publication number | Publication date |
---|---|
CN106919662B (en) | 2021-08-31 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN106919662A (en) | A kind of music recognition methods and system | |
EP2659482B1 (en) | Ranking representative segments in media data | |
Singh et al. | Multimedia utilization of non-computerized disguised voice and acoustic similarity measurement | |
Mitrović et al. | Features for content-based audio retrieval | |
CN108962279A (en) | New Method for Instrument Recognition and device, electronic equipment, the storage medium of audio data | |
CN101599271B (en) | Recognition method of digital music emotion | |
JP5295433B2 (en) | Perceptual tempo estimation with scalable complexity | |
Cartwright et al. | Social-EQ: Crowdsourcing an Equalization Descriptor Map. | |
CN103440873B (en) | A kind of music recommend method based on similarity | |
Dressler | Pitch estimation by the pair-wise evaluation of spectral peaks | |
CN107851444A (en) | For acoustic signal to be decomposed into the method and system, target voice and its use of target voice | |
CN106997765B (en) | Quantitative characterization method for human voice timbre | |
Mehrabi et al. | Similarity measures for vocal-based drum sample retrieval using deep convolutional auto-encoders | |
Yu et al. | Sparse cepstral codes and power scale for instrument identification | |
CN110534091A (en) | A kind of people-car interaction method identified based on microserver and intelligent sound | |
Zhang | Application of audio visual tuning detection software in piano tuning teaching | |
Meng | Research on timbre classification based on BP neural network and MFCC | |
Kreković et al. | An algorithm for controlling arbitrary sound synthesizers using adjectives | |
Hinrichs et al. | Convolutional neural networks for the classification of guitar effects and extraction of the parameter settings of single and multi-guitar effects from instrument mixes | |
Dorochowicz et al. | Classification of Music Genres by Means of Listening Tests and Decision Algorithms | |
Orio | A model for human-computer interaction based on the recognition of musical gestures | |
Rajan et al. | Multi-channel CNN-Based Rāga Recognition in Carnatic Music Using Sequential Aggregation Strategy | |
Mo | Music timbre extracted from audio signal features | |
Ezers et al. | Musical Instruments Recognition App | |
Lekshmi et al. | Predominant Instrument Recognition in Polyphonic Music Using Convolutional Recurrent Neural Networks |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |