CN106919662A - A kind of music recognition methods and system - Google Patents

A kind of music recognition methods and system Download PDF

Info

Publication number
CN106919662A
CN106919662A CN201710077359.2A CN201710077359A CN106919662A CN 106919662 A CN106919662 A CN 106919662A CN 201710077359 A CN201710077359 A CN 201710077359A CN 106919662 A CN106919662 A CN 106919662A
Authority
CN
China
Prior art keywords
music
eigenmatrix
identified
snatch
similarity
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201710077359.2A
Other languages
Chinese (zh)
Other versions
CN106919662B (en
Inventor
李伟
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Fudan University
Original Assignee
Fudan University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Fudan University filed Critical Fudan University
Priority to CN201710077359.2A priority Critical patent/CN106919662B/en
Publication of CN106919662A publication Critical patent/CN106919662A/en
Application granted granted Critical
Publication of CN106919662B publication Critical patent/CN106919662B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/60Information retrieval; Database structures therefor; File system structures therefor of audio data
    • G06F16/68Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually
    • G06F16/683Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually using metadata automatically derived from the content

Abstract

The invention discloses a kind of music recognition methods and system.The method includes:Obtain snatch of music to be identified;Extract mel cepstrum coefficients, mel cepstrum coefficients first-order difference, linear prediction residue error and the perception linear predictor coefficient of each frame audio in snatch of music to be identified;The characteristic vector of audio is constituted using the above-mentioned coefficient of audio;Characteristic vector to audio described in each frame is combined, and obtains the eigenmatrix of snatch of music to be identified;The eigenmatrix of snatch of music to be identified is compared with the sample musical features matrix in music libraries, maximum similarity eigenmatrix is obtained, the maximum similarity eigenmatrix is the sample musical features matrix maximum with the similarity of snatch of music to be identified;Obtain the music information of the maximum sample musical features matrix of similarity;Music information is exported.The music recognition methods and system that the present invention is provided have more preferable noise immunity and discrimination, and recognition effect is more preferable.

Description

A kind of music recognition methods and system
Technical field
The present invention relates to music recognition field, more particularly to a kind of music recognition methods and system.
Background technology
With flourishing for science and technology, especially the generation of computer technology and develop rapidly, sound and music meter Calculating (Sound andMusic Computing, SMC) becomes an emerging cross discipline, and the subject is mainly by calculating Method understands, simulates and produces sound and music.
With the ripe and development of every technology, people can obtain music by multiple channel.Wherein network is most just It is prompt, most quick to obtain channel.This music that directly results on network generates explosive growth, and people are also accustomed to passing through Network retrieval down-load music.But the music for spreading through the internet not has complete music information to mark, on the one hand for Management causes certain difficulty, and the on the other hand use for obtaining taker also result in certain puzzlement.Simultaneously in life, Music is often simple broadcasting, and the relevant information of music such as title of the song, singer etc., it is not same along with music always When occur.This case, for having heard that music and music-lover that is interested in it, wanting to know about and obtain the music be One very stubborn problem.Retrieval accordingly, for music becomes a highly important problem.
The retrieval of music mainly has two ways, and one kind is text based music retrieval, and another kind is based on content Music retrieval.Wherein, text based music retrieval system mainly by user submit to keyword message, such as title of the song, Ge Shouming, Lyrics fragment etc., is then retrieved and is matched using keyword to the music information in database, so as to realize purpose.The skill Art is very ripe and is used widely, but its limitation is also fairly obvious.Which is bright for that cannot provide The unknown audio of true keyword message cannot be retrieved and recognized, and the keyword message that user provides also very easily is produced Raw mistake, causes error and the failure of retrieval.
And music recognition, namely content-based music retrieval is different from text based music retrieval, it and not according to The text message of music, but be directly identified, so as to reach the mesh of retrieval according to content of music samples fragment itself 's.Although in itself with complicated physical characteristic, per song has its metastable feature to music, and can make One section of music is characterized with the feature of these stabilizations.And music recognition feature namely according to these music come complete identification. The feature that can go out from extraction of music has many, such as beat number per minute, timing node etc. is started over, using different Theoretical and method, can extract different audio frequency characteristics.And different audio frequency characteristics have different characteristics, for different Condition, can choose appropriate feature and be identified.Music recognition overcomes the limitation of text based music retrieval, concern Music in itself, with more preferable adaptability and practicality, will also be increasingly becoming the main trend of music retrieval.
Music recognition, in being calculated as sound and music compared with based on and practical problem, is at home and abroad weighed Depending on.The music recognition service of Shazam companies is by extracting happy line (Music-Fingerprinting, MFP), then carrying out Match to realize music recognition, it proposes a kind of happy line extracting method of distinguished point based, i.e., find feature from frequency spectrum Point, is exactly the happy line of the fragment to the sequence for constituting by its composition characteristic point to (Peak-Pairs) by characteristic point.Separately there is one kind Classical music automatic recognition system based on hidden Markov model, the system is according to the half rank note category feature for extracting (Chroma Features) is clustered, and is then identified using hidden Markov model.
The magnanimity music retrieval system of a kind of optimization based on Shazam companies achievement, by happy line extraction process The middle optimization for adding frequency spectrum optimization, peak value filtering, setting feature three links of pixel confidence to realize to happy line.Also use line Property alignment matching method be used to the singing search system realized approximate melody matching and build based on this.The main basis of the method The pitch contour of close melody is in similitude geometrically, the new method that pitch and rhythm characteristic are considered and formed in the lump.
But above-mentioned music recognition methods are not high to the discrimination of music, recognition effect is also unsatisfactory, needs further Improve.
The content of the invention
There is more preferable noise immunity and discrimination, the more preferable music of recognition effect it is an object of the invention to provide one kind Recognition methods and system.
To achieve the above object, the invention provides following scheme:
A kind of music recognition methods, methods described includes:
Obtain snatch of music to be identified;
Extract mel cepstrum coefficients, the jump of mel cepstrum coefficients one of each frame audio in the snatch of music to be identified Point, linear prediction residue error and perceive linear predictor coefficient;
Using the mel cepstrum coefficients of the audio, mel cepstrum coefficients first-order difference, linear prediction residue error and sense Know that linear predictor coefficient constitutes the characteristic vector of the audio;
Characteristic vector to audio described in each frame is combined, and obtains the eigenmatrix of the snatch of music to be identified;
The eigenmatrix of the snatch of music to be identified is compared with the sample musical features matrix in music libraries, is obtained To maximum similarity eigenmatrix, the maximum similarity eigenmatrix is maximum with the similarity of the snatch of music to be identified Sample musical features matrix;
Obtain the music information of the maximum sample musical features matrix of the similarity;
By music information output.
Optionally, after one section of music to be identified of the acquisition, each frame audio in the extraction music to be identified Before mel cepstrum coefficients, mel cepstrum coefficients first-order difference, linear prediction residue error and perception linear predictor coefficient, also wrap Include:
The snatch of music to be identified is pre-processed, the pretreatment includes preemphasis treatment, framing and adding window.
Optionally, the sample musical features matrix in the eigenmatrix by the snatch of music to be identified and music libraries It is compared, obtains similarity maximum sample musical features matrix, specifically includes:
From the sample musical features matrixThe middle feature square intercepted with the snatch of music to be identified Battle array has the matrix of same number of rowsThe sample musical features matrix has multiple, wherein, k =1,2 ..., M-N+1, Δ k=1, tz (1), tz (2) ..., tz (M) they are each frame audio in the sample musical features matrix B Characteristic vector, M is the number of characteristic vector in the musical features matrix B, and N is the feature square of the snatch of music to be identified The line number of battle array, by matrix BkLabeled as eigenmatrix to be compared;
The similarity of eigenmatrix to be compared and the eigenmatrix of the snatch of music to be identified is calculated, obtains being treated with described Recognize the maximum eigenmatrix to be compared of the eigenmatrix similarity of snatch of music, the spy with the snatch of music to be identified Levy the musical features matrix as maximum similarity musical features matrix belonging to the maximum eigenmatrix to be compared of matrix similarity.
Optionally, described from the musical features matrixMiddle interception and the musical film to be identified The eigenmatrix of section has the matrix of same number of rowsWherein, Δ k=1, k=1,2 ..., M-N+ 1, tz (1), tz (2) ..., tz (M) are the characteristic vector of each frame audio in the musical features matrix B, and M is the special music The number of characteristic vector in matrix B is levied, before N is the line number of the eigenmatrix of the snatch of music to be identified, is also included:
Judge whether to need to complete to calculate in Preset Time, and predetermined threshold value is less than to accuracy requirement;
If it is, Δ k to be set greater than 1 integer.
Optionally, the similarity for calculating eigenmatrix to be compared and the eigenmatrix of the snatch of music to be identified, Specifically include:
Cosine using matrix absolute value distance method, the cosine law method of vector space model or vector space model is determined Manage the feature that the method being combined with Euclidean distance calculates the eigenmatrix to be compared and the snatch of music to be identified The similarity of matrix.
Present invention also offers a kind of music recognition system, the system includes:
Snatch of music acquisition module to be identified, for obtaining snatch of music to be identified;
Parameter extraction module, mel cepstrum coefficients, plum for extracting each frame audio in the snatch of music to be identified That cepstrum coefficient first-order difference, linear prediction residue error and perception linear predictor coefficient;
Characteristic vector determining module, for using the mel cepstrum coefficients of the audio, mel cepstrum coefficients first-order difference, Linear prediction residue error and perception linear predictor coefficient constitute the characteristic vector of the audio;
Eigenmatrix determining module, is combined for the characteristic vector to audio described in each frame, obtains described waiting to know The eigenmatrix of other snatch of music;
Matching module, for by the sample musical features square in the eigenmatrix of the snatch of music to be identified and music libraries Battle array is compared, and obtains maximum similarity eigenmatrix, and the maximum similarity eigenmatrix is and the musical film to be identified The maximum sample musical features matrix of the similarity of section;
Music information acquisition module, the music information for obtaining the maximum sample musical features matrix of the similarity;
Music information output module, for the music information to be exported.
Optionally, the system also includes:
Pretreatment module, for being pre-processed to the snatch of music to be identified, the pretreatment is included at preemphasis Reason, framing and adding window.
Optionally, the matching module, specifically includes:
Matrix interception unit, for from the sample musical features matrixMiddle interception is to be identified with described The eigenmatrix of snatch of music has the matrix of same number of rowsThe sample musical features square Battle array has multiple, wherein, k=1,2 ..., M-N+1, Δ k=1, tz (1), tz (2) ..., tz (M) they are the sample musical features square The characteristic vector of each frame audio in battle array B, M is the number of characteristic vector in the musical features matrix B, and N is described to be identified The line number of the eigenmatrix of snatch of music, by matrix BkLabeled as eigenmatrix to be compared;
Similarity calculated, for calculating eigenmatrix to be compared with the eigenmatrix of the snatch of music to be identified Similarity, obtains the to be compared eigenmatrix maximum with the eigenmatrix similarity of the snatch of music to be identified, described and institute Musical features matrix belonging to the eigenmatrix to be compared of the eigenmatrix similarity maximum for stating snatch of music to be identified is most Big similarity musical features matrix.
Optionally, the matching unit also includes:
Judging unit, completes to calculate for judging whether to need in Preset Time, and to accuracy requirement less than default Threshold value;
Setting unit, for completing to calculate in Preset Time when needs, and when being less than predetermined threshold value to accuracy requirement, Δ k is set greater than 1 integer.
Optionally, the similarity calculated is specifically included:
Similarity Measure subelement, for using matrix absolute value distance method, the cosine law method of vector space model or The cosine law of person's vector space model calculates the eigenmatrix to be compared and institute with the method that Euclidean distance is combined State the similarity of the eigenmatrix of snatch of music to be identified.
According to the specific embodiment that the present invention is provided, the invention discloses following technique effect:The present invention is fallen using Mel Spectral coefficient, linear prediction residue error, perceive linear prediction these three essential characteristics and carry out the comprehensive identification to realize music, three Class audio frequency feature has different characteristics and advantage, and more preferable recognition result can be obtained with reference to after;Linear prediction is perceived because of mould The masking effect of human ear is intended, with more preferable noise immunity and discrimination, the recognition effect for obtaining is more preferable.
Brief description of the drawings
In order to illustrate more clearly about the embodiment of the present invention or technical scheme of the prior art, below will be to institute in embodiment The accompanying drawing for needing to use is briefly described, it should be apparent that, drawings in the following description are only some implementations of the invention Example, for those of ordinary skill in the art, without having to pay creative labor, can also be according to these accompanying drawings Obtain other accompanying drawings.
Fig. 1 is the schematic flow sheet of embodiment of the present invention music recognition methods;
Fig. 2 is the structural representation of embodiment of the present invention music recognition system.
Specific embodiment
Below in conjunction with the accompanying drawing in the embodiment of the present invention, the technical scheme in the embodiment of the present invention is carried out clear, complete Site preparation is described, it is clear that described embodiment is only a part of embodiment of the invention, rather than whole embodiments.It is based on Embodiment in the present invention, it is every other that those of ordinary skill in the art are obtained under the premise of creative work is not made Embodiment, belongs to the scope of protection of the invention.
There are the music recognition methods and system being obviously improved it is an object of the invention to provide a kind of accuracy and noise immunity.
It is below in conjunction with the accompanying drawings and specific real to enable the above objects, features and advantages of the present invention more obvious understandable The present invention is further detailed explanation to apply mode.
Fig. 1 is embodiment of the present invention music recognition methods steps flow chart schematic diagram, as shown in figure 1, music recognition methods are walked It is rapid as follows:
Step 101:Obtain snatch of music to be identified;
Step 102:Extract mel cepstrum coefficients, the mel cepstrum coefficients of each frame audio in the snatch of music to be identified First-order difference, linear prediction residue error and perception linear predictor coefficient;
Step 103:Using the mel cepstrum coefficients of the audio, mel cepstrum coefficients first-order difference, linear prediction cepstrum coefficient Coefficient and perception linear predictor coefficient constitute the characteristic vector of the audio;
Step 104:Characteristic vector to audio described in each frame is combined, and obtains the spy of the snatch of music to be identified Levy matrix;
Step 105:The eigenmatrix of the snatch of music to be identified is entered with the sample musical features matrix in music libraries Row compares, and obtains maximum similarity eigenmatrix, and the maximum similarity eigenmatrix is and the snatch of music to be identified The maximum sample musical features matrix of similarity;
Step 106:Obtain the music information of the maximum sample musical features matrix of the similarity;
Step 107:By music information output.
Wherein, before step 101, also include:
The snatch of music to be identified is pre-processed, the pretreatment includes preemphasis treatment, framing and adding window.
Step 105, specifically includes:
Judge whether to need to complete to calculate in Preset Time, and predetermined threshold value is less than to accuracy requirement;
If it is, Δ k to be set greater than 1 integer;
From the sample musical features matrixThe middle feature square intercepted with the snatch of music to be identified Battle array has the matrix of same number of rowsThe sample musical features matrix has multiple, wherein, k =1,2 ..., M-N+1, Δ k=1, tz (1), tz (2) ..., tz (M) they are each frame audio in the sample musical features matrix B Characteristic vector, M is the number of characteristic vector in the musical features matrix B, and N is the feature square of the snatch of music to be identified The line number of battle array, by matrix BkLabeled as eigenmatrix to be compared;
The similarity of eigenmatrix to be compared and the eigenmatrix of the snatch of music to be identified is calculated, obtains being treated with described Recognize the maximum eigenmatrix to be compared of the eigenmatrix similarity of snatch of music, the spy with the snatch of music to be identified Levy the musical features matrix as maximum similarity musical features matrix belonging to the maximum eigenmatrix to be compared of matrix similarity. When calculating the similarity of the eigenmatrix to be compared and the eigenmatrix of the snatch of music to be identified, using matrix absolute value The cosine law of Furthest Neighbor, the cosine law method of vector space model or vector space model is combined with Euclidean distance Method carry out the calculating of similarity.
Used as preferred embodiment, pretreatment includes three work, respectively preemphasis treatment, framing, adding window.
A) preemphasis treatment
Preemphasis treatment is that, by a high-pass filter, the transmission function of this wave filter is as follows by audio signal:
H (z)=μ z-1
In above formula, the value of parameter μ generally takes 0.97 between 0.9-1.0.The purpose of preemphasis treatment is lifting High fdrequency component in audio signal, making the frequency spectrum of audio signal becomes more flat, strengthens track characteristics, is spectrum analysis or sound Road Parameter analysis are ready.
B) framing
As a whole, be one has unstable, stochastic behaviour process to audio signal, but audio is between the short time In (usually 10ms-30ms), audio signal shows certain stability.In passing research, more attention is Audio carries out short-time analysis in the invariant feature for showing interior in short-term.Continuous audio signal is divided into comprising equivalent amount The period in short-term of sampled point, i.e. frame.Three audio frequency characteristics of required extraction are analyzed on frame-basis.Framing The short-time stability of audio signal can be kept, so that for short-time analysis lays the first stone.Meanwhile, to ensure the continuous of audio signal Property and dynamic, in framing it is required that adjacent two interframe has one section of overlapping region, the Duplication of interframe is typically about 50%-80%.44100Hz is using audio in system, is a frame with 512 sampled points, corresponding time span is (512 ÷ 44100) × 1000=11.61ms, interframe Duplication is the sampled point of 50%, i.e., 256.
C) adding window
To continuous audio signal sub-frame processing, spectrum energy is leaked caused by producing after being truncated by unlimited signal, In order to reduce spectrum energy leakage, it is to avoid each issuable signal in frame two ends is discontinuous, it is necessary to be carried out at adding window to each frame Reason.Conventional window function has following three kinds:
(1) rectangular window (RectangularWindow)
(2) Hamming window (Hamming Window)
(3) Hanning window (Hann Window)
From the point of view of window function, Hamming window and Hanning window are all broad sense raised cosines.From the angle for reducing spectrum energy leakage For degree, Hanning window is better than rectangular window, but frequency identification precision is low.Hamming window is reducing spectrum energy compared with rectangular window Leakage aspect is more excellent, while frequency identification precision is more preferable compared with Hanning window.Therefore, the system carries out adding window with Hamming window. In the flow of the extraction mel cepstrum coefficients of current standard, also adding window is carried out using Hamming window.
As the preferred embodiments of the present invention, the mel cepstrum system of each frame audio in the snatch of music to be identified is extracted Number, mel cepstrum coefficients first-order difference, linear prediction residue error and perception linear predictor coefficient.
1. mel cepstrum coefficients MFCC and its first-order difference
Mel cepstrum coefficients, also known as mel-frequency cepstrum coefficient.The frequency band of mel-frequency cepstrum is first-class in Mel scale Away from what is divided, it is similar to the auditory system of the mankind compared with the linear interval frequency band that cepstrum is used, more, can be from human ear The angle of the sense of hearing preferably represents sound.It is by the common-used formula that frequency f is converted to mel-frequency m:The flow for extracting mel cepstrum coefficients is as follows:
A) Fast Fourier Transform (FFT) (Fast Fourier Transform, FFT) is discrete Fourier transform (Discrete FourierTransform, DFT) fast algorithm.For N point sequences { x [n] } 0≤n≤N, its discrete Fourier transform is
Accordingly, the discrete Fourier transform of audio signal is set to
Wherein x (n) is the audio signal of input, and N shows the points of discrete Fourier transform.
Because acoustic characteristic is difficult to draw in time domain, in order to obtain the characteristic of different audios, usually using discrete Fourier transformation is converted to frequency domain, to obtain its Energy distribution on frequency spectrum, such that it is able to be further processed, Obtain the characteristic of audio.In practice, each frame is obtained by Fast Fourier Transform (FFT) by the audio of pretreatment to each frame Frequency spectrum, then to frequency spectrum modulus square, finally obtain the power spectrum of every frame signal.
B) Mel wave filter group is one group of equally distributed M triangle bandpass filter in Mel scale, and usual M takes 24, centre frequency is f (m) (m=1,2 ..., M), and the interval of each f (m) is widened along with the increase of m.The triangle band logical is filtered The frequency response function of ripple device is
Wherein
Because the scope that each triangle bandpass filter is covered in the wave filter group be similar to human ear one is critical Bandwidth, in order to simulate the masking effect of human ear, is filtered using the wave filter group to the power spectrum of every frame signal.
C) output result to Mel wave filter group carries out following computing, so as to try to achieve logarithmic energy.
D) logarithmic energy for trying to achieve back carries out discrete cosine transform, as follows
Wherein L is mel cepstrum coefficients exponent number, generally takes 12-16.Because 0 rank of mel cepstrum coefficients reflects spectrum energy, It is often used without, using 12-16 ranks parameter thereafter as mel cepstrum coefficients.The mel cepstrum coefficients that the system is extracted are 12 ranks.
E) it is merely able to react the static characteristic of audio because of mel cepstrum coefficients, in order to describe the dynamic characteristic of audio, so Calculate the first-order difference parameter of mel cepstrum coefficients.Calculate first-order difference formula be:
Wherein, dtIt is t-th first-order difference, CtIt is t-th mel cepstrum coefficients, Q is the exponent number of mel cepstrum coefficients, K tables Show the time difference of first derivative, can use 1 or 2.
F) mel cepstrum coefficients and its first-order difference are merged, as one 24 vector of dimension, as first audio spy Levy.
2. linear prediction residue error LPCC
Linear prediction residue error is obtained based on linear predictor coefficient (LPC).The basic thought of linear prediction It is that using the correlation between audio sampling point, past several sample values predict present or following sample value.Extract line Property prediction cepstrum coefficient flow it is as follows:
A) LPC parameters are solved
Tract characteristics are simulated using all-pole modeling, transfer function is as follows:
P is the exponent number of linear prediction, a in formulakIt is that (k=1,2 ..., p), G is that vocal tract filter increases to linear predictor coefficient Benefit.
If a certain frame audio signal is x (n), the linear prediction result of the audio is:
Error between the frame in audio signal and linear prediction result is:
Linear predictor coefficient is by making the error between the sample value of actual audio signal and linear prediction result equal Square criterion is issued to minimum come what is tried to achieve, i.e. following formula value reaches minimum:
To cause that above formula value is minimum, to akLocal derviation is sought, it is zero then to make it, need to solve following canonical side after abbreviation Journey group:
Wherein R (k) is the auto-correlation coefficient of sequence x (n).Above formula can be rewritten into following matrix form:
Rp·ap=-rp
Wherein
ap=[ap1,ap2,...,app]T
rp=[R (1), R (2) ..., R (p)]T
Then Levinson-Durbin algorithms are used, namely based on autocorrelative Recursive Solution equation, is solved cutting edge aligned Predictive coefficient.
Through observation shows that, RpToeplitz matrixes, with the element on diagonal it is all equal the characteristics of. Levinson-Durbin is iterated calculating using this feature, the result of low order is first calculated, then using the result of low order Go to calculate high order result.For m rank linear predictions, linear predictor coefficient algorithm is can obtain as follows:
amk=a(m-1)kma(m-1)(m-k)=a(m-1)k+amma(m-1)(m-k)
Wherein, am-1It is (m-1) rank predictive coefficient, Subscript b represent rm-1=[R (1), R (2) ..., R (m-1)]TThe inverted order arrangement of element.
According to this algorithm, linear predictor coefficient can be obtained.
B) LPCC parameters are solved
Cepstrum generally refers to the inverse Fourier transform of the logarithm value of power spectrum, and the mathematic(al) representation of cepstrum function is:
C (q)=IDFT (log (| s (f) |2))
Wherein, s (f) is the Fourier transformation of signal s (t), and to take the logarithm, IDFT () is inverse Fourier transform to log ().
Linear predictor coefficient and cepstrum are defined, and linear prediction residue error exists a kind of using linear predictor coefficient realization Recursive Solution method, it is as follows:
Specific recurrence relation is as follows:
Wherein, anIt is n-th order linear predictor coefficient.
Linear prediction residue error c is tried to achieve according to this recurrence relation, i.e.,
C=[c (1), c (2) ..., c (q)], 10≤q≤16
Q is the exponent number of linear prediction residue error in formula, and 12 are taken in the present invention.
3. linear prediction PLP is perceived
It is the characteristic parameter based on auditory model to perceive linear prediction, and the all-pole modeling using linear prediction is divided Analysis, but the related conclusions of human auditory model are applied into spectrum analysis from linear prediction unlike linear prediction, is perceived In, by the method for approximate calculation so that audible spectrum meets human auditory system and perceives mechanism, improves the robust of audio frequency characteristics Property.
Perception linear forecasting technology realizes the imitation to the Auditory Perception mechanism of human ear by three levels, faces respectively The treatment of boundary frequency range analysis, contour of equal loudness preemphasis, signal intensity-hearing loudness conversion.Extract and perceive linear prediction feature Flow is as follows:
A) discrete Fourier transform DFT
It is similar to mel cepstrum coefficients MFCC is extracted, calculated using Fast Fourier Transform (FFT), obtain every frame audio letter Number frequency spectrum, then to the power spectrum for square obtaining every frame signal of frequency spectrum modulus.After this process, the power of audio signal is obtained Spectrum P (f).
B) critical band analysis
The masking effect of human auditory system is an important feature, refers to that the sound not waited when two loudness acts on people During ear, the sound that loudness sound higher can allow loudness relatively low becomes to be difficult to be therefore easily perceived by humans.Under study for action, critical band is analyzed is Embodiment to masking effect.Critical band refers to continuously making an uproar by the frequency centered on it and with certain bandwidth when certain pure tone When sound is sheltered, if the pure tone can just be heard the power that the power for being is equal to this inband noise, then this band Width is referred to as critical band width.The unit of critical band is Bark.
In order that obtaining PLP features can approximately imitate the mechanism of perception of human auditory system, it is necessary to be analyzed by critical band, will The frequency axis f of power spectrum is mapped to Bark domains, and formula is as follows:
The audio frequency that system is used is 44100Hz, and 30 frequency bands can be obtained after substitution.Then by the power spectrum after mapping with Simulate the curve of critical bandMutually roll up, obtain critical band power spectrum θ (k):
Wherein simulate critical band curveIt is as follows:
Z in formula0K () represents k-th centre frequency of critical band power spectrum.
The loudness preemphasis such as c)
Loudness describes subjective feeling of the human ear to sound, and the external ear and middle ear of people are for 1~5kHz frequency ranges Interior sound has the lifting of 10~20dB.In order to imitate this characteristic of human ear, it is necessary to the treatment of loudness preemphasis such as carry out.Use The 40dB equal loudness contours of simulation such as carry out at the treatment of loudness preemphasis, and because of it, no matter noise intensity height can preferably reflect people Ear is widely used in the evaluation criterion of noise to the sensation of noise loudness.
Simulating contour of equal loudness is:
Preemphasis is carried out to critical band power spectrum using simulation contour of equal loudness:
τ (k)=E [f0(k)] θ (k), (k=1,2 ..., 30)
Wherein f0K () represents the frequency corresponding to k-th centre frequency of critical band power spectrum.
D) intensity-hearing loudness conversion
Because the relation between the intensity of sound and the loudness of auditory perceptual is nonlinear, in order to simulate this relation, Need to carry out the conversion between intensity and hearing loudness:
E) inverse discrete Fourier transform IDFT
Inverse discrete Fourier transform is the inverse process to discrete Fourier transform, and direct computation of DFT is carried out here by δ (k) Leaf inverse transformation obtains the short-time autocorrelation function of audio signal, and the linear prediction analysis after being prepares.
For N point sequences { x [n] } 0≤n<N, its inverse discrete Fourier transform is
F) all-pole modeling
Using all-pole modeling, the result of inverse discrete Fourier transform is carried out to δ (k) carries out linear prediction analysis, method It is identical with linear predictor coefficient is solved.Go out 12 rank linear predictor coefficients using Levinson-Durbin Algorithm for Solving, then utilize With solution linear prediction residue error identical method, 16 rank cepstrum coefficients are solved, be as a result and perceive linear predictor coefficient.
As the preferred embodiment of the invention, using the mel cepstrum coefficients of the audio, the jump of mel cepstrum coefficients one Point, linear prediction residue error and perceive the characteristic vector that linear predictor coefficient constitutes the audio;To audio described in each frame Characteristic vector be combined, obtain the eigenmatrix of the snatch of music to be identified.
For each frame audio, 12 Jan Vermeer cepstrum coefficient m, 12 Jan Vermeer cepstrum coefficient first-order difference Δs are can extract out M, the 12 dimensional linears dimensions of prediction cepstrum coefficient l and 16 perceive linear predictor coefficient p:
M=[m1 m2…m12]
Δ m=[Δ m1 Δm2…Δm12]
L=[l1 l2…l12]
P=[p1 p2…p16]
Aforementioned four feature is merged turns into the general audio characteristic vector tz of one 52 dimension:
Tz=[m Δ m l p]
For a section audio, the eigenmatrix A, N that can extract out N × 52 represent totalframes, by audio duration, adopt Sample rate, frame length and frame are moved and determined:
Why propose the synthesis based on mel cepstrum coefficients, linear prediction residue error, perception linear prediction Audio frequency characteristics are used as music recognition standard, because the different characteristic that these features have.Mel cepstrum coefficients are based on Auditory model, according to human ear mechanism, obtains result, with preferably anti-by with hertz into the mel-frequency of non-linear relation Making an uproar property;Linear prediction residue error is based on channel model, according to principle of sound, is obtained by the linear approximation of all-pole modeling As a result, the tract characteristics and resonance peak character of audio signal are reflected;Perceive linear prediction and be based on auditory model, according to human ear Mechanism, by obtaining result for the simulation and linear prediction of human ear characteristic, reflects the masking characteristics of human ear, with compared with Good noise immunity.
Four kinds of features of selection each have its advantage and can be complementary, so with mel cepstrum coefficients, linear prediction cepstrum coefficient Coefficient and perceive linear predictor coefficient be combined the general audio feature to be formed as criterion of identification, can carry out more fully, More accurately retrieval matching, music recognition rate and noise immunity can get a promotion.
It is as preferred embodiment, the sample music in the eigenmatrix of the snatch of music to be identified and music libraries is special Levy matrix to be compared, obtain maximum similarity eigenmatrix, the maximum similarity eigenmatrix is and the sound to be identified The maximum sample musical features matrix of the similarity of happy fragment.
Matching is to be compared the eigenmatrix of any sample music in the eigenmatrix of snatch of music and audio repository And matching, most close eigenmatrix is found, the sample music information belonging to the matrix is then obtained, namely:
The eigenmatrix A of-unknown audio sample fragments
- audio repository provides a series of eigenmatrix B for being used to match and comparingk(n)
- find out the corresponding n of the matrix most like with the eigenmatrix A of unknown audio sample fragments
Such as, the eigenmatrix of snatch of music is A, is the matrix of N × 52, if the feature square of any music in audio repository Battle array is B, is the matrix of M × 52, and M >=N:
Interception has the submatrix B of same number of rows with matrix A from matrix B successivelyk
Wherein k represents submatrix BkThe first row be located at position in eigenmatrix B, the changes delta k of k values is defaulted as 1, Namely matrix BkWith matrix Bk+1Between have N-1 rows identical, complete identification need M-N+1 matrix BkCarried out with matrix A successively Compare, calculate similarity ψ (k).
Calculating two similarities of matrix can have various methods, and three kinds of methods have been outlined below:
A) matrix absolute value distance
Two similarity degrees of matrix are calculated, there are various methods, simplest is exactly to seek two absolute value distances of matrix.
Size is the matrix A and matrix B of N × 52kSubtract each other that to obtain matrix poor, matrix difference is asked and thoroughly deserves matrix C:
C=| A-Bk|
Then diversity factor ψ (k) for obtaining two matrixes is summed up to all elements in Matrix C:
ψ (k)=C1,1+C1,2+C1,3+…+Ci,j+…+CN,52,
1≤i≤N, 1≤j≤52
Although this method is easier to realize also relatively intuitively, readily appreciate, but this method is easily influenceed by 0 value, once sound Occur the pause of long period in pleasure, matching effect will be influenceed.
B) cosine law of vector space model
It is considered that two similarity degrees of matrix, are in essence to compare the numerical value on two correspondence positions of matrix Difference degree, as long as namely the corresponding relation of numerical value do not change, the form of matrix can be what is be changed.By two Individual matrix is deformed, and becomes two multi-C vectors, the method then referring to text similarity is calculated in natural language processing, profit Audio similarity is calculated with the cosine law of vector space model (Vector Space Model).
According to the cosine law of vector space model, the similarity of two section audios can be by their audio frequency characteristics in N The relative position performance of dimension space, and the relative position of audio frequency characteristics is then the cosine table by the angle between two vectors Show, angle between the two is smaller, then similarity is bigger.
Size is the matrix A and matrix B of N × 52kBe converted to the matrix that two sizes are 1 × 52N, namely two The vectorial a of 52N dimensions1And a2
a1((i-1) * 52+j)=A (i, j), 1≤i≤N, 1≤j≤52
a2((i-1) * 52+j)=Bk(i, j), 1≤i≤N, 1≤j≤52
According to the cosine law of space vector, calculating matrix A and matrix BkSimilarity ψ:
Wherein vector a2According to matrix BkChange and change.
C) cosine law & Euclidean distances of vector space model
It is similar to former approach, the similarity of matrix is asked for using the cosine law of vector space model;Difference It is that eigenmatrix is first divided into three submatrixs according to the dimension of three class audio frequency features, then asks for the sub- square of audio sample Similarity between battle array and audio repository submatrix, then obtains the similarity of eigenmatrix by Euclidean distance.
Size is the matrix A and matrix B of N × 52kFirst respectively according to the dimension of three category features, three sub- squares are divided into Battle array, the respectively A of N × 241And Bk1, N × 12 A2And Bk2, N × 16 A3And Bk3, the six sub- matrix conversions that will be obtained are six Individual vector of different sizes:
a1((i-1) * 24+j)=A1(i, j), 1≤i≤N, 1≤j≤24
a2((i-1) * 12+j)=A2(i, j), 1≤i≤N, 1≤j≤12
a3((i-1) * 16+j)=A3(i, j), 1≤i≤N, 1≤j≤16
b1((i-1) * 24+j)=Bk1(i, j), 1≤i≤N, 1≤j≤24
b2((i-1) * 12+j)=Bk2(i, j), 1≤i≤N, 1≤j≤12
b3((i-1) * 16+j)=Bk3(i, j), 1≤i≤N, 1≤j≤16
According to the cosine law of space vector, calculated sub-matrix A1With submatrix Bk1, submatrix A2With submatrix Bk2, sub- square Battle array A3With submatrix Bk3Similarity ψ1(k)、ψ2(k)、ψ3(k):
According to Euclidean distance, matrix A and matrix B are calculatedkSimilarity ψ (k):
(3rd) kind method is have selected when system is realized carries out Similarity Measure, obtain M-N+1 dimension similarity to Amount, then by comparing, finds out the similarity ψ closest to 1 in ψ (k)maxAnd its corresponding k, then obtain corresponding audio Time location t:
Wherein L represents frame length, and sr represents sample rate, and wherein L is that the unit that 512, sr is 44100, t is the second.
By the eigenmatrix B of all audios in the eigenmatrix A of snatch of music and audio repositorykN () (n is in audio repository Audio quantity), Similarity Measure is carried out, obtain the ψ of audio sample and every first audiomaxAnd t, the ψ of all audios in storehousemaxFormed One n-dimensional vector ψmaxN (), the t of all audios forms n-dimensional vector t (n) in storehouse.
By comparing, n-dimensional vector ψ is obtainedmaxIn (n) closest to 1 that, then using corresponding n, obtain audio Title and time location, namely complete identification.
3. Rapid matching
Matching process described above is more comprehensive and accurate method, and relative, matching speed also can be slower.For reality When property is higher, the relatively low situation of accuracy requirement, 1 can be set greater than by by the value of Δ k, so as to accelerate submatrix Bk's Translational speed.Because Δ k values increase to the integer more than 1 so that matrix BkWith matrix Bk+1Identical line number is reduced to N- Δ k, Completing identification needs the number of times of the matrix Similarity Measure for carrying out to be reduced to from M-N+1It is secondary, with the increase of Δ k, Calculation times reduction is bigger, and the increase of matching speed is also more notable.
The music recognition methods that the present invention is provided are linear pre- using mel cepstrum coefficients, linear prediction residue error, perception Surveying these three essential characteristics carries out the comprehensive identification to realize music, and three class audio frequency features have different characteristics and advantage, tie More preferable recognition result can be obtained after conjunction;Linear prediction is perceived because simulating the masking effect of human ear, with more preferable anti-noise Property and discrimination, the recognition effect for obtaining are more preferable.
Present invention also offers a kind of music recognition system, Fig. 2 shows for the structure of embodiment of the present invention music recognition system It is intended to, as shown in Fig. 2 the system includes:
Snatch of music acquisition module 201 to be identified, for obtaining snatch of music to be identified;
Parameter extraction module 202, for extract each frame audio in the snatch of music to be identified mel cepstrum coefficients, Mel cepstrum coefficients first-order difference, linear prediction residue error and perception linear predictor coefficient;
Characteristic vector determining module 203, for mel cepstrum coefficients, the jump of mel cepstrum coefficients one using the audio Point, linear prediction residue error and perceive the characteristic vector that linear predictor coefficient constitutes the audio;
Eigenmatrix determining module 204, is combined for the characteristic vector to audio described in each frame, obtains described treating Recognize the eigenmatrix of snatch of music;
Matching module 205, for the sample music in the eigenmatrix of the snatch of music to be identified and music libraries is special Levy matrix to be compared, obtain maximum similarity eigenmatrix, the maximum similarity eigenmatrix is and the sound to be identified The maximum sample musical features matrix of the similarity of happy fragment;
Music information acquisition module 206, the music letter for obtaining the maximum sample musical features matrix of the similarity Breath;
Music information output module 207, for the music information to be exported.
The system also includes:
Pretreatment module, for being pre-processed to the snatch of music to be identified, the pretreatment is included at preemphasis Reason, framing and adding window.
The matching module 205, specifically includes:
Matrix interception unit, for from the sample musical features matrixMiddle interception is to be identified with described The eigenmatrix of snatch of music has the matrix of same number of rowsThe sample musical features square Battle array has multiple, wherein, k=1,2 ..., M-N+1, Δ k=1, tz (1), tz (2) ..., tz (M) they are the sample musical features square The characteristic vector of each frame audio in battle array B, M is the number of characteristic vector in the musical features matrix B, and N is described to be identified The line number of the eigenmatrix of snatch of music, by matrix BkLabeled as eigenmatrix to be compared;
Similarity calculated, for calculating eigenmatrix to be compared with the eigenmatrix of the snatch of music to be identified Similarity, obtains the to be compared eigenmatrix maximum with the eigenmatrix similarity of the snatch of music to be identified, described and institute Musical features matrix belonging to the eigenmatrix to be compared of the eigenmatrix similarity maximum for stating snatch of music to be identified is most Big similarity musical features matrix.
Judging unit, completes to calculate for judging whether to need in Preset Time, and to accuracy requirement less than default Threshold value;
Setting unit, for completing to calculate in Preset Time when needs, and when being less than predetermined threshold value to accuracy requirement, Δ k is set greater than 1 integer.
The similarity calculated is specifically included:
Similarity Measure subelement, for using matrix absolute value distance method, the cosine law method of vector space model or The cosine law of person's vector space model calculates the eigenmatrix to be compared and institute with the method that Euclidean distance is combined State the similarity of the eigenmatrix of snatch of music to be identified.
The music recognition system that the present invention is provided is linear pre- using mel cepstrum coefficients, linear prediction residue error, perception Surveying these three essential characteristics carries out the comprehensive identification to realize music, and three class audio frequency features have different characteristics and advantage, tie More preferable recognition result can be obtained after conjunction;Linear prediction is perceived because simulating the masking effect of human ear, with more preferable anti-noise Property and discrimination, the recognition effect for obtaining are more preferable.
Each embodiment is described by the way of progressive in this specification, and what each embodiment was stressed is and other The difference of embodiment, between each embodiment identical similar portion mutually referring to.For system disclosed in embodiment For, because it is corresponded to the method disclosed in Example, so description is fairly simple, related part is said referring to method part It is bright.
Specific case used herein is set forth to principle of the invention and implementation method, and above example is said It is bright to be only intended to help and understand the method for the present invention and its core concept;Simultaneously for those of ordinary skill in the art, foundation Thought of the invention, will change in specific embodiments and applications.In sum, this specification content is not It is interpreted as limitation of the present invention.

Claims (10)

1. a kind of music recognition methods, it is characterised in that methods described includes:
Obtain snatch of music to be identified;
Extract mel cepstrum coefficients, mel cepstrum coefficients first-order difference, the line of each frame audio in the snatch of music to be identified Property prediction cepstrum coefficient and perceive linear predictor coefficient;
Using the mel cepstrum coefficients of the audio, mel cepstrum coefficients first-order difference, linear prediction residue error and perception line Property predictive coefficient constitute the characteristic vector of the audio;
Characteristic vector to audio described in each frame is combined, and obtains the eigenmatrix of the snatch of music to be identified;
The eigenmatrix of the snatch of music to be identified is compared with the sample musical features matrix in music libraries, is obtained most Big similarity eigenmatrix, the maximum similarity eigenmatrix is the sample maximum with the similarity of the snatch of music to be identified This musical features matrix;
Obtain the music information of the maximum sample musical features matrix of the similarity;
By music information output.
2. recognition methods according to claim 1, it is characterised in that after one section of music to be identified of the acquisition, carry Take mel cepstrum coefficients, mel cepstrum coefficients first-order difference, the linear prediction cepstrum coefficient of each frame audio in the music to be identified Before coefficient and perception linear predictor coefficient, also include:
The snatch of music to be identified is pre-processed, the pretreatment includes preemphasis treatment, framing and adding window.
3. recognition methods according to claim 1, it is characterised in that the feature square by the snatch of music to be identified Battle array is compared with the sample musical features matrix in music libraries, obtains similarity maximum sample musical features matrix, specific bag Include:
From the sample musical features matrixMiddle interception has with the eigenmatrix of the snatch of music to be identified There is the matrix of same number of rowsThe sample musical features matrix has multiple, wherein, k=1, 2 ..., M-N+1, Δ k=1, tz (1), tz (2) ..., tz (M) are the spy of each frame audio in the sample musical features matrix B Vector is levied, M is the number of characteristic vector in the musical features matrix B, and N is the eigenmatrix of the snatch of music to be identified Line number, by matrix BkLabeled as eigenmatrix to be compared;
The similarity of eigenmatrix to be compared and the eigenmatrix of the snatch of music to be identified is calculated, obtains to be identified with described The maximum eigenmatrix to be compared of the eigenmatrix similarity of snatch of music, the feature square with the snatch of music to be identified Musical features matrix belonging to the maximum eigenmatrix to be compared of battle array similarity is maximum similarity musical features matrix.
4. recognition methods according to claim 3, it is characterised in that described from the musical features matrixIt is middle to intercept the matrix with the eigenmatrix of the snatch of music to be identified with same number of rowsWherein, Δ k=1, k=1,2 ..., M-N+1, tz (1), tz (2) ..., tz (M) is described The characteristic vector of each frame audio in musical features matrix B, M is the number of characteristic vector in the musical features matrix B, and N is Before the line number of the eigenmatrix of the snatch of music to be identified, also include:
Judge whether to need to complete to calculate in Preset Time, and predetermined threshold value is less than to accuracy requirement;
If it is, Δ k to be set greater than 1 integer.
5. recognition methods according to claim 3, it is characterised in that the calculating eigenmatrix to be compared is waited to know with described The similarity of the eigenmatrix of other snatch of music, specifically includes:
Using matrix absolute value distance method, the cosine law method of vector space model or vector space model the cosine law with The method that Euclidean distance is combined calculates the eigenmatrix of the eigenmatrix to be compared and the snatch of music to be identified Similarity.
6. a kind of music recognition system, it is characterised in that the system includes:
Snatch of music acquisition module to be identified, for obtaining snatch of music to be identified;
Parameter extraction module, falls for extracting the mel cepstrum coefficients of each frame audio in the snatch of music to be identified, Mel Spectral coefficient first-order difference, linear prediction residue error and perception linear predictor coefficient;
Characteristic vector determining module, for using the mel cepstrum coefficients of the audio, mel cepstrum coefficients first-order difference, linear Prediction cepstrum coefficient and perception linear predictor coefficient constitute the characteristic vector of the audio;
Eigenmatrix determining module, is combined for the characteristic vector to audio described in each frame, obtains the sound to be identified The eigenmatrix of happy fragment;
Matching module, for the eigenmatrix of the snatch of music to be identified to be entered with the sample musical features matrix in music libraries Row compares, and obtains maximum similarity eigenmatrix, and the maximum similarity eigenmatrix is and the snatch of music to be identified The maximum sample musical features matrix of similarity;
Music information acquisition module, the music information for obtaining the maximum sample musical features matrix of the similarity;
Music information output module, for the music information to be exported.
7. identifying system according to claim 6, it is characterised in that the system also includes:
Pretreatment module, for being pre-processed to the snatch of music to be identified, the pretreatment includes preemphasis treatment, divides Frame and adding window.
8. identifying system according to claim 6, it is characterised in that the matching module, specifically includes:
Matrix interception unit, for from the sample musical features matrixMiddle interception and the music to be identified The eigenmatrix of fragment has the matrix of same number of rowsThe sample musical features matrix has Multiple, wherein, k=1,2 ..., M-N+1, Δ k=1, tz (1), tz (2) ..., tz (M) they are the sample musical features matrix B In each frame audio characteristic vector, M is the number of characteristic vector in the musical features matrix B, and N is the music to be identified The line number of the eigenmatrix of fragment, by matrix BkLabeled as eigenmatrix to be compared;
Similarity calculated, it is similar to the eigenmatrix of the snatch of music to be identified for calculating eigenmatrix to be compared Degree, obtains the to be compared eigenmatrix maximum with the eigenmatrix similarity of the snatch of music to be identified, described to be treated with described Recognize that the musical features matrix belonging to the maximum eigenmatrix to be compared of the eigenmatrix similarity of snatch of music is maximum phase Like degree musical features matrix.
9. identifying system according to claim 8, it is characterised in that the matching unit also includes:
Judging unit, completes to calculate, and be less than predetermined threshold value to accuracy requirement for judging whether to need in Preset Time;
Setting unit, for completing to calculate in Preset Time when needs, and when being less than predetermined threshold value to accuracy requirement, by Δ K is set greater than 1 integer.
10. identifying system according to claim 8, it is characterised in that the similarity calculated is specifically included:
Similarity Measure subelement, for using matrix absolute value distance method, the cosine law method of vector space model or to The cosine law of quantity space model calculates the eigenmatrix to be compared and is treated with described with the method that Euclidean distance is combined Recognize the similarity of the eigenmatrix of snatch of music.
CN201710077359.2A 2017-02-14 2017-02-14 Music identification method and system Active CN106919662B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201710077359.2A CN106919662B (en) 2017-02-14 2017-02-14 Music identification method and system

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201710077359.2A CN106919662B (en) 2017-02-14 2017-02-14 Music identification method and system

Publications (2)

Publication Number Publication Date
CN106919662A true CN106919662A (en) 2017-07-04
CN106919662B CN106919662B (en) 2021-08-31

Family

ID=59454524

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201710077359.2A Active CN106919662B (en) 2017-02-14 2017-02-14 Music identification method and system

Country Status (1)

Country Link
CN (1) CN106919662B (en)

Cited By (16)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107731233A (en) * 2017-11-03 2018-02-23 王华锋 A kind of method for recognizing sound-groove based on RNN
CN108665903A (en) * 2018-05-11 2018-10-16 复旦大学 A kind of automatic testing method and its system of audio signal similarity degree
CN108735230A (en) * 2018-05-10 2018-11-02 佛山市博知盾识科技有限公司 Background music recognition methods, device and equipment based on mixed audio
CN109308913A (en) * 2018-08-02 2019-02-05 平安科技(深圳)有限公司 Sound quality evaluation method, device, computer equipment and storage medium
CN109802987A (en) * 2018-09-11 2019-05-24 北京京东方技术开发有限公司 For the content delivery method of display device, driving means and display equipment
WO2019184523A1 (en) * 2018-03-29 2019-10-03 北京字节跳动网络技术有限公司 Media feature comparison method and device
CN110717062A (en) * 2018-07-11 2020-01-21 阿里巴巴集团控股有限公司 Music searching and vehicle-mounted music playing method, device, equipment and storage medium
CN111429891A (en) * 2020-03-30 2020-07-17 腾讯科技(深圳)有限公司 Audio data processing method, device and equipment and readable storage medium
CN112102846A (en) * 2020-09-04 2020-12-18 腾讯科技(深圳)有限公司 Audio processing method and device, electronic equipment and storage medium
WO2021051681A1 (en) * 2019-09-19 2021-03-25 腾讯音乐娱乐科技(深圳)有限公司 Song recognition method and apparatus, storage medium and electronic device
CN113345443A (en) * 2021-04-22 2021-09-03 西北工业大学 Marine mammal vocalization detection and identification method based on mel-frequency cepstrum coefficient
CN113432856A (en) * 2021-06-28 2021-09-24 西门子电机(中国)有限公司 Motor testing method, device, electronic equipment and storage medium
CN114036341A (en) * 2022-01-10 2022-02-11 腾讯科技(深圳)有限公司 Music tag prediction method and related equipment
WO2022148163A1 (en) * 2021-01-05 2022-07-14 北京字跳网络技术有限公司 Method and apparatus for positioning music clip, and device and storage medium
CN114783152A (en) * 2022-03-30 2022-07-22 郑州熙禾智能科技有限公司 Energy storage power station fire alarm method and system based on gas-sound information fusion
CN116546264A (en) * 2023-04-10 2023-08-04 北京度友信息技术有限公司 Video processing method and device, electronic equipment and storage medium

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2005049859A (en) * 2003-07-28 2005-02-24 Sony Corp Method and device for automatically recognizing audio data
CN104462537A (en) * 2014-12-24 2015-03-25 北京奇艺世纪科技有限公司 Method and device for classifying voice data
CN104882146A (en) * 2015-05-12 2015-09-02 百度在线网络技术(北京)有限公司 Method and device for processing audio popularization information
CN105810213A (en) * 2014-12-30 2016-07-27 浙江大华技术股份有限公司 Typical abnormal sound detection method and device
CN105893389A (en) * 2015-01-26 2016-08-24 阿里巴巴集团控股有限公司 Voice message search method, device and server
CN105976812A (en) * 2016-04-28 2016-09-28 腾讯科技(深圳)有限公司 Voice identification method and equipment thereof
WO2016156554A1 (en) * 2015-04-01 2016-10-06 Spotify Ab System and method for generating dynamic playlists utilising device co-presence proximity

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2005049859A (en) * 2003-07-28 2005-02-24 Sony Corp Method and device for automatically recognizing audio data
CN104462537A (en) * 2014-12-24 2015-03-25 北京奇艺世纪科技有限公司 Method and device for classifying voice data
CN105810213A (en) * 2014-12-30 2016-07-27 浙江大华技术股份有限公司 Typical abnormal sound detection method and device
CN105893389A (en) * 2015-01-26 2016-08-24 阿里巴巴集团控股有限公司 Voice message search method, device and server
WO2016156554A1 (en) * 2015-04-01 2016-10-06 Spotify Ab System and method for generating dynamic playlists utilising device co-presence proximity
CN104882146A (en) * 2015-05-12 2015-09-02 百度在线网络技术(北京)有限公司 Method and device for processing audio popularization information
CN105976812A (en) * 2016-04-28 2016-09-28 腾讯科技(深圳)有限公司 Voice identification method and equipment thereof

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
华斌,尹文慧,张奕林: "基于哼唱的音乐检索应用系统", 《计算机工程与应用》 *
胡政权: "说话人识别中语音参数提取方法的研究", 《中国优秀硕士学位论文全文数据库信息科技辑》 *

Cited By (24)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107731233A (en) * 2017-11-03 2018-02-23 王华锋 A kind of method for recognizing sound-groove based on RNN
WO2019184523A1 (en) * 2018-03-29 2019-10-03 北京字节跳动网络技术有限公司 Media feature comparison method and device
CN110569373A (en) * 2018-03-29 2019-12-13 北京字节跳动网络技术有限公司 Media feature comparison method and device
US11593582B2 (en) 2018-03-29 2023-02-28 Beijing Bytedance Network Technology Co., Ltd. Method and device for comparing media features
CN108735230B (en) * 2018-05-10 2020-12-04 上海麦克风文化传媒有限公司 Background music identification method, device and equipment based on mixed audio
CN108735230A (en) * 2018-05-10 2018-11-02 佛山市博知盾识科技有限公司 Background music recognition methods, device and equipment based on mixed audio
CN108665903A (en) * 2018-05-11 2018-10-16 复旦大学 A kind of automatic testing method and its system of audio signal similarity degree
CN108665903B (en) * 2018-05-11 2021-04-30 复旦大学 Automatic detection method and system for audio signal similarity
CN110717062B (en) * 2018-07-11 2024-03-22 斑马智行网络(香港)有限公司 Music search and vehicle-mounted music playing method, device, equipment and storage medium
CN110717062A (en) * 2018-07-11 2020-01-21 阿里巴巴集团控股有限公司 Music searching and vehicle-mounted music playing method, device, equipment and storage medium
CN109308913A (en) * 2018-08-02 2019-02-05 平安科技(深圳)有限公司 Sound quality evaluation method, device, computer equipment and storage medium
US11410706B2 (en) 2018-09-11 2022-08-09 Beijing Boe Technology Development Co., Ltd. Content pushing method for display device, pushing device and display device
WO2020052324A1 (en) * 2018-09-11 2020-03-19 京东方科技集团股份有限公司 Content pushing method used for display apparatus, pushing apparatus, and display device
CN109802987A (en) * 2018-09-11 2019-05-24 北京京东方技术开发有限公司 For the content delivery method of display device, driving means and display equipment
WO2021051681A1 (en) * 2019-09-19 2021-03-25 腾讯音乐娱乐科技(深圳)有限公司 Song recognition method and apparatus, storage medium and electronic device
CN111429891B (en) * 2020-03-30 2022-03-04 腾讯科技(深圳)有限公司 Audio data processing method, device and equipment and readable storage medium
CN111429891A (en) * 2020-03-30 2020-07-17 腾讯科技(深圳)有限公司 Audio data processing method, device and equipment and readable storage medium
CN112102846A (en) * 2020-09-04 2020-12-18 腾讯科技(深圳)有限公司 Audio processing method and device, electronic equipment and storage medium
WO2022148163A1 (en) * 2021-01-05 2022-07-14 北京字跳网络技术有限公司 Method and apparatus for positioning music clip, and device and storage medium
CN113345443A (en) * 2021-04-22 2021-09-03 西北工业大学 Marine mammal vocalization detection and identification method based on mel-frequency cepstrum coefficient
CN113432856A (en) * 2021-06-28 2021-09-24 西门子电机(中国)有限公司 Motor testing method, device, electronic equipment and storage medium
CN114036341A (en) * 2022-01-10 2022-02-11 腾讯科技(深圳)有限公司 Music tag prediction method and related equipment
CN114783152A (en) * 2022-03-30 2022-07-22 郑州熙禾智能科技有限公司 Energy storage power station fire alarm method and system based on gas-sound information fusion
CN116546264A (en) * 2023-04-10 2023-08-04 北京度友信息技术有限公司 Video processing method and device, electronic equipment and storage medium

Also Published As

Publication number Publication date
CN106919662B (en) 2021-08-31

Similar Documents

Publication Publication Date Title
CN106919662A (en) A kind of music recognition methods and system
EP2659482B1 (en) Ranking representative segments in media data
Singh et al. Multimedia utilization of non-computerized disguised voice and acoustic similarity measurement
Mitrović et al. Features for content-based audio retrieval
CN108962279A (en) New Method for Instrument Recognition and device, electronic equipment, the storage medium of audio data
CN101599271B (en) Recognition method of digital music emotion
JP5295433B2 (en) Perceptual tempo estimation with scalable complexity
Cartwright et al. Social-EQ: Crowdsourcing an Equalization Descriptor Map.
CN103440873B (en) A kind of music recommend method based on similarity
Dressler Pitch estimation by the pair-wise evaluation of spectral peaks
CN107851444A (en) For acoustic signal to be decomposed into the method and system, target voice and its use of target voice
CN106997765B (en) Quantitative characterization method for human voice timbre
Mehrabi et al. Similarity measures for vocal-based drum sample retrieval using deep convolutional auto-encoders
Yu et al. Sparse cepstral codes and power scale for instrument identification
CN110534091A (en) A kind of people-car interaction method identified based on microserver and intelligent sound
Zhang Application of audio visual tuning detection software in piano tuning teaching
Meng Research on timbre classification based on BP neural network and MFCC
Kreković et al. An algorithm for controlling arbitrary sound synthesizers using adjectives
Hinrichs et al. Convolutional neural networks for the classification of guitar effects and extraction of the parameter settings of single and multi-guitar effects from instrument mixes
Dorochowicz et al. Classification of Music Genres by Means of Listening Tests and Decision Algorithms
Orio A model for human-computer interaction based on the recognition of musical gestures
Rajan et al. Multi-channel CNN-Based Rāga Recognition in Carnatic Music Using Sequential Aggregation Strategy
Mo Music timbre extracted from audio signal features
Ezers et al. Musical Instruments Recognition App
Lekshmi et al. Predominant Instrument Recognition in Polyphonic Music Using Convolutional Recurrent Neural Networks

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant