CN103077706B

CN103077706B - Method for extracting and representing music fingerprint characteristic of music with regular drumbeat rhythm

Info

Publication number: CN103077706B
Application number: CN201310027662.3A
Authority: CN
Inventors: 林晓勇; 蒋玲慧; 张跃; 赵静; 穆祥女
Original assignee: Nanjing Post and Telecommunication University
Current assignee: Nanjing Post and Telecommunication University; Nanjing University of Posts and Telecommunications
Priority date: 2013-01-24
Filing date: 2013-01-24
Publication date: 2015-03-25
Anticipated expiration: 2033-01-24
Also published as: CN103077706A

Abstract

The invention discloses a method for extracting and representing the music fingerprint characteristic of music with regular drumbeat rhythm. Aimed at the music with regular drumbeat rhythm, music measure parameter extraction and estimation are carried out and a measure position offset matrix is generated, requisite parameters which accord with the perception of the human body are extracted from the content of the music, non-linear Bark separation is carried out, so that an energy parameter matrix for each subband is obtained, interleaving is carried out in the matrixes in the form of blocks, a two-dimensional music fingerprint image is finally determined and outputted, and an independent representable specific 'music fingerprint' file is generated with the measure position offset matrix and the two-dimensional music fingerprint image. The method is mainly aimed at classical music (genuine) with clear drumbeats to extract the specific 'music fingerprints' of classical music, which serve as the separate 'fingerprints' of music, and meanwhile, the method also can extract the music fingerprint information of copied (pirated, illegally recorded and duplicated) classical music to carry out comparison and finally judge whether the copied classical music is genuine music according to errors.

Description

Happy line feature extraction and method for expressing are carried out to the music of regular drumbeat rhythm

Technical field

The present invention relates to a kind of happy line feature extraction of music (especially classical music) content and method for expressing for having regular drumbeat rhythm, belonging to music voice signal characteristic abstraction and processing technology field.

Background technology

At present at CBMR(Content-based Music Retrieve, content-based music fingerprint retrieval), CBID(content-based audio identification, content-based audio identification) AFP(Audio fingerprinting in other words, fingerprint extraction) in search technique field one take music signal as the special applications of principal character.CBMR comprises two large main contents: music fingerprint (happy line) extract and happy line retrieval in matching algorithm.

In happy line extraction algorithm, till now, the achievement in research of a lot of vocal print algorithm is had both at home and abroad.The method of extensive employing be from through in short-term-Fourier transformation after spectrogram inside select some features, and carry out modeling to these characteristic sequences, the model extraction parameter after modeling is as the happy line of this fragment.

In work in early days, mainly contain the LPC(Linear Prediction Coefficients using field of voice signal, linear predictor coefficient), and use MFCC(Mel-Frequency Cepstral Coefficients, mel cepstrum coefficients) feature characterizes music signal.The two is all by transform acoustical signals on cepstrum domain, and MFCC method is compared than LPC has better advantage.

Because " vocal print " retrieval technique of current research is mainly for general sound class, such as voice paragraph, song, music song etc., the means therefore adopted are all more common and extensive, poor-performing in robustness.And for the more and more higher classical music protected the intellectual property in the whole world, not there is generality.Classical music tuneful, drumbeat is rule (such as the thump such as piano, Zheng class music) comparatively, how there is not yet solution to the retrieval that this type of music with regular drumbeat rhythm carries out " vocal print ".

Summary of the invention

Technical problem to be solved by this invention is rapid extraction for the happy line parameter of the music being content with the music (classical music) with regular drumbeat rhythm and visable representation.To guarantor's ear can responsive to frequency retain and process, extraction trifle and beat excursion matrix are carried out to the drumbeat feature of classical music, intertexture is carried out to the sub belt energy of classical music data and does difference judgement, final generation " happy line " tag file, the happy line characteristic parameter obtaining legal music unique represents.

The present invention is for solving the problems of the technologies described above by the following technical solutions:

Happy line feature extraction and a method for expressing are carried out to the music of regular drumbeat rhythm, comprises the preprocessing process to original music, two-dimentional happy print image generative process, music rhythm start position leaching process, happy line tag file generative process; Concrete steps are as follows:

A, preprocessing process are as follows:

Steps A 1, the translation window pattern adopting overlap coefficient to be 31/32 carry out sample sequence framing to original music file, obtain some based on seasonal effect in time series Frame;

Steps A 2, Frame is obtained for A1 carry out preemphasis process, filter background noise and channel white noise;

Steps A 3, adopt the white noise and department's short duration high frequency interference noise that bring due to sound pick-up outfit in filters filter data, obtain continuous print Frame;

Steps A 4, carries out the operation of loading Hanning window to continuous print Frame, is converted into time-domain signal;

Steps A 5, time-domain signal steps A 4 obtained adopt FFT to become frequency domain discrete signal, i.e. frequency domain matrix { H (i, j) }, and this frequency domain matrix { H (i, j) } are adopted Db form matrix E (k)=10log ₁₀(| H (i, j) | ²) be converted into corresponding frequency energy matrix { E (i, j) }; Wherein, H (i, j) be under Time Continuous i frame coordinate, j frequency time short time frame signal amplitude, the frequency energy that E (i, j) denotation coordination (i, j) is corresponding, k represents Time Continuous frame number, and i, j, k are natural number;

B, two-dimentional happy print image generative process are as follows:

Step B1, the frequency energy matrix { E (i, j) } produced steps A 5, adopt Bark curve table to carry out nonlinear Bark subband and be separated;

Step B2, each subband is carried out to the filtration of auditory perceptual thresholding, retaining human auditory system can the energy point that arrives of sensitivity rapidly;

The non-linear value of step B3, corresponding Bark curve, using the division border that each frequency index of continuous subband is separated as subband, carries out sub belt energy summation, obtain a continuous matrix { J (m, n) }, wherein m ∈ (2,32), n ∈ (1, ∞); Then carry out interleaving block process between adjacent block, adopt three value methods to export court verdict, obtain one by three values { matrix that-1,0,1} forms, i.e. happy line characteristic value;

Step B4, to export happy line characteristic value carry out visual image displaying, namely to described three value {-1,0,1} uses RGB look to draw respectively;

C, music rhythm start position leaching process, specifically comprise:

Step C1, by steps A obtain energy matrix, carry out the estimation of successive frame energy, by the judgement to zero-crossing rate and average frame energy thresholding, judge to mourn in silence sound and ambient noise, obtained the set { T (k) } of the position skew of point frame, k is for scope is from 1 to obtained whole starting point sums;

Step C2, restriction frequency index range, calculate frequency difference, filter local power minimum in origin sequences; To the origin sequences after filtration, calculate the distance between adjacent T (k), be designated as { D (k) } sequence;

Step C3, carry out K-Means cluster calculation to { D (k) } sequence, obtain its maximal subset { Dm (p) }, wherein p represents that Dm is the mark of D (k) sequence maximal subset from 1 to the maximum sum of this subset;

The corresponding time location of step C4, extraction { Dm (k) }, as the offset data of final effectively rhythm starting point;

D, happy line tag file generative process, be specially:

The final result of step B and step C is synthesized a file, using the head of the result of step C as this file, the result of step B as the data volume of this file, then finally generates a kind of visual happy line data file that uniquely can indicate this song.

The present invention adopts above technical scheme compared with prior art, has following technique effect:

1, non-linear Bark subband partition method is adopted, avoid the simplification process of traditional even partition subband, fully take into account the different feeling of human auditory system curve to classical music content, by the filtration to auditory sensitivity thresholding, the music content part not affecting auditory effect is filtered, remains the validity to perceived content;

2, " three value methods " is adopted to describe visual happy line file, have better illustrative than traditional black and white method, avoid the change of the file fingerprint that minor fluctuations that black and white two-value method causes under noise jamming causes simultaneously, therefore this kind of method has better " robustness ".

3, have employed the maximal subset that clustering algorithm obtains rhythm starting point, this kind of method has than a lot of theoretical algorithm and better realizes effect, it filters pseudo-some subset effectively, although also filter out some available points simultaneously, ensure that the existence of effective rhythm starting point from probability.

4, the final happy line tag file generated, the coloured graph-based of tool, the positional information of the rhythm starting point simultaneously in header file, retrieval original position can be set up rapidly when happy line retrieval, and the comparison breviaty of whole music file tag file will have been become to the process of only rhythm start position fragment being compared.

Accompanying drawing explanation

Fig. 1 is functions implementing the present invention block diagrams.

Fig. 2 is original CD music pretreatment process figure.

Fig. 3 is two-dimentional happy print image generative process figure.

Fig. 4 definitely hears Bark curve map.

Fig. 5 adopts " three value methods " to the schematic diagram carrying out interleaving block process output court verdict between adjacent block.

Fig. 6 is by three values { the matrix schematic diagram of 32 row that-1,0,1} forms.

Fig. 7 is the visual image schematic diagram of the happy line of music.

Fig. 8 is music start position extracting method flow chart.

Fig. 9 is peak-to-peak power points before treatment and the peak-to-peak power points schematic diagram after filtering.

Figure 10 is visual happy line document format data figure.

Detailed description of the invention

Below in conjunction with accompanying drawing, technical scheme of the present invention is described in further detail:

As shown in Figure 1, the present invention proposes a kind of happy line feature extraction of music content and method for expressing based on having regular drumbeat rhythm, mainly comprises: the pretreatment of original CD music, two-dimentional happy print image generation method, music start position extracting method, happy line tag file method for expressing.Wherein, two-dimentional happy print image generation method comprises: to the non-linear Bark subband separation process of sampling point sequence, antithetical phrase band carries out the filtration of perception thresholding, and sub belt energy is sued for peace, the process of matrix intersector block, happy line character representation, the processes such as the happy print image display of two-dimensional color.Music start position extracting method comprises: extract the rhythmic stress in music, eliminate pseudo-stress sampling point, effective rhythm start point data is obtained by clustering algorithm, and by start of record deviation post, final and two dimension happy print image file synthesis one has the happy line tag file that unique identification specifies music.

For classical music, the main method specific implementation step of the present invention is as follows:

A, pretreatment are read original classical music file and carry out the preparation measures before happy line extraction.As shown in Figure 2, following steps are specifically comprised:

A1, framing, getting every 16384 sampling points is a frame, and overlap coefficient is 31/32 simultaneously, and namely adjacent two frames only have 512 sample values to be different.

A2, introducing preemphasis process, by channel H (z)=1-α z ^-1come filter background noise and channel white noise, wherein empirical coefficient α ∈ (0.9375,1), z is the transform factor.

The RASTA wave filter that A3, this method adopt, RASTA wave filter is a time series IIR type bandpass filter, filtration fraction short duration high frequency interference noise; The characteristic of channel is:

H (z) = 0.1 \times \frac{2 + z^{- 1} - z^{- 3} - 2 z^{- 4}}{z^{- 4} \times (1 - 0.94 z^{- 1})} .

A4, to filter after time series frame, carry out loading Hanning window (length of window is 16384), introduce empirical coefficient simultaneously

β = \sqrt{8 / 3} .

A5, employing FFT become frequency domain discrete signal, and FFT transformation for mula is, wherein: N is a N continuous time domain grouping sum, and N is natural number, and r is continuous frequency sequence number, and x (t) is the sampled value of t, w _n=e ^{(-2 π t/N)}, obtain frequency domain matrix { H (i, j) }.

A6, obtain and be converted into corresponding energy matrix { E (i, j) } thus, and be converted to Db form matrix.Wherein the conversion formula of Db form matrix is: E (k)=10log ₁₀(| H (i, j) | ²) (unit is Db).

B, two-dimentional happy print image generative process, as shown in Figure 3, specifically comprises the non-linear Bark subband adapting to auditory perceptual and filter, and perception thresholding filters and the judging process of the continuous envelope variation of subband.

Step B1, to { E (the i that steps A 6 produces, j) } carry out nonlinear Bark subband to be separated, the Bark curve table herein taked as shown in Figure 4, comprise people's ear to the threshold of hearing required for the susceptibility of different frequent points, go out to show Freq_Bark_AbsThresh_Table according to curve plotting, comprise frequency index, critical frequency numerical value, definitely subband speed and voice range absolute threshold:

B2, each subband is carried out to the filtration of auditory perceptual thresholding, retaining human auditory system responsively rapidly can arrive energy point, and can be called the pretreatment before carrying out non-linear subband separation, step is as follows:

B2.1, in the frequency band of 300-2000hz, arranges critical band 3,4,5,6,7,8,9,10,11, and 12 totally 10 critical bands, if the centre frequency f of each critical band _s;

B2.2 obtains 10 and hears threshold value, corresponding 10 critical bands

T(f _s)=3.64(f _s/1000)^(－0.8)－6.5exp{－0.6(f _s/1000－3.3)^2}+(f _s/1000)^4/1000；

(fs is 10 different critical band centre frequencies).

B2.3 gets the minimum point Min of each frequency band;

If amplitude is greater than Min*10^ (T (f in B2.4 matrix { E (i, j) } frequency band _s)/20) reservation, be less than the amplitude zero setting of this value, the matrix { E be improved thus _new(i, j) };

Step B3, the non-linear value of corresponding Bark curve, gets 34 critical frequencies, namely comprises 33 frequency domain sub-band, is specially

[111,118,125,132,140,148,157,166,176,187,198,209,222,235,249,264,279,296,314,332,352,373,395,418,443,470,497,527,558,591,626,663,703,743] as the division border that frequency index is separated as subband;

Step B4, according to frequency index given by B2, sub belt energy summation is carried out to the matrix that B3 obtains, obtain the continuous matrix { J (m, n) } of behavior 33 row, as shown in Figure 5.

B5, carry out interleaving block process between adjacent block, as shown in Figure 5 { J (m, n) }, adopt " three value methods " to export court verdict, decision method is as follows:

Δ (m-1, n)=| J (m, n)-J (m-1, n) |, wherein m ∈ (2,32), n ∈ (1, ∞)

K=Δ (i, j+1)-Δ (i, j), wherein i ∈ (1,31), j ∈ (1, ∞)

When time, F=0; Herein

Otherwise: as K>0, F=+1;

Work as K<0, F=-1;

B6, thus obtain one by three values the matrix of 32 row that-1,0,1} forms, as shown in Figure 6: wherein the value limit of element (i, j) gets { any one in-1,0,1}.Trichromatic diagram is got to this matrix, namely uses RGB look (255,0,0) (0,255,0) (0,0,255) to draw respectively, can obtain as shown in Figure 7, be the visual image of the happy line of this music.

As long as C, this step extract classical music rhythm starting point, eliminate pseudo-starting point, obtain maximal subset finally by clustering algorithm, obtain the set of effective rhythm starting point, and record the deviation post of this maximal subset, as shown in Figure 8.

Step C1, by steps A obtain energy matrix, carry out the estimation of successive frame energy, by the judgement to zero-crossing rate and average frame energy thresholding, judge to mourn in silence sound and ambient noise, during T=0, be judged as sound of mourning in silence, ambient noise is the thresholding trained, only for the process of head and the tail ambient noise.Specific as follows:

Z_{v} = Σ_{u = - \infty}^{\infty} {| sgn [x_{(v)} - T] - sgn [x_{(v - 1)} - T] | + | sgn [x_{(v)} + T] - sgn [x_{(v - 1)} + T] |} w_{(v - u)},

Wherein x _(v-1), x _(v)for the sampled value of Time Continuous, T is decision threshold, and 20%, u of the number of winning the confidence average amplitude is variable herein, usually gets x _(u)be total to (2N+1) individual sample value before and after moment, namely u scope is (-N, N), because can not calculate infinity in theory, so getting N is an enough large natural number, usually gets 1024, w _(v-u)for window function, represent that in the value of (v-u) moment w be 1, and be value at other moment w be 0.

Step C2, to be obtained in origin sequences { T (k) } by step C1 and comprise multiple pseudo-point, namely rhythm starting point is mistaken for, therefore need again filtering in a frequency domain, limit frequency index range between (111 ~ 743), to in origin sequences, frequency difference is calculated, filter local power minimum, peak-to-peak power points before treatment and the peak-to-peak power points after filtering are illustrated as shown in Figure 9.

Step C3, to the origin sequences filtered after pseudo-point, calculate the distance between adjacent T (k) simultaneously, be designated as { D (k) } sequence, k is for scope is from 1 to obtained whole starting point sums, also comprise by the real start position deleted by mistake in this sequence, therefore the interval of adjacent two starting points is just much larger than conventional starting point interval.

Step C4, K-Means cluster calculation is carried out to { D (k) } sequence, get the maximal subset of polymerizing factor (0.80 ~ 0.90), be defined as { Dm (p) }; Wherein p represents that Dm is the mark of D (k) sequence maximal subset from 1 to the maximum sum of this subset.

The corresponding time location of step C5, extraction { Dm (p) }, as offset data T (p) of final effectively rhythm starting point.

D, happy line tag file generative process, the final result of step B and step C is synthesized a file, using the head of C result as this file, the result of step B is as the data volume of this file, then a kind of visual happy line data file that uniquely can indicate this song of last generation, file suffixes is called " som ", and file format as shown in Figure 10.

Wherein character " somusic " and " fmt " represent that this file is a happy line file, " head length " shows the data length in whole to " data " from field " somusic ", " total length " the i.e. byte-sized of whole happy line file, " polymerizing factor " refers to the tightness be polymerized when extracting starting point, be defaulted as 85, be 0.85, " sample frequency " is sample frequency when reading source file, " transfer rate " refers to Wave data transfer rate, i.e. average byte number per second, the beginning offset position of to be all four bytes be mark that " index " character representation is follow-up, data increase progressively gradually, terminate until run into " data " field, from after " data " field, be {-1 of step B, 0, the three value combinations of+1}.

When happy line software for display reads " som " file, the validity of file will be judged successively, and the happy line figure of the music of RGB color shows the most at last.

Thus, the happy line that the present invention proposes a kind of music mainly for having regular drumbeat rhythm extracts and method for expressing, not only simplify in the past take probabilistic model as the origin detection method of decision method, consider the nonlinear effect of human auditory model simultaneously, masking effect by people's ear ensure that the feature of music content better, more even sub-band division is separated to the non-linear Bark subband of happy line better, to continuous adjacent subband, namely time domain and frequency domain have carried out interleaving block process, greatly enhance the stability of happy line feature in noise jamming situation, the happy line tag file of " SOM " form that the present invention finally proposes, better technical foundation will be improved for technology such as content-based music retrieval.

Claims

1. happy line feature extraction and a method for expressing are carried out to the music of regular drumbeat rhythm, it is characterized in that: comprise the preprocessing process to original music, two-dimentional happy print image generative process, music rhythm start position leaching process, happy line tag file generative process; Concrete steps are as follows:

A, preprocessing process are as follows:

B, two-dimentional happy print image generative process are as follows:

C, music rhythm start position leaching process, specifically comprise:

The corresponding time location of step C4, extraction { Dm (p) }, as the offset data of final effectively rhythm starting point;

D, happy line tag file generative process, be specially: