Summary of the invention
Technical problem to be solved by this invention is rapid extraction for the happy line parameter of the music being content with the music (classical music) with regular drumbeat rhythm and visable representation.To guarantor's ear can responsive to frequency retain and process, extraction trifle and beat excursion matrix are carried out to the drumbeat feature of classical music, intertexture is carried out to the sub belt energy of classical music data and does difference judgement, final generation " happy line " tag file, the happy line characteristic parameter obtaining legal music unique represents.
The present invention is for solving the problems of the technologies described above by the following technical solutions:
Happy line feature extraction and a method for expressing are carried out to the music of regular drumbeat rhythm, comprises the preprocessing process to original music, two-dimentional happy print image generative process, music rhythm start position leaching process, happy line tag file generative process; Concrete steps are as follows:
A, preprocessing process are as follows:
Steps A 1, the translation window pattern adopting overlap coefficient to be 31/32 carry out sample sequence framing to original music file, obtain some based on seasonal effect in time series Frame;
Steps A 2, Frame is obtained for A1 carry out preemphasis process, filter background noise and channel white noise;
Steps A 3, adopt the white noise and department's short duration high frequency interference noise that bring due to sound pick-up outfit in filters filter data, obtain continuous print Frame;
Steps A 4, carries out the operation of loading Hanning window to continuous print Frame, is converted into time-domain signal;
Steps A 5, time-domain signal steps A 4 obtained adopt FFT to become frequency domain discrete signal, i.e. frequency domain matrix { H (i, j) }, and this frequency domain matrix { H (i, j) } are adopted Db form matrix E (k)=10log
10(| H (i, j) |
2) be converted into corresponding frequency energy matrix { E (i, j) }; Wherein, H (i, j) be under Time Continuous i frame coordinate, j frequency time short time frame signal amplitude, the frequency energy that E (i, j) denotation coordination (i, j) is corresponding, k represents Time Continuous frame number, and i, j, k are natural number;
B, two-dimentional happy print image generative process are as follows:
Step B1, the frequency energy matrix { E (i, j) } produced steps A 5, adopt Bark curve table to carry out nonlinear Bark subband and be separated;
Step B2, each subband is carried out to the filtration of auditory perceptual thresholding, retaining human auditory system can the energy point that arrives of sensitivity rapidly;
The non-linear value of step B3, corresponding Bark curve, using the division border that each frequency index of continuous subband is separated as subband, carries out sub belt energy summation, obtain a continuous matrix { J (m, n) }, wherein m ∈ (2,32), n ∈ (1, ∞); Then carry out interleaving block process between adjacent block, adopt three value methods to export court verdict, obtain one by three values { matrix that-1,0,1} forms, i.e. happy line characteristic value;
Step B4, to export happy line characteristic value carry out visual image displaying, namely to described three value {-1,0,1} uses RGB look to draw respectively;
C, music rhythm start position leaching process, specifically comprise:
Step C1, by steps A obtain energy matrix, carry out the estimation of successive frame energy, by the judgement to zero-crossing rate and average frame energy thresholding, judge to mourn in silence sound and ambient noise, obtained the set { T (k) } of the position skew of point frame, k is for scope is from 1 to obtained whole starting point sums;
Step C2, restriction frequency index range, calculate frequency difference, filter local power minimum in origin sequences; To the origin sequences after filtration, calculate the distance between adjacent T (k), be designated as { D (k) } sequence;
Step C3, carry out K-Means cluster calculation to { D (k) } sequence, obtain its maximal subset { Dm (p) }, wherein p represents that Dm is the mark of D (k) sequence maximal subset from 1 to the maximum sum of this subset;
The corresponding time location of step C4, extraction { Dm (k) }, as the offset data of final effectively rhythm starting point;
D, happy line tag file generative process, be specially:
The final result of step B and step C is synthesized a file, using the head of the result of step C as this file, the result of step B as the data volume of this file, then finally generates a kind of visual happy line data file that uniquely can indicate this song.
The present invention adopts above technical scheme compared with prior art, has following technique effect:
1, non-linear Bark subband partition method is adopted, avoid the simplification process of traditional even partition subband, fully take into account the different feeling of human auditory system curve to classical music content, by the filtration to auditory sensitivity thresholding, the music content part not affecting auditory effect is filtered, remains the validity to perceived content;
2, " three value methods " is adopted to describe visual happy line file, have better illustrative than traditional black and white method, avoid the change of the file fingerprint that minor fluctuations that black and white two-value method causes under noise jamming causes simultaneously, therefore this kind of method has better " robustness ".
3, have employed the maximal subset that clustering algorithm obtains rhythm starting point, this kind of method has than a lot of theoretical algorithm and better realizes effect, it filters pseudo-some subset effectively, although also filter out some available points simultaneously, ensure that the existence of effective rhythm starting point from probability.
4, the final happy line tag file generated, the coloured graph-based of tool, the positional information of the rhythm starting point simultaneously in header file, retrieval original position can be set up rapidly when happy line retrieval, and the comparison breviaty of whole music file tag file will have been become to the process of only rhythm start position fragment being compared.
Detailed description of the invention
Below in conjunction with accompanying drawing, technical scheme of the present invention is described in further detail:
As shown in Figure 1, the present invention proposes a kind of happy line feature extraction of music content and method for expressing based on having regular drumbeat rhythm, mainly comprises: the pretreatment of original CD music, two-dimentional happy print image generation method, music start position extracting method, happy line tag file method for expressing.Wherein, two-dimentional happy print image generation method comprises: to the non-linear Bark subband separation process of sampling point sequence, antithetical phrase band carries out the filtration of perception thresholding, and sub belt energy is sued for peace, the process of matrix intersector block, happy line character representation, the processes such as the happy print image display of two-dimensional color.Music start position extracting method comprises: extract the rhythmic stress in music, eliminate pseudo-stress sampling point, effective rhythm start point data is obtained by clustering algorithm, and by start of record deviation post, final and two dimension happy print image file synthesis one has the happy line tag file that unique identification specifies music.
For classical music, the main method specific implementation step of the present invention is as follows:
A, pretreatment are read original classical music file and carry out the preparation measures before happy line extraction.As shown in Figure 2, following steps are specifically comprised:
A1, framing, getting every 16384 sampling points is a frame, and overlap coefficient is 31/32 simultaneously, and namely adjacent two frames only have 512 sample values to be different.
A2, introducing preemphasis process, by channel H (z)=1-α z
-1come filter background noise and channel white noise, wherein empirical coefficient α ∈ (0.9375,1), z is the transform factor.
The RASTA wave filter that A3, this method adopt, RASTA wave filter is a time series IIR type bandpass filter, filtration fraction short duration high frequency interference noise; The characteristic of channel is:
A4, to filter after time series frame, carry out loading Hanning window (length of window is 16384), introduce empirical coefficient simultaneously
A5, employing FFT become frequency domain discrete signal, and FFT transformation for mula is,
wherein: N is a N continuous time domain grouping sum, and N is natural number, and r is continuous frequency sequence number, and x (t) is the sampled value of t, w
n=e
(-2 π t/N), obtain frequency domain matrix { H (i, j) }.
A6, obtain and be converted into corresponding energy matrix { E (i, j) } thus, and be converted to Db form matrix.Wherein the conversion formula of Db form matrix is: E (k)=10log
10(| H (i, j) |
2) (unit is Db).
B, two-dimentional happy print image generative process, as shown in Figure 3, specifically comprises the non-linear Bark subband adapting to auditory perceptual and filter, and perception thresholding filters and the judging process of the continuous envelope variation of subband.
Step B1, to { E (the i that steps A 6 produces, j) } carry out nonlinear Bark subband to be separated, the Bark curve table herein taked as shown in Figure 4, comprise people's ear to the threshold of hearing required for the susceptibility of different frequent points, go out to show Freq_Bark_AbsThresh_Table according to curve plotting, comprise frequency index, critical frequency numerical value, definitely subband speed and voice range absolute threshold:
B2, each subband is carried out to the filtration of auditory perceptual thresholding, retaining human auditory system responsively rapidly can arrive energy point, and can be called the pretreatment before carrying out non-linear subband separation, step is as follows:
B2.1, in the frequency band of 300-2000hz, arranges critical band 3,4,5,6,7,8,9,10,11, and 12 totally 10 critical bands, if the centre frequency f of each critical band
s;
B2.2 obtains 10 and hears threshold value, corresponding 10 critical bands
T(f
s)=3.64(f
s/1000)^(-0.8)-6.5exp{-0.6(f
s/1000-3.3)^2}+(f
s/1000)^4/1000;
(fs is 10 different critical band centre frequencies).
B2.3 gets the minimum point Min of each frequency band;
If amplitude is greater than Min*10^ (T (f in B2.4 matrix { E (i, j) } frequency band
s)/20) reservation, be less than the amplitude zero setting of this value, the matrix { E be improved thus
new(i, j) };
Step B3, the non-linear value of corresponding Bark curve, gets 34 critical frequencies, namely comprises 33 frequency domain sub-band, is specially
[111,118,125,132,140,148,157,166,176,187,198,209,222,235,249,264,279,296,314,332,352,373,395,418,443,470,497,527,558,591,626,663,703,743] as the division border that frequency index is separated as subband;
Step B4, according to frequency index given by B2, sub belt energy summation is carried out to the matrix that B3 obtains, obtain the continuous matrix { J (m, n) } of behavior 33 row, as shown in Figure 5.
B5, carry out interleaving block process between adjacent block, as shown in Figure 5 { J (m, n) }, adopt " three value methods " to export court verdict, decision method is as follows:
Δ (m-1, n)=| J (m, n)-J (m-1, n) |, wherein m ∈ (2,32), n ∈ (1, ∞)
K=Δ (i, j+1)-Δ (i, j), wherein i ∈ (1,31), j ∈ (1, ∞)
When
time, F=0; Herein
Otherwise: as K>0, F=+1;
Work as K<0, F=-1;
B6, thus obtain one by three values the matrix of 32 row that-1,0,1} forms, as shown in Figure 6: wherein the value limit of element (i, j) gets { any one in-1,0,1}.Trichromatic diagram is got to this matrix, namely uses RGB look (255,0,0) (0,255,0) (0,0,255) to draw respectively, can obtain as shown in Figure 7, be the visual image of the happy line of this music.
As long as C, this step extract classical music rhythm starting point, eliminate pseudo-starting point, obtain maximal subset finally by clustering algorithm, obtain the set of effective rhythm starting point, and record the deviation post of this maximal subset, as shown in Figure 8.
Step C1, by steps A obtain energy matrix, carry out the estimation of successive frame energy, by the judgement to zero-crossing rate and average frame energy thresholding, judge to mourn in silence sound and ambient noise, during T=0, be judged as sound of mourning in silence, ambient noise is the thresholding trained, only for the process of head and the tail ambient noise.Specific as follows:
Wherein x
(v-1), x
(v)for the sampled value of Time Continuous, T is decision threshold, and 20%, u of the number of winning the confidence average amplitude is variable herein, usually gets x
(u)be total to (2N+1) individual sample value before and after moment, namely u scope is (-N, N), because can not calculate infinity in theory, so getting N is an enough large natural number, usually gets 1024, w
(v-u)for window function, represent that in the value of (v-u) moment w be 1, and be value at other moment w be 0.
Step C2, to be obtained in origin sequences { T (k) } by step C1 and comprise multiple pseudo-point, namely rhythm starting point is mistaken for, therefore need again filtering in a frequency domain, limit frequency index range between (111 ~ 743), to in origin sequences, frequency difference is calculated, filter local power minimum, peak-to-peak power points before treatment and the peak-to-peak power points after filtering are illustrated as shown in Figure 9.
Step C3, to the origin sequences filtered after pseudo-point, calculate the distance between adjacent T (k) simultaneously, be designated as { D (k) } sequence, k is for scope is from 1 to obtained whole starting point sums, also comprise by the real start position deleted by mistake in this sequence, therefore the interval of adjacent two starting points is just much larger than conventional starting point interval.
Step C4, K-Means cluster calculation is carried out to { D (k) } sequence, get the maximal subset of polymerizing factor (0.80 ~ 0.90), be defined as { Dm (p) }; Wherein p represents that Dm is the mark of D (k) sequence maximal subset from 1 to the maximum sum of this subset.
The corresponding time location of step C5, extraction { Dm (p) }, as offset data T (p) of final effectively rhythm starting point.
D, happy line tag file generative process, the final result of step B and step C is synthesized a file, using the head of C result as this file, the result of step B is as the data volume of this file, then a kind of visual happy line data file that uniquely can indicate this song of last generation, file suffixes is called " som ", and file format as shown in Figure 10.
Wherein character " somusic " and " fmt " represent that this file is a happy line file, " head length " shows the data length in whole to " data " from field " somusic ", " total length " the i.e. byte-sized of whole happy line file, " polymerizing factor " refers to the tightness be polymerized when extracting starting point, be defaulted as 85, be 0.85, " sample frequency " is sample frequency when reading source file, " transfer rate " refers to Wave data transfer rate, i.e. average byte number per second, the beginning offset position of to be all four bytes be mark that " index " character representation is follow-up, data increase progressively gradually, terminate until run into " data " field, from after " data " field, be {-1 of step B, 0, the three value combinations of+1}.
When happy line software for display reads " som " file, the validity of file will be judged successively, and the happy line figure of the music of RGB color shows the most at last.
Thus, the happy line that the present invention proposes a kind of music mainly for having regular drumbeat rhythm extracts and method for expressing, not only simplify in the past take probabilistic model as the origin detection method of decision method, consider the nonlinear effect of human auditory model simultaneously, masking effect by people's ear ensure that the feature of music content better, more even sub-band division is separated to the non-linear Bark subband of happy line better, to continuous adjacent subband, namely time domain and frequency domain have carried out interleaving block process, greatly enhance the stability of happy line feature in noise jamming situation, the happy line tag file of " SOM " form that the present invention finally proposes, better technical foundation will be improved for technology such as content-based music retrieval.