CN102270210A

CN102270210A - MP3 audio attribute discretization method based on heterogeneity rule

Info

Publication number: CN102270210A
Application number: CN2010106122593A
Authority: CN
Inventors: 余小清; 刘军伟; 万旺根; 张静; 杨薇
Original assignee: University of Shanghai for Science and Technology
Current assignee: University of Shanghai for Science and Technology
Priority date: 2010-12-30
Filing date: 2010-12-30
Publication date: 2011-12-07

Abstract

The invention relates to an MP3 audio attribute discretization method based on a heterogeneity rule. The method directly performs discretization on the MP3 audio. Firstly, MP3 audio features are preprocessed, then, an MDCT (Modified Discrete Cosine Transform) spectral coefficient of each frame of audio is obtained, main features (including sideband energy ratio BER, root mean square RMS, spectral center distance SC, mel frequency ceptral coefficient MFCC (12-dimensional)) of the audio are extracted based on an MDCT domain and taken as an attribute set of training samples so as to obtain a 15-dimensional feature attribute input set, and finally, a discretization result is obtained via the discretization method based on the heterogeneity rule. The experimental results demonstrate that the discretization method can facilitate post-treatment for optimization of compressed domain audio attribute features and lay a foundation for establishing a practical and quick audio multi-classification and searching system.

Description

A kind of MP3 audio attribute discretization method based on heterogeneous criterion

Technical field

The present invention relates to a kind of MP3 audio attribute discretization method based on heterogeneous criterion, mainly be to carry out handling, can simplify the method for final discrete point set when being intended to guarantee degree of accuracy based on the discretize of heterogeneous criterion at MP3 audio attribute feature.

Background technology

The attribute discretization technique at first is divided into some equivalence classes to the connection attribute value in the data acquisition, then in guaranteeing each equivalence class under the prerequisite of data consistency, represent each equivalence class with different symbols or round values, and these equivalence classes are handled as single discrete data, thereby reach the purpose of discretize.Briefly, the discretize process of connection attribute is exactly the process of attribute space being divided with some specific symbols or round values.

Along with the fast development of mass data, how from disorderly and unsystematic noisy huge database, to excavate useful knowledge, become human challenge to the Intelligent Information Processing ability.For some data digging method, they all are the data set at discrete type usually when carrying out algorithm design, as decision tree, rough set, correlation rule etc., particularly become one of subject matter of rough set theory, also be one of bottleneck that influences the rough set theory application.Yet, in actual applications, attribute more is to present continuously or the state that mixes, rather than single discrete data, in order to contain the data sample of obtaining the database of connection attribute from these, obtain succinct and effective rule, excavate more effective information, need carry out the pretreated discretize of data connection attribute.

Discretization method proposed by the invention has solved the problem of connection attribute discretize in the MP3 compression domain, can have nothing in common with each other to the selected discrete point that comes out of each dimension attribute, by sample attribute itself and sample class decision.This method is selected method more reasonable of breakpoint than " lumping together " formula in traditional discretization method, can keep the more characteristic of each attribute.Can further be applied in the speech recognition and classification and retrieval system of MP3 audio frequency.

Summary of the invention

The objective of the invention is at the defective that exists in the prior art, a kind of MP3 audio attribute discretization method based on heterogeneous criterion be provided, by extract based on

Figure 2010106122593100002DEST_PATH_IMAGE001

The principal character of territory audio frequency, and choose candidate's breakpoint based on flex point, realize MP3 audio attribute discretize is handled problems.

For achieving the above object, design of the present invention is: from the MP3 voice data, extract earlier the MDCT coefficient, then based on

The principal character of audio frequency is extracted in the territory, as the property set of training sample, obtains the characteristic attributes input set of 15 dimensions, and obtains the breakpoint set of connection attribute according to the character of flex point, obtains discrete results by the discretization method based on heterogeneous criterion at last.

Conceive according to foregoing invention, the technical solution used in the present invention is further improved: at first extract the MDCT coefficient from the MP3 voice data, analyze the characteristic of MDCT coefficient again, according to the principal character of the feature extraction audio frequency of MDCT coefficient (comprising root mean square RMS, spectrum centre distance SC, sideband energy ratio B ER, Mel cepstrum coefficient MFCC (12 dimension)), property set as training sample, obtain the characteristic attribute input set of 15 dimensions, obtain the breakpoint set of connection attribute then according to the character of flex point, obtain discrete results by discretization method at last based on heterogeneous criterion.This method specifically comprises the steps:

1), the pre-service of MP3 audio frequency characteristics: comprise to the MP3 frame head decode, side information obtains, obtain master data and zoom factor, Hafman decoding and four parts of inverse quantization;

2), based on the audio feature extraction of MDCT coefficient: the MDCT coefficient of finding out two granularities of each frame the MP3 frame behind inverse quantization, MDCT coefficient to two particles asks average by Frequency point, make up the MDCT spectral coefficient of every frame audio frequency, extract root mean square RMS, spectrum centre distance SC, sideband energy ratio B ER, Mel cepstrum coefficient MFCC (12 dimension) then;

3), the selection of candidate's breakpoint: from the envelope character of connection attribute, will be retained in the important information of the interval attribute change of different breakpoints, and improve the adaptability of discretization method based on the flex point of this envelope initial candidate breakpoint as the attribute discretize;

4), design heterogeneous amount: calculate class-based conditional probability vector

Figure 2010106122593100002DEST_PATH_IMAGE002

, and with vector

With middle probability vector

Figure 2010106122593100002DEST_PATH_IMAGE004

Between distance be called vector

Heterogeneous amount

Figure 2010106122593100002DEST_PATH_IMAGE006

, with

Figure 2010106122593100002DEST_PATH_IMAGE007

With the center of gravity probability vector

Between distance be heterogeneous amount as the method for weighing the discretize quality;

5), the discretize algorithm under the heterogeneous criterion: according to the candidate's breakpoint in the step 3) algorithm each dimension attribute in the property set is handled, and the property set after handling is carried out discretize according to the heterogeneous amount that calculates in the step 4);

The present invention compared with prior art, have following conspicuous outstanding substantive distinguishing features and remarkable advantage: directly the MP3 audio frequency is carried out discretize and handle based on the MP3 compression domain, carry out the method that discretize is handled again than traditional incompressible audio frequency that the MP3 compressed audio is decoded as, the method that the present invention proposes is simpler, and saves computing time; New algorithm according to the present invention can have nothing in common with each other to the selected discrete point that comes out of each dimension attribute, by sample attribute itself and sample class decision.This method is more reasonable than " lumping together " formula discretize result in traditional discretization method, can keep the more characteristic of each attribute, not only make things convenient for the subsequent treatment that the compressed domain audio attributive character is optimized, and lay the first stone for setting up practicality many classification of audio frequency fast and searching system.

Description of drawings

Fig. 1 is the process flow diagram of a kind of MP3 audio attribute discretization method based on heterogeneous criterion of the present invention;

Fig. 2 is a compressed domain audio feature SC primitive character attribute synoptic diagram;

Fig. 3 is that compressed domain audio feature SC is through characteristic attribute synoptic diagram after the linear discreteization;

Fig. 4 is that compressed domain audio feature SC process is based on characteristic attribute synoptic diagram behind the discretize algorithm of entropy;

Fig. 5 is that compressed domain audio feature SC process is based on characteristic attribute synoptic diagram behind the discretize algorithm of heterogeneous criterion.

Embodiment

A preferred embodiment of the present invention accompanying drawings is as follows: referring to Fig. 1, it was five steps that the MP3 audio attribute discretization method that the present invention is based on heterogeneous criterion is divided into:

The first step: the pre-service of MP3 compressed domain audio feature

The pre-service of MP3 compressed audio, comprise to the MP3 frame head decode, side information obtains, read master data and zoom factor, Hafman decoding and four parts of inverse quantization.

1, obtaining of synchronous data flow and frame head information:

A), according to the MP3 coded format, from the MP3 data stream, search for synchronizing information;

B), according to synchronizing information, find the reference position of each frame data in the MP3 data stream;

C), after the reference position of specified data frame, obtain frame head information Head

2, obtaining of side information:

A), according to the coded format of MP3 frame head, determine the reference position of side information in the MP3 frame head;

B), from MP3 frame head information HeadIn obtain side information Side

3, reading of MP3 master data and zoom factor:

A), according to side information SideCalculate the length of master data L

B), according to frame head information HeadIn the side-play amount of master data, determine the reference position of MP3 master data;

C), obtaining total length from present frame is LMaster data D

D), from master data DThe middle zoom factor that extracts Scale

4, Hafman decoding and inverse quantization:

A), according to side information SideDetermine the starting and ending position of Hafman decoding data;

B), to the MP3 master data DCarry out Hafman decoding, obtain the Hafman decoding result of 32*18 dimension F[32,18];

C), to the Hafman decoding result FData in [32,18] are carried out inverse quantization.

Second step: the MDCT coefficient extracts and the MP3 audio feature extraction

1, make up the correction discrete cosine transform MDCT coefficient of every frame audio frequency:

A), divide and to be used in the MDCT coefficient of depositing two granularities of a frame MP3 audio frequency n* the storage space of 576 sizes MDCT ₀[ n, 576], MDCT ₁[ n, 576] in, wherein nFrame number for the MP3 audio frequency;

B), from array FIn find the MDCT coefficient of two granularities of same frame audio frequency respectively, rearrange by frequency principle from low to high, obtain MDCT ₀[ i, j], MDCT ₁[ i, j] in;

C), calculate the mean value of the MDCT coefficient at two granulometric facies same frequency point places in the same frame audio frequency, as the MDCT coefficient value of this frame audio frequency M[ i, j];

Figure 2010106122593100002DEST_PATH_IMAGE008

Wherein,

Figure 2010106122593100002DEST_PATH_IMAGE009

,

Figure 2010106122593100002DEST_PATH_IMAGE010

Respectively the iOf the 0th granularity of frame audio frequency and the 1st granularity jIndividual MDCT spectrum value. Be iOf frame audio frequency jIndividual average MD CT spectrum value.

2, extract root mean square RMS, spectrum centre distance SC, sideband energy ratio B ER, Mel cepstrum coefficient MFCC (12 dimension):

A), root mean square RMS

This parameter is the envelope of sound signal, embodies the energy variation of signal.To a particle root mean square RMS computing formula be:

Figure 2010106122593100002DEST_PATH_IMAGE012

Wherein, Be a coefficient number in the particle,

Figure 2010106122593100002DEST_PATH_IMAGE014

It is the MDCT coefficient value.

B), spectrum centre distance SC

This parameter is the equilibrium point of MDCT coefficient energy distribution, has embodied the spectral regions that most of signal energy is concentrated.Its computing formula is:

Wherein

It is the MDCT coefficient value.

C), sideband energy ratio B ER

The sideband energy ratio is meant, gets a reference frequency in signal frequency range

, being lower than and being higher than

Figure DEST_PATH_344801DEST_PATH_IMAGE003

The pairing MDCT coefficient of frequency energy do ratio, obtain the sideband energy ratio.Suppose

Figure DEST_PATH_776919DEST_PATH_IMAGE003

With

Individual sideband, the

Individual

The pairing frequency of coefficient is the most approaching.So, sideband energy ratio B ER as shown in the formula:

Wherein, M represents the window type, during long window

, during the weak point window

,

It is the MDCT coefficient value.

D), Mel cepstrum coefficient MFCC

MFCC is based on that human hearing characteristic puts forward, and it becomes nonlinear correspondence relation with the Hz frequency, and the calculation procedure of MFCC is:

(1): the definition number is 12 triangular filter group, corresponding to the practical center frequency of Mel frequency is

, m=1,2 ... 12.

Figure DEST_PATH_418247DEST_PATH_IMAGE011

Determine by following formula:

N=576 wherein,

Be the transforming relationship of actual frequency and Mel frequency,

It is its inverse function.

,

It is respectively minimum and Mel frequency representation highest frequency.Following formula is converted into the practical center frequency to equally spaced centre frequency in the Mel frequency domain.The triangle filtering of Mel frequency domain is to calculate the frequency domain components that falls in the triangular filter scope in fact, and MDCT can be multiplied by the corresponding factor by discharge amplitude.Triangle filtering is shown below to the frequency response of different frequency component:

M represents corresponding wave filter sequence number, and the scope of m is here

And be integer.K represents the sequence number of frequency line, and the scope of k is here

(2): the output energy that calculates each wave filter by following formula:

M is the wave filter sequence number, and k is Sequence number.

(3): calculate cosine transform to obtain the MFCC coefficient by following formula:

The 3rd step: the selection of candidate's breakpoint

1, analyzing audio characteristic attribute collection; Choose successively four continuous attribute points on the first dimension attribute collection (A, B, C, D);

2, three vectors that four order points are formed (AB, BC CD), calculate two groups of curvature of intersecting vector by following formula:

3, calculate by following formula

:

If 4

, also promptly satisfy the necessary and sufficient condition that flex point exists, and flex point is positioned at

A certain position on the vector.Usually we get BC mid point as this flex point, i.e. candidate's breakpoint;

5, cycling.To other conditional attributes, the flow process that repeats Step1-Step4 is to obtain candidate's breakpoint set of each dimension attribute.

The 4th step: design heterogeneous amount

1, the calculating of heterogeneous amount:

Suppose in the infosystem that M sample is to attribute

Quantize, its classification is divided into

Class is between the discrete regions of attribute being

Individual interval.

The representation attribute classification is

And attribute

Figure DEST_PATH_461640DEST_PATH_IMAGE029

Value be positioned at i discrete interval All sample numbers; The

The row sample statistics and

The expression all properties is Total sample number, i row quantitative statistics and

The value of expression sample attribute drops on the

Individual discrete interval the interval

Figure DEST_PATH_368602DEST_PATH_IMAGE036

Total sample number.To i discrete interval , obtain the class conddition probability distribution , wherein

, and

Order

For

The probability vector combination of dimension has comprised all class probability vectors of each discrete interval, then:

With vectorial P and middle probability vector

Between distance be called vector

Heterogeneous amount

, as shown in the formula:

Wherein

, be called the center of gravity probability vector.To

Individual discrete interval is when condition class probability vector =P ₀The time, its heterogeneous amount minimum has at interval characterized its relatively poor classification performance.Therefore to class probability vector arbitrarily:

Figure DEST_PATH_594123DEST_PATH_IMAGE042

, with

With the center of gravity probability vector

Figure DEST_PATH_521628DEST_PATH_IMAGE047

Between distance be that heterogeneous amount is as balancing method.

[The comparison of classification degree of accuracy under three kinds of discretize modes of table 2

The audio types sample	Discretize not	Linear discreteization	EBD	Heterogeneous discrete
					Allusion/voice/rock and roll	93.53%	83.83％	85.14%	89.71%
Male voice/female voice/rock and roll	88.78%	71.46％	81.52%	86.87%

From table 2, can see, classification degree of accuracy during without the discretize mode is the highest, and other through after discretizes all more or less reduction the classification degree of accuracy, this is because discretize will inevitably make former sample attribute information ignore a part of characteristic, though this shortcoming is also arranged by the discretization method that this paper proposed, but its error is less relatively, can guarantee higher classification accuracy, and when the audio samples data when huge and characteristic attribute dimension increases, discretize is handled the complexity that will reduce subsequent algorithm greatly, and sacrificing a part of accuracy rate in tolerance interval is worth; In addition, the connection attribute discretize also is the essential treatment scheme of doing based on the work of the characteristic optimization of rough set theory for next.

,

Figure DEST_PATH_660485DEST_PATH_IMAGE053

Computing formula be shown below:

With Represent the discretize scheme respectively

And the border of D '.When

The time, expression can from

Produce D ' in the scheme, promptly in the D border, increase some frontier point and can access D '.Therefore, for two discretize scheme D and D ' arbitrarily, can obtain following rule:

, in view of the quantity of discrete interval, when under the condition of the good classification effectiveness of maintenance, the few more complicacy to the reduction data of quantity at interval is good more, also helps follow-up sample classification more.Therefore can obtain the criterion of equal value of following measurement discretize scheme:

Suppose two discretize scheme D and D ', the discrete interval number that they had is respectively

With

, use

Sign by

The scheme that produces, and the scheme that satisfies the discretize criterion is called Candidate Set, represents with CD.If following formula is set up:

Show that discretize scheme D is better than , wherein Be called criterion function.

The 5th step: the discretize under the heterogeneous criterion

(1): the set of initialization breakpoint.Algorithm according to the resulting candidate's breakpoint of claim 5 is handled first attribute in the property set, makes the initialization frontier point gather

(2): the initialization discrete solution.According to breakpoint collection initialization discrete solution

, simultaneously

,

(3): add candidate's breakpoint.In current discrete solution, add breakpoint to produce new scheme

(4): upgrade discrete solution.In current GD, establish discrete solution

, whether judge CF (D) greater than current Globalopt, if then upgrade CD=D, G=CF (D); If not then upgrading discrete solution

, continue relatively the value of CF (D) and Globalopt,, obtain a best CD scheme and have maximum Globalopt value after the possible and unduplicated scheme up to checking institute;

(5): circulation step (3) and (4) operation.Up to having verified all initial breakpoint set, end loop;

(6): the discrete point that obtains current attribute.Then to other conditional attributes, repeated execution of steps (1) to the flow process of step (5) to obtain the discrete point of each dimension attribute.

Experimental result: the characteristic parameter of this experiment: comprise root mean square RMS, spectrum centre distance SC, sideband energy ratio B ER and Mel cepstrum coefficient MFCC (12 dimension) by under the VC++ platform, extracting, as the property set of training sample, obtain the feature set input set of 15 dimensions.The algorithm of introducing according to last joint, to this 15 dimensional feature property set under the Matlab platform, carry out the linear discrete method, based on the discretize of entropy, handle based on the discretize of heterogeneous criterion, and three kinds of discretization methods have been carried out analyzing contrast.The audio frequency of choosing is that sampling rate is that one section duration of 44.1KHz/s, monophony, 16bit coding is the audio frequency combination of 60 seconds MP3 format: absolute music/voice/rap music respectively are 20 seconds.

The attribute SC that chooses the attribute vector collection analyzes, and four figure of Fig. 2 to Fig. 5 have represented that successively compressed domain audio feature SC is without the virgin state under the discrete form, linear discrete method, based on the discretize algorithm of entropy and the result schematic diagram of this characteristic parameter being handled based on the discretize of heterogeneous criterion.

By Fig. 5 and Fig. 2 contrast as can be seen, be a kind of mode that supervision is arranged based on the discretization method of heterogeneous standard, the selection meeting of discrete point is according to the difference of sample class and difference; And the artificial regulation parameter K of linear discrete algorithm, and once obtained all breakpoint set, and not considering the classified information that community-internal is contained, the result that discretize is handled does not ensure.

Can find by Fig. 4 and Fig. 5 contrast, discretize algorithm based on entropy has seriously changed original DATA DISTRIBUTION, the reason that this situation occurs is that preset threshold is excessive, and discretize result fails to reflect really the difference of raw data, but how setting threshold is the process of difficult operation; Heterogeneous criterion discretize then such problem can not occur.

Listed in the table 1 one section music samples 15 dimension attribute vector after standardizing before bidimensional be that the discrete point of RMS and SC attribute is selected.

Table 1 is based on the attribute discrete point of heterogeneous standard

The attribute discrete point	1	2	3	4	5	6	7
								RMS	0.0002	0.0003	0.0004	0.0052	0.0074	0.0075	0.0102
SC	0.0206	0.0308	0.0524	0.0669	0.0691	0.0915	0.1815
								The attribute discrete point	8	9	10	11	12	13	?
RMS	0.0104	0.0119	0.0122	0.0123	0.0216	0.0255	?
								SC	0.1823	0.2077	0.3042	0.5240	0.5570	0.9572	?

As can be seen from Table 1, based on the discretization method of heterogeneous standard, the discrete point of two attributes changes along with the change of attribute.In the above-mentioned linear discrete method, for 15 Wei Yangbenshuxingji, the discrete point of each attribute all is the same, and the new algorithm that proposes according to this paper can have nothing in common with each other to the selected discrete point that comes out of each dimension attribute, is determined by sample attribute itself and sample class.This method is more reasonable than " lumping together " formula in the linear discrete method, can keep the more characteristic of each attribute.

For the discretize result is verified, this paper has carried out class test on the basis as a result of discretize.Listed statistical classification degree of accuracy in the table 2, to linear discreteization, based on contrasting under the discretize (EBD) of entropy and three kinds of algorithms of heterogeneous normal scatterization to four kinds of audio samples.

The comparison of classification degree of accuracy under three kinds of discretize modes of table 2

Claims

1. MP3 audio attribute discretization method based on heterogeneous criterion, it is characterized in that: the concrete operations step is as follows:

1), the pre-service of MP3 audio frequency characteristics: comprise to the MP3 frame head decode, side information obtains, obtain master data and zoom factor, Hafman decoding and inverse quantization;

2), based on the audio feature extraction of MDCT coefficient: the MDCT coefficient of finding out two granularities of each frame the MP3 frame behind inverse quantization, MDCT coefficient to two particles asks average by Frequency point, make up the MDCT spectral coefficient of every frame audio frequency, extract root mean square RMS, spectrum centre distance SC, sideband energy ratio B ER, Mel cepstrum coefficient MFCC then;

, and with vector

With middle probability vector

Between distance be called vector

Heterogeneous amount

, with With the center of gravity probability vector

5), the discretize algorithm under the heterogeneous criterion: according to the candidate's breakpoint in the step 3) algorithm each dimension attribute in the property set is handled, and the property set after handling is carried out discretize according to the heterogeneous amount that calculates in the step 4).

2. the MP3 audio attribute discretization method based on heterogeneous criterion according to claim 1 is characterized in that the pre-service concrete steps of carrying out the MP3 audio frequency characteristics in the described step 1) are as follows:

(1), synchronous data flow and frame head information obtains;

(2), from the frame head information that decoding obtains, obtain side information;

(3), extract MP3 master data and zoom factor;

(4), the MP3 primary traffic is carried out Hafman decoding and inverse quantization.

3. the MP3 audio attribute discretization method based on heterogeneous criterion according to claim 1 is characterized in that described step 2) in the audio feature extraction concrete steps based on the MDCT coefficient as follows:

(1), makes up the MDCT coefficient of every frame audio frequency;

(2), extraction is based on root mean square RMS, spectrum centre distance SC, sideband energy ratio B ER, the Mel cepstrum coefficient MFCC of MDCT coefficient.

4. the MP3 audio attribute discretization method based on heterogeneous criterion according to claim 1, it is characterized in that: the selection concrete steps of candidate's breakpoint are as follows in the described step 3):

(1), initialization audio frequency characteristics property set;

(2), choose three vectors of four the order points formation in the audio frequency characteristics property set successively

, and calculate two groups of curvature that intersection is vectorial;

(3), judge according to the variation of curvature direction whether flex point exists;

(4), cycling, to other conditional attributes, the flow process that repeats Step1-Step3 is to obtain candidate's breakpoint set of each dimension attribute.

5. the MP3 audio attribute discretization method based on heterogeneous criterion according to claim 1, it is as follows to it is characterized in that described step 4) designs heterogeneous measuring step:

(1), calculates heterogeneous amount between the different types of audio according to Euclidean distance;

(2), calculate heterogeneity between the different types of audio according to the heterogeneous amount of selecting.

6. the MP3 audio attribute discretization method based on heterogeneous criterion according to claim 1 is characterized in that the discretize algorithm concrete steps under the heterogeneous criterion of described step 5) are as follows:

(1), to each dimension attribute collection initialization breakpoint set;

(2), according to initialized breakpoint set initialization discrete solution;

(3), in discrete solution, add candidate's breakpoint;

(4), whether basis verifies that all candidate's breakpoints upgrade discrete solution;

(5), circulation step (3) and (4) operation, up to verify that all initial breakpoint gather end loop;

(6), obtain the discrete point of current attribute, then to other conditional attributes, repeated execution of steps (1) to the flow process of step (5) to obtain the discrete point of each dimension attribute.