CN102789780B

CN102789780B - Method for identifying environment sound events based on time spectrum amplitude scaling vectors

Info

Publication number: CN102789780B
Application number: CN201210242825.5A
Authority: CN
Inventors: 李应
Original assignee: Fuzhou University
Current assignee: Fuzhou University
Priority date: 2012-07-14
Filing date: 2012-07-14
Publication date: 2014-10-01
Anticipated expiration: 2032-07-14
Also published as: CN102789780A

Abstract

The invention relates to a method for identifying environment sound events based on time spectrum amplitude scaling vectors. Firstly, a time spectrum amplitude scale vector (TSASV) of each related sound event is computed as an identifying prototype, and is stored in a database to be used as a template for identifying a sound event to be detected; and then the TSASV of the sound event to be detected is computed and is compared with each identifying prototype, and the sound event proximate to the identifying prototype is the sound event to be identified. The construction method of the TSASV comprises the steps that fast Fourier transform is carried out on the obtained environment sound event to generate a spectrogram; the generated spectrogram is sampled according to different frequency resolutions, and sub spectrograms with different frequency resolutions are constructed by the spectrogram; signal enhancement is carried out on the sub spectrograms to generate signal-enhanced sub spectrograms; amplitude scaling is carried out on the signal-enhanced sub spectrograms; and the sub spectrograms after amplitude scaling are coded to generate the TSASV. The method is conductive to improving the effectiveness of environment sound event identification.

Description

The method of amplitude classification vector identification ambient sound event during based on spectrum

Technical field

The present invention relates to sound event identification technique field, the method for amplitude classification vector (time-spectral amplitude scale vector, TSASV) identification ambient sound event when particularly one is based on spectrum.

Background technology

Ambient sound identification attempts to pick out in voice signal the real event being hidden in them.It is used to a lot of fields, as, environmental monitoring, sound scene analysis and multimedia data retrieval etc.The conventional method of sound event identification, comprises and from voice data, extracts discernible feature, and input using them as pattern classifier.In sound event identification, the feature extracting from voice signal is more effective, and the performance of identification is better.

Traditionally, sound signal is characterized into Mai Er frequency cepstral coefficient (MFCCs), or MFCC with combine with the MPEG-7 descriptor of hidden Markov model (HMM) recognizer.Recently, relevant research comprises that some other time-frequency represents, as short time discrete Fourier transform and wavelet transformation, or the high dimensional feature of sound signal is in conjunction with the MFCC with discrete Gabor small echo.Comprising of other: the method that the sense of hearing excites, by sense of hearing excitation filter group, when waveform signal is carried out-frequency expression, then, draw subband temporal envelope (subband temporal envelopes, STE), and using it as feature, sound signal is carried out to characterization, by the general gamma model of STE feature, using the distribution of STE as sound characteristic.Then, the parameter probability metrics between sample sound STE distribution is classified for svm classifier device.

But the physical characteristics of ambient sound event is very complicated, as hypothesis such as linear prediction, periodicity and specific models, not necessarily effective to a lot of sound events.The FAQs of carrying out sound recognition event with said method is that, in the time there is noise, performance falls sharply.In actual sense of hearing application, because sound source is uncertain, designing suitable detecting device has a lot of difficulties.In order to address these problems, new research comprises that time-encoded signal processing and identification (time encoded signal processing and recognition, TESPAR) matrix, energy measuring, spectrogram represent and the visual information of spectrogram.M. V. Ghiurcau etc. uses TESPAR matrix to monitor region, field, by detecting and identification three class sound, is derived from people's sound, bird sound and automobile sound, detects the invasion situation of relevant range.J. Moragues etc. is for the natural sound event of the unknown continuity, and used time-frequency multipotency detector feature, carries out detection and the classification of natural sound.L. Neal etc. proposes the method for segmentation, first, input signal is converted to spectrogram represent, then, application supervised classifier is that each time-frequency unit two-value mask label of establishment is as bird sound and background sound.J. the visual information of the use sonograph such as Dennis produces the feature of sound classification.These processes attempt to extract the feature of unique ambient sound event, when being included in sound event and overlapping, realize the identification to ambient sound event.

Except independent sound event, obtaining in sound process, also may there is the multi-acoustical that sends ambient sound event.As, the torrent sound in sound of the wind, brook and other ground unrest have weakened the acoustic information that we are concerned about, come from other animal and surrounding environment sound interference we be concerned about ambient sound event.More complicated situation, during obtaining sound, the sound event of being concerned about may come from two or more independent sound sources simultaneously.Therefore, under the environment of noise and many sound sources, to the identification Challenge of sound event.

Summary of the invention

The object of the present invention is to provide a kind of method based on when spectrum amplitude classification vector identification ambient sound event, the method is conducive to improve the validity of ambient sound event recognition.

For achieving the above object, the technical solution used in the present invention is: a kind of method based on when spectrum amplitude classification vector identification ambient sound event, first calculate the TSASV of various related sound events as identification prototype, and each identification prototype is kept in database as the template of differentiating sound event to be measured; Then calculate the TSASV of sound event to be measured, and the TSASV of described sound event to be measured is compared with the each identification prototype being kept in database, the prototype sound event corresponding with the immediate identification prototype of TSASV of described sound event to be measured, the sound event that will identify exactly;

When described spectrum, the building method of amplitude classification vector T SASV comprises the following steps:

Step 1: the ambient sound event of obtaining is carried out to fast fourier transform, generate sonograph;

Step 2: the sonograph generating is carried out to the sampling of different frequency resolution, build the sub-sonograph of different frequency resolution with sonograph;

Step 3: antithetical phrase sonograph carries out signal enhancing, generates the sub-sonograph that signal strengthens;

Step 4: the sub-sonograph that signal is strengthened carries out amplitude classification;

Step 5: the sub-sonograph after amplitude classification is encoded, generate TSASV.

While the invention has the beneficial effects as follows to compose, energy is basis, while adopting spectrum, amplitude classification vector carrys out characterization voice signal, for identification ambient sound event, adopt in this way, detecting device not only can detect the sound event in ground unrest, and can carry out effective identification and classification to ambient sound event, its performance is better than the svm classifier model based on MFCC.

Brief description of the drawings

The construction process schematic diagram of amplitude classification vector when Fig. 1 is the spectrum of the embodiment of the present invention.

Fig. 2 is one section of original sound oscillogram with various ambient sounds that is recorded on campus in the embodiment of the present invention.

Fig. 3 is G=3 in the embodiment of the present invention, amplitude hierarchical coding figure when d=1.

Fig. 4 is the cataloged procedure of signal enhancer sonograph 1 in the embodiment of the present invention.

Embodiment

The method of amplitude classification vector identification ambient sound event while the present invention is based on spectrum, first while calculating the spectrum of various related sound events, amplitude classification vector T SASV is as identification prototype, and each identification prototype is kept in database as the template of differentiating sound event to be measured; Then calculate the TSASV of sound event to be measured, and the TSASV of described sound event to be measured is compared with the each identification prototype being kept in database, the prototype sound event corresponding with the immediate identification prototype of TSASV of described sound event to be measured, the sound event that will identify exactly.

When described spectrum, the building method of amplitude classification vector as shown in Figure 1, comprises the following steps:

Step 5: the sub-sonograph after amplitude classification is encoded, generate TSASV, carry out characterization voice signal with TSASV, and for identification sound event.

In step 1, to the ambient sound event signal y (i) with noise of sampling, it be pure sound event signal s (i) with interfering noise n's (i) and, be y (i)=s (i)+n (i), i represents the index of sampling number, by window h (i), N continuous signal y (i) carried out to windowing, and the sample in window is carried out to fast fourier transform, the time-domain signal y (i) with noise is converted to frequency-region signal; Window is moved down to M sampled point, then calculate next fast fourier transform, the spectrum that obtains ambient sound event signal y (i) is:

Wherein, lrepresent the index of window sliding, i.e. the index of time domain signal frame, l∈ 0,1 ..., L-1}, L represents the sum of signal y (i) point of frame; K represents the index that signal frequency is differentiated, k ∈ 0,1 ..., N-1}, it is relevant with normalized centre frequency Ω k, can be expressed as Ω _k=2 π k/N, N represents the quantity that signal frequency is differentiated, j is the symbol of imaginary number.

In step 2, spectrum Y to ambient sound event signal y (i) (k, l) carry out the sampling of certain frequency resolution, obtain sub sampling spectrum Y ^d, spectrum Y (k, l) and sub sampling spectrum Y ^dbe expressed as follows:

Y ~ [ Y ¹, Y ², …Y ^d, …, Y ^D]

Wherein, D represents frequency spectrum Y to adopt a point as frequency discrimination point every D frequency discrimination point, spectrogram Y is resolved into son spectrum Y ^dquantity, d represents the sample index of sub-spectrogram of frequency discrimination, d ∈ (1,2 ..., D), Y ^drepresent the matrix of (N/D) × L, Y ^dcan be expressed as:

Y ^d(b, l) =Y( k _d, l )

Wherein, b represents son spectrum Y ^din spectrum Y, carry out the index of frequency sampling, b ∈ (0,1 ..., N/D-1), N/D represents son spectrum Y ^dline number, N/D ∈ { positive integer }, k _drepresent the sampled point of frequency discrimination, k _d=b*D+d-1.

For the sound obtaining in actual environment, may contain the noise of various compositions, therefore must carry out signal to the spectrum of this sound and strengthen processing.In step 3, antithetical phrase sample spectrum Y ^dcarry out signal and strengthen processing, sub sampling is composed to Y ^dconvert the sub sampling spectrum X that signal strengthens to ^d, signal strengthen spectrum X (k, l) and sub sampling spectrum X ^dbe expressed as follows:

X ~ [X ¹, X ^{2, …X d}, …, X ^D]

Wherein, D represents frequency spectrum Y to adopt a point as frequency discrimination point every D frequency discrimination point, spectrogram Y is resolved into son spectrum Y ^dquantity, d represents the sample index of sub-spectrogram of frequency discrimination, d ∈ (1,2 ..., D), X ^drepresent the matrix of (N/D) × L, X ^dcan be expressed as:

X ^d(b, l) = X ( k _d, l )

Wherein, b represents son spectrum X ^din spectrum X, carry out the index of frequency sampling, b ∈ (0,1 ..., N/D-1), N/D represents son spectrum X ^dline number, N/D ∈ { positive integer }, k _drepresent the sampled point of frequency discrimination, k _d=b*D+d-1.

In step 4, the sub-sonograph that signal is strengthened carries out amplitude classification and comprises the following steps:

For d signal enhancer spectrum X ^d, amplitude classification thresholds i (d) is expressed as:

i(d) = f _d/2

Wherein, f _drepresent d signal enhancer spectrum X ^dmaximal value, i.e. f _d=max (| X ^d|);

In actual applications, can, according to different acoustic environments, set a corresponding threshold value θ, work as f _d> θ, gets f _das X ^dthe reference of classification.That is to say, only at f _dwhen > θ, ability is to X ^dcarry out classification and coding, otherwise, be judged to be quiet.Like this, can, by different threshold value θ is set, the voice signal of various varying strengths be analyzed.

For d signal enhancer spectrum X ^d, according to amplitude classification thresholds i (d), obtain G grading range:

d-0: |X ^d|∈[0, 2 ⁰×i(d)]

d-1: |X ^d|∈( 2 ⁰×i(d), 2 ¹×i(d)]

…

d-g: |X ^d|∈( 2 ^g-1×i(d), 2 ^g×i(d)]

…

d-G: |X ^d|∈( 2 ^G-1×i(d), 2 ^G×i(d)]

Wherein, g represents the progression of amplitude classification, g ∈ 1,2 ..., G}, G represents at most progression that can classification, G ∈ 1,2 ..., 8 };

According to the span of G, to X ^dcarrying out classification obtains:

Wherein, the part that represents voice signal amplitude classification minimum, is the 0th grade, subconstiuent be 0,

Therefore, can be be expressed as:

Wherein, represent subconstiuent, the g level of amplitude classification, can be expressed as follows:

Wherein, the amplitude that T is illustrated in this rank in a frame satisfies condition 2 ^g-1× i (d) <X ^d(b, l)≤2 ^g× i (d) is desirable number at most, T=4; T represents the index of the amplitude of this rank, 0≤t≤T-1; lrepresent the index of frame in son spectrum, 0≤ l≤ L-1; B represents son spectrum X ^dthe index of frequency discrimination, b ∈ (0,1 ..., N/D-1); '. ' represents not value, time do not consider at coding, is in order to represent and the difference of the visual character representation of value part; c _brepresent available coded character set, its number is determined by N/D.With N/D=2 ⁶for example, c _bcan be expressed as:

c _b=b+c(b) ∈{ 43; 45; 48, ... , 57; 65, ..., 90; 97, ..., 122}

Represent by ASCII character character set,

c _b∈{‘+’; ‘-’; ‘0’, ... , ‘9’; ‘A’, ..., ‘Z’; ‘a’, ..., ‘z’ }

Work as b=0, c (0)=43 o'clock, c _b=b+c (b)=43=ascii ('+'), represents with '+' coding that frequency discrimination index is 0 that is.Work as b=1, c (1)=44 o'clock, c _b=b+c (b)=45=ascii ('-'), represents with '-' coding that frequency discrimination index is 1 that is.In like manner, in the time of 2≤b≤63, also can differentiate by corresponding character representation respective frequencies the coding of index.

In step 5, as follows the sub-sonograph after amplitude classification is encoded to generate TSASV:

Can use following mode, ambient sound event table is shown as to sub sampling spectrum 1 TSASV forming (1) being strengthened by a signal:

Or the sub sampling being strengthened by two signals spectrum 1 and sub sampling spectrum 2 TSASV that form (1) (2):

Or the sub sampling being strengthened by d signal spectrum 1, sub sampling spectrum 2 ..., son adopts the TSASV (1) (2) of spectrum d composition ... (d)

Wherein, d ∈ (1,2 ..., D) and be positive integer; In actual applications, according to different situations, TSASV can be combination in any, that is:

Wherein, component v ^dcan be expressed as:

Equally, v ^din component v ^d _wcan be expressed as:

can be expressed as:

Wherein, b ∈ (0,1 ..., N/D-1), for , represent son spectrum X ^dw sound event section in, amplitude is classified as g, frequency discrimination index is b, is encoded to character c _bquantity, w represents the segmented index of sound event, w ∈ 1,2 ..., W}, W represents the quantity of sound event segmentation.

For etc. acoustic segment or sound ordinal number to be identified, we also can pass through above-mentioned algorithm, represent that their TSASV (1) is as follows:

Or TSASV (1) (2):

Or TSASV (1) (2) ... (d):

Or

Wherein, component can be expressed as:

Equally, in component can be expressed as:

can be expressed as:

Wherein, represent sound event spectrum X to be measured ^dthe individual sound event section of w ' in, amplitude is classified as g, frequency discrimination sequence number is b, is encoded to character c _bquantity; W ' represents the segmented index of sound event, w ' ∈ 1,2 ..., W ' }, W ' represents the quantity of sound event segmentation to be measured;

The prototype sound event that in the sound event to be measured that is W ' for segments and database, segments is W, get shorter segment length:

W _M=min(W, W’)

Calculate the difference of sound event to be measured and prototype sound event hop count:

W _T=abs(W-W’)

If W-W ' is <0, i.e. the length that is shorter in length than sound event to be measured of prototype sound event segmentation, has:

Otherwise have

Wherein, w ₀the number of times that prototype sound event compares with sound event to be measured in order, w ₀∈ 1,2 ..., W _t;

Then the TSASV that, calculates script event and sound event to be identified is apart from s:

s=max(r(1), r(2), …, r(W _T))

Further the TSASV of prototype sound events all in sound event to be measured and database is compared, wherein the prototype sound event s of s maximum _idthe sound event that will identify exactly:

Wherein, represent to ask to make s _idmaximum s.

Below in conjunction with embodiment and accompanying drawing, the invention will be further described.

1. amplitude coding when frequency discrimination sampling is with spectrum

Fig. 2 is one section of original waveform at the other public thrush sound with various ambient sounds of recording of Campus Center lakelet.Step 1 during according to spectrum in the building method of amplitude classification vector, we can generate sonograph it.According to step 2, we carry out the sampling of different frequency resolution to the sonograph generating.In this example, we get D=4, N=64.Sonograph is generated to 4 sub-sonographs after a frequency of 4 frequency samplings.Specific practice is: get 1,5 in sonograph ..., n-3 (n≤N) frequency, forms sub-sonograph 1, then it is carried out to signal enhancing, obtains signal enhancer sonograph 1.Equally, get 2,6 in sonograph ..., n-2 frequency, 3,7 ..., n-1 frequency and 4,8 ... n frequency forms respectively sub-sonograph 2,3 and 4, then respectively they is carried out to signal and strengthens processing, obtains signal enhancer sonograph 2, signal enhancer sonograph 3 and signal enhancer sonograph 4.Like this, can obtain D=4, when N=64, the signal enhancer sonograph described in step 3, i.e. X ~ [X ¹, X ², X ³, X ⁴].

Taking signal enhancer sonograph 1 as example, for signal enhancer sonograph 1, after the processing of step 4, can obtain Fig. 3.In step 4, d=1, the amplitude peak value of this sound event is:

f _d=f ₁=max(|X ¹|)。

Get G=3, range value is divided into three grades,

i(d)=i(1)=f ₁/2 ^G=f ₁/8。

X ¹classification situation be,

d-1: |X ¹|∈( f ₁/8, f ₁/4 ];

d-2: |X ¹|∈( f ₁/4, f ₁/2 ];

d-3: |X ¹|∈( f ₁/2, f ₁ ]。

In the time of G=3, can obtain respectively the coding of basic, normal, high three amplitude level.Wherein, Fig. 3 (a) is d-1:(f ₁/ 8, f ₁/ 4] coding, ; Fig. 3 (b) is d-2:(f ₁/ 4, f ₁/ 2] coding, ; Fig. 3 (c) is d-3:(f ₁/ 2, f ₁] coding, .

Here, T=4 represents, as (f ₁/ 8, f ₁/ 4] interval exists multiple | X ^d| when value, can get at most 4 values wherein.As the second row in Fig. 3 (a), have one ' E ', represent in this frame, at (f ₁/ 8, f ₁/ 4] interval second value existing, its frequency discrimination index is ' E '.And ' 7 ' of this row the first row represent this frame first value in this interval, frequency discrimination index ' 7 '.Third and fourth behavior sky, represents that this frame only has two values in this interval.

For the TSASV coding in this example, get N/D=2 ⁴, b ∈ (0,1 ..., 15), order

c _b=b+c(b) ∈{48,49,50,51,52,53,54,55,56,57; 65,66,67,68,69,70}，

So,

Represent c by ASCII character character set _b,

c _b∈{ ‘0’, ... , ‘9’, ‘A’, ..., ‘F’ }

So, can represent respectively the coding that sub-spectrogram frequency discrimination index is 0～15 with ' 0 ' ~ ' F '.

Equally, according to step 4, can be by Fig. 3 (a), Fig. 3 (b) and Fig. 3 (c) are merged into Fig. 4 (a).According to experiment, in Fig. 4 (a), in three amplitude level, if the Signal coding that is greater than continuously 10 frames is for empty, set it as the mark of segments of sounds.Like this, can obtain Fig. 4 (b).In each acoustic segment, the section that non-NULL coding number is less than 5, rejects.Therefore, for the first paragraph in Fig. 4 (b), only have a numerical value, we reject it.Can obtain like this Fig. 4 (c), totally 4 sections, are equivalent to get the W=4 in step 5.According to step 5, further Fig. 4 (c) is encoded, obtain identification vector, as Fig. 4 (d).Here d ∈ { 1,2,3,4}, w ∈ { 1,2,3,4}, g ∈ { 1,2,3}.Work as d=1, w=1, when g=1, can be expressed as

,

When g=2, g=2, be expressed as

,

When g=3, be expressed as

.

While being G=3, can be expressed as

{2.9 A 4.A}.

Equally, work as w=2,3,4 o'clock, we can obtain , with as follows

={ A 4.A 6.A },

={4.7-2.8-4.9-A-3.E 9.7-4.8-A 0 },

={ A A-A-2 7}。

Further, can be , , with be expressed as ,

={{2.9 A 4.A },

{ A 4.A 6.A },

{4.7-2.8-4.9-A-3.E 9.7-4.8-A 0 },

{ A A-A-2 7}}.

For d=2,3 and the signal enhancer sonograph 2,3 and 4 of 4 o'clock, can obtain equally identification vector , with .

={{2.9-A 2.A 2.A },

{2.A 4.A 8.A },

{2.8 5.8 0},

{4.A A 0}}。

={{8-2.A 2.9-A 9},

{9-2.A 2.A 7.A },

{5.9 3.9 0}}。

={{2.8-5.9 2.8-9 8-5.9},

{5-4.D 6 0 },

{3.6-7.7-4.8-3.9-5.D 5.7-5.9-D 9},

{3.9 2.9 6.9}}.

2. ambient sound event and TSASV

According to above-mentioned steps, for any one sound event, carry out TSASV coding, obtain TSASV identification vector.Taking this sound clip of Fig. 2 as example, according to step 1,2,3,4 and 5, we can obtain the TSASV coding v of its signal enhancer sonograph 1 ¹.According to step 5, work as v={v ¹time, expression v ¹identify this sound clip.Can use the v shown in Fig. 4 (d) ¹as TSASV coding, marked graph 2 these sound clips.

According to step 5, Fig. 2 also can be according to actual conditions, by its combination in any of TSASV coding of signal enhancer sonograph 1, signal enhancer sonograph 2, signal enhancer sonograph 3 and signal enhancer sonograph 4, , identify.

Be more than preferred embodiment of the present invention, all changes of doing according to technical solution of the present invention, when the function producing does not exceed the scope of technical solution of the present invention, all belong to protection scope of the present invention.

Claims

1. the method based on when spectrum amplitude classification vector identification ambient sound event, it is characterized in that: while first calculating the spectrum of various related sound events, amplitude classification vector T SASV, as identification prototype, and is kept in database each identification prototype as the template of differentiating sound event to be measured; Then calculate the TSASV of sound event to be measured, and the TSASV of described sound event to be measured and each identification prototype are compared, the prototype sound event corresponding with the immediate identification prototype of TSASV of described sound event to be measured, the sound event that will identify exactly;

When described spectrum, the building method of amplitude classification vector comprises the following steps:

2. the method based on when spectrum amplitude classification vector identification ambient sound event according to claim 1, it is characterized in that: in step 1, to the ambient sound event signal y (i) with noise of sampling, i represents the index of sampling number, by window h (i), N continuous signal y (i) carried out to windowing, and the sample in window is carried out to fast fourier transform, the time-domain signal y (i) with noise is converted to frequency-region signal; Window is moved down to M sampled point, then calculate next fast fourier transform, the spectrum that obtains ambient sound event signal y (i) is:

Wherein, lrepresent the index of window sliding, i.e. the index of time domain signal frame, l∈ 0,1 ..., L-1}, L represents the sum of signal y (i) point of frame; K represents the index that signal frequency is differentiated, k ∈ 0,1 ..., N-1}, N represents the quantity that signal frequency is differentiated.

3. the method based on when spectrum amplitude classification vector identification ambient sound event according to claim 2, is characterized in that: in step 2, spectrum Y to ambient sound event signal y (i) (k, l) carry out the sampling of certain frequency resolution, obtain sub sampling spectrum Y ^d, spectrum Y (k, l) and sub sampling spectrum Y ^dbe expressed as follows:

Y ~ [ Y ¹, Y ², …Y ^d, …, Y ^D]

Y ^d(b, l) =Y( k _d, l )

4. the method based on when spectrum amplitude classification vector identification ambient sound event according to claim 3, is characterized in that: in step 3, and antithetical phrase sample spectrum Y ^dcarry out signal and strengthen processing, sub sampling is composed to Y ^dconvert the sub sampling spectrum X that signal strengthens to ^d, signal strengthen spectrum X (k, l) and sub sampling spectrum X ^dbe expressed as follows:

X ~ [X ¹, X ², …X ^d, …, X ^D]

X ^d(b, l) = X ( k _d, l )

5. the method based on when spectrum amplitude classification vector identification ambient sound event according to claim 4, is characterized in that: in step 4, the sub-sonograph that signal is strengthened carries out amplitude classification and comprises the following steps:

i(d) = f _d/2

d-1: |X ^d|∈( 2 ⁰×i(d), 2 ¹×i(d)]

…

d-g: |X ^d|∈( 2 ^g-1×i(d), 2 ^g×i(d)]

…

d-G: |X ^d|∈( 2 ^G-1×i(d), 2 ^G×i(d)]

Wherein, g represents the progression of amplitude classification, g ∈ 1,2 ..., G}, G represents at most progression that can classification;

According to the span of G, to X ^dcarrying out classification obtains:

Wherein, the amplitude that T is illustrated in this rank in a frame satisfies condition 2 ^g-1× i (d) <X ^d(b, l)≤2 ^g× i (d) is desirable number at most; T represents the index of the amplitude of this rank, 0≤t≤T-1; lrepresent the index of frame in son spectrum, 0≤ l≤ L-1; B represents son spectrum X ^dthe index of frequency discrimination, b ∈ (0,1 ..., N/D-1); '. ' represents not value, in the time of coding, do not consider; c _brepresent available coded character set.

6. the method based on when spectrum amplitude classification vector identification ambient sound event according to claim 5, is characterized in that: in step 5, as follows the sub-sonograph after amplitude classification is encoded to generate TSASV:

Ambient sound event table is shown as by signal enhancer spectrum 1 TSASV who forms (1):

Or by d signal enhancer spectrum 1, son spectrum 2 ..., son spectrum d composition TSASV (1) (2) ... (d):

Wherein, d ∈ (1,2 ..., D); In actual applications, according to different situations, TSASV can be combination in any, that is:

Wherein, component v ^dcan be expressed as:

Equally, v ^din component v ^d _wcan be expressed as:

can be expressed as:

Wherein, for , represent son spectrum X ^dw sound event section in, amplitude is classified as g, frequency discrimination index is b, is encoded to character c _bquantity, w represents the segmented index of sound event, w ∈ 1,2 ..., W}, W represents the quantity of sound event segmentation.

7. the method based on when spectrum amplitude classification vector identification ambient sound event according to claim 6, is characterized in that:

For sound event to be measured, TSASV (d) is expressed as follows:

The prototype sound event that the sound event to be measured that is W ' for segments and segments are W, get shorter segment length:

W _M=min(W, W’)

W _T=abs(W-W’)

Otherwise have

s=max(r(1), r(2), …, r(W _T))

Wherein, represent to ask to make s _idmaximum s.