CN102789780B - Method for identifying environment sound events based on time spectrum amplitude scaling vectors - Google Patents

Method for identifying environment sound events based on time spectrum amplitude scaling vectors Download PDF

Info

Publication number
CN102789780B
CN102789780B CN201210242825.5A CN201210242825A CN102789780B CN 102789780 B CN102789780 B CN 102789780B CN 201210242825 A CN201210242825 A CN 201210242825A CN 102789780 B CN102789780 B CN 102789780B
Authority
CN
China
Prior art keywords
spectrum
sound event
signal
tsasv
sub
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Expired - Fee Related
Application number
CN201210242825.5A
Other languages
Chinese (zh)
Other versions
CN102789780A (en
Inventor
李应
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Fuzhou University
Original Assignee
Fuzhou University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Fuzhou University filed Critical Fuzhou University
Priority to CN201210242825.5A priority Critical patent/CN102789780B/en
Publication of CN102789780A publication Critical patent/CN102789780A/en
Application granted granted Critical
Publication of CN102789780B publication Critical patent/CN102789780B/en
Expired - Fee Related legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Abstract

The invention relates to a method for identifying environment sound events based on time spectrum amplitude scaling vectors. Firstly, a time spectrum amplitude scale vector (TSASV) of each related sound event is computed as an identifying prototype, and is stored in a database to be used as a template for identifying a sound event to be detected; and then the TSASV of the sound event to be detected is computed and is compared with each identifying prototype, and the sound event proximate to the identifying prototype is the sound event to be identified. The construction method of the TSASV comprises the steps that fast Fourier transform is carried out on the obtained environment sound event to generate a spectrogram; the generated spectrogram is sampled according to different frequency resolutions, and sub spectrograms with different frequency resolutions are constructed by the spectrogram; signal enhancement is carried out on the sub spectrograms to generate signal-enhanced sub spectrograms; amplitude scaling is carried out on the signal-enhanced sub spectrograms; and the sub spectrograms after amplitude scaling are coded to generate the TSASV. The method is conductive to improving the effectiveness of environment sound event identification.

Description

The method of amplitude classification vector identification ambient sound event during based on spectrum
Technical field
The present invention relates to sound event identification technique field, the method for amplitude classification vector (time-spectral amplitude scale vector, TSASV) identification ambient sound event when particularly one is based on spectrum.
Background technology
Ambient sound identification attempts to pick out in voice signal the real event being hidden in them.It is used to a lot of fields, as, environmental monitoring, sound scene analysis and multimedia data retrieval etc.The conventional method of sound event identification, comprises and from voice data, extracts discernible feature, and input using them as pattern classifier.In sound event identification, the feature extracting from voice signal is more effective, and the performance of identification is better.
Traditionally, sound signal is characterized into Mai Er frequency cepstral coefficient (MFCCs), or MFCC with combine with the MPEG-7 descriptor of hidden Markov model (HMM) recognizer.Recently, relevant research comprises that some other time-frequency represents, as short time discrete Fourier transform and wavelet transformation, or the high dimensional feature of sound signal is in conjunction with the MFCC with discrete Gabor small echo.Comprising of other: the method that the sense of hearing excites, by sense of hearing excitation filter group, when waveform signal is carried out-frequency expression, then, draw subband temporal envelope (subband temporal envelopes, STE), and using it as feature, sound signal is carried out to characterization, by the general gamma model of STE feature, using the distribution of STE as sound characteristic.Then, the parameter probability metrics between sample sound STE distribution is classified for svm classifier device.
But the physical characteristics of ambient sound event is very complicated, as hypothesis such as linear prediction, periodicity and specific models, not necessarily effective to a lot of sound events.The FAQs of carrying out sound recognition event with said method is that, in the time there is noise, performance falls sharply.In actual sense of hearing application, because sound source is uncertain, designing suitable detecting device has a lot of difficulties.In order to address these problems, new research comprises that time-encoded signal processing and identification (time encoded signal processing and recognition, TESPAR) matrix, energy measuring, spectrogram represent and the visual information of spectrogram.M. V. Ghiurcau etc. uses TESPAR matrix to monitor region, field, by detecting and identification three class sound, is derived from people's sound, bird sound and automobile sound, detects the invasion situation of relevant range.J. Moragues etc. is for the natural sound event of the unknown continuity, and used time-frequency multipotency detector feature, carries out detection and the classification of natural sound.L. Neal etc. proposes the method for segmentation, first, input signal is converted to spectrogram represent, then, application supervised classifier is that each time-frequency unit two-value mask label of establishment is as bird sound and background sound.J. the visual information of the use sonograph such as Dennis produces the feature of sound classification.These processes attempt to extract the feature of unique ambient sound event, when being included in sound event and overlapping, realize the identification to ambient sound event.
Except independent sound event, obtaining in sound process, also may there is the multi-acoustical that sends ambient sound event.As, the torrent sound in sound of the wind, brook and other ground unrest have weakened the acoustic information that we are concerned about, come from other animal and surrounding environment sound interference we be concerned about ambient sound event.More complicated situation, during obtaining sound, the sound event of being concerned about may come from two or more independent sound sources simultaneously.Therefore, under the environment of noise and many sound sources, to the identification Challenge of sound event.
Summary of the invention
The object of the present invention is to provide a kind of method based on when spectrum amplitude classification vector identification ambient sound event, the method is conducive to improve the validity of ambient sound event recognition.
For achieving the above object, the technical solution used in the present invention is: a kind of method based on when spectrum amplitude classification vector identification ambient sound event, first calculate the TSASV of various related sound events as identification prototype, and each identification prototype is kept in database as the template of differentiating sound event to be measured; Then calculate the TSASV of sound event to be measured, and the TSASV of described sound event to be measured is compared with the each identification prototype being kept in database, the prototype sound event corresponding with the immediate identification prototype of TSASV of described sound event to be measured, the sound event that will identify exactly;
When described spectrum, the building method of amplitude classification vector T SASV comprises the following steps:
Step 1: the ambient sound event of obtaining is carried out to fast fourier transform, generate sonograph;
Step 2: the sonograph generating is carried out to the sampling of different frequency resolution, build the sub-sonograph of different frequency resolution with sonograph;
Step 3: antithetical phrase sonograph carries out signal enhancing, generates the sub-sonograph that signal strengthens;
Step 4: the sub-sonograph that signal is strengthened carries out amplitude classification;
Step 5: the sub-sonograph after amplitude classification is encoded, generate TSASV.
While the invention has the beneficial effects as follows to compose, energy is basis, while adopting spectrum, amplitude classification vector carrys out characterization voice signal, for identification ambient sound event, adopt in this way, detecting device not only can detect the sound event in ground unrest, and can carry out effective identification and classification to ambient sound event, its performance is better than the svm classifier model based on MFCC.
Brief description of the drawings
The construction process schematic diagram of amplitude classification vector when Fig. 1 is the spectrum of the embodiment of the present invention.
Fig. 2 is one section of original sound oscillogram with various ambient sounds that is recorded on campus in the embodiment of the present invention.
Fig. 3 is G=3 in the embodiment of the present invention, amplitude hierarchical coding figure when d=1.
Fig. 4 is the cataloged procedure of signal enhancer sonograph 1 in the embodiment of the present invention.
Embodiment
The method of amplitude classification vector identification ambient sound event while the present invention is based on spectrum, first while calculating the spectrum of various related sound events, amplitude classification vector T SASV is as identification prototype, and each identification prototype is kept in database as the template of differentiating sound event to be measured; Then calculate the TSASV of sound event to be measured, and the TSASV of described sound event to be measured is compared with the each identification prototype being kept in database, the prototype sound event corresponding with the immediate identification prototype of TSASV of described sound event to be measured, the sound event that will identify exactly.
When described spectrum, the building method of amplitude classification vector as shown in Figure 1, comprises the following steps:
Step 1: the ambient sound event of obtaining is carried out to fast fourier transform, generate sonograph;
Step 2: the sonograph generating is carried out to the sampling of different frequency resolution, build the sub-sonograph of different frequency resolution with sonograph;
Step 3: antithetical phrase sonograph carries out signal enhancing, generates the sub-sonograph that signal strengthens;
Step 4: the sub-sonograph that signal is strengthened carries out amplitude classification;
Step 5: the sub-sonograph after amplitude classification is encoded, generate TSASV, carry out characterization voice signal with TSASV, and for identification sound event.
In step 1, to the ambient sound event signal y (i) with noise of sampling, it be pure sound event signal s (i) with interfering noise n's (i) and, be y (i)=s (i)+n (i), i represents the index of sampling number, by window h (i), N continuous signal y (i) carried out to windowing, and the sample in window is carried out to fast fourier transform, the time-domain signal y (i) with noise is converted to frequency-region signal; Window is moved down to M sampled point, then calculate next fast fourier transform, the spectrum that obtains ambient sound event signal y (i) is:
Wherein, lrepresent the index of window sliding, i.e. the index of time domain signal frame, l∈ 0,1 ..., L-1}, L represents the sum of signal y (i) point of frame; K represents the index that signal frequency is differentiated, k ∈ 0,1 ..., N-1}, it is relevant with normalized centre frequency Ω k, can be expressed as Ω k=2 π k/N, N represents the quantity that signal frequency is differentiated, j is the symbol of imaginary number.
In step 2, spectrum Y to ambient sound event signal y (i) (k, l) carry out the sampling of certain frequency resolution, obtain sub sampling spectrum Y d, spectrum Y (k, l) and sub sampling spectrum Y dbe expressed as follows:
Y ~ [ Y 1, Y 2, …Y d, …, Y D]
Wherein, D represents frequency spectrum Y to adopt a point as frequency discrimination point every D frequency discrimination point, spectrogram Y is resolved into son spectrum Y dquantity, d represents the sample index of sub-spectrogram of frequency discrimination, d ∈ (1,2 ..., D), Y drepresent the matrix of (N/D) × L, Y dcan be expressed as:
Y d(b, l) =Y( k d, l )
Wherein, b represents son spectrum Y din spectrum Y, carry out the index of frequency sampling, b ∈ (0,1 ..., N/D-1), N/D represents son spectrum Y dline number, N/D ∈ { positive integer }, k drepresent the sampled point of frequency discrimination, k d=b*D+d-1.
For the sound obtaining in actual environment, may contain the noise of various compositions, therefore must carry out signal to the spectrum of this sound and strengthen processing.In step 3, antithetical phrase sample spectrum Y dcarry out signal and strengthen processing, sub sampling is composed to Y dconvert the sub sampling spectrum X that signal strengthens to d, signal strengthen spectrum X (k, l) and sub sampling spectrum X dbe expressed as follows:
X ~ [X 1, X 2, …X d, …, X D]
Wherein, D represents frequency spectrum Y to adopt a point as frequency discrimination point every D frequency discrimination point, spectrogram Y is resolved into son spectrum Y dquantity, d represents the sample index of sub-spectrogram of frequency discrimination, d ∈ (1,2 ..., D), X drepresent the matrix of (N/D) × L, X dcan be expressed as:
X d(b, l) = X ( k d, l )
Wherein, b represents son spectrum X din spectrum X, carry out the index of frequency sampling, b ∈ (0,1 ..., N/D-1), N/D represents son spectrum X dline number, N/D ∈ { positive integer }, k drepresent the sampled point of frequency discrimination, k d=b*D+d-1.
In step 4, the sub-sonograph that signal is strengthened carries out amplitude classification and comprises the following steps:
For d signal enhancer spectrum X d, amplitude classification thresholds i (d) is expressed as:
i(d) = f d/2
Wherein, f drepresent d signal enhancer spectrum X dmaximal value, i.e. f d=max (| X d|);
In actual applications, can, according to different acoustic environments, set a corresponding threshold value θ, work as f d> θ, gets f das X dthe reference of classification.That is to say, only at f dwhen > θ, ability is to X dcarry out classification and coding, otherwise, be judged to be quiet.Like this, can, by different threshold value θ is set, the voice signal of various varying strengths be analyzed.
For d signal enhancer spectrum X d, according to amplitude classification thresholds i (d), obtain G grading range:
d-0: |X d|∈[0, 2 0×i(d)]
d-1: |X d|∈( 2 0×i(d), 2 1×i(d)]
d-g: |X d|∈( 2 g-1×i(d), 2 g×i(d)]
d-G: |X d|∈( 2 G-1×i(d), 2 G×i(d)]
Wherein, g represents the progression of amplitude classification, g ∈ 1,2 ..., G}, G represents at most progression that can classification, G ∈ 1,2 ..., 8 };
According to the span of G, to X dcarrying out classification obtains:
Wherein, the part that represents voice signal amplitude classification minimum, is the 0th grade, subconstiuent be 0,
Therefore, can be be expressed as:
Wherein, represent subconstiuent, the g level of amplitude classification, can be expressed as follows:
Wherein, the amplitude that T is illustrated in this rank in a frame satisfies condition 2 g-1× i (d) <X d(b, l)≤2 g× i (d) is desirable number at most, T=4; T represents the index of the amplitude of this rank, 0≤t≤T-1; lrepresent the index of frame in son spectrum, 0≤ l≤ L-1; B represents son spectrum X dthe index of frequency discrimination, b ∈ (0,1 ..., N/D-1); '. ' represents not value, time do not consider at coding, is in order to represent and the difference of the visual character representation of value part; c brepresent available coded character set, its number is determined by N/D.With N/D=2 6for example, c bcan be expressed as:
c b=b+c(b) ∈{ 43; 45; 48, ... , 57; 65, ..., 90; 97, ..., 122}
Represent by ASCII character character set,
c b∈{‘+’; ‘-’; ‘0’, ... , ‘9’; ‘A’, ..., ‘Z’; ‘a’, ..., ‘z’ }
Work as b=0, c (0)=43 o'clock, c b=b+c (b)=43=ascii ('+'), represents with '+' coding that frequency discrimination index is 0 that is.Work as b=1, c (1)=44 o'clock, c b=b+c (b)=45=ascii ('-'), represents with '-' coding that frequency discrimination index is 1 that is.In like manner, in the time of 2≤b≤63, also can differentiate by corresponding character representation respective frequencies the coding of index.
In step 5, as follows the sub-sonograph after amplitude classification is encoded to generate TSASV:
Can use following mode, ambient sound event table is shown as to sub sampling spectrum 1 TSASV forming (1) being strengthened by a signal:
Or the sub sampling being strengthened by two signals spectrum 1 and sub sampling spectrum 2 TSASV that form (1) (2):
Or the sub sampling being strengthened by d signal spectrum 1, sub sampling spectrum 2 ..., son adopts the TSASV (1) (2) of spectrum d composition ... (d)
Wherein, d ∈ (1,2 ..., D) and be positive integer; In actual applications, according to different situations, TSASV can be combination in any, that is:
Wherein, component v dcan be expressed as:
Equally, v din component v d wcan be expressed as:
can be expressed as:
Wherein, b ∈ (0,1 ..., N/D-1), for , represent son spectrum X dw sound event section in, amplitude is classified as g, frequency discrimination index is b, is encoded to character c bquantity, w represents the segmented index of sound event, w ∈ 1,2 ..., W}, W represents the quantity of sound event segmentation.
For etc. acoustic segment or sound ordinal number to be identified, we also can pass through above-mentioned algorithm, represent that their TSASV (1) is as follows:
Or TSASV (1) (2):
Or TSASV (1) (2) ... (d):
Or
Wherein, component can be expressed as:
Equally, in component can be expressed as:
can be expressed as:
Wherein, represent sound event spectrum X to be measured dthe individual sound event section of w ' in, amplitude is classified as g, frequency discrimination sequence number is b, is encoded to character c bquantity; W ' represents the segmented index of sound event, w ' ∈ 1,2 ..., W ' }, W ' represents the quantity of sound event segmentation to be measured;
The prototype sound event that in the sound event to be measured that is W ' for segments and database, segments is W, get shorter segment length:
W M=min(W, W’)
Calculate the difference of sound event to be measured and prototype sound event hop count:
W T=abs(W-W’)
If W-W ' is <0, i.e. the length that is shorter in length than sound event to be measured of prototype sound event segmentation, has:
Otherwise have
Wherein, w 0the number of times that prototype sound event compares with sound event to be measured in order, w 0∈ 1,2 ..., W t;
Then the TSASV that, calculates script event and sound event to be identified is apart from s:
s=max(r(1), r(2), …, r(W T))
Further the TSASV of prototype sound events all in sound event to be measured and database is compared, wherein the prototype sound event s of s maximum idthe sound event that will identify exactly:
Wherein, represent to ask to make s idmaximum s.
Below in conjunction with embodiment and accompanying drawing, the invention will be further described.
1. amplitude coding when frequency discrimination sampling is with spectrum
Fig. 2 is one section of original waveform at the other public thrush sound with various ambient sounds of recording of Campus Center lakelet.Step 1 during according to spectrum in the building method of amplitude classification vector, we can generate sonograph it.According to step 2, we carry out the sampling of different frequency resolution to the sonograph generating.In this example, we get D=4, N=64.Sonograph is generated to 4 sub-sonographs after a frequency of 4 frequency samplings.Specific practice is: get 1,5 in sonograph ..., n-3 (n≤N) frequency, forms sub-sonograph 1, then it is carried out to signal enhancing, obtains signal enhancer sonograph 1.Equally, get 2,6 in sonograph ..., n-2 frequency, 3,7 ..., n-1 frequency and 4,8 ... n frequency forms respectively sub-sonograph 2,3 and 4, then respectively they is carried out to signal and strengthens processing, obtains signal enhancer sonograph 2, signal enhancer sonograph 3 and signal enhancer sonograph 4.Like this, can obtain D=4, when N=64, the signal enhancer sonograph described in step 3, i.e. X ~ [X 1, X 2, X 3, X 4].
Taking signal enhancer sonograph 1 as example, for signal enhancer sonograph 1, after the processing of step 4, can obtain Fig. 3.In step 4, d=1, the amplitude peak value of this sound event is:
f d=f 1=max(|X 1|)。
Get G=3, range value is divided into three grades,
i(d)=i(1)=f 1/2 G=f 1/8。
X 1classification situation be,
d-1: |X 1|∈( f 1/8, f 1/4 ];
d-2: |X 1|∈( f 1/4, f 1/2 ];
d-3: |X 1|∈( f 1/2, f 1 ]。
In the time of G=3, can obtain respectively the coding of basic, normal, high three amplitude level.Wherein, Fig. 3 (a) is d-1:(f 1/ 8, f 1/ 4] coding, ; Fig. 3 (b) is d-2:(f 1/ 4, f 1/ 2] coding, ; Fig. 3 (c) is d-3:(f 1/ 2, f 1] coding, .
Here, T=4 represents, as (f 1/ 8, f 1/ 4] interval exists multiple | X d| when value, can get at most 4 values wherein.As the second row in Fig. 3 (a), have one ' E ', represent in this frame, at (f 1/ 8, f 1/ 4] interval second value existing, its frequency discrimination index is ' E '.And ' 7 ' of this row the first row represent this frame first value in this interval, frequency discrimination index ' 7 '.Third and fourth behavior sky, represents that this frame only has two values in this interval.
For the TSASV coding in this example, get N/D=2 4, b ∈ (0,1 ..., 15), order
c b=b+c(b) ∈{48,49,50,51,52,53,54,55,56,57; 65,66,67,68,69,70},
So,
Represent c by ASCII character character set b,
c b∈{ ‘0’, ... , ‘9’, ‘A’, ..., ‘F’ }
So, can represent respectively the coding that sub-spectrogram frequency discrimination index is 0~15 with ' 0 ' ~ ' F '.
Equally, according to step 4, can be by Fig. 3 (a), Fig. 3 (b) and Fig. 3 (c) are merged into Fig. 4 (a).According to experiment, in Fig. 4 (a), in three amplitude level, if the Signal coding that is greater than continuously 10 frames is for empty, set it as the mark of segments of sounds.Like this, can obtain Fig. 4 (b).In each acoustic segment, the section that non-NULL coding number is less than 5, rejects.Therefore, for the first paragraph in Fig. 4 (b), only have a numerical value, we reject it.Can obtain like this Fig. 4 (c), totally 4 sections, are equivalent to get the W=4 in step 5.According to step 5, further Fig. 4 (c) is encoded, obtain identification vector, as Fig. 4 (d).Here d ∈ { 1,2,3,4}, w ∈ { 1,2,3,4}, g ∈ { 1,2,3}.Work as d=1, w=1, when g=1, can be expressed as
,
When g=2, g=2, be expressed as
,
When g=3, be expressed as
.
While being G=3, can be expressed as
{2.9 A 4.A}.
Equally, work as w=2,3,4 o'clock, we can obtain , with as follows
={ A 4.A 6.A },
={4.7-2.8-4.9-A-3.E 9.7-4.8-A 0 },
={ A A-A-2 7}。
Further, can be , , with be expressed as ,
={{2.9 A 4.A },
{ A 4.A 6.A },
{4.7-2.8-4.9-A-3.E 9.7-4.8-A 0 },
{ A A-A-2 7}}.
For d=2,3 and the signal enhancer sonograph 2,3 and 4 of 4 o'clock, can obtain equally identification vector , with .
={{2.9-A 2.A 2.A },
{2.A 4.A 8.A },
{2.8 5.8 0},
{4.A A 0}}。
={{8-2.A 2.9-A 9},
{9-2.A 2.A 7.A },
{5.9 3.9 0}}。
={{2.8-5.9 2.8-9 8-5.9},
{5-4.D 6 0 },
{3.6-7.7-4.8-3.9-5.D 5.7-5.9-D 9},
{3.9 2.9 6.9}}.
2. ambient sound event and TSASV
According to above-mentioned steps, for any one sound event, carry out TSASV coding, obtain TSASV identification vector.Taking this sound clip of Fig. 2 as example, according to step 1,2,3,4 and 5, we can obtain the TSASV coding v of its signal enhancer sonograph 1 1.According to step 5, work as v={v 1time, expression v 1identify this sound clip.Can use the v shown in Fig. 4 (d) 1as TSASV coding, marked graph 2 these sound clips.
According to step 5, Fig. 2 also can be according to actual conditions, by its combination in any of TSASV coding of signal enhancer sonograph 1, signal enhancer sonograph 2, signal enhancer sonograph 3 and signal enhancer sonograph 4, , identify.
Be more than preferred embodiment of the present invention, all changes of doing according to technical solution of the present invention, when the function producing does not exceed the scope of technical solution of the present invention, all belong to protection scope of the present invention.

Claims (7)

1. the method based on when spectrum amplitude classification vector identification ambient sound event, it is characterized in that: while first calculating the spectrum of various related sound events, amplitude classification vector T SASV, as identification prototype, and is kept in database each identification prototype as the template of differentiating sound event to be measured; Then calculate the TSASV of sound event to be measured, and the TSASV of described sound event to be measured and each identification prototype are compared, the prototype sound event corresponding with the immediate identification prototype of TSASV of described sound event to be measured, the sound event that will identify exactly;
When described spectrum, the building method of amplitude classification vector comprises the following steps:
Step 1: the ambient sound event of obtaining is carried out to fast fourier transform, generate sonograph;
Step 2: the sonograph generating is carried out to the sampling of different frequency resolution, build the sub-sonograph of different frequency resolution with sonograph;
Step 3: antithetical phrase sonograph carries out signal enhancing, generates the sub-sonograph that signal strengthens;
Step 4: the sub-sonograph that signal is strengthened carries out amplitude classification;
Step 5: the sub-sonograph after amplitude classification is encoded, generate TSASV.
2. the method based on when spectrum amplitude classification vector identification ambient sound event according to claim 1, it is characterized in that: in step 1, to the ambient sound event signal y (i) with noise of sampling, i represents the index of sampling number, by window h (i), N continuous signal y (i) carried out to windowing, and the sample in window is carried out to fast fourier transform, the time-domain signal y (i) with noise is converted to frequency-region signal; Window is moved down to M sampled point, then calculate next fast fourier transform, the spectrum that obtains ambient sound event signal y (i) is:
Wherein, lrepresent the index of window sliding, i.e. the index of time domain signal frame, l∈ 0,1 ..., L-1}, L represents the sum of signal y (i) point of frame; K represents the index that signal frequency is differentiated, k ∈ 0,1 ..., N-1}, N represents the quantity that signal frequency is differentiated.
3. the method based on when spectrum amplitude classification vector identification ambient sound event according to claim 2, is characterized in that: in step 2, spectrum Y to ambient sound event signal y (i) (k, l) carry out the sampling of certain frequency resolution, obtain sub sampling spectrum Y d, spectrum Y (k, l) and sub sampling spectrum Y dbe expressed as follows:
Y ~ [ Y 1, Y 2, …Y d, …, Y D]
Wherein, D represents frequency spectrum Y to adopt a point as frequency discrimination point every D frequency discrimination point, spectrogram Y is resolved into son spectrum Y dquantity, d represents the sample index of sub-spectrogram of frequency discrimination, d ∈ (1,2 ..., D), Y drepresent the matrix of (N/D) × L, Y dcan be expressed as:
Y d(b, l) =Y( k d, l )
Wherein, b represents son spectrum Y din spectrum Y, carry out the index of frequency sampling, b ∈ (0,1 ..., N/D-1), N/D represents son spectrum Y dline number, N/D ∈ { positive integer }, k drepresent the sampled point of frequency discrimination, k d=b*D+d-1.
4. the method based on when spectrum amplitude classification vector identification ambient sound event according to claim 3, is characterized in that: in step 3, and antithetical phrase sample spectrum Y dcarry out signal and strengthen processing, sub sampling is composed to Y dconvert the sub sampling spectrum X that signal strengthens to d, signal strengthen spectrum X (k, l) and sub sampling spectrum X dbe expressed as follows:
X ~ [X 1, X 2, …X d, …, X D]
Wherein, D represents frequency spectrum Y to adopt a point as frequency discrimination point every D frequency discrimination point, spectrogram Y is resolved into son spectrum Y dquantity, d represents the sample index of sub-spectrogram of frequency discrimination, d ∈ (1,2 ..., D), X drepresent the matrix of (N/D) × L, X dcan be expressed as:
X d(b, l) = X ( k d, l )
Wherein, b represents son spectrum X din spectrum X, carry out the index of frequency sampling, b ∈ (0,1 ..., N/D-1), N/D represents son spectrum X dline number, N/D ∈ { positive integer }, k drepresent the sampled point of frequency discrimination, k d=b*D+d-1.
5. the method based on when spectrum amplitude classification vector identification ambient sound event according to claim 4, is characterized in that: in step 4, the sub-sonograph that signal is strengthened carries out amplitude classification and comprises the following steps:
For d signal enhancer spectrum X d, amplitude classification thresholds i (d) is expressed as:
i(d) = f d/2
Wherein, f drepresent d signal enhancer spectrum X dmaximal value, i.e. f d=max (| X d|);
For d signal enhancer spectrum X d, according to amplitude classification thresholds i (d), obtain G grading range:
d-1: |X d|∈( 2 0×i(d), 2 1×i(d)]
d-g: |X d|∈( 2 g-1×i(d), 2 g×i(d)]
d-G: |X d|∈( 2 G-1×i(d), 2 G×i(d)]
Wherein, g represents the progression of amplitude classification, g ∈ 1,2 ..., G}, G represents at most progression that can classification;
According to the span of G, to X dcarrying out classification obtains:
Wherein, represent subconstiuent, the g level of amplitude classification, can be expressed as follows:
Wherein, the amplitude that T is illustrated in this rank in a frame satisfies condition 2 g-1× i (d) <X d(b, l)≤2 g× i (d) is desirable number at most; T represents the index of the amplitude of this rank, 0≤t≤T-1; lrepresent the index of frame in son spectrum, 0≤ l≤ L-1; B represents son spectrum X dthe index of frequency discrimination, b ∈ (0,1 ..., N/D-1); '. ' represents not value, in the time of coding, do not consider; c brepresent available coded character set.
6. the method based on when spectrum amplitude classification vector identification ambient sound event according to claim 5, is characterized in that: in step 5, as follows the sub-sonograph after amplitude classification is encoded to generate TSASV:
Ambient sound event table is shown as by signal enhancer spectrum 1 TSASV who forms (1):
Or by d signal enhancer spectrum 1, son spectrum 2 ..., son spectrum d composition TSASV (1) (2) ... (d):
Wherein, d ∈ (1,2 ..., D); In actual applications, according to different situations, TSASV can be combination in any, that is:
Wherein, component v dcan be expressed as:
Equally, v din component v d wcan be expressed as:
can be expressed as:
Wherein, for , represent son spectrum X dw sound event section in, amplitude is classified as g, frequency discrimination index is b, is encoded to character c bquantity, w represents the segmented index of sound event, w ∈ 1,2 ..., W}, W represents the quantity of sound event segmentation.
7. the method based on when spectrum amplitude classification vector identification ambient sound event according to claim 6, is characterized in that:
For sound event to be measured, TSASV (d) is expressed as follows:
Wherein, represent sound event spectrum X to be measured dthe individual sound event section of w ' in, amplitude is classified as g, frequency discrimination sequence number is b, is encoded to character c bquantity; W ' represents the segmented index of sound event, w ' ∈ 1,2 ..., W ' }, W ' represents the quantity of sound event segmentation to be measured;
The prototype sound event that the sound event to be measured that is W ' for segments and segments are W, get shorter segment length:
W M=min(W, W’)
Calculate the difference of sound event to be measured and prototype sound event hop count:
W T=abs(W-W’)
If W-W ' is <0, i.e. the length that is shorter in length than sound event to be measured of prototype sound event segmentation, has:
Otherwise have
Wherein, w 0the number of times that prototype sound event compares with sound event to be measured in order, w 0∈ 1,2 ..., W t;
Then the TSASV that, calculates script event and sound event to be identified is apart from s:
s=max(r(1), r(2), …, r(W T))
Further the TSASV of prototype sound events all in sound event to be measured and database is compared, wherein the prototype sound event s of s maximum idthe sound event that will identify exactly:
Wherein, represent to ask to make s idmaximum s.
CN201210242825.5A 2012-07-14 2012-07-14 Method for identifying environment sound events based on time spectrum amplitude scaling vectors Expired - Fee Related CN102789780B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201210242825.5A CN102789780B (en) 2012-07-14 2012-07-14 Method for identifying environment sound events based on time spectrum amplitude scaling vectors

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201210242825.5A CN102789780B (en) 2012-07-14 2012-07-14 Method for identifying environment sound events based on time spectrum amplitude scaling vectors

Publications (2)

Publication Number Publication Date
CN102789780A CN102789780A (en) 2012-11-21
CN102789780B true CN102789780B (en) 2014-10-01

Family

ID=47155167

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201210242825.5A Expired - Fee Related CN102789780B (en) 2012-07-14 2012-07-14 Method for identifying environment sound events based on time spectrum amplitude scaling vectors

Country Status (1)

Country Link
CN (1) CN102789780B (en)

Families Citing this family (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104575496A (en) * 2013-10-14 2015-04-29 中兴通讯股份有限公司 Method and device for automatically sending multimedia documents and mobile terminal
CN103531202B (en) * 2013-10-14 2015-10-28 无锡儒安科技有限公司 Distributed Detection sound event also chooses the method for similar events point
JP7266390B2 (en) * 2018-11-20 2023-04-28 パナソニック インテレクチュアル プロパティ コーポレーション オブ アメリカ Behavior identification method, behavior identification device, behavior identification program, machine learning method, machine learning device, and machine learning program
CN111292767B (en) * 2020-02-10 2023-02-14 厦门快商通科技股份有限公司 Audio event detection method and device and equipment

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101587710A (en) * 2009-07-02 2009-11-25 北京理工大学 A kind of many code books coding parameter quantification method based on the audio emergent event classification
CN102555082A (en) * 2010-11-30 2012-07-11 三星钻石工业股份有限公司 Breaking method of fragile material substrate

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102782750B (en) * 2011-01-05 2015-04-01 松下电器(美国)知识产权公司 Region of interest extraction device, region of interest extraction method

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101587710A (en) * 2009-07-02 2009-11-25 北京理工大学 A kind of many code books coding parameter quantification method based on the audio emergent event classification
CN102555082A (en) * 2010-11-30 2012-07-11 三星钻石工业股份有限公司 Breaking method of fragile material substrate

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
Jonathan Dennis et al.Spectrogram Image Feature for Sound Event Classification in Mismatched Conditions.《IEEE SIGNAL PROCESSING LETTERS》.2011,
Spectrogram Image Feature for Sound Event Classification in Mismatched Conditions;Jonathan Dennis et al;《IEEE SIGNAL PROCESSING LETTERS》;20110228;全文 *

Also Published As

Publication number Publication date
CN102789780A (en) 2012-11-21

Similar Documents

Publication Publication Date Title
Kong et al. Weakly labelled audioset tagging with attention neural networks
CN104795064B (en) The recognition methods of sound event under low signal-to-noise ratio sound field scape
US7117149B1 (en) Sound source classification
Ellis et al. Classifying soundtracks with audio texture features
CN111279414B (en) Segmentation-based feature extraction for sound scene classification
Mulimani et al. Segmentation and characterization of acoustic event spectrograms using singular value decomposition
WO2009090584A2 (en) Method and system for activity recognition and its application in fall detection
CN103824557A (en) Audio detecting and classifying method with customization function
CN112712809B (en) Voice detection method and device, electronic equipment and storage medium
CN102789780B (en) Method for identifying environment sound events based on time spectrum amplitude scaling vectors
CN113327626A (en) Voice noise reduction method, device, equipment and storage medium
CN105810212A (en) Train whistle recognizing method for complex noise environment
Kim et al. Hierarchical approach for abnormal acoustic event classification in an elevator
CN111724770A (en) Audio keyword identification method for generating confrontation network based on deep convolution
Zhang et al. Speech emotion recognition using combination of features
Jančovič et al. Bird species recognition from field recordings using HMM-based modelling of frequency tracks
KR102314824B1 (en) Acoustic event detection method based on deep learning
Chen et al. Robust speech hash function
CN113345443A (en) Marine mammal vocalization detection and identification method based on mel-frequency cepstrum coefficient
Thiruvengatanadhan Music genre classification using gmm
Astapov et al. A multistage procedure of mobile vehicle acoustic identification for single-sensor embedded device
Sobreira-Seoane et al. Automatic classification of traffic noise
CN109935234B (en) Method for identifying source equipment of sound recording
CN114626412A (en) Multi-class target identification method and system for unattended sensor system
Feki et al. Audio stream analysis for environmental sound classification

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
C14 Grant of patent or utility model
GR01 Patent grant
CF01 Termination of patent right due to non-payment of annual fee
CF01 Termination of patent right due to non-payment of annual fee

Granted publication date: 20141001

Termination date: 20170714