KR101732399B1 - Sound Detection Method Using Stereo Channel - Google Patents

Sound Detection Method Using Stereo Channel Download PDF

Info

Publication number
KR101732399B1
KR101732399B1 KR1020150155057A KR20150155057A KR101732399B1 KR 101732399 B1 KR101732399 B1 KR 101732399B1 KR 1020150155057 A KR1020150155057 A KR 1020150155057A KR 20150155057 A KR20150155057 A KR 20150155057A KR 101732399 B1 KR101732399 B1 KR 101732399B1
Authority
KR
South Korea
Prior art keywords
event
sound
signal
value
channel
Prior art date
Application number
KR1020150155057A
Other languages
Korean (ko)
Inventor
김홍국
이동윤
유승우
전광명
Original Assignee
광주과학기술원
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 광주과학기술원 filed Critical 광주과학기술원
Priority to KR1020150155057A priority Critical patent/KR101732399B1/en
Application granted granted Critical
Publication of KR101732399B1 publication Critical patent/KR101732399B1/en

Links

Images

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/78Detection of presence or absence of voice signals
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/20Speech recognition techniques specially adapted for robustness in adverse environments, e.g. in noise, of stress induced speech

Abstract

A method for detecting an anomalous sound event using an input signal of a stereo in an environment where background noise is mixed includes receiving a stereo input acoustic signal, converting the input acoustic signal into a time-frequency domain , Decomposing the input acoustic signal transformed using a non-sound tensor decomposition (NTF) algorithm, firstly discriminating whether an event sound is generated by referring to a channel gain value, discriminating characteristics of the identified event sound signals Extracting an event sound signal, performing an HMM classification on the extracted event sound signal, and finally determining whether an abnormal sound event has occurred according to the HMM classification.

Description

[0001] The present invention relates to a sound detection method using a stereo channel,

The present invention relates to a sound detection method, and more particularly, to a method for classifying and detecting an abnormal sound in a real environment in which various background noises are mixed, and a method for detecting anomalous sound using a stereo channel.

An acoustic-based event detection method means to detect the occurrence of an acoustic event from a recorded sound source and to classify the event according to the characteristics of the sound. In this paper, we propose a new method for detecting acoustic event based on a Gaussian mixed model and a non-singular nonlinear system. Is a method of detecting an acoustic event using a recorded signal.

However, since the conventional acoustic event detection techniques use a stereo channel signal as an independent single channel signal and use it as an input signal, the detection accuracy of an acoustic event is lowered There was a problem.

In the conventional acoustic detection method, a method using GMM (Gaussian Mixture Models) is used after deriving MFCC feature values. However, this method has high detection accuracy for a single acoustic signal having no signal complexity, but there is a problem in that the accuracy is low when detecting an abnormal sound in a signal in which a plurality of acoustic signals are combined.

To solve this problem, an NMF-based sound detection method has been proposed in which an input signal in which a plurality of sound signals are mixed is classified into a plurality of sound sources. This method is characterized in that a plurality of sound sources are labeled so that each abnormal event can be independently detected. However, since the channel information can not be utilized for a stereo sound signal, the detection accuracy of an abnormal sound event is somewhat deteriorated.

It is an object of the present invention to provide a sound detection method capable of increasing the detection accuracy of an abnormal sound event by using stereo channel signal information in detecting an abnormal sound event.

An embodiment of the present invention is a method of detecting an anomalous acoustic event using an input signal of a stereo, comprising: receiving a stereo input acoustic signal; Converting the input acoustic signal into a time-frequency domain; Decomposing the converted input acoustic signal using a non-sound tensor decomposition (NTF) algorithm; Firstly determining whether an event sound is generated by referring to the channel gain value; Extracting feature values for the identified event sound signals; Performing HMM classification on the extracted event sound signal; And determining whether a final abnormal acoustic event has occurred according to the HMM classification.

According to the present invention, the acoustic detection method of the embodiment uses a non-sound tensor decomposition (NTF) algorithm to separately detect various acoustic signals into respective event sounds, and by referring to the channel gain of the NTF to acoustic event detection, It is possible to detect an abnormal sound with high accuracy even in a mixed environment.

The present invention primarily classifies the event signal based on the non-sound water tensor decomposition technique, classifies the background noise and the event signal again through the hidden Markov model, and obtains a specific sound aimed at by the user in the input signal with higher accuracy Can be reliably detected.

1 is a flowchart showing a sound detection method according to an embodiment of the present invention.
2 is a diagram specifically showing each step of the sound detection method according to the embodiment of the present invention.
3 is a diagram illustrating a step of primarily determining whether an event sound is generated based on a channel gain value in the sound detection method according to the embodiment of the present invention.
4 is a diagram illustrating a step of extracting a feature value for signals discriminated in the sound detection method according to the embodiment of the present invention
5 is a diagram illustrating a step of extracting a feature value for signals discriminated in the sound detection method according to the embodiment of the present invention
6 is a graph comparing the performance of acoustic detection according to an embodiment of the present invention with the conventional art
FIG. 7 is a graph showing the performance of sound detection according to an embodiment of the present invention,

The embodiments of the present invention will be described in detail with reference to the accompanying drawings, but the present invention is not limited to these embodiments. In describing the present invention, a detailed description of well-known functions or constructions may be omitted for the sake of clarity of the present invention.

The present invention relates to a method for detecting a specific event sound among inputted sound signals, and in particular, a method for detecting an event sound in an input sound signal obtained with a stereo channel.

1 is a flowchart illustrating a sound detection method according to an embodiment of the present invention.

Referring to FIG. 1, a method for detecting an acoustic event according to an embodiment includes receiving a stereo input acoustic signal S1, converting a received input signal to a mel-amplitude spectrum signal S2, (S3), a step (S4) of discriminating whether an event sound is generated based on the channel gain value, a step (S5) of extracting a feature value for the discriminated signals, a HMM likelihood verification (S6), and detecting the occurrence of an abnormal event (S7) according to the verification result.

FIG. 2 is a diagram illustrating each step of the sound detection method according to the embodiment of the present invention in detail.

Referring to FIGS. 1 and 2 together, it is assumed that an acoustic event detection method of the present invention is such that, in separating a sound source, an input signal is a signal obtained from a stereo channel. The stereo input signal obtained in step S1 may be expressed by the following equation.

Figure 112015107758514-pat00001

i is the frame number, c is the channel number, e is the event classification number, E is the number of event classifications, S i c, e is the e-categorized acoustic event signal of channel c , d i c is the background Represents a noise signal. The stereo input signal (S i c, e (n)) is assumed to contain a mixture of E acoustic events.

In step S2, the input signal spectrum is obtained from the received input signal y i c (n) as described above. The input signal is subjected to a short-term Fourier transform (STFT) i e (k) |) and this signal is converted to a Mel amplitude spectrum signal Y i c (m).

The mel-amplitude spectrum signal can be expressed by the following equation.

Figure 112015107758514-pat00002

Where m and M represent the order of mel-amplitude spectrum.

Then, in step S3, a step of decomposing mel-amplitude spectrum into a non-sound tensor is performed. The mel-amplitude spectrum by non-sound hydrostatic resolution can consist of a channel gain, a frequency basis, and a time activation matrix called a tensor. This can be expressed by the following equation.

Figure 112015107758514-pat00003

here,

Figure 112015107758514-pat00004
Is the tensor product, and J is the rank of the basis in the NTF decomposition. The channel gain (C), frequency base (B) and time activation (A) matrix are as follows.

C = [C i 1, S , ···, C i E, S, C i D]: channel gain matrix (2 × J)

C i e, S , C i D represent the channel gain matrix of the acoustic event and background noise.

B = B i 1, S , ..., B i E, S , B i D : Frequency gain matrix (2 x J '

J 'represents the base rank of each acoustic event and background noise.

B i e, S , and B i D represent the frequency gain matrix of the acoustic event and the background noise.

A = [A i 1, S , ..., A i E, S , A i D ]: Time gain matrix (1 × J)

A i e, S , and A i D represent the time gain matrix of the acoustic event and the background noise.

The channel gain and the time gain may be updated by successive updating rules. The channel gain and time gain can be updated by repeatedly performing the following equation.

Figure 112015107758514-pat00005

Figure 112015107758514-pat00006

Where h is a repetition factor, and ° denotes a multiplication operation. Then, P h c, k, m is

Figure 112015107758514-pat00007
Lt; RTI ID = 0.0 > Y < / RTI > As described above, in the channel gain update rule, the channel gain, the frequency gain,
Figure 112015107758514-pat00008
The ratio of Y i to the channel gain, the frequency gain, and the channel gain for the update rule of the time gain.
Figure 112015107758514-pat00009
The percentage value of Y i on can be considered. Equation (4) can be performed until the relative decrease of the KL divergence becomes smaller than a predetermined threshold value.

The Y i may be expressed as a sum of each event sound and background sound as follows.

Figure 112015107758514-pat00010

Then, Y i can be expressed as a background noise and an abnormal sound event signal by applying a tensor product composed of the respective factors as follows.

&Quot; (6) "

Figure 112015107758514-pat00011

Figure 112015107758514-pat00012

As shown in Equation (6), the abnormal acoustic event signal and background noise can be decomposed into a combination of channel gain, frequency gain, and time gain tensor, respectively.

3 is a diagram illustrating a step S40 of determining whether an event sound is generated based on a channel gain value in the sound detection method according to the embodiment of the present invention. The acoustic detection method of the present invention divides an input signal transformed into a time-frequency domain into a combination of a channel gain, a time gain, and a tensor with respect to a frequency gain, and in particular, detects an abnormal sound event This is primarily used for

In operation S40, it is determined whether an event sound is generated based on the channel gain value. In operation S40, an average of channel gains for each event sound may be determined. The channel gain of the e-th event can be calculated as:

Figure 112015107758514-pat00013

here

Figure 112015107758514-pat00014
Represents the number of bases corresponding to each event classification, and the average (C i ) of all channel gains can be expressed by the following equation.

Figure 112015107758514-pat00015

Where J represents the number of all Basis.

In addition, the embodiment first derives the channel gain value and the average of all the channel gains for each sound, and applies the mean-to-max threshold value to the average of the channel gains of the e-th (arbitrary) event . And calculating a ratio value of an e-th channel gain with respect to an average of all channel gains and comparing the ratio value with a predetermined threshold value.

If, in the case larger than the threshold (thr c) the ratio value is a predetermined acoustic signal is carried over the case of the e-th HMM classification step. If the ratio value is smaller than the predetermined threshold value, the corresponding signal is determined to be a background noise signal, and does not consider the event occurrence decision. The above-described judgment process can be expressed by the following equation.

Figure 112015107758514-pat00016

If the channel gain ratio of the e-th signal is greater than a predetermined value as shown in Equation (9), it is classified as an acoustic having the event information of the first degree, and this signal can be denoted as Flag c e .

4 is a diagram illustrating a step S50 of extracting a feature value for signals discriminated in the sound detection method according to the embodiment of the present invention.

Referring to FIG. 4, a process of converting signals filtered by channel gain using NTF decomposition to a feature vector composed of MFCCs may be performed to detect an event sound using HMM classification.

That is, an MFCC extraction is performed with reference to a signal (Flag c e ) in which the NTF is performed considering a channel gain, and extracted feature values can be expressed as follows.

Figure 112015107758514-pat00017

Then, the average of the feature values

Figure 112015107758514-pat00018
, The change value of the feature value is obtained. The change value may be the difference between the previous feature value and the previous value of the previous feature value. (
Figure 112015107758514-pat00019
)

That is, the feature value of the i-th sound

Figure 112015107758514-pat00020
, The main feature value is
Figure 112015107758514-pat00021
Can be expressed as

Referring to FIG. 5, it is possible to use previously trained sound event HMMs (? 1, S ...? E, S ) and background noise HMM (? D ) for HMM classification. A likelihood for the event sound detected using the HMMs can be derived.

Acoustic event HMM (Φ E, S) and the background noise HMM (Φ D) may comprise the initial state distribution (π), the state transition probability matrix (T), Gaussian mixture observation (θ) as a parameter, Φ E, S = [π E, S , T E, S , θ E, S ] and φ D = [π D , T D , θ D ].

A likelihood (L i Flag ) for the event sound and a likelihood (L i D ) for the background noise can be derived by calculating probability values for the HMMs and the feature values. The likelihood for likelihood and background noise for event sound is as follows.

Figure 112015107758514-pat00022

Then, the average of the channel gains of the event sounds derived in step S50 (

Figure 112015107758514-pat00023
), And a value obtained by normalizing this value (
Figure 112015107758514-pat00024
). Through this process, the weighted likelihood probability is
Figure 112015107758514-pat00025
And this value can be used to perform the determination of the event sound.

In step S60 of performing HMM classification on the feature values extracted in step S50 to verify the likelihood, step S61 of performing maximum likelihood classification and step S60 of performing a maximum likelihood classification are performed. In step S61, the ratio of likelihoods for event sound to likelihood for background noise And determining whether the event sound is an event sound according to the step S62.

In step S61, the feature value obtained in step S50

Figure 112015107758514-pat00026
) And the average value of the normalized channel gain obtained in step S40
Figure 112015107758514-pat00027
), Data (? 1, S ...? E, S ) from a plurality of pre-trained event sound HMMs (? E, S ) and data (? D ) from background noise HMMs for the final event sound Likelihood for likelihood and background noise can be obtained.

In step S62, the ratio of the likelihood of the event sound to the likelihood (L i D ) of the background noise HMM is calculated. If the value of the likelihood is greater than the predetermined value, it can be determined that the sound signal is a singular event. The expression is as follows.

Figure 112015107758514-pat00028

The finally obtained result value (Flag i e ) represents e or 0, and it can be determined whether there is an abnormal situation in the current frame.

Here,

Figure 112015107758514-pat00029
Is a preset threshold value, it can be determined that the i-th frame includes an abnormal sound when the detected result {Flag i e } is e i as in Equation (11). If the result value is 0, it can be determined that only the background noise exists in the i-th frame.

If it is determined that the i-th frame includes the abnormal sound through the comparison of the likelihood with the reference value as described above, it can be determined that an abnormal sound exists in the currently input signal.

FIG. 6 is a graph comparing the performance of acoustic detection according to an embodiment of the present invention with the conventional art, and F-measure is used as an index indicating the accuracy of sound detection. The F-measure is an indicator of accuracy by integrating the precision and the trade-off of the recall, and is also referred to as the harmonic mean with weight.

Referring to FIG. 6 (a), the case of detecting the likelihood of the event sound is compared with the case of the GMM algorithm, and the case of detecting the likelihood of the event sound considering the channel gain through the NMF decomposition as in the embodiment. As shown in the graph, when the sound detection method of the embodiment is applied, the F-measure is the highest at 0.5042, and in the remaining cases, the value close to 0.4 is obtained.

6 (b) is a graph showing a relative improvement rate of the F-measure. Comparing the conventional method and the embodiment using the GMM algorithm, the embodiment improved the F-measure by about 23. 64%, and comparing the conventional method and the embodiment using the NMF algorithm, the embodiment showed about 31.30% The accuracy of the detection of the abnormal event sound is remarkably improved.

FIG. 7 is a graph comparing the performance of sound detection according to an embodiment of the present invention, according to the value of SNR. Referring to FIG. 7, the F-measure is compared with the case of FIG. 6 when the SNR is 6 dB, 0 dB, and -6 dB.

The proposed method has higher F-measure than the method using GMM and NMF in all cases where SNR is 6dB, 0dB, -6dB. Especially, in -6dB environment where SNR is relatively poor, Although the accuracy of the detection is hardly shown, the embodiment shows an F-measure of 0.2588 to 0.3448, which can be judged to be remarkable even in an environment where the background noise dominates.

The sound detection method of the embodiment uses an input signal of a stereo and uses an NTF algorithm and decomposes the input signal into a plurality of combinations of tensors. Among them, the use of the channel gain related tensor for acoustic event detection can improve the detection accuracy of the abnormal sound even in a mixed environment of several noises.

The present invention considers the channel gain through the NTF algorithm, and at the same time, compares the likelihoods of the background noise and the event sound through the hidden Markov model, thereby making it possible to more accurately and reliably detect a specific sound intended by the user from among the input signals.

While the present invention has been particularly shown and described with reference to exemplary embodiments thereof, it is to be understood that the invention is not limited to the disclosed exemplary embodiments, but, on the contrary, It will be understood that various modifications and applications other than those described above are possible. For example, each component specifically shown in the embodiments of the present invention can be modified and implemented. It is to be understood that all changes and modifications that come within the meaning and range of equivalency of the claims are therefore intended to be embraced therein.

Claims (12)

1. A method for detecting an anomalous acoustic event using an input signal of a stereo,
Receiving a stereo input acoustic signal;
Converting the input acoustic signal into a time-frequency domain;
Decomposing the converted input acoustic signal using a non-sound tensor decomposition (NTF) algorithm;
Firstly determining whether an event sound is generated by referring to the channel gain value;
Extracting feature values for the identified event sound signals;
Performing HMM classification on the extracted event sound signal; And
And finally determining whether an abnormal acoustic event has occurred according to the HMM classification.
The method according to claim 1,
Wherein the step of converting the input acoustic signal into a time-frequency domain comprises converting the stereo input sound signal into a Mel amplitude spectrum signal after performing a short-term Fourier transform (STFT) on the stereo input sound signal. Way.
3. The method of claim 2,
Wherein the mel-amplitude spectrum signal is decomposed into a tensor composed of a combination of a channel gain, a frequency gain, and a time gain.
The method of claim 3,
Wherein the channel gain and the time gain are continuously updated by repetitive execution of an update rule.
The method according to claim 1,
The step of determining whether an event sound is generated based on the channel gain value comprises:
Calculating an average of all the channel gains, calculating a ratio value of an arbitrary channel gain with respect to the average of all the channel gains, and comparing it with a preset threshold value Wherein the audio channel is a stereo channel.
6. The method of claim 5,
When the ratio value is greater than the predetermined threshold value, applies only the corresponding event sound signal to the future classification algorithm, and when the ratio value is greater than the threshold value, determines that the event sound signal corresponding to the event sound signal is background noise A method for detecting sound using a channel.
The method according to claim 1,
The step of extracting feature values for the determined event sound signals comprises:
And converting the signals filtered by the channel gain obtained by the NTF into a feature vector composed of MFCCs.
The method according to claim 1,
The step of performing HMM classification on the extracted event sound signal includes:
A sound detection method using a stereo channel that derives a likelihood for an event sound detected using a pre-trained acoustic event HMM and a background noise HMM.
9. The method of claim 8,
Calculating a probability and a likelihood value for the base and background noise HMMs of the acoustic event HMM and the feature value to derive the likelihood for the event sound and the likelihood for the background noise.
10. The method of claim 9,
Averaging the channel gains of the detected event sounds and obtaining a generalized value thereof and then multiplying the generalized value by the likelihood for the event sound to obtain a stereo likelihood for the event sound having a weighted value of the channel gain, A method for detecting sound using a channel.
10. The method of claim 9,
A method of sound detection using a stereo channel that derives likelihood for a final event sound and likelihood for a background noise by referring to data from a pre-trained event acoustic HMM and data from a background noise HMM.
11. The method of claim 10,
Calculating a rate of likelihood for a final event sound to a likelihood for the background noise and determining that the sound signal is a sound signal including a specific event when the value is greater than a preset value.
KR1020150155057A 2015-11-05 2015-11-05 Sound Detection Method Using Stereo Channel KR101732399B1 (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
KR1020150155057A KR101732399B1 (en) 2015-11-05 2015-11-05 Sound Detection Method Using Stereo Channel

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
KR1020150155057A KR101732399B1 (en) 2015-11-05 2015-11-05 Sound Detection Method Using Stereo Channel

Publications (1)

Publication Number Publication Date
KR101732399B1 true KR101732399B1 (en) 2017-05-08

Family

ID=60164302

Family Applications (1)

Application Number Title Priority Date Filing Date
KR1020150155057A KR101732399B1 (en) 2015-11-05 2015-11-05 Sound Detection Method Using Stereo Channel

Country Status (1)

Country Link
KR (1) KR101732399B1 (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115174130A (en) * 2022-03-10 2022-10-11 中国科学院沈阳自动化研究所 HMM-based AGV semantic attack detection method

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
Emmanouil Benetos et al., ‘Non-negative tensor factorization applied to music genre classification’, IEEE Trans. on Audio, Speech, and Language Processing, Vol.18, No.8, November 2010.*
이동윤 외 3명, ‘비음수 텐서 분해 및 은익 마코프 모델 기반 스테레오 채널 음향 사건 검출 방법’, 2014년 대한전자공학회 추계학술대회 논문집, pp.445~446, 2014년 11월.*

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115174130A (en) * 2022-03-10 2022-10-11 中国科学院沈阳自动化研究所 HMM-based AGV semantic attack detection method
CN115174130B (en) * 2022-03-10 2023-06-20 中国科学院沈阳自动化研究所 AGV semantic attack detection method based on HMM

Similar Documents

Publication Publication Date Title
Xu et al. An experimental study on speech enhancement based on deep neural networks
US9489965B2 (en) Method and apparatus for acoustic signal characterization
JP4746533B2 (en) Multi-sound source section determination method, method, program and recording medium thereof
JP2005091732A (en) Method for restoring target speech based on shape of amplitude distribution of divided spectrum found by blind signal separation
KR102206546B1 (en) Hearing Aid Having Noise Environment Classification and Reduction Function and Method thereof
US20160027438A1 (en) Concurrent Segmentation of Multiple Similar Vocalizations
JP6721365B2 (en) Voice dictionary generation method, voice dictionary generation device, and voice dictionary generation program
KR102066718B1 (en) Acoustic Tunnel Accident Detection System
May et al. Computational speech segregation based on an auditory-inspired modulation analysis
JP5994639B2 (en) Sound section detection device, sound section detection method, and sound section detection program
JP5974901B2 (en) Sound segment classification device, sound segment classification method, and sound segment classification program
Poorjam et al. A parametric approach for classification of distortions in pathological voices
CN112992190B (en) Audio signal processing method and device, electronic equipment and storage medium
KR101732399B1 (en) Sound Detection Method Using Stereo Channel
Jeon et al. Acoustic surveillance of hazardous situations using nonnegative matrix factorization and hidden Markov model
JPWO2016152132A1 (en) Audio processing apparatus, audio processing system, audio processing method, and program
KR101343768B1 (en) Method for speech and audio signal classification using Spectral flux pattern
JP2017021267A (en) Wiener filter design device, sound enhancement device, acoustic feature amount selection device, and method and program therefor
Seong et al. WADA-W: A modified WADA SNR estimator for audio-visual speech recognition
Cournapeau et al. Evaluation of real-time voice activity detection based on high order statistics.
JP5936378B2 (en) Voice segment detection device
JP5147012B2 (en) Target signal section estimation device, target signal section estimation method, target signal section estimation program, and recording medium
CN115223584B (en) Audio data processing method, device, equipment and storage medium
JP2013160937A (en) Voice section detection device
Gergen et al. Reduction of reverberation effects in the MFCC modulation spectrum for improved classification of acoustic signals.

Legal Events

Date Code Title Description
E701 Decision to grant or registration of patent right
GRNT Written decision to grant