KR101741418B1 - A method for recognizing sound based on acoustic feature extraction and probabillty model - Google Patents

A method for recognizing sound based on acoustic feature extraction and probabillty model Download PDF

Info

Publication number
KR101741418B1
KR101741418B1 KR1020150056272A KR20150056272A KR101741418B1 KR 101741418 B1 KR101741418 B1 KR 101741418B1 KR 1020150056272 A KR1020150056272 A KR 1020150056272A KR 20150056272 A KR20150056272 A KR 20150056272A KR 101741418 B1 KR101741418 B1 KR 101741418B1
Authority
KR
South Korea
Prior art keywords
sound
impact sound
model
impact
acoustic
Prior art date
Application number
KR1020150056272A
Other languages
Korean (ko)
Other versions
KR20160125628A (en
Inventor
홍승기
Original Assignee
(주)사운드렉
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by (주)사운드렉 filed Critical (주)사운드렉
Priority to KR1020150056272A priority Critical patent/KR101741418B1/en
Publication of KR20160125628A publication Critical patent/KR20160125628A/en
Application granted granted Critical
Publication of KR101741418B1 publication Critical patent/KR101741418B1/en

Links

Images

Classifications

    • GPHYSICS
    • G08SIGNALLING
    • G08BSIGNALLING OR CALLING SYSTEMS; ORDER TELEGRAPHS; ALARM SYSTEMS
    • G08B13/00Burglar, theft or intruder alarms
    • G08B13/16Actuation by interference with mechanical vibrations in air or other fluid
    • G08B13/1654Actuation by interference with mechanical vibrations in air or other fluid using passive vibration detection systems
    • G08B13/1672Actuation by interference with mechanical vibrations in air or other fluid using passive vibration detection systems using sonic detecting means, e.g. a microphone operating in the audio frequency range
    • GPHYSICS
    • G08SIGNALLING
    • G08BSIGNALLING OR CALLING SYSTEMS; ORDER TELEGRAPHS; ALARM SYSTEMS
    • G08B21/00Alarms responsive to a single specified undesired or abnormal condition and not otherwise provided for
    • G08B21/18Status alarms
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/06Creation of reference templates; Training of speech recognition systems, e.g. adaptation to the characteristics of the speaker's voice
    • G10L15/063Training
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/20Speech recognition techniques specially adapted for robustness in adverse environments, e.g. in noise, of stress induced speech
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/22Procedures used during a speech recognition process, e.g. man-machine dialogue

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Multimedia (AREA)
  • Acoustics & Sound (AREA)
  • Computational Linguistics (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Human Computer Interaction (AREA)
  • General Physics & Mathematics (AREA)
  • Artificial Intelligence (AREA)
  • Business, Economics & Management (AREA)
  • Emergency Management (AREA)
  • Measurement Of Mechanical Vibrations Or Ultrasonic Waves (AREA)

Abstract

The present invention relates to an acoustic feature extraction method and a probability model based acoustic model for acoustic recognition of a continuous impact sound. More particularly, the present invention relates to a method for recognizing a specific impact sound, And a sound recognition method based on an acoustic feature extraction and a probability model for controlling a digital device or a device of the device.

Description

Technical Field [0001] The present invention relates to an acoustic feature extraction method and a probabilistic model based acoustic feature extraction method,

The present invention relates to an acoustic feature extraction method and a probability model based acoustic model for acoustic recognition of a continuous impact sound. More particularly, the present invention relates to a method for recognizing a specific impact sound, And a sound recognition method based on an acoustic feature extraction and a probability model for controlling a digital device or a device of the device.

A method of extracting a conventional acoustic feature and a method of generating an acoustic recognition model are intended for speech recognition, and a technology for extracting an acoustic feature from a human voice and generating a probability model based on the extracted feature is also proposed . Therefore, there is a limit to apply to the purpose of recognizing the specific sound generated from the object through the existing methods and recognizing the situation.

Korean Patent Publication No. 2014-0136332 Korean Patent Publication No. 2009-0035222

The present invention provides a method for solving the above-described conventional problems, and a method for making a determination of a dangerous situation by recognizing an unexpected impact sound, or a method for achieving a purpose of controlling an apparatus or a device by intentionally generating an impact sound And to provide the above objects. That is, if an unexpected impact sound is generated in a limited space, it is determined as an emergency and remotely recognized, or intentionally generated continuous clapping sound to control a specific object. For example, it can be applied by turning on / off the light by generating a continuous impact sound. Or a case where a window is cracked in a general home, and can be applied to a remote monitoring and control system for the purpose of informing emergency emergency situations.

The means for achieving the object of the present invention extracts features to be utilized through the collected impact sound sound database and trains a Gaussian Mixture Model (GMM) using vectors of the extracted feature data, A training module for training the GMM in the same way through a database of various general sounds to detect the sudden change in the signal from the input signal and extracting the acoustic characteristic using the voice signal of a certain length based on the detected change. And a recognizer module for performing a Likelihood Ratio Test (LRT) on the signal to verify whether the impulse sound of the signal is an impulse sound, do.

In the present invention, by suggesting a method of recognizing an unexpected impact sound to make a judgment as to a dangerous situation, or a method of intentionally generating an impact sound to control an apparatus or a device, an unexpected impact sound It is possible to remotely recognize the emergency situation, or to intentionally generate consecutive impact sounds to control a specific object, and to inform the remote monitoring and control system of an emergency emergency situation.

1 is a configuration diagram of a training unit module according to the present invention;
2 is a configuration diagram of a recognition module according to the present invention;
Fig. 3 is an example of a filter bank of a linear scale and a mel scale,
FIG. 4 is a flow chart of a feature extraction process of LFCC and MFCC,
5 is an exemplary detection of the impact sound sound starting point, and Fig.
6 is a state diagram for counting the impact sound.

DETAILED DESCRIPTION OF THE PREFERRED EMBODIMENTS The present invention will now be described in detail with reference to the accompanying drawings.

As shown in FIGS. 1 and 2, the impact sound recognition system is largely divided into a training section module and a recognition section module. The training department extracts the acoustic features to be used through the collected impact sound database and trains the Gaussian Mixture Model (GMM) using the extracted feature vectors. It also consists of training GMMs in the same way through a database of various general sounds to identify sounds other than impact sound. The recognition unit detects sudden signal change from the input signal, extracts features by using the speech signal having a predetermined length as a reference, performs a Likelihood Ratio Test (LRT) And finally determining whether the sound of the impact sound is a command for the purpose of outputting the final recognition result.

1. Training Department

As shown in FIG. 1, the training unit extracts acoustic features from a database composed of sound impact sounds and a general sound database that is frequently encountered in home environments such as conversation and music, and then trains each GMM using a machine learning method .

1.1 Database for statistical model learning

In order to train the statistical model, the impact sound database and the general sound database were constructed and utilized. The impact sound database consists of 110 seconds of training data in a relatively quiet environment. The general sound database was used for training for a total of about 150 minutes of data such as 50 minutes of TV, 8 minutes of music, 61 minutes of conversation, 30 minutes of various sounds in the room in consideration of the possibility of actual home environment. Generally, a database corresponding to a general sound has a purpose of reflecting a more varied sound to a probability model, so there is no great difficulty in performing work related to a database other than a lot of sounds to be considered. On the other hand, in the case of the impact sound, since the duration of the sound is very short, if the database of the file unit in which the impact sound is inputted is used several times as it is, if the noise is used instead of the impact sound in the modeling, A problem that is more involved in this probability model occurs. To avoid this problem, only the impact sound regions are cut out for each impact sound database and utilized for statistical model training.

1.2 Acoustic Feature Extraction

In order to construct the acoustic model of the impact sound and the general sound, the feature extraction method expressing the characteristics of the acoustic data includes the Mel-Frequency Cepstral Coefficients (MFCC) and the Linear Frequency Cepstral Coefficients (LFCC) . Both MFCC and LFCC are features that extract the features using the filter bank in the frequency domain. The difference is that the spectrum uses a nonlinear filter bank of the Mel Scale and a linearly spaced filter bank.

The method of extracting MFCC and LFCC is as follows. In this sound recognition system, the frame size is 25ms and the feature is extracted in the corresponding frame by 12.5ms. First, a pre-emphasis process and a Hamming window of Equation (2) are applied to a frame of an inputted signal using Equation (1), and Fourier transform is performed.

Figure 112015038906680-pat00001
(One)

Figure 112015038906680-pat00002
(2)

Where N represents the total number of samples in the frame. x w ( n ) is the signal to which the Hamming window is applied, and a generally uses a value of 0.95 to 0.98. Apply filter bank and log corresponding to each scale in Fig. 3 to filter bank characteristics of MFCC and LFCC as shown in equation (3).

Figure 112015038906680-pat00003
(3)

Figure 112015038906680-pat00004
Is a frequency domain signal after the FFT,
Figure 112015038906680-pat00005
Denotes an mth triangular filter in the filter bank, and k denotes a frequency bin. B means the number of filter banks, and 26 filter banks are used in both MFCC and LFCC. Here, rather than using the frequency bin signal of the Magnitude spectrum as it is, the reason why the filter bank is applied is that the frequency band energy is used rather than the frequency bin signal, And can exhibit a strong characteristic even in the case where noise is mixed.

Finally, the coefficient of the 15th order is extracted by performing the DCT transform as shown in equation (4). The MFCC and LFCC extraction procedures are summarized in FIG. In this way, the logarithm of the DCT is applied to the signal in the frequency domain, and the signal in the signal domain is the frequency characteristic of the frequency domain signal. Therefore, the frequency domain characteristic can be efficiently represented by only a low-order cepstral coefficient.

Figure 112015038906680-pat00006
(4)

For the feature vectors obtained through the above procedure, the Delta feature is finally used to consider the time-varying characteristics of the feature vectors. The Delta feature is a feature vector that is constructed considering the feature vectors of the front and back frames, and is obtained by the following equation.

Figure 112015038906680-pat00007
(5)

here

Figure 112015038906680-pat00008
and
Figure 112015038906680-pat00009
Represents the Delta feature in the i- th frame and the original feature vector of MFCC or LFCC, respectively. As in the formula, the Delta feature can be viewed as considering the signal in the adjacent frame through the average of the front and back frames with respect to the current frame. The obtained Delta feature is combined with the originally obtained MFCC or LFCC and then combined with the 15th feature vector obtained first to construct the 30th feature vector finally.

1.3 Gaussian Mixed Model Training

In order to construct a statistical model of impact sound and general sound to be used in impact sound recognizers, the MFCC and LFCC features are extracted from the impact sound and general sound database and the Gaussian mixture model is trained based on this feature. Since the Gaussian mixture model estimates the probability distribution of data by using various Gaussian probability density functions, it is possible to estimate the probability distribution of the general sound including various sound classes such as voice and music, It is effective to express. Therefore, the Gaussian mixture model was used as the impact sound and general sound statistical model.

The Gaussian mixture model is a representative learning technique based on the statistical model, and estimates the Probability Density Function of the data using the learning data. The Gaussian mixture model is defined as follows.

Figure 112015038906680-pat00010
(6)

Figure 112015038906680-pat00011
Is a D-dimensional random vector and becomes a feature vector of the audio.
Figure 112015038906680-pat00012
Means the density of each element constituting the mixed density.
Figure 112015038906680-pat00013
Denotes the weight of the kth element density
Figure 112015038906680-pat00014
Shall be satisfied.

The D-dimensional probability density function of the kth Gaussian function is calculated as follows.

Figure 112015038906680-pat00015
(7)

here

Figure 112015038906680-pat00016
Mean the mean vector and the covariance matrix, respectively.

The Gaussian mixture density can be defined as the parameters consisting of the mean vector of M element densities, the covariance matrix, and the mixture weights. These parameters are denoted as follows.

Figure 112015038906680-pat00017
(8)

The Gaussian mixture model has various forms depending on the selection of the covariance matrix. In this study, we use the diagonal covariance method with computational advantages.

The maximum likelihood (ML) estimation method is used as a learning method of the parameters constituting the Gaussian mixture model. The purpose of the ML estimation method is to model parameters that maximize the likelihood of the GMM for given training data

Figure 112015038906680-pat00018
. The training procedure for estimating GMM parameters by the ML method is as follows.

T r training vector columns

Figure 112015038906680-pat00019
The likelihood of GMM can be expressed as follows.

Figure 112015038906680-pat00020
(9)

The equation (10)

Figure 112015038906680-pat00021
It can not be directly maximized because it is a non-linear function. Therefore, the parameter is estimated using an EM (Expectation-Maximization) algorithm. The estimation of each parameter using the EM algorithm is as follows.

1. Mixed weights

Figure 112015038906680-pat00022
(10)

2. Average vector

Figure 112015038906680-pat00023
(11)

3. The covariance matrix

Figure 112015038906680-pat00024
(12)

Here, the posterior probability for class k is obtained as follows.

Figure 112015038906680-pat00025
(13)

2. Recognition

As shown in FIG. 2, the recognition unit first performs Abrupt Sound Detection (ASD) from an acoustic signal input through a microphone. This is because a frame in which a signal of a specific frequency band in the frequency domain changes abruptly is detected as a candidate group for an impact sound, so that a feature is extracted from a signal of a predetermined section based on a detected frame. After that, the likelihood is calculated for the mixed sound model of Gaussian mixture and the mixture of normal sound and Gaussian model, and then the Likelihood Ratio Test (LRT) is performed to determine whether the input signal through the microphone is one impact sound . Then, by using the result of the verification of the impact sound, it is possible to detect only the impact sound for the second and third impact sound corresponding to the target sound of the command in consideration of the time interval and frequency of the impact sound.

2.1 Abrupt Sound Detection (ASD)

The ASD module aims to detect the starting point of the impact sound. The impact sound is characterized by its acoustical characteristics suddenly appearing and being held for a short time. For this reason, it is judged as a potential impact sound for the sudden signals, and after the start point is detected, a process of verifying whether or not the signal detected by the ASD is an impact sound is performed through an additional process.

In the ASD, first, the signal change in the region where the power is highest in the spectrum region of the impact sound is considered. For this reason, ASD performs an FFT for each frame and performs a process of adding Magnitude values from 1.5 kHz to 5 kHz. This added value compares the previous and the value in the frame to detect the sudden change of the input signal, which is expressed by the following equation.

Figure 112015038906680-pat00026
(14)

Figure 112015038906680-pat00027
(15)

here

Figure 112015038906680-pat00028
Represents the intensity of the signal in the impact sound frequency region in the i- th frame, and w1 and w2 denote frequency bins corresponding to 1.5 kHz and 5 kHz mentioned above.
Figure 112015038906680-pat00029
Lt; RTI ID = 0.0 >
Figure 112015038906680-pat00030
Of the previous two frames by looking at the ratio of
Figure 112015038906680-pat00031
, And select the larger one among them. This means that within two frames
Figure 112015038906680-pat00032
As a method to consider the case where the value of
Figure 112015038906680-pat00033
Is compared with the threshold value to be used for detecting the start point of the impact sound. In this system,
Figure 112015038906680-pat00034
Is set to 7. The important point here is
Figure 112015038906680-pat00035
It is possible to reduce the case where the normal sound is selected as the candidate for the impact sound, but conversely, when the actual impact sound can be selected in the noisy environment, it is also reduced. Also, if the threshold value is lowered too much, the probability that a normal sound can be selected as a candidate for the impact sound increases, thereby increasing the error rate of the recognizer. Therefore, it is advisable to set this threshold slightly higher to reduce the error rate of the recognizer and to be able to respond only to certain impact sound. This is an advantage in that the reliability of the system can be increased merely by requiring the user to slightly increase the impact sound. Probability-based recognition system In this case, raising the threshold value causes the system to increase the error rate. To obtain a high probability value, it is necessary to make a sound similar to the training model as much as possible. However, It is because it is work. However, in this impact sound recognizer, as a preprocessing process, a process of filtering out a candidate of an impact sound through power is provided, so that the user is given a chance to construct a more stable system by paying a little attention.

Additional features in the ASD module include:

Figure 112015038906680-pat00036
Is compared with the threshold value, if the frame exceeding all the threshold values is selected as the starting point of the impact sound, the result becomes too noisy. Therefore, within 200 ms
Figure 112015038906680-pat00037
Is determined to be a phenomenon caused by noise or other cause rather than due to the impact sound, the start point of the impact sound is not determined and the start point is changed to the frame exceeding the next threshold value. In Fig. 5, the above-mentioned cases are summarized in the figure.

By using such a function, only one impact sound starting point is found within 200 ms determined as the starting point of the impact sound,

Figure 112015038906680-pat00038
This is because the noisy characteristic of the sound of the impact sound is reduced and it is possible to detect the starting point of the impact sound only for those that can be definitely determined as the impact sound.

2.2 Acoustic Feature Extraction

The MFCC or LFCC feature is extracted for the 150 ms long signal based on the starting point of the sound of the impact sound detected by the ASD module. This is because the characteristics of the signal are not long lasting due to the characteristics of the impact sound, so that the duration of the sound is generally considered after the impact sound is started.

2.3 Impulse Sound Verification based on Likelihood Ratio Test

Using the feature vector sequence extracted from the 150-ms-long signal detected as a result of the ASD module, a log-likelihood test is performed on the Gaussian mixture model of each class constructed in the training section to determine whether or not the impact sound is sounded for one impact sound . The log-likelihood ratio of the class to the feature vector column is as follows.

Figure 112015038906680-pat00039
(16)

here

Figure 112015038906680-pat00040
Is the likelihood of the t- th frame with respect to the impact sound model,
Figure 112015038906680-pat00041
Represents the likelihood of the t- th frame for the general sound model. T means the total number of frames at 150 ms. Finally, the test sound compares the log-likelihood ratio with the threshold value as shown in equation (16) to determine whether or not it is an impact sound.

Figure 112015038906680-pat00042
(17)

and &thetas; th denotes a threshold value for determining the impact sound. Threshold adjustments control the trade-off between false alarms and false rejections, allowing the system to be tailored to the user's needs. For example, in a situation where the impact sound must be recognized, it is possible to set the impact sound to be discriminated as the impact sound by accepting that the general sound is misdetected as the impact sound by lowering the threshold value. On the contrary, in a situation where normal sound is not recognized as a sound of a crash sound, it is possible to raise the threshold to detect only a reliable impact sound.

2.4 Sound impact sound counting

In the impact sound counting module

Figure 112015038906680-pat00043
And the number of the impact sound is analyzed. In this module,
Figure 112015038906680-pat00044
The time interval between frames where the value of the impact sound is 1 is grasped and the number of the impact sound sounds appearing for the purpose of the command is grasped. The most important function for counting impact sound is to count the impact only when the impact sound is connected between 200 ms and 500 ms after detecting one impact sound, and there is no input for the impact sound until it exceeds 500 ms , It is performed to perform an operation of returning the number of impact sound counts calculated before. As a more detailed description, a series of operations corresponding to the impact sound counting can be expressed as a state diagram as shown in FIG.

1) WAIT State: The WAIT State is the state waiting for the start of the impact sound

Figure 112015038906680-pat00045
The counter for the sound of impact sound is set to 1 at the instant of 1 and the state is transited to CLAP1.

2) CLAP1 State: CLAP1 State is a state after one impact sound is generated, and performs an operation to observe that the next impact sound occurs between 200 ms and 500 ms. Within that time interval

Figure 112015038906680-pat00046
The counter for the sound of impact sound is set to 2 and the state is changed to CLAP2. On the other hand, when 500 ms elapses without the next impact sound, set the impact sound sound counter to 0 and transition the state to OVER. This is because only two or three impact sound are recognized as impact sound for command.

3) CLAP2 State: The CLAP2 State is the state after the occurrence of two impact sounds, and performs the operation of observing that the next impact sound occurs between 200 ms and 500 ms. Within that time interval

Figure 112015038906680-pat00047
The counter for the sound of impact sound is set to 3 and the state is changed to CLAP3. Conversely, if there is no next impact sound, set the impact sound sound counter to 2 when 500 ms elapses, and transit the state to OVER.

4) CLAP3 State: CLAP3 State is the state after three impact sounds, and performs the operation to observe that the next impact sound occurs between 200 ms and 500 ms. Within that time interval

Figure 112015038906680-pat00048
The counter for the sound of impact sound is set to 0 and the state is changed to OVER. On the other hand, when 500 ms elapses without the next impact sound, set the impact sound sound counter to 3 and transit the state to OVER. This is because the sound of four or more impact sounds is not significant.

5) OVER State: In the over state, it returns the counter value for the sound sound calculated so far and transits the state to the WAIT state. The counter value of the impact sound returned from the OVER state is one of 0, 2, and 3.

3. Threshold Considerations

The impact sound system is largely divided into ASD module, LRT module and impact sound sound counting module, and each module has various adjustable parameters. Of particular importance is the correlation of thresholds used in ASD and LRT modules. As mentioned above, the higher the threshold value used in the ASD module, the harder it becomes to detect the starting point of the impact sound in a noisy situation. Therefore, when the threshold of the ASD module is increased, the LRT module shows a small contribution to the recognition of the impact sound. This is because many sounds are already filtered out from the ASD module. On the other hand, if the threshold of the ASD module is low, the contribution of the LRT module increases. If the threshold value of ASD is low, it is possible to find the starting point of the impact sound even in a noisy environment. However, since the signals other than the impact sound pass through the ASD module, a lot of non-impact sound must be filtered through the LRT. Therefore, when setting the thresholds of these two, the user should be encouraged to strike a loud noise with a certain degree of loudness, and the threshold value of the ASD module should be set to a relatively high value (between 8 and 10). The threshold used in the LRT module should also be adjusted to match the filtered sound, which should be set to MFCC-2 ± 0.5, LFCC-5 ± 1. This is because, as mentioned above, it is much easier to hit an impact sound to raise the value required by the ASD module than to strike an impact sound to increase the value of the Likelihood, and the system can recognize the stable sound.

Claims (6)

delete We extract the features expressing the characteristics of the sound data to be utilized through the collected impact sound database and train the Gaussian Mixture Model (GMM) using the extracted feature vectors to discriminate sounds other than the impact sound A training module for training the GMM in the same manner through a database for general sound, and a speaker module for detecting a sudden signal change from the input signal and extracting a voice characteristic using a voice signal having a predetermined length as a reference, And a recognition module for counting the sound of the impact sound by calculating the likelihood of the mixed sound model of the Gaussian mixture and the general sound Gaussian mixture model, Extraction and Extraction of Acoustic Feature for Impact Sound In the sound recognition method based on a model,
The method of extracting features expressing the characteristics of the sound data to be utilized through the impact sound database includes the Mel-Frequency Cepstral Coefficients (MFCC) and the Linear Frequency Cepstral Coefficients (LFCC), which are acoustic data feature extraction methods, Wherein the acoustic feature extraction and the probability model based acoustic model are used for continuous impact sound.
3. The method of claim 2,
The Gaussian mixture model (GMM) is a method based on acoustic sound recognition feature extraction and the probability model for a series of impact sound, characterized in that using the method diagonal covariance (diagonal covariance).
delete 3. The method of claim 2,
The likelihood ratio test determines whether the impact sound is an impact sound with respect to the impact sound through the log-likelihood ratio of the Gaussian mixture model of each class constructed in the training section using the feature vector stream extracted from the predetermined signal detected as a result of the ASD module A sound recognition method based on acoustic feature extraction and probability model for continuous impact sound.
3. The method of claim 2,
The impact sound counting increases the count only when the sound of the impact sound is detected after the sound of the impact sound is located between the predetermined lengths. If there is no input of the impact sound until the predetermined length is exceeded, And performing an operation of returning the number of times of sounding. The acoustic feature extraction and the probability model based acoustic sound recognition method for continuous impact sound are performed.




KR1020150056272A 2015-04-22 2015-04-22 A method for recognizing sound based on acoustic feature extraction and probabillty model KR101741418B1 (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
KR1020150056272A KR101741418B1 (en) 2015-04-22 2015-04-22 A method for recognizing sound based on acoustic feature extraction and probabillty model

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
KR1020150056272A KR101741418B1 (en) 2015-04-22 2015-04-22 A method for recognizing sound based on acoustic feature extraction and probabillty model

Publications (2)

Publication Number Publication Date
KR20160125628A KR20160125628A (en) 2016-11-01
KR101741418B1 true KR101741418B1 (en) 2017-06-02

Family

ID=57484882

Family Applications (1)

Application Number Title Priority Date Filing Date
KR1020150056272A KR101741418B1 (en) 2015-04-22 2015-04-22 A method for recognizing sound based on acoustic feature extraction and probabillty model

Country Status (1)

Country Link
KR (1) KR101741418B1 (en)

Families Citing this family (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106504510B (en) * 2016-11-11 2021-07-06 青岛海尔智能家电科技有限公司 Remote infrared control method and device
KR102066718B1 (en) * 2017-10-26 2020-01-15 광주과학기술원 Acoustic Tunnel Accident Detection System
CN111564163B (en) * 2020-05-08 2023-12-15 宁波大学 RNN-based multiple fake operation voice detection method

Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
KR101251373B1 (en) * 2011-10-27 2013-04-05 한국과학기술연구원 Sound classification apparatus and method thereof

Patent Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
KR101251373B1 (en) * 2011-10-27 2013-04-05 한국과학기술연구원 Sound classification apparatus and method thereof

Also Published As

Publication number Publication date
KR20160125628A (en) 2016-11-01

Similar Documents

Publication Publication Date Title
US10074384B2 (en) State estimating apparatus, state estimating method, and state estimating computer program
JPH0990974A (en) Signal processor
Droghini et al. A combined one-class SVM and template-matching approach for user-aided human fall detection by means of floor acoustic features
KR101250668B1 (en) Method for recogning emergency speech using gmm
KR101741418B1 (en) A method for recognizing sound based on acoustic feature extraction and probabillty model
Khoa Noise robust voice activity detection
CN110189746A (en) A kind of method for recognizing speech applied to earth-space communication
Kiktova et al. Comparison of different feature types for acoustic event detection system
Choi et al. Selective background adaptation based abnormal acoustic event recognition for audio surveillance
Alonso-Martin et al. Multidomain voice activity detection during human-robot interaction
US20200066268A1 (en) Noise cancellation
Sadjadi et al. Robust front-end processing for speaker identification over extremely degraded communication channels
Cetin et al. Classification of closed and open shell pistachio nuts using principal component analysis of impact acoustics
JP2020524300A (en) Method and device for obtaining event designations based on audio data
JP6616182B2 (en) Speaker recognition device, discriminant value generation method, and program
Varela et al. Combining pulse-based features for rejecting far-field speech in a HMM-based voice activity detector
Suthokumar et al. Use of claimed speaker models for replay detection
Shokri et al. A robust keyword spotting system for Persian conversational telephone speech using feature and score normalization and ARMA filter
Karakos et al. Individual ship detection using underwater acoustics
Ishizuka et al. A feature for voice activity detection derived from speech analysis with the exponential autoregressive model
JP5439221B2 (en) Voice detection device
Besacier et al. Automatic sound recognition relying on statistical methods, with application to telesurveillance
KR20000056849A (en) method for recognizing speech in sound apparatus
Dennis et al. Enhanced local feature approach for overlapping sound event recognition
US11437019B1 (en) System and method for source authentication in voice-controlled automation

Legal Events

Date Code Title Description
A201 Request for examination
E701 Decision to grant or registration of patent right
GRNT Written decision to grant