CN109785857B - Abnormal sound event identification method based on MFCC + MP fusion characteristics - Google Patents

Abnormal sound event identification method based on MFCC + MP fusion characteristics Download PDF

Info

Publication number
CN109785857B
CN109785857B CN201910153124.6A CN201910153124A CN109785857B CN 109785857 B CN109785857 B CN 109785857B CN 201910153124 A CN201910153124 A CN 201910153124A CN 109785857 B CN109785857 B CN 109785857B
Authority
CN
China
Prior art keywords
sound
abnormal
time
frame
classifier
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201910153124.6A
Other languages
Chinese (zh)
Other versions
CN109785857A (en
Inventor
罗丽燕
李芳足
王玫
仇洪冰
宋浠瑜
周陬
覃泓铭
韦金泉
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Guilin University of Electronic Technology
Original Assignee
Guilin University of Electronic Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Guilin University of Electronic Technology filed Critical Guilin University of Electronic Technology
Priority to CN201910153124.6A priority Critical patent/CN109785857B/en
Publication of CN109785857A publication Critical patent/CN109785857A/en
Application granted granted Critical
Publication of CN109785857B publication Critical patent/CN109785857B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Landscapes

  • Image Analysis (AREA)
  • Measurement Of Mechanical Vibrations Or Ultrasonic Waves (AREA)

Abstract

The invention discloses an abnormal sound event identification method based on MFCC + MP fusion characteristics, which is characterized by comprising the following steps of: 1) carrying out first sound preprocessing; 2) extracting sound features for the first time; 3) training a classifier; 4) actually measuring sound input; 5) performing second sound preprocessing; 6) extracting the features for the second time; 7) application of a classifier; 8) and outputting a detection result. The method has good noise robustness, can effectively detect abnormal sounds in sound signals in a low signal-to-noise ratio environment, solves the problem of blind areas in video monitoring, and provides favorable help for security work.

Description

Abnormal sound event identification method based on MFCC + MP fusion characteristics
Technical Field
The invention relates to the technical field of sound signal identification, in particular to detection and identification of abnormal sounds such as gunshot sounds, screaming sounds, glass breaking sounds and the like, which are used for monitoring abnormal events in public places, and particularly relates to an abnormal sound event identification method based on fusion characteristics of Mel-frequency cepstral coefficient (MFCC) and Matching Pursuit (MP).
Background
The human exploration for voice signal recognition has already started in the 50 s of the 20 th century, and researchers from the Bell laboratories of AT & T in 1952 realized a voice recognition system for english digital isolated words of a specific speaker, which is realized by adopting analog electronic devices, mainly extracting formant information of vowels in digital pronunciation, and performing isolated digital recognition of the specific speaker by a simple template matching method. After the 60's of the 20 th century, speech recognition technology has been developed in great length, and the Vintsyuk of the soviet union proposed that the two speech events be out of synchronization by using a dynamic programming method, and by the dynamic time warping algorithm (DWT) proposed by the japanese scholars Sakoe in the 70's of the 20 th century, the method effectively solved the problem of unequal length of speech signals. By the 80 s of the 20 th century, speech signal recognition was brought into a high-speed development period by the introduction of a Hidden Markov model (HMM for short) method based on statistics and the introduction of MFCC features, and thereafter technologies applied to speech recognition were widely studied. In the 90 s of the 20 th century, with the development of computer technology and the pursuit of human beings for more convenient life, human perception of environmental sounds is developed to develop a series of researches, and it is expected that sound signals are collected through a sound sensor on a robot body, and then events occurring around the current environment and the current environment are distinguished through a series of environmental sound perception technologies, so that the robot obtains a certain environmental perception capability. The early studies of these works were conducted by Sawhney and Maes, the national institute of technology, Massachusetts, who obtained a 68% classification accuracy by extracting features from sounds and then classifying the sound scenes using recurrent neural networks and K-nearest neighbor algorithms. Then, with the researchers in the laboratory, the sound scene classification of the continuous sound stream is solved, and the extracted sound features are classified by means of the HMM, so as to obtain a preliminary recognition result. Since the 20 th century, researchers began to study psychoacoustics, and proposed a series of local and global features, Eronen et al extracted MFCC features from sound signals, used GMM to describe feature distribution, and then introduced a Hidden Markov Model (HMM) to reflect the time variation of GMM, providing a better solution for ambient sound perception. In 2008, China national science Foundation starts a major research plan of 'cognitive calculation of visual and auditory information', starts from the research of human on the cognitive mechanism of auditory sense, establishes a mathematical model, solves the problems of 'machine learning and understanding of perception data' and the like, and aims to construct an intelligent vehicle unmanned platform. International competitions regarding detection of sound events, such as DCASE Challenge, have been held in recent years, aiming to search for effective solutions for sound scene classification and important problems in the field of sound event detection on a global scale.
Disclosure of Invention
The invention aims to overcome the defects of the prior art and provides an abnormal sound event identification method based on MFCC + MP fusion characteristics. The method has good noise robustness, can effectively detect abnormal sounds in sound signals in a low signal-to-noise ratio environment, solves the problem of blind areas in video monitoring, and provides favorable help for security work.
The technical scheme for realizing the purpose of the invention is as follows:
an abnormal sound event identification method based on MFCC + MP fusion characteristics is different from the prior art and comprises the following steps:
1) first sound preprocessing: carrying out a series of digital processing on sound signals in a sound library to enable the signal distribution to be more stable and facilitate the extraction of subsequent sound characteristics, wherein the first sound preprocessing comprises normalization processing, framing processing and windowing processing, the normalization processing is to normalize the collected sound signals to be between-1 and 1, and the subsequent processing of the sound signals and the training of a neural network are facilitated; the framing process is to divide a section of sound signal into a group of short and equal-length time frames, when the sampling frequency of the sound signal is 44.1KHz, 1024 points are taken as one frame, and two adjacent frames are overlapped, which is called frame shift, and the frame shift is taken as frame shift
Figure GDA0002530161100000021
Because the ambient sound signal is generally non-stationary, but has short-time stationarity, namely appears stationary within 10-30ms, the stationary characteristic of the sound signal can be increased by framing; the windowing treatment can make the adjacent two frames smoother and more continuous, reduce the frequency spectrum leakage and reduce the windowing positionProcessing by using a Hamming window;
2) first-time sound feature extraction: firstly, extracting 12-order MFCC of each frame signal, then adopting MP algorithm to decompose each frame sound signal, its dictionary adopts Gabor atom, and its formula is
Figure GDA0002530161100000022
Wherein
Figure GDA0002530161100000023
s, u, ω, θ represent the size, time, frequency and phase of the atom,
Figure GDA0002530161100000024
representing a positive integer vector, taking s as 2p,1≤p≤8;u={0,64,128};ω=Ki2.6,1≤i≤35,K=0.5×35-2.6θ is 0; taking the s and omega parameters of the first five atoms, the mean value and the variance of the atoms and MFCC in series as a feature vector of the frame of sound signal, then solving the feature vector of each frame of sound signal of the sound segment, solving first-order and second-order difference parameters of the feature vector as dynamic supplementary features, finally taking 60 frames as feature representation of the sound segment, adopting the classic MFCC features as main sound features, and utilizing MP algorithm to extract time-frequency representation which is more robust to noise as supplementary features, wherein the flexibility, robustness and physical interpretability of the fusion features improve the detection and classification capability of abnormal sound events in the low signal-to-noise ratio environment;
3) training a classifier: the classifier adopts a convolutional neural network, and the convolutional neural network comprises a convolutional layer c1, a pooling layer s1, a convolutional layer c2, a pooling layer s2, a full-connection layer f1 and an output layer (out layer) which are sequentially connected; in the output layer, a noise category is added besides the abnormal sound category to be identified, and the category is trained by using the environmental noise and other sounds, so that the input environmental noise and other sounds can be classified into the category by the classifier, the interference of other sounds except the abnormal sound on the classification result is reduced, and the false detection rate is reduced; when the classifier is trained, the sound library containing a proper amount of noise is used for training the neural network, so that the generalization capability of the neural network can be enhanced; the linear convolution kernel in the convolution neural network is good at processing the sound characteristics represented by time frequency, and can extract the characteristics with better discrimination and higher level, thereby being beneficial to improving the identification accuracy of abnormal sound events;
4) actually measuring sound input: collecting actual measurement sound;
5) and (3) second sound preprocessing: firstly, carrying out normalization processing, framing processing and windowing processing on the actually measured sound, then carrying out noise reduction, wherein the noise reduction adopts amplitude spectrum subtraction, the short-time amplitude spectrum of each frame of sound signal is subtracted by the short-time amplitude spectrum of the noise acquired in advance, and the purpose of noise reduction is achieved through spectrum subtraction, the algorithm is simple and easy to realize, and the method is favorable for improving the identification accuracy of the method on abnormal sound;
6) and (3) second-time feature extraction: the second characteristic extraction mode of the actually measured sound is consistent with the first characteristic extraction method in the step 2);
7) application of the classifier: by taking the sound features of 60 frames as identification features, the classifier trained in the step 3) is used for identifying the first 60 frames of the sound segments, if the abnormal sound is not identified, the classifier is moved backwards for continuous identification for 60 frames, until the abnormal sound is identified, the moment is marked as the starting moment of the abnormal sound, the detection is continued backwards until the abnormal sound is not detected, the previous moment is marked as the ending moment of the abnormal sound, and the detection method can not only detect whether the abnormal sound exists in a section of sound signal, but also accurately position the starting time and the ending time of the abnormal sound;
8) and (3) outputting a detection result: and outputting whether the actually measured sound has abnormal sound and the starting time and the ending time of the abnormal sound.
The technical scheme has the following advantages:
1) the invention adopts the classic MFCC characteristics as the main sound characteristics, and the MP characteristics with good robustness as the supplementary characteristics, so that the flexibility, robustness and physical interpretability of the invention improve the detection and classification capability of abnormal sound events in the environment with low signal-to-noise ratio;
2) the technical scheme uses the convolutional neural network as a classifier, and a linear convolution kernel in the network is good at processing the sound characteristics represented by time frequency, and further extracts the characteristics with better discriminative power and higher level of the sound characteristics, so that the identification accuracy rate of abnormal sound events is improved;
3) according to the technical scheme, the neural network model is trained by using the sound data with different signal to noise ratios, so that the trained neural network has higher generalization capability, and the identification accuracy can be effectively improved.
4) According to the technical scheme, when the classifier is trained, a noise class is added besides the class of abnormal sound to be recognized, the class is trained by using the environmental noise and other sounds, and the input environmental noise and other sounds are classified into the class by the classifier, so that the interference of other sounds except the abnormal sound on a classification result is reduced, and the false detection rate is reduced;
5) according to the technical scheme, the method for positioning the foreground image position in the image by using the sliding window in the machine vision is used for carrying out time positioning on the abnormal sound segment on the collected sound signal segment, so that the starting time and the ending time of the abnormal sound event can be accurately positioned.
The method has good noise robustness, can effectively detect abnormal sounds in sound signals in a low signal-to-noise ratio environment, solves the problem of blind areas in video monitoring, and provides favorable help for security work.
Description of the drawings:
FIG. 1 is a schematic flow chart of the method in the example;
FIG. 2 is a schematic flow chart of a first sound preprocessing in the embodiment;
FIG. 3 is a schematic diagram illustrating a first sound feature extraction process in the embodiment;
FIG. 4 is a schematic structural diagram of a convolutional neural network in an embodiment;
FIG. 5 is a schematic view of a second sound preprocessing flow in the embodiment;
fig. 6 is a functional block diagram of an abnormal sound segment localization in the embodiment.
Detailed Description
The invention will be further elucidated with reference to the drawings and examples, without however being limited thereto.
Example (b):
referring to fig. 1, an abnormal acoustic event identification method based on MFCC + MP fusion features includes the following steps:
1) first sound preprocessing: performing a series of digital processing on the sound signals in the sound library to make the signal distribution more stable and facilitate the extraction of subsequent sound features, wherein the first sound preprocessing comprises normalization processing, framing processing and windowing processing, as shown in fig. 2, the normalization processing is to normalize the collected sound signals to-1, so as to facilitate the subsequent processing of the sound signals and the training of a neural network; the framing process is to divide a section of sound signal into a group of short and equal-length time frames, when the sampling frequency of the sound signal is 44.1KHz, 1024 points are taken as one frame, and two adjacent frames are overlapped, which is called frame shift, and the frame shift is taken as frame shift
Figure GDA0002530161100000041
Because the ambient sound signal is generally non-stationary, but has short-time stationarity, namely appears stationary within 10-30ms, the stationary characteristic of the sound signal can be increased by framing; windowing can enable two adjacent frames to be smoother and continuous, frequency spectrum leakage is reduced, and Hamming window processing is adopted in windowing;
2) first-time sound feature extraction: firstly, extracting 12-order MFCC of each frame signal, then adopting MP algorithm to decompose each frame sound signal, its dictionary adopts Gabor atom, and its formula is
Figure GDA0002530161100000051
Wherein
Figure GDA0002530161100000052
s, u, ω, θ represent the size, time, frequency and phase of the atom,
Figure GDA0002530161100000053
representing a positive integer vector, taking s as 2p,1≤p≤8;u={0,64,128};ω=Ki2.6,1≤i≤35,K=0.5×35-2.6θ is 0; taking the s and omega parameters of the first five atoms, the mean value and the variance of the atoms and MFCC in series as a feature vector of the frame of sound signal, then solving the feature vector of each frame of sound signal of the sound segment, solving first-order and second-order difference parameters of the feature vector as dynamic supplementary features, and finally taking 60 frames as feature representation of the sound of the segment, as shown in FIG. 3, adopting the classic MFCC features as main sound features, and utilizing MP algorithm to extract time-frequency representation which is more robust to noise as supplementary features, wherein the flexibility, robustness and physical interpretability of the fusion features improve the detection and classification capability of abnormal sound events in low signal-to-noise ratio environments;
3) training a classifier: the classifier adopts a convolutional neural network, and the convolutional neural network comprises a convolutional layer c1, a pooling layer s1, a convolutional layer c2, a pooling layer s2, a full-connection layer f1 and an output layer (out layer) which are sequentially connected; in the output layer, a noise category is added besides the abnormal sound category to be identified, and the category is trained by using the environmental noise and other sounds, so that the input environmental noise and other sounds can be classified into the category by the classifier, the interference of other sounds except the abnormal sound on the classification result is reduced, and the false detection rate is reduced; when the classifier is trained, the sound library containing a proper amount of noise is used for training the neural network, so that the generalization capability of the neural network can be enhanced; the linear convolution kernel in the convolution neural network is good at processing the sound characteristics represented by time frequency, and can extract the characteristics with better discrimination and higher level, thereby being beneficial to improving the identification accuracy of abnormal sound events;
4) actually measuring sound input: collecting actual measurement sound;
5) and (3) second sound preprocessing: as shown in fig. 5, firstly, the normalization processing, framing processing and windowing processing are performed on the actually measured sound, then the noise reduction is performed, the amplitude spectrum subtraction is adopted for noise reduction, the short-time amplitude spectrum of each frame of sound signal is subtracted from the short-time amplitude spectrum of the noise collected in advance, the purpose of noise reduction is achieved through the spectrum subtraction, the algorithm is simple and easy to implement, and the method is beneficial to improving the identification accuracy of the method on the abnormal sound;
6) and (3) second-time feature extraction: referring to fig. 3, the second time of feature extraction of the measured sound is consistent with the first time of feature extraction method in step 2);
7) application of the classifier: referring to fig. 6, by taking advantage of the method of positioning the foreground image position in an image by using a sliding window in machine vision to detect and identify an abnormal sound segment of a collected sound segment, because sound features of 60 frames are taken as identification features, the classifier trained in step 3) is used for identifying from the first 60 frames of the sound segment, if no abnormal sound is identified, the classifier is moved backwards by 60 frames to continue identification, until the abnormal sound is identified, the moment is marked as the starting moment of the abnormal sound, the detection continues backwards until the abnormal sound is not detected, the previous moment is marked as the ending moment of the abnormal sound, and the detection method can not only detect whether the abnormal sound exists in a section of sound signal, but also accurately position the starting and ending time of the abnormal sound;
8) and (3) outputting a detection result: and outputting whether the actually measured sound has abnormal sound and the starting time and the ending time of the abnormal sound.

Claims (1)

1. An abnormal sound event identification method based on MFCC + MP fusion features is characterized by comprising the following steps:
1) first sound preprocessing: the first sound preprocessing is to perform normalization processing, framing processing and windowing processing on sound signals selected from a sound database, wherein the normalization processing is to normalize the collected sound signals to be between-1 and 1; the framing processing is to divide a section of sound signal into a group of short and equal-length time frames, when the sampling frequency of the sound signal is 44.1KHz, 1024 points are taken as one frame, and two adjacent frames are overlapped, which is called frame shifting, and the frame shifting is taken as frame shifting
Figure FDA0002530161090000011
Windowed processing miningProcessing by a Hamming window;
2) first-time sound feature extraction: firstly, extracting 12-order MFCC of each frame signal, then adopting MP algorithm to decompose each frame sound signal, its dictionary adopts Gabor atom, and its formula is
Figure FDA0002530161090000012
Wherein
Figure FDA0002530161090000013
θ∈[0,2π](ii) a s, u, ω, θ represent the size, time, frequency and phase of the atom respectively,
Figure FDA0002530161090000014
representing a positive integer vector, taking s as 2p,1≤p≤8;u={0,64,129};ω=Ki2.6,1≤i≤35,K=0.5×35-2.6θ is 0; taking the s and omega parameters of the first five atoms, the mean value and the variance of the atoms and MFCC in series as a feature vector of the frame of sound signal, then solving the feature vector of each frame of sound signal of the sound segment, solving first-order and second-order difference parameters of the solved feature vector as dynamic supplementary features, and finally taking 60 frames in the sound segment as feature representation of the sound segment;
3) training a classifier: the classifier adopts a convolutional neural network, the convolutional neural network comprises a convolutional layer c1, a pooling layer s1, a convolutional layer c2, a pooling layer s2, a fully-connected layer f1 and an output layer (out layer) which are sequentially connected, and when the classifier is trained, a sound library mixed with noise is used for training the neural network;
4) actually measuring sound input: collecting actual measurement sound;
5) and (3) second sound preprocessing: firstly, carrying out normalization processing, framing processing and windowing processing on actual measurement sound, and then carrying out noise reduction, wherein the noise reduction adopts amplitude spectrum subtraction to subtract a short-time amplitude spectrum of noise acquired in advance from a short-time amplitude spectrum of each frame of sound signal;
6) and (3) second-time feature extraction: the second characteristic extraction mode of the actually measured sound is consistent with the first characteristic extraction method in the step 2);
7) application of the classifier: detecting and identifying abnormal sound segments of a section of collected sound segments by using a method for positioning foreground image positions in images by using a sliding window in machine vision, and because sound features of 60 frames are taken as identification features, identifying the sound segments from the first 60 frames of the sound segments by using a classifier trained in the step 3), moving 60 frames backwards to continue identification if abnormal sounds are not identified, marking the moment as the starting moment of the abnormal sounds until the abnormal sounds are identified, continuing to detect backwards until the abnormal sounds are not detected, and marking the previous moment as the ending moment of the abnormal sounds;
8) and (3) outputting a detection result: and outputting whether the actually measured sound has abnormal sound and the starting time and the ending time of the abnormal sound.
CN201910153124.6A 2019-02-28 2019-02-28 Abnormal sound event identification method based on MFCC + MP fusion characteristics Active CN109785857B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910153124.6A CN109785857B (en) 2019-02-28 2019-02-28 Abnormal sound event identification method based on MFCC + MP fusion characteristics

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910153124.6A CN109785857B (en) 2019-02-28 2019-02-28 Abnormal sound event identification method based on MFCC + MP fusion characteristics

Publications (2)

Publication Number Publication Date
CN109785857A CN109785857A (en) 2019-05-21
CN109785857B true CN109785857B (en) 2020-08-14

Family

ID=66486550

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910153124.6A Active CN109785857B (en) 2019-02-28 2019-02-28 Abnormal sound event identification method based on MFCC + MP fusion characteristics

Country Status (1)

Country Link
CN (1) CN109785857B (en)

Families Citing this family (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112349298A (en) * 2019-08-09 2021-02-09 阿里巴巴集团控股有限公司 Sound event recognition method, device, equipment and storage medium
CN110598599A (en) * 2019-08-30 2019-12-20 北京工商大学 Method and device for detecting abnormal gait of human body based on Gabor atomic decomposition
CN111144482B (en) * 2019-12-26 2023-10-27 惠州市锦好医疗科技股份有限公司 Scene matching method and device for digital hearing aid and computer equipment
CN112418181B (en) * 2020-12-13 2023-05-02 西北工业大学 Personnel falling water detection method based on convolutional neural network
CN112687294A (en) * 2020-12-21 2021-04-20 重庆科技学院 Vehicle-mounted noise identification method
CN112669879B (en) * 2020-12-24 2022-06-03 山东大学 Air conditioner indoor unit noise anomaly detection method based on time-frequency domain deep learning algorithm
CN112397055B (en) * 2021-01-19 2021-07-27 北京家人智能科技有限公司 Abnormal sound detection method and device and electronic equipment
CN113470654A (en) * 2021-06-02 2021-10-01 国网浙江省电力有限公司绍兴供电公司 Voiceprint automatic identification system and method
CN114202892B (en) * 2021-11-16 2023-04-25 北京航天试验技术研究所 Hydrogen leakage monitoring method
CN115206302B (en) * 2022-06-30 2023-08-18 中国石油大学(华东) Oil tank boiling-over fire early warning method based on micro-explosion noise identification model

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108899049A (en) * 2018-05-31 2018-11-27 中国地质大学(武汉) A kind of speech-emotion recognition method and system based on convolutional neural networks
CN109065030A (en) * 2018-08-01 2018-12-21 上海大学 Ambient sound recognition methods and system based on convolutional neural networks
CN109087655A (en) * 2018-07-30 2018-12-25 桂林电子科技大学 A kind of monitoring of traffic route sound and exceptional sound recognition system

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US10014003B2 (en) * 2015-10-12 2018-07-03 Gwangju Institute Of Science And Technology Sound detection method for recognizing hazard situation

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108899049A (en) * 2018-05-31 2018-11-27 中国地质大学(武汉) A kind of speech-emotion recognition method and system based on convolutional neural networks
CN109087655A (en) * 2018-07-30 2018-12-25 桂林电子科技大学 A kind of monitoring of traffic route sound and exceptional sound recognition system
CN109065030A (en) * 2018-08-01 2018-12-21 上海大学 Ambient sound recognition methods and system based on convolutional neural networks

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
基于声源特征分析的声频分类的研究;杨松;《中国优秀硕士学位论文全文数据库》;20120731;全文 *

Also Published As

Publication number Publication date
CN109785857A (en) 2019-05-21

Similar Documents

Publication Publication Date Title
CN109785857B (en) Abnormal sound event identification method based on MFCC + MP fusion characteristics
Dennis et al. Overlapping sound event recognition using local spectrogram features and the generalised hough transform
Van Segbroeck et al. A robust frontend for VAD: exploiting contextual, discriminative and spectral cues of human voice.
Maxime et al. Sound representation and classification benchmark for domestic robots
CN111724770B (en) Audio keyword identification method for generating confrontation network based on deep convolution
Venter et al. Automatic detection of African elephant (Loxodonta africana) infrasonic vocalisations from recordings
CN108520753A (en) Voice lie detection method based on the two-way length of convolution memory network in short-term
CN114863937B (en) Mixed bird song recognition method based on deep migration learning and XGBoost
CN108648760A (en) Real-time sound-groove identification System and method for
CN112820279A (en) Parkinson disease detection method based on voice context dynamic characteristics
Wang et al. Audio event detection and classification using extended R-FCN approach
Chee et al. Automatic detection of prolongations and repetitions using LPCC
Illa et al. A comparative study of acoustic-to-articulatory inversion for neutral and whispered speech
Jena et al. Gender recognition of speech signal using knn and svm
Kuang et al. Simplified inverse filter tracked affective acoustic signals classification incorporating deep convolutional neural networks
CN111785262B (en) Speaker age and gender classification method based on residual error network and fusion characteristics
CN117457031A (en) Emotion recognition method based on global acoustic features and local spectrum features of voice
Kamble et al. Emotion recognition for instantaneous Marathi spoken words
CN116434759A (en) Speaker identification method based on SRS-CL network
Ruinskiy et al. Spectral and textural feature-based system for automatic detection of fricatives and affricates
Sharma et al. Classification of children with specific language impairment using pitch-based parameters
Ardiana et al. Gender Classification Based Speaker’s Voice using YIN Algorithm and MFCC
Neti et al. Joint processing of audio and visual information for multimedia indexing and human-computer interaction.
Wang et al. Environmental sound recognition based on double-input convolutional neural network model
CN113628639A (en) Voice emotion recognition method based on multi-head attention mechanism

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant