CN109785857B - Abnormal sound event identification method based on MFCC + MP fusion characteristics - Google Patents
Abnormal sound event identification method based on MFCC + MP fusion characteristics Download PDFInfo
- Publication number
- CN109785857B CN109785857B CN201910153124.6A CN201910153124A CN109785857B CN 109785857 B CN109785857 B CN 109785857B CN 201910153124 A CN201910153124 A CN 201910153124A CN 109785857 B CN109785857 B CN 109785857B
- Authority
- CN
- China
- Prior art keywords
- sound
- abnormal
- time
- frame
- classifier
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
Images
Landscapes
- Image Analysis (AREA)
- Measurement Of Mechanical Vibrations Or Ultrasonic Waves (AREA)
Abstract
The invention discloses an abnormal sound event identification method based on MFCC + MP fusion characteristics, which is characterized by comprising the following steps of: 1) carrying out first sound preprocessing; 2) extracting sound features for the first time; 3) training a classifier; 4) actually measuring sound input; 5) performing second sound preprocessing; 6) extracting the features for the second time; 7) application of a classifier; 8) and outputting a detection result. The method has good noise robustness, can effectively detect abnormal sounds in sound signals in a low signal-to-noise ratio environment, solves the problem of blind areas in video monitoring, and provides favorable help for security work.
Description
Technical Field
The invention relates to the technical field of sound signal identification, in particular to detection and identification of abnormal sounds such as gunshot sounds, screaming sounds, glass breaking sounds and the like, which are used for monitoring abnormal events in public places, and particularly relates to an abnormal sound event identification method based on fusion characteristics of Mel-frequency cepstral coefficient (MFCC) and Matching Pursuit (MP).
Background
The human exploration for voice signal recognition has already started in the 50 s of the 20 th century, and researchers from the Bell laboratories of AT & T in 1952 realized a voice recognition system for english digital isolated words of a specific speaker, which is realized by adopting analog electronic devices, mainly extracting formant information of vowels in digital pronunciation, and performing isolated digital recognition of the specific speaker by a simple template matching method. After the 60's of the 20 th century, speech recognition technology has been developed in great length, and the Vintsyuk of the soviet union proposed that the two speech events be out of synchronization by using a dynamic programming method, and by the dynamic time warping algorithm (DWT) proposed by the japanese scholars Sakoe in the 70's of the 20 th century, the method effectively solved the problem of unequal length of speech signals. By the 80 s of the 20 th century, speech signal recognition was brought into a high-speed development period by the introduction of a Hidden Markov model (HMM for short) method based on statistics and the introduction of MFCC features, and thereafter technologies applied to speech recognition were widely studied. In the 90 s of the 20 th century, with the development of computer technology and the pursuit of human beings for more convenient life, human perception of environmental sounds is developed to develop a series of researches, and it is expected that sound signals are collected through a sound sensor on a robot body, and then events occurring around the current environment and the current environment are distinguished through a series of environmental sound perception technologies, so that the robot obtains a certain environmental perception capability. The early studies of these works were conducted by Sawhney and Maes, the national institute of technology, Massachusetts, who obtained a 68% classification accuracy by extracting features from sounds and then classifying the sound scenes using recurrent neural networks and K-nearest neighbor algorithms. Then, with the researchers in the laboratory, the sound scene classification of the continuous sound stream is solved, and the extracted sound features are classified by means of the HMM, so as to obtain a preliminary recognition result. Since the 20 th century, researchers began to study psychoacoustics, and proposed a series of local and global features, Eronen et al extracted MFCC features from sound signals, used GMM to describe feature distribution, and then introduced a Hidden Markov Model (HMM) to reflect the time variation of GMM, providing a better solution for ambient sound perception. In 2008, China national science Foundation starts a major research plan of 'cognitive calculation of visual and auditory information', starts from the research of human on the cognitive mechanism of auditory sense, establishes a mathematical model, solves the problems of 'machine learning and understanding of perception data' and the like, and aims to construct an intelligent vehicle unmanned platform. International competitions regarding detection of sound events, such as DCASE Challenge, have been held in recent years, aiming to search for effective solutions for sound scene classification and important problems in the field of sound event detection on a global scale.
Disclosure of Invention
The invention aims to overcome the defects of the prior art and provides an abnormal sound event identification method based on MFCC + MP fusion characteristics. The method has good noise robustness, can effectively detect abnormal sounds in sound signals in a low signal-to-noise ratio environment, solves the problem of blind areas in video monitoring, and provides favorable help for security work.
The technical scheme for realizing the purpose of the invention is as follows:
an abnormal sound event identification method based on MFCC + MP fusion characteristics is different from the prior art and comprises the following steps:
1) first sound preprocessing: carrying out a series of digital processing on sound signals in a sound library to enable the signal distribution to be more stable and facilitate the extraction of subsequent sound characteristics, wherein the first sound preprocessing comprises normalization processing, framing processing and windowing processing, the normalization processing is to normalize the collected sound signals to be between-1 and 1, and the subsequent processing of the sound signals and the training of a neural network are facilitated; the framing process is to divide a section of sound signal into a group of short and equal-length time frames, when the sampling frequency of the sound signal is 44.1KHz, 1024 points are taken as one frame, and two adjacent frames are overlapped, which is called frame shift, and the frame shift is taken as frame shiftBecause the ambient sound signal is generally non-stationary, but has short-time stationarity, namely appears stationary within 10-30ms, the stationary characteristic of the sound signal can be increased by framing; the windowing treatment can make the adjacent two frames smoother and more continuous, reduce the frequency spectrum leakage and reduce the windowing positionProcessing by using a Hamming window;
2) first-time sound feature extraction: firstly, extracting 12-order MFCC of each frame signal, then adopting MP algorithm to decompose each frame sound signal, its dictionary adopts Gabor atom, and its formula isWhereins, u, ω, θ represent the size, time, frequency and phase of the atom,representing a positive integer vector, taking s as 2p,1≤p≤8;u={0,64,128};ω=Ki2.6,1≤i≤35,K=0.5×35-2.6θ is 0; taking the s and omega parameters of the first five atoms, the mean value and the variance of the atoms and MFCC in series as a feature vector of the frame of sound signal, then solving the feature vector of each frame of sound signal of the sound segment, solving first-order and second-order difference parameters of the feature vector as dynamic supplementary features, finally taking 60 frames as feature representation of the sound segment, adopting the classic MFCC features as main sound features, and utilizing MP algorithm to extract time-frequency representation which is more robust to noise as supplementary features, wherein the flexibility, robustness and physical interpretability of the fusion features improve the detection and classification capability of abnormal sound events in the low signal-to-noise ratio environment;
3) training a classifier: the classifier adopts a convolutional neural network, and the convolutional neural network comprises a convolutional layer c1, a pooling layer s1, a convolutional layer c2, a pooling layer s2, a full-connection layer f1 and an output layer (out layer) which are sequentially connected; in the output layer, a noise category is added besides the abnormal sound category to be identified, and the category is trained by using the environmental noise and other sounds, so that the input environmental noise and other sounds can be classified into the category by the classifier, the interference of other sounds except the abnormal sound on the classification result is reduced, and the false detection rate is reduced; when the classifier is trained, the sound library containing a proper amount of noise is used for training the neural network, so that the generalization capability of the neural network can be enhanced; the linear convolution kernel in the convolution neural network is good at processing the sound characteristics represented by time frequency, and can extract the characteristics with better discrimination and higher level, thereby being beneficial to improving the identification accuracy of abnormal sound events;
4) actually measuring sound input: collecting actual measurement sound;
5) and (3) second sound preprocessing: firstly, carrying out normalization processing, framing processing and windowing processing on the actually measured sound, then carrying out noise reduction, wherein the noise reduction adopts amplitude spectrum subtraction, the short-time amplitude spectrum of each frame of sound signal is subtracted by the short-time amplitude spectrum of the noise acquired in advance, and the purpose of noise reduction is achieved through spectrum subtraction, the algorithm is simple and easy to realize, and the method is favorable for improving the identification accuracy of the method on abnormal sound;
6) and (3) second-time feature extraction: the second characteristic extraction mode of the actually measured sound is consistent with the first characteristic extraction method in the step 2);
7) application of the classifier: by taking the sound features of 60 frames as identification features, the classifier trained in the step 3) is used for identifying the first 60 frames of the sound segments, if the abnormal sound is not identified, the classifier is moved backwards for continuous identification for 60 frames, until the abnormal sound is identified, the moment is marked as the starting moment of the abnormal sound, the detection is continued backwards until the abnormal sound is not detected, the previous moment is marked as the ending moment of the abnormal sound, and the detection method can not only detect whether the abnormal sound exists in a section of sound signal, but also accurately position the starting time and the ending time of the abnormal sound;
8) and (3) outputting a detection result: and outputting whether the actually measured sound has abnormal sound and the starting time and the ending time of the abnormal sound.
The technical scheme has the following advantages:
1) the invention adopts the classic MFCC characteristics as the main sound characteristics, and the MP characteristics with good robustness as the supplementary characteristics, so that the flexibility, robustness and physical interpretability of the invention improve the detection and classification capability of abnormal sound events in the environment with low signal-to-noise ratio;
2) the technical scheme uses the convolutional neural network as a classifier, and a linear convolution kernel in the network is good at processing the sound characteristics represented by time frequency, and further extracts the characteristics with better discriminative power and higher level of the sound characteristics, so that the identification accuracy rate of abnormal sound events is improved;
3) according to the technical scheme, the neural network model is trained by using the sound data with different signal to noise ratios, so that the trained neural network has higher generalization capability, and the identification accuracy can be effectively improved.
4) According to the technical scheme, when the classifier is trained, a noise class is added besides the class of abnormal sound to be recognized, the class is trained by using the environmental noise and other sounds, and the input environmental noise and other sounds are classified into the class by the classifier, so that the interference of other sounds except the abnormal sound on a classification result is reduced, and the false detection rate is reduced;
5) according to the technical scheme, the method for positioning the foreground image position in the image by using the sliding window in the machine vision is used for carrying out time positioning on the abnormal sound segment on the collected sound signal segment, so that the starting time and the ending time of the abnormal sound event can be accurately positioned.
The method has good noise robustness, can effectively detect abnormal sounds in sound signals in a low signal-to-noise ratio environment, solves the problem of blind areas in video monitoring, and provides favorable help for security work.
Description of the drawings:
FIG. 1 is a schematic flow chart of the method in the example;
FIG. 2 is a schematic flow chart of a first sound preprocessing in the embodiment;
FIG. 3 is a schematic diagram illustrating a first sound feature extraction process in the embodiment;
FIG. 4 is a schematic structural diagram of a convolutional neural network in an embodiment;
FIG. 5 is a schematic view of a second sound preprocessing flow in the embodiment;
fig. 6 is a functional block diagram of an abnormal sound segment localization in the embodiment.
Detailed Description
The invention will be further elucidated with reference to the drawings and examples, without however being limited thereto.
Example (b):
referring to fig. 1, an abnormal acoustic event identification method based on MFCC + MP fusion features includes the following steps:
1) first sound preprocessing: performing a series of digital processing on the sound signals in the sound library to make the signal distribution more stable and facilitate the extraction of subsequent sound features, wherein the first sound preprocessing comprises normalization processing, framing processing and windowing processing, as shown in fig. 2, the normalization processing is to normalize the collected sound signals to-1, so as to facilitate the subsequent processing of the sound signals and the training of a neural network; the framing process is to divide a section of sound signal into a group of short and equal-length time frames, when the sampling frequency of the sound signal is 44.1KHz, 1024 points are taken as one frame, and two adjacent frames are overlapped, which is called frame shift, and the frame shift is taken as frame shiftBecause the ambient sound signal is generally non-stationary, but has short-time stationarity, namely appears stationary within 10-30ms, the stationary characteristic of the sound signal can be increased by framing; windowing can enable two adjacent frames to be smoother and continuous, frequency spectrum leakage is reduced, and Hamming window processing is adopted in windowing;
2) first-time sound feature extraction: firstly, extracting 12-order MFCC of each frame signal, then adopting MP algorithm to decompose each frame sound signal, its dictionary adopts Gabor atom, and its formula isWhereins, u, ω, θ represent the size, time, frequency and phase of the atom,representing a positive integer vector, taking s as 2p,1≤p≤8;u={0,64,128};ω=Ki2.6,1≤i≤35,K=0.5×35-2.6θ is 0; taking the s and omega parameters of the first five atoms, the mean value and the variance of the atoms and MFCC in series as a feature vector of the frame of sound signal, then solving the feature vector of each frame of sound signal of the sound segment, solving first-order and second-order difference parameters of the feature vector as dynamic supplementary features, and finally taking 60 frames as feature representation of the sound of the segment, as shown in FIG. 3, adopting the classic MFCC features as main sound features, and utilizing MP algorithm to extract time-frequency representation which is more robust to noise as supplementary features, wherein the flexibility, robustness and physical interpretability of the fusion features improve the detection and classification capability of abnormal sound events in low signal-to-noise ratio environments;
3) training a classifier: the classifier adopts a convolutional neural network, and the convolutional neural network comprises a convolutional layer c1, a pooling layer s1, a convolutional layer c2, a pooling layer s2, a full-connection layer f1 and an output layer (out layer) which are sequentially connected; in the output layer, a noise category is added besides the abnormal sound category to be identified, and the category is trained by using the environmental noise and other sounds, so that the input environmental noise and other sounds can be classified into the category by the classifier, the interference of other sounds except the abnormal sound on the classification result is reduced, and the false detection rate is reduced; when the classifier is trained, the sound library containing a proper amount of noise is used for training the neural network, so that the generalization capability of the neural network can be enhanced; the linear convolution kernel in the convolution neural network is good at processing the sound characteristics represented by time frequency, and can extract the characteristics with better discrimination and higher level, thereby being beneficial to improving the identification accuracy of abnormal sound events;
4) actually measuring sound input: collecting actual measurement sound;
5) and (3) second sound preprocessing: as shown in fig. 5, firstly, the normalization processing, framing processing and windowing processing are performed on the actually measured sound, then the noise reduction is performed, the amplitude spectrum subtraction is adopted for noise reduction, the short-time amplitude spectrum of each frame of sound signal is subtracted from the short-time amplitude spectrum of the noise collected in advance, the purpose of noise reduction is achieved through the spectrum subtraction, the algorithm is simple and easy to implement, and the method is beneficial to improving the identification accuracy of the method on the abnormal sound;
6) and (3) second-time feature extraction: referring to fig. 3, the second time of feature extraction of the measured sound is consistent with the first time of feature extraction method in step 2);
7) application of the classifier: referring to fig. 6, by taking advantage of the method of positioning the foreground image position in an image by using a sliding window in machine vision to detect and identify an abnormal sound segment of a collected sound segment, because sound features of 60 frames are taken as identification features, the classifier trained in step 3) is used for identifying from the first 60 frames of the sound segment, if no abnormal sound is identified, the classifier is moved backwards by 60 frames to continue identification, until the abnormal sound is identified, the moment is marked as the starting moment of the abnormal sound, the detection continues backwards until the abnormal sound is not detected, the previous moment is marked as the ending moment of the abnormal sound, and the detection method can not only detect whether the abnormal sound exists in a section of sound signal, but also accurately position the starting and ending time of the abnormal sound;
8) and (3) outputting a detection result: and outputting whether the actually measured sound has abnormal sound and the starting time and the ending time of the abnormal sound.
Claims (1)
1. An abnormal sound event identification method based on MFCC + MP fusion features is characterized by comprising the following steps:
1) first sound preprocessing: the first sound preprocessing is to perform normalization processing, framing processing and windowing processing on sound signals selected from a sound database, wherein the normalization processing is to normalize the collected sound signals to be between-1 and 1; the framing processing is to divide a section of sound signal into a group of short and equal-length time frames, when the sampling frequency of the sound signal is 44.1KHz, 1024 points are taken as one frame, and two adjacent frames are overlapped, which is called frame shifting, and the frame shifting is taken as frame shiftingWindowed processing miningProcessing by a Hamming window;
2) first-time sound feature extraction: firstly, extracting 12-order MFCC of each frame signal, then adopting MP algorithm to decompose each frame sound signal, its dictionary adopts Gabor atom, and its formula isWhereinθ∈[0,2π](ii) a s, u, ω, θ represent the size, time, frequency and phase of the atom respectively,representing a positive integer vector, taking s as 2p,1≤p≤8;u={0,64,129};ω=Ki2.6,1≤i≤35,K=0.5×35-2.6θ is 0; taking the s and omega parameters of the first five atoms, the mean value and the variance of the atoms and MFCC in series as a feature vector of the frame of sound signal, then solving the feature vector of each frame of sound signal of the sound segment, solving first-order and second-order difference parameters of the solved feature vector as dynamic supplementary features, and finally taking 60 frames in the sound segment as feature representation of the sound segment;
3) training a classifier: the classifier adopts a convolutional neural network, the convolutional neural network comprises a convolutional layer c1, a pooling layer s1, a convolutional layer c2, a pooling layer s2, a fully-connected layer f1 and an output layer (out layer) which are sequentially connected, and when the classifier is trained, a sound library mixed with noise is used for training the neural network;
4) actually measuring sound input: collecting actual measurement sound;
5) and (3) second sound preprocessing: firstly, carrying out normalization processing, framing processing and windowing processing on actual measurement sound, and then carrying out noise reduction, wherein the noise reduction adopts amplitude spectrum subtraction to subtract a short-time amplitude spectrum of noise acquired in advance from a short-time amplitude spectrum of each frame of sound signal;
6) and (3) second-time feature extraction: the second characteristic extraction mode of the actually measured sound is consistent with the first characteristic extraction method in the step 2);
7) application of the classifier: detecting and identifying abnormal sound segments of a section of collected sound segments by using a method for positioning foreground image positions in images by using a sliding window in machine vision, and because sound features of 60 frames are taken as identification features, identifying the sound segments from the first 60 frames of the sound segments by using a classifier trained in the step 3), moving 60 frames backwards to continue identification if abnormal sounds are not identified, marking the moment as the starting moment of the abnormal sounds until the abnormal sounds are identified, continuing to detect backwards until the abnormal sounds are not detected, and marking the previous moment as the ending moment of the abnormal sounds;
8) and (3) outputting a detection result: and outputting whether the actually measured sound has abnormal sound and the starting time and the ending time of the abnormal sound.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910153124.6A CN109785857B (en) | 2019-02-28 | 2019-02-28 | Abnormal sound event identification method based on MFCC + MP fusion characteristics |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910153124.6A CN109785857B (en) | 2019-02-28 | 2019-02-28 | Abnormal sound event identification method based on MFCC + MP fusion characteristics |
Publications (2)
Publication Number | Publication Date |
---|---|
CN109785857A CN109785857A (en) | 2019-05-21 |
CN109785857B true CN109785857B (en) | 2020-08-14 |
Family
ID=66486550
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201910153124.6A Active CN109785857B (en) | 2019-02-28 | 2019-02-28 | Abnormal sound event identification method based on MFCC + MP fusion characteristics |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN109785857B (en) |
Families Citing this family (10)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN112349298A (en) * | 2019-08-09 | 2021-02-09 | 阿里巴巴集团控股有限公司 | Sound event recognition method, device, equipment and storage medium |
CN110598599A (en) * | 2019-08-30 | 2019-12-20 | 北京工商大学 | Method and device for detecting abnormal gait of human body based on Gabor atomic decomposition |
CN111144482B (en) * | 2019-12-26 | 2023-10-27 | 惠州市锦好医疗科技股份有限公司 | Scene matching method and device for digital hearing aid and computer equipment |
CN112418181B (en) * | 2020-12-13 | 2023-05-02 | 西北工业大学 | Personnel falling water detection method based on convolutional neural network |
CN112687294A (en) * | 2020-12-21 | 2021-04-20 | 重庆科技学院 | Vehicle-mounted noise identification method |
CN112669879B (en) * | 2020-12-24 | 2022-06-03 | 山东大学 | Air conditioner indoor unit noise anomaly detection method based on time-frequency domain deep learning algorithm |
CN112397055B (en) * | 2021-01-19 | 2021-07-27 | 北京家人智能科技有限公司 | Abnormal sound detection method and device and electronic equipment |
CN113470654A (en) * | 2021-06-02 | 2021-10-01 | 国网浙江省电力有限公司绍兴供电公司 | Voiceprint automatic identification system and method |
CN114202892B (en) * | 2021-11-16 | 2023-04-25 | 北京航天试验技术研究所 | Hydrogen leakage monitoring method |
CN115206302B (en) * | 2022-06-30 | 2023-08-18 | 中国石油大学(华东) | Oil tank boiling-over fire early warning method based on micro-explosion noise identification model |
Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN108899049A (en) * | 2018-05-31 | 2018-11-27 | 中国地质大学(武汉) | A kind of speech-emotion recognition method and system based on convolutional neural networks |
CN109065030A (en) * | 2018-08-01 | 2018-12-21 | 上海大学 | Ambient sound recognition methods and system based on convolutional neural networks |
CN109087655A (en) * | 2018-07-30 | 2018-12-25 | 桂林电子科技大学 | A kind of monitoring of traffic route sound and exceptional sound recognition system |
Family Cites Families (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US10014003B2 (en) * | 2015-10-12 | 2018-07-03 | Gwangju Institute Of Science And Technology | Sound detection method for recognizing hazard situation |
-
2019
- 2019-02-28 CN CN201910153124.6A patent/CN109785857B/en active Active
Patent Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN108899049A (en) * | 2018-05-31 | 2018-11-27 | 中国地质大学(武汉) | A kind of speech-emotion recognition method and system based on convolutional neural networks |
CN109087655A (en) * | 2018-07-30 | 2018-12-25 | 桂林电子科技大学 | A kind of monitoring of traffic route sound and exceptional sound recognition system |
CN109065030A (en) * | 2018-08-01 | 2018-12-21 | 上海大学 | Ambient sound recognition methods and system based on convolutional neural networks |
Non-Patent Citations (1)
Title |
---|
基于声源特征分析的声频分类的研究;杨松;《中国优秀硕士学位论文全文数据库》;20120731;全文 * |
Also Published As
Publication number | Publication date |
---|---|
CN109785857A (en) | 2019-05-21 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN109785857B (en) | Abnormal sound event identification method based on MFCC + MP fusion characteristics | |
Dennis et al. | Overlapping sound event recognition using local spectrogram features and the generalised hough transform | |
Van Segbroeck et al. | A robust frontend for VAD: exploiting contextual, discriminative and spectral cues of human voice. | |
Maxime et al. | Sound representation and classification benchmark for domestic robots | |
CN111724770B (en) | Audio keyword identification method for generating confrontation network based on deep convolution | |
Venter et al. | Automatic detection of African elephant (Loxodonta africana) infrasonic vocalisations from recordings | |
CN108520753A (en) | Voice lie detection method based on the two-way length of convolution memory network in short-term | |
CN114863937B (en) | Mixed bird song recognition method based on deep migration learning and XGBoost | |
CN108648760A (en) | Real-time sound-groove identification System and method for | |
CN112820279A (en) | Parkinson disease detection method based on voice context dynamic characteristics | |
Wang et al. | Audio event detection and classification using extended R-FCN approach | |
Chee et al. | Automatic detection of prolongations and repetitions using LPCC | |
Illa et al. | A comparative study of acoustic-to-articulatory inversion for neutral and whispered speech | |
Jena et al. | Gender recognition of speech signal using knn and svm | |
Kuang et al. | Simplified inverse filter tracked affective acoustic signals classification incorporating deep convolutional neural networks | |
CN111785262B (en) | Speaker age and gender classification method based on residual error network and fusion characteristics | |
CN117457031A (en) | Emotion recognition method based on global acoustic features and local spectrum features of voice | |
Kamble et al. | Emotion recognition for instantaneous Marathi spoken words | |
CN116434759A (en) | Speaker identification method based on SRS-CL network | |
Ruinskiy et al. | Spectral and textural feature-based system for automatic detection of fricatives and affricates | |
Sharma et al. | Classification of children with specific language impairment using pitch-based parameters | |
Ardiana et al. | Gender Classification Based Speaker’s Voice using YIN Algorithm and MFCC | |
Neti et al. | Joint processing of audio and visual information for multimedia indexing and human-computer interaction. | |
Wang et al. | Environmental sound recognition based on double-input convolutional neural network model | |
CN113628639A (en) | Voice emotion recognition method based on multi-head attention mechanism |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |