CN112216287A - Environmental sound identification method based on ensemble learning and convolution neural network - Google Patents

Environmental sound identification method based on ensemble learning and convolution neural network Download PDF

Info

Publication number
CN112216287A
CN112216287A CN202011020706.6A CN202011020706A CN112216287A CN 112216287 A CN112216287 A CN 112216287A CN 202011020706 A CN202011020706 A CN 202011020706A CN 112216287 A CN112216287 A CN 112216287A
Authority
CN
China
Prior art keywords
neural network
data
convolutional neural
sound
training
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202011020706.6A
Other languages
Chinese (zh)
Inventor
陈俊
谢维
王震宇
郭宏成
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Jiangsu Lishi Technology Co ltd
Original Assignee
Jiangsu Lishi Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Jiangsu Lishi Technology Co ltd filed Critical Jiangsu Lishi Technology Co ltd
Priority to CN202011020706.6A priority Critical patent/CN112216287A/en
Publication of CN112216287A publication Critical patent/CN112216287A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L17/00Speaker identification or verification techniques
    • G10L17/26Recognition of special voice characteristics, e.g. for use in lie detectors; Recognition of animal voices
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L17/00Speaker identification or verification techniques
    • G10L17/04Training, enrolment or model building
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/03Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the type of extracted parameters
    • G10L25/18Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the type of extracted parameters the extracted parameters being spectral information of each sub-band
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/27Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the analysis technique
    • G10L25/30Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the analysis technique using neural networks
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/45Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the type of analysis window

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Health & Medical Sciences (AREA)
  • Multimedia (AREA)
  • Acoustics & Sound (AREA)
  • Human Computer Interaction (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Computational Linguistics (AREA)
  • Signal Processing (AREA)
  • Theoretical Computer Science (AREA)
  • Artificial Intelligence (AREA)
  • Evolutionary Computation (AREA)
  • Biophysics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Computing Systems (AREA)
  • Molecular Biology (AREA)
  • General Health & Medical Sciences (AREA)
  • Data Mining & Analysis (AREA)
  • Biomedical Technology (AREA)
  • Spectroscopy & Molecular Physics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Measurement Of Mechanical Vibrations Or Ultrasonic Waves (AREA)

Abstract

The invention discloses an environmental sound identification method based on integrated learning and convolutional neural network, which comprises the following steps: s1, feature extraction, namely framing and windowing the original audio, obtaining a Mel energy spectrum of the sound by utilizing a Mel filter bank, and finally obtaining the final Mel energy spectrum feature as a data set; s2, model training, namely performing model training on the data set by adopting K-fold cross validation and using a mixup data enhancement method to obtain K convolutional neural network models; and S3, testing the sound, and identifying the sound sample to be tested through a convolutional neural network model. The method can train k models by utilizing k-fold cross validation and combine the k models to perform voice recognition, greatly enhances the generalization capability of the models, effectively relieves the phenomenon of overfitting, and uses mixup data to enhance the generalization capability of the models by mixing the original samples aiming at the condition of small data volume.

Description

Environmental sound identification method based on ensemble learning and convolution neural network
Technical Field
The invention relates to the technical field of artificial intelligence, in particular to an environmental sound identification method based on integrated learning and convolutional neural network.
Background
In the research of audio information, environmental sound identification is an important research field, and has great application potential in the fields of safety monitoring, medical monitoring, smart home, scene analysis and the like. Compared with voice recognition, the environmental sound has the characteristics of noise-like property, wide frequency spectrum and the like, so that the recognition of the environmental sound is more challenging.
The existing environmental sound recognition method based on the convolutional neural network generally divides the existing data into a training set and a test set, then trains a model by using the training set until the model converges, tests the model by using the test set in the training process, selects the model which best appears on the test set for storage, and finally performs the environmental sound recognition by using the stored convolutional neural network
The existing identification method based on the convolutional neural network, the identification method based on the convolutional neural network and the cyclic neural network and the identification method based on the Gaussian mixture model are all used for identifying unknown environmental audio by training a single model through existing environmental audio data, and the models trained by the method have the defects of weak generalization capability and easy occurrence of overfitting.
Disclosure of Invention
In view of the foregoing defects in the prior art, the technical problem to be solved by the present invention is to provide an environmental sound recognition method based on ensemble learning and convolutional neural network, which can train k models by using k-fold cross validation and perform sound recognition by combining the k models, greatly enhance the generalization capability of the models, effectively alleviate the over-fitting phenomenon, and further enhance the generalization capability of the models by mixing the original samples by using mix-up data enhancement in view of the small data volume.
In order to achieve the above object, the present invention provides an environmental sound identification method based on ensemble learning and convolutional neural network, comprising the following steps:
s1, feature extraction, namely framing and windowing the original audio, obtaining a corresponding amplitude spectrum through FFT (fast Fourier transform) for each short-time analysis window, taking a square to obtain an energy spectrum of the sound, then obtaining a Mel energy spectrum of the sound by using a Mel filter bank, and then carrying out log nonlinear transformation on the Mel energy spectrum to obtain the final Mel energy spectrum feature which is used as a data set;
s2, performing model training, namely dividing the data set into K parts in an equal proportion by adopting K-fold cross validation, taking one part as test data and the other K-1 parts as training data, then mixing the training data by using mixup data enhancement for model training, storing the model which is best in performance on the test data, and repeating the operation for more than K times to obtain K convolutional neural network models;
and S3, sound testing, wherein the same characteristic extraction step as the step S1 is adopted for the sound sample to be tested, the Mel energy spectrum characteristic of the sound to be tested is obtained and is used as a test sample, the test sample is input into k trained convolutional neural network models, the output of the k convolutional neural network models is sent into a combination module, the combination module takes the output mode as the final output of the integrated model, the final output is compared with the class corresponding to the test set sample, and the recognition rate of the environmental sound is calculated.
Further, the step S1 of framing and windowing the original audio specifically includes: the audio data N sampling points are collected into an observation unit called a frame, an overlapping area is formed between two adjacent frames, and each frame is substituted into a window function to eliminate signal discontinuity caused by two ends of each frame.
Further, in each operation of step S2, a different data portion is selected from the k portions to be used as test data, it is ensured that the data of the k portions are respectively used as test data, and the remaining k-1 portions are used as training data.
Further, the mixup data enhancement in step S2 is specifically: randomly selecting two characteristic samples, mixing the two characteristic samples in proportion, and constructing a new training sample and a new label in a linear interpolation mode, wherein the label is finally processed by the following formula:
Figure BDA0002700514490000021
Figure BDA0002700514490000022
(xi,yi)、(xj,yj) The two data pairs are training sample pairs in the original data set, i.e. training samples and their corresponding labels, where λ is a parameter subject to B distribution, λ -Beta (α, α).
Further, when the model training is performed in step S2, the convolution kernel and the weight are initialized uniformly by Glorot, and the bias is initialized by all 0S.
Further, when the model training is performed in step S2, the Adam algorithm is used to update the network parameters, and when the number of network iterations reaches a preset number of iterations or the recognition accuracy on the verification set is not improved, the training is stopped and the trained convolutional neural network model is stored.
The invention has the beneficial effects that:
the method can train k models by utilizing k-fold cross validation and combine the k models to perform voice recognition, greatly enhances the generalization capability of the models, effectively relieves the phenomenon of overfitting, and uses mixup data to enhance the generalization capability of the models by mixing the original samples aiming at the condition of small data volume.
The conception, the specific structure and the technical effects of the present invention will be further described with reference to the accompanying drawings to fully understand the objects, the features and the effects of the present invention.
Drawings
FIG. 1 is a flow chart of the method of the present invention.
FIG. 2 is a schematic diagram of the integrated model prediction of the present invention.
Detailed Description
As shown in fig. 1, a flowchart of an environmental sound identification method based on ensemble learning and convolutional neural network includes the following steps:
s1, extracting characteristics, wherein for the convenience of speech analysis, N sampling points are firstly collected into an observation unit called a frame, so as to avoid overlarge change of two adjacent frames, and therefore, an overlapping area is formed between the two adjacent frames. Each frame is substituted into a window function to eliminate signal discontinuities that may be caused across the frames. For each short-time analysis window, obtaining a corresponding amplitude spectrum through FFT, taking a square to obtain an energy spectrum of sound, then obtaining a Mel energy spectrum of the sound by utilizing a Mel filter bank, and then obtaining log nonlinear transformation of the Mel energy spectrum to obtain the final Mel energy spectrum characteristic;
and S2, model training, namely dividing the data set into k parts in an equal proportion by adopting k-fold cross validation, wherein one part is used as test data, and the other k-1 parts are used as training data. Meanwhile, because the data volume of the data set is small, the embodiment mixes the characteristic data by using mixup data enhancement and then uses the characteristic data for model training so as to improve the generalization capability of the model. And inputting the training set into a convolutional neural network model for supervised training, storing the model with the best performance on test data, and repeating the operation for more than K times to obtain K convolutional neural network models. During training, the convolution kernel and the weight are initialized uniformly by using Glorot, and the bias is initialized by using all 0. And updating network parameters by adopting an Adam algorithm, and stopping training and storing the trained convolutional neural network model when the network iteration times reach the preset iteration times or the identification precision on the verification set is not improved for a long time.
The Mixup data enhancement refers to the random selection of two characteristic samples, the mixing of the two characteristic samples is carried out in proportion, a new training sample and a new label are constructed in a linear interpolation mode, and finally the label is processed according to the following formula:
Figure BDA0002700514490000041
(xi,yi),(xj,yj) The two data pairs are training sample pairs (training samples and their corresponding labels) in the raw data set. Where λ is a parameter subject to the B distribution, λ -Beta (α, α).
S3, testing: the method comprises the steps of obtaining Mel energy spectrum characteristics of sound to be tested by adopting the same characteristic extraction steps as those in a training stage for the sound sample to be tested, inputting the test sample into k trained convolutional neural network models, sending the output of the k models into a combination module, using the output mode of the combination module as the final output of an integrated model, comparing the final output with the category corresponding to a test set sample, and calculating the recognition rate of environmental sound. FIG. 2 is a schematic diagram of model prediction.
Specifically, the present embodiment uses a convolutional neural network and a mixup method for performance testing on ESC-50 data sets. The ESC-50 data set contains 2000 natural environment sound segments, each with a 5 second sound length and a sample rate of 44.1 kHz. The data set includes: 5 major categories, namely 5 major categories of animal cry, natural environment sound, human non-voice sound, indoor sound and urban outdoor sound, wherein each major category comprises 10 types of sound, and each type of sound has 40 samples. The data set details are shown in table 1.
TABLE 1 ambient Sound data set
Figure BDA0002700514490000042
Figure BDA0002700514490000051
Framing the sound signal by using a Hann window, selecting 1764 sampling points in each frame, and selecting 882 sampling points in each frame in order to keep the continuity between adjacent frames; the amplitude spectrum of the sound is obtained by FFT, the energy spectrum of the sound is obtained by squaring the amplitude spectrum, and then the energy spectrum of the sound is converted into a Mel energy spectrum by utilizing a Mel filter bank. Finally, in order to enhance the low-frequency representation of the sound and enhance the feature information hidden in the low-frequency part, the embodiment performs log nonlinear transformation on the mel-energy spectrum to obtain 2000 40 × 251 wirler-energy spectrum features, wherein 1600 are training sets, and the other 400 are test sets. The 1600 training sets are further divided into a training set and a verification set according to the ratio of 4:1, wherein the training set is used for training the models, and the verification set is used for storing the best models.
The convolutional neural network comprises: six convolutional layers, four maximum pooling layers, one global average pooling layer, and three fully-connected layers, wherein: the maximum pooling layer is connected behind each of the first two convolution layers, and the maximum pooling layer is connected behind each of the second four convolution layers; the global average pooling layer is between the convolution pooling layer and the full-connected layer; the number of convolution kernels of the six convolution layers is respectively 64, 128, 256, 512 and 512, the size of the convolution kernels is 3x3, the step length is 3, the filling mode is complementary 0, and the activation function is relu; the convolution kernel size of the four maximum pooling layers is 2x2, and the filling mode is 0 complementing; the first two full connection layers are both provided with 256 nodes, and the activation function is relu; the number of nodes of the last fully connected layer is the number of sound classes, ESC-50 has 50 sound classes, so the number of nodes of the layer is 50, and the activation function is softmax. Table 2 shows the settings of the model specific parameters.
TABLE 2 model parameter settings
Figure BDA0002700514490000061
The k of the k-fold cross validation used in the training of the method is set to be 5, 5 models are integrated for voice recognition after the training is completed, and table 3 shows the performance comparison of the CNN based on ensemble learning and other methods provided by the method on ESC-50. The invention obtains the best performance at present on an ESC-50 public environment sound data set, and compared with a CNN model which also uses Mel frequency spectrum characteristic extraction and mixu data enhancement, the recognition accuracy of the integrated CNN model provided by the invention is improved by 6.25% compared with that of Single CNN, and is improved by 13.1% compared with that of EnvNet-v2 which uses data enhancement.
TABLE 3 comparison of Performance of different ambient Sound identification methods
Figure BDA0002700514490000062
Figure BDA0002700514490000071
In conclusion, the method can train k models by using k-fold cross validation and combine the k models to perform voice recognition, greatly enhances the generalization capability of the models, effectively relieves the phenomenon of overfitting, and uses mixup data to enhance the generalization capability of the models by mixing the original samples aiming at the condition of small data volume.
The foregoing detailed description of the preferred embodiments of the invention has been presented. It should be understood that numerous modifications and variations could be devised by those skilled in the art in light of the present teachings without departing from the inventive concepts. Therefore, the technical solutions available to those skilled in the art through logic analysis, reasoning and limited experiments based on the prior art according to the concept of the present invention should be within the scope of protection defined by the claims.

Claims (6)

1. The environmental sound identification method based on the ensemble learning and the convolutional neural network is characterized by comprising the following steps of:
s1, feature extraction, namely framing and windowing the original audio, obtaining a corresponding amplitude spectrum through FFT (fast Fourier transform) for each short-time analysis window, taking a square to obtain an energy spectrum of the sound, then obtaining a Mel energy spectrum of the sound by using a Mel filter bank, and then carrying out log nonlinear transformation on the Mel energy spectrum to obtain the final Mel energy spectrum feature which is used as a data set;
s2, performing model training, namely dividing the data set into K parts in an equal proportion by adopting K-fold cross validation, taking one part as test data and the other K-1 parts as training data, then mixing the training data by using mixup data enhancement for model training, storing the model which is best in performance on the test data, and repeating the operation for more than K times to obtain K convolutional neural network models;
and S3, sound testing, wherein the same characteristic extraction step as the step S1 is adopted for the sound sample to be tested, the Mel energy spectrum characteristic of the sound to be tested is obtained and is used as a test sample, the test sample is input into k trained convolutional neural network models, the output of the k convolutional neural network models is sent into a combination module, the combination module takes the output mode as the final output of the integrated model, the final output is compared with the class corresponding to the test set sample, and the recognition rate of the environmental sound is calculated.
2. The ensemble learning and convolutional neural network-based ambient sound recognition method of claim 1, wherein: the step S1 of framing and windowing the original audio specifically includes: the audio data N sampling points are collected into an observation unit called a frame, an overlapping area is formed between two adjacent frames, and each frame is substituted into a window function to eliminate signal discontinuity caused by two ends of each frame.
3. The ensemble learning and convolutional neural network-based ambient sound recognition method of claim 1, wherein: in each operation of step S2, a different data portion is selected from the k portions as test data, it is ensured that the k portions of data have been respectively tested, and the remaining k-1 portions are used as training data.
4. The ensemble learning and convolutional neural network-based ambient sound recognition method of claim 1, wherein: the step S2 of enhancing the mixup data specifically includes: randomly selecting two characteristic samples, mixing the two characteristic samples in proportion, and constructing a new training sample and a new label in a linear interpolation mode, wherein the label is finally processed by the following formula:
Figure FDA0002700514480000021
Figure FDA0002700514480000022
(xi,yi)、(xj,yj) The two data pairs are training sample pairs in the original data set, i.e. training samples and their corresponding labels, where λ is a parameter subject to B distribution, λ -Beta (α, α).
5. The ensemble learning and convolutional neural network-based ambient sound recognition method of claim 1, wherein: when the model training is performed in step S2, the convolution kernel and the weight are initialized uniformly by gloot, and the bias is initialized by all 0S.
6. The ensemble learning and convolutional neural network-based ambient sound recognition method of claim 1, wherein: when model training is performed in the step S2, network parameters are updated by using an Adam algorithm, and when the number of network iterations reaches a preset number of iterations or the recognition accuracy on the verification set is not improved, the training is stopped and the trained convolutional neural network model is stored.
CN202011020706.6A 2020-09-25 2020-09-25 Environmental sound identification method based on ensemble learning and convolution neural network Pending CN112216287A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202011020706.6A CN112216287A (en) 2020-09-25 2020-09-25 Environmental sound identification method based on ensemble learning and convolution neural network

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202011020706.6A CN112216287A (en) 2020-09-25 2020-09-25 Environmental sound identification method based on ensemble learning and convolution neural network

Publications (1)

Publication Number Publication Date
CN112216287A true CN112216287A (en) 2021-01-12

Family

ID=74051077

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202011020706.6A Pending CN112216287A (en) 2020-09-25 2020-09-25 Environmental sound identification method based on ensemble learning and convolution neural network

Country Status (1)

Country Link
CN (1) CN112216287A (en)

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112560822A (en) * 2021-02-23 2021-03-26 江苏聆世科技有限公司 Road sound signal classification method based on convolutional neural network
CN113591733A (en) * 2021-08-04 2021-11-02 中国人民解放军国防科技大学 Underwater acoustic communication modulation mode classification identification method based on integrated neural network model
CN113628641A (en) * 2021-06-08 2021-11-09 广东工业大学 Method for checking mouth and nose breathing based on deep learning
CN114912539A (en) * 2022-05-30 2022-08-16 吉林大学 Environmental sound classification method and system based on reinforcement learning

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8831942B1 (en) * 2010-03-19 2014-09-09 Narus, Inc. System and method for pitch based gender identification with suspicious speaker detection
CN109065030A (en) * 2018-08-01 2018-12-21 上海大学 Ambient sound recognition methods and system based on convolutional neural networks
CN109215637A (en) * 2017-06-30 2019-01-15 三星Sds株式会社 Audio recognition method
CN110189769A (en) * 2019-05-23 2019-08-30 复钧智能科技(苏州)有限公司 Abnormal sound detection method based on multiple convolutional neural networks models couplings

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8831942B1 (en) * 2010-03-19 2014-09-09 Narus, Inc. System and method for pitch based gender identification with suspicious speaker detection
CN109215637A (en) * 2017-06-30 2019-01-15 三星Sds株式会社 Audio recognition method
CN109065030A (en) * 2018-08-01 2018-12-21 上海大学 Ambient sound recognition methods and system based on convolutional neural networks
CN110189769A (en) * 2019-05-23 2019-08-30 复钧智能科技(苏州)有限公司 Abnormal sound detection method based on multiple convolutional neural networks models couplings

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
吴佳;陈森朋;陈修云;周瑞;: "基于强化学习的模型选择和超参数优化", 电子科技大学学报, no. 02, 30 March 2020 (2020-03-30) *
苍岩;罗顺元;乔玉龙;: "基于深层神经网络的猪声音分类", 农业工程学报, no. 09, 8 May 2020 (2020-05-08) *
陈维高;朱卫纲;唐晓婧;贾鑫;: "栈式降噪自编码器在波形单元识别中的应用", 哈尔滨工业大学学报, no. 11, 4 May 2018 (2018-05-04) *

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112560822A (en) * 2021-02-23 2021-03-26 江苏聆世科技有限公司 Road sound signal classification method based on convolutional neural network
CN112560822B (en) * 2021-02-23 2021-05-14 江苏聆世科技有限公司 Road sound signal classification method based on convolutional neural network
CN113628641A (en) * 2021-06-08 2021-11-09 广东工业大学 Method for checking mouth and nose breathing based on deep learning
CN113591733A (en) * 2021-08-04 2021-11-02 中国人民解放军国防科技大学 Underwater acoustic communication modulation mode classification identification method based on integrated neural network model
CN114912539A (en) * 2022-05-30 2022-08-16 吉林大学 Environmental sound classification method and system based on reinforcement learning
CN114912539B (en) * 2022-05-30 2024-07-09 吉林大学 Environmental sound classification method and system based on reinforcement learning

Similar Documents

Publication Publication Date Title
CN109065030B (en) Convolutional neural network-based environmental sound identification method and system
CN112216287A (en) Environmental sound identification method based on ensemble learning and convolution neural network
CN110189769B (en) Abnormal sound detection method based on combination of multiple convolutional neural network models
CN108899051B (en) Speech emotion recognition model and recognition method based on joint feature representation
CN110491416B (en) Telephone voice emotion analysis and identification method based on LSTM and SAE
CN112885372B (en) Intelligent diagnosis method, system, terminal and medium for power equipment fault sound
CN109256118B (en) End-to-end Chinese dialect identification system and method based on generative auditory model
CN108198561A (en) A kind of pirate recordings speech detection method based on convolutional neural networks
CN109243429B (en) Voice modeling method and device
CN113205820B (en) Method for generating voice coder for voice event detection
CN114220458B (en) Voice recognition method and device based on array hydrophone
CN111123894A (en) Chemical process fault diagnosis method based on combination of LSTM and MLP
CN115545086B (en) Migratable feature automatic selection acoustic diagnosis method and system
CN114373452A (en) Voice abnormity identification and evaluation method and system based on deep learning
CN111402922B (en) Audio signal classification method, device, equipment and storage medium based on small samples
CN116935892A (en) Industrial valve anomaly detection method based on audio key feature dynamic aggregation
Zhang et al. Machine hearing for industrial fault diagnosis
CN117056678B (en) Machine pump equipment operation fault diagnosis method and device based on small sample
CN117457017A (en) Voice data cleaning method and electronic equipment
Jadhav et al. Sound classification using python
CN116884435A (en) Voice event detection method and device based on audio prompt learning
CN116863956A (en) Robust snore detection method and system based on convolutional neural network
CN115881160A (en) Music genre classification method and system based on knowledge graph fusion
CN112861949B (en) Emotion prediction method and system based on face and sound
CN114974302A (en) Ambient sound event detection method, apparatus and medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination