CN112216287A - Environmental sound identification method based on ensemble learning and convolution neural network - Google Patents
Environmental sound identification method based on ensemble learning and convolution neural network Download PDFInfo
- Publication number
- CN112216287A CN112216287A CN202011020706.6A CN202011020706A CN112216287A CN 112216287 A CN112216287 A CN 112216287A CN 202011020706 A CN202011020706 A CN 202011020706A CN 112216287 A CN112216287 A CN 112216287A
- Authority
- CN
- China
- Prior art keywords
- neural network
- data
- convolutional neural
- sound
- training
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000000034 method Methods 0.000 title claims abstract description 30
- 230000007613 environmental effect Effects 0.000 title claims abstract description 18
- 238000013528 artificial neural network Methods 0.000 title description 3
- 238000012549 training Methods 0.000 claims abstract description 46
- 238000013527 convolutional neural network Methods 0.000 claims abstract description 35
- 238000001228 spectrum Methods 0.000 claims abstract description 29
- 238000012360 testing method Methods 0.000 claims abstract description 27
- 238000002790 cross-validation Methods 0.000 claims abstract description 9
- 238000000605 extraction Methods 0.000 claims abstract description 7
- 238000009432 framing Methods 0.000 claims abstract description 6
- 238000004458 analytical method Methods 0.000 claims description 6
- 230000006870 function Effects 0.000 claims description 6
- 238000005070 sampling Methods 0.000 claims description 5
- 238000012795 verification Methods 0.000 claims description 5
- 230000009466 transformation Effects 0.000 claims description 4
- ORILYTVJVMAKLC-UHFFFAOYSA-N Adamantane Natural products C1C(C2)CC3CC1CC2C3 ORILYTVJVMAKLC-UHFFFAOYSA-N 0.000 claims description 3
- 230000002708 enhancing effect Effects 0.000 claims 1
- 238000011176 pooling Methods 0.000 description 7
- 230000004913 activation Effects 0.000 description 3
- 230000007547 defect Effects 0.000 description 2
- 238000010586 diagram Methods 0.000 description 2
- 230000000694 effects Effects 0.000 description 2
- 238000012544 monitoring process Methods 0.000 description 2
- 238000011160 research Methods 0.000 description 2
- 238000013473 artificial intelligence Methods 0.000 description 1
- 230000009286 beneficial effect Effects 0.000 description 1
- 230000008859 change Effects 0.000 description 1
- 230000000295 complement effect Effects 0.000 description 1
- 125000004122 cyclic group Chemical group 0.000 description 1
- 238000002474 experimental method Methods 0.000 description 1
- 239000000203 mixture Substances 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 230000008569 process Effects 0.000 description 1
- 230000005236 sound signal Effects 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L17/00—Speaker identification or verification techniques
- G10L17/26—Recognition of special voice characteristics, e.g. for use in lie detectors; Recognition of animal voices
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L17/00—Speaker identification or verification techniques
- G10L17/04—Training, enrolment or model building
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L25/00—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
- G10L25/03—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the type of extracted parameters
- G10L25/18—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the type of extracted parameters the extracted parameters being spectral information of each sub-band
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L25/00—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
- G10L25/27—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the analysis technique
- G10L25/30—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the analysis technique using neural networks
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L25/00—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
- G10L25/45—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the type of analysis window
Landscapes
- Engineering & Computer Science (AREA)
- Physics & Mathematics (AREA)
- Health & Medical Sciences (AREA)
- Multimedia (AREA)
- Acoustics & Sound (AREA)
- Human Computer Interaction (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Computational Linguistics (AREA)
- Signal Processing (AREA)
- Theoretical Computer Science (AREA)
- Artificial Intelligence (AREA)
- Evolutionary Computation (AREA)
- Biophysics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Mathematical Physics (AREA)
- Software Systems (AREA)
- Computing Systems (AREA)
- Molecular Biology (AREA)
- General Health & Medical Sciences (AREA)
- Data Mining & Analysis (AREA)
- Biomedical Technology (AREA)
- Spectroscopy & Molecular Physics (AREA)
- Life Sciences & Earth Sciences (AREA)
- Measurement Of Mechanical Vibrations Or Ultrasonic Waves (AREA)
Abstract
The invention discloses an environmental sound identification method based on integrated learning and convolutional neural network, which comprises the following steps: s1, feature extraction, namely framing and windowing the original audio, obtaining a Mel energy spectrum of the sound by utilizing a Mel filter bank, and finally obtaining the final Mel energy spectrum feature as a data set; s2, model training, namely performing model training on the data set by adopting K-fold cross validation and using a mixup data enhancement method to obtain K convolutional neural network models; and S3, testing the sound, and identifying the sound sample to be tested through a convolutional neural network model. The method can train k models by utilizing k-fold cross validation and combine the k models to perform voice recognition, greatly enhances the generalization capability of the models, effectively relieves the phenomenon of overfitting, and uses mixup data to enhance the generalization capability of the models by mixing the original samples aiming at the condition of small data volume.
Description
Technical Field
The invention relates to the technical field of artificial intelligence, in particular to an environmental sound identification method based on integrated learning and convolutional neural network.
Background
In the research of audio information, environmental sound identification is an important research field, and has great application potential in the fields of safety monitoring, medical monitoring, smart home, scene analysis and the like. Compared with voice recognition, the environmental sound has the characteristics of noise-like property, wide frequency spectrum and the like, so that the recognition of the environmental sound is more challenging.
The existing environmental sound recognition method based on the convolutional neural network generally divides the existing data into a training set and a test set, then trains a model by using the training set until the model converges, tests the model by using the test set in the training process, selects the model which best appears on the test set for storage, and finally performs the environmental sound recognition by using the stored convolutional neural network
The existing identification method based on the convolutional neural network, the identification method based on the convolutional neural network and the cyclic neural network and the identification method based on the Gaussian mixture model are all used for identifying unknown environmental audio by training a single model through existing environmental audio data, and the models trained by the method have the defects of weak generalization capability and easy occurrence of overfitting.
Disclosure of Invention
In view of the foregoing defects in the prior art, the technical problem to be solved by the present invention is to provide an environmental sound recognition method based on ensemble learning and convolutional neural network, which can train k models by using k-fold cross validation and perform sound recognition by combining the k models, greatly enhance the generalization capability of the models, effectively alleviate the over-fitting phenomenon, and further enhance the generalization capability of the models by mixing the original samples by using mix-up data enhancement in view of the small data volume.
In order to achieve the above object, the present invention provides an environmental sound identification method based on ensemble learning and convolutional neural network, comprising the following steps:
s1, feature extraction, namely framing and windowing the original audio, obtaining a corresponding amplitude spectrum through FFT (fast Fourier transform) for each short-time analysis window, taking a square to obtain an energy spectrum of the sound, then obtaining a Mel energy spectrum of the sound by using a Mel filter bank, and then carrying out log nonlinear transformation on the Mel energy spectrum to obtain the final Mel energy spectrum feature which is used as a data set;
s2, performing model training, namely dividing the data set into K parts in an equal proportion by adopting K-fold cross validation, taking one part as test data and the other K-1 parts as training data, then mixing the training data by using mixup data enhancement for model training, storing the model which is best in performance on the test data, and repeating the operation for more than K times to obtain K convolutional neural network models;
and S3, sound testing, wherein the same characteristic extraction step as the step S1 is adopted for the sound sample to be tested, the Mel energy spectrum characteristic of the sound to be tested is obtained and is used as a test sample, the test sample is input into k trained convolutional neural network models, the output of the k convolutional neural network models is sent into a combination module, the combination module takes the output mode as the final output of the integrated model, the final output is compared with the class corresponding to the test set sample, and the recognition rate of the environmental sound is calculated.
Further, the step S1 of framing and windowing the original audio specifically includes: the audio data N sampling points are collected into an observation unit called a frame, an overlapping area is formed between two adjacent frames, and each frame is substituted into a window function to eliminate signal discontinuity caused by two ends of each frame.
Further, in each operation of step S2, a different data portion is selected from the k portions to be used as test data, it is ensured that the data of the k portions are respectively used as test data, and the remaining k-1 portions are used as training data.
Further, the mixup data enhancement in step S2 is specifically: randomly selecting two characteristic samples, mixing the two characteristic samples in proportion, and constructing a new training sample and a new label in a linear interpolation mode, wherein the label is finally processed by the following formula:
(xi,yi)、(xj,yj) The two data pairs are training sample pairs in the original data set, i.e. training samples and their corresponding labels, where λ is a parameter subject to B distribution, λ -Beta (α, α).
Further, when the model training is performed in step S2, the convolution kernel and the weight are initialized uniformly by Glorot, and the bias is initialized by all 0S.
Further, when the model training is performed in step S2, the Adam algorithm is used to update the network parameters, and when the number of network iterations reaches a preset number of iterations or the recognition accuracy on the verification set is not improved, the training is stopped and the trained convolutional neural network model is stored.
The invention has the beneficial effects that:
the method can train k models by utilizing k-fold cross validation and combine the k models to perform voice recognition, greatly enhances the generalization capability of the models, effectively relieves the phenomenon of overfitting, and uses mixup data to enhance the generalization capability of the models by mixing the original samples aiming at the condition of small data volume.
The conception, the specific structure and the technical effects of the present invention will be further described with reference to the accompanying drawings to fully understand the objects, the features and the effects of the present invention.
Drawings
FIG. 1 is a flow chart of the method of the present invention.
FIG. 2 is a schematic diagram of the integrated model prediction of the present invention.
Detailed Description
As shown in fig. 1, a flowchart of an environmental sound identification method based on ensemble learning and convolutional neural network includes the following steps:
s1, extracting characteristics, wherein for the convenience of speech analysis, N sampling points are firstly collected into an observation unit called a frame, so as to avoid overlarge change of two adjacent frames, and therefore, an overlapping area is formed between the two adjacent frames. Each frame is substituted into a window function to eliminate signal discontinuities that may be caused across the frames. For each short-time analysis window, obtaining a corresponding amplitude spectrum through FFT, taking a square to obtain an energy spectrum of sound, then obtaining a Mel energy spectrum of the sound by utilizing a Mel filter bank, and then obtaining log nonlinear transformation of the Mel energy spectrum to obtain the final Mel energy spectrum characteristic;
and S2, model training, namely dividing the data set into k parts in an equal proportion by adopting k-fold cross validation, wherein one part is used as test data, and the other k-1 parts are used as training data. Meanwhile, because the data volume of the data set is small, the embodiment mixes the characteristic data by using mixup data enhancement and then uses the characteristic data for model training so as to improve the generalization capability of the model. And inputting the training set into a convolutional neural network model for supervised training, storing the model with the best performance on test data, and repeating the operation for more than K times to obtain K convolutional neural network models. During training, the convolution kernel and the weight are initialized uniformly by using Glorot, and the bias is initialized by using all 0. And updating network parameters by adopting an Adam algorithm, and stopping training and storing the trained convolutional neural network model when the network iteration times reach the preset iteration times or the identification precision on the verification set is not improved for a long time.
The Mixup data enhancement refers to the random selection of two characteristic samples, the mixing of the two characteristic samples is carried out in proportion, a new training sample and a new label are constructed in a linear interpolation mode, and finally the label is processed according to the following formula:
(xi,yi),(xj,yj) The two data pairs are training sample pairs (training samples and their corresponding labels) in the raw data set. Where λ is a parameter subject to the B distribution, λ -Beta (α, α).
S3, testing: the method comprises the steps of obtaining Mel energy spectrum characteristics of sound to be tested by adopting the same characteristic extraction steps as those in a training stage for the sound sample to be tested, inputting the test sample into k trained convolutional neural network models, sending the output of the k models into a combination module, using the output mode of the combination module as the final output of an integrated model, comparing the final output with the category corresponding to a test set sample, and calculating the recognition rate of environmental sound. FIG. 2 is a schematic diagram of model prediction.
Specifically, the present embodiment uses a convolutional neural network and a mixup method for performance testing on ESC-50 data sets. The ESC-50 data set contains 2000 natural environment sound segments, each with a 5 second sound length and a sample rate of 44.1 kHz. The data set includes: 5 major categories, namely 5 major categories of animal cry, natural environment sound, human non-voice sound, indoor sound and urban outdoor sound, wherein each major category comprises 10 types of sound, and each type of sound has 40 samples. The data set details are shown in table 1.
TABLE 1 ambient Sound data set
Framing the sound signal by using a Hann window, selecting 1764 sampling points in each frame, and selecting 882 sampling points in each frame in order to keep the continuity between adjacent frames; the amplitude spectrum of the sound is obtained by FFT, the energy spectrum of the sound is obtained by squaring the amplitude spectrum, and then the energy spectrum of the sound is converted into a Mel energy spectrum by utilizing a Mel filter bank. Finally, in order to enhance the low-frequency representation of the sound and enhance the feature information hidden in the low-frequency part, the embodiment performs log nonlinear transformation on the mel-energy spectrum to obtain 2000 40 × 251 wirler-energy spectrum features, wherein 1600 are training sets, and the other 400 are test sets. The 1600 training sets are further divided into a training set and a verification set according to the ratio of 4:1, wherein the training set is used for training the models, and the verification set is used for storing the best models.
The convolutional neural network comprises: six convolutional layers, four maximum pooling layers, one global average pooling layer, and three fully-connected layers, wherein: the maximum pooling layer is connected behind each of the first two convolution layers, and the maximum pooling layer is connected behind each of the second four convolution layers; the global average pooling layer is between the convolution pooling layer and the full-connected layer; the number of convolution kernels of the six convolution layers is respectively 64, 128, 256, 512 and 512, the size of the convolution kernels is 3x3, the step length is 3, the filling mode is complementary 0, and the activation function is relu; the convolution kernel size of the four maximum pooling layers is 2x2, and the filling mode is 0 complementing; the first two full connection layers are both provided with 256 nodes, and the activation function is relu; the number of nodes of the last fully connected layer is the number of sound classes, ESC-50 has 50 sound classes, so the number of nodes of the layer is 50, and the activation function is softmax. Table 2 shows the settings of the model specific parameters.
TABLE 2 model parameter settings
The k of the k-fold cross validation used in the training of the method is set to be 5, 5 models are integrated for voice recognition after the training is completed, and table 3 shows the performance comparison of the CNN based on ensemble learning and other methods provided by the method on ESC-50. The invention obtains the best performance at present on an ESC-50 public environment sound data set, and compared with a CNN model which also uses Mel frequency spectrum characteristic extraction and mixu data enhancement, the recognition accuracy of the integrated CNN model provided by the invention is improved by 6.25% compared with that of Single CNN, and is improved by 13.1% compared with that of EnvNet-v2 which uses data enhancement.
TABLE 3 comparison of Performance of different ambient Sound identification methods
In conclusion, the method can train k models by using k-fold cross validation and combine the k models to perform voice recognition, greatly enhances the generalization capability of the models, effectively relieves the phenomenon of overfitting, and uses mixup data to enhance the generalization capability of the models by mixing the original samples aiming at the condition of small data volume.
The foregoing detailed description of the preferred embodiments of the invention has been presented. It should be understood that numerous modifications and variations could be devised by those skilled in the art in light of the present teachings without departing from the inventive concepts. Therefore, the technical solutions available to those skilled in the art through logic analysis, reasoning and limited experiments based on the prior art according to the concept of the present invention should be within the scope of protection defined by the claims.
Claims (6)
1. The environmental sound identification method based on the ensemble learning and the convolutional neural network is characterized by comprising the following steps of:
s1, feature extraction, namely framing and windowing the original audio, obtaining a corresponding amplitude spectrum through FFT (fast Fourier transform) for each short-time analysis window, taking a square to obtain an energy spectrum of the sound, then obtaining a Mel energy spectrum of the sound by using a Mel filter bank, and then carrying out log nonlinear transformation on the Mel energy spectrum to obtain the final Mel energy spectrum feature which is used as a data set;
s2, performing model training, namely dividing the data set into K parts in an equal proportion by adopting K-fold cross validation, taking one part as test data and the other K-1 parts as training data, then mixing the training data by using mixup data enhancement for model training, storing the model which is best in performance on the test data, and repeating the operation for more than K times to obtain K convolutional neural network models;
and S3, sound testing, wherein the same characteristic extraction step as the step S1 is adopted for the sound sample to be tested, the Mel energy spectrum characteristic of the sound to be tested is obtained and is used as a test sample, the test sample is input into k trained convolutional neural network models, the output of the k convolutional neural network models is sent into a combination module, the combination module takes the output mode as the final output of the integrated model, the final output is compared with the class corresponding to the test set sample, and the recognition rate of the environmental sound is calculated.
2. The ensemble learning and convolutional neural network-based ambient sound recognition method of claim 1, wherein: the step S1 of framing and windowing the original audio specifically includes: the audio data N sampling points are collected into an observation unit called a frame, an overlapping area is formed between two adjacent frames, and each frame is substituted into a window function to eliminate signal discontinuity caused by two ends of each frame.
3. The ensemble learning and convolutional neural network-based ambient sound recognition method of claim 1, wherein: in each operation of step S2, a different data portion is selected from the k portions as test data, it is ensured that the k portions of data have been respectively tested, and the remaining k-1 portions are used as training data.
4. The ensemble learning and convolutional neural network-based ambient sound recognition method of claim 1, wherein: the step S2 of enhancing the mixup data specifically includes: randomly selecting two characteristic samples, mixing the two characteristic samples in proportion, and constructing a new training sample and a new label in a linear interpolation mode, wherein the label is finally processed by the following formula:
(xi,yi)、(xj,yj) The two data pairs are training sample pairs in the original data set, i.e. training samples and their corresponding labels, where λ is a parameter subject to B distribution, λ -Beta (α, α).
5. The ensemble learning and convolutional neural network-based ambient sound recognition method of claim 1, wherein: when the model training is performed in step S2, the convolution kernel and the weight are initialized uniformly by gloot, and the bias is initialized by all 0S.
6. The ensemble learning and convolutional neural network-based ambient sound recognition method of claim 1, wherein: when model training is performed in the step S2, network parameters are updated by using an Adam algorithm, and when the number of network iterations reaches a preset number of iterations or the recognition accuracy on the verification set is not improved, the training is stopped and the trained convolutional neural network model is stored.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202011020706.6A CN112216287A (en) | 2020-09-25 | 2020-09-25 | Environmental sound identification method based on ensemble learning and convolution neural network |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202011020706.6A CN112216287A (en) | 2020-09-25 | 2020-09-25 | Environmental sound identification method based on ensemble learning and convolution neural network |
Publications (1)
Publication Number | Publication Date |
---|---|
CN112216287A true CN112216287A (en) | 2021-01-12 |
Family
ID=74051077
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202011020706.6A Pending CN112216287A (en) | 2020-09-25 | 2020-09-25 | Environmental sound identification method based on ensemble learning and convolution neural network |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN112216287A (en) |
Cited By (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN112560822A (en) * | 2021-02-23 | 2021-03-26 | 江苏聆世科技有限公司 | Road sound signal classification method based on convolutional neural network |
CN113591733A (en) * | 2021-08-04 | 2021-11-02 | 中国人民解放军国防科技大学 | Underwater acoustic communication modulation mode classification identification method based on integrated neural network model |
CN113628641A (en) * | 2021-06-08 | 2021-11-09 | 广东工业大学 | Method for checking mouth and nose breathing based on deep learning |
CN114912539A (en) * | 2022-05-30 | 2022-08-16 | 吉林大学 | Environmental sound classification method and system based on reinforcement learning |
Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US8831942B1 (en) * | 2010-03-19 | 2014-09-09 | Narus, Inc. | System and method for pitch based gender identification with suspicious speaker detection |
CN109065030A (en) * | 2018-08-01 | 2018-12-21 | 上海大学 | Ambient sound recognition methods and system based on convolutional neural networks |
CN109215637A (en) * | 2017-06-30 | 2019-01-15 | 三星Sds株式会社 | Audio recognition method |
CN110189769A (en) * | 2019-05-23 | 2019-08-30 | 复钧智能科技(苏州)有限公司 | Abnormal sound detection method based on multiple convolutional neural networks models couplings |
-
2020
- 2020-09-25 CN CN202011020706.6A patent/CN112216287A/en active Pending
Patent Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US8831942B1 (en) * | 2010-03-19 | 2014-09-09 | Narus, Inc. | System and method for pitch based gender identification with suspicious speaker detection |
CN109215637A (en) * | 2017-06-30 | 2019-01-15 | 三星Sds株式会社 | Audio recognition method |
CN109065030A (en) * | 2018-08-01 | 2018-12-21 | 上海大学 | Ambient sound recognition methods and system based on convolutional neural networks |
CN110189769A (en) * | 2019-05-23 | 2019-08-30 | 复钧智能科技(苏州)有限公司 | Abnormal sound detection method based on multiple convolutional neural networks models couplings |
Non-Patent Citations (3)
Title |
---|
吴佳;陈森朋;陈修云;周瑞;: "基于强化学习的模型选择和超参数优化", 电子科技大学学报, no. 02, 30 March 2020 (2020-03-30) * |
苍岩;罗顺元;乔玉龙;: "基于深层神经网络的猪声音分类", 农业工程学报, no. 09, 8 May 2020 (2020-05-08) * |
陈维高;朱卫纲;唐晓婧;贾鑫;: "栈式降噪自编码器在波形单元识别中的应用", 哈尔滨工业大学学报, no. 11, 4 May 2018 (2018-05-04) * |
Cited By (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN112560822A (en) * | 2021-02-23 | 2021-03-26 | 江苏聆世科技有限公司 | Road sound signal classification method based on convolutional neural network |
CN112560822B (en) * | 2021-02-23 | 2021-05-14 | 江苏聆世科技有限公司 | Road sound signal classification method based on convolutional neural network |
CN113628641A (en) * | 2021-06-08 | 2021-11-09 | 广东工业大学 | Method for checking mouth and nose breathing based on deep learning |
CN113591733A (en) * | 2021-08-04 | 2021-11-02 | 中国人民解放军国防科技大学 | Underwater acoustic communication modulation mode classification identification method based on integrated neural network model |
CN114912539A (en) * | 2022-05-30 | 2022-08-16 | 吉林大学 | Environmental sound classification method and system based on reinforcement learning |
CN114912539B (en) * | 2022-05-30 | 2024-07-09 | 吉林大学 | Environmental sound classification method and system based on reinforcement learning |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN109065030B (en) | Convolutional neural network-based environmental sound identification method and system | |
CN112216287A (en) | Environmental sound identification method based on ensemble learning and convolution neural network | |
CN110189769B (en) | Abnormal sound detection method based on combination of multiple convolutional neural network models | |
CN108899051B (en) | Speech emotion recognition model and recognition method based on joint feature representation | |
CN110491416B (en) | Telephone voice emotion analysis and identification method based on LSTM and SAE | |
CN112885372B (en) | Intelligent diagnosis method, system, terminal and medium for power equipment fault sound | |
CN109256118B (en) | End-to-end Chinese dialect identification system and method based on generative auditory model | |
CN108198561A (en) | A kind of pirate recordings speech detection method based on convolutional neural networks | |
CN109243429B (en) | Voice modeling method and device | |
CN113205820B (en) | Method for generating voice coder for voice event detection | |
CN114220458B (en) | Voice recognition method and device based on array hydrophone | |
CN111123894A (en) | Chemical process fault diagnosis method based on combination of LSTM and MLP | |
CN115545086B (en) | Migratable feature automatic selection acoustic diagnosis method and system | |
CN114373452A (en) | Voice abnormity identification and evaluation method and system based on deep learning | |
CN111402922B (en) | Audio signal classification method, device, equipment and storage medium based on small samples | |
CN116935892A (en) | Industrial valve anomaly detection method based on audio key feature dynamic aggregation | |
Zhang et al. | Machine hearing for industrial fault diagnosis | |
CN117056678B (en) | Machine pump equipment operation fault diagnosis method and device based on small sample | |
CN117457017A (en) | Voice data cleaning method and electronic equipment | |
Jadhav et al. | Sound classification using python | |
CN116884435A (en) | Voice event detection method and device based on audio prompt learning | |
CN116863956A (en) | Robust snore detection method and system based on convolutional neural network | |
CN115881160A (en) | Music genre classification method and system based on knowledge graph fusion | |
CN112861949B (en) | Emotion prediction method and system based on face and sound | |
CN114974302A (en) | Ambient sound event detection method, apparatus and medium |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination |