CN113851115A - Complex sound identification method based on one-dimensional convolutional neural network - Google Patents
Complex sound identification method based on one-dimensional convolutional neural network Download PDFInfo
- Publication number
- CN113851115A CN113851115A CN202111044338.3A CN202111044338A CN113851115A CN 113851115 A CN113851115 A CN 113851115A CN 202111044338 A CN202111044338 A CN 202111044338A CN 113851115 A CN113851115 A CN 113851115A
- Authority
- CN
- China
- Prior art keywords
- neural network
- convolutional neural
- dimensional
- layer
- features
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000000034 method Methods 0.000 title claims abstract description 38
- 238000013527 convolutional neural network Methods 0.000 title claims abstract description 37
- 230000007246 mechanism Effects 0.000 claims abstract description 28
- 238000011176 pooling Methods 0.000 claims abstract description 14
- 230000008569 process Effects 0.000 claims abstract description 7
- 238000013528 artificial neural network Methods 0.000 claims description 17
- 239000013598 vector Substances 0.000 claims description 12
- 238000012545 processing Methods 0.000 claims description 8
- 230000009469 supplementation Effects 0.000 claims description 3
- 230000000694 effects Effects 0.000 abstract description 3
- 230000006870 function Effects 0.000 description 12
- 230000005236 sound signal Effects 0.000 description 4
- 238000012549 training Methods 0.000 description 4
- 238000010586 diagram Methods 0.000 description 3
- 230000001133 acceleration Effects 0.000 description 2
- 238000013136 deep learning model Methods 0.000 description 2
- 230000007547 defect Effects 0.000 description 2
- 230000010339 dilation Effects 0.000 description 2
- 238000005457 optimization Methods 0.000 description 2
- 238000007792 addition Methods 0.000 description 1
- 230000009286 beneficial effect Effects 0.000 description 1
- 230000007423 decrease Effects 0.000 description 1
- 238000002474 experimental method Methods 0.000 description 1
- 238000000605 extraction Methods 0.000 description 1
- 238000011478 gradient descent method Methods 0.000 description 1
- 230000006872 improvement Effects 0.000 description 1
- 230000002452 interceptive effect Effects 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 238000006467 substitution reaction Methods 0.000 description 1
- 239000013589 supplement Substances 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/06—Creation of reference templates; Training of speech recognition systems, e.g. adaptation to the characteristics of the speaker's voice
- G10L15/063—Training
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L19/00—Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis
- G10L19/04—Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis using predictive techniques
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L25/00—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
- G10L25/27—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the analysis technique
- G10L25/30—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the analysis technique using neural networks
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/06—Creation of reference templates; Training of speech recognition systems, e.g. adaptation to the characteristics of the speaker's voice
- G10L15/063—Training
- G10L2015/0631—Creating reference templates; Clustering
Landscapes
- Engineering & Computer Science (AREA)
- Physics & Mathematics (AREA)
- Health & Medical Sciences (AREA)
- Computational Linguistics (AREA)
- Theoretical Computer Science (AREA)
- Artificial Intelligence (AREA)
- Multimedia (AREA)
- Acoustics & Sound (AREA)
- Human Computer Interaction (AREA)
- Evolutionary Computation (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Biophysics (AREA)
- Computing Systems (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Mathematical Physics (AREA)
- Software Systems (AREA)
- Molecular Biology (AREA)
- General Health & Medical Sciences (AREA)
- Data Mining & Analysis (AREA)
- Biomedical Technology (AREA)
- Life Sciences & Earth Sciences (AREA)
- Signal Processing (AREA)
- Complex Calculations (AREA)
Abstract
The invention discloses a complex sound identification method based on a one-dimensional convolutional neural network, which adopts a random completion algorithm to process complex sounds, fills original data to the same length and is used for the input of the one-dimensional convolutional neural network; and embedding a pre-emphasis module and a simplified attention mechanism module in a basic frame of the one-dimensional convolutional neural network, wherein the pre-emphasis module is arranged at an input part of the one-dimensional convolutional neural network and is used for pre-emphasizing input data and optimizing a participating network model, the simplified attention mechanism module is arranged at the deep layer of the one-dimensional convolutional neural network, and global features with attention are obtained by utilizing a global average pooling function and a sigmoid function. The method of the invention optimizes the network model and obtains good identification effect.
Description
Technical Field
The invention belongs to the technical field of audio processing, relates to a complex sound identification technology, and particularly relates to a complex sound identification method based on a one-dimensional convolutional neural network.
Background
The complex sound refers to non-language sounds in the environment, the sound source is complex and various, the signal itself has non-stationarity and is often accompanied by extremely interfering background noise and the like, so that the sound characteristics of different sound scenes are not obvious enough or the similarity of the characteristics is very high, and the complex sound identification can automatically identify the specific types of the complex sounds in the environment, such as child playing, car whistling, street music and the like. In the field of sound classification, such as speech classification and music classification, very high accuracy has been achieved, but in the field of complex sound recognition, due to the non-stationarity of the signal itself, the speech or music classification scheme is obviously not suitable for solving such problems, and therefore an effective recognition model for complex sounds needs to be provided.
At present, there are three main methods for solving the problem of complex sound classification by combining with neural network according to the difference of input data: based on the original signal, artificial features and a variety of input data. The first method is to directly use the original signal to carry out network training, and the method has the advantages that the characteristic extraction of the signal is not needed manually, the operation flow is greatly simplified, and the model is simple and convenient to popularize; the second method is to process the original data and artificially extract some features of the sound signal, such as a spectrogram, a mel frequency cepstrum coefficient and the like. The third is a multi-input complex network, the original sound signal and the manually extracted features are used as the input part of the network, and the method has the advantages that the original features (time sequence features) and frequency domain features of the signal can be combined, so that the defect of insufficient single data features is overcome, but the model is complex, has high requirements on the hardware of the platform and is inconvenient to apply.
Deep learning models based on original audio signals are used by many scholars to solve complex voice recognition problems, such as the complex voice recognition model based on one-dimensional convolutional neural network proposed by Dai et al, which achieves better recognition accuracy. However, the deep learning model is difficult to effectively extract the features of the original signal, and the model proposed by the prior art is complex and needs to be further optimized. Solving complex sound problems based on the original audio signal is thus a very big challenge. In order to achieve a good recognition effect, the following problems still exist in the existing scheme:
(1) problem of raw data inconsistency
In the actual data processing process, there are some cases that audio durations in data sets (such as an UrbanSound8K data set and data collected in an actual environment) are not consistent, and a one-dimensional convolutional neural network model requires a fixed input data length, so that data padding needs to be used, and common data padding methods include cubic spline interpolation, a zero padding method and the like. There are many audio time lengths in the data set that are very different from the target length, for example, the actual time length is 1 second, and the target length is 4 seconds, obviously, cubic spline interpolation is not suitable for this case, and the zero filling method is too simple, data will lose much information, and the more zeros filled in may mask valid information. Therefore, the invention provides a random completion algorithm, which is based on original data and can enrich data characteristics while filling data.
(2) Attention mechanism
The attention mechanism can enable the model to pay attention to useful information, and can further improve the performance of the model. The invention provides a simplified attention mechanism facing a one-dimensional convolutional neural network, which obtains an attention feature vector by weighting global features and multiplying the global features by an original feature vector.
Disclosure of Invention
Aiming at the defects in the prior art, the invention provides a complex voice recognition method based on a one-dimensional convolutional neural network, firstly, a random completion algorithm is provided, uneven original audio data are filled to the same length and input into a network model, then the network model is optimized, a pre-emphasis technology and a simplified attention mechanism are introduced into the neural network for training, and finally, a complex voice recognition model is constructed.
In order to solve the technical problems, the invention adopts the technical scheme that:
a complex sound identification method based on a one-dimensional convolution neural network adopts a random completion algorithm to process complex sounds, and original data are filled to the same length and used for input of the one-dimensional convolution neural network; and embedding a pre-emphasis module and a simplified attention mechanism module in a basic frame of the one-dimensional convolutional neural network, wherein the pre-emphasis module is arranged at an input part of the one-dimensional convolutional neural network and is used for pre-emphasizing input data and optimizing a participating network model, the simplified attention mechanism module is arranged at the deep layer of the one-dimensional convolutional neural network, and global features with attention are obtained by utilizing a global average pooling function and a sigmoid function.
Further, the detailed steps of the complex voice recognition method based on the one-dimensional convolutional neural network are as follows:
firstly, processing original data: filling the original data by adopting a random padding algorithm to obtain cut original audio with consistent length after random padding, and taking the original audio as input data of a one-dimensional convolution neural network;
secondly, pre-emphasis: pre-emphasis is carried out on input data through a pre-emphasis module, and then the input data are processed through a layer of convolution layer;
and thirdly, one-dimensional convolution neural network: obtaining a characteristic vector through one-dimensional convolutional neural network processing, wherein the one-dimensional convolutional neural network structure adopts a convolutional layer with the same number of two channels, and then a pooling layer is stacked for three times, so that 6 layers of convolutional structures are obtained;
fourthly, attention mechanism: inputting the feature vector into a simplified attention mechanism module to obtain the feature with attention;
fifthly, output classification: and finally, outputting a final recognition result through a two-layer full connection structure and a softmax classification function.
Further, the random completion algorithm specifically comprises the following steps:
(1) dividing all samples into two categories of more than or equal to N/2 seconds and less than N/2 seconds, wherein the target length of the samples is N seconds;
randomly selecting a starting point which can be supplemented to N seconds at one time for the samples of which the time is more than or equal to N/2 seconds, then intercepting the starting point to a required length, and finally filling the intercepted audio segment at the tail end of the original audio to complete the supplementation;
(2) and directly copying the whole sample until the length of the sample is more than or equal to N seconds for the sample less than N/2 seconds, and finally cutting the sample into the sample of N seconds.
Furthermore, the pre-emphasis module has two layers of convolution structures, initial values of convolution kernels of the first layer are set to be-0.97 and 1 and are continuously stacked, initial values of convolution kernels of the second layer are set to be 1, and pre-emphasis coefficients are further adjusted.
Further, the number of convolution kernels of each layer of the pre-emphasis module is set to be 1.
Further, the simplified attention mechanism is that firstly, global average pooling is used for compressing the features into one-dimensional features consistent with the number of channels, global features of the model are obtained, then the features are input into a sigmoid function to obtain weights corresponding to each channel, and finally the weights and the one-dimensional features obtained by the original global average pooling are multiplied to obtain new global features which are the features with attention;
the expression for the attention mechanism is as follows:
wherein F is the deep output characteristic of the one-dimensional convolution neural network, W is the weight vector, FOIs a global feature with attention.
Compared with the prior art, the invention has the advantages that:
(1) the random completion method designed by the invention can fill the uneven original data to the same length, is convenient for the input of a network model, makes up for the singleness of the zero filling method, depends on the original data to supplement the original data, furthest retains the characteristics of the time sequence and the like of the original data, provides more useful characteristics and obviously contributes to the improvement of the classification performance.
(2) The pre-emphasis module designed by the invention combines the pre-emphasis technology into the convolutional neural network by utilizing the convolution operation of the convolutional layer, provides a buffer space for the previous pre-emphasis layer by adding a convolutional layer with a convolutional kernel initial value of 1 and a kernel length of 1, can further properly adjust the network, simultaneously lightens the tuning burden of the next one-dimensional convolutional neural network and improves the performance.
(3) The simplified attention mechanism designed by the invention obtains the global characteristics with attention by utilizing the global average pooling and sigmoid functions, and is beneficial to model classification.
(4) The key points are combined and an end-to-end complex sound identification model based on the one-dimensional convolutional neural network is constructed, and the model can well acquire the characteristics of original complex sound and obtain a good identification effect.
Drawings
In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings needed to be used in the description of the embodiments are briefly introduced below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and it is obvious for those skilled in the art to obtain other drawings based on these drawings without creative efforts.
FIG. 1 is a flow chart of complex speech recognition according to the present invention;
FIG. 2 is a comparison of the original data after the random completion method and the zero padding method of the present invention;
FIG. 3 is a diagram of a pre-emphasis module of the present invention;
FIG. 4 is a simplified attention mechanism model architecture diagram of the present invention.
Detailed Description
The invention is further described with reference to the following figures and specific embodiments.
The embodiment provides a complex sound identification method based on a one-dimensional convolutional neural network, which comprises the following two aspects: on one hand, the method adopts a random completion algorithm to process complex sound, and fills original data to the same length for the input of a one-dimensional convolution neural network. On the other hand, the optimization network model structure: a pre-emphasis module and a simplified attention mechanism module are embedded in the basic framework of the one-dimensional convolutional neural network. The pre-emphasis module is arranged at the input part of the one-dimensional convolutional neural network and is used for pre-emphasizing input data and participating in network model tuning; the simplified attention mechanism module is arranged at the deep layer of the one-dimensional convolutional neural network, and global features with attention are obtained by utilizing a global average pooling function and a sigmoid function.
In conjunction with the complex voice recognition flowchart shown in fig. 1, the detailed steps are as follows:
firstly, processing original data: and filling the original data by adopting a random padding algorithm to obtain the original audio with consistent length after being cut and padded randomly, and taking the original audio as the input data of the one-dimensional convolution neural network.
The random completion algorithm comprises the following specific steps:
(1) assuming that the target length of a sample is 4 seconds, dividing all samples into two categories of more than or equal to 2 seconds and less than 2 seconds;
randomly selecting a starting point which can be supplemented for 4 seconds at one time for samples more than or equal to 2 seconds, then intercepting the starting point to a required length, and finally filling the intercepted audio segment at the tail end of the original audio to complete the supplementation;
(2) for samples less than 2 seconds, the whole sample is directly copied until the length is more than or equal to 4 seconds, and finally the sample is cut into samples of 4 seconds.
The pair of raw data after the random padding method and the method of padding zeros is shown in fig. 2. The pseudo code for the random completion algorithm is as follows.
Secondly, pre-emphasis: the input data is pre-emphasized through a pre-emphasis module, and then is processed through a convolution layer with a large convolution kernel.
The pre-emphasis module has two layers of convolution structures, initial values of convolution kernels of the first layer are set to be-0.97 and 1 and are stacked continuously, initial values of convolution kernels of the second layer are 1, and pre-emphasis coefficients can be further adjusted. Fig. 3 is a diagram of a pre-emphasis module, which aims to pre-emphasize input data without extracting features, so that the number of convolution kernels in each layer is set to 1. In the process of model learning, the pre-emphasis module also participates in the network for tuning.
And thirdly, one-dimensional convolution neural network: and processing by a one-dimensional convolutional neural network to obtain a feature vector, wherein the one-dimensional convolutional neural network structure adopts a convolutional layer with the same number of two channels, and then a pooling layer is stacked for three times, so that 6 layers of convolutional structures are obtained.
Fourthly, attention mechanism: the feature vectors are input into a simplified attention mechanism module to obtain attention-bearing features.
As shown in fig. 4, the attention mechanism is placed at a deep layer of a one-dimensional convolutional neural network, first, features are compressed into one-dimensional features consistent with the number of channels by using Global Average Pooling (GAP), the Global features of the model can be obtained in this step, then the features are input into a sigmoid function to obtain weights corresponding to each channel, and finally, the weights and the one-dimensional features obtained by the original GAP are multiplied to obtain new Global features, which are features with attention.
The expression for the attention mechanism is as follows:
wherein F is the deep output characteristic of the one-dimensional convolution neural network, W is the weight vector, FOIs a global feature with attention.
Fifthly, output classification: and finally, outputting a final recognition result through a two-layer full connection structure and a softmax classification function.
With reference to fig. 1, the present invention integrates a stochastic completion algorithm, a pre-emphasis module, and a simplified attention mechanism to obtain a complex voice recognition model, and performs stochastic completion on input audio data to obtain a clipped and completed original audio, and uses the original audio as input data of a network; then the pre-emphasis module performs pre-emphasis on the input data, and then the input data passes through a convolution layer with a large convolution kernel; then, the traditional one-dimensional convolution neural network structure adopts a convolution layer with the same number of two channels and a pooling layer, and the convolution layer is stacked for three times to form a total of 6 layers of convolution structures. In addition, the first three layers further increase the receptive field of the model by using dilation convolution with dilation coefficients of 2, 3 and 4 respectively; then inputting the feature vector into a simplified attention mechanism module to obtain the feature with attention; and finally, outputting a final recognition result through a two-layer full connection structure and a softmax classification function.
The model structure and parameters are shown in table 1, with a sample rate of 16kHz and a sample length of 4 seconds as an example.
TABLE 1 model Structure and parameter Table
Experimental configuration and results:
1. loss function
The model uses a classical cross-entropy loss function, the formula is as follows:
H(p,q)=-∑xp (x) log q (x) equation (2)
Wherein p represents the distribution of the real samples, and q is the sample prediction distribution of the trained model.
2. Optimization algorithm
The optimizer algorithm uses a stochastic gradient descent method with momentum of 0.9, and the acceleration is updated as follows:
vt=γvt-1+ lr grad formula (3)
Where v is the acceleration, γ is the momentum coefficient, typically set to 0.9, lr is the learning rate, and grad is the gradient.
3. Learning rate
The learning rate attenuation adopts discrete decline, and the specific parameters are as follows:
the batch is set to 64 during model training for 200 rounds of training.
4. Results of the experiment
The recognition accuracy of the complex sound recognition model based on the one-dimensional convolutional neural network on ESC10, ESC50 and UrbanSound8K data sets respectively reaches 84.4%, 73.8% and 88.6%, and the method is effective.
It is understood that the above description is not intended to limit the present invention, and the present invention is not limited to the above examples, and those skilled in the art should understand that they can make various changes, modifications, additions and substitutions within the spirit and scope of the present invention.
Claims (6)
1. A complex sound identification method based on a one-dimensional convolution neural network is characterized in that a random completion algorithm is adopted to process complex sounds, and original data are filled to the same length and used for input of the one-dimensional convolution neural network; and embedding a pre-emphasis module and a simplified attention mechanism module in a basic frame of the one-dimensional convolutional neural network, wherein the pre-emphasis module is arranged at an input part of the one-dimensional convolutional neural network and is used for pre-emphasizing input data and optimizing a participating network model, the simplified attention mechanism module is arranged at the deep layer of the one-dimensional convolutional neural network, and global features with attention are obtained by utilizing a global average pooling function and a sigmoid function.
2. The complex voice recognition method based on the one-dimensional convolutional neural network of claim 1, wherein the detailed steps are as follows:
firstly, processing original data: filling the original data by adopting a random padding algorithm to obtain cut original audio with consistent length after random padding, and taking the original audio as input data of a one-dimensional convolution neural network;
secondly, pre-emphasis: pre-emphasis is carried out on input data through a pre-emphasis module, and then the input data are processed through a layer of convolution layer;
and thirdly, one-dimensional convolution neural network: obtaining a characteristic vector through one-dimensional convolutional neural network processing, wherein the one-dimensional convolutional neural network structure adopts a convolutional layer with the same number of two channels, and then a pooling layer is stacked for three times, so that 6 layers of convolutional structures are obtained;
fourthly, attention mechanism: inputting the feature vector into a simplified attention mechanism module to obtain the feature with attention;
fifthly, output classification: and finally, outputting a final recognition result through a two-layer full connection structure and a softmax classification function.
3. The method for complex voice recognition based on one-dimensional convolutional neural network of claim 2, wherein the random completion algorithm comprises the following specific steps:
(1) dividing all samples into two categories of more than or equal to N/2 seconds and less than N/2 seconds, wherein the target length of the samples is N seconds;
randomly selecting a starting point which can be supplemented to N seconds at one time for the samples of which the time is more than or equal to N/2 seconds, then intercepting the starting point to a required length, and finally filling the intercepted audio segment at the tail end of the original audio to complete the supplementation;
(2) and directly copying the whole sample until the length of the sample is more than or equal to N seconds for the sample less than N/2 seconds, and finally cutting the sample into the sample of N seconds.
4. The method of claim 2, wherein the pre-emphasis module has a two-layer convolution structure, the initial values of the convolution kernels of the first layer are set to-0.97 and 1 and are stacked continuously, the initial value of the convolution kernel of the second layer is 1, and the pre-emphasis coefficient is further adjusted.
5. The method of claim 4, wherein the number of convolution kernels of each layer of the pre-emphasis module is set to 1.
6. The complex sound identification method based on the one-dimensional convolutional neural network as claimed in claim 2, wherein the simplified attention mechanism is that firstly, global average pooling is used to compress the features into one-dimensional features consistent with the channel number, global features of the model are obtained, then the features are input into a sigmoid function to obtain weights corresponding to each channel, and finally the weights and the one-dimensional features obtained by the original global average pooling are multiplied to obtain new global features, wherein the features are features with attention;
the expression for the attention mechanism is as follows:
wherein F is the deep output characteristic of the one-dimensional convolution neural network, W is the weight vector, FOIs a global feature with attention.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202111044338.3A CN113851115A (en) | 2021-09-07 | 2021-09-07 | Complex sound identification method based on one-dimensional convolutional neural network |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202111044338.3A CN113851115A (en) | 2021-09-07 | 2021-09-07 | Complex sound identification method based on one-dimensional convolutional neural network |
Publications (1)
Publication Number | Publication Date |
---|---|
CN113851115A true CN113851115A (en) | 2021-12-28 |
Family
ID=78973314
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202111044338.3A Pending CN113851115A (en) | 2021-09-07 | 2021-09-07 | Complex sound identification method based on one-dimensional convolutional neural network |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN113851115A (en) |
Citations (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20160093293A1 (en) * | 2014-09-26 | 2016-03-31 | Samsung Electronics Co., Ltd. | Method and device for preprocessing speech signal |
CN110047506A (en) * | 2019-04-19 | 2019-07-23 | 杭州电子科技大学 | A kind of crucial audio-frequency detection based on convolutional neural networks and Multiple Kernel Learning SVM |
CN110070888A (en) * | 2019-05-07 | 2019-07-30 | 颐保医疗科技(上海)有限公司 | A kind of Parkinson's audio recognition method based on convolutional neural networks |
CN111160438A (en) * | 2019-12-24 | 2020-05-15 | 浙江大学 | Acoustic garbage classification method adopting one-dimensional convolutional neural network |
CN112199548A (en) * | 2020-09-28 | 2021-01-08 | 华南理工大学 | Music audio classification method based on convolution cyclic neural network |
CN112863550A (en) * | 2021-03-01 | 2021-05-28 | 德鲁动力科技(成都)有限公司 | Crying detection method and system based on attention residual learning |
US20210256386A1 (en) * | 2020-02-13 | 2021-08-19 | Soundhound, Inc. | Neural acoustic model |
-
2021
- 2021-09-07 CN CN202111044338.3A patent/CN113851115A/en active Pending
Patent Citations (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20160093293A1 (en) * | 2014-09-26 | 2016-03-31 | Samsung Electronics Co., Ltd. | Method and device for preprocessing speech signal |
CN110047506A (en) * | 2019-04-19 | 2019-07-23 | 杭州电子科技大学 | A kind of crucial audio-frequency detection based on convolutional neural networks and Multiple Kernel Learning SVM |
CN110070888A (en) * | 2019-05-07 | 2019-07-30 | 颐保医疗科技(上海)有限公司 | A kind of Parkinson's audio recognition method based on convolutional neural networks |
CN111160438A (en) * | 2019-12-24 | 2020-05-15 | 浙江大学 | Acoustic garbage classification method adopting one-dimensional convolutional neural network |
US20210256386A1 (en) * | 2020-02-13 | 2021-08-19 | Soundhound, Inc. | Neural acoustic model |
CN112199548A (en) * | 2020-09-28 | 2021-01-08 | 华南理工大学 | Music audio classification method based on convolution cyclic neural network |
CN112863550A (en) * | 2021-03-01 | 2021-05-28 | 德鲁动力科技(成都)有限公司 | Crying detection method and system based on attention residual learning |
Non-Patent Citations (3)
Title |
---|
POOI SHIANG TAN: "Acoustic Event Detection with MobileNet and 1D Convolutional Neural Network", 《2020 IEEE 2ND INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE IN ENGINEERING AND TECHNOLOGY》, 31 December 2020 (2020-12-31), pages 1 - 6 * |
XIFENG DONG: "Environment Sound Event Classification With a Two-Stream Convolutional Neural Network", 《IEEE ACCESS》, 31 July 2020 (2020-07-31), pages 125714 - 125721, XP011799543, DOI: 10.1109/ACCESS.2020.3007906 * |
刘航;汪西莉;: "基于注意力机制的遥感图像分割模型", 激光与光电子学进展, no. 04, 31 December 2020 (2020-12-31) * |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN110867181B (en) | Multi-target speech enhancement method based on SCNN and TCNN joint estimation | |
CN111564160B (en) | Voice noise reduction method based on AEWGAN | |
US7848924B2 (en) | Method, apparatus and computer program product for providing voice conversion using temporal dynamic features | |
KR101420557B1 (en) | Parametric speech synthesis method and system | |
CN106782511A (en) | Amendment linear depth autoencoder network audio recognition method | |
KR101807961B1 (en) | Method and apparatus for processing speech signal based on lstm and dnn | |
CN108133702A (en) | A kind of deep neural network speech enhan-cement model based on MEE Optimality Criterias | |
CN111899757A (en) | Single-channel voice separation method and system for target speaker extraction | |
CN111814448B (en) | Pre-training language model quantization method and device | |
CN112259119B (en) | Music source separation method based on stacked hourglass network | |
JP2023546099A (en) | Audio generator, audio signal generation method, and audio generator learning method | |
CN111798875A (en) | VAD implementation method based on three-value quantization compression | |
CN114267372A (en) | Voice noise reduction method, system, electronic device and storage medium | |
JPH08123484A (en) | Method and device for signal synthesis | |
WO2020141108A1 (en) | Method, apparatus and system for hybrid speech synthesis | |
JP2022539867A (en) | Audio separation method and device, electronic equipment | |
CN114141237A (en) | Speech recognition method, speech recognition device, computer equipment and storage medium | |
CN113539293A (en) | Single-channel voice separation method based on convolutional neural network and joint optimization | |
CN111724809A (en) | Vocoder implementation method and device based on variational self-encoder | |
CN113436607B (en) | Quick voice cloning method | |
CN111354367A (en) | Voice processing method and device and computer storage medium | |
CN110120228A (en) | Audio general steganalysis method and system based on sonograph and depth residual error network | |
WO2024072700A1 (en) | Switchable noise reduction profiles | |
CN113851115A (en) | Complex sound identification method based on one-dimensional convolutional neural network | |
CN113094544B (en) | Music recommendation method based on DCNN joint feature representation |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination |