CN116959477A - Convolutional neural network-based noise source classification method and device - Google Patents
Convolutional neural network-based noise source classification method and device Download PDFInfo
- Publication number
- CN116959477A CN116959477A CN202311208076.9A CN202311208076A CN116959477A CN 116959477 A CN116959477 A CN 116959477A CN 202311208076 A CN202311208076 A CN 202311208076A CN 116959477 A CN116959477 A CN 116959477A
- Authority
- CN
- China
- Prior art keywords
- audio
- noise
- neural network
- convolutional neural
- model
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
- 238000013527 convolutional neural network Methods 0.000 title claims abstract description 50
- 238000000034 method Methods 0.000 title claims abstract description 47
- 238000012549 training Methods 0.000 claims abstract description 29
- 238000012545 processing Methods 0.000 claims abstract description 5
- 238000004364 calculation method Methods 0.000 claims description 23
- 230000008859 change Effects 0.000 claims description 17
- 238000011176 pooling Methods 0.000 claims description 15
- 238000013145 classification model Methods 0.000 claims description 13
- 238000001228 spectrum Methods 0.000 claims description 10
- 239000013598 vector Substances 0.000 claims description 9
- 238000000605 extraction Methods 0.000 claims description 8
- 238000006243 chemical reaction Methods 0.000 claims description 5
- 230000000694 effects Effects 0.000 claims description 4
- 238000013507 mapping Methods 0.000 claims description 4
- 230000003595 spectral effect Effects 0.000 claims description 4
- 230000008569 process Effects 0.000 claims description 3
- 230000009467 reduction Effects 0.000 claims description 3
- 238000013528 artificial neural network Methods 0.000 abstract description 5
- 238000004422 calculation algorithm Methods 0.000 abstract description 5
- 230000007613 environmental effect Effects 0.000 abstract description 5
- 238000010586 diagram Methods 0.000 description 9
- 238000010606 normalization Methods 0.000 description 7
- 230000003321 amplification Effects 0.000 description 3
- 238000010276 construction Methods 0.000 description 3
- 238000005516 engineering process Methods 0.000 description 3
- 238000003199 nucleic acid amplification method Methods 0.000 description 3
- 230000006870 function Effects 0.000 description 2
- 230000006872 improvement Effects 0.000 description 2
- 230000004913 activation Effects 0.000 description 1
- 230000009286 beneficial effect Effects 0.000 description 1
- 230000005540 biological transmission Effects 0.000 description 1
- 238000007635 classification algorithm Methods 0.000 description 1
- 238000011161 development Methods 0.000 description 1
- 238000009432 framing Methods 0.000 description 1
- 239000000463 material Substances 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 238000003062 neural network model Methods 0.000 description 1
- 238000002360 preparation method Methods 0.000 description 1
- 238000013138 pruning Methods 0.000 description 1
- 230000001105 regulatory effect Effects 0.000 description 1
- 238000007634 remodeling Methods 0.000 description 1
- 238000010008 shearing Methods 0.000 description 1
- 230000003068 static effect Effects 0.000 description 1
- 230000007704 transition Effects 0.000 description 1
Classifications
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L21/00—Processing of the speech or voice signal to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
- G10L21/02—Speech enhancement, e.g. noise reduction or echo cancellation
- G10L21/0208—Noise filtering
- G10L21/0216—Noise filtering characterised by the method used for estimating noise
- G10L21/0232—Processing in the frequency domain
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L25/00—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
- G10L25/03—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the type of extracted parameters
- G10L25/18—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the type of extracted parameters the extracted parameters being spectral information of each sub-band
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L25/00—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
- G10L25/27—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the analysis technique
- G10L25/30—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the analysis technique using neural networks
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L25/00—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
- G10L25/48—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use
- G10L25/51—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use for comparison or discrimination
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y02—TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
- Y02T—CLIMATE CHANGE MITIGATION TECHNOLOGIES RELATED TO TRANSPORTATION
- Y02T90/00—Enabling technologies or technologies with a potential or indirect contribution to GHG emissions mitigation
Abstract
The application relates to the technical field of environmental noise identification, solves the problem that a neural network algorithm-based method in the prior art is generally limited by less training samples, so that model precision is poor, and discloses a convolutional neural network-based noise source classification method and device, wherein the method comprises the following steps: acquiring noise sample audio, expanding the noise sample audio by adopting methods of splicing, re-cutting, audio tone changing, audio speed changing, noise adding and random cutting, and constructing a convolutional neural network model; the method comprises the steps of performing data enhancement processing on samples which are difficult to collect, performing multiple expansion on training samples after data enhancement to improve accuracy and generalization of the training model, canceling filling of time domain dimensions in the network, performing zero filling only in the frequency domain dimensions, and reducing calculated amount during training and actual use.
Description
Technical Field
The application relates to the technical field of environmental noise identification, in particular to a method and a device for classifying noise sources based on a convolutional neural network.
Background
In recent years, with rapid development of industrial technology and increasing promotion of living standard of people, noise source types in life are increasing, including living noise, traffic noise, industrial noise, and the like. The contradiction and dispute caused by noise pollution are more and more, and along with the improvement of life quality of people, the influence of people on environmental noise is more and more important. Therefore, in the context of new noise law promulgation, the resolution of noise source categories is also an important issue faced by many regulatory authorities.
The noise source classification refers to the classification of the noise sound source, and two implementation modes based on a traditional algorithm and a neural network algorithm exist at present. The traditional noise source classification algorithm is used for manually extracting the audio characteristics and classifying according to the differences among the characteristics, so that the problems that the classification accuracy is difficult to improve and the classification category of the noise source is single are solved. The method based on the neural network algorithm at the present stage is generally limited by few training samples, so that the model precision is poor, and the number of parameters and the calculated amount of the model in actual use are too large.
In general, in the prior art, because of many noise types, some noises have the problem of difficult acquisition, such as thunder and rain, and specific weather environments are needed for acquisition, so that training samples of a neural network model are few, the classification precision of the trained model is low, and the generalization is poor; in addition, most convolutional neural networks used for classification at the present stage are a resnet51 network, the network is huge in parameter quantity and calculation quantity, the power consumption and calculation power requirements are high in use, and the real-time classification requirements are difficult to finish.
Disclosure of Invention
The application aims to solve the problem that a neural network algorithm-based method in the prior art is generally limited by less training samples and the model precision is poor, and provides a convolutional neural network-based noise source classification method and device.
In a first aspect, a method for classifying noise sources based on a convolutional neural network is provided, including:
acquiring noise sample audio, and expanding the noise sample audio by adopting methods of splicing, re-cutting, audio tone changing, audio speed changing, noise adding and random cutting;
constructing a convolutional neural network model;
inputting the noise sample audio and the audio obtained by expansion into a convolutional neural network model for model training so as to obtain a noise classification model;
the collected noise audio is subjected to frequency spectrum conversion to obtain a log_mel spectrum characteristic vector of the noise audio;
inputting log_mel spectral feature vectors of noise audio into the noise classification model to output respective noise categories and corresponding probabilities;
carrying out noise category statistics in a period of time;
and calculating a noise class corresponding to the maximum probability in the class statistics as a classification result of the noise source.
Further, the splicing and re-cutting includes: all noise sample audios are spliced into a long audio, and the long audio is sheared into a plurality of audios with first preset duration in an overlapping mode.
Further, the audio tone variation includes: each noise sample audio is individually subjected to a random pitch variation process to obtain an equal number of audio.
Further, the audio shifting includes: performing random speed change processing on each piece of noise sample audio independently, comparing the audio time after speed change with a first preset time, and if the audio time after speed change does not reach the first preset time, performing self-splicing by using the audio after speed change to enable the audio time to be equal to the first preset time; if the audio time after the speed change exceeds the first preset time, cutting the audio after the speed change to enable the audio time to be equal to the first preset time.
Further, the adding noise includes: random snr ambient noise is added to each noise sample audio to get an equal amount of audio.
Further, the random clipping includes: and randomly cutting each noise sample audio according to the audio feature points for a second preset time period, and splicing the randomly cut audio to the first preset time period by using the audio.
Further, the convolutional neural network model sequentially includes: the device comprises a two-dimensional conv layer, a feature extraction module, a two-dimensional DepthwiseConv layer, a mean pooling layer, a two-dimensional conv layer, a pooling layer, a Reshape layer, a two-dimensional conv layer and a Softmax layer, wherein the feature extraction module comprises 4 Transmit Block blocks and 12 normalBlock blocks.
Furthermore, the convolutional neural network model adopts a calculation method of multiplexing convolutional results, the filling of time domain dimension is canceled in the network, zero filling is only carried out on the frequency domain dimension, and the corresponding reduction of feature map dimension is carried out through a pooling layer, so that the effect of residual mapping addition is achieved.
In a second aspect, an apparatus for noise source classification based on convolutional neural network is provided, comprising:
the industrial personal computer comprises a processor, a memory and a program or an instruction stored on the memory and capable of running on the processor, wherein the program or the instruction realizes the method according to any implementation manner of the first aspect when being executed by the processor;
the microphone is electrically connected with the processor;
and the display screen is electrically connected with the processor.
In a third aspect, a computer readable storage medium is provided, the computer readable medium storing program code for execution by a device, the program code comprising steps for performing the method as in any one of the implementations of the first aspect.
The application has the following beneficial effects:
according to the application, data enhancement processing is performed on the samples which are difficult to collect, and after data enhancement, multiple expansion is performed on the training samples so as to improve the accuracy and generalization of the training model;
the application greatly reduces the parameter quantity of the model by separating convolution and a network layer pruning technology, and carries out convolution result multiplexing technology during identification, thereby greatly reducing the redundant calculation of convolution during frame-by-frame calculation, reducing the calculated amount and improving the classification efficiency.
Drawings
The accompanying drawings, which are included to provide a further understanding of the application and are incorporated in and constitute a part of this specification, illustrate embodiments of the application and together with the description serve to explain the application.
In order to more clearly illustrate the technical solutions of the embodiments of the present application, the drawings required for the description of the embodiments will be briefly described below, and it is apparent that the drawings in the following description are only some embodiments of the present application, and other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.
FIG. 1 is a flow chart of a method of convolutional neural network-based noise source classification of embodiment 1 of the present application;
FIG. 2 is a flow chart of sample expansion and model training in the method of convolutional neural network-based noise source classification of embodiment 1 of the present application;
FIG. 3 is a block diagram of a convolutional neural network model in the convolutional neural network-based noise source classification method of embodiment 1 of the present application;
FIG. 4 is a construction diagram of a transition_block module in the method for classifying noise sources based on convolutional neural network according to embodiment 1 of the present application;
FIG. 5 is a diagram of the normal_block module in the method for classifying noise sources based on convolutional neural network according to embodiment 1 of the present application;
FIG. 6 is a schematic diagram of a normal convolution calculation;
FIG. 7 is a schematic diagram of convolutional multiplexing calculation in the method of convolutional neural network-based noise source classification of embodiment 1 of the present application;
fig. 8 is a block diagram showing the structure of an apparatus for noise source classification based on convolutional neural network according to embodiment 2 of the present application.
Reference numerals:
100. an industrial personal computer; 101. a processor; 102. a memory; 200. a microphone; 300. and a display screen.
Detailed Description
The following description of the embodiments of the present application will be made clearly and completely with reference to the accompanying drawings, in which it is apparent that the embodiments described are only some embodiments of the present application, but not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the application without making any inventive effort, are intended to be within the scope of the application.
Example 1
The method for classifying noise sources based on convolutional neural network according to the embodiment 1 of the application comprises the following steps: acquiring noise sample audio, and expanding the noise sample audio by adopting methods of splicing, re-cutting, audio tone changing, audio speed changing, noise adding and random cutting; constructing a convolutional neural network model; inputting the noise sample audio and the audio obtained by expansion into a convolutional neural network model for model training so as to obtain a noise classification model; the collected noise audio is subjected to frequency spectrum conversion to obtain a log_mel spectrum characteristic vector of the noise audio; inputting log_mel spectral feature vectors of noise audio into the noise classification model to output respective noise categories and corresponding probabilities; carrying out noise category statistics in a period of time; the noise class corresponding to the maximum probability in the class statistics is calculated to serve as a classification result of the noise source, data enhancement processing can be carried out on samples which are difficult to collect by the method, and after data enhancement, multiple expansion is carried out on training samples so as to improve the accuracy and generalization of the training model.
Specifically, fig. 1 shows a flowchart (mainly divided into a training phase and an reasoning phase) of a method for classifying noise sources based on a convolutional neural network in application example 1, including:
s100, obtaining noise sample audio, and expanding the noise sample audio by adopting methods of splicing, re-cutting, audio tone changing, audio speed changing, noise adding and random cutting;
it is known that neural network-based noise source classification models want to achieve high recognition rates, requiring rich training samples. If all the training samples are manually collected, huge manpower and material resources are needed, and certain samples are required to be subjected to special weather (lightning strike, heavy rain, hail and the like), the application provides a sample amplification scheme aiming at the problem of small number of manually collected samples.
In this embodiment, the audio with the duration of 10s is used as noise sample audio, and the sample amplification scheme is shown in fig. 2, and the process of the sample amplification scheme is that the manually collected 10s noise sample audio sample is spliced, sheared, audio tonal variation, audio speed variation, noise addition and random clipping are performed on the manually collected 10s noise sample audio sample, the sample is subjected to five times and more expansion, the sample after expansion and the manually collected sample are jointly used as training samples of a network, and a noise classification model is obtained through training, as shown in fig. 2, so that the model has very high accuracy and generalization.
The data expansion scheme comprises the following specific steps:
splicing and then cutting: all manually collected 10s samples are spliced into a whole long audio, then overlapped sheared is added into training samples, the number of the samples obtained by the expansion scheme is more than that of the original samples, and more than one time of expansion effect is achieved (the specific data amount depends on the overlap amount in shearing);
audio tone variation: carrying out random tone modification treatment on each manually collected 10s sample independently, wherein the sample quantity obtained by the expansion scheme is equal to the sample quantity of a sample;
audio frequency speed change: each manually collected 10s sample is independently subjected to random speed change treatment, the audio after speed change is less than 10s (namely, the first preset duration is 10s to ensure that the duration of the audio after expansion is equal to the duration of the original noise sample audio), the audio after speed change is spliced to 10s by the audio after speed change, the audio with the length exceeding 10s is randomly shortened to 10s, and the sample quantity obtained by the expansion scheme is equal to the sample quantity of the original sample;
adding noise: adding random snr environmental noise to each manually collected 10s sample, wherein the sample quantity obtained by the expansion scheme is equal to the sample quantity of the original sample;
randomly cutting: for each manually collected 10s sample, according to the audio feature points, randomly cutting the audio with the audio feature points to be 9s (namely, the second preset time length is 1s, the original 10s sample is randomly cut for 1s and then becomes the audio with the audio frequency of 9 s), then splicing the sample to the audio with the audio feature points by itself to be 10s (namely, the audio feature points are lower than the preset time length by 10 s), each sample can be cut into n pieces of audio, and the sample size obtained by the expansion scheme is n times of the sample size of the original sample.
S200, constructing a convolutional neural network model;
as shown in fig. 3, the convolutional neural network model sequentially includes: the two-dimensional conv layer, the feature extraction module, the two-dimensional DepthwiseConv layer, the mean pooling layer, the two-dimensional conv layer, the pooling layer, the Reshape layer, the two-dimensional conv layer and the Softmax layer, wherein the feature extraction module comprises 4 Transmit Block blocks and 12 normalBlock blocks, and the flow of the convolutional neural network model is as follows: the method comprises the steps of extracting log_mel characteristics (MxN) of original audio (P x 1) as original characteristics of a network, inputting the log_mel characteristics through a two-dimensional Conv layer, inputting the log_mel characteristics into a characteristic extraction module, wherein the characteristic extraction module consists of 4 TransmitionBlock blocks and 12 normalBlock blocks, extracting a characteristic diagram, inputting the extracted characteristic diagram through a two-dimensional DepthwiseConv layer after passing through a Mean layer, inputting the extracted characteristic diagram into the two-dimensional Conv layer after passing through a pooling layer, carrying out dimension adjustment through a Reshape layer after passing through the two-dimensional Conv layer, and finally obtaining corresponding category scores through a Softmax layer. The specific parameter settings are shown in table 1:
table 1: convolutional neural network model parameter setting table
Network layer | Structure of the | Output dimension | Quantity of parameters |
1 | padding | [400, 132, 1] | 0 |
2 | conv2d | [396, 64, 16] | 416 |
3 | transition_block | [394, 64, 16] | 800 |
4 | normal_block | [392, 64, 16] | 480 |
5 | normal_block | [390, 64, 16] | 480 |
6 | padding | [390, 64, 16] | 0 |
7 | transition_block | [386, 32, 32] | 2112 |
8 | normal_block | [382, 32, 32] | 1472 |
9 | normal_block | [378, 32, 32] | 1472 |
10 | padding | [378, 32, 32] | 0 |
11 | transition_block | [370, 16, 64] | 7296 |
12 | normal_block | [362, 16, 64] | 4992 |
13 | normal_block | [354, 16, 64] | 4992 |
14 | normal_block | [346, 16, 64] | 4992 |
15 | normal_block | [338, 16, 64] | 4992 |
16 | padding | [338, 16, 64] | 0 |
17 | transition_block | [322, 16, 128] | 26880 |
18 | normal_block | [306, 16, 128] | 18176 |
19 | normal_block | [290, 16, 128] | 18176 |
20 | normal_block | [274, 16, 128] | 18176 |
21 | normal_block | [258, 16, 128] | 18176 |
22 | padding | [258, 16, 128] | 0 |
23 | padding | [258, 20, 128] | 0 |
24 | conv2d | [254, 16, 128] | 3328 |
25 | mean | [254, 1, 128] | 0 |
26 | conv2d | [254, 1, 32] | 4096 |
27 | padding | [1, 1, 32] | 0 |
28 | conv2d | [1, 1, 9] | 288 |
The specific construction of the transform_block module is shown in fig. 4, and the specific parameter settings of the transform_block module are shown in table 2:
table 2: specific parameter setting table of transition_block module
Network layer | Structure of the |
1 | conv2d |
2 | batch_normalization |
3 | padding |
4 | depthwise_conv2d |
5 | batch_normalization |
6 | pooling |
7 | depthwise_conv2d |
8 | batch_normalization |
9 | conv2d |
The specific construction of the normal_block module is shown in fig. 5, and the specific parameter settings of the normal_block module are shown in table 3:
table 3: specific parameter setting table of normal_block module
Network layer | Structure of the |
1 | pooling |
2 | padding |
3 | depthwise_conv2d |
4 | batch_normalization |
5 | pooling |
6 | depthwise_conv2d |
7 | batch_normalization |
8 | conv2d |
Aiming at the problem of large calculation amount of the convolutional neural network model, the calculation method of convolutional result multiplexing is used in the embodiment, the filling of time domain dimension is canceled in the network, and zero filling is only carried out on the frequency domain dimension, so that the dimension of the feature map is ensured, and the calculation amount in training and practical use is reduced on the premise of not losing the model precision.
If the input of the convolutional neural network model is the audio length of 10 seconds, 400 frames are obtained after framing as the time domain length of the input features, the audio is completely convolved every 10 seconds during each forward reasoning, and when the next 10 seconds of audio is convolved again after the time of one frame, the two sections of audio which are only 10 seconds of one frame apart have great convolution repetition calculation. In the embodiment, the convolution result of each frame is stored, when the network infers the audio frequency of the next frame for 10 seconds, only the convolution result of the tail frame is needed to be calculated, the convolution calculation can be greatly reduced, the inference speed is increased, and real-time inference is realized.
It should be noted that, the corresponding relationship between Chinese and English with respect to terms in the specification and the drawings is as follows:
conv: a convolution layer;
depthwise Conv: a deep convolution layer;
pooling: pooling layers;
transmission Block: a transition block;
normal Block: a regular block;
mean: an average layer;
reshape: dimensional remodeling;
softmax: a Softmax layer;
cls_prob: category scores;
batch normalization: a batch normalization layer;
ReLU: a ReLU activation function;
pad: a zero-fill layer;
swish: swish activates the function.
As shown in fig. 6 and 7, if the convolutional neural network model is input into 6 frames of length at the time of reasoning to obtain a final result, the convolution kernel is 3, and in the case of normal convolution calculation, separate convolution calculation is performed for every 6 frames of data, the results of 4 convolutions of two adjacent 6 frames of length are the same, which results in 4 repeated convolution calculations. In the embodiment, the convolution calculation results are multiplexed, each convolution result is stored, and the average of the sum of the corresponding convolution results is obtained when the final result of forward reasoning is obtained, so that the repeated calculation of the convolution is reduced, and the calculation amount is greatly reduced.
Meanwhile, in the embodiment, because the residual mapping is applied to the network, when the time domain convolution is performed, if zero filling is performed, the result of each convolution is different, and convolution result administration cannot be performed, so that the zero filling of the time domain dimension is cancelled, and a pooling layer is used for performing corresponding reduction of the feature map dimension, so that the effect of adding the residual mapping is achieved, and the structures of a Transmit Block module and a normalBlock module are specifically seen.
S300, inputting noise sample audio and audio obtained by expansion into a convolutional neural network model for model training so as to obtain a noise classification model;
it should be noted that, the model training stage belongs to the preparation stage of the model, and does not belong to the practical stage, in this embodiment, the manually collected noise audio sample is subjected to a sample expansion method to obtain a training sample of the model, and the convolutional neural network model is trained by using the training sample to obtain an optimal inference weight, so as to supply the noise classification model of the inference stage for use.
S400, carrying out frequency spectrum conversion on the collected noise audio to obtain a log_mel spectrum feature vector of the noise audio;
s500, inputting log_mel spectral feature vectors of noise audio into the noise classification model to output each noise category and corresponding probability;
s600, carrying out noise category statistics in a period of time;
and S700, calculating a noise category corresponding to the maximum probability in the category statistics as a classification result of the noise source.
It should be noted that, steps S400-S700 belong to an inference stage, the inference stage is an actual practical stage of the noise classification model, the microphone receives the environmental noise, and through spectrum conversion, a log_mel spectrum feature vector of the audio is obtained, and is used for inputting the noise classification model (the weight obtained in the loaded training stage) to obtain the probability corresponding to each noise class, then the class statistics is performed for a period of time, and finally the class corresponding to the maximum probability is output.
The use of a 1x1 two-dimensional convolution and a 1x3 two-dimensional convolution in this embodiment reduces the number of parameters of the model. The method uses a calculation method of frame convolution, the filling of time domain dimension is canceled in a network, and zero filling is only carried out on the frequency domain dimension, so that the size of the dimension of the feature map is ensured, and the calculated amount in training and actual use is reduced on the premise of not losing the model precision.
Aiming at the problem of large calculation amount of a convolutional neural network, the calculation method of convolutional result multiplexing is used in the scheme, filling of time domain dimension is canceled in the network, zero filling is only carried out on the frequency domain dimension, the size of the dimension of the feature map is guaranteed, and the calculation amount in training and practical use is reduced on the premise that model accuracy is not lost.
Example 2
As shown in fig. 8, an apparatus for classifying noise sources based on convolutional neural network according to embodiment 2 of the present application includes:
100 industrial personal computers, the industrial personal computer 100 comprising a processor 101, a memory 102 and a program or instruction stored on the memory 102 and executable on the processor 101, the program or instruction implementing the method according to any one of the embodiments 1 when executed by the processor 101;
a microphone 200, wherein the microphone 200 is electrically connected with the processor 101;
the display screen 300, the display screen 300 is electrically connected with the processor 101.
It should be noted that, for avoiding redundancy, reference may be made to other specific embodiments of the device for classifying noise sources based on convolutional neural networks in the embodiment of the present application, and in order to avoid redundancy, details are not described here, when in use, the microphone 200 collects audio information and transmits the audio information to the industrial personal computer 100, the industrial personal computer 100 carries a program or an instruction that can run on the processor 101, and when the program or the instruction is executed by the processor 101, the method described in any one of embodiments 1 is implemented, the audio information is processed by the industrial personal computer 100 to obtain the category to which the audio belongs, and the category information is transmitted to the display screen 300 for display.
Example 3
A computer-readable storage medium according to embodiment 3 of the present application stores program code for execution by a device, the program code including steps for performing the method as in any one of the implementations of embodiment 1 of the present application;
wherein the computer readable storage medium may be a Read Only Memory (ROM), a static storage device, a dynamic storage device, or a random access memory (random access memory, RAM); the computer readable storage medium may store a program code which, when executed by a processor, is adapted to perform the steps of the method as in any one of the implementations of embodiment 1 of the application.
The above is only a preferred embodiment of the present application; the scope of the application is not limited in this respect. Any person skilled in the art, within the technical scope of the present disclosure, may apply to the present application, and the technical solution and the improvement thereof are all covered by the protection scope of the present application.
Claims (10)
1. A method for noise source classification based on convolutional neural network, comprising:
acquiring noise sample audio, and expanding the noise sample audio by adopting methods of splicing, re-cutting, audio tone changing, audio speed changing, noise adding and random cutting;
constructing a convolutional neural network model;
inputting the noise sample audio and the audio obtained by expansion into a convolutional neural network model for model training so as to obtain a noise classification model;
the collected noise audio is subjected to frequency spectrum conversion to obtain a log_mel spectrum characteristic vector of the noise audio;
inputting log_mel spectral feature vectors of noise audio into the noise classification model to output respective noise categories and corresponding probabilities;
carrying out noise category statistics in a period of time;
and calculating a noise class corresponding to the maximum probability in the class statistics as a classification result of the noise source.
2. The convolutional neural network-based noise source classification method of claim 1, wherein the stitching re-clipping comprises: all noise sample audios are spliced into a long audio, and the long audio is sheared into a plurality of audios with first preset duration in an overlapping mode.
3. The method of convolutional neural network-based noise source classification of claim 1, wherein the audio tone variation comprises: each noise sample audio is individually subjected to a random pitch variation process to obtain an equal number of audio.
4. The method of convolutional neural network-based noise source classification of claim 1, wherein the audio shifting comprises: performing random speed change processing on each piece of noise sample audio independently, comparing the audio time after speed change with a first preset time, and if the audio time after speed change does not reach the first preset time, performing self-splicing by using the audio after speed change to enable the audio time to be equal to the first preset time; if the audio time after the speed change exceeds the first preset time, cutting the audio after the speed change to enable the audio time to be equal to the first preset time.
5. The method of convolutional neural network-based noise source classification of claim 1, wherein the adding noise comprises: random snr ambient noise is added to each noise sample audio to get an equal amount of audio.
6. The method of convolutional neural network-based noise source classification of claim 1, wherein the random clipping comprises: and randomly cutting each noise sample audio according to the audio feature points for a second preset time period, and splicing the randomly cut audio to the first preset time period by using the audio.
7. The method for classifying noise sources based on convolutional neural network of claim 1, wherein the convolutional neural network model comprises, in order: the device comprises a two-dimensional conv layer, a feature extraction module, a two-dimensional DepthwiseConv layer, a mean pooling layer, a two-dimensional conv layer, a pooling layer, a Reshape layer, a two-dimensional conv layer and a Softmax layer, wherein the feature extraction module comprises 4 Transmit Block blocks and 12 normalBlock blocks.
8. The method for classifying noise sources based on convolutional neural network according to claim 4, wherein the convolutional neural network model adopts a calculation method of convolutional result multiplexing, the filling of time domain dimension is canceled in the network, zero filling is only performed on the frequency domain dimension, and the corresponding reduction of feature map dimension is performed through a pooling layer so as to achieve the effect of residual mapping addition.
9. An apparatus for noise source classification based on convolutional neural network, comprising:
an industrial personal computer comprising a processor, a memory and a program or instruction stored on the memory and executable on the processor, which when executed by the processor implements the method of any one of claims 1-8;
the microphone is electrically connected with the processor;
and the display screen is electrically connected with the processor.
10. A computer readable storage medium storing program code for execution by a device, the program code comprising steps for performing the method of any one of claims 1-8.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202311208076.9A CN116959477B (en) | 2023-09-19 | 2023-09-19 | Convolutional neural network-based noise source classification method and device |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202311208076.9A CN116959477B (en) | 2023-09-19 | 2023-09-19 | Convolutional neural network-based noise source classification method and device |
Publications (2)
Publication Number | Publication Date |
---|---|
CN116959477A true CN116959477A (en) | 2023-10-27 |
CN116959477B CN116959477B (en) | 2023-12-19 |
Family
ID=88454923
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202311208076.9A Active CN116959477B (en) | 2023-09-19 | 2023-09-19 | Convolutional neural network-based noise source classification method and device |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN116959477B (en) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN117690451A (en) * | 2024-01-29 | 2024-03-12 | 杭州爱华仪器有限公司 | Neural network noise source classification method and device based on ensemble learning |
Citations (10)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN105868785A (en) * | 2016-03-30 | 2016-08-17 | 乐视控股(北京)有限公司 | Image identification method based on convolutional neural network and image identification system thereof |
WO2020037960A1 (en) * | 2018-08-21 | 2020-02-27 | 深圳大学 | Sar target recognition method and apparatus, computer device, and storage medium |
CN111915506A (en) * | 2020-06-19 | 2020-11-10 | 西安电子科技大学 | Method for eliminating stripe noise of sequence image |
CN111951823A (en) * | 2020-08-07 | 2020-11-17 | 腾讯科技(深圳)有限公司 | Audio processing method, device, equipment and medium |
KR20210034429A (en) * | 2019-09-20 | 2021-03-30 | 아주대학교산학협력단 | Apparatus and method for classificating point cloud using neighbor connectivity convolutional neural network |
CN113963719A (en) * | 2020-07-20 | 2022-01-21 | 东声(苏州)智能科技有限公司 | Deep learning-based sound classification method and apparatus, storage medium, and computer |
CN115019760A (en) * | 2022-05-19 | 2022-09-06 | 上海理工大学 | Data amplification method for audio and real-time sound event detection system and method |
CN115050356A (en) * | 2022-06-07 | 2022-09-13 | 中山大学 | Noise identification method and device and computer readable storage medium |
CN115358718A (en) * | 2022-08-24 | 2022-11-18 | 广东旭诚科技有限公司 | Noise pollution classification and real-time supervision method based on intelligent monitoring front end |
CN115456088A (en) * | 2022-09-19 | 2022-12-09 | 浙江科技学院 | Motor fault classification method based on self-encoder feature lifting dimension and multi-sensing fusion |
-
2023
- 2023-09-19 CN CN202311208076.9A patent/CN116959477B/en active Active
Patent Citations (10)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN105868785A (en) * | 2016-03-30 | 2016-08-17 | 乐视控股(北京)有限公司 | Image identification method based on convolutional neural network and image identification system thereof |
WO2020037960A1 (en) * | 2018-08-21 | 2020-02-27 | 深圳大学 | Sar target recognition method and apparatus, computer device, and storage medium |
KR20210034429A (en) * | 2019-09-20 | 2021-03-30 | 아주대학교산학협력단 | Apparatus and method for classificating point cloud using neighbor connectivity convolutional neural network |
CN111915506A (en) * | 2020-06-19 | 2020-11-10 | 西安电子科技大学 | Method for eliminating stripe noise of sequence image |
CN113963719A (en) * | 2020-07-20 | 2022-01-21 | 东声(苏州)智能科技有限公司 | Deep learning-based sound classification method and apparatus, storage medium, and computer |
CN111951823A (en) * | 2020-08-07 | 2020-11-17 | 腾讯科技(深圳)有限公司 | Audio processing method, device, equipment and medium |
CN115019760A (en) * | 2022-05-19 | 2022-09-06 | 上海理工大学 | Data amplification method for audio and real-time sound event detection system and method |
CN115050356A (en) * | 2022-06-07 | 2022-09-13 | 中山大学 | Noise identification method and device and computer readable storage medium |
CN115358718A (en) * | 2022-08-24 | 2022-11-18 | 广东旭诚科技有限公司 | Noise pollution classification and real-time supervision method based on intelligent monitoring front end |
CN115456088A (en) * | 2022-09-19 | 2022-12-09 | 浙江科技学院 | Motor fault classification method based on self-encoder feature lifting dimension and multi-sensing fusion |
Non-Patent Citations (1)
Title |
---|
GEORGE SAON ET AL: "Diagonal State Space Augmented Transformers for Speech Recognition", ICASSP 2023 - 2023 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP) * |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN117690451A (en) * | 2024-01-29 | 2024-03-12 | 杭州爱华仪器有限公司 | Neural network noise source classification method and device based on ensemble learning |
CN117690451B (en) * | 2024-01-29 | 2024-04-16 | 杭州爱华仪器有限公司 | Neural network noise source classification method and device based on ensemble learning |
Also Published As
Publication number | Publication date |
---|---|
CN116959477B (en) | 2023-12-19 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN108764471B (en) | Neural network cross-layer pruning method based on feature redundancy analysis | |
CN110097129B (en) | Remote sensing target detection method based on profile wave grouping characteristic pyramid convolution | |
CN111259898B (en) | Crop segmentation method based on unmanned aerial vehicle aerial image | |
CN107393542B (en) | Bird species identification method based on two-channel neural network | |
CN116959477B (en) | Convolutional neural network-based noise source classification method and device | |
CN108231086A (en) | A kind of deep learning voice enhancer and method based on FPGA | |
CN110533022B (en) | Target detection method, system, device and storage medium | |
CN114492831B (en) | Method and device for generating federal learning model | |
CN113658200B (en) | Edge perception image semantic segmentation method based on self-adaptive feature fusion | |
CN111401523A (en) | Deep learning network model compression method based on network layer pruning | |
CN111667068A (en) | Mask-based depth map convolutional neural network model pruning method and system | |
CN112508961A (en) | CT image segmentation method based on improved ResNet-Unet | |
Yang et al. | A new hybrid prediction model of PM2. 5 concentration based on secondary decomposition and optimized extreme learning machine | |
CN112800851B (en) | Water body contour automatic extraction method and system based on full convolution neuron network | |
CN112101487B (en) | Compression method and device for fine-grained recognition model | |
CN113706471A (en) | Steel product surface defect detection method based on model compression | |
CN111932690B (en) | Pruning method and device based on 3D point cloud neural network model | |
CN114169232A (en) | Full-time-period three-dimensional atmospheric pollutant reconstruction method and device, computer equipment and storage medium | |
CN111586151B (en) | Intelligent city data sharing system and method based on block chain | |
Li et al. | Towards communication-efficient digital twin via ai-powered transmission and reconstruction | |
CN111461324A (en) | Hierarchical pruning method based on layer recovery sensitivity | |
CN115331690B (en) | Method for eliminating noise of call voice in real time | |
CN110766219A (en) | Haze prediction method based on BP neural network | |
CN110164418B (en) | Automatic speech recognition acceleration method based on convolution grid long-time memory recurrent neural network | |
CN112561817A (en) | Remote sensing image cloud removing method, device and equipment based on AM-GAN and storage medium |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |