CN116959477B - Convolutional neural network-based noise source classification method and device - Google Patents

Convolutional neural network-based noise source classification method and device Download PDF

Info

Publication number
CN116959477B
CN116959477B CN202311208076.9A CN202311208076A CN116959477B CN 116959477 B CN116959477 B CN 116959477B CN 202311208076 A CN202311208076 A CN 202311208076A CN 116959477 B CN116959477 B CN 116959477B
Authority
CN
China
Prior art keywords
audio
noise
neural network
convolutional neural
model
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202311208076.9A
Other languages
Chinese (zh)
Other versions
CN116959477A (en
Inventor
纪盟盟
张静
毛志德
李兆行
张凯帆
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Hangzhou Aihua Instruments Co ltd
Original Assignee
Hangzhou Aihua Instruments Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Hangzhou Aihua Instruments Co ltd filed Critical Hangzhou Aihua Instruments Co ltd
Priority to CN202311208076.9A priority Critical patent/CN116959477B/en
Publication of CN116959477A publication Critical patent/CN116959477A/en
Application granted granted Critical
Publication of CN116959477B publication Critical patent/CN116959477B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L21/00Processing of the speech or voice signal to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
    • G10L21/02Speech enhancement, e.g. noise reduction or echo cancellation
    • G10L21/0208Noise filtering
    • G10L21/0216Noise filtering characterised by the method used for estimating noise
    • G10L21/0232Processing in the frequency domain
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/03Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the type of extracted parameters
    • G10L25/18Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the type of extracted parameters the extracted parameters being spectral information of each sub-band
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/27Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the analysis technique
    • G10L25/30Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the analysis technique using neural networks
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/48Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use
    • G10L25/51Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use for comparison or discrimination
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02TCLIMATE CHANGE MITIGATION TECHNOLOGIES RELATED TO TRANSPORTATION
    • Y02T90/00Enabling technologies or technologies with a potential or indirect contribution to GHG emissions mitigation

Abstract

The application relates to the technical field of environmental noise identification, solves the problem that a neural network algorithm-based method in the prior art is generally limited by less training samples, so that model accuracy is poor, and discloses a convolutional neural network-based noise source classification method and device, wherein the method comprises the following steps: acquiring noise sample audio, expanding the noise sample audio by adopting methods of splicing, re-cutting, audio tone changing, audio speed changing, noise adding and random cutting, and constructing a convolutional neural network model; the method comprises the steps of performing data enhancement processing on samples which are difficult to collect, performing multiple expansion on training samples after data enhancement to improve accuracy and generalization of the training model, canceling filling of time domain dimensions in the network, performing zero filling only in the frequency domain dimensions, and reducing calculated amount during training and actual use.

Description

Convolutional neural network-based noise source classification method and device
Technical Field
The application relates to the technical field of environmental noise identification, in particular to a method and a device for classifying noise sources based on a convolutional neural network.
Background
In recent years, with rapid development of industrial technology and increasing promotion of living standard of people, noise source types in life are increasing, including living noise, traffic noise, industrial noise, and the like. The contradiction and dispute caused by noise pollution are more and more, and along with the improvement of life quality of people, the influence of people on environmental noise is more and more important. Therefore, in the context of new noise law promulgation, the resolution of noise source categories is also an important issue faced by many regulatory authorities.
The noise source classification refers to the classification of the noise sound source, and two implementation modes based on a traditional algorithm and a neural network algorithm exist at present. The traditional noise source classification algorithm is used for manually extracting the audio characteristics and classifying according to the differences among the characteristics, so that the problems that the classification accuracy is difficult to improve and the classification category of the noise source is single are solved. The method based on the neural network algorithm at the present stage is generally limited by few training samples, so that the model precision is poor, and the number of parameters and the calculated amount of the model in actual use are too large.
In general, in the prior art, because of many noise types, some noises have the problem of difficult acquisition, such as thunder and rain, and specific weather environments are needed for acquisition, so that training samples of a neural network model are few, the classification precision of the trained model is low, and the generalization is poor; in addition, most convolutional neural networks used for classification at the present stage are a resnet51 network, the network is huge in parameter quantity and calculation quantity, the power consumption and calculation power requirements are high in use, and the real-time classification requirements are difficult to finish.
Disclosure of Invention
The method and the device aim to solve the problem that in the prior art, a method based on a neural network algorithm is generally limited by less training samples and the model precision is poor.
In a first aspect, a method for classifying noise sources based on a convolutional neural network is provided, including:
acquiring noise sample audio, and expanding the noise sample audio by adopting methods of splicing, re-cutting, audio tone changing, audio speed changing, noise adding and random cutting;
constructing a convolutional neural network model;
inputting the noise sample audio and the audio obtained by expansion into a convolutional neural network model for model training so as to obtain a noise classification model;
the collected noise audio is subjected to frequency spectrum conversion to obtain a log_mel spectrum characteristic vector of the noise audio;
inputting log_mel spectral feature vectors of noise audio into the noise classification model to output respective noise categories and corresponding probabilities;
carrying out noise category statistics in a period of time;
and calculating a noise class corresponding to the maximum probability in the class statistics as a classification result of the noise source.
Further, the splicing and re-cutting includes: all noise sample audios are spliced into a long audio, and the long audio is sheared into a plurality of audios with first preset duration in an overlapping mode.
Further, the audio tone variation includes: each noise sample audio is individually subjected to a random pitch variation process to obtain an equal number of audio.
Further, the audio shifting includes: performing random speed change processing on each piece of noise sample audio independently, comparing the audio time after speed change with a first preset time, and if the audio time after speed change does not reach the first preset time, performing self-splicing by using the audio after speed change to enable the audio time to be equal to the first preset time; if the audio time after the speed change exceeds the first preset time, cutting the audio after the speed change to enable the audio time to be equal to the first preset time.
Further, the adding noise includes: random snr ambient noise is added to each noise sample audio to get an equal amount of audio.
Further, the random clipping includes: and randomly cutting each noise sample audio according to the audio feature points for a second preset time period, and splicing the randomly cut audio to the first preset time period by using the audio.
Further, the convolutional neural network model sequentially includes: the device comprises a two-dimensional conv layer, a feature extraction module, a two-dimensional DepthwiseConv layer, a mean pooling layer, a two-dimensional conv layer, a pooling layer, a Reshape layer, a two-dimensional conv layer and a Softmax layer, wherein the feature extraction module comprises 4 Transmit Block blocks and 12 normalBlock blocks.
Furthermore, the convolutional neural network model adopts a calculation method of multiplexing convolutional results, the filling of time domain dimension is canceled in the network, zero filling is only carried out on the frequency domain dimension, and the corresponding reduction of feature map dimension is carried out through a pooling layer, so that the effect of residual mapping addition is achieved.
In a second aspect, an apparatus for noise source classification based on convolutional neural network is provided, comprising:
the industrial personal computer comprises a processor, a memory and a program or an instruction stored on the memory and capable of running on the processor, wherein the program or the instruction realizes the method according to any implementation manner of the first aspect when being executed by the processor;
the microphone is electrically connected with the processor;
and the display screen is electrically connected with the processor.
In a third aspect, a computer readable storage medium is provided, the computer readable medium storing program code for execution by a device, the program code comprising steps for performing the method as in any one of the implementations of the first aspect.
The application has the following beneficial effects:
according to the method, data enhancement processing is performed on the samples which are difficult to collect, and after data enhancement, multiple expansion is performed on the training samples so as to improve the accuracy and generalization of the training model;
according to the method, the parameters of the model are greatly reduced through the separation convolution and the network layer pruning technology, the convolution result multiplexing technology is carried out during identification, the convolution redundancy calculation during frame-by-frame calculation is greatly reduced, the calculated amount is reduced, and the classification efficiency is improved.
Drawings
The accompanying drawings, which are included to provide a further understanding of the application, illustrate and explain the application and are not to be construed as limiting the application.
In order to more clearly illustrate the technical solutions of the embodiments of the present application, the drawings that are needed in the description of the embodiments will be briefly introduced below, and it is obvious that the drawings in the following description are only some embodiments of the present application, and that other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.
FIG. 1 is a flow chart of a method of convolutional neural network based noise source classification of embodiment 1 of the present application;
FIG. 2 is a flow chart of sample expansion and model training in the method of convolutional neural network-based noise source classification of embodiment 1 of the present application;
FIG. 3 is a block diagram of a convolutional neural network model in the convolutional neural network-based noise source classification method of embodiment 1 of the present application;
FIG. 4 is a construction diagram of a transition_block module in the method for classifying noise sources based on convolutional neural network according to embodiment 1 of the present application;
FIG. 5 is a diagram of the normal_block module in the method for convolutional neural network based noise source classification of embodiment 1 of the present application;
FIG. 6 is a schematic diagram of a normal convolution calculation;
FIG. 7 is a schematic diagram of convolutional multiplexing computation in the method of convolutional neural network-based noise source classification of embodiment 1 of the present application;
fig. 8 is a block diagram of the structure of the device for noise source classification based on convolutional neural network according to embodiment 2 of the present application.
Reference numerals:
100. an industrial personal computer; 101. a processor; 102. a memory; 200. a microphone; 300. and a display screen.
Detailed Description
The following description of the embodiments of the present invention will be made clearly and completely with reference to the accompanying drawings, in which it is apparent that the embodiments described are only some embodiments of the present invention, but not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.
Example 1
The method for classifying noise sources based on convolutional neural network according to embodiment 1 of the present application includes: acquiring noise sample audio, and expanding the noise sample audio by adopting methods of splicing, re-cutting, audio tone changing, audio speed changing, noise adding and random cutting; constructing a convolutional neural network model; inputting the noise sample audio and the audio obtained by expansion into a convolutional neural network model for model training so as to obtain a noise classification model; the collected noise audio is subjected to frequency spectrum conversion to obtain a log_mel spectrum characteristic vector of the noise audio; inputting log_mel spectral feature vectors of noise audio into the noise classification model to output respective noise categories and corresponding probabilities; carrying out noise category statistics in a period of time; the noise class corresponding to the maximum probability in the class statistics is calculated to serve as a classification result of the noise source, data enhancement processing can be carried out on samples which are difficult to collect by the method, and after data enhancement, multiple expansion is carried out on training samples so as to improve the accuracy and generalization of the training model.
Specifically, fig. 1 shows a flowchart (mainly divided into a training phase and an reasoning phase) of a method for classifying noise sources based on a convolutional neural network in application example 1, including:
s100, obtaining noise sample audio, and expanding the noise sample audio by adopting methods of splicing, re-cutting, audio tone changing, audio speed changing, noise adding and random cutting;
it is known that neural network-based noise source classification models want to achieve high recognition rates, requiring rich training samples. If all the training samples are manually collected, huge manpower and material resources are needed, and certain samples are required to be subjected to special weather (lightning strike, heavy rain, hail and the like), the invention provides a sample amplification scheme aiming at the problem of small number of manually collected samples.
In this embodiment, the audio with the duration of 10s is used as noise sample audio, and the sample amplification scheme is shown in fig. 2, and the process of the sample amplification scheme is that the manually collected 10s noise sample audio sample is spliced, sheared, audio tonal variation, audio speed variation, noise addition and random clipping are performed on the manually collected 10s noise sample audio sample, the sample is subjected to five times and more expansion, the sample after expansion and the manually collected sample are jointly used as training samples of a network, and a noise classification model is obtained through training, as shown in fig. 2, so that the model has very high accuracy and generalization.
The data expansion scheme comprises the following specific steps:
splicing and then cutting: all manually collected 10s samples are spliced into a whole long audio, then overlapped sheared is added into training samples, the number of the samples obtained by the expansion scheme is more than that of the original samples, and more than one time of expansion effect is achieved (the specific data amount depends on the overlap amount in shearing);
audio tone variation: carrying out random tone modification treatment on each manually collected 10s sample independently, wherein the sample quantity obtained by the expansion scheme is equal to the sample quantity of a sample;
audio frequency speed change: each manually collected 10s sample is independently subjected to random speed change treatment, the audio after speed change is less than 10s (namely, the first preset duration is 10s to ensure that the duration of the audio after expansion is equal to the duration of the original noise sample audio), the audio after speed change is spliced to 10s by the audio after speed change, the audio with the length exceeding 10s is randomly shortened to 10s, and the sample quantity obtained by the expansion scheme is equal to the sample quantity of the original sample;
adding noise: adding random snr environmental noise to each manually collected 10s sample, wherein the sample quantity obtained by the expansion scheme is equal to the sample quantity of the original sample;
randomly cutting: for each manually collected 10s sample, according to the audio feature points, randomly cutting the audio with the audio feature points to be 9s (namely, the second preset time length is 1s, the original 10s sample is randomly cut for 1s and then becomes the audio with the audio frequency of 9 s), then splicing the sample to the audio with the audio feature points by itself to be 10s (namely, the audio feature points are lower than the preset time length by 10 s), each sample can be cut into n pieces of audio, and the sample size obtained by the expansion scheme is n times of the sample size of the original sample.
S200, constructing a convolutional neural network model;
as shown in fig. 3, the convolutional neural network model sequentially includes: the two-dimensional conv layer, the feature extraction module, the two-dimensional DepthwiseConv layer, the mean pooling layer, the two-dimensional conv layer, the pooling layer, the Reshape layer, the two-dimensional conv layer and the Softmax layer, wherein the feature extraction module comprises 4 Transmit Block blocks and 12 normalBlock blocks, and the flow of the convolutional neural network model is as follows: the method comprises the steps of extracting log_mel characteristics (MxN) of original audio (P x 1) as original characteristics of a network, inputting the log_mel characteristics through a two-dimensional Conv layer, inputting the log_mel characteristics into a characteristic extraction module, wherein the characteristic extraction module consists of 4 TransmitionBlock blocks and 12 normalBlock blocks, extracting a characteristic diagram, inputting the extracted characteristic diagram through a two-dimensional DepthwiseConv layer after passing through a Mean layer, inputting the extracted characteristic diagram into the two-dimensional Conv layer after passing through a pooling layer, carrying out dimension adjustment through a Reshape layer after passing through the two-dimensional Conv layer, and finally obtaining corresponding category scores through a Softmax layer. The specific parameter settings are shown in table 1:
table 1: convolutional neural network model parameter setting table
Network layer Structure of the Output dimension Quantity of parameters
1 padding [400, 132, 1] 0
2 conv2d [396, 64, 16] 416
3 transition_block [394, 64, 16] 800
4 normal_block [392, 64, 16] 480
5 normal_block [390, 64, 16] 480
6 padding [390, 64, 16] 0
7 transition_block [386, 32, 32] 2112
8 normal_block [382, 32, 32] 1472
9 normal_block [378, 32, 32] 1472
10 padding [378, 32, 32] 0
11 transition_block [370, 16, 64] 7296
12 normal_block [362, 16, 64] 4992
13 normal_block [354, 16, 64] 4992
14 normal_block [346, 16, 64] 4992
15 normal_block [338, 16, 64] 4992
16 padding [338, 16, 64] 0
17 transition_block [322, 16, 128] 26880
18 normal_block [306, 16, 128] 18176
19 normal_block [290, 16, 128] 18176
20 normal_block [274, 16, 128] 18176
21 normal_block [258, 16, 128] 18176
22 padding [258, 16, 128] 0
23 padding [258, 20, 128] 0
24 conv2d [254, 16, 128] 3328
25 mean [254, 1, 128] 0
26 conv2d [254, 1, 32] 4096
27 padding [1, 1, 32] 0
28 conv2d [1, 1, 9] 288
The specific construction of the transform_block module is shown in fig. 4, and the specific parameter settings of the transform_block module are shown in table 2:
table 2: specific parameter setting table of transition_block module
Network layer Structure of the
1 conv2d
2 batch_normalization
3 padding
4 depthwise_conv2d
5 batch_normalization
6 pooling
7 depthwise_conv2d
8 batch_normalization
9 conv2d
The specific construction of the normal_block module is shown in fig. 5, and the specific parameter settings of the normal_block module are shown in table 3:
table 3: specific parameter setting table of normal_block module
Network layer Structure of the
1 pooling
2 padding
3 depthwise_conv2d
4 batch_normalization
5 pooling
6 depthwise_conv2d
7 batch_normalization
8 conv2d
Aiming at the problem of large calculation amount of the convolutional neural network model, the calculation method of convolutional result multiplexing is used in the embodiment, the filling of time domain dimension is canceled in the network, and zero filling is only carried out on the frequency domain dimension, so that the dimension of the feature map is ensured, and the calculation amount in training and practical use is reduced on the premise of not losing the model precision.
If the input of the convolutional neural network model is the audio length of 10 seconds, 400 frames are obtained after framing as the time domain length of the input features, the audio is completely convolved every 10 seconds during each forward reasoning, and when the next 10 seconds of audio is convolved again after the time of one frame, the two sections of audio which are only 10 seconds of one frame apart have great convolution repetition calculation. In the embodiment, the convolution result of each frame is stored, when the network infers the audio frequency of the next frame for 10 seconds, only the convolution result of the tail frame is needed to be calculated, the convolution calculation can be greatly reduced, the inference speed is increased, and real-time inference is realized.
It should be noted that, the corresponding relationship between Chinese and English with respect to terms in the specification and the drawings is as follows:
conv: a convolution layer;
depthwise Conv: a deep convolution layer;
pooling: pooling layers;
transmission Block: a transition block;
normal Block: a regular block;
mean: an average layer;
reshape: dimensional remodeling;
softmax: a Softmax layer;
cls_prob: category scores;
batch normalization: a batch normalization layer;
ReLU: a ReLU activation function;
pad: a zero-fill layer;
swish: swish activates the function.
As shown in fig. 6 and 7, if the convolutional neural network model is input into 6 frames of length at the time of reasoning to obtain a final result, the convolution kernel is 3, and in the case of normal convolution calculation, separate convolution calculation is performed for every 6 frames of data, the results of 4 convolutions of two adjacent 6 frames of length are the same, which results in 4 repeated convolution calculations. In the embodiment, the convolution calculation results are multiplexed, each convolution result is stored, and the average of the sum of the corresponding convolution results is obtained when the final result of forward reasoning is obtained, so that the repeated calculation of the convolution is reduced, and the calculation amount is greatly reduced.
Meanwhile, in the embodiment, because the residual mapping is applied to the network, when the time domain convolution is performed, if zero filling is performed, the result of each convolution is different, and convolution result administration cannot be performed, so that the zero filling of the time domain dimension is cancelled, and a pooling layer is used for performing corresponding reduction of the feature map dimension, so that the effect of adding the residual mapping is achieved, and the structures of a Transmit Block module and a normalBlock module are specifically seen.
S300, inputting noise sample audio and audio obtained by expansion into a convolutional neural network model for model training so as to obtain a noise classification model;
it should be noted that, the model training stage belongs to the preparation stage of the model, and does not belong to the practical stage, in this embodiment, the manually collected noise audio sample is subjected to a sample expansion method to obtain a training sample of the model, and the convolutional neural network model is trained by using the training sample to obtain an optimal inference weight, so as to supply the noise classification model of the inference stage for use.
S400, carrying out frequency spectrum conversion on the collected noise audio to obtain a log_mel spectrum feature vector of the noise audio;
s500, inputting log_mel spectral feature vectors of noise audio into the noise classification model to output each noise category and corresponding probability;
s600, carrying out noise category statistics in a period of time;
and S700, calculating a noise category corresponding to the maximum probability in the category statistics as a classification result of the noise source.
It should be noted that, steps S400-S700 belong to an inference stage, the inference stage is an actual practical stage of the noise classification model, the microphone receives the environmental noise, and through spectrum conversion, a log_mel spectrum feature vector of the audio is obtained, and is used for inputting the noise classification model (the weight obtained in the loaded training stage) to obtain the probability corresponding to each noise class, then the class statistics is performed for a period of time, and finally the class corresponding to the maximum probability is output.
The use of a 1x1 two-dimensional convolution and a 1x3 two-dimensional convolution in this embodiment reduces the number of parameters of the model. The method uses a calculation method of frame convolution, the filling of time domain dimension is canceled in a network, and zero filling is only carried out on the frequency domain dimension, so that the size of the dimension of the feature map is ensured, and the calculated amount in training and actual use is reduced on the premise of not losing the model precision.
Aiming at the problem of large calculation amount of a convolutional neural network, the calculation method of convolutional result multiplexing is used in the scheme, filling of time domain dimension is canceled in the network, zero filling is only carried out on the frequency domain dimension, the size of the dimension of the feature map is guaranteed, and the calculation amount in training and practical use is reduced on the premise that model accuracy is not lost.
Example 2
As shown in fig. 8, an apparatus for classifying noise sources based on convolutional neural network according to embodiment 2 of the present application includes:
100 industrial personal computers, the industrial personal computer 100 comprising a processor 101, a memory 102 and a program or instruction stored on the memory 102 and executable on the processor 101, the program or instruction implementing the method according to any one of the embodiments 1 when executed by the processor 101;
a microphone 200, wherein the microphone 200 is electrically connected with the processor 101;
the display screen 300, the display screen 300 is electrically connected with the processor 101.
It should be noted that, for avoiding redundancy, reference may be made to other specific embodiments of the device for classifying noise sources based on convolutional neural networks in the embodiment of the present invention, and in order to avoid redundancy, details are not described here, when in use, the microphone 200 collects audio information and transmits the audio information to the industrial personal computer 100, the industrial personal computer 100 carries a program or an instruction that can run on the processor 101, and when the program or the instruction is executed by the processor 101, the method described in any one of embodiments 1 is implemented, the audio information is processed by the industrial personal computer 100 to obtain the category to which the audio belongs, and the category information is transmitted to the display screen 300 for display.
Example 3
A computer readable storage medium according to embodiment 3 of the present application stores program code for execution by a device, the program code including steps for performing the method in any one of the implementations of embodiment 1 of the present application;
wherein the computer readable storage medium may be a Read Only Memory (ROM), a static storage device, a dynamic storage device, or a random access memory (random access memory, RAM); the computer readable storage medium may store program code which, when executed by a processor, is adapted to perform the steps of a method as in any one of the implementations of embodiment 1 of the present application.
The above is only a preferred embodiment of the present application; the scope of protection of the present application is not limited in this respect. Any person skilled in the art, within the technical scope of the present disclosure, shall cover the protection scope of the present application by making equivalent substitutions or alterations to the technical solution and the improved concepts thereof.

Claims (8)

1. A method for noise source classification based on convolutional neural network, comprising:
acquiring noise sample audio, and expanding the noise sample audio by adopting methods of splicing, re-cutting, audio tone changing, audio speed changing, noise adding and random cutting;
constructing a convolutional neural network model, wherein the convolutional neural network model sequentially comprises: the device comprises a two-dimensional conv layer, a feature extraction module, a two-dimensional DepthwiseConv layer, a mean pooling layer, a two-dimensional conv layer, a pooling layer, a Reshape layer, a two-dimensional conv layer and a Softmax layer, wherein the feature extraction module comprises 4 Transmit Block blocks and 12 normalBlock blocks, the convolutional neural network model adopts a calculation method of convolutional result multiplexing, the filling of a time domain dimension is canceled in a network, zero filling is only carried out on the frequency domain dimension, and the corresponding reduction of a feature map dimension is carried out through the pooling layer so as to achieve the effect of residual mapping addition;
inputting the noise sample audio and the audio obtained by expansion into a convolutional neural network model for model training so as to obtain a noise classification model;
the collected noise audio is subjected to frequency spectrum conversion to obtain a log_mel spectrum characteristic vector of the noise audio;
inputting log_mel spectral feature vectors of noise audio into the noise classification model to output respective noise categories and corresponding probabilities;
carrying out noise category statistics in a period of time;
and calculating a noise class corresponding to the maximum probability in the class statistics as a classification result of the noise source.
2. The convolutional neural network-based noise source classification method of claim 1, wherein the stitching re-clipping comprises: all noise sample audios are spliced into a long audio, and the long audio is sheared into a plurality of audios with first preset duration in an overlapping mode.
3. The method of convolutional neural network-based noise source classification of claim 1, wherein the audio tone variation comprises: each noise sample audio is individually subjected to a random pitch variation process to obtain an equal number of audio.
4. The method of convolutional neural network-based noise source classification of claim 1, wherein the audio shifting comprises: performing random speed change processing on each piece of noise sample audio independently, comparing the audio time after speed change with a first preset time, and if the audio time after speed change does not reach the first preset time, performing self-splicing by using the audio after speed change to enable the audio time to be equal to the first preset time; if the audio time after the speed change exceeds the first preset time, cutting the audio after the speed change to enable the audio time to be equal to the first preset time.
5. The method of convolutional neural network-based noise source classification of claim 1, wherein the adding noise comprises: random snr ambient noise is added to each noise sample audio to get an equal amount of audio.
6. The method of convolutional neural network-based noise source classification of claim 1, wherein the random clipping comprises: and randomly cutting each noise sample audio according to the audio feature points for a second preset time period, and splicing the randomly cut audio to the first preset time period by using the audio.
7. An apparatus for noise source classification based on convolutional neural network, comprising:
an industrial personal computer comprising a processor, a memory and a program or instruction stored on the memory and executable on the processor, which when executed by the processor implements the method of any one of claims 1-6;
the microphone is electrically connected with the processor;
and the display screen is electrically connected with the processor.
8. A computer readable storage medium storing program code for execution by a device, the program code comprising steps for performing the method of any one of claims 1-6.
CN202311208076.9A 2023-09-19 2023-09-19 Convolutional neural network-based noise source classification method and device Active CN116959477B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202311208076.9A CN116959477B (en) 2023-09-19 2023-09-19 Convolutional neural network-based noise source classification method and device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202311208076.9A CN116959477B (en) 2023-09-19 2023-09-19 Convolutional neural network-based noise source classification method and device

Publications (2)

Publication Number Publication Date
CN116959477A CN116959477A (en) 2023-10-27
CN116959477B true CN116959477B (en) 2023-12-19

Family

ID=88454923

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202311208076.9A Active CN116959477B (en) 2023-09-19 2023-09-19 Convolutional neural network-based noise source classification method and device

Country Status (1)

Country Link
CN (1) CN116959477B (en)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117690451B (en) * 2024-01-29 2024-04-16 杭州爱华仪器有限公司 Neural network noise source classification method and device based on ensemble learning

Citations (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105868785A (en) * 2016-03-30 2016-08-17 乐视控股(北京)有限公司 Image identification method based on convolutional neural network and image identification system thereof
WO2020037960A1 (en) * 2018-08-21 2020-02-27 深圳大学 Sar target recognition method and apparatus, computer device, and storage medium
CN111915506A (en) * 2020-06-19 2020-11-10 西安电子科技大学 Method for eliminating stripe noise of sequence image
CN111951823A (en) * 2020-08-07 2020-11-17 腾讯科技(深圳)有限公司 Audio processing method, device, equipment and medium
KR20210034429A (en) * 2019-09-20 2021-03-30 아주대학교산학협력단 Apparatus and method for classificating point cloud using neighbor connectivity convolutional neural network
CN113963719A (en) * 2020-07-20 2022-01-21 东声(苏州)智能科技有限公司 Deep learning-based sound classification method and apparatus, storage medium, and computer
CN115019760A (en) * 2022-05-19 2022-09-06 上海理工大学 Data amplification method for audio and real-time sound event detection system and method
CN115050356A (en) * 2022-06-07 2022-09-13 中山大学 Noise identification method and device and computer readable storage medium
CN115358718A (en) * 2022-08-24 2022-11-18 广东旭诚科技有限公司 Noise pollution classification and real-time supervision method based on intelligent monitoring front end
CN115456088A (en) * 2022-09-19 2022-12-09 浙江科技学院 Motor fault classification method based on self-encoder feature lifting dimension and multi-sensing fusion

Patent Citations (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105868785A (en) * 2016-03-30 2016-08-17 乐视控股(北京)有限公司 Image identification method based on convolutional neural network and image identification system thereof
WO2020037960A1 (en) * 2018-08-21 2020-02-27 深圳大学 Sar target recognition method and apparatus, computer device, and storage medium
KR20210034429A (en) * 2019-09-20 2021-03-30 아주대학교산학협력단 Apparatus and method for classificating point cloud using neighbor connectivity convolutional neural network
CN111915506A (en) * 2020-06-19 2020-11-10 西安电子科技大学 Method for eliminating stripe noise of sequence image
CN113963719A (en) * 2020-07-20 2022-01-21 东声(苏州)智能科技有限公司 Deep learning-based sound classification method and apparatus, storage medium, and computer
CN111951823A (en) * 2020-08-07 2020-11-17 腾讯科技(深圳)有限公司 Audio processing method, device, equipment and medium
CN115019760A (en) * 2022-05-19 2022-09-06 上海理工大学 Data amplification method for audio and real-time sound event detection system and method
CN115050356A (en) * 2022-06-07 2022-09-13 中山大学 Noise identification method and device and computer readable storage medium
CN115358718A (en) * 2022-08-24 2022-11-18 广东旭诚科技有限公司 Noise pollution classification and real-time supervision method based on intelligent monitoring front end
CN115456088A (en) * 2022-09-19 2022-12-09 浙江科技学院 Motor fault classification method based on self-encoder feature lifting dimension and multi-sensing fusion

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
Diagonal State Space Augmented Transformers for Speech Recognition;George Saon et al;ICASSP 2023 - 2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP);全文 *

Also Published As

Publication number Publication date
CN116959477A (en) 2023-10-27

Similar Documents

Publication Publication Date Title
CN110097129B (en) Remote sensing target detection method based on profile wave grouping characteristic pyramid convolution
CN108764471B (en) Neural network cross-layer pruning method based on feature redundancy analysis
CN107393542B (en) Bird species identification method based on two-channel neural network
CN116959477B (en) Convolutional neural network-based noise source classification method and device
CN108231086A (en) A kind of deep learning voice enhancer and method based on FPGA
CN110533022B (en) Target detection method, system, device and storage medium
CN114492831B (en) Method and device for generating federal learning model
CN112785034B (en) Typhoon path forecasting method, system, medium and terminal based on fusion neural network
CN113658200B (en) Edge perception image semantic segmentation method based on self-adaptive feature fusion
Zhang et al. Hourly prediction of PM 2.5 concentration in Beijing based on Bi-LSTM neural network
CN111401523A (en) Deep learning network model compression method based on network layer pruning
CN112652306A (en) Voice wake-up method and device, computer equipment and storage medium
Yang et al. A new hybrid prediction model of PM2. 5 concentration based on secondary decomposition and optimized extreme learning machine
CN112101487B (en) Compression method and device for fine-grained recognition model
CN112800851B (en) Water body contour automatic extraction method and system based on full convolution neuron network
CN114169232A (en) Full-time-period three-dimensional atmospheric pollutant reconstruction method and device, computer equipment and storage medium
CN111586151B (en) Intelligent city data sharing system and method based on block chain
CN111932690B (en) Pruning method and device based on 3D point cloud neural network model
CN113706471A (en) Steel product surface defect detection method based on model compression
CN111091580B (en) Stumpage image segmentation method based on improved ResNet-UNet network
Li et al. Towards communication-efficient digital twin via ai-powered transmission and reconstruction
CN115331690B (en) Method for eliminating noise of call voice in real time
CN110766219A (en) Haze prediction method based on BP neural network
CN116611580A (en) Ocean red tide prediction method based on multi-source data and deep learning
CN110164418B (en) Automatic speech recognition acceleration method based on convolution grid long-time memory recurrent neural network

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant