CN110751044B - Urban noise identification method based on deep network migration characteristics and augmented self-coding - Google Patents

Urban noise identification method based on deep network migration characteristics and augmented self-coding Download PDF

Info

Publication number
CN110751044B
CN110751044B CN201910886926.8A CN201910886926A CN110751044B CN 110751044 B CN110751044 B CN 110751044B CN 201910886926 A CN201910886926 A CN 201910886926A CN 110751044 B CN110751044 B CN 110751044B
Authority
CN
China
Prior art keywords
output
self
spectrogram
augmented
training
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201910886926.8A
Other languages
Chinese (zh)
Other versions
CN110751044A (en
Inventor
曹九稳
崔小南
王天磊
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Hangzhou Dianzi University
Original Assignee
Hangzhou Dianzi University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Hangzhou Dianzi University filed Critical Hangzhou Dianzi University
Priority to CN201910886926.8A priority Critical patent/CN110751044B/en
Publication of CN110751044A publication Critical patent/CN110751044A/en
Application granted granted Critical
Publication of CN110751044B publication Critical patent/CN110751044B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/60Type of objects
    • G06V20/64Three-dimensional objects
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/25Fusion techniques
    • G06F18/253Fusion techniques of extracted features
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • General Physics & Mathematics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Artificial Intelligence (AREA)
  • General Engineering & Computer Science (AREA)
  • Evolutionary Computation (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Computational Linguistics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Health & Medical Sciences (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Evolutionary Biology (AREA)
  • General Health & Medical Sciences (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Multimedia (AREA)
  • Image Analysis (AREA)

Abstract

The invention discloses a city noise identification method based on deep network migration characteristics and augmented self-coding. The invention comprises the following steps: 1. preprocessing each type of acquired urban noise signals, including denoising, framing and windowing; 2. converting the processed noise signal into a spectrogram; 3. performing feature extraction on the spectrogram obtained in the step (2) by using a plurality of pre-trained deep convolutional neural networks; 4. fusing the obtained feature vectors x by using an augmented self-encoder; 5. constructing a multi-layer one-class classification model on the basis of the fusion characteristics in the step 4; 6. calculating an output weight and a decision threshold of the ML-OCRLS; 7. and carrying out classification prediction on the unknown signals. The hidden layer neuron of the augmented self-encoder provided by the invention can optimize all the characteristics, main information can be extracted by the ML-OCRLS based on the augmented self-encoder, the characteristic redundancy is reduced, and meanwhile, various transfer learning characteristics are effectively fused, so that the classification accuracy of a classifier is improved.

Description

Urban noise identification method based on deep network migration characteristics and augmented self-coding
Technical Field
The invention belongs to the field of sound signal identification, and relates to a city noise identification method based on deep network migration characteristics and augmented self-coding.
Background
With the advance of urbanization construction, the noise pollution problem is increasingly serious, and great influence is caused on the life quality and health of people. By identifying various typical urban environmental noises and correspondingly processing the noises, the method plays a vital role in monitoring and governing urban noise pollution. Most of the existing methods are based on traditional speech features combined with classifier algorithms to perform urban noise recognition. However, these methods have the following problems: 1) for the urban noise signals with multiple categories and more complex, the traditional speech features cannot effectively represent the urban noise signals. 2) The urban noise identification method based on multiple characteristics relieves the problem that single characteristics cannot represent all types of urban noise, but the multi-characteristic fusion method still stays in simple splicing, adding or multiplying modes and the like. 3) The traditional shallow multi-classifier algorithm has limited generalization capability and complex model updating algorithm, and is not suitable for complex and changeable urban environment noise identification modeling.
Disclosure of Invention
In order to overcome the problems in urban noise identification, the invention provides an urban noise identification method based on deep network migration characteristics and augmented self-encoding. The method is respectively improved aiming at the existing problems, and comprises the steps that 1) aiming at the problem that the traditional voice characteristics cannot effectively express urban noise, a plurality of deep convolution networks pre-trained in ImageNet are adopted as a characteristic extractor to extract a plurality of convolution characteristics based on an urban noise spectrogram, the deep convolution networks gradually extract the deep characteristics of the image through operations such as convolution transformation, pooling and the like, the rich nonlinear information in the spectrogram can be learned, and the extracted convolution characteristics have stronger clustering characteristics and generalization capability; the multiple convolution characteristics are used for compensating the problem that a single convolution characteristic cannot effectively represent all noise signals; 2) in order to effectively fuse the multi-convolution features, an Augmented Auto Encoder (AAE) is proposed to fuse the extracted multi-convolution features, so as to obtain a higher-level representation of the urban noise signals; 3) a multi-layer single classification model is provided, the single classification model is used for learning and forming data description of a target class from data only containing a single class (the target class), whether an unknown sample belongs to the target class is judged according to a similarity measurement set threshold value instead of distributing the unknown sample to a predefined class, therefore, the single classification can accurately identify a new type of noise signal, when the noise signal is added to the new class, only a specific single classifier needs to be trained, repeated training of the known class can be avoided, and the training time of the classification model is shortened.
The technical scheme of the invention mainly comprises the following steps:
step 1, preprocessing each type of collected urban noise signals, including denoising, framing and windowing, wherein the frame length is L, and the frame shift is
Figure BDA0002207573100000021
And 2, converting the processed noise signal into a spectrogram.
2-1, performing fast Fourier transform on the preprocessed sound signal data, taking each frequency band after the transform as a vertical coordinate, and simultaneously taking multi-frame signals as a horizontal coordinate to construct a two-dimensional image matrix, wherein each pixel point is the energy of the current frame signal in the corresponding frequency band;
and 2-2, calculating the spectral energy density of each pixel point, and expressing the energy density by using the shade of the color tone to obtain a three-dimensional image, namely a spectrogram.
And 3, performing feature extraction on the spectrogram obtained in the step 2 by using a plurality of pre-trained deep convolutional neural networks. Supposing that M deep convolution networks are used for extracting the features of the spectrogram, firstly, the spectrogram is cut and scaled according to the adopted deep convolution networks, then, the adjusted spectrogram is input into the deep convolution networks, the output of the last full-connection layer of the network is extracted as the features extracted by the convolution networks, and D is respectively obtained for the M deep convolution networks 1 ,...,D M And (5) dimension feature vectors, and splicing the M features to obtain spliced feature vectors x.
And 4, fusing the obtained feature vectors x by using an augmented self-encoder.
4-1, processing the data through the step 3 to obtain a training data set X ═ X 1 ,...,x N ]Wherein x is n The feature vector after splicing the M features of the nth spectrogram is obtained, wherein N is 1, …, and N is the number of samples; setting the convolution characteristics of each sample in the data set X except the mth characteristic as 0, wherein M is 1, … and M, and obtaining the result
Figure BDA0002207573100000022
A new data set X m (ii) a Then two different convolution characteristics in the data set X are reserved, the other characteristics are set to be 0, and two different characteristic combinations are sequentially selected to obtain
Figure BDA0002207573100000031
A new data set, and so on for a total of
Figure BDA0002207573100000032
A new data set is merged to obtain X x Constructed as an input to an augmented self-encoder
Figure BDA0002207573100000033
As an output from the encoder.
For example, two deep convolution networks are used for extracting features of a spectrogram, and feature vectors obtained by the nth sample are respectively
Figure BDA0002207573100000034
And
Figure BDA0002207573100000035
splicing is carried out to obtain the feature vector
Figure BDA0002207573100000036
The training data set is
Figure BDA0002207573100000037
Respectively construct X 1 =[X (1) 0] T ,X 2 =[0 X (2) ] T And X 12 =[X (1) X (2) ] T Combining three new data sets to obtain X x ={X 1 ,X 2 ,X 12 As the input of the augmented autoencoder, construct X y X, X as the output of the augmented autoencoder.
4-2. weight and bias matrix W of random initialization coding layer 01 And b 01 Weight and offset matrix W of decoding layer 02 And b 02 (ii) a The hidden layer output of AAE is H ═ g (W) 01 X x +b 01 ) The output of the output layer is phi ═ sigma (W) 02 H+b 02 ) The following loss function is constructed:
Figure BDA0002207573100000038
where λ is the weight attenuation parameter, ρ is the sparsity parameter, typically ρ > 0 and close to 0,
Figure BDA0002207573100000039
represents the average activation value of the i-th hidden layer neuron of the self-encoder, beta is a penalty factor,
Figure BDA00022075731000000310
is a penalty term.
4-3, training the augmented self-encoder by using a random gradient descent method, and extracting the weight W of the trained encoding layer 01 And bias b 01
4-4, finally, using the trained AAE to encode the training data set X, and the encoding output is H 0 =g(W 01 X+b 01 ) Obtaining H 0 I.e. the fusion signature.
And 5, constructing a multi-layer one-class classification model (ML-OCRLS) on the basis of the fusion characteristics in the step 4.
5-1, extracting the weight W of the AAE coding layer 01 And bias b 01 As parameters between the input layer and the first hidden layer of the ML-OCRLS;
5-2. in order to enable some structural patterns in the input data to be found, a conventional sparse self-encoder is used, with H 0 Training for input and output, sparsity constraint on hidden layer neuron activation is added by using a sparse self-encoder, whose loss function is as follows:
Figure BDA0002207573100000041
Wherein phi 1 Is the actual output of the sparse autoencoder.
5-3, training the self-encoder by using a random gradient descent method, and then extracting the weight W of the trained encoding layer 11 And bias b 11 Obtaining hidden layer output H as parameter between first hidden layer and second hidden layer of ML-OCRLS 1 =g(W 11 H 0 +b 11 ) The next autoencoder is trained as input and output.
5-4, training a plurality of self-encoders in sequence, and obtaining the parameter W of the ML-OCRLS after the training is finished 01 ,...,W k1 And b 01 ,...,b k1 And k is the number of self-encoders. Computing the hidden layer output H of the kth self-encoder k =g(W k1 H k-1 +b k1 )。
And 6, calculating the output weight and the decision threshold of the ML-OCRLS.
6-1. setting the expected output T ═ T of the training samples 1 ,...,t N ] T =[1,...,1] T Solving a minimization problem:
Figure BDA0002207573100000042
where C is the trade-off coefficient between the two terms and β is the output weight matrix of ML-OCRLS.
Solving the above problem:
Figure BDA0002207573100000043
obtaining the real output O ═ H of the training sample k β;
6-2, calculating the distance between each sample and the target class, wherein the formula is as follows:
d(x i )=|o i -t i |=|ε i |
sorting the materials from large to small to obtain d ═ d 1 ,...,d N ];
Setting a classification decision threshold θ ═ d floor(μ·N) Where μ is a threshold parameter.
And 7, carrying out classification prediction on the unknown signals.
For an unknown sound signal z, the signal z is converted into a spectrogram after being preprocessed, the same deep convolution network is used for feature extraction, M kinds of features are obtained and input into ML-OCRLS for fusion and classification after being spliced, and the output is
Figure BDA0002207573100000051
The distance of the signal z from the target class is then calculated:
Figure BDA0002207573100000052
and judging the category of the unknown sample according to a decision function, wherein the decision function is as follows:
Figure BDA0002207573100000053
the invention has the following beneficial effects:
the invention trains corresponding ML-OCRLS aiming at different types of urban noise signals, can accurately identify the new type of urban noise, does not need to retrain the known category when the urban noise increases the new category, and can obviously reduce the model training time. Compared with the traditional voice feature extraction method and the method using a single deep convolution network as a feature extractor, the method uses a plurality of deep convolution networks to extract features of the spectrogram, the extracted features can effectively represent all noise signals, and richer and more detailed information of the spectrogram can be obtained. Compared with a common feature fusion method, the hidden layer neuron of the augmented self-encoder provided by the invention can optimize all features, main information can be extracted based on the ML-OCRLS of the augmented self-encoder, feature redundancy is reduced, and meanwhile, various transfer learning features are effectively fused, so that the classification precision of a classifier is improved.
Drawings
FIG. 1 is a flow chart of a city noise identification method based on deep network migration feature and augmented self-coding proposed by the present invention;
FIG. 2(a) is a view showing the structure of the acceptance _ v3 model;
FIG. 2(b) is a diagram of the resnet152 model architecture;
FIG. 2(c) is a view showing the structure of the concept _ resnet _ v2 model;
fig. 3 is a diagram of a ML-OCRLS network architecture.
Detailed Description
The invention is further illustrated by the following figures and examples.
Taking 11 kinds of city noise signals as an example, the invention is further explained by using three deep convolutional neural networks pre-trained on ImageNet, namely, initiation _ v3, respet 152 and initiation _ respet _ v2, as feature extractors. The following description is exemplary and explanatory only and is not restrictive of the invention in any way.
As shown in fig. 1, the urban noise identification method based on deep network migration features and augmented self-coding is specifically implemented as follows:
step 1, preprocessing each type of collected urban noise signals, including denoising, framing and windowing, wherein the frame length is L, and the frame shift is
Figure BDA0002207573100000061
1-1 normalization and pre-emphasis
Firstly, normalizing the amplitude of the collected urban noise signal to [ -1, 1 [ -1 [ ]]The influence on the identification result caused by the difference is reduced; then, a first-order high-pass filter is used for pre-emphasis processing of the signal, and the corresponding characteristics of the first-order high-pass filter are as follows: h (z) ═ 1-z -1 U has a value range of [0.9, 1 ]];
1-2 framing and windowing
Framing the preprocessed signal to obtain stable short-time signal, adding window function to each frame signal to reduce the frequency spectrum leakage of the framed sound signal, and using a certain amount of window functionMultiplying the window function w (n) of the length by the sound signal x (n) to obtain a windowed signal x for each frame i (n),x i (n)=w(n)*x(n);
Here a hanning window is used as a window function,
Figure BDA0002207573100000062
and 2, converting the processed sound signal into a spectrogram. Firstly, performing fast Fourier transform on preprocessed sound signal data, taking each frequency band after the transform as a vertical coordinate, and simultaneously using multi-frame signals as a horizontal coordinate to construct a two-dimensional image matrix, wherein each pixel point is the energy of the frame signal in the corresponding frequency band, then calculating the spectral energy density of each point, and expressing the energy density by the shade of the tone to obtain a three-dimensional sound spectrogram.
And 3, performing feature extraction on the spectrogram obtained in the step 2 by using three pre-trained deep convolutional neural networks, wherein the three convolutional networks are initiation _ v3, respet 152 and initiation _ reset _ v2 respectively.
Firstly, clipping and scaling a spectrogram according to three depth convolution networks, wherein the input image sizes of an interception _ v3, a rest 152 and an interception _ rest _ v2 are 299 multiplied by 3, 224 multiplied by 3 and 299 multiplied by 3 respectively, and the clipped and scaled spectrogram is input into the depth convolution networks respectively to extract deep features;
The concept of decomposing into small convolutions is introduced into the interception _ v3 network, a larger two-dimensional convolution is decomposed into two smaller one-dimensional convolutions, a large number of parameters are saved, operation is accelerated, overfitting is relieved, meanwhile, the expression capability of a layer of nonlinear expansion model is increased, more and richer space features can be processed, and feature diversity is increased, for example, as an interception _ v3 network structure diagram shown in fig. 2(a), 2048-dimensional features of a last fully-connected linear lagits layer of the network are extracted as features extracted by an interception _ v3 network;
the resnet network introduces a residual error learning unit structure to relieve the degradation of the deep neural network, the resnet152 has a deep network structure, the extracted features are more abstract, semantic information is richer, as shown in fig. 2(b), a resnet152 network structure diagram is shown, and 2048 dimensional features in conv5_ x of the last layer of the network are extracted as the features extracted by the resnet152 network;
an interception _ net _ v2 network adds a short-circuit connection of reset in an interception module, which not only avoids the degradation problem caused by a deep structure, but also reduces the training time, as shown in fig. 2(c), an interception _ net _ v2 network structure diagram is extracted, and 1536-dimensional features in the last full-connection layer of the network are taken as the features extracted by the interception _ net _ v2 network;
And then, splicing the three features to obtain 5632-dimensional feature vector x ═ x (1) x (2) x (3) ] T
And 4, fusing the obtained feature vectors by using an augmented self-encoder. Processing the data through the step 3 to obtain a training data set X ═ X 1 ,...,x N ]Wherein x is n N is a feature vector formed by splicing three features of the nth spectrogram, and N is the number of samples. Setting the convolution characteristics of each sample in the data set X except the m, m-1, 2 and 3 types of characteristics as 0 to obtain three new data sets X 1 =[X (1) 0 0] T ,X 2 =[0 X (2) 0] T And X 3 =[0 0 X (3) ] T . Then two different convolution characteristics in the data set X are reserved, the other characteristics are set to be 0, and two different characteristic combinations are sequentially selected to obtain three new data sets X 12 =[X (1) X (2) 0] T ,X 13 =[X (1) 0 X (3) ] T And X 23 =[0X (2) X (3) ] T And finally, reserving three characteristics in X to obtain a new data set X 123 =[X (1) X (2) X (3) ] T And are combined to give X x ={X 1 ,X 2 ,X 3 ,X 12 ,X 13 ,X 23 ,X 123 Constructed as an input to an augmented autoencoder
Figure BDA0002207573100000081
As an output from the encoder.
Then randomly initializing the weight and bias matrix W of the coding layer and the decoding layer 01 ,b 01 ,W 02 And b 02 The hidden layer output of AAE is H ═ g (W) 01 X x +b 01 ) The actual output is phi ═ sigma (W) 02 H+b 02 ) We construct the loss function as follows:
Figure BDA0002207573100000082
wherein lambda is a weight attenuation parameter, beta is a penalty factor, and the two parameters are determined by a grid optimization method; ρ is a sparsity parameter, and is empirically set to 0.05.
Training the augmented self-encoder by using a random gradient descent method, and extracting the weight W of the trained encoding layer 01 And bias b 01
Finally, the trained AAE is used for coding the training data set X, and the coded output is H 0 =g(W 01 X+b 01 ) Obtaining H 0 I.e. the fusion characteristics.
And 5, constructing a multi-layer one-class classification model (ML-OCRLS) on the basis of the fusion characteristics in the step 4. Firstly, the weight W of the extracted AAE coding layer 01 And bias b 01 As parameters between the input layer and the first hidden layer of the ML-OCRLS; then using a conventional sparse auto-encoder, with H 0 Training for input and output, in order to enable some structural patterns in the input data to be discovered, a sparse autoencoder is used to add sparsity constraints on hidden neuron activation, and the loss function of the sparse autoencoder is as follows:
Figure BDA0002207573100000083
wherein phi 1 Is the actual output from the encoder. Benefit toTraining the self-encoder by using a random gradient descent method, and then extracting the weight W of the trained encoding layer 11 And bias b 11 Obtaining hidden layer output H as parameter between first hidden layer and second hidden layer of ML-OCRLS 1 =g(W 11 H 0 +b 11 ) The next autoencoder is trained as input and output.
Training a plurality of self-encoders in sequence, and obtaining the parameter W of the ML-OCRLS after the training is finished 01 ,...,W k1 And b 01 ,...,b k1 And k is the number of self-encoders. As shown in the network structure diagram of ML-QCRLS in FIG. 3, the hidden layer output H of the kth self-encoder is calculated k =g(W k1 H k-1 +b k1 )。
And 6, calculating the output weight and the decision threshold of the ML-OCRLS. Setting an expected output T ═ T for training samples 1 ,...,t N ] T =[1,...,1] T Solving the minimization problem
Figure BDA0002207573100000091
Where C is the trade-off coefficient between the two terms and β is the output weight matrix of ML-OCRLS.
Solving the above problem:
Figure BDA0002207573100000092
obtaining the real output O ═ H of the training sample k β, then calculating the distance between each sample and the target class, the formula is as follows:
d(x i )=|o i -t i |=|ε i |
sorting it d ═ d 1 ,...,d N ]Wherein d is 1 ≥d N
Setting a classification decision threshold θ ═ d floor(μ·N) Where μ is the threshold parameter, we set to 0.1 empirically.
And 7, carrying out classification decision on the unknown signals. For an unknown sound signal z, the signal is converted into a spectrogram after being preprocessed, the same deep convolution network is used for feature extraction, three kinds of features are obtained and input into ML-OCRLS for fusion and classification after being spliced, and the output is
Figure BDA0002207573100000093
Then the distance of z from the target class is calculated:
Figure BDA0002207573100000094
and judging the category of the unknown sample according to a decision function, wherein the decision function is as follows:
Figure BDA0002207573100000095

Claims (1)

1. the urban noise identification method based on the deep network migration characteristic and the augmented self-coding is characterized by comprising the following steps of:
Step 1, preprocessing each type of collected urban noise signals, including denoising, framing and windowing, wherein the frame length is L, and the frame shift is
Figure FDA0003693874300000011
Step 2, converting the processed noise signal into a spectrogram;
step 3, extracting the characteristics of the spectrogram obtained in the step 2 by using a plurality of pre-trained deep convolutional neural networks; supposing that M deep convolution networks are used for extracting the features of the spectrogram, firstly, the spectrogram is cut and scaled according to the adopted deep convolution networks, then, the adjusted spectrogram is input into the deep convolution networks, the output of the last full-connection layer of the network is extracted as the features extracted by the convolution networks, and D is respectively obtained for the M deep convolution networks 1 ,…,D M Dimension feature vectors, namely splicing the M features to obtain spliced feature vectors x;
step 4, fusing the obtained feature vector x by using an augmented self-encoder;
step 5, constructing a multilayer one-class classification model ML-OCRLS on the basis of the fusion characteristics in the step 4;
step 6, calculating the output weight and the decision threshold of the ML-OCRLS;
step 7, carrying out classification prediction on unknown signals;
the step 2 is realized as follows:
2-1, performing fast Fourier transform on the preprocessed sound signal data, taking each frequency band after the transform as a vertical coordinate, and simultaneously taking multi-frame signals as a horizontal coordinate to construct a two-dimensional image matrix, wherein each pixel point is the energy of the current frame signal in the corresponding frequency band;
2-2, calculating the spectral energy density of each pixel point, and expressing the energy density by using the shade of the tone to obtain a three-dimensional image, namely a spectrogram;
and 4, fusing the obtained feature vector x by using the augmented self-encoder, and specifically realizing the following steps:
4-1, processing the data through the step 3 to obtain a training data set X ═ X 1 ,…,x N ]Wherein x is n The feature vector after splicing the M features of the nth spectrogram is obtained, wherein N is 1, …, and N is the number of samples; setting the convolution characteristics of each sample in the data set X except the mth characteristic as 0, wherein M is 1, … and M, and obtaining the result
Figure FDA0003693874300000021
A new data set X m (ii) a Then two different convolution characteristics in the data set X are reserved, the other characteristics are set to be 0, and two different characteristic combinations are sequentially selected to obtain
Figure FDA0003693874300000022
A new data set, and so on for a total of
Figure FDA0003693874300000023
A new data set is merged to obtain X x' Constructing X as an input to an augmented autoencoder y As an output of an augmented autoencoder;
4-2. weight and bias matrix W of random initialization coding layer 01 And b 01 Weight and offset matrix W of decoding layer 02 And b 02 (ii) a The hidden layer output of AAE is H ═ g (W) 01 X x' +b 01 ) The output of the output layer is phi ═ sigma (W) 02 H+b 02 ) The following loss function is constructed:
Figure FDA0003693874300000024
Where λ is the weight attenuation parameter and ρ is the sparsity parameter, usually ρ>0 and is close to 0, and the ratio of,
Figure FDA0003693874300000025
represents the average activation value of the i-th hidden layer neuron of the self-encoder, beta is a penalty factor,
Figure FDA0003693874300000026
is a penalty item;
4-3, training the augmented self-encoder by using a random gradient descent method, and extracting the weight W of the trained encoding layer 01 And bias b 01
4-4, finally, using the trained AAE to encode the training data set X, and the encoding output is H 0 =g(W 01 X+b 01 ) Obtaining H 0 Namely the fusion characteristics;
the step 4-1 is specifically realized as follows:
performing feature extraction on a spectrogram by using two deep convolution networks, wherein feature vectors obtained by the nth sample are respectively
Figure FDA0003693874300000027
And
Figure FDA0003693874300000028
splicing is carried out to obtain the feature vector
Figure FDA0003693874300000029
Figure FDA00036938743000000210
The training data set is
Figure FDA00036938743000000211
Respectively construct X 1 =[X (1) 0] T ,X 2 =[0 X (2) ] T And X 12 =[X (1) X (2) ] T Combining three new data sets to obtain X x' ={X 1 ,X 2 ,X 12 As the input of the augmented autoencoder, construct X y As an output of an augmented autoencoder;
step 5, on the basis of the fusion characteristics in step 4, constructing a multi-layer one-class classification model, which is specifically realized as follows:
5-1, extracting the weight W of the AAE coding layer 01 And bias b 01 As parameters between the input layer and the first hidden layer of the ML-OCRLS;
5-2. use sparse autoencoder, with H 0 Training for input and output, sparsity constraint on hidden layer neuron activation is added by using a sparse self-encoder, whose loss function is as follows:
Figure FDA0003693874300000031
wherein phi 1 Is the actual output of the sparse autoencoder;
5-3, training the self-encoder by using a random gradient descent method, and then extracting the weight W of the trained encoding layer 11 And bias b 11 As MThe parameters between the first hidden layer and the second hidden layer of the L-OCRLS are obtained to obtain the hidden layer output H 1 =g(W 11 H 0 +b 11 ) Training the next autoencoder as input and output;
5-4, training a plurality of self-encoders in sequence, and obtaining the parameter W of the ML-OCRLS after the training is finished 01 ,…,W k1 And b 01 ,…,b k1 K is the number of self-encoders; computing the hidden layer output H of the kth self-encoder k =g(W k1 H k-1 +b k1 );
The calculation of the output weight and the decision threshold of the ML-OCRLS in the step 6 is specifically realized as follows:
6-1. setting the expected output T ═ T of the training samples 1 ,…,t N ] T =[1,…,1] T Solving a minimization problem:
Figure FDA0003693874300000032
where C is a trade-off coefficient between the two terms, β' is the output weight matrix of ML-OCRLS; solving the above problem:
Figure FDA0003693874300000033
obtaining the real output O ═ H of the training sample k β';
6-2, calculating the distance between each sample and the target class, wherein the formula is as follows:
d(x i )=|o i -t i |=|ε i |
sorting the materials from large to small to obtain d ═ d 1 ,…,d N ];
Setting a classification decision threshold θ ═ d floor(μ·N) Where μ is a threshold parameter;
the classification prediction of the unknown signal in the step 7 is specifically realized as follows:
for an unknown sound signal z, the signal is processedConverting the preprocessed z into a spectrogram, extracting features by using the same deep convolution network to obtain M kinds of features, splicing the M kinds of features, inputting the M kinds of features into ML-OCRLS for fusion and classification, and outputting
Figure FDA0003693874300000041
The distance of the signal z from the target class is then calculated:
Figure FDA0003693874300000042
and judging the category of the unknown sample according to a decision function, wherein the decision function is as follows:
Figure FDA0003693874300000043
CN201910886926.8A 2019-09-19 2019-09-19 Urban noise identification method based on deep network migration characteristics and augmented self-coding Active CN110751044B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910886926.8A CN110751044B (en) 2019-09-19 2019-09-19 Urban noise identification method based on deep network migration characteristics and augmented self-coding

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910886926.8A CN110751044B (en) 2019-09-19 2019-09-19 Urban noise identification method based on deep network migration characteristics and augmented self-coding

Publications (2)

Publication Number Publication Date
CN110751044A CN110751044A (en) 2020-02-04
CN110751044B true CN110751044B (en) 2022-07-29

Family

ID=69276686

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910886926.8A Active CN110751044B (en) 2019-09-19 2019-09-19 Urban noise identification method based on deep network migration characteristics and augmented self-coding

Country Status (1)

Country Link
CN (1) CN110751044B (en)

Families Citing this family (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111401236A (en) * 2020-03-16 2020-07-10 西北工业大学 Underwater sound signal denoising method based on self-coding neural network
CN111653290B (en) * 2020-05-29 2023-05-02 北京百度网讯科技有限公司 Audio scene classification model generation method, device, equipment and storage medium
CN111833653A (en) * 2020-07-13 2020-10-27 江苏理工学院 Driving assistance system, method, device, and storage medium using ambient noise
CN111985533B (en) * 2020-07-14 2023-02-03 中国电子科技集团公司第三十六研究所 Incremental underwater sound signal identification method based on multi-scale information fusion
CN112086100B (en) * 2020-08-17 2022-12-02 杭州电子科技大学 Quantization error entropy based urban noise identification method of multilayer random neural network
CN111912521B (en) * 2020-08-17 2021-08-06 湖南五凌电力科技有限公司 Frequency detection method of non-stationary signal and storage medium
CN112614298A (en) * 2020-12-09 2021-04-06 杭州拓深科技有限公司 Composite smoke sensation monitoring method based on intra-class interaction constraint layering single classification
CN113065454B (en) * 2021-03-30 2023-01-17 青岛海信智慧生活科技股份有限公司 High-altitude parabolic target identification and comparison method and device
CN113112003A (en) * 2021-04-15 2021-07-13 东南大学 Data amplification and deep learning channel estimation performance improvement method based on self-encoder
CN113837154B (en) * 2021-11-25 2022-03-25 之江实验室 Open set filtering system and method based on multitask assistance
CN114724549B (en) * 2022-06-09 2022-09-06 广州声博士声学技术有限公司 Intelligent identification method, device, equipment and storage medium for environmental noise

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107293301A (en) * 2017-05-27 2017-10-24 深圳大学 Recognition methods and system based on dental articulation sound
CN107610692A (en) * 2017-09-22 2018-01-19 杭州电子科技大学 The sound identification method of self-encoding encoder multiple features fusion is stacked based on neutral net
CN108922560A (en) * 2018-05-02 2018-11-30 杭州电子科技大学 A kind of city noise recognition methods based on interacting depth neural network model
CN109829352A (en) * 2018-11-20 2019-05-31 中国人民解放军陆军工程大学 Communication fingerprint identification method integrating multilayer sparse learning and multi-view learning
CN109902393A (en) * 2019-03-01 2019-06-18 哈尔滨理工大学 Fault Diagnosis of Roller Bearings under a kind of variable working condition based on further feature and transfer learning

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US11163269B2 (en) * 2017-09-11 2021-11-02 International Business Machines Corporation Adaptive control of negative learning for limited reconstruction capability auto encoder

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107293301A (en) * 2017-05-27 2017-10-24 深圳大学 Recognition methods and system based on dental articulation sound
CN107610692A (en) * 2017-09-22 2018-01-19 杭州电子科技大学 The sound identification method of self-encoding encoder multiple features fusion is stacked based on neutral net
CN108922560A (en) * 2018-05-02 2018-11-30 杭州电子科技大学 A kind of city noise recognition methods based on interacting depth neural network model
CN109829352A (en) * 2018-11-20 2019-05-31 中国人民解放军陆军工程大学 Communication fingerprint identification method integrating multilayer sparse learning and multi-view learning
CN109902393A (en) * 2019-03-01 2019-06-18 哈尔滨理工大学 Fault Diagnosis of Roller Bearings under a kind of variable working condition based on further feature and transfer learning

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
基于特征融合的开挖器械声音识别算法研究;曹九稳;《中国优秀博硕士学位论文全文数据库(硕士)工程科技Ⅱ辑》;20190115;全文 *

Also Published As

Publication number Publication date
CN110751044A (en) 2020-02-04

Similar Documents

Publication Publication Date Title
CN110751044B (en) Urban noise identification method based on deep network migration characteristics and augmented self-coding
CN112364779B (en) Underwater sound target identification method based on signal processing and deep-shallow network multi-model fusion
CN110245608B (en) Underwater target identification method based on half tensor product neural network
CN110491416B (en) Telephone voice emotion analysis and identification method based on LSTM and SAE
CN112216271B (en) Audio-visual dual-mode speech recognition method based on convolution block attention mechanism
CN111627419B (en) Sound generation method based on underwater target and environmental information characteristics
CN106782511A (en) Amendment linear depth autoencoder network audio recognition method
CN106847309A (en) A kind of speech-emotion recognition method
CN111429947B (en) Speech emotion recognition method based on multi-stage residual convolutional neural network
CN113191178B (en) Underwater sound target identification method based on auditory perception feature deep learning
CN113488060B (en) Voiceprint recognition method and system based on variation information bottleneck
CN112183582A (en) Multi-feature fusion underwater target identification method
CN113111786B (en) Underwater target identification method based on small sample training diagram convolutional network
CN111276187A (en) Gene expression profile feature learning method based on self-encoder
CN113611293A (en) Mongolian data set expansion method
CN114694255B (en) Sentence-level lip language recognition method based on channel attention and time convolution network
Mu et al. Voice activity detection optimized by adaptive attention span transformer
CN113673323A (en) Underwater target identification method based on multi-depth learning model joint decision system
CN113851148A (en) Cross-library speech emotion recognition method based on transfer learning and multi-loss dynamic adjustment
CN111785262B (en) Speaker age and gender classification method based on residual error network and fusion characteristics
CN117310668A (en) Underwater sound target identification method integrating attention mechanism and depth residual error shrinkage network
CN109741733B (en) Voice phoneme recognition method based on consistency routing network
CN113643722B (en) Urban noise identification method based on multilayer matrix random neural network
CN116417011A (en) Underwater sound target identification method based on feature fusion and residual CNN
CN114818789A (en) Ship radiation noise identification method based on data enhancement

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant