CN114282572A

CN114282572A - Underwater sound target identification method based on ShuffleNet V2 classification network and Mel spectrum characteristics

Info

Publication number: CN114282572A
Application number: CN202111529853.0A
Authority: CN
Inventors: 曾向阳; 杨爽; 王海涛
Original assignee: Northwestern Polytechnical University
Current assignee: Northwestern Polytechnical University
Priority date: 2021-12-14
Filing date: 2021-12-14
Publication date: 2022-04-05
Anticipated expiration: 2041-12-14
Also published as: CN114282572B

Abstract

The invention relates to an underwater acoustic target identification method based on a ShuffLeNet V2 classification network and Mel spectral features, wherein a ShuffLeNet V2 network of a ShuffLeNet V20.5x version and a ShuffLeNet V21.0x version is used and is modified to match with the Mel spectral features. The method comprises the steps of changing the shape of an input tensor of each layer of the network, changing output channels and changing the size of a convolution kernel of a Globalpool layer. In addition, a batch normalization layer is added to the bottom layer of each of the 2 networks to normalize the mean and variance of each batch of input data. The identification method of the invention also uses data enhancement, increases the sample data volume and improves the generalization capability of the model; feature enhancement is also used to normalize the mel-frequency spectral feature sample range. The experimental result shows that the classification experimental result based on various actually measured underwater sound targets shows that the recognition effect of the underwater sound target recognition method based on the multiple actually measured underwater sound targets is optimal at present as a method for recognizing the underwater sound targets by combining deep learning and artificial Mel spectrum characteristics.

Description

Underwater sound target identification method based on ShuffleNet V2 classification network and Mel spectrum characteristics

Technical Field

The invention belongs to the field of underwater acoustic target identification, and particularly relates to an underwater acoustic target identification method based on a ShuffLeNet V2 classification network and Mel spectrum characteristics.

Background

The underwater sound target recognition is an important component of underwater sound signal processing and is also an important technical support for acquiring the underwater sound information and resisting the underwater sound information. Recently, deep CNN networks such as ResNet and densnet have been applied to underwater acoustic target recognition technology to improve the correct recognition rate of underwater acoustic target recognition. However, ResNet and DenseNet have high computational complexity, affect the network training speed, and are difficult to apply to small mobile-end devices. The shuffle net is a CNN algorithm with extremely high computational efficiency, and the algorithm adopts two new operations of point state group convolution (point group convolution) and channel shuffle (channel shuffle), thereby greatly reducing the computational cost while ensuring the computational accuracy. ShuffleNet V2 further improves the accuracy of classification compared to ShuffleNet. Therefore, the light-weight convolutional network ShuffleNet V2 is a better choice for the mobile-end equipment application of the technical method and the limited underwater sound target sample number.

The underwater acoustic target recognition is different from image recognition, and the application of depth networks such as ResNet, DenseNet and ShuffleNet to the underwater acoustic target recognition needs to process an original underwater acoustic signal. In a traditional underwater acoustic target passive recognition system, feature extraction and a classifier are usually two relatively independent links, and the matching degree of the feature extraction and the classifier needs to be considered in a step processing method. The matching of the features and the classification method is the key for effectively improving the correct recognition rate.

Disclosure of Invention

The technical problem solved by the invention is as follows: in order to solve the problem that the existing target identification system is difficult to deploy to mobile terminal equipment, the invention provides an underwater acoustic target identification method based on a ShuffleNet V2 classification network and Mel spectrum characteristics. The invention introduces ShuffleNet V2 into underwater acoustic target recognition, combines with Mel spectrum characteristics, and provides an underwater acoustic target depth recognition method. Experimental results show that the recognition effect of the underwater sound target recognition method fusing deep learning and artificial Mel spectrum features is optimal at present.

The technical scheme of the invention is as follows: an underwater acoustic target identification method based on ShuffleNet V2 classification network and Mel spectral features comprises the following steps:

step 1: the method comprises the steps of reading labels of underwater sound targets to be recognized, preprocessing several types of the labeled underwater sound targets, dividing the underwater sound targets into training set samples and verification set samples, wherein the number of the training set samples is larger than that of the verification set samples.

Step 2: carrying out feature extraction and feature enhancement on the training set samples and the verification set samples, designing a Mel frequency scale filter bank, and obtaining Mel spectrum features of the multi-class target training set and the verification set after feature enhancement;

and step 3: and taking the Mel spectral features of the multi-class target training set and the verification set as the input of the modified ShuffLeNet V2 network model, training and verifying the network model, and finally completing the underwater sound target recognition.

The further technical scheme of the invention is as follows: in the step 1, the tag reading specifically includes reading a specific path name of the underwater sound target file, and generating tag information corresponding to several types of underwater sound targets according to target category information in the path name.

The further technical scheme of the invention is as follows: in the step 1, each section of target data of several types of underwater sound targets is subjected to framing, wherein the frame overlapping is 25% -75%), and the number of samples is increased.

The further technical scheme of the invention is as follows: the number ratio of the training set samples to the verification set samples is 7: 3.

the further technical scheme of the invention is as follows: the step 2 comprises the following substeps:

step 2.1: framing each sample in the training set and the verification set, and taking N sampling points as an observation unit; a part of frame overlapping M exists between every two adjacent 2 frames, the default M is 1/4 × N, and then each frame signal is multiplied by a window function, and the default Haining window is obtained;

step 2.2: assuming that the framed signal is s (N), N is 0,1 …, N-1, N is the size of the frame, and the haining window expression:

step 2.3: after the signal is windowedFourier transform is performed on each frame signal to obtain a transformed signal X_a(k)：

Wherein S' (N) is a frame signal after windowing, and N represents the number of points of Fourier transform;

step 2.4: designing a Mel frequency scale filter bank, wherein the relationship between the Mel frequency scale and the actual frequency size is as follows:

where f is the actual frequency in Hz. Mel (f) is the perceived frequency in mel;

step 2.5) this step and step 2.4 should be 1 step: defining a filter bank having M filters, the lower limit frequency of the M filter being the center frequency of the M-1 filter, i.e. f₀(m)＝f_l(m+1)＝f_h(M-1), wherein M is 1,2 …, M, f₀Is the center frequency, f_lLower limit frequency (default 0), f_hUpper limit frequency (1/2 of sampling frequency by default); frequency response formula H of triangular band-pass filter_m(k) The following were used:

wherein

Calculate each filter response and each frame X_a(k) Obtaining the Mel spectrum characteristic by the dot product of the squares of the modes; and subtracting the mean value and dividing the variance of the Mel spectral features obtained by each frame to obtain the Mel spectral features after feature enhancement.

The further technical scheme of the invention is as follows: the step 3 comprises the following steps:

step 3.1: modifying the ShuffLeNet V2 network to obtain a modified ShuffLeNet V2 classifier, wherein a Batch Normalization layer, namely Batch Normalization, BN, is added to change the number of output channels of a Mel spectrum feature layer, and Conv1, Stage2, Stage3, Stage4 and Conv5 layers, namely Conv1, Stage2, Stage3, Stage4 and Conv5 layers are all formed by convolution operations, wherein 3 Stage layers are formed by the output size of each feature map in the splicing of spatial downsampling blocks and basic cell blocks, and the convolution kernel size of a GlobalPool layer, namely a global averaging pooling layer, and the number of output channels of a last full-Connected layer, namely Fully Connected layers, FC;

step 3.2: and (3) taking the Mel spectrum characteristics of the multi-class target training set and the verification set as the input of the modified ShuffLeNet V2 network model, and training and verifying the network model.

Effects of the invention

The invention has the technical effects that: to match with the lightweight shefflenet V2 network, mel-spectrum features are input in a manner similar to RGB input of an image, using mel-spectrum as an artificial feature, except that RGB of an image is 3-channel and mel-spectrum features are single-channel. The present invention uses the ShuffleNet V2 networks of the ShuffleNet V20.5x version and the ShuffleNet V21.0x version and modifies them to match the Mel spectral features. The method comprises the steps of changing the shape of an input tensor of each layer of the network, changing output channels and changing the size of a convolution kernel of a Globalpool layer. In addition, a batch normalization layer is added to the bottom layer of each of the 2 networks to normalize the mean and variance of each batch of input data. The identification method of the invention also uses data enhancement, increases the sample data volume and improves the generalization capability of the model; feature enhancement is also used to normalize the mel-frequency spectral feature sample range. The invention can complete the task of identifying various underwater acoustic targets at the mobile terminal. The invention realizes the tasks of feature extraction and classification and identification of underwater sound target identification by utilizing Mel spectrum features and the modified lightweight convolutional network ShuffleNet V2. Under the small-sized equipment at the mobile end, the classification experiment result based on various actually measured underwater sound targets shows that the accurate recognition rate of 99% is obtained as an underwater sound target recognition method integrating deep learning and artificial Mel spectrum characteristics.

Drawings

FIG. 1 is a flow chart of an underwater acoustic target recognition model method fusing Mel spectral features and a modified ShuffleNet V2 classifier.

Detailed Description

Referring to fig. 1, the underwater acoustic target recognition method of the present invention will now be described in detail with reference to examples. The mobile terminal device information of this example is as follows: a display card: the number of available GPUs in the system is 1 and the number of CPUs is 8 in the Nvidia GeForce MX 350. The implementation is programmed in a Python language pytorch1.9 environment.

The depth recognition model method comprises the following steps:

step 1: data pre-processing (data enhancement).

Considering that the underwater sound signal is stationary for a short time, framing the underwater sound signal, and overlapping the frames, which increases the number of samples, can be regarded as some kind of data enhancement. An underwater acoustic target sample of known tags is read. And strictly dividing the underwater sound target data into a training set and a verification set. Namely, it is

Step 2: feature extraction and feature enhancement.

And (3) extraction of mel spectrum features: and performing short-time Fourier transform on the time domain signal to obtain a power spectrum, and performing Mel filtering on the power spectrum to obtain Mel spectrum characteristics.

And (3) feature enhancement: and carrying out feature scaling processing on the obtained Mel spectrum features, and standardizing the range of the data features.

And step 3: ShuffleNet V2 network modification.

The ShuffleNet V2 networks were modified for the ShuffleNet V20.5x version and the ShuffleNet V21.0x version, respectively. The method comprises the changes of the shape of each layer of input tensor of the network (matched with the size of a Mel spectrum characteristic), the changes of an output channel and the changes of the size of a convolution kernel of a GlobalPool layer. In addition, a batch normalization layer is added to the network bottom layer to normalize the mean and variance of each batch of input data.

And 4, step 4: and (3) performing the feature extraction and the feature enhancement in the step (2) on the underwater sound target data obtained by the data enhancement in the step (1) to obtain feature sample data. And inputting the feature sample data acquired in the step 2 into the ShuffLeNet V2 network modified in the step 3 for training and verification.

Wherein the training process is explained as follows: the loss function in the network model is used to represent the difference (error) between the actual output (probability) and the expected output (probability) of the network model. In the training process of the network model, firstly, forward propagation is carried out to calculate actual output (probability), then, network parameters are updated through a reverse gradient propagation algorithm, the loss value of a loss function is reduced to continuously reduce errors, and the actual output (probability) of the model is closer to expected output (probability). Therefore, the smaller the loss value obtained by calculation in the network training process is, the better the recognition performance of the network model is.

In the verification process, the model does not perform reverse gradient training any more, model parameters are not updated any more, the characteristic sample is input into the network to directly calculate actual output (probability), and the correct recognition rate of the verification set is obtained through comparison with expected output (probability). The correct recognition rate is the percentage of correctly classified underwater acoustic targets over all underwater acoustic targets. And the loss value obtained by the training set sample and the correct recognition rate obtained by the verification set sample are used as evaluation indexes of the recognition method.

Step 1: several types of labeled underwater sound targets are read, and the underwater sound target data is preprocessed firstly. And (4) dividing each section of target data into frames, overlapping the frames by 50%, and increasing the number of samples. The underwater sound target data is strictly divided into a training set and a verification set. 7/10 for the total sample was used as the training set sample and 3/10 as the validation set sample in the experiment.

Step 2: and carrying out feature extraction and feature enhancement on the data sample.

Each sample is framed. Taking N as an observation unit, N is 2048, and a part of frame overlap M exists between every two adjacent 2 frames, and M is 1/4 × N as a default. Each frame signal is then multiplied by a window function, the default haining window. Assuming that the signal after framing is s (N), N is 0,1 …, N-1, and N is the size of the frame, formula (1) gives the result obtained by multiplying the signal by the haining window, and formula (2) is a haining window expression:

S'(n)＝S(n)×W(n) (1)

after windowing of the signals, Fourier transform is performed on each frame of signal to obtain a transformed signal X_a(k)：

Where S' (N) is a windowed frame signal, and N represents the number of points of fourier transform.

The design of the Mel frequency scale filter bank is also the key of the design of the Mel spectrum characteristics. The sound sensed by the human ear is not linearly proportional to the frequency, and the Mel frequency scale is more consistent with the auditory characteristics of the human ear. The Mel frequency scale and the actual frequency magnitude have a distribution of logarithmic relations, and the relation between them can be approximately expressed by the formula (4):

where f is the actual frequency in Hz. Mel (f) is the perceptual frequency in Mel.

Each filter in the Mel Filter Bank is a triangular filter, defining a filter Bank (default is 128) with M filters, the lower limit frequency of the M filter is the center frequency of the M-1 filter, i.e. f₀(m)＝f_l(m+1)＝f_h(M-1), wherein M is 1,2 …, M, f₀Is the center frequency, f_lLower limit frequency (default 0), f_hIs the upper frequency (1/2 for the sampling frequency by default).

Of triangular band-pass filtersFrequency response equation H_m(k) The following were used:

wherein

Calculate each filter response and each frame X_a(k) And obtaining the Mel spectral characteristics by the dot product of the squares of the modes. And subtracting the mean value and dividing the variance of the Mel spectral features obtained by each frame to obtain the Mel spectral features after feature enhancement.

Step 2: and modifying the ShuffleNet V2 networks of the ShuffleNet V20.5x version and the ShuffleNet V21.0x version to obtain a modified ShuffleNet V2 classifier. As shown in the attached table one. The italicized bold part is a modification part of ShuffleNet V2, and comprises the steps of adding a batch normalization layer (BN) layer, changing the output channel number of a Mel spectrum characteristic layer, the output size of each characteristic diagram in a Conv1 layer, a Stage2 layer, a Stage3 layer, a Stage4 layer and a Conv5 layer, the convolution kernel size of a GlobalPool layer and the output channel number of a final layer of full-connection layers FC.

And step 3: and (6) evaluating the model.

And (3) taking the Mel spectrum characteristics of the multi-class target training set and the verification set as the input of the modified ShuffLeNet V2 network model, and training and verifying the network model. Setting network model parameters, carrying out random initialization on the network, calculating loss by adopting a cross entropy loss function, optimizing the gradient by adopting a random gradient descent (SGD) algorithm, setting Momentum (Momentum) to be 0.9, setting the learning rate to be 0.001 and training times to be 30 times. And after the network model is trained, recording loss in the training process and correct recognition rate in the verification process to evaluate the performance of the recognition method.

The invention combines the artificial Mel spectrum characteristic with the modified light-weight ShuffLeNet V2 network to complete the recognition task of the underwater acoustic target. The identification method is added with data enhancement and characteristic enhancement steps. The network is modified for matching with Mel spectral features; in particular, the network bottom layer adds a batch normalization layer to normalize the mean and variance of each batch of input data. Experimental results show that under the small-sized equipment at the mobile terminal, the model obtains a very good recognition effect in recognition and classification tasks of various underwater sound targets, and the effectiveness of the invention is proved.

A first attached table: a modified ShuffleNet V2 classifier.

And B, attaching a table II: training effects of the recognition methods under the ShuffleNet V20.5x version and the ShuffleNet V21.0x version.

Claims

1. An underwater acoustic target identification method based on ShuffleNet V2 classification network and Mel spectral features is characterized by comprising the following steps:

2. The underwater acoustic target identification method based on the ShuffLeNet V2 classification network and Mel spectral features as claimed in claim 1, wherein in step 1, the tag reading specifically is reading a specific path name of an underwater acoustic target file, and tag information corresponding to several types of underwater acoustic targets is generated according to target class information in the path name.

3. The underwater acoustic target identification method based on the ShuffleNet V2 classification network and Mel spectral features as claimed in claim 1, wherein in step 1, each target data of several types of underwater acoustic targets is framed, and the overlap of frames is 25% -75%), which is used to increase the number of samples.

4. The underwater acoustic target recognition method based on the ShuffLeNet V2 classification network and Mel spectral features as claimed in claim 1, wherein the number ratio of the training set samples to the validation set samples is 7: 3.

5. the underwater acoustic target identification method based on the ShuffleNet V2 classification network and Mel spectral features as claimed in claim 1, wherein the step 2 comprises the following sub-steps:

step 2.3: after windowing of the signals, Fourier transform is performed on each frame of signal to obtain a transformed signal X_a(k)：

wherein

6. The underwater acoustic target identification method based on the ShuffleNet V2 classification network and Mel spectral features as claimed in claim 1, wherein the step 3 comprises the following steps: