CN117912458A

CN117912458A - Underwater sound abnormal signal detection method

Info

Publication number: CN117912458A
Application number: CN202410147800.XA
Authority: CN
Inventors: 夏飞; 窦钰涛; 赵祥; 赵飞; 张洋
Original assignee: Hunan University
Current assignee: Hunan University
Priority date: 2024-02-02
Filing date: 2024-02-02
Publication date: 2024-04-19

Abstract

The application relates to the technical field of underwater sound signal processing, and particularly discloses a method for detecting an underwater sound abnormal signal. The method comprises the following steps: extracting normal underwater sound data samples with the same sample size, establishing a sample data set, performing short-time Fourier transform to obtain a corresponding time-frequency diagram data set, constructing a self-supervision training model, training a neural network model by using the sample data set and the self-supervision training model, making mask input data, making a reconstruction target, and training the neural network model; a neural network model is applied. Compared with the traditional classification anomaly detection method, the method can effectively reduce the requirement for a large amount of original data. Compared with the traditional self-supervision training method, the method can effectively reduce the complexity of the neural network, improve the effectiveness and efficiency of the training process, and can also ensure the success rate of continuous training by gradually increasing the details or diversity of the reconstruction.

Description

Underwater sound abnormal signal detection method

Technical Field

The invention relates to the technical field of underwater sound signal processing, and particularly discloses a method for detecting an underwater sound abnormal signal.

Background

With the progress of human civilization, especially the recent years of growing shortage of land resources, efforts are being made to develop marine resources, and the exploration activities for the ocean are becoming active. Scientific researches such as ocean environment perception become an important national strategy of each country. In the information age, electromagnetic signal-based information technology has achieved remarkable achievements, and provides a no-less rapid way for people to transfer and process information. However, the ocean environment is very complex, and seawater strongly attenuates electromagnetic signals, so that long-distance information interaction cannot be performed underwater by adopting the electromagnetic signals, and activities of people in the ocean are seriously hindered. In comparison, only sound waves have better transmission capability in seawater, so that the sound waves become the main means of current underwater information sensing and information interaction.

When there is a human activity such as ship navigation, well drilling exploration, etc., some abnormal sound signal will appear in the sea water. However, the ocean has a lot of background noise at any time due to the wave, tide, ocean current, etc., which causes great trouble to the detection of the underwater sound abnormal signal. How to quickly and accurately detect these abnormal underwater sounds has become an important issue to be addressed by marine environmental awareness and exploration. At present, some researches on an abnormality detection method of sound in air exist, but few researches on detection of underwater sound abnormality signals are performed. Most of the existing air sound abnormality detection methods are based on traditional manual feature extraction, and the feature extraction capability of sound signals is insufficient. Some methods also extract sound features based on neural networks, but do not consider the complexity of the underwater acoustic environment, and are not specifically designed for the neural network training difficulties caused by the large amount of noise present in the underwater sound, so that the methods are difficult to accurately detect complex underwater sound abnormal signals.

Disclosure of Invention

Aiming at the existing problems, the invention provides the self-supervision method for training the neural network based on the mask reconstruction, which can relieve the problems of difficult acquisition and difficult labeling of the underwater sound abnormal samples, can realize training work without the abnormal samples, and simultaneously gradually improves the reconstruction difficulty and the reconstruction fineness through a progressive training method, improves the self-supervision training efficiency and the characteristic extraction capability of the neural network, thereby completing the training of the neural network model more quickly and obtaining more accurate detection results.

In order to achieve the above purpose, the present invention adopts the following technical scheme.

The invention discloses a method for detecting an underwater sound abnormal signal, which comprises the following steps:

Step one, extracting normal underwater sound data samples with the same sample size to establish a sample data set D; performing short-time Fourier transform on all underwater sound data samples in the sample data set D to obtain a corresponding time-frequency diagram data set S _m×n; m×n is the pixel size of the time-frequency diagram;

Step two, constructing a self-supervision training model A comprising an input module, an encoder module, a small block reorganization module, a decoder module, a reconstruction target generation module and a loss function module; wherein:

the input module divides an original time-frequency diagram I into a plurality of small blocks with the same size, then masks most of the small blocks according to a preset proportion to obtain a mask small block set M, and the rest is a visible small block set N;

The encoder module takes N as input, depth features in N are extracted through a neural network, and the size of each feature map in the feature map sets F, F is i multiplied by j multiplied by k, wherein i is the length of the feature map, j is the width of the feature map, and k is the channel number of the feature map;

The small block reorganization module takes F as input to construct a mask token set T, and then adds a position code for each feature map in F and each mask token in T to obtain a new set C; wherein the number of mask tokens in T is equal to q, each mask token being a learnable vector representing a placeholder of a small block of layers to be recovered by the decoder;

the decoder module takes C as input, restores the mask token and the feature map into small blocks with a size of a multiplied by b, then arranges a plurality of small blocks according to the position code, and reconstructs a new time-frequency map

The reconstruction target generation module takes an original time-frequency diagram I as input, carries out SVD decomposition on the I, then extracts the previous e components for image reconstruction, and generates a reconstruction target Te; the reconstruction target Te is a time-frequency diagram with the same image size as I;

The loss function module comprises a loss function, and the loss function comprises two parts:

The first part is reconstruction loss Lr for reconstructing an image The mean square error MSE between the reconstruction target Te is as follows: /(I)

The second part is the loss of diversity Ld, and the reconstructed image is adoptedNuclear norms as/>The formula is as follows: /(I)

Wherein,Reconstruction target/>Is a nuclear norm of (2);

The total loss function is:

Wherein, alpha is a variable loss function weight factor, E is the number of current training rounds, E is the total number of rounds to be trained, and beta is a diversity adjustment factor;

Step three, training the neural network model by using a sample data set D and a self-supervision training model A, wherein E is a preset super parameter, and E is less than or equal to min (m, n); the training process of the E (E is more than or equal to 1 and less than or equal to E) is shown in the steps 3.1-3.4:

Step 3.1: making mask input data, randomly extracting an image I from a time-frequency chart set S, dividing the image I into a plurality of small blocks with the same size, then masking p small blocks to obtain a mask small block set M, and the remaining q small blocks are visible small block sets N; wherein the size of each small block is a×b;

Step 3.2: manufacturing a reconstruction target T, carrying out SVD decomposition on the image I, and carrying out image reconstruction on the first e most main components to obtain a reconstruction target Te;

Step 3.3: m, N and Te are used as inputs of a self-supervision training framework to train the neural network model;

Step 3.4: repeating the steps 4.1 to 4.3 for K times, wherein K is a preset super parameter;

step four, applying a neural network model, wherein the specific steps comprise:

step 4.1: carrying out framing treatment on the audio data to be detected, so that the size of the audio data is the same as the size of an audio sample in a data set D;

Step 4.2: performing STFT (Standard time-shift transform) on the audio data after framing to obtain a time spectrogram It;

Step 4.3: sending the time spectrogram It into a trained neural network model, and reconstructing a new time spectrogram through the neural network model

Step 4.4: calculating anomaly scores, i.e. calculating input spectrogram It and reconstructing spectrogramThe mean square error between them is given by: /(I)

Step 4.5: judging whether the audio data to be detected is abnormal or not according to the size of the Score; if the Score is greater than Y, judging that the Score is abnormal, otherwise, judging that the Score is normal; wherein Y is a preset threshold.

For further improvement or specific implementation of the underwater sound abnormal signal detection method, the input module divides the original time-frequency diagram I into a plurality of small blocks with the same size, then masks 80% of the small blocks to obtain a mask small block set M, and the remaining 20% is a visible small block set N.

Further perfecting or implementing the method for detecting the underwater sound abnormal signal, the encoder module and the decoder module adopt ViT neural network structures.

Further refinement or specific implementation of the foregoing underwater sound abnormality signal detection method, the masking in step 3.1 refers to setting all pixel values in the small block to 0.

The beneficial effects are that:

Compared with the traditional classification anomaly detection method, the method can effectively reduce the requirement for a large amount of original data. Compared with the traditional self-supervision training method, the method can effectively reduce the complexity of the neural network, improve the effectiveness and efficiency of the training process, and can also ensure the success rate of continuous training by gradually increasing the details or diversity of the reconstruction.

Drawings

Fig. 1 is a schematic flow diagram of a module of a method for detecting an underwater sound abnormality signal.

Detailed Description

The present invention will be described in detail with reference to specific examples.

The invention provides a method for detecting an underwater sound abnormal signal, which is used for solving the problem that the capability of extracting the characteristic of the underwater sound signal is not strong in the existing method for detecting the underwater sound abnormal signal, and improving the training efficiency of a neural network by a progressive self-supervision method.

The method specifically comprises the following steps:

the decoder module takes C as input, restores the mask token and the feature map into small blocks with a size of a multiplied by b, then arranges a plurality of small blocks according to the position code, and reconstructs a new time-frequency map The encoder module and the decoder module adopt ViT neural network structures.

Wherein,Reconstruction target/>Is a nuclear norm of (2);

The total loss function is:

Step 3.1: making mask input data, randomly extracting an image I from a time-frequency chart set S, dividing the image I into a plurality of small blocks with the same size, then masking p small blocks to obtain a mask small block set M, and the remaining q small blocks are visible small block sets N; wherein the size of each small block is a×b; masking refers to setting all pixel values in a tile to 0.

Examples:

the marine biologist needs to design an intelligent sonar system to automatically detect whether rare fish X exist in a sea area. Fish X produces an abnormal sound of some sort, but since fish X is very rare, no sound sample of the fish has been collected before.

If the abnormality monitoring is performed by using a classification-based method, a large amount of sounds of the fish X and normal background sounds without the fish X need to be collected in advance, the two sound samples are labeled and made into a data set, and then a classification model is trained using the data, and the fish X is detected using the classification model. Since fish X is very rare, this method is not applicable. By adopting the method, the fish X can be found in advance by collecting the normal background sound without the fish X and training a self-supervision model without data marking, regarding the sound of the fish X as an abnormality and carrying out abnormality detection through the self-supervision model.

On the other hand, since there is a lot of noise in the seawater sound, there is also a lot of noise in the corresponding time-frequency diagram. If a common self-supervised training method is adopted, the reconstruction target is an original time-frequency diagram containing a large amount of noise, the reconstruction target is difficult for the neural network, and the training process is likely to fail. The present invention optimizes the training process in two ways, solving this problem. Firstly, adopting SVD decomposition reconstruction technology, firstly setting a reconstruction target as a main component of an original sample, and then gradually adding detailed information in the reconstruction target; secondly, a loss function capable of adjusting the diversity of the reconstruction target is constructed, the weight of diversity loss in the total loss function is gradually increased in a progressive mode, and the higher the diversity is, the more detail information is in the image, so that the reconstruction capability of the neural network on the image details can be gradually improved. As a progressive training mode is adopted, the training difficulty of the neural network at the beginning is reduced, and the training cannot collapse. After training for a certain round, the neural network model improves the understanding capability of the image, at the moment, the training difficulty is gradually increased (the detail information or diversity in the reconstruction target is increased), the understanding of the details of the neural network is not so difficult, and the training is not failed.

Finally, it should be noted that the above embodiments are only for illustrating the technical solution of the present invention, and not for limiting the scope of the present invention, and although the present invention has been described in detail with reference to the preferred embodiments, it should be understood by those skilled in the art that modifications or equivalent substitutions can be made to the technical solution of the present invention without departing from the spirit and scope of the technical solution of the present invention.

Claims

1. The method for detecting the underwater sound abnormal signal is characterized by comprising the following steps of:

The input module divides an original time-frequency diagram I into a plurality of small blocks with the same size, masks most of the small blocks according to a preset proportion to obtain a mask small block set M, and the rest is a visible small block set N;

The encoder module takes N as input, depth features in N are extracted through a neural network, and the size of each feature map in the feature map set F is i multiplied by j multiplied by k, wherein i is the length of the feature map, j is the width of the feature map, and k is the channel number of the feature map;

the decoder module takes C as input, restores the mask token and the feature map into small blocks with a size of a multiplied by b, then arranges a plurality of small blocks according to the position coding, and reconstructs a new time-frequency map

the loss function module comprises a loss function, wherein the loss function comprises two parts:

The first part is reconstruction loss Lr for reconstructing an image The mean square error MSE between the reconstruction target Te is as follows:

The second part is the loss of diversity Ld, and the reconstructed image is adopted Nuclear norms as/>The formula is as follows:

wherein, Reconstruction target/>Is a nuclear norm of (2);

The total loss function is:

2. The method for detecting an underwater sound abnormality signal according to claim 1, wherein said input module divides the original time-frequency diagram I into a plurality of small blocks of the same size, then masks 80% of the small blocks to obtain a masked small block set M, and the remaining 20% is a visible small block set N.

3. The method of claim 1, wherein the encoder module and decoder module employ ViT neural network structures.

4. The method of detecting an underwater sound abnormality signal according to claim 1, wherein masking in the step 3.1 means setting all pixel values in a small block to 0.