CN117912458A - Underwater sound abnormal signal detection method - Google Patents

Underwater sound abnormal signal detection method Download PDF

Info

Publication number
CN117912458A
CN117912458A CN202410147800.XA CN202410147800A CN117912458A CN 117912458 A CN117912458 A CN 117912458A CN 202410147800 A CN202410147800 A CN 202410147800A CN 117912458 A CN117912458 A CN 117912458A
Authority
CN
China
Prior art keywords
neural network
training
size
time
module
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202410147800.XA
Other languages
Chinese (zh)
Inventor
夏飞
窦钰涛
赵祥
赵飞
张洋
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Hunan University
Original Assignee
Hunan University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Hunan University filed Critical Hunan University
Priority to CN202410147800.XA priority Critical patent/CN117912458A/en
Publication of CN117912458A publication Critical patent/CN117912458A/en
Pending legal-status Critical Current

Links

Abstract

The application relates to the technical field of underwater sound signal processing, and particularly discloses a method for detecting an underwater sound abnormal signal. The method comprises the following steps: extracting normal underwater sound data samples with the same sample size, establishing a sample data set, performing short-time Fourier transform to obtain a corresponding time-frequency diagram data set, constructing a self-supervision training model, training a neural network model by using the sample data set and the self-supervision training model, making mask input data, making a reconstruction target, and training the neural network model; a neural network model is applied. Compared with the traditional classification anomaly detection method, the method can effectively reduce the requirement for a large amount of original data. Compared with the traditional self-supervision training method, the method can effectively reduce the complexity of the neural network, improve the effectiveness and efficiency of the training process, and can also ensure the success rate of continuous training by gradually increasing the details or diversity of the reconstruction.

Description

Underwater sound abnormal signal detection method
Technical Field
The invention relates to the technical field of underwater sound signal processing, and particularly discloses a method for detecting an underwater sound abnormal signal.
Background
With the progress of human civilization, especially the recent years of growing shortage of land resources, efforts are being made to develop marine resources, and the exploration activities for the ocean are becoming active. Scientific researches such as ocean environment perception become an important national strategy of each country. In the information age, electromagnetic signal-based information technology has achieved remarkable achievements, and provides a no-less rapid way for people to transfer and process information. However, the ocean environment is very complex, and seawater strongly attenuates electromagnetic signals, so that long-distance information interaction cannot be performed underwater by adopting the electromagnetic signals, and activities of people in the ocean are seriously hindered. In comparison, only sound waves have better transmission capability in seawater, so that the sound waves become the main means of current underwater information sensing and information interaction.
When there is a human activity such as ship navigation, well drilling exploration, etc., some abnormal sound signal will appear in the sea water. However, the ocean has a lot of background noise at any time due to the wave, tide, ocean current, etc., which causes great trouble to the detection of the underwater sound abnormal signal. How to quickly and accurately detect these abnormal underwater sounds has become an important issue to be addressed by marine environmental awareness and exploration. At present, some researches on an abnormality detection method of sound in air exist, but few researches on detection of underwater sound abnormality signals are performed. Most of the existing air sound abnormality detection methods are based on traditional manual feature extraction, and the feature extraction capability of sound signals is insufficient. Some methods also extract sound features based on neural networks, but do not consider the complexity of the underwater acoustic environment, and are not specifically designed for the neural network training difficulties caused by the large amount of noise present in the underwater sound, so that the methods are difficult to accurately detect complex underwater sound abnormal signals.
Disclosure of Invention
Aiming at the existing problems, the invention provides the self-supervision method for training the neural network based on the mask reconstruction, which can relieve the problems of difficult acquisition and difficult labeling of the underwater sound abnormal samples, can realize training work without the abnormal samples, and simultaneously gradually improves the reconstruction difficulty and the reconstruction fineness through a progressive training method, improves the self-supervision training efficiency and the characteristic extraction capability of the neural network, thereby completing the training of the neural network model more quickly and obtaining more accurate detection results.
In order to achieve the above purpose, the present invention adopts the following technical scheme.
The invention discloses a method for detecting an underwater sound abnormal signal, which comprises the following steps:
Step one, extracting normal underwater sound data samples with the same sample size to establish a sample data set D; performing short-time Fourier transform on all underwater sound data samples in the sample data set D to obtain a corresponding time-frequency diagram data set S m×n; m×n is the pixel size of the time-frequency diagram;
Step two, constructing a self-supervision training model A comprising an input module, an encoder module, a small block reorganization module, a decoder module, a reconstruction target generation module and a loss function module; wherein:
the input module divides an original time-frequency diagram I into a plurality of small blocks with the same size, then masks most of the small blocks according to a preset proportion to obtain a mask small block set M, and the rest is a visible small block set N;
The encoder module takes N as input, depth features in N are extracted through a neural network, and the size of each feature map in the feature map sets F, F is i multiplied by j multiplied by k, wherein i is the length of the feature map, j is the width of the feature map, and k is the channel number of the feature map;
The small block reorganization module takes F as input to construct a mask token set T, and then adds a position code for each feature map in F and each mask token in T to obtain a new set C; wherein the number of mask tokens in T is equal to q, each mask token being a learnable vector representing a placeholder of a small block of layers to be recovered by the decoder;
the decoder module takes C as input, restores the mask token and the feature map into small blocks with a size of a multiplied by b, then arranges a plurality of small blocks according to the position code, and reconstructs a new time-frequency map
The reconstruction target generation module takes an original time-frequency diagram I as input, carries out SVD decomposition on the I, then extracts the previous e components for image reconstruction, and generates a reconstruction target Te; the reconstruction target Te is a time-frequency diagram with the same image size as I;
The loss function module comprises a loss function, and the loss function comprises two parts:
The first part is reconstruction loss Lr for reconstructing an image The mean square error MSE between the reconstruction target Te is as follows: /(I)
The second part is the loss of diversity Ld, and the reconstructed image is adoptedNuclear norms as/>The formula is as follows: /(I)
Wherein,Reconstruction target/>Is a nuclear norm of (2);
The total loss function is:
Wherein, alpha is a variable loss function weight factor, E is the number of current training rounds, E is the total number of rounds to be trained, and beta is a diversity adjustment factor;
Step three, training the neural network model by using a sample data set D and a self-supervision training model A, wherein E is a preset super parameter, and E is less than or equal to min (m, n); the training process of the E (E is more than or equal to 1 and less than or equal to E) is shown in the steps 3.1-3.4:
Step 3.1: making mask input data, randomly extracting an image I from a time-frequency chart set S, dividing the image I into a plurality of small blocks with the same size, then masking p small blocks to obtain a mask small block set M, and the remaining q small blocks are visible small block sets N; wherein the size of each small block is a×b;
Step 3.2: manufacturing a reconstruction target T, carrying out SVD decomposition on the image I, and carrying out image reconstruction on the first e most main components to obtain a reconstruction target Te;
Step 3.3: m, N and Te are used as inputs of a self-supervision training framework to train the neural network model;
Step 3.4: repeating the steps 4.1 to 4.3 for K times, wherein K is a preset super parameter;
step four, applying a neural network model, wherein the specific steps comprise:
step 4.1: carrying out framing treatment on the audio data to be detected, so that the size of the audio data is the same as the size of an audio sample in a data set D;
Step 4.2: performing STFT (Standard time-shift transform) on the audio data after framing to obtain a time spectrogram It;
Step 4.3: sending the time spectrogram It into a trained neural network model, and reconstructing a new time spectrogram through the neural network model
Step 4.4: calculating anomaly scores, i.e. calculating input spectrogram It and reconstructing spectrogramThe mean square error between them is given by: /(I)
Step 4.5: judging whether the audio data to be detected is abnormal or not according to the size of the Score; if the Score is greater than Y, judging that the Score is abnormal, otherwise, judging that the Score is normal; wherein Y is a preset threshold.
For further improvement or specific implementation of the underwater sound abnormal signal detection method, the input module divides the original time-frequency diagram I into a plurality of small blocks with the same size, then masks 80% of the small blocks to obtain a mask small block set M, and the remaining 20% is a visible small block set N.
Further perfecting or implementing the method for detecting the underwater sound abnormal signal, the encoder module and the decoder module adopt ViT neural network structures.
Further refinement or specific implementation of the foregoing underwater sound abnormality signal detection method, the masking in step 3.1 refers to setting all pixel values in the small block to 0.
The beneficial effects are that:
Compared with the traditional classification anomaly detection method, the method can effectively reduce the requirement for a large amount of original data. Compared with the traditional self-supervision training method, the method can effectively reduce the complexity of the neural network, improve the effectiveness and efficiency of the training process, and can also ensure the success rate of continuous training by gradually increasing the details or diversity of the reconstruction.
Drawings
Fig. 1 is a schematic flow diagram of a module of a method for detecting an underwater sound abnormality signal.
Detailed Description
The present invention will be described in detail with reference to specific examples.
The invention provides a method for detecting an underwater sound abnormal signal, which is used for solving the problem that the capability of extracting the characteristic of the underwater sound signal is not strong in the existing method for detecting the underwater sound abnormal signal, and improving the training efficiency of a neural network by a progressive self-supervision method.
The method specifically comprises the following steps:
Step one, extracting normal underwater sound data samples with the same sample size to establish a sample data set D; performing short-time Fourier transform on all underwater sound data samples in the sample data set D to obtain a corresponding time-frequency diagram data set S m×n; m×n is the pixel size of the time-frequency diagram;
Step two, constructing a self-supervision training model A comprising an input module, an encoder module, a small block reorganization module, a decoder module, a reconstruction target generation module and a loss function module; wherein:
the input module divides an original time-frequency diagram I into a plurality of small blocks with the same size, then masks most of the small blocks according to a preset proportion to obtain a mask small block set M, and the rest is a visible small block set N;
The encoder module takes N as input, depth features in N are extracted through a neural network, and the size of each feature map in the feature map sets F, F is i multiplied by j multiplied by k, wherein i is the length of the feature map, j is the width of the feature map, and k is the channel number of the feature map;
The small block reorganization module takes F as input to construct a mask token set T, and then adds a position code for each feature map in F and each mask token in T to obtain a new set C; wherein the number of mask tokens in T is equal to q, each mask token being a learnable vector representing a placeholder of a small block of layers to be recovered by the decoder;
the decoder module takes C as input, restores the mask token and the feature map into small blocks with a size of a multiplied by b, then arranges a plurality of small blocks according to the position code, and reconstructs a new time-frequency map The encoder module and the decoder module adopt ViT neural network structures.
The reconstruction target generation module takes an original time-frequency diagram I as input, carries out SVD decomposition on the I, then extracts the previous e components for image reconstruction, and generates a reconstruction target Te; the reconstruction target Te is a time-frequency diagram with the same image size as I;
The loss function module comprises a loss function, and the loss function comprises two parts:
The first part is reconstruction loss Lr for reconstructing an image The mean square error MSE between the reconstruction target Te is as follows: /(I)
The second part is the loss of diversity Ld, and the reconstructed image is adoptedNuclear norms as/>The formula is as follows: /(I)
Wherein,Reconstruction target/>Is a nuclear norm of (2);
The total loss function is:
Wherein, alpha is a variable loss function weight factor, E is the number of current training rounds, E is the total number of rounds to be trained, and beta is a diversity adjustment factor;
Step three, training the neural network model by using a sample data set D and a self-supervision training model A, wherein E is a preset super parameter, and E is less than or equal to min (m, n); the training process of the E (E is more than or equal to 1 and less than or equal to E) is shown in the steps 3.1-3.4:
Step 3.1: making mask input data, randomly extracting an image I from a time-frequency chart set S, dividing the image I into a plurality of small blocks with the same size, then masking p small blocks to obtain a mask small block set M, and the remaining q small blocks are visible small block sets N; wherein the size of each small block is a×b; masking refers to setting all pixel values in a tile to 0.
Step 3.2: manufacturing a reconstruction target T, carrying out SVD decomposition on the image I, and carrying out image reconstruction on the first e most main components to obtain a reconstruction target Te;
Step 3.3: m, N and Te are used as inputs of a self-supervision training framework to train the neural network model;
Step 3.4: repeating the steps 4.1 to 4.3 for K times, wherein K is a preset super parameter;
step four, applying a neural network model, wherein the specific steps comprise:
step 4.1: carrying out framing treatment on the audio data to be detected, so that the size of the audio data is the same as the size of an audio sample in a data set D;
Step 4.2: performing STFT (Standard time-shift transform) on the audio data after framing to obtain a time spectrogram It;
Step 4.3: sending the time spectrogram It into a trained neural network model, and reconstructing a new time spectrogram through the neural network model
Step 4.4: calculating anomaly scores, i.e. calculating input spectrogram It and reconstructing spectrogramThe mean square error between them is given by: /(I)
Step 4.5: judging whether the audio data to be detected is abnormal or not according to the size of the Score; if the Score is greater than Y, judging that the Score is abnormal, otherwise, judging that the Score is normal; wherein Y is a preset threshold.
For further improvement or specific implementation of the underwater sound abnormal signal detection method, the input module divides the original time-frequency diagram I into a plurality of small blocks with the same size, then masks 80% of the small blocks to obtain a mask small block set M, and the remaining 20% is a visible small block set N.
Examples:
the marine biologist needs to design an intelligent sonar system to automatically detect whether rare fish X exist in a sea area. Fish X produces an abnormal sound of some sort, but since fish X is very rare, no sound sample of the fish has been collected before.
If the abnormality monitoring is performed by using a classification-based method, a large amount of sounds of the fish X and normal background sounds without the fish X need to be collected in advance, the two sound samples are labeled and made into a data set, and then a classification model is trained using the data, and the fish X is detected using the classification model. Since fish X is very rare, this method is not applicable. By adopting the method, the fish X can be found in advance by collecting the normal background sound without the fish X and training a self-supervision model without data marking, regarding the sound of the fish X as an abnormality and carrying out abnormality detection through the self-supervision model.
On the other hand, since there is a lot of noise in the seawater sound, there is also a lot of noise in the corresponding time-frequency diagram. If a common self-supervised training method is adopted, the reconstruction target is an original time-frequency diagram containing a large amount of noise, the reconstruction target is difficult for the neural network, and the training process is likely to fail. The present invention optimizes the training process in two ways, solving this problem. Firstly, adopting SVD decomposition reconstruction technology, firstly setting a reconstruction target as a main component of an original sample, and then gradually adding detailed information in the reconstruction target; secondly, a loss function capable of adjusting the diversity of the reconstruction target is constructed, the weight of diversity loss in the total loss function is gradually increased in a progressive mode, and the higher the diversity is, the more detail information is in the image, so that the reconstruction capability of the neural network on the image details can be gradually improved. As a progressive training mode is adopted, the training difficulty of the neural network at the beginning is reduced, and the training cannot collapse. After training for a certain round, the neural network model improves the understanding capability of the image, at the moment, the training difficulty is gradually increased (the detail information or diversity in the reconstruction target is increased), the understanding of the details of the neural network is not so difficult, and the training is not failed.
Finally, it should be noted that the above embodiments are only for illustrating the technical solution of the present invention, and not for limiting the scope of the present invention, and although the present invention has been described in detail with reference to the preferred embodiments, it should be understood by those skilled in the art that modifications or equivalent substitutions can be made to the technical solution of the present invention without departing from the spirit and scope of the technical solution of the present invention.

Claims (4)

1. The method for detecting the underwater sound abnormal signal is characterized by comprising the following steps of:
Step one, extracting normal underwater sound data samples with the same sample size to establish a sample data set D; performing short-time Fourier transform on all underwater sound data samples in the sample data set D to obtain a corresponding time-frequency diagram data set S m×n; m×n is the pixel size of the time-frequency diagram;
Step two, constructing a self-supervision training model A comprising an input module, an encoder module, a small block reorganization module, a decoder module, a reconstruction target generation module and a loss function module; wherein:
The input module divides an original time-frequency diagram I into a plurality of small blocks with the same size, masks most of the small blocks according to a preset proportion to obtain a mask small block set M, and the rest is a visible small block set N;
The encoder module takes N as input, depth features in N are extracted through a neural network, and the size of each feature map in the feature map set F is i multiplied by j multiplied by k, wherein i is the length of the feature map, j is the width of the feature map, and k is the channel number of the feature map;
The small block reorganization module takes F as input to construct a mask token set T, and then adds a position code for each feature map in F and each mask token in T to obtain a new set C; wherein the number of mask tokens in T is equal to q, each mask token being a learnable vector representing a placeholder of a small block of layers to be recovered by the decoder;
the decoder module takes C as input, restores the mask token and the feature map into small blocks with a size of a multiplied by b, then arranges a plurality of small blocks according to the position coding, and reconstructs a new time-frequency map
The reconstruction target generation module takes an original time-frequency diagram I as input, carries out SVD decomposition on the I, then extracts the previous e components for image reconstruction, and generates a reconstruction target Te; the reconstruction target Te is a time-frequency diagram with the same image size as I;
the loss function module comprises a loss function, wherein the loss function comprises two parts:
The first part is reconstruction loss Lr for reconstructing an image The mean square error MSE between the reconstruction target Te is as follows:
The second part is the loss of diversity Ld, and the reconstructed image is adopted Nuclear norms as/>The formula is as follows:
wherein, Reconstruction target/>Is a nuclear norm of (2);
The total loss function is:
Wherein, alpha is a variable loss function weight factor, E is the number of current training rounds, E is the total number of rounds to be trained, and beta is a diversity adjustment factor;
Step three, training the neural network model by using a sample data set D and a self-supervision training model A, wherein E is a preset super parameter, and E is less than or equal to min (m, n); the training process of the E (E is more than or equal to 1 and less than or equal to E) is shown in the steps 3.1-3.4:
Step 3.1: making mask input data, randomly extracting an image I from a time-frequency chart set S, dividing the image I into a plurality of small blocks with the same size, then masking p small blocks to obtain a mask small block set M, and the remaining q small blocks are visible small block sets N; wherein the size of each small block is a×b;
Step 3.2: manufacturing a reconstruction target T, carrying out SVD decomposition on the image I, and carrying out image reconstruction on the first e most main components to obtain a reconstruction target Te;
Step 3.3: m, N and Te are used as inputs of a self-supervision training framework to train the neural network model;
Step 3.4: repeating the steps 4.1 to 4.3 for K times, wherein K is a preset super parameter;
step four, applying a neural network model, wherein the specific steps comprise:
step 4.1: carrying out framing treatment on the audio data to be detected, so that the size of the audio data is the same as the size of an audio sample in a data set D;
Step 4.2: performing STFT (Standard time-shift transform) on the audio data after framing to obtain a time spectrogram It;
Step 4.3: sending the time spectrogram It into a trained neural network model, and reconstructing a new time spectrogram through the neural network model
Step 4.4: calculating anomaly scores, i.e. calculating input spectrogram It and reconstructing spectrogramThe mean square error between them is given by: /(I)
Step 4.5: judging whether the audio data to be detected is abnormal or not according to the size of the Score; if the Score is greater than Y, judging that the Score is abnormal, otherwise, judging that the Score is normal; wherein Y is a preset threshold.
2. The method for detecting an underwater sound abnormality signal according to claim 1, wherein said input module divides the original time-frequency diagram I into a plurality of small blocks of the same size, then masks 80% of the small blocks to obtain a masked small block set M, and the remaining 20% is a visible small block set N.
3. The method of claim 1, wherein the encoder module and decoder module employ ViT neural network structures.
4. The method of detecting an underwater sound abnormality signal according to claim 1, wherein masking in the step 3.1 means setting all pixel values in a small block to 0.
CN202410147800.XA 2024-02-02 2024-02-02 Underwater sound abnormal signal detection method Pending CN117912458A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202410147800.XA CN117912458A (en) 2024-02-02 2024-02-02 Underwater sound abnormal signal detection method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202410147800.XA CN117912458A (en) 2024-02-02 2024-02-02 Underwater sound abnormal signal detection method

Publications (1)

Publication Number Publication Date
CN117912458A true CN117912458A (en) 2024-04-19

Family

ID=90688973

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202410147800.XA Pending CN117912458A (en) 2024-02-02 2024-02-02 Underwater sound abnormal signal detection method

Country Status (1)

Country Link
CN (1) CN117912458A (en)

Similar Documents

Publication Publication Date Title
CN107785029A (en) Target voice detection method and device
CN108922513A (en) Speech differentiation method, apparatus, computer equipment and storage medium
CN109376589A (en) ROV deformation target and Small object recognition methods based on convolution kernel screening SSD network
CN106094046A (en) Time domain aviation electromagnetic data de-noising method based on singular value decomposition and wavelet analysis
CN106899357B (en) Disguised and concealed underwater communication device simulating dolphin whistle
CN113724149B (en) Weak-supervision visible light remote sensing image thin cloud removing method
CN105741844A (en) DWT-SVD-ICA-based digital audio watermarking algorithm
CN112183582A (en) Multi-feature fusion underwater target identification method
Yin et al. Underwater acoustic target classification based on LOFAR spectrum and convolutional neural network
CN107731235A (en) Sperm whale and the cry pulse characteristicses extraction of long fin navigator whale and sorting technique and device
CN105589073A (en) Walsh-conversion-based fish active acoustic identification method
CN117349657A (en) Distributed data acquisition module and monitoring system for hydraulic engineering environment monitoring
CN113435276A (en) Underwater sound target identification method based on antagonistic residual error network
CN117912458A (en) Underwater sound abnormal signal detection method
CN110580915B (en) Sound source target identification system based on wearable equipment
CN112881986A (en) Radar slice storage forwarding type interference suppression method based on optimized depth model
CN114613384B (en) Deep learning-based multi-input voice signal beam forming information complementation method
CN117173022A (en) Remote sensing image super-resolution reconstruction method based on multipath fusion and attention
CN106024006A (en) Wavelet-transform-based cetacean sound signal denoising method and device
CN115510898A (en) Ship acoustic wake flow detection method based on convolutional neural network
CN115293214A (en) Underwater sound target recognition model optimization method based on sample expansion network
CN109712639A (en) A kind of audio collecting system and method based on wavelet filter
CN109347569A (en) A kind of hidden underwater acoustic communication method of camouflage based on discrete cosine transform
CN111624585A (en) Underwater target passive detection method based on convolutional neural network
Wang et al. High visual quality image steganography based on encoder-decoder model

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination