CN116106880A

CN116106880A - Underwater sound source ranging method and device based on attention mechanism and multi-scale fusion

Info

Publication number: CN116106880A
Application number: CN202310390544.2A
Authority: CN
Inventors: 徐立军
Original assignee: Beijing Institute of Technology BIT
Current assignee: Beijing Institute of Technology BIT
Priority date: 2023-04-13
Filing date: 2023-04-13
Publication date: 2023-05-12
Anticipated expiration: 2043-04-13
Also published as: CN116106880B

Abstract

The application provides an underwater sound source ranging method and device based on an attention mechanism and multi-scale fusion, and relates to the technical field of underwater communication, wherein the method comprises the following steps: preprocessing a received underwater signal by using a signal processing technology to obtain a sample covariance matrix corresponding to the received signal; the method comprises the steps of inputting a sample covariance matrix into an underwater sound source ranging network for feature extraction, and taking an output result as a predicted distance, wherein the underwater sound source ranging network takes a residual network as a main network, and the underwater sound source ranging network comprises a self-adaptive feature fusion module and at least one feature subspace channel attention module. The accurate ranging of the underwater sound source is realized by the scheme.

Description

Underwater sound source ranging method and device based on attention mechanism and multi-scale fusion

Technical Field

The application relates to the technical field of underwater communication, in particular to an underwater sound source ranging method and device based on an attention mechanism and multi-scale fusion.

Background

Ocean is an important space and resource guarantee for realizing sustainable development. The underwater sound source ranging based on sound waves is an important means in the marine application fields of environment perception, marine monitoring, information collection and the like, and is one of basic technologies for improving the capability of coping with marine emergencies and strengthening strategic tactics. Underwater sound source ranging is essentially a feature engineering problem, including both feature extraction and position prediction. Therefore, the high-efficiency characteristic extraction module is designed, and accurate ranging of the underwater sound source can be realized.

Underwater sound source ranging is classified into two types, one is a model-driven method, in which the target position is predicted by manually designing features, which are closely related to the physical characteristics of sound wave propagation. A typical representative of such methods is a matching field processing Method (MFP). The method is based on marine environment parameters, utilizes an acoustic propagation model to simulate a sound field within a limited range, then matches the simulated sound field with a real sound field, and estimates a sound source distance. The model driving method has the following problems: the characteristics of manual design cannot truly and comprehensively reflect the actual conditions of deep sea, are limited in practical application, and can directly lead to the reduction of ranging performance if the characteristics are designed wrongly. Therefore, the data-driven underwater sound source ranging method, namely a Deep Neural Network (DNN), is an effective alternative method for learning the characteristic pattern through data analysis and interpretation.

In recent years, DNN has been widely used in marine engineering, such as underwater target detection, direction of arrival estimation, sea bed classification, and the like. DNN learns features related to sound source location given input acoustic data through multiple nonlinear layers. Compared with a model driving method, DNN has stronger characteristic representation capability, and the most advanced performance is obtained in underwater sound source ranging. As a data driven approach, the performance of DNNs depends largely on the amount and quality of training data. However, for ocean engineering, the acquisition of actual data is quite difficult, involving budgeting, time consuming experiments, regulatory and confidentiality. The sparsity of the training data causes the model to be over-fitted, the generalization capability is poor, and the prediction accuracy is low.

Disclosure of Invention

The present application aims to solve, at least to some extent, one of the technical problems in the related art.

Therefore, the first object of the application is to provide an underwater sound source ranging method based on an attention mechanism and multi-scale fusion, which solves the technical problems of low prediction precision and difficult application of the existing method and realizes accurate ranging of the underwater sound source.

A second object of the present application is to propose an underwater sound source ranging device based on an attention mechanism and multi-scale fusion.

To achieve the above objective, an embodiment of a first aspect of the present application provides an underwater sound source ranging method based on an attention mechanism and multi-scale fusion, including: preprocessing a received underwater signal by using a signal processing technology to obtain a sample covariance matrix corresponding to the received signal; the method comprises the steps of inputting a sample covariance matrix into an underwater sound source ranging network for feature extraction, and taking an output result as a predicted distance, wherein the underwater sound source ranging network takes a residual network as a main network, and the underwater sound source ranging network comprises a self-adaptive feature fusion module and at least one feature subspace channel attention module.

According to the underwater sound source ranging method based on the attention mechanism and the multi-scale fusion, a signal processing technology is utilized to preprocess received signals, and the obtained sample covariance matrix can effectively represent the relation between signal frequency and a receiving array. And then improving the traditional attention mechanism and the multi-scale fusion module, adding the improved attention mechanism and the multi-scale fusion module into the traditional DNN to obtain an underwater sound source ranging network based on the attention mechanism and the multi-scale fusion, and finally inputting an obtained sample covariance matrix into the underwater sound source ranging network to output a predicted distance.

Optionally, in an embodiment of the present application, preprocessing the received underwater signal by using a signal processing technology to obtain a sample covariance matrix corresponding to the received signal, including:

normalizing the received underwater signal, and calculating to obtain an initial sample covariance matrix according to the normalized signal;

separating the real part and the imaginary part of the initial sample covariance matrix, and stacking the separated sample covariance matrices of different frequencies along a first dimension to obtain a sample covariance matrix;

wherein the initial sample covariance matrix is expressed as:

wherein ,

the representation shape is +.>

Sample covariance matrix of>

Representing the number of array elements of the receiving array,/->

Representing the normalized signal, ++>

Representing the complex conjugate transpose.

Optionally, in an embodiment of the present application, the underwater sound source ranging network further includes at least one pooling layer and at least one full connection layer, the residual network is a multi-layer network, each layer is composed of at least one residual block, other layers except for the last layer in the residual network are intermediate layers, each intermediate layer corresponds to a feature subspace channel attention module, each layer in the residual network corresponds to one pooling layer and one full connection layer, and the sample covariance matrix is input into the underwater sound source ranging network for feature extraction, including:

inputting the sample covariance matrix into a residual error network, sequentially passing through each layer of network to obtain output data of each middle layer, and obtaining an output result of the last layer as a final output result of the residual error network;

the final output result passes through the corresponding pooling layer and the full connection layer to obtain an initial prediction result;

the output data of each intermediate layer is respectively passed through a corresponding feature subspace channel attention module to obtain a feature map corresponding to each intermediate layer, all the feature maps are input into a self-adaptive feature fusion module to obtain integrated features corresponding to each intermediate layer, and the integrated features corresponding to each intermediate layer are respectively passed through a corresponding pooling layer and a full-connection layer to obtain prediction results of all the intermediate layers;

and adding and averaging the initial prediction result and the prediction results of all intermediate layers to obtain a final prediction result.

Optionally, in one embodiment of the present application, the feature subspace channel attention module includes a feature subspace module and at least one compressed excitation attention module, the compressed excitation attention module including a compression module and an excitation module, inputting output data of the intermediate layer to the feature subspace channel attention module, comprising:

dividing output data along a channel dimension by using a feature subspace module to obtain at least one group of feature graphs, wherein each group of feature graphs corresponds to one compressed excitation attention module;

inputting each group of feature graphs into a corresponding compression excitation attention module, coding the whole spatial features on the channels of the feature subgroups into global features by using global average pooling through the corresponding compression modules, obtaining the weight of each channel according to the global features through the corresponding excitation modules, and multiplying the weight of each channel by the feature graphs of the corresponding groups to obtain updated feature graphs of each group;

and splicing each updated group of feature images along the channel dimension to obtain corresponding feature images.

Alternatively, in one embodiment of the present application, the global features are expressed as:

wherein ,

representing global features->

A set of feature maps representing the current process;

the weight of each channel is expressed as:

wherein ,

representing the weight of each channel, +.>

Representing global features->

and />

Representing a ReLU activation function and a sigmoid activation function, respectively,>

representing a convolution layer comprising->

and />

，/>

For compressing channel characteristics->

For restoring the channel dimension;

each updated set of feature maps is represented as:

wherein ,

a set of feature maps representing the current process, +.>

Representing the weight of each channel;

the feature map is expressed as:

wherein ,

representing updated->

Group feature graphs, concat, represent the concatenation operation of channel dimensions.

Optionally, in an embodiment of the present application, inputting all feature maps into the adaptive feature fusion module to obtain integrated features corresponding to each intermediate layer, including:

selecting the intermediate values of the picture sizes of all the feature images as standard values, adjusting the sizes of all the feature images according to the standard values through interpolation and maximum pooling, and fusing the adjusted feature images to obtain initial fusion features;

the initial fusion feature is subjected to convolution and softmax functions to obtain a space self-adaptive weight, and the space self-adaptive weight is split in the channel dimension, so that the space self-adaptive weight corresponds to the adjusted feature map in sequence to obtain a split weight;

correspondingly multiplying the adjusted feature map and the corresponding split weight, and adding to obtain updated fusion features;

and scaling and expanding the updated fusion features according to the picture size of each feature map according to the FPN structure, and adding each feature map and the corresponding adjusted updated fusion features through jump connection to obtain the integrated features of the feature maps corresponding to each middle layer.

Optionally, in an embodiment of the present application, the adjusted feature map is expressed as:

wherein ,

、/>

、/>

representing all feature maps;

the initial fusion features are expressed as:

wherein ,

representing an initial fusion feature;

the split weights are expressed as:

wherein ,

representing the weight corresponding to each adjusted feature map;

the updated fusion characteristics are expressed as:

wherein ,

representing the updated fusion features;

the integrated features of the feature map corresponding to each intermediate layer are expressed as:

wherein ,

representing the integrated features of the feature maps corresponding to all intermediate layers.

In order to achieve the above object, an embodiment of a second aspect of the present invention provides an underwater sound source ranging device based on an attention mechanism and multi-scale fusion, which includes a preprocessing module and a ranging module, wherein,

the preprocessing module is used for preprocessing the received underwater signal by utilizing a signal processing technology to obtain a sample covariance matrix corresponding to the received signal;

the distance measurement module is used for inputting the sample covariance matrix into the underwater sound source distance measurement network to perform feature extraction and taking the output result as a predicted distance, wherein the underwater sound source distance measurement network takes a residual network as a main network, and the underwater sound source distance measurement network comprises a self-adaptive feature fusion module and at least one feature subspace channel attention module.

Optionally, in an embodiment of the present application, the preprocessing module is specifically configured to:

and separating the real part and the imaginary part of the initial sample covariance matrix, and stacking the separated sample covariance matrices of different frequencies along the first dimension to obtain the sample covariance matrix.

Optionally, in an embodiment of the present application, the underwater sound source ranging network further includes at least one pooling layer and at least one fully connected layer, the residual network is a multi-layer network, each layer is composed of at least one residual block, other layers except for the last layer in the residual network are intermediate layers, each intermediate layer corresponds to a characteristic subspace channel attention module, each layer of the residual network corresponds to one pooling layer and one fully connected layer, and the ranging module is specifically configured to:

the output data of each intermediate layer is respectively passed through a corresponding feature subspace channel attention module to obtain a feature map corresponding to each intermediate layer, all the feature maps are input into a self-adaptive feature fusion module to obtain integrated features corresponding to each intermediate layer, and the integrated features are respectively passed through a corresponding pooling layer and a full-connection layer to obtain prediction results of all the intermediate layers;

Additional aspects and advantages of the application will be set forth in part in the description which follows and, in part, will be obvious from the description, or may be learned by practice of the application.

Drawings

The foregoing and/or additional aspects and advantages of the present application will become apparent and readily appreciated from the following description of the embodiments, taken in conjunction with the accompanying drawings, in which:

fig. 1 is a schematic flow chart of an underwater sound source ranging method based on an attention mechanism and multi-scale fusion according to an embodiment of the present application;

FIG. 2 is a diagram illustrating an exemplary structure of a multi-scale fusion underwater sound source ranging network based on an attention mechanism according to an embodiment of the present application;

FIG. 3 is a schematic diagram of a feature subspace channel attention architecture of an embodiment of the present application;

fig. 4 is a schematic structural diagram of an adaptive feature fusion module according to an embodiment of the present application;

fig. 5 is a schematic structural diagram of an underwater sound source ranging device based on an attention mechanism and multi-scale fusion according to an embodiment of the present application.

Detailed Description

Embodiments of the present application are described in detail below, examples of which are illustrated in the accompanying drawings, wherein the same or similar reference numerals refer to the same or similar elements or elements having the same or similar functions throughout. The embodiments described below by referring to the drawings are exemplary and intended for the purpose of explaining the present application and are not to be construed as limiting the present application.

The following describes an underwater sound source ranging method and device based on an attention mechanism and multi-scale fusion according to the embodiments of the present application with reference to the accompanying drawings.

Fig. 1 is a schematic flow chart of an underwater sound source ranging method based on an attention mechanism and multi-scale fusion according to an embodiment of the present application.

As shown in fig. 1, the underwater sound source ranging method based on the attention mechanism and the multi-scale fusion comprises the following steps:

step 101, preprocessing a received underwater signal by using a signal processing technology to obtain a sample covariance matrix corresponding to the received signal;

step 102, inputting a sample covariance matrix into an underwater sound source ranging network for feature extraction, and taking an output result as a predicted distance, wherein the underwater sound source ranging network takes a residual network as a main network, and the underwater sound source ranging network comprises a self-adaptive feature fusion module and at least one feature subspace channel attention module.

for the pretreatment of the received signal, namely by a signal processing technology, the received signal is normalized firstly, and then a sample covariance matrix is calculated, wherein the formula is as follows:

wherein ,

representing the normalized signal, ++>

Representing the complex conjugate transpose>

Representing a sample covariance matrix in the shape of l×l, L representing the number of receive array elements. Due to->

Each data is in complex form and cannot be directly input into the neural network. Thus, the real part and the imaginary part thereof are separated, resulting in input data having a shape of 2×l×l.

Finally, at different frequencies

Stacking along a first dimension to obtain final input data in the shape of 2FXLXL, wherein +.>

Indicating the number of frequencies.

wherein ,

representing global features->

A set of feature maps representing the current process;

the weight of each channel is expressed as:

wherein ,

representing the weight of each channel, +.>

Representing global features->

and />

representing a convolution layer comprising->

and />

，/>

For compressing channel characteristics->

For restoring the channel dimension;

each updated set of feature maps is represented as:

wherein ,

a set of feature maps representing the current process, +.>

Representing the weight of each channel;

the feature map is expressed as:

wherein ,

representing updated->

wherein ,

、/>

、/>

representing all feature maps;

the initial fusion features are expressed as:

wherein ,

representing an initial fusion feature;

the split weights are expressed as:

wherein ,

representing the weight corresponding to each adjusted feature map;

the updated fusion characteristics are expressed as:

wherein ,

representing the updated fusion features;

wherein ,

representing the integrated features of the feature maps corresponding to all intermediate layers. />

FIG. 2 is a diagram showing an example of the structure of an underwater sound source ranging network based on the attention mechanism and multi-scale fusion of the present application, as shown in FIG. 2, the present embodiment first selects ResNet-50 as a backbone network, which contains four layers in total, each layer is composed of a plurality of residual blocks (ResBlock), and generates an output

. On one hand, the outputs are sequentially transmitted from bottom to top, a prediction result is obtained through the pooling layer and the full-connection layer, and on the other hand, the front three-layer output firstly passes through the attention of three characteristic subspace channels, so that channel characteristics and multi-frequency characteristics are extracted. Then, through a self-adaptive feature fusion module, fusion of semantic features and detail features is realized, and finally three prediction results ++are obtained through three pooling layers and a full-connection layer>

. Finally, adding and averaging the four prediction results to obtainThe final prediction result is defined by the following formula:

feature subspace channel attention

Because the channel dimension of the sample covariance matrix represents frequency, compared with the traditional image, the channel number is more, and the channel feature is richer. Thus, there is a need for efficient characterization and learning of channel features using a channel attention mechanism to impart varying importance to these features. Based on the above findings, the present embodiment devised a feature subspace channel attention. FIG. 3 is a schematic diagram of the attention of the channel of the feature subspace according to the present embodiment.

Let the input feature map be

Where M is the number of feature channels and H, W is the feature space dimension. The input feature map is first passed through a feature subspace module that divides the input feature map along the channel dimension, dividing G groups in total, each group comprising G features. The subspace division mode can effectively learn multi-collar features, and the extraction of the multi-collar features is helpful for solving the problem of large intra-class variation in the sample covariance matrix.

Next, each set of feature maps is passed through a compressed stimulus attention module (Squeeze-excitation module, SE). The SE emphasizes effective information and suppresses ineffective information by weighting the channel characteristics, so that the complex channel characteristics in the sample covariance matrix can be better extracted. SE mainly includes two operations, extrusion (squeeze) and excitation (specification). Taking one group of characteristic diagrams as an example, the group of characteristic diagrams is set as

The set of feature maps is first input into the squeeze module. squeeze encodes the entire spatial feature on a channel as a global feature using global averaging pooling, defined by the following equation:

wherein

Representing global features. Then, through an expression module, the expression captures the dependency relationship among channels through global features extracted by the squeeze, and the dependency relationship is defined by the following formula:

wherein

Representing the weight of each channel. Delta and sigma represent the ReLU activation function and sigmoid activation function, respectively.

and />

Representing a convolution layer, the first convolution layer compresses the channel characteristics, fully capturing the relationship between channels, and the second convolution layer restores the channel dimensions. And finally multiplying the weight of each channel to realize the recalibration of the input features in the channel dimension. The process is defined by the following formula:

wherein ,

a set of feature maps representing the current process, +.>

Representing the weight of each channel;

finally, each set of feature maps passing through the SE module are stitched along the channel dimension, the process being defined by the following formula:

wherein

Output representing attention of characteristic subspace channel, +.>

Representing each set of feature graphs through SE, concat represents a concatenation operation of channel dimensions.

Self-adaptive feature fusion module

Because of the higher intra-class variation in the sample covariance matrix, for data with similar distances, a single semantic feature cannot be accurately predicted, and low-level detail features need to be combined. Feature Pyramid (FPN) realizes feature integration by fusing deep semantic information and shallow detail features, and is an effective solution. However, the bottom-up path and top-down path in the FPN are one sequential approach. This sequential approach results in more concern for adjacent features per layer feature and itself, and lack of concern for cross-layer features. Based on the findings, the invention designs a self-adaptive feature fusion module. Fig. 4 is a schematic structural diagram of an adaptive feature fusion module according to the present embodiment.

The three scale feature diagrams of the input are respectively

、/>

、/>

The corresponding scales decrease in turn. In the first step, the three feature maps are adjusted to an intermediate size, i.e. with +.>

After scaling the feature map with the same size, the initial fusion feature ++is obtained by averaging>

The process is defined by the following formula:

the second step obtains a scale by a 1 x 1 convolution and softmax function

Is used for the spatial adaptation of the weights of the (c). Then split it in the channel dimension, and +.>

The three features correspond in sequence. The process is defined by the following formula:

wherein ,

representing the weight corresponding to each adjusted feature map;

third step, will

And->

Multiplying. The context information is aggregated by calculating a weighted sum. The process is defined by the following formula:

fourth, according to the structure in FPN, the reverse operation is used

Scaling and expanding, then connecting with +_ through jump>

Add, output integration feature->

. The process is defined by the following formula:

。

in order to achieve the above embodiment, the present application further provides an underwater sound source ranging device based on an attention mechanism and multi-scale fusion.

As shown in fig. 5, the underwater sound source ranging device based on the attention mechanism and the multi-scale fusion comprises a preprocessing module and a ranging module, wherein,

It should be noted that the foregoing explanation of the embodiment of the underwater sound source ranging method based on the attention mechanism and the multi-scale fusion is also applicable to the underwater sound source ranging device based on the attention mechanism and the multi-scale fusion of the embodiment, and will not be repeated herein.

In the description of the present specification, a description referring to the terms "one embodiment," "some embodiments," "examples," "particular examples," or "some examples," etc., means that a particular feature, structure, material, or characteristic described in connection with the embodiment or example is included in at least one embodiment or example of the present application. In this specification, schematic representations of the above terms are not necessarily directed to the same embodiment or example. Furthermore, the particular features, structures, materials, or characteristics described may be combined in any suitable manner in any one or more embodiments or examples. Furthermore, the different embodiments or examples described in this specification and the features of the different embodiments or examples may be combined and combined by those skilled in the art without contradiction.

Furthermore, the terms "first," "second," and the like, are used for descriptive purposes only and are not to be construed as indicating or implying a relative importance or implicitly indicating the number of technical features indicated. Thus, a feature defining "a first" or "a second" may explicitly or implicitly include at least one such feature. In the description of the present application, the meaning of "plurality" is at least two, such as two, three, etc., unless explicitly defined otherwise.

Any process or method descriptions in flow charts or otherwise described herein may be understood as representing modules, segments, or portions of code which include one or more executable instructions for implementing specific logical functions or steps of the process, and additional implementations are included within the scope of the preferred embodiment of the present application in which functions may be executed out of order from that shown or discussed, including substantially concurrently or in reverse order, depending on the functionality involved, as would be understood by those reasonably skilled in the art of the embodiments of the present application.

Logic and/or steps represented in the flowcharts or otherwise described herein, e.g., a ordered listing of executable instructions for implementing logical functions, can be embodied in any computer-readable medium for use by or in connection with an instruction execution system, apparatus, or device, such as a computer-based system, processor-containing system, or other system that can fetch the instructions from the instruction execution system, apparatus, or device and execute the instructions. For the purposes of this description, a "computer-readable medium" can be any means that can contain, store, communicate, propagate, or transport the program for use by or in connection with the instruction execution system, apparatus, or device. More specific examples (a non-exhaustive list) of the computer-readable medium would include the following: an electrical connection (electronic device) having one or more wires, a portable computer diskette (magnetic device), a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber device, and a portable compact disc read-only memory (CDROM). In addition, the computer readable medium may even be paper or other suitable medium on which the program is printed, as the program may be electronically captured, via, for instance, optical scanning of the paper or other medium, then compiled, interpreted or otherwise processed in a suitable manner, if necessary, and then stored in a computer memory.

It is to be understood that portions of the present application may be implemented in hardware, software, firmware, or a combination thereof. In the above-described embodiments, the various steps or methods may be implemented in software or firmware stored in a memory and executed by a suitable instruction execution system. As with the other embodiments, if implemented in hardware, may be implemented using any one or combination of the following techniques, as is well known in the art: discrete logic circuits having logic gates for implementing logic functions on data signals, application specific integrated circuits having suitable combinational logic gates, programmable Gate Arrays (PGAs), field Programmable Gate Arrays (FPGAs), and the like.

Those of ordinary skill in the art will appreciate that all or a portion of the steps carried out in the method of the above-described embodiments may be implemented by a program to instruct related hardware, where the program may be stored in a computer readable storage medium, and where the program, when executed, includes one or a combination of the steps of the method embodiments.

In addition, each functional unit in each embodiment of the present application may be integrated in one processing module, or each unit may exist alone physically, or two or more units may be integrated in one module. The integrated modules may be implemented in hardware or in software functional modules. The integrated modules may also be stored in a computer readable storage medium if implemented in the form of software functional modules and sold or used as a stand-alone product.

The above-mentioned storage medium may be a read-only memory, a magnetic disk or an optical disk, or the like. Although embodiments of the present application have been shown and described above, it will be understood that the above embodiments are illustrative and not to be construed as limiting the application, and that variations, modifications, alternatives, and variations may be made to the above embodiments by one of ordinary skill in the art within the scope of the application.

Claims

1. An underwater sound source ranging method based on an attention mechanism and multi-scale fusion is characterized by comprising the following steps:

preprocessing a received underwater signal by using a signal processing technology to obtain a sample covariance matrix corresponding to the received signal;

and inputting the sample covariance matrix into an underwater sound source ranging network for feature extraction, and taking an output result as a predicted distance, wherein the underwater sound source ranging network takes a residual network as a main network, and the underwater sound source ranging network comprises a self-adaptive feature fusion module and at least one feature subspace channel attention module.

2. The method for ranging underwater sound sources based on an attention mechanism and multi-scale fusion according to claim 1, wherein the preprocessing of the received underwater signals by using a signal processing technology to obtain a sample covariance matrix corresponding to the received signals comprises:

separating the real part and the imaginary part of the initial sample covariance matrix, and stacking the separated sample covariance matrices of different frequencies along a first dimension to obtain the sample covariance matrix;

wherein the initial sample covariance matrix is expressed as:

wherein ,

the representation shape is +.>

Sample covariance matrix of>

Representing the number of array elements of the receiving array,/->

Representing the normalized signal, ++>

Representing the complex conjugate transpose.

3. The underwater sound source ranging method based on the attention mechanism and the multi-scale fusion according to claim 1, wherein the underwater sound source ranging network further comprises at least one pooling layer and at least one fully-connected layer, the residual network is a multi-layer network, each layer is composed of at least one residual block, other layers except the last layer in the residual network are intermediate layers, each intermediate layer corresponds to a characteristic subspace channel attention module, each layer of the residual network corresponds to one pooling layer and one fully-connected layer, and the inputting the sample covariance matrix into the underwater sound source ranging network for characteristic extraction comprises:

inputting the sample covariance matrix into the residual error network, sequentially passing through each layer of network to obtain output data of each middle layer, and obtaining an output result of the last layer as a final output result of the residual error network;

the final output result passes through a corresponding pooling layer and a full-connection layer to obtain an initial prediction result;

the output data of each intermediate layer is respectively passed through a corresponding feature subspace channel attention module to obtain feature graphs corresponding to each intermediate layer, all the feature graphs are input into the self-adaptive feature fusion module to obtain integrated features corresponding to each intermediate layer, and the integrated features corresponding to each intermediate layer are respectively passed through a corresponding pooling layer and a full-connection layer to obtain prediction results of all the intermediate layers;

and adding and averaging the initial prediction results and the prediction results of all the intermediate layers to obtain a final prediction result.

4. An attention mechanism and multiscale fusion based underwater sound source ranging method according to claim 3 wherein the feature subspace channel attention module comprises a feature subspace module and at least one compressed excitation attention module comprising a compression module and an excitation module, the input of the intermediate layer output data to the feature subspace channel attention module comprising:

dividing the output data along a channel dimension by using the feature subspace module to obtain at least one group of feature graphs, wherein each group of feature graphs corresponds to one compression excitation attention module;

inputting each group of feature graphs into a corresponding compression excitation attention module, coding the whole spatial feature on the channels of the feature subgroups into global features by using global average pooling through the corresponding compression modules, obtaining the weight of each channel according to the global features through the corresponding excitation modules, and multiplying the weight of each channel by the feature graphs of the corresponding groups to obtain updated feature graphs of each group;

and splicing the updated feature graphs of each group along the channel dimension to obtain corresponding feature graphs.

5. The underwater sound source ranging method based on the attention mechanism and the multi-scale fusion according to claim 4, wherein the global features are expressed as:

wherein ,

representing global features->

A set of feature maps representing the current process;

the weight of each channel is expressed as:

wherein ,

representing the weight of each channel, +.>

Representing global features->

and />

representing a convolution layer comprising->

and />

，/>

For compressing channel characteristics->

For restoring the channel dimension;

each set of updated feature maps is represented as:

wherein ,

a set of feature maps representing the current process, +.>

Representing the weight of each channel;

the feature map is expressed as:

wherein ,

representing updated->

6. The method for ranging underwater sound sources based on an attention mechanism and multi-scale fusion according to claim 3, wherein said inputting all feature maps into the adaptive feature fusion module to obtain the integrated features corresponding to each intermediate layer comprises:

selecting the intermediate value of the picture sizes of all feature images as a standard value, adjusting the sizes of all feature images according to the standard value through interpolation and maximum pooling, and fusing the adjusted feature images to obtain initial fusion features;

the initial fusion feature is subjected to convolution and softmax functions to obtain space self-adaptive weights, and the space self-adaptive weights are split in channel dimensions so that the space self-adaptive weights correspond to the adjusted feature images in sequence to obtain split weights;

correspondingly multiplying the adjusted feature map and the corresponding split weight and then adding to obtain updated fusion features;

7. The method for ranging underwater sound sources based on the attention mechanism and the multi-scale fusion according to claim 6, wherein the adjusted feature map is expressed as:

/>

wherein ,

、/>

、/>

representing all feature maps;

the initial fusion features are expressed as:

wherein ,

representing the initial fusion feature;

the splitting weight is expressed as:

wherein ,

representing the weight corresponding to each adjusted feature map;

the updated fusion characteristics are expressed as:

wherein ,

representing the updated fusion feature;

the integrated features of the feature map corresponding to each intermediate layer are expressed as follows:

wherein ,

8. An underwater sound source ranging device based on an attention mechanism and multi-scale fusion is characterized by comprising a preprocessing module and a ranging module, wherein,

the distance measurement module is used for inputting the sample covariance matrix into an underwater sound source distance measurement network to perform feature extraction and taking an output result as a predicted distance, wherein the underwater sound source distance measurement network takes a residual network as a main network, and the underwater sound source distance measurement network comprises a self-adaptive feature fusion module and at least one feature subspace channel attention module.

9. The underwater sound source ranging device based on the attention mechanism and the multi-scale fusion as claimed in claim 8, wherein the preprocessing module is specifically configured to:

and separating the real part and the imaginary part of the initial sample covariance matrix, and stacking the separated sample covariance matrices of different frequencies along a first dimension to obtain the sample covariance matrix.

10. The underwater sound source ranging device based on the attention mechanism and the multi-scale fusion according to claim 8, wherein the underwater sound source ranging network further comprises at least one pooling layer and at least one fully-connected layer, the residual network is a multi-layer network, each layer is composed of at least one residual block, other layers except the last layer in the residual network are intermediate layers, each intermediate layer corresponds to a characteristic subspace channel attention module, each layer of the residual network corresponds to one pooling layer and one fully-connected layer, and the ranging module is specifically configured to: