CN112183582A

CN112183582A - Multi-feature fusion underwater target identification method

Info

Publication number: CN112183582A
Application number: CN202010930201.7A
Authority: CN
Inventors: 殷波; 魏志强; 贾东宁
Original assignee: Ocean University of China
Current assignee: Ocean University of China
Priority date: 2020-09-07
Filing date: 2020-09-07
Publication date: 2021-01-05

Abstract

The invention discloses a multi-feature fusion underwater target identification method, which comprises the following steps: (1) data preprocessing: (2) feature extraction: performing short-time energy feature extraction on the processed data in the aspect of time domain, decomposing the processed data in the aspect of frequency domain by using an EMD (empirical mode decomposition) method to obtain a plurality of IMF (intrinsic mode function) components, and performing GFCC (Gaussian filtered coefficient) feature extraction on each obtained IMF component; (3) feature fusion: combining the two audio signal feature vectors in an end-to-end connection mode to form a fusion feature vector; (4) building a model: introducing a weighted voting mechanism to build a CNN-LSTM integrated time sequence network model; (5) target identification: and taking the fused feature vector as an input vector of the CNN-LSTM integrated time sequence network, and selecting the category corresponding to the maximum value as a final target identification classification result. The method disclosed by the invention improves the identification precision and accuracy of the underwater target.

Description

Multi-feature fusion underwater target identification method

Technical Field

The invention relates to an underwater target identification method, in particular to an underwater target identification method with multi-feature fusion.

Background

With the development of the ocean sonar technology, underwater target recognition becomes one of the most important technologies in the field of underwater sound detection, and is increasingly applied to scientific activities such as marine organism investigation, detection and recognition of mines and submersibles and the like. In recent years, various sonar system schemes with high sensitivity are developed, and the distance and the precision of detection and positioning are greatly improved. The passive sonar system detects and identifies by using radiation noise of a target, can keep the concealment of the passive sonar system when finding the target, and is a key link in passive sonar research based on an underwater sound target radiation noise target identification technology. With the development of deep learning, a deep learning framework based on big data obtains brilliant results in many fields such as voice recognition, text translation and the like, and is obviously improved compared with the traditional machine learning method. Therefore, deep feature mining and extraction are carried out based on the deep learning method, a more stable, more intelligent and more automatic target identification model is established, and the method becomes a new direction for the development of the underwater target identification technology.

In order to improve the accuracy and efficiency of underwater target detection, sonar target feature extraction methods adopted in various countries are also continuously updated. The existing research means is to perform single feature extraction after preprocessing the underwater acoustic signals. Such as linear predictive coding, mel-frequency cepstral coefficients, etc. The classification recognition model is mainly a widely applied classical machine learning method, such as a shallow layer model through a network classifier, clustering, K nearest neighbor, a support vector machine, a Markov chain model and the like.

Due to the complex and variable underwater environment and the difference of radiation source structures, the radiation noise difference of different targets is obvious, the original signal characteristics are difficult to accurately and comprehensively represent by a single characteristic vector, part of important characteristics are lost, and the identification is inaccurate. How to rapidly and accurately extract the characteristic parameters capable of effectively representing the characteristics of the voice signals so as to realize classification and recognition is the key of the recognition precision of the underwater sound target. The low noise development trend of submarines puts higher requirements on the underwater target identification capability. Under the trend, the performance of some classical target identification algorithms is difficult to meet the requirements of modern sea warfare, and the improvement of the accuracy of underwater target identification becomes a problem to be solved urgently.

Disclosure of Invention

In order to solve the technical problem, the invention provides a multi-feature fusion underwater target identification method, which is characterized in that sound signals are obtained from two different angles of a signal time domain and a signal frequency domain, underwater audio features can be expressed in an all-around mode, a weighted voting method is adopted to integrate a CNN (continuous channel network) and LSTM (local state transducer) integrated network suitable for time sequence data modeling, and extracted feature sequences are input into the network to be identified and classified, so that the identification precision of underwater targets is improved.

In order to achieve the purpose, the technical scheme of the invention is as follows:

a multi-feature fusion underwater target identification method comprises the following steps:

(1) data preprocessing: carrying out min-max standardization processing on the acquired underwater sound signal data, and mapping a result value to [0-1 ];

(2) feature extraction: performing short-time energy feature extraction on the processed data in the aspect of time domain, decomposing the processed data in the aspect of frequency domain by using an EMD (empirical mode decomposition) method to obtain a plurality of IMF (intrinsic mode function) components, and performing GFCC (Gaussian filtered coefficient) feature extraction on each obtained IMF component;

(3) feature fusion: firstly, extracting short-time energy characteristics and GFCC characteristics by adopting the same framing mode, and then combining two audio signal characteristic vectors by adopting an end-to-end connection mode to form a fusion characteristic vector;

(4) building a model: introducing a weighted voting mechanism to build a CNN-LSTM integrated time sequence network model, wherein the model comprises a CNN sub-classifier and an LSTM sub-classifier;

(5) target identification: and taking the fused feature vector as an input vector of the CNN-LSTM integrated time sequence network model, respectively inputting the input vector into the CNN sub-classifier and the LSTM sub-classifier, distributing high weight to the sub-classifier with high classification accuracy, multiplying the prediction probability of each category of the two sub-classifiers by the weight to obtain a classification vector, and selecting the category corresponding to the maximum value in the classification vector as a final target identification classification result.

In the above scheme, in the step (2), the short-time energy feature extraction method has the following formula:

wherein E is_nThe short-time energy characteristic of the nth frame filtering and windowing is shown, N is the length of a frame of voice signal, and y (N) is the signal after filtering and windowing.

In the above scheme, in step (2), the GFCC feature extraction method is as follows: after preprocessing the audio signal, dividing the audio signal into n frames, and performing EMD on each frame of signal to obtain a plurality of IMF components; and then carrying out fast Fourier transform on each IMF component, adding the IMF components of each frequency band, filtering through a Gamma-tone filter, carrying out logarithmic compression on the output value of each filter to obtain a group of logarithmic energy spectrums, and finally carrying out discrete cosine transform to obtain the GFCC characteristics.

In the scheme, firstly, an audio signal is divided into n frames, and each frame of voice signal is decomposed into I IMF components through EMD; then, fast Fourier transform is respectively carried out on each IMF component to obtain a discrete power spectrum s of the voice signal_i(k)：

Wherein I is 1, 2, …, I, t is time, k is the order of the filter, s (k) is the original audio signal, d is the number of discrete fourier transform sampling points;

then, frequency synthesis is carried out to obtain a power spectrum x (t):

the time domain impulse response of the Gamma-tone filter can be regarded as the product of the Gamma-tone function, and is expressed as follows:

wherein g (t) is the time domain impulse response of the Gamma filter, t is time, f is the center frequency, a is the control filter gain,

for phase, k is the order of the filter, b is the attenuation factor, which determines the corresponding bandwidth of the filter, whose value is determined by the center frequency f:

after frequency synthesis, squaring the power spectrum x (t) to obtain an energy spectrum, and performing logarithmic compression on the output value of each filter to obtain a logarithmic energy spectrum:

wherein j is 1, 2, …, H, H is the number of Gamma filters;

finally, discrete cosine transform is carried out on the energy spectrum e (j) to obtain GFCC characteristics, and the calculation formula is as follows:

wherein M is 1, 2, …, M is the dimension of GFCC characteristic parameter, G_mnThe nth frame of the audio signal, the mth dimension.

In the scheme, in the step (3), before the feature fusion, Min-Max standardization processing is carried out on the feature data:

wherein x is_maxIs the maximum value, x, in the feature data_minIs the minimum value in the feature data, x is the feature data, x^*And standardizing the processed characteristic data for Min-Max.

In the above scheme, in step (3), the two audio signal feature vectors are combined in an end-to-end manner to form a fusion matrix, and the fused feature matrix T formula is as follows:

T＝[G_1n,G_2n,…,G_mn,G_n] (9)

wherein G is_mnFor the nth frame, the m-th GFCC characteristic, E_nThe windowed short-time energy features are filtered for the nth frame extracted in the same frame-wise manner.

In the above scheme, in step (4), the first layer of the CNN sub-classifier is a one-dimensional convolution layer, and then a residual structure is constructed, the left side is the one-dimensional convolution layer, the right side is two layers of expanded convolution layers, and the last layer is a full-link layer.

In the scheme, in the step (4), the LSTM sub-classifier selects an LSTM network, and finally classifies the LSTM network by two layers of fully-connected softmax functions.

In the above scheme, in step (5), the weight is defined as:

wherein p is_lThe classification accuracy of the first sub-classifier is given, L is the total number of the sub-classifiers, and L is 2;

then, the classification result obtained by each sub-classifier is multiplied by the corresponding weight value to obtain a classification vector, as shown in formula (11):

and finally, the category corresponding to the maximum value in the classification vector P is the final classification result of the integration algorithm.

Through the technical scheme, the underwater target identification method with multi-feature fusion provided by the invention has the following beneficial effects:

1. the invention adopts the improved GFCC method based on empirical mode decomposition, is more suitable for the characteristic of non-stationarity of the underwater sound signal, and effectively improves the accuracy of underwater sound target identification by simulating the auditory perception characteristic of human ears.

2. The method extracts the features from two different angles of the time domain and the change domain, more comprehensively and accurately represents the original ocean signals, and solves the problem that the single feature extraction method cannot accurately capture the audio features.

3. The method uses the CNN-LSTM integrated time sequence network suitable for time sequence data to carry out modeling, uses the fused characteristic data as the network input vector to carry out classification recognition research, and improves the accuracy of underwater target recognition.

4. In the integrated network, a weighted voting algorithm is introduced, the error of a single classifier is reduced by utilizing the complementary function between each sub-classification model, and the accuracy of classification identification is improved.

Drawings

In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below.

FIG. 1 is a schematic flow chart of a multi-feature fusion underwater target identification method disclosed by an embodiment of the invention;

FIG. 2 is a flowchart of GFCC feature extraction according to an embodiment of the present invention;

fig. 3 is a structure diagram of a CNN-LSTM integrated timing network disclosed in the embodiment of the present invention.

Detailed Description

The technical solution in the embodiments of the present invention will be clearly and completely described below with reference to the accompanying drawings in the embodiments of the present invention.

The invention provides a multi-feature fusion underwater target identification method, as shown in fig. 1, the specific embodiment is as follows:

(1) data preprocessing: in order to eliminate dimension influence generated between different dimensions of the extracted features and avoid deviation of classification results caused by over-large or over-small part of features, min-max standardization processing is carried out on data before feature extraction is carried out, and result values are mapped to [0-1 ].

(2) Feature extraction: and short-time energy feature extraction is carried out on the processed data in the time domain, the energy of the underwater sound signal changes obviously along with time, and the short-time energy can better express audio features. The short-time energy feature extraction method has the following formula:

In the frequency domain, as shown in fig. 2, after the audio signal is preprocessed, the audio signal is divided into n frames, each frame of signal is subjected to EMD decomposition to obtain a series of Intrinsic Mode Function (IMF) components containing different frequency components from high to low, the frequency components are changed along with the change of the signal, then each IMF component is subjected to fast fourier transform, the IMF components of each frequency band are added, filtering is performed through a Gammatone filter, the output value of each filter is subjected to logarithmic compression to obtain a set of logarithmic energy spectrums, and finally, the GFCC characteristics are obtained through discrete cosine transform.

The method comprises the following specific steps:

firstly, dividing an audio signal into n frames, and decomposing each frame of voice signal into I IMF components through EMD; then, fast Fourier transform is respectively carried out on each IMF component to obtain a discrete power spectrum s of the voice signal_i(k)：

then, frequency synthesis is carried out to obtain a power spectrum x (t):

the Gamma atom filter can better simulate the frequency decomposition function and sharp filtering characteristic of a cochlear basilar membrane only by needing few parameters, has simple time domain impact response and is easy to physically realize. The time domain expression is as follows:

the non-linear characteristic is a very important point in the human auditory system, mainly because of its interference-free capability. Logarithmic compression is used to model this non-linear behavior.

wherein j is 1, 2, …, H, H is the number of Gamma filters;

where M is 1, 2, …, M is the dimension of the GFCC characteristic parameter, since the signal is divided into n frames, G, before EMD decomposition_mnThe GFCC characteristic for the nth frame, mth dimension of the audio signal.

(3) Feature fusion: the short-time energy characteristic and the GFCC characteristic are extracted by adopting the same framing mode, and then the characteristic vectors of the two audio signals are combined in an end-to-end connection mode to form a fusion characteristic vector.

In order to ensure that the features in different dimensions have certain comparability on the numerical value and avoid the deviation of part of features with overlarge or undersize numerical values on the classification result, Min-Max standardization processing is carried out on the feature data before feature fusion:

Then, combining the two audio signal feature vectors in an end-to-end connection mode to form a fusion matrix, wherein the T formula of the fusion feature matrix is as follows:

T＝[G_1n,G_2n,…,G_mn,G_n] (9)

(4) Building a model: a weighted voting mechanism is introduced to build a CNN-LSTM integrated time sequence network model, as shown in FIG. 3, the model comprises a CNN sub-classifier and an LSTM sub-classifier.

In the CNN sub-classifier, the CNN network is converted into a model suitable for time series data by combining one-dimensional full convolution and expansion convolution. The first layer is a one-dimensional convolution layer, then a residual structure is constructed, the left side is the one-dimensional convolution layer, the right side is two layers of expansion convolution layers, and the last layer is a full-connection layer. The expansion coefficient of each expanded convolutional layer of the residual structure increases exponentially, so that the convolutional core is ensured to cover all the input in the effective history information, and the extremely long effective history information can be generated by using the depth network.

And the LSTM sub-classifier selects the LSTM network which is most suitable for sequence data modeling, and finally classifies through two layers of fully-connected softmax functions.

The weight is defined as:

then, the classification result obtained by each sub-classifier is multiplied by the corresponding weight value to obtain a classification vector, as shown in formula (8):

The weighted voting algorithm strengthens the influence of the sub-classifiers with high accuracy on the final result by distributing higher weights to the sub-classifiers. The error of a single classifier is reduced by utilizing the complementary function between single classification models, and the classification precision can be further improved.

The previous description of the disclosed embodiments is provided to enable any person skilled in the art to make or use the present invention. Various modifications to these embodiments will be readily apparent to those skilled in the art, and the generic principles defined herein may be applied to other embodiments without departing from the spirit or scope of the invention. Thus, the present invention is not intended to be limited to the embodiments shown herein but is to be accorded the widest scope consistent with the principles and novel features disclosed herein.

Claims

1. A multi-feature fusion underwater target identification method is characterized by comprising the following steps:

2. The method for identifying the underwater target with the multi-feature fusion as claimed in claim 1, wherein in the step (2), the short-time energy feature extraction method has the following formula:

3. The method for identifying the underwater target with the multi-feature fusion as claimed in claim 1, wherein in the step (2), the GFCC feature extraction method comprises the following steps: after preprocessing the audio signal, dividing the audio signal into n frames, and performing EMD on each frame of signal to obtain a plurality of IMF components; and then carrying out fast Fourier transform on each IMF component, adding the IMF components of each frequency band, filtering through a Gamma-tone filter, carrying out logarithmic compression on the output value of each filter to obtain a group of logarithmic energy spectrums, and finally carrying out discrete cosine transform to obtain the GFCC characteristics.

4. The method for identifying the underwater target with the multi-feature fusion as claimed in claim 3, characterized in that, firstly, the audio signal is divided into n frames, and each frame of the voice signal is decomposed into I IMF components through EMD; then, fast Fourier transform is respectively carried out on each IMF component to obtain a discrete power spectrum s of the voice signal_i(k)：

then, frequency synthesis is carried out to obtain a power spectrum x (t):

wherein j is 1, 2, …, H, H is the number of Gamma filters;

wherein M is 1, 2, …, M is the dimension of GFCC characteristic parameter, G_mnThe GFCC characteristic for the nth frame, mth dimension of the segment of audio signal.

5. The method for identifying the underwater target with the multi-feature fusion as claimed in claim 1, wherein in the step (3), before the feature fusion, Min-Max standardization processing is performed on the feature data:

6. The method for identifying the underwater target with the multi-feature fusion as claimed in claim 1, wherein in the step (3), the feature vectors of the two audio signals are combined in an end-to-end connection manner to form a fusion matrix, and the T formula of the fusion feature matrix is as follows:

T＝[G_1n,G_2n,…,G_mn,G_n] (9)

7. The method for identifying the underwater target with the multi-feature fusion as claimed in claim 1, wherein in the step (4), the first layer of the CNN sub-classifiers is a one-dimensional convolution layer, then a residual structure is constructed, the left layer is the one-dimensional convolution layer, the right layer is two expansion convolution layers, and the last layer is a full connection layer.

8. The method for identifying the underwater target with the multi-feature fusion as claimed in claim 1, wherein in the step (4), the LSTM sub-classifier selects an LSTM network, and finally performs classification through two layers of fully connected softmax functions.

9. The method for identifying the underwater target with the multi-feature fusion as claimed in claim 1, wherein in the step (5), the weight is defined as: