CN112183582A - Multi-feature fusion underwater target identification method - Google Patents

Multi-feature fusion underwater target identification method Download PDF

Info

Publication number
CN112183582A
CN112183582A CN202010930201.7A CN202010930201A CN112183582A CN 112183582 A CN112183582 A CN 112183582A CN 202010930201 A CN202010930201 A CN 202010930201A CN 112183582 A CN112183582 A CN 112183582A
Authority
CN
China
Prior art keywords
feature
sub
fusion
classifier
gfcc
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202010930201.7A
Other languages
Chinese (zh)
Inventor
殷波
魏志强
贾东宁
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Ocean University of China
Original Assignee
Ocean University of China
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Ocean University of China filed Critical Ocean University of China
Priority to CN202010930201.7A priority Critical patent/CN112183582A/en
Publication of CN112183582A publication Critical patent/CN112183582A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • G06F18/2415Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches based on parametric or probabilistic models, e.g. based on likelihood ratio or false acceptance rate versus a false rejection rate
    • GPHYSICS
    • G01MEASURING; TESTING
    • G01SRADIO DIRECTION-FINDING; RADIO NAVIGATION; DETERMINING DISTANCE OR VELOCITY BY USE OF RADIO WAVES; LOCATING OR PRESENCE-DETECTING BY USE OF THE REFLECTION OR RERADIATION OF RADIO WAVES; ANALOGOUS ARRANGEMENTS USING OTHER WAVES
    • G01S7/00Details of systems according to groups G01S13/00, G01S15/00, G01S17/00
    • G01S7/52Details of systems according to groups G01S13/00, G01S15/00, G01S17/00 of systems according to group G01S15/00
    • G01S7/523Details of pulse systems
    • G01S7/526Receivers
    • G01S7/527Extracting wanted echo signals
    • G01S7/5273Extracting wanted echo signals using digital techniques
    • GPHYSICS
    • G01MEASURING; TESTING
    • G01SRADIO DIRECTION-FINDING; RADIO NAVIGATION; DETERMINING DISTANCE OR VELOCITY BY USE OF RADIO WAVES; LOCATING OR PRESENCE-DETECTING BY USE OF THE REFLECTION OR RERADIATION OF RADIO WAVES; ANALOGOUS ARRANGEMENTS USING OTHER WAVES
    • G01S7/00Details of systems according to groups G01S13/00, G01S15/00, G01S17/00
    • G01S7/52Details of systems according to groups G01S13/00, G01S15/00, G01S17/00 of systems according to group G01S15/00
    • G01S7/534Details of non-pulse systems
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/049Temporal neural networks, e.g. delay elements, oscillating neurons or pulsed inputs
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Artificial Intelligence (AREA)
  • General Engineering & Computer Science (AREA)
  • Computing Systems (AREA)
  • Molecular Biology (AREA)
  • Computational Linguistics (AREA)
  • Biophysics (AREA)
  • Biomedical Technology (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Health & Medical Sciences (AREA)
  • General Health & Medical Sciences (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Remote Sensing (AREA)
  • Radar, Positioning & Navigation (AREA)
  • Probability & Statistics with Applications (AREA)
  • Evolutionary Biology (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Complex Calculations (AREA)

Abstract

The invention discloses a multi-feature fusion underwater target identification method, which comprises the following steps: (1) data preprocessing: (2) feature extraction: performing short-time energy feature extraction on the processed data in the aspect of time domain, decomposing the processed data in the aspect of frequency domain by using an EMD (empirical mode decomposition) method to obtain a plurality of IMF (intrinsic mode function) components, and performing GFCC (Gaussian filtered coefficient) feature extraction on each obtained IMF component; (3) feature fusion: combining the two audio signal feature vectors in an end-to-end connection mode to form a fusion feature vector; (4) building a model: introducing a weighted voting mechanism to build a CNN-LSTM integrated time sequence network model; (5) target identification: and taking the fused feature vector as an input vector of the CNN-LSTM integrated time sequence network, and selecting the category corresponding to the maximum value as a final target identification classification result. The method disclosed by the invention improves the identification precision and accuracy of the underwater target.

Description

Multi-feature fusion underwater target identification method
Technical Field
The invention relates to an underwater target identification method, in particular to an underwater target identification method with multi-feature fusion.
Background
With the development of the ocean sonar technology, underwater target recognition becomes one of the most important technologies in the field of underwater sound detection, and is increasingly applied to scientific activities such as marine organism investigation, detection and recognition of mines and submersibles and the like. In recent years, various sonar system schemes with high sensitivity are developed, and the distance and the precision of detection and positioning are greatly improved. The passive sonar system detects and identifies by using radiation noise of a target, can keep the concealment of the passive sonar system when finding the target, and is a key link in passive sonar research based on an underwater sound target radiation noise target identification technology. With the development of deep learning, a deep learning framework based on big data obtains brilliant results in many fields such as voice recognition, text translation and the like, and is obviously improved compared with the traditional machine learning method. Therefore, deep feature mining and extraction are carried out based on the deep learning method, a more stable, more intelligent and more automatic target identification model is established, and the method becomes a new direction for the development of the underwater target identification technology.
In order to improve the accuracy and efficiency of underwater target detection, sonar target feature extraction methods adopted in various countries are also continuously updated. The existing research means is to perform single feature extraction after preprocessing the underwater acoustic signals. Such as linear predictive coding, mel-frequency cepstral coefficients, etc. The classification recognition model is mainly a widely applied classical machine learning method, such as a shallow layer model through a network classifier, clustering, K nearest neighbor, a support vector machine, a Markov chain model and the like.
Due to the complex and variable underwater environment and the difference of radiation source structures, the radiation noise difference of different targets is obvious, the original signal characteristics are difficult to accurately and comprehensively represent by a single characteristic vector, part of important characteristics are lost, and the identification is inaccurate. How to rapidly and accurately extract the characteristic parameters capable of effectively representing the characteristics of the voice signals so as to realize classification and recognition is the key of the recognition precision of the underwater sound target. The low noise development trend of submarines puts higher requirements on the underwater target identification capability. Under the trend, the performance of some classical target identification algorithms is difficult to meet the requirements of modern sea warfare, and the improvement of the accuracy of underwater target identification becomes a problem to be solved urgently.
Disclosure of Invention
In order to solve the technical problem, the invention provides a multi-feature fusion underwater target identification method, which is characterized in that sound signals are obtained from two different angles of a signal time domain and a signal frequency domain, underwater audio features can be expressed in an all-around mode, a weighted voting method is adopted to integrate a CNN (continuous channel network) and LSTM (local state transducer) integrated network suitable for time sequence data modeling, and extracted feature sequences are input into the network to be identified and classified, so that the identification precision of underwater targets is improved.
In order to achieve the purpose, the technical scheme of the invention is as follows:
a multi-feature fusion underwater target identification method comprises the following steps:
(1) data preprocessing: carrying out min-max standardization processing on the acquired underwater sound signal data, and mapping a result value to [0-1 ];
(2) feature extraction: performing short-time energy feature extraction on the processed data in the aspect of time domain, decomposing the processed data in the aspect of frequency domain by using an EMD (empirical mode decomposition) method to obtain a plurality of IMF (intrinsic mode function) components, and performing GFCC (Gaussian filtered coefficient) feature extraction on each obtained IMF component;
(3) feature fusion: firstly, extracting short-time energy characteristics and GFCC characteristics by adopting the same framing mode, and then combining two audio signal characteristic vectors by adopting an end-to-end connection mode to form a fusion characteristic vector;
(4) building a model: introducing a weighted voting mechanism to build a CNN-LSTM integrated time sequence network model, wherein the model comprises a CNN sub-classifier and an LSTM sub-classifier;
(5) target identification: and taking the fused feature vector as an input vector of the CNN-LSTM integrated time sequence network model, respectively inputting the input vector into the CNN sub-classifier and the LSTM sub-classifier, distributing high weight to the sub-classifier with high classification accuracy, multiplying the prediction probability of each category of the two sub-classifiers by the weight to obtain a classification vector, and selecting the category corresponding to the maximum value in the classification vector as a final target identification classification result.
In the above scheme, in the step (2), the short-time energy feature extraction method has the following formula:
Figure BDA0002669942810000021
wherein E isnThe short-time energy characteristic of the nth frame filtering and windowing is shown, N is the length of a frame of voice signal, and y (N) is the signal after filtering and windowing.
In the above scheme, in step (2), the GFCC feature extraction method is as follows: after preprocessing the audio signal, dividing the audio signal into n frames, and performing EMD on each frame of signal to obtain a plurality of IMF components; and then carrying out fast Fourier transform on each IMF component, adding the IMF components of each frequency band, filtering through a Gamma-tone filter, carrying out logarithmic compression on the output value of each filter to obtain a group of logarithmic energy spectrums, and finally carrying out discrete cosine transform to obtain the GFCC characteristics.
In the scheme, firstly, an audio signal is divided into n frames, and each frame of voice signal is decomposed into I IMF components through EMD; then, fast Fourier transform is respectively carried out on each IMF component to obtain a discrete power spectrum s of the voice signali(k):
Figure BDA0002669942810000022
Wherein I is 1, 2, …, I, t is time, k is the order of the filter, s (k) is the original audio signal, d is the number of discrete fourier transform sampling points;
then, frequency synthesis is carried out to obtain a power spectrum x (t):
Figure BDA0002669942810000031
the time domain impulse response of the Gamma-tone filter can be regarded as the product of the Gamma-tone function, and is expressed as follows:
Figure BDA0002669942810000032
wherein g (t) is the time domain impulse response of the Gamma filter, t is time, f is the center frequency, a is the control filter gain,
Figure BDA0002669942810000033
for phase, k is the order of the filter, b is the attenuation factor, which determines the corresponding bandwidth of the filter, whose value is determined by the center frequency f:
Figure BDA0002669942810000034
after frequency synthesis, squaring the power spectrum x (t) to obtain an energy spectrum, and performing logarithmic compression on the output value of each filter to obtain a logarithmic energy spectrum:
Figure BDA0002669942810000035
wherein j is 1, 2, …, H, H is the number of Gamma filters;
finally, discrete cosine transform is carried out on the energy spectrum e (j) to obtain GFCC characteristics, and the calculation formula is as follows:
Figure BDA0002669942810000036
wherein M is 1, 2, …, M is the dimension of GFCC characteristic parameter, GmnThe nth frame of the audio signal, the mth dimension.
In the scheme, in the step (3), before the feature fusion, Min-Max standardization processing is carried out on the feature data:
Figure BDA0002669942810000037
wherein x ismaxIs the maximum value, x, in the feature dataminIs the minimum value in the feature data, x is the feature data, x*And standardizing the processed characteristic data for Min-Max.
In the above scheme, in step (3), the two audio signal feature vectors are combined in an end-to-end manner to form a fusion matrix, and the fused feature matrix T formula is as follows:
T=[G1n,G2n,…,Gmn,Gn] (9)
wherein G ismnFor the nth frame, the m-th GFCC characteristic, EnThe windowed short-time energy features are filtered for the nth frame extracted in the same frame-wise manner.
In the above scheme, in step (4), the first layer of the CNN sub-classifier is a one-dimensional convolution layer, and then a residual structure is constructed, the left side is the one-dimensional convolution layer, the right side is two layers of expanded convolution layers, and the last layer is a full-link layer.
In the scheme, in the step (4), the LSTM sub-classifier selects an LSTM network, and finally classifies the LSTM network by two layers of fully-connected softmax functions.
In the above scheme, in step (5), the weight is defined as:
Figure BDA0002669942810000041
wherein p islThe classification accuracy of the first sub-classifier is given, L is the total number of the sub-classifiers, and L is 2;
then, the classification result obtained by each sub-classifier is multiplied by the corresponding weight value to obtain a classification vector, as shown in formula (11):
Figure BDA0002669942810000042
and finally, the category corresponding to the maximum value in the classification vector P is the final classification result of the integration algorithm.
Through the technical scheme, the underwater target identification method with multi-feature fusion provided by the invention has the following beneficial effects:
1. the invention adopts the improved GFCC method based on empirical mode decomposition, is more suitable for the characteristic of non-stationarity of the underwater sound signal, and effectively improves the accuracy of underwater sound target identification by simulating the auditory perception characteristic of human ears.
2. The method extracts the features from two different angles of the time domain and the change domain, more comprehensively and accurately represents the original ocean signals, and solves the problem that the single feature extraction method cannot accurately capture the audio features.
3. The method uses the CNN-LSTM integrated time sequence network suitable for time sequence data to carry out modeling, uses the fused characteristic data as the network input vector to carry out classification recognition research, and improves the accuracy of underwater target recognition.
4. In the integrated network, a weighted voting algorithm is introduced, the error of a single classifier is reduced by utilizing the complementary function between each sub-classification model, and the accuracy of classification identification is improved.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below.
FIG. 1 is a schematic flow chart of a multi-feature fusion underwater target identification method disclosed by an embodiment of the invention;
FIG. 2 is a flowchart of GFCC feature extraction according to an embodiment of the present invention;
fig. 3 is a structure diagram of a CNN-LSTM integrated timing network disclosed in the embodiment of the present invention.
Detailed Description
The technical solution in the embodiments of the present invention will be clearly and completely described below with reference to the accompanying drawings in the embodiments of the present invention.
The invention provides a multi-feature fusion underwater target identification method, as shown in fig. 1, the specific embodiment is as follows:
(1) data preprocessing: in order to eliminate dimension influence generated between different dimensions of the extracted features and avoid deviation of classification results caused by over-large or over-small part of features, min-max standardization processing is carried out on data before feature extraction is carried out, and result values are mapped to [0-1 ].
(2) Feature extraction: and short-time energy feature extraction is carried out on the processed data in the time domain, the energy of the underwater sound signal changes obviously along with time, and the short-time energy can better express audio features. The short-time energy feature extraction method has the following formula:
Figure BDA0002669942810000051
wherein E isnThe short-time energy characteristic of the nth frame filtering and windowing is shown, N is the length of a frame of voice signal, and y (N) is the signal after filtering and windowing.
In the frequency domain, as shown in fig. 2, after the audio signal is preprocessed, the audio signal is divided into n frames, each frame of signal is subjected to EMD decomposition to obtain a series of Intrinsic Mode Function (IMF) components containing different frequency components from high to low, the frequency components are changed along with the change of the signal, then each IMF component is subjected to fast fourier transform, the IMF components of each frequency band are added, filtering is performed through a Gammatone filter, the output value of each filter is subjected to logarithmic compression to obtain a set of logarithmic energy spectrums, and finally, the GFCC characteristics are obtained through discrete cosine transform.
The method comprises the following specific steps:
firstly, dividing an audio signal into n frames, and decomposing each frame of voice signal into I IMF components through EMD; then, fast Fourier transform is respectively carried out on each IMF component to obtain a discrete power spectrum s of the voice signali(k):
Figure BDA0002669942810000052
Wherein I is 1, 2, …, I, t is time, k is the order of the filter, s (k) is the original audio signal, d is the number of discrete fourier transform sampling points;
then, frequency synthesis is carried out to obtain a power spectrum x (t):
Figure BDA0002669942810000061
the Gamma atom filter can better simulate the frequency decomposition function and sharp filtering characteristic of a cochlear basilar membrane only by needing few parameters, has simple time domain impact response and is easy to physically realize. The time domain expression is as follows:
Figure BDA0002669942810000062
wherein g (t) is the time domain impulse response of the Gamma filter, t is time, f is the center frequency, a is the control filter gain,
Figure BDA0002669942810000063
for phase, k is the order of the filter, b is the attenuation factor, which determines the corresponding bandwidth of the filter, whose value is determined by the center frequency f:
Figure BDA0002669942810000064
the non-linear characteristic is a very important point in the human auditory system, mainly because of its interference-free capability. Logarithmic compression is used to model this non-linear behavior.
After frequency synthesis, squaring the power spectrum x (t) to obtain an energy spectrum, and performing logarithmic compression on the output value of each filter to obtain a logarithmic energy spectrum:
Figure BDA0002669942810000065
wherein j is 1, 2, …, H, H is the number of Gamma filters;
finally, discrete cosine transform is carried out on the energy spectrum e (j) to obtain GFCC characteristics, and the calculation formula is as follows:
Figure BDA0002669942810000066
where M is 1, 2, …, M is the dimension of the GFCC characteristic parameter, since the signal is divided into n frames, G, before EMD decompositionmnThe GFCC characteristic for the nth frame, mth dimension of the audio signal.
(3) Feature fusion: the short-time energy characteristic and the GFCC characteristic are extracted by adopting the same framing mode, and then the characteristic vectors of the two audio signals are combined in an end-to-end connection mode to form a fusion characteristic vector.
In order to ensure that the features in different dimensions have certain comparability on the numerical value and avoid the deviation of part of features with overlarge or undersize numerical values on the classification result, Min-Max standardization processing is carried out on the feature data before feature fusion:
Figure BDA0002669942810000067
wherein x ismaxIs the maximum value, x, in the feature dataminIs the minimum value in the feature data, x is the feature data, x*And standardizing the processed characteristic data for Min-Max.
Then, combining the two audio signal feature vectors in an end-to-end connection mode to form a fusion matrix, wherein the T formula of the fusion feature matrix is as follows:
T=[G1n,G2n,…,Gmn,Gn] (9)
wherein G ismnFor the nth frame, the m-th GFCC characteristic, EnThe windowed short-time energy features are filtered for the nth frame extracted in the same frame-wise manner.
(4) Building a model: a weighted voting mechanism is introduced to build a CNN-LSTM integrated time sequence network model, as shown in FIG. 3, the model comprises a CNN sub-classifier and an LSTM sub-classifier.
In the CNN sub-classifier, the CNN network is converted into a model suitable for time series data by combining one-dimensional full convolution and expansion convolution. The first layer is a one-dimensional convolution layer, then a residual structure is constructed, the left side is the one-dimensional convolution layer, the right side is two layers of expansion convolution layers, and the last layer is a full-connection layer. The expansion coefficient of each expanded convolutional layer of the residual structure increases exponentially, so that the convolutional core is ensured to cover all the input in the effective history information, and the extremely long effective history information can be generated by using the depth network.
And the LSTM sub-classifier selects the LSTM network which is most suitable for sequence data modeling, and finally classifies through two layers of fully-connected softmax functions.
(5) Target identification: and taking the fused feature vector as an input vector of the CNN-LSTM integrated time sequence network model, respectively inputting the input vector into the CNN sub-classifier and the LSTM sub-classifier, distributing high weight to the sub-classifier with high classification accuracy, multiplying the prediction probability of each category of the two sub-classifiers by the weight to obtain a classification vector, and selecting the category corresponding to the maximum value in the classification vector as a final target identification classification result.
The weight is defined as:
Figure BDA0002669942810000071
wherein p islThe classification accuracy of the first sub-classifier is given, L is the total number of the sub-classifiers, and L is 2;
then, the classification result obtained by each sub-classifier is multiplied by the corresponding weight value to obtain a classification vector, as shown in formula (8):
Figure BDA0002669942810000072
and finally, the category corresponding to the maximum value in the classification vector P is the final classification result of the integration algorithm.
The weighted voting algorithm strengthens the influence of the sub-classifiers with high accuracy on the final result by distributing higher weights to the sub-classifiers. The error of a single classifier is reduced by utilizing the complementary function between single classification models, and the classification precision can be further improved.
The previous description of the disclosed embodiments is provided to enable any person skilled in the art to make or use the present invention. Various modifications to these embodiments will be readily apparent to those skilled in the art, and the generic principles defined herein may be applied to other embodiments without departing from the spirit or scope of the invention. Thus, the present invention is not intended to be limited to the embodiments shown herein but is to be accorded the widest scope consistent with the principles and novel features disclosed herein.

Claims (9)

1. A multi-feature fusion underwater target identification method is characterized by comprising the following steps:
(1) data preprocessing: carrying out min-max standardization processing on the acquired underwater sound signal data, and mapping a result value to [0-1 ];
(2) feature extraction: performing short-time energy feature extraction on the processed data in the aspect of time domain, decomposing the processed data in the aspect of frequency domain by using an EMD (empirical mode decomposition) method to obtain a plurality of IMF (intrinsic mode function) components, and performing GFCC (Gaussian filtered coefficient) feature extraction on each obtained IMF component;
(3) feature fusion: firstly, extracting short-time energy characteristics and GFCC characteristics by adopting the same framing mode, and then combining two audio signal characteristic vectors by adopting an end-to-end connection mode to form a fusion characteristic vector;
(4) building a model: introducing a weighted voting mechanism to build a CNN-LSTM integrated time sequence network model, wherein the model comprises a CNN sub-classifier and an LSTM sub-classifier;
(5) target identification: and taking the fused feature vector as an input vector of the CNN-LSTM integrated time sequence network model, respectively inputting the input vector into the CNN sub-classifier and the LSTM sub-classifier, distributing high weight to the sub-classifier with high classification accuracy, multiplying the prediction probability of each category of the two sub-classifiers by the weight to obtain a classification vector, and selecting the category corresponding to the maximum value in the classification vector as a final target identification classification result.
2. The method for identifying the underwater target with the multi-feature fusion as claimed in claim 1, wherein in the step (2), the short-time energy feature extraction method has the following formula:
Figure FDA0002669942800000011
wherein E isnThe short-time energy characteristic of the nth frame filtering and windowing is shown, N is the length of a frame of voice signal, and y (N) is the signal after filtering and windowing.
3. The method for identifying the underwater target with the multi-feature fusion as claimed in claim 1, wherein in the step (2), the GFCC feature extraction method comprises the following steps: after preprocessing the audio signal, dividing the audio signal into n frames, and performing EMD on each frame of signal to obtain a plurality of IMF components; and then carrying out fast Fourier transform on each IMF component, adding the IMF components of each frequency band, filtering through a Gamma-tone filter, carrying out logarithmic compression on the output value of each filter to obtain a group of logarithmic energy spectrums, and finally carrying out discrete cosine transform to obtain the GFCC characteristics.
4. The method for identifying the underwater target with the multi-feature fusion as claimed in claim 3, characterized in that, firstly, the audio signal is divided into n frames, and each frame of the voice signal is decomposed into I IMF components through EMD; then, fast Fourier transform is respectively carried out on each IMF component to obtain a discrete power spectrum s of the voice signali(k):
Figure FDA0002669942800000012
Wherein I is 1, 2, …, I, t is time, k is the order of the filter, s (k) is the original audio signal, d is the number of discrete fourier transform sampling points;
then, frequency synthesis is carried out to obtain a power spectrum x (t):
Figure FDA0002669942800000021
the time domain impulse response of the Gamma-tone filter can be regarded as the product of the Gamma-tone function, and is expressed as follows:
Figure FDA0002669942800000022
wherein g (t) is the time domain impulse response of the Gamma filter, t is time, f is the center frequency, a is the control filter gain,
Figure FDA0002669942800000023
for phase, k is the order of the filter, b is the attenuation factor, which determines the corresponding bandwidth of the filter, whose value is determined by the center frequency f:
Figure FDA0002669942800000024
after frequency synthesis, squaring the power spectrum x (t) to obtain an energy spectrum, and performing logarithmic compression on the output value of each filter to obtain a logarithmic energy spectrum:
Figure FDA0002669942800000025
wherein j is 1, 2, …, H, H is the number of Gamma filters;
finally, discrete cosine transform is carried out on the energy spectrum e (j) to obtain GFCC characteristics, and the calculation formula is as follows:
Figure FDA0002669942800000026
wherein M is 1, 2, …, M is the dimension of GFCC characteristic parameter, GmnThe GFCC characteristic for the nth frame, mth dimension of the segment of audio signal.
5. The method for identifying the underwater target with the multi-feature fusion as claimed in claim 1, wherein in the step (3), before the feature fusion, Min-Max standardization processing is performed on the feature data:
Figure FDA0002669942800000027
wherein x ismaxIs the maximum value, x, in the feature dataminIs the minimum value in the feature data, x is the feature data, x*And standardizing the processed characteristic data for Min-Max.
6. The method for identifying the underwater target with the multi-feature fusion as claimed in claim 1, wherein in the step (3), the feature vectors of the two audio signals are combined in an end-to-end connection manner to form a fusion matrix, and the T formula of the fusion feature matrix is as follows:
T=[G1n,G2n,…,Gmn,Gn] (9)
wherein G ismnFor the nth frame, the m-th GFCC characteristic, EnThe windowed short-time energy features are filtered for the nth frame extracted in the same frame-wise manner.
7. The method for identifying the underwater target with the multi-feature fusion as claimed in claim 1, wherein in the step (4), the first layer of the CNN sub-classifiers is a one-dimensional convolution layer, then a residual structure is constructed, the left layer is the one-dimensional convolution layer, the right layer is two expansion convolution layers, and the last layer is a full connection layer.
8. The method for identifying the underwater target with the multi-feature fusion as claimed in claim 1, wherein in the step (4), the LSTM sub-classifier selects an LSTM network, and finally performs classification through two layers of fully connected softmax functions.
9. The method for identifying the underwater target with the multi-feature fusion as claimed in claim 1, wherein in the step (5), the weight is defined as:
Figure FDA0002669942800000031
wherein p islThe classification accuracy of the first sub-classifier is given, L is the total number of the sub-classifiers, and L is 2;
then, the classification result obtained by each sub-classifier is multiplied by the corresponding weight value to obtain a classification vector, as shown in formula (11):
Figure FDA0002669942800000032
and finally, the category corresponding to the maximum value in the classification vector P is the final classification result of the integration algorithm.
CN202010930201.7A 2020-09-07 2020-09-07 Multi-feature fusion underwater target identification method Pending CN112183582A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010930201.7A CN112183582A (en) 2020-09-07 2020-09-07 Multi-feature fusion underwater target identification method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010930201.7A CN112183582A (en) 2020-09-07 2020-09-07 Multi-feature fusion underwater target identification method

Publications (1)

Publication Number Publication Date
CN112183582A true CN112183582A (en) 2021-01-05

Family

ID=73925637

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010930201.7A Pending CN112183582A (en) 2020-09-07 2020-09-07 Multi-feature fusion underwater target identification method

Country Status (1)

Country Link
CN (1) CN112183582A (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112925822A (en) * 2021-02-08 2021-06-08 山东大学 Time series classification method, system, medium and device based on multi-representation learning
CN114220458A (en) * 2021-11-16 2022-03-22 武汉普惠海洋光电技术有限公司 Sound identification method and device based on array hydrophone
CN114863951A (en) * 2022-07-11 2022-08-05 中国科学院合肥物质科学研究院 Rapid dysarthria detection method based on modal decomposition

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102820033A (en) * 2012-08-17 2012-12-12 南京大学 Voiceprint identification method
CN108416364A (en) * 2018-01-31 2018-08-17 重庆大学 Integrated study data classification method is merged in subpackage
CN110200626A (en) * 2019-06-14 2019-09-06 重庆大学 A kind of vision induction motion sickness detection method based on ballot classifier
CN110599336A (en) * 2018-06-13 2019-12-20 北京九章云极科技有限公司 Financial product purchase prediction method and system
CN111414754A (en) * 2020-03-19 2020-07-14 中国建设银行股份有限公司 Emotion analysis method and device of event, server and storage medium

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102820033A (en) * 2012-08-17 2012-12-12 南京大学 Voiceprint identification method
CN108416364A (en) * 2018-01-31 2018-08-17 重庆大学 Integrated study data classification method is merged in subpackage
CN110599336A (en) * 2018-06-13 2019-12-20 北京九章云极科技有限公司 Financial product purchase prediction method and system
CN110200626A (en) * 2019-06-14 2019-09-06 重庆大学 A kind of vision induction motion sickness detection method based on ballot classifier
CN111414754A (en) * 2020-03-19 2020-07-14 中国建设银行股份有限公司 Emotion analysis method and device of event, server and storage medium

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
XINGMEI WANG等: "Underwater Acoustic Target Recognition: A Combination of Multi-Dimensional Fusion Features and Modified Deep Neural Network", 《MDPI》 *
曾赛等: "水下目标多模态深度学习分类识别研究", 《应用声学》 *

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112925822A (en) * 2021-02-08 2021-06-08 山东大学 Time series classification method, system, medium and device based on multi-representation learning
CN114220458A (en) * 2021-11-16 2022-03-22 武汉普惠海洋光电技术有限公司 Sound identification method and device based on array hydrophone
CN114220458B (en) * 2021-11-16 2024-04-05 武汉普惠海洋光电技术有限公司 Voice recognition method and device based on array hydrophone
CN114863951A (en) * 2022-07-11 2022-08-05 中国科学院合肥物质科学研究院 Rapid dysarthria detection method based on modal decomposition
CN114863951B (en) * 2022-07-11 2022-09-23 中国科学院合肥物质科学研究院 Rapid dysarthria detection method based on modal decomposition

Similar Documents

Publication Publication Date Title
CN112364779B (en) Underwater sound target identification method based on signal processing and deep-shallow network multi-model fusion
Hu et al. Deep learning methods for underwater target feature extraction and recognition
CN109410917B (en) Voice data classification method based on improved capsule network
CN110751044B (en) Urban noise identification method based on deep network migration characteristics and augmented self-coding
CN112257521B (en) CNN underwater acoustic signal target identification method based on data enhancement and time-frequency separation
CN112183582A (en) Multi-feature fusion underwater target identification method
CN111161715B (en) Specific sound event retrieval and positioning method based on sequence classification
CN111724770B (en) Audio keyword identification method for generating confrontation network based on deep convolution
CN113205820B (en) Method for generating voice coder for voice event detection
CN112183107A (en) Audio processing method and device
CN111899757A (en) Single-channel voice separation method and system for target speaker extraction
CN105448302A (en) Environment adaptive type voice reverberation elimination method and system
CN111341319A (en) Audio scene recognition method and system based on local texture features
CN113646833A (en) Voice confrontation sample detection method, device, equipment and computer readable storage medium
CN113191178A (en) Underwater sound target identification method based on auditory perception feature deep learning
CN114783418B (en) End-to-end voice recognition method and system based on sparse self-attention mechanism
CN110580915B (en) Sound source target identification system based on wearable equipment
CN110444225B (en) Sound source target identification method based on feature fusion network
CN112329819A (en) Underwater target identification method based on multi-network fusion
Espi et al. Spectrogram patch based acoustic event detection and classification in speech overlapping conditions
Vani et al. Improving speech recognition using bionic wavelet features
CN116310770A (en) Underwater sound target identification method and system based on mel cepstrum and attention residual error network
Sunny et al. Feature extraction methods based on linear predictive coding and wavelet packet decomposition for recognizing spoken words in malayalam
Song et al. Research on Scattering Transform of Urban Sound Events Detection Based on Self-Attention Mechanism
CN111785262B (en) Speaker age and gender classification method based on residual error network and fusion characteristics

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication

Application publication date: 20210105

RJ01 Rejection of invention patent application after publication