CN114282572A - Underwater sound target identification method based on ShuffleNet V2 classification network and Mel spectrum characteristics - Google Patents

Underwater sound target identification method based on ShuffleNet V2 classification network and Mel spectrum characteristics Download PDF

Info

Publication number
CN114282572A
CN114282572A CN202111529853.0A CN202111529853A CN114282572A CN 114282572 A CN114282572 A CN 114282572A CN 202111529853 A CN202111529853 A CN 202111529853A CN 114282572 A CN114282572 A CN 114282572A
Authority
CN
China
Prior art keywords
mel
shufflenet
frequency
network
underwater sound
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202111529853.0A
Other languages
Chinese (zh)
Other versions
CN114282572B (en
Inventor
曾向阳
杨爽
王海涛
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Northwestern Polytechnical University
Original Assignee
Northwestern Polytechnical University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Northwestern Polytechnical University filed Critical Northwestern Polytechnical University
Priority to CN202111529853.0A priority Critical patent/CN114282572B/en
Publication of CN114282572A publication Critical patent/CN114282572A/en
Application granted granted Critical
Publication of CN114282572B publication Critical patent/CN114282572B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Landscapes

  • Measurement Of Velocity Or Position Using Acoustic Or Ultrasonic Waves (AREA)
  • Complex Calculations (AREA)

Abstract

The invention relates to an underwater acoustic target identification method based on a ShuffLeNet V2 classification network and Mel spectral features, wherein a ShuffLeNet V2 network of a ShuffLeNet V20.5x version and a ShuffLeNet V21.0x version is used and is modified to match with the Mel spectral features. The method comprises the steps of changing the shape of an input tensor of each layer of the network, changing output channels and changing the size of a convolution kernel of a Globalpool layer. In addition, a batch normalization layer is added to the bottom layer of each of the 2 networks to normalize the mean and variance of each batch of input data. The identification method of the invention also uses data enhancement, increases the sample data volume and improves the generalization capability of the model; feature enhancement is also used to normalize the mel-frequency spectral feature sample range. The experimental result shows that the classification experimental result based on various actually measured underwater sound targets shows that the recognition effect of the underwater sound target recognition method based on the multiple actually measured underwater sound targets is optimal at present as a method for recognizing the underwater sound targets by combining deep learning and artificial Mel spectrum characteristics.

Description

Underwater sound target identification method based on ShuffleNet V2 classification network and Mel spectrum characteristics
Technical Field
The invention belongs to the field of underwater acoustic target identification, and particularly relates to an underwater acoustic target identification method based on a ShuffLeNet V2 classification network and Mel spectrum characteristics.
Background
The underwater sound target recognition is an important component of underwater sound signal processing and is also an important technical support for acquiring the underwater sound information and resisting the underwater sound information. Recently, deep CNN networks such as ResNet and densnet have been applied to underwater acoustic target recognition technology to improve the correct recognition rate of underwater acoustic target recognition. However, ResNet and DenseNet have high computational complexity, affect the network training speed, and are difficult to apply to small mobile-end devices. The shuffle net is a CNN algorithm with extremely high computational efficiency, and the algorithm adopts two new operations of point state group convolution (point group convolution) and channel shuffle (channel shuffle), thereby greatly reducing the computational cost while ensuring the computational accuracy. ShuffleNet V2 further improves the accuracy of classification compared to ShuffleNet. Therefore, the light-weight convolutional network ShuffleNet V2 is a better choice for the mobile-end equipment application of the technical method and the limited underwater sound target sample number.
The underwater acoustic target recognition is different from image recognition, and the application of depth networks such as ResNet, DenseNet and ShuffleNet to the underwater acoustic target recognition needs to process an original underwater acoustic signal. In a traditional underwater acoustic target passive recognition system, feature extraction and a classifier are usually two relatively independent links, and the matching degree of the feature extraction and the classifier needs to be considered in a step processing method. The matching of the features and the classification method is the key for effectively improving the correct recognition rate.
Disclosure of Invention
The technical problem solved by the invention is as follows: in order to solve the problem that the existing target identification system is difficult to deploy to mobile terminal equipment, the invention provides an underwater acoustic target identification method based on a ShuffleNet V2 classification network and Mel spectrum characteristics. The invention introduces ShuffleNet V2 into underwater acoustic target recognition, combines with Mel spectrum characteristics, and provides an underwater acoustic target depth recognition method. Experimental results show that the recognition effect of the underwater sound target recognition method fusing deep learning and artificial Mel spectrum features is optimal at present.
The technical scheme of the invention is as follows: an underwater acoustic target identification method based on ShuffleNet V2 classification network and Mel spectral features comprises the following steps:
step 1: the method comprises the steps of reading labels of underwater sound targets to be recognized, preprocessing several types of the labeled underwater sound targets, dividing the underwater sound targets into training set samples and verification set samples, wherein the number of the training set samples is larger than that of the verification set samples.
Step 2: carrying out feature extraction and feature enhancement on the training set samples and the verification set samples, designing a Mel frequency scale filter bank, and obtaining Mel spectrum features of the multi-class target training set and the verification set after feature enhancement;
and step 3: and taking the Mel spectral features of the multi-class target training set and the verification set as the input of the modified ShuffLeNet V2 network model, training and verifying the network model, and finally completing the underwater sound target recognition.
The further technical scheme of the invention is as follows: in the step 1, the tag reading specifically includes reading a specific path name of the underwater sound target file, and generating tag information corresponding to several types of underwater sound targets according to target category information in the path name.
The further technical scheme of the invention is as follows: in the step 1, each section of target data of several types of underwater sound targets is subjected to framing, wherein the frame overlapping is 25% -75%), and the number of samples is increased.
The further technical scheme of the invention is as follows: the number ratio of the training set samples to the verification set samples is 7: 3.
the further technical scheme of the invention is as follows: the step 2 comprises the following substeps:
step 2.1: framing each sample in the training set and the verification set, and taking N sampling points as an observation unit; a part of frame overlapping M exists between every two adjacent 2 frames, the default M is 1/4 × N, and then each frame signal is multiplied by a window function, and the default Haining window is obtained;
step 2.2: assuming that the framed signal is s (N), N is 0,1 …, N-1, N is the size of the frame, and the haining window expression:
Figure BDA0003410328380000021
step 2.3: after the signal is windowedFourier transform is performed on each frame signal to obtain a transformed signal Xa(k):
Figure BDA0003410328380000031
Wherein S' (N) is a frame signal after windowing, and N represents the number of points of Fourier transform;
step 2.4: designing a Mel frequency scale filter bank, wherein the relationship between the Mel frequency scale and the actual frequency size is as follows:
Figure BDA0003410328380000032
where f is the actual frequency in Hz. Mel (f) is the perceived frequency in mel;
step 2.5) this step and step 2.4 should be 1 step: defining a filter bank having M filters, the lower limit frequency of the M filter being the center frequency of the M-1 filter, i.e. f0(m)=fl(m+1)=fh(M-1), wherein M is 1,2 …, M, f0Is the center frequency, flLower limit frequency (default 0), fhUpper limit frequency (1/2 of sampling frequency by default); frequency response formula H of triangular band-pass filterm(k) The following were used:
Figure BDA0003410328380000033
wherein
Figure BDA0003410328380000034
Calculate each filter response and each frame Xa(k) Obtaining the Mel spectrum characteristic by the dot product of the squares of the modes; and subtracting the mean value and dividing the variance of the Mel spectral features obtained by each frame to obtain the Mel spectral features after feature enhancement.
The further technical scheme of the invention is as follows: the step 3 comprises the following steps:
step 3.1: modifying the ShuffLeNet V2 network to obtain a modified ShuffLeNet V2 classifier, wherein a Batch Normalization layer, namely Batch Normalization, BN, is added to change the number of output channels of a Mel spectrum feature layer, and Conv1, Stage2, Stage3, Stage4 and Conv5 layers, namely Conv1, Stage2, Stage3, Stage4 and Conv5 layers are all formed by convolution operations, wherein 3 Stage layers are formed by the output size of each feature map in the splicing of spatial downsampling blocks and basic cell blocks, and the convolution kernel size of a GlobalPool layer, namely a global averaging pooling layer, and the number of output channels of a last full-Connected layer, namely Fully Connected layers, FC;
step 3.2: and (3) taking the Mel spectrum characteristics of the multi-class target training set and the verification set as the input of the modified ShuffLeNet V2 network model, and training and verifying the network model.
Effects of the invention
The invention has the technical effects that: to match with the lightweight shefflenet V2 network, mel-spectrum features are input in a manner similar to RGB input of an image, using mel-spectrum as an artificial feature, except that RGB of an image is 3-channel and mel-spectrum features are single-channel. The present invention uses the ShuffleNet V2 networks of the ShuffleNet V20.5x version and the ShuffleNet V21.0x version and modifies them to match the Mel spectral features. The method comprises the steps of changing the shape of an input tensor of each layer of the network, changing output channels and changing the size of a convolution kernel of a Globalpool layer. In addition, a batch normalization layer is added to the bottom layer of each of the 2 networks to normalize the mean and variance of each batch of input data. The identification method of the invention also uses data enhancement, increases the sample data volume and improves the generalization capability of the model; feature enhancement is also used to normalize the mel-frequency spectral feature sample range. The invention can complete the task of identifying various underwater acoustic targets at the mobile terminal. The invention realizes the tasks of feature extraction and classification and identification of underwater sound target identification by utilizing Mel spectrum features and the modified lightweight convolutional network ShuffleNet V2. Under the small-sized equipment at the mobile end, the classification experiment result based on various actually measured underwater sound targets shows that the accurate recognition rate of 99% is obtained as an underwater sound target recognition method integrating deep learning and artificial Mel spectrum characteristics.
Drawings
FIG. 1 is a flow chart of an underwater acoustic target recognition model method fusing Mel spectral features and a modified ShuffleNet V2 classifier.
Detailed Description
Referring to fig. 1, the underwater acoustic target recognition method of the present invention will now be described in detail with reference to examples. The mobile terminal device information of this example is as follows: a display card: the number of available GPUs in the system is 1 and the number of CPUs is 8 in the Nvidia GeForce MX 350. The implementation is programmed in a Python language pytorch1.9 environment.
The depth recognition model method comprises the following steps:
step 1: data pre-processing (data enhancement).
Considering that the underwater sound signal is stationary for a short time, framing the underwater sound signal, and overlapping the frames, which increases the number of samples, can be regarded as some kind of data enhancement. An underwater acoustic target sample of known tags is read. And strictly dividing the underwater sound target data into a training set and a verification set. Namely, it is
Figure BDA0003410328380000051
Step 2: feature extraction and feature enhancement.
And (3) extraction of mel spectrum features: and performing short-time Fourier transform on the time domain signal to obtain a power spectrum, and performing Mel filtering on the power spectrum to obtain Mel spectrum characteristics.
And (3) feature enhancement: and carrying out feature scaling processing on the obtained Mel spectrum features, and standardizing the range of the data features.
And step 3: ShuffleNet V2 network modification.
The ShuffleNet V2 networks were modified for the ShuffleNet V20.5x version and the ShuffleNet V21.0x version, respectively. The method comprises the changes of the shape of each layer of input tensor of the network (matched with the size of a Mel spectrum characteristic), the changes of an output channel and the changes of the size of a convolution kernel of a GlobalPool layer. In addition, a batch normalization layer is added to the network bottom layer to normalize the mean and variance of each batch of input data.
And 4, step 4: and (3) performing the feature extraction and the feature enhancement in the step (2) on the underwater sound target data obtained by the data enhancement in the step (1) to obtain feature sample data. And inputting the feature sample data acquired in the step 2 into the ShuffLeNet V2 network modified in the step 3 for training and verification.
Wherein the training process is explained as follows: the loss function in the network model is used to represent the difference (error) between the actual output (probability) and the expected output (probability) of the network model. In the training process of the network model, firstly, forward propagation is carried out to calculate actual output (probability), then, network parameters are updated through a reverse gradient propagation algorithm, the loss value of a loss function is reduced to continuously reduce errors, and the actual output (probability) of the model is closer to expected output (probability). Therefore, the smaller the loss value obtained by calculation in the network training process is, the better the recognition performance of the network model is.
In the verification process, the model does not perform reverse gradient training any more, model parameters are not updated any more, the characteristic sample is input into the network to directly calculate actual output (probability), and the correct recognition rate of the verification set is obtained through comparison with expected output (probability). The correct recognition rate is the percentage of correctly classified underwater acoustic targets over all underwater acoustic targets. And the loss value obtained by the training set sample and the correct recognition rate obtained by the verification set sample are used as evaluation indexes of the recognition method.
Step 1: several types of labeled underwater sound targets are read, and the underwater sound target data is preprocessed firstly. And (4) dividing each section of target data into frames, overlapping the frames by 50%, and increasing the number of samples. The underwater sound target data is strictly divided into a training set and a verification set. 7/10 for the total sample was used as the training set sample and 3/10 as the validation set sample in the experiment.
Step 2: and carrying out feature extraction and feature enhancement on the data sample.
Each sample is framed. Taking N as an observation unit, N is 2048, and a part of frame overlap M exists between every two adjacent 2 frames, and M is 1/4 × N as a default. Each frame signal is then multiplied by a window function, the default haining window. Assuming that the signal after framing is s (N), N is 0,1 …, N-1, and N is the size of the frame, formula (1) gives the result obtained by multiplying the signal by the haining window, and formula (2) is a haining window expression:
S'(n)=S(n)×W(n) (1)
Figure BDA0003410328380000061
after windowing of the signals, Fourier transform is performed on each frame of signal to obtain a transformed signal Xa(k):
Figure BDA0003410328380000062
Where S' (N) is a windowed frame signal, and N represents the number of points of fourier transform.
The design of the Mel frequency scale filter bank is also the key of the design of the Mel spectrum characteristics. The sound sensed by the human ear is not linearly proportional to the frequency, and the Mel frequency scale is more consistent with the auditory characteristics of the human ear. The Mel frequency scale and the actual frequency magnitude have a distribution of logarithmic relations, and the relation between them can be approximately expressed by the formula (4):
Figure BDA0003410328380000071
where f is the actual frequency in Hz. Mel (f) is the perceptual frequency in Mel.
Each filter in the Mel Filter Bank is a triangular filter, defining a filter Bank (default is 128) with M filters, the lower limit frequency of the M filter is the center frequency of the M-1 filter, i.e. f0(m)=fl(m+1)=fh(M-1), wherein M is 1,2 …, M, f0Is the center frequency, flLower limit frequency (default 0), fhIs the upper frequency (1/2 for the sampling frequency by default).
Of triangular band-pass filtersFrequency response equation Hm(k) The following were used:
Figure BDA0003410328380000072
wherein
Figure BDA0003410328380000073
Calculate each filter response and each frame Xa(k) And obtaining the Mel spectral characteristics by the dot product of the squares of the modes. And subtracting the mean value and dividing the variance of the Mel spectral features obtained by each frame to obtain the Mel spectral features after feature enhancement.
Step 2: and modifying the ShuffleNet V2 networks of the ShuffleNet V20.5x version and the ShuffleNet V21.0x version to obtain a modified ShuffleNet V2 classifier. As shown in the attached table one. The italicized bold part is a modification part of ShuffleNet V2, and comprises the steps of adding a batch normalization layer (BN) layer, changing the output channel number of a Mel spectrum characteristic layer, the output size of each characteristic diagram in a Conv1 layer, a Stage2 layer, a Stage3 layer, a Stage4 layer and a Conv5 layer, the convolution kernel size of a GlobalPool layer and the output channel number of a final layer of full-connection layers FC.
And step 3: and (6) evaluating the model.
And (3) taking the Mel spectrum characteristics of the multi-class target training set and the verification set as the input of the modified ShuffLeNet V2 network model, and training and verifying the network model. Setting network model parameters, carrying out random initialization on the network, calculating loss by adopting a cross entropy loss function, optimizing the gradient by adopting a random gradient descent (SGD) algorithm, setting Momentum (Momentum) to be 0.9, setting the learning rate to be 0.001 and training times to be 30 times. And after the network model is trained, recording loss in the training process and correct recognition rate in the verification process to evaluate the performance of the recognition method.
The invention combines the artificial Mel spectrum characteristic with the modified light-weight ShuffLeNet V2 network to complete the recognition task of the underwater acoustic target. The identification method is added with data enhancement and characteristic enhancement steps. The network is modified for matching with Mel spectral features; in particular, the network bottom layer adds a batch normalization layer to normalize the mean and variance of each batch of input data. Experimental results show that under the small-sized equipment at the mobile terminal, the model obtains a very good recognition effect in recognition and classification tasks of various underwater sound targets, and the effectiveness of the invention is proved.
A first attached table: a modified ShuffleNet V2 classifier.
Figure BDA0003410328380000081
Figure BDA0003410328380000091
And B, attaching a table II: training effects of the recognition methods under the ShuffleNet V20.5x version and the ShuffleNet V21.0x version.
Figure BDA0003410328380000092

Claims (6)

1. An underwater acoustic target identification method based on ShuffleNet V2 classification network and Mel spectral features is characterized by comprising the following steps:
step 1: the method comprises the steps of reading labels of underwater sound targets to be recognized, preprocessing several types of the labeled underwater sound targets, dividing the underwater sound targets into training set samples and verification set samples, wherein the number of the training set samples is larger than that of the verification set samples.
Step 2: carrying out feature extraction and feature enhancement on the training set samples and the verification set samples, designing a Mel frequency scale filter bank, and obtaining Mel spectrum features of the multi-class target training set and the verification set after feature enhancement;
and step 3: and taking the Mel spectral features of the multi-class target training set and the verification set as the input of the modified ShuffLeNet V2 network model, training and verifying the network model, and finally completing the underwater sound target recognition.
2. The underwater acoustic target identification method based on the ShuffLeNet V2 classification network and Mel spectral features as claimed in claim 1, wherein in step 1, the tag reading specifically is reading a specific path name of an underwater acoustic target file, and tag information corresponding to several types of underwater acoustic targets is generated according to target class information in the path name.
3. The underwater acoustic target identification method based on the ShuffleNet V2 classification network and Mel spectral features as claimed in claim 1, wherein in step 1, each target data of several types of underwater acoustic targets is framed, and the overlap of frames is 25% -75%), which is used to increase the number of samples.
4. The underwater acoustic target recognition method based on the ShuffLeNet V2 classification network and Mel spectral features as claimed in claim 1, wherein the number ratio of the training set samples to the validation set samples is 7: 3.
5. the underwater acoustic target identification method based on the ShuffleNet V2 classification network and Mel spectral features as claimed in claim 1, wherein the step 2 comprises the following sub-steps:
step 2.1: framing each sample in the training set and the verification set, and taking N sampling points as an observation unit; a part of frame overlapping M exists between every two adjacent 2 frames, the default M is 1/4 × N, and then each frame signal is multiplied by a window function, and the default Haining window is obtained;
step 2.2: assuming that the framed signal is s (N), N is 0,1 …, N-1, N is the size of the frame, and the haining window expression:
Figure FDA0003410328370000021
step 2.3: after windowing of the signals, Fourier transform is performed on each frame of signal to obtain a transformed signal Xa(k):
Figure FDA0003410328370000022
Wherein S' (N) is a frame signal after windowing, and N represents the number of points of Fourier transform;
step 2.4: designing a Mel frequency scale filter bank, wherein the relationship between the Mel frequency scale and the actual frequency size is as follows:
Figure FDA0003410328370000023
where f is the actual frequency in Hz. Mel (f) is the perceived frequency in mel;
step 2.5) this step and step 2.4 should be 1 step: defining a filter bank having M filters, the lower limit frequency of the M filter being the center frequency of the M-1 filter, i.e. f0(m)=fl(m+1)=fh(M-1), wherein M is 1,2 …, M, f0Is the center frequency, flLower limit frequency (default 0), fhUpper limit frequency (1/2 of sampling frequency by default); frequency response formula H of triangular band-pass filterm(k) The following were used:
Figure FDA0003410328370000024
wherein
Figure FDA0003410328370000025
Calculate each filter response and each frame Xa(k) Obtaining the Mel spectrum characteristic by the dot product of the squares of the modes; and subtracting the mean value and dividing the variance of the Mel spectral features obtained by each frame to obtain the Mel spectral features after feature enhancement.
6. The underwater acoustic target identification method based on the ShuffleNet V2 classification network and Mel spectral features as claimed in claim 1, wherein the step 3 comprises the following steps:
step 3.1: modifying the ShuffLeNet V2 network to obtain a modified ShuffLeNet V2 classifier, wherein a Batch Normalization layer, namely Batch Normalization, BN, is added to change the number of output channels of a Mel spectrum feature layer, and Conv1, Stage2, Stage3, Stage4 and Conv5 layers, namely Conv1, Stage2, Stage3, Stage4 and Conv5 layers are all formed by convolution operations, wherein 3 Stage layers are formed by the output size of each feature map in the splicing of spatial downsampling blocks and basic cell blocks, and the convolution kernel size of a GlobalPool layer, namely a global averaging pooling layer, and the number of output channels of a last full-Connected layer, namely Fully Connected layers, FC;
step 3.2: and (3) taking the Mel spectrum characteristics of the multi-class target training set and the verification set as the input of the modified ShuffLeNet V2 network model, and training and verifying the network model.
CN202111529853.0A 2021-12-14 2021-12-14 Underwater sound target identification method based on ShuffleNet V < 2 > classification network and Mel spectrum characteristics Active CN114282572B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202111529853.0A CN114282572B (en) 2021-12-14 2021-12-14 Underwater sound target identification method based on ShuffleNet V < 2 > classification network and Mel spectrum characteristics

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202111529853.0A CN114282572B (en) 2021-12-14 2021-12-14 Underwater sound target identification method based on ShuffleNet V < 2 > classification network and Mel spectrum characteristics

Publications (2)

Publication Number Publication Date
CN114282572A true CN114282572A (en) 2022-04-05
CN114282572B CN114282572B (en) 2024-08-09

Family

ID=80872152

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202111529853.0A Active CN114282572B (en) 2021-12-14 2021-12-14 Underwater sound target identification method based on ShuffleNet V < 2 > classification network and Mel spectrum characteristics

Country Status (1)

Country Link
CN (1) CN114282572B (en)

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
KR20190110939A (en) * 2018-03-21 2019-10-01 한국과학기술원 Environment sound recognition method based on convolutional neural networks, and system thereof
CN112329819A (en) * 2020-10-20 2021-02-05 中国海洋大学 Underwater target identification method based on multi-network fusion
CN112364779A (en) * 2020-11-12 2021-02-12 中国电子科技集团公司第五十四研究所 Underwater sound target identification method based on signal processing and deep-shallow network multi-model fusion
GB202107666D0 (en) * 2021-05-28 2021-07-14 Bae Systems Plc Apparatus and method of classification

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
KR20190110939A (en) * 2018-03-21 2019-10-01 한국과학기술원 Environment sound recognition method based on convolutional neural networks, and system thereof
CN112329819A (en) * 2020-10-20 2021-02-05 中国海洋大学 Underwater target identification method based on multi-network fusion
CN112364779A (en) * 2020-11-12 2021-02-12 中国电子科技集团公司第五十四研究所 Underwater sound target identification method based on signal processing and deep-shallow network multi-model fusion
GB202107666D0 (en) * 2021-05-28 2021-07-14 Bae Systems Plc Apparatus and method of classification

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
张少康;田德艳;: "水下声目标的梅尔倒谱系数智能分类方法", 应用声学, no. 02, 12 March 2019 (2019-03-12) *
程锦盛;杜选民;周胜增;曾赛;: "基于目标MFCC特征的监督学习方法在被动声呐目标识别中的应用研究", 舰船科学技术, no. 17, 8 September 2018 (2018-09-08) *

Also Published As

Publication number Publication date
CN114282572B (en) 2024-08-09

Similar Documents

Publication Publication Date Title
Becker et al. Interpreting and explaining deep neural networks for classification of audio signals
Ayvaz et al. Automatic Speaker Recognition Using Mel-Frequency Cepstral Coefficients Through Machine Learning.
CN110245608B (en) Underwater target identification method based on half tensor product neural network
Park et al. Musical instrument sound classification with deep convolutional neural network using feature fusion approach
CN111724770B (en) Audio keyword identification method for generating confrontation network based on deep convolution
Nainan et al. Enhancement in speaker recognition for optimized speech features using GMM, SVM and 1-D CNN
Passricha et al. A comparative analysis of pooling strategies for convolutional neural network based Hindi ASR
Hourri et al. Convolutional neural network vectors for speaker recognition
CN112750442B (en) Crested mill population ecological system monitoring system with wavelet transformation and method thereof
Wei et al. A method of underwater acoustic signal classification based on deep neural network
CN108986798A (en) Processing method, device and the equipment of voice data
CN111932056A (en) Customer service quality scoring method and device, computer equipment and storage medium
Dua et al. Optimizing integrated features for Hindi automatic speech recognition system
Sekkate et al. A statistical feature extraction for deep speech emotion recognition in a bilingual scenario
CN117763446B (en) Multi-mode emotion recognition method and device
Gaurav et al. An efficient speaker identification framework based on Mask R-CNN classifier parameter optimized using hosted cuckoo optimization (HCO)
Islam et al. Noise-robust text-dependent speaker identification using cochlear models
Cetin Accent recognition using a spectrogram image feature-based convolutional neural network
Mewada et al. Gaussian-filtered high-frequency-feature trained optimized BiLSTM network for spoofed-speech classification
Hu et al. A lightweight multi-sensory field-based dual-feature fusion residual network for bird song recognition
CN111785262B (en) Speaker age and gender classification method based on residual error network and fusion characteristics
Zhang et al. Discriminative frequency filter banks learning with neural networks
CN116682463A (en) Multi-mode emotion recognition method and system
Al-Dulaimi et al. Speaker Identification System Employing Multi-resolution Analysis in Conjunction with CNN.
CN114282572A (en) Underwater sound target identification method based on ShuffleNet V2 classification network and Mel spectrum characteristics

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant