CN114863938A - Bird language identification method and system based on attention residual error and feature fusion - Google Patents

Bird language identification method and system based on attention residual error and feature fusion Download PDF

Info

Publication number
CN114863938A
CN114863938A CN202210570511.1A CN202210570511A CN114863938A CN 114863938 A CN114863938 A CN 114863938A CN 202210570511 A CN202210570511 A CN 202210570511A CN 114863938 A CN114863938 A CN 114863938A
Authority
CN
China
Prior art keywords
bird
sound
attention
layer
residual error
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202210570511.1A
Other languages
Chinese (zh)
Inventor
程吉祥
潘齐炜
李志丹
何虹斌
曾蕊
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Southwest Petroleum University
Original Assignee
Southwest Petroleum University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Southwest Petroleum University filed Critical Southwest Petroleum University
Priority to CN202210570511.1A priority Critical patent/CN114863938A/en
Publication of CN114863938A publication Critical patent/CN114863938A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L17/00Speaker identification or verification techniques
    • G10L17/26Recognition of special voice characteristics, e.g. for use in lie detectors; Recognition of animal voices
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L17/00Speaker identification or verification techniques
    • G10L17/02Preprocessing operations, e.g. segment selection; Pattern representation or modelling, e.g. based on linear discriminant analysis [LDA] or principal components; Feature selection or extraction
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L17/00Speaker identification or verification techniques
    • G10L17/04Training, enrolment or model building
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L17/00Speaker identification or verification techniques
    • G10L17/18Artificial neural networks; Connectionist approaches

Landscapes

  • Engineering & Computer Science (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Human Computer Interaction (AREA)
  • Physics & Mathematics (AREA)
  • Acoustics & Sound (AREA)
  • Multimedia (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Artificial Intelligence (AREA)
  • Evolutionary Computation (AREA)
  • Image Analysis (AREA)

Abstract

The invention discloses a bird sound identification method and system based on attention residual error and feature fusion, which comprises the following steps: firstly, preprocessing operations of framing and windowing are carried out on a bird sound training set; then, processing the preprocessed bird sound training set by two feature extraction methods, and converting the obtained feature information into an energy spectrum image; in the training stage, the probability of bird species is calculated by using a residual error network with a horizontal and vertical attention structure module and a cross entropy loss function, a final classification prediction layer is obtained, and classification prediction of bird sounds is realized; a bird voice recognition system is designed and implemented, which can perform bird voice recognition classification using the method proposed by the present invention. The method improves the accuracy of bird sound recognition with high sound confusion degree, and can prove the effectiveness of the method through the test result.

Description

Bird language identification method and system based on attention residual error and feature fusion
Technical Field
The invention relates to the technical field of bird voiceprint recognition based on deep learning, in particular to a bird language recognition method and system based on attention residual error and feature fusion.
Background
Birds are an important index for evaluating the health of an ecosystem, and as an important component of the ecosystem, the existence and migration mode of the birds are often warning signals of environmental health of any specific area. In recent decades, the protection of bird biodiversity is more and more important, and the significance of bird voiceprint recognition technology is more and more important. The sound production structure and organs of each bird have certain differences, biological characteristics of the birds cannot be copied, the biological characteristics can be used for identifying biological species, and bird voiceprint technology is used for identifying the species of the biological characteristics specific to bird species by utilizing voiceprint identification technology. At present, bird voiceprint recognition technology can be divided into a traditional method and a deep learning-based method according to the type of a model; the traditional method mainly uses a Gaussian mixture model and maximum likelihood estimation to learn the sound with the highest score; the deep learning-based method mainly comprises the steps of training, identifying and detecting through a neural network model. Compared with the traditional method and the machine learning method, the deep learning-based method has more excellent performance in processing bird voice recognition tasks. With the rapid development of artificial intelligence and deep learning, the bird voiceprint recognition technology has wide application prospect in the field of environmental protection.
Document 1(Lee C H, Han C, Chuang C. automatic Classification of Bird specifices From the sources Using Two-Dimensional Cepstral Coefficients [ J ]. IEEE Transactions on Audio Spech & Lange Processing,2008,16(8):1541-1550.) is to use single Audio features to improve the feature information expression by dynamic and static extraction and then fusion representation, so as to improve the identification accuracy. Document 2(Efremova D B, Sankupelay M, Konovalov D A. data-Efficient Classification of Birdcall Through consistent Neural Networks Transfer Learning [ C ] In: Digital Image Computing: Techniques and Applications,2019,294-301.) utilizes ResNet50 deep Convolutional Neural Networks as a model to increase the speed of bird identification. Document 3 (spring and courage of poplar, qihongda, penyanqiu, etc..) research on energy spectrogram fusing voiceprint information in bird identification [ J ] application acoustics 2020,39(3): 453) is combined with classifier algorithm through LBP and HOG characteristics, and the generated confrontation network frequency spectrum information is additionally used for data enhancement, so that the identification rate is further improved.
Most models in the recognition task based on deep learning use large convolutional neural networks, although the recognition rate is improved, the problems of difficult training, low detection speed and the like caused by the increase of parameter calculation amount are inevitable. In the aspect of feature extraction, a single feature extraction method is generally used, but a single feature parameter cannot completely express all the characteristics of bird sounds in the identification and detection process, and certain limitations exist.
Disclosure of Invention
In order to solve the problems of audio characteristic information limitation, large network parameter quantity and the like, the invention provides a bird language identification method based on attention residual error and characteristic fusion, which uses two characteristic extraction methods, performs information fusion to obtain characteristic information, and converts the characteristic information into an energy spectrogram; the energy spectrogram is input into a bird language identification classification convolution neural network, corresponding characteristic images are generated through sampling, a residual error structure network with a horizontal and vertical attention module is used for effectively paying attention to channel relations among the characteristic images after the energy spectrogram is input into the network, and meanwhile, the calculation cost is reduced and the identification precision is improved.
A bird language identification method based on attention residual error and feature fusion is characterized by comprising the following steps:
s1, collecting the sound of various birds in the natural environment to form a sound training set; marking the sound of the bird species to which the bird belongs, and controlling the time range of each sound to be between 2s and 30s, wherein the sound contains the singing of a single bird;
s2, sampling the sound training set in the step S1 by using the same sampling frequency, and then unifying the audio time of the sound training set through the preprocessing operations of framing and windowing;
s3, obtaining feature information through two feature extraction methods, and finally converting the feature information into an energy spectrogram;
s31, processing the preprocessed sound training set by sequentially using a Mel trigonometric filtering algorithm and a cepstrum mean variance normalization method to obtain a vector fraction F; processing the preprocessed sound training set by using a gamma pass filtering algorithm added with noise suppression processing and cepstrum mean variance normalization in sequence to obtain a vector score G; and fusing the two vector fractions to obtain characteristic information f:
f=ωF+(1-ω)G
where ω represents the mixing weight coefficient.
And S32, converting the characteristic information f obtained in the S31 into an energy spectrogram.
S33, performing image enhancement on the obtained energy spectrogram; wherein the image enhancement operation comprises image color random gray scale transformation and image rotation.
The constructed bird language identification classification convolution neural network specifically comprises the following steps:
the network structure is sequentially provided with 3 convolution layers with convolution kernels of 3 x 3 and step length of 2, a maximum pooling layer, an activation function layer and 48 residual error structure layers with horizontal and vertical attention modules; the residual error structure layer with the horizontal and vertical attention modules comprises a convolution layer, an activation function layer, an average pooling layer and a batch normalization layer; the network structure uses a global average pooling operation at the last layer and an activation function layer after all convolution operations.
Inputting the energy spectrogram into the bird recognition classification convolutional neural network, and performing down-sampling operation on 3 convolutional layers with 3 × 3 step lengths of 2 to obtain a characteristic image Q, wherein the process is represented as follows:
Q=F 3*3 (F 3*3 (F 3*3 (f)))
processing the characteristic image Q by a residual structural layer with a horizontal and vertical attention module, wherein the residual structural layer with the horizontal and vertical attention module comprises the following parts: convolution layer 1 x 1, convolution layer 3 x 3, convolution layer 1 x 1, batch normalization layer, activation function layer, horizontal and vertical attention module and residual connection; the process can be expressed as:
F out =F HW (F 1*1 (F 3*3 (F 1*1 (x))))+x
F HW is a horizontal-vertical attention module, which is composed of two attention submodules in the vertical and horizontal directions, respectively, and is expressed as:
F HW =F H +F W
wherein:
Figure BDA0003658939510000031
Figure BDA0003658939510000032
δ denotes the use of sigmoid function, conv (x) denotes the use of a 1 × 1 size convolution kernel, and Avgpool denotes the use of an average pooling operation.
S4, constructing a bird language identification classification convolution neural network; inputting the energy spectrogram obtained in the step S3 into the constructed bird language identification classification convolutional neural network for training; the loss function uses a classified cross entropy loss function, an optimization strategy and a hyper-parameter are set for constructing a bird language recognition classification network, the loss function is continuously reduced by carrying out cyclic iterative training on the network until the set iteration times are finished and the training weight parameters are stored;
s5, constructing a bird language recognition system based on attention residual error and feature fusion by using the bird language recognition and classification convolutional neural network constructed in the step S3 and the obtained network training weight parameters, performing bird language recognition and classification on the energy spectrogram to be detected by using the detection system, and performing quantity marking and classification on all input bird spectrograms by using the bird language recognition system.
The invention also provides a bird language recognition system based on residual attention and feature fusion, which comprises the following modules:
the bird sound acquisition module is configured to acquire a bird sound data set to be processed;
a bird voice recognition model acquisition module, which configures a bird voice recognizer by using the bird voice recognition model and the parameter file obtained in the bird language recognition method based on attention residual and feature fusion in the claim 1, and is used for bird voice type recognition and classification;
and the bird counting module is used for counting the number of the obtained bird species.
Has the advantages that:
1. the invention provides a bird language identification method based on attention residual error and feature fusion. Because the bird sound data set comprises a large number of short-time bird original singing signals and is various in types, the invention firstly uses voiceprint to preprocess audio signals, then converts the audio signals into energy spectrogram by audio characteristic extraction, and then uses an attention residual error network to extract picture characteristics to accelerate the training speed of the recognition and classification network and reduce the network parameter number.
2. When the audio signal features are extracted, different feature extraction methods are adopted, cepstrum mean and variance normalization and feature warping processing are used after different features are obtained, and channel mismatch and channel effect in audio possibly existing are reduced.
3. The invention provides and designs a bird voice recognition system which can perform bird voice recognition by using the bird voice recognition method based on attention residual error and feature fusion.
Drawings
FIG. 1 is a general diagram of a model structure of a bird language identification method used in an embodiment of the present invention;
fig. 2 is a diagram illustrating an attention structure of a bird language recognition method according to an embodiment of the present invention; wherein, the sub-graph a is the integral structure of the attention residual error module; sub-graph b is the vertical and horizontal attention sub-structure diagram in the attention module;
FIG. 3 is a schematic flow chart of bird voice recognition system according to an embodiment of the present invention;
FIG. 4 is a block diagram of a bird voice recognition system according to an embodiment of the present invention;
fig. 5 is a comparison graph of feature extraction by the method of the present invention and the method not employed in the present invention, in which, sub-graph a is an energy spectrum graph of the feature extraction method using only the mel-triangle filtering algorithm, sub-graph b is an energy spectrum graph of the feature extraction method using only the gamma-pass filtering algorithm with noise suppression processing, and sub-graph c is an energy spectrum graph obtained by the feature fusion method of the present invention.
Fig. 6 is a comparison graph of confusion matrices between the method of the present invention and the method not of the present invention, in which, sub-graph a is a confusion matrix graph using only the mel triangle filtering algorithm feature extraction method, sub-graph b is a confusion matrix graph using only the gamma pass filtering algorithm feature extraction method with noise suppression processing, sub-graph c is a confusion matrix graph using the feature fusion method, and sub-graph d is a confusion matrix graph using the method of the present invention.
Detailed Description
In order to make the technical features, objects and advantages of the present invention more comprehensible, one embodiment of the present invention is further described with reference to the accompanying drawings. The examples are given solely for the purpose of illustration and are not to be construed as limitations of the present invention, as numerous insubstantial modifications and adaptations of the invention may be made by those skilled in the art based on the teachings herein.
A bird language identification method based on attention residual error and feature fusion is characterized by comprising the following steps:
s1, collecting the sound of various birds in the natural environment to form a sound training set; marking the sound of the bird species to which the bird belongs, and controlling the time range of each sound to be between 2s and 30s, wherein the sound contains the singing of a single bird; the number of bird sounds screened here is greater than or equal to 200 per bird.
S2, sampling the sound training set in the step S1 by using the same sampling frequency, and then unifying the audio time of the sound training set through the preprocessing operations of framing and windowing;
s3, obtaining feature information through two feature extraction methods, and finally converting the feature information into an energy spectrogram;
s4, constructing a bird language identification classification convolution neural network; inputting the energy spectrogram obtained in the step S3 into the constructed bird language identification classification convolutional neural network for training; the loss function uses a classified cross entropy loss function, an optimization strategy and a hyper-parameter are set for constructing a bird language recognition classification network, the loss function is continuously reduced by carrying out cyclic iterative training on the network until the set iteration times are finished and the training weight parameters are stored; when the training iteration times are reached, the loss function value is not obviously reduced after the training and fitting are performed;
s5, constructing a bird language recognition system based on attention residual error and feature fusion by using the bird language recognition and classification convolutional neural network constructed in the step S3 and the obtained network training weight parameters, performing bird language recognition and classification on the energy spectrogram to be detected by using the detection system, and performing quantity marking and classification on all input bird spectrograms by using the bird language recognition system.
As a specific embodiment of the present invention, the steps specifically include the following steps:
s31, processing the preprocessed sound training set by sequentially using a Mel trigonometric filtering algorithm and a cepstrum mean variance normalization method to obtain a vector fraction F; processing the preprocessed sound training set by using a gamma pass filtering algorithm added with noise suppression processing and cepstrum mean variance normalization in sequence to obtain a vector score G; and fusing the two vector fractions to obtain characteristic information f:
f=ωF+(1-ω)G
where ω represents the mixing weight coefficient.
And S32, converting the characteristic information f obtained in the S31 into an energy spectrogram.
S33, performing image enhancement on the obtained energy spectrogram; wherein the image enhancement operation comprises image color random gray scale transformation and image rotation.
As a specific embodiment of the present invention, the bird language identification classification convolutional neural network constructed in step S3 specifically includes:
the network structure is sequentially provided with 3 convolution layers with convolution kernels of 3 x 3 and step length of 2, a maximum pooling layer, an activation function layer and 48 residual error structure layers with horizontal and vertical attention modules; the residual error structure layer with the horizontal and vertical attention modules comprises a convolution layer, an activation function layer, an average pooling layer and a batch normalization layer; the network structure uses a global average pooling operation at the last layer and an activation function layer after all convolution operations.
Inputting the energy spectrogram into the bird recognition classification convolutional neural network, and performing down-sampling operation on 3 convolutional layers with 3 × 3 step lengths of 2 to obtain a characteristic image Q, wherein the process is represented as follows:
Q=F 3*3 (F 3*3 (F 3*3 (f)))
processing the characteristic image Q by a residual structural layer with a horizontal and vertical attention module, wherein the residual structural layer with the horizontal and vertical attention module comprises the following parts: convolution layer 1 x 1, convolution layer 3 x 3, convolution layer 1 x 1, batch normalization layer, activation function layer, horizontal and vertical attention module and residual connection; the process can be expressed as:
F out =F HW (F 1*1 (F 3*3 (F 1*1 (x))))+x
F HW is a horizontal-vertical attention module, which is composed of two attention submodules in the vertical and horizontal directions, respectively, and is expressed as:
F HW =F H +F W
wherein:
Figure BDA0003658939510000061
Figure BDA0003658939510000062
δ denotes the use of sigmoid function, conv (x) denotes the use of a 1 × 1 size convolution kernel, and Avgpool denotes the use of an average pooling operation.
Simulation experiment
Fig. 5 shows that the frequency representation in the energy spectrum graph c subgraph using the method is more obvious than the frequency representation in the subgraphs a and b, which shows that the feature fusion method in the invention has certain enhancement effect on the frequency and is beneficial to the improvement of the recognition rate.
The recognition rates of the method and the comparison recognition method are shown in table 1, wherein the characteristic 1 method is an index obtained by only using a Mel-triangular filtering algorithm and a residual error network, the characteristic 2 method is an index obtained by only using a gamma-pass filtering algorithm subjected to noise suppression and the residual error network, and the characteristic fusion method is an index obtained by only using a characteristic fusion method and the residual error network.
Table 1 simulation experiment bird voice recognition classification evaluation index statistical table
Method Average precision (%) Average recall (%) Average F1 (%)
Characteristic 1 method 88.96 86.06 88.06
Characteristic 2 method 90.17 89.14 89.14
Feature fusion method 93.43 90.91 92.15
The method of the invention 93.62 90.59 92.17
As can be seen from table 1 and fig. 6: by using the feature fusion and attention residual error network method, the accuracy and the F1 value are improved to a certain extent, and the classification effect is superior to that of a single feature extraction method and a residual error network. The simulation experiment results show that the method of the invention improves the expression of the feature extraction on the bird sound information and well improves the recognition performance without adding extra calculation cost to the network.
The content of the method of the present invention is described above, and those skilled in the art can implement the method of the present invention based on the description of the content. Based on the above description of the invention, other embodiments obtained by a person skilled in the art without any inventive step should fall within the scope of protection of the invention.

Claims (4)

1. A bird language identification method and system based on attention residual error and feature fusion are characterized by comprising the following steps:
s1, collecting the sound of various birds in the natural environment to form a sound training set; marking the sound of the bird species to which the bird belongs, and controlling the time range of each sound to be between 2s and 30s, wherein the sound contains the singing of a single bird;
s2, sampling the sound training set in the step S1 by using the same sampling frequency, and then unifying the audio time of the sound training set through the preprocessing operations of framing and windowing;
s3, obtaining feature information through two feature extraction methods, and finally converting the feature information into an energy spectrogram;
s4, constructing a bird language identification classification convolution neural network; inputting the energy spectrogram obtained in the step S3 into the constructed bird language identification classification convolutional neural network for training; the loss function uses a classified cross entropy loss function, an optimization strategy and a hyper-parameter are set for constructing a bird language recognition classification network, the loss function is continuously reduced by carrying out cyclic iterative training on the network until the set iteration times are finished and the training weight parameters are stored;
s5, constructing a bird language recognition system based on attention residual error and feature fusion by using the bird language recognition and classification convolutional neural network constructed in the step S3 and the obtained network training weight parameters, recognizing and classifying bird sounds by using the detection system, and quantitatively marking and classifying all input bird sounds by using the bird language recognition system.
2. The method for identifying bird language based on attention residual error and feature fusion according to claim 1, wherein the step S3 comprises the following steps:
s31, processing the preprocessed sound training set by sequentially using a Mel trigonometric filtering algorithm and a cepstrum mean variance normalization method to obtain a vector fraction F; processing the preprocessed sound training set by using a gamma pass filtering algorithm added with noise suppression processing and cepstrum mean variance normalization in sequence to obtain a vector score G; and fusing the two vector fractions to obtain characteristic information f:
f=ωF+(1-ω)G
where ω represents the mixing weight coefficient.
And S32, converting the characteristic information f obtained in the S31 into an energy spectrogram.
S33, performing image enhancement on the obtained energy spectrogram; wherein the image enhancement operation comprises image color random gray scale transformation and image rotation.
3. The method for identifying bird language based on attention residual error and feature fusion according to claim 1, wherein the bird language identification classification convolutional neural network constructed in step S3 specifically comprises:
the network structure is sequentially provided with 3 convolution layers with convolution kernels of 3 x 3 and step length of 2, a maximum pooling layer, an activation function layer and 48 residual error structure layers with horizontal and vertical attention modules; the residual error structure layer with the horizontal and vertical attention modules comprises a convolution layer, an activation function layer, an average pooling layer and a batch normalization layer; the network structure uses a global average pooling operation at the last layer and an activation function layer after all convolution operations.
Inputting the energy spectrogram into the bird recognition classification convolutional neural network, and performing down-sampling operation on 3 convolutional layers with 3 × 3 step lengths of 2 to obtain a characteristic image Q, wherein the process is represented as follows:
Q=F 3*3 (F 3*3 (F 3*3 (f)))
processing the characteristic image Q by a residual structural layer with a horizontal and vertical attention module, wherein the residual structural layer with the horizontal and vertical attention module comprises the following parts: convolution layer 1 x 1, convolution layer 3 x 3, convolution layer 1 x 1, batch normalization layer, activation function layer, horizontal and vertical attention module and residual connection; the process can be expressed as:
F out =F HW (F 1*1 (F 3*3 (F 1*1 (x))))+x
F HW is a horizontal-vertical attention module, which is composed of two attention submodules in the vertical and horizontal directions, respectively, and is expressed as:
F HW =F H +F W
wherein:
Figure FDA0003658939500000021
Figure FDA0003658939500000022
δ denotes the use of sigmoid function, conv (x) denotes the use of a 1 × 1 size convolution kernel, and Avgpool denotes the use of an average pooling operation.
4. A bird language recognition system based on attention residual and feature fusion is characterized by comprising the following modules:
the bird sound acquisition module is configured to acquire a bird sound data set to be processed;
a bird voice recognition model acquisition module configured to form a bird voice recognizer with the bird voice recognition model and the parameter file obtained in the bird language recognition method based on attention residual and feature fusion in claim 1, and to be used for bird voice category recognition and classification;
and the bird counting module is used for counting the number of the obtained bird species.
CN202210570511.1A 2022-05-24 2022-05-24 Bird language identification method and system based on attention residual error and feature fusion Pending CN114863938A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210570511.1A CN114863938A (en) 2022-05-24 2022-05-24 Bird language identification method and system based on attention residual error and feature fusion

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210570511.1A CN114863938A (en) 2022-05-24 2022-05-24 Bird language identification method and system based on attention residual error and feature fusion

Publications (1)

Publication Number Publication Date
CN114863938A true CN114863938A (en) 2022-08-05

Family

ID=82638609

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210570511.1A Pending CN114863938A (en) 2022-05-24 2022-05-24 Bird language identification method and system based on attention residual error and feature fusion

Country Status (1)

Country Link
CN (1) CN114863938A (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116206612A (en) * 2023-03-02 2023-06-02 中国科学院半导体研究所 Bird voice recognition method, model training method, device and electronic equipment
CN117275491A (en) * 2023-11-17 2023-12-22 青岛科技大学 Sound classification method based on audio conversion and time diagram neural network
CN117292693A (en) * 2023-11-27 2023-12-26 安徽大学 CRNN rare animal identification and positioning method integrated with self-attention mechanism

Citations (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP1661123A1 (en) * 2003-08-28 2006-05-31 Wildlife Acoustics, Inc. Method and apparatus for automatically identifying animal species from their vocalizations
US20110112839A1 (en) * 2009-09-03 2011-05-12 Honda Motor Co., Ltd. Command recognition device, command recognition method, and command recognition robot
CN106780911A (en) * 2016-12-30 2017-05-31 西南石油大学 A kind of gate inhibition's voice coding, decoding system and method
CN107393542A (en) * 2017-06-28 2017-11-24 北京林业大学 A kind of birds species identification method based on binary channels neutral net
CN109979441A (en) * 2019-04-03 2019-07-05 中国计量大学 A kind of birds recognition methods based on deep learning
CN110246504A (en) * 2019-05-20 2019-09-17 平安科技(深圳)有限公司 Birds sound identification method, device, computer equipment and storage medium
CN111754988A (en) * 2020-06-23 2020-10-09 南京工程学院 Sound scene classification method based on attention mechanism and double-path depth residual error network
CN113724712A (en) * 2021-08-10 2021-11-30 南京信息工程大学 Bird sound identification method based on multi-feature fusion and combination model
CN113936667A (en) * 2021-09-14 2022-01-14 广州大学 Bird song recognition model training method, recognition method and storage medium
CN114373476A (en) * 2022-01-11 2022-04-19 江西师范大学 Sound scene classification method based on multi-scale residual attention network

Patent Citations (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP1661123A1 (en) * 2003-08-28 2006-05-31 Wildlife Acoustics, Inc. Method and apparatus for automatically identifying animal species from their vocalizations
US20110112839A1 (en) * 2009-09-03 2011-05-12 Honda Motor Co., Ltd. Command recognition device, command recognition method, and command recognition robot
CN106780911A (en) * 2016-12-30 2017-05-31 西南石油大学 A kind of gate inhibition's voice coding, decoding system and method
CN107393542A (en) * 2017-06-28 2017-11-24 北京林业大学 A kind of birds species identification method based on binary channels neutral net
CN109979441A (en) * 2019-04-03 2019-07-05 中国计量大学 A kind of birds recognition methods based on deep learning
CN110246504A (en) * 2019-05-20 2019-09-17 平安科技(深圳)有限公司 Birds sound identification method, device, computer equipment and storage medium
CN111754988A (en) * 2020-06-23 2020-10-09 南京工程学院 Sound scene classification method based on attention mechanism and double-path depth residual error network
CN113724712A (en) * 2021-08-10 2021-11-30 南京信息工程大学 Bird sound identification method based on multi-feature fusion and combination model
CN113936667A (en) * 2021-09-14 2022-01-14 广州大学 Bird song recognition model training method, recognition method and storage medium
CN114373476A (en) * 2022-01-11 2022-04-19 江西师范大学 Sound scene classification method based on multi-scale residual attention network

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
KONG TINGLI,ET AL.: "Birdcall Identification and Prediction Based on ResNeSt Model", 《2021 IEEE 21ST INTERNATIONAL CONFERENCE ON COMMUNICATION TECHNOLOGY (ICCT)》, 4 January 2022 (2022-01-04) *
谢云澄: "基于深度学习的鸟类声音识别的研究与应用", 《中国优秀硕士学位论文全文库(工程科技Ⅱ辑)》, no. 1, 15 January 2022 (2022-01-15) *

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116206612A (en) * 2023-03-02 2023-06-02 中国科学院半导体研究所 Bird voice recognition method, model training method, device and electronic equipment
CN117275491A (en) * 2023-11-17 2023-12-22 青岛科技大学 Sound classification method based on audio conversion and time diagram neural network
CN117275491B (en) * 2023-11-17 2024-01-30 青岛科技大学 Sound classification method based on audio conversion and time attention seeking neural network
CN117292693A (en) * 2023-11-27 2023-12-26 安徽大学 CRNN rare animal identification and positioning method integrated with self-attention mechanism
CN117292693B (en) * 2023-11-27 2024-02-09 安徽大学 CRNN rare animal identification and positioning method integrated with self-attention mechanism

Similar Documents

Publication Publication Date Title
US20200335086A1 (en) Speech data augmentation
CN107610707B (en) A kind of method for recognizing sound-groove and device
CN109559736B (en) Automatic dubbing method for movie actors based on confrontation network
CN109410917B (en) Voice data classification method based on improved capsule network
CN114863938A (en) Bird language identification method and system based on attention residual error and feature fusion
CN108172238A (en) A kind of voice enhancement algorithm based on multiple convolutional neural networks in speech recognition system
CN105047194B (en) A kind of self study sound spectrograph feature extracting method for speech emotion recognition
CN107221320A (en) Train method, device, equipment and the computer-readable storage medium of acoustic feature extraction model
CN109887484A (en) A kind of speech recognition based on paired-associate learning and phoneme synthesizing method and device
CN111243602A (en) Voiceprint recognition method based on gender, nationality and emotional information
CN111899757B (en) Single-channel voice separation method and system for target speaker extraction
CN110853630B (en) Lightweight speech recognition method facing edge calculation
CN111402928B (en) Attention-based speech emotion state evaluation method, device, medium and equipment
CN110634476B (en) Method and system for rapidly building robust acoustic model
CN113488060B (en) Voiceprint recognition method and system based on variation information bottleneck
CN114678030A (en) Voiceprint identification method and device based on depth residual error network and attention mechanism
CN112232395A (en) Semi-supervised image classification method for generating confrontation network based on joint training
CN112562725A (en) Mixed voice emotion classification method based on spectrogram and capsule network
CN113611293A (en) Mongolian data set expansion method
Jiang et al. Speech Emotion Recognition Using Deep Convolutional Neural Network and Simple Recurrent Unit.
CN112331232B (en) Voice emotion recognition method combining CGAN spectrogram denoising and bilateral filtering spectrogram enhancement
CN114420151B (en) Speech emotion recognition method based on parallel tensor decomposition convolutional neural network
CN112233668B (en) Voice instruction and identity recognition method based on neural network
CN114818789A (en) Ship radiation noise identification method based on data enhancement
CN113870896A (en) Motion sound false judgment method and device based on time-frequency graph and convolutional neural network

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination