CN114863938A - Bird language identification method and system based on attention residual error and feature fusion - Google Patents
Bird language identification method and system based on attention residual error and feature fusion Download PDFInfo
- Publication number
- CN114863938A CN114863938A CN202210570511.1A CN202210570511A CN114863938A CN 114863938 A CN114863938 A CN 114863938A CN 202210570511 A CN202210570511 A CN 202210570511A CN 114863938 A CN114863938 A CN 114863938A
- Authority
- CN
- China
- Prior art keywords
- bird
- sound
- attention
- layer
- residual error
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000000034 method Methods 0.000 title claims abstract description 51
- 230000004927 fusion Effects 0.000 title claims abstract description 22
- 238000012549 training Methods 0.000 claims abstract description 35
- 238000012545 processing Methods 0.000 claims abstract description 18
- 238000000605 extraction Methods 0.000 claims abstract description 17
- 238000009432 framing Methods 0.000 claims abstract description 4
- 238000007781 pre-processing Methods 0.000 claims abstract description 4
- 230000006870 function Effects 0.000 claims description 25
- 238000004422 calculation algorithm Methods 0.000 claims description 13
- 238000013527 convolutional neural network Methods 0.000 claims description 13
- 238000010606 normalization Methods 0.000 claims description 13
- 230000004913 activation Effects 0.000 claims description 12
- 238000001914 filtration Methods 0.000 claims description 12
- 238000011176 pooling Methods 0.000 claims description 12
- 238000005070 sampling Methods 0.000 claims description 10
- 230000008569 process Effects 0.000 claims description 7
- 238000013528 artificial neural network Methods 0.000 claims description 6
- 230000001629 suppression Effects 0.000 claims description 6
- 238000001514 detection method Methods 0.000 claims description 5
- 125000004122 cyclic group Chemical group 0.000 claims description 3
- 238000005457 optimization Methods 0.000 claims description 3
- 230000009466 transformation Effects 0.000 claims description 3
- 238000001228 spectrum Methods 0.000 abstract description 6
- 238000013135 deep learning Methods 0.000 description 6
- 238000007500 overflow downdraw method Methods 0.000 description 6
- 241000894007 species Species 0.000 description 6
- 238000005516 engineering process Methods 0.000 description 5
- 238000010586 diagram Methods 0.000 description 4
- 239000011159 matrix material Substances 0.000 description 4
- 238000004364 calculation method Methods 0.000 description 3
- 230000000694 effects Effects 0.000 description 3
- 238000004088 simulation Methods 0.000 description 3
- 230000005236 sound signal Effects 0.000 description 3
- 238000007476 Maximum Likelihood Methods 0.000 description 1
- 241000219000 Populus Species 0.000 description 1
- 230000006978 adaptation Effects 0.000 description 1
- 238000013473 artificial intelligence Methods 0.000 description 1
- 230000009286 beneficial effect Effects 0.000 description 1
- 238000013461 design Methods 0.000 description 1
- 238000011161 development Methods 0.000 description 1
- 230000007613 environmental effect Effects 0.000 description 1
- 230000005183 environmental health Effects 0.000 description 1
- 238000011156 evaluation Methods 0.000 description 1
- 230000036541 health Effects 0.000 description 1
- 230000006872 improvement Effects 0.000 description 1
- 238000010801 machine learning Methods 0.000 description 1
- 238000004519 manufacturing process Methods 0.000 description 1
- 238000013508 migration Methods 0.000 description 1
- 230000005012 migration Effects 0.000 description 1
- 239000000203 mixture Substances 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 238000003062 neural network model Methods 0.000 description 1
- 210000000056 organ Anatomy 0.000 description 1
- 238000011160 research Methods 0.000 description 1
- 230000003068 static effect Effects 0.000 description 1
- 238000013526 transfer learning Methods 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L17/00—Speaker identification or verification techniques
- G10L17/26—Recognition of special voice characteristics, e.g. for use in lie detectors; Recognition of animal voices
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L17/00—Speaker identification or verification techniques
- G10L17/02—Preprocessing operations, e.g. segment selection; Pattern representation or modelling, e.g. based on linear discriminant analysis [LDA] or principal components; Feature selection or extraction
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L17/00—Speaker identification or verification techniques
- G10L17/04—Training, enrolment or model building
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L17/00—Speaker identification or verification techniques
- G10L17/18—Artificial neural networks; Connectionist approaches
Landscapes
- Engineering & Computer Science (AREA)
- Health & Medical Sciences (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Human Computer Interaction (AREA)
- Physics & Mathematics (AREA)
- Acoustics & Sound (AREA)
- Multimedia (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Artificial Intelligence (AREA)
- Evolutionary Computation (AREA)
- Image Analysis (AREA)
Abstract
The invention discloses a bird sound identification method and system based on attention residual error and feature fusion, which comprises the following steps: firstly, preprocessing operations of framing and windowing are carried out on a bird sound training set; then, processing the preprocessed bird sound training set by two feature extraction methods, and converting the obtained feature information into an energy spectrum image; in the training stage, the probability of bird species is calculated by using a residual error network with a horizontal and vertical attention structure module and a cross entropy loss function, a final classification prediction layer is obtained, and classification prediction of bird sounds is realized; a bird voice recognition system is designed and implemented, which can perform bird voice recognition classification using the method proposed by the present invention. The method improves the accuracy of bird sound recognition with high sound confusion degree, and can prove the effectiveness of the method through the test result.
Description
Technical Field
The invention relates to the technical field of bird voiceprint recognition based on deep learning, in particular to a bird language recognition method and system based on attention residual error and feature fusion.
Background
Birds are an important index for evaluating the health of an ecosystem, and as an important component of the ecosystem, the existence and migration mode of the birds are often warning signals of environmental health of any specific area. In recent decades, the protection of bird biodiversity is more and more important, and the significance of bird voiceprint recognition technology is more and more important. The sound production structure and organs of each bird have certain differences, biological characteristics of the birds cannot be copied, the biological characteristics can be used for identifying biological species, and bird voiceprint technology is used for identifying the species of the biological characteristics specific to bird species by utilizing voiceprint identification technology. At present, bird voiceprint recognition technology can be divided into a traditional method and a deep learning-based method according to the type of a model; the traditional method mainly uses a Gaussian mixture model and maximum likelihood estimation to learn the sound with the highest score; the deep learning-based method mainly comprises the steps of training, identifying and detecting through a neural network model. Compared with the traditional method and the machine learning method, the deep learning-based method has more excellent performance in processing bird voice recognition tasks. With the rapid development of artificial intelligence and deep learning, the bird voiceprint recognition technology has wide application prospect in the field of environmental protection.
Document 1(Lee C H, Han C, Chuang C. automatic Classification of Bird specifices From the sources Using Two-Dimensional Cepstral Coefficients [ J ]. IEEE Transactions on Audio Spech & Lange Processing,2008,16(8):1541-1550.) is to use single Audio features to improve the feature information expression by dynamic and static extraction and then fusion representation, so as to improve the identification accuracy. Document 2(Efremova D B, Sankupelay M, Konovalov D A. data-Efficient Classification of Birdcall Through consistent Neural Networks Transfer Learning [ C ] In: Digital Image Computing: Techniques and Applications,2019,294-301.) utilizes ResNet50 deep Convolutional Neural Networks as a model to increase the speed of bird identification. Document 3 (spring and courage of poplar, qihongda, penyanqiu, etc..) research on energy spectrogram fusing voiceprint information in bird identification [ J ] application acoustics 2020,39(3): 453) is combined with classifier algorithm through LBP and HOG characteristics, and the generated confrontation network frequency spectrum information is additionally used for data enhancement, so that the identification rate is further improved.
Most models in the recognition task based on deep learning use large convolutional neural networks, although the recognition rate is improved, the problems of difficult training, low detection speed and the like caused by the increase of parameter calculation amount are inevitable. In the aspect of feature extraction, a single feature extraction method is generally used, but a single feature parameter cannot completely express all the characteristics of bird sounds in the identification and detection process, and certain limitations exist.
Disclosure of Invention
In order to solve the problems of audio characteristic information limitation, large network parameter quantity and the like, the invention provides a bird language identification method based on attention residual error and characteristic fusion, which uses two characteristic extraction methods, performs information fusion to obtain characteristic information, and converts the characteristic information into an energy spectrogram; the energy spectrogram is input into a bird language identification classification convolution neural network, corresponding characteristic images are generated through sampling, a residual error structure network with a horizontal and vertical attention module is used for effectively paying attention to channel relations among the characteristic images after the energy spectrogram is input into the network, and meanwhile, the calculation cost is reduced and the identification precision is improved.
A bird language identification method based on attention residual error and feature fusion is characterized by comprising the following steps:
s1, collecting the sound of various birds in the natural environment to form a sound training set; marking the sound of the bird species to which the bird belongs, and controlling the time range of each sound to be between 2s and 30s, wherein the sound contains the singing of a single bird;
s2, sampling the sound training set in the step S1 by using the same sampling frequency, and then unifying the audio time of the sound training set through the preprocessing operations of framing and windowing;
s3, obtaining feature information through two feature extraction methods, and finally converting the feature information into an energy spectrogram;
s31, processing the preprocessed sound training set by sequentially using a Mel trigonometric filtering algorithm and a cepstrum mean variance normalization method to obtain a vector fraction F; processing the preprocessed sound training set by using a gamma pass filtering algorithm added with noise suppression processing and cepstrum mean variance normalization in sequence to obtain a vector score G; and fusing the two vector fractions to obtain characteristic information f:
f=ωF+(1-ω)G
where ω represents the mixing weight coefficient.
And S32, converting the characteristic information f obtained in the S31 into an energy spectrogram.
S33, performing image enhancement on the obtained energy spectrogram; wherein the image enhancement operation comprises image color random gray scale transformation and image rotation.
The constructed bird language identification classification convolution neural network specifically comprises the following steps:
the network structure is sequentially provided with 3 convolution layers with convolution kernels of 3 x 3 and step length of 2, a maximum pooling layer, an activation function layer and 48 residual error structure layers with horizontal and vertical attention modules; the residual error structure layer with the horizontal and vertical attention modules comprises a convolution layer, an activation function layer, an average pooling layer and a batch normalization layer; the network structure uses a global average pooling operation at the last layer and an activation function layer after all convolution operations.
Inputting the energy spectrogram into the bird recognition classification convolutional neural network, and performing down-sampling operation on 3 convolutional layers with 3 × 3 step lengths of 2 to obtain a characteristic image Q, wherein the process is represented as follows:
Q=F 3*3 (F 3*3 (F 3*3 (f)))
processing the characteristic image Q by a residual structural layer with a horizontal and vertical attention module, wherein the residual structural layer with the horizontal and vertical attention module comprises the following parts: convolution layer 1 x 1, convolution layer 3 x 3, convolution layer 1 x 1, batch normalization layer, activation function layer, horizontal and vertical attention module and residual connection; the process can be expressed as:
F out =F HW (F 1*1 (F 3*3 (F 1*1 (x))))+x
F HW is a horizontal-vertical attention module, which is composed of two attention submodules in the vertical and horizontal directions, respectively, and is expressed as:
F HW =F H +F W
wherein:
δ denotes the use of sigmoid function, conv (x) denotes the use of a 1 × 1 size convolution kernel, and Avgpool denotes the use of an average pooling operation.
S4, constructing a bird language identification classification convolution neural network; inputting the energy spectrogram obtained in the step S3 into the constructed bird language identification classification convolutional neural network for training; the loss function uses a classified cross entropy loss function, an optimization strategy and a hyper-parameter are set for constructing a bird language recognition classification network, the loss function is continuously reduced by carrying out cyclic iterative training on the network until the set iteration times are finished and the training weight parameters are stored;
s5, constructing a bird language recognition system based on attention residual error and feature fusion by using the bird language recognition and classification convolutional neural network constructed in the step S3 and the obtained network training weight parameters, performing bird language recognition and classification on the energy spectrogram to be detected by using the detection system, and performing quantity marking and classification on all input bird spectrograms by using the bird language recognition system.
The invention also provides a bird language recognition system based on residual attention and feature fusion, which comprises the following modules:
the bird sound acquisition module is configured to acquire a bird sound data set to be processed;
a bird voice recognition model acquisition module, which configures a bird voice recognizer by using the bird voice recognition model and the parameter file obtained in the bird language recognition method based on attention residual and feature fusion in the claim 1, and is used for bird voice type recognition and classification;
and the bird counting module is used for counting the number of the obtained bird species.
Has the advantages that:
1. the invention provides a bird language identification method based on attention residual error and feature fusion. Because the bird sound data set comprises a large number of short-time bird original singing signals and is various in types, the invention firstly uses voiceprint to preprocess audio signals, then converts the audio signals into energy spectrogram by audio characteristic extraction, and then uses an attention residual error network to extract picture characteristics to accelerate the training speed of the recognition and classification network and reduce the network parameter number.
2. When the audio signal features are extracted, different feature extraction methods are adopted, cepstrum mean and variance normalization and feature warping processing are used after different features are obtained, and channel mismatch and channel effect in audio possibly existing are reduced.
3. The invention provides and designs a bird voice recognition system which can perform bird voice recognition by using the bird voice recognition method based on attention residual error and feature fusion.
Drawings
FIG. 1 is a general diagram of a model structure of a bird language identification method used in an embodiment of the present invention;
fig. 2 is a diagram illustrating an attention structure of a bird language recognition method according to an embodiment of the present invention; wherein, the sub-graph a is the integral structure of the attention residual error module; sub-graph b is the vertical and horizontal attention sub-structure diagram in the attention module;
FIG. 3 is a schematic flow chart of bird voice recognition system according to an embodiment of the present invention;
FIG. 4 is a block diagram of a bird voice recognition system according to an embodiment of the present invention;
fig. 5 is a comparison graph of feature extraction by the method of the present invention and the method not employed in the present invention, in which, sub-graph a is an energy spectrum graph of the feature extraction method using only the mel-triangle filtering algorithm, sub-graph b is an energy spectrum graph of the feature extraction method using only the gamma-pass filtering algorithm with noise suppression processing, and sub-graph c is an energy spectrum graph obtained by the feature fusion method of the present invention.
Fig. 6 is a comparison graph of confusion matrices between the method of the present invention and the method not of the present invention, in which, sub-graph a is a confusion matrix graph using only the mel triangle filtering algorithm feature extraction method, sub-graph b is a confusion matrix graph using only the gamma pass filtering algorithm feature extraction method with noise suppression processing, sub-graph c is a confusion matrix graph using the feature fusion method, and sub-graph d is a confusion matrix graph using the method of the present invention.
Detailed Description
In order to make the technical features, objects and advantages of the present invention more comprehensible, one embodiment of the present invention is further described with reference to the accompanying drawings. The examples are given solely for the purpose of illustration and are not to be construed as limitations of the present invention, as numerous insubstantial modifications and adaptations of the invention may be made by those skilled in the art based on the teachings herein.
A bird language identification method based on attention residual error and feature fusion is characterized by comprising the following steps:
s1, collecting the sound of various birds in the natural environment to form a sound training set; marking the sound of the bird species to which the bird belongs, and controlling the time range of each sound to be between 2s and 30s, wherein the sound contains the singing of a single bird; the number of bird sounds screened here is greater than or equal to 200 per bird.
S2, sampling the sound training set in the step S1 by using the same sampling frequency, and then unifying the audio time of the sound training set through the preprocessing operations of framing and windowing;
s3, obtaining feature information through two feature extraction methods, and finally converting the feature information into an energy spectrogram;
s4, constructing a bird language identification classification convolution neural network; inputting the energy spectrogram obtained in the step S3 into the constructed bird language identification classification convolutional neural network for training; the loss function uses a classified cross entropy loss function, an optimization strategy and a hyper-parameter are set for constructing a bird language recognition classification network, the loss function is continuously reduced by carrying out cyclic iterative training on the network until the set iteration times are finished and the training weight parameters are stored; when the training iteration times are reached, the loss function value is not obviously reduced after the training and fitting are performed;
s5, constructing a bird language recognition system based on attention residual error and feature fusion by using the bird language recognition and classification convolutional neural network constructed in the step S3 and the obtained network training weight parameters, performing bird language recognition and classification on the energy spectrogram to be detected by using the detection system, and performing quantity marking and classification on all input bird spectrograms by using the bird language recognition system.
As a specific embodiment of the present invention, the steps specifically include the following steps:
s31, processing the preprocessed sound training set by sequentially using a Mel trigonometric filtering algorithm and a cepstrum mean variance normalization method to obtain a vector fraction F; processing the preprocessed sound training set by using a gamma pass filtering algorithm added with noise suppression processing and cepstrum mean variance normalization in sequence to obtain a vector score G; and fusing the two vector fractions to obtain characteristic information f:
f=ωF+(1-ω)G
where ω represents the mixing weight coefficient.
And S32, converting the characteristic information f obtained in the S31 into an energy spectrogram.
S33, performing image enhancement on the obtained energy spectrogram; wherein the image enhancement operation comprises image color random gray scale transformation and image rotation.
As a specific embodiment of the present invention, the bird language identification classification convolutional neural network constructed in step S3 specifically includes:
the network structure is sequentially provided with 3 convolution layers with convolution kernels of 3 x 3 and step length of 2, a maximum pooling layer, an activation function layer and 48 residual error structure layers with horizontal and vertical attention modules; the residual error structure layer with the horizontal and vertical attention modules comprises a convolution layer, an activation function layer, an average pooling layer and a batch normalization layer; the network structure uses a global average pooling operation at the last layer and an activation function layer after all convolution operations.
Inputting the energy spectrogram into the bird recognition classification convolutional neural network, and performing down-sampling operation on 3 convolutional layers with 3 × 3 step lengths of 2 to obtain a characteristic image Q, wherein the process is represented as follows:
Q=F 3*3 (F 3*3 (F 3*3 (f)))
processing the characteristic image Q by a residual structural layer with a horizontal and vertical attention module, wherein the residual structural layer with the horizontal and vertical attention module comprises the following parts: convolution layer 1 x 1, convolution layer 3 x 3, convolution layer 1 x 1, batch normalization layer, activation function layer, horizontal and vertical attention module and residual connection; the process can be expressed as:
F out =F HW (F 1*1 (F 3*3 (F 1*1 (x))))+x
F HW is a horizontal-vertical attention module, which is composed of two attention submodules in the vertical and horizontal directions, respectively, and is expressed as:
F HW =F H +F W
wherein:
δ denotes the use of sigmoid function, conv (x) denotes the use of a 1 × 1 size convolution kernel, and Avgpool denotes the use of an average pooling operation.
Simulation experiment
Fig. 5 shows that the frequency representation in the energy spectrum graph c subgraph using the method is more obvious than the frequency representation in the subgraphs a and b, which shows that the feature fusion method in the invention has certain enhancement effect on the frequency and is beneficial to the improvement of the recognition rate.
The recognition rates of the method and the comparison recognition method are shown in table 1, wherein the characteristic 1 method is an index obtained by only using a Mel-triangular filtering algorithm and a residual error network, the characteristic 2 method is an index obtained by only using a gamma-pass filtering algorithm subjected to noise suppression and the residual error network, and the characteristic fusion method is an index obtained by only using a characteristic fusion method and the residual error network.
Table 1 simulation experiment bird voice recognition classification evaluation index statistical table
Method | Average precision (%) | Average recall (%) | Average F1 (%) |
Characteristic 1 method | 88.96 | 86.06 | 88.06 |
Characteristic 2 method | 90.17 | 89.14 | 89.14 |
Feature fusion method | 93.43 | 90.91 | 92.15 |
The method of the invention | 93.62 | 90.59 | 92.17 |
As can be seen from table 1 and fig. 6: by using the feature fusion and attention residual error network method, the accuracy and the F1 value are improved to a certain extent, and the classification effect is superior to that of a single feature extraction method and a residual error network. The simulation experiment results show that the method of the invention improves the expression of the feature extraction on the bird sound information and well improves the recognition performance without adding extra calculation cost to the network.
The content of the method of the present invention is described above, and those skilled in the art can implement the method of the present invention based on the description of the content. Based on the above description of the invention, other embodiments obtained by a person skilled in the art without any inventive step should fall within the scope of protection of the invention.
Claims (4)
1. A bird language identification method and system based on attention residual error and feature fusion are characterized by comprising the following steps:
s1, collecting the sound of various birds in the natural environment to form a sound training set; marking the sound of the bird species to which the bird belongs, and controlling the time range of each sound to be between 2s and 30s, wherein the sound contains the singing of a single bird;
s2, sampling the sound training set in the step S1 by using the same sampling frequency, and then unifying the audio time of the sound training set through the preprocessing operations of framing and windowing;
s3, obtaining feature information through two feature extraction methods, and finally converting the feature information into an energy spectrogram;
s4, constructing a bird language identification classification convolution neural network; inputting the energy spectrogram obtained in the step S3 into the constructed bird language identification classification convolutional neural network for training; the loss function uses a classified cross entropy loss function, an optimization strategy and a hyper-parameter are set for constructing a bird language recognition classification network, the loss function is continuously reduced by carrying out cyclic iterative training on the network until the set iteration times are finished and the training weight parameters are stored;
s5, constructing a bird language recognition system based on attention residual error and feature fusion by using the bird language recognition and classification convolutional neural network constructed in the step S3 and the obtained network training weight parameters, recognizing and classifying bird sounds by using the detection system, and quantitatively marking and classifying all input bird sounds by using the bird language recognition system.
2. The method for identifying bird language based on attention residual error and feature fusion according to claim 1, wherein the step S3 comprises the following steps:
s31, processing the preprocessed sound training set by sequentially using a Mel trigonometric filtering algorithm and a cepstrum mean variance normalization method to obtain a vector fraction F; processing the preprocessed sound training set by using a gamma pass filtering algorithm added with noise suppression processing and cepstrum mean variance normalization in sequence to obtain a vector score G; and fusing the two vector fractions to obtain characteristic information f:
f=ωF+(1-ω)G
where ω represents the mixing weight coefficient.
And S32, converting the characteristic information f obtained in the S31 into an energy spectrogram.
S33, performing image enhancement on the obtained energy spectrogram; wherein the image enhancement operation comprises image color random gray scale transformation and image rotation.
3. The method for identifying bird language based on attention residual error and feature fusion according to claim 1, wherein the bird language identification classification convolutional neural network constructed in step S3 specifically comprises:
the network structure is sequentially provided with 3 convolution layers with convolution kernels of 3 x 3 and step length of 2, a maximum pooling layer, an activation function layer and 48 residual error structure layers with horizontal and vertical attention modules; the residual error structure layer with the horizontal and vertical attention modules comprises a convolution layer, an activation function layer, an average pooling layer and a batch normalization layer; the network structure uses a global average pooling operation at the last layer and an activation function layer after all convolution operations.
Inputting the energy spectrogram into the bird recognition classification convolutional neural network, and performing down-sampling operation on 3 convolutional layers with 3 × 3 step lengths of 2 to obtain a characteristic image Q, wherein the process is represented as follows:
Q=F 3*3 (F 3*3 (F 3*3 (f)))
processing the characteristic image Q by a residual structural layer with a horizontal and vertical attention module, wherein the residual structural layer with the horizontal and vertical attention module comprises the following parts: convolution layer 1 x 1, convolution layer 3 x 3, convolution layer 1 x 1, batch normalization layer, activation function layer, horizontal and vertical attention module and residual connection; the process can be expressed as:
F out =F HW (F 1*1 (F 3*3 (F 1*1 (x))))+x
F HW is a horizontal-vertical attention module, which is composed of two attention submodules in the vertical and horizontal directions, respectively, and is expressed as:
F HW =F H +F W
wherein:
δ denotes the use of sigmoid function, conv (x) denotes the use of a 1 × 1 size convolution kernel, and Avgpool denotes the use of an average pooling operation.
4. A bird language recognition system based on attention residual and feature fusion is characterized by comprising the following modules:
the bird sound acquisition module is configured to acquire a bird sound data set to be processed;
a bird voice recognition model acquisition module configured to form a bird voice recognizer with the bird voice recognition model and the parameter file obtained in the bird language recognition method based on attention residual and feature fusion in claim 1, and to be used for bird voice category recognition and classification;
and the bird counting module is used for counting the number of the obtained bird species.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202210570511.1A CN114863938A (en) | 2022-05-24 | 2022-05-24 | Bird language identification method and system based on attention residual error and feature fusion |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202210570511.1A CN114863938A (en) | 2022-05-24 | 2022-05-24 | Bird language identification method and system based on attention residual error and feature fusion |
Publications (1)
Publication Number | Publication Date |
---|---|
CN114863938A true CN114863938A (en) | 2022-08-05 |
Family
ID=82638609
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202210570511.1A Pending CN114863938A (en) | 2022-05-24 | 2022-05-24 | Bird language identification method and system based on attention residual error and feature fusion |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN114863938A (en) |
Cited By (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN116206612A (en) * | 2023-03-02 | 2023-06-02 | 中国科学院半导体研究所 | Bird voice recognition method, model training method, device and electronic equipment |
CN117275491A (en) * | 2023-11-17 | 2023-12-22 | 青岛科技大学 | Sound classification method based on audio conversion and time diagram neural network |
CN117292693A (en) * | 2023-11-27 | 2023-12-26 | 安徽大学 | CRNN rare animal identification and positioning method integrated with self-attention mechanism |
Citations (10)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
EP1661123A1 (en) * | 2003-08-28 | 2006-05-31 | Wildlife Acoustics, Inc. | Method and apparatus for automatically identifying animal species from their vocalizations |
US20110112839A1 (en) * | 2009-09-03 | 2011-05-12 | Honda Motor Co., Ltd. | Command recognition device, command recognition method, and command recognition robot |
CN106780911A (en) * | 2016-12-30 | 2017-05-31 | 西南石油大学 | A kind of gate inhibition's voice coding, decoding system and method |
CN107393542A (en) * | 2017-06-28 | 2017-11-24 | 北京林业大学 | A kind of birds species identification method based on binary channels neutral net |
CN109979441A (en) * | 2019-04-03 | 2019-07-05 | 中国计量大学 | A kind of birds recognition methods based on deep learning |
CN110246504A (en) * | 2019-05-20 | 2019-09-17 | 平安科技(深圳)有限公司 | Birds sound identification method, device, computer equipment and storage medium |
CN111754988A (en) * | 2020-06-23 | 2020-10-09 | 南京工程学院 | Sound scene classification method based on attention mechanism and double-path depth residual error network |
CN113724712A (en) * | 2021-08-10 | 2021-11-30 | 南京信息工程大学 | Bird sound identification method based on multi-feature fusion and combination model |
CN113936667A (en) * | 2021-09-14 | 2022-01-14 | 广州大学 | Bird song recognition model training method, recognition method and storage medium |
CN114373476A (en) * | 2022-01-11 | 2022-04-19 | 江西师范大学 | Sound scene classification method based on multi-scale residual attention network |
-
2022
- 2022-05-24 CN CN202210570511.1A patent/CN114863938A/en active Pending
Patent Citations (10)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
EP1661123A1 (en) * | 2003-08-28 | 2006-05-31 | Wildlife Acoustics, Inc. | Method and apparatus for automatically identifying animal species from their vocalizations |
US20110112839A1 (en) * | 2009-09-03 | 2011-05-12 | Honda Motor Co., Ltd. | Command recognition device, command recognition method, and command recognition robot |
CN106780911A (en) * | 2016-12-30 | 2017-05-31 | 西南石油大学 | A kind of gate inhibition's voice coding, decoding system and method |
CN107393542A (en) * | 2017-06-28 | 2017-11-24 | 北京林业大学 | A kind of birds species identification method based on binary channels neutral net |
CN109979441A (en) * | 2019-04-03 | 2019-07-05 | 中国计量大学 | A kind of birds recognition methods based on deep learning |
CN110246504A (en) * | 2019-05-20 | 2019-09-17 | 平安科技(深圳)有限公司 | Birds sound identification method, device, computer equipment and storage medium |
CN111754988A (en) * | 2020-06-23 | 2020-10-09 | 南京工程学院 | Sound scene classification method based on attention mechanism and double-path depth residual error network |
CN113724712A (en) * | 2021-08-10 | 2021-11-30 | 南京信息工程大学 | Bird sound identification method based on multi-feature fusion and combination model |
CN113936667A (en) * | 2021-09-14 | 2022-01-14 | 广州大学 | Bird song recognition model training method, recognition method and storage medium |
CN114373476A (en) * | 2022-01-11 | 2022-04-19 | 江西师范大学 | Sound scene classification method based on multi-scale residual attention network |
Non-Patent Citations (2)
Title |
---|
KONG TINGLI,ET AL.: "Birdcall Identification and Prediction Based on ResNeSt Model", 《2021 IEEE 21ST INTERNATIONAL CONFERENCE ON COMMUNICATION TECHNOLOGY (ICCT)》, 4 January 2022 (2022-01-04) * |
谢云澄: "基于深度学习的鸟类声音识别的研究与应用", 《中国优秀硕士学位论文全文库(工程科技Ⅱ辑)》, no. 1, 15 January 2022 (2022-01-15) * |
Cited By (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN116206612A (en) * | 2023-03-02 | 2023-06-02 | 中国科学院半导体研究所 | Bird voice recognition method, model training method, device and electronic equipment |
CN117275491A (en) * | 2023-11-17 | 2023-12-22 | 青岛科技大学 | Sound classification method based on audio conversion and time diagram neural network |
CN117275491B (en) * | 2023-11-17 | 2024-01-30 | 青岛科技大学 | Sound classification method based on audio conversion and time attention seeking neural network |
CN117292693A (en) * | 2023-11-27 | 2023-12-26 | 安徽大学 | CRNN rare animal identification and positioning method integrated with self-attention mechanism |
CN117292693B (en) * | 2023-11-27 | 2024-02-09 | 安徽大学 | CRNN rare animal identification and positioning method integrated with self-attention mechanism |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US20200335086A1 (en) | Speech data augmentation | |
CN107610707B (en) | A kind of method for recognizing sound-groove and device | |
CN109559736B (en) | Automatic dubbing method for movie actors based on confrontation network | |
CN109410917B (en) | Voice data classification method based on improved capsule network | |
CN114863938A (en) | Bird language identification method and system based on attention residual error and feature fusion | |
CN108172238A (en) | A kind of voice enhancement algorithm based on multiple convolutional neural networks in speech recognition system | |
CN105047194B (en) | A kind of self study sound spectrograph feature extracting method for speech emotion recognition | |
CN107221320A (en) | Train method, device, equipment and the computer-readable storage medium of acoustic feature extraction model | |
CN109887484A (en) | A kind of speech recognition based on paired-associate learning and phoneme synthesizing method and device | |
CN111243602A (en) | Voiceprint recognition method based on gender, nationality and emotional information | |
CN111899757B (en) | Single-channel voice separation method and system for target speaker extraction | |
CN110853630B (en) | Lightweight speech recognition method facing edge calculation | |
CN111402928B (en) | Attention-based speech emotion state evaluation method, device, medium and equipment | |
CN110634476B (en) | Method and system for rapidly building robust acoustic model | |
CN113488060B (en) | Voiceprint recognition method and system based on variation information bottleneck | |
CN114678030A (en) | Voiceprint identification method and device based on depth residual error network and attention mechanism | |
CN112232395A (en) | Semi-supervised image classification method for generating confrontation network based on joint training | |
CN112562725A (en) | Mixed voice emotion classification method based on spectrogram and capsule network | |
CN113611293A (en) | Mongolian data set expansion method | |
Jiang et al. | Speech Emotion Recognition Using Deep Convolutional Neural Network and Simple Recurrent Unit. | |
CN112331232B (en) | Voice emotion recognition method combining CGAN spectrogram denoising and bilateral filtering spectrogram enhancement | |
CN114420151B (en) | Speech emotion recognition method based on parallel tensor decomposition convolutional neural network | |
CN112233668B (en) | Voice instruction and identity recognition method based on neural network | |
CN114818789A (en) | Ship radiation noise identification method based on data enhancement | |
CN113870896A (en) | Motion sound false judgment method and device based on time-frequency graph and convolutional neural network |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination |