CN110085218A - A kind of audio scene recognition method based on feature pyramid network - Google Patents
A kind of audio scene recognition method based on feature pyramid network Download PDFInfo
- Publication number
- CN110085218A CN110085218A CN201910233193.8A CN201910233193A CN110085218A CN 110085218 A CN110085218 A CN 110085218A CN 201910233193 A CN201910233193 A CN 201910233193A CN 110085218 A CN110085218 A CN 110085218A
- Authority
- CN
- China
- Prior art keywords
- audio
- feature pyramid
- pyramid network
- audio scene
- network model
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000000034 method Methods 0.000 title claims abstract description 40
- 230000000644 propagated effect Effects 0.000 claims abstract description 7
- 239000013598 vector Substances 0.000 claims description 6
- 238000000605 extraction Methods 0.000 claims description 5
- 238000009432 framing Methods 0.000 claims description 5
- 230000005540 biological transmission Effects 0.000 claims description 3
- 238000001914 filtration Methods 0.000 claims description 3
- 238000001228 spectrum Methods 0.000 claims 1
- 238000013527 convolutional neural network Methods 0.000 description 7
- 238000013528 artificial neural network Methods 0.000 description 4
- 238000012706 support-vector machine Methods 0.000 description 3
- 238000013135 deep learning Methods 0.000 description 2
- 230000006870 function Effects 0.000 description 2
- PCHJSUWPFVWCPO-UHFFFAOYSA-N gold Chemical compound [Au] PCHJSUWPFVWCPO-UHFFFAOYSA-N 0.000 description 2
- 239000010931 gold Substances 0.000 description 2
- 229910052737 gold Inorganic materials 0.000 description 2
- 238000010801 machine learning Methods 0.000 description 2
- 230000000306 recurrent effect Effects 0.000 description 2
- ORILYTVJVMAKLC-UHFFFAOYSA-N Adamantane Natural products C1C(C2)CC3CC1CC2C3 ORILYTVJVMAKLC-UHFFFAOYSA-N 0.000 description 1
- 230000003044 adaptive effect Effects 0.000 description 1
- 230000000694 effects Effects 0.000 description 1
- 238000010606 normalization Methods 0.000 description 1
- NJPPVKZQTLUDBO-UHFFFAOYSA-N novaluron Chemical compound C1=C(Cl)C(OC(F)(F)C(OC(F)(F)F)F)=CC=C1NC(=O)NC(=O)C1=C(F)C=CC=C1F NJPPVKZQTLUDBO-UHFFFAOYSA-N 0.000 description 1
- 238000003909 pattern recognition Methods 0.000 description 1
- 230000005236 sound signal Effects 0.000 description 1
Classifications
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/06—Creation of reference templates; Training of speech recognition systems, e.g. adaptation to the characteristics of the speaker's voice
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/06—Creation of reference templates; Training of speech recognition systems, e.g. adaptation to the characteristics of the speaker's voice
- G10L15/063—Training
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L25/00—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
- G10L25/03—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the type of extracted parameters
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L25/00—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
- G10L25/27—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the analysis technique
- G10L25/30—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the analysis technique using neural networks
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L25/00—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
- G10L25/45—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the type of analysis window
Landscapes
- Engineering & Computer Science (AREA)
- Computational Linguistics (AREA)
- Health & Medical Sciences (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Human Computer Interaction (AREA)
- Physics & Mathematics (AREA)
- Acoustics & Sound (AREA)
- Multimedia (AREA)
- Signal Processing (AREA)
- Artificial Intelligence (AREA)
- Evolutionary Computation (AREA)
- Image Analysis (AREA)
Abstract
A kind of audio scene recognition method based on feature pyramid network: it establishes and is used for audio scene identification feature pyramid network model;It will include that the audio file of different scenes classification and the training set of corresponding scene type are trained to audio scene identification feature pyramid network model is used for;It reads the audio file for needing to identify and truncation is carried out to the audio file;Extract Meier feature, obtain two Jan Vermeer sonographs of each audio frame, and it is normalized, and propagated forward is carried out for audio scene identification feature pyramid network model after training, the prediction probability to different audio scene classifications is obtained, the maximum scene type of prediction probability is taken to export as the prediction of audio frame corresponding to two Jan Vermeer sonographs;The audio file identified is needed to predict to whole section.The present invention takes full advantage of low-level image feature information, improves model performance.Information brought by the more and more data provided under current big data trend can be made full use of, predetermined speed is very fast.
Description
Technical field
The present invention relates to a kind of audio scene recognition methods.More particularly to a kind of audio based on feature pyramid network
Scene recognition method.
Background technique
Audio scene identification is the data flow for allowing machine to pass through one section of audio file recorded of processing or upload, and purpose is
Allow machine that can imitate the mankind to identify one kind of audio specific background information (such as: park, street or dining room) behind
Method.
In machine learning field, in order to solve the problems, such as scene Recognition, many different models and audio frequency characteristics are proposed
Representation method.Early in 1997, the correlative study that scene audio is solved the problems, such as using neural network is just had already appeared.
Liu in 1998 et al. uses Recognition with Recurrent Neural Network (Recurrent Neural Networks, RNNs) and nearest neighbor classifier
The ambient sounds different to five classes distinguish.However, due to having introduced excessive parameter, both the above mind in training process
Model complexity through network is very high, and performance is poor after training.In the match that 2013 are held by IEEE AASP
In, many teams participating in the contest attempt to use some traditional machine learning methods, such as gauss hybrid models
(GaussianMixtureModels, GMMs), support vector machines (Support Vector Machines, SVMs) are based on tree
Classification method (Tree-based Methods) and classification method (Bag-based Methods) based on packet, to distinguish 10
The different sound scenery classification of class.Although these methods have lower computation complexity, due to their model structure phase
To simple and be unable to fully utilize the more and more data provided under current big data trend, conventional machines learning method
It is unable to reach satisfactory audio scene recognition effect.
In recent years, the proposition of convolutional neural networks (Convolutional Neural Networks, CNNs) has pushed mind
Application through network and deep learning in fields such as pattern-recognitions.The thought that wherein local sensing and weight are shared is reducing mould
While shape parameter, more features can also be captured to improve network model performance.Valenti in 2017 et al. is by CNN
It applies and identifies field in audio scene, and achieve good results.However, the feature extraction of tradition CNNs from bottom to top
Journey can not effectively utilize the detailed information of low-level image feature.
Recently, computer vision field propose it is a kind of utilize the pyramidal method of CNNs network struction feature, can be with
While retaining higher level of abstraction semantic information again, low-level image feature information is made full use of.
Summary of the invention
The technical problem to be solved by the invention is to provide a kind of with higher accuracy based on feature pyramid network
Audio scene recognition method.
The technical scheme adopted by the invention is that: a kind of audio scene recognition method based on feature pyramid network, packet
Include following steps:
1) it establishes and is used for audio scene identification feature pyramid network model;
It 2) will include that the audio file of different scenes classification and the training set of corresponding scene type input and be used for audio
Scene Recognition feature pyramid network model is trained to for audio scene identification feature pyramid network model;
3) it reads the audio file for needing to identify and truncation is carried out to the audio file;
4) Meier feature is extracted, obtains two Jan Vermeer sonographs of each audio frame, and be normalized;
5) audio scene identification feature gold will be used for after the two Jan Vermeer sonographs input training after each normalized
Word tower network model carries out propagated forward, obtains frame by frame finally by softmax layers general to the prediction of different audio scene classifications
Rate takes the maximum scene type of prediction probability to export as the prediction of audio frame corresponding to the two Jan Vermeers sonograph;
6) it needs the audio file identified to predict to whole section, i.e., in the prediction output of all audio frames, will occur
The prediction result for the audio file that the highest audio scene classification of frequency needs to identify as whole section exports.
It is used for audio scene identification feature pyramid network model described in step 1), is using Xception as feature
The backbone structure of pyramid network model, it is 3 that the fallout predictor in model, which is by being input to output sequence to be followed successively by convolution kernel size,
× 3 convolutional layer, global pool layer, full articulamentum and softmax layers of composition.
Truncation described in step 3) is that the audio file to be identified is cut into the several of fixed duration 10s
A signal segment.
Extraction Meier feature described in step 4), comprising:
(4.1) framing windowing process is carried out to each signal segment respectively;
(4.2) after filtering obtained each audio frame by Meier filter group, each time in audio frame is calculated
It walks in range through the energy of each Meier filter, it is all by Meier filter by what is obtained within the scope of each time step
Energy forms energy vectors, and the energy vectors within the scope of all time steps are merged, the two-dimentional plum of corresponding audio frame is finally obtained
That sonograph.
The energy within the scope of each time step in audio frame by each Meier filter is calculated described in (4.2) step,
It is
Wherein, M is the quantity of Meier filter, and H (k) is the transmission function of Meier filter, and X (k) is corresponding FFT
Range value.
A kind of audio scene recognition method based on feature pyramid network of the invention, due to having used in deep learning
The method of neural network, the present invention can make full use of the more and more data provided under current big data trend to be brought
Information.Meanwhile because relating only to the prediction process of propagated forward in actual use, its predetermined speed is very fast.And biography
The CNN method of system is compared, and the present invention takes full advantage of low-level image feature information.Can on the basis of not increasing model complexity,
Improve model performance.
Detailed description of the invention
Fig. 1 is a kind of flow chart of the audio scene recognition method based on feature pyramid network of the present invention.
Specific embodiment
Below with reference to embodiment and attached drawing to a kind of audio scene identification side based on feature pyramid network of the invention
Method is described in detail.
A kind of audio scene recognition method based on feature pyramid network of the invention, includes the following steps:
1) it establishes and is used for audio scene identification feature pyramid network model;
Described is used for audio scene identification feature pyramid network model, is using Xception as feature pyramid
The backbone structure of network model, it is 3 × 3 that the fallout predictor in model, which is by being input to output sequence to be followed successively by convolution kernel size,
Convolutional layer, global pool layer, full articulamentum and softmax layers of composition.
It 2) will include that the audio file of different scenes classification and the training set of corresponding scene type input and be used for audio
Scene Recognition feature pyramid network model is trained to for audio scene identification feature pyramid network model, is used
After training set is trained network, prediction process pertains only to propagated forward;
For fast convergence and best performance is obtained, is learned using Adam optimizer in training process and provided with adaptive
The decaying of habit rate.
3) it reads the audio file for needing to identify and truncation is carried out to the audio file;The truncation is
The audio file to be identified is cut into several signal segments of fixed duration 10s.
4) Meier (Mel) feature is extracted, obtains two Jan Vermeer sonographs of each audio frame, and be normalized;
Described extraction Meier (Mel) feature includes:
(3.1) framing windowing process is carried out to each signal segment respectively;
(3.2) after filtering obtained each audio frame by Meier filter group, each time in audio frame is calculated
It walks in range through the energy of each Meier filter, it is all by Meier filter by what is obtained within the scope of each time step
Energy forms energy vectors, and the energy vectors within the scope of all time steps are merged, the two-dimentional plum of corresponding audio frame is finally obtained
That sonograph.
By the energy of each Meier filter within the scope of each time step calculated in audio frame, it is:
Wherein, M is the quantity of Meier filter, and H (k) is the transmission function of Meier filter, and X (k) is corresponding FFT
Range value.
5) audio scene identification feature gold will be used for after the two Jan Vermeer sonographs input training after each normalized
Word tower network model carries out propagated forward, obtains frame by frame finally by softmax layers general to the prediction of different audio scene classifications
Rate takes the maximum scene type of prediction probability to export as the prediction of audio frame corresponding to the two Jan Vermeers sonograph;
6) it needs the audio file identified to predict to whole section, i.e., in the prediction output of all audio frames, will occur
The prediction result for the audio file that the highest audio scene classification of frequency needs to identify as whole section exports.
Specific example is given below:
1, it reads audio signal and carries out truncation, every section of sound bite for being cut into fixed duration 10s;
2, framing windowing process is carried out to the voice signal of fixed duration, 2048 sampled points of every frame add 2048 Hammings
Window;
3, the signal after framing is subjected to feature extraction by Mel filter group and takes logarithm, filter number 134
It is a, a length of 1704 points of the window of filter, Chong Die 852 points between frame and frame;
4, obtained Mel sonograph is normalized;
5, the Mel sonograph after normalization is inputted into ASCFPN network, carries out propagated forward;
6, using ballot method, the prediction result of each frame is counted, most scene types is predicted and is taken as whole section audio
Prediction result output.
All kinds of audio scene recognizers of table 1 compare
Shown in as shown above, ASCFPN is algorithm proposed by the present invention, under identical data set, based on ASCFPN's
The accuracy rate of audio scene recognition method is apparently higher than other two kinds of pedestal methods, thus in the present invention mentioned method performance compared with
It is good.
Claims (5)
1. a kind of audio scene recognition method based on feature pyramid network, which comprises the steps of:
1) it establishes and is used for audio scene identification feature pyramid network model;
It 2) will include that the audio file of different scenes classification and the training set of corresponding scene type input and be used for audio scene
Identification feature pyramid network model is trained to for audio scene identification feature pyramid network model;
3) it reads the audio file for needing to identify and truncation is carried out to the audio file;
4) Meier feature is extracted, obtains two Jan Vermeer sonographs of each audio frame, and be normalized;
5) audio scene identification feature pyramid will be used for after the two Jan Vermeer sonographs input training after each normalized
Network model carries out propagated forward, obtains taking the prediction probability of different audio scene classifications frame by frame finally by softmax layers
The maximum scene type of prediction probability is exported as the prediction of audio frame corresponding to the two Jan Vermeers sonograph;
6) audio file identified is needed to predict to whole section, i.e., in the prediction output of all audio frames, by the frequency of occurrences
The prediction result for the audio file that highest audio scene classification needs to identify as whole section exports.
2. a kind of audio scene recognition method based on feature pyramid network according to claim 1, which is characterized in that
It is used for audio scene identification feature pyramid network model described in step 1), is using Xception as feature pyramid network
The backbone structure of network model, the fallout predictor in model are to be followed successively by the convolution that convolution kernel size is 3 × 3 by being input to output sequence
Layer, global pool layer, full articulamentum and softmax layers of composition.
3. a kind of audio scene recognition method based on feature pyramid network according to claim 1, which is characterized in that
Truncation described in step 3) is several signal segments that the audio file to be identified is cut into fixed duration 10s.
4. a kind of audio scene recognition method based on feature pyramid network according to claim 1, which is characterized in that
Extraction Meier feature described in step 4), comprising:
(4.1) framing windowing process is carried out to each signal segment respectively;
(4.2) after filtering obtained each audio frame by Meier filter group, each time step model in audio frame is calculated
By the energy of each Meier filter in enclosing, all energy bins by Meier filter that will be obtained within the scope of each time step
At energy vectors, the energy vectors within the scope of all time steps are merged, two Jan Vermeer sound spectrums of corresponding audio frame are finally obtained
Figure.
5. a kind of audio scene recognition method based on feature pyramid network according to claim 4, which is characterized in that
By the energy of each Meier filter within the scope of each time step in calculating audio frame described in (4.2) step, it is
Wherein, M is the quantity of Meier filter, and H (k) is the transmission function of Meier filter, and X (k) is the amplitude of corresponding FFT
Value.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910233193.8A CN110085218A (en) | 2019-03-26 | 2019-03-26 | A kind of audio scene recognition method based on feature pyramid network |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910233193.8A CN110085218A (en) | 2019-03-26 | 2019-03-26 | A kind of audio scene recognition method based on feature pyramid network |
Publications (1)
Publication Number | Publication Date |
---|---|
CN110085218A true CN110085218A (en) | 2019-08-02 |
Family
ID=67413630
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201910233193.8A Pending CN110085218A (en) | 2019-03-26 | 2019-03-26 | A kind of audio scene recognition method based on feature pyramid network |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN110085218A (en) |
Cited By (11)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110718234A (en) * | 2019-09-02 | 2020-01-21 | 江苏师范大学 | Acoustic scene classification method based on semantic segmentation coding and decoding network |
CN110782878A (en) * | 2019-10-10 | 2020-02-11 | 天津大学 | Attention mechanism-based multi-scale audio scene recognition method |
CN110796027A (en) * | 2019-10-10 | 2020-02-14 | 天津大学 | Sound scene recognition method based on compact convolution neural network model |
CN111081266A (en) * | 2019-12-18 | 2020-04-28 | 暗物智能科技(广州)有限公司 | Training generation countermeasure network, and voice enhancement method and system |
CN112201226A (en) * | 2020-09-28 | 2021-01-08 | 复旦大学 | Sound production mode judging method and system |
CN112721933A (en) * | 2020-07-28 | 2021-04-30 | 盐城工业职业技术学院 | Agricultural tractor's control terminal based on speech recognition |
CN114117096A (en) * | 2021-11-23 | 2022-03-01 | 腾讯科技(深圳)有限公司 | Multimedia data processing method and related equipment |
CN115602165A (en) * | 2022-09-07 | 2023-01-13 | 杭州优航信息技术有限公司(Cn) | Digital staff intelligent system based on financial system |
CN116030800A (en) * | 2023-03-30 | 2023-04-28 | 南昌航天广信科技有限责任公司 | Audio classification recognition method, system, computer and readable storage medium |
CN116186524A (en) * | 2023-05-04 | 2023-05-30 | 天津大学 | Self-supervision machine abnormal sound detection method |
CN116543795A (en) * | 2023-06-29 | 2023-08-04 | 天津大学 | Sound scene classification method based on multi-mode feature fusion |
Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN107392901A (en) * | 2017-07-24 | 2017-11-24 | 国网山东省电力公司信息通信公司 | A kind of method for transmission line part intelligence automatic identification |
CN108231067A (en) * | 2018-01-13 | 2018-06-29 | 福州大学 | Sound scenery recognition methods based on convolutional neural networks and random forest classification |
CN109065030A (en) * | 2018-08-01 | 2018-12-21 | 上海大学 | Ambient sound recognition methods and system based on convolutional neural networks |
CN109448703A (en) * | 2018-11-14 | 2019-03-08 | 山东师范大学 | In conjunction with the audio scene recognition method and system of deep neural network and topic model |
-
2019
- 2019-03-26 CN CN201910233193.8A patent/CN110085218A/en active Pending
Patent Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN107392901A (en) * | 2017-07-24 | 2017-11-24 | 国网山东省电力公司信息通信公司 | A kind of method for transmission line part intelligence automatic identification |
CN108231067A (en) * | 2018-01-13 | 2018-06-29 | 福州大学 | Sound scenery recognition methods based on convolutional neural networks and random forest classification |
CN109065030A (en) * | 2018-08-01 | 2018-12-21 | 上海大学 | Ambient sound recognition methods and system based on convolutional neural networks |
CN109448703A (en) * | 2018-11-14 | 2019-03-08 | 山东师范大学 | In conjunction with the audio scene recognition method and system of deep neural network and topic model |
Non-Patent Citations (4)
Title |
---|
BASBUG A M: ""Acoustic Scene Classification Using Spatial Pyramid Pooling With Convolutional Neural Networks"", 《2019 ICSC》 * |
TSUNG-YI LIN: ""feature pyramid networks for object detection"", 《2017 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION》 * |
夏子琪: ""基于注意力机制的声音场景深度分类模型研究"", 《中国优秀硕士学位论文全文数据库信息科技辑》 * |
李琪: ""基于深度学习的音频场景识别方法研究"", 《中国优秀硕士学位论文全文数据库信息科技辑》 * |
Cited By (18)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110718234A (en) * | 2019-09-02 | 2020-01-21 | 江苏师范大学 | Acoustic scene classification method based on semantic segmentation coding and decoding network |
CN110782878B (en) * | 2019-10-10 | 2022-04-05 | 天津大学 | Attention mechanism-based multi-scale audio scene recognition method |
CN110782878A (en) * | 2019-10-10 | 2020-02-11 | 天津大学 | Attention mechanism-based multi-scale audio scene recognition method |
CN110796027A (en) * | 2019-10-10 | 2020-02-14 | 天津大学 | Sound scene recognition method based on compact convolution neural network model |
CN110796027B (en) * | 2019-10-10 | 2023-10-17 | 天津大学 | Sound scene recognition method based on neural network model of tight convolution |
CN111081266A (en) * | 2019-12-18 | 2020-04-28 | 暗物智能科技(广州)有限公司 | Training generation countermeasure network, and voice enhancement method and system |
CN112721933A (en) * | 2020-07-28 | 2021-04-30 | 盐城工业职业技术学院 | Agricultural tractor's control terminal based on speech recognition |
CN112721933B (en) * | 2020-07-28 | 2022-01-04 | 盐城工业职业技术学院 | Agricultural tractor's control terminal based on speech recognition |
CN112201226A (en) * | 2020-09-28 | 2021-01-08 | 复旦大学 | Sound production mode judging method and system |
CN112201226B (en) * | 2020-09-28 | 2022-09-16 | 复旦大学 | Sound production mode judging method and system |
CN114117096A (en) * | 2021-11-23 | 2022-03-01 | 腾讯科技(深圳)有限公司 | Multimedia data processing method and related equipment |
CN115602165A (en) * | 2022-09-07 | 2023-01-13 | 杭州优航信息技术有限公司(Cn) | Digital staff intelligent system based on financial system |
CN115602165B (en) * | 2022-09-07 | 2023-05-05 | 杭州优航信息技术有限公司 | Digital employee intelligent system based on financial system |
CN116030800A (en) * | 2023-03-30 | 2023-04-28 | 南昌航天广信科技有限责任公司 | Audio classification recognition method, system, computer and readable storage medium |
CN116186524A (en) * | 2023-05-04 | 2023-05-30 | 天津大学 | Self-supervision machine abnormal sound detection method |
CN116186524B (en) * | 2023-05-04 | 2023-07-18 | 天津大学 | Self-supervision machine abnormal sound detection method |
CN116543795A (en) * | 2023-06-29 | 2023-08-04 | 天津大学 | Sound scene classification method based on multi-mode feature fusion |
CN116543795B (en) * | 2023-06-29 | 2023-08-29 | 天津大学 | Sound scene classification method based on multi-mode feature fusion |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN110085218A (en) | A kind of audio scene recognition method based on feature pyramid network | |
CN110782878B (en) | Attention mechanism-based multi-scale audio scene recognition method | |
CN107393542B (en) | Bird species identification method based on two-channel neural network | |
WO2020248376A1 (en) | Emotion detection method and apparatus, electronic device, and storage medium | |
CN111723239B (en) | Video annotation method based on multiple modes | |
CN111680706A (en) | Double-channel output contour detection method based on coding and decoding structure | |
CN111461173B (en) | Multi-speaker clustering system and method based on attention mechanism | |
WO2022198923A1 (en) | Speech emotion recognition method and system using fusion of crowd information | |
CN111462733B (en) | Multi-modal speech recognition model training method, device, equipment and storage medium | |
CN110148408A (en) | A kind of Chinese speech recognition method based on depth residual error | |
CN109948721A (en) | A kind of video scene classification method based on video presentation | |
CN112784730A (en) | Multi-modal emotion recognition method based on time domain convolutional network | |
CN110796027A (en) | Sound scene recognition method based on compact convolution neural network model | |
CN108921032A (en) | A kind of new video semanteme extracting method based on deep learning model | |
CN112562698B (en) | Power equipment defect diagnosis method based on fusion of sound source information and thermal imaging characteristics | |
CN109784277A (en) | A kind of Emotion identification method based on intelligent glasses | |
CN113724712A (en) | Bird sound identification method based on multi-feature fusion and combination model | |
Wang et al. | Exploring audio semantic concepts for event-based video retrieval | |
CN111488813B (en) | Video emotion marking method and device, electronic equipment and storage medium | |
CN111048097A (en) | Twin network voiceprint recognition method based on 3D convolution | |
CN112749663A (en) | Agricultural fruit maturity detection system based on Internet of things and CCNN model | |
CN114882914A (en) | Aliasing tone processing method, device and storage medium | |
CN114330454A (en) | Live pig cough sound identification method based on DS evidence theory fusion characteristics | |
Liu et al. | Bird song classification based on improved Bi-LSTM-DenseNet network | |
CN109658369A (en) | Video intelligent generation method and device |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
RJ01 | Rejection of invention patent application after publication |
Application publication date: 20190802 |
|
RJ01 | Rejection of invention patent application after publication |