CN108122562A - A kind of audio frequency classification method based on convolutional neural networks and random forest - Google Patents
A kind of audio frequency classification method based on convolutional neural networks and random forest Download PDFInfo
- Publication number
- CN108122562A CN108122562A CN201810037337.8A CN201810037337A CN108122562A CN 108122562 A CN108122562 A CN 108122562A CN 201810037337 A CN201810037337 A CN 201810037337A CN 108122562 A CN108122562 A CN 108122562A
- Authority
- CN
- China
- Prior art keywords
- convolutional neural
- neural networks
- audio
- spectrogram
- random forest
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000013527 convolutional neural network Methods 0.000 title claims abstract description 33
- 238000000034 method Methods 0.000 title claims abstract description 28
- 238000007637 random forest analysis Methods 0.000 title claims abstract description 23
- 238000000605 extraction Methods 0.000 claims abstract description 17
- 238000010183 spectrum analysis Methods 0.000 claims abstract description 9
- 239000000284 extract Substances 0.000 claims abstract description 5
- 238000009432 framing Methods 0.000 claims abstract description 4
- 230000009466 transformation Effects 0.000 claims abstract description 3
- 238000013480 data collection Methods 0.000 claims abstract 2
- 238000012549 training Methods 0.000 claims description 16
- 238000003066 decision tree Methods 0.000 claims description 6
- 239000012535 impurity Substances 0.000 claims description 2
- 238000001228 spectrum Methods 0.000 claims description 2
- 230000007935 neutral effect Effects 0.000 claims 1
- 238000012360 testing method Methods 0.000 abstract description 9
- 238000010276 construction Methods 0.000 abstract description 3
- 230000011218 segmentation Effects 0.000 abstract description 2
- 238000005070 sampling Methods 0.000 description 9
- 238000013135 deep learning Methods 0.000 description 3
- 230000006870 function Effects 0.000 description 3
- 238000010606 normalization Methods 0.000 description 3
- 238000005516 engineering process Methods 0.000 description 2
- 210000004205 output neuron Anatomy 0.000 description 2
- 238000012706 support-vector machine Methods 0.000 description 2
- 241000208340 Araliaceae Species 0.000 description 1
- 241001050985 Disco Species 0.000 description 1
- 235000005035 Panax pseudoginseng ssp. pseudoginseng Nutrition 0.000 description 1
- 235000003140 Panax quinquefolius Nutrition 0.000 description 1
- 238000004458 analytical method Methods 0.000 description 1
- 230000015572 biosynthetic process Effects 0.000 description 1
- 238000004422 calculation algorithm Methods 0.000 description 1
- 238000006243 chemical reaction Methods 0.000 description 1
- 230000007812 deficiency Effects 0.000 description 1
- 238000013461 design Methods 0.000 description 1
- 238000011161 development Methods 0.000 description 1
- 230000018109 developmental process Effects 0.000 description 1
- 230000000694 effects Effects 0.000 description 1
- 235000008434 ginseng Nutrition 0.000 description 1
- 238000003475 lamination Methods 0.000 description 1
- 238000010801 machine learning Methods 0.000 description 1
- 238000005065 mining Methods 0.000 description 1
- 230000001537 neural effect Effects 0.000 description 1
- 210000002569 neuron Anatomy 0.000 description 1
- 238000011160 research Methods 0.000 description 1
- 239000011435 rock Substances 0.000 description 1
Classifications
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L25/00—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
- G10L25/48—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use
- G10L25/51—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use for comparison or discrimination
- G10L25/54—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use for comparison or discrimination for retrieval
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/24—Classification techniques
- G06F18/241—Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L25/00—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
- G10L25/03—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the type of extracted parameters
- G10L25/18—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the type of extracted parameters the extracted parameters being spectral information of each sub-band
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L25/00—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
- G10L25/27—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the analysis technique
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L25/00—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
- G10L25/27—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the analysis technique
- G10L25/30—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the analysis technique using neural networks
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L25/00—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
- G10L25/45—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the type of analysis window
Landscapes
- Engineering & Computer Science (AREA)
- Physics & Mathematics (AREA)
- Health & Medical Sciences (AREA)
- Computational Linguistics (AREA)
- Signal Processing (AREA)
- Multimedia (AREA)
- Acoustics & Sound (AREA)
- Human Computer Interaction (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Theoretical Computer Science (AREA)
- Evolutionary Computation (AREA)
- Artificial Intelligence (AREA)
- Data Mining & Analysis (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Life Sciences & Earth Sciences (AREA)
- Evolutionary Biology (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Bioinformatics & Computational Biology (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Spectroscopy & Molecular Physics (AREA)
- Biomedical Technology (AREA)
- Biophysics (AREA)
- General Health & Medical Sciences (AREA)
- Molecular Biology (AREA)
- Computing Systems (AREA)
- Mathematical Physics (AREA)
- Software Systems (AREA)
- Image Analysis (AREA)
Abstract
The invention discloses a kind of audio frequency classification method based on convolutional neural networks and random forest, this method includes:S1:Spectrum analysis is carried out to original audio data collection, including segmentation, framing, adding window, Fourier transformation, obtains the corresponding spectrogram of original audio file;S2:Spectrogram to obtain trains a convolutional neural networks feature extractor as input;S3:Remove the softmax layers of convolutional neural networks, extract the high-level characteristic of spectrogram;S4:Random forest grader is trained using the spectrogram high-level characteristic of extraction;S5:Based on the high-level characteristic of convolutional neural networks extraction, audio classification is carried out using trained random forest.The present invention is based on convolutional neural networks to do feature extraction, avoid the complicated processes of construction extraction feature manually, simultaneously for causing generalization ability insufficient as convolutional neural networks grader using softmax the problem of, the softmax layers of convolutional neural networks are replaced using random forest, as final grader.Higher accuracy rate and recall rate are achieved during the test.
Description
Technical field
The invention belongs to machine learning fields, are related to a kind of audio classification side based on convolutional neural networks and random forest
Method.
Background technology
The life of internet and the development let us of multimedia technology is flooded with substantial amounts of audio, especially various music nets
It stands, possesses the audio file of substantial amounts and different style.In face of the audio of magnanimity, audio retrieval can help us quick and precisely
Find required audio file in ground.Audio classification is the premise of audio retrieval, but carries out manual sort to a large amount of audio files
It is a quite time-consuming and a hard row to hoe.With the auditory fatigue of people, the accuracy rate of manual sort also can decrease.For
A large amount of audio files, fast and accurately automatic classification seem very it is necessary to.Research in relation to audio frequency classification method is more, such as
Using the two-stage audio frequency classification method based on hidden Markov model and support vector machines mixing, first with hidden Markov model
Preliminary classification is carried out to audio, most probable two kinds of classification results is determined, then is made finally of corresponding support vector machine classifier
Judgement.The method that similarity between also with good grounds audio content classifies to audio, being represented with the pitch collection of each audio should
Audio file, with LDA topic models to audio classification.Also have and carried out using gauss hybrid models, decision tree etc. as grader
Classification.But these methods craft construction feature mostly by the way of traditional, both cumbersome, the feature of extraction is also not enough.
And single grader is used, the generalization ability for causing model is not strong.
In recent years, deep learning is gradually burning hot, and structure contains more hidden layers, more abstract by combining low-level image feature formation
It is high-rise represent attribute or feature, preferably feature can be represented by the distributed of mining data, than the side of traditional manual construction feature
Formula effect is more preferable.For present situation and the above problem, it is necessary to design a kind of audio frequency classification method based on deep learning.
The content of the invention
The technical problems to be solved by the invention are to provide a kind of audio based on convolutional neural networks and random forest point
Class method, this method automatically extract high-level characteristic using convolutional neural networks, and it is extensive to solve single grader using random forest
The problem of indifferent, has higher accuracy rate and recall rate.
Inventive technique solution is as follows:
A kind of audio frequency classification method based on convolutional neural networks and random forest, comprises the following steps.
Step 1:Spectrum analysis is carried out to original audio file, obtains its corresponding spectrogram.Due to audio file often
Longer, it is excessive directly to do the spectrogram that spectrum analysis obtains to original audio, cause later stage training pattern occupying system resources compared with
It is more.So appropriate segmentation is taken original audio, then spectrum analysis is done to every section audio, including framing, in short-term adding window, Fourier
The processes such as conversion.Assuming thatIt is a long sequence,It is the window function that length is N, usesTo addingAdding window obtains N
Point sequence, i.e.,
Have on frequency domain
The formula of Short Time Fourier Transform is as follows:
WhereinFor original signal,For window function.By spectrum analysis, the corresponding spectrogram of audio has been obtained.
Step 2:The spectrogram obtained by the use of in step 1 trains an improved convolutional neural networks as training set.It should
Network has 14 layers, including convolutional layer, down-sampling layer, Dropout layers, Flatten layers, full articulamentum, Batch
Normalization layers, softmax layers etc., using cross entropy as loss function.Each layer is described as follows:
Input:Size is the spectrogram of 248*248;
Layer1:Convolutional layer, core size are (5,5), and 64, strides=1, output characteristic figure size is (244,244);
Layer2:Down-sampling layer, core size are (2,2), and output characteristic figure size is (122,122);
Layer3:Convolutional layer, core size are (3,3), and 128, strides=2, output characteristic figure size is (60,60);
Layer4:Down-sampling layer, core size are (2,2), and output characteristic figure size is (30,30);
Layer5:Convolutional layer, core size are (3,3), and 256, strides=2, output characteristic figure size is (14,14);
Layer6:Down-sampling layer, core size are (2,2), and output characteristic figure size is (7,7);
Layer7:Convolutional layer, core size are (2,2), and 512, strides=1, output characteristic figure size is (6,6);
Layer8:Down-sampling layer, core size are (2,2), and output characteristic figure size is (3,3);
Layer9:Dropout layers, dropout=0.5 makes neuron fail by certain probability, prevented plan in the training process
It closes;
Layer10:Flatten layers, multidimensional data one-dimensional, it is transitioned into full articulamentum;
Layer11:Full articulamentum, output neuron number are 128;
Layer12:Batch Normalization, normalize input signal, while keep the ability to express of model again;
Layer13:Full articulamentum, output neuron number is 9, because the data set sample used has 9 classes;
Layer14:Softmax layers, grader is exported as final probability distribution, and each value represents a kind of probability of classification.
Step 3:The softmax layers of trained convolutional neural networks in step 2 are removed, by the last one full articulamentum
High-level characteristic of the output as spectrogram.
Step 4:Random forest grader is trained using the high-level characteristic extracted in step 3.Using Gini impurity level conducts
The criterion of decision tree feature selecting.Algorithm description is as follows:
Input:Sample set D={ (x1, y1), (x2, y2) ... (xm, ym) }, Weak Classifier iterations T;
Output:Final strong classifier f (x);
The T for t=1,2 ...
A) concentrated from initial data and carry out the t times stochastical sampling, sampled m times altogether, obtain sampling set Dm;
B) m-th of decision tree Gm (x) is built using sampling set Dm.A part of feature is randomly choosed in all features of sample, so
An optimal feature is selected from these features again afterwards and divides left and right subtree for decision tree.
Step 5:Audio to be sorted is subjected to the spectrum analysis in step 1 and obtains spectrogram, is then removed in step 3
The high-level characteristic of extraction, is finally input in step 4 and instructs by softmax layers of convolutional neural networks extraction spectrogram high-level characteristic
The random forest grader perfected carries out audio classification, and the classifications of the most polls launched by the use of T weak learners is as final class
Not.
The present invention is based on deep learnings to propose a kind of audio frequency classification method, employs convolutional neural networks and random forest
The mixed model being combined.For conventional model it is insufficient to feature extraction the problem of, the present invention converts the audio into spectrogram,
The high-level characteristic of convolutional neural networks extraction spectrogram is recycled, has given full play to powerful feature of the convolutional neural networks to image
Extractability simplifies the complex process of feature extraction.For single grader generalization ability it is not strong the problem of, employ random
Forest model, the advantages of giving full play to random forest integrated study, structure more decision trees are classified, and compensate for single grader
Deficiency.From classification results, the present invention has higher accuracy rate and recall rate.
Description of the drawings
Fig. 1 is a kind of flow chart of the audio frequency classification method based on convolutional neural networks and random forest of the present invention.
The spectrogram obtained after Fig. 2 spectrum analyses.
Fig. 3 is the flow chart that high-level characteristic extraction is carried out using improved convolutional neural networks.
Specific embodiment
With reference to the accompanying drawings and examples, the specific implementation method of the present invention is described further.Example is applied below only to use
In illustrating the present invention, but it is not limited to the scope of the present invention.
Embodiment 1 is a kind of example of the present invention, using " GTZAN Genre Collection " as data set, using it
In nine kinds of different schools audio file as training set and test set, nine kinds of classifications are:blues、C1assical、
Country, Disco, Jazz, Metal, Pop, Reggae and Rock.
1. audio file is divided into isometric 6 sections, each section all corresponds to identical label.To each section audio framing, add
Window, Fourier transformation obtain its spectrogram.What attached drawing 2 was shown is the spectrogram obtained.Spectrogram is read in, is converted to ash
Degree figure.It is again 248*248 by the size adjusting of every figure.The pixel value of the picture after adjustment is finally saved in array, as
A sample in convolutional neural networks data set.By operation above, data set D (5400,248,248) is obtained, is represented
There are 5400 spectrograms, the width of every spectrogram is 248, is highly 248.Data set is divided into training set and test set,
Wherein 80% is used as training set, and 20% is used as test set, finally obtains training set T (4320,248,248), test set V (1080,
248,248)。
2. utilize training set T (4320,248,248) training convolutional neural networks model.Network has 14 layers altogether, including volume
Lamination, down-sampling layer, full articulamentum, Dropout layers, Normalization layers of Batch etc..
3. after the completion of convolutional neural networks training, remove last softmax layers.With trained convolutional Neural net
Network carries out deeper feature extraction to spectrogram, by the original training set T (4320,248,248) being made of spectrogram weight
Structure is new training set T ' (4320,9), the original test set V (1080,248,248) being made of spectrogram is reconstructed into new
Test set V ' (1080,9).
4. random forest is trained with new training set T ' and test set V ', as final grader.Using different ginsengs
Number combination settings, wherein
Parameter | Numerical value |
n_estimators | [10,50,100] |
min_samples_split | [2, 3, 4] |
min_samples_leaf | [1, 2, 3] |
By selecting, optimal parameter is combined as n_estimators:100, min_samples_split:3, min_samples_
leaf:1.After the completion of random forest training, tested on test set, it is as a result as follows:
Classes | Precision | Recall | F1-score | support |
0 | 0.80 | 0.74 | 0.77 | 118 |
1 | 0.89 | 0.92 | 0.90 | 133 |
2 | 0.75 | 0.80 | 0.78 | 117 |
3 | 0.75 | 0.83 | 0.79 | 118 |
4 | 0.93 | 0.88 | 0.90 | 134 |
5 | 0.94 | 0.90 | 0.92 | 108 |
6 | 0.88 | 0.85 | 0.87 | 103 |
7 | 0.86 | 0.78 | 0.82 | 124 |
8 | 0.64 | 0.68 | 0.66 | 125 |
Avg/total | 0.83 | 0.82 | 0.82 | 1080 |
This method can accurately classify automatically to audio as can be seen from the above table, and wherein Average Accuracy reaches
83%, average recall rate has reached 82%.
Claims (3)
1. a kind of audio frequency classification method based on convolutional neural networks and random forest, feature include the following steps:
Step 1:Spectrum analysis is carried out to original audio data collection, long audio file is divided into isometric several sections first, every section
Audio corresponds to identical label, then carries out framing, adding window, Fourier transformation to every section audio, obtains the frequency spectrum of every section audio
Figure, a sample as new training set;
Step 2:All spectrograms and its corresponding label obtained using step 1, one improved convolutional neural networks of training,
The network has 14 layers;
Step 3:Remove the softmax layers for the convolutional neural networks that step 2 learns, then extract institute with convolutional neural networks again
There is the high-level characteristic of spectrogram;
Step 4:The high-level characteristic of the spectrogram extracted using step 3 trains random forest grader, is made using Gini impurity levels
For the criterion of decision tree feature selecting;
Step 5:Audio to be sorted is subjected to the spectrum analysis in step 1 and obtains spectrogram, is then removed in step 3
The high-level characteristic of extraction, is finally input in step 4 and instructs by softmax layers of convolutional neural networks extraction spectrogram high-level characteristic
The random forest grader perfected carries out audio classification, and final classification results are obtained in a manner of ballot.
2. a kind of audio frequency classification method based on convolutional neural networks and random forest according to claim 1, feature
It is, for audio frequency characteristics, the specific implementation process of this method includes two-stage feature extraction, and first order feature extraction is to pass through frequency
Spectrum analysis obtains the corresponding spectrogram of audio, tentatively extracts its low layer time-frequency characteristics, and second level feature extraction uses improved volume
Product neutral net, further extracts high-level characteristic to spectrogram.
3. a kind of audio frequency classification method based on convolutional neural networks and random forest according to claim 1, feature
Be, this method in order to overcome the problems, such as that softmax causes generalization ability not strong as convolutional neural networks grader, using with
Machine forest replaces last layer of convolutional neural networks, as final audio classifiers.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201810037337.8A CN108122562A (en) | 2018-01-16 | 2018-01-16 | A kind of audio frequency classification method based on convolutional neural networks and random forest |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201810037337.8A CN108122562A (en) | 2018-01-16 | 2018-01-16 | A kind of audio frequency classification method based on convolutional neural networks and random forest |
Publications (1)
Publication Number | Publication Date |
---|---|
CN108122562A true CN108122562A (en) | 2018-06-05 |
Family
ID=62232892
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201810037337.8A Pending CN108122562A (en) | 2018-01-16 | 2018-01-16 | A kind of audio frequency classification method based on convolutional neural networks and random forest |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN108122562A (en) |
Cited By (27)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN108766461A (en) * | 2018-07-17 | 2018-11-06 | 厦门美图之家科技有限公司 | Audio feature extraction methods and device |
CN109002529A (en) * | 2018-07-17 | 2018-12-14 | 厦门美图之家科技有限公司 | Audio search method and device |
CN109166593A (en) * | 2018-08-17 | 2019-01-08 | 腾讯音乐娱乐科技(深圳)有限公司 | audio data processing method, device and storage medium |
CN109493881A (en) * | 2018-11-22 | 2019-03-19 | 北京奇虎科技有限公司 | A kind of labeling processing method of audio, device and calculate equipment |
CN109684506A (en) * | 2018-11-22 | 2019-04-26 | 北京奇虎科技有限公司 | A kind of labeling processing method of video, device and calculate equipment |
CN109739112A (en) * | 2018-12-29 | 2019-05-10 | 张卫校 | A kind of wobble objects control method and wobble objects |
CN109949825A (en) * | 2019-03-06 | 2019-06-28 | 河北工业大学 | Noise classification method based on the FPGA PCNN algorithm accelerated |
CN110010128A (en) * | 2019-04-09 | 2019-07-12 | 天津松下汽车电子开发有限公司 | A kind of sound control method and system of high discrimination |
CN110324657A (en) * | 2019-05-29 | 2019-10-11 | 北京奇艺世纪科技有限公司 | Model generation, method for processing video frequency, device, electronic equipment and storage medium |
CN110414483A (en) * | 2019-08-13 | 2019-11-05 | 山东浪潮人工智能研究院有限公司 | A kind of face identification method and system based on deep neural network and random forest |
CN110600038A (en) * | 2019-08-23 | 2019-12-20 | 北京工业大学 | Audio fingerprint dimension reduction method based on discrete kini coefficient |
CN110675893A (en) * | 2019-09-19 | 2020-01-10 | 腾讯音乐娱乐科技(深圳)有限公司 | Song identification method and device, storage medium and electronic equipment |
CN110808033A (en) * | 2019-09-25 | 2020-02-18 | 武汉科技大学 | Audio classification method based on dual data enhancement strategy |
CN110931046A (en) * | 2019-11-29 | 2020-03-27 | 福州大学 | Audio high-level semantic feature extraction method and system for overlapped sound event detection |
CN110933236A (en) * | 2019-10-25 | 2020-03-27 | 杭州哲信信息技术有限公司 | Machine learning-based null number identification method |
CN110931045A (en) * | 2019-12-20 | 2020-03-27 | 重庆大学 | Audio feature generation method based on convolutional neural network |
CN111159464A (en) * | 2019-12-26 | 2020-05-15 | 腾讯科技(深圳)有限公司 | Audio clip detection method and related equipment |
CN111179971A (en) * | 2019-12-03 | 2020-05-19 | 杭州网易云音乐科技有限公司 | Nondestructive audio detection method and device, electronic equipment and storage medium |
CN111508526A (en) * | 2020-04-10 | 2020-08-07 | 腾讯音乐娱乐科技(深圳)有限公司 | Method and device for detecting audio beat information and storage medium |
CN111583890A (en) * | 2019-02-15 | 2020-08-25 | 阿里巴巴集团控股有限公司 | Audio classification method and device |
CN112735386A (en) * | 2021-01-18 | 2021-04-30 | 苏州大学 | Voice recognition method based on glottal wave information |
CN113313197A (en) * | 2021-06-17 | 2021-08-27 | 哈尔滨工业大学 | Full-connection neural network training method |
CN113729715A (en) * | 2021-10-11 | 2021-12-03 | 山东大学 | Parkinson's disease intelligent diagnosis system based on finger pressure |
CN113901977A (en) * | 2020-06-22 | 2022-01-07 | 中国电力科学研究院有限公司 | Deep learning-based power consumer electricity stealing identification method and system |
CN115064184A (en) * | 2022-06-28 | 2022-09-16 | 镁佳(北京)科技有限公司 | Audio file musical instrument content identification vector representation method and device |
US11905926B2 (en) * | 2019-12-31 | 2024-02-20 | Envision Digital International Pte. Ltd. | Method and apparatus for inspecting wind turbine blade, and device and storage medium thereof |
CN118098270A (en) * | 2024-04-24 | 2024-05-28 | 安徽大学 | Noise tracing method based on feature extraction and feature fusion |
Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN106408015A (en) * | 2016-09-13 | 2017-02-15 | 电子科技大学成都研究院 | Road fork identification and depth estimation method based on convolutional neural network |
CN106952274A (en) * | 2017-03-14 | 2017-07-14 | 西安电子科技大学 | Pedestrian detection and distance-finding method based on stereoscopic vision |
CN107066553A (en) * | 2017-03-24 | 2017-08-18 | 北京工业大学 | A kind of short text classification method based on convolutional neural networks and random forest |
CN107492383A (en) * | 2017-08-07 | 2017-12-19 | 上海六界信息技术有限公司 | Screening technique, device, equipment and the storage medium of live content |
CN107491606A (en) * | 2017-08-17 | 2017-12-19 | 安徽工业大学 | Variable working condition epicyclic gearbox sun gear method for diagnosing faults based on more attribute convolutional neural networks |
-
2018
- 2018-01-16 CN CN201810037337.8A patent/CN108122562A/en active Pending
Patent Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN106408015A (en) * | 2016-09-13 | 2017-02-15 | 电子科技大学成都研究院 | Road fork identification and depth estimation method based on convolutional neural network |
CN106952274A (en) * | 2017-03-14 | 2017-07-14 | 西安电子科技大学 | Pedestrian detection and distance-finding method based on stereoscopic vision |
CN107066553A (en) * | 2017-03-24 | 2017-08-18 | 北京工业大学 | A kind of short text classification method based on convolutional neural networks and random forest |
CN107492383A (en) * | 2017-08-07 | 2017-12-19 | 上海六界信息技术有限公司 | Screening technique, device, equipment and the storage medium of live content |
CN107491606A (en) * | 2017-08-17 | 2017-12-19 | 安徽工业大学 | Variable working condition epicyclic gearbox sun gear method for diagnosing faults based on more attribute convolutional neural networks |
Non-Patent Citations (2)
Title |
---|
曹林林: ""卷积神经网络在高分遥感影像分类中的应用"", 《测绘科学》 * |
罗建华: ""基于深度卷积神经网络的高光谱遥感图像分类"", 《西华大学学报》 * |
Cited By (37)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN108766461A (en) * | 2018-07-17 | 2018-11-06 | 厦门美图之家科技有限公司 | Audio feature extraction methods and device |
CN109002529A (en) * | 2018-07-17 | 2018-12-14 | 厦门美图之家科技有限公司 | Audio search method and device |
CN108766461B (en) * | 2018-07-17 | 2021-01-26 | 厦门美图之家科技有限公司 | Audio feature extraction method and device |
CN109002529B (en) * | 2018-07-17 | 2021-02-02 | 厦门美图之家科技有限公司 | Audio retrieval method and device |
CN109166593A (en) * | 2018-08-17 | 2019-01-08 | 腾讯音乐娱乐科技(深圳)有限公司 | audio data processing method, device and storage medium |
CN109684506A (en) * | 2018-11-22 | 2019-04-26 | 北京奇虎科技有限公司 | A kind of labeling processing method of video, device and calculate equipment |
CN109493881B (en) * | 2018-11-22 | 2023-12-05 | 北京奇虎科技有限公司 | Method and device for labeling audio and computing equipment |
CN109684506B (en) * | 2018-11-22 | 2023-10-20 | 三六零科技集团有限公司 | Video tagging processing method and device and computing equipment |
CN109493881A (en) * | 2018-11-22 | 2019-03-19 | 北京奇虎科技有限公司 | A kind of labeling processing method of audio, device and calculate equipment |
CN109739112A (en) * | 2018-12-29 | 2019-05-10 | 张卫校 | A kind of wobble objects control method and wobble objects |
CN109739112B (en) * | 2018-12-29 | 2022-03-04 | 张卫校 | Swinging object control method and swinging object |
CN111583890A (en) * | 2019-02-15 | 2020-08-25 | 阿里巴巴集团控股有限公司 | Audio classification method and device |
CN109949825A (en) * | 2019-03-06 | 2019-06-28 | 河北工业大学 | Noise classification method based on the FPGA PCNN algorithm accelerated |
CN110010128A (en) * | 2019-04-09 | 2019-07-12 | 天津松下汽车电子开发有限公司 | A kind of sound control method and system of high discrimination |
CN110324657A (en) * | 2019-05-29 | 2019-10-11 | 北京奇艺世纪科技有限公司 | Model generation, method for processing video frequency, device, electronic equipment and storage medium |
CN110414483A (en) * | 2019-08-13 | 2019-11-05 | 山东浪潮人工智能研究院有限公司 | A kind of face identification method and system based on deep neural network and random forest |
CN110600038B (en) * | 2019-08-23 | 2022-04-05 | 北京工业大学 | Audio fingerprint dimension reduction method based on discrete kini coefficient |
CN110600038A (en) * | 2019-08-23 | 2019-12-20 | 北京工业大学 | Audio fingerprint dimension reduction method based on discrete kini coefficient |
CN110675893A (en) * | 2019-09-19 | 2020-01-10 | 腾讯音乐娱乐科技(深圳)有限公司 | Song identification method and device, storage medium and electronic equipment |
CN110808033A (en) * | 2019-09-25 | 2020-02-18 | 武汉科技大学 | Audio classification method based on dual data enhancement strategy |
CN110808033B (en) * | 2019-09-25 | 2022-04-15 | 武汉科技大学 | Audio classification method based on dual data enhancement strategy |
CN110933236A (en) * | 2019-10-25 | 2020-03-27 | 杭州哲信信息技术有限公司 | Machine learning-based null number identification method |
CN110931046A (en) * | 2019-11-29 | 2020-03-27 | 福州大学 | Audio high-level semantic feature extraction method and system for overlapped sound event detection |
CN111179971A (en) * | 2019-12-03 | 2020-05-19 | 杭州网易云音乐科技有限公司 | Nondestructive audio detection method and device, electronic equipment and storage medium |
CN110931045A (en) * | 2019-12-20 | 2020-03-27 | 重庆大学 | Audio feature generation method based on convolutional neural network |
CN111159464A (en) * | 2019-12-26 | 2020-05-15 | 腾讯科技(深圳)有限公司 | Audio clip detection method and related equipment |
CN111159464B (en) * | 2019-12-26 | 2023-12-15 | 腾讯科技(深圳)有限公司 | Audio clip detection method and related equipment |
US11905926B2 (en) * | 2019-12-31 | 2024-02-20 | Envision Digital International Pte. Ltd. | Method and apparatus for inspecting wind turbine blade, and device and storage medium thereof |
CN111508526B (en) * | 2020-04-10 | 2022-07-01 | 腾讯音乐娱乐科技(深圳)有限公司 | Method and device for detecting audio beat information and storage medium |
CN111508526A (en) * | 2020-04-10 | 2020-08-07 | 腾讯音乐娱乐科技(深圳)有限公司 | Method and device for detecting audio beat information and storage medium |
CN113901977A (en) * | 2020-06-22 | 2022-01-07 | 中国电力科学研究院有限公司 | Deep learning-based power consumer electricity stealing identification method and system |
CN112735386B (en) * | 2021-01-18 | 2023-03-24 | 苏州大学 | Voice recognition method based on glottal wave information |
CN112735386A (en) * | 2021-01-18 | 2021-04-30 | 苏州大学 | Voice recognition method based on glottal wave information |
CN113313197A (en) * | 2021-06-17 | 2021-08-27 | 哈尔滨工业大学 | Full-connection neural network training method |
CN113729715A (en) * | 2021-10-11 | 2021-12-03 | 山东大学 | Parkinson's disease intelligent diagnosis system based on finger pressure |
CN115064184A (en) * | 2022-06-28 | 2022-09-16 | 镁佳(北京)科技有限公司 | Audio file musical instrument content identification vector representation method and device |
CN118098270A (en) * | 2024-04-24 | 2024-05-28 | 安徽大学 | Noise tracing method based on feature extraction and feature fusion |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN108122562A (en) | A kind of audio frequency classification method based on convolutional neural networks and random forest | |
Chang et al. | Learning representations of emotional speech with deep convolutional generative adversarial networks | |
CN106503805B (en) | A kind of bimodal based on machine learning everybody talk with sentiment analysis method | |
CN106328121B (en) | Chinese Traditional Instruments sorting technique based on depth confidence network | |
CN109147804A (en) | A kind of acoustic feature processing method and system based on deep learning | |
CN111723874B (en) | Sound field scene classification method based on width and depth neural network | |
CN111000553B (en) | Intelligent classification method for electrocardiogram data based on voting ensemble learning | |
CN107644057B (en) | Absolute imbalance text classification method based on transfer learning | |
CN106815369A (en) | A kind of file classification method based on Xgboost sorting algorithms | |
CN109271550B (en) | Music personalized recommendation method based on deep learning | |
CN107993663A (en) | A kind of method for recognizing sound-groove based on Android | |
CN107392241A (en) | A kind of image object sorting technique that sampling XGBoost is arranged based on weighting | |
CN106295717A (en) | A kind of western musical instrument sorting technique based on rarefaction representation and machine learning | |
CN103000172A (en) | Signal classification method and device | |
CN109767789A (en) | A kind of new feature extracting method for speech emotion recognition | |
Shen et al. | Learning how to listen: A temporal-frequential attention model for sound event detection | |
Shakil et al. | Feature based classification of voice based biometric data through Machine learning algorithm | |
CN112861984A (en) | Speech emotion classification method based on feature fusion and ensemble learning | |
CN110910175A (en) | Tourist ticket product portrait generation method | |
CN104077598A (en) | Emotion recognition method based on speech fuzzy clustering | |
CN111583957B (en) | Drama classification method based on five-tone music rhythm spectrogram and cascade neural network | |
CN102521402B (en) | Text filtering system and method | |
CN110084126A (en) | A kind of satellite communication jamming signal type recognition methods based on Xgboost | |
CN109460872A (en) | One kind being lost unbalanced data prediction technique towards mobile communication subscriber | |
CN111785236A (en) | Automatic composition method based on motivational extraction model and neural network |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
WD01 | Invention patent application deemed withdrawn after publication | ||
WD01 | Invention patent application deemed withdrawn after publication |
Application publication date: 20180605 |