CN108122562A - A kind of audio frequency classification method based on convolutional neural networks and random forest - Google Patents
A kind of audio frequency classification method based on convolutional neural networks and random forest Download PDFInfo
- Publication number
- CN108122562A CN108122562A CN201810037337.8A CN201810037337A CN108122562A CN 108122562 A CN108122562 A CN 108122562A CN 201810037337 A CN201810037337 A CN 201810037337A CN 108122562 A CN108122562 A CN 108122562A
- Authority
- CN
- China
- Prior art keywords
- convolutional neural
- neural network
- audio
- random forest
- spectrogram
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Classifications
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L25/00—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
- G10L25/48—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use
- G10L25/51—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use for comparison or discrimination
- G10L25/54—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use for comparison or discrimination for retrieval
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/24—Classification techniques
- G06F18/241—Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L25/00—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
- G10L25/03—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the type of extracted parameters
- G10L25/18—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the type of extracted parameters the extracted parameters being spectral information of each sub-band
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L25/00—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
- G10L25/27—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the analysis technique
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L25/00—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
- G10L25/27—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the analysis technique
- G10L25/30—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the analysis technique using neural networks
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L25/00—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
- G10L25/45—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the type of analysis window
Landscapes
- Engineering & Computer Science (AREA)
- Physics & Mathematics (AREA)
- Health & Medical Sciences (AREA)
- Computational Linguistics (AREA)
- Signal Processing (AREA)
- Multimedia (AREA)
- Acoustics & Sound (AREA)
- Human Computer Interaction (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Theoretical Computer Science (AREA)
- Evolutionary Computation (AREA)
- Artificial Intelligence (AREA)
- Data Mining & Analysis (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Life Sciences & Earth Sciences (AREA)
- Evolutionary Biology (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Bioinformatics & Computational Biology (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Spectroscopy & Molecular Physics (AREA)
- Biomedical Technology (AREA)
- Biophysics (AREA)
- General Health & Medical Sciences (AREA)
- Molecular Biology (AREA)
- Computing Systems (AREA)
- Mathematical Physics (AREA)
- Software Systems (AREA)
- Image Analysis (AREA)
Abstract
本发明公开了一种基于卷积神经网络和随机森林的音频分类方法,该方法包括:S1:对原始音频数据集进行频谱分析,包括分段、分帧、加窗、傅里叶变换,得到原始音频文件对应的频谱图;S2:以得到的频谱图作为输入,训练一个卷积神经网络特征提取器;S3:去掉卷积神经网络的softmax层,提取频谱图的高层特征;S4:利用提取的频谱图高层特征训练随机森林分类器;S5:基于卷积神经网络提取的高层特征,利用训练好的随机森林进行音频分类。本发明基于卷积神经网络做特征提取,避免了手动构造提取特征的繁琐过程,同时针对采用softmax作为卷积神经网络分类器导致泛化能力不足的问题,采用随机森林替换掉卷积神经网络的softmax层,作为最终的分类器。在测试过程中取得了较高的准确率和召回率。
The invention discloses an audio classification method based on a convolutional neural network and a random forest. The method includes: S1: performing spectrum analysis on an original audio data set, including segmenting, framing, windowing, and Fourier transform, to obtain The spectrogram corresponding to the original audio file; S2: use the obtained spectrogram as input, train a convolutional neural network feature extractor; S3: remove the softmax layer of the convolutional neural network, and extract the high-level features of the spectrogram; S4: use the extracted The high-level features of the spectrogram train the random forest classifier; S5: Based on the high-level features extracted by the convolutional neural network, the trained random forest is used for audio classification. The present invention is based on the convolutional neural network for feature extraction, avoiding the cumbersome process of manually constructing and extracting features, and at the same time aiming at the problem of insufficient generalization ability caused by using softmax as the convolutional neural network classifier, the random forest is used to replace the convolutional neural network softmax layer, as the final classifier. High precision and recall were achieved during the test.
Description
技术领域technical field
本发明属于机器学习领域,涉及一种基于卷积神经网络和随机森林的音频分类方法。The invention belongs to the field of machine learning and relates to an audio classification method based on a convolutional neural network and a random forest.
背景技术Background technique
互联网和多媒体技术的发展让我们的生活充斥着大量的音频,尤其是各种音乐网站,拥有数量庞大且风格迥异的音频文件。面对海量的音频,音频检索能帮助我们快速准确地找到所需的音频文件。音频分类是音频检索的前提,但对大量音频文件进行人工分类却是一项十分耗时且乏味的工作。随着人的听觉疲劳,人工分类的准确率也会有所降低。针对大量音频文件,快速准确的自动分类显得十分有必要。有关音频分类方法的研究较多,例如采用基于隐马尔可夫模型和支持向量机混合的两级音频分类方法,先利用隐马尔可夫模型对音频进行初步分类,确定最可能的两种分类结果,再用相应的支持向量机分类器做最终判决。还有根据音频内容间的相似度对音频进行分类的方法,用每个音频的音高集代表该音频文件,以LDA主题模型对音频分类。也有采用高斯混合模型、决策树等作为分类器进行分类的。但这些方法大都采用传统的方式手工构造特征,既繁琐,提取的特征也不够充分。而且采用单一的分类器,导致模型的泛化能力不强。The development of the Internet and multimedia technology has filled our lives with a large amount of audio, especially various music websites, which have a large number of audio files with different styles. In the face of massive audio, audio retrieval can help us quickly and accurately find the audio files we need. Audio classification is the premise of audio retrieval, but manual classification of a large number of audio files is a very time-consuming and tedious task. With people's hearing fatigue, the accuracy of manual classification will also decrease. For a large number of audio files, fast and accurate automatic classification is very necessary. There are many studies on audio classification methods. For example, a two-level audio classification method based on a mixture of hidden Markov model and support vector machine is used. First, the hidden Markov model is used to initially classify the audio, and the two most likely classification results are determined. , and then use the corresponding support vector machine classifier to make the final decision. There is also a method of classifying audio according to the similarity between audio contents, using the pitch set of each audio to represent the audio file, and classifying the audio with the LDA topic model. There are also Gaussian mixture models, decision trees, etc. used as classifiers for classification. However, most of these methods use traditional methods to manually construct features, which is cumbersome and the extracted features are not sufficient. Moreover, the use of a single classifier leads to poor generalization ability of the model.
近年来,深度学习逐渐火热,其结构含有多隐层,通过组合底层特征形成更加抽象的高层表示属性或特征,能更好的挖掘数据的分布式表示特征,比传统手动构造特征的方式效果更好。针对现状及上述问题,有必要设计一种基于深度学习的音频分类方法。In recent years, deep learning has become increasingly popular. Its structure contains multiple hidden layers. By combining the underlying features to form more abstract high-level representation attributes or features, it can better mine the distributed representation features of data, which is more effective than the traditional way of manually constructing features. it is good. In view of the current situation and the above problems, it is necessary to design an audio classification method based on deep learning.
发明内容Contents of the invention
本发明所要解决的技术问题是提供一种基于卷积神经网络和随机森林的音频分类方法,该方法采用卷积神经网络自动提取高层特征,采用随机森林解决单一分类器泛化能力不强的问题,具有较高的准确率和召回率。The technical problem to be solved by the present invention is to provide an audio classification method based on convolutional neural network and random forest, which uses convolutional neural network to automatically extract high-level features, and uses random forest to solve the problem that the generalization ability of a single classifier is not strong , with high precision and recall.
发明技术解决方案如下:The technical solution of the invention is as follows:
一种基于卷积神经网络和随机森林的音频分类方法,包括以下步骤。An audio classification method based on convolutional neural network and random forest, comprising the following steps.
步骤1:对原始音频文件进行频谱分析,获取其对应的频谱图。由于音频文件往往较长,直接对原始音频做频谱分析得到的频谱图过大,导致后期训练模型占用系统资源较多。所以对原始音频采取适当分段,再对每段音频做频谱分析,包括分帧、加窗、短时傅里叶变换等过程。假设是一个长序列,是长度为N的窗函数,用给加加窗,得到N点序列,即 Step 1: Spectrum analysis is performed on the original audio file to obtain its corresponding spectrogram. Since the audio files are often long, the spectrogram obtained by directly analyzing the spectrum of the original audio is too large, resulting in the later training model occupying more system resources. Therefore, the original audio is properly segmented, and then the spectrum analysis is performed on each segment of audio, including framing, windowing, short-time Fourier transform and other processes. suppose is a long sequence, is a window function of length N, with to add Add window to get N point sequence ,Right now
在频域上有 In the frequency domain there are
短时傅里叶变换的公式如下:The formula for the short-time Fourier transform is as follows:
其中为原信号,为窗函数。通过频谱分析,得到了音频对应的频谱图。in is the original signal, is a window function. Through spectrum analysis, the spectrum diagram corresponding to the audio is obtained.
步骤2:利用步骤1中得到的频谱图作为训练集,训练一个改进的卷积神经网络。该网络有14层,包括卷积层、下采样层、Dropout层、Flatten层、全连接层、BatchNormalization层、softmax层等,采用交叉熵作为损失函数。各层具体说明如下:Step 2: Use the spectrogram obtained in step 1 as a training set to train an improved convolutional neural network. The network has 14 layers, including convolutional layer, downsampling layer, Dropout layer, Flatten layer, fully connected layer, BatchNormalization layer, softmax layer, etc., using cross entropy as the loss function. The details of each layer are as follows:
输入:尺寸为248*248的频谱图;Input: spectrogram with size 248*248;
Layer1:卷积层,核尺寸为(5,5),64个,strides=1,输出特征图尺寸为(244,244);Layer1: Convolutional layer, the kernel size is (5,5), 64, strides=1, the output feature map size is (244,244);
Layer2:下采样层,核尺寸为(2,2),输出特征图尺寸为(122,122);Layer2: Downsampling layer, the kernel size is (2,2), and the output feature map size is (122,122);
Layer3:卷积层,核尺寸为(3,3),128个,strides =2,输出特征图尺寸为(60,60);Layer3: convolutional layer, the kernel size is (3,3), 128, strides =2, the output feature map size is (60,60);
Layer4:下采样层,核尺寸为(2,2), 输出特征图尺寸为(30,30);Layer4: Downsampling layer, the kernel size is (2,2), and the output feature map size is (30,30);
Layer5:卷积层,核尺寸为(3,3),256个,strides =2, 输出特征图尺寸为(14,14);Layer5: convolutional layer, kernel size is (3,3), 256, strides =2, output feature map size is (14,14);
Layer6:下采样层,核尺寸为(2,2),输出特征图尺寸为(7,7);Layer6: downsampling layer, the kernel size is (2,2), and the output feature map size is (7,7);
Layer7:卷积层,核尺寸为(2,2),512个,strides =1,输出特征图尺寸为(6,6);Layer7: convolutional layer, kernel size is (2,2), 512, strides =1, output feature map size is (6,6);
Layer8:下采样层,核尺寸为(2,2),输出特征图尺寸为(3,3);Layer8: downsampling layer, the kernel size is (2,2), and the output feature map size is (3,3);
Layer9:Dropout层,dropout=0.5,在训练过程中使神经元按一定概率失效,防止过拟合;Layer9: Dropout layer, dropout=0.5, during the training process, the neurons will be invalidated with a certain probability to prevent overfitting;
Layer10:Flatten层,把多维数据一维化,过渡到全连接层;Layer10: Flatten layer, which converts multi-dimensional data into one dimension and transitions to a fully connected layer;
Layer11:全连接层,输出神经元个数为128;Layer11: fully connected layer, the number of output neurons is 128;
Layer12:Batch Normalization,对输入信号做归一化,同时又保持模型的表达能力;Layer12: Batch Normalization, which normalizes the input signal while maintaining the expressive ability of the model;
Layer13:全连接层,输出神经元个数为9,因为采用的数据集样本有9类;Layer13: fully connected layer, the number of output neurons is 9, because there are 9 types of data set samples;
Layer14:softmax层,分类器,输出为最终的概率分布,每个值代表一种类别的概率。Layer14: softmax layer, classifier, the output is the final probability distribution, each value represents the probability of a category.
步骤3:将步骤2中训练好的卷积神经网络的softmax层去掉,将最后一个全连接层的输出作为频谱图的高层特征。Step 3: Remove the softmax layer of the convolutional neural network trained in step 2, and use the output of the last fully connected layer as the high-level feature of the spectrogram.
步骤4:利用步骤3中提取的高层特征训练随机森林分类器。采用Gini不纯度作为决策树特征选择的准则。算法描述如下:Step 4: Train a random forest classifier using the high-level features extracted in step 3. Gini impurity is used as the criterion for decision tree feature selection. The algorithm is described as follows:
输入:样本集D = {(x1,y1), (x2,y2)…(xm,ym)},弱分类器迭代次数T;Input: sample set D = {(x1,y1), (x2,y2)...(xm,ym)}, weak classifier iteration number T;
输出:最终的强分类器f(x);Output: final strong classifier f(x);
对于t = 1,2…Tfor t = 1,2...T
a)从原始数据集中进行第t次随机采样,共采样m次,得到采样集Dm;a) The tth random sampling is performed from the original data set, and a total of m samples are taken to obtain the sampling set Dm;
b)利用采样集Dm构建第m个决策树Gm(x)。在样本所有特征中随机选择一部分特征,然后再从这些特征中选择最优的一个特征来为决策树划分左右子树。b) Construct the mth decision tree Gm(x) by using the sampling set Dm. Randomly select some features from all the features of the sample, and then select the optimal feature from these features to divide the left and right subtrees for the decision tree.
步骤5:将待分类的音频进行步骤1中的频谱分析得到频谱图,然后用步骤3中去掉softmax层的卷积神经网络提取频谱图高层特征,最后将提取的高层特征输入到步骤4中训练好的随机森林分类器进行音频分类,用T个弱学习器投出的最多票数的类别作为最终类别。Step 5: Perform spectral analysis on the audio to be classified to obtain a spectrogram, then use the convolutional neural network with the softmax layer removed in step 3 to extract high-level features of the spectrogram, and finally input the extracted high-level features to step 4 for training A good random forest classifier performs audio classification, and the category with the most votes cast by T weak learners is used as the final category.
本发明基于深度学习提出了一种音频分类方法,采用了卷积神经网络和随机森林相结合的混合模型。针对传统模型对特征提取不充分的问题,本发明将音频转换成频谱图,再利用卷积神经网络提取频谱图的高层特征,充分发挥了卷积神经网络对图像的强大特征提取能力,简化了特征提取的复杂过程。针对单一分类器泛化能力不强的问题,采用了随机森林模型,充分发挥随机森林集成学习的优点,构建多棵决策树来分类,弥补了单一分类器的不足。从分类结果上看,本发明具有较高的准确率和召回率。The present invention proposes an audio classification method based on deep learning, which adopts a hybrid model combining convolutional neural network and random forest. Aiming at the problem of insufficient feature extraction by the traditional model, the present invention converts the audio into a spectrogram, and then uses the convolutional neural network to extract the high-level features of the spectrogram, which fully utilizes the powerful feature extraction ability of the convolutional neural network for images, and simplifies The complex process of feature extraction. Aiming at the problem that the generalization ability of a single classifier is not strong, a random forest model is used to give full play to the advantages of random forest ensemble learning, and multiple decision trees are built to classify, which makes up for the shortcomings of a single classifier. From the classification results, the present invention has higher accuracy and recall.
附图说明Description of drawings
图1为本发明一种基于卷积神经网络和随机森林的音频分类方法的流程图。Fig. 1 is a flow chart of an audio classification method based on convolutional neural network and random forest in the present invention.
图2频谱分析后获取的频谱图。Figure 2 Spectrum diagram obtained after spectrum analysis.
图3为采用改进后的卷积神经网络进行高层特征提取的流程图。Figure 3 is a flowchart of high-level feature extraction using the improved convolutional neural network.
具体实施方式Detailed ways
下面结合附图和实施例,对本发明的具体实施方法做进一步描述。以下施例仅用于说明本发明,但不用来限制本发明的范围。The specific implementation method of the present invention will be further described below in conjunction with the accompanying drawings and embodiments. The following examples are only used to illustrate the present invention, but are not intended to limit the scope of the present invention.
实施例1是本发明的一种实例,以“GTZAN Genre Collection”作为数据集,采用其中九种不同流派的音频文件作为训练集和测试集,九种类别为:blues、C1assical、Country、Disco、Jazz、Metal、Pop、Reggae和Rock。Embodiment 1 is an example of the present invention, using "GTZAN Genre Collection" as a data set, adopting nine kinds of audio files of different genres as a training set and a test set, nine kinds of categories are: blues, C1assical, Country, Disco, Jazz, Metal, Pop, Reggae and Rock.
1. 将音频文件分为等长的6段,每一段都对应相同的标签。对每一段音频分帧、加窗、傅里叶变换,得到其频谱图。附图2展示的即为获取的频谱图。将频谱图读入,转换为灰度图。再将每张图的尺寸调整为248*248。最后将调整后的图片的像素值保存到数组,作为卷积神经网络数据集中的一个样本。经过上面的操作,得到数据集D(5400,248,248),表示有5400张频谱图,每张频谱图的宽度为248,高度为248。将数据集划分为训练集和测试集,其中80%作为训练集,20%作为测试集,最终得到训练集T(4320,248,248),测试集V(1080,248,248)。1. Divide the audio file into 6 segments of equal length, and each segment corresponds to the same tag. Framing, windowing, and Fourier transform each segment of audio to obtain its spectrogram. Figure 2 shows the obtained frequency spectrum. Read in the spectrogram and convert it to grayscale. Then adjust the size of each picture to 248*248. Finally, save the pixel values of the adjusted image to an array as a sample in the convolutional neural network dataset. After the above operations, the data set D (5400, 248, 248) is obtained, which means that there are 5400 spectrograms, and the width of each spectrogram is 248 and the height is 248. The data set is divided into training set and test set, 80% of which are used as training set and 20% are used as test set. Finally, training set T (4320, 248, 248) and test set V (1080, 248, 248) are obtained.
2. 利用训练集T(4320,248,248)训练卷积神经网络模型。网络一共14层,包括卷积层、下采样层、全连接层、Dropout层、Batch Normalization层等。2. Use the training set T (4320, 248, 248) to train the convolutional neural network model. The network has a total of 14 layers, including convolutional layers, downsampling layers, fully connected layers, Dropout layers, Batch Normalization layers, etc.
3. 当卷积神经网络训练完成后,去掉最后的softmax层。用训练好的卷积神经网络对频谱图进行更深层次的特征提取,将由频谱图构成的原始训练集T(4320,248,248)重构为新的训练集T’(4320,9),将由频谱图构成的原始测试集V(1080,248,248)重构为新的测试集V’(1080,9)。3. When the convolutional neural network is trained, remove the last softmax layer. Use the trained convolutional neural network to perform deeper feature extraction on the spectrogram, and reconstruct the original training set T(4320,248,248) composed of the spectrogram into a new training set T'(4320,9), which will be composed of the spectrogram The original test set V(1080,248,248) is reconstructed into a new test set V'(1080,9).
4. 用新的训练集T’和测试集V’来训练随机森林,作为最终的分类器。采用不同参数组合设置,其中4. Use the new training set T' and test set V' to train the random forest as the final classifier. Different parameter combinations are used, among which
经过挑选,最佳参数组合为n_estimators:100,min_samples_split:3,min_samples_leaf:1。随机森林训练完成后,在测试集上进行测试,结果如下:After selection, the best parameter combination is n_estimators:100, min_samples_split:3, min_samples_leaf:1. After the random forest training is completed, it is tested on the test set, and the results are as follows:
由上表可以看出该方法能够较准确地对音频进行自动分类,其中平均准确率达到了83%,平均召回率达到了82%。It can be seen from the above table that this method can automatically classify audio more accurately, with an average accuracy rate of 83% and an average recall rate of 82%.
Claims (3)
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201810037337.8A CN108122562A (en) | 2018-01-16 | 2018-01-16 | A kind of audio frequency classification method based on convolutional neural networks and random forest |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201810037337.8A CN108122562A (en) | 2018-01-16 | 2018-01-16 | A kind of audio frequency classification method based on convolutional neural networks and random forest |
Publications (1)
Publication Number | Publication Date |
---|---|
CN108122562A true CN108122562A (en) | 2018-06-05 |
Family
ID=62232892
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201810037337.8A Pending CN108122562A (en) | 2018-01-16 | 2018-01-16 | A kind of audio frequency classification method based on convolutional neural networks and random forest |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN108122562A (en) |
Cited By (27)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN108766461A (en) * | 2018-07-17 | 2018-11-06 | 厦门美图之家科技有限公司 | Audio feature extraction methods and device |
CN109002529A (en) * | 2018-07-17 | 2018-12-14 | 厦门美图之家科技有限公司 | Audio search method and device |
CN109166593A (en) * | 2018-08-17 | 2019-01-08 | 腾讯音乐娱乐科技(深圳)有限公司 | audio data processing method, device and storage medium |
CN109493881A (en) * | 2018-11-22 | 2019-03-19 | 北京奇虎科技有限公司 | A kind of labeling processing method of audio, device and calculate equipment |
CN109684506A (en) * | 2018-11-22 | 2019-04-26 | 北京奇虎科技有限公司 | A kind of labeling processing method of video, device and calculate equipment |
CN109739112A (en) * | 2018-12-29 | 2019-05-10 | 张卫校 | A kind of wobble objects control method and wobble objects |
CN109949825A (en) * | 2019-03-06 | 2019-06-28 | 河北工业大学 | Noise classification method based on FPGA-accelerated PCNN algorithm |
CN110010128A (en) * | 2019-04-09 | 2019-07-12 | 天津松下汽车电子开发有限公司 | A kind of sound control method and system of high discrimination |
CN110324657A (en) * | 2019-05-29 | 2019-10-11 | 北京奇艺世纪科技有限公司 | Model generation, method for processing video frequency, device, electronic equipment and storage medium |
CN110414483A (en) * | 2019-08-13 | 2019-11-05 | 山东浪潮人工智能研究院有限公司 | A face recognition method and system based on deep neural network and random forest |
CN110600038A (en) * | 2019-08-23 | 2019-12-20 | 北京工业大学 | Audio fingerprint dimension reduction method based on discrete kini coefficient |
CN110675893A (en) * | 2019-09-19 | 2020-01-10 | 腾讯音乐娱乐科技(深圳)有限公司 | Song identification method and device, storage medium and electronic equipment |
CN110808033A (en) * | 2019-09-25 | 2020-02-18 | 武汉科技大学 | An Audio Classification Method Based on Double Data Augmentation Strategy |
CN110931045A (en) * | 2019-12-20 | 2020-03-27 | 重庆大学 | Audio feature generation method based on convolutional neural network |
CN110931046A (en) * | 2019-11-29 | 2020-03-27 | 福州大学 | Audio high-level semantic feature extraction method and system for overlapped sound event detection |
CN110933236A (en) * | 2019-10-25 | 2020-03-27 | 杭州哲信信息技术有限公司 | Machine learning-based null number identification method |
CN111159464A (en) * | 2019-12-26 | 2020-05-15 | 腾讯科技(深圳)有限公司 | Audio clip detection method and related equipment |
CN111179971A (en) * | 2019-12-03 | 2020-05-19 | 杭州网易云音乐科技有限公司 | Nondestructive audio detection method and device, electronic equipment and storage medium |
CN111508526A (en) * | 2020-04-10 | 2020-08-07 | 腾讯音乐娱乐科技(深圳)有限公司 | Method and device for detecting audio beat information and storage medium |
CN111583890A (en) * | 2019-02-15 | 2020-08-25 | 阿里巴巴集团控股有限公司 | Audio classification method and device |
CN112735386A (en) * | 2021-01-18 | 2021-04-30 | 苏州大学 | Voice recognition method based on glottal wave information |
CN113313197A (en) * | 2021-06-17 | 2021-08-27 | 哈尔滨工业大学 | Full-connection neural network training method |
CN113729715A (en) * | 2021-10-11 | 2021-12-03 | 山东大学 | Parkinson's disease intelligent diagnosis system based on finger pressure |
CN113901977A (en) * | 2020-06-22 | 2022-01-07 | 中国电力科学研究院有限公司 | A deep learning-based method and system for identifying electricity theft by power users |
CN115064184A (en) * | 2022-06-28 | 2022-09-16 | 镁佳(北京)科技有限公司 | Audio file musical instrument content identification vector representation method and device |
US11905926B2 (en) * | 2019-12-31 | 2024-02-20 | Envision Digital International Pte. Ltd. | Method and apparatus for inspecting wind turbine blade, and device and storage medium thereof |
CN118098270A (en) * | 2024-04-24 | 2024-05-28 | 安徽大学 | A noise source tracing method based on feature extraction and feature fusion |
Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN106408015A (en) * | 2016-09-13 | 2017-02-15 | 电子科技大学成都研究院 | Road fork identification and depth estimation method based on convolutional neural network |
CN106952274A (en) * | 2017-03-14 | 2017-07-14 | 西安电子科技大学 | Pedestrian detection and ranging method based on stereo vision |
CN107066553A (en) * | 2017-03-24 | 2017-08-18 | 北京工业大学 | A kind of short text classification method based on convolutional neural networks and random forest |
CN107492383A (en) * | 2017-08-07 | 2017-12-19 | 上海六界信息技术有限公司 | Screening technique, device, equipment and the storage medium of live content |
CN107491606A (en) * | 2017-08-17 | 2017-12-19 | 安徽工业大学 | Variable working condition epicyclic gearbox sun gear method for diagnosing faults based on more attribute convolutional neural networks |
-
2018
- 2018-01-16 CN CN201810037337.8A patent/CN108122562A/en active Pending
Patent Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN106408015A (en) * | 2016-09-13 | 2017-02-15 | 电子科技大学成都研究院 | Road fork identification and depth estimation method based on convolutional neural network |
CN106952274A (en) * | 2017-03-14 | 2017-07-14 | 西安电子科技大学 | Pedestrian detection and ranging method based on stereo vision |
CN107066553A (en) * | 2017-03-24 | 2017-08-18 | 北京工业大学 | A kind of short text classification method based on convolutional neural networks and random forest |
CN107492383A (en) * | 2017-08-07 | 2017-12-19 | 上海六界信息技术有限公司 | Screening technique, device, equipment and the storage medium of live content |
CN107491606A (en) * | 2017-08-17 | 2017-12-19 | 安徽工业大学 | Variable working condition epicyclic gearbox sun gear method for diagnosing faults based on more attribute convolutional neural networks |
Non-Patent Citations (2)
Title |
---|
曹林林: ""卷积神经网络在高分遥感影像分类中的应用"", 《测绘科学》 * |
罗建华: ""基于深度卷积神经网络的高光谱遥感图像分类"", 《西华大学学报》 * |
Cited By (37)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN108766461A (en) * | 2018-07-17 | 2018-11-06 | 厦门美图之家科技有限公司 | Audio feature extraction methods and device |
CN109002529A (en) * | 2018-07-17 | 2018-12-14 | 厦门美图之家科技有限公司 | Audio search method and device |
CN108766461B (en) * | 2018-07-17 | 2021-01-26 | 厦门美图之家科技有限公司 | Audio feature extraction method and device |
CN109002529B (en) * | 2018-07-17 | 2021-02-02 | 厦门美图之家科技有限公司 | Audio retrieval method and device |
CN109166593A (en) * | 2018-08-17 | 2019-01-08 | 腾讯音乐娱乐科技(深圳)有限公司 | audio data processing method, device and storage medium |
CN109684506A (en) * | 2018-11-22 | 2019-04-26 | 北京奇虎科技有限公司 | A kind of labeling processing method of video, device and calculate equipment |
CN109493881B (en) * | 2018-11-22 | 2023-12-05 | 北京奇虎科技有限公司 | Method and device for labeling audio and computing equipment |
CN109684506B (en) * | 2018-11-22 | 2023-10-20 | 三六零科技集团有限公司 | Video tagging processing method and device and computing equipment |
CN109493881A (en) * | 2018-11-22 | 2019-03-19 | 北京奇虎科技有限公司 | A kind of labeling processing method of audio, device and calculate equipment |
CN109739112A (en) * | 2018-12-29 | 2019-05-10 | 张卫校 | A kind of wobble objects control method and wobble objects |
CN109739112B (en) * | 2018-12-29 | 2022-03-04 | 张卫校 | Swinging object control method and swinging object |
CN111583890A (en) * | 2019-02-15 | 2020-08-25 | 阿里巴巴集团控股有限公司 | Audio classification method and device |
CN109949825A (en) * | 2019-03-06 | 2019-06-28 | 河北工业大学 | Noise classification method based on FPGA-accelerated PCNN algorithm |
CN110010128A (en) * | 2019-04-09 | 2019-07-12 | 天津松下汽车电子开发有限公司 | A kind of sound control method and system of high discrimination |
CN110324657A (en) * | 2019-05-29 | 2019-10-11 | 北京奇艺世纪科技有限公司 | Model generation, method for processing video frequency, device, electronic equipment and storage medium |
CN110414483A (en) * | 2019-08-13 | 2019-11-05 | 山东浪潮人工智能研究院有限公司 | A face recognition method and system based on deep neural network and random forest |
CN110600038B (en) * | 2019-08-23 | 2022-04-05 | 北京工业大学 | A Dimensionality Reduction Method of Audio Fingerprint Based on Discrete Gini Coefficient |
CN110600038A (en) * | 2019-08-23 | 2019-12-20 | 北京工业大学 | Audio fingerprint dimension reduction method based on discrete kini coefficient |
CN110675893A (en) * | 2019-09-19 | 2020-01-10 | 腾讯音乐娱乐科技(深圳)有限公司 | Song identification method and device, storage medium and electronic equipment |
CN110808033A (en) * | 2019-09-25 | 2020-02-18 | 武汉科技大学 | An Audio Classification Method Based on Double Data Augmentation Strategy |
CN110808033B (en) * | 2019-09-25 | 2022-04-15 | 武汉科技大学 | Audio classification method based on dual data enhancement strategy |
CN110933236A (en) * | 2019-10-25 | 2020-03-27 | 杭州哲信信息技术有限公司 | Machine learning-based null number identification method |
CN110931046A (en) * | 2019-11-29 | 2020-03-27 | 福州大学 | Audio high-level semantic feature extraction method and system for overlapped sound event detection |
CN111179971A (en) * | 2019-12-03 | 2020-05-19 | 杭州网易云音乐科技有限公司 | Nondestructive audio detection method and device, electronic equipment and storage medium |
CN110931045A (en) * | 2019-12-20 | 2020-03-27 | 重庆大学 | Audio feature generation method based on convolutional neural network |
CN111159464A (en) * | 2019-12-26 | 2020-05-15 | 腾讯科技(深圳)有限公司 | Audio clip detection method and related equipment |
CN111159464B (en) * | 2019-12-26 | 2023-12-15 | 腾讯科技(深圳)有限公司 | Audio clip detection method and related equipment |
US11905926B2 (en) * | 2019-12-31 | 2024-02-20 | Envision Digital International Pte. Ltd. | Method and apparatus for inspecting wind turbine blade, and device and storage medium thereof |
CN111508526B (en) * | 2020-04-10 | 2022-07-01 | 腾讯音乐娱乐科技(深圳)有限公司 | Method and device for detecting audio beat information and storage medium |
CN111508526A (en) * | 2020-04-10 | 2020-08-07 | 腾讯音乐娱乐科技(深圳)有限公司 | Method and device for detecting audio beat information and storage medium |
CN113901977A (en) * | 2020-06-22 | 2022-01-07 | 中国电力科学研究院有限公司 | A deep learning-based method and system for identifying electricity theft by power users |
CN112735386B (en) * | 2021-01-18 | 2023-03-24 | 苏州大学 | Voice recognition method based on glottal wave information |
CN112735386A (en) * | 2021-01-18 | 2021-04-30 | 苏州大学 | Voice recognition method based on glottal wave information |
CN113313197A (en) * | 2021-06-17 | 2021-08-27 | 哈尔滨工业大学 | Full-connection neural network training method |
CN113729715A (en) * | 2021-10-11 | 2021-12-03 | 山东大学 | Parkinson's disease intelligent diagnosis system based on finger pressure |
CN115064184A (en) * | 2022-06-28 | 2022-09-16 | 镁佳(北京)科技有限公司 | Audio file musical instrument content identification vector representation method and device |
CN118098270A (en) * | 2024-04-24 | 2024-05-28 | 安徽大学 | A noise source tracing method based on feature extraction and feature fusion |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN108122562A (en) | A kind of audio frequency classification method based on convolutional neural networks and random forest | |
CN101247470B (en) | Method realized by computer for detecting scene boundaries in videos | |
US10515292B2 (en) | Joint acoustic and visual processing | |
CN107301171A (en) | A kind of text emotion analysis method and system learnt based on sentiment dictionary | |
CN111986699B (en) | Sound event detection method based on full convolution network | |
CN109308912A (en) | Music style recognition methods, device, computer equipment and storage medium | |
CN109508379A (en) | A kind of short text clustering method indicating and combine similarity based on weighted words vector | |
CN103268339A (en) | Method and system for named entity recognition in microblog messages | |
CN110120218A (en) | Expressway oversize vehicle recognition methods based on GMM-HMM | |
CN110399478A (en) | Event discovery method and device | |
CN104166684A (en) | Cross-media retrieval method based on uniform sparse representation | |
CN108846047A (en) | A kind of picture retrieval method and system based on convolution feature | |
CN110990563A (en) | A method and system for constructing traditional cultural material library based on artificial intelligence | |
CN112489689B (en) | Cross-database voice emotion recognition method and device based on multi-scale difference countermeasure | |
CN109086794B (en) | Driving behavior pattern recognition method based on T-LDA topic model | |
CN110910175A (en) | Tourist ticket product portrait generation method | |
CN107357785A (en) | Theme feature word abstracting method and system, feeling polarities determination methods and system | |
Flamary et al. | Spoken WordCloud: Clustering recurrent patterns in speech | |
CN111091809B (en) | Regional accent recognition method and device based on depth feature fusion | |
CN109492105A (en) | A kind of text sentiment classification method based on multiple features integrated study | |
Ferragne et al. | Towards phonetic interpretability in deep learning applied to voice comparison | |
Blanchard et al. | Getting the subtext without the text: Scalable multimodal sentiment classification from visual and acoustic modalities | |
CN107292348A (en) | A kind of Bagging_BSJ short text classification methods | |
CN108985369A (en) | A kind of same distribution for unbalanced dataset classification integrates prediction technique and system | |
CN108920451A (en) | Text emotion analysis method based on dynamic threshold and multi-categorizer |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
WD01 | Invention patent application deemed withdrawn after publication |
Application publication date: 20180605 |
|
WD01 | Invention patent application deemed withdrawn after publication |