CN101546556A - Classification system for identifying audio content - Google Patents

Classification system for identifying audio content Download PDF

Info

Publication number
CN101546556A
CN101546556A CN200810035351A CN200810035351A CN101546556A CN 101546556 A CN101546556 A CN 101546556A CN 200810035351 A CN200810035351 A CN 200810035351A CN 200810035351 A CN200810035351 A CN 200810035351A CN 101546556 A CN101546556 A CN 101546556A
Authority
CN
China
Prior art keywords
module
audio
transient state
frame
audio content
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN200810035351A
Other languages
Chinese (zh)
Other versions
CN101546556B (en
Inventor
黄鹤云
林福辉
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Spreadtrum Communications Shanghai Co Ltd
Original Assignee
Spreadtrum Communications Shanghai Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Spreadtrum Communications Shanghai Co Ltd filed Critical Spreadtrum Communications Shanghai Co Ltd
Priority to CN2008100353510A priority Critical patent/CN101546556B/en
Publication of CN101546556A publication Critical patent/CN101546556A/en
Application granted granted Critical
Publication of CN101546556B publication Critical patent/CN101546556B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Abstract

The invention provides an audio content classification system, which comprises a training end and a test end, wherein the training end extracts characteristics of audio test samples through an audio characteristics extracting module, and trains classifier parameters through a classifier training module; and the test end comprises the audio characteristics extracting module shared by the training end, a classifier decision module, a transient characteristics extracting module, a transient characteristics smoothing module and an incremental learning module, wherein the audio characteristics extracting module is used for extracting audio characteristics of input signals; the classifier decision module takes output audio characteristics of the audio characteristics extracting module as input to classify the classifier parameters obtained by training a first frame through a training part; simultaneously, the transient characteristics extracting module extracts transient characteristics of the input signals, and outputs the transient characteristics of the input signals to the transient characteristics smoothing module; the transient characteristics smoothing module corrects and outputs an output result of the classifier decision module; and simultaneously, an incremental learning module utilizes classified class information and characteristic information of audio frames as a group of incremental learning samples to update the classifier parameters.

Description

The categorizing system that is used for audio content identification
Technical field
The present invention relates to a kind of pattern-recognition and signal processing technology, relate in particular to a kind of categorizing system that is used for audio content identification.
Background technology
Audio frequency is a kind of important medium in the multimedia, the audio-frequency information retrieval technique is a pith in the multimedia information retrieval technology, corresponding prior art can be with reference to No. 1391211,1223739 and 1270361, Chinese patent and United States Patent (USP) 5,613,037,6,292,776 and 5,440, No. 662 etc.In audio retrieval is used, need classify to voice data, its purpose is that the sound signal of distinguishing input belongs to that class, common audio categories has voice, ground unrest, pop music, classical music etc., and the application of audio content classification is also very extensive, particularly in the audio retrieval field, audio content classification decisive role, and in the extraction process of some multimedia summaries, the audio content classification has also been played vital role as a kind of supplementary means of video content retrieval.Broadly, at a lot of voice and audio standard, for example in the AMR-WB and AMR-WB+ of 3GPP, they have all used voice/noise classification device and voice/music sorter, offering the scrambler input signal is any sound signal, thereby each signal is taked different scramblers, and it is quite crucial and important therefore designing a kind of good audio content sorting technique.In common sorting technique, usually use two requisite modules, i.e. audio feature extraction module, its function are to extract to reflect the audio content kinds of information from the audio sample point of input, another then is a sorter, and it utilizes these information to finish the process that kind is judged.A lot of features of audio content wherein, temporal signatures (zero-crossing rate for example, curvature, linear predictor coefficient or the like), frequency domain character (Mel cepstrum coefficient, fourier transform coefficient, wavelet conversion coefficient or the like) and some other nonlinear characteristics (fractal, chaos parameter or the like) is proved to be very effective sorting technique, and in audio content sorting technique field, existing a variety of sorters have been widely used, wherein decision tree (Decision Tree) and k-arest neighbors method (K Nearest Neighbor) are two kinds of relative sorters of realizing and understanding of being easy to, they and to voice, neighbourhood noise, music three class audio frequency classifying contents have been obtained good effect.In addition, in the AMR-WB+ standard, the sorter of voice and music also is the method for the decision tree of employing.And support vector machine classifier (Support Vector MachineClassifier) as a kind of in recent years by the sorter that adopts in a lot of machine learning and the area of pattern recognition, also be proved to be a kind of very efficient ways.Other several classical sorters, reverse neural network (Back-Propagation Neural Network) for example, artificial neural network (ArtificialNeural Network) cluster (Clustering) method, it is effective also being proved to be audio content classification.
And in existing categorizing system, because that the parameter of its sorter is is fixing, can't upgrades in time, and the acoustic characteristic of accident can't effectively be handled, therefore can not satisfy the request for utilization of specific environment (as safety monitoring).
Summary of the invention
The technical problem to be solved in the present invention is to propose a kind of audio content classification system, the defective that can't upgrade and can't effectively handle the acoustic characteristic of accident in order to the parameter that solves existing sorter.
For addressing the above problem, according to a kind of audio content classification system of the present invention, comprise training end and test lead, wherein the training end comprises audio feature extraction module and sorter training module, wherein the audio feature extraction module is in order to extract the feature of audio-frequency test sample, and the sorter training module trains the parameter of sorter according to the audio frequency characteristics of audio feature extraction module collection and the classification information of this sound signal; And test lead comprises and train the shared audio feature extraction module of end, the sorter decision-making module, the transient state characteristic extraction module, level and smooth module of transient state characteristic and incremental learning module, wherein the audio feature extraction module is in order to extract the audio frequency characteristics of input signal, the sorter decision-making module is that the output audio according to the audio feature extraction module is characterized as input, the classifier parameters that training obtains to first frame utilization training part is classified, the transient state characteristic extraction module extracts and exports to the level and smooth module of transient state characteristic to the transient state characteristic of this input signal simultaneously, the level and smooth module of this transient state characteristic comes the output result of sorter decision-making module is revised and exports, and the incremental learning module utilizes the classification information of classified audio frame and characteristic information to be used as the parameter that one group of incremental learning sample upgrades sorter simultaneously.
According to above-mentioned principal character, the transient state characteristic extraction module extracts the transient state characteristic of present frame and judges, the level and smooth module of transient state characteristic is taked different smoothing processing methods according to the difference of transient state characteristic, when wherein present frame is judged as the transient state frame, adopt second smoothing method, otherwise adopt first smoothing method, wherein first smoothing method is meant and the irrelevant smoothing method of transient state characteristic, and second smoothing method then is the smoothing method relevant with transient state characteristic.
According to above-mentioned principal character, it is that the input audio frame is divided into M section: B that transient state characteristic extracts i, l=1,2 ..., 32, wherein:
B l = { x N l + 1 , x N l + 2 , . . . . , x N l + 32 } , N l = lN 64 , l = 1,2 , . . . , 64 ;
Calculate every section amplitude sum then, i.e. the absolute value sum of sampled point numerical value obtains:
M i = 1 32 Σ n ∈ B i | x n | , i = 1,2 , . . . , 64 ;
Calculate energy ratio and the amplitude-energy ratio of each section and the last period afterwards again:
r l 1 = E l min ( E l - 1 , E l - 2 ) , r l 2 = max x i ∈ B l x i 2 E l - 1 , l ∈ S , Wherein E l = Σ n ∈ B l x n 2
Calculate maximum amplitude-energy ratio and energy ratio again:
F i = max l ( log r l i ) , i = 1,2 ,
Therefore, transient state characteristic can calculate with following mode:
F=0.45F 1+0.55F 2
Obtain after the transient state characteristic, whether judge F greater than first threshold value, if greater than would be expressed as the transient state frame, then adopt second smoothing method, otherwise then adopt first smoothing method.
According to above-mentioned principal character, first smoothing method is to analyze first three frame earlier, if " non-accident frame, accident frame, non-accident frame " this classification results, all smoothly be non-accident frame then with three frames, and a kind of embodiment of second smoothing method can be as feature F during greater than second threshold value, then makes this frame begin first three frame and back three frames all are accident.
According to above-mentioned principal character, second threshold value is bigger than first threshold value.
According to above-mentioned principal character, the renewal classifier parameters is to form a bigger training sample by the sample of the training data that will preserve in advance and incremental learning, and training classifier upgrades classifier parameters again.
According to above-mentioned principal character, also comprise Feature Fusion module or feature dimensionality reduction module in the above-mentioned sorter.
According to above-mentioned principal character, after having extracted feature and before the decision-making classification, use principal component analysis (PCA) with the feature dimensionality reduction.
According to above-mentioned principal character, the transient state characteristic extracting method is a perceptual entropy.
According to above-mentioned principal character, described sorter adopts traditional decision-tree.
According to above-mentioned principal character, described sorter adopts neural net method.
According to above-mentioned principal character, described sorter adopts support vector machine method.
According to above-mentioned principal character, described sorter adopts clustering method.
According to above-mentioned principal character, described sorter adopts bayes method.
Compared with prior art, the present invention has adopted enhancing learning art and transient state characteristic smoothing technique, has improved the accuracy of classification.
Description of drawings
Fig. 1 is the composition Organization Chart of the training end of the embodiment of the invention.
Fig. 2 is the composition Organization Chart of the test lead of the embodiment of the invention.
Embodiment
Below in conjunction with accompanying drawing the specific embodiment of the invention is described.
Audio frequency is a kind of important medium in the multimedia, and the audio-frequency information retrieval technique is a pith in the multimedia information retrieval technology.In audio retrieval is used, need classify to voice data, its purpose is that the sound signal of distinguishing input belongs to that class, common audio categories has voice, ground unrest, pop music, classical music etc., and the application of audio content classification is also very extensive, particularly in the audio retrieval field, audio content classification decisive role, and in the extraction process of some multimedia summaries, the audio content classification has also been played vital role as a kind of supplementary means of video content retrieval.Broadly, at a lot of voice and audio standard, for example in the AMR-WB and AMR-WB+ of 3GPP, they have all used voice/noise classification device and voice/music sorter, offering the scrambler input signal is any sound signal, thereby each signal is taked different scramblers, and it is quite crucial and important therefore designing a kind of good audio content sorting technique.In common sorting technique, usually use two requisite modules, i.e. audio feature extraction module, its function are to extract to reflect the audio content kinds of information from the audio sample point of input, another then is a sorter, and it utilizes these information to finish the process that kind is judged.A lot of features of audio content wherein, temporal signatures (zero-crossing rate for example, curvature, linear predictor coefficient or the like), frequency domain character (Mel cepstrum coefficient, fourier transform coefficient, wavelet conversion coefficient or the like) and some other nonlinear characteristics (fractal, chaos parameter or the like) is proved to be very effective sorting technique, and in audio content sorting technique field, existing a variety of sorters have been widely used, wherein decision tree (Decision Tree) and k-arest neighbors method (K Nearest Neighbor) are two kinds of relative sorters of realizing and understanding of being easy to, they and to voice, neighbourhood noise, music three class audio frequency classifying contents have been obtained good effect.In addition, in the AMR-WB+ standard, the sorter of voice and music also is the method for the decision tree of employing.And support vector machine classifier (Support Vector Machine Classifier) as a kind of in recent years by the sorter that adopts in a lot of machine learning and the area of pattern recognition, also be proved to be a kind of very efficient ways.Other several classical sorters, reverse neural network (Back-Propagation NeuralNetwork) for example, artificial neural network (Artificial Neural Network), cluster (Clustering) method, it is effective also being proved to be audio content classification.
And in existing categorizing system, because the parameter of its sorter is fixing, can't upgrade in time, and the acoustic characteristic to accident can't effectively be handled, therefore can not satisfy the request for utilization of specific environment (as safety monitoring), therefore the invention provides a kind of audio content classification system, the defective that can't upgrade and can't effectively handle the acoustic characteristic of accident in order to the parameter that solves existing sorter.
Figure 1 shows that the composition Organization Chart of the training end of the embodiment of the invention, wherein the training end comprises two modules, and one is the audio feature extraction module, and one is the sorter training module.In the present invention, all Audio Signal Processing all are to handle frame by frame, suppose to read in each frame sound signal and are expressed as x 1, x 2...., x N, after characteristic extracting module is handled, can obtain the proper vector (F of a M dimension 1, F 2...., F M), that is:
x 1 , x 2 , . . . , x N → Feature Extraction F 1 , F 2 , . . . , F M
Be that zero-crossing rate (Zero-Crossing Rate) with signal is a feature in the present embodiment, other calculates according to following method:
F 1 = ZCR = Σ i = 1 N - 1 sgn ( x i x i + 1 )
Sgn (x) is-symbol function wherein, if x greater than zero then get 1, gets-1 less than zero, equalling zero then is zero.
Certainly, also the gross energy of available signal is a feature, and it calculates according to following formula:
F 2 = TE = Σ i = 1 N x i 2
Obtain feature and promptly finished the work of audio feature extraction later on, carry out last classification according to feature then, promptly enter the sorter training module, the effect of sorter training module is according to feature (F 1, F 2...., F M) and the classification information of this frame sound signal, train the parameter of sorter, use for test lead, wherein common sorter embodiment has traditional decision-tree, neural net method, support vector machine method, clustering method, bayes method etc.
See also shown in Figure 2, composition Organization Chart for the test lead of the embodiment of the invention, wherein test lead comprises and trains and hold shared audio feature extraction module, the sorter decision-making module, the transient state characteristic extraction module, level and smooth module of transient state characteristic and incremental learning module, wherein the sorter decision-making module is that output audio according to the audio feature extraction module is characterized as input, the sorter that training obtains to first frame utilization training part is classified, all frames that second frame is begun use the sorter (being detailed later) after incremental learnings upgrade to classify, and embodiment can comprise traditional decision-tree, neural net method, support vector machine method, clustering method and bayes method etc.And the audio feature extraction module is when extracting audio frequency characteristics to the input audio frame, and the transient state characteristic extraction module has extracted the transient state characteristic of this frame, outputs to the level and smooth module of transient state characteristic and comes the output result of sorter decision-making module is revised.The definition of transient state characteristic then is whether the energy at time domain up-sampling point significantly improves, and take different smoothing processing methods according to the difference of transient state characteristic, when wherein present frame is judged as the transient state frame, adopt second smoothing method, otherwise adopt first smoothing method.Wherein first smoothing method is meant and the irrelevant smoothing method of transient state characteristic, and second smoothing method then is the smoothing method relevant with transient state characteristic.
Wherein the embodiment of transient state characteristic extraction then is that the input audio frame is divided into M section: B i, l=1,2 ..., 32, wherein:
B l = { x N l + 1 , x N l + 2 , . . . . , x N l + 32 } , N l = lN 64 , l = 1,2 , . . . , 64 ;
So between the adjacent segment the overlapping of half arranged.Calculate every section amplitude sum then, i.e. the absolute value sum of sampled point numerical value obtains:
M i = 1 32 Σ n ∈ B i | x n | , i = 1,2 , . . . , 64 ;
Calculate energy ratio and the amplitude-energy ratio of each section and the last period afterwards again:
r l 1 = E l min ( E l - 1 , E l - 2 ) , r l 2 = max x i ∈ B l x i 2 E l - 1 , l ∈ S , Wherein E l = Σ n ∈ B l x n 2
Calculate maximum amplitude-energy ratio and energy ratio again:
F i = max l ( log r l i ) , i = 1,2 ,
Therefore, transient state characteristic can calculate with following mode:
F=0.45F 1+0.55F 2
Obtain after the transient state characteristic, judge according to this feature to start which smoothing method.Transient state characteristic can be an one dimension, also can be higher-dimension, and whether output is bidimensional at least, be transient state frame or non-transient state frame in order to judge this frame.A kind of embodiment then is whether to judge F greater than first threshold value, if greater than would be expressed as the transient state frame, start classification results second smoothing method, otherwise then start first smoothing method.A kind of embodiment of first smoothing method can be that (being that present frame is non-transient state frame) analyzes earlier first three frame, if " non-accident frame, accident frame, non-accident frame " this classification results, all smoothly be non-accident frame then with three frames.A kind of embodiment of second smoothing method can be as feature F during greater than second threshold value (bigger than first threshold value usually), then makes this frame begin first three frame and back three frames all are accident.
The incremental learning module then is to utilize the classification information of classified audio frame and characteristic information to be used as the parameter that one group of incremental learning sample upgrades sorter.A kind of embodiment then is that the training data of preservation in advance and the sample of incremental learning are formed a bigger training sample, and training classifier has reached the purpose of upgrading classifier parameters again.
Pay special attention to, be with the part preferred implementation in above-mentioned description, really in above-mentioned all sorters, can take any one feature extraction algorithm or several feature extraction algorithm, and in wherein involved all sorters, can increase Feature Fusion module or feature dimensionality reduction module arbitrarily, a kind of preferable mode then is to use principal component analysis (PCA) with the feature dimensionality reduction after having extracted feature with before the decision-making classification, and in the related sorter, can take any one sorting technique, a kind of variation example is support vector machine classifier or neural network classifier.In addition, in above-mentioned description in the related sorter, the transient state characteristic extracting method can be any one method, a kind of variation pattern is a perceptual entropy, and the transient state characteristic extracting method can extract one-dimensional characteristic, also can extract high dimensional feature, the output of transient state frame determination methods can be the bidimensional result, also can be higher-dimension result more, and the method that the transient state frame is judged can be any one method, a kind of variation example then is a support vector machine method, and the classification results smoothing algorithm can be an arbitrary method.
In addition, in above-mentioned all sorters, the incremental learning module can adopt incremental learning method arbitrarily.
Be understandable that, for those of ordinary skills, can be equal to replacement or change according to technical scheme of the present invention and inventive concept thereof, and all these changes or replacement all should belong to the protection domain of the appended claim of the present invention.

Claims (14)

1. an audio content classification system comprises training end and test lead, it is characterized in that the training end comprises:
The audio feature extraction module is in order to extract the feature of audio-frequency test sample;
The sorter training module, it trains the parameter of sorter according to the audio frequency characteristics of audio feature extraction module collection and the classification information of this sound signal;
And test lead comprises:
With the shared audio feature extraction module of training end;
The sorter decision-making module is characterized as input according to the output audio of audio feature extraction module, and the classifier parameters that training obtains to first frame utilization training part is classified;
The transient state characteristic extraction module extracts and exports to the level and smooth module of transient state characteristic to the transient state characteristic of this input signal;
The level and smooth module of this transient state characteristic comes the output result of sorter decision-making module is revised and exports;
The incremental learning module utilizes the classification information of classified audio frame and characteristic information to be used as the parameter that one group of incremental learning sample upgrades sorter.
2. audio content classification system as claimed in claim 1, it is characterized in that: the transient state characteristic extraction module extracts the transient state characteristic of present frame and judges, the level and smooth module of transient state characteristic is taked different smoothing processing methods according to the difference of transient state characteristic, when wherein present frame is judged as the transient state frame, adopt second smoothing method, otherwise adopt first smoothing method, wherein first smoothing method is meant and the irrelevant smoothing method of transient state characteristic, and second smoothing method then is the smoothing method relevant with transient state characteristic.
3. audio content classification system as claimed in claim 2 is characterized in that: it is that the input audio frame is divided into M section: B that transient state characteristic extracts l, l=1,2 ..., 32, wherein:
B l = { x N l + 1 , x N l + 2 , . . . . , x N l + 32 } , N l = lN 64 , l = 1,2 , . . . , 64 ;
Calculate every section amplitude sum then, i.e. the absolute value sum of sampled point numerical value obtains:
M i = 1 32 Σ n ∈ B i | x n | , i = 1,2 , . . . , 64 ;
Calculate energy ratio and the amplitude-energy ratio of each section and the last period afterwards again:
r l 1 = E l min ( E l - 1 , E l - 2 ) , r l 2 = max x i ∈ B l x i 2 E l - 1 , l ∈ S , Wherein E l = Σ n ∈ B l x n 2
Calculate maximum amplitude-energy ratio and energy ratio again:
F i = max l ( log r l i ) , i = 1,2 ,
Therefore, transient state characteristic can calculate with following mode:
F=0.45F 1+0.55F 2
Obtain after the transient state characteristic, whether judge F greater than first threshold value, if greater than would be expressed as the transient state frame, then adopt second smoothing method, otherwise then adopt first smoothing method.
4. audio content classification system as claimed in claim 3, it is characterized in that: first smoothing method is to analyze first three frame earlier, if " non-accident frame, accident frame, non-accident frame " this classification results, all smoothly be non-accident frame then with three frames, and a kind of embodiment of second smoothing method can be as feature F during greater than second threshold value, then makes this frame begin first three frame and back three frames all are accident.
5. audio content classification system as claimed in claim 4 is characterized in that: second threshold value is bigger than first threshold value.
6. audio content classification system as claimed in claim 1 is characterized in that: the renewal classifier parameters is to form a bigger training sample by the sample of the training data that will preserve in advance and incremental learning, and training classifier upgrades classifier parameters again.
7. audio content classification system as claimed in claim 1 is characterized in that: also comprise Feature Fusion module or feature dimensionality reduction module in the above-mentioned sorter.
8. audio content classification system as claimed in claim 7 is characterized in that: use principal component analysis (PCA) with the feature dimensionality reduction after having extracted feature and before the decision-making classification.
9. audio content classification system as claimed in claim 1 is characterized in that: the transient state characteristic extracting method is a perceptual entropy.
10. as each described audio content classification system of claim 1 to 9, it is characterized in that: described sorter adopts traditional decision-tree.
11. ask 1 to 9 each described audio content classification system as claim, it is characterized in that: described sorter adopts neural net method.
12. as each described audio content classification system of claim 1 to 9, it is characterized in that: described sorter adopts support vector machine method.
13. as each described audio content classification system of claim 1 to 9, it is characterized in that: described sorter adopts clustering method.
14. as each described audio content classification system of claim 1 to 9, it is characterized in that: described sorter adopts bayes method.
CN2008100353510A 2008-03-28 2008-03-28 Classification system for identifying audio content Active CN101546556B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN2008100353510A CN101546556B (en) 2008-03-28 2008-03-28 Classification system for identifying audio content

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN2008100353510A CN101546556B (en) 2008-03-28 2008-03-28 Classification system for identifying audio content

Publications (2)

Publication Number Publication Date
CN101546556A true CN101546556A (en) 2009-09-30
CN101546556B CN101546556B (en) 2011-03-23

Family

ID=41193649

Family Applications (1)

Application Number Title Priority Date Filing Date
CN2008100353510A Active CN101546556B (en) 2008-03-28 2008-03-28 Classification system for identifying audio content

Country Status (1)

Country Link
CN (1) CN101546556B (en)

Cited By (22)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103000172A (en) * 2011-09-09 2013-03-27 中兴通讯股份有限公司 Signal classification method and device
CN103251405A (en) * 2013-04-18 2013-08-21 深圳市科曼医疗设备有限公司 Method and system for analyzing arrhythmia
CN103337248A (en) * 2013-05-17 2013-10-02 南京航空航天大学 Airport noise event recognition method based on time series kernel clustering
CN103824557A (en) * 2014-02-19 2014-05-28 清华大学 Audio detecting and classifying method with customization function
WO2015018121A1 (en) * 2013-08-06 2015-02-12 华为技术有限公司 Audio signal classification method and device
CN104731979A (en) * 2015-04-16 2015-06-24 广东欧珀移动通信有限公司 Method and device for storing all exclusive information resources of specific user
CN105788592A (en) * 2016-04-28 2016-07-20 乐视控股(北京)有限公司 Audio classification method and apparatus thereof
CN107154866A (en) * 2017-04-19 2017-09-12 腾讯科技(深圳)有限公司 Realize the method and system of service dynamic configuration
CN107943865A (en) * 2017-11-10 2018-04-20 阿基米德(上海)传媒有限公司 It is a kind of to be suitable for more scenes, the audio classification labels method and system of polymorphic type
CN108388942A (en) * 2018-02-27 2018-08-10 四川云淞源科技有限公司 Information intelligent processing method based on big data
CN109147771A (en) * 2017-06-28 2019-01-04 广州视源电子科技股份有限公司 Audio frequency splitting method and system
CN109166593A (en) * 2018-08-17 2019-01-08 腾讯音乐娱乐科技(深圳)有限公司 audio data processing method, device and storage medium
CN109389989A (en) * 2017-08-07 2019-02-26 上海谦问万答吧云计算科技有限公司 Sound mixing method, device, equipment and storage medium
CN109685045A (en) * 2018-06-25 2019-04-26 鲁东大学 A kind of Moving Targets Based on Video Streams tracking and system
CN110132598A (en) * 2019-05-13 2019-08-16 中国矿业大学 Slewing rolling bearing fault noise diagnostics algorithm
CN110910906A (en) * 2019-11-12 2020-03-24 国网山东省电力公司临沂供电公司 Audio endpoint detection and noise reduction method based on power intranet
CN110959159A (en) * 2017-07-25 2020-04-03 谷歌有限责任公司 Speech classifier
TWI690862B (en) * 2017-10-12 2020-04-11 英屬開曼群島商意騰科技股份有限公司 Local learning system in artificial intelligence device
CN111385688A (en) * 2018-12-29 2020-07-07 安克创新科技股份有限公司 Active noise reduction method, device and system based on deep learning
CN111433843A (en) * 2017-10-27 2020-07-17 谷歌有限责任公司 Unsupervised learning of semantic audio representations
CN111681674A (en) * 2020-06-01 2020-09-18 中国人民大学 Method and system for identifying musical instrument types based on naive Bayes model
CN113920473A (en) * 2021-10-15 2022-01-11 宿迁硅基智能科技有限公司 Complete event determination method, storage medium and electronic device

Family Cites Families (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
SE501305C2 (en) * 1993-05-26 1995-01-09 Ericsson Telefon Ab L M Method and apparatus for discriminating between stationary and non-stationary signals
US6292776B1 (en) * 1999-03-12 2001-09-18 Lucent Technologies Inc. Hierarchial subband linear predictive cepstral features for HMM-based speech recognition
DE10119284A1 (en) * 2001-04-20 2002-10-24 Philips Corp Intellectual Pty Method and system for training parameters of a pattern recognition system assigned to exactly one implementation variant of an inventory pattern
KR100467617B1 (en) * 2002-10-30 2005-01-24 삼성전자주식회사 Method for encoding digital audio using advanced psychoacoustic model and apparatus thereof
CN100422999C (en) * 2006-09-14 2008-10-01 浙江大学 Transmedia searching method based on content correlation

Cited By (33)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103000172A (en) * 2011-09-09 2013-03-27 中兴通讯股份有限公司 Signal classification method and device
CN103251405A (en) * 2013-04-18 2013-08-21 深圳市科曼医疗设备有限公司 Method and system for analyzing arrhythmia
CN103251405B (en) * 2013-04-18 2015-04-08 深圳市科曼医疗设备有限公司 System for analyzing arrhythmia
CN103337248A (en) * 2013-05-17 2013-10-02 南京航空航天大学 Airport noise event recognition method based on time series kernel clustering
CN103337248B (en) * 2013-05-17 2015-07-29 南京航空航天大学 A kind of airport noise event recognition based on time series kernel clustering
US11289113B2 (en) 2013-08-06 2022-03-29 Huawei Technolgies Co. Ltd. Linear prediction residual energy tilt-based audio signal classification method and apparatus
WO2015018121A1 (en) * 2013-08-06 2015-02-12 华为技术有限公司 Audio signal classification method and device
US11756576B2 (en) 2013-08-06 2023-09-12 Huawei Technologies Co., Ltd. Classification of audio signal as speech or music based on energy fluctuation of frequency spectrum
US10529361B2 (en) 2013-08-06 2020-01-07 Huawei Technologies Co., Ltd. Audio signal classification method and apparatus
US10090003B2 (en) 2013-08-06 2018-10-02 Huawei Technologies Co., Ltd. Method and apparatus for classifying an audio signal based on frequency spectrum fluctuation
CN103824557B (en) * 2014-02-19 2016-06-15 清华大学 A kind of audio detection sorting technique with custom feature
WO2015124006A1 (en) * 2014-02-19 2015-08-27 清华大学 Audio detection and classification method with customized function
CN103824557A (en) * 2014-02-19 2014-05-28 清华大学 Audio detecting and classifying method with customization function
CN104731979A (en) * 2015-04-16 2015-06-24 广东欧珀移动通信有限公司 Method and device for storing all exclusive information resources of specific user
CN105788592A (en) * 2016-04-28 2016-07-20 乐视控股(北京)有限公司 Audio classification method and apparatus thereof
CN107154866A (en) * 2017-04-19 2017-09-12 腾讯科技(深圳)有限公司 Realize the method and system of service dynamic configuration
CN109147771A (en) * 2017-06-28 2019-01-04 广州视源电子科技股份有限公司 Audio frequency splitting method and system
CN110959159A (en) * 2017-07-25 2020-04-03 谷歌有限责任公司 Speech classifier
CN109389989B (en) * 2017-08-07 2021-11-30 苏州谦问万答吧教育科技有限公司 Sound mixing method, device, equipment and storage medium
CN109389989A (en) * 2017-08-07 2019-02-26 上海谦问万答吧云计算科技有限公司 Sound mixing method, device, equipment and storage medium
TWI690862B (en) * 2017-10-12 2020-04-11 英屬開曼群島商意騰科技股份有限公司 Local learning system in artificial intelligence device
CN111433843A (en) * 2017-10-27 2020-07-17 谷歌有限责任公司 Unsupervised learning of semantic audio representations
CN107943865A (en) * 2017-11-10 2018-04-20 阿基米德(上海)传媒有限公司 It is a kind of to be suitable for more scenes, the audio classification labels method and system of polymorphic type
CN108388942A (en) * 2018-02-27 2018-08-10 四川云淞源科技有限公司 Information intelligent processing method based on big data
CN109685045A (en) * 2018-06-25 2019-04-26 鲁东大学 A kind of Moving Targets Based on Video Streams tracking and system
CN109685045B (en) * 2018-06-25 2020-09-29 鲁东大学 Moving target video tracking method and system
CN109166593A (en) * 2018-08-17 2019-01-08 腾讯音乐娱乐科技(深圳)有限公司 audio data processing method, device and storage medium
CN111385688A (en) * 2018-12-29 2020-07-07 安克创新科技股份有限公司 Active noise reduction method, device and system based on deep learning
CN110132598A (en) * 2019-05-13 2019-08-16 中国矿业大学 Slewing rolling bearing fault noise diagnostics algorithm
CN110910906A (en) * 2019-11-12 2020-03-24 国网山东省电力公司临沂供电公司 Audio endpoint detection and noise reduction method based on power intranet
CN111681674A (en) * 2020-06-01 2020-09-18 中国人民大学 Method and system for identifying musical instrument types based on naive Bayes model
CN111681674B (en) * 2020-06-01 2024-03-08 中国人民大学 Musical instrument type identification method and system based on naive Bayesian model
CN113920473A (en) * 2021-10-15 2022-01-11 宿迁硅基智能科技有限公司 Complete event determination method, storage medium and electronic device

Also Published As

Publication number Publication date
CN101546556B (en) 2011-03-23

Similar Documents

Publication Publication Date Title
CN101546556B (en) Classification system for identifying audio content
Demir et al. A new pyramidal concatenated CNN approach for environmental sound classification
US7457749B2 (en) Noise-robust feature extraction using multi-layer principal component analysis
CN101685634B (en) Children speech emotion recognition method
CN103646649B (en) A kind of speech detection method efficiently
CN109767785A (en) Ambient noise method for identifying and classifying based on convolutional neural networks
EP3701528B1 (en) Segmentation-based feature extraction for acoustic scene classification
CN102436810A (en) Record replay attack detection method and system based on channel mode noise
CN104167208A (en) Speaker recognition method and device
CN101546557B (en) Method for updating classifier parameters for identifying audio content
CN103761965B (en) A kind of sorting technique of instrument signal
CN112528920A (en) Pet image emotion recognition method based on depth residual error network
Koerich et al. Cross-representation transferability of adversarial attacks: From spectrograms to audio waveforms
CN102567512B (en) Method and device for webpage video control by classification
CN103474072A (en) Rapid anti-noise twitter identification method by utilizing textural features and random forest (RF)
KR20210043833A (en) Apparatus and Method for Classifying Animal Species Noise Robust
CN105702251A (en) Speech emotion identifying method based on Top-k enhanced audio bag-of-word model
CN108806725A (en) Speech differentiation method, apparatus, computer equipment and storage medium
CN101620852A (en) Speech-emotion recognition method based on improved quadratic discriminant
CN116884435A (en) Voice event detection method and device based on audio prompt learning
Islam et al. DCNN-LSTM based audio classification combining multiple feature engineering and data augmentation techniques
Swaminathan et al. Multi-label classification for acoustic bird species detection using transfer learning approach
CN114626412A (en) Multi-class target identification method and system for unattended sensor system
Shim et al. Attentive max feature map for acoustic scene classification with joint learning considering the abstraction of classes
Xie et al. Image processing and classification procedure for the analysis of australian frog vocalisations

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
C14 Grant of patent or utility model
GR01 Patent grant
TR01 Transfer of patent right

Effective date of registration: 20180410

Address after: The 300456 Tianjin FTA test area (Dongjiang Bonded Port) No. 6865 North Road, 1-1-1802-7 financial and trade center of Asia

Patentee after: Xinji Lease (Tianjin) Co.,Ltd.

Address before: Pudong Zhangjiang Zuchongzhi road 201203 Lane 2288 Shanghai City Center Building 1 houses

Patentee before: SPREADTRUM COMMUNICATIONS (SHANGHAI) Co.,Ltd.

TR01 Transfer of patent right
EE01 Entry into force of recordation of patent licensing contract

Application publication date: 20090930

Assignee: SPREADTRUM COMMUNICATIONS (SHANGHAI) Co.,Ltd.

Assignor: Xinji Lease (Tianjin) Co.,Ltd.

Contract record no.: 2018990000196

Denomination of invention: Classification system for identifying audio content

Granted publication date: 20110323

License type: Exclusive License

Record date: 20180801

EE01 Entry into force of recordation of patent licensing contract
TR01 Transfer of patent right

Effective date of registration: 20221018

Address after: 201203 Shanghai city Zuchongzhi road Pudong New Area Zhangjiang hi tech park, Spreadtrum Center Building 1, Lane 2288

Patentee after: SPREADTRUM COMMUNICATIONS (SHANGHAI) Co.,Ltd.

Address before: 300456 1-1-1802-7, north area of financial and Trade Center, No. 6865, Asia Road, Tianjin pilot free trade zone (Dongjiang Bonded Port Area)

Patentee before: Xinji Lease (Tianjin) Co.,Ltd.

TR01 Transfer of patent right