CN109036390B - Broadcast keyword identification method based on integrated gradient elevator - Google Patents

Broadcast keyword identification method based on integrated gradient elevator Download PDF

Info

Publication number
CN109036390B
CN109036390B CN201810929482.7A CN201810929482A CN109036390B CN 109036390 B CN109036390 B CN 109036390B CN 201810929482 A CN201810929482 A CN 201810929482A CN 109036390 B CN109036390 B CN 109036390B
Authority
CN
China
Prior art keywords
training
keyword
broadcast
data
classifier
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201810929482.7A
Other languages
Chinese (zh)
Other versions
CN109036390A (en
Inventor
雒瑞森
龚晓峰
王琛
费绍敏
余勤
王建
冯谦
杨晓梅
任小梅
曾晓东
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Sichuan University
Original Assignee
Sichuan University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Sichuan University filed Critical Sichuan University
Priority to CN201810929482.7A priority Critical patent/CN109036390B/en
Publication of CN109036390A publication Critical patent/CN109036390A/en
Application granted granted Critical
Publication of CN109036390B publication Critical patent/CN109036390B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/08Speech classification or search
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/06Creation of reference templates; Training of speech recognition systems, e.g. adaptation to the characteristics of the speaker's voice
    • G10L15/063Training
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/03Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the type of extracted parameters
    • G10L25/24Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the type of extracted parameters the extracted parameters being the cepstrum

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Multimedia (AREA)
  • Artificial Intelligence (AREA)
  • Acoustics & Sound (AREA)
  • Human Computer Interaction (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Health & Medical Sciences (AREA)
  • Computational Linguistics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Evolutionary Computation (AREA)
  • Evolutionary Biology (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Signal Processing (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

The invention discloses a broadcast keyword identification method based on an integrated gradient elevator. Under the condition that a single keyword exists, the method can improve the recall rate (one of important indexes for evaluating the keyword identification and often needing to be improved most) of the keyword identification to more than 80 percent, and simultaneously maintain the overall accuracy rate of about 70 percent. Meanwhile, for the index F1 score for identifying the unbalance sample, the method can improve the identification reliability from about 0.04 to about 0.31 of a single gradient elevator on the test sample.

Description

Broadcast keyword identification method based on integrated gradient elevator
Technical Field
The invention relates to the technical field of information acquisition, in particular to a broadcast keyword identification method based on an integrated gradient elevator.
Background
The broadcast keyword recognition is mainly applied to broadcast content analysis, and has wide applications in the aspects of related information acquisition, efficient data mining, radio spectrum control and the like of broadcast content. The working principle of the broadcast keyword recognition is to automatically find out a segment containing a specific keyword from a segment of broadcast recording and automatically analyze the broadcast content according to the keyword segment. Conventionally, the work of analyzing the broadcast content is mostly completed manually, and the method has the defects of high cost, long time consumption, easy error occurrence and the like. The automatic broadcast keyword recognition can be realized by a computer or an integrated system through a reliable algorithm, so that the cost is reduced, the efficiency is improved, and errors possibly caused by manual work are avoided.
The core of broadcast keyword identification is the algorithm to find keywords from broadcast segments. Intuitively, we can design a rule-based algorithm to determine keywords according to the characteristics of the broadcast paragraphs. However, since the broadcast signal is a voice signal, the amount of information is large, and the data structure is complicated, the simple rule-based method often cannot achieve the expected effect. In addition to the rule-based method, since the broadcast signal is one of the voice signals, there is a design in which processing is performed using a voice recognition system in the conventional method. However, the broadcast signal is often different from the general speech signal, and has interference information such as special noise and music background, and in radio spectrum management, the recognition system is often required to be used in an off-line environment due to the need of confidentiality, so that it is difficult for the general speech recognition system to obtain an ideal effect on the recognition of the broadcast keyword. In addition, since the broadcast keyword recognition needs to face a large number of unbalanced samples (the keywords account for only a small part of the whole broadcast), the general algorithm is prone to miss keywords or misclassify non-keywords into keywords, which results in recognition errors.
Disclosure of Invention
Aiming at the defects in the prior art, the method for identifying the broadcast keywords based on the integrated gradient elevator solves the problem that the identification of the broadcast keywords is easy to be wrong.
In order to achieve the purpose of the invention, the invention adopts the technical scheme that: a broadcast keyword identification method based on an integrated gradient elevator comprises the following steps:
s1, dividing the training broadcast into training broadcast segments of 3-5S, and performing feature transformation on the training broadcast segments to obtain training data MFCC features;
s2, extracting training samples from the training data according to the MFCC characteristics of the training data, and undersampling a plurality of groups of random non-keyword samples through random place-back sampling to obtain a plurality of groups of balanced training subsets;
s3, performing Tomek Link noise reduction processing on each group of balance training subsets to obtain noise reduction balance training subsets;
s4, training an independent gradient lifting machine model for the noise reduction balance training subset of the single keyword through a GBM algorithm to obtain a gradient lifting classifier;
s5, integrating the gradient lifting classifier through a bagging algorithm to obtain an integrated gradient lifting classifier, and adjusting the probability threshold of the integrated gradient lifting classifier through training data;
s6, dividing the to-be-tested broadcast into 3-5S to-be-tested broadcast segments, and performing feature transformation on the to-be-tested broadcast segments to obtain MFCC (Mel frequency cepstrum coefficient) features of to-be-tested data;
and S7, putting the MFCC features of the data to be tested into an integrated gradient boost classifier for keyword recognition to obtain a recognition result.
Further: the Tomek Link denoising processing in step S3 specifically includes: for balance training subset X except data XiAnd data xjData x ofkI.e. xk∈X\{xi,xjIs satisfied to xiSum of distance to xjAre all greater than xiAnd xjDistance of (d), i.e. dist (x)i,xj)<dist(xi,xk) Anddist(xi,xj)<dist(xj,xk) Then x isiAnd xjIs a pair of Tomek links, if the data x in the pair of Tomek linksiAnd data xjIf they belong to different classes, x is deletediOr xj
Further: the GBM algorithm in step S4 includes the following specific steps:
s41 order model FK(x) Comprises the following steps:
Figure BDA0001766194550000031
in the above formula, fk(x;θk) Is a sub-model of step k, αkIs fk(x;θk) Corresponding weight, K is the current step number, i.e. the current total operation step number, x is the recording sample, i.e. the training broadcast segment, thetakIs fk(x;θk) A set of parameters of (a);
s42, calculating the distance r between the predicted value and the true value of the model in the step K +1K+1The calculation formula is as follows:
Figure BDA0001766194550000032
in the above formula, L (y, F)K(x) Is a loss function, FK(x) Is a model predicted value, and y is a true value;
s43, fitting the model parameter theta of the K +1 stepK+1The fitting formula is:
Figure BDA0001766194550000033
in the above formula, θ is a model parameter, fK+1(x;θk) The submodel in the step K + 1;
s44, calculating the weight alpha of the K +1 stepK+1The calculation formula is as follows:
Figure BDA0001766194550000034
in the above formula, α is a weight coefficient;
s45, model FK(x) Iteration is carried out to obtain an updated optimization model FK+1(x) I.e. gradient boost classifier:
FK+1(x)=FK(x)+αK+1fK+1(x;θK+1)。
further: the specific adjusting method of the integrated gradient boost classifier probability threshold in step S5 is as follows:
s51, calculating the probability of the prediction examples in the keyword class, wherein the calculation formula is as follows:
Figure BDA0001766194550000035
in the above formula, T is the number of the balance training subsets, Ht(xi) As a result of prediction of a single classifier, alphatTaking alpha as the weight employedt=1/T;
S52, judging whether the sample contains keywords or not, wherein the judgment formula is as follows:
Figure BDA0001766194550000041
in the above formula, δ is an adjustable probability threshold;
and S53, outputting the probability threshold value of the determined keywords/non-keywords through probability adjustment of the example in the keyword class.
The invention has the beneficial effects that: the method can improve the recall rate (one of important indexes for evaluating the keyword recognition and often needing to be improved) of the keyword recognition to more than 80 percent, and simultaneously keeps the overall accuracy rate of about 70 percent. Meanwhile, for the index F1 score for identifying the unbalance sample, the method can improve the identification reliability from about 0.04 to about 0.31 of a single gradient elevator on the test sample.
Drawings
FIG. 1 is a flow chart of the present invention;
FIG. 2 is a schematic diagram of the performance of a single reference gradient elevator based on a complete data set according to the present invention;
FIG. 3 is a schematic diagram of the performance of a single gradient elevator using an undersampling scheme of the present invention;
FIG. 4 is a schematic diagram of the performance of the present invention using 5 integrated gradient elevators;
fig. 5 is a schematic diagram of the performance of the present invention using 10 integrated gradient elevators.
Detailed Description
The following description of the embodiments of the present invention is provided to facilitate the understanding of the present invention by those skilled in the art, but it should be understood that the present invention is not limited to the scope of the embodiments, and it will be apparent to those skilled in the art that various changes may be made without departing from the spirit and scope of the invention as defined and defined in the appended claims, and all matters produced by the invention using the inventive concept are protected.
As shown in fig. 1, a broadcast keyword recognition method based on an integrated gradient elevator is characterized by comprising the following steps:
and S1, dividing the training broadcast into training broadcast segments of 3-5S, and performing feature transformation on the training broadcast segments to obtain the MFCC features of the training data.
And S2, extracting training samples from the training data according to the MFCC characteristics of the training data, and undersampling a plurality of groups of random non-keyword samples through random place-back sampling to obtain a plurality of groups of balanced training subsets. Because keyword data typically accounts for a small percentage of the data set, the amount is limited. If the keyword data is further undersampled, the number of keywords is insufficient. On the contrary, because the data volume of the non-keyword is large, the non-keyword is put back for random sampling, so that the reserved non-keyword sample volume is equivalent to the keyword sample volume, the imbalance problem can be reduced, and the manifold of the keyword data distribution cannot be damaged. Ideally, if the number of the keyword data is oneNumber mkThen, to obtain balanced samples, we set the number p of non-keyword sample samples to mk. However, since we need to remove some non-keyword instances overlapping with the keyword class by Tomek Link denoising subsequently, we need to multisample some non-keyword instances at the beginning, i.e. set
Figure BDA0001766194550000051
And S3, performing Tomek Link noise reduction processing on each group of balance training subsets to obtain noise reduction balance training subsets. The Tomek Link noise reduction processing specifically comprises the following steps: for balance training subset X except data XiAnd data xjData x ofkI.e. xk∈X\{xi,xjIs satisfied to xiSum of distance to xjAre all greater than xiAnd xjDistance of (d), i.e. dist (x)i,xj)<dist(xi,xk) And dist (x)i,xj)<dist(xj,xk) Then xiAnd xjIs a pair of Tomek links, if the data x in the pair of Tomek linksiAnd data xjBelong to different classes, delete xiOr xj
S4, training an independent gradient lifting machine model for the noise reduction balance training subset of the single keyword through a GBM algorithm to obtain a gradient lifting classifier, wherein the GBM algorithm comprises the following specific steps:
s41 order model FK(x) Comprises the following steps:
Figure BDA0001766194550000052
in the above formula, fk(x;θk) Is a sub-model of step k, αkIs fk(x;θk) Corresponding weight, K is the current step number, i.e. the current total operation step number, x is the recording sample, i.e. the training broadcast segment, thetakIs fk(x;θk) A set of parameters of (a);
s42, calculating the distance r between the predicted value and the true value of the model in the step K +1K+1The calculation formula is as follows:
Figure BDA0001766194550000061
in the above formula, L (y, F)K(x) Is a loss function, FK(x) Is a model predicted value, and y is a true value;
s43, fitting the model parameter theta of the K +1 stepK+1The fitting formula is:
Figure BDA0001766194550000062
in the above formula, θ is a model parameter, fK+1(x;θk) The submodel in the step K + 1;
s44, calculating the weight alpha of the K +1 stepK+1The calculation formula is as follows:
Figure BDA0001766194550000063
in the above formula, α is a weight coefficient;
s45, model FK(x) Iteration is carried out to obtain an updated optimization model FK+1(x) I.e. gradient boost classifier:
FK+1(x)=FK(x)+αK+1fK+1(x;θK+1)。
s5, integrating the gradient boost classifier through a bagging method to obtain an integrated gradient boost classifier, and adjusting the probability threshold of the integrated gradient boost classifier through training data, wherein the specific adjusting method of the probability threshold of the integrated gradient boost classifier is as follows:
s51, calculating the probability of the prediction examples in the keyword class, wherein the calculation formula is as follows:
Figure BDA0001766194550000064
in the above formula, T is the number of the balance training subsets, Ht(xi) As a result of prediction of a single classifier, alphatTaking alpha as the weight employedt=1/T;
S52, judging whether the sample contains keywords or not, wherein the judgment formula is as follows:
Figure BDA0001766194550000071
in the above formula, δ is an adjustable probability threshold;
s53, outputting a probability threshold value for determining the keyword/non-keyword by adjusting the probability in the keyword class by example, which is generally 0.5.
S6, dividing the to-be-tested broadcast into 3-5S to-be-tested broadcast segments, and performing feature transformation on the to-be-tested broadcast segments to obtain the MFCC features of the to-be-tested data.
And S7, putting the MFCC features of the data to be tested into an integrated gradient boost classifier for keyword recognition to obtain a recognition result.
In one embodiment of the present invention, to demonstrate the effectiveness of the present invention, we extracted a sentence of data consisting of 133 radio broadcast records. A small part of these contain the keyword "beijing time" (mandarin), and our goal is to identify the keyword from the broadcast segment. All broadcast audio was processed for 5 second segments, and with this processing we obtained a total of 6906 recordings, of which 197 contain keywords.
Since the labels identified by broadcast keywords are highly skewed (keyword/non-keyword sample size is unbalanced), the classification of data can achieve high accuracy by simply predicting all instances as non-keywords, and thus a common accuracy index cannot sufficiently represent the quality of the algorithm. In label imbalance data classification, indexes of precision and recall (recall) are usually adopted to measure the quality of the algorithm. Using TP, FP, TN, FN to indicate that the classification result is determined as true positive, false positive, true negative and false negative, and then the accuracy precision and recall of the positive category can be calculated by the following formula:
Figure BDA0001766194550000072
the same method can be applied to the negative category. For a specific positive or negative class, we can calculate the F1 score for this algorithm:
Figure BDA0001766194550000081
in the experiment, we focused on four evaluation indices: overall classification accuracy, recall of non-keywords, recall of keywords, and F1 score for keyword classes. We take the F1 score of the keyword class as the overall evaluation index because our task is to identify keywords. The output of our model is the probability of the example being a few classes (key, label 1), whose delta value we can adjust to obtain the best prediction result. In our experiments, the tested delta values ranged from 0 to 1 (open interval) with a precision length of 0.05. As shown in fig. 2, the variation of classification accuracy, majority (non-keyword) recall, and minority (keyword) recall is demonstrated. 4 different models were tested in the experiment based on a gradient elevator classifier and the parameters were optimally adjusted by validation. The reference model is a single gradient elevator (xgboost implementation) classifier. In the figure, the x-axis represents the delta value and the y-axis represents the precision/recall. As can be seen from fig. 2, the keyword recall rate and the precision rate are the highest in the second graph (training set), and the keyword recall rate is greatly reduced in both the verification set and the test set, indicating that the model has the problem of overfitting.
The second model tested was a single gradient elevator classifier using an undersampling method. This method can be interpreted as "single model integration" and can take advantage of the advantages of undersampling and Tomek Link, but does not have the advantages of an ensemble-based classifier. As can be seen in fig. 3, the overfitting problem is alleviated and the classifier no longer predicts most instances as non-key words. Although the recall rate of non-keyword data is reduced, the overall performance is improved.
Finally, 5 and 10 integrated gradient elevator classifiers based on the bagging algorithm, which followed the previously proposed technique, were tested using the same data set, with the results shown in fig. 4 and 5. These two figures can be seen with two lifts: first, the comprehensive recall rate of keywords/non-keywords is greatly increased. Second, when adding 1 more classifier per whole model, the impact of different delta values on performance becomes significant. The choice of δ can be done by verification and we can get the best output and prediction confidence for each instance.
Table 1 shows the best F1 score for the test minority (keyword) data and the accuracy/recall at that score. There is also an additional "balanced F1 score" index, which means the F1 score calculated assuming the same number of keyword and non-keyword instances. The index further emphasizes successful identification of keyword class data and increases its retrieval recall.
TABLE 1
Figure BDA0001766194550000091

Claims (4)

1. A broadcast keyword identification method based on an integrated gradient elevator is characterized by comprising the following steps:
s1, dividing the training broadcast into training broadcast segments of 3-5S, and performing feature transformation on the training broadcast segments to obtain training data MFCC features;
s2, extracting training samples from the training data according to the MFCC characteristics of the training data, and undersampling a plurality of groups of random non-keyword samples through random place-back sampling to obtain a plurality of groups of balanced training subsets;
s3, performing Tomek Link noise reduction processing on each group of balance training subsets to obtain noise reduction balance training subsets;
s4, training an independent gradient lifting machine model for the noise reduction balance training subset of the single keyword through a GBM algorithm to obtain a gradient lifting classifier;
s5, integrating the gradient lifting classifier through a bagging algorithm to obtain an integrated gradient lifting classifier, and adjusting the probability threshold of the integrated gradient lifting classifier through training data;
s6, dividing the to-be-tested broadcast into 3-5S to-be-tested broadcast segments, and performing feature transformation on the to-be-tested broadcast segments to obtain MFCC (Mel frequency cepstrum coefficient) features of to-be-tested data;
and S7, putting the MFCC features of the data to be tested into an integrated gradient boost classifier for keyword recognition to obtain a recognition result.
2. The integrated gradient elevator-based broadcast keyword recognition method as claimed in claim 1, wherein the Tomek Link denoising process in step S3 specifically comprises: for balance training subset X except data XiAnd data xjData x ofkI.e. xk∈X\{xi,xjIs satisfied to xiSum of distance to xjAre all greater than xiAnd xjDistance of (d), i.e. dist (x)i,xj)<dist(xi,xk) And dist (x)i,xj)<dist(xj,xk) Then xiAnd xjIs a pair of Tomek links, if the data x in the pair of Tomek linksiAnd data xjIf they belong to different classes, x is deletediOr xj
3. The integrated gradient elevator-based broadcast keyword recognition method as claimed in claim 1, wherein the GBM algorithm in step S4 comprises the following specific steps:
s41 order model FK(x) Comprises the following steps:
Figure FDA0001766194540000021
the upper typeIn fk(x;θk) Is a sub-model of step k, αkIs fk(x;θk) Corresponding weight, K is the current step number, i.e. the current total operation step number, x is the recording sample, i.e. the training broadcast segment, thetakIs fk(x;θk) A set of parameters of (a);
s42, calculating the distance r between the predicted value and the true value of the model in the step K +1K+1The calculation formula is as follows:
Figure FDA0001766194540000022
in the above formula, L (y, F)K(x) Is a loss function, FK(x) Is a model predicted value, and y is a true value;
s43, fitting the model parameter theta of the K +1 stepK+1The fitting formula is:
Figure FDA0001766194540000023
in the above formula, θ is a model parameter, fK+1(x;θk) The submodel in the step K + 1;
s44, calculating the weight alpha of the K +1 stepK+1The calculation formula is as follows:
Figure FDA0001766194540000024
in the above formula, α is a weight coefficient;
s45, model FK(x) Iteration is carried out to obtain an updated optimization model FK+1(x) I.e. gradient boost classifier:
FK+1(x)=FK(x)+αK+1fK+1(x;θK+1)。
4. the integrated gradient elevator-based broadcast keyword recognition method as claimed in claim 1, wherein the specific adjustment method of the integrated gradient elevator classifier probability threshold in step S5 is as follows:
s51, calculating the probability of the prediction examples in the keyword class, wherein the calculation formula is as follows:
Figure FDA0001766194540000025
in the above formula, T is the number of the balance training subsets, Ht(xi) As a result of prediction of a single classifier, alphatTaking alpha as the weight employedt=1/T;
S52, judging whether the sample contains keywords or not, wherein the judgment formula is as follows:
Figure FDA0001766194540000031
in the above formula, δ is an adjustable probability threshold;
and S53, outputting the probability threshold value of the determined keywords/non-keywords through probability adjustment of the example in the keyword class.
CN201810929482.7A 2018-08-15 2018-08-15 Broadcast keyword identification method based on integrated gradient elevator Active CN109036390B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201810929482.7A CN109036390B (en) 2018-08-15 2018-08-15 Broadcast keyword identification method based on integrated gradient elevator

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201810929482.7A CN109036390B (en) 2018-08-15 2018-08-15 Broadcast keyword identification method based on integrated gradient elevator

Publications (2)

Publication Number Publication Date
CN109036390A CN109036390A (en) 2018-12-18
CN109036390B true CN109036390B (en) 2022-07-08

Family

ID=64631548

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201810929482.7A Active CN109036390B (en) 2018-08-15 2018-08-15 Broadcast keyword identification method based on integrated gradient elevator

Country Status (1)

Country Link
CN (1) CN109036390B (en)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110299133B (en) * 2019-07-03 2021-05-28 四川大学 Method for judging illegal broadcast based on keyword

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106682642A (en) * 2017-01-06 2017-05-17 竹间智能科技(上海)有限公司 Multi-language-oriented behavior identification method and multi-language-oriented behavior identification system
CN108010527A (en) * 2017-12-19 2018-05-08 深圳市欧瑞博科技有限公司 Audio recognition method, device, computer equipment and storage medium
CN108257593A (en) * 2017-12-29 2018-07-06 深圳和而泰数据资源与云技术有限公司 A kind of audio recognition method, device, electronic equipment and storage medium

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US9031829B2 (en) * 2013-02-08 2015-05-12 Machine Zone, Inc. Systems and methods for multi-user multi-lingual communications

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106682642A (en) * 2017-01-06 2017-05-17 竹间智能科技(上海)有限公司 Multi-language-oriented behavior identification method and multi-language-oriented behavior identification system
CN108010527A (en) * 2017-12-19 2018-05-08 深圳市欧瑞博科技有限公司 Audio recognition method, device, computer equipment and storage medium
CN108257593A (en) * 2017-12-29 2018-07-06 深圳和而泰数据资源与云技术有限公司 A kind of audio recognition method, device, electronic equipment and storage medium

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
"Data Mining on Imbalanced Data Sets";Qiong Gu等;《2008 International Conference on Advanced Computer Theory and Engineering》;20081231;全文 *
"Acoustic scene classification by ensembling gradient boosting machine and convolutional neural networks";E Fonseca等;《Detection and Classification of Acoustic Scenes and Events 2017》;20171016;全文 *
"集成学习中有关算法的研究";张春霞;《中国博士学位论文全文数据库》;20101231;全文 *

Also Published As

Publication number Publication date
CN109036390A (en) 2018-12-18

Similar Documents

Publication Publication Date Title
CN102799899B (en) Special audio event layered and generalized identification method based on SVM (Support Vector Machine) and GMM (Gaussian Mixture Model)
CN111243602B (en) Voiceprint recognition method based on gender, nationality and emotion information
CN110188047B (en) Double-channel convolutional neural network-based repeated defect report detection method
CN102982804A (en) Method and system of voice frequency classification
CN106376002B (en) Management method and device and spam monitoring system
Muscariello et al. Audio keyword extraction by unsupervised word discovery
JPH08512148A (en) Topic discriminator
CN102915729B (en) Speech keyword spotting system and system and method of creating dictionary for the speech keyword spotting system
CN102915728B (en) Sound segmentation device and method and speaker recognition system
US11481707B2 (en) Risk prediction system and operation method thereof
CN110428845A (en) Composite tone detection method, system, mobile terminal and storage medium
CN106910495A (en) A kind of audio classification system and method for being applied to abnormal sound detection
CN112417132B (en) New meaning identification method for screening negative samples by using guest information
CN103077720A (en) Speaker identification method and system
CN108335699A (en) A kind of method for recognizing sound-groove based on dynamic time warping and voice activity detection
CN115457966B (en) Pig cough sound identification method based on improved DS evidence theory multi-classifier fusion
CN104240719A (en) Feature extraction method and classification method for audios and related devices
CN106650446A (en) Identification method and system of malicious program behavior, based on system call
CN112256849A (en) Model training method, text detection method, device, equipment and storage medium
CN109036390B (en) Broadcast keyword identification method based on integrated gradient elevator
US9697825B2 (en) Audio recording triage system
CN104239372B (en) A kind of audio data classification method and device
CN109979482B (en) Audio evaluation method and device
CN113792141A (en) Feature selection method based on covariance measurement factor
CN112967712A (en) Synthetic speech detection method based on autoregressive model coefficient

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant