CN109036390A

CN109036390A - A kind of broadcast keyword recognition method based on integrated gradient elevator

Info

Publication number: CN109036390A
Application number: CN201810929482.7A
Authority: CN
Inventors: 雒瑞森; 龚晓峰; 王琛; 费绍敏; 余勤; 王建; 冯谦; 杨晓梅; 任小梅; 曾晓东
Original assignee: Sichuan University
Current assignee: Sichuan University
Priority date: 2018-08-15
Filing date: 2018-08-15
Publication date: 2018-12-18
Anticipated expiration: 2038-08-15
Also published as: CN109036390B

Abstract

The invention discloses a kind of broadcast keyword recognition methods based on integrated gradient elevator.In the presence of single keyword, the recall rate of keyword recognition (can be evaluated one of the important indicator of keyword recognition, and often need most promotion) promotion to 80% or more by the present invention, while keep 70% or so whole accuracy rate.Meanwhile the index F1 score for measuring uneven specimen discerning, the present invention can promote it to 0.31 or so from 0.04 or so of single gradient elevator in test sample, be greatly improved the reliability of identification.

Description

A kind of broadcast keyword recognition method based on integrated gradient elevator

Technical field

The present invention relates to technology for information acquisition fields, and in particular to a kind of broadcast keyword based on integrated gradient elevator Recognition methods.

Background technique

Broadcast keyword recognition is mainly used in broadcasted content analysis, obtains, efficiently in the relevant information of broadcasted content Data mining and radio-frequency spectrum control etc. are widely used in.The working principle for broadcasting keyword recognition is from one The segment containing special key words is found out automatically in section sound program recording, and broadcasted content is automatically analyzed according to keyword segment.It passes On system, the work of broadcasted content analysis has the shortcomings that at high cost, time-consuming, is easy to appear fault mostly by being accomplished manually.And from Dynamic broadcast keyword recognition can then be identified by reliable algorithm by computer or integrated form system, thus reduce at Originally, improve efficiency and avoid the mistake being manually likely to occur.

The core of broadcast keyword recognition is to find out the algorithm of keyword from broadcast segment.Intuitively say, we Rule-based algorithm can be designed, keyword is determined according to the feature of broadcast paragraph.However, since broadcast singal belongs to voice One kind of signal, information content is larger, data structure is complex, so simple rule-based method tends not to reach Expected effect.In addition to rule-based method, since broadcast singal belongs to one kind of voice signal, also have in previous method The design handled using speech recognition system.But since broadcast singal is often compared with general voice signal difference It is larger, there are the interference informations such as special noise and music background, and in radio-frequency spectrum control, often for secrecy It needing, identifying system needs use under offline environment, so general speech recognition system is also difficult to know in broadcast keyword Ideal effect Shang not obtained.In addition, since broadcast keyword recognition is generally required in face of a large amount of uneven sample (keyword phases Very small part is only accounted for for whole broadcast), so general algorithm is easy to omit keyword, or non-keyword is accidentally divided into pass Key word causes the mistake of identification.

Summary of the invention

For above-mentioned deficiency in the prior art, a kind of broadcast based on integrated gradient elevator provided by the invention is crucial Word recognition methods solves the problems, such as that broadcast keyword recognition is easy mistake.

In order to achieve the above object of the invention, the technical solution adopted by the present invention are as follows: a kind of based on integrated gradient elevator Broadcast keyword recognition method, comprising the following steps:

S1, the training broadcast segment that training broadcast is divided into 3-5s, and eigentransformation is carried out to training broadcast segment, it obtains To training data MFCC feature；

S2, training sample is extracted in training data according to training data MFCC feature, and passes through random sampling with replacement To multiple groups non-key word sample lack sampling at random, multiple groups balance training subset is obtained；

S3, Tomek Link noise reduction process is carried out to every group of balance training subset, obtains noise reduction balance training subset；

S4, by GBM algorithm to the independent gradient elevator model of noise reduction balance training trained of single keyword, It obtains gradient and promotes classifier；

S5, classifier is promoted by pack algorithm integration gradient, obtains integrated gradient and promote classifier, and passes through training number According to the probability threshold value for adjusting integrated gradient promotion classifier；

S6, the broadcast segment to be tested that broadcast to be tested is divided into 3-5s, and feature is carried out to broadcast segment to be tested Transformation, obtains data MFCC feature to be tested；

S7, data MFCC feature to be tested is put into integrated gradient promotion classifier progress keyword identification, is identified As a result.

Further: Tomek Link noise reduction process in the step S3 specifically: in balance training subset X in addition to Data x_iWith data x_jData x_k, i.e. x_k∈X\{x_i,x_j, if meeting it to x_iDistance and arrive x_jDistance be all larger than x_iWith x_jDistance, i.e. dist (x_i,x_j)<dist(x_i,x_k) and dist (x_i,x_j)<dist(x_j,x_k), then x_iAnd x_jFor a pair of of Tomek Link, if this is to the data x in Tomek Link_iWith data x_jBelong to different classes, then deletes x_iOr x_j。

Further: the specific steps of GBM algorithm in the step S4 are as follows:

S41, model F is enabled_K(x) are as follows:

In above formula, f_k(x；θ_k) it is the submodel that kth walks, α_kFor f_k(x；θ_k) corresponding weight, K is current step number, as Current operation total step number, x are voice sample, i.e. training broadcast segment, θ_kFor f_k(x；θ_k) parameter sets；

The distance between S42, the model predication value for calculating K+1 step and true value r_K+1, calculation formula are as follows:

In above formula, L (y, F_KIt (x)) is loss function, F_KIt (x) is model predication value, y is true value；

The model parameter θ of S43, fitting K+1 step_K+1, fitting formula are as follows:

In above formula, θ is model parameter, f_K+1(x；θ_k) it is the submodel that K+1 is walked；

S44, the weight α for calculating K+1 step_K+1, calculation formula are as follows:

In above formula, α is weight coefficient；

S45, by model F_K(x) iteration obtains updated Optimized model F_K+1(x), i.e., gradient promotes classifier:

F_K+1(x)=F_K(x)+α_K+1f_K+1(x；θ_K+1)。

Further: the specific method of adjustment that gradient promotes classifier probability threshold value is integrated in the step S5 are as follows:

S51, probability of the prediction example in crucial part of speech, calculation formula are calculated are as follows:

In above formula, T is the number of balance training subset, H_t(x_i) be single classifier prediction result, α_tIt is used Weight takes α_t=1/T；

S52, determine whether sample contains keyword, determine formula are as follows:

In above formula, δ is adjustable probability threshold value；

S53, keyword/non-key word probability threshold is determined as by probability adjustment output of the example in crucial part of speech Value.

The invention has the benefit that the present invention can be by the recall rate (weight of evaluation keyword recognition of keyword recognition Want one of index, and often need most promotion) promotion to 80% or more, while keep 70% or so whole accuracy rate. Meanwhile the index F1 score for measuring uneven specimen discerning, the present invention can in test sample by it from single gradient 0.04 or so of elevator is promoted to 0.31 or so, is greatly improved the reliability of identification.

Detailed description of the invention

Fig. 1 is flow chart of the present invention；

Fig. 2 is that the present invention is based on the performance schematic diagrames of the single benchmark gradient elevator of complete data set；

Fig. 3 is performance schematic diagram of the present invention using the single gradient elevator of lack sampling scheme；

Fig. 4 is the performance schematic diagram that the present invention uses 5 integrated gradient elevators；

Fig. 5 is the performance schematic diagram that the present invention uses 10 integrated gradient elevators.

Specific embodiment

A specific embodiment of the invention is described below, in order to facilitate understanding by those skilled in the art this hair It is bright, it should be apparent that the present invention is not limited to the ranges of specific embodiment, for those skilled in the art, As long as various change is in the spirit and scope of the present invention that the attached claims limit and determine, these variations are aobvious and easy See, all are using the innovation and creation of present inventive concept in the column of protection.

As shown in Figure 1, a kind of broadcast keyword recognition method based on integrated gradient elevator, which is characterized in that including Following steps:

S1, the training broadcast segment that training broadcast is divided into 3-5s, and eigentransformation is carried out to training broadcast segment, it obtains To training data MFCC feature.

S2, training sample is extracted in training data according to training data MFCC feature, and passes through random sampling with replacement To multiple groups non-key word sample lack sampling at random, multiple groups balance training subset is obtained.Because keyword data is generally in data set Middle accounting example is less, limited amount.If carrying out lack sampling to keyword data again, keyword lazy weight will lead to.Phase Instead, since the data volume of non-key word itself is larger, it has been carried out to put back to random sampling, so that the non-key word sample retained Amount is suitable with keyword sample size, then can not only reduce imbalance problem, but also will not destroy the manifold of keyword data distribution.Reason In the case of thinking, if the number of fruit keyword data is m_k, then balance sample in order to obtain, we can be arranged non-key word sample and adopt Sample quantity p=m_k.But it also needs to delete by Tomek Link noise reduction since we are subsequent some Chong Die with crucial part of speech Non-key word example, therefore, we just need to be arranged in more some non-key word examples of sampling at the beginning

S3, Tomek Link noise reduction process is carried out to every group of balance training subset, obtains noise reduction balance training subset. Tomek Link noise reduction process specifically: in balance training subset X in addition to data x_iWith data x_jData x_k, i.e. x_k∈ X\{x_i,x_j, if meeting it to x_iDistance and arrive x_jDistance be all larger than x_iAnd x_jDistance, i.e. dist (x_i,x_j)<dist (x_i,x_k) and dist (x_i,x_j)<dist(x_j,x_k), then x_iAnd x_jFor a pair of of Tomek Link, if this is in Tomek Link Data x_iWith data x_jBelong to different classes, then deletes x_iOr x_j。

S4, by GBM algorithm to the independent gradient elevator model of noise reduction balance training trained of single keyword, It obtains gradient and promotes classifier, the specific steps of GBM algorithm are as follows:

S41, model F is enabled_K(x) are as follows:

In above formula, α is weight coefficient；

F_K+1(x)=F_K(x)+α_K+1f_K+1(x；θ_K+1)。

S5, gradient promotion classifier is integrated by bagging method, obtain integrated gradient and promote classifier, and pass through training number According to the probability threshold value for adjusting integrated gradient promotion classifier, the specific method of adjustment that gradient promotes classifier probability threshold value is integrated Are as follows:

In above formula, δ is adjustable probability threshold value；

S53, keyword/non-key word probability threshold is determined as by probability adjustment output of the example in crucial part of speech Value, generally 0.5.

S6, the broadcast segment to be tested that broadcast to be tested is divided into 3-5s, and feature is carried out to broadcast segment to be tested Transformation, obtains data MFCC feature to be tested.

In one embodiment of the invention, in order to show effectiveness of the invention, we have been extracted by 133 radio One data of broadcast recoding composition.Wherein, sub-fraction contains keyword " Beijing time " (mandarin), and our mesh I.e. by keyword from broadcast segment in identify.All broadcast audios segment all processed for 5 seconds, by this The processing of sample, we obtain 6906 records in total, wherein 197 include keyword.

Since the label of broadcast keyword recognition has very big deflection (keyword/non-keyword sample size is uneven), number According to classification can realize high-accuracy, therefore common accuracy rate by the way that all examples are simply predicted as non-key word Index cannot sufficiently represent the quality of algorithm.In the classification of label unbalanced data, (it will usually be looked into using precision and recall rate Full rate) index carry out measure algorithm quality.Using TP, FP, TN, FN presentation class result is determined as true positives, and false positive is Kidney-Yin Property and false negative, may then pass through following formula calculate positive class precision precision and recall rate recall:

Same the method also can be applied to negative class.For specific positive or negative class, we can be calculated The F1 score of this algorithm:

In an experiment, we pay close attention to four evaluation indexes: general classification accuracy, the recall rate of non-key word, keyword Recall rate and crucial part of speech F1 score.We using the F1 score of crucial part of speech as net assessment index because we Task be identification keyword.That our models export is the probability that example is minority class (keyword, label 1), Wo Menke To adjust its δ value to obtain optimum prediction result.In our experiment, the δ value of test (open interval) from 0 to 1, precision length It is 0.05.As shown in Fig. 2, illustrating classification accuracy, majority (non-key word) recall rate and a small number of (keyword) recall rates Variation.4 kinds of different models are tested based on gradient elevator classifier in an experiment, and parameter has been carried out most by verifying Good adjustment.Benchmark model is a single gradient elevator (xgboost realization) classifier.X-axis indicates δ value in the figure, and And y-axis indicates rate of precision/recall rate.From figure 2 it can be seen that keyword recall rate and rate of precision are in the second width figure (training set) Middle highest, and keyword recall rate is greatly reduced in verifying collection and test set, illustrates that model has overfitting.

Second model of test is the single gradient elevator classifier using lack sampling method.This method can solve It is interpreted as " single model integrated ", the advantages of it can use lack sampling and Tomek Link, but do not have based on integrated classifier Advantage.From figure 3, it can be seen that overfitting problem is alleviated, most of example is no longer predicted as non-pass by classifier Keyword.Although the recall rate of non-key word data is declined, aggregate performance makes moderate progress.

Finally, the 5 and 10 gradient elevator classifiers based on pack algorithm integration are tested using identical data set, These classifiers follow technology set forth above, as a result as shown in Figure 4 and Figure 5.The two figures can be seen that there are two promoted: First, keyword/non-key word synthesis recall rate greatly improves.Secondly, when adding 1 classifier per in overall model more When, influence of the different δ values to performance becomes significant.The selection of δ can be completed by verifying, and we can obtain most preferably The forecast confidence of output and each example.

Table 1 shows the best F1 score for testing a small number of (keyword) data and accuracy/recall rate under the score. There are one additional " balance F1 score " indexs, it means that assuming that keyword and the identical feelings of non-key word example quantity Condition is got off the F1 score of calculating.The index further emphasizes the successful identification of crucial part of speech data, and increases it and retrieve recall rate.

Table 1

Claims

1. a kind of broadcast keyword recognition method based on integrated gradient elevator, which comprises the following steps:

S1, the training broadcast segment that training broadcast is divided into 3-5s, and eigentransformation is carried out to training broadcast segment, it is instructed Practice data MFCC feature；

S2, training sample is extracted in training data according to training data MFCC feature, and by random sampling with replacement to more The random non-key word sample lack sampling of group, obtains multiple groups balance training subset；

S4, by GBM algorithm to the independent gradient elevator model of noise reduction balance training trained of single keyword, obtain Gradient promotes classifier；

S5, classifier is promoted by pack algorithm integration gradient, obtains integrated gradient and promote classifier, and passes through training data tune Whole integrated gradient promotes the probability threshold value of classifier；

S6, the broadcast segment to be tested that broadcast to be tested is divided into 3-5s, and eigentransformation is carried out to broadcast segment to be tested, Obtain data MFCC feature to be tested；

S7, data MFCC feature to be tested is put into integrated gradient promotion classifier progress keyword identification, obtains recognition result.

2. the broadcast keyword recognition method according to claim 1 based on integrated gradient elevator, which is characterized in that institute State Tomek Link noise reduction process in step S3 specifically: in balance training subset X in addition to data x_iWith data x_jData x_k, i.e. x_k∈X\{x_i,x_j, if meeting it to x_iDistance and arrive x_jDistance be all larger than x_iAnd x_jDistance, i.e. dist (x_i, x_j)<dist(x_i,x_k) and dist (x_i,x_j)<dist(x_j,x_k), then x_iAnd x_jFor a pair of of Tomek Link, if this is to Tomek Data x in Link_iWith data x_jBelong to different classes, then deletes x_iOr x_j。

3. the broadcast keyword recognition method according to claim 1 based on integrated gradient elevator, which is characterized in that institute State the specific steps of GBM algorithm in step S4 are as follows:

S41, model F is enabled_K(x) are as follows:

In above formula, f_k(x；θ_k) it is the submodel that kth walks, α_kFor f_k(x；θ_k) corresponding weight, K is current step number, as currently Operation total step number, x is voice sample, i.e., training broadcast segment, θ_kFor f_k(x；θ_k) parameter sets；

In above formula, α is weight coefficient；

F_K+1(x)=F_K(x)+α_K+1f_K+1(x；θ_K+1)。

4. the broadcast keyword recognition method according to claim 1 based on integrated gradient elevator, which is characterized in that institute It states and integrates the specific method of adjustment that gradient promotes classifier probability threshold value in step S5 are as follows:

In above formula, T is the number of balance training subset, H_t(x_i) be single classifier prediction result, α_tFor used power Weight, takes α_t=1/T；

In above formula, δ is adjustable probability threshold value；

S53, keyword/non-key word probability threshold value is determined as by probability adjustment output of the example in crucial part of speech.