CN110298391A

CN110298391A - A kind of iterative increment dialogue intention classification recognition methods based on small sample

Info

Publication number: CN110298391A
Application number: CN201910505469.3A
Authority: CN
Inventors: 向阳; 单光旭; 贾圣宾; 徐诗瑶; 杨力
Original assignee: Tongji University
Current assignee: Tongji University
Priority date: 2019-06-12
Filing date: 2019-06-12
Publication date: 2019-10-01
Anticipated expiration: 2039-06-12
Also published as: CN110298391B

Abstract

The present invention relates to a kind of, and the iterative increment dialogue based on small sample is intended to classification recognition methods, this method is based on Small Sample Database collection, it is trained since a preliminary classification model, with the use of model, the quantity of rudimentary model is increasing, model accuracy rate also steps up, the training method that previous deep learning model needs great amount of samples is abandoned, preliminary classification model of this method during repetitive exercise due to only needing a small amount of sample training one new every time, other existing history preliminary classification Model Weights are constant, then it will be trained in the result input of whole preliminary classification models again disaggregated model, the calculating speed of model will not be reduced with the increase of sample size, similarity screening model can be screened and be rejected to existing preliminary classification model simultaneously, performance is maintained in the case where guaranteeing accuracy rate , compared with prior art, the advantages that present invention has training samples number few, and calculated performance is stablized, model easily updated extension.

Description

A kind of iterative increment dialogue intention classification recognition methods based on small sample

Technical field

The present invention relates to field of computer technology, talk with more particularly, to a kind of iterative increment based on small sample and are intended to Classification recognition methods.

Background technique

For the quality for promoting products & services, many companies are proposed the customer service system of oneself, are helped by artificial customer service User answers a question, and so that user is enjoyed more perfect service, improves service quality and efficiency, but with product user quantity Increase, traditional artificial customer service is not able to satisfy the demand of numerous users, and artificial customer service needs special to carry out business Training and learning also brings certain cost, and customer service hotline is chronically at the state that the line is busy, will affect the Experience Degree of user.Cause This each major company is proposed the artificial customer service product of oneself, by the chat message between study history customer service and user, extracts Intent information contained in dialogue out can help user to solve relevant business faster.

Dialogue intention assessment is exactly to pass through to understand interpersonal chat corpus information, to the intent features in text into Row retrieval, filtering and classification etc. finally identify the purpose even emotion that user session is included, it is intended that the core of identification is pair Semantic understanding.Dialogue intension recognizing method based on machine learning includes rule-based and statistics dialogue intention assessment, base In the dialogue intention assessment and the dialogue intention assessment based on production model etc. of Machine learning classifiers.Telecommunications industry uses Intelligent customer service robot the business tine handled required for user quickly can be understood by the dialogue with user, provide use Family options mitigates the retrieval burden of user.

In recent years, with the gradually development of deep learning, more and more scientific & technical corporation are proposed the chat product of oneself, For example the Siri of Apple Inc., Microsoft little Na and the voice assistant of Iflytek etc., these interactive purposes are final It is the intention for being appreciated that user, gives user feedback and user is helped to enjoy preferably service, these, which all be unable to do without, is intended to user Recognizer.But often some chat robots due to itself learning ability it is poor, be difficult to understand for that user is profound to be asked Topic, cause to occur giving an irrelevant answer the phenomenon that even circulation is answered, therefore there are also to be improved and raisings for current intention assessment algorithm.

The algorithm of comparative maturity has rule-based matched intention assessment algorithm in dialogue recent years intention assessment, is based on The document classification algorithm of probability statistics model, and SVM (support vector machines) is used, KNN (k- neighborhood), the text of decision-tree model This sorting algorithm etc..Information of the rule-based algorithm often by keyword in statistics text, rough supposition user Intention, this mode classification is slower in the biggish inquiry under condition of data volume, and needs artificial mark, takes time and effort；Base It is higher for the quality and Spreading requirements of text in the document classification algorithm of probability statistics model, and in the case where small sample Classification is inaccurate；Text classification algorithm based on machine learning classification model is preferable for short text classifying quality, for long text For be difficult capture context of dialogue information, be easy to appear the case where giving an irrelevant answer, and new expectation is needed to instruct again Practice model, with increasing for number of samples, training complexity is larger.

Summary of the invention

It is an object of the present invention to overcome the above-mentioned drawbacks of the prior art and provide a kind of based on small sample Iterative increment dialogue is intended to classification recognition methods.

The purpose of the present invention can be achieved through the following technical solutions:

A kind of iterative increment dialogue intention classification recognition methods based on small sample, which includes following step It is rapid:

Step 1: segment for the text sentence in dialogue intention and training obtains term vector；

Step 2: obtained after successively being extracted using LSTM network and CNN network characterization for a part of term vector sentence to Another part term vector is input to what training finished by sentence vector by taxon to train preliminary classification model by amount Preliminary classification model obtains preliminary classification results；

Step 3: for preliminary classification results, passing through XGBoost after being screened according to the return of intensified learning model Model carries out secondary classification, and trains disaggregated model again using the method for gradient decline；

Step 4: using the classification error sample in entire disaggregated model training process re-enter training one it is new just Disaggregated model is walked, return step 3 is iterated incremental learning；

Step 5: sharp when the quantity of preliminary disaggregated model rises to preset threshold in updating iterative increment learning process Similarity calculation is carried out with the arbiter model that training finishes, and removes one of highest two models of similarity to maintain Overall model computational stability；

Step 6: circulation executes step 3 to step 5, and gradual perfection classification results simultaneously obtain finally corresponding to identification types knot Fruit.

Further, in the preliminary classification results in the step 2 each classification output probability are as follows:

In formula, P (X_i|D_k) indicate the output probability of each classification, X_iIndicate actual classification as a result, X_i' indicate prediction point Class is as a result, X indicates all classification results.

Further, the loss function in the preliminary classification results in the step 2 are as follows:

In formula, y₁Indicate the loss function in preliminary classification model output result, η₁For adjustment parameter, P indicates true intention Class label probability, P' indicate prediction class label probability.

Further, the calculation formula of the return in the step 3 are as follows:

In formula, R indicates return, r_iIndicate that n and i are natural number, and ξ takes 0.95 when time return.

Further, the taxon in the step 2 is full Connection Neural Network, preliminary in the step 2 Disaggregated model includes mutual sequentially connected sentence vector extract layer, talks with vector extract layer and layer of classifying, in the step 3 Disaggregated model includes intensified learning interconnected-model discrimination module and decision tree-categorization module again, in the step 5 Arbiter model be full Connection Neural Network.

Further, preliminary classification results are directed in the step 3, the process screened according to return is corresponding Loss function are as follows:

y₃=p (A_i|F4_i)*log(p'(A_i|F4_i))

In formula, y₃Indicate the corresponding loss letter of process screened for preliminary classification model output result according to return Number, p (A_i|F4_i) indicate under being corresponded in loss function for preliminary classification model output result according to the process that return is screened The probability of one movement, p'(A_i|F4_i) indicate the process pair screened for preliminary classification model output result according to return Answer the probability of realistic operation in loss function.

Further, loss function is corresponded to by the process that XGBoost model carries out secondary classification in the step 3 Are as follows:

In formula, y₄It indicates to correspond to loss function, O by the process that XGBoost model is further classified_i' indicate to pass through The process that XGBoost model is further classified corresponds to the selection result of first stage in loss function, O_i" indicate to pass through The process that XGBoost model is further classified corresponds to further sorted result in loss function.

Further, arbiter model carries out similarity calculation in the step 5, and removes similarity highest two The process of one of model corresponds to loss function are as follows:

In formula, y₅Indicate that the process of arbiter model progress similarity screening corresponds to loss function, K is indicated from existing instruction Practice the quantity that a part of data set is randomly selected in sample, p_kIndicate that arbiter model carries out the corresponding damage of process of similarity screening Lose the legitimate reading in function, p_k' indicate that the process of arbiter model progress similarity screening corresponds to the prediction in loss function As a result.

Step 1 and step 2 described above can integrate summary are as follows: carry out unsupervised instruction to participle using language model first Practice, obtain the term vector of fixed dimension, then the term vector in each sentence is input in LSTM network and carries out feature extraction, Extract obtained sentence vector and dimensionality reduction operation carried out using CNN, then using above-mentioned same process to the sentence in dialogue into Row feature extraction, the vector entirely talked with indicate, can preferably obtain dialogue by the way of secondary characteristics extraction Between contextual information；

Intensified learning method refers to step 3 in disaggregated model again: intensified learning mode first export on last stage to Amount then according to current state selects next movement as state, if the prediction effect of next small sample model compared with It is good, then it is added to prediction result concentration, is otherwise abandoned；The return of intensified learning model judges according to model prediction result, root The maximum small sample model set of integral benefit is selected according to the above method, this method can be improved the predictablity rate of model；It will The prediction result for the small sample model set that above-mentioned intensified learning model discrimination obtains is input in the model, and prediction label is most The tag along sort that dialogue is intended to eventually, the training by the way of gradient decline.

Iterative delta algorithm process in step 4 is as follows: first according to existing small sample model training preliminary classification mould Then type trains disaggregated model again for the result of preliminary classification model as the input of disaggregated model again, then has new mould When type is added, all preliminary classification models of re -training are not needed, but new preliminary classification model and original model is defeated Be input in disaggregated model again re -training disaggregated model again together out, due to single preliminary classification model training with again The training time of subseries model is short, can save the cycle of training of entire model, reduces time complexity, improves making for model Use performance；

Similarity is screened in step 5 specifically: with being increasing for new model, the training that preliminary classification model generates is tied Fruit constantly increases, and in order to control the service performance of entire model, and in the case where guarantee model accuracy rate, needs to existing mould Type is screened, therefore the similarity of result is exported by calculating different models, can choose reservation similarity higher two One of them in model enhances the generalization ability of model, reduces redundancy.

Compared with prior art, the invention has the following advantages that

(1) first two steps first are rapid in the method for the present invention: step 1: being segmented simultaneously for the text sentence in dialogue intention Training obtains term vector；Step 2: being obtained after successively being extracted using LSTM network and CNN network characterization for a part of term vector Another part term vector is input to training to train preliminary classification model by taxon by sentence vector by sentence vector The preliminary classification model finished obtains preliminary classification model output result；Training for small sample, can be preliminary from one Naive model starts, and is gradually promoted then as the use training accuracy rate of model, and combines existing model, abandoned with The training method of great amount of samples is needed toward deep learning model.

(2) screening of the intensified learning strategy for preliminary classification model result in step 3 in the method for the present invention, guarantees first Global optimum, avoids the excessive classification results to disaggregated model again of single Model Weight from impacting, equally subseries again Model adjusts the classification results of preliminary classification model, and personalized selection preliminary classification category of model is as a result, protect The robustness of model is demonstrate,proved.

(3) the iterative incremental learning model in the method for the present invention in step 4 only needs a small amount of sample training due to trained Last preliminary classification model, other Model Weights are constant to still carry out output, then further updates disaggregated model again Weight, in calculating speed will not by sample size increase and reduce, while similarity screening model can to existing model into Row screening and rejecting guarantee to maintain performance in the case that accuracy rate is stablized.

(4) learning strategy of iterative increment, mistake sample are used on the framework of the entire model in the method for the present invention in step 4 This continues to train the performance for promoting entire model, guarantees that scene rare in intention assessment scene can also make reliable meaning Figure classification, while function is completed in combination between multiple modules on framework, the degree of coupling is lower, can train in a distributed manner, single model Facilitate replacement and update, is easy to extend.

Detailed description of the invention

Fig. 1 is general frame schematic diagram of the invention；

Fig. 2 is specific implementation network structure of the invention.

Specific embodiment

Following will be combined with the drawings in the embodiments of the present invention, and technical solution in the embodiment of the present invention carries out clear, complete Site preparation description, it is clear that described embodiment is a part of the embodiments of the present invention, rather than whole embodiments.Based on this hair Embodiment in bright, those of ordinary skill in the art's every other reality obtained without making creative work Example is applied, all should belong to the scope of protection of the invention.

It is a kind of based on a small amount of sample the object of the invention is to be provided to solve defect existing for above-mentioned existing method Training strategy, accuracy rate constantly promotes, and the degree of coupling is lower between disparate modules, is easy to extend, and introducing intensified learning strategy can It is preferably promoted with the model to primary learning, finally the screening to existing model and filtering are able to maintain the stability of model, Technological frame is as shown in Figure 1.

The invention mainly comprises two big modules, first is that the instruction of the dialogue intention assessment model of the iterative increment based on small sample Practice module, second is that the use module of the dialogue intention assessment model of the iterative increment based on small sample.Wherein training module includes Three models: first is that for the model one tentatively classified to Small Sample Database collection, second is that preliminary classification result again into The model two of row classification, third is that carrying out the model three of screening and filtering to model.It the use of module is to the trained mould of above three Type carry out using, and can be carried out iterative increment study, step up model performance.

Specific module architectures are as follows:

First part is the training of model: model one uses the stratification feature extraction algorithm based on deep learning, first Corpus of text is input in language model and carries out unsupervised training, obtains term vector；Then M sentence in dialogue is carried out Participle, is converted into term vector < W for N number of word in sentence₁,W₂,...,W_N>, after obtaining term vector, Small Sample Database Collection is divided into Da and Db two parts, and wherein Da data set is used for training pattern one, and training process is as follows, in wheel dialogues more in Da Term vector corresponding to single sentence is input in two-way LSTM (shot and long term neural network), in conjunction with attention mechanism, before obtaining To the output vector of LSTMWith the output vector of consequent LSTMThen by vector into Row splicing obtains feature vector corresponding to wordUse CNN (convolutional neural networks) later By sentence vector carry out dimensionality reduction operation, obtain each sentence for feature vector C_k(1≤k≤M), in entirely talking with M sentence, can by the above method carry out feature extraction, obtain sentence vector < C₁,C₂,...,C_N>, then using complete Connection Neural Network extracts sentence, and by four layers of full articulamentum, every layer of output is F1~F4, the feature of the last layer Vector layer output is F4, represents the vector of current entire sentence, if final intention classification is X, then F4 access X ties up Quan Lian It connects layer and is divided into X class intention, training process is for classification X actual in sample_iWith the classification X of classification_i' using the side of supervised learning Formula, network function are indicated with f1, are declined the weight of reversed adjustment model by gradient, are trained single model, be denoted as model 1-1 ~model 1-n, the output of model are O=< O₁,O₂,...,O_N>；

Model two screens the classification results in model one using intensified learning mode, it is assumed that has trained mould Db data set is input to and obtains the 4th layer of output vector F4 and final defeated in 1~n of model by n model in type one Outgoing vector O, taking 1-i here is i-th of model in model one, and 1≤i≤n-2, wherein the last layer in model 1-i is special The output vector F4 of layer is levied as current state S_i, the output vector F4 in model 1- (i+1) is as next state, it is assumed that The output result of model 1- (i+1) is X_i+1', true classification results are X_i+1If X_i+1'=X_i+1, then classification is correct, when Preceding return r_i=1, act A_i=+1, next state is converted into i+1, if X_i+1’≠X_i+1, then return r_i=0, movement A_i=+2, next state is converted into i+2.It repeats the above process and is learnt, until obtaining maximum return R.Model Based on the state S and movement A saved in history in two training process, it is assumed that the weight of network is θ, and network function is indicated with f2, Income indicates that f2 backpropagation is trained, so that Q network reaches stable state with Q.Model two is to the above-mentioned result O' screened =< O'₁,O'₂,...,O'_NThe classification of > further progress, wherein O' ∈ O, determines to the above results by XGBoost model Plan obtains result of decision O ", according to really talking with corresponding intention classificationThe result O " obtained with two decision of model is carried out Comparison, the same weight that two XGBoost classifier of model is adjusted by the way of gradient decline.

Model is third is that being eliminated and being screened to the single mini Mod in model one, from existing historical sample data D A part of sample Dc is screened, is input in model 1-1 and model 1-n, produce output result simultaneously obtains one taxon of model Then F4 is input in arbiter model f3 by the 4th layer of output vector F4, arbiter model is neural by four layers of full connection Network composition, input sample is the output vector F4 of two models one, it is assumed that is F4_iAnd F4_j, and (1≤i ≠ j≤n), sample is set It is set to (F4_i, i) and (F4_j, j), which model softmax layer of the output by two classification currently exports for judging from, By there is the training method of supervision, 1-1~1-n, model are trained, a more perfect arbiter model is obtained；

Second part is the use of model: sample D ' in practice being input in 1-n model one first, obtains model One classification results O and the 4th layer of full connection output vector F4, then as the input of two intensified learning model of model, K output is obtained as a result, and then obtaining final classification in the decision tree classifier of K output result access model two, it is right Words are intended in scene, are intended to the data of misinterpretation for user, need to turn artificial treatment, the intention classification of final artificial treatment Intention as processing mistake understands label, then by after the result queue of classification error, obtains new data set E, same point For Ea and Eb, model 1- (n+1) is denoted as using the sample E one new model one of training for having label, then new training sample This Eb, which is input in n+1 model one, to be obtained as a result, using the result of n+1 model as the training sample of model two into one in turn Walk training pattern two, the step for only have trained model two and model 1-n+1, original model 1-1~1-n, there is no again It updates, the training time is greatly saved；And the quantity of model one gradually increases with the accumulation of sample, waits until one quantity of model Certain threshold value δ is risen to, existing preceding n model is screened using trained model three, from 1- (n-1) a model In select with the maximum model 1-i ' of model 1-n similarity, then remove existing model 1-i, in this way holding model quantity dimension It holds at n, so that the calculated performance of model will not be reduced because model number is excessive.

The specific embodiment of iterative increment dialogue intension recognizing method based on small sample is as follows:

It is that telecommunications customer service intent data is divided first, customer service is intended to be divided into 50 according to industry actual needs Class (X takes 50), customer service data from the true customer service incoming call text that is converted into of dialogue it is anticipated that the extra greeting sentence of removal, It is 10 wheels that number is averagely taken turns in the important dialogue of the dialogue, and each round includes two sentences, therefore a customer service sample is about by 20 Sentence, an intention labels composition, each sentence are on a rough average in 5-10 words or so after participle.Here it chooses first 1000 data collection are divided into 10 groups, each group contains 100 datas, and each group of data are again by 1000 Small Sample Database collection It is secondary to be divided into Da data set and Db data set.Each group data set uses 70% training set as single model, 30% conduct The effect of verifying the set pair analysis model is assessed, until accuracy rate of the model on verifying collection reaches convergence, it is believed that model training It finishes.

The structure of model one is as shown in Fig. 2, training process in detail is as follows: first splitting into the dialogue of existing training sample Single different sentence obtains corresponding to each word using these sentences as input training one into language model Term vector is denoted as W, and the dimension of each term vector is 100 dimensions.It may finally obtain the term vector < W of entire sentence₁,W₂,...,W_N >, N takes 10 here, and number of words is filled less than N with 0, extra direct to cut out the subsequent part of n-th word.It then will be whole The sentence vector < W of a sentence₁,W₂,...,W_N> is input in LSTM model, and LSTM model is using double based on attention mechanism To LSTM, the vector that model each in this way obtains isThe corresponding output vector of each word is 200 dimensions, since the length of sentence is 10, so the vector dimension that entire sentence obtains in model is 2000 dimensions.2000 dimensions LSTM mode input into three layers of convolutional neural networks, the setting of the design parameter of convolutional network is as follows, first first layer general 2000 dimension vectors as input, convolution kernel size be 2*1, sliding step 2, this results in one 1000 tie up output to Amount, then by second layer pond layer, the size of pond layer network is 2*1, and sliding step 2 has obtained the defeated of one 500 dimension Outgoing vector, then by the output vector of 500 dimensions, then by there is one layer of convolution, convolution kernel size ties up 5*1, and step-length 5 obtains The output vector of one 100 dimension, terminates the feature extraction of sentence, obtains vector < C₁,C₂,...,C_N>.

For entirely talking with, corresponding 100 dimensional vector of available each sentence through the above way, then by 20 The corresponding sentence vector < C of a sentence₁,C₂,...,C_N> is input in another two-way LSTM based on attention mechanism, is led to The output vector that LSTM feedforward network obtains the output vector of one 100 dimension and backward network obtains one 100 dimension is crossed to combine It arrivesThen two vectors are subjected to direct splicing and obtain the vector of one 200 dimension, because By 20 sentences, whole vectors is spliced the vector of available one 4000 dimension, be then linked into full Connection Neural Network Classify in module, the design method of full Connection Neural Network is as follows: being that one layer of 2048 neuron form first Then network layer is added in dropout layers, dropout ratio is set as 0.2, then accesses the full articulamentum of one 1024 dimension, Then it is linked into the full articulamentum of 512 dimensions, after being standardized using Batch Normalize (batch standardization) layer, access In the full articulamentum tieed up to one 100, the output result obtained at this time is denoted as F4, and F4 is then linked into one 50 dimension In softmax, classification output, therefore available sample D are carried out_kFor the output probability P (X of each classification_i|D_k) are as follows:

Here setting loss function is arranged are as follows:

In formula, y₁Indicate the loss function in preliminary classification model output result, η₁For adjustment parameter, value 0.6 is used In regulation loss function, P indicates that true intention class label probability, P' indicate prediction class label probability.

Using gradient decline by the way of model is trained, until the predictablity rate of model reach 75% or more or Model loss function during 10 iteration reaches stabilization can deconditioning.

Followed by the first module of model two, intensified learning result screening module, specific embodiment as shown in Fig. 2, By the sample data in above-mentioned training set Db, it is input in trained model 1-1 to model 1-n, passes through some mould Type 1-i, (1≤i≤n-2), the vector F4 of available 100 dimension_i, as the state of "current" model, F4_i+1As current mould Next state of type 1-i, the movement A of model are to select next state for F4_iOr F4_i+1, according to model 1-'s (i+1) Prediction result O_iWith legitimate readingComparison, if the correct so currently return r of prediction_i=1, otherwise r_i=0, at the same time may be used A is selected to obtain the movement of model_iIf r_i=1, then next prediction result is advantageous, A_i=+1, indicate that selection is next A movement, if prediction error, A_i=+2, next step is directly skipped in expression, in order to prevent A_iSelection, which has, is biased to Property, model over-fitting is prevented, A is set here_iThere is 0.01 probability to randomly select next movement.The calculating that model is finally returned Formula is following (ξ=0.95):

Assuming that the weight of network is θ, network function indicates that income is indicated with Q with f2, then:

If loss function y₂=(f₂-Q(S_i,A'；θ))², declined by reversed gradient, until loss function convergence can stop Only train.The network parameter of lower intensified learning screening model described herein is arranged, and the present invention is pre- using 4 layers of deep neural network The return of next state is surveyed, training process is as follows, the result of classification is converted into the array of historical record preservation, each Record is made of following state, current state F4_i, next state F4_i+1, the movement A that is done_i=+1 or A_i=+2, with And the return r of current action_i, it is denoted as four-tuple (F4_i, F4_i+1, r_i, A_i), the overall dimension that inputs is 200 dimensions, and input vector is (F4_i, F4_i+1, r_i), it being input in one four layers full articulamentum, number of network node is respectively set to [128,64,32,16], wherein Dropout layers are added between the second layer and third layer, ratio is set as 0.25, and the output of model is A_i, true output action It is available according to existing return, therefore the probability that can calculate next movement is p (A_i|F4_i), realistic operation it is general Rate is p ' (A_i|F4_i), here by the way of gradient decline, loss function are as follows:

y₃=p (A_i|F4_i)*log(p'(A_i|F4_i))

Second modular character screening module of following training pattern two, obtains 1- in the first module of model two first Under i model, the output result O of NextState_i+1If acting A in 1-i model_i=+2, indicate the meter of model 1- (i+1) Result mistake is calculated, then the output result of next state 0 filling, i.e. O_i+1=0, the input dimension of model two is maintained in this way Unanimously, here using XGBoost decision tree for the selection result O'=< O' of two first stage of model₁,O'₂,...,O'_N> Further classification reversely adjusts the training parameter of model also according to true prediction result O ", updates the classification power of decision tree Weight, sets loss function here as y₄, calculation is as follows:

The training process of model three is as follows, and structure is embodied as shown in Fig. 2, training process is as follows, from existing training A part of data set Dc is selected in sample at random, the quantity of Dc is K, takes K=500 item here, takes a trained mould These data are input to the output vector F4 that the last layer is obtained in model one by type 1-i_iAnd i, it is also fed to model 1-n In obtain output vector F4_nAnd n, therefore use 500 data sets, available 1000 training samples and corresponding Label is then enter into full articulamentum neural network model three, and the network number of plies is set as [512,128,64,32], Wherein two or three layers of full articulamentum access dropout, and ratio is set as 0.35, in final softmax layers of access, obtain the defeated of 2 dimensions Then outgoing vector is classified, predict input F4 vector generate the result is that i or n, that is, judge that current training sample comes from In model 1-i or model 1-n, using the accuracy rate of classification as the similarity of model, because if the accuracy rate of category of model It is higher to indicate that the similarity between model is smaller, it is easy to distinguish, if the accuracy rate of model illustrates the phase of model 0.5 or so It is big like degree, it should not distinguish.If the result of prediction is p_k, legitimate reading p_k' loss function y₅It calculates as follows:

After the training of three above module is completed, followed by the use of model, first in the actual use of input a part Data to 1-1~1-10 totally 10 models of model one, obtain exporting preliminary classification results O₁-O₁₀And model one is complete Connect the 4th layer of output vector F4₁-F4₁₀, it is then enter into trained model two, the reinforcing of model two first Learning layer can screen the classification results of above-mentioned 1-10 model, if screening has obtained the classification results of model one, Current classification results are so saved, if eliminating the classification results of model one, completion are filled with 0, this ensure that model Two input dimension is consistent, finally obtains the output vector of a N-dimensional, and N takes 20 here, is then input to 20 output results In the XGBoost categorised decision tree of model two, final result of secondary classification results as intention assessment is obtained, if meaning Figure identification model for user dialog information misinterpretation, then according to the practical class of service handled of user to classification error Corpus is marked, and is put into new training corpus set E, if the quantity count (E) of E reaches certain threshold value δ, that Data set E is divided into two training datasets of a and b, one model 1-11 of Ea data set re -training, then by Eb data Integrate to be input in model 1-1~1-11 and generates 11 outputs as a result, for the total amount of setting model as N (N=20), model two will here Above-mentioned output result is converted into the four-tuple of intensified learning model then in conjunction with experience replay new mechanism intensified learning model, from In filter out a part of classification results after, be input in the XGBoost decision tree of model two, obtain final point of model two Class adjusts the weight of network as a result, then compare with the actual result currently expected, and so far the training process of model two terminates, The iteration above process reaches 20 until the number of model, while training pattern 1-21, has trained before use Model 1-20 and other 1-1~-1-19 mode inputs into model three, obtain the similarity of each model, will be similar Degree takes the highest model 1-k of similarity according to descending sort from big to small¹, it is removed from original model, 1-k¹Later Model serial number subtracts 1, and then using newly trained model as 1-20, more new model two, obtains new classification results again.Subsequent mould The iteration update of type repeats the above process, and keeping the maximum number of model is 20.

For in the usage mode of above-mentioned model iteration, for the excavation of text information in model one, Integrated Understanding is single Also in relation with the corpus information between sentence and sentence on the basis of sentence characteristics, can more understand dialogue included in semanteme with Timing information.In the use process of model, since the data set that model itself uses is constantly increasing, model it is fault-tolerant Property constantly enhancing, and existing model is screened using the structure of intensified learning in model two, ensure that model Then robustness carries out secondary classification using classification results of the decision-tree model to model one, further promotes classifying quality.It is right In in practice to data, if classification error, one model of data set re -training of classification error is further used, is had The classification weakness for targetedly promoting existing model, forms the closed loop Training strategy an of positive feedback, in incremental learning sample Characteristic information.Model uses distributed training method, and the degree of coupling between model one, model two and model three is low, can With parallel distributed training, after the training of model one is completed, model two and model three can be updated and calculate simultaneously, mention The efficiency of model is risen, during iteration, only new model 1-n and model two need to update for new data set model Weight, existing model keep original state, as far as possible reduction operation time, the stabilization of model are also ensured, in pattern number Amount is screened after reaching a certain level using three arbiter of model, and disadvantage or similar model are eliminated, and keeps pattern number Amount is in stable state, is also prevented from the excessive duplicate result of generation in model one and interferes to the classification of model two.

The above description is merely a specific embodiment, but scope of protection of the present invention is not limited thereto, any Those familiar with the art in the technical scope disclosed by the present invention, can readily occur in various equivalent modifications or replace It changes, these modifications or substitutions should be covered by the protection scope of the present invention.Therefore, protection scope of the present invention should be with right It is required that protection scope subject to.

Claims

1. a kind of iterative increment dialogue based on small sample is intended to classification recognition methods, which is characterized in that the recognition methods packet Include following steps:

Step 2: sentence vector is obtained after successively extracting using LSTM network and CNN network characterization for a part of term vector, it will Another part term vector, to train preliminary classification model, is input to preliminary point that training finishes by taxon by sentence vector Class model obtains preliminary classification results；

Step 3: for preliminary classification results, passing through XGBoost model after being screened according to the return of intensified learning model Secondary classification is carried out, and disaggregated model again is trained using the method for gradient decline；

Step 4: re-entering training one new preliminary point using the classification error sample in entire disaggregated model training process Class model, return step 3, is iterated incremental learning；

Step 5: utilizing instruction when the quantity of preliminary disaggregated model rises to preset threshold in updating iterative increment learning process Practice the arbiter model finished and carry out similarity calculation, and removes one of highest two models of similarity to remain whole Model computational stability；

Step 6: circulation executes step 3 to step 5, and gradual perfection classification results simultaneously obtain finally corresponding to identification types result.

2. a kind of iterative increment dialogue based on small sample according to claim 1 is intended to classification recognition methods, special Sign is, the output probability of each classification in the preliminary classification results in the step 2 are as follows:

In formula, P (X_i|D_k) indicate the output probability of each classification, X_iIndicate actual classification as a result, X_i' indicate prediction classification knot Fruit, X indicate all classification results.

3. a kind of iterative increment dialogue based on small sample according to claim 1 is intended to classification recognition methods, special Sign is, the loss function in preliminary classification results in the step 2 are as follows:

In formula, y₁Indicate the loss function in preliminary classification model output result, η₁For adjustment parameter, P indicates true intention classification Label probability, P' indicate prediction class label probability.

4. a kind of iterative increment dialogue based on small sample according to claim 1 is intended to classification recognition methods, special Sign is, the calculation formula of the return in the step 3 are as follows:

5. a kind of iterative increment dialogue based on small sample according to claim 1 is intended to classification recognition methods, special Sign is that the taxon in the step 2 is full Connection Neural Network, the preliminary classification model packet in the step 2 Include mutual sequentially connected sentence vector extract layer, dialogue vector extract layer and classification layer, the classification mould again in the step 3 Type includes intensified learning interconnected-model discrimination module and decision tree-categorization module, the arbiter mould in the step 5 Type is full Connection Neural Network.

6. a kind of iterative increment dialogue based on small sample according to claim 1 is intended to classification recognition methods, special Sign is, is directed to preliminary classification results in the step 3, corresponds to loss function according to the process that return is screened Are as follows:

y₃=p (A_i|F4_i)*log(p'(A_i|F4_i))

In formula, y₃It indicates to correspond to loss function, p (A according to the process that return is screened for preliminary classification model output result_i |F4_i) indicate to correspond to next in loss function move according to the process that return is screened for preliminary classification model output result The probability of work, p'(A_i|F4_i) indicate the corresponding loss of process screened for preliminary classification model output result according to return The probability of realistic operation in function.

7. a kind of iterative increment dialogue based on small sample according to claim 1 is intended to classification recognition methods, special Sign is, corresponds to loss function by the process that XGBoost model carries out secondary classification in the step 3 are as follows:

8. a kind of iterative increment dialogue based on small sample according to claim 1 is intended to classification recognition methods, special Sign is, arbiter model carries out similarity calculation in the step 5, and remove highest two models of similarity wherein it One process corresponds to loss function are as follows:

In formula, y₅Indicate that the process of arbiter model progress similarity screening corresponds to loss function, K is indicated from existing trained sample The quantity of a part of data set, p are randomly selected in this_kIndicate that arbiter model carries out the corresponding loss letter of process of similarity screening Legitimate reading in number, p_k' indicate that the process of arbiter model progress similarity screening corresponds to the prediction result in loss function.