CN108829662A

CN108829662A - A kind of conversation activity recognition methods and system based on condition random field structuring attention network

Info

Publication number: CN108829662A
Application number: CN201810443182.8A
Authority: CN
Inventors: 陈哲乾; 蔡登�; 杨荣钦; 赵洲; 何晓飞
Original assignee: Zhejiang University ZJU
Current assignee: Zhejiang University ZJU
Priority date: 2018-05-10
Filing date: 2018-05-10
Publication date: 2018-11-16

Abstract

The invention discloses a kind of conversation activity recognition methods and system based on condition random field structuring attention network, and wherein recognition methods includes the following steps：(1) memory network is combined, dialog semantics information is subjected to hierarchical reasoning, semantic modeling according to word layer, sentence layer, dialogue layer；(2) application structure attention network carries out the division of structure trifle to conversation content according to the correlation between conversation content；(3) obtained structured message is applied on linear conditions random field algorithm, based on context context predicts current session behavior.Through the invention, the contextual information in dialogue interactive process can be captured with depth, and it can accomplish that the segment of dynamic division conversation content can be further improved conversation activity recognition accuracy by combining structuring attention network with condition random field algorithm.

Description

A kind of conversation activity recognition methods based on condition random field structuring attention network And system

Technical field

The present invention relates to natural language processing conversational system fields, and in particular to one kind is infused based on condition random field structuring The conversation activity recognition methods and system of meaning power network.

Background technique

In recent years, gradualling mature with human-computer interaction technology, the product for largely carrying man-machine interactive system come into thousand Ten thousand families of family.Such as smart phone assistant Siri, Cortana, the small love classmate of intelligent sound, day cat spirit etc., such product go out Show, people is allowed profoundly to experience technology to mankind's bring convenience and enjoy.At the same time, human-computer interaction conversational system also by The concern from industry and academia researcher extensively is arrived.Major research field of the invention, being can not in conversational system Or one of scarce technology --- conversation activity identification.The purpose of conversation activity identification is for one section of conversation content, on given Hereafter under the premise of conversation content, the behavior of current session is predicted, so as to identify the intention of speaker.One efficiently and accurately Conversation activity identification model, it is desirable to be able to clearly capture context of co-text information, while state can be carried out to conversation content and chased after Track, so that the intention of clear current session speaker, carries out Activity recognition.For machine, speaker can be accurately identified Behavior be intended to, can accordingly produce accurately dialogue reply, this is an important technology in human-computer interaction conversational system Difficult point.

Currently, the conversation activity identification technology of mainstream, is mainly studied in terms of two.One is identify conversation activity Problem definition is at the more classification problems of text.Such as the paper that Khanpour et al. is proposed in COLING meeting in 2016 《Dialogue Act Classifcation in Domain-Independent Conversations Using a Deep Recurrent Neural Network》The LSTM-Softmax algorithm of proposition will be right in conjunction with deep learning and soft sorting algorithm Activity recognition problem definition is talked about into the more classification problems of simple text.Blunsom et al. was in Computer in 2013 The paper of Science periodical publication《Recurrent Convolutional Neural Networks for Discourse Compositionality》The RCNN algorithm of proposition carries out sentence model and dialog model in conjunction with layering convolutional neural networks Layered modeling carries out text classification after extracting Deep Semantics information.Ji Yangfeng et al. was in NAACL-HLT meeting paper in 2016《A Latent Variable Recurrent Neural Network for Discourse Relation Language Models》The DRLM-Conditional model of middle proposition, it also is contemplated that neural network structure and probability graph model, it will be different right The sentence of words behavior carries out labeling according to maximization principle, reaches conversation activity identifying purpose.There are also another kind sides Method is that conversation activity identification problem definition is marked problem at a structuring timing, generallys use Hidden Markov Model And conditional random field models, it may be considered that the context relation relationship of entire conversation content.Definition mode of this method to problem It is entirely different with the self-existent definition of conversation activity each in the more classification problems of text.Structuring timing mark, more Ground considers that influence state of the current the words by front context of co-text, each current state can be by preceding states It influences.Such as Kumar et al. discloses the paper issued on website in Connell《Dialogue Act Sequence Labeling using Hierarchical encoder with CRF》The Bi-LSTM-CRF model of proposition, by simple condition random field Algorithm is placed in the last layer output layer of depth of seam division learning neural network, thus the Activity recognition that engages in the dialogue.Stolcke etc. The paper that people delivered on Dialogue magazine in 2006《Dialogue Act Modeling for Automatic Tagging and Recognition of Conversation Speech》It directly proposes to use plain text feature extracting method It is combined with Hidden Markov Model, carries out structured sequence prediction.

It is obvious that conversation activity identification problem definition is merely lost dialogue field at the more classification problems of text-independent Context relation information abundant in scape.In dialog procedure, speaker speak intention can be by conversation content institute shadow before It rings, such as when the conversation activity of previous sentence is to greet, next conversation activity very maximum probability is also to greet.Or it is preceding One conversation activity is when asking a question, and it may be exactly to answer a question that next conversation activity is very big.And another settling mode, Using the method for usual terms random field and Hidden Markov Model, although the association of context conversation content can be captured Property, it has but seriously been confined to the influence of dialogue previous state, has not accounted for whole conversation content when dividing topic Influence to conversation activity identification.Usually, they can spread whole section of conversation content out to regard one whole section of impartial long article as This, does not distinguish topics different in conversation content by dividing the mode of trifle.This is like the mankind in more wheel dialogues In the process, topic is often replaced, and chats this topic for a moment, chats another topic again for a moment.Between two topics in fact simultaneously Without too big relevance, more the two topics should be independently differentiated.

Summary of the invention

The present invention provides a kind of conversation activity recognition methods based on condition random field structuring attention network, very well Ground solves the problems, such as that conversation content because Activity recognition accuracy rate caused by topic shifts is low, improves conversation activity in the process Robustness when identification is influenced by context relation.

A kind of conversation activity recognition methods based on condition random field structuring attention network, including including following step Suddenly：

(1) memory network is combined, dialog semantics information is subjected to hierarchical reasoning, language according to word layer, sentence layer, dialogue layer Justice modeling；

(2) it is small to carry out structure to conversation content according to the correlation between conversation content for application structure attention network Section divides；

(3) obtained structured message is applied on linear conditions random field algorithm, based on context context prediction is worked as Preceding conversation activity.

The present invention can be understood as machine by deep learning semantic understanding, understands whole section of conversation content, dialogue is divided Trifle puts several bigger conversation contents of relevance together, and is as far as possible separated from each other the small dialogue of relevance.It is sharp again With structuring attention network, can relevance between Dynamic Recognition conversation activity, and combined with condition random field algorithm, Reach structuring prediction conversation activity purpose.

The present invention carries out semantic understanding to conversation content, captures the semantic information of conversation content profound level.Conversation activity is known Other primary precondition needs to carry out accurate semantic understanding to conversation content.Since conversation content naturally has hierarchy Feature, the present invention are handled semanteme by the way of layered modeling in this step.Word constitutes sentence, and sentence group At whole section of conversation content, every corresponding behavior of words is differentiated according to conversation content.

In step (1), the word layer rational formula of the dialog semantics information is as follows：

E=f_concat(E_w,E_a,E_pos,E_ner)

Wherein, E is that the final complete vector of word indicates, is spliced by four kinds of different dimensions word informations, f_concatIt is The function representation of splicing, E_wIndicate the Word2vec vector that word is obtained from the good English words vector model of Google's pre-training；E_aTable Showing indicates vector by the word that Recognition with Recurrent Neural Network learns by monogram information；Indicate word composition Each letter；E_posIndicate nltk kit treated word part-of-speech information；E_nerIndicate nltk kit treated word Entity class information.

To reach semantic understanding purpose, it is desirable that model must have enough comprehensions to word.The present invention is using each Word part-of-speech information abundant and morphological information, to enhance word in the ability to express of semantic space.

In step (1), specific step is as follows for the dialogue layer reasoning of the dialog semantics information：

(1-1) uses bidirectional valve controlled cycling element, and the implicit expression of the forward direction of each word is indicated to splice with implicit backward, The Spatial Semantics vector for obtaining entire sentence indicates that formula is：

U=f_biGRU(E₁,…,E_n)

Wherein, U indicates that the Spatial Semantics vector of entire sentence indicates, E_iIndicate i-th of word in sentence；

(1-2) obtains semantic expressiveness of the current sentence in context of co-text, and formula is：

C_t=tanh (W_m-1C_t-1+W_m+1C_t+1+b_m)

Wherein, C_tIt is expressed as semantic expressiveness of the t word in context of co-text, C_t-1And C_t+1For preceding word and rear word It is implicit to indicate, W_m-1, W_m+1, b_mIt is the parameter that training obtains, Tanh is activation primitive, that is to say, that in context of co-text, the Joint effect of the t word by its preceding word and rear word；

(1-3) uses Memory Neural Networks, integrates in conjunction with attention mechanism to two kinds of dialogue expressions, is finally melted The dialog semantics information of conjunction.

Here U only independently arrives the implicit expression study of vector that every is talked about, and but ignores sentence within a context It is influenced by front and back context.In order to further learn implicit expression of the every words in context of co-text, the present invention proposes one A variable C indicates implicit expression of the current sentence in context of co-text.

Specific step is as follows for step (1-3)：

(1-3-1) normalizes to obtain original sentence expression U by softmax_tWith the semantic expressiveness C in context of co-text_tIt Between correlation：

Wherein,It is expressed as the transposed vector of sentence original vector expression, p_j,iIndicate that original sentence indicates U_tWith it is upper and lower Literary context indicates C_tBetween correlation, it is possible to understand that be attention weight between the two.

(1-3-2) introduces memory network and exports O to generate final memory_t, formula is as follows：

O_t=∑_ip_j,iC_t

Final output O_tThe implicit expression C of context of co-text can be regarded as_tIn each hidden state by original sentence shadow Semantic understanding after sound.Since memory network can arbitrarily be superimposed multilayer, often plus one layer of understanding that can be expressed as to original sentence Deeper one layer.

(1-3-3) after k layers of memory network, present invention employs stack manipulations, i.e., by upper one layer of output WithIt is added, obtaining next layer of sentence finally indicates, specific formula is：

Wherein,Expression is influenced by last memory network, and the final semantic understanding of obtained dialogue sentence indicates. Mean understanding of the model to current session, have passed through the complicated interaction of multilayer and understand, merged the original semanteme of conversation content The semantic understanding for understanding and being influenced by context.By above step, it is ensured that model has had dialog semantics content Sufficient understanding.

In step (2), the structure trifle generally includes greeting trifle, chat trifle, question and answer trifle and takes leave of small Section etc..Context relation between same trifle is close, and is associated between different trifles, compares and becomes estranged.

It is presently believed that the division to trifle, will largely influence accuracy of the conversation activity on sequence labelling. Allow U={ U₁,U₂…,U_nIndicate each conversation content, y={ y₁,y₂…,y_nIndicate the corresponding behavior classification of every words, z= {z₁,z₂…,z_nIndicate discrete implicit variable, any one z_i∈ { 0,1 }, 0 represents context-free, and 1 represents context phase It closes.The purpose for introducing structuring attention mechanism can be based on context talked with interior so that model is when predict conversation activity Appearance and the corresponding behavior type of context every words, the relevance being inferred between current session and context, thus with reference to Current session behavior is predicted in the corresponding conversation activity of context.

In conversation activity identification mission, the behavior classification of every words is individually predicted in a manner of greed every time, it may be simultaneously Optimal solution will not be brought.On the contrary, it is desirable to based on context conversation content and the corresponding behavior of context every words Label, joint determine best sequence label.Therefore, the effect of linear conditions random field just highlights.

In step (3), the linear conditions random field algorithm is specially：For whole section talk with, conversation content and respectively it is right The conversation activity answered is the sequence of random variables that linear chain indicates, under conditions of given sequence of random variables X, stochastic variable sequence Conditional probability distribution P (Y | X) structure condition random field of Y is arranged, condition random field probability distribution can be expressed as：

Wherein, θ_i(z_i) indicate probability distribution of the condition random field in each implicit dialogue node, with unified condition with Airport setting：

For every conversation content, this content and correlativity of the upper content in relevance are all summarized, is passed through Use structuring marginal probability function p (z₁,…,z_n| U, y), in conjunction with preceding Back Propagation Algorithm, calculate based on condition random field The conversation activity distribution probability p (y of whole section of conversation content₁,..,y_n,U₁,..,U_n；θ)：

The present invention devises training algorithm end to end and testing algorithm, using maximum- likelihood estimation come condition for study Random field-structuring attention network parameter.Given training set (U, Y), log-likelihood function can be expressed as：

Wherein, Θ indicates the parameter that neural network is acquired.L indicates the loss function of training definition.

Objective function of the invention is defined as：

Indicate L2 regularization, λ is loss function L (Θ) and regular termsBetween tradeoff parameter.

For test phase, the present invention obtains optimal sequence prediction using viterbi algorithm.By dynamic programming algorithm, Conversation activity prediction can obtain in the following manner：

Y '=argmax_y∈Yp(y|U,Θ)

The conversation activity label that y ' expression model prediction goes out, argmax function are to take maximum in condition random field probability distribution Item is used as prediction result.

The present invention also provides a kind of conversation activity identifying system based on condition random field structuring attention network, tools Module includes：

Word layer representation module：For obtain the word2vec pre-training vector, the vector of character level, part of speech of word to Amount and entity class vector, and this four vectors are spliced to form to the final expression vector of the word；

Dialogue layer representation module：Using deep-cycle neural network, the original semantic expressiveness vector of sentence is obtained, and is combined Memory network, obtaining whole section of dialog semantics in conjunction with context of co-text and structuring attention mechanism indicates；

Behavior layer representation module：For talking with corresponding behavior classification according to conversation content prediction；

Context semantic understanding module：Context for being captured in dialog procedure using deep-cycle neural network is believed Breath；

Initialize dialogue state module：For initializing hyper parameter of the dialog model in training process and test process；

Condition random field probability distribution module：For calculating the context conversation activity pair when predicting current session behavior The influence degree of current session.

Test module：After model training finishes, conversation activity prediction result is externally exported.

The invention has the advantages that：

1, different from previous research, the present invention engages in the dialogue behavior from the angle that expansion condition random field structuring relies on Identification.Structuring attention network proposed by the invention, provide a kind of concern again of not only having paid close attention to dialog semantics content talk with it is small The new solution of section structure.

2, depth of seam division Recognition with Recurrent Neural Network proposed by the present invention is combined with memory-enhancing effect mechanism, sufficiently in simulation dialogue The semantic expressiveness of appearance.The frame proposed can be accomplished to train end to end, and this model can easily expand to not In same conversation tasks.

3, the present invention is on two popular data set SWDA and MRDA, by the way that experimental results demonstrate illustrate better than it The model performance of his reference line algorithm.From the experiment proves that model superiority.

Detailed description of the invention

Fig. 1 is overall structure diagram of the present invention in context semantic understanding；

Fig. 2 is the deep-cycle neural network schematic diagram of conversation content of the present invention；

Fig. 3 is that the dialogue trifle under real dialog scene of the present invention divides schematic diagram；

Fig. 4 is that the present invention is based on the structuring implicit semantics of linear conditions random field to divide schematic diagram；

Fig. 5 is that the present invention is based on the conversation activity identifying system block process of condition random field structuring attention network Figure；

Fig. 6 is the ten big conversation activity label thermodynamic charts that the present invention carries out on SwDA data set；

Fig. 7 is conversation activity identifying system of the present invention after CVAE dialog generation system combines, conversation activity classification number Measure the influence generated to dialogue.

Specific embodiment

The present invention is further elaborated and is illustrated with reference to the accompanying drawings and detailed description.

As shown in Figure 1, frame of the present invention is always divided into three layers using layering semantic understanding mode：

(a) word layer：For a word, the word2vec pre-training vector of the word is obtained, the vector of character level, Part of speech vector sum entity class vector.This four vectors are spliced to form to the final expression vector of this word.Firstly, of the invention Using the good English words vector model of Google's pre-training, the word2vec vector E of each word is obtained_w；Secondly, each word is It is made of each letter, the combination of different letters can indicate the root and etymology of word well.Pass through deep-cycle Neural network, the present invention can obtain another word vector E based on alphabetical level_a；In addition, each word have it is corresponding Part-of-speech information, e.g. adjective, noun or verb, the present invention obtain the corresponding part of speech letter of word by nltk kit Cease E_pos；There are also the entity information belonging to word, e.g. place name, name, certain specific entity information such as time or event, together Sample obtains E with nltk kit_ner.In this way, the list that a word present invention must be indicated to four kinds of different dimensions This four expression vectors are finally connected into the same vector, indicate the word finally complete space representation by word semantic information.

(b) dialogue layer：Using deep-cycle neural network, the original semantic expressiveness vector of sentence is obtained.Implementation is： U=f_biGRU(E₁,…,E_n).Obtaining whole section of dialog semantics in conjunction with context of co-text and structuring attention mechanism indicates.Realization side Formula is：C_t=tanh (W_m-1C_t-1+

W_m+1C_t+1+b_m).It indicates in the original vector talked in U and context after semantic expressiveness C, using memory Network constantly updates the level of interaction between U and C, so that model more enhances the dialog semantics degree of understanding.Mean model Understanding to current session have passed through the complicated interaction of multilayer and understand, merged the original semantic understanding of conversation content and process The semantic understanding that context influences.By above step, it is ensured that model has had sufficient reason to dialog semantics content Solution.

(c) behavior layer：Directly utilize condition random field algorithm different from previous, invention introduces structuring attention machines System, is no longer to regard whole section of dialogue as a flat article, but regard whole section of dialogue as a structured message, by difference Trifle is composed.Such as talk for one section, it will usually have greeting trifle, chat trifle, question and answer trifle, farewell trifle etc. Deng.Context relation between same trifle is close, and is associated between different trifles, compares and becomes estranged.It is presently believed that small The division of section will largely influence accuracy of the conversation activity on sequence labelling.It allows

U={ U₁,U₂…,U_nIndicate each conversation content, y={ y₁,y₂…,y_nIndicate the corresponding behavior class of every words Not, z={ z₁,z₂…,z_nIndicate discrete implicit variable, any one z_i∈ { 0,1 }, 0 represents context-free, and 1 represents up and down It is literary related.The purpose for introducing structuring attention mechanism, be so that model is when predicting conversation activity, can be based on context right Words content and the corresponding behavior type of context every words, the relevance being inferred between current session and context, thus With reference to the corresponding conversation activity of context, current session behavior is predicted.

Assuming that regarding one section of dialogue as the undirected graph structure for possessing n node, every talk in dialogue is One node of non-directed graph.Condition random field can be by learning implicit variable parameter θ_C(z_C) ∈ R divides to obtain.It defines herein Under, structuring attention probability can be defined as：

p(z|U,y；θ)=softmax (∑_Cθ_C(Z_C))

Wherein, p (z | U, y；θ) in parameter θ, the attention influenced by conversation content U and behavior classification y is general Rate, Z_CIndicate whether discrete implicit variable, characterization context are associated with, θ_C(Z_C) it is the implicit parameter function for dividing trifle.

Corresponding, the expression C of final whole section of conversation content can be by conversation content U and corresponding implicit dialogue line For state Z expression：

Wherein labelling function f is defined as f (U, y, z)=∑ by us_Cf_C(U,y,Z_C), indicate the hidden of dialogue trifle selection It is indicated containing state.Conversation content C can be understood as be it is very sensitive to dialogue trifle content, pay close attention to one section of conversation activity talk in Hold, the optional state of all dialogue trifle selections is all weighted according to implicit variable z~p average.Here p is defined At a mapping function of conversation content U and conversation activity y.

In practical applications, labelling function is defined as f by the present invention_i(U,y,z_i{ the z of)=1_i=1 } (U_i,y_i), i.e., for right Talk about the relevant trifle of content, labelling function emblem forms 1, and context incoherent for conversation content, setting associated with each other It is 0.By this definition, entire function expectation can be expressed as talking with one section, and how many note should be placed in every talk Power weight of anticipating is on the conversation activity mark of context.Formula is expressed as：

WhereinRepresent the whole trifle attention expectation of whole section of words.U_iAnd y_iI-th conversation content is respectively represented, And corresponding behavior type.p(z_i=1 | U, y) to represent current session content and upper sentence pair words be belong to same trifle general Rate.For whole section of dialogue, condition random field probability distribution can be expressed as：

Wherein θ_i(z_i) indicate probability distribution of the condition random field in each implicit dialogue node, unified item can be used The setting of part random field：

For every conversation content, this content and correlativity of the upper content in relevance are all summarized.Pass through Use structuring marginal probability function p (z₁,…,z_n| U, y), in conjunction with preceding Back Propagation Algorithm, can calculate based on condition random The conversation activity distribution probability p (y of whole section of conversation content of field₁,..,y_n,U₁,..,U_n；θ)：

As shown in Fig. 2, each word includes former word, character in the deep-cycle neural network schematic diagram of conversation content Layer, part of speech name the big information of entity class four, this makes the original semantic of conversation content indicate more accurate.Specific embodiment party Formula is：E=f_concat(E_w,E_a,E_pos,E_ner),E_wIndicate that word is directly extracted from the good Word2vec vector of Google's pre-training； E_aIt indicates to indicate vector by the word that Recognition with Recurrent Neural Network learns by monogram information,Indicate the group of words At each letter；E_posIndicate nltk kit treated word part-of-speech information；E_nerIndicate that treated for nltk kit Word entities classification information.After obtaining the complete semantic expressiveness of word, deep-cycle neural network is used, to whole sentence pair It talks about content and carries out semantic understanding.

As shown in figure 3, this section of dialogue is segmented into three trifles, first trifle is to greet, and second trifle is to ask It answers, third trifle is to take leave of.These three trifles each other without too big relevance, it is obvious that can separate into Row conversation activity identification.

As shown in figure 4, being the structuring implicit semantic segmentation schematic diagram based on linear conditions random field.Z_iIndicate i-th The structuring of dialogue is implicit to be indicated, for talking with trifle semantic segmentation.

As shown in figure 5, a kind of conversation activity identifying system based on condition random field structuring attention network, is always divided into For seven big modules, specific module includes：

Word layer representation module：For a word, the word2vec pre-training vector of the word is obtained, character level Vector, part of speech vector sum entity class vector.This four vectors are spliced to form to the final expression vector of this word.

Dialogue layer representation module：Using deep-cycle neural network, the original semantic expressiveness vector of sentence is obtained.And it combines Memory network, obtaining whole section of dialog semantics in conjunction with context of co-text and structuring attention mechanism indicates.

Behavior layer representation module：According to conversation content, corresponding behavior classification is talked in prediction.

Context semantic understanding module：Using deep-cycle neural network, for capturing the letter of the context in dialog procedure Breath.

Initialize dialogue state module：For initializing hyper parameter of the dialog model in training process and test process.

Condition random field probability distribution module：Using the characteristic of condition random field, when predicting current session behavior, consider Influence degree of the context conversation activity to current session.

Test module：The module be model is trained finish after, externally export conversation activity prediction result mould Block.By the module, system can show the final effect of algorithm with product form.

Pair of the present invention on conversation activity identification the data set SwDA and MRDA of two mainstreams with other current forefronts Words generating mode compares.Two large data sets introductions difference is as follows：

SwDA corpus：SwDA is the big data set based on handmarking, it is talked with by 1155 from phone pair It is obtained in words scene.Two strangers are all selected in experiment at random every time, they select one at random and are exchanged from topic.

MRDA corpus：MRDA is the data set recorded from 75 conference contents, and every by manually mark The intention classification of word, reaches conversation activity identifying purpose.

Particularly with regard to the introduction of corpus, can be learnt by table 1：

Table 1

Data set	Categorical measure	Vocabulary	Training set	Verifying collection	Test set
						SwDA	42	19k	1003(173k)	112(22k)	19(4k)
MRDA	5	10k	51(76k)	11(15k)	11(15k)

The present invention is mainly using conversation activity accuracy as judging quota.The dialogue of 7 current mainstreams is compared in total Activity recognition algorithm, respectively：Bi-LSTM-CRF,DRLM-Conditional,LSTM-Softmax,RCNN,CRF,HMM, SVM.Table 2 indicates that recognition accuracy of major algorithm model on SwDA corpus, table 3 indicate major algorithm model in MRDA language Expect the recognition accuracy on library.

Table 2

Model	Accuracy (%)
		Mankind's mark	84.0
Inventive algorithm	80.8
		Bi-LSTM-CRF	79.2
DRLM-Conditional	77.0
		LSTM-Softmax	75.8
RCNN	73.9
		CRF	71.7
HMM	71.0
		SVM	70.6

Table 3

Model	Accuracy (%)
		Inventive algorithm	91.4
Bi-LSTM-CRF	90.9
		LSTM-Softmax	86.8
CRF	83.9
		SVM	81.8

From table 2 and table 3 as can be seen that proposed by the present invention be based on condition random field structuring attention network frame, Optimal effectiveness is obtained compared to other algorithms on two large data sets, sufficiently illustrates the superiority of inventive algorithm.

In addition, the present invention has carried out the matching visualization of ten big conversation activity labels on SwDA data set.Such as Fig. 6 institute Show, abscissa represents real dialog behavior label, and ordinate represents the conversation activity label that model prediction of the present invention comes out.In figure Each node, color is deeper, and the value of the value and ordinate that represent abscissa is more close.

Finally, conversation activity identifying system of the present invention is combined with CVAE dialog generation system, by conversation activity Identification, auxiliary dialog generation system generate significant reply related to context implication.As shown in fig. 7, dialogue of the invention Activity recognition system, compared to other identifying systems, obvious accuracy is more done in the effect that auxiliary dialogue generates.Also side is demonstrate,proved Model proposed by the invention is illustrated in the superiority for comparing other forward position algorithms.This absolutely proves calculation proposed by the invention Method is more excellent than other models in the stability that auxiliary dialogue is replied.

Claims

1. a kind of conversation activity recognition methods based on condition random field structuring attention network, which is characterized in that including with Lower step：

(1) memory network is combined, dialog semantics information is subjected to hierarchical reasoning according to word layer, sentence layer, dialogue layer, semanteme is built Mould；

(2) application structure attention network carries out structure trifle to conversation content and draws according to the correlation between conversation content Point；

(3) obtained structured message is applied on linear conditions random field algorithm, based on context context prediction is current right Words behavior.

2. the conversation activity recognition methods according to claim 1 based on condition random field structuring attention network, It is characterized in that, in step (1), the word layer rational formula of the dialog semantics information is as follows：

E=f_concat(E_w,E_a,E_pos,E_ner)

Wherein, E is that the final complete vector of word indicates, is spliced by four kinds of different dimensions word informations, f_concatIt is splicing Function representation, E_wIndicate the Word2vec vector that word is obtained from the good English words vector model of Google's pre-training；E_aIndicate by Monogram information indicates vector by the word that Recognition with Recurrent Neural Network learns；Indicate each of word composition Letter；E_posIndicate nltk kit treated word part-of-speech information；E_nerIndicate nltk kit treated word entities Classification information.

3. the conversation activity recognition methods according to claim 1 based on condition random field structuring attention network, It is characterized in that, in step (1), specific step is as follows for the dialogue layer reasoning of the dialog semantics information：

(1-1) uses bidirectional valve controlled cycling element, and the implicit expression of the forward direction of each word is indicated to splice with implicit backward, is obtained The Spatial Semantics vector of entire sentence indicates that formula is：

U=f_biGRU(E₁,…,E_n)

C_t=tanh (W_m-1C_t-1+W_m+1C_t+1+b_m)

Wherein, C_tIt is expressed as semantic expressiveness of the t word in context of co-text, C_t-1And C_t+1It is implicit for preceding word and rear word It indicates, W_m-1, W_m+1, b_mIt is the parameter that training obtains, Tanh is activation primitive；

(1-3) uses Memory Neural Networks, integrates in conjunction with attention mechanism to two kinds of dialogue expressions, is finally merged Dialog semantics information.

4. the conversation activity recognition methods according to claim 3 based on condition random field structuring attention network, It is characterized in that, specific step is as follows for step (1-3)：

(1-3-1) normalizes to obtain original sentence expression U by softmax_tWith the semantic expressiveness C in context of co-text_tBetween Correlation：

Wherein,It is expressed as the transposed vector of sentence original vector expression, p_j,iIndicate that original sentence indicates U_tAnd context of co-text In semantic expressiveness C_tBetween correlation；

(1-3-2) introduces memory network and generates final memory output O_t：

(1-3-3) passes through after k layers of memory network, using stack manipulation, by upper one layer of outputWithIt is added, obtains down One layer of sentence finally indicates that specific formula is：

Wherein,Expression is influenced by last memory network, and the final semantic understanding of obtained dialogue sentence indicates.

5. the conversation activity recognition methods according to claim 1 based on condition random field structuring attention network, It is characterized in that, in step (2), the structure trifle includes greeting trifle, chat trifle, question and answer trifle and takes leave of trifle.

6. the conversation activity recognition methods according to claim 1 based on condition random field structuring attention network, It is characterized in that, in step (3), the linear conditions random field algorithm is specially：

Whole section is talked with, condition random field probability distribution is expressed as：

Wherein, θ_i(z_i) indicate probability distribution of the condition random field in each implicit dialogue node, with unified condition random field Setting：

For every conversation content, this content and correlativity of the upper content in relevance are summarized, by using knot Structure marginal probability function p (z₁,…,z_n| U, y), in conjunction with preceding Back Propagation Algorithm, whole section based on condition random field of calculating is right Talk about the conversation activity distribution probability of content：

Wherein, p (y₁,..,y_n,U₁,..,U_n；That θ) represent is conversation activity distribution probability, U_i(y_j) represent the i-th word prediction row For label y_jProbability, what Σ was represented is that in short distribution probability in different dialogue behavior, Π represent every words in dialogue Conversation activity be distributed summer condition probability.

7. a kind of conversation activity identifying system based on condition random field structuring attention network, which is characterized in that including：

Word layer representation module：For obtaining word2vec pre-training vector, the vector of character level, the part of speech vector sum of word Entity class vector, and this four vectors are spliced to form to the final expression vector of the word；

Dialogue layer representation module：Using deep-cycle neural network, the original semantic expressiveness vector of sentence is obtained, and combines memory Network, obtaining whole section of dialog semantics in conjunction with context of co-text and structuring attention mechanism indicates；

Context semantic understanding module：For capturing the contextual information in dialog procedure using deep-cycle neural network；

Condition random field probability distribution module：For calculating when predicting current session behavior, context conversation activity is to current The influence degree of dialogue；