CN107766506A

CN107766506A - A kind of more wheel dialog model construction methods based on stratification notice mechanism

Info

Publication number: CN107766506A
Application number: CN201710986813.6A
Authority: CN
Inventors: 张伟男; 汪意发; 朱庆福; 刘挺
Original assignee: Harbin Institute of Technology
Current assignee: Harbin Institute of Technology
Priority date: 2017-10-20
Filing date: 2017-10-20
Publication date: 2018-03-06

Abstract

The present invention relates to a kind of more wheel dialog model construction methods based on stratification notice mechanism, it is to depend on large-scale corpus to solve existing interactive system, training speed is influenceed by language material scale, and due to the reply nonuniqueness of dialogue generation, Seq2Seq models are invariably prone to generate general, the shortcomings that insignificant reply, and a kind of more wheel dialog model construction methods based on stratification notice mechanism are proposed, including：Sentence inputting is received, for each sentence, encryption implicit function is calculated since first word, calculates the Attention weights of each sentence, and calculates topic linguistic context and represents vector, finally calculates decryption implicit function, while result is exported.The present invention is applied to the chat robots system of open field.

Description

A kind of more wheel dialog model construction methods based on stratification notice mechanism

Technical field

The present invention relates to interactive system, and in particular to a kind of more wheel dialog models based on stratification notice mechanism Construction method.

Background technology

1st, foreign technology present situation

(1) conversational system based on artificial template

Based on the technology of artificial template by manually setting session operational scenarios, and some are write to each scene and targetedly talked with Template, template describe the problem of user is possible and corresponding answer template.

Weizenbaum et al. (1966) develops earliest chat robots ELIZA, ELIZA may go out according in dialogue Existing language situation, goes to be pre-designed corresponding language template, and text generator can be according to the input of user by the weight in input Want information to be embedded into template, finally give reply.

Chat is restricted to special scenes or specific topic by them, and generates sound using one group of pattern rule Should.

(2) conversational system based on retrieval

Chat robots based on retrieval technique then using the method for being similar search engine, are previously stored dialogue storehouse simultaneously Establish index, according to user's question sentence, in storehouse talk with progress fuzzy matching find most suitable response content.

Shaikh et al. (2010) constructs a cyberchat robot (VCA), can enter in chatroom with people The preliminary social activity of row, they utilize ongoing conversation subject to carry out web search using a kind of novel method, and Find the related subject that may be inserted into dialogue and change its flow, can be regarded as based on retrieval and and template method melt Close.

(3) the dialogue generation model based on deep learning

Application of the depth learning technology in dialogue generates is mainly towards open field chat robots, because extensive logical It is relatively easy to the acquisition of language material, the Sequence to Sequence models of the most frequently used reference machine translation, dialogue is given birth to Into the whole process by problem to reply be considered as in machine translation translation process from original language to object language.

Ritter et al. (2011) has used the dialogue language material obtained from Twitter, to imitate using Seq2Seq models Fruit has exceeded the dialog model based on searching system.

Sordoni et al. (2015) proposes a dialog generation system, and the system take into account the context letter in dialogue Breath, so as to obtain lifting on uniformity is replied.

Serban et al. (2016) proposes Hierarchical Nerual Network models, it is intended in dialogue Semanteme and interaction hereafter is modeled, so as to build the conversational system taken turns one more.

Jiwei Li et al. (2016) are directed to solving the problems, such as traditional Seq2Seq models generation common replies, introduce Mutual information improves the diversity that generation is replied as object function.Jiwei Li (2016) use improved Seq2Seq simultaneously Model is modeled to user styles, user embedding is introduced as priori in decoding end, so as to improve dialogue system The uniformity and correlation of system.

Louis Shao et al. (2017) improve training method and the decode ends of Seq2Seq models, and add Beam-search, so as to improve the reply length and uniformity and correlation of model generation.

2nd, domestic technique present situation

The country is because start late, the method that the research in terms of conversational system is also mainly based upon deep learning, Li Hang et al. (2015) proposes Neural Responding Machine, using improved Seq2Seq models, adds Attention mechanism simultaneously carries out fusion so as to achieve good result in short text conversational system using multiple models.

Mou Lili (2016), which are conceived to, solves the problems, such as traditional Seq2Seq models generation common replies, proposes first Seq2Bf models, keyword is first predicted by using mutual information, then based on keyword reply the generation of sentence.

Zongcheng Ji (2014) then use the method based on retrieval simultaneously, using state-of-the-art information retrieval technique, By huge dialogue corpus, a relatively intelligent conversational system is created.

3rd, the brief analysis of domestic and foreign literature summary

The research both at home and abroad for open field dialog generation system mainly includes the method based on template at present, based on retrieval Method and method based on deep learning.The method based on template during early stage does not carry out real Language Processing, raw Into language it is stiff, formalization, the problem of often existing in terms of semantic and fluency, relative to open field conversational system, the party Method is more suitable for Task chat robots.

Method based on retrieval be it is existing everybody talk with corpus by sorting learning art and depth matches skill Art finds the optimal reply for being adapted to currently input.The limitation of this method is to be only capable of being replied with fixed language mode, nothing Method realizes the diversity combination of word.

Current most popular method is the method based on deep learning, uses the Seq2Seq moulds from machine translation task Type, usually Encoder-Decoder structures, coordinate fairly large dialogue language material, trained end to end for realizing, so as to Obtain a conversational system.This method can break through before limitation of the method for clause word, mainly user is inputted The problem of be modeled, the generation of word for word (word) is then carried out according to intermediate result, can creatively generate replys, it is exhausted at present Most of researchs are all based on the expansion or improvement of the model.

But the method based on deep learning depends on large-scale corpus, Seq2Seq model training speed is advised by language material The influence of mould, and due to the reply nonuniqueness of dialogue generation, Seq2Seq models are invariably prone to generate general, it is not intended to justice Reply, such as " hello ", " I does not also know ", " heartily " etc..

Furthermore current most of conversational systems are devoted to optimize the single-wheel dialogue i.e. quality of question-response process, and chat is One continuous interactive process for having specific background, the meaning of a word will combine the background ability of the context of dialogue or correlation sometimes It can determine that.The problem of needing to be studied is still for context modeling.

The content of the invention

The invention aims to solve existing interactive system to depend on large-scale corpus, training speed by The influence of language material scale, and due to the reply nonuniqueness of dialogue generation, Seq2Seq models be invariably prone to generate it is general, The shortcomings that insignificant reply, and a kind of more wheel dialog model construction methods based on stratification notice mechanism are proposed, including：

Step 1: receive n+1 sentence inputting c_o,c₁,…c_n；

Step 2: it is directed to each sentence c_i, encryption implicit function h is calculated since first word_i,t=f (x_i,t, h_i,t-1), wherein wherein x_i,tRepresent c_iT-th of word；Wherein h_i,0It is designated as parameter preset；And finish last calculating h_i,tIt is designated as sentence c_iEncryption implicit function h_i；

Step 3: calculate the Attention weights of i-th of sentenceWherein e_i=v^Ttanh(Wh_i+ Uh_n)；V, W, U are the parameter preset in Attention mechanism；Tanh is activation primitive；

Step 4: calculating topic linguistic context represents vector T=∑ α_ih_i；

Step 5: calculate decryption implicit function s_t=f (y_t-1,s_t-1, T), y_t-1Represent the iteration input quantity at t-1 moment, y₀For Preset value；s₀=h_n；

Step 6: by s₁,s₂,……s_t……s_nValue exported as a result.

The present invention also provides another more wheel dialog model construction methods based on stratification notice mechanism, including：

Step 1: receive n+1 sentence inputting c_o,c₁,…c_n；

Step 3: calculate the Attention weights of t-th of word in i-th of sentenceWherein e_it= v^Ttanh(Wh_i+Us_t-1)；V, W, U are the parameter preset in Attention mechanism；s_t-1For the hidden layer state at t-1 moment；

Step 4: calculate dynamic representation vector D_t=α_ith_i；

Step 6: by s₁,s₂,……s_t……s_nValue exported as a result.

Beneficial effects of the present invention are：

1st, do not influenceed independent of large-scale corpus, training speed by language material scale, do not tend to generate it is general, Insignificant reply；

2nd, the present invention is tested on Opensubtitles data sets and Ubuntu data sets.

On Opensubtitles data sets, Embedding Average of the invention can reach 0.565647, Apparently higher than the 0.557139 of prior art；The Greedy Matching of the present invention can reach 0.523235, hence it is evident that be higher than The 0.503273 of prior art；The Extrema of the present invention can reach 0.393724, higher than the 0.393189 of prior art.

On Ubuntu data sets, Embedding Average of the invention can reach 0.612089, hence it is evident that be higher than The 0.577022 of prior art, Greedy Matching of the invention can reach 0.429328, hence it is evident that higher than prior art 0.416948, Extrema of the invention can reach 0.397543, higher than the 0.391392 of prior art.

Brief description of the drawings

Fig. 1 is the flow chart of more wheel dialog model construction methods based on stratification notice mechanism of the present invention；

Fig. 2 is the schematic diagram that encryption implicit function process is calculated in the step 2 of the specific embodiment of the invention one；Wherein Context refers to sentence inputting c_o,c₁,…c_nThe context of composition；

Fig. 3 is schematic diagram of the step 3 of the specific embodiment of the invention one to step 5；Wherein Topic Net represent to calculate Topic linguistic context represents the computing module of vector T, and topic vector represent that topic linguistic context represents vector；Decoder, which is used to calculate, to be solved Close implicit function；By h_nThe arrow for pointing to decoder represents that the calculating of decryption implicit function needs to use h_nValue.

Embodiment

Embodiment one：

The present invention provides a kind of more wheel dialog model construction methods based on stratification notice mechanism, as shown in figure 1, bag Include：

Step 1: receive n+1 sentence inputting c_o,c₁,…c_n。

Step 2: it is directed to each sentence c_i, encryption implicit function h is calculated since first word_i,t=f (x_i,t, h_i,t-1), wherein wherein x_i,tRepresent c_iT-th of word；Wherein h_i,0It is designated as parameter preset；And finish last calculating h_i,tIt is designated as sentence c_iEncryption implicit function h_i。

Step 3: calculate the Attention weights of i-th of sentenceWherein e_i=v^Ttanh(Wh_i+ Uh_n)；V, W, U are the parameter preset in Attention mechanism.

Step 4: calculating topic linguistic context represents vector T=∑ α_ih_i。

Step 5: calculate decryption implicit function s_t=f (y_t-1,s_t-1, T), y_t-1The iteration input quantity at t-1 moment is represented,₀To be pre- If value；s₀=h_n；When the sentence inputting received in step 1 is training data, y_t-1For default model answer word, work as step When the sentence inputting received in rapid one is measured data, y_t-1Value be equal to s_t-1。

Step 6: by s_tValue exported as a result.

Specifically, mode of the invention is the Seq2Seq models based on machine translation task, and using Encoder-Decoder structures, it is the characteristics of this model and structure, for a sentence c_i, each list that will wherein include Word x_i,tModel is sequentially inputted, model is for each x_i,tCorresponding encryption implicit function (as shown in Figure 2) is calculated according to formula, its Result of calculation is used to calculate next encryption implicit function, and by that analogy, the value of last encryption implicit function output is as whole The encryption implicit function h of sentence_i, due to have input n+1 sentence altogether, therefore can also obtain n+1 encryption implicit function.Then root Attention weights are calculated according to these encryption implicit functions, Attention mechanism is also conventional mechanism, and the present invention is to its formula Form is modified, the improvement to formula is mainly when calculating the weight of each, it is contemplated that the encryption of current sentence Implicit function (the i.e. formula e of last in implicit function and all sentences_i=v^Ttanh(Wh_i+Uh_n) in h_iRepresent current sentence Encryption implicit function, h_nRepresent last encryption implicit function), vector T then is represented further according to weight calculation topic linguistic context, The vector contains the necessary information at encryption end, and this vector is decrypted last decrypting end (is used for the function i.e. s decrypted_t =f (y_t-1,s_t-1, T)) result is obtained, as shown in Figure 3.During actual treatment data, y_t-1Be output result, s_tIt is exactly y_t, i.e. t The word of moment output, and in the training process, in order to ensure the effect of training, s_tIt is actual output, and y_t-1It is to preset Word, i.e. word in model answer.

It can be seen that the method for the present invention is based not only on current problem and replied, current context and up and down also can take into account Literary information, this process can be considered as to Memory Process, people read information from memory, with reference to current problem, so as to provide Reply.This item work is handled whole context using memory network (Memory Network), remembers net end to end Network can be used for QA tasks or read understanding, and different tasks is completed with this to obtain the expression of document by modeling.

Last sentence in context is considered as key (i.e. formula e by this item work_i=v^Ttanh(Wh_i+Uh_n) in h_n), its Remaining sentence is considered as memory, and representing for whole context is calculated with this, encryption end of the part as Seq2Seq models, as a result Decrypting end is input to, is replied for decoding.The timing information of sentence in memory is considered that simultaneously, assigns each sentence Different weights, represented with this to obtain more preferable context.

Embodiment two：

Step 1: receive n+1 sentence inputting c_o,c₁,…c_n。

Step 3: calculate the Attention weights of t-th of word in i-th of sentenceWherein e_it= v^Ttanh(Wh_i+Us_t-1)；V, W, U are the parameter preset in Attention mechanism；s_t-1For the hidden layer state at t-1 moment.

Step 4: calculate dynamic representation vector D_t=α_ith_i。

Step 5: calculate decryption implicit function s_t=f (y_t-1,s_t-1,D_t), y_t-1Represent the iteration input quantity at t-1 moment, y₀For Preset value；s₀=h_n。

Step 6: by s_tValue exported as a result.

The difference of present embodiment and embodiment is, the side of Attention weights is calculated in step 3 Formula is different, and present embodiment is to calculate weight for each each word, and embodiment one be only for Whole sentence calculates a weight.Another place's difference is that last encryption implicit function h is not used when weight specifically calculates_n, and It is the hidden layer state s using last moment_t-1.Embodiment one and embodiment two are all based on " considering on entire chapter Hereafter, and from key values weight is calculated " this invention thinking, the key values only chosen one are to select last to add Close implicit function, another is the hidden layer state for selecting decrypting end.

In the present invention, the method for deep learning has been used to be modeled contextual information, so as to optimize dialogue matter Amount, improve correlation and uniformity that generation is replied.The main models that this item patent uses are Seq2Seq models, and structure is Encoder-Decoder structures (encryption end-decryption end structure), because final goal is that decoding generation one is semantic smooth, one The reply of cause property good relationship, Decoder (decrypting end) must be a preferable language model, so Decoder is Realization based on RNN.

Traditional Seq2Seq models simply consider the problem of single-wheel dialogue and reply, but not only nationwide examination for graduation qualification when people engage in the dialogue Dialogue is current, and what is said or talked about, also can take into account current context and contextual information, but in more wheel dialogues, current sentence is namely Distance currently replys nearest sentence and is considered as most important sentence on time, because it is for the direct of the sentence that generation, which is replied, Reply.In order to obtain the information of whole context, so as to be modeled to linguistic context topic, institute of this item work using RNN to context There is sentence to be modeled, so as to obtain multiple expressions, then referring to the notice mechanism in machine translation, layer is carried out to context The notice of secondaryization calculates, and show that the attented of context is represented, and in this, as the expression of topic linguistic context, be added to decoding Hold and be used for auxiliary decoder, preferably replied so as to generate uniformity correlation.

Beneficial effects of the present invention are specifically described below：

The method main flow evaluated at present the chat robots under open field environment has two kinds of thinkings, objective indicator evaluation with Subjective artificial scoring.The objective indicator part that this item work uses is mainly with Embedding Average, Greedy Matching, Vector Extrema are the Evaluations matrix based on term vector of representative.

The general principle of above-mentioned objective indicator is all that the candidate for calculating generation is replied between the target retro of Key for Reference Similarity, in this, as reply quality scoring, basic skills be by understand each word the meaning come judge reply Correlation, term vector are the bases for realizing this evaluation method.According to semantic distribution, distribute a vector to each word and be used for Represent this word, then by various methods can respectively obtain candidate reply with the sentence of target retro vector, then by cosine away from From being compared, it is possible to obtain the similarity of the two.

Opensubtitles is the well-known captions website in a world, there is provided the captions of various countries' language are downloaded, and can therefrom be obtained Obtain the English dialogue data collection being made up of film dialogue；Ubuntu data sets are that the chat record sticked is discussed in Ubuntu forums The English dialogue data collection of composition.This patent work obtains 100000 Opensubtitles session operational scenarios, is processed into The dialogue session of wheel more than 800000, on this basis using word2vec pre-training term vectors, has obtained 30000+ word Table；The dialogue session of wheel more than about 450000 has then been obtained on Ubuntu data sets, has been obtained using word2vec pre-training term vectors 30000+ vocabulary is arrived.

At present topic model (method of embodiment one) and dynamic model (embodiment two Method) tested on Opensubtitles data sets and Ubuntu data sets, by handling former dialogue language material After context, dialogue session form is processed into, carries out experiment test.

This item patent work and three LSTM, HRED model, VHRED models baseline models are in Embedding Compared in Average, Greedy Matching and Vector Extrema indexs.Test result indicates that this item is special The topic model of profit work performances of three indexs on Opensubtitles data sets and Ubuntu data sets are better than baseline；And dynamic model have also exceeded baseline performance in the first two index.It the results are shown in Table 1 and table 2.

Table 1:Result on Opensubtitles data sets, lstm, hred, vhred are baseline results, topic Model, dynamic model are the method applied for a patent

Table 2:Result on Ubuntu data sets, lstm, hred, vhred are baseline results, topic model, Dynamic model are the method applied for a patent

The effect of the present invention can be more intuitively found out by following 4 groups of test cases.

The method that wherein topic mode represent embodiment one, dyna model represent embodiment two Method, hred model, vhred model and LSTM model are existing models.

Example 1：

Above:

I will attempt the exploitation version using PPA now, look at whether it can collapse again.

You are seeing the cpu temperature of computer

No, the problem of I is not run into also in terms of temperature ... I where can see its temperature

topic model：You can attempt to delete your configuration file, then try again

dyna model：Attempt from terminal operating lspci, there is individual list there

hred model：System->Preference->Power management

vhred model：I does not understand your meaning

LSTM model：I does not understand

Example 2：

Above：

Come in.

It thanks.

It is not so polite, it is in own home just to work as

Miss Di La etc. be able to can go out.

Alright

Have fun

topic model:Had fun I believe that can do by myself

dyna model:Thank you

hred model:Come on

vhred model:Alright, we walk

lstm model:I understands careful

Example 3：

Above:

According to the method in link, nomodeset parameters are added in kernel into startup item

In additionIt thanks.

topic model：You can also use nomodeset options editor's configuration text under/etc/default/grub Part

dyna model：Instruct identical with last entry of last time renewal

hred model：It is unfriendly

vhred model：It is unfriendly

LSTM model：I does not understand

Example 4：

Above：

Who knows another problem.Pidgin is allowed to show messages menu either with or without method

It please check down whether you really use NVIDIA driver.Or reset driving ensure to have no problem

topic model：I has a nvidia video card, and a video card of Radeon Mobility 7000.

dyna model：I has one 75 feet of best display.

hred model：How I this do

vhred model：I knows

lstm model：I does not know

Vhred model and lstm model are can be seen that from above-mentioned 4 examples to be more likely to generate general reply, It is difficult to the context generation for complexity meets the reply of linguistic context.Hred model generate general reply sometimes, sometimes Generation meets the reply of linguistic context, but generally speaking, the situation for not meeting linguistic context is more.And the present invention topic model and Dyna model can generate the reply content for meeting linguistic context.

The embodiment one, two of the present invention may be directly applied in the chat robots system of open field.This is specially The chat robots that the application vector of sharp technology is Harbin Institute of Technology's social computing with Research into information retrieval center is developed are " stupid It is stupid ".

More wheel dialogue functions provided by the invention can be as a module of whole chat robots system：The module connects By the enabling signal from middle control module, independently it is responsible for once the progress of more wheel dialogues, when taking turns end-of-dialogue, by control flow more Give back middle control module.

From deployment way, this technology can independently be used as a calculate node, be deployed in Ali's cloud or U.S. On the cloud computing platforms such as group's cloud, the communication between other modules can be carried out by way of binding IP address and port numbers.

In the specific implementation of the technology, because having used deep learning correlation technique, corresponding depth is needed to use Learning framework：The all alternatively frameworks of Tensorflow and Pytorch.No matter any deep learning framework is used, not The external interface of the technology modules can be influenceed.

The present invention can also have other various embodiments, in the case of without departing substantially from spirit of the invention and its essence, this area Technical staff works as can make various corresponding changes and deformation according to the present invention, but these corresponding changes and deformation should all belong to The protection domain of appended claims of the invention.

Claims

1. a kind of more wheel dialog model construction methods based on stratification notice mechanism, including：

Step 1: receive n+1 sentence inputting c_o, c₁... c_n；

Step 2: it is directed to each sentence c_i, encryption implicit function h is calculated since first word_{I, t}=f (x_{I, t}, h_{I, t-1}), its In wherein x_{I, t}Represent c_iT-th of word；Wherein h_{I, 0}For parameter preset；And the h for finishing last calculating_{I, t}As sentence c_i Encryption implicit function h_i；

Step 3: calculate the Attention weights of i-th of sentenceWherein e_i=v^Ttanh(Wh_i+Uh_n)；v、 W, U is the parameter preset in Attention mechanism；Tanh is activation primitive；

Step 4: calculating topic linguistic context represents vector T=∑ α_ih_i；

Step 5: calculate decryption implicit function s_t=f (y_t-1, s_t-1, T), y_t-1Represent the iteration input quantity at t-1 moment, y₀It is default Value；s₀=h_n；

Step 6: by s₁, s₂... s_nValue exported as a result.

2. more wheel dialog model construction methods according to claim 1 based on stratification notice mechanism, its feature exist In, when the sentence inputting received in step 1 is training data, y in step 5_t-1For default model answer word, work as step When the sentence inputting received in rapid one is measured data, y in step 5_t-1Value be equal to s_t-1。

3. a kind of more wheel dialog model construction methods based on stratification notice mechanism, including：

Step 1: receive n+1 sentence inputting c_o, c₁... c_n；

Step 3: calculate the Attention weights of t-th of word in i-th of sentenceWherein e_it= v^Ttanh(Wh_i+Us_t-1)；V, W, U are the parameter preset in Attention mechanism；s_t-1For the hidden layer state at t-1 moment； Tanh is activation primitive；

Step 4: calculate dynamic representation vector D_t=α_ith_i；

Step 5: calculate decryption implicit function s_t=f (y_t-1, s_t-1, D_t), y_t-1Represent the iteration input quantity at t-1 moment, y₀It is default Value；s₀=h_n；

Step 6: by s₁, s₂... s_nValue exported as a result.

4. more wheel dialog model construction methods according to claim 3 based on stratification notice mechanism, its feature exist In, when the sentence inputting received in step 1 is training data, y in step 5_t-1For default model answer word, work as step When the sentence inputting received in rapid one is measured data, y in step 5_t-1Value be equal to s_t-1。