CN106126596B

CN106126596B - A kind of answering method based on stratification memory network

Info

Publication number: CN106126596B
Application number: CN201610447676.4A
Authority: CN
Inventors: 许家铭; 石晶; 姚轶群; 徐波
Original assignee: Institute of Automation of Chinese Academy of Science
Current assignee: Institute of Automation of Chinese Academy of Science
Priority date: 2016-06-20
Filing date: 2016-06-20
Publication date: 2019-08-23
Anticipated expiration: 2036-06-20
Also published as: CN106126596A

Abstract

The present invention provides a kind of answering methods based on stratification memory network, sentence granularity memory coding is carried out first, and under the stimulation of problem semantic coding, attention mechanism by more taking turns iteration completes the information inference of sentence granularity memory unit, sentence is screened by the sampling of k maximum, word granularity memory coding is also carried out on the basis of sentence granularity memory coding, memory coding is carried out in two levels, form the memory coding of stratification, it is distributed using sentence granularity and word granularity memory unit associated prediction output Word probability, improve the accuracy of automatic question answering, efficiently solve the answer select permeability of low-frequency word and unregistered word.

Description

A kind of answering method based on stratification memory network

Technical field

The present invention relates to automatically request-answering system constructing technology field, relate more specifically to a kind of based on level memory network End-to-end answering method.

Background technique

For a long time, automatic question answering always is one of task most challenging in natural language processing problem, this Business needs to carry out deep layer understanding to text and filters out candidate answers to respond as system.Current existing conventional method packet It includes: stand-alone training being carried out to modules during text-processing using pipeline mode, then merges the mode of output；Building Large-scale structure knowledge base, and information inference and answer prediction are carried out based on this knowledge base.In recent years, it is based on deep learning side The end-to-end system of method is widely used in solving various tasks, these methods are not necessarily to without manual construction feature to each mould Block carries out independent tuning.

Question answering system is broadly divided into two steps: related semantic information is positioned first, which is known as " activation stage ", It is then based on relevant information and carries out response generation, which is known as " generation phase ".Recently, Neural memory network model is in question and answer Preferably effect is achieved in system task.But these models it is maximum the disadvantage is that using single level sentence granularity memory list Member can not preferably solve the problems, such as low-frequency word or unregistered word.And under normal conditions, in order to which the time for reducing model is complicated Degree, often needs to reduce dictionary scale.At this point, existing end-to-end neural network model can not preferably select low frequency or not step on Word is recorded to export as answer.I.e. when target answer word is when except training dictionary, existing method can not be compared in the on-line testing stage Accurate answer is selected to export as model well.By taking following dialog texts as an example:

What is your name 1. you are good by Mr.?

2. oh, I is Williamson.

3. could you tell me your passport number.

4. it is good, it is 577838771.

5. there are also your telephone numbers?

6. number is 0016178290851.

It is assumed that " Williamson ", " 577838771 " and " 0016178290851 " are low-frequency word or unregistered word, if passed System method abandons these words or uniformly with the replacement of " unk " symbol, these methods can not be selected from dialog text Accurate user information out.However, in practical applications, how most answer informations design from low-frequency word or long-tail word A kind of answer selection method can effectively solve the problem that unregistered word is the task that current automatically request-answering system field urgently needs.

Summary of the invention

(1) technical problems to be solved

In order to solve prior art problem, the present invention provides a kind of answering methods based on stratification memory network.

(2) technical solution

The present invention provides a kind of answering methods based on stratification memory network, comprising: step S101: merging the position of word The time serial message with sentence is set, the sentence in distich subclass carries out sentence granularity memory coding, and it is single to obtain a granularity memory The binary channels memory coding of member；Step S102: complete by the attention mechanism for more taking turns iteration under the stimulation of problem semantic coding At the information inference of the sentence granularity memory unit, the output Word probability in the sentence granularity memory unit in dictionary dimension is obtained Distribution；Step S103: the sampling of k maximum is carried out to the information inference result of the sentence granularity memory unit, from the sentence set In filter out k maximum sampling important sentences set；Step S104: using bidirectional circulating neural network model to the sentence set Word granularity memory coding is carried out, the memory coding of word granularity memory unit is obtained；Step S105: it is compiled based on described problem semanteme Code, the memory coding of word granularity memory unit and k maximum sample important sentences set, and it is defeated to obtain word granularity by attention mechanism Word probability is distributed out；And step S106: associated prediction output Word probability distribution in subordinate clause granularity and word granularity memory unit, and It is exercised supervision training using cross entropy.

(3) beneficial effect

It can be seen from the above technical proposal that the answering method of the invention based on stratification memory network has with following Beneficial effect:

(1) present invention carries out sentence granularity memory coding first, and under the stimulation of problem semantic coding, by taking turns iteration more Attention mechanism complete sentence granularity memory unit information inference, the accuracy and timeliness of automatic question answering can be improved, have It is selected conducive to the answer of low-frequency word and unregistered word；

(2) sentence is screened by the sampling of k maximum, the efficiency of automatic question answering can be improved, reduce computation complexity；

(3) word granularity memory coding is also carried out on the basis of sentence granularity memory coding, i.e., is remembered in two levels Coding, forms the memory coding of stratification, can be further improved the accuracy of automatic question answering；

It (4) when carrying out word granularity memory coding using Recognition with Recurrent Neural Network, is operated on full sentence set X, the party Formula can be introduced into context environmental semantic information of the word in sentence universal class during word granularity memory coding, can be improved The accuracy and timeliness of automatic question answering；

(5) the attention mechanism in word granularity memory unit is transported on the word granularity memory unit subclass after k sample It calculates, avoids the interference information in memory coding, and reduce the calculation amount of word granularity attention mechanism；

(6) it is distributed, is can be further improved certainly using sentence granularity and word granularity memory unit associated prediction output Word probability The accuracy of dynamic question and answer, efficiently solves the answer select permeability of low-frequency word and unregistered word.

Detailed description of the invention

Fig. 1 is the flow chart of the answering method based on stratification memory network of the embodiment of the present invention；

Fig. 2 is the block schematic illustration of the answering method based on stratification memory network of the embodiment of the present invention；

Fig. 3 is the sentence granularity memory coding of the embodiment of the present invention and the information inference signal based on sentence granularity memory coding Figure；

Fig. 4 is that the word granularity memory coding of the embodiment of the present invention and the attention of word-based granularity memory coding activate signal Figure；

Fig. 5 is the performance schematic diagram 1 of the answering method based on stratification memory network of the embodiment of the present invention；

Fig. 6 is the another performance schematic diagram of the answering method based on stratification memory network of the embodiment of the present invention.

Specific embodiment

To make the objectives, technical solutions, and advantages of the present invention clearer, below in conjunction with specific embodiment, and reference Attached drawing, the present invention is described in further detail.

The invention discloses a kind of answering methods based on stratification memory network, are based on a kind of full neural network structure End to end model, may be implemented the information inference in sentence set, screening and word granularity selection effectively solve to ask under big data System is answered to the answer select permeability of low-frequency word or unregistered word.Answering method of the invention is by the sentence with time serial message Subclass carries out the memory coding of two stratification respectively, is respectively as follows: a granularity memory coding and word granularity memory coding.Then Information inference, screening and activation, the distribution of associated prediction candidate answers Word probability are carried out based on stratification memory network.

Answering method of the invention carries out sentence vectorization memory coding by stratification memory network elder generation distich subclass, Consider location information and sentence sequence time information in sentence set of the word in sentence, then passes through the note of mostly wheel iteration Power mechanism of anticipating completes the information inference of sentence granularity memory unit, and carries out the sampling of k maximum based on the reasoning results, filters out important Sentence information.Then the sequential coding of word granularity is carried out using bi-directional cyclic network model distich subclass, and passes through attention Power mechanism carries out the information activation of word granularity memory unit from the information being screened, and finally subordinate clause granularity and word granularity are remembered respectively Recall prediction output Word probability in unit to be distributed and carry out team surveillance training by Softmax, learns automatic question answering end to end Model.

The answering method based on stratification memory network as the embodiment of the present invention is carried out with reference to the accompanying drawing detailed Description.

Fig. 1 is the flow chart of the answering method based on stratification memory network of the embodiment of the present invention, and referring to Fig.1, this is asked The method of answering includes:

Step S101: merging the position of word and the time serial message of sentence, and the sentence in distich subclass carries out sentence grain Memory coding is spent, the binary channels memory coding of granularity memory unit is obtained.

Include: referring to Fig. 3, step S101

Sub-step S101a: the sentence in distich subclass with time serial message carries out the mapping of binary channels term vector, obtains Binary channels term vectorization to sentence encodes.

Sub-step S101a includes: the given sentence set X={ x with time serial message_i}_{I=(1,2 ..., n)}, wherein i For the current time sequence of sentence, n is the maximum time sequence length of sentence set；Two term vector matrixes of random initializtionWithWherein, | V | it is dictionary dimension, d is the dimension of term vector, and A and C are respectively adopted standard deviation and are 0.1, sentence x of the normal distribution that mean value is 0 as random initializtion parameter, in distich subclass X_iCarry out binary channels term vector It maps, then sentence x_iIn word x_ijBinary channels vectorization be encoded toWithJ is word in sentence x_iIn Location information.

Sub-step S101b: binary channels term vectorization coding is updated according to location information of the word in sentence.

Sub-step S101b includes: that the dimension d of the location information j and term vector according to word in sentence generate update matrix L, then updated binary channels term vector is encoded to l_gj·(Ax_ij) and l_gj·(Cx_ij), in which:

l_gj=(1-j/J_i)-(g/d)(1-2j/J_i) (1)

Wherein, J_iIt is sentence x_iThe number of middle word, and it is current dimension values in the term vector of d that g, which is dimension, and 1≤j≤J With 1≤g≤d.

Sub-step S101c: the time serial message for merging sentence carries out sentence granularity memory coding to sentence, obtains a granularity The binary channels memory coding of memory unit.

Sub-step S101c includes: the vectorization matrix of two sentence time serieses of random initializtionWithWherein, n is the maximum time sequence length of sentence set, and d is time arrow dimension, the dimension phase with term vector Together, T_AAnd T_CStandard deviation is 0.1, mean value is 0 normal distribution is respectively adopted as random initializtion parameter, then sentence granularity is remembered The binary channels memory coding of unit is M^(S)={ { a_i, { c_i, in which:

Wherein, l_jIt is middle update matrix l in sentence x_iIn j-th of word renewal vector, operator element between vector multiplies Method operation, such as l in formula (2)_j·(Ax_ij) indicate vector l_jWith vector (Ax_ij) carry out the operation of element multiplication.

Step S102: under the stimulation of problem semantic coding, the attention mechanism by more taking turns iteration completes sentence granularity note The information inference for recalling unit obtains being distributed in granularity memory unit in the output Word probability of dictionary dimension.

Step S102 includes:

Sub-step S102a: vectorization expression is carried out to question text, obtains the semantic coding of problem.

Sub-step S102a includes: to utilize the term vector matrixTo j-th of word q in question text q_jCarry out to Quantization meansAnd the word-based position j in question text is updated vectorization expression, obtains problem Semantic coding:

With the formula (2) and (3), l_jTo update matrix l in sentence x_iIn j-th of word renewal vector.

Sub-step S102b: under the stimulation of problem semantic coding, using attention mechanism in the double of sentence granularity memory unit Information activation is carried out in the memory coding of channel；

Sub-step S102b includes: the attention using dot product mode computational problem semantic coding in sentence granularity memory unit Weight:

Then under the stimulation of problem semantic coding, the active information of the binary channels memory coding of sentence granularity memory unit are as follows:

Sub-step S102c: the attention mechanism by more taking turns iteration completes the information inference in sentence granularity memory unit, obtains It is distributed on to sentence granularity memory unit in the output Word probability of dictionary dimension.

Sub-step S102c includes: that R wheel information activation is carried out in sentence granularity memory unit, finds candidate sentences set, obtains The active information O taken turns to R_R, wherein in r+1 wheel information activation,

Wherein, 1≤r≤(R-1) uses independent term vector matrix A in r+1 wheel information activation^r+1And C^r+1And when Between vector matrixWithDistich subclass carries out vectorization expression, andC^rWithIt adopts respectively Use standard deviation is 0.1, mean value is 0 normal distribution as random initializtion parameter.

The information inference in sentence granularity memory unit is completed by the attention mechanism that R takes turns iteration, obtains a granularity memory It is distributed on unit in the output Word probability of dictionary dimension are as follows:

Wherein,For dictionary dimension set of words,The term vector square of information activation is taken turns for R Battle array, and T is transposition operator.

The present invention carries out sentence granularity memory coding first, and under the stimulation of problem semantic coding, by taking turns iteration more Attention mechanism completes the information inference of sentence granularity memory unit, the accuracy and timeliness of automatic question answering can be improved, favorably It is selected in the answer of low-frequency word and unregistered word.

Step S103: the information inference result of distich granularity memory unit carries out the sampling of k maximum, screens in subordinate clause subclass K maximum samples important sentences set out.

Step S103 includes:

Sub-step S103a: R takes turns the attention weight vectors of information activation in distich granularity memory unitTherefrom choose its k maximum attention weight subclass

Sub-step S103b: k maximum attention weight subclass are chosenCorresponding sentence collection cooperation Important sentences set is sampled for k maximumSentence in important sentences setFor important sentences.

Sentence is screened in k maximum sampling of the present invention, and the efficiency of automatic question answering can be improved, reduce computation complexity, It is more conducive to the answer selection of low-frequency word and unregistered word.

Step S104: word granularity memory coding is carried out using bidirectional circulating neural network model distich subclass, obtains word The memory coding of granularity memory unit.

Include: referring to Fig. 4, step S104

Sub-step S104a: the word in important sentences set is compiled in temporal sequence using bi-directional cyclic network model Code, obtains the hidden state of bi-directional cyclic network model.There are many existing model, the present embodiment to use it for bi-directional cyclic network model One of: door Cyclic Operation Network (GRU).

Sub-step S104a includes: to utilize all words in door Cyclic Operation Network (GRU) respectively distich subclass XForward and reverse coding is carried out in temporal sequence, for the word feature of t moment, the hidden shape of forward direction GRU coding State isThe hidden state of GRU coding is backwardWherein, | t | for according to time series Word maximal sequence length after all words are arranged in distich subclass X,WithDimension it is identical as the dimension d of term vector, C^RThe term vector matrix during information activation is taken turns for R in sentence granularity memory unit.

Sub-step S104b: the hidden state of bi-directional cyclic network model is merged, the note of word granularity memory unit is obtained Recall coding.

Sub-step S104b includes: directly to be added the hidden state of bi-directional cyclic network model, obtains word granularity memory unit Memory coding M^(W)={ m_t}_{T=1,2 .. | t |)}, wherein

It when the present invention carries out word granularity memory coding using Recognition with Recurrent Neural Network, is operated on full sentence set X, it should Mode can be introduced into context environmental semantic information of the word in sentence universal class, Ke Yiti during word granularity memory coding The accuracy and timeliness of high automatic question answering are conducive to the answer selection of low-frequency word and unregistered word.

Step S105: important sentences are sampled based on problem semantic coding, the memory coding of word granularity memory unit and k maximum Set obtains word granularity by attention mechanism and exports Word probability distribution.

Step S105 includes:

Sub-step S105a: the memory of word granularity is calculated according to problem semantic coding and the memory coding of word granularity memory unit The attention weight of unit；

Sub-step S105a includes: to take turns the problems in information activation process semantic coding based on R in sentence granularity memory unitThe memory coding M of word granularity memory unit^(W)={ m_t}_{T=1,2 .., | t |)}Important sentences set is sampled with k maximumIt obtains The attention weight vectors of word granularity memory unit after normalizationWherein:

Wherein,It is k maximum sampling important sentences setIn set of wordsCorresponding word granularity Memory coding M^(W)={ m_t}_{T=(1,2 ..., | t |)}In subclassAttention weight vectors α^(W)Dimension with according to Time series is to important sentences setIn all wordsWord maximal sequence length after being arranged is consistent, as WithIt is learning parameter, v, W and U are all made of the normal state that standard deviation is 0.1, mean value is 0 Distribution carries out random initializtion, is updated in the training stage.

Sub-step S105b: the output Word probability distribution of word granularity is obtained according to the attention weight of word granularity memory unit.? In the embodiment of the present invention, the attention weight α of the word granularity memory unit after directlying adopt normalization^(W)It is defeated as word granularity Word probability is distributed out:

At this point, the output Word probability distribution of word granularity is consistent with the dimension of attention weight, i.e., Attach most importance to Want the set of all words in sentence set

The present invention also carries out word granularity memory coding on the basis of sentence granularity memory coding, i.e., is remembered in two levels Recall coding, form the memory coding of stratification, can be further improved the accuracy of automatic question answering, be more conducive to low-frequency word and The answer of unregistered word selects.Meanwhile the attention mechanism in word granularity memory unit is the word granularity memory list after k sample Operation on first subclass, avoids the interference information in memory coding, and reduces the calculation amount of word granularity attention mechanism.

Step S106: associated prediction output Word probability distribution in subordinate clause granularity and word granularity memory unit, and utilize intersection Entropy exercises supervision training.

Step S106 includes:

Sub-step S106a: based on defeated in the output Word probability distribution of dictionary dimension and word granularity in sentence granularity memory unit Word probability distribution carries out output word associated prediction out, and associated prediction exports word distribution p (w) expression formula are as follows:

Wherein, trans () is indicated the word granularity output Word probability distribution of subclassIt is mapped to dictionary The word granularity output Word probability distribution of dimension universal classThe map operation specifically refers to output Word probability distributionMiddle probability value is according to its equivalent subclassIn word in dictionary dimension word universal classIn position carry out probability value mapping, if certain words in universal class do not occur in subclass, by it Output probability is set to 0, the word output probability distribution after being mapped

Sub-step S106b: cross entropy supervised training is carried out to associated prediction output word distribution using the distribution of target answer word. The target answer word of given training set is distributed as y, then exports word distribution p (w) based on target answer word distribution y and associated prediction Intersect entropy function and carries out combined optimization.

In one exemplary embodiment of the present invention, error back propagation is carried out to joint using stochastic gradient descent method Objective function in optimization optimizes, and Optimal Parameters include the term vector matrix in word granularity memory unit {A^r}_{R=(1,2 ..., R)}{ C^r}_{T=1,2 ..., R)}With time arrow matrixWithThe memory of word granularity is compiled All parameter sets { θ of two-way GRU model employed in code process_GRUAnd calculate word granularity memory unit attention weight V, W and U in (formula (12)).

Present invention associated prediction output Word probability distribution in sentence granularity and word granularity memory unit, can be further improved The accuracy of automatic question answering is more conducive to the answer selection of low-frequency word and unregistered word.

Fig. 2 is the block schematic illustration of the answering method based on stratification memory network as one embodiment of the invention. Referring to Fig. 2, the answering method based on stratification memory network there are two the memory network unit of level, is respectively as follows: altogether

Memory unit one: sentence set carries out the coded memory of sentence granularity with time series；

Memory unit two: all words carry out the coded memory of word granularity according to time series in sentence set.

Using k maximum using progress important information screening and filtering between different memory unit levels.

There is information activation mechanism at two in model information processing stage, is respectively as follows:

Activation mechanism one: information activation is carried out using inference mechanism in sentence granularity memory unit；

Activation mechanism two: selected ci poem is carried out using attention mechanism in word granularity memory unit and is selected.

Entire model training stage shares supervision message at two and is instructed, and is respectively as follows:

Supervision message one: the output vector after sentence granularity memory unit progress information inference is decoded defeated with Softmax To the fitting information of target word after out；

Supervision message two: to target word after word granularity memory unit progress attention mechanism activation and Softmax output It is fitted information.

For the automatic question answering response performance of accurate evaluation the method for the present invention, the present invention passes through contrast model selection output Answer word and the inconsistent error sample number of the true answer word of data compare the performance of the method for the present invention.

Table 1

Data fields	Training/test question and answer pair	Dictionary size (whole/training/test)	It is not logged in target word (percentage)
				Plane ticket booking	7,000/7,000	10,682/5,612/5,618	5,070 (72.43%)

Using a kind of Chinese air ticket ticket booking field text data set in experiment of the invention, which includes 2,000 altogether Its 5: 5 points are training set and test set by a full dialog history, 14,000 question and answer pair.For these text data sets, originally Invent (including going the operation such as stop words and stem reduction) without any processing.The specific statistical information of data set is as shown in table 1, It can be seen that the target word that is not logged in test set occupies 72.43%, bigger shadow can be generated to conventional model training It rings.

Following control methods is used in experiment of the invention:

Control methods one: the pointer network model based on attention mechanism, this method press all words in sentence set It regards a long sentence as according to time series to be encoded, the attention mechanism of direct Utilizing question and Chinese word coding generates answer；

Control methods two: Neural memory network model, this method distich subclass carry out sentence granularity coding, Utilizing question Information after coding vector progress semantic-enabled directly carries out answer matches on full dictionary space.

It is as shown in table 2 using parameter setting in present invention experiment:

Table 2

n	d	R	k	lr	bs
						16	100	3	1	0.01	10

In table 2, parameter n is the sentence maximum time sequence of the sentence set of experimental data, and d is term vector dimension and hidden layer Coding dimension, R are the number of iterations of inference mechanism in granularity memory unit, and k is the maximum hits between different levels memory, Lr is learning rate when carrying out Model Parameter Optimization using stochastic gradient descent method, and bs is every batch of sample when carrying out model training Quantity.

In present invention experiment, 15 wheel repetitive exercises are carried out, all methods restrain the reality as shown in figure 5, after final convergence Test that the results are shown in Table 3:

Table 3

Method	Error sample number
		Control methods one	109
Control methods two	56
		The method of the present invention	0

Fig. 5 and table 3 are the error sample number evaluation and test of the method for the present invention, control methods one and control methods two on data set As a result.The experimental results showed that the convergence rate of the method for the present invention is significantly superior to other methods.And according to the final receipts in table 3 Hold back result, it can be seen that the method for the present invention is substantially better than other methods, can be fully solved the answer choosing that unregistered word collection closes Problem is selected, 100% accuracy is reached.

Meanwhile the maximum hits k of experimental verification of the present invention information sifting between level memory unit asks answer selection The performance of error sample number influences in topic, and experimental result is as shown in Fig. 6 and table 4.It can be seen that when maximum hits is 1, this The convergence rate of inventive method performance and final convergence result can reach optimal, further illustrate and carry out letter between level memory unit Cease the importance of selection.

Table 4

Maximum hits	Error sample number
		3	5
2	4
		1	0

So far, attached drawing is had been combined the embodiment of the present invention is described in detail.According to above description, art technology Personnel should have clear understanding to a kind of answering method based on stratification memory network of the invention.

A kind of answering method based on stratification memory network of the present invention, first progress sentence granularity memory coding, and asking Under the stimulation for inscribing semantic coding, the attention mechanism by more taking turns iteration completes the information inference of sentence granularity memory unit, can be with The accuracy and timeliness for improving automatic question answering are conducive to the answer selection of low-frequency word and unregistered word；And it is sampled by k maximum Sentence is screened, the efficiency of automatic question answering can be improved, computation complexity is reduced, is gone back on the basis of sentence granularity memory coding Word granularity memory coding is carried out, i.e., carries out memory coding in two levels, forms the memory coding of stratification, can further mention The accuracy of high automatic question answering；It when carrying out word granularity memory coding using Recognition with Recurrent Neural Network, is operated on full sentence set X , which can be introduced into context environmental semantic information of the word in sentence universal class during word granularity memory coding, The accuracy and timeliness of automatic question answering can be improved；Attention mechanism in word granularity memory unit is the word after k sample Operation, avoids the interference information in memory coding on granularity memory unit subclass, and reduces word granularity attention machine The calculation amount of system；It is distributed, is can be further improved certainly using sentence granularity and word granularity memory unit associated prediction output Word probability The accuracy of dynamic question and answer, efficiently solves the answer select permeability of low-frequency word and unregistered word.

It should be noted that in attached drawing or specification text, the implementation for not being painted or describing is affiliated technology Form known to a person of ordinary skill in the art, is not described in detail in field.In addition, the above-mentioned definition to each element and not only limiting The various modes mentioned in embodiment, those of ordinary skill in the art simply can be changed or be replaced to it, such as:

(1) direction term mentioned in embodiment, such as "upper", "lower", "front", "rear", "left", "right" etc. are only ginsengs The direction for examining attached drawing, the protection scope being not intended to limit the invention；

(2) above-described embodiment can be based on the considerations of design and reliability, and the collocation that is mixed with each other uses or and other embodiments Mix and match uses, i.e., the technical characteristic in different embodiments can freely form more embodiments.

The purpose of the present invention, technical scheme and beneficial effects are described in detail in particular embodiments described above, It should be understood that the above is only a specific embodiment of the present invention, it is not intended to restrict the invention, it is all in the present invention Spirit and principle within, any modification, equivalent substitution, improvement and etc. done, should be included in protection scope of the present invention it It is interior.

Claims

1. a kind of answering method based on stratification memory network characterized by comprising

Step S101: merging the position of word and the time serial message of sentence, and the sentence in distich subclass carries out sentence granularity note Recall coding, obtains the binary channels memory coding of granularity memory unit；

Step S102: under the stimulation of problem semantic coding, the attention mechanism by more taking turns iteration is completed the sentence granularity and is remembered The information inference for recalling unit obtains being distributed in the sentence granularity memory unit in the output Word probability of dictionary dimension；

Step S103: the sampling of k maximum is carried out to the information inference result of the sentence granularity memory unit, from the sentence set Filter out k maximum sampling important sentences set；

Step S104: word granularity memory coding is carried out to the sentence set using bidirectional circulating neural network model, obtains word The memory coding of granularity memory unit；

Step S105: important sentences are sampled based on described problem semantic coding, the memory coding of word granularity memory unit and k maximum Set obtains word granularity by attention mechanism and exports Word probability distribution；And

Step S106: in subordinate clause granularity and word granularity memory unit associated prediction output Word probability distribution, and using cross entropy into Row supervised training.

2. answering method as described in claim 1, which is characterized in that the step S101 includes:

Sub-step S101a: the given sentence set X={ x with time serial message_i}_{I=(1,2 ..., n)}, random initializtion word to Moment matrixWithSentence x_iIn word x_ijBinary channels vectorization be encoded toWith

Wherein, i is the current time sequence of sentence；N is the maximum time sequence length of sentence set；| V | it is dictionary dimension；d For the dimension of term vector；J is word in sentence x_iIn location information；

Sub-step S101b: binary channels term vectorization coding is updated according to location information of the word in sentence；And

Sub-step S101c: the time serial message for merging sentence carries out sentence granularity memory coding to sentence, obtains a granularity memory The binary channels memory coding of unit.

3. answering method as claimed in claim 2, which is characterized in that the sub-step S101b includes:

Updated binary channels term vector is encoded to l_gj·(Ax_ij) and l_gj·(Cx_ij), wherein

l_gj=(1-j/J_i)-(g/d)(1-2j/J_i) (1)

Wherein, J_iIt is sentence x_iThe number of middle word, and it is current dimension values in the term vector of d that g, which is dimension, and 1≤j≤J_iWith 1 ≤g≤d。

4. answering method as claimed in claim 3, which is characterized in that the sub-step S101c includes:

The time arrow matrix of random initializtion sentenceWithThe then binary channels note of sentence granularity memory unit Recall and is encoded to M^(S)={ { a_i},{c_i, wherein

a_i=∑_jl_j·(Ax_ij)+T_A(i) (2)

c_i=∑_jl_j·(Cx_ij)+T_C(i) (3)

Wherein, l_jTo update matrix l in sentence x_iIn j-th of word renewal vector；Operator element multiplication between vector is grasped Make；N is the maximum time sequence length of sentence set；D is time arrow dimension, identical as the dimension of term vector.

5. answering method as claimed in claim 4, which is characterized in that the step S102 includes:

Sub-step S102a: term vector matrix is utilizedTo j-th of word q in question text q_jCarry out vectorization expressionObtain problem semantic coding:

Wherein, l_jTo update matrix l in sentence x_iIn j-th of word renewal vector；

Sub-step S102b: attention weight of the computational problem semantic coding in sentence granularity memory unit

Under the stimulation of problem semantic coding, the active information of the binary channels memory coding of sentence granularity memory unit are as follows:And

Sub-step S102c: the attention mechanism by more taking turns iteration completes the information inference in sentence granularity memory unit, obtains sentence It is distributed in granularity memory unit in the output Word probability of dictionary dimension.

6. answering method as claimed in claim 5, which is characterized in that the sub-step S102c includes:

R is carried out in sentence granularity memory unit and takes turns information activation, obtains the active information o of R wheel_R, wherein information is taken turns in r+1 In activation,

Wherein, 1≤r≤(R-1), 2≤R；A^r+1=C^r,

Output Word probability in sentence granularity memory unit in dictionary dimension is distributed are as follows:

Wherein, w={ w_e}_{E=(1,2 .., | V |)}For dictionary dimension | V | the set of words of size, w_eFor e-th of word in dictionary V；The term vector matrix of information activation is taken turns for R；T is transposition operator.

7. answering method as claimed in claim 6, which is characterized in that the step S103 includes:

Sub-step S103a: R takes turns the attention weight vectors of information activation in distich granularity memory unitTherefrom choose its k maximum attention weight subclass1≤k≤ n；And

Sub-step S103b: k maximum attention weight subclass are chosenCorresponding sentence set as k most Big sampling important sentences set For from α^(S)The big attention weight of the i-th of middle selection,For? Corresponding i-th important sentences in sentence set X.

8. answering method as claimed in claim 7, which is characterized in that the step S104 includes:

Sub-step S104a: all words in door Cyclic Operation Network difference distich subclass X are utilizedOn time Between sequence carry out forward and reverse coding, for t-th of word, forward direction GRU coding hidden state be The hidden state of GRU coding is backward

Wherein, num is the word maximal sequence length after being arranged according to all words in time series distich subclass X；With Dimension it is identical as the dimension d of term vector,For t-th of word in sentence set X；

Sub-step S104b: the memory coding M of word granularity memory unit is obtained^(W)={ m_t}_{T=(1,2 ..., num)}, wherein

9. answering method as claimed in claim 8, which is characterized in that the step S105 includes:

Sub-step S105a: the attention weight vectors of the word granularity memory unit after calculating normalizationWherein:

Wherein,It is k maximum sampling important sentences setIn set of wordsCorresponding word granularity note Recall codingIn subclassAttention weight vectors α^(W)Dimension be WithFor learning parameter,For sentence setIn t-th of wordCorresponding word grain Spend memory coding；

Sub-step S105b: word granularity exports Word probability distributionAre as follows:

Wherein, For the set of all words in important sentences set

10. answering method as claimed in claim 9, which is characterized in that the step S106 includes:

Sub-step S106a: word is exported with word granularity based on the output Word probability distribution in sentence granularity memory unit in dictionary dimension Probability distribution carries out output word associated prediction, and associated prediction exports word distribution p (w) expression formula are as follows:

Wherein, trans () is indicated the word granularity output Word probability distribution of subclassIt is mapped to dictionary dimension Spend the word granularity output Word probability distribution of universal class

Sub-step S106b: cross entropy supervised training is carried out to associated prediction output word distribution using the distribution of target answer word.