CN106126596B - A kind of answering method based on stratification memory network - Google Patents
A kind of answering method based on stratification memory network Download PDFInfo
- Publication number
- CN106126596B CN106126596B CN201610447676.4A CN201610447676A CN106126596B CN 106126596 B CN106126596 B CN 106126596B CN 201610447676 A CN201610447676 A CN 201610447676A CN 106126596 B CN106126596 B CN 106126596B
- Authority
- CN
- China
- Prior art keywords
- word
- sentence
- granularity
- coding
- memory unit
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
- 238000000034 method Methods 0.000 title claims abstract description 59
- 238000013517 stratification Methods 0.000 title claims abstract description 25
- 230000007246 mechanism Effects 0.000 claims abstract description 30
- 238000005070 sampling Methods 0.000 claims abstract description 12
- 230000000638 stimulation Effects 0.000 claims abstract description 10
- 239000013598 vector Substances 0.000 claims description 39
- 230000004913 activation Effects 0.000 claims description 18
- 239000011159 matrix material Substances 0.000 claims description 16
- 238000012549 training Methods 0.000 claims description 16
- 125000004122 cyclic group Chemical group 0.000 claims description 9
- 238000003062 neural network model Methods 0.000 claims description 4
- 230000002457 bidirectional effect Effects 0.000 claims description 3
- 238000010606 normalization Methods 0.000 claims description 3
- 230000002441 reversible effect Effects 0.000 claims description 2
- 230000017105 transposition Effects 0.000 claims description 2
- 230000035699 permeability Effects 0.000 abstract description 4
- 238000001994 activation Methods 0.000 description 16
- 238000012360 testing method Methods 0.000 description 7
- 238000013528 artificial neural network Methods 0.000 description 4
- 238000002474 experimental method Methods 0.000 description 4
- 230000009286 beneficial effect Effects 0.000 description 3
- 238000004364 calculation method Methods 0.000 description 3
- 238000005516 engineering process Methods 0.000 description 3
- 230000007613 environmental effect Effects 0.000 description 3
- 238000005457 optimization Methods 0.000 description 3
- 230000000306 recurrent effect Effects 0.000 description 3
- 238000012216 screening Methods 0.000 description 3
- 238000013461 design Methods 0.000 description 2
- 238000010586 diagram Methods 0.000 description 2
- 238000011156 evaluation Methods 0.000 description 2
- 230000006870 function Effects 0.000 description 2
- 238000011478 gradient descent method Methods 0.000 description 2
- 238000013507 mapping Methods 0.000 description 2
- 230000001537 neural effect Effects 0.000 description 2
- 238000012545 processing Methods 0.000 description 2
- 230000004044 response Effects 0.000 description 2
- 230000002123 temporal effect Effects 0.000 description 2
- 241000208340 Araliaceae Species 0.000 description 1
- 101100379081 Emericella variicolor andC gene Proteins 0.000 description 1
- 235000005035 Panax pseudoginseng ssp. pseudoginseng Nutrition 0.000 description 1
- 235000003140 Panax quinquefolius Nutrition 0.000 description 1
- 230000008901 benefit Effects 0.000 description 1
- 238000010276 construction Methods 0.000 description 1
- 238000007796 conventional method Methods 0.000 description 1
- 238000013135 deep learning Methods 0.000 description 1
- 230000000694 effects Effects 0.000 description 1
- 238000001914 filtration Methods 0.000 description 1
- 235000008434 ginseng Nutrition 0.000 description 1
- 230000006872 improvement Effects 0.000 description 1
- 230000010365 information processing Effects 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 238000003058 natural language processing Methods 0.000 description 1
- 238000013139 quantization Methods 0.000 description 1
- 230000009467 reduction Effects 0.000 description 1
- 230000003252 repetitive effect Effects 0.000 description 1
- 238000010187 selection method Methods 0.000 description 1
- 238000006467 substitution reaction Methods 0.000 description 1
- 238000012795 verification Methods 0.000 description 1
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/30—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
- G06F16/33—Querying
- G06F16/332—Query formulation
- G06F16/3329—Natural language query formulation or dialogue systems
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/30—Semantic analysis
Abstract
The present invention provides a kind of answering methods based on stratification memory network, sentence granularity memory coding is carried out first, and under the stimulation of problem semantic coding, attention mechanism by more taking turns iteration completes the information inference of sentence granularity memory unit, sentence is screened by the sampling of k maximum, word granularity memory coding is also carried out on the basis of sentence granularity memory coding, memory coding is carried out in two levels, form the memory coding of stratification, it is distributed using sentence granularity and word granularity memory unit associated prediction output Word probability, improve the accuracy of automatic question answering, efficiently solve the answer select permeability of low-frequency word and unregistered word.
Description
Technical field
The present invention relates to automatically request-answering system constructing technology field, relate more specifically to a kind of based on level memory network
End-to-end answering method.
Background technique
For a long time, automatic question answering always is one of task most challenging in natural language processing problem, this
Business needs to carry out deep layer understanding to text and filters out candidate answers to respond as system.Current existing conventional method packet
It includes: stand-alone training being carried out to modules during text-processing using pipeline mode, then merges the mode of output;Building
Large-scale structure knowledge base, and information inference and answer prediction are carried out based on this knowledge base.In recent years, it is based on deep learning side
The end-to-end system of method is widely used in solving various tasks, these methods are not necessarily to without manual construction feature to each mould
Block carries out independent tuning.
Question answering system is broadly divided into two steps: related semantic information is positioned first, which is known as " activation stage ",
It is then based on relevant information and carries out response generation, which is known as " generation phase ".Recently, Neural memory network model is in question and answer
Preferably effect is achieved in system task.But these models it is maximum the disadvantage is that using single level sentence granularity memory list
Member can not preferably solve the problems, such as low-frequency word or unregistered word.And under normal conditions, in order to which the time for reducing model is complicated
Degree, often needs to reduce dictionary scale.At this point, existing end-to-end neural network model can not preferably select low frequency or not step on
Word is recorded to export as answer.I.e. when target answer word is when except training dictionary, existing method can not be compared in the on-line testing stage
Accurate answer is selected to export as model well.By taking following dialog texts as an example:
What is your name 1. you are good by Mr.?
2. oh, I is Williamson.
3. could you tell me your passport number.
4. it is good, it is 577838771.
5. there are also your telephone numbers?
6. number is 0016178290851.
It is assumed that " Williamson ", " 577838771 " and " 0016178290851 " are low-frequency word or unregistered word, if passed
System method abandons these words or uniformly with the replacement of " unk " symbol, these methods can not be selected from dialog text
Accurate user information out.However, in practical applications, how most answer informations design from low-frequency word or long-tail word
A kind of answer selection method can effectively solve the problem that unregistered word is the task that current automatically request-answering system field urgently needs.
Summary of the invention
(1) technical problems to be solved
In order to solve prior art problem, the present invention provides a kind of answering methods based on stratification memory network.
(2) technical solution
The present invention provides a kind of answering methods based on stratification memory network, comprising: step S101: merging the position of word
The time serial message with sentence is set, the sentence in distich subclass carries out sentence granularity memory coding, and it is single to obtain a granularity memory
The binary channels memory coding of member;Step S102: complete by the attention mechanism for more taking turns iteration under the stimulation of problem semantic coding
At the information inference of the sentence granularity memory unit, the output Word probability in the sentence granularity memory unit in dictionary dimension is obtained
Distribution;Step S103: the sampling of k maximum is carried out to the information inference result of the sentence granularity memory unit, from the sentence set
In filter out k maximum sampling important sentences set;Step S104: using bidirectional circulating neural network model to the sentence set
Word granularity memory coding is carried out, the memory coding of word granularity memory unit is obtained;Step S105: it is compiled based on described problem semanteme
Code, the memory coding of word granularity memory unit and k maximum sample important sentences set, and it is defeated to obtain word granularity by attention mechanism
Word probability is distributed out;And step S106: associated prediction output Word probability distribution in subordinate clause granularity and word granularity memory unit, and
It is exercised supervision training using cross entropy.
(3) beneficial effect
It can be seen from the above technical proposal that the answering method of the invention based on stratification memory network has with following
Beneficial effect:
(1) present invention carries out sentence granularity memory coding first, and under the stimulation of problem semantic coding, by taking turns iteration more
Attention mechanism complete sentence granularity memory unit information inference, the accuracy and timeliness of automatic question answering can be improved, have
It is selected conducive to the answer of low-frequency word and unregistered word;
(2) sentence is screened by the sampling of k maximum, the efficiency of automatic question answering can be improved, reduce computation complexity;
(3) word granularity memory coding is also carried out on the basis of sentence granularity memory coding, i.e., is remembered in two levels
Coding, forms the memory coding of stratification, can be further improved the accuracy of automatic question answering;
It (4) when carrying out word granularity memory coding using Recognition with Recurrent Neural Network, is operated on full sentence set X, the party
Formula can be introduced into context environmental semantic information of the word in sentence universal class during word granularity memory coding, can be improved
The accuracy and timeliness of automatic question answering;
(5) the attention mechanism in word granularity memory unit is transported on the word granularity memory unit subclass after k sample
It calculates, avoids the interference information in memory coding, and reduce the calculation amount of word granularity attention mechanism;
(6) it is distributed, is can be further improved certainly using sentence granularity and word granularity memory unit associated prediction output Word probability
The accuracy of dynamic question and answer, efficiently solves the answer select permeability of low-frequency word and unregistered word.
Detailed description of the invention
Fig. 1 is the flow chart of the answering method based on stratification memory network of the embodiment of the present invention;
Fig. 2 is the block schematic illustration of the answering method based on stratification memory network of the embodiment of the present invention;
Fig. 3 is the sentence granularity memory coding of the embodiment of the present invention and the information inference signal based on sentence granularity memory coding
Figure;
Fig. 4 is that the word granularity memory coding of the embodiment of the present invention and the attention of word-based granularity memory coding activate signal
Figure;
Fig. 5 is the performance schematic diagram 1 of the answering method based on stratification memory network of the embodiment of the present invention;
Fig. 6 is the another performance schematic diagram of the answering method based on stratification memory network of the embodiment of the present invention.
Specific embodiment
To make the objectives, technical solutions, and advantages of the present invention clearer, below in conjunction with specific embodiment, and reference
Attached drawing, the present invention is described in further detail.
The invention discloses a kind of answering methods based on stratification memory network, are based on a kind of full neural network structure
End to end model, may be implemented the information inference in sentence set, screening and word granularity selection effectively solve to ask under big data
System is answered to the answer select permeability of low-frequency word or unregistered word.Answering method of the invention is by the sentence with time serial message
Subclass carries out the memory coding of two stratification respectively, is respectively as follows: a granularity memory coding and word granularity memory coding.Then
Information inference, screening and activation, the distribution of associated prediction candidate answers Word probability are carried out based on stratification memory network.
Answering method of the invention carries out sentence vectorization memory coding by stratification memory network elder generation distich subclass,
Consider location information and sentence sequence time information in sentence set of the word in sentence, then passes through the note of mostly wheel iteration
Power mechanism of anticipating completes the information inference of sentence granularity memory unit, and carries out the sampling of k maximum based on the reasoning results, filters out important
Sentence information.Then the sequential coding of word granularity is carried out using bi-directional cyclic network model distich subclass, and passes through attention
Power mechanism carries out the information activation of word granularity memory unit from the information being screened, and finally subordinate clause granularity and word granularity are remembered respectively
Recall prediction output Word probability in unit to be distributed and carry out team surveillance training by Softmax, learns automatic question answering end to end
Model.
The answering method based on stratification memory network as the embodiment of the present invention is carried out with reference to the accompanying drawing detailed
Description.
Fig. 1 is the flow chart of the answering method based on stratification memory network of the embodiment of the present invention, and referring to Fig.1, this is asked
The method of answering includes:
Step S101: merging the position of word and the time serial message of sentence, and the sentence in distich subclass carries out sentence grain
Memory coding is spent, the binary channels memory coding of granularity memory unit is obtained.
Include: referring to Fig. 3, step S101
Sub-step S101a: the sentence in distich subclass with time serial message carries out the mapping of binary channels term vector, obtains
Binary channels term vectorization to sentence encodes.
Sub-step S101a includes: the given sentence set X={ x with time serial messagei}I=(1,2 ..., n), wherein i
For the current time sequence of sentence, n is the maximum time sequence length of sentence set;Two term vector matrixes of random initializtionWithWherein, | V | it is dictionary dimension, d is the dimension of term vector, and A and C are respectively adopted standard deviation and are
0.1, sentence x of the normal distribution that mean value is 0 as random initializtion parameter, in distich subclass XiCarry out binary channels term vector
It maps, then sentence xiIn word xijBinary channels vectorization be encoded toWithJ is word in sentence xiIn
Location information.
Sub-step S101b: binary channels term vectorization coding is updated according to location information of the word in sentence.
Sub-step S101b includes: that the dimension d of the location information j and term vector according to word in sentence generate update matrix
L, then updated binary channels term vector is encoded to lgj·(Axij) and lgj·(Cxij), in which:
lgj=(1-j/Ji)-(g/d)(1-2j/Ji) (1)
Wherein, JiIt is sentence xiThe number of middle word, and it is current dimension values in the term vector of d that g, which is dimension, and 1≤j≤J
With 1≤g≤d.
Sub-step S101c: the time serial message for merging sentence carries out sentence granularity memory coding to sentence, obtains a granularity
The binary channels memory coding of memory unit.
Sub-step S101c includes: the vectorization matrix of two sentence time serieses of random initializtionWithWherein, n is the maximum time sequence length of sentence set, and d is time arrow dimension, the dimension phase with term vector
Together, TAAnd TCStandard deviation is 0.1, mean value is 0 normal distribution is respectively adopted as random initializtion parameter, then sentence granularity is remembered
The binary channels memory coding of unit is M(S)={ { ai, { ci, in which:
Wherein, ljIt is middle update matrix l in sentence xiIn j-th of word renewal vector, operator element between vector multiplies
Method operation, such as l in formula (2)j·(Axij) indicate vector ljWith vector (Axij) carry out the operation of element multiplication.
Step S102: under the stimulation of problem semantic coding, the attention mechanism by more taking turns iteration completes sentence granularity note
The information inference for recalling unit obtains being distributed in granularity memory unit in the output Word probability of dictionary dimension.
Step S102 includes:
Sub-step S102a: vectorization expression is carried out to question text, obtains the semantic coding of problem.
Sub-step S102a includes: to utilize the term vector matrixTo j-th of word q in question text qjCarry out to
Quantization meansAnd the word-based position j in question text is updated vectorization expression, obtains problem
Semantic coding:
With the formula (2) and (3), ljTo update matrix l in sentence xiIn j-th of word renewal vector.
Sub-step S102b: under the stimulation of problem semantic coding, using attention mechanism in the double of sentence granularity memory unit
Information activation is carried out in the memory coding of channel;
Sub-step S102b includes: the attention using dot product mode computational problem semantic coding in sentence granularity memory unit
Weight:
Then under the stimulation of problem semantic coding, the active information of the binary channels memory coding of sentence granularity memory unit are as follows:
Sub-step S102c: the attention mechanism by more taking turns iteration completes the information inference in sentence granularity memory unit, obtains
It is distributed on to sentence granularity memory unit in the output Word probability of dictionary dimension.
Sub-step S102c includes: that R wheel information activation is carried out in sentence granularity memory unit, finds candidate sentences set, obtains
The active information O taken turns to RR, wherein in r+1 wheel information activation,
Wherein, 1≤r≤(R-1) uses independent term vector matrix A in r+1 wheel information activationr+1And Cr+1And when
Between vector matrixWithDistich subclass carries out vectorization expression, andCrWithIt adopts respectively
Use standard deviation is 0.1, mean value is 0 normal distribution as random initializtion parameter.
The information inference in sentence granularity memory unit is completed by the attention mechanism that R takes turns iteration, obtains a granularity memory
It is distributed on unit in the output Word probability of dictionary dimension are as follows:
Wherein,For dictionary dimension set of words,The term vector square of information activation is taken turns for R
Battle array, and T is transposition operator.
The present invention carries out sentence granularity memory coding first, and under the stimulation of problem semantic coding, by taking turns iteration more
Attention mechanism completes the information inference of sentence granularity memory unit, the accuracy and timeliness of automatic question answering can be improved, favorably
It is selected in the answer of low-frequency word and unregistered word.
Step S103: the information inference result of distich granularity memory unit carries out the sampling of k maximum, screens in subordinate clause subclass
K maximum samples important sentences set out.
Step S103 includes:
Sub-step S103a: R takes turns the attention weight vectors of information activation in distich granularity memory unitTherefrom choose its k maximum attention weight subclass
Sub-step S103b: k maximum attention weight subclass are chosenCorresponding sentence collection cooperation
Important sentences set is sampled for k maximumSentence in important sentences setFor important sentences.
Sentence is screened in k maximum sampling of the present invention, and the efficiency of automatic question answering can be improved, reduce computation complexity,
It is more conducive to the answer selection of low-frequency word and unregistered word.
Step S104: word granularity memory coding is carried out using bidirectional circulating neural network model distich subclass, obtains word
The memory coding of granularity memory unit.
Include: referring to Fig. 4, step S104
Sub-step S104a: the word in important sentences set is compiled in temporal sequence using bi-directional cyclic network model
Code, obtains the hidden state of bi-directional cyclic network model.There are many existing model, the present embodiment to use it for bi-directional cyclic network model
One of: door Cyclic Operation Network (GRU).
Sub-step S104a includes: to utilize all words in door Cyclic Operation Network (GRU) respectively distich subclass XForward and reverse coding is carried out in temporal sequence, for the word feature of t moment, the hidden shape of forward direction GRU coding
State isThe hidden state of GRU coding is backwardWherein, | t | for according to time series
Word maximal sequence length after all words are arranged in distich subclass X,WithDimension it is identical as the dimension d of term vector,
CRThe term vector matrix during information activation is taken turns for R in sentence granularity memory unit.
Sub-step S104b: the hidden state of bi-directional cyclic network model is merged, the note of word granularity memory unit is obtained
Recall coding.
Sub-step S104b includes: directly to be added the hidden state of bi-directional cyclic network model, obtains word granularity memory unit
Memory coding M(W)={ mt}T=1,2 .. | t |), wherein
It when the present invention carries out word granularity memory coding using Recognition with Recurrent Neural Network, is operated on full sentence set X, it should
Mode can be introduced into context environmental semantic information of the word in sentence universal class, Ke Yiti during word granularity memory coding
The accuracy and timeliness of high automatic question answering are conducive to the answer selection of low-frequency word and unregistered word.
Step S105: important sentences are sampled based on problem semantic coding, the memory coding of word granularity memory unit and k maximum
Set obtains word granularity by attention mechanism and exports Word probability distribution.
Step S105 includes:
Sub-step S105a: the memory of word granularity is calculated according to problem semantic coding and the memory coding of word granularity memory unit
The attention weight of unit;
Sub-step S105a includes: to take turns the problems in information activation process semantic coding based on R in sentence granularity memory unitThe memory coding M of word granularity memory unit(W)={ mt}T=1,2 .., | t |)Important sentences set is sampled with k maximumIt obtains
The attention weight vectors of word granularity memory unit after normalizationWherein:
Wherein,It is k maximum sampling important sentences setIn set of wordsCorresponding word granularity
Memory coding M(W)={ mt}T=(1,2 ..., | t |)In subclassAttention weight vectors α(W)Dimension with according to
Time series is to important sentences setIn all wordsWord maximal sequence length after being arranged is consistent, as WithIt is learning parameter, v, W and U are all made of the normal state that standard deviation is 0.1, mean value is 0
Distribution carries out random initializtion, is updated in the training stage.
Sub-step S105b: the output Word probability distribution of word granularity is obtained according to the attention weight of word granularity memory unit.?
In the embodiment of the present invention, the attention weight α of the word granularity memory unit after directlying adopt normalization(W)It is defeated as word granularity
Word probability is distributed out:
At this point, the output Word probability distribution of word granularity is consistent with the dimension of attention weight, i.e., Attach most importance to
Want the set of all words in sentence set
The present invention also carries out word granularity memory coding on the basis of sentence granularity memory coding, i.e., is remembered in two levels
Recall coding, form the memory coding of stratification, can be further improved the accuracy of automatic question answering, be more conducive to low-frequency word and
The answer of unregistered word selects.Meanwhile the attention mechanism in word granularity memory unit is the word granularity memory list after k sample
Operation on first subclass, avoids the interference information in memory coding, and reduces the calculation amount of word granularity attention mechanism.
Step S106: associated prediction output Word probability distribution in subordinate clause granularity and word granularity memory unit, and utilize intersection
Entropy exercises supervision training.
Step S106 includes:
Sub-step S106a: based on defeated in the output Word probability distribution of dictionary dimension and word granularity in sentence granularity memory unit
Word probability distribution carries out output word associated prediction out, and associated prediction exports word distribution p (w) expression formula are as follows:
Wherein, trans () is indicated the word granularity output Word probability distribution of subclassIt is mapped to dictionary
The word granularity output Word probability distribution of dimension universal classThe map operation specifically refers to output Word probability distributionMiddle probability value is according to its equivalent subclassIn word in dictionary dimension word universal classIn position carry out probability value mapping, if certain words in universal class do not occur in subclass, by it
Output probability is set to 0, the word output probability distribution after being mapped
Sub-step S106b: cross entropy supervised training is carried out to associated prediction output word distribution using the distribution of target answer word.
The target answer word of given training set is distributed as y, then exports word distribution p (w) based on target answer word distribution y and associated prediction
Intersect entropy function and carries out combined optimization.
In one exemplary embodiment of the present invention, error back propagation is carried out to joint using stochastic gradient descent method
Objective function in optimization optimizes, and Optimal Parameters include the term vector matrix in word granularity memory unit
{Ar}R=(1,2 ..., R){ Cr}T=1,2 ..., R)With time arrow matrixWithThe memory of word granularity is compiled
All parameter sets { θ of two-way GRU model employed in code processGRUAnd calculate word granularity memory unit attention weight
V, W and U in (formula (12)).
Present invention associated prediction output Word probability distribution in sentence granularity and word granularity memory unit, can be further improved
The accuracy of automatic question answering is more conducive to the answer selection of low-frequency word and unregistered word.
Fig. 2 is the block schematic illustration of the answering method based on stratification memory network as one embodiment of the invention.
Referring to Fig. 2, the answering method based on stratification memory network there are two the memory network unit of level, is respectively as follows: altogether
Memory unit one: sentence set carries out the coded memory of sentence granularity with time series;
Memory unit two: all words carry out the coded memory of word granularity according to time series in sentence set.
Using k maximum using progress important information screening and filtering between different memory unit levels.
There is information activation mechanism at two in model information processing stage, is respectively as follows:
Activation mechanism one: information activation is carried out using inference mechanism in sentence granularity memory unit;
Activation mechanism two: selected ci poem is carried out using attention mechanism in word granularity memory unit and is selected.
Entire model training stage shares supervision message at two and is instructed, and is respectively as follows:
Supervision message one: the output vector after sentence granularity memory unit progress information inference is decoded defeated with Softmax
To the fitting information of target word after out;
Supervision message two: to target word after word granularity memory unit progress attention mechanism activation and Softmax output
It is fitted information.
For the automatic question answering response performance of accurate evaluation the method for the present invention, the present invention passes through contrast model selection output
Answer word and the inconsistent error sample number of the true answer word of data compare the performance of the method for the present invention.
Table 1
Data fields | Training/test question and answer pair | Dictionary size (whole/training/test) | It is not logged in target word (percentage) |
Plane ticket booking | 7,000/7,000 | 10,682/5,612/5,618 | 5,070 (72.43%) |
Using a kind of Chinese air ticket ticket booking field text data set in experiment of the invention, which includes 2,000 altogether
Its 5: 5 points are training set and test set by a full dialog history, 14,000 question and answer pair.For these text data sets, originally
Invent (including going the operation such as stop words and stem reduction) without any processing.The specific statistical information of data set is as shown in table 1,
It can be seen that the target word that is not logged in test set occupies 72.43%, bigger shadow can be generated to conventional model training
It rings.
Following control methods is used in experiment of the invention:
Control methods one: the pointer network model based on attention mechanism, this method press all words in sentence set
It regards a long sentence as according to time series to be encoded, the attention mechanism of direct Utilizing question and Chinese word coding generates answer;
Control methods two: Neural memory network model, this method distich subclass carry out sentence granularity coding, Utilizing question
Information after coding vector progress semantic-enabled directly carries out answer matches on full dictionary space.
It is as shown in table 2 using parameter setting in present invention experiment:
Table 2
n | d | R | k | lr | bs |
16 | 100 | 3 | 1 | 0.01 | 10 |
In table 2, parameter n is the sentence maximum time sequence of the sentence set of experimental data, and d is term vector dimension and hidden layer
Coding dimension, R are the number of iterations of inference mechanism in granularity memory unit, and k is the maximum hits between different levels memory,
Lr is learning rate when carrying out Model Parameter Optimization using stochastic gradient descent method, and bs is every batch of sample when carrying out model training
Quantity.
In present invention experiment, 15 wheel repetitive exercises are carried out, all methods restrain the reality as shown in figure 5, after final convergence
Test that the results are shown in Table 3:
Table 3
Method | Error sample number |
Control methods one | 109 |
Control methods two | 56 |
The method of the present invention | 0 |
Fig. 5 and table 3 are the error sample number evaluation and test of the method for the present invention, control methods one and control methods two on data set
As a result.The experimental results showed that the convergence rate of the method for the present invention is significantly superior to other methods.And according to the final receipts in table 3
Hold back result, it can be seen that the method for the present invention is substantially better than other methods, can be fully solved the answer choosing that unregistered word collection closes
Problem is selected, 100% accuracy is reached.
Meanwhile the maximum hits k of experimental verification of the present invention information sifting between level memory unit asks answer selection
The performance of error sample number influences in topic, and experimental result is as shown in Fig. 6 and table 4.It can be seen that when maximum hits is 1, this
The convergence rate of inventive method performance and final convergence result can reach optimal, further illustrate and carry out letter between level memory unit
Cease the importance of selection.
Table 4
Maximum hits | Error sample number |
3 | 5 |
2 | 4 |
1 | 0 |
So far, attached drawing is had been combined the embodiment of the present invention is described in detail.According to above description, art technology
Personnel should have clear understanding to a kind of answering method based on stratification memory network of the invention.
A kind of answering method based on stratification memory network of the present invention, first progress sentence granularity memory coding, and asking
Under the stimulation for inscribing semantic coding, the attention mechanism by more taking turns iteration completes the information inference of sentence granularity memory unit, can be with
The accuracy and timeliness for improving automatic question answering are conducive to the answer selection of low-frequency word and unregistered word;And it is sampled by k maximum
Sentence is screened, the efficiency of automatic question answering can be improved, computation complexity is reduced, is gone back on the basis of sentence granularity memory coding
Word granularity memory coding is carried out, i.e., carries out memory coding in two levels, forms the memory coding of stratification, can further mention
The accuracy of high automatic question answering;It when carrying out word granularity memory coding using Recognition with Recurrent Neural Network, is operated on full sentence set X
, which can be introduced into context environmental semantic information of the word in sentence universal class during word granularity memory coding,
The accuracy and timeliness of automatic question answering can be improved;Attention mechanism in word granularity memory unit is the word after k sample
Operation, avoids the interference information in memory coding on granularity memory unit subclass, and reduces word granularity attention machine
The calculation amount of system;It is distributed, is can be further improved certainly using sentence granularity and word granularity memory unit associated prediction output Word probability
The accuracy of dynamic question and answer, efficiently solves the answer select permeability of low-frequency word and unregistered word.
It should be noted that in attached drawing or specification text, the implementation for not being painted or describing is affiliated technology
Form known to a person of ordinary skill in the art, is not described in detail in field.In addition, the above-mentioned definition to each element and not only limiting
The various modes mentioned in embodiment, those of ordinary skill in the art simply can be changed or be replaced to it, such as:
(1) direction term mentioned in embodiment, such as "upper", "lower", "front", "rear", "left", "right" etc. are only ginsengs
The direction for examining attached drawing, the protection scope being not intended to limit the invention;
(2) above-described embodiment can be based on the considerations of design and reliability, and the collocation that is mixed with each other uses or and other embodiments
Mix and match uses, i.e., the technical characteristic in different embodiments can freely form more embodiments.
The purpose of the present invention, technical scheme and beneficial effects are described in detail in particular embodiments described above,
It should be understood that the above is only a specific embodiment of the present invention, it is not intended to restrict the invention, it is all in the present invention
Spirit and principle within, any modification, equivalent substitution, improvement and etc. done, should be included in protection scope of the present invention it
It is interior.
Claims (10)
1. a kind of answering method based on stratification memory network characterized by comprising
Step S101: merging the position of word and the time serial message of sentence, and the sentence in distich subclass carries out sentence granularity note
Recall coding, obtains the binary channels memory coding of granularity memory unit;
Step S102: under the stimulation of problem semantic coding, the attention mechanism by more taking turns iteration is completed the sentence granularity and is remembered
The information inference for recalling unit obtains being distributed in the sentence granularity memory unit in the output Word probability of dictionary dimension;
Step S103: the sampling of k maximum is carried out to the information inference result of the sentence granularity memory unit, from the sentence set
Filter out k maximum sampling important sentences set;
Step S104: word granularity memory coding is carried out to the sentence set using bidirectional circulating neural network model, obtains word
The memory coding of granularity memory unit;
Step S105: important sentences are sampled based on described problem semantic coding, the memory coding of word granularity memory unit and k maximum
Set obtains word granularity by attention mechanism and exports Word probability distribution;And
Step S106: in subordinate clause granularity and word granularity memory unit associated prediction output Word probability distribution, and using cross entropy into
Row supervised training.
2. answering method as described in claim 1, which is characterized in that the step S101 includes:
Sub-step S101a: the given sentence set X={ x with time serial messagei}I=(1,2 ..., n), random initializtion word to
Moment matrixWithSentence xiIn word xijBinary channels vectorization be encoded toWith
Wherein, i is the current time sequence of sentence;N is the maximum time sequence length of sentence set;| V | it is dictionary dimension;d
For the dimension of term vector;J is word in sentence xiIn location information;
Sub-step S101b: binary channels term vectorization coding is updated according to location information of the word in sentence;And
Sub-step S101c: the time serial message for merging sentence carries out sentence granularity memory coding to sentence, obtains a granularity memory
The binary channels memory coding of unit.
3. answering method as claimed in claim 2, which is characterized in that the sub-step S101b includes:
Updated binary channels term vector is encoded to lgj·(Axij) and lgj·(Cxij), wherein
lgj=(1-j/Ji)-(g/d)(1-2j/Ji) (1)
Wherein, JiIt is sentence xiThe number of middle word, and it is current dimension values in the term vector of d that g, which is dimension, and 1≤j≤JiWith 1
≤g≤d。
4. answering method as claimed in claim 3, which is characterized in that the sub-step S101c includes:
The time arrow matrix of random initializtion sentenceWithThe then binary channels note of sentence granularity memory unit
Recall and is encoded to M(S)={ { ai},{ci, wherein
ai=∑jlj·(Axij)+TA(i) (2)
ci=∑jlj·(Cxij)+TC(i) (3)
Wherein, ljTo update matrix l in sentence xiIn j-th of word renewal vector;Operator element multiplication between vector is grasped
Make;N is the maximum time sequence length of sentence set;D is time arrow dimension, identical as the dimension of term vector.
5. answering method as claimed in claim 4, which is characterized in that the step S102 includes:
Sub-step S102a: term vector matrix is utilizedTo j-th of word q in question text qjCarry out vectorization expressionObtain problem semantic coding:
Wherein, ljTo update matrix l in sentence xiIn j-th of word renewal vector;
Sub-step S102b: attention weight of the computational problem semantic coding in sentence granularity memory unit
Under the stimulation of problem semantic coding, the active information of the binary channels memory coding of sentence granularity memory unit are as follows:And
Sub-step S102c: the attention mechanism by more taking turns iteration completes the information inference in sentence granularity memory unit, obtains sentence
It is distributed in granularity memory unit in the output Word probability of dictionary dimension.
6. answering method as claimed in claim 5, which is characterized in that the sub-step S102c includes:
R is carried out in sentence granularity memory unit and takes turns information activation, obtains the active information o of R wheelR, wherein information is taken turns in r+1
In activation,
Wherein, 1≤r≤(R-1), 2≤R;Ar+1=Cr,
Output Word probability in sentence granularity memory unit in dictionary dimension is distributed are as follows:
Wherein, w={ we}E=(1,2 .., | V |)For dictionary dimension | V | the set of words of size, weFor e-th of word in dictionary V;The term vector matrix of information activation is taken turns for R;T is transposition operator.
7. answering method as claimed in claim 6, which is characterized in that the step S103 includes:
Sub-step S103a: R takes turns the attention weight vectors of information activation in distich granularity memory unitTherefrom choose its k maximum attention weight subclass1≤k≤
n;And
Sub-step S103b: k maximum attention weight subclass are chosenCorresponding sentence set as k most
Big sampling important sentences set For from α(S)The big attention weight of the i-th of middle selection,For?
Corresponding i-th important sentences in sentence set X.
8. answering method as claimed in claim 7, which is characterized in that the step S104 includes:
Sub-step S104a: all words in door Cyclic Operation Network difference distich subclass X are utilizedOn time
Between sequence carry out forward and reverse coding, for t-th of word, forward direction GRU coding hidden state be
The hidden state of GRU coding is backward
Wherein, num is the word maximal sequence length after being arranged according to all words in time series distich subclass X;With
Dimension it is identical as the dimension d of term vector,For t-th of word in sentence set X;
Sub-step S104b: the memory coding M of word granularity memory unit is obtained(W)={ mt}T=(1,2 ..., num), wherein
9. answering method as claimed in claim 8, which is characterized in that the step S105 includes:
Sub-step S105a: the attention weight vectors of the word granularity memory unit after calculating normalizationWherein:
Wherein,It is k maximum sampling important sentences setIn set of wordsCorresponding word granularity note
Recall codingIn subclassAttention weight vectors α(W)Dimension be WithFor learning parameter,For sentence setIn t-th of wordCorresponding word grain
Spend memory coding;
Sub-step S105b: word granularity exports Word probability distributionAre as follows:
Wherein, For the set of all words in important sentences set
10. answering method as claimed in claim 9, which is characterized in that the step S106 includes:
Sub-step S106a: word is exported with word granularity based on the output Word probability distribution in sentence granularity memory unit in dictionary dimension
Probability distribution carries out output word associated prediction, and associated prediction exports word distribution p (w) expression formula are as follows:
Wherein, trans () is indicated the word granularity output Word probability distribution of subclassIt is mapped to dictionary dimension
Spend the word granularity output Word probability distribution of universal class
Sub-step S106b: cross entropy supervised training is carried out to associated prediction output word distribution using the distribution of target answer word.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201610447676.4A CN106126596B (en) | 2016-06-20 | 2016-06-20 | A kind of answering method based on stratification memory network |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201610447676.4A CN106126596B (en) | 2016-06-20 | 2016-06-20 | A kind of answering method based on stratification memory network |
Publications (2)
Publication Number | Publication Date |
---|---|
CN106126596A CN106126596A (en) | 2016-11-16 |
CN106126596B true CN106126596B (en) | 2019-08-23 |
Family
ID=57470348
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201610447676.4A Active CN106126596B (en) | 2016-06-20 | 2016-06-20 | A kind of answering method based on stratification memory network |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN106126596B (en) |
Families Citing this family (37)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN106778014B (en) * | 2016-12-29 | 2020-06-16 | 浙江大学 | Disease risk prediction modeling method based on recurrent neural network |
CN106776578B (en) * | 2017-01-03 | 2020-03-17 | 竹间智能科技(上海)有限公司 | Method and device for improving conversation performance of conversation system |
CN108388561B (en) | 2017-02-03 | 2022-02-25 | 百度在线网络技术(北京)有限公司 | Neural network machine translation method and device |
CN107273487A (en) | 2017-06-13 | 2017-10-20 | 北京百度网讯科技有限公司 | Generation method, device and the computer equipment of chat data based on artificial intelligence |
CN109388706A (en) * | 2017-08-10 | 2019-02-26 | 华东师范大学 | A kind of problem fine grit classification method, system and device |
CN107491541B (en) * | 2017-08-24 | 2021-03-02 | 北京丁牛科技有限公司 | Text classification method and device |
CN107844533A (en) * | 2017-10-19 | 2018-03-27 | 云南大学 | A kind of intelligent Answer System and analysis method |
CN107766506A (en) * | 2017-10-20 | 2018-03-06 | 哈尔滨工业大学 | A kind of more wheel dialog model construction methods based on stratification notice mechanism |
CN107818306B (en) * | 2017-10-31 | 2020-08-07 | 天津大学 | Video question-answering method based on attention model |
CN110019719B (en) * | 2017-12-15 | 2023-04-25 | 微软技术许可有限责任公司 | Assertion-based question and answer |
CN108108428B (en) * | 2017-12-18 | 2020-05-12 | 苏州思必驰信息科技有限公司 | Method, input method and system for constructing language model |
CN108417210B (en) * | 2018-01-10 | 2020-06-26 | 苏州思必驰信息科技有限公司 | Word embedding language model training method, word recognition method and system |
CN108628935B (en) * | 2018-03-19 | 2021-10-15 | 中国科学院大学 | Question-answering method based on end-to-end memory network |
CN108549850B (en) * | 2018-03-27 | 2021-07-16 | 联想(北京)有限公司 | Image identification method and electronic equipment |
US10431210B1 (en) * | 2018-04-16 | 2019-10-01 | International Business Machines Corporation | Implementing a whole sentence recurrent neural network language model for natural language processing |
US10770066B2 (en) * | 2018-05-31 | 2020-09-08 | Robert Bosch Gmbh | Slot filling in spoken language understanding with joint pointer and attention |
CN108763567A (en) * | 2018-06-05 | 2018-11-06 | 北京玄科技有限公司 | Method of Knowledge Reasoning and device applied to intelligent robot interaction |
CN108959246B (en) * | 2018-06-12 | 2022-07-12 | 北京慧闻科技(集团)有限公司 | Answer selection method and device based on improved attention mechanism and electronic equipment |
CN110866403B (en) * | 2018-08-13 | 2021-06-08 | 中国科学院声学研究所 | End-to-end conversation state tracking method and system based on convolution cycle entity network |
CN109033463B (en) * | 2018-08-28 | 2021-11-26 | 广东工业大学 | Community question-answer content recommendation method based on end-to-end memory network |
CN109558487A (en) * | 2018-11-06 | 2019-04-02 | 华南师范大学 | Document Classification Method based on the more attention networks of hierarchy |
CN109840322B (en) * | 2018-11-08 | 2023-06-20 | 中山大学 | Complete shape filling type reading understanding analysis model and method based on reinforcement learning |
CN109658270A (en) * | 2018-12-19 | 2019-04-19 | 前海企保科技(深圳)有限公司 | It is a kind of to read the core compensation system and method understood based on insurance products |
CN109597884B (en) * | 2018-12-28 | 2021-07-20 | 北京百度网讯科技有限公司 | Dialog generation method, device, storage medium and terminal equipment |
CN109829631B (en) * | 2019-01-14 | 2020-10-09 | 北京中兴通网络科技股份有限公司 | Enterprise risk early warning analysis method and system based on memory network |
CN110147532B (en) * | 2019-01-24 | 2023-08-25 | 腾讯科技(深圳)有限公司 | Encoding method, apparatus, device and storage medium |
CN109977428B (en) * | 2019-03-29 | 2024-04-02 | 北京金山数字娱乐科技有限公司 | Answer obtaining method and device |
CN109992657B (en) * | 2019-04-03 | 2021-03-30 | 浙江大学 | Dialogue type problem generation method based on enhanced dynamic reasoning |
CN110134771B (en) * | 2019-04-09 | 2022-03-04 | 广东工业大学 | Implementation method of multi-attention-machine-based fusion network question-answering system |
CN110046244B (en) * | 2019-04-24 | 2021-06-08 | 中国人民解放军国防科技大学 | Answer selection method for question-answering system |
CN110334195A (en) * | 2019-06-26 | 2019-10-15 | 北京科技大学 | A kind of answering method and system based on local attention mechanism memory network |
CN110348462B (en) * | 2019-07-09 | 2022-03-04 | 北京金山数字娱乐科技有限公司 | Image feature determination and visual question and answer method, device, equipment and medium |
CN111047482B (en) * | 2019-11-14 | 2023-07-04 | 华中师范大学 | Knowledge tracking system and method based on hierarchical memory network |
CN111291803B (en) * | 2020-01-21 | 2022-07-29 | 中国科学技术大学 | Image grading granularity migration method, system, equipment and medium |
CN111310848B (en) * | 2020-02-28 | 2022-06-28 | 支付宝(杭州)信息技术有限公司 | Training method and device for multi-task model |
CN112732879B (en) * | 2020-12-23 | 2022-05-10 | 重庆理工大学 | Downstream task processing method and model of question-answering task |
CN113704437B (en) * | 2021-09-03 | 2023-08-11 | 重庆邮电大学 | Knowledge base question-answering method integrating multi-head attention mechanism and relative position coding |
Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN104834747A (en) * | 2015-05-25 | 2015-08-12 | 中国科学院自动化研究所 | Short text classification method based on convolution neutral network |
CN105159890A (en) * | 2014-06-06 | 2015-12-16 | 谷歌公司 | Generating representations of input sequences using neural networks |
-
2016
- 2016-06-20 CN CN201610447676.4A patent/CN106126596B/en active Active
Patent Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN105159890A (en) * | 2014-06-06 | 2015-12-16 | 谷歌公司 | Generating representations of input sequences using neural networks |
CN104834747A (en) * | 2015-05-25 | 2015-08-12 | 中国科学院自动化研究所 | Short text classification method based on convolution neutral network |
Non-Patent Citations (2)
Title |
---|
End-To-End Memory Networks;Sainbayar Sukhbaatar et al.;《arXiv:1503.08895v5》;20151124;全文 * |
Hierarchical Memory Networks;Sarath Chandar et al.;《arXiv:1605.07427v1》;20160524;全文 * |
Also Published As
Publication number | Publication date |
---|---|
CN106126596A (en) | 2016-11-16 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN106126596B (en) | A kind of answering method based on stratification memory network | |
CN113544703B (en) | Efficient off-policy credit allocation | |
CN107220296B (en) | Method for generating question-answer knowledge base, method and equipment for training neural network | |
CN111897941B (en) | Dialogue generation method, network training method, device, storage medium and equipment | |
CN105574098B (en) | The generation method and device of knowledge mapping, entity control methods and device | |
CN110473592B (en) | Multi-view human synthetic lethal gene prediction method | |
CN108549658A (en) | A kind of deep learning video answering method and system based on the upper attention mechanism of syntactic analysis tree | |
US10824946B2 (en) | Training neural networks using posterior sharpening | |
CN112699960A (en) | Semi-supervised classification method and equipment based on deep learning and storage medium | |
CN111581966A (en) | Context feature fusion aspect level emotion classification method and device | |
Iqbal et al. | Extracting and using building blocks of knowledge in learning classifier systems | |
Zhou et al. | ICRC-HIT: A deep learning based comment sequence labeling system for answer selection challenge | |
Li et al. | Educational data mining for students' performance based on fuzzy C‐means clustering | |
CN110858805A (en) | Method and device for predicting network traffic of cell | |
LU501881B1 (en) | A METHOD AND SYSTEM FOR PREDICTING MIRNA DISEASE ASSOCIATIONS BASED ON HETEROGENOUS GRAPHS | |
Tam Cho et al. | An optimization approach for making causal inferences | |
KR20220098698A (en) | Learning content recommendation system that predicts the user's correct answer probability using collaborative filtering based on latent factors and operation method thereof | |
Klüver | A mathematical theory of communication: Meaning, information, and topology | |
CN107092593B (en) | Sentence semantic role recognition method and system for elementary mathematics hierarchical sampling application questions | |
CN110705279A (en) | Vocabulary selection method and device and computer readable storage medium | |
CN115604131B (en) | Link flow prediction method, system, electronic device and medium | |
Zhang et al. | Design of online learning early warning model based on artificial intelligence | |
CN112417267A (en) | User behavior analysis method and device, computer equipment and storage medium | |
Olabiyi et al. | Adversarial bootstrapping for dialogue model training | |
Ziegler et al. | Modelling word recognition and reading aloud |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
C06 | Publication | ||
PB01 | Publication | ||
C10 | Entry into substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |