CN106126596B - A kind of answering method based on stratification memory network - Google Patents

A kind of answering method based on stratification memory network Download PDF

Info

Publication number
CN106126596B
CN106126596B CN201610447676.4A CN201610447676A CN106126596B CN 106126596 B CN106126596 B CN 106126596B CN 201610447676 A CN201610447676 A CN 201610447676A CN 106126596 B CN106126596 B CN 106126596B
Authority
CN
China
Prior art keywords
word
sentence
granularity
coding
memory unit
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201610447676.4A
Other languages
Chinese (zh)
Other versions
CN106126596A (en
Inventor
许家铭
石晶
姚轶群
徐波
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Institute of Automation of Chinese Academy of Science
Original Assignee
Institute of Automation of Chinese Academy of Science
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Institute of Automation of Chinese Academy of Science filed Critical Institute of Automation of Chinese Academy of Science
Priority to CN201610447676.4A priority Critical patent/CN106126596B/en
Publication of CN106126596A publication Critical patent/CN106126596A/en
Application granted granted Critical
Publication of CN106126596B publication Critical patent/CN106126596B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/332Query formulation
    • G06F16/3329Natural language query formulation or dialogue systems
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/30Semantic analysis

Abstract

The present invention provides a kind of answering methods based on stratification memory network, sentence granularity memory coding is carried out first, and under the stimulation of problem semantic coding, attention mechanism by more taking turns iteration completes the information inference of sentence granularity memory unit, sentence is screened by the sampling of k maximum, word granularity memory coding is also carried out on the basis of sentence granularity memory coding, memory coding is carried out in two levels, form the memory coding of stratification, it is distributed using sentence granularity and word granularity memory unit associated prediction output Word probability, improve the accuracy of automatic question answering, efficiently solve the answer select permeability of low-frequency word and unregistered word.

Description

A kind of answering method based on stratification memory network
Technical field
The present invention relates to automatically request-answering system constructing technology field, relate more specifically to a kind of based on level memory network End-to-end answering method.
Background technique
For a long time, automatic question answering always is one of task most challenging in natural language processing problem, this Business needs to carry out deep layer understanding to text and filters out candidate answers to respond as system.Current existing conventional method packet It includes: stand-alone training being carried out to modules during text-processing using pipeline mode, then merges the mode of output;Building Large-scale structure knowledge base, and information inference and answer prediction are carried out based on this knowledge base.In recent years, it is based on deep learning side The end-to-end system of method is widely used in solving various tasks, these methods are not necessarily to without manual construction feature to each mould Block carries out independent tuning.
Question answering system is broadly divided into two steps: related semantic information is positioned first, which is known as " activation stage ", It is then based on relevant information and carries out response generation, which is known as " generation phase ".Recently, Neural memory network model is in question and answer Preferably effect is achieved in system task.But these models it is maximum the disadvantage is that using single level sentence granularity memory list Member can not preferably solve the problems, such as low-frequency word or unregistered word.And under normal conditions, in order to which the time for reducing model is complicated Degree, often needs to reduce dictionary scale.At this point, existing end-to-end neural network model can not preferably select low frequency or not step on Word is recorded to export as answer.I.e. when target answer word is when except training dictionary, existing method can not be compared in the on-line testing stage Accurate answer is selected to export as model well.By taking following dialog texts as an example:
What is your name 1. you are good by Mr.?
2. oh, I is Williamson.
3. could you tell me your passport number.
4. it is good, it is 577838771.
5. there are also your telephone numbers?
6. number is 0016178290851.
It is assumed that " Williamson ", " 577838771 " and " 0016178290851 " are low-frequency word or unregistered word, if passed System method abandons these words or uniformly with the replacement of " unk " symbol, these methods can not be selected from dialog text Accurate user information out.However, in practical applications, how most answer informations design from low-frequency word or long-tail word A kind of answer selection method can effectively solve the problem that unregistered word is the task that current automatically request-answering system field urgently needs.
Summary of the invention
(1) technical problems to be solved
In order to solve prior art problem, the present invention provides a kind of answering methods based on stratification memory network.
(2) technical solution
The present invention provides a kind of answering methods based on stratification memory network, comprising: step S101: merging the position of word The time serial message with sentence is set, the sentence in distich subclass carries out sentence granularity memory coding, and it is single to obtain a granularity memory The binary channels memory coding of member;Step S102: complete by the attention mechanism for more taking turns iteration under the stimulation of problem semantic coding At the information inference of the sentence granularity memory unit, the output Word probability in the sentence granularity memory unit in dictionary dimension is obtained Distribution;Step S103: the sampling of k maximum is carried out to the information inference result of the sentence granularity memory unit, from the sentence set In filter out k maximum sampling important sentences set;Step S104: using bidirectional circulating neural network model to the sentence set Word granularity memory coding is carried out, the memory coding of word granularity memory unit is obtained;Step S105: it is compiled based on described problem semanteme Code, the memory coding of word granularity memory unit and k maximum sample important sentences set, and it is defeated to obtain word granularity by attention mechanism Word probability is distributed out;And step S106: associated prediction output Word probability distribution in subordinate clause granularity and word granularity memory unit, and It is exercised supervision training using cross entropy.
(3) beneficial effect
It can be seen from the above technical proposal that the answering method of the invention based on stratification memory network has with following Beneficial effect:
(1) present invention carries out sentence granularity memory coding first, and under the stimulation of problem semantic coding, by taking turns iteration more Attention mechanism complete sentence granularity memory unit information inference, the accuracy and timeliness of automatic question answering can be improved, have It is selected conducive to the answer of low-frequency word and unregistered word;
(2) sentence is screened by the sampling of k maximum, the efficiency of automatic question answering can be improved, reduce computation complexity;
(3) word granularity memory coding is also carried out on the basis of sentence granularity memory coding, i.e., is remembered in two levels Coding, forms the memory coding of stratification, can be further improved the accuracy of automatic question answering;
It (4) when carrying out word granularity memory coding using Recognition with Recurrent Neural Network, is operated on full sentence set X, the party Formula can be introduced into context environmental semantic information of the word in sentence universal class during word granularity memory coding, can be improved The accuracy and timeliness of automatic question answering;
(5) the attention mechanism in word granularity memory unit is transported on the word granularity memory unit subclass after k sample It calculates, avoids the interference information in memory coding, and reduce the calculation amount of word granularity attention mechanism;
(6) it is distributed, is can be further improved certainly using sentence granularity and word granularity memory unit associated prediction output Word probability The accuracy of dynamic question and answer, efficiently solves the answer select permeability of low-frequency word and unregistered word.
Detailed description of the invention
Fig. 1 is the flow chart of the answering method based on stratification memory network of the embodiment of the present invention;
Fig. 2 is the block schematic illustration of the answering method based on stratification memory network of the embodiment of the present invention;
Fig. 3 is the sentence granularity memory coding of the embodiment of the present invention and the information inference signal based on sentence granularity memory coding Figure;
Fig. 4 is that the word granularity memory coding of the embodiment of the present invention and the attention of word-based granularity memory coding activate signal Figure;
Fig. 5 is the performance schematic diagram 1 of the answering method based on stratification memory network of the embodiment of the present invention;
Fig. 6 is the another performance schematic diagram of the answering method based on stratification memory network of the embodiment of the present invention.
Specific embodiment
To make the objectives, technical solutions, and advantages of the present invention clearer, below in conjunction with specific embodiment, and reference Attached drawing, the present invention is described in further detail.
The invention discloses a kind of answering methods based on stratification memory network, are based on a kind of full neural network structure End to end model, may be implemented the information inference in sentence set, screening and word granularity selection effectively solve to ask under big data System is answered to the answer select permeability of low-frequency word or unregistered word.Answering method of the invention is by the sentence with time serial message Subclass carries out the memory coding of two stratification respectively, is respectively as follows: a granularity memory coding and word granularity memory coding.Then Information inference, screening and activation, the distribution of associated prediction candidate answers Word probability are carried out based on stratification memory network.
Answering method of the invention carries out sentence vectorization memory coding by stratification memory network elder generation distich subclass, Consider location information and sentence sequence time information in sentence set of the word in sentence, then passes through the note of mostly wheel iteration Power mechanism of anticipating completes the information inference of sentence granularity memory unit, and carries out the sampling of k maximum based on the reasoning results, filters out important Sentence information.Then the sequential coding of word granularity is carried out using bi-directional cyclic network model distich subclass, and passes through attention Power mechanism carries out the information activation of word granularity memory unit from the information being screened, and finally subordinate clause granularity and word granularity are remembered respectively Recall prediction output Word probability in unit to be distributed and carry out team surveillance training by Softmax, learns automatic question answering end to end Model.
The answering method based on stratification memory network as the embodiment of the present invention is carried out with reference to the accompanying drawing detailed Description.
Fig. 1 is the flow chart of the answering method based on stratification memory network of the embodiment of the present invention, and referring to Fig.1, this is asked The method of answering includes:
Step S101: merging the position of word and the time serial message of sentence, and the sentence in distich subclass carries out sentence grain Memory coding is spent, the binary channels memory coding of granularity memory unit is obtained.
Include: referring to Fig. 3, step S101
Sub-step S101a: the sentence in distich subclass with time serial message carries out the mapping of binary channels term vector, obtains Binary channels term vectorization to sentence encodes.
Sub-step S101a includes: the given sentence set X={ x with time serial messagei}I=(1,2 ..., n), wherein i For the current time sequence of sentence, n is the maximum time sequence length of sentence set;Two term vector matrixes of random initializtionWithWherein, | V | it is dictionary dimension, d is the dimension of term vector, and A and C are respectively adopted standard deviation and are 0.1, sentence x of the normal distribution that mean value is 0 as random initializtion parameter, in distich subclass XiCarry out binary channels term vector It maps, then sentence xiIn word xijBinary channels vectorization be encoded toWithJ is word in sentence xiIn Location information.
Sub-step S101b: binary channels term vectorization coding is updated according to location information of the word in sentence.
Sub-step S101b includes: that the dimension d of the location information j and term vector according to word in sentence generate update matrix L, then updated binary channels term vector is encoded to lgj·(Axij) and lgj·(Cxij), in which:
lgj=(1-j/Ji)-(g/d)(1-2j/Ji) (1)
Wherein, JiIt is sentence xiThe number of middle word, and it is current dimension values in the term vector of d that g, which is dimension, and 1≤j≤J With 1≤g≤d.
Sub-step S101c: the time serial message for merging sentence carries out sentence granularity memory coding to sentence, obtains a granularity The binary channels memory coding of memory unit.
Sub-step S101c includes: the vectorization matrix of two sentence time serieses of random initializtionWithWherein, n is the maximum time sequence length of sentence set, and d is time arrow dimension, the dimension phase with term vector Together, TAAnd TCStandard deviation is 0.1, mean value is 0 normal distribution is respectively adopted as random initializtion parameter, then sentence granularity is remembered The binary channels memory coding of unit is M(S)={ { ai, { ci, in which:
Wherein, ljIt is middle update matrix l in sentence xiIn j-th of word renewal vector, operator element between vector multiplies Method operation, such as l in formula (2)j·(Axij) indicate vector ljWith vector (Axij) carry out the operation of element multiplication.
Step S102: under the stimulation of problem semantic coding, the attention mechanism by more taking turns iteration completes sentence granularity note The information inference for recalling unit obtains being distributed in granularity memory unit in the output Word probability of dictionary dimension.
Step S102 includes:
Sub-step S102a: vectorization expression is carried out to question text, obtains the semantic coding of problem.
Sub-step S102a includes: to utilize the term vector matrixTo j-th of word q in question text qjCarry out to Quantization meansAnd the word-based position j in question text is updated vectorization expression, obtains problem Semantic coding:
With the formula (2) and (3), ljTo update matrix l in sentence xiIn j-th of word renewal vector.
Sub-step S102b: under the stimulation of problem semantic coding, using attention mechanism in the double of sentence granularity memory unit Information activation is carried out in the memory coding of channel;
Sub-step S102b includes: the attention using dot product mode computational problem semantic coding in sentence granularity memory unit Weight:
Then under the stimulation of problem semantic coding, the active information of the binary channels memory coding of sentence granularity memory unit are as follows:
Sub-step S102c: the attention mechanism by more taking turns iteration completes the information inference in sentence granularity memory unit, obtains It is distributed on to sentence granularity memory unit in the output Word probability of dictionary dimension.
Sub-step S102c includes: that R wheel information activation is carried out in sentence granularity memory unit, finds candidate sentences set, obtains The active information O taken turns to RR, wherein in r+1 wheel information activation,
Wherein, 1≤r≤(R-1) uses independent term vector matrix A in r+1 wheel information activationr+1And Cr+1And when Between vector matrixWithDistich subclass carries out vectorization expression, andCrWithIt adopts respectively Use standard deviation is 0.1, mean value is 0 normal distribution as random initializtion parameter.
The information inference in sentence granularity memory unit is completed by the attention mechanism that R takes turns iteration, obtains a granularity memory It is distributed on unit in the output Word probability of dictionary dimension are as follows:
Wherein,For dictionary dimension set of words,The term vector square of information activation is taken turns for R Battle array, and T is transposition operator.
The present invention carries out sentence granularity memory coding first, and under the stimulation of problem semantic coding, by taking turns iteration more Attention mechanism completes the information inference of sentence granularity memory unit, the accuracy and timeliness of automatic question answering can be improved, favorably It is selected in the answer of low-frequency word and unregistered word.
Step S103: the information inference result of distich granularity memory unit carries out the sampling of k maximum, screens in subordinate clause subclass K maximum samples important sentences set out.
Step S103 includes:
Sub-step S103a: R takes turns the attention weight vectors of information activation in distich granularity memory unitTherefrom choose its k maximum attention weight subclass
Sub-step S103b: k maximum attention weight subclass are chosenCorresponding sentence collection cooperation Important sentences set is sampled for k maximumSentence in important sentences setFor important sentences.
Sentence is screened in k maximum sampling of the present invention, and the efficiency of automatic question answering can be improved, reduce computation complexity, It is more conducive to the answer selection of low-frequency word and unregistered word.
Step S104: word granularity memory coding is carried out using bidirectional circulating neural network model distich subclass, obtains word The memory coding of granularity memory unit.
Include: referring to Fig. 4, step S104
Sub-step S104a: the word in important sentences set is compiled in temporal sequence using bi-directional cyclic network model Code, obtains the hidden state of bi-directional cyclic network model.There are many existing model, the present embodiment to use it for bi-directional cyclic network model One of: door Cyclic Operation Network (GRU).
Sub-step S104a includes: to utilize all words in door Cyclic Operation Network (GRU) respectively distich subclass XForward and reverse coding is carried out in temporal sequence, for the word feature of t moment, the hidden shape of forward direction GRU coding State isThe hidden state of GRU coding is backwardWherein, | t | for according to time series Word maximal sequence length after all words are arranged in distich subclass X,WithDimension it is identical as the dimension d of term vector, CRThe term vector matrix during information activation is taken turns for R in sentence granularity memory unit.
Sub-step S104b: the hidden state of bi-directional cyclic network model is merged, the note of word granularity memory unit is obtained Recall coding.
Sub-step S104b includes: directly to be added the hidden state of bi-directional cyclic network model, obtains word granularity memory unit Memory coding M(W)={ mt}T=1,2 .. | t |), wherein
It when the present invention carries out word granularity memory coding using Recognition with Recurrent Neural Network, is operated on full sentence set X, it should Mode can be introduced into context environmental semantic information of the word in sentence universal class, Ke Yiti during word granularity memory coding The accuracy and timeliness of high automatic question answering are conducive to the answer selection of low-frequency word and unregistered word.
Step S105: important sentences are sampled based on problem semantic coding, the memory coding of word granularity memory unit and k maximum Set obtains word granularity by attention mechanism and exports Word probability distribution.
Step S105 includes:
Sub-step S105a: the memory of word granularity is calculated according to problem semantic coding and the memory coding of word granularity memory unit The attention weight of unit;
Sub-step S105a includes: to take turns the problems in information activation process semantic coding based on R in sentence granularity memory unitThe memory coding M of word granularity memory unit(W)={ mt}T=1,2 .., | t |)Important sentences set is sampled with k maximumIt obtains The attention weight vectors of word granularity memory unit after normalizationWherein:
Wherein,It is k maximum sampling important sentences setIn set of wordsCorresponding word granularity Memory coding M(W)={ mt}T=(1,2 ..., | t |)In subclassAttention weight vectors α(W)Dimension with according to Time series is to important sentences setIn all wordsWord maximal sequence length after being arranged is consistent, as WithIt is learning parameter, v, W and U are all made of the normal state that standard deviation is 0.1, mean value is 0 Distribution carries out random initializtion, is updated in the training stage.
Sub-step S105b: the output Word probability distribution of word granularity is obtained according to the attention weight of word granularity memory unit.? In the embodiment of the present invention, the attention weight α of the word granularity memory unit after directlying adopt normalization(W)It is defeated as word granularity Word probability is distributed out:
At this point, the output Word probability distribution of word granularity is consistent with the dimension of attention weight, i.e., Attach most importance to Want the set of all words in sentence set
The present invention also carries out word granularity memory coding on the basis of sentence granularity memory coding, i.e., is remembered in two levels Recall coding, form the memory coding of stratification, can be further improved the accuracy of automatic question answering, be more conducive to low-frequency word and The answer of unregistered word selects.Meanwhile the attention mechanism in word granularity memory unit is the word granularity memory list after k sample Operation on first subclass, avoids the interference information in memory coding, and reduces the calculation amount of word granularity attention mechanism.
Step S106: associated prediction output Word probability distribution in subordinate clause granularity and word granularity memory unit, and utilize intersection Entropy exercises supervision training.
Step S106 includes:
Sub-step S106a: based on defeated in the output Word probability distribution of dictionary dimension and word granularity in sentence granularity memory unit Word probability distribution carries out output word associated prediction out, and associated prediction exports word distribution p (w) expression formula are as follows:
Wherein, trans () is indicated the word granularity output Word probability distribution of subclassIt is mapped to dictionary The word granularity output Word probability distribution of dimension universal classThe map operation specifically refers to output Word probability distributionMiddle probability value is according to its equivalent subclassIn word in dictionary dimension word universal classIn position carry out probability value mapping, if certain words in universal class do not occur in subclass, by it Output probability is set to 0, the word output probability distribution after being mapped
Sub-step S106b: cross entropy supervised training is carried out to associated prediction output word distribution using the distribution of target answer word. The target answer word of given training set is distributed as y, then exports word distribution p (w) based on target answer word distribution y and associated prediction Intersect entropy function and carries out combined optimization.
In one exemplary embodiment of the present invention, error back propagation is carried out to joint using stochastic gradient descent method Objective function in optimization optimizes, and Optimal Parameters include the term vector matrix in word granularity memory unit {Ar}R=(1,2 ..., R){ Cr}T=1,2 ..., R)With time arrow matrixWithThe memory of word granularity is compiled All parameter sets { θ of two-way GRU model employed in code processGRUAnd calculate word granularity memory unit attention weight V, W and U in (formula (12)).
Present invention associated prediction output Word probability distribution in sentence granularity and word granularity memory unit, can be further improved The accuracy of automatic question answering is more conducive to the answer selection of low-frequency word and unregistered word.
Fig. 2 is the block schematic illustration of the answering method based on stratification memory network as one embodiment of the invention. Referring to Fig. 2, the answering method based on stratification memory network there are two the memory network unit of level, is respectively as follows: altogether
Memory unit one: sentence set carries out the coded memory of sentence granularity with time series;
Memory unit two: all words carry out the coded memory of word granularity according to time series in sentence set.
Using k maximum using progress important information screening and filtering between different memory unit levels.
There is information activation mechanism at two in model information processing stage, is respectively as follows:
Activation mechanism one: information activation is carried out using inference mechanism in sentence granularity memory unit;
Activation mechanism two: selected ci poem is carried out using attention mechanism in word granularity memory unit and is selected.
Entire model training stage shares supervision message at two and is instructed, and is respectively as follows:
Supervision message one: the output vector after sentence granularity memory unit progress information inference is decoded defeated with Softmax To the fitting information of target word after out;
Supervision message two: to target word after word granularity memory unit progress attention mechanism activation and Softmax output It is fitted information.
For the automatic question answering response performance of accurate evaluation the method for the present invention, the present invention passes through contrast model selection output Answer word and the inconsistent error sample number of the true answer word of data compare the performance of the method for the present invention.
Table 1
Data fields Training/test question and answer pair Dictionary size (whole/training/test) It is not logged in target word (percentage)
Plane ticket booking 7,000/7,000 10,682/5,612/5,618 5,070 (72.43%)
Using a kind of Chinese air ticket ticket booking field text data set in experiment of the invention, which includes 2,000 altogether Its 5: 5 points are training set and test set by a full dialog history, 14,000 question and answer pair.For these text data sets, originally Invent (including going the operation such as stop words and stem reduction) without any processing.The specific statistical information of data set is as shown in table 1, It can be seen that the target word that is not logged in test set occupies 72.43%, bigger shadow can be generated to conventional model training It rings.
Following control methods is used in experiment of the invention:
Control methods one: the pointer network model based on attention mechanism, this method press all words in sentence set It regards a long sentence as according to time series to be encoded, the attention mechanism of direct Utilizing question and Chinese word coding generates answer;
Control methods two: Neural memory network model, this method distich subclass carry out sentence granularity coding, Utilizing question Information after coding vector progress semantic-enabled directly carries out answer matches on full dictionary space.
It is as shown in table 2 using parameter setting in present invention experiment:
Table 2
n d R k lr bs
16 100 3 1 0.01 10
In table 2, parameter n is the sentence maximum time sequence of the sentence set of experimental data, and d is term vector dimension and hidden layer Coding dimension, R are the number of iterations of inference mechanism in granularity memory unit, and k is the maximum hits between different levels memory, Lr is learning rate when carrying out Model Parameter Optimization using stochastic gradient descent method, and bs is every batch of sample when carrying out model training Quantity.
In present invention experiment, 15 wheel repetitive exercises are carried out, all methods restrain the reality as shown in figure 5, after final convergence Test that the results are shown in Table 3:
Table 3
Method Error sample number
Control methods one 109
Control methods two 56
The method of the present invention 0
Fig. 5 and table 3 are the error sample number evaluation and test of the method for the present invention, control methods one and control methods two on data set As a result.The experimental results showed that the convergence rate of the method for the present invention is significantly superior to other methods.And according to the final receipts in table 3 Hold back result, it can be seen that the method for the present invention is substantially better than other methods, can be fully solved the answer choosing that unregistered word collection closes Problem is selected, 100% accuracy is reached.
Meanwhile the maximum hits k of experimental verification of the present invention information sifting between level memory unit asks answer selection The performance of error sample number influences in topic, and experimental result is as shown in Fig. 6 and table 4.It can be seen that when maximum hits is 1, this The convergence rate of inventive method performance and final convergence result can reach optimal, further illustrate and carry out letter between level memory unit Cease the importance of selection.
Table 4
Maximum hits Error sample number
3 5
2 4
1 0
So far, attached drawing is had been combined the embodiment of the present invention is described in detail.According to above description, art technology Personnel should have clear understanding to a kind of answering method based on stratification memory network of the invention.
A kind of answering method based on stratification memory network of the present invention, first progress sentence granularity memory coding, and asking Under the stimulation for inscribing semantic coding, the attention mechanism by more taking turns iteration completes the information inference of sentence granularity memory unit, can be with The accuracy and timeliness for improving automatic question answering are conducive to the answer selection of low-frequency word and unregistered word;And it is sampled by k maximum Sentence is screened, the efficiency of automatic question answering can be improved, computation complexity is reduced, is gone back on the basis of sentence granularity memory coding Word granularity memory coding is carried out, i.e., carries out memory coding in two levels, forms the memory coding of stratification, can further mention The accuracy of high automatic question answering;It when carrying out word granularity memory coding using Recognition with Recurrent Neural Network, is operated on full sentence set X , which can be introduced into context environmental semantic information of the word in sentence universal class during word granularity memory coding, The accuracy and timeliness of automatic question answering can be improved;Attention mechanism in word granularity memory unit is the word after k sample Operation, avoids the interference information in memory coding on granularity memory unit subclass, and reduces word granularity attention machine The calculation amount of system;It is distributed, is can be further improved certainly using sentence granularity and word granularity memory unit associated prediction output Word probability The accuracy of dynamic question and answer, efficiently solves the answer select permeability of low-frequency word and unregistered word.
It should be noted that in attached drawing or specification text, the implementation for not being painted or describing is affiliated technology Form known to a person of ordinary skill in the art, is not described in detail in field.In addition, the above-mentioned definition to each element and not only limiting The various modes mentioned in embodiment, those of ordinary skill in the art simply can be changed or be replaced to it, such as:
(1) direction term mentioned in embodiment, such as "upper", "lower", "front", "rear", "left", "right" etc. are only ginsengs The direction for examining attached drawing, the protection scope being not intended to limit the invention;
(2) above-described embodiment can be based on the considerations of design and reliability, and the collocation that is mixed with each other uses or and other embodiments Mix and match uses, i.e., the technical characteristic in different embodiments can freely form more embodiments.
The purpose of the present invention, technical scheme and beneficial effects are described in detail in particular embodiments described above, It should be understood that the above is only a specific embodiment of the present invention, it is not intended to restrict the invention, it is all in the present invention Spirit and principle within, any modification, equivalent substitution, improvement and etc. done, should be included in protection scope of the present invention it It is interior.

Claims (10)

1. a kind of answering method based on stratification memory network characterized by comprising
Step S101: merging the position of word and the time serial message of sentence, and the sentence in distich subclass carries out sentence granularity note Recall coding, obtains the binary channels memory coding of granularity memory unit;
Step S102: under the stimulation of problem semantic coding, the attention mechanism by more taking turns iteration is completed the sentence granularity and is remembered The information inference for recalling unit obtains being distributed in the sentence granularity memory unit in the output Word probability of dictionary dimension;
Step S103: the sampling of k maximum is carried out to the information inference result of the sentence granularity memory unit, from the sentence set Filter out k maximum sampling important sentences set;
Step S104: word granularity memory coding is carried out to the sentence set using bidirectional circulating neural network model, obtains word The memory coding of granularity memory unit;
Step S105: important sentences are sampled based on described problem semantic coding, the memory coding of word granularity memory unit and k maximum Set obtains word granularity by attention mechanism and exports Word probability distribution;And
Step S106: in subordinate clause granularity and word granularity memory unit associated prediction output Word probability distribution, and using cross entropy into Row supervised training.
2. answering method as described in claim 1, which is characterized in that the step S101 includes:
Sub-step S101a: the given sentence set X={ x with time serial messagei}I=(1,2 ..., n), random initializtion word to Moment matrixWithSentence xiIn word xijBinary channels vectorization be encoded toWith
Wherein, i is the current time sequence of sentence;N is the maximum time sequence length of sentence set;| V | it is dictionary dimension;d For the dimension of term vector;J is word in sentence xiIn location information;
Sub-step S101b: binary channels term vectorization coding is updated according to location information of the word in sentence;And
Sub-step S101c: the time serial message for merging sentence carries out sentence granularity memory coding to sentence, obtains a granularity memory The binary channels memory coding of unit.
3. answering method as claimed in claim 2, which is characterized in that the sub-step S101b includes:
Updated binary channels term vector is encoded to lgj·(Axij) and lgj·(Cxij), wherein
lgj=(1-j/Ji)-(g/d)(1-2j/Ji) (1)
Wherein, JiIt is sentence xiThe number of middle word, and it is current dimension values in the term vector of d that g, which is dimension, and 1≤j≤JiWith 1 ≤g≤d。
4. answering method as claimed in claim 3, which is characterized in that the sub-step S101c includes:
The time arrow matrix of random initializtion sentenceWithThe then binary channels note of sentence granularity memory unit Recall and is encoded to M(S)={ { ai},{ci, wherein
ai=∑jlj·(Axij)+TA(i) (2)
ci=∑jlj·(Cxij)+TC(i) (3)
Wherein, ljTo update matrix l in sentence xiIn j-th of word renewal vector;Operator element multiplication between vector is grasped Make;N is the maximum time sequence length of sentence set;D is time arrow dimension, identical as the dimension of term vector.
5. answering method as claimed in claim 4, which is characterized in that the step S102 includes:
Sub-step S102a: term vector matrix is utilizedTo j-th of word q in question text qjCarry out vectorization expressionObtain problem semantic coding:
Wherein, ljTo update matrix l in sentence xiIn j-th of word renewal vector;
Sub-step S102b: attention weight of the computational problem semantic coding in sentence granularity memory unit
Under the stimulation of problem semantic coding, the active information of the binary channels memory coding of sentence granularity memory unit are as follows:And
Sub-step S102c: the attention mechanism by more taking turns iteration completes the information inference in sentence granularity memory unit, obtains sentence It is distributed in granularity memory unit in the output Word probability of dictionary dimension.
6. answering method as claimed in claim 5, which is characterized in that the sub-step S102c includes:
R is carried out in sentence granularity memory unit and takes turns information activation, obtains the active information o of R wheelR, wherein information is taken turns in r+1 In activation,
Wherein, 1≤r≤(R-1), 2≤R;Ar+1=Cr,
Output Word probability in sentence granularity memory unit in dictionary dimension is distributed are as follows:
Wherein, w={ we}E=(1,2 .., | V |)For dictionary dimension | V | the set of words of size, weFor e-th of word in dictionary V;The term vector matrix of information activation is taken turns for R;T is transposition operator.
7. answering method as claimed in claim 6, which is characterized in that the step S103 includes:
Sub-step S103a: R takes turns the attention weight vectors of information activation in distich granularity memory unitTherefrom choose its k maximum attention weight subclass1≤k≤ n;And
Sub-step S103b: k maximum attention weight subclass are chosenCorresponding sentence set as k most Big sampling important sentences set For from α(S)The big attention weight of the i-th of middle selection,For? Corresponding i-th important sentences in sentence set X.
8. answering method as claimed in claim 7, which is characterized in that the step S104 includes:
Sub-step S104a: all words in door Cyclic Operation Network difference distich subclass X are utilizedOn time Between sequence carry out forward and reverse coding, for t-th of word, forward direction GRU coding hidden state be The hidden state of GRU coding is backward
Wherein, num is the word maximal sequence length after being arranged according to all words in time series distich subclass X;With Dimension it is identical as the dimension d of term vector,For t-th of word in sentence set X;
Sub-step S104b: the memory coding M of word granularity memory unit is obtained(W)={ mt}T=(1,2 ..., num), wherein
9. answering method as claimed in claim 8, which is characterized in that the step S105 includes:
Sub-step S105a: the attention weight vectors of the word granularity memory unit after calculating normalizationWherein:
Wherein,It is k maximum sampling important sentences setIn set of wordsCorresponding word granularity note Recall codingIn subclassAttention weight vectors α(W)Dimension be WithFor learning parameter,For sentence setIn t-th of wordCorresponding word grain Spend memory coding;
Sub-step S105b: word granularity exports Word probability distributionAre as follows:
Wherein, For the set of all words in important sentences set
10. answering method as claimed in claim 9, which is characterized in that the step S106 includes:
Sub-step S106a: word is exported with word granularity based on the output Word probability distribution in sentence granularity memory unit in dictionary dimension Probability distribution carries out output word associated prediction, and associated prediction exports word distribution p (w) expression formula are as follows:
Wherein, trans () is indicated the word granularity output Word probability distribution of subclassIt is mapped to dictionary dimension Spend the word granularity output Word probability distribution of universal class
Sub-step S106b: cross entropy supervised training is carried out to associated prediction output word distribution using the distribution of target answer word.
CN201610447676.4A 2016-06-20 2016-06-20 A kind of answering method based on stratification memory network Active CN106126596B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201610447676.4A CN106126596B (en) 2016-06-20 2016-06-20 A kind of answering method based on stratification memory network

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201610447676.4A CN106126596B (en) 2016-06-20 2016-06-20 A kind of answering method based on stratification memory network

Publications (2)

Publication Number Publication Date
CN106126596A CN106126596A (en) 2016-11-16
CN106126596B true CN106126596B (en) 2019-08-23

Family

ID=57470348

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201610447676.4A Active CN106126596B (en) 2016-06-20 2016-06-20 A kind of answering method based on stratification memory network

Country Status (1)

Country Link
CN (1) CN106126596B (en)

Families Citing this family (37)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106778014B (en) * 2016-12-29 2020-06-16 浙江大学 Disease risk prediction modeling method based on recurrent neural network
CN106776578B (en) * 2017-01-03 2020-03-17 竹间智能科技(上海)有限公司 Method and device for improving conversation performance of conversation system
CN108388561B (en) 2017-02-03 2022-02-25 百度在线网络技术(北京)有限公司 Neural network machine translation method and device
CN107273487A (en) 2017-06-13 2017-10-20 北京百度网讯科技有限公司 Generation method, device and the computer equipment of chat data based on artificial intelligence
CN109388706A (en) * 2017-08-10 2019-02-26 华东师范大学 A kind of problem fine grit classification method, system and device
CN107491541B (en) * 2017-08-24 2021-03-02 北京丁牛科技有限公司 Text classification method and device
CN107844533A (en) * 2017-10-19 2018-03-27 云南大学 A kind of intelligent Answer System and analysis method
CN107766506A (en) * 2017-10-20 2018-03-06 哈尔滨工业大学 A kind of more wheel dialog model construction methods based on stratification notice mechanism
CN107818306B (en) * 2017-10-31 2020-08-07 天津大学 Video question-answering method based on attention model
CN110019719B (en) * 2017-12-15 2023-04-25 微软技术许可有限责任公司 Assertion-based question and answer
CN108108428B (en) * 2017-12-18 2020-05-12 苏州思必驰信息科技有限公司 Method, input method and system for constructing language model
CN108417210B (en) * 2018-01-10 2020-06-26 苏州思必驰信息科技有限公司 Word embedding language model training method, word recognition method and system
CN108628935B (en) * 2018-03-19 2021-10-15 中国科学院大学 Question-answering method based on end-to-end memory network
CN108549850B (en) * 2018-03-27 2021-07-16 联想(北京)有限公司 Image identification method and electronic equipment
US10431210B1 (en) * 2018-04-16 2019-10-01 International Business Machines Corporation Implementing a whole sentence recurrent neural network language model for natural language processing
US10770066B2 (en) * 2018-05-31 2020-09-08 Robert Bosch Gmbh Slot filling in spoken language understanding with joint pointer and attention
CN108763567A (en) * 2018-06-05 2018-11-06 北京玄科技有限公司 Method of Knowledge Reasoning and device applied to intelligent robot interaction
CN108959246B (en) * 2018-06-12 2022-07-12 北京慧闻科技(集团)有限公司 Answer selection method and device based on improved attention mechanism and electronic equipment
CN110866403B (en) * 2018-08-13 2021-06-08 中国科学院声学研究所 End-to-end conversation state tracking method and system based on convolution cycle entity network
CN109033463B (en) * 2018-08-28 2021-11-26 广东工业大学 Community question-answer content recommendation method based on end-to-end memory network
CN109558487A (en) * 2018-11-06 2019-04-02 华南师范大学 Document Classification Method based on the more attention networks of hierarchy
CN109840322B (en) * 2018-11-08 2023-06-20 中山大学 Complete shape filling type reading understanding analysis model and method based on reinforcement learning
CN109658270A (en) * 2018-12-19 2019-04-19 前海企保科技(深圳)有限公司 It is a kind of to read the core compensation system and method understood based on insurance products
CN109597884B (en) * 2018-12-28 2021-07-20 北京百度网讯科技有限公司 Dialog generation method, device, storage medium and terminal equipment
CN109829631B (en) * 2019-01-14 2020-10-09 北京中兴通网络科技股份有限公司 Enterprise risk early warning analysis method and system based on memory network
CN110147532B (en) * 2019-01-24 2023-08-25 腾讯科技(深圳)有限公司 Encoding method, apparatus, device and storage medium
CN109977428B (en) * 2019-03-29 2024-04-02 北京金山数字娱乐科技有限公司 Answer obtaining method and device
CN109992657B (en) * 2019-04-03 2021-03-30 浙江大学 Dialogue type problem generation method based on enhanced dynamic reasoning
CN110134771B (en) * 2019-04-09 2022-03-04 广东工业大学 Implementation method of multi-attention-machine-based fusion network question-answering system
CN110046244B (en) * 2019-04-24 2021-06-08 中国人民解放军国防科技大学 Answer selection method for question-answering system
CN110334195A (en) * 2019-06-26 2019-10-15 北京科技大学 A kind of answering method and system based on local attention mechanism memory network
CN110348462B (en) * 2019-07-09 2022-03-04 北京金山数字娱乐科技有限公司 Image feature determination and visual question and answer method, device, equipment and medium
CN111047482B (en) * 2019-11-14 2023-07-04 华中师范大学 Knowledge tracking system and method based on hierarchical memory network
CN111291803B (en) * 2020-01-21 2022-07-29 中国科学技术大学 Image grading granularity migration method, system, equipment and medium
CN111310848B (en) * 2020-02-28 2022-06-28 支付宝(杭州)信息技术有限公司 Training method and device for multi-task model
CN112732879B (en) * 2020-12-23 2022-05-10 重庆理工大学 Downstream task processing method and model of question-answering task
CN113704437B (en) * 2021-09-03 2023-08-11 重庆邮电大学 Knowledge base question-answering method integrating multi-head attention mechanism and relative position coding

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104834747A (en) * 2015-05-25 2015-08-12 中国科学院自动化研究所 Short text classification method based on convolution neutral network
CN105159890A (en) * 2014-06-06 2015-12-16 谷歌公司 Generating representations of input sequences using neural networks

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105159890A (en) * 2014-06-06 2015-12-16 谷歌公司 Generating representations of input sequences using neural networks
CN104834747A (en) * 2015-05-25 2015-08-12 中国科学院自动化研究所 Short text classification method based on convolution neutral network

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
End-To-End Memory Networks;Sainbayar Sukhbaatar et al.;《arXiv:1503.08895v5》;20151124;全文 *
Hierarchical Memory Networks;Sarath Chandar et al.;《arXiv:1605.07427v1》;20160524;全文 *

Also Published As

Publication number Publication date
CN106126596A (en) 2016-11-16

Similar Documents

Publication Publication Date Title
CN106126596B (en) A kind of answering method based on stratification memory network
CN113544703B (en) Efficient off-policy credit allocation
CN107220296B (en) Method for generating question-answer knowledge base, method and equipment for training neural network
CN111897941B (en) Dialogue generation method, network training method, device, storage medium and equipment
CN105574098B (en) The generation method and device of knowledge mapping, entity control methods and device
CN110473592B (en) Multi-view human synthetic lethal gene prediction method
CN108549658A (en) A kind of deep learning video answering method and system based on the upper attention mechanism of syntactic analysis tree
US10824946B2 (en) Training neural networks using posterior sharpening
CN112699960A (en) Semi-supervised classification method and equipment based on deep learning and storage medium
CN111581966A (en) Context feature fusion aspect level emotion classification method and device
Iqbal et al. Extracting and using building blocks of knowledge in learning classifier systems
Zhou et al. ICRC-HIT: A deep learning based comment sequence labeling system for answer selection challenge
Li et al. Educational data mining for students' performance based on fuzzy C‐means clustering
CN110858805A (en) Method and device for predicting network traffic of cell
LU501881B1 (en) A METHOD AND SYSTEM FOR PREDICTING MIRNA DISEASE ASSOCIATIONS BASED ON HETEROGENOUS GRAPHS
Tam Cho et al. An optimization approach for making causal inferences
KR20220098698A (en) Learning content recommendation system that predicts the user's correct answer probability using collaborative filtering based on latent factors and operation method thereof
Klüver A mathematical theory of communication: Meaning, information, and topology
CN107092593B (en) Sentence semantic role recognition method and system for elementary mathematics hierarchical sampling application questions
CN110705279A (en) Vocabulary selection method and device and computer readable storage medium
CN115604131B (en) Link flow prediction method, system, electronic device and medium
Zhang et al. Design of online learning early warning model based on artificial intelligence
CN112417267A (en) User behavior analysis method and device, computer equipment and storage medium
Olabiyi et al. Adversarial bootstrapping for dialogue model training
Ziegler et al. Modelling word recognition and reading aloud

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant