Disclosure of Invention
In order to solve the above problems, it is an object of the present invention to provide an answer selection algorithm based on an enhanced representation of the importance of a question.
In order to achieve the purpose, the invention provides the following technical scheme: an answer selection algorithm based on enhanced question importance representation, comprising the steps of:
s1, coding the question and the answer through a BilSTM coding layer;
s2, the coded question and answer are regenerated by the question vector of the self-attention mechanism to obtain a new question vector;
s3, aligning the word levels in the question and the answer by using the word level similarity matrix;
s4, capturing semantic information of multiple granularities, and comparing vectors of different granularities;
and S5, extracting fusion characteristics through multiple layers of CNNs to obtain the best option.
Preferably, in step S1, Q is the question, the answer is a, and H is usedq={hq1,...,hqmH anda={ha1,...,hanrepresents a question sentence vector and an answer sentence vector,is sentence HqThe ith word of (1) is embedded, and m and n respectively represent the lengths of the question and the answer;
the question and answer capture the information of sentence context through a BilSTM coding layer, the hidden layer dimension of the LSTM is u, and the embedded word at the time t is xtThe hidden layer and the memory unit at the previous moment are h respectivelyt-1And ct-1Hidden layer h at the next momenttAnd a memory cell ctThe calculation is as follows:
gt=φ(Wgxt+Vght-1+bg),
it=σ(Wixt+Wiht-1+bi),
ft=σ(Wfxt+Wfht-1+bf),
ot=σ(Woxt+Woht-1+bo),
ct=gt⊙it+ct-1⊙ft,
ht=ct⊙ot
wherein the content of the first and second substances,
in the above-mentioned manner,
sigma and phi are sigmoid function and tanh function respectively, ⊙ represents that two vectors are subjected to element multiplication, and an input gate i, a forgetting gate f and an output gate o can be self-multipliedDynamic control of the flow of information, with storage unit c
tCan remember the long-distance information h
tIs a vector representation at time t.
Preferably, the sentence T of question obtained in step S1 is processed in step S2
q={t
q1,...,t
qmAnd the sentence T of the answer
a={t
a1,...,t
anTherein of
And calculating the weight of each word in the question and updating the weight to generate a new question vector representation. The new vector calculation formula is:
αqSigmoid (v); wherein
Preferably, the calculation method of the word-level matrix is as follows:
M(i,j)=Uq(i)Ta(j)T
wherein the content of the first and second substances,
each row of the word-level matrix is the influence of the words in the question on each word in the answer, and the rows and the columns of the word-level matrix are normalized by a softmax function to obtain a mutual information influence factor lambda
q(i, j) and λ
a(i, j) wherein λ
q(i, j) and λ
aThe value ranges of (i, j) are all [0,1 ]](ii) a Multiplying the question vector and the answer vector with the corresponding influence factors to obtain two new vectors E
qAnd E
a。
It is preferable thatIn step S4, the original problem vector is represented as Q, and the vectors passing through the attention alignment layer are represented as Q
The original vector of the answer is A, and the vector passing through the attention alignment layer is represented as
Vector subtraction represents the euclidean distance between two vectors, and vector multiplication is approximate to the cosine distance between two vectors, and the specific calculation formula is as follows:
wherein the content of the first and second substances,
preferably, the calculation formula in step S5 is:
u ═ CNN (Fuse), wherein Fuse represents the fusion content KqOr the fusion content Ka;
Obtaining S from the output u of the CNN through maximum pooling and average poolingq,max,Sa,max,Sq,mean,Sa,meanThen splicing into a vector S;
deriving final prediction vector by multi-layer perceptron (MLP)
Obtaining a score vector by using the following formula;
reducing the difference between the probability distribution of the predicted value and the probability distribution of the label value, wherein the formula is as follows:
compared with the prior art, the invention has the beneficial effects that: the answer selection algorithm based on the enhanced question importance expression provides an answer selection algorithm based on a question importance expression network aiming at noise words in sentences, and the method generates 'clean' question sentence vectors by endowing different words with different weights again by using a self-attention mechanism; and capturing fine-grained semantic information between the question sentences and the answer sentences by using the word-level interaction matrix, thereby relieving the influence of noise words in the answer sentences.
Detailed Description
The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
Referring to fig. 1-2, the present invention provides a technical solution: an answer selection algorithm based on enhanced question importance representation, comprising the steps of:
s1, coding the question and the answer through a BilSTM coding layer; in step S1, Q is the question, the answer is A, and H is used
q={h
q1,...,h
qmH and
a={h
a1,...,h
anrepresents a question sentence vector and an answer sentence vector,
is sentence H
qThe ith word of (1) is embedded, m and n represent question and answer, respectivelyThe length of the case;
the question and answer capture the information of sentence context through a BilSTM coding layer, the hidden layer dimension of the LSTM is u, and the embedded word at the time t is xtThe hidden layer and the memory unit at the previous moment are h respectivelyt-1And ct-1Hidden layer h at the next momenttAnd a memory cell ctThe calculation is as follows:
gt=φ(Wgxt+Vght-1+bg),
it=σ(Wixt+Wiht-1+bi),
ft=σ(Wfxt+Wfht-1+bf),
ot=σ(Woxt+Woht-1+bo),
ct=gt⊙it+ct-1⊙ft,
ht=ct⊙ot
wherein the content of the first and second substances,
in the above-mentioned manner,
sigma and phi are sigmoid function and tanh function respectively, ⊙ represents that two vectors are subjected to element multiplication, an input gate i, a forgetting gate f and an output gate o can automatically control the flow of information, and a memory unit c
tCan remember the long-distance information h
tIs a vector representation at time t.
S2, the coded question and answer are regenerated by the question vector of the self-attention mechanism to obtain a new question vector; the sentence T of question obtained in step S1 is processed in step S2
q={t
q1,...,t
qmAnd the sentence T of the answer
a={t
a1,...,t
anTherein of
And calculating the weight of each word in the question and updating the weight to generate a new question vector representation. The new vector calculation formula is:
S3, establishing a word-level similarity matrix for the questions and the answers and aligning the word-level similarity matrix; the calculation mode of the word level matrix is as follows:
M(i,j)=Uq(i)Ta(j)T
wherein the content of the first and second substances,
each row of the word-level matrix is the influence of the words in the question on each word in the answer, and the rows and the columns of the word-level matrix are normalized by a softmax function to obtain a mutual information influence factor lambda
q(i, j) and λ
a(i, j) wherein λ
q(i, j) and λ
aThe value ranges of (i, j) are all [0,1 ]](ii) a Multiplying the question vector and the answer vector with the corresponding influence factors to obtain two new vectors E
qAnd E
a。
S4, capturing semantic information of multiple granularities, and fusing and comparing vectors of different granularities; the problem original vector is denoted as Q and the vectors passing through the attention-alignment layer are denoted as Q
The original vector of the answer is A, and the vector passing through the attention alignment layer is represented as
Vector subtraction represents the euclidean distance between two vectors, and vector multiplication is approximate to the cosine distance between two vectors, and the specific calculation formula is as follows:
wherein the content of the first and second substances,
s5, extracting fusion characteristics through multiple layers of CNNs to obtain the best option, wherein the calculation formula is as follows:
u ═ CNN (Fuse), wherein Fuse represents the fusion content KqOr the fusion content Ka;
Obtaining S from the output u of the CNN through maximum pooling and average poolingq,max,Sa,max,Sq,mean,Sa,meanThen splicing into a vector S;
deriving final prediction vector by multi-layer perceptron (MLP)
Obtaining a score vector by using the following formula;
reducing the difference between the probability distribution of the predicted value and the probability distribution of the label value, wherein the formula is as follows:
although the present invention has been described in detail with reference to the foregoing embodiments, it will be apparent to those skilled in the art that various changes in the embodiments and/or modifications of the invention can be made, and equivalents and modifications of some features of the invention can be made without departing from the spirit and scope of the invention.