Disclosure of Invention
In order to solve the problems in the prior art, embodiments of the present invention provide a question-answering method and system fusing a convolutional neural network and a recurrent neural network, so as to overcome the problems in the prior art that the query efficiency is low and attribute inference requires a large amount of manual sorting due to unavoidable character string comparison in entity positioning, the question-asking mode of each attribute of an entity is enumerated, the query is very complicated, and manual sorting easily causes incompleteness.
In one aspect, a question-answering method fusing a convolutional neural network and a cyclic neural network is provided, and the method includes the following steps:
s1: acquiring an entity of a question and a query ID of the entity according to the question input by a user;
s2: simplifying the question into a question mode according to the entity;
s3: collecting the related attributes of the entity according to the query ID, and generating a candidate attribute list;
s4: inputting the question mode and the candidate attribute list into a pre-trained attribute discriminator to obtain the score of each candidate attribute and question;
s5: and judging the attribute relationship which is most similar to the question according to the score, and taking out the attribute value of the most similar attribute relationship to finish the answer.
Further, the step S1 specifically includes:
according to a question input by a user, utilizing a pre-constructed dictionary tree to perform entity positioning on the question, and matching out an entity of the question and a query ID corresponding to the entity.
Further, the step S3 specifically includes:
and inquiring and taking out all attribute relations connected with the entity in a pre-constructed knowledge graph according to the inquiry ID to generate a candidate attribute list.
Further, the step S4 specifically includes:
s4.1: semantic modeling is carried out on the question by utilizing a recurrent neural network, and a attention mechanism is added according to each candidate attribute to obtain a semantic feature vector;
s4.2: performing word similarity modeling on the question by using a convolutional neural network to obtain word feature vectors;
s4.3: and combining the semantic feature vector and the word feature vector, and acquiring the score of each candidate attribute and question through a linear neural network.
Further, the step S4.1 specifically includes:
s4.1.1: segmenting the question and the candidate attribute, acquiring a word vector corresponding to the question and a relation vector corresponding to the candidate attribute according to a pre-constructed word vector library, and splicing the word vectors into a sentence vector;
s4.1.2: inputting the sentence vectors into a cyclic neural network, and splicing the output into a new sentence containing the semantic relation before and after;
s4.1.3: multiplying the new sentence by an attention weight vector to obtain a new sentence representation for the candidate attribute;
s4.1.4: and acquiring the semantic feature vector of the candidate attribute according to the new sentence expression and the relation vector.
Further, the step S4.2 specifically includes:
s4.2.1: performing similarity calculation according to the word vector corresponding to the question and the relation vector corresponding to the candidate attribute to obtain a similarity matrix consisting of two similarity scores;
s4.2.2: and inputting the similarity matrix into a convolutional neural network, and obtaining word characteristic vectors through calculation.
In another aspect, a question-answering system fusing a convolutional neural network and a cyclic neural network is provided, the system including:
the entity acquisition module is used for acquiring an entity of a question and a query ID of the entity according to the question input by a user;
the question simplifying module is used for simplifying the question into a question mode according to the entity;
the attribute acquisition module is used for collecting the related attributes of the entity according to the query ID of the entity and generating a candidate attribute list;
the attribute inference module is used for inputting the question pattern and the candidate attribute list into a pre-trained attribute discriminator to obtain the score of each candidate attribute and the question;
and the answer feedback module is used for judging the attribute which is most similar to the question according to the score and completing an answer according to the most similar attribute.
Further, the entity obtaining module is specifically configured to:
according to a question input by a user, utilizing a pre-constructed dictionary tree to perform entity positioning on the question, and matching out an entity of the question and a query ID corresponding to the entity.
Further, the attribute obtaining module is specifically configured to:
and inquiring and taking out all attribute relations connected with the entity in a pre-constructed knowledge graph according to the inquiry ID to generate a candidate attribute list.
Further, the attribute inference module comprises:
the semantic understanding unit is used for carrying out semantic modeling on the question by utilizing a recurrent neural network and adding an attention mechanism according to each candidate attribute to obtain a semantic feature vector;
the lexical similarity unit is used for performing word similarity modeling on the question by utilizing a convolutional neural network to obtain word feature vectors;
and the score output unit is used for combining the semantic feature vector and the word feature vector and acquiring the score of each candidate attribute and question through a linear neural network.
Further, the semantic understanding unit includes:
the vectorization subunit is used for segmenting the question and the candidate attributes, acquiring word vectors corresponding to the question and relation vectors corresponding to the candidate attributes according to a pre-constructed word vector library, and splicing the word vectors into sentence vectors;
the splicing subunit is used for inputting the sentence vectors into a cyclic neural network and splicing the output into a new sentence containing a front semantic relation and a rear semantic relation;
a fusion subunit, configured to multiply the new sentence by an attention weight vector to obtain a new sentence representation for the candidate attribute;
and the point multiplication subunit is used for acquiring the semantic feature vector of the candidate attribute according to the new sentence expression and the relation vector.
Further, the lexical similarity unit includes:
the similarity matrix subunit is used for performing similarity calculation according to the word vectors corresponding to the question sentences and the relation vectors corresponding to the candidate attributes to obtain a similarity matrix consisting of two similarity scores;
and the calculating subunit is used for inputting the similarity matrix into the convolutional neural network and acquiring word characteristic vectors through calculation.
The technical scheme provided by the embodiment of the invention has the following beneficial effects:
1. according to the question-answering method and system fusing the convolutional neural network and the cyclic neural network, provided by the embodiment of the invention, the query sentence is rapidly and physically positioned by utilizing the pre-constructed dictionary tree, and the entity of the query sentence and the query ID corresponding to the entity are matched, so that a high-concurrency commercial system can be met, meaningless character string comparison can be reduced, and the query efficiency is improved;
2. according to the question-answering method and system fusing the convolutional neural network and the cyclic neural network, attribute inference is carried out through a neural network-based method, vectorization numerical calculation does not need manual template arrangement, and the problems of complicated operation, incompleteness and the like of manual arrangement can be effectively solved.
Detailed Description
In order to make the objects, technical solutions and advantages of the present invention clearer, the technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the accompanying drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
It should be noted that the preset precondition of the embodiment of the present invention is that the knowledge graph is constructed, wherein the knowledge graph used for the description in the embodiment of the present invention uses an open-source Freebase knowledge base, but the knowledge graph used in the present invention is not limited to the Freebase knowledge base, and for example, YAGO, NELL, DBpedie, zhishi.
Fig. 1 is a flowchart illustrating a question-answering method fusing a convolutional neural network and a recurrent neural network according to an exemplary embodiment, and referring to fig. 1, the method includes the following steps:
s1: and acquiring an entity of the question and the query ID of the entity according to the question input by the user.
Further, in the embodiment of the present invention, a specific implementation manner of entity positioning is: according to a question input by a user, utilizing a pre-constructed dictionary tree to perform entity positioning on the question, and matching out an entity of the question and a query ID corresponding to the entity.
Specifically, in embodiments of the present invention, a generic knowledge graph (e.g., the Freebase knowledge base) has been established by default. The method comprises the steps of firstly constructing a dictionary tree according to all entity names in a knowledge map, wherein an entity set is [ Shanghai city, Shanghai city people's court, Shanghai transportation university, Shanghai Zhongzhong, Shanghai Glochun center, Glochun finance center and Jinmao Xiong ], wherein the Shanghai Glochun finance center has a nickname of the Glochun finance center, and common nicknames can be manually added. The steps of constructing the dictionary tree are as follows:
1. constructing a Root node Root;
2. traversing the entity name list, searching the prefix of the entity name of the current node according to the word, and adding a suffix node;
3. adding query ID corresponding to the entity as a leaf node;
fig. 2 is a schematic diagram of a dictionary tree constructed according to an entity set according to an exemplary embodiment, and referring to fig. 2, the online entity positioning refers to querying the constructed dictionary tree, and a plurality of leaf nodes can be searched through the dictionary tree. For example: the question "is the national court of Shanghai city? Two leaf nodes of Shanghai city and Shanghai city people court can be searched by the question sentence, and according to the rule of the embodiment of the invention, the result given by the entity positioning only has the longest matching 'Shanghai city people court'.
It should be noted that a dictionary tree, i.e., a Trie tree, also called a word-lookup tree or a key tree, is a hash tree variation. A typical application is for counting and sorting a large number of strings (but not limited to strings), and is therefore often used by search engine systems for text word frequency statistics. The dictionary tree has three basic features: 1. the root node does not contain any characters; 2. connecting the characters passing through the path from the root node to a certain node, and taking the connected characters as a character string corresponding to the node; 3. all the byte points of each node contain different characters. The core idea of the dictionary tree is to change the time by space and reduce the cost of query time by using the common prefix of the character string so as to achieve the purpose of improving the efficiency. By utilizing the characteristics of the dictionary tree, the embodiment of the invention can meet the high-concurrency business system when the question is rapidly and physically positioned, reduce meaningless character string comparison and improve the query efficiency.
S2: and simplifying the question into a question mode according to the entity.
Specifically, in the embodiment of the present invention, before attribute inference is performed on a question, the question needs to be simplified into a question mode, that is, the name of the related entity is replaced with a specific character, for example, the question "where is the national court of Shanghai? "can be simplified to" < s > where? "the Shanghai city people court" is replaced by "< s >".
S3: and collecting the related attributes of the entity according to the query ID to generate a candidate attribute list.
Further, according to the query ID, querying and taking out all attribute relations connected with the entity from a pre-constructed knowledge graph, and generating a candidate attribute list.
Specifically, after the entity location is completed, the query ID corresponding to the entity is obtained at the same time, and the system queries all possible candidate attribute relations in the knowledge graph according to the query ID, such as [ organization, location, organization, telephone, organization, brief introduction, … ], and takes out all candidate attribute relations to generate a candidate attribute list.
S4: and inputting the question pattern and the candidate attribute list into a pre-trained attribute discriminator to obtain the score of each candidate attribute and question.
Specifically, the question pattern and the candidate attribute list obtained in the above steps are input into a pre-trained attribute inference discriminator to obtain the score of each candidate attribute and question. It should be noted here that the training process of the attribute inference arbiter requires a tagged data set of question and attribute relationship, the data format of the tag can be expressed as (subject, relationship, object, query), and the corresponding knowledge question and answer can be understood as (entity, attribute relationship, attribute value, question). In the embodiment of the invention, an open-source SimpleQuestion data set is used for explanation, but the invention is not limited to the data set, and any labeled data set meeting the format requirement can be used for training the model. Since there are only positive examples and no negative examples in the labeled dataset, the data also needs to be augmented in the training attribute inference arbiter.
For each item in the labeled dataset, all attribute relationships of the entity are extracted from the knowledge graphIs a candidate attribute list. The attribute relationship in the embodiment of the invention is composed of two parts, wherein the front part is a descriptor of the entity type, and the rear part is a descriptor of the attribute relationship. And respectively processing the words of the positive sample attribute relationship, and correspondingly, taking out corresponding data from the knowledge graph to assemble into negative samples with the same format. One assembled data item can be represented as (A)
p) in which r
+Expressed as labeling positive samples within the data set, and r
-Representing the following negative examples complemented by data.
In addition, in order to ensure the accuracy of the attribute inference arbiter, the embodiment of the present invention adopts a method of minimizing the maximum interval as a loss function to train parameters of the whole network, wherein an empirical constant of the loss function is set to 1.0. The simple description of the maximum interval method is the score S (r) of the negative sample
-) Subtract the score of the positive sample S (r)
+) After adding an empirical constant γ, if it is greater than 0, it will take itself, otherwise it will take 0. The labeled data set used for the description of the invention only has one positive sample and a plurality of negative samples, so the positive samples and the negative samples are combined into data items one by one to be calculated and then are summed and averaged. Is formulated as:
s5: judging the attribute relationship which is most similar to the question sentence according to the score, and taking out the attribute value of the most similar attribute relationship to finish the answer
Specifically, the attribute relationship most similar to the question is judged according to the score to serve as the most accurate attribute relationship, and the attribute value of the most similar attribute relationship is taken out to serve as an answer to be fed back to the user, so that the answer is completed.
Fig. 3 is a flowchart illustrating inputting the question pattern and the candidate attribute list into a pre-trained attribute discriminator to obtain a score of each candidate attribute and question according to an exemplary embodiment, and referring to fig. 3, the flowchart includes the following steps:
s4.1: and performing semantic modeling on the question by using a recurrent neural network, and adding an attention mechanism according to each candidate attribute to obtain a semantic feature vector.
The specific process is as follows:
s4.1.1: and segmenting the question and the candidate attribute, acquiring a word vector corresponding to the question and a relation vector corresponding to the candidate attribute according to a pre-constructed word vector library, and splicing the word vectors into a sentence vector.
Specifically, we usually need to convert the words or sentences into a computable vector to use the computational advantages of the computer to perform numerical computation on the sentence vector or the word vector. In the embodiment of the invention, the sentence vector is simply formed by word vector splicing. For example, a word vector matrix is initialized by random assignment, and then the vectors of the input sentences one-hot are subjected to looking up-up table lookup to obtain word vectors which are then spliced into sentence vectors. It should be noted that, in the embodiment of the present invention, all word vectors in the vectorization layer are initialized by using pre-trained GloVe 300-dimensional input, and the attribute relation word selects a random initialization 150-dimensional vector.
S4.1.2: and inputting the sentence vectors into a cyclic neural network, and splicing the output into a new sentence containing the semantic relation before and after.
Specifically, in the embodiment of the invention, a sentence vector is input into a bidirectional recurrent neural network, and the hidden layer output h at each moment is output
t(including forward and reverse, i.e.
) A new sentence structure H containing the semantic relation before and after is formed by splicing
1:L=[h
1;...;h
L]. In the embodiment of the present invention, the hidden layer size of the bidirectional recurrent neural network is set to 200.
S4.1.3: multiplying the new sentence by an attention weight vector to obtain a new sentence representation for the candidate attribute.
Specifically, the new sentence is multiplied by an attention weight vector to obtain a candidate attribute r
iNew sentence representation of
Wherein, a
ijRepresenting a candidate attribute r
iWith each word w in the sentence
jAttention weight of (1). It should be noted that the weight matrix in the embodiment of the present invention is trained in advance. The attention weight is a correlation coefficient of a word corresponding to each position of the input question and an attribute relation word, the correlation is weighed through the inner product of word vectors, and then the words are normalized by softmax. An attention weight vector is a vector of weights of respective positions in a sentence [ a1, a2]。
S4.1.4: and acquiring the semantic feature vector of the candidate attribute according to the new sentence expression and the relation vector.
In particular, p is finally represented by a new sentence
iPerforming point multiplication on the sum relation vector to obtain a semantic feature vector related to the candidate attribute
S4.2: and performing word similarity modeling on the question by using a convolutional neural network to obtain word feature vectors.
The specific process is as follows:
s4.2.1: and performing similarity calculation according to the word vector corresponding to the question and the relation vector corresponding to the candidate attribute to obtain a similarity matrix formed by two similarity scores.
Specifically, the question and the candidate attribute are segmented according to the word granularity, the word vector corresponding to the question and the relation vector corresponding to the candidate attribute are obtained according to a word vector library constructed in advance, and then cosine similarity scores are mutually calculated to form a similarity matrix.
S4.2.2: and inputting the similarity matrix into a convolutional neural network, and obtaining word characteristic vectors through calculation.
In particular, a similarity matrix M is formedijInputting the data into a convolutional neural network, multiplying the data by a convolution kernel with a specific size and elements in a similar matrix to obtain a characteristic matrix, and performing maximum pooling operation from the transverse direction and the longitudinal direction to obtain c1And c2And obtaining lexical feature vectors Z through a linear network layer respectively1And Z2。
It should be noted here that, in the embodiment of the present invention, the convolution kernel in the convolutional neural network is set to 3 × 3, and the channel is set to 4.
S4.3: and combining the semantic feature vector and the word feature vector, and acquiring the score of each candidate attribute and question through a linear neural network.
Specifically, the semantic feature vector and the word feature vector obtained in the previous step are spliced to obtain input data Z ═ Z of the output layer1;...;zn]. Then according to the formula S (p, r) ═ sigmoid (W)TZ + b) to obtain scores for the question and the candidate attributes.
Fig. 4 is a schematic structural diagram illustrating a question-answering system fusing a convolutional neural network and a recurrent neural network according to an exemplary embodiment, and referring to fig. 4, the system includes:
the entity acquisition module is used for acquiring an entity of a question and a query ID of the entity according to the question input by a user;
in the embodiment of the present invention, the specific process of the entity obtaining module for entity positioning of the question sentence is as follows:
according to a question input by a user, utilizing a pre-constructed dictionary tree to perform entity positioning on the question, and matching out an entity of the question and a query ID corresponding to the entity.
The question simplifying module is used for simplifying the question into a question mode according to the entity;
the attribute acquisition module is used for collecting the related attributes of the entity according to the query ID of the entity and generating a candidate attribute list;
in the embodiment of the present invention, the specific process of the attribute obtaining module obtaining the related attribute relationship of the question is as follows:
and inquiring and taking out all attribute relations connected with the entity in a pre-constructed knowledge graph according to the inquiry ID to generate a candidate attribute list.
The attribute inference module is used for inputting the question pattern and the candidate attribute list into a pre-trained attribute discriminator to obtain the score of each candidate attribute and the question;
and the answer feedback module is used for judging the attribute which is most similar to the question according to the score and completing an answer according to the most similar attribute.
Further, the attribute inference module comprises:
the semantic understanding unit is used for carrying out semantic modeling on the question by utilizing a recurrent neural network and adding an attention mechanism according to each candidate attribute to obtain a semantic feature vector;
specifically, in the embodiment of the present invention, the semantic understanding unit is embodied as a recurrent neural network structure with attention mechanism.
And the lexical similarity unit is used for performing word similarity modeling on the question by utilizing a convolutional neural network to obtain word feature vectors.
And the score output unit is used for combining the semantic feature vector and the word feature vector and acquiring the score of each candidate attribute and question through a linear neural network.
Specifically, in the embodiment of the present invention, the score output unit is preferably a BP neural network, that is, the neuron is a perceptron of a sigmoid function.
Further, the semantic understanding unit includes:
the vectorization subunit is used for segmenting the question and the candidate attributes, acquiring word vectors corresponding to the question and relation vectors corresponding to the candidate attributes according to a pre-constructed word vector library, and splicing the word vectors into sentence vectors;
the splicing subunit is used for inputting the sentence vectors into a cyclic neural network and splicing the output into a new sentence containing a front semantic relation and a rear semantic relation;
a fusion subunit, configured to multiply the new sentence by an attention weight vector to obtain a new sentence representation for the candidate attribute;
and the point multiplication subunit is used for acquiring the semantic feature vector of the candidate attribute according to the new sentence expression and the relation vector.
Further, the lexical similarity unit includes:
the similarity matrix subunit is used for performing similarity calculation according to the word vectors corresponding to the question sentences and the relation vectors corresponding to the candidate attributes to obtain a similarity matrix consisting of two similarity scores;
and the calculating subunit is used for inputting the similarity matrix into the convolutional neural network and acquiring word characteristic vectors through calculation.
Fig. 5 is a schematic structural diagram of attribute inference according to an exemplary embodiment, and referring to fig. 5, the specific process is as follows:
suppose that the question entered by the user is "who the director of xx is? Firstly, an entity obtaining module carries out entity positioning on a question, an entity of the question is positioned to be 'xx', then a query ID of 'xx' is obtained, and an attribute obtaining module collects relevant attributes of the question according to the query ID of 'xx', and generates a candidate attribute list. Then, the question reduction module replaces "xx" with a specific character, such as replacing it with < s >, who is the director of "xx? "simplify to" who the director of < s > is "question mode.
On the one hand, who the director of the question pattern "< s > is" and the candidate attribute list are input into the attribute inference module, and the score of each candidate attribute and question is obtained. The specific process is as follows: firstly, inputting a director who a question pattern is's' and a candidate attribute list into a vectorization subunit, vectorizing the vectorization subunit respectively, obtaining word vectors corresponding to the question and relation vectors corresponding to the candidate attributes, and splicing the word vectors into sentence vectors. And then, inputting the sentence vectors into a cyclic neural network of a splicing subunit, and splicing the output into a new sentence containing the semantic relation before and after. The fusion subunit then multiplies the new sentence by the attention weight vector, obtaining a new sentence representation for the candidate attribute. And finally, the point multiplication subunit performs point multiplication on the new sentence expression and the relation vector to acquire the semantic feature vector of the candidate attribute.
And on the other hand, the similar matrix subunit performs similar calculation according to the word vector corresponding to the question and the relation vector corresponding to the candidate attribute to obtain a similar matrix consisting of two similar scores. Then the calculating subunit inputs the similar matrix into a convolutional neural network, a feature matrix is obtained by utilizing a convolution kernel with a specific size and the element point multiplication in the similar matrix, then the maximum pooling operation is carried out from the transverse direction and the longitudinal direction, and then word feature vectors are obtained through a linear network layer respectively.
And finally, inputting the semantic feature vectors and the word feature vectors into a score output unit, and acquiring the score of each candidate attribute and question through a linear neural network.
All the above-mentioned optional technical solutions can be combined arbitrarily to form the optional embodiments of the present invention, and are not described herein again.
In summary, the technical solution provided by the embodiment of the present invention has the following beneficial effects:
1. according to the question-answering method and system fusing the convolutional neural network and the cyclic neural network, provided by the embodiment of the invention, the query sentence is rapidly and physically positioned by utilizing the pre-constructed dictionary tree, and the entity of the query sentence and the query ID corresponding to the entity are matched, so that a high-concurrency commercial system can be met, meaningless character string comparison can be reduced, and the query efficiency is improved;
2. according to the question-answering method and system fusing the convolutional neural network and the cyclic neural network, attribute inference is carried out through a neural network-based method, vectorization numerical calculation does not need manual template arrangement, and the problems of complicated operation, incompleteness and the like of manual arrangement can be effectively solved.
It should be noted that: in the above-described embodiment, when the question-answering system fusing the convolutional neural network and the cyclic neural network triggers the question-answering service, the above-described division of the functional modules is merely used as an example, and in practical applications, the above-described function distribution may be completed by different functional modules according to needs, that is, the internal structure of the system is divided into different functional modules to complete all or part of the above-described functions. In addition, the question-answering system fusing the convolutional neural network and the cyclic neural network and the question-answering method fusing the convolutional neural network and the cyclic neural network provided by the embodiments belong to the same concept, and specific implementation processes thereof are detailed in the method embodiments and are not described herein again. In addition, the question-answering system fusing the convolutional neural network and the cyclic neural network can be realized based on the question-answering method fusing the convolutional neural network and the cyclic neural network.
It will be understood by those skilled in the art that all or part of the steps for implementing the above embodiments may be implemented by hardware, or may be implemented by a program instructing relevant hardware, where the program may be stored in a computer-readable storage medium, and the above-mentioned storage medium may be a read-only memory, a magnetic disk or an optical disk, etc.
The above description is only for the purpose of illustrating the preferred embodiments of the present invention and is not to be construed as limiting the invention, and any modifications, equivalents, improvements and the like that fall within the spirit and principle of the present invention are intended to be included therein.