CN114896407B - Question-answering method based on combination of semantic analysis and vector modeling - Google Patents
Question-answering method based on combination of semantic analysis and vector modeling Download PDFInfo
- Publication number
- CN114896407B CN114896407B CN202210275679.XA CN202210275679A CN114896407B CN 114896407 B CN114896407 B CN 114896407B CN 202210275679 A CN202210275679 A CN 202210275679A CN 114896407 B CN114896407 B CN 114896407B
- Authority
- CN
- China
- Prior art keywords
- question
- entity
- vector
- intention
- answer
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
- 239000013598 vector Substances 0.000 title claims abstract description 111
- 238000000034 method Methods 0.000 title claims abstract description 33
- 238000004458 analytical method Methods 0.000 title claims abstract description 17
- 230000006870 function Effects 0.000 claims description 18
- 238000002372 labelling Methods 0.000 claims description 17
- 238000004364 calculation method Methods 0.000 claims description 13
- 230000002457 bidirectional effect Effects 0.000 claims description 11
- 230000007704 transition Effects 0.000 claims description 11
- 238000011176 pooling Methods 0.000 claims description 7
- 238000007781 pre-processing Methods 0.000 claims description 7
- 238000012545 processing Methods 0.000 claims description 7
- 238000005352 clarification Methods 0.000 claims description 6
- 230000009191 jumping Effects 0.000 claims description 6
- 238000012163 sequencing technique Methods 0.000 claims description 6
- 239000011159 matrix material Substances 0.000 claims description 5
- 238000004422 calculation algorithm Methods 0.000 claims description 4
- 239000013604 expression vector Substances 0.000 claims description 4
- 238000000605 extraction Methods 0.000 claims description 4
- 230000004913 activation Effects 0.000 claims description 3
- 238000012549 training Methods 0.000 claims description 3
- 238000012546 transfer Methods 0.000 claims description 3
- 238000013507 mapping Methods 0.000 claims description 2
- 238000011084 recovery Methods 0.000 claims description 2
- 208000009525 Myocarditis Diseases 0.000 description 7
- 238000010586 diagram Methods 0.000 description 5
- 230000008569 process Effects 0.000 description 3
- 238000013528 artificial neural network Methods 0.000 description 2
- 230000008901 benefit Effects 0.000 description 2
- 238000013461 design Methods 0.000 description 2
- 238000011161 development Methods 0.000 description 2
- 239000012634 fragment Substances 0.000 description 2
- 230000006698 induction Effects 0.000 description 2
- 230000007246 mechanism Effects 0.000 description 2
- 238000012986 modification Methods 0.000 description 2
- 230000004048 modification Effects 0.000 description 2
- 238000011160 research Methods 0.000 description 2
- 230000004075 alteration Effects 0.000 description 1
- 238000013473 artificial intelligence Methods 0.000 description 1
- 230000009286 beneficial effect Effects 0.000 description 1
- 230000001149 cognitive effect Effects 0.000 description 1
- 238000004590 computer program Methods 0.000 description 1
- 238000012790 confirmation Methods 0.000 description 1
- 230000007547 defect Effects 0.000 description 1
- 230000000694 effects Effects 0.000 description 1
- 238000005516 engineering process Methods 0.000 description 1
- 239000000284 extract Substances 0.000 description 1
- 238000003058 natural language processing Methods 0.000 description 1
- 238000003672 processing method Methods 0.000 description 1
- 230000000306 recurrent effect Effects 0.000 description 1
- 230000000007 visual effect Effects 0.000 description 1
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/30—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
- G06F16/36—Creation of semantic tools, e.g. ontology or thesauri
- G06F16/367—Ontology
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/30—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
- G06F16/33—Querying
- G06F16/3331—Query processing
- G06F16/334—Query execution
- G06F16/3344—Query execution using natural language analysis
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/30—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
- G06F16/35—Clustering; Classification
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/10—Text processing
- G06F40/12—Use of codes for handling textual entities
- G06F40/126—Character encoding
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/10—Text processing
- G06F40/194—Calculation of difference between files
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/20—Natural language analysis
- G06F40/205—Parsing
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/20—Natural language analysis
- G06F40/279—Recognition of textual entities
- G06F40/289—Phrasal analysis, e.g. finite state techniques or chunking
- G06F40/295—Named entity recognition
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/044—Recurrent networks, e.g. Hopfield networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- Computational Linguistics (AREA)
- Artificial Intelligence (AREA)
- General Health & Medical Sciences (AREA)
- Health & Medical Sciences (AREA)
- Data Mining & Analysis (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Databases & Information Systems (AREA)
- Life Sciences & Earth Sciences (AREA)
- Biomedical Technology (AREA)
- Computing Systems (AREA)
- Mathematical Physics (AREA)
- Software Systems (AREA)
- Molecular Biology (AREA)
- Evolutionary Computation (AREA)
- Biophysics (AREA)
- Animal Behavior & Ethology (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
- Machine Translation (AREA)
Abstract
The invention discloses a question-answering method based on combination of semantic analysis and vector modeling, which comprises the steps of completing named entity identification of a question, completing slot filling, completing intention identification of the question, confirming the intention of identification, completing answer inquiry through a knowledge graph inquiry language according to the entity and the intention of the question, inquiring a plurality of triples related to the entity according to the entity identified in the step 2 if the intention of identification is not in a preset intention range, completing sub-picture recall of the entity, respectively coding and sorting paths of the question and the sub-picture, comparing the sorted scores, returning the path with the highest score as an answer, completing answer inquiry, understanding the input question, and carrying out recall sorting by combining a knowledge graph, and returning the answer with high accuracy, thereby being capable of answering the question proposed by a user in natural language with accurate and simple natural language.
Description
Technical Field
The invention relates to the technical field of natural language processing, in particular to a question-answering method based on combination of semantic analysis and vector modeling.
Background
With the rapid development of information technology, research on the direction of artificial intelligence is more and more advanced, researchers have also begun to research on intelligent question-answering systems, and various data sets have also begun to appear. Along with the rapid development of the knowledge graph, a new knowledge source is provided for intelligent question answering. The knowledge graph can be regarded as a semantic network, the nodes in the knowledge graph represent the entities or concepts of related knowledge, the directed edges represent the relations among the entities, the knowledge graph can visualize the relation among the entities, the relation among the data can be represented in a more visual way, the knowledge graph accords with the cognitive habit of people better, and intelligent question-answering based on the knowledge graph starts to attract the attention of a plurality of researchers.
Based on a knowledge graph question-answering system, the knowledge base is utilized for inquiring and reasoning to obtain an answer through semantic understanding and analyzing of questions, the answer is needed to be found according to data of the knowledge graph, the accuracy is high, and the judging standard consists of three values, namely a recall rate, an accuracy rate and an F1 value. The related technical scheme based on the knowledge graph question-answering system comprises semantic analysis, information retrieval and vector modeling. The question-answering algorithm based on semantic analysis converts natural language into a series of formalized logic forms, and obtains the logic form capable of expressing the whole question semantics through analyzing the logic forms from bottom to top, and obtains answers by inquiring in a knowledge graph through corresponding inquiry sentences, but the accuracy of returning the answers is not high only by using the method. The method based on vector modeling is to consider knowledge base questions and answers as a semantic matching process, obtain a numerical vector in a low-dimensional space by representing the vector mapping of learning knowledge patterns and user questions, and then directly match answers most similar to the semantics of the user questions through numerical calculation, namely, a question and answer task can be regarded as a process of calculating the similarity of the semantic vector of the question and the semantic vector of the entity and the edge in the knowledge base.
Most of the existing question-answering system methods focus on simple questions, namely only one entity and one relation are involved, the common solution is that the questions are mapped to the triplet query in the knowledge graph to obtain answers, but the common KBQA method cannot work well for complex questions involving a plurality of entities and relations.
Disclosure of Invention
According to the defects of the prior art, the invention aims to provide a question-answering method based on the combination of semantic analysis and vector modeling, which can answer questions presented by a user in natural language with accurate and simple natural language by understanding the input questions and carrying out recall ordering by combining with a knowledge graph and finally returning an answer with high accuracy.
In order to solve the technical problems, the invention adopts the following technical scheme:
a question-answering method based on combination of semantic analysis and vector modeling comprises the following steps:
Step 1, preprocessing a question input by a user to obtain a word vector with high quality, obtaining an optimal prediction sequence, and completing named entity recognition of the question to obtain an entity;
Step 2, if the entity identified in the step 1 is not the standard name, the entity chain refers to the unique entity in the knowledge graph, the entity is obtained again, and if the entity identified in the step 1 is the standard name, the entity is used;
Step 3, taking the unique entity obtained in the step 2 as a slot, if the slot of the question is identified as empty, loading the context of the user dialogue, and inheriting the slot saved in the previous dialogue to finish slot filling;
step 4, completing the intention recognition of the question;
Step 5, if the intention recognition is ambiguous, using a pre-designed semantic slot template to reply to the clarification operation to confirm the intention, and jumping to the step 6 if the intention is correct and jumping to the step 7 if the intention is incorrect;
step 6, according to the entity and intention of the question, finishing answer inquiry through a knowledge graph inquiry language;
Step 7, if the identified intention is not in the preset intention range, inquiring a plurality of triples associated with the entity according to the entity identified in the step 2 to complete sub-picture recall of the entity;
and 8, respectively coding and sequencing the paths of the questions and the subgraphs in the step 7, comparing the sequenced scores, and returning the path with the highest score as an answer to finish answer inquiry.
Further, the step 1 includes: the method comprises the steps of establishing a BERT-BiLSTM-CRF model, preprocessing a question by using the BERT module, inputting the question into the BERT model, obtaining word vectors through a bidirectional transducer structure, calculating input hidden information by using a BiLSTM module through bidirectional LSTM, decoding output of a BiLSTM module by using the CRF module, solving an optimal path, and obtaining an optimal prediction sequence.
Further, the step of the BERT-BiLSTM-CRF model for completing the named entity recognition of the question specifically comprises the following steps:
Step 101, inputting a question in a BERT module, and acquiring a plurality of word vectors through the BERT module, wherein the word vectors are respectively expressed as w 1、w2、w3,…,wn, and each word vector is provided with a corresponding label;
102, in BiLSTM module, inputting embedding of each word vector, and extracting semantic expression vector of each word vector on the context by using bi-directional LSTM model;
Step 103, decoding semantic vectors of the word vectors by using softmax;
Step 104, decoding the BiLSTM module output result through the CRF module, marking a sequence L and a question W, where the question W is given by a user, and the marking sequence L is output in the given question W, where in the CRF module, the given question W includes n word vectors, the marking sequence is scored according to a feature function set, a transition score is obtained, an optimal predicted sequence is obtained according to a maximum value of probability values of the marking sequence, and a calculation formula of the transition score is as follows:
Wherein score (L/W) is a transfer score, f k represents a feature function, a weight lambda k is given to each feature function, n represents the length of a labeling sequence, K represents the number of feature functions, K represents the subscript of the feature function, and i represents the subscript of a word vector in the labeling sequence or question;
The specific calculation formula of the probability value of the labeling sequence is as follows:
Wherein p (L/W) is the probability value of the labeling sequence, W is a question, L is the labeling sequence, and score (L/W) is the transition score.
Further, the step 2 includes: and (2) for the attribute of the entity identified in the step (1) to be linked to the attribute of the related entity in the knowledge graph, obtaining a question and a corresponding candidate relation or vector representation of the attribute through a bidirectional LSTM algorithm model, determining the entity linked to the knowledge graph by obtaining semantic similarity, enabling the entity candidate word extracted from the question to be W, enabling the entity in the knowledge graph to be Z, wherein Z= (Z 1,z2,z3,…zn), N represents an N-dimensional vector corresponding to the entity W and the entity Z, and the semantic similarity is specifically calculated as follows:
where cos (W, Z) is semantic similarity and j represents the j-th word in the N-dimensional vector.
Further, the step 4 includes: identifying a question intention by using a BERT-TextCNN model, performing token embedding, segment embedding and position embedding representation on an original input, inputting the original input to a BERT module, generating a word vector matrix, further performing convolution operation through a convolution layer of the TextCNN module to generate a feature map, performing maximum pooling operation through a pooling layer of TextCNN, outputting an intention classification result by using a softmax activation function in a final full-connection layer, and taking the maximum value as a final intention between 0 and 1;
Performing convolution operation on a TextCNN convolution layer to generate a feature map, wherein the obtained feature formula is as follows:
wherein c represents a feature extraction vector, p represents a p-th word in the question, w is a convolution kernel, d is the width of the convolution kernel, h is the height of the convolution kernel,
The output result of the full connection layer is:
y=softmax(wdense·(zοr)+bdense)
wherein y is a classification result, and w dense and b dense are parameters and offset of the full connection layer respectively.
Further, the final result value obtained in the step 4 may be smaller, when the reply processing is performed, a comparison is performed on the final result, if the result is between 0.4 and 0.8, the intention is ambiguous, the clarification processing is required, and the step 6 is skipped; if the result is less than 0.4, it is indicated that the identified result is too low, and step 7 is skipped to ensure accuracy.
Further, according to the final result value of the step 4, if the final result is above 0.8, the meaning understanding is already completed, the accuracy is high, and the knowledge graph query language Cypher can be used to directly return the answer.
Further, the question input by the user is reduced from high dimensionality to low dimensionality, the question and the answer are mapped into a low-dimensional space to obtain distributed expression, the distributed expression is trained by utilizing a data set, the similarity between the question and the answer is calculated according to the Manhattan distance, the similarity between the question and the answer is as high as possible, and finally the return final answer with the highest score is obtained according to the vector representation in the candidate answer group and the expression of the question input by the user;
The question is W, w= (W 1,w2,…wn), the answer is B, b= (B 1,b2,…bn), and the calculation formula for calculating the similarity between the question and the answer according to the manhattan distance is as follows:
wherein dist max (W, B) is the manhattan distance between vector W and vector B, W q represents the q-th tuple in vector W, B q represents the q-th tuple in vector B, and n represents the number of the tuple in vector W or vector B.
Compared with the prior art, the invention has the following advantages and beneficial effects:
1. According to the question-answering method based on the combination of semantic analysis and vector modeling, the input questions are understood, recall ordering is carried out by combining with the knowledge graph, and finally, a high-accuracy answer is returned, so that questions submitted by users in natural language can be answered by accurate and simple natural language, and the method can be widely applied to various business scenes in industry.
2. The question-answering method based on the combination of semantic analysis and vector modeling realizes multiple rounds of question-answering by using slot inheritance and intention inheritance.
3. According to the question-answering method based on the combination of semantic analysis and vector modeling, the accuracy of answers is improved by means of meaning induction and semantic slot design, and the coverage rate of the answers is improved by combining sub-graph recall on the basis.
Drawings
FIG. 1 is a flow chart of a question-answering method based on the combination of semantic parsing and vector modeling;
FIG. 2 is a BERT-BiLSTM-CRF model diagram;
FIG. 3 is a BERT-TextCNN intent recognition model;
FIG. 4 is a frame diagram based on vector modeling.
Detailed Description
The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the accompanying drawings in the embodiments of the present invention.
As shown in fig. 1, a question-answering method based on combination of semantic analysis and vector modeling includes:
Step 1, preprocessing a question input by a user to obtain a word vector with high quality, obtaining an optimal prediction sequence, and completing named entity recognition of the question to obtain an entity;
Step 2, if the entity identified in the step 1 is not the standard name, the entity chain refers to the unique entity in the knowledge graph, the entity is obtained again, and if the entity identified in the step 1 is the standard name, the entity is used;
Step 3, taking the unique entity obtained in the step 2 as a slot, if the slot of the question is identified as empty, loading the context of the user dialogue, and inheriting the slot saved in the previous dialogue to finish slot filling;
step 4, completing the intention recognition of the question;
Step 5, if the intention recognition is ambiguous, using a pre-designed semantic slot template to reply to the clarification operation to confirm the intention, and jumping to the step 6 if the intention is correct and jumping to the step 7 if the intention is incorrect;
step 6, according to the entity and intention of the question, finishing answer inquiry through a knowledge graph inquiry language;
Step 7, if the identified intention is not in the preset intention range, inquiring a plurality of triples associated with the entity according to the entity identified in the step 2 to complete sub-picture recall of the entity;
and 8, respectively coding and sequencing the paths of the questions and the subgraphs in the step 7, comparing the sequenced scores, returning the path with the highest score as an answer, skipping the step 6, and returning the answer.
According to the invention, through understanding the input question and carrying out recall sequencing by combining with the knowledge graph, a high-accuracy answer is finally returned, and the question which is presented by the user in natural language can be answered by accurate and simple natural language.
The traditional form of a question-and-answer is to give an answer to a question posed by the user, which answer contains the processing methods under all conditions. A problem user in this mode needs to read a large amount of content and find his own desired answer in this piece of content. And the multiple rounds of questions and answers can be used for inquiring the user, and the answer wanted by the user is given under the condition of clear user intention. The invention can realize multi-round question and answer by using slot inheritance and intention inheritance, and improves the accuracy of the answers by using intention induction and semantic slot design, and improves the coverage rate of the answers by combining recall ordering on the basis.
The invention generates a core inference chain through the entity and the inquired intention, then generates a final inquiry diagram through adding constraint, thereby converting semantic analysis into inquiry diagram generation, and finally converting the inquiry diagram into knowledge graph inquiry. In order to improve the accuracy rate at the same time,
In one embodiment of the invention, an example of a medical answer is given.
The user: what is myocarditis? { intention: definition; entity: myocarditis };
and (3) a robot: returning a definition of myocarditis answers;
The user: what is treating? { intention: a method of treatment; entity: none };
And (3) a robot: according to the history of the user dialogue, recognizing the slot positions of the round of questions as empty, loading the context of the user dialogue, inheriting the slot positions saved by the previous round of dialogue, and completing slot position filling { entity: myocarditis };
The user: how long it will be recovered? { intention: treatment time (weaker), entity: none };
and (3) a robot: is you asking about the treatment time of "myocarditis"? { clear reply policy };
the user: is that; { user affirmative answer })
And (3) a robot: the answer is made as required.
The user: what are the common onset periods? { intention: onset time (very weak or no intention set); entity: none };
and (3) a robot: and carrying out inheritance { entity } on the entity of the previous dialogue according to the dialogue history of the user: myocarditis };
And (3) a robot: is you asking about the onset period of "myocarditis"? { clear reply policy };
the user: is that; { user affirmative answer })
And (3) a robot: the answer is made as required.
In the step 1, the named entity recognition of the question is completed through a BERT-BiLSTM-CRF model, specifically, a BERT module is used for preprocessing the question input by the user to obtain a word vector with high quality, the word vector is input to a BiLSTM module for further processing, the output result of the BiLSTM module is decoded through the CRF module, the CRF module obtains the relation of adjacent labels through probability calculation, and the optimal prediction sequence is obtained through the score with the maximum probability to complete the named entity recognition of the question.
The invention combines the BERT module, biLSTM module and CRF module to build the BERT-BiLSTM-CRF model, uses the model to make named entity identification for the question inputted by the user, identifies the entity in the question, and has the greatest advantage that the BERT module is used to preprocess the question, does not need to train the feature vector in advance, only inputs the question into the BERT model, obtains word vector through a bidirectional transducer structure, and BiLSTM module inputs hidden information through bidirectional LSTM calculation, decodes BiLSTM module output by using the CRF module, solves the optimal path and obtains the text label.
Wherein the BERT module is a bi-directional encoder based on a multi-layer transducer model. The transducer module consists of 6 encodings and 6 encodings, essentially a structure of encodings-encodings, each encodings module consisting of FFNN (feedforward neural network ) and a multi-headed attention mechanism, so that the representation of each word can integrate the information on the left and right sides thereof.
In the present invention, the pretraining of BERT consists of two things, one is MLM (Masked Language Model), which can be understood as a complete filling, the process is that 15% of words in each question of the random mask are predicted by context. And the other is Next Sentence Prediction a next sentence prediction task, two questions A and B in the article are randomly given, and the front-back relation before the two questions is judged.
Features, i.e., embedded vectors of words and questions, can be extracted from text data using BERT, where feature embedding was previously generated by Word2Vec, although each Word under Word2Vec has a fixed representation, independent of the context in which the Word appears. The BERT generated word representation is dynamically informed by words surrounding the word, and in addition to capturing obvious differences such as word ambiguities, the contextually relevant word embeddings captures the remaining forms of information that can yield more accurate feature representations, thereby improving model performance and obtaining high quality word vectors.
For the model BiLSTM, LSTM is a special recurrent neural network RNN, biLSTM magic armor acquires the context information of an input sequence through forward LSTM and backward LSTM, and solves the problem that the LSTM model cannot encode information from back to front.
The BiLSTM module extracts the semantic expression vector for each word in its context and then decodes the word's semantic vector using softmax, i.e., multiclassifies each word. However, the BiLSTM model cannot mine potential relations between the current information and the context, so that the invention adds an attention mechanism after the BiLSTM model to extract potential semantic relevance in the text.
The CRF module is a special Markov random field, wherein the Markov random field refers to that the assignment of one position in the random field is only related to the assignment of other adjacent positions, and the assignment of other non-adjacent positions is not related, and the CRF module sets the whole random field on the basis that the CRF module has only two variables: the labeling sequence L and the question W are generally given by a user, and the labeling sequence L is output under the condition of the given W.
The step of completing the named entity recognition of the question through the BERT-BiLSTM-CRF model specifically comprises the following steps:
Step 101, inputting a question in a BERT module, acquiring a plurality of word vectors through the BERT module, wherein the word vectors are respectively expressed as w 1、Ww2、w3,…,wn, each word vector has a corresponding label, and the model trains an optimized target;
For example, today is a good day, w 1 represents today, w 2 represents yes, w 3 represents good, w 4 represents good, and w 5 represents day.
102, In BiLSTM module, inputting embedding of each word vector, and extracting semantic expression vector of each word vector on the context by using bi-directional LSTM model;
Step 103, decoding semantic vectors of the word vectors by using softmax;
Step 104, decoding the result output by the BiLSTM module through the CRF module, marking a sequence L and a question W, wherein the question W is given by a user, the marking sequence L is output in the given question W, the given question W in the CRF module includes n word vectors, the marking sequence is scored according to a feature function set, a transition score is obtained, an optimal prediction sequence is obtained according to the maximum value of the probability value of the marking sequence, and the calculation formula of the transition score is as follows:
Wherein score (L/W) is a transfer score, f k represents a feature function, a weight lambda k is given to each feature function, n represents the length of a labeling sequence, K represents the number of feature functions, K represents the subscript of the feature function, and i represents the subscript of a word vector in the labeling sequence or question;
The specific calculation formula of the probability value of the labeling sequence is as follows:
Wherein p (L/W) is the probability value of the labeling sequence, W is a question, L is the labeling sequence, and score (L/W) is the transition score.
Since each word vector is multi-classified. As shown in fig. 2, for questions today are good days, it can be seen that there are 5 possible categories per word vector, but if the decoding is performed directly using softmax, it is easy to cause the predicted result to be out of order with the sequence. For example, a correct sequence of entities, in which B must precede I and O will not occur between B and I. Since softmax does not consider the sequence relation of the current token context when decoding, if the correct entity sequence is not considered, the final decoding result will appear that I is in front of B and only I has no question of B in some entity fragments, so the semantic vector of the word vector is decoded by using softmax, and the correct entity sequence needs to be considered at the same time.
In the invention, a CRF module is connected behind the bidirectional LSTM model to improve the accuracy of identifying the entity sequence, the CRF maintains a probability transition matrix during decoding, and judges what label the current token should be according to the transition matrix, thereby avoiding the generation of entity fragments which do not meet the sequence ordering requirement, and completely solving the question of directly using softmax for decoding in the step 103, which easily leads to the predicted result not meeting the sequence ordering.
In the step 2, for the attribute of the entity identified in the step 1 linked to the attribute of the related entity in the knowledge graph, through a bidirectional LSTM algorithm model, a question and a corresponding candidate relationship or vector representation of the attribute can be obtained, through obtaining the semantic similarity, the entity of the question linked to the knowledge graph is determined, the entity candidate word extracted from the question is made to be W, the entity in the knowledge graph is Z, z= (Z 1,z2,z3,…zn), N represents an N-dimensional vector corresponding to the entity W and the entity Z, and then the specific calculation formula of the semantic similarity is as follows:
where cos (W, Z) is semantic similarity and j represents the j-th word in the N-dimensional vector.
The problem that the LSTM model cannot encode the information from back to front is solved by using the bidirectional LSTM to acquire the context information of the input sequence.
And 3, taking the unique entity determined in the step 2 as a slot value, and if the user carries on the entity of the previous round in the question of the current round to ask questions, and the entity is not input in the question of the current round, inheriting the slot value saved in the previous round of dialogue by loading the dialogue context of the user, so as to realize the multi-round dialogue effect of slot inheritance.
Step 4, as shown in fig. 3, the question intent is identified using a BERT-TextCNN model, which is composed of a word embedding layer, a convolution layer, a pooling layer and a full connection layer. Firstly, carrying out token embedding on an original input, inputting to a BERT module after segment embedding and position embedding representation, generating a word vector matrix, carrying out convolution operation through a convolution layer of a TextCNN module to generate a feature map, carrying out maximum pooling operation through a pooling layer of TextCNN, namely, taking the maximum value of each feature obtained through the convolution operation, compressing and retaining the most important feature information, and outputting an intention classification result in a final full-connection layer by using a softmax activation function, wherein the result is between 0 and 1, and taking the maximum value as the final intention.
The feature map is generated by carrying out convolution operation on a TextCNN convolution layer, and the obtained feature formula is as follows:
wherein c represents a feature extraction vector, p represents a p-th word in the question, w is a convolution kernel, d is the width of the convolution kernel, h is the height of the convolution kernel,
The output result of the full connection layer is:
y=softmax(wdense·(zοr)+bdense)
wherein y is a classification result, and w dense and b dense are parameters and offset of the full connection layer respectively.
In the step 5, the final result value obtained in the step 4 may be smaller, and when the recovery processing is performed, a comparison is performed on the final result, if the result is between 0.4 and 0.8, the intention is ambiguous, and clarification processing is required, if the result is smaller than 0.4, it is indicated that the identified result is too low, and the step 7 is skipped to ensure accuracy.
And 6, according to the final result value of the step 4, if the final result is more than 0.8, the meaning understanding is finished, the accuracy is high, and as shown in fig. 4, the knowledge graph query language cytoer can be used for directly returning the answer.
Step 7, if the intention identified in step 5 is not within the preset intention range, inquiring a plurality of triples associated with the entity according to the entity identified in step 2 to complete sub-picture recall of the entity;
Step 8, coding and sequencing the paths of the questions and the subgraphs in the step 7 respectively; comparing the ordered scores, returning the path with the highest score as an answer, repeatedly executing the step 6, and returning the answer, wherein the specific method is to reduce the question input by the user from high dimension to low dimension, and map the question and the answer into a low dimension space to obtain the distributed expression. Training the distributed expression by utilizing the data set, and calculating the similarity between the question and the answer according to the Manhattan distance so that the similarity between the question and the answer is as high as possible. And finally, obtaining the return final answer with highest score according to the vector representation in the candidate answer group and the expression of the question input by the user.
In the invention, the question is W, W= (W 1,w2,…wn), the answer is B, B= (B 1,b2,…bn), and the calculation formula for calculating the similarity between the question and the answer according to the Manhattan distance is as follows:
wherein dist max (W, B) is the manhattan distance between vector W and vector B, W q represents the q-th tuple in vector W, B q represents the q-th tuple in vector B, and n represents the number of the tuple in vector W or vector B.
In step 6-step 8, when the identified intention data is worse, the invention finds a group of candidate answer sets in the knowledge graph according to the found entity, reduces the question input by the user from high dimension to low dimension, maps the question and the answer to the low dimension space in the form of vectors to obtain a distribution expression, trains the distribution expression through the data set, calculates the similarity of the question and the answer, and finally obtains the return final answer with the highest score according to the vector expression in the candidate answer sets and the expression of the question input by the user.
The invention also provides a question-answering device based on the combination of semantic analysis and vector modeling, which comprises:
the receiving module is used for receiving a question sentence given by a user;
the named entity recognition module is used for preprocessing a question input by a user, obtaining a word vector with high quality, obtaining an optimal prediction sequence and obtaining an entity;
the title confirming module is used for referring the non-standard title chain to the standard title, and when the entity identified by the named entity identifying module is the non-standard title, referring the entity chain identified by the named entity identifying module to the standard title, and re-acquiring the entity;
The slot filling module is used for completing slot filling, taking the entity acquired by the title confirmation module as a slot, and if the slot of the round of question is identified as empty, loading the context of the user dialogue, inheriting the slot saved by the previous round of dialogue, and completing slot filling;
the intention recognition module is used for recognizing the intention of the question;
the intention confirming module is used for confirming the intention of the question, if the intention is not clear, a pre-designed semantic slot template is used for replying to clear voice to confirm the intention, the intention is correct, the query module is skipped, and if the intention is incorrect, the recall module is skipped;
the query module is used for completing answer query through a knowledge graph query language according to the entity and the intention of the question;
And the recall module is used for inquiring a plurality of triples associated with the entity according to the entity identified by the title identification module to finish sub-picture recall of the entity, respectively coding and sequencing paths of the question sentence and the sub-picture, comparing the sequenced scores, and returning the path with the highest score as an answer.
The invention also provides an electronic device, which comprises a memory, a processor and a computer program stored in the memory and capable of running on the processor, wherein the processor realizes the steps of the question-answering method based on the combination of semantic analysis and vector modeling when executing the program.
In summary, the invention adopts the BERT-BiLSTM-CRF model to identify the question candidate words, and utilizes the cosine similarity to complete the matching of the question candidate words and the knowledge graph entity, thereby realizing entity extraction; constructing an entity name dictionary by utilizing entity names and entity alias information of the knowledge graph, and pointing the entity to a unique entity in the knowledge graph in a BERT pre-training model and dictionary matching mode to realize entity chain pointing; and then the question intention category and the question entity form a question triplet. If the identified intention is not covered, all sub-graphs under the entity data are recalled by using a vector modeling mode, coding ordering is carried out on the question and sub-graph paths respectively, the ordered scores are compared, the path with the highest score is returned, and the knowledge graph is queried through a Cypher graph database query statement to obtain an answer.
It will be apparent to those skilled in the art that various modifications and variations can be made to the present invention without departing from the spirit or scope of the invention. Thus, it is intended that the present invention also include such modifications and alterations insofar as they come within the scope of the appended claims or the equivalents thereof.
Claims (8)
1. A question-answering method based on combination of semantic analysis and vector modeling is characterized by comprising the following steps:
Step 1, preprocessing a question input by a user to obtain a word vector with high quality, obtaining an optimal prediction sequence, and completing named entity recognition of the question to obtain an entity;
Step 2, if the entity identified in the step 1 is not the standard name, the entity chain refers to the unique entity in the knowledge graph, the entity is obtained again, and if the entity identified in the step 1 is the standard name, the entity is used;
Step 3, taking the unique entity obtained in the step 2 as a slot, if the slot of the question is identified as empty, loading the context of the user dialogue, and inheriting the slot saved in the previous dialogue to finish slot filling;
step 4, completing the intention recognition of the question;
Step 5, if the intention recognition is ambiguous, using a pre-designed semantic slot template to reply to the clarification operation to confirm the intention, and jumping to the step 6 if the intention is correct and jumping to the step 7 if the intention is incorrect;
step 6, according to the entity and intention of the question, finishing answer inquiry through a knowledge graph inquiry language;
Step 7, if the identified intention is not in the preset intention range, inquiring a plurality of triples associated with the entity according to the entity identified in the step 2 to complete sub-picture recall of the entity;
and 8, respectively coding and sequencing the paths of the questions and the subgraphs in the step 7, comparing the sequenced scores, and returning the path with the highest score as an answer to finish answer inquiry.
2. The question-answering method based on combination of semantic parsing and vector modeling according to claim 1, wherein step 1 includes: the method comprises the steps of establishing a BERT-BiLSTM-CRF model, preprocessing a question by using the BERT module, inputting the question into the BERT model, obtaining word vectors through a bidirectional transducer structure, calculating input hidden information by using a BiLSTM module through bidirectional LSTM, decoding output of a BiLSTM module by using the CRF module, solving an optimal path, and obtaining an optimal prediction sequence.
3. The question-answering method based on combination of semantic analysis and vector modeling according to claim 2, wherein the step of the BERT-BiLSTM-CRF model for completing named entity recognition of a question specifically comprises:
Step 101, inputting a question in a BERT module, and acquiring a plurality of word vectors through the BERT module, wherein the word vectors are respectively expressed as w 1、w2、w3,...,wn, and each word vector is provided with a corresponding label;
102, in BiLSTM module, inputting embedding of each word vector, and extracting semantic expression vector of each word vector on the context by using bi-directional LSTM model;
Step 103, decoding semantic vectors of the word vectors by using softmax;
104, decoding the result output by the BiLSTM module through the CRF module, marking a sequence L and a question W, wherein the question W is given by a user, the marking sequence L is output in the given question W, the given question W in the CRF module comprises n word vectors, the marking sequence is scored according to a feature function set, a transition score is obtained, an optimal prediction sequence is obtained according to the maximum value of the probability value of the marking sequence, and the calculation formula of the W=(w1,w2,w2,...wn),L=(l1,l2,l3,...ln), transition score is as follows:
Wherein score (L/W) is a transfer score, f k represents a feature function, a weight lambda k is given to each feature function, n represents the length of a labeling sequence, K represents the number of feature functions, K represents the subscript of the feature function, and i represents the subscript of a word vector in the labeling sequence or question;
The specific calculation formula of the probability value of the labeling sequence is as follows:
Wherein p (L/W) is the probability value of the labeling sequence, W is a question, L is the labeling sequence, and score (L/W) is the transition score.
4. The question-answering method based on combination of semantic parsing and vector modeling according to claim 1, wherein step 2 comprises: and (2) for the attribute of the entity identified in the step (1) to be linked to the attribute of the related entity in the knowledge graph, obtaining a question and a corresponding candidate relation or vector representation of the attribute through a bidirectional LSTM algorithm model, determining the entity linked to the knowledge graph by obtaining semantic similarity, enabling the entity candidate word extracted from the question to be W, enabling the entity in the knowledge graph to be Z, wherein Z= (Z 1,z2,z3,...zn), N represents an N-dimensional vector corresponding to the entity W and the entity Z, and the semantic similarity is specifically calculated as follows:
where cos (W, Z) is semantic similarity and j represents the j-th word in the N-dimensional vector.
5. The question-answering method based on combination of semantic parsing and vector modeling according to claim 1, wherein the step 4 includes: identifying a question intention by using a BERT-TextCNN model, performing token embedding, segment embedding and position embedding representation on an original input, inputting the original input to a BERT module, generating a word vector matrix, further performing convolution operation through a convolution layer of the TextCNN module to generate a feature map, performing maximum pooling operation through a pooling layer of TextCNN, outputting an intention classification result by using a softmax activation function in a final full-connection layer, and taking the maximum value as a final intention between 0 and 1;
Performing convolution operation on a TextCNN convolution layer to generate a feature map, wherein the obtained feature formula is as follows:
wherein c represents a feature extraction vector, p represents a p-th word in the question, w is a convolution kernel, d is the width of the convolution kernel, h is the height of the convolution kernel,
The output result of the full connection layer is:
y=softmax(wdense·(z or)+bdense)
wherein y is a classification result, and w dense and b dense are parameters and offset of the full connection layer respectively.
6. The question-answering method based on combination of semantic parsing and vector modeling according to claim 1, wherein: the final result value obtained in the step 4 may be smaller, when the recovery processing is performed, a comparison is performed on the final result, if the result is between 0.4 and 0.8, the intention is ambiguous, the clarification processing is required, and the step 6 is skipped; if the result is less than 0.4, it is indicated that the identified result is too low, and step 7 is skipped to ensure accuracy.
7. The question-answering method based on combination of semantic parsing and vector modeling according to claim 6, wherein: according to the final result value of the step4, if the final result is above 0.8, the semantic understanding is finished, the accuracy is high, and the knowledge graph can be used for inquiring the language Cypher, and the answer can be directly returned.
8. The question-answering method based on combination of semantic parsing and vector modeling according to claim 1, wherein: the method comprises the steps of reducing the dimensionality of a question input by a user from high dimensionality to low dimensionality, mapping the question and an answer into a low-dimensional space to obtain distributed expression of the question, training the distributed expression by utilizing a data set, calculating the similarity between the question and the answer according to Manhattan distance to enable the similarity between the question and the answer to be as high as possible, and finally obtaining a return final answer with highest score according to vector representation in a candidate answer set and the expression of the question input by the user;
the question is W, w= (W 1,w2,...wn), the answer is B, b= (B 1,b2,...bn), and the calculation formula for calculating the similarity between the question and the answer according to the manhattan distance is as follows:
Wherein dist max (W, B) is the manhattan distance between vector W and vector B, W q represents the q-th tuple in vector W, B q represents the q-th tuple in vector B, and n represents the number of the tuple in vector W or vector B.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202210275679.XA CN114896407B (en) | 2022-03-21 | 2022-03-21 | Question-answering method based on combination of semantic analysis and vector modeling |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202210275679.XA CN114896407B (en) | 2022-03-21 | 2022-03-21 | Question-answering method based on combination of semantic analysis and vector modeling |
Publications (2)
Publication Number | Publication Date |
---|---|
CN114896407A CN114896407A (en) | 2022-08-12 |
CN114896407B true CN114896407B (en) | 2024-07-26 |
Family
ID=82715878
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202210275679.XA Active CN114896407B (en) | 2022-03-21 | 2022-03-21 | Question-answering method based on combination of semantic analysis and vector modeling |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN114896407B (en) |
Families Citing this family (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN116244344B (en) * | 2022-11-25 | 2023-09-05 | 中国农业科学院农业信息研究所 | Retrieval method and device based on user requirements and electronic equipment |
CN115982338B (en) * | 2023-02-24 | 2023-06-06 | 中国测绘科学研究院 | Domain knowledge graph question-answering method and system based on query path sorting |
CN117149966A (en) * | 2023-08-17 | 2023-12-01 | 内蒙古大学 | Question-answering method and system based on Roberta-DPCNN model |
CN118070812B (en) * | 2024-04-19 | 2024-07-05 | 深圳市中壬银兴信息技术有限公司 | Industry data analysis method based on NLP |
CN118113855B (en) * | 2024-04-30 | 2024-08-09 | 浙江建木智能系统有限公司 | Ship test training scene question answering method, system, equipment and medium |
Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2018000277A1 (en) * | 2016-06-29 | 2018-01-04 | 深圳狗尾草智能科技有限公司 | Question and answer method and system, and robot |
CN109271506A (en) * | 2018-11-29 | 2019-01-25 | 武汉大学 | A kind of construction method of the field of power communication knowledge mapping question answering system based on deep learning |
-
2022
- 2022-03-21 CN CN202210275679.XA patent/CN114896407B/en active Active
Patent Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2018000277A1 (en) * | 2016-06-29 | 2018-01-04 | 深圳狗尾草智能科技有限公司 | Question and answer method and system, and robot |
CN109271506A (en) * | 2018-11-29 | 2019-01-25 | 武汉大学 | A kind of construction method of the field of power communication knowledge mapping question answering system based on deep learning |
Also Published As
Publication number | Publication date |
---|---|
CN114896407A (en) | 2022-08-12 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN114896407B (en) | Question-answering method based on combination of semantic analysis and vector modeling | |
CN112115238B (en) | Question-answering method and system based on BERT and knowledge base | |
CN110413785A (en) | A kind of Automatic document classification method based on BERT and Fusion Features | |
CN110795543A (en) | Unstructured data extraction method and device based on deep learning and storage medium | |
CN112417894B (en) | Conversation intention identification method and system based on multi-task learning | |
CN112257449B (en) | Named entity recognition method and device, computer equipment and storage medium | |
CN111914556B (en) | Emotion guiding method and system based on emotion semantic transfer pattern | |
CN116127095A (en) | Question-answering method combining sequence model and knowledge graph | |
CN112100348A (en) | Knowledge base question-answer relation detection method and system of multi-granularity attention mechanism | |
CN118093834B (en) | AIGC large model-based language processing question-answering system and method | |
CN116150335A (en) | Text semantic retrieval method under military scene | |
CN111984780A (en) | Multi-intention recognition model training method, multi-intention recognition method and related device | |
CN115438674B (en) | Entity data processing method, entity linking method, entity data processing device, entity linking device and computer equipment | |
CN115964459B (en) | Multi-hop reasoning question-answering method and system based on food safety cognition spectrum | |
CN115982338A (en) | Query path ordering-based domain knowledge graph question-answering method and system | |
CN116737911A (en) | Deep learning-based hypertension question-answering method and system | |
CN116663539A (en) | Chinese entity and relationship joint extraction method and system based on Roberta and pointer network | |
CN115203388A (en) | Machine reading understanding method and device, computer equipment and storage medium | |
CN118313382A (en) | Small sample named entity recognition method and system based on feature pyramid | |
CN113378569A (en) | Model generation method, entity identification method, model generation device, entity identification device, electronic equipment and storage medium | |
CN117034135A (en) | API recommendation method based on prompt learning and double information source fusion | |
CN116822513A (en) | Named entity identification method integrating entity types and keyword features | |
CN114548325B (en) | Zero sample relation extraction method and system based on dual contrast learning | |
CN114417880B (en) | Interactive intelligent question-answering method based on power grid practical training question-answering knowledge base | |
CN115238705A (en) | Semantic analysis result reordering method and system |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |