CN114896407A - Question-answering method based on combination of semantic analysis and vector modeling - Google Patents
Question-answering method based on combination of semantic analysis and vector modeling Download PDFInfo
- Publication number
- CN114896407A CN114896407A CN202210275679.XA CN202210275679A CN114896407A CN 114896407 A CN114896407 A CN 114896407A CN 202210275679 A CN202210275679 A CN 202210275679A CN 114896407 A CN114896407 A CN 114896407A
- Authority
- CN
- China
- Prior art keywords
- question
- vector
- entity
- intention
- answer
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
- 239000013598 vector Substances 0.000 title claims abstract description 114
- 238000000034 method Methods 0.000 title claims abstract description 38
- 238000004458 analytical method Methods 0.000 title abstract description 15
- 238000012163 sequencing technique Methods 0.000 claims abstract description 8
- 238000002372 labelling Methods 0.000 claims description 22
- 230000006870 function Effects 0.000 claims description 18
- 230000002457 bidirectional effect Effects 0.000 claims description 14
- 238000012546 transfer Methods 0.000 claims description 12
- 230000014509 gene expression Effects 0.000 claims description 11
- 238000007781 pre-processing Methods 0.000 claims description 8
- 238000004364 calculation method Methods 0.000 claims description 7
- 238000011176 pooling Methods 0.000 claims description 7
- 238000012545 processing Methods 0.000 claims description 6
- 238000005352 clarification Methods 0.000 claims description 5
- 238000010586 diagram Methods 0.000 claims description 5
- 239000011159 matrix material Substances 0.000 claims description 5
- 238000004422 calculation algorithm Methods 0.000 claims description 4
- 239000013604 expression vector Substances 0.000 claims description 4
- 238000000605 extraction Methods 0.000 claims description 4
- 230000004913 activation Effects 0.000 claims description 3
- 238000003491 array Methods 0.000 claims description 3
- 238000013507 mapping Methods 0.000 claims description 3
- 238000012549 training Methods 0.000 claims description 3
- 208000009525 Myocarditis Diseases 0.000 description 7
- 230000008569 process Effects 0.000 description 5
- 238000012986 modification Methods 0.000 description 3
- 230000004048 modification Effects 0.000 description 3
- 238000013528 artificial neural network Methods 0.000 description 2
- 238000012790 confirmation Methods 0.000 description 2
- 238000013461 design Methods 0.000 description 2
- 238000011161 development Methods 0.000 description 2
- 239000012634 fragment Substances 0.000 description 2
- 230000006698 induction Effects 0.000 description 2
- 230000007246 mechanism Effects 0.000 description 2
- 230000007704 transition Effects 0.000 description 2
- 238000013473 artificial intelligence Methods 0.000 description 1
- 230000009286 beneficial effect Effects 0.000 description 1
- 230000001149 cognitive effect Effects 0.000 description 1
- 238000004590 computer program Methods 0.000 description 1
- 230000007547 defect Effects 0.000 description 1
- 230000001419 dependent effect Effects 0.000 description 1
- 230000000694 effects Effects 0.000 description 1
- 238000005516 engineering process Methods 0.000 description 1
- 238000011156 evaluation Methods 0.000 description 1
- 239000000284 extract Substances 0.000 description 1
- 230000009191 jumping Effects 0.000 description 1
- 238000003058 natural language processing Methods 0.000 description 1
- 238000011084 recovery Methods 0.000 description 1
- 230000000306 recurrent effect Effects 0.000 description 1
- 238000011160 research Methods 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/30—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
- G06F16/36—Creation of semantic tools, e.g. ontology or thesauri
- G06F16/367—Ontology
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/30—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
- G06F16/33—Querying
- G06F16/3331—Query processing
- G06F16/334—Query execution
- G06F16/3344—Query execution using natural language analysis
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/30—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
- G06F16/35—Clustering; Classification
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/10—Text processing
- G06F40/12—Use of codes for handling textual entities
- G06F40/126—Character encoding
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/10—Text processing
- G06F40/194—Calculation of difference between files
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/20—Natural language analysis
- G06F40/205—Parsing
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/20—Natural language analysis
- G06F40/279—Recognition of textual entities
- G06F40/289—Phrasal analysis, e.g. finite state techniques or chunking
- G06F40/295—Named entity recognition
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/044—Recurrent networks, e.g. Hopfield networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- Computational Linguistics (AREA)
- Artificial Intelligence (AREA)
- General Health & Medical Sciences (AREA)
- Health & Medical Sciences (AREA)
- Data Mining & Analysis (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Databases & Information Systems (AREA)
- Life Sciences & Earth Sciences (AREA)
- Biomedical Technology (AREA)
- Computing Systems (AREA)
- Mathematical Physics (AREA)
- Software Systems (AREA)
- Molecular Biology (AREA)
- Evolutionary Computation (AREA)
- Biophysics (AREA)
- Animal Behavior & Ethology (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
- Machine Translation (AREA)
Abstract
The invention discloses a question-answering method based on the combination of semantic analysis and vector modeling, which comprises the steps of completing the named entity identification of a question, completing slot filling, completing the intention identification of the question, confirming the identified intention, according to the entity and intention of the question, the answer query is completed through the knowledge map query language, the identified intention is not in the scope of the pre-designed intention, then, according to the entity identified in step 2, a plurality of triples associated with the entity are queried, the sub-graph recall of the entity is completed, the paths of the question and the sub-graph are respectively encoded and ordered, comparing the sorted scores, returning the path with the highest score as an answer, completing answer query, understanding the input question, and recall sequencing is carried out by combining the knowledge graph, answers with high precision rate are returned, and the question sentences put forward by the user in natural language can be answered by accurate and concise natural language.
Description
Technical Field
The invention relates to the technical field of natural language processing, in particular to a question-answering method based on combination of semantic analysis and vector modeling.
Background
With the rapid development of information technology, research on the direction of artificial intelligence is more and more intensive, researchers have also started researching intelligent question-answering systems, and various data sets have also started to appear. With the rapid development of the knowledge graph, a new knowledge source is provided for intelligent question answering. The knowledge graph can be used as a semantic network, nodes in the knowledge graph represent entities or concepts of related knowledge, directed edges represent the relation among the entities, the knowledge graph can visualize the relation among the entities and express the relation among data in a more intuitive mode, the knowledge graph is more in line with the cognitive habits of people, and intelligent question answering based on the knowledge graph attracts the attention of many researchers.
Based on a knowledge graph question-answering system, through semantic understanding and analysis of question sentences, the knowledge base is used for inquiring and reasoning to obtain answers, the answers need to be searched by relying on data of a knowledge graph, the accuracy rate is high, the evaluation standard is composed of three values, the recall rate, the accuracy rate and the F1 value. The related technical scheme based on the knowledge-graph question-answering system comprises semantic analysis, information retrieval and vector modeling. The question-answering algorithm based on semantic analysis is to convert natural language into a series of formal logic forms, and to analyze the logic forms from bottom to top to obtain the logic forms capable of expressing the semantics of the whole question, and to query in a knowledge graph through corresponding query sentences to obtain answers, but the accuracy rate of returning answers by using the method is not high. The method based on vector modeling is characterized in that a knowledge base question and answer is regarded as a semantic matching process, a numerical vector in a low-dimensional space is obtained by expressing a learning knowledge map and carrying out vector mapping on user question sentences, then answers most similar to the user question sentences in semantics are directly matched through numerical calculation, namely a question and answer task can be regarded as a process for calculating the similarity between the semantic vectors of the question sentences and the semantic vectors of entities and edges in a knowledge base.
Most of the existing question-answering system methods focus on simple question sentences, namely only one entity and one relationship are involved, the common solution method is to map the question sentences to triple query in a knowledge graph to obtain answers, but the common solution method is to map the question sentences to complex question sentences of a plurality of entities and relationships, and the common KBQA method cannot work well.
Disclosure of Invention
According to the defects of the prior art, the invention aims to provide a question-answering method based on the combination of semantic analysis and vector modeling, which can answer a question posed by a user in a natural language with accuracy and conciseness by understanding the input question and combining a knowledge graph to recall and sort and finally return an answer with high accuracy.
In order to solve the technical problem, the technical scheme adopted by the invention is as follows:
a question-answering method based on semantic parsing and vector modeling combination comprises the following steps:
step 1, preprocessing a question input by a user to obtain a high-quality word vector, obtaining an optimal prediction sequence, completing named entity identification of the question and obtaining an entity;
step 2, if the entity identified in the step 1 is not the standard title, the entity chain refers to the unique entity in the knowledge graph, the entity is obtained again, and if the entity identified in the step 1 is the standard title, the entity is used;
step 3, taking the only entity obtained in the step 2 as a slot position, if the slot position of the question turn is identified as empty, loading the context of the user conversation, inheriting the slot position saved in the previous turn of the conversation, and completing slot position filling;
step 4, completing the intention identification of the question;
step 5, if the intention identification is not clear, a pre-designed semantic groove template is used, a clear word replying technique is used for confirming the intention, if the intention is correct, the step 6 is skipped, and if the intention is incorrect, the step 7 is skipped;
step 6, according to the entity and the intention of the question sentence, completing answer query through a knowledge graph query language;
step 7, if the identified intention is not in the range of the pre-designed intention, inquiring a plurality of triples associated with the entity according to the entity identified in the step 2, and completing the sub-graph recall of the entity;
and 8, coding and sequencing the paths of the question sentence and the subgraph in the step 7 respectively, comparing the sequenced scores, returning the path with the highest score as an answer, and finishing answer query.
Further, the step 1 comprises: establishing a BERT-BilSTM-CRF model, preprocessing a question by using a BERT module, inputting the question into the BERT model, obtaining a word vector by a bidirectional Transformer structure, calculating input hidden information by using the BilSTM module through bidirectional LSTM, decoding the output of the BilSTM module by using the CRF module, solving an optimal path and obtaining an optimal prediction sequence.
Further, the step of completing the named entity identification of the question by the BERT-BilSTM-CRF model specifically comprises the following steps:
step 101, inputting a question in a BERT module, and obtaining a plurality of word vectors through the BERT module, wherein the word vectors are respectively represented as w 1 、w 2 、w 3 ,…,w n Each word vector has a corresponding label;
102, inputting embedding of each word vector in a BilSTM module, and extracting semantic expression vectors of each word vector in the context of the word vector by using a bidirectional LSTM model;
step 103, decoding a semantic vector of the word vector by using softmax;
step 104, decoding the output result of the BilSTM module through a CRF module, marking a sequence L and a question W, wherein the question W is given by a user, the marking sequence L is the output of the given question W, in the CRF module, the given question W comprises n wordsThe vector is used for scoring the labeling sequence according to the characteristic function set to obtain a transfer score, obtaining an optimal prediction sequence according to the maximum value of the probability value of the labeling sequence, and enabling W to be equal to (W) 1 ,w 2 ,w 2 ,…w n ),L=(l 1 ,l 2 ,l 3 ,…l n ) The formula for calculating the transfer score is as follows:
wherein score (L/W) is the transfer fraction, f k Representing the characteristic functions, each characteristic function being given a weight λ k N represents the length of the labeling sequence, K represents the number of the characteristic functions, K represents the subscript of the characteristic functions, and i represents the subscript of the word vector in the labeling sequence or question sentence;
the specific formula for calculating the probability value of the label sequence is as follows:
wherein p (L/W) is the probability value of the labeling sequence, W is a question, L is the labeling sequence, and score (L/W) is the transfer score.
Further, the step 2 comprises: linking the attributes of the entities identified in the step 1 to the attributes of related entities in the knowledge graph, obtaining a question and vector representation of corresponding candidate relations or attributes through a bidirectional LSTM algorithm model, determining the entities of the question linked to the knowledge graph by obtaining semantic similarity, and enabling the entity candidate words extracted from the question to be W and the entities in the knowledge graph to be Z and Z (Z is 1 ,z 2 ,z 3 ,…z n ) And N represents an N-dimensional vector corresponding to the entity W and the entity Z, and the semantic similarity is calculated by the following formula:
where cos (W, Z) is semantic similarity, and j represents the jth word in the N-dimensional vector.
Further, the step 4 comprises: recognizing question intentions by using a BERT-TextCNN model, performing token embedding on original input, inputting the original input into a BERT module after segment embedding and position embedding are expressed, generating a word vector matrix, performing convolution operation on a convolution layer of the TextCNN module to generate a feature map, performing maximum pooling operation on a pooling layer of the TextCNN, outputting intention classification results by using a softmax activation function in a final full-connection layer, and taking the maximum value as a final intention when the result is between 0 and 1;
carrying out convolution operation on the convolution layer of the TextCNN to generate a characteristic diagram, wherein the obtained characteristic formula is as follows:
wherein c represents a feature extraction vector, p represents the p-th word in the question, w is a convolution kernel, d is the width of the convolution kernel, h is the height of the convolution kernel,
the output result of the full connection layer is:
y=softmax(w dense ·(zοr)+b dense )
wherein y is the classification result, w dense And b dense Respectively, parameters and offset of the fully connected layer.
Further, the final result value obtained in step 4 may be relatively small, when performing the reply process, a comparison is performed on the final result, if the result is between 0.4 and 0.8, the intention is not clear, clarification process needs to be performed, and step 6 is skipped; if the result is less than 0.4, the recognized result is too low, and the step 7 is skipped to ensure the accuracy.
Further, according to the final result value in the step 4, if the final result is more than 0.8, the semantic understanding is completed, the accuracy is high, and the knowledge graph query language Cypher can be used for directly returning the answer.
Further, reducing the question input by the user from high dimension to low dimension, mapping the question and the answer to a low dimension space to obtain a distributed expression of the question, training the distributed expression by using a data set, calculating the similarity between the question and the answer according to the Manhattan distance to ensure that the similarity between the question and the answer is as high as possible, and finally obtaining a returned final answer with the highest score according to the vector representation in the candidate answer group and the question input by the user;
question is W, W ═ W 1 ,w 2 ,…w n ) The answer is B, B ═ B 1 ,b 2 ,…b n ) The calculation formula for calculating the similarity between the question and the answer according to the manhattan distance is as follows:
wherein, dist max (W, B) is the Manhattan distance between vector W and vector B, W q Representing the qth group of numbers, b, in the vector W q Representing the qth array in vector B and n representing the number of arrays in vector W or vector B.
Compared with the prior art, the invention has the following advantages and beneficial effects:
1. according to the question-answering method based on the combination of semantic analysis and vector modeling, input question sentences are understood, and are recalled and sequenced in combination with a knowledge graph, and finally an answer with high accuracy is returned.
2. The invention discloses a question answering method based on semantic analysis and vector modeling, which realizes multiple rounds of question answering by using slot inheritance and intention inheritance.
3. The question-answering method based on the combination of semantic analysis and vector modeling improves the accuracy of answers by using intention induction and semantic slot design, and improves the coverage rate of the answers by combining subgraph recall on the basis.
Drawings
FIG. 1 is a flow chart of a question-answering method based on the combination of semantic analysis and vector modeling;
FIG. 2 is a model diagram of BERT-BilSTM-CRF;
FIG. 3 is a BERT-TextCNN intent recognition model;
FIG. 4 is a block diagram based on vector modeling.
Detailed Description
The technical solution in the embodiments of the present invention will be clearly and completely described below with reference to the accompanying drawings in the embodiments of the present invention.
As shown in fig. 1, a question-answering method based on semantic parsing and vector modeling includes:
step 1, preprocessing a question input by a user to obtain a high-quality word vector, obtaining an optimal prediction sequence, completing named entity identification of the question and obtaining an entity;
step 2, if the entity identified in the step 1 is not the standard title, the entity chain refers to the unique entity in the knowledge graph, the entity is obtained again, and if the entity identified in the step 1 is the standard title, the entity is used;
step 3, taking the unique entity obtained in the step 2 as a slot position, if the slot position of the question sentence in the turn is identified as empty, loading the context of the user conversation, inheriting the slot position stored in the previous turn of conversation, and completing slot position filling;
step 4, completing the intention identification of the question;
step 5, if the intention identification is not clear, a pre-designed semantic groove template is used, a clear word replying technique is used for confirming the intention, if the intention is correct, the step 6 is skipped, and if the intention is incorrect, the step 7 is skipped;
step 6, according to the entity and the intention of the question sentence, completing answer query through a knowledge graph query language;
step 7, if the identified intention is not in the range of the pre-designed intention, inquiring a plurality of triples associated with the entity according to the entity identified in the step 2, and completing the sub-graph recall of the entity;
and 8, coding and sequencing the paths of the question sentence and the subgraph in the step 7 respectively, comparing the sequenced scores, returning the path with the highest score as an answer, and jumping to the step 6 to return the answer.
The invention finally returns an answer with high precision rate by understanding the input question and combining the knowledge graph to recall and sort, and can answer the question which is put forward by the user in natural language with accurate and simple natural language.
The traditional question-answer form is to give an answer to a question posed by a user, and the answer contains processing modes under all conditions. The users of the questions in this mode need to read a large amount of contents and find the answer they want in the contents. And multiple rounds of questions and answers can be used for asking the user, and answers wanted by the user are given under the condition that the intention of the user is clear. The invention can realize multiple rounds of question answering by using slot inheritance and intention inheritance, improve the accuracy of answers by using intention induction and semantic slot design, and improve the coverage rate of the answers by combining recall sequencing on the basis.
The invention generates a core reasoning chain through an entity and a queried intention, then generates a final query graph by adding constraint, thereby converting semantic analysis into query graph generation, and finally converting the query graph into knowledge graph query. At the same time, in order to improve the accuracy,
in one embodiment of the invention, an example of a medical answer is given.
The user: what is the myocarditis? { intent: defining; entity: myocarditis };
the robot comprises: returning a definition of a myocarditis answer;
the user: how to treat that? { intent: a method of treatment; entity: none };
the robot comprises: according to the user conversation history, identifying the slot position of the question sentence in the turn as empty, loading the context of the user conversation, inheriting the slot position saved in the previous turn of conversation, and completing slot position filling { entity: myocarditis };
the user: how long it takes for rehabilitation? { intent: treatment time (weaker), entity: none };
the robot comprises: asking you to ask about the treatment time of "myocarditis"? { clarification recovery strategy };
the user: is; { user positive answer }
The robot comprises: the answer is made as required.
The user: what is the common onset time? { intent: the period of onset (very weak or no intention set); entity: none };
the robot comprises: according to the user conversation history, inheriting entities of the previous conversation { entity: myocarditis };
the robot comprises: asking you to ask about the onset period of "myocarditis"? { clarification reply strategy };
the user: is; { user-affirmative answer }
The robot comprises: the answer is made as required.
In the step 1, named entity recognition of the question is completed through a BERT-BilSTM-CRF model, specifically, a BERT module is used for preprocessing the question input by a user to obtain a high-quality word vector, the word vector is input to a BilSTM module for further processing, the output result of the BilSTM module is decoded through a CRF module, the CRF module adopts probability calculation to obtain the relation of adjacent labels, and an optimal prediction sequence is obtained through the score with the maximum probability to complete named entity recognition of the question.
The method combines a BERT module, a BilSTM module and a CRF module to establish a BERT-BilSTM-CRF model, uses the model to identify named entities of question input by a user and identify entities in the question, and has the greatest advantages that the BERT module is used for preprocessing the question, characteristic vectors do not need to be trained in advance, only the question needs to be input into the BERT model, word vectors are obtained through a bidirectional transducer structure, the BilSTM module calculates input hidden information through bidirectional LSTM, and the CRF module is used for decoding output of the BilSTM module, solving an optimal path and obtaining text labels.
Wherein the BERT module is a bidirectional encoder based on a multi-layer Transformer model. The Transformer module is composed of 6 encoders and 6 decoders, and is essentially an encoder-decoder structure, and each encoder module is composed of an FFNN (feed forward neural network) and a multi-head attention mechanism, so that the representation of each word can integrate information on the left side and the right side of the word.
In the invention, the pretraining of the BERT includes two contents, one is mlm (masked Language model), which can be understood as complete filling, and the process is to randomly mask 15% of words in each question sentence, and predict the meaning of the 15% of words through context. The other is a Next Sentence Prediction task of Next sequence Prediction, two question sentences A and B in the article are randomly given, and the front-back relation between the two question sentences is judged.
Using BERT, it is possible to extract features, i.e. embedded vectors of words and question sentences, from text data, where feature embedding was previously generated by Word2Vec, although each Word under Word2Vec has a fixed representation, regardless of the context in which the Word appears. The word representation generated by BERT is dynamically informed by words around the words, and besides obvious differences such as word ambiguity are captured, the context-dependent words embeddings capture other forms of information which can generate more accurate feature representation, so that the performance of the model is improved, and high-quality word vectors are obtained.
For the BiLSTM model, LSTM is a special recurrent neural network RNN, and the BiLSTM magic armor acquires context information of an input sequence through forward LSTM and backward LSTM, so that a question that the LSTM model cannot encode information from back to front is solved.
The BilSTM module extracts the semantic expression vector of each word in its context and then uses softmax to decode the semantic vector of the word, i.e., multi-classify each word. However, the BilSTM model cannot dig out the potential relationship between the current information and the context, so that the method adds an attention mechanism behind the BilSTM model to extract the potential semantic relevance in the text.
The CRF module is a special Markov random field, the Markov random field refers to that the assignment of a certain position in the random field is only related to the assignments of other adjacent positions, and is not related to the assignments of other non-adjacent positions, and the CRF module sets that the whole random field has only two variables on the basis: and labeling a sequence L and a question W, wherein the question W is generally given by a user, and the labeling sequence L is output under the condition of the given W.
The step of completing the named entity recognition of the question through a BERT-BilSTM-CRF model specifically comprises the following steps:
step 101, inputting a question in a BERT module, and obtaining a plurality of word vectors through the BERT module, wherein the word vectors are respectively represented as w 1 、Ww 2 、w 3 ,…,w n Each word vector has a corresponding label, and the model trains an optimized target;
for example, today is a good day, w 1 Represents today, w 2 Is represented by 3 Represents a number w 4 Is shown as good, w 5 Indicating the day.
102, inputting embedding of each word vector in a BilSTM module, and extracting semantic expression vectors of each word vector in the context of the word vector by using a bidirectional LSTM model;
step 103, decoding a semantic vector of the word vector by using softmax;
step 104, decoding the output result of the BilSTM module through a CRF module, marking a sequence L and a question W, wherein the question W is given by a user, the marking sequence L is the output of the given question W, in the CRF module, the given question W comprises n word vectors, scoring the marking sequence according to a feature function set to obtain a transfer score, obtaining an optimal prediction sequence according to the maximum value of the probability value of the marking sequence, and making W equal to (W is the maximum value of the probability value of the marking sequence) 1 ,w 2 ,w 2 ,…w n ),L=(l 1 ,l 2 ,l 3 ,…l n ) The formula for calculating the transfer score is as follows:
wherein score (L/W) is the transfer fraction, f k Representing the characteristic functions, each characteristic function being given a weight λ k And n represents the length of the tag sequenceDegree, K represents the number of the characteristic functions, K represents the subscripts of the characteristic functions, and i represents the subscripts of word vectors in the labeling sequences or question sentences;
the specific calculation formula of the probability value of the labeling sequence is as follows:
wherein p (L/W) is the probability value of the labeling sequence, W is a question, L is the labeling sequence, and score (L/W) is the transfer score.
Since each word vector is multiclassified. As shown in fig. 2, for a question that is a good day today, it can be seen that there are 5 possible categories per word vector, but if decoded directly using softmax, it is easy to cause the predicted result to be out of order with the sequence. For example, a correct sequence of entities, in which B must precede I, but O does not occur between B and I. Because softmax does not consider the sequence relation of the context of the current token when decoding, if a correct entity sequence is not considered, the last decoding result will be a question that I is in front of B and only I does not have B in some entity fragments, so that when the softmax is used for decoding the semantic vector of the word vector, the correct entity sequence needs to be considered at the same time.
In the invention, a CRF module is connected behind a bidirectional LSTM model to improve the accuracy of identifying an entity sequence, a probability transition matrix is maintained during decoding of the CRF, and the label of the current token is judged according to the transition matrix, so that the generation of entity fragments which do not accord with the sequence ordering requirement is avoided, and the question that the predicted result does not accord with the ordering of the sequence because the decoding is directly carried out by using softmax in the step 103 can be completely solved.
In the step 2, the attributes of the entity identified in the step 1 are linked to the attributes of the related entities in the knowledge map, the question and the vector representation of the corresponding candidate relation or attribute can be obtained through a bidirectional LSTM algorithm model, and the question linked to the knowledge map is determined by obtaining the semantic similarityThe entity of the map is W as the entity candidate words extracted from the question, Z as the entity in the knowledge map (Z) 1 ,z 2 ,z 3 ,…z n ) And N represents an N-dimensional vector corresponding to the entity W and the entity Z, and the semantic similarity is calculated by the following formula:
where cos (W, Z) is semantic similarity, and j represents the jth word in the N-dimensional vector.
The use of bi-directional LSTM to obtain context information of the input sequence solves the problem that the LSTM model cannot encode information from back to front.
And 3, taking the unique entity determined in the step 2 as a slot position value, if the user continues the entity of the previous round in the current round of question to ask questions and does not input the entity in the question, inheriting the slot position value stored in the previous round of conversation by loading the user conversation context, and realizing the multi-round conversation effect inherited by the slot position.
And 4, as shown in FIG. 3, recognizing the question intention by using a BERT-TextCNN model, wherein the BERT-TextCNN model consists of a word embedding layer, a convolution layer, a pooling layer and a full-link layer. Firstly, token embedding is carried out on original input, after segment embedding and position embedding are expressed, the original input is input to a BERT module to generate a word vector matrix, convolution operation is carried out on a convolution layer of a TextCNN module to generate a feature map, then maximum pooling operation is carried out on a pooling layer of the TextCNN, namely, only the maximum value of each feature obtained by the convolution operation is taken, the most important feature information is compressed and reserved, in a final full-connection layer, an intention classification result is output by using a softmax activation function, and the maximum value is taken as a final intention when the result is between 0 and 1.
Performing convolution operation on the convolution layer of the TextCNN to generate a characteristic diagram, wherein the obtained characteristic formula is as follows:
wherein c represents a feature extraction vector, p represents the p-th word in the question, w is a convolution kernel, d is the width of the convolution kernel, h is the height of the convolution kernel,
the output result of the full connection layer is:
y=softmax(w dense ·(zοr)+b dense )
wherein y is the classification result, w dense And b dense Respectively, parameters and offset of the fully connected layer.
In the step 5, the final result value obtained in the step 4 may be relatively small, when the reply processing is performed, a comparison is performed on the final result, if the result is between 0.4 and 0.8, the intention is not clear, clarification processing is required, if the result is less than 0.4, the identified result is too low, and the step 7 is skipped to ensure the accuracy.
And 6, according to the final result value in the step 4, if the final result is more than 0.8, the semantic understanding is finished, and the accuracy is high, and as shown in fig. 4, a knowledge graph query language Cypher can be used for directly returning an answer.
Step 7, if the intention identified in the step 5 is not in the range of the pre-designed intention, inquiring a plurality of triples associated with the entity according to the entity identified in the step 2, and completing the sub-graph recall of the entity;
step 8, coding and sequencing the question and the path of the subgraph in the step 7 respectively; and (3) comparing the sorted scores, returning the path with the highest score as an answer, repeatedly executing the step (6), and returning the answer. The distributed expressions are trained by using the data sets, and the similarity between the question and the answer is calculated according to the Manhattan distance, so that the similarity between the question and the answer is as high as possible. And finally, obtaining a returned final answer with the highest score according to vector representation in the candidate answer group and the expression of the question input by the user.
In the present invention, the question is W, and W is (W) 1 ,w 2 ,…w n ) The answer is B, B ═ B 1 ,b 2 ,…b n ) The calculation formula for calculating the similarity between the question and the answer according to the manhattan distance is as follows:
wherein, dist max (W, B) is the Manhattan distance between vector W and vector B, W q Representing the qth group of numbers, b, in the vector W q Representing the qth array in vector B and n representing the number of arrays in vector W or vector B.
In steps 6 to 8, when the identified intention data is poor, a group of candidate answer groups is found in the knowledge graph according to the found entities, the question input by the user is reduced from high dimension to low dimension, the question and the answer are mapped to a low dimension space in a vector mode to obtain a distribution expression, then the distribution expression is trained through a data set, the similarity between the question and the answer is calculated, and finally the returned final answer with the highest score is obtained according to the vector expression in the candidate answer group and the question expression input by the user.
The invention also provides a question answering device based on the combination of semantic analysis and vector modeling, which comprises the following components:
the receiving module is used for receiving a question given by a user;
the named entity recognition module is used for preprocessing a question input by a user to obtain a high-quality word vector, obtain an optimal prediction sequence and obtain an entity;
the title confirming module is used for pointing the non-standard title chain to a standard title, and when the entity identified by the named entity identifying module is the non-standard title, pointing the entity chain identified by the named entity identifying module to the standard title, and re-acquiring the entity;
the slot filling module is used for completing slot filling, taking the entity acquired by the title confirmation module as a slot, if the slot of the question sentence is identified as empty, loading the context of user conversation, inheriting the slot saved in the previous round of conversation, and completing slot filling;
the intention identification module is used for identifying the intention of the question;
the intention confirming module is used for confirming the intention of the question, if the intention is not clearly identified, a pre-designed semantic slot template is used, a clear word replying technique is used for confirming the intention, the inquiry module is skipped if the intention is correct, and the recall module is skipped if the intention is incorrect;
the query module is used for completing answer query through a knowledge graph query language according to the entity and the intention of the question sentence;
and the recall module is used for inquiring the multiple triples related to the entity according to the entity identified by the title confirmation module for the unconfirmed intention, completing the sub-graph recall of the entity, respectively coding and sequencing the paths of the question and the sub-graph, comparing the sequenced scores, and returning the path with the highest score as an answer.
The invention also provides an electronic device, which comprises a memory, a processor and a computer program stored on the memory and capable of running on the processor, wherein the processor realizes the steps of the question answering method based on the combination of semantic analysis and vector modeling when executing the program.
In conclusion, the invention adopts a BERT-BilSTM-CRF model to identify question candidate words, and uses cosine similarity to complete the matching of the question candidate words and knowledge graph entities, thereby realizing entity extraction; an entity name dictionary is constructed by utilizing the entity name of the knowledge graph and the alias information of the entity, and the entity is pointed to the only entity in the knowledge graph through a BERT pre-training model and a dictionary matching mode, so that entity chain pointing is realized; and then the question intention categories and question entities form question triples. And if the identified intention is not covered, recalling all sub-graphs under the entity data by using a vector modeling mode, respectively coding and sequencing the question and sub-graph paths, comparing the sequenced scores, returning the path with the highest score, and querying a knowledge graph through a Cypher graph database query sentence to obtain an answer.
It will be apparent to those skilled in the art that various changes and modifications may be made in the present invention without departing from the spirit and scope of the invention. Thus, if such modifications and variations of the present invention fall within the scope of the claims of the present invention and their equivalents, the present invention is also intended to include such modifications and variations.
Claims (8)
1. A question-answering method based on semantic parsing and vector modeling is characterized by comprising the following steps:
step 1, preprocessing a question input by a user to obtain a high-quality word vector, obtaining an optimal prediction sequence, completing named entity identification of the question and obtaining an entity;
step 2, if the entity identified in the step 1 is not the standard title, the entity chain refers to the unique entity in the knowledge graph, the entity is obtained again, and if the entity identified in the step 1 is the standard title, the entity is used;
step 3, taking the only entity obtained in the step 2 as a slot position, if the slot position of the question turn is identified as empty, loading the context of the user conversation, inheriting the slot position saved in the previous turn of the conversation, and completing slot position filling;
step 4, completing the intention identification of the question;
step 5, if the intention identification is not clear, a pre-designed semantic groove template is used, a clear word replying technique is used for confirming the intention, if the intention is correct, the step 6 is skipped, and if the intention is incorrect, the step 7 is skipped;
step 6, according to the entity and the intention of the question sentence, completing answer query through a knowledge graph query language;
step 7, if the identified intention is not in the range of the pre-designed intention, inquiring a plurality of triples associated with the entity according to the entity identified in the step 2, and completing the sub-graph recall of the entity;
and 8, coding and sequencing the paths of the question sentence and the subgraph in the step 7 respectively, comparing the sequenced scores, returning the path with the highest score as an answer, and finishing answer query.
2. The question-answering method based on semantic parsing combined with vector modeling according to claim 1, wherein the step 1 comprises: establishing a BERT-BilSTM-CRF model, preprocessing a question by using a BERT module, inputting the question into the BERT model, obtaining a word vector by a bidirectional Transformer structure, calculating input hidden information by using the BilSTM module through bidirectional LSTM, decoding the output of the BilSTM module by using the CRF module, solving an optimal path and obtaining an optimal prediction sequence.
3. The question-answering method based on the combination of semantic parsing and vector modeling according to claim 2, wherein the step of completing named entity recognition of a question by a BERT-BilSTM-CRF model specifically comprises:
step 101, inputting a question in a BERT module, and obtaining a plurality of word vectors through the BERT module, wherein the word vectors are respectively represented as w 1 、w 2 、w 3 ,...,w n Each word vector has a corresponding label;
102, inputting embedding of each word vector in a BilSTM module, and extracting semantic expression vectors of each word vector in the context of the word vector by using a bidirectional LSTM model;
step 103, decoding a semantic vector of the word vector by using softmax;
step 104, decoding the output result of the BilSTM module through a CRF module, labeling a sequence L and a question W, wherein the question W is given by a user, the labeling sequence L is the output of the given question W, in the CRF module, the given question W comprises n word vectors, scoring is carried out on the labeling sequence according to a feature function set to obtain a transfer score, an optimal prediction sequence is obtained according to the maximum value of the probability value of the labeling sequence, and W is made to be (W is the maximum value of the probability value of the labeling sequence) 1 ,w 2 ,w 2 ,...w n ),L=(l 1 ,l 2 ,l 3 ,...l n ) The formula for calculating the transfer score is as follows:
wherein score (L/W) is the transfer fraction, f k Representing the characteristic functions, each characteristic function being given a weight λ k N represents the length of the labeling sequence, K represents the number of the characteristic functions, K represents the subscript of the characteristic functions, and i represents the subscript of the word vector in the labeling sequence or question sentence;
the specific calculation formula of the probability value of the labeling sequence is as follows:
wherein p (L/W) is the probability value of the labeling sequence, W is a question, L is the labeling sequence, and score (L/W) is the transfer score.
4. The question-answering method based on the combination of semantic parsing and vector modeling according to claim 1, wherein the step 2 comprises: linking the attributes of the entities identified in the step 1 to the attributes of related entities in the knowledge graph, obtaining a question and vector representation of corresponding candidate relations or attributes through a bidirectional LSTM algorithm model, determining the entities of the question linked to the knowledge graph by obtaining semantic similarity, and enabling the entity candidate words extracted from the question to be W and the entities in the knowledge graph to be Z and Z (Z is 1 ,z 2 ,z 3 ,...z n ) And N represents an N-dimensional vector corresponding to the entity W and the entity Z, and the semantic similarity is calculated by the following formula:
where cos (W, Z) is semantic similarity, and j represents the jth word in the N-dimensional vector.
5. The question-answering method based on the combination of semantic parsing and vector modeling according to claim 1, wherein the step 4 comprises: recognizing question intentions by using a BERT-TextCNN model, performing token embedding on original input, inputting the original input into a BERT module after segment embedding and position embedding are expressed, generating a word vector matrix, performing convolution operation on a convolution layer of the TextCNN module to generate a feature map, performing maximum pooling operation on a pooling layer of the TextCNN, outputting intention classification results by using a softmax activation function in a final full-connection layer, and taking the maximum value as a final intention when the result is between 0 and 1;
carrying out convolution operation on the convolution layer of the TextCNN to generate a characteristic diagram, wherein the obtained characteristic formula is as follows:
wherein c represents a feature extraction vector, p represents the p-th word in the question, w is a convolution kernel, d is the width of the convolution kernel, h is the height of the convolution kernel,
the output result of the full connection layer is:
y=softmax(w dense ·(z or)+b dense )
wherein y is the classification result, w dense And b dense Respectively, parameters and offset of the fully connected layer.
6. The question-answering method based on the combination of semantic parsing and vector modeling according to claim 1, wherein: the final result value obtained in the step 4 is possibly smaller, when the reply processing is carried out, the final result is compared, if the result is between 0.4 and 0.8, the intention is not clear, clarification processing is required, and the step 6 is skipped; if the result is less than 0.4, the recognized result is too low, and the step 7 is skipped to ensure the accuracy.
7. The question-answering method based on the combination of semantic parsing and vector modeling according to claim 6, wherein: according to the final result value in the step 4, if the final result is more than 0.8, the semantic understanding is finished, the accuracy is high, and the answer can be directly returned by using a knowledge graph query language Cypher.
8. The question-answering method based on the combination of semantic parsing and vector modeling according to claim 1, wherein: reducing a question input by a user from a high dimension to a low dimension, mapping the question and an answer to a low dimension space to obtain a distributed expression of the question, training the distributed expression by using a data set, calculating the similarity between the question and the answer according to a Manhattan distance to ensure that the similarity between the question and the answer is as high as possible, and finally obtaining a returned final answer with the highest score according to vector representation in a candidate answer group and the expression of the question input by the user;
question is W, W ═ W 1 ,w 2 ,...w n ) The answer is B, B ═ B 1 ,b 2 ,...b n ) The calculation formula for calculating the similarity between the question and the answer according to the manhattan distance is as follows:
wherein, dist max (W, B) is the Manhattan distance between vector W and vector B, W q Representing the qth group of numbers, b, in the vector W q Representing the qth array in vector B and n representing the number of arrays in vector W or vector B.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202210275679.XA CN114896407B (en) | 2022-03-21 | 2022-03-21 | Question-answering method based on combination of semantic analysis and vector modeling |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202210275679.XA CN114896407B (en) | 2022-03-21 | 2022-03-21 | Question-answering method based on combination of semantic analysis and vector modeling |
Publications (2)
Publication Number | Publication Date |
---|---|
CN114896407A true CN114896407A (en) | 2022-08-12 |
CN114896407B CN114896407B (en) | 2024-07-26 |
Family
ID=82715878
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202210275679.XA Active CN114896407B (en) | 2022-03-21 | 2022-03-21 | Question-answering method based on combination of semantic analysis and vector modeling |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN114896407B (en) |
Cited By (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN115982338A (en) * | 2023-02-24 | 2023-04-18 | 中国测绘科学研究院 | Query path ordering-based domain knowledge graph question-answering method and system |
CN116244344A (en) * | 2022-11-25 | 2023-06-09 | 中国农业科学院农业信息研究所 | Retrieval method and device based on user requirements and electronic equipment |
CN117149966A (en) * | 2023-08-17 | 2023-12-01 | 内蒙古大学 | Question-answering method and system based on Roberta-DPCNN model |
CN118070812A (en) * | 2024-04-19 | 2024-05-24 | 深圳市中壬银兴信息技术有限公司 | Industry data analysis method and system based on NLP |
CN118113855A (en) * | 2024-04-30 | 2024-05-31 | 浙江建木智能系统有限公司 | Ship test training scene question answering method, system, equipment and medium |
Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2018000277A1 (en) * | 2016-06-29 | 2018-01-04 | 深圳狗尾草智能科技有限公司 | Question and answer method and system, and robot |
CN109271506A (en) * | 2018-11-29 | 2019-01-25 | 武汉大学 | A kind of construction method of the field of power communication knowledge mapping question answering system based on deep learning |
-
2022
- 2022-03-21 CN CN202210275679.XA patent/CN114896407B/en active Active
Patent Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2018000277A1 (en) * | 2016-06-29 | 2018-01-04 | 深圳狗尾草智能科技有限公司 | Question and answer method and system, and robot |
CN109271506A (en) * | 2018-11-29 | 2019-01-25 | 武汉大学 | A kind of construction method of the field of power communication knowledge mapping question answering system based on deep learning |
Non-Patent Citations (1)
Title |
---|
程树东;胡鹰;: "基于BI-LSTM-CRF模型的限定领域知识库问答系统", 计算机与现代化, no. 07, 15 July 2018 (2018-07-15) * |
Cited By (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN116244344A (en) * | 2022-11-25 | 2023-06-09 | 中国农业科学院农业信息研究所 | Retrieval method and device based on user requirements and electronic equipment |
CN116244344B (en) * | 2022-11-25 | 2023-09-05 | 中国农业科学院农业信息研究所 | Retrieval method and device based on user requirements and electronic equipment |
CN115982338A (en) * | 2023-02-24 | 2023-04-18 | 中国测绘科学研究院 | Query path ordering-based domain knowledge graph question-answering method and system |
CN117149966A (en) * | 2023-08-17 | 2023-12-01 | 内蒙古大学 | Question-answering method and system based on Roberta-DPCNN model |
CN118070812A (en) * | 2024-04-19 | 2024-05-24 | 深圳市中壬银兴信息技术有限公司 | Industry data analysis method and system based on NLP |
CN118070812B (en) * | 2024-04-19 | 2024-07-05 | 深圳市中壬银兴信息技术有限公司 | Industry data analysis method based on NLP |
CN118113855A (en) * | 2024-04-30 | 2024-05-31 | 浙江建木智能系统有限公司 | Ship test training scene question answering method, system, equipment and medium |
Also Published As
Publication number | Publication date |
---|---|
CN114896407B (en) | 2024-07-26 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN114896407A (en) | Question-answering method based on combination of semantic analysis and vector modeling | |
CN110413785A (en) | A kind of Automatic document classification method based on BERT and Fusion Features | |
CN109033068B (en) | Method and device for reading and understanding based on attention mechanism and electronic equipment | |
CN112115238B (en) | Question-answering method and system based on BERT and knowledge base | |
CN115640410B (en) | Knowledge map multi-hop question-answering method based on reinforcement learning path reasoning | |
CN110502627A (en) | A kind of answer generation method based on multilayer Transformer polymerization encoder | |
CN111949787A (en) | Automatic question-answering method, device, equipment and storage medium based on knowledge graph | |
CN112417894B (en) | Conversation intention identification method and system based on multi-task learning | |
CN111581519A (en) | Item recommendation method and system based on user intention in session | |
CN116127095A (en) | Question-answering method combining sequence model and knowledge graph | |
CN112100348A (en) | Knowledge base question-answer relation detection method and system of multi-granularity attention mechanism | |
CN113297364A (en) | Natural language understanding method and device for dialog system | |
CN116150335A (en) | Text semantic retrieval method under military scene | |
CN115329766B (en) | Named entity identification method based on dynamic word information fusion | |
CN111462749A (en) | End-to-end dialogue system and method based on dialogue state guidance and knowledge base retrieval | |
CN116842126B (en) | Method, medium and system for realizing accurate output of knowledge base by using LLM | |
CN115982338A (en) | Query path ordering-based domain knowledge graph question-answering method and system | |
CN115688784A (en) | Chinese named entity recognition method fusing character and word characteristics | |
Szűcs et al. | Seq2seq deep learning method for summary generation by lstm with two-way encoder and beam search decoder | |
CN116663539A (en) | Chinese entity and relationship joint extraction method and system based on Roberta and pointer network | |
CN118313382A (en) | Small sample named entity recognition method and system based on feature pyramid | |
CN117932066A (en) | Pre-training-based 'extraction-generation' answer generation model and method | |
Yang et al. | Learning binary hash codes based on adaptable label representations | |
Chaudhuri et al. | Cross-modal fusion distillation for fine-grained sketch-based image retrieval | |
CN116822513A (en) | Named entity identification method integrating entity types and keyword features |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |