CN114896407A - Question-answering method based on combination of semantic analysis and vector modeling - Google Patents

Question-answering method based on combination of semantic analysis and vector modeling Download PDF

Info

Publication number
CN114896407A
CN114896407A CN202210275679.XA CN202210275679A CN114896407A CN 114896407 A CN114896407 A CN 114896407A CN 202210275679 A CN202210275679 A CN 202210275679A CN 114896407 A CN114896407 A CN 114896407A
Authority
CN
China
Prior art keywords
question
vector
entity
intention
answer
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202210275679.XA
Other languages
Chinese (zh)
Other versions
CN114896407B (en
Inventor
马小林
周至春
旷海兰
刘新华
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Wuhan University of Technology WUT
Original Assignee
Wuhan University of Technology WUT
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Wuhan University of Technology WUT filed Critical Wuhan University of Technology WUT
Priority to CN202210275679.XA priority Critical patent/CN114896407B/en
Publication of CN114896407A publication Critical patent/CN114896407A/en
Application granted granted Critical
Publication of CN114896407B publication Critical patent/CN114896407B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/36Creation of semantic tools, e.g. ontology or thesauri
    • G06F16/367Ontology
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/3331Query processing
    • G06F16/334Query execution
    • G06F16/3344Query execution using natural language analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/35Clustering; Classification
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/10Text processing
    • G06F40/12Use of codes for handling textual entities
    • G06F40/126Character encoding
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/10Text processing
    • G06F40/194Calculation of difference between files
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/205Parsing
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • G06F40/289Phrasal analysis, e.g. finite state techniques or chunking
    • G06F40/295Named entity recognition
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/044Recurrent networks, e.g. Hopfield networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Computational Linguistics (AREA)
  • Artificial Intelligence (AREA)
  • General Health & Medical Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Data Mining & Analysis (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Databases & Information Systems (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Biomedical Technology (AREA)
  • Computing Systems (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Molecular Biology (AREA)
  • Evolutionary Computation (AREA)
  • Biophysics (AREA)
  • Animal Behavior & Ethology (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
  • Machine Translation (AREA)

Abstract

The invention discloses a question-answering method based on the combination of semantic analysis and vector modeling, which comprises the steps of completing the named entity identification of a question, completing slot filling, completing the intention identification of the question, confirming the identified intention, according to the entity and intention of the question, the answer query is completed through the knowledge map query language, the identified intention is not in the scope of the pre-designed intention, then, according to the entity identified in step 2, a plurality of triples associated with the entity are queried, the sub-graph recall of the entity is completed, the paths of the question and the sub-graph are respectively encoded and ordered, comparing the sorted scores, returning the path with the highest score as an answer, completing answer query, understanding the input question, and recall sequencing is carried out by combining the knowledge graph, answers with high precision rate are returned, and the question sentences put forward by the user in natural language can be answered by accurate and concise natural language.

Description

Question-answering method based on combination of semantic analysis and vector modeling
Technical Field
The invention relates to the technical field of natural language processing, in particular to a question-answering method based on combination of semantic analysis and vector modeling.
Background
With the rapid development of information technology, research on the direction of artificial intelligence is more and more intensive, researchers have also started researching intelligent question-answering systems, and various data sets have also started to appear. With the rapid development of the knowledge graph, a new knowledge source is provided for intelligent question answering. The knowledge graph can be used as a semantic network, nodes in the knowledge graph represent entities or concepts of related knowledge, directed edges represent the relation among the entities, the knowledge graph can visualize the relation among the entities and express the relation among data in a more intuitive mode, the knowledge graph is more in line with the cognitive habits of people, and intelligent question answering based on the knowledge graph attracts the attention of many researchers.
Based on a knowledge graph question-answering system, through semantic understanding and analysis of question sentences, the knowledge base is used for inquiring and reasoning to obtain answers, the answers need to be searched by relying on data of a knowledge graph, the accuracy rate is high, the evaluation standard is composed of three values, the recall rate, the accuracy rate and the F1 value. The related technical scheme based on the knowledge-graph question-answering system comprises semantic analysis, information retrieval and vector modeling. The question-answering algorithm based on semantic analysis is to convert natural language into a series of formal logic forms, and to analyze the logic forms from bottom to top to obtain the logic forms capable of expressing the semantics of the whole question, and to query in a knowledge graph through corresponding query sentences to obtain answers, but the accuracy rate of returning answers by using the method is not high. The method based on vector modeling is characterized in that a knowledge base question and answer is regarded as a semantic matching process, a numerical vector in a low-dimensional space is obtained by expressing a learning knowledge map and carrying out vector mapping on user question sentences, then answers most similar to the user question sentences in semantics are directly matched through numerical calculation, namely a question and answer task can be regarded as a process for calculating the similarity between the semantic vectors of the question sentences and the semantic vectors of entities and edges in a knowledge base.
Most of the existing question-answering system methods focus on simple question sentences, namely only one entity and one relationship are involved, the common solution method is to map the question sentences to triple query in a knowledge graph to obtain answers, but the common solution method is to map the question sentences to complex question sentences of a plurality of entities and relationships, and the common KBQA method cannot work well.
Disclosure of Invention
According to the defects of the prior art, the invention aims to provide a question-answering method based on the combination of semantic analysis and vector modeling, which can answer a question posed by a user in a natural language with accuracy and conciseness by understanding the input question and combining a knowledge graph to recall and sort and finally return an answer with high accuracy.
In order to solve the technical problem, the technical scheme adopted by the invention is as follows:
a question-answering method based on semantic parsing and vector modeling combination comprises the following steps:
step 1, preprocessing a question input by a user to obtain a high-quality word vector, obtaining an optimal prediction sequence, completing named entity identification of the question and obtaining an entity;
step 2, if the entity identified in the step 1 is not the standard title, the entity chain refers to the unique entity in the knowledge graph, the entity is obtained again, and if the entity identified in the step 1 is the standard title, the entity is used;
step 3, taking the only entity obtained in the step 2 as a slot position, if the slot position of the question turn is identified as empty, loading the context of the user conversation, inheriting the slot position saved in the previous turn of the conversation, and completing slot position filling;
step 4, completing the intention identification of the question;
step 5, if the intention identification is not clear, a pre-designed semantic groove template is used, a clear word replying technique is used for confirming the intention, if the intention is correct, the step 6 is skipped, and if the intention is incorrect, the step 7 is skipped;
step 6, according to the entity and the intention of the question sentence, completing answer query through a knowledge graph query language;
step 7, if the identified intention is not in the range of the pre-designed intention, inquiring a plurality of triples associated with the entity according to the entity identified in the step 2, and completing the sub-graph recall of the entity;
and 8, coding and sequencing the paths of the question sentence and the subgraph in the step 7 respectively, comparing the sequenced scores, returning the path with the highest score as an answer, and finishing answer query.
Further, the step 1 comprises: establishing a BERT-BilSTM-CRF model, preprocessing a question by using a BERT module, inputting the question into the BERT model, obtaining a word vector by a bidirectional Transformer structure, calculating input hidden information by using the BilSTM module through bidirectional LSTM, decoding the output of the BilSTM module by using the CRF module, solving an optimal path and obtaining an optimal prediction sequence.
Further, the step of completing the named entity identification of the question by the BERT-BilSTM-CRF model specifically comprises the following steps:
step 101, inputting a question in a BERT module, and obtaining a plurality of word vectors through the BERT module, wherein the word vectors are respectively represented as w 1 、w 2 、w 3 ,…,w n Each word vector has a corresponding label;
102, inputting embedding of each word vector in a BilSTM module, and extracting semantic expression vectors of each word vector in the context of the word vector by using a bidirectional LSTM model;
step 103, decoding a semantic vector of the word vector by using softmax;
step 104, decoding the output result of the BilSTM module through a CRF module, marking a sequence L and a question W, wherein the question W is given by a user, the marking sequence L is the output of the given question W, in the CRF module, the given question W comprises n wordsThe vector is used for scoring the labeling sequence according to the characteristic function set to obtain a transfer score, obtaining an optimal prediction sequence according to the maximum value of the probability value of the labeling sequence, and enabling W to be equal to (W) 1 ,w 2 ,w 2 ,…w n ),L=(l 1 ,l 2 ,l 3 ,…l n ) The formula for calculating the transfer score is as follows:
Figure BDA0003555644110000031
wherein score (L/W) is the transfer fraction, f k Representing the characteristic functions, each characteristic function being given a weight λ k N represents the length of the labeling sequence, K represents the number of the characteristic functions, K represents the subscript of the characteristic functions, and i represents the subscript of the word vector in the labeling sequence or question sentence;
the specific formula for calculating the probability value of the label sequence is as follows:
Figure BDA0003555644110000032
wherein p (L/W) is the probability value of the labeling sequence, W is a question, L is the labeling sequence, and score (L/W) is the transfer score.
Further, the step 2 comprises: linking the attributes of the entities identified in the step 1 to the attributes of related entities in the knowledge graph, obtaining a question and vector representation of corresponding candidate relations or attributes through a bidirectional LSTM algorithm model, determining the entities of the question linked to the knowledge graph by obtaining semantic similarity, and enabling the entity candidate words extracted from the question to be W and the entities in the knowledge graph to be Z and Z (Z is 1 ,z 2 ,z 3 ,…z n ) And N represents an N-dimensional vector corresponding to the entity W and the entity Z, and the semantic similarity is calculated by the following formula:
Figure BDA0003555644110000033
where cos (W, Z) is semantic similarity, and j represents the jth word in the N-dimensional vector.
Further, the step 4 comprises: recognizing question intentions by using a BERT-TextCNN model, performing token embedding on original input, inputting the original input into a BERT module after segment embedding and position embedding are expressed, generating a word vector matrix, performing convolution operation on a convolution layer of the TextCNN module to generate a feature map, performing maximum pooling operation on a pooling layer of the TextCNN, outputting intention classification results by using a softmax activation function in a final full-connection layer, and taking the maximum value as a final intention when the result is between 0 and 1;
carrying out convolution operation on the convolution layer of the TextCNN to generate a characteristic diagram, wherein the obtained characteristic formula is as follows:
Figure BDA0003555644110000034
wherein c represents a feature extraction vector, p represents the p-th word in the question, w is a convolution kernel, d is the width of the convolution kernel, h is the height of the convolution kernel,
the output result of the full connection layer is:
y=softmax(w dense ·(zοr)+b dense )
wherein y is the classification result, w dense And b dense Respectively, parameters and offset of the fully connected layer.
Further, the final result value obtained in step 4 may be relatively small, when performing the reply process, a comparison is performed on the final result, if the result is between 0.4 and 0.8, the intention is not clear, clarification process needs to be performed, and step 6 is skipped; if the result is less than 0.4, the recognized result is too low, and the step 7 is skipped to ensure the accuracy.
Further, according to the final result value in the step 4, if the final result is more than 0.8, the semantic understanding is completed, the accuracy is high, and the knowledge graph query language Cypher can be used for directly returning the answer.
Further, reducing the question input by the user from high dimension to low dimension, mapping the question and the answer to a low dimension space to obtain a distributed expression of the question, training the distributed expression by using a data set, calculating the similarity between the question and the answer according to the Manhattan distance to ensure that the similarity between the question and the answer is as high as possible, and finally obtaining a returned final answer with the highest score according to the vector representation in the candidate answer group and the question input by the user;
question is W, W ═ W 1 ,w 2 ,…w n ) The answer is B, B ═ B 1 ,b 2 ,…b n ) The calculation formula for calculating the similarity between the question and the answer according to the manhattan distance is as follows:
Figure BDA0003555644110000041
wherein, dist max (W, B) is the Manhattan distance between vector W and vector B, W q Representing the qth group of numbers, b, in the vector W q Representing the qth array in vector B and n representing the number of arrays in vector W or vector B.
Compared with the prior art, the invention has the following advantages and beneficial effects:
1. according to the question-answering method based on the combination of semantic analysis and vector modeling, input question sentences are understood, and are recalled and sequenced in combination with a knowledge graph, and finally an answer with high accuracy is returned.
2. The invention discloses a question answering method based on semantic analysis and vector modeling, which realizes multiple rounds of question answering by using slot inheritance and intention inheritance.
3. The question-answering method based on the combination of semantic analysis and vector modeling improves the accuracy of answers by using intention induction and semantic slot design, and improves the coverage rate of the answers by combining subgraph recall on the basis.
Drawings
FIG. 1 is a flow chart of a question-answering method based on the combination of semantic analysis and vector modeling;
FIG. 2 is a model diagram of BERT-BilSTM-CRF;
FIG. 3 is a BERT-TextCNN intent recognition model;
FIG. 4 is a block diagram based on vector modeling.
Detailed Description
The technical solution in the embodiments of the present invention will be clearly and completely described below with reference to the accompanying drawings in the embodiments of the present invention.
As shown in fig. 1, a question-answering method based on semantic parsing and vector modeling includes:
step 1, preprocessing a question input by a user to obtain a high-quality word vector, obtaining an optimal prediction sequence, completing named entity identification of the question and obtaining an entity;
step 2, if the entity identified in the step 1 is not the standard title, the entity chain refers to the unique entity in the knowledge graph, the entity is obtained again, and if the entity identified in the step 1 is the standard title, the entity is used;
step 3, taking the unique entity obtained in the step 2 as a slot position, if the slot position of the question sentence in the turn is identified as empty, loading the context of the user conversation, inheriting the slot position stored in the previous turn of conversation, and completing slot position filling;
step 4, completing the intention identification of the question;
step 5, if the intention identification is not clear, a pre-designed semantic groove template is used, a clear word replying technique is used for confirming the intention, if the intention is correct, the step 6 is skipped, and if the intention is incorrect, the step 7 is skipped;
step 6, according to the entity and the intention of the question sentence, completing answer query through a knowledge graph query language;
step 7, if the identified intention is not in the range of the pre-designed intention, inquiring a plurality of triples associated with the entity according to the entity identified in the step 2, and completing the sub-graph recall of the entity;
and 8, coding and sequencing the paths of the question sentence and the subgraph in the step 7 respectively, comparing the sequenced scores, returning the path with the highest score as an answer, and jumping to the step 6 to return the answer.
The invention finally returns an answer with high precision rate by understanding the input question and combining the knowledge graph to recall and sort, and can answer the question which is put forward by the user in natural language with accurate and simple natural language.
The traditional question-answer form is to give an answer to a question posed by a user, and the answer contains processing modes under all conditions. The users of the questions in this mode need to read a large amount of contents and find the answer they want in the contents. And multiple rounds of questions and answers can be used for asking the user, and answers wanted by the user are given under the condition that the intention of the user is clear. The invention can realize multiple rounds of question answering by using slot inheritance and intention inheritance, improve the accuracy of answers by using intention induction and semantic slot design, and improve the coverage rate of the answers by combining recall sequencing on the basis.
The invention generates a core reasoning chain through an entity and a queried intention, then generates a final query graph by adding constraint, thereby converting semantic analysis into query graph generation, and finally converting the query graph into knowledge graph query. At the same time, in order to improve the accuracy,
in one embodiment of the invention, an example of a medical answer is given.
The user: what is the myocarditis? { intent: defining; entity: myocarditis };
the robot comprises: returning a definition of a myocarditis answer;
the user: how to treat that? { intent: a method of treatment; entity: none };
the robot comprises: according to the user conversation history, identifying the slot position of the question sentence in the turn as empty, loading the context of the user conversation, inheriting the slot position saved in the previous turn of conversation, and completing slot position filling { entity: myocarditis };
the user: how long it takes for rehabilitation? { intent: treatment time (weaker), entity: none };
the robot comprises: asking you to ask about the treatment time of "myocarditis"? { clarification recovery strategy };
the user: is; { user positive answer }
The robot comprises: the answer is made as required.
The user: what is the common onset time? { intent: the period of onset (very weak or no intention set); entity: none };
the robot comprises: according to the user conversation history, inheriting entities of the previous conversation { entity: myocarditis };
the robot comprises: asking you to ask about the onset period of "myocarditis"? { clarification reply strategy };
the user: is; { user-affirmative answer }
The robot comprises: the answer is made as required.
In the step 1, named entity recognition of the question is completed through a BERT-BilSTM-CRF model, specifically, a BERT module is used for preprocessing the question input by a user to obtain a high-quality word vector, the word vector is input to a BilSTM module for further processing, the output result of the BilSTM module is decoded through a CRF module, the CRF module adopts probability calculation to obtain the relation of adjacent labels, and an optimal prediction sequence is obtained through the score with the maximum probability to complete named entity recognition of the question.
The method combines a BERT module, a BilSTM module and a CRF module to establish a BERT-BilSTM-CRF model, uses the model to identify named entities of question input by a user and identify entities in the question, and has the greatest advantages that the BERT module is used for preprocessing the question, characteristic vectors do not need to be trained in advance, only the question needs to be input into the BERT model, word vectors are obtained through a bidirectional transducer structure, the BilSTM module calculates input hidden information through bidirectional LSTM, and the CRF module is used for decoding output of the BilSTM module, solving an optimal path and obtaining text labels.
Wherein the BERT module is a bidirectional encoder based on a multi-layer Transformer model. The Transformer module is composed of 6 encoders and 6 decoders, and is essentially an encoder-decoder structure, and each encoder module is composed of an FFNN (feed forward neural network) and a multi-head attention mechanism, so that the representation of each word can integrate information on the left side and the right side of the word.
In the invention, the pretraining of the BERT includes two contents, one is mlm (masked Language model), which can be understood as complete filling, and the process is to randomly mask 15% of words in each question sentence, and predict the meaning of the 15% of words through context. The other is a Next Sentence Prediction task of Next sequence Prediction, two question sentences A and B in the article are randomly given, and the front-back relation between the two question sentences is judged.
Using BERT, it is possible to extract features, i.e. embedded vectors of words and question sentences, from text data, where feature embedding was previously generated by Word2Vec, although each Word under Word2Vec has a fixed representation, regardless of the context in which the Word appears. The word representation generated by BERT is dynamically informed by words around the words, and besides obvious differences such as word ambiguity are captured, the context-dependent words embeddings capture other forms of information which can generate more accurate feature representation, so that the performance of the model is improved, and high-quality word vectors are obtained.
For the BiLSTM model, LSTM is a special recurrent neural network RNN, and the BiLSTM magic armor acquires context information of an input sequence through forward LSTM and backward LSTM, so that a question that the LSTM model cannot encode information from back to front is solved.
The BilSTM module extracts the semantic expression vector of each word in its context and then uses softmax to decode the semantic vector of the word, i.e., multi-classify each word. However, the BilSTM model cannot dig out the potential relationship between the current information and the context, so that the method adds an attention mechanism behind the BilSTM model to extract the potential semantic relevance in the text.
The CRF module is a special Markov random field, the Markov random field refers to that the assignment of a certain position in the random field is only related to the assignments of other adjacent positions, and is not related to the assignments of other non-adjacent positions, and the CRF module sets that the whole random field has only two variables on the basis: and labeling a sequence L and a question W, wherein the question W is generally given by a user, and the labeling sequence L is output under the condition of the given W.
The step of completing the named entity recognition of the question through a BERT-BilSTM-CRF model specifically comprises the following steps:
step 101, inputting a question in a BERT module, and obtaining a plurality of word vectors through the BERT module, wherein the word vectors are respectively represented as w 1 、Ww 2 、w 3 ,…,w n Each word vector has a corresponding label, and the model trains an optimized target;
for example, today is a good day, w 1 Represents today, w 2 Is represented by 3 Represents a number w 4 Is shown as good, w 5 Indicating the day.
102, inputting embedding of each word vector in a BilSTM module, and extracting semantic expression vectors of each word vector in the context of the word vector by using a bidirectional LSTM model;
step 103, decoding a semantic vector of the word vector by using softmax;
step 104, decoding the output result of the BilSTM module through a CRF module, marking a sequence L and a question W, wherein the question W is given by a user, the marking sequence L is the output of the given question W, in the CRF module, the given question W comprises n word vectors, scoring the marking sequence according to a feature function set to obtain a transfer score, obtaining an optimal prediction sequence according to the maximum value of the probability value of the marking sequence, and making W equal to (W is the maximum value of the probability value of the marking sequence) 1 ,w 2 ,w 2 ,…w n ),L=(l 1 ,l 2 ,l 3 ,…l n ) The formula for calculating the transfer score is as follows:
Figure BDA0003555644110000081
wherein score (L/W) is the transfer fraction, f k Representing the characteristic functions, each characteristic function being given a weight λ k And n represents the length of the tag sequenceDegree, K represents the number of the characteristic functions, K represents the subscripts of the characteristic functions, and i represents the subscripts of word vectors in the labeling sequences or question sentences;
the specific calculation formula of the probability value of the labeling sequence is as follows:
Figure BDA0003555644110000082
wherein p (L/W) is the probability value of the labeling sequence, W is a question, L is the labeling sequence, and score (L/W) is the transfer score.
Since each word vector is multiclassified. As shown in fig. 2, for a question that is a good day today, it can be seen that there are 5 possible categories per word vector, but if decoded directly using softmax, it is easy to cause the predicted result to be out of order with the sequence. For example, a correct sequence of entities, in which B must precede I, but O does not occur between B and I. Because softmax does not consider the sequence relation of the context of the current token when decoding, if a correct entity sequence is not considered, the last decoding result will be a question that I is in front of B and only I does not have B in some entity fragments, so that when the softmax is used for decoding the semantic vector of the word vector, the correct entity sequence needs to be considered at the same time.
In the invention, a CRF module is connected behind a bidirectional LSTM model to improve the accuracy of identifying an entity sequence, a probability transition matrix is maintained during decoding of the CRF, and the label of the current token is judged according to the transition matrix, so that the generation of entity fragments which do not accord with the sequence ordering requirement is avoided, and the question that the predicted result does not accord with the ordering of the sequence because the decoding is directly carried out by using softmax in the step 103 can be completely solved.
In the step 2, the attributes of the entity identified in the step 1 are linked to the attributes of the related entities in the knowledge map, the question and the vector representation of the corresponding candidate relation or attribute can be obtained through a bidirectional LSTM algorithm model, and the question linked to the knowledge map is determined by obtaining the semantic similarityThe entity of the map is W as the entity candidate words extracted from the question, Z as the entity in the knowledge map (Z) 1 ,z 2 ,z 3 ,…z n ) And N represents an N-dimensional vector corresponding to the entity W and the entity Z, and the semantic similarity is calculated by the following formula:
Figure BDA0003555644110000083
where cos (W, Z) is semantic similarity, and j represents the jth word in the N-dimensional vector.
The use of bi-directional LSTM to obtain context information of the input sequence solves the problem that the LSTM model cannot encode information from back to front.
And 3, taking the unique entity determined in the step 2 as a slot position value, if the user continues the entity of the previous round in the current round of question to ask questions and does not input the entity in the question, inheriting the slot position value stored in the previous round of conversation by loading the user conversation context, and realizing the multi-round conversation effect inherited by the slot position.
And 4, as shown in FIG. 3, recognizing the question intention by using a BERT-TextCNN model, wherein the BERT-TextCNN model consists of a word embedding layer, a convolution layer, a pooling layer and a full-link layer. Firstly, token embedding is carried out on original input, after segment embedding and position embedding are expressed, the original input is input to a BERT module to generate a word vector matrix, convolution operation is carried out on a convolution layer of a TextCNN module to generate a feature map, then maximum pooling operation is carried out on a pooling layer of the TextCNN, namely, only the maximum value of each feature obtained by the convolution operation is taken, the most important feature information is compressed and reserved, in a final full-connection layer, an intention classification result is output by using a softmax activation function, and the maximum value is taken as a final intention when the result is between 0 and 1.
Performing convolution operation on the convolution layer of the TextCNN to generate a characteristic diagram, wherein the obtained characteristic formula is as follows:
Figure BDA0003555644110000091
wherein c represents a feature extraction vector, p represents the p-th word in the question, w is a convolution kernel, d is the width of the convolution kernel, h is the height of the convolution kernel,
the output result of the full connection layer is:
y=softmax(w dense ·(zοr)+b dense )
wherein y is the classification result, w dense And b dense Respectively, parameters and offset of the fully connected layer.
In the step 5, the final result value obtained in the step 4 may be relatively small, when the reply processing is performed, a comparison is performed on the final result, if the result is between 0.4 and 0.8, the intention is not clear, clarification processing is required, if the result is less than 0.4, the identified result is too low, and the step 7 is skipped to ensure the accuracy.
And 6, according to the final result value in the step 4, if the final result is more than 0.8, the semantic understanding is finished, and the accuracy is high, and as shown in fig. 4, a knowledge graph query language Cypher can be used for directly returning an answer.
Step 7, if the intention identified in the step 5 is not in the range of the pre-designed intention, inquiring a plurality of triples associated with the entity according to the entity identified in the step 2, and completing the sub-graph recall of the entity;
step 8, coding and sequencing the question and the path of the subgraph in the step 7 respectively; and (3) comparing the sorted scores, returning the path with the highest score as an answer, repeatedly executing the step (6), and returning the answer. The distributed expressions are trained by using the data sets, and the similarity between the question and the answer is calculated according to the Manhattan distance, so that the similarity between the question and the answer is as high as possible. And finally, obtaining a returned final answer with the highest score according to vector representation in the candidate answer group and the expression of the question input by the user.
In the present invention, the question is W, and W is (W) 1 ,w 2 ,…w n ) The answer is B, B ═ B 1 ,b 2 ,…b n ) The calculation formula for calculating the similarity between the question and the answer according to the manhattan distance is as follows:
Figure BDA0003555644110000101
wherein, dist max (W, B) is the Manhattan distance between vector W and vector B, W q Representing the qth group of numbers, b, in the vector W q Representing the qth array in vector B and n representing the number of arrays in vector W or vector B.
In steps 6 to 8, when the identified intention data is poor, a group of candidate answer groups is found in the knowledge graph according to the found entities, the question input by the user is reduced from high dimension to low dimension, the question and the answer are mapped to a low dimension space in a vector mode to obtain a distribution expression, then the distribution expression is trained through a data set, the similarity between the question and the answer is calculated, and finally the returned final answer with the highest score is obtained according to the vector expression in the candidate answer group and the question expression input by the user.
The invention also provides a question answering device based on the combination of semantic analysis and vector modeling, which comprises the following components:
the receiving module is used for receiving a question given by a user;
the named entity recognition module is used for preprocessing a question input by a user to obtain a high-quality word vector, obtain an optimal prediction sequence and obtain an entity;
the title confirming module is used for pointing the non-standard title chain to a standard title, and when the entity identified by the named entity identifying module is the non-standard title, pointing the entity chain identified by the named entity identifying module to the standard title, and re-acquiring the entity;
the slot filling module is used for completing slot filling, taking the entity acquired by the title confirmation module as a slot, if the slot of the question sentence is identified as empty, loading the context of user conversation, inheriting the slot saved in the previous round of conversation, and completing slot filling;
the intention identification module is used for identifying the intention of the question;
the intention confirming module is used for confirming the intention of the question, if the intention is not clearly identified, a pre-designed semantic slot template is used, a clear word replying technique is used for confirming the intention, the inquiry module is skipped if the intention is correct, and the recall module is skipped if the intention is incorrect;
the query module is used for completing answer query through a knowledge graph query language according to the entity and the intention of the question sentence;
and the recall module is used for inquiring the multiple triples related to the entity according to the entity identified by the title confirmation module for the unconfirmed intention, completing the sub-graph recall of the entity, respectively coding and sequencing the paths of the question and the sub-graph, comparing the sequenced scores, and returning the path with the highest score as an answer.
The invention also provides an electronic device, which comprises a memory, a processor and a computer program stored on the memory and capable of running on the processor, wherein the processor realizes the steps of the question answering method based on the combination of semantic analysis and vector modeling when executing the program.
In conclusion, the invention adopts a BERT-BilSTM-CRF model to identify question candidate words, and uses cosine similarity to complete the matching of the question candidate words and knowledge graph entities, thereby realizing entity extraction; an entity name dictionary is constructed by utilizing the entity name of the knowledge graph and the alias information of the entity, and the entity is pointed to the only entity in the knowledge graph through a BERT pre-training model and a dictionary matching mode, so that entity chain pointing is realized; and then the question intention categories and question entities form question triples. And if the identified intention is not covered, recalling all sub-graphs under the entity data by using a vector modeling mode, respectively coding and sequencing the question and sub-graph paths, comparing the sequenced scores, returning the path with the highest score, and querying a knowledge graph through a Cypher graph database query sentence to obtain an answer.
It will be apparent to those skilled in the art that various changes and modifications may be made in the present invention without departing from the spirit and scope of the invention. Thus, if such modifications and variations of the present invention fall within the scope of the claims of the present invention and their equivalents, the present invention is also intended to include such modifications and variations.

Claims (8)

1. A question-answering method based on semantic parsing and vector modeling is characterized by comprising the following steps:
step 1, preprocessing a question input by a user to obtain a high-quality word vector, obtaining an optimal prediction sequence, completing named entity identification of the question and obtaining an entity;
step 2, if the entity identified in the step 1 is not the standard title, the entity chain refers to the unique entity in the knowledge graph, the entity is obtained again, and if the entity identified in the step 1 is the standard title, the entity is used;
step 3, taking the only entity obtained in the step 2 as a slot position, if the slot position of the question turn is identified as empty, loading the context of the user conversation, inheriting the slot position saved in the previous turn of the conversation, and completing slot position filling;
step 4, completing the intention identification of the question;
step 5, if the intention identification is not clear, a pre-designed semantic groove template is used, a clear word replying technique is used for confirming the intention, if the intention is correct, the step 6 is skipped, and if the intention is incorrect, the step 7 is skipped;
step 6, according to the entity and the intention of the question sentence, completing answer query through a knowledge graph query language;
step 7, if the identified intention is not in the range of the pre-designed intention, inquiring a plurality of triples associated with the entity according to the entity identified in the step 2, and completing the sub-graph recall of the entity;
and 8, coding and sequencing the paths of the question sentence and the subgraph in the step 7 respectively, comparing the sequenced scores, returning the path with the highest score as an answer, and finishing answer query.
2. The question-answering method based on semantic parsing combined with vector modeling according to claim 1, wherein the step 1 comprises: establishing a BERT-BilSTM-CRF model, preprocessing a question by using a BERT module, inputting the question into the BERT model, obtaining a word vector by a bidirectional Transformer structure, calculating input hidden information by using the BilSTM module through bidirectional LSTM, decoding the output of the BilSTM module by using the CRF module, solving an optimal path and obtaining an optimal prediction sequence.
3. The question-answering method based on the combination of semantic parsing and vector modeling according to claim 2, wherein the step of completing named entity recognition of a question by a BERT-BilSTM-CRF model specifically comprises:
step 101, inputting a question in a BERT module, and obtaining a plurality of word vectors through the BERT module, wherein the word vectors are respectively represented as w 1 、w 2 、w 3 ,...,w n Each word vector has a corresponding label;
102, inputting embedding of each word vector in a BilSTM module, and extracting semantic expression vectors of each word vector in the context of the word vector by using a bidirectional LSTM model;
step 103, decoding a semantic vector of the word vector by using softmax;
step 104, decoding the output result of the BilSTM module through a CRF module, labeling a sequence L and a question W, wherein the question W is given by a user, the labeling sequence L is the output of the given question W, in the CRF module, the given question W comprises n word vectors, scoring is carried out on the labeling sequence according to a feature function set to obtain a transfer score, an optimal prediction sequence is obtained according to the maximum value of the probability value of the labeling sequence, and W is made to be (W is the maximum value of the probability value of the labeling sequence) 1 ,w 2 ,w 2 ,...w n ),L=(l 1 ,l 2 ,l 3 ,...l n ) The formula for calculating the transfer score is as follows:
Figure FDA0003555644100000021
wherein score (L/W) is the transfer fraction, f k Representing the characteristic functions, each characteristic function being given a weight λ k N represents the length of the labeling sequence, K represents the number of the characteristic functions, K represents the subscript of the characteristic functions, and i represents the subscript of the word vector in the labeling sequence or question sentence;
the specific calculation formula of the probability value of the labeling sequence is as follows:
Figure FDA0003555644100000022
wherein p (L/W) is the probability value of the labeling sequence, W is a question, L is the labeling sequence, and score (L/W) is the transfer score.
4. The question-answering method based on the combination of semantic parsing and vector modeling according to claim 1, wherein the step 2 comprises: linking the attributes of the entities identified in the step 1 to the attributes of related entities in the knowledge graph, obtaining a question and vector representation of corresponding candidate relations or attributes through a bidirectional LSTM algorithm model, determining the entities of the question linked to the knowledge graph by obtaining semantic similarity, and enabling the entity candidate words extracted from the question to be W and the entities in the knowledge graph to be Z and Z (Z is 1 ,z 2 ,z 3 ,...z n ) And N represents an N-dimensional vector corresponding to the entity W and the entity Z, and the semantic similarity is calculated by the following formula:
Figure FDA0003555644100000023
where cos (W, Z) is semantic similarity, and j represents the jth word in the N-dimensional vector.
5. The question-answering method based on the combination of semantic parsing and vector modeling according to claim 1, wherein the step 4 comprises: recognizing question intentions by using a BERT-TextCNN model, performing token embedding on original input, inputting the original input into a BERT module after segment embedding and position embedding are expressed, generating a word vector matrix, performing convolution operation on a convolution layer of the TextCNN module to generate a feature map, performing maximum pooling operation on a pooling layer of the TextCNN, outputting intention classification results by using a softmax activation function in a final full-connection layer, and taking the maximum value as a final intention when the result is between 0 and 1;
carrying out convolution operation on the convolution layer of the TextCNN to generate a characteristic diagram, wherein the obtained characteristic formula is as follows:
Figure FDA0003555644100000024
wherein c represents a feature extraction vector, p represents the p-th word in the question, w is a convolution kernel, d is the width of the convolution kernel, h is the height of the convolution kernel,
the output result of the full connection layer is:
y=softmax(w dense ·(z or)+b dense )
wherein y is the classification result, w dense And b dense Respectively, parameters and offset of the fully connected layer.
6. The question-answering method based on the combination of semantic parsing and vector modeling according to claim 1, wherein: the final result value obtained in the step 4 is possibly smaller, when the reply processing is carried out, the final result is compared, if the result is between 0.4 and 0.8, the intention is not clear, clarification processing is required, and the step 6 is skipped; if the result is less than 0.4, the recognized result is too low, and the step 7 is skipped to ensure the accuracy.
7. The question-answering method based on the combination of semantic parsing and vector modeling according to claim 6, wherein: according to the final result value in the step 4, if the final result is more than 0.8, the semantic understanding is finished, the accuracy is high, and the answer can be directly returned by using a knowledge graph query language Cypher.
8. The question-answering method based on the combination of semantic parsing and vector modeling according to claim 1, wherein: reducing a question input by a user from a high dimension to a low dimension, mapping the question and an answer to a low dimension space to obtain a distributed expression of the question, training the distributed expression by using a data set, calculating the similarity between the question and the answer according to a Manhattan distance to ensure that the similarity between the question and the answer is as high as possible, and finally obtaining a returned final answer with the highest score according to vector representation in a candidate answer group and the expression of the question input by the user;
question is W, W ═ W 1 ,w 2 ,...w n ) The answer is B, B ═ B 1 ,b 2 ,...b n ) The calculation formula for calculating the similarity between the question and the answer according to the manhattan distance is as follows:
Figure FDA0003555644100000031
wherein, dist max (W, B) is the Manhattan distance between vector W and vector B, W q Representing the qth group of numbers, b, in the vector W q Representing the qth array in vector B and n representing the number of arrays in vector W or vector B.
CN202210275679.XA 2022-03-21 2022-03-21 Question-answering method based on combination of semantic analysis and vector modeling Active CN114896407B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210275679.XA CN114896407B (en) 2022-03-21 2022-03-21 Question-answering method based on combination of semantic analysis and vector modeling

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210275679.XA CN114896407B (en) 2022-03-21 2022-03-21 Question-answering method based on combination of semantic analysis and vector modeling

Publications (2)

Publication Number Publication Date
CN114896407A true CN114896407A (en) 2022-08-12
CN114896407B CN114896407B (en) 2024-07-26

Family

ID=82715878

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210275679.XA Active CN114896407B (en) 2022-03-21 2022-03-21 Question-answering method based on combination of semantic analysis and vector modeling

Country Status (1)

Country Link
CN (1) CN114896407B (en)

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115982338A (en) * 2023-02-24 2023-04-18 中国测绘科学研究院 Query path ordering-based domain knowledge graph question-answering method and system
CN116244344A (en) * 2022-11-25 2023-06-09 中国农业科学院农业信息研究所 Retrieval method and device based on user requirements and electronic equipment
CN117149966A (en) * 2023-08-17 2023-12-01 内蒙古大学 Question-answering method and system based on Roberta-DPCNN model
CN118070812A (en) * 2024-04-19 2024-05-24 深圳市中壬银兴信息技术有限公司 Industry data analysis method and system based on NLP
CN118113855A (en) * 2024-04-30 2024-05-31 浙江建木智能系统有限公司 Ship test training scene question answering method, system, equipment and medium

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2018000277A1 (en) * 2016-06-29 2018-01-04 深圳狗尾草智能科技有限公司 Question and answer method and system, and robot
CN109271506A (en) * 2018-11-29 2019-01-25 武汉大学 A kind of construction method of the field of power communication knowledge mapping question answering system based on deep learning

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2018000277A1 (en) * 2016-06-29 2018-01-04 深圳狗尾草智能科技有限公司 Question and answer method and system, and robot
CN109271506A (en) * 2018-11-29 2019-01-25 武汉大学 A kind of construction method of the field of power communication knowledge mapping question answering system based on deep learning

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
程树东;胡鹰;: "基于BI-LSTM-CRF模型的限定领域知识库问答系统", 计算机与现代化, no. 07, 15 July 2018 (2018-07-15) *

Cited By (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116244344A (en) * 2022-11-25 2023-06-09 中国农业科学院农业信息研究所 Retrieval method and device based on user requirements and electronic equipment
CN116244344B (en) * 2022-11-25 2023-09-05 中国农业科学院农业信息研究所 Retrieval method and device based on user requirements and electronic equipment
CN115982338A (en) * 2023-02-24 2023-04-18 中国测绘科学研究院 Query path ordering-based domain knowledge graph question-answering method and system
CN117149966A (en) * 2023-08-17 2023-12-01 内蒙古大学 Question-answering method and system based on Roberta-DPCNN model
CN118070812A (en) * 2024-04-19 2024-05-24 深圳市中壬银兴信息技术有限公司 Industry data analysis method and system based on NLP
CN118070812B (en) * 2024-04-19 2024-07-05 深圳市中壬银兴信息技术有限公司 Industry data analysis method based on NLP
CN118113855A (en) * 2024-04-30 2024-05-31 浙江建木智能系统有限公司 Ship test training scene question answering method, system, equipment and medium

Also Published As

Publication number Publication date
CN114896407B (en) 2024-07-26

Similar Documents

Publication Publication Date Title
CN114896407A (en) Question-answering method based on combination of semantic analysis and vector modeling
CN110413785A (en) A kind of Automatic document classification method based on BERT and Fusion Features
CN109033068B (en) Method and device for reading and understanding based on attention mechanism and electronic equipment
CN112115238B (en) Question-answering method and system based on BERT and knowledge base
CN115640410B (en) Knowledge map multi-hop question-answering method based on reinforcement learning path reasoning
CN110502627A (en) A kind of answer generation method based on multilayer Transformer polymerization encoder
CN111949787A (en) Automatic question-answering method, device, equipment and storage medium based on knowledge graph
CN112417894B (en) Conversation intention identification method and system based on multi-task learning
CN111581519A (en) Item recommendation method and system based on user intention in session
CN116127095A (en) Question-answering method combining sequence model and knowledge graph
CN112100348A (en) Knowledge base question-answer relation detection method and system of multi-granularity attention mechanism
CN113297364A (en) Natural language understanding method and device for dialog system
CN116150335A (en) Text semantic retrieval method under military scene
CN115329766B (en) Named entity identification method based on dynamic word information fusion
CN111462749A (en) End-to-end dialogue system and method based on dialogue state guidance and knowledge base retrieval
CN116842126B (en) Method, medium and system for realizing accurate output of knowledge base by using LLM
CN115982338A (en) Query path ordering-based domain knowledge graph question-answering method and system
CN115688784A (en) Chinese named entity recognition method fusing character and word characteristics
Szűcs et al. Seq2seq deep learning method for summary generation by lstm with two-way encoder and beam search decoder
CN116663539A (en) Chinese entity and relationship joint extraction method and system based on Roberta and pointer network
CN118313382A (en) Small sample named entity recognition method and system based on feature pyramid
CN117932066A (en) Pre-training-based 'extraction-generation' answer generation model and method
Yang et al. Learning binary hash codes based on adaptable label representations
Chaudhuri et al. Cross-modal fusion distillation for fine-grained sketch-based image retrieval
CN116822513A (en) Named entity identification method integrating entity types and keyword features

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant