CN113722452B

CN113722452B - Semantic-based rapid knowledge hit method and device in question-answering system

Info

Publication number: CN113722452B
Application number: CN202110807421.5A
Authority: CN
Inventors: 郭大勇; 张海龙; 兰永
Original assignee: Shanghai Tongban Information Service Co ltd
Current assignee: Shanghai Tongban Information Service Co ltd
Priority date: 2021-07-16
Filing date: 2021-07-16
Publication date: 2024-01-19
Anticipated expiration: 2041-07-16
Also published as: CN113722452A

Abstract

The application discloses a semantic-based rapid knowledge hit method and device in a question-answering system, wherein the method comprises the following steps: preparing corpus for model training, including user questions and knowledge in a corresponding knowledge base, and marking whether the user questions are matched with the knowledge; performing model training according to the labeled corpus according to the two classification tasks based on the Bert model, setting model output as a pool_output layer output of the Bert model after training is finished, and storing the model output as a semantic model; a knowledge base vector representation comprising a set of semantic vectors that are semantic vector spaces; carrying out semantic segmentation on a semantic vector space by adopting a random forest, and generating N binary trees by the same semantic vector space; and converting the user question into a semantic vector, and performing knowledge hit calculation. According to the method, the deep learning model is introduced to improve the knowledge hit effect, and the matching algorithm is optimized to improve the knowledge hit speed, so that the intelligent customer service can support a huge knowledge base.

Description

Semantic-based rapid knowledge hit method and device in question-answering system

Technical Field

The invention relates to the technical field of data identification processing, in particular to a semantic-based rapid knowledge hit method and device in a question-answering system.

Background

In recent years, intelligent customer service has been successfully applied to various business consultation service services, and a quick and convenient solution path is provided for enterprises and users. The intelligent customer service is used for automatically identifying the problem of the user through a machine and giving a corresponding solution, and in a specific implementation, the problem of the user is replied through the intelligent customer service, so that the response speed of the problem of the user can be improved, and the labor cost is saved.

Along with the development of the service in the application field, the intelligent customer service question-answering system has a plurality of and complex service scenes, the corresponding knowledge base is larger and larger, the traditional searching and matching algorithm cannot meet the requirements in the aspect of performance or effect, the knowledge hit rate is poor, and the user experience is poor.

Disclosure of Invention

The invention aims to provide a semantic-based rapid knowledge hit method and device in a question-answering system, so as to solve the problems in the technical background.

In order to achieve the above purpose, the present invention adopts the following technical scheme:

the first aspect of the present application provides a semantic-based rapid knowledge hit method in a question-answering system, including:

s1, preparing corpus used for model training, wherein the corpus comprises user questions and knowledge in a corresponding knowledge base, and labeling whether the user questions are matched with the knowledge;

s2, training a model based on the Bert model by using labeled corpus according to the two classification tasks, setting the model output as the output of a porous_output layer of the Bert model after training is completed, and storing the model output as a semantic model;

s3, converting the knowledge base of the text representation into a knowledge base of the semantic vector representation, namely a vector knowledge base, wherein a set of semantic vectors contained in the knowledge base is a semantic vector space;

s4, carrying out semantic segmentation on the semantic vector space by adopting a random forest, generating N binary trees by the same semantic vector space, wherein N is a natural number which is more than or equal to 1; wherein, each binary tree corresponds to a knowledge base represented by semantic vectors which are segmented randomly, leaf nodes of each binary tree represent semantic vectors of which the number is not more than K, K is a natural number and satisfies that K is not less than 1 and not more than total vector number/N;

s5, converting the user question into a corresponding semantic vector, namely a user question semantic vector, traversing the N binary trees by using the user question semantic vector, searching N nearest leaf nodes, collecting and de-duplicating the semantic vectors contained in the N nearest leaf nodes, and obtaining M semantic vectors;

s6, calculating the similarity between M semantic vectors and the semantic vectors of the user question, and selecting the semantic vector with the highest similarity as hit knowledge.

Specifically, in the step S4, N is a balance value of performance and accuracy, and needs to be adjusted step by step according to the actual effect.

Preferably, the step S1 includes the steps of:

s11: collecting user questions and knowledge in a corresponding knowledge base, wherein the user questions comprise positive and negative questions, positive expressions of which are matched with the knowledge, and negative expressions of which are not matched with the knowledge, and the questions which are similar in word and not matched with the knowledge comprise questions with unmatched semanteme;

s12: labeling whether the user question is matched with the knowledge, wherein the labeling format is as follows: user question + knowledge + tag, wherein the tag is matched or not.

Preferably, the step S3 includes the steps of:

s31: converting each piece of knowledge in the knowledge base into digital information by using a vocab dictionary of the Bert model;

s32: inputting the digital information into the semantic model for reasoning, and outputting semantic expression vectors of knowledge;

s33: after all knowledge reasoning is completed, the knowledge base of the text representation is converted into the knowledge base of the semantic vector representation.

Preferably, the step S4 includes the steps of:

s41: randomly selecting one semantic vector V in the vector knowledge base, and calculating cosine similarity between the semantic vectors in all the vector knowledge bases and the randomly selected semantic vector V;

s42: dividing semantic vectors with cosine similarity in the range of (0, 1) into a first subspace, and dividing semantic vectors with cosine similarity in the range of [ -1,0] into a second subspace;

s43: the semantic vector V is taken as a root node, the first subspace is taken as a left subtree, the second subspace is taken as a right subtree, and the semantic vector V, the first subspace and the second subspace form a binary tree;

s44: repeating steps S41-S43 for subspaces on all nodes of the binary tree until the number of semantic vectors in all subspaces is less than or equal to K;

s45: repeating the steps for N times, and projecting the semantic vector space of the vector knowledge base into N binary trees.

Preferably, in the step S5, the number of semantic vectors included in the N nearest leaf nodes is equal to or less than n×k.

Preferably, the step S5 includes the steps of:

s51: converting the text information into digital information by using a vorcab dictionary of the Bert model through a user question;

s52: inputting the digital information of the user question into the semantic model for reasoning, and outputting a semantic vector corresponding to the user question, namely, a semantic vector of the user question;

s53: selecting any one of N binary trees;

s54: calculating cosine similarity between the semantic vector of the question of the user and the binary tree node, wherein the cosine similarity is within the range of (0, 1), taking the left subtree node, or else, taking the right subtree node;

s55: repeating the step S54, searching the binary tree until a leaf node of the binary tree, namely the nearest leaf node, is found;

s56: repeating the steps S53-S55, and finding N nearest leaf nodes in all binary trees;

s57: and performing de-duplication processing on all the semantic vectors of the N nearest leaf nodes to obtain M semantic vectors, wherein the number of all the semantic vectors of the N leaf nodes is less than or equal to N x K.

Preferably, the step S6 includes the steps of:

s61: calculating cosine similarity between M semantic vectors and user question vectors;

s62: sequencing the M semantic vectors according to the descending order of cosine similarity, and returning the semantic vector D with highest similarity;

s63: and comparing the similarity value of the semantic vector D with a preset distance threshold T, and when D > T, indicating hit knowledge.

The second aspect of the present application provides a semantic-based rapid knowledge hit device in a question-answering system, including:

the corpus labeling preparation module is used for preparing corpus trained by the model, comprising user question sentences and knowledge in a corresponding knowledge base, and labeling whether the user question sentences are matched with the knowledge;

the semantic model fine-tuning module is used for carrying out model training by using the labeled corpus according to the two classification tasks based on the Bert model, setting the model output as the output of a porous_output layer of the Bert model after training is finished, and storing the model output as the semantic model;

the knowledge base vector representation module is used for converting the knowledge base of the text representation into the knowledge base of the semantic vector representation, namely a vector knowledge base, wherein the set of the semantic vectors contained in the knowledge base vector representation is a semantic vector space;

the binary tree generation module is used for carrying out semantic segmentation on the semantic vector space by adopting a random forest, and N binary trees are generated by the same semantic vector space, wherein N is a natural number which is more than or equal to 1; wherein, each binary tree corresponds to a knowledge base represented by semantic vectors which are segmented randomly, leaf nodes of each binary tree represent semantic vectors of which the number is not more than K, K is a natural number and satisfies that K is not less than 1 and not more than total vector number/N;

the user question searching module is used for converting a user question into a corresponding semantic vector, namely a user question semantic vector, traversing the N binary trees by using the user question semantic vector, searching N nearest leaf nodes, collecting and de-duplicating the semantic vectors contained in the N nearest leaf nodes, and obtaining M semantic vectors;

the knowledge hit calculation module is used for calculating the similarity between M semantic vectors and the semantic vectors of the user question, and selecting the semantic vector with the highest similarity as hit knowledge.

Specifically, N in the above is a balance value of performance and accuracy, and needs to be adjusted step by step according to the actual effect.

Preferably, the corpus annotation preparation module includes:

the collecting sub-module is used for collecting user questions and knowledge in the corresponding knowledge base, wherein the user questions comprise positive and negative questions, positive questions represent and knowledge are matched, negative questions represent and knowledge are not matched, and the questions which are similar in word and are not matched with the knowledge comprise questions which are not matched with the semantic meaning;

the labeling sub-module is used for labeling whether the question sentence of the user is matched with the knowledge, and the labeling format is as follows: user question + knowledge + tag, wherein the tag is matched or not.

Preferably, the user question searching module includes:

the user question semantic vector generation sub-module is used for outputting semantic vectors corresponding to the user questions after reasoning the user questions through the semantic model, namely the user question semantic vectors;

the traversing sub-module is used for traversing N binary trees and searching N nearest neighbor leaf nodes matched with the user question semantic vector in all binary trees;

and the de-duplication processing sub-module is used for performing de-duplication processing on all the semantic vectors of the N nearest leaf nodes to obtain M semantic vectors, wherein the number of all the semantic vectors of the N leaf nodes is less than or equal to N.

Preferably, the knowledge hit calculation module includes:

the similarity calculation sub-module is used for calculating cosine similarity between the M semantic vectors and the user question semantic vectors;

the similarity sorting sub-module is used for sorting the M semantic vectors according to the descending order of cosine similarity and returning the semantic vector D with the highest similarity;

the judging submodule is used for judging whether the returned semantic vector D exceeds a preset distance threshold T or not;

the determining submodule is used for determining that the semantic vector is hit knowledge when the similarity of the semantic vector D exceeds a preset distance threshold T.

A third aspect of the present application provides a computer device comprising a memory and a processor, and computer readable instructions stored in the memory and executable on the processor, the processor implementing the steps of the semantic-based fast knowledge hit method in a question-answering system described above when executing the computer readable instructions.

A fourth aspect of the present application provides a computer-readable storage medium storing computer-readable instructions that, when executed by a processor, implement the steps of the semantic-based rapid knowledge hit method in the question-answering system described above.

Compared with the prior art, the technical scheme of the invention has the following beneficial effects:

the application discloses a semantic-based rapid knowledge hit method and device in a question-answering system, wherein the method comprises the following steps: preparing and labeling corpus; fine-tuning a semantic model; a knowledge base vector representation; creating a knowledge base vector projection index; and (5) calculating the knowledge hit. According to the technical scheme, the deep learning model is introduced to improve the knowledge hit effect, and the matching algorithm is optimized to improve the knowledge hit speed, so that the intelligent customer service can support a larger and larger knowledge base. According to the method provided by the application, under the condition of a small amount of even no marked data, in the intelligent question-answering system facing a huge knowledge base, quicker and more accurate knowledge hit can be realized, and the user experience effect is good.

Drawings

The accompanying drawings, which are included to provide a further understanding of the application, illustrate and explain the application and are not to be construed as limiting the application. In the drawings:

FIG. 1 is a flow diagram of a semantic-based rapid knowledge hit method in a question-answering system of the present invention;

FIG. 2 is a schematic diagram of a processing procedure of forming a binary tree by a semantic vector V, a subspace A and a subspace B according to the embodiment of the invention;

FIG. 3 is a schematic diagram of a process for spatially projecting semantic vectors of a vector knowledge base into N binary trees in accordance with an embodiment of the present invention;

FIG. 4 is a schematic diagram of a process of traversing N binary trees to find all N leaf nodes in an embodiment of the present invention;

FIG. 5 is a logic diagram of a semantic-based fast knowledge hit method in a question-answering system according to an embodiment of the present invention;

FIG. 6 is a schematic diagram of a semantic-based rapid knowledge hit device in a question-answering system according to an embodiment of the present invention;

fig. 7 is a schematic structural diagram of a corpus labeling preparation module of a semantic-based rapid knowledge hit device in a question-answering system according to an embodiment of the present invention;

FIG. 8 is a schematic diagram of a user question searching module of a semantic-based rapid knowledge hit device in a question and answer system according to an embodiment of the present invention;

fig. 9 is a schematic diagram of a knowledge hit calculation module of a semantic-based fast knowledge hit device in a question-answering system according to an embodiment of the present invention.

Detailed Description

In order to make the objects, technical solutions and effects of the present invention clearer and more obvious, the present invention will be further described in detail below with reference to the accompanying drawings and examples. It should be understood that the specific embodiments described herein are for purposes of illustration only and are not intended to limit the scope of the invention.

It is noted that the terms "first," "second," and the like in the description and claims of the present invention and in the foregoing figures are used for distinguishing between similar objects and not necessarily for describing a particular sequential or chronological order, and it is to be understood that the data so used may be interchanged where appropriate. Furthermore, the terms "comprises," "comprising," and "having," and any variations thereof, are intended to cover a non-exclusive inclusion, such that a process, method, system, article, or apparatus that comprises a list of steps or elements is not necessarily limited to those steps or elements expressly listed but may include other steps or elements not expressly listed or inherent to such process, method, article, or apparatus.

A method and apparatus for semantic-based fast knowledge hit in a question-answering system of the present application are described below with reference to the accompanying drawings.

Embodiment one:

FIG. 1 is a flow chart of a semantic-based fast knowledge hit method according to the present application, as shown in FIG. 1, comprising the steps of:

step S1, preparing corpus for model training, including user questions and knowledge in a corresponding knowledge base, and marking whether the user questions are matched with the knowledge;

step S2, training a model based on the Bert model by using labeled corpus according to the two classification tasks, setting the model output as the output of a porous_output layer of the Bert model after training is completed, and storing the model output as a semantic model;

step S3, converting the knowledge base of the text representation into a knowledge base of the semantic vector representation, namely a vector knowledge base, wherein the set of the semantic vectors contained in the knowledge base is a semantic vector space;

s4, carrying out semantic segmentation on the semantic vector space by adopting a random forest, generating N binary trees by adopting the same semantic vector space, wherein N is a natural number which is more than or equal to 1, N is a balance value of performance and precision, and the N is required to be adjusted step by step according to an actual effect; wherein, each binary tree corresponds to a knowledge base of semantic representation of random segmentation, leaf nodes of each binary tree represent semantic vectors of not more than K pieces of knowledge, K is a natural number and satisfies 1.ltoreq.K.ltoreq.total vector number/N;

step S5, converting the user question into a corresponding semantic vector, namely a user question semantic vector, traversing the N binary trees by using the user question semantic vector, searching N nearest leaf nodes, collecting and de-duplicating the semantic vectors contained in the N nearest leaf nodes, and obtaining M semantic vectors;

and S6, calculating the similarity between M semantic vectors and the semantic vectors of the user question, and selecting the semantic vector with the highest similarity as hit knowledge.

Specifically, in connection with fig. 2-5, the method comprises the steps of:

the first step: corpus preparation and labeling.

Step 101: preparing corpus for model training, and collecting user question sentences and knowledge in a corresponding knowledge base, wherein the user question sentences comprise positive and negative question sentences, positive expression and knowledge matching, and negative expression and knowledge unmatched question sentences, in particular question sentences similar in word but unmatched in semantic meaning.

Step 102: the labeling format of the training corpus is as follows: user question + knowledge + tag, wherein the tag is matched or not.

And a second step of: semantic model fine-tuning.

Step 201: and training the model by using the corpus marked in the last step according to the classification tasks based on the deep pre-training Bert model.

Step 202: after training is completed, the model output is set as the output of the porous_output layer of the Bert model, and is stored as a semantic model.

And a third step of: knowledge base vector representation.

Step 301: each piece of knowledge in the knowledge base is converted to digital information using the vocab dictionary of the Bert model.

Step 302: the digital information is input into the semantic model subjected to fine adjustment to be inferred, and semantic expression vectors of knowledge are output.

Step 303: after all knowledge reasoning is completed, the knowledge base of the text representation is converted into a knowledge base of semantic vector representation, namely a vector knowledge base, and the set of semantic vectors contained in the knowledge base is a semantic vector space.

Fourth step: vector knowledge base projection index creation.

In many business scenarios, a random forest model is used as a classifier to perform operations such as classification processing on big data in the business, the random forest model is a combined model based on decision trees, and in practical application, the random forest model is classified by the voting results of a plurality of decision trees. Each internal node of the decision tree represents a test on an attribute, each branch represents a test output, each leaf node represents a category, and relevant information of the current node is recorded except the leaf nodes on the decision tree.

In this embodiment, a random forest is used to perform semantic segmentation on the semantic vector space, N binary trees are generated in the same semantic vector space, and N is the number of semantic vectors contained in the semantic vector space.

The method specifically comprises the following steps:

step 401: randomly selecting one semantic vector V in a vector knowledge base, and calculating cosine similarity between all semantic vectors in the vector knowledge base and the randomly selected semantic vector V;

step 402: dividing semantic vectors with cosine similarity in the range of (0, 1) into subspace A, and dividing semantic vectors with cosine similarity in the range of [ -1,0] into subspace B;

step 403: referring to fig. 2, a semantic vector V is taken as a root node, a subspace a is taken as a left subtree, a subspace B is taken as a right subtree, and the semantic vector V, the subspace a and the subspace B form a binary tree;

step 404: repeating steps 401-403 for subspaces on all nodes until the number of semantic vectors in all subspaces is less than or equal to K;

step 405: repeating the steps for N times, and projecting the semantic vector space of the vector knowledge base into N binary trees, as shown in the figure 3.

Fifth step: and (5) calculating the knowledge hit.

Step 501: the user question is converted into text information into digital information using the vocab dictionary of the Bert model.

Step 502: and inputting the digital information of the user question into the semantic model for reasoning, and outputting a semantic vector corresponding to the user question, namely, the semantic vector of the user question.

Step 503: any one of the N binary trees is selected.

Step 504: and calculating cosine similarity between the semantic vector of the question sentence of the user and the binary tree node, wherein the cosine similarity is within the range of (0, 1), and taking the left subtree node, or else, taking the right subtree node.

Step 505: step S504 is repeated to search the binary tree until a leaf node of the binary tree, i.e. the nearest leaf node, is found.

Step 506: steps S503 to S505 are repeated to find out the N nearest leaf nodes in all binary trees.

Step 507: and performing de-duplication processing on the semantic vectors which are less than or equal to N and contained in the N nearest leaf nodes to obtain M semantic vectors.

Step 508: and calculating cosine similarity between the M semantic vectors and the semantic vectors of the user question.

Step 509: and sequencing the M semantic vectors according to the descending order of cosine similarity, and returning the semantic vector D with the highest similarity.

Step 510: and comparing the similarity of the semantic vector D with a preset distance threshold T according to the preset distance threshold T determined by multiple tests in the actual service, and indicating hit knowledge when D is more than T.

Embodiment two:

FIG. 6 is a schematic structural diagram of a semantic-based fast knowledge hit apparatus according to the present application, as shown in FIG. 6, the apparatus 100 includes:

the corpus labeling preparation module 110 is configured to prepare a corpus trained by a model, including user question sentences and knowledge in a corresponding knowledge base, and label whether the user question sentences are matched with the knowledge;

the semantic model fine-tuning module 120 is configured to perform model training based on the Bert model by using the labeled corpus according to the two classification tasks, set the model output as a porous_output layer output of the Bert model after training is completed, and store the model output as a semantic model;

a knowledge base vector representation module 130, configured to convert a knowledge base of text representations into a knowledge base of semantic vector representations, i.e. a vector knowledge base, which includes a set of semantic vectors as a semantic vector space;

the binary tree generating module 140 is configured to perform semantic segmentation on the semantic vector space by using a random forest, generate N binary trees in the same semantic vector space, where N is a natural number greater than or equal to 1, and N is a balance value of performance and precision, and needs to be adjusted step by step according to an actual effect; wherein, each binary tree corresponds to a knowledge base represented by semantic vectors which are segmented randomly, leaf nodes of each binary tree represent semantic vectors of which the number is not more than K, K is a natural number and satisfies that K is not less than 1 and not more than total vector number/N;

the user question searching module 150 is configured to convert a user question into a corresponding semantic vector, that is, a user question semantic vector, traverse the N binary trees using the user question semantic vector, find N nearest leaf nodes, and aggregate and deduplicate semantic vectors contained in the N nearest leaf nodes to obtain M semantic vectors;

the knowledge hit calculation module 160 is configured to calculate the similarity between the M semantic vectors and the semantic vectors of the question of the user, and select the semantic vector with the highest similarity to determine the hit knowledge.

Specifically, referring to fig. 7, the corpus labeling preparation module 110 includes:

a collecting sub-module 111, configured to collect user questions and knowledge points in the corresponding knowledge base, where the user questions include positive and negative questions, positive expressions and knowledge-matched questions, negative expressions and knowledge-unmatched questions, and especially questions that are similar in terms but have unmatched semantics;

the labeling sub-module 112 is configured to label whether the question of the user is matched with the knowledge, and the labeling format is as follows: user question + knowledge + tag, wherein the tag is matched or not.

Specifically, referring to fig. 8, the user question searching module 150 includes:

the user question semantic vector generation sub-module 151 is configured to infer a user question from the semantic model and output a semantic vector corresponding to the user question, that is, a user question semantic vector;

a traversing submodule 152, configured to traverse the N binary trees, and find N nearest neighbor leaf nodes in all binary trees that match the user question semantic vector;

the deduplication processing sub-module 153 is configured to perform deduplication processing on all the semantic vectors of the found N nearest leaf nodes to obtain M semantic vectors, where the number of all the semantic vectors of the N leaf nodes is equal to or less than n×k.

Specifically, referring to fig. 9, the knowledge hit calculation module 160 includes:

a similarity calculation submodule 161, configured to calculate cosine similarity between M semantic vectors and user question semantic vectors;

the similarity sorting sub-module 162 is configured to sort the M semantic vectors in descending order of cosine similarity, and return the semantic vector D with the highest similarity;

a judging sub-module 163, configured to judge whether the similarity of the returned semantic vector D exceeds a preset distance threshold T;

a determining submodule 164, configured to determine knowledge that the semantic vector D is hit when the similarity of the semantic vector D exceeds a preset distance threshold T.

In another aspect, the present application further provides a computer device, including a memory, a processor, and computer readable instructions stored in the memory and executable on the processor, where the processor executes the computer readable instructions to implement the steps of the semantic-based fast knowledge hit method in the question-answering system.

In another aspect, the present application further provides a computer readable storage medium storing computer readable instructions that, when executed by a processor, implement the steps of the semantic-based fast knowledge hit method in the question-answering system described above.

In summary, the application discloses a semantic-based rapid knowledge hit method and device in a question-answering system, wherein the method comprises the following steps: preparing and labeling corpus; fine-tuning a semantic model; a knowledge base vector representation; creating a knowledge base vector projection index; and (5) calculating the knowledge hit. According to the technical scheme, the deep learning model is introduced to improve the knowledge hit effect, and the matching algorithm is optimized to improve the knowledge hit speed, so that the intelligent customer service can support a larger and larger knowledge base. According to the method provided by the application, under the condition of a small amount of even no marked data, in the intelligent question-answering system facing a huge knowledge base, quicker and more accurate knowledge hit can be realized, and the user experience effect is good.

The above description of the specific embodiments of the present invention has been given by way of example only, and the present invention is not limited to the above described specific embodiments. Any equivalent modifications and substitutions for the present invention will occur to those skilled in the art, and are also within the scope of the present invention. Accordingly, equivalent changes and modifications are intended to be included within the scope of the present invention without departing from the spirit and scope thereof.

Claims

1. A semantic-based rapid knowledge hit method in a question-answering system, comprising:

s6, calculating the similarity between M semantic vectors and the semantic vectors of the user question, and selecting the semantic vector with the highest similarity to determine the hit knowledge;

the step S4 includes the following steps S41 to S45:

s45: repeating the steps for N times, and projecting the semantic vector space of the vector knowledge base into N binary trees;

wherein, the step S5 comprises the following steps S51 to S57:

s53: selecting any one of N binary trees;

2. The method for semantic-based rapid knowledge hit in a question-answering system according to claim 1, wherein step S1 includes the steps of:

3. The method for semantic-based rapid knowledge hit in a question-answering system according to claim 1, wherein step S3 includes the steps of:

4. The method for semantic-based rapid knowledge hit in a question-answering system according to claim 1, wherein step S6 includes the steps of:

s61: calculating cosine similarity between M semantic vectors and user question semantic vectors;

s63: and comparing the similarity of the semantic vector D with a preset distance threshold T, and when D > T, indicating hit knowledge.

5. A semantic-based rapid knowledge hit device in a question-answering system, comprising:

the knowledge hit calculation module is used for calculating the similarity between M semantic vectors and the semantic vectors of the user question, and selecting the semantic vector with the highest similarity as hit knowledge;

wherein the binary tree generation module is configured to implement the following steps M1 to M5:

m1: randomly selecting one semantic vector V in the vector knowledge base, and calculating cosine similarity between the semantic vectors in all the vector knowledge bases and the randomly selected semantic vector V;

m2: dividing semantic vectors with cosine similarity in the range of (0, 1) into a first subspace, and dividing semantic vectors with cosine similarity in the range of [ -1,0] into a second subspace;

m3: the semantic vector V is taken as a root node, the first subspace is taken as a left subtree, the second subspace is taken as a right subtree, and the semantic vector V, the first subspace and the second subspace form a binary tree;

m4: repeating the steps M1-M3 for subspaces on all nodes of the binary tree until the number of semantic vectors in all subspaces is less than or equal to K;

m5: repeating the steps for N times, and projecting the semantic vector space of the vector knowledge base into N binary trees;

wherein, the user question searching module comprises:

6. The semantic-based rapid knowledge hit apparatus in a question-answering system according to claim 5, wherein the corpus annotation preparation module comprises:

7. The semantic-based rapid knowledge hit apparatus in a question-answering system according to claim 5, wherein the knowledge hit calculation module includes:

the judging submodule is used for judging whether the similarity of the returned semantic vector D exceeds a preset distance threshold T;