CN112231450A

CN112231450A - Question-answer retrieval method, question-answer retrieval device, and medium

Info

Publication number: CN112231450A
Application number: CN201910579670.6A
Authority: CN
Inventors: 张振中
Original assignee: BOE Technology Group Co Ltd
Current assignee: BOE Technology Group Co Ltd
Priority date: 2019-06-28
Filing date: 2019-06-28
Publication date: 2021-01-15

Abstract

Disclosed are a question and answer retrieval method, a question and answer retrieval device, a question and answer retrieval apparatus and a medium, wherein the question and answer retrieval method comprises: carrying out syntactic structure analysis on an input problem to obtain a syntactic structure vector of the input problem; processing an input question to obtain a syntactic content vector of the input question; and comparing the input question with a preset question in a preset question-answering library based on the syntactic structure vector and the syntactic content vector to obtain a retrieval result. The accuracy of the retrieval result is improved by comprehensively considering the syntactic structure and the syntactic content information of the input problem.

Description

Question-answer retrieval method, question-answer retrieval device, and medium

Technical Field

The present disclosure relates to the field of artificial intelligence, and more particularly, to a question and answer retrieval method, a question and answer retrieval device, a question and answer retrieval apparatus, and a medium.

Background

With the wide application of artificial intelligence in the civil and commercial fields, the technical demand for natural language processing is continuously increasing, and in particular, a process of retrieving corresponding answers based on input questions in a professional field (e.g., medical field) is also more demanding.

At present, when a user inputs a problem on the internet, a lot of time is consumed to acquire and browse information in the face of massive data resources in the network, and the information needs to be further discriminated to obtain a required retrieval result. Especially in the professional field (e.g., medical field), the user has many difficulties in finding, acquiring and understanding information, so that the answer retrieval process for the input question takes a long time and the obtained answer is poor in accuracy.

Therefore, a question-answer retrieval method with higher retrieval accuracy rate on the premise of realizing answer retrieval of input questions is needed.

Disclosure of Invention

In view of the above problems, the present disclosure provides a question and answer retrieval method, apparatus, device and medium. By using the question-answer retrieval method provided by the disclosure, the retrieval speed and the accuracy of the retrieval result can be effectively improved on the basis of realizing answer retrieval of input questions, real-time and high-precision retrieval is realized, and the method has good robustness.

According to an aspect of the present disclosure, a question and answer retrieval method is provided, including: carrying out syntactic structure analysis on an input problem to obtain a syntactic structure vector of the input problem; processing an input question to obtain a syntactic content vector of the input question; and comparing the input question with a preset question in a preset question-answering library based on the syntactic structure vector and the syntactic content vector to obtain a retrieval result.

In some embodiments, comparing the input question with a preset question in a preset question-answer library to obtain a search result based on the syntactic structure vector and the syntactic content vector comprises: calculating the syntactic structure similarity of the syntactic structure vector and the syntactic structure vector of each preset question in a preset question-answering library; calculating the syntactic content similarity of the syntactic content vector and the syntactic content vector of each preset question in a preset question-answering library; determining question similarity between the input question and each preset question in a preset question-answering library according to the syntactic structure similarity and the syntactic content similarity; and outputting a retrieval result according to the question similarity.

In some embodiments, outputting the search result according to the question similarity comprises: determining the maximum question similarity in the preset question-answer library based on the question similarity determined for each preset question, and acquiring a preset question and answer pair corresponding to the maximum question similarity; comparing the maximum problem similarity with the preset threshold; and outputting corresponding answers in the preset question and answer pair when the maximum question similarity is larger than or equal to a preset threshold, and outputting a null value when the maximum question similarity is smaller than the preset threshold.

In some embodiments, the input question is a medical question, and the preset question-answer library includes a plurality of preset medical question-answer pairs, wherein each preset medical question-answer pair includes: a class of medical questions and their corresponding answers.

In some embodiments, calculating the syntactic structure similarity of the syntactic structure vector and the syntactic structure vector of each preset question in the preset question-and-answer library comprises: for each sub-element in the syntactic structure vector of the input problem, obtaining a corresponding syntactic sub-tree; comparing the syntax subtree with the syntax subtree corresponding to each sub-element in the syntax structure vector of the preset problem, and obtaining the sub-element similarity of the sub-element in the syntax structure vector of the input problem based on the comparison result; and adding the sub-element similarity of all sub-elements in the syntactic structure vector of the input question to obtain the syntactic structure similarity of the input question and the preset question.

In some embodiments, comparing the syntax sub-tree with the syntax sub-tree corresponding to each sub-element in the syntax structure vector of the preset problem comprises: judging whether the element values of the current sub-elements in the syntactic structure vector of the input problem and the current sub-elements in the syntactic structure vector of the preset problem are non-zero numerical values or not; if a zero value exists in the numerical values of the current sub-elements in the syntax structure vector of the input problem and/or the current sub-elements in the syntax structure vector of the preset problem, outputting a preset first comparison result; and if the current sub-element in the syntactic structure vector of the input problem and the current sub-element in the syntactic structure vector of the preset problem are both nonzero values, comparing the syntactic sub-tree corresponding to the current sub-element in the syntactic structure vector of the input problem with the syntactic sub-tree corresponding to the current sub-element in the syntactic structure vector of the preset problem.

In some embodiments, comparing the syntax sub-tree corresponding to the current sub-element in the syntax structure vector of the input question with the syntax sub-tree corresponding to the current sub-element in the syntax structure vector of the preset question includes: taking a syntax sub-tree corresponding to a current sub-element in a syntax structure vector of an input problem as a first syntax sub-tree, and taking a syntax sub-tree corresponding to a current sub-element in the syntax structure vector of a preset problem as a second syntax sub-tree; and comparing the first syntax subtree and the second syntax subtree based on a preset rule to obtain the similarity of the first syntax subtree and the second syntax subtree.

In some embodiments, comparing the first and second syntax subtrees based on a preset rule comprises: judging whether a production formula on an initial node of the first syntax subtree is the same as a production formula on an initial node of the second syntax subtree or not; when the production formula on the initial node of the first syntax subtree is different from the production formula on the initial node of the second syntax subtree, outputting a preset first comparison result; and wherein, when the production formula on the initial node of the first syntax subtree and the production formula on the initial node of the second syntax subtree are the same, determining whether only leaf nodes exist in descendants of the initial node of the first syntax subtree and descendants of the initial node of the second syntax subtree; if only leaf nodes exist in the descendants of the initial node of the first syntax subtree and the descendants of the initial node of the second syntax subtree, outputting a preset second comparison result; and if the descendants of the initial node of the first syntactic subtree and/or the descendants of the initial node of the second syntactic subtree comprise non-leaf nodes, adopting a preset algorithm to calculate the similarity of the first syntactic subtree and the second syntactic subtree.

According to another aspect of the present disclosure, there is provided a question answering retrieval device including: the syntactic structure analysis module is configured to perform syntactic structure analysis on the input question to obtain a syntactic structure vector of the input question; the syntactic content analysis module is configured to process the input question to obtain a syntactic content vector of the input question; and the retrieval result generation module is configured to compare the input question with a preset question in a preset question-answering library based on the syntactic structure vector and the syntactic content vector to obtain a retrieval result.

In some embodiments, the search result generation module comprises: the structure similarity generating module is configured to calculate the syntactic structure similarity between the syntactic structure vector and the syntactic structure vector of each preset question in a preset question-answering library; the content similarity generating module is configured to calculate the syntactic content similarity between the syntactic content vector and the syntactic content vector of each preset question in a preset question-answering library; the question similarity generating module is configured to determine question similarity between the input question and each preset question in a preset question-and-answer library according to the syntactic structure similarity and the syntactic content similarity; and a result module configured to output a retrieval result according to the question similarity.

In some embodiments, the results module comprises: the maximum question similarity determining module is configured to determine the maximum question similarity in the preset question-answer library based on the question similarity determined for each preset question, and acquire a preset question and answer pair corresponding to the maximum question similarity; a comparison module configured to compare the maximum problem similarity with the preset threshold; and the output module is configured to output the corresponding answer in the preset question and answer pair when the maximum question similarity is greater than or equal to a preset threshold value, and output a null value when the maximum question similarity is less than the preset threshold value.

According to another aspect of the present disclosure, there is provided a question-answering retrieval device, wherein the device comprises a processor and a memory, the memory containing a set of instructions that, when executed by the processor, cause the question-answering retrieval device to perform operations comprising: carrying out syntactic structure analysis on an input problem to obtain a syntactic structure vector of the input problem; and comparing the input question with a preset question in a preset question-answering library based on the syntactic structure vector and the syntactic content vector to obtain a retrieval result.

In some embodiments, comparing the input question with a preset question in a preset question-answer library based on the syntactic structure vector and the syntactic content vector to obtain a retrieval result comprises: calculating the syntactic structure similarity of the syntactic structure vector and the syntactic structure vector of each preset question in a preset question-answering library; calculating the syntactic content similarity of the syntactic content vector and the syntactic content vector of each preset question in a preset question-answering library; determining question similarity between the input question and each preset question in a preset question-answering library according to the syntactic structure similarity and the syntactic content similarity; and outputting a retrieval result according to the question similarity.

According to another aspect of the present disclosure, there is provided a computer-readable storage medium having stored thereon computer-readable instructions which, when executed by a computer, perform the method as described above.

By utilizing the retrieval method provided by the disclosure, the answer retrieval process of the input question can be well completed, particularly, the method has higher retrieval accuracy and higher detection speed, and the method has good robustness.

Drawings

In order to more clearly illustrate the technical solutions of the embodiments of the present disclosure, the drawings needed to be used in the description of the embodiments will be briefly introduced below, and it is apparent that the drawings in the following description are only some embodiments of the present disclosure, and it is obvious for those skilled in the art that other drawings can be obtained based on these drawings without making creative efforts. The following drawings are not intended to be drawn to scale in actual dimensions, with emphasis instead being placed upon illustrating the principles of the disclosure.

FIG. 1 illustrates an exemplary flow chart of a question and answer retrieval method according to an embodiment of the present disclosure;

fig. 2A illustrates an exemplary flow diagram of a process 200 for deriving a syntactic analysis vector by syntactic analysis according to an embodiment of the present disclosure;

FIG. 2B shows a schematic diagram of a preset initial vector according to an embodiment of the present disclosure;

FIG. 2C illustrates a schematic diagram of parsing an input question according to an embodiment of the present disclosure;

FIG. 3 illustrates an exemplary flow diagram of a process 300 for comparing the input question with preset questions in a preset question-and-answer library to obtain a search result based on the syntactic structure vector and syntactic content vector according to an embodiment of the present disclosure;

fig. 4A illustrates an exemplary flow diagram of a process 400 of calculating a syntactic structure similarity of the syntactic structure vector to the syntactic structure vector of each preset question in a preset question-and-answer library, according to an embodiment of the present disclosure;

FIG. 4B illustrates an exemplary flow diagram of a process 410 for discriminating on a value of a sub-element to be compared according to an embodiment of the present disclosure;

FIG. 4C illustrates an exemplary flow diagram of a process of comparing a first syntax subtree and a second syntax subtree in accordance with an embodiment of the present disclosure;

FIG. 4D is a diagram illustrating a calculation of similarity between sub-elements in an input question and sub-elements in the pre-set question according to an embodiment of the present disclosure;

fig. 5 illustrates an exemplary flowchart of a process of calculating a syntactic content similarity of a syntactic content vector of the input question to a syntactic content vector of each preset question in a preset question-and-answer library, according to an embodiment of the present disclosure;

FIG. 6 illustrates an exemplary flow chart of a process of outputting search results based on the question similarity according to an embodiment of the present disclosure;

FIG. 7 illustrates an exemplary block diagram of a question answering retrieval device in accordance with an embodiment of the present disclosure;

fig. 8 shows an exemplary block diagram of a question answering retrieval device according to an embodiment of the present disclosure.

Detailed Description

Technical solutions in embodiments of the present disclosure will be described clearly and completely with reference to the accompanying drawings, and it is to be understood that the described embodiments are only some embodiments, but not all embodiments, of the present disclosure. All other embodiments, which can be derived by a person skilled in the art from the embodiments disclosed herein without making any creative effort, shall fall within the protection scope of the present disclosure.

As used in this application and the appended claims, the terms "a," "an," "the," and/or "the" are not intended to be inclusive in the singular, but rather are intended to be inclusive in the plural unless the context clearly dictates otherwise. In general, the terms "comprises" and "comprising" merely indicate that steps and elements are included which are explicitly identified, that the steps and elements do not form an exclusive list, and that a method or apparatus may include other steps or elements.

Although various references are made herein to certain modules in a system according to embodiments of the present application, any number of different modules may be used and run on a user terminal and/or server. The modules are merely illustrative and different aspects of the systems and methods may use different modules.

Flow charts are used herein to illustrate operations performed by systems according to embodiments of the present application. It should be understood that the preceding or following operations are not necessarily performed in the exact order in which they are performed. Rather, the various steps may be processed in reverse order or simultaneously, as desired. Meanwhile, other operations may be added to the processes, or a certain step or several steps of operations may be removed from the processes.

Fig. 1 illustrates an exemplary flow diagram of a question and answer retrieval method 100 according to an embodiment of the present disclosure.

First, in step S101, a syntactic structure analysis is performed on an input question to obtain a syntactic structure vector of the input question.

The syntactic structure analysis, namely syntactic analysis (parsing), refers to analyzing the grammatical function of words in a sentence, and the syntactic structure in the sentence and the dependency relationship among the components of the sentence can be found through the analysis. For example, for a sentence "i am late", syntactic structural analysis thereof may result in, for example, "i" being the subject, "i" being the predicate, and "late" being the complement in the sentence.

The syntactic structure analysis may be, for example, probabilistic context-free model-based syntactic analysis, core-word-driven syntactic analysis, etc., and embodiments of the present disclosure are not limited by the specific algorithm employed in the syntactic structure analysis. The syntactic structure analysis may be implemented, for example, by syntactic analysis tools, such as by Stanford's parser and Berkeley's parser, among others. Embodiments of the present disclosure are not limited by the particular tools employed by the syntactic structure analysis process.

The input question may be, for example, a question directly input by the user, or may be a question that is self-determined by the computer system in response to input information or control information of the user. The disclosed embodiments are not limited by the source of the input problem and the input manner. For example, the question may be a question input by the user in the web search bar, or may be a question generated after being processed in advance by the computer based on the input information of the user.

Next, in step S102, the input question is processed to obtain a syntactic content vector of the input question.

The process of processing the input question to obtain the syntactic vector can be implemented by a neural network, for example. The neural network may be, for example, a convolutional neural network, a fully-connected neural network, or a long-term and short-term memory neural network to achieve different practical application requirements, and the disclosure does not limit the type of the selected neural network.

The syntactic content vector characterizes content information of the input question. When an input question is processed through a neural network to obtain a syntactic content vector, it may be, for example, a 1024-dimensional vector, or also a 2048-dimensional vector, depending on parameter settings inside the neural network. Embodiments of the present disclosure are not limited by the specific dimensions of the syntactic content vector.

After the syntactic content vector and the syntactic structure vector are obtained, in step S103, the input question is compared with a preset question in a preset question-answering library based on the syntactic structure vector and the syntactic structure vector, so as to obtain a search result.

The preset question-answer library comprises a plurality of preset question and answer pairs. Each preset question and answer pair comprises a type of question and a corresponding answer.

The preset question-answer library may be, for example, a question-answer library of general knowledge, such as a common sense question-answer library. Or it may be a question-and-answer library of knowledge in a certain professional field, such as a medical question-and-answer library, or a financial knowledge question-and-answer library, and the embodiments of the present disclosure are not limited by the types of question and answer pairs in the preset question-and-answer library.

For example, each preset question and answer pair may include only one question, such as the question "how to get fever" and its answer; or the preset question and answer pair may also include a plurality of medical questions belonging to the same type, such as a question and answer pair named as "cardiovascular and cerebrovascular diseases", which may include a question "how the cerebral thrombosis is done" and its answer, a question "how the cerebral apoplexy is done" and its answer, and a question "how the heart disease is done" and its answer. Embodiments of the present disclosure are not limited by the number of specific questions included in each preset question and answer pair.

The comparing process may be, for example, comparing the syntactic content vector of the input question with the syntactic content vector of each preset question in the preset question library, comparing the syntactic structure vector of the input question with the syntactic structure vector of each preset question in the preset question library, obtaining the syntactic content vector similarity and the syntactic structure vector similarity of the input question relative to each preset question in the preset question library, and further obtaining the retrieval result based on the syntactic content similarity and the syntactic structure similarity. Other comparison methods may alternatively be employed, and embodiments of the present disclosure are not limited by the particular comparison method chosen.

Based on the above, the syntactic content vector and the syntactic structure vector of the input question are obtained, and the input question is compared with the preset question in the preset question-answering library to obtain a retrieval result based on the syntactic content vector and the syntactic structure vector, so that when the answer retrieval is performed on the input question, the syntactic structure characteristics and the syntactic content characteristics of the input question are comprehensively considered, and the retrieved answer is more accurate; compared with the retrieval process that a user browses information data in a network by himself and manually screens answers, the method obviously reduces the time cost of retrieval and has higher retrieval efficiency and retrieval speed.

Fig. 2A illustrates an exemplary flow diagram of a process 200 for deriving a syntactic analysis vector by syntactic analysis according to an embodiment of the present disclosure.

Referring to fig. 2A, in some embodiments, a process 200 of deriving a syntactic analysis vector by syntactic analysis, for example, may be described in more detail. First, in step S201, a syntax tree of the input question and all syntax subtrees included therein are obtained by syntax analysis. Next, in step S202, for each syntax subtree in the input question, a preset syntax subtree identical to the syntax subtree is obtained in a preset sentence syntax tree library. In step S203, based on each obtained preset syntax subtree, in a preset initial vector, a corresponding accumulated value is assigned to the sub-element corresponding to the preset syntax subtree, so as to obtain a syntax analysis vector of the input question.

The preset initial vector is a vector with a first preset dimension, the first preset dimension is the number of preset syntax subtrees in the preset sentence-law subtree library, each dimension in the preset initial vector corresponds to one preset syntax subtree in the preset sentence-law subtree library, and each subelement in the preset initial vector has the same initial value.

The preset sentence method sub-tree library and the preset initial vector can be obtained through the following processes: firstly, performing statement analysis on a professional library in a professional field and performing duplication removal operation on an analysis result to obtain a syntax subtree library in the professional field, and taking the syntax subtree library as a preset syntax subtree library. The preset syntax subtree library comprises a plurality of preset syntax subtrees which are different from each other and cover most syntax subtrees which can appear in the professional field. Secondly, determining a first preset dimension of a preset initial vector based on the number of preset syntax subtrees included in the preset sentence method subtree library, namely enabling the first preset dimension to be equal to the number of the preset syntax subtrees in the preset sentence method subtree library, and enabling each subelement in the preset initial vector to correspond to one preset syntax subtree in the preset sentence method subtree library. However, it should be appreciated that embodiments of the present disclosure are not limited to the limitations of the specific manner of determining the first preset dimension of the preset initial vector and obtaining the preset syntax subtree corresponding thereto.

For example, when the syntax subtree in the input question is a syntax subtree other than the preset syntax subtree in the current sentence syntax subtree library, the syntax subtree can be regarded as a sentence which does not belong to the professional field, the syntax subtree is determined as error data, and the syntax subtree in the input question is discarded.

And the first preset dimension of the preset initial vector represents the number of the corresponding syntax subtrees. The first predetermined dimension may be 512 or it may be 1028. Embodiments of the present disclosure are not limited by the first preset dimension of the preset initial vector.

The preset syntax subtree corresponding to each dimension in the preset initial vector may be, for example, a general syntax subtree; or it may be a specific syntactic subtree, e.g. comprising a specific syntactic subtree in a certain professional domain, e.g. medical domain, legal domain, etc. The embodiments of the present disclosure are not limited by the type of the preset syntax subtree corresponding to each dimension in the preset initial vector.

The initial value of the preset initial vector is intended to make each sub-element in the preset initial vector have the same initial value for the subsequent accumulation process, so that it may be, for example, 0, or may also be 10 or any other value according to the actual requirement. Embodiments of the present disclosure are not limited by the set initial values.

And giving corresponding accumulated values to the sub-elements in the preset initial vector, namely representing that the syntax subtrees which are the same as the preset syntax subtrees corresponding to the sub-elements exist in all the syntax subtrees of the input problem. The accumulated value may be 1, for example, that is, for each syntax subtree existing in the input question, if there is a preset syntax subtree identical to the syntax subtree in the preset syntax subtree library, the preset syntax subtree is obtained, and the value of the child element corresponding to the preset syntax subtree in the preset initial vector is added by 1. However, embodiments of the present disclosure are not limited by the value of the accumulated value.

Fig. 2B shows a schematic diagram of a preset initial vector according to an embodiment of the present disclosure.

Referring to FIG. 2B, the process 200 shown in FIG. 2A may be described in more detail. Wherein, the preset sentence law sub-tree library comprises 10 preset syntax sub-trees c₁-c₁₀Then, the initial vector M is preset₀Correspondingly comprising 10 sub-elements m₁-m₁₀In which an initial vector M is preset₀The corresponding relationship between the sub-element m and the preset sub-tree is shown by the arrow in FIG. 2B₁Corresponding to a preset syntax sub-tree c₁Sub-element m₂Corresponding to a preset syntax sub-tree c₂… … subelement m₁₀Corresponding to a preset syntax sub-tree c₁₀And the predetermined initial vector M₀Is 0.

Fig. 2C illustrates a schematic diagram of parsing an input question according to an embodiment of the present disclosure.

Referring to fig. 2C and 2B, for the input question "fever due to cold", the syntax structure analysis is first performed to obtain a syntax tree J₀. Where S represents a sentence, NP represents a noun phrase, VP represents a verb phrase, N represents a noun, and V represents a verb. Based on the resulting syntax tree, it may, for example, decompose the syntax tree into 5 syntax subtrees j as in fig. 2C₁-j₅And obtaining the syntax tree and syntax subtree of the input question.

Further, based on the syntax subtree j₁-j₅Obtaining the same preset subtree c in the preset sentence law subtree library₂,c₃,c₄,c₅,c₈Correspondingly, for the preset initial vector M₀Sub-element m in₂,m₃,m₄,m₅,m₈An accumulated value is given, which is, for example, 1. A syntactic analysis vector I (0,1,1,1,1,0,0,1,0,0) for the input question is obtained from this.

Based on the above, one method of obtaining a syntax analysis vector through syntax analysis and an example thereof are shown, however, it should be understood that embodiments of the present disclosure are not limited to the above method, and other methods may be selected to obtain the syntax analysis vector.

Based on the above, the input question is parsed to obtain a syntactic analysis vector, and the vector represents the syntactic structure and composition of the input question, so that the syntactic structure and composition can be conveniently compared with the question and answer pairs in the preset question library.

In some embodiments, the process of processing an input question to obtain a syntactic content vector may be described in more detail. For example, in the case of processing an input question through a neural network to obtain a syntactic content vector, when the selected neural network is a long-short memory network, the input question will be first input to an input end of the long-short memory network; then, the input problem is calculated by a forward layer and a reverse layer of the long-time neural network; the calculation result may be further processed, for example, via a conditional random field algorithm layer; and finally, obtaining a processing result of the neural network at the output end of the neural network, wherein the processing result is a syntactic content vector with a second preset dimension, and the vector is the syntactic content vector.

It will be appreciated that where the input problem is processed through a neural network to obtain a syntactic content vector, the second predetermined dimension is determined in dependence upon the selected neural network and the set parameters. The second predetermined dimension may be, for example, the same as the first predetermined dimension or may be different from the first predetermined dimension. Embodiments of the present disclosure are not limited by the relationship between the first predetermined dimension and the second predetermined dimension and the specific value of the second predetermined dimension.

Based on the above, the input question is processed to obtain the syntactic content vector of the input question, so that the syntactic content characteristics of the input question can be obtained, and answer retrieval for the input question based on the syntactic content characteristics is facilitated.

Fig. 3 illustrates an exemplary flow chart of a process 300 for comparing the input question with preset questions in a preset question-and-answer library to obtain a search result based on the syntactic structure vector and the syntactic content vector according to an embodiment of the present disclosure.

The process of obtaining the search result can be described in more detail with reference to fig. 3. In some embodiments, first, in step S301, a syntactic structure similarity of the syntactic structure vector and the syntactic structure vector of each preset question in the preset question-and-answer library is calculated.

Wherein the preset question-answer library comprises at least one preset question, and wherein for each preset question in the preset question-answer library: carrying out syntactic structure analysis on the preset problem to obtain a syntactic structure vector of the preset problem; and processing the preset problem to obtain a syntactic content vector of the preset problem. Accordingly, the preset question has a syntactic structure vector and a syntactic content vector corresponding to the preset question, and the syntactic structure vector and the syntactic content vector have the same dimension with the syntactic structure vector and the syntactic content vector of the input question respectively.

The syntactic structure similarity is used for representing the similarity degree of the input question and the preset question on syntactic structure characteristics. It can be obtained by comparing the syntactic structure feature vectors of the input question and the preset question, for example. The embodiments of the present disclosure are not limited by the specific manner of comparison.

Next, in step S302, a syntactic content similarity of the syntactic content vector and the syntactic content vector of each preset question in a preset question-and-answer library is calculated.

The syntactic content similarity is intended to characterize a degree of similarity in syntactic content characteristics of the input question and the preset question. It can be obtained by, for example, solving cosine similarity between the syntactic content vector of the input question and the syntactic content vector of the preset question, or it can also obtain the syntactic content similarity in other ways. Embodiments of the present disclosure are not limited by the particular method employed to find the syntactic content similarity.

Thereafter, in step S303, a question similarity between the input question and each preset question in a preset question-and-answer library is determined according to the syntactic structure similarity and the syntactic content similarity.

The question similarity may be obtained by directly adding the syntax structure similarity and the syntax content similarity, or may be obtained by giving different weights to the syntax structure similarity and the syntax content similarity, and further adding the obtained syntax structure similarity and the syntax content similarity multiplied by the corresponding weights, where the weights of the syntax structure similarity and the syntax content similarity are, for example, 0.4 and 0.6, or may be 0.3 and 0.7, respectively. Embodiments of the present disclosure are not limited by the specific manner of obtaining the similarity of the preset problem and the specific weight setting.

After the question similarity is obtained, in step S304, a search result is output according to the question similarity.

For example, the maximum question similarity in the preset question-answer library may be determined based on the question similarity determined for each preset question, and a preset question and answer pair corresponding to the maximum question similarity may be obtained, and the answer in the answer pair may be directly output; or a preset threshold value can be set, the obtained question similarity is compared with the preset threshold value, preset question and answer pairs corresponding to one or more question similarity larger than the preset threshold value are obtained, and answers in the preset question and answer pairs are sequentially output. Embodiments of the present disclosure are not limited by the particular method of determining search results based on question similarity.

Based on the above, the syntactic structure similarity and the syntactic content similarity of the input question and each preset question in the preset answer library are obtained, the question similarity between the input question and the preset question is obtained based on the syntactic structure similarity and the content similarity, and the retrieval result is obtained based on the question similarity. Therefore, in the process of calculating the retrieval result, the structural characteristics and the content characteristics of the input problem are comprehensively considered, and the accuracy and the reliability of the retrieval result are improved.

Fig. 4A illustrates an exemplary flow diagram of a process 400 for calculating a syntactic structure similarity of the syntactic structure vector to the syntactic structure vector of each preset question in a preset question-and-answer library, according to an embodiment of the present disclosure.

Referring to fig. 4A, first, in step S401, for each sub-element in the syntax structure vector of the input question, a syntax sub-tree corresponding to the sub-element is obtained. Then, in step S402, the syntax subtree is compared with the syntax subtree corresponding to each sub-element in the syntax structure vector of the preset question, and based on the comparison result, the sub-element similarity of the sub-element in the syntax structure vector of the input question is obtained. In step S403, the sub-element similarities of all sub-elements in the syntax structure vector of the input question are added to obtain the syntax structure similarity between the input question and the preset question.

Specifically, the above process can be illustrated by the following formula:

wherein q is₀To input a question, q_LThe method comprises the steps of setting the L-th preset question in a preset question bank, wherein L is a positive integer which is larger than 0 and smaller than the total number of questions in the preset question bank. H (q)₀) To input a question q₀Syntax structure vector of, H (q)_L) To preset a problem q_LThe syntactic structure vector of (2). sim _ T is the similarity of the resulting syntactic structure vector of the input question and the syntactic structure vector of the preset question. Wherein n is a first predetermined dimension. H (q)₀) I denotes the ith sub-element in the syntactic structure vector of the input problem, H (q)_L) And j represents the jth sub-element in the syntax structure vector of the preset problem. Wherein C (H (q)₀)_i，H(q_L) And j) representing the similarity between the ith sub-element in the current input problem and the jth sub-element in the preset problem, wherein i is a positive integer which is greater than or equal to 1 and less than or equal to n, and j is a positive integer which is greater than or equal to 1 and less than or equal to n.

The process of obtaining the similarity of the sub-elements in the syntax structure vector of the input question based on the comparison result may be, for example: and on the basis of a comparison result obtained by comparing the syntax subtree of the sub-element in the syntax structure vector of the input question with the syntax subtree corresponding to each sub-element in the syntax structure vector of the preset question, adding all the comparison results to obtain the sub-element similarity of the sub-element in the syntax structure vector of the input question. And the comparison result represents the similarity between the syntax subtree corresponding to the sub-element in the syntax structure vector of the input question and the syntax subtree corresponding to the sub-element in the preset question.

A method for comparing the syntax subtree of the sub-element in the syntax structure vector of the input question with the syntax subtree corresponding to the sub-element in the syntax structure vector of the preset question to obtain a comparison result is provided below. It should be understood that the embodiments of the disclosure are not limited by the particular manner of comparison employed.

In some embodiments, before comparing the syntax subtree of the sub-element in the syntax structure vector of the input question with the syntax subtree corresponding to the sub-element in the syntax structure vector of the preset question, a process of distinguishing a value of the sub-element to be compared is further included.

FIG. 4B illustrates an exemplary flow diagram of a process 410 for discriminating on a value of a sub-element to be compared according to an embodiment of the disclosure.

In some embodiments, in combination with fig. 4B, the initial values of all sub-elements in the predetermined initial vector are 0, and the accumulated value is 1. Based on this, when the numerical value of the sub-element to be compared is judged: first, in step S411, it is determined whether the element values of the current sub-element in the syntax structure vector of the input question and the current sub-element in the syntax structure vector of the preset question are both non-zero values.

The current sub-elements in the syntactic structure vector of the input question refer to the sub-elements in the syntactic structure vector of the input question, the similarity of which needs to be calculated currently; the current sub-elements in the syntax structure vector of the preset problem refer to the sub-elements in the syntax structure vector of the preset problem, which need to calculate the similarity at present.

When there is a zero value in the value of the current sub-element in the syntax structure vector of the input question and/or the current sub-element in the syntax structure vector of the preset question, in step S412, a preset first comparison result is output.

The preset first comparison result is intended to represent a current sub-element in the syntax structure vector and/or a similarity of a current sub-element in the syntax structure vector of a preset problem is 0%, and based on this, the preset first comparison result may be, for example, 0 or may also be a null value.

The above process can be more specifically described as three cases. In the first case, when a question q is input₀Syntax structure vector H (q)₀) The value of the current sub-element in (1) is 0, and a question q is preset_LSyntax structure vector H (q)_L) When the value of the current sub-element in (1) is not 0, the syntax structure vector H (q) of the question is input at this time₀) Current sub-element of (1) and preset question q_LIs a syntax knotConstruct vector H (q)_L) The similarity of the current sub-element of (2) is 0.

In the second case, when the question is input, the syntactic structure vector H (q)₀) If the value of the current sub-element in (b) is not 0, the question q is preset_LSyntax structure vector H (q)_L) If the value of the current sub-element in (1) is 0, then the syntax structure vector H (q) of the question is input at this time₀) Current sub-element of (1) and preset question q_LSyntax structure vector H (q)_L) The similarity of the current sub-element of (2) is 0.

In the third case, when a syntactic structure vector H (q) of the question is input₀) The value of the current sub-element in (1) is 0, and a question q is preset_LSyntax structure vector H (q)_L) If the value of the current sub-element in (1) is also 0, then the syntax structure vector H (q) of the question is input at this time₀) Current sub-element of (1) and preset question q_LSyntax structure vector H (q)_L) The similarity of the current sub-element of (2) is 0.

Syntactic structure vector H (q) at input question₀) The value of the current sub-element in (1) and the preset problem q_LSyntax structure vector H (q)_L) If none of the values of the current sub-elements in the input question has a zero value, in step S413, the syntax sub-tree corresponding to the current sub-element in the syntax structure vector of the input question is compared with the syntax sub-tree corresponding to the current sub-element in the syntax structure vector of the preset question.

In some embodiments, the comparison process in step S413 described above may be described in more detail. First, in step S4131, a syntax sub-tree corresponding to a current sub-element in a syntax structure vector of an input question is set as a first syntax sub-tree, and a syntax sub-tree corresponding to a current sub-element in a syntax structure vector of the preset question is set as a second syntax sub-tree. In step S4132, the first syntax sub-tree and the second syntax sub-tree are compared to obtain a similarity between the first syntax sub-tree and the second syntax sub-tree.

The first syntax subtree and the second syntax subtree are only used for distinguishing the syntax subtree corresponding to the current sub-element in the syntax structure vector of the input question and the syntax subtree corresponding to the current sub-element in the syntax structure vector of the preset question, but not used for limiting the type or the content of the syntax subtree. It should be appreciated that the first and second syntax subtrees may be syntax subtrees of the same type.

The above-mentioned comparison between the first syntax subtree and the second syntax subtree may be performed by comparing each node included in the first syntax subtree and the generation formula of the node, or may also be performed in other comparison manners, and the embodiment of the present disclosure is not limited by the specific method for comparing the first syntax subtree and the second syntax subtree to obtain the similarity.

Fig. 4C illustrates an exemplary flow diagram of a process 420 of comparing a first syntax subtree and a second syntax subtree according to an embodiment of the disclosure.

In some embodiments, comparing the first syntax subtree and the second syntax subtree, for example, may employ a method as shown in fig. 4C, and first, in step S421, it is determined whether a production formula on an initial node of the first syntax subtree is the same as a production formula on an initial node of the second syntax subtree.

The production representation characterizes the manner in which one or more child nodes directly connected to it are extended by a non-leaf node, which may be characterized as one or more branches in a syntactic subtree that lead from the non-leaf node. It should be understood that the number of producers is not limited by the number of branches it contains, and that for the same node it has only one producer to which one or more branches from that node belong. For example, for syntactic subtree c in FIG. 2B₆Has a production formula with two branches, and the child nodes NP and VP directly connected to it are derived based on the production formula extension.

When the production formula at the initial node of the first syntax sub-tree is different from the production formula at the initial node of the second syntax sub-tree, in step S422, a preset first comparison result is output.

As mentioned above, the preset first comparison result is intended to represent that the similarity between the current sub-element of the syntactic structure vector of the input question and the current sub-element of the syntactic structure vector of the preset question is 0%. The preset first comparison result may be, for example, 0, or may also be a null value, and embodiments of the present disclosure are not limited by a specific value of the preset first comparison result.

The initial node is a node positioned at the uppermost layer in the syntax subtree, namely a main node of the syntax subtree.

When the initial nodes of the first and second syntax subtrees are identical in production, it is further determined in step S423 whether only leaf nodes exist in descendants of the initial nodes of the first and second syntax subtrees.

Wherein the descendants of the initial node refer to all children nodes derived from the initial node. The leaf nodes characterize the node in the syntax subtree at the lowest level, i.e. the node in the syntax subtree where no further branching can be divided.

If only leaf nodes exist in the descendants of the initial node of the first syntax subtree and the descendants of the initial node of the second syntax subtree, in step S424, a preset second comparison result is output.

The preset second comparison result is intended to represent that the similarity between the current sub-element of the syntactic structure vector of the input question and the current sub-element of the syntactic structure vector of the preset question is 100%. The value may be, for example, 1, or may be other values. Embodiments of the present disclosure are not limited by the specific value of the preset second comparison result.

It should be understood that the preset first comparison result and the preset second comparison result in the present application are intended to distinguish different comparison result values and their representation meanings, and are not intended to limit the preset first comparison result and the preset second comparison result.

When a non-leaf node is included in the descendant of the initial node of the first syntax subtree and/or the descendant of the initial node of the second syntax subtree, a preset algorithm is employed to calculate the similarity of the first syntax subtree and the second syntax subtree in step S425.

The predetermined algorithm may be selected based on the accuracy requirement and the actual calculation requirement, for example, a recursive algorithm is selected, or a composite algorithm including a recursive operation is selected to perform the calculation. Embodiments of the present disclosure are not limited by the specific algorithm content and type of the preset algorithm.

In some embodiments, the preset algorithm may be, for example, a recursive algorithm, which may be specifically represented by the following formula:

based on the above formula, the sub-element similarity can be obtained in a recursive manner. Where J1 denotes a first syntax subtree and J2 denotes a second syntax subtree. S (J1) represents the number of all children nodes produced under the initial node in the first syntax subtree, S (J2) represents the number of all children nodes produced under the initial node in the second syntax subtree, and min [ S (J1), S (J2) ] represents the minimum of the number of children nodes of the first syntax subtree and the number of children nodes of the second syntax subtree. J1(s) represents the generated s-th node under the initial node in the first syntax sub-tree, and J2(s) represents the generated s-th node under the initial node in the second syntax sub-tree. Wherein S is a positive integer of 1 to S.

FIG. 4D illustrates a diagram of calculating sub-element similarity for sub-elements in an input problem, according to an embodiment of the disclosure.

Referring to fig. 4D and 2B, the above process may be described in more detail. When the similarity between the sub-elements in the syntactic structure vector and the sub-elements of the preset problem is calculated, if the input problem at this time is: if the fever is caused by cold, the syntactic structure is analyzed to obtain the syntactic tree J as described above₀And a plurality of syntax subtrees contained in the vector, and a syntax structure vector H (q) of the syntax subtree is obtained based on the corresponding relation between the syntax subtree and sub-elements in a preset initial vector₀)：(0,1,1,1,1,0,0,1,0,0)。

Next, for the syntactic structure vector H (q) of the input question₀) The process of comparing the input question with the preset question in the preset question-and-answer library to find the similarity between the input question and the preset question is described below by taking the example of finding the similarity between the syntactic structure vector of the input question and the syntactic structure vector of the first preset question "pneumonia causes fever" in the preset question-and-answer library. First, for a first preset problem: "pneumonia causes fever", and it is known that he has syntax tree J₁And syntax structure vector H (q)₁) Comprises the following steps: (1,0,0,1,0,1,1,1,0,0).

Based on the above process, the syntactic structure vector H (q) for the input question follows₀) Each sub-element in (a) finds its similarity to a preset problem. First, based on the above method, a syntactic structure vector H (q) for an input question is known₀) Subelement H (q) of (1)₀)_1、H(q₀)_6、H(q₀)_7、H(q₀)_9、H(q₀) 10, since its own value is 0, it is associated with the syntactic structure vector H (q) of the first predetermined problem₁) The calculated sub-element similarity is 0. Next, the sub-element similarity is found for other sub-elements, and the syntactic structure vector H (q) of the input question is used₀) The sub-element similarity of all the sub-elements in the input question is added to obtain a syntactic structure vector H (q) of the input question₀) And the syntactic structure similarity of the syntactic structure vector with the first preset problem. Now with H (q)₀) Example of a.2, the subelement H (q)₀) The sub-element similarity corresponding to _2 is shown as the following formula:

based on the above, the corresponding H (q) is obtained₀) The sub-element similarity of the _2 sub-element has a value of 18. Wherein, for the similarity C (H (q)) in the sub-element similarity₀)_2，H(q_L) 1) due to the sub-element H (q)₀)_2、H(q_L) The initial nodes in _1are both S and they both produce NP and VP, so the production on their initial nodes is the same, but since there are descendants of the initial nodeThe non-leaf node (N, V, N) will therefore use the recursive formula as described above to find the sub-element similarity, as described in step S425 of FIG. 4C. In the recursive process, for each child node in the initial node, comparing the generation formulas of the child nodes to obtain corresponding child node similarity, and multiplying the multiple child node similarities to obtain the similarity of the two child elements, so that the finally obtained similarity C (H (q) of the two child elements is obtained₀)_2，H(q_L) A 1) is (1+1) × (1+1+1+1), i.e. 8.

Through the above process, a syntactic structure vector H (q) of the input question is obtained₀) And adding the sub-element similarity of all sub-elements in the syntax structure vector of the input problem to obtain the syntax structure similarity of the input problem and the preset problem.

Based on the above, the syntactic structure vector of the input question is compared with the syntactic structure vector of each preset question in the preset question library to obtain the syntactic structure similarity of the input question relative to each preset question, which is beneficial to further calculation based on the syntactic structure similarity.

Fig. 5 illustrates an exemplary flowchart of a process of calculating a syntactic content similarity of a syntactic content vector of the input question to a syntactic content vector of each preset question in a preset question-and-answer library according to an embodiment of the present disclosure.

Referring to fig. 5, in some embodiments, first, in step S501, a cosine similarity of the syntactic content vector of the input question and the syntactic content vector of the preset question is calculated. Specifically, it can be calculated, for example, by the following formula:

sim_V(LSTM(q₀),LSTM(q_L))＝cos(LSTM(q₀),LSTM(q_L)) 4)

wherein q is₀To input a question, q_LThe method comprises the steps of setting the L-th preset question in a preset question bank, wherein L is a positive integer which is larger than 0 and smaller than the total number of questions in the preset question bank. LSTM (q)₀) Is via nerveNetwork processing of an output input question q₀Syntactic content vector of, LSTM (q)_L) For processing the preset questions q in the preset question bank output by the neural network_LSyntactic content vector of (2). sim _ V is the cosine similarity of the syntactic content vector of the input question and the syntactic content vector of the preset question.

After the cosine similarity is obtained, in step S502, the cosine similarity is used as the syntax content similarity.

Based on the above, by obtaining the cosine similarity between the syntactic content vector of the input problem and the syntactic content vector of the preset problem, the syntactic content similarity between the input problem and the preset problem can be obtained simply, conveniently and quickly, and the retrieval can be realized based on the syntactic content similarity conveniently and subsequently.

Fig. 6 illustrates an exemplary flowchart of a process of outputting a retrieval result according to the question similarity according to an embodiment of the present disclosure.

Referring to fig. 6, in some embodiments, when obtaining a search result based on the obtained question similarity, first, in step S601, based on the question similarity determined for each preset question, the maximum question similarity in the preset question-answer library is determined, and a preset question and answer pair corresponding to the maximum question similarity is obtained.

For example, if there are 10 preset questions in the current preset question bank, the questions are question q₁-q₁₀And the problem similarity with the input problem is respectively as follows: 87,90,101,20,32,12,91,82,9,10. Then, the maximum question similarity is known to be 101, and the question and answer pair corresponding to the maximum question similarity is obtained, i.e. the preset question q is obtained₃And its answer pair.

Based on the obtained maximum question similarity, in step S602, the maximum question similarity is compared with the preset threshold.

The preset threshold is used for representing the lowest similarity value between the preset problem corresponding to the retrieval result to be output and the input problem. Which may be set, for example, based on the actual required retrieval accuracy, embodiments of the present disclosure are not limited by the specific numerical values that the preset threshold has. For example, it may be set to 50, or it may be set to 100.

Further, in step S603, when the maximum question similarity is greater than or equal to a preset threshold, the corresponding answer in the preset question and answer pair is output, and when the maximum question similarity is less than the preset threshold, a null value is output.

Specifically, for example, a preset threshold is set to 100, which is intended to represent that a preset question corresponding to an output search result should have a question similarity value of at least 100 with the input question. If the maximum question similarity in the current preset question library is 90, the answer of the preset question corresponding to the maximum question similarity cannot be output because the maximum question similarity is smaller than the preset threshold, and a null value is output at the moment. If the maximum question similarity in the current preset question library is 103, the answer of the preset question corresponding to the maximum question similarity is output because the maximum question similarity is larger than the preset threshold.

Based on the above, the maximum question similarity is obtained based on the question similarity, the preset question and answer pair with the maximum question similarity is obtained, the question similarity is further checked through the preset threshold, and the answer corresponding to the corresponding preset question is output only when the maximum similarity is greater than or equal to the preset threshold, so that the output answer and the input question have higher similarity, and the accuracy of the retrieval result is improved.

The process of obtaining the medical question and answer pair may be, for example, crawling the patient's question and the doctor's answer from the medical community, the hospital official website and the electronic medical record through an algorithm to form a question and answer pair. Further, the question and answer pair may be further subject to content inspection, information supplementation and modification, e.g. by a medical professional, so that the accuracy of the question answer is further improved. Embodiments of the present disclosure are not limited by the specific method of obtaining the medical question and answer pair and the subsequent specific processing of the obtained question and answer pair.

By setting the question answer library for the medical professional field, when the input question is a medical question, the input question can be retrieved based on the preset question library and the answer of the input question can be obtained, so that the retrieval speed in the medical question retrieval process can be obviously improved; meanwhile, compared with massive consultation on the network, the method is beneficial to providing more professional medical problem solutions for users.

Fig. 7 shows an exemplary block diagram of a question answering retrieval device according to an embodiment of the present disclosure.

The question answering retrieval apparatus 900 shown in fig. 7 includes a syntactic structure analysis module 910, a syntactic content analysis module 920, and a retrieval result generation module 930.

Wherein the syntactic structure analyzing module 910 is configured to perform syntactic structure analysis on the input question to obtain a syntactic structure vector of the input question. The syntactic content analysis module 920 is configured to process an input question, resulting in a syntactic content vector for the input question. The search result generation module 930 is configured to compare the input question with a preset question in a preset question-answer library based on the syntactic structure vector and the syntactic content vector to obtain a search result.

According to the question-answer retrieval device, the syntactic content vector and the syntactic structure vector of the input question can be obtained based on the input question, and the input question is compared with the preset question in the preset question-answer library to obtain a retrieval result based on the syntactic content vector and the syntactic structure vector, so that when answer retrieval is carried out on the input question, the syntactic structure characteristics and the syntactic content characteristics of the input question are comprehensively considered, and the retrieved answer is more accurate; compared with the retrieval process that a user browses information data in a network by himself and manually screens answers, the method obviously reduces the time cost of retrieval and has higher retrieval efficiency and retrieval speed.

In some embodiments, the search result generation module 930 may further include: a structure similarity generating module 931, a content similarity generating module 932, a question similarity generating module 933, and a result module 934. It may execute the process shown in fig. 3, and compare the syntactic structure vector and syntactic content vector of the input question with the preset questions in the preset question-answering library to obtain the search result.

Wherein the structure similarity generating module 931 is configured to perform the operation of step S301 in fig. 3, and calculate the syntax structure similarity between the syntax structure vector and the syntax structure vector of each preset question in the preset question-and-answer library.

The preset question-answer library comprises at least one preset question, and each preset question in the preset question-answer library is: carrying out syntactic structure analysis on the preset problem to obtain a syntactic structure vector of the preset problem; and processing the preset problem to obtain a syntactic content vector of the preset problem. Accordingly, the preset question has a syntactic structure vector and a syntactic content vector corresponding to the preset question, and the syntactic structure vector and the syntactic content vector have the same dimension with the syntactic structure vector and the syntactic content vector of the input question respectively.

The content similarity generating module 932 is configured to perform the operation of step S302 in fig. 3, and calculate the syntactic content similarity of the syntactic content vector and the syntactic content vector of each preset question in the preset question-and-answer library.

The syntactic content similarity is intended to characterize a degree of similarity in syntactic content characteristics of the input question and the preset question. Embodiments of the present disclosure are not limited by the particular method employed to find the syntactic content similarity.

The question similarity generating module 933 is configured to perform the operation of step S303 in fig. 3, and determine question similarities between the input question and each preset question in a preset question-and-answer library according to the syntactic structure similarity and the syntactic content similarity.

The question similarity may be obtained by directly adding the syntax structure similarity and the syntax content similarity, or may be obtained by adding the syntax structure similarity and the syntax content similarity with different weights. Embodiments of the present disclosure are not limited by the specific manner of obtaining the similarity of the preset problem and the specific weight setting.

The result module 934 is configured to perform the operation of step S304 in fig. 3, and output the search result according to the question similarity.

For example, the maximum question similarity in the preset question-answer library may be determined based on the question similarity determined for each preset question, a preset question and answer pair corresponding to the maximum question similarity may be obtained, the answer in the answer pair may be directly output, or a search result may be determined in another manner. Embodiments of the present disclosure are not limited by the particular method of determining search results based on question similarity.

Based on the above, the question-answer retrieval device is used for obtaining the syntactic structure similarity and the syntactic content similarity of the input question and each preset question in the preset answer library, and comprehensively obtaining the question similarity of the input question and each preset question based on the syntactic structure similarity and the content similarity, so as to obtain the retrieval result. Therefore, in the process of calculating the retrieval result, the structural characteristics and the content characteristics of the input problem are comprehensively considered, and the accuracy and the reliability of the retrieval result are improved.

In some embodiments, the results module 934 includes a maximum problem similarity determination module 9341, a comparison module 9342, and an output module 9343. It may execute the flow shown in fig. 6, and output the search result according to the question similarity.

The maximum question similarity determining module 9341 is configured to execute the step shown in step S601 in fig. 6, determine the maximum question similarity in the preset question-answer library based on the question similarity determined for each preset question, and obtain a preset question and answer pair corresponding to the maximum question similarity.

The comparing module 9342 is configured to execute the step shown in step S602 in fig. 6, and compare the maximum problem similarity with the preset threshold.

The output module 9343 is configured to execute the step shown in step S603 in fig. 6, and output a corresponding answer in the preset question and answer pair when the maximum question similarity is greater than or equal to a preset threshold, and output a null value when the maximum question similarity is less than the preset threshold.

The preset threshold is used for representing the lowest similarity value between the preset problem corresponding to the retrieval result to be output and the input problem. Which may be set, for example, based on the actual required retrieval accuracy, embodiments of the present disclosure are not limited by the specific numerical values that the preset threshold has.

Based on the above, with the question-answer retrieval device, the maximum question similarity can be obtained based on the question similarity, the preset question and answer pair with the maximum question similarity can be obtained, the question similarity is further checked through the preset threshold, and the answer corresponding to the corresponding preset question is output only when the maximum similarity is greater than or equal to the preset threshold, so that the output answer and the input question have higher similarity, and the accuracy of the retrieval result is improved.

The process of obtaining the medical question and answer pair may be, for example, crawling the patient's question and the doctor's answer from the medical community, the hospital official website and the electronic medical record through an algorithm to form a question and answer pair. Embodiments of the present disclosure are not limited by the specific method of obtaining the medical question and answer pair and the subsequent specific processing of the obtained question and answer pair.

In some embodiments, the question answering retrieval device is capable of performing the method as described above and has corresponding functionality.

The question answering retrieval device 950 shown in fig. 8 may be implemented as one or more special or general computer system modules or components, such as a personal computer, a laptop computer, a tablet computer, a mobile phone, a Personal Digital Assistant (PDA), and any intelligent portable device. The question answering retrieval device 950 may include at least one processor 960 and a memory 970, among others.

Wherein the at least one processor is configured to execute program instructions. The memory 970 may be present in the question answering retrieval device 950 in various forms of program storage units as well as data storage units, such as a hard disk, Read Only Memory (ROM), Random Access Memory (RAM), which can be used to store various data files used by the processor in processing and/or performing retrieval, as well as possible program instructions executed by the processor. Although not shown in the drawings, question-answering retrieval device 950 may also include an input/output component that supports input/output data flow between question-answering retrieval device 950 and other components. Question and answer retrieving device 950 may also send and receive information and data from the network through the communication port.

In some embodiments, a set of instructions stored by the memory 970, when executed by the processor 960, causes the question-answering retrieval device 950 to perform operations comprising: carrying out syntactic structure analysis on an input problem to obtain a syntactic structure vector of the input problem; processing an input question to obtain a syntactic content vector of the input question; and comparing the input question with a preset question in a preset question-answering library based on the syntactic structure vector and the syntactic content vector to obtain a retrieval result.

In some embodiments, the question-answer retrieving device 950 may receive the user input questions collected from an input device external to the question-answer retrieving device 950, and perform the above-described question-answer retrieving method on the received input questions, implementing the above-described function of the question-answer retrieving apparatus.

Although in fig. 8, processor 960 and memory 970 are shown as separate modules, those skilled in the art will appreciate that the device modules described above may be implemented as separate hardware devices or integrated into one or more hardware devices. The specific implementation of different hardware devices should not be considered as a factor limiting the scope of the present disclosure, as long as the principles described in the present disclosure can be implemented.

In the embodiments of the present disclosure, the processor may be a Central Processing Unit (CPU), a field programmable logic array (FPGA), a single chip Microcomputer (MCU), a Digital Signal Processor (DSP), an Application Specific Integrated Circuit (ASIC), or other logic operation devices having data processing capability and/or program execution capability. Memory includes, but is not limited to, volatile memory and/or non-volatile memory, for example. The volatile memory may include, for example, Random Access Memory (RAM), Cache memory (Cache), and/or the like. The non-volatile memory may include, for example, Read Only Memory (ROM), hard disk, flash memory, etc.

According to another aspect of the present disclosure, there is also provided a non-transitory computer readable storage medium having stored thereon computer readable instructions which, when executed by a computer, may perform the method as described above.

Portions of the technology may be considered "articles" or "articles of manufacture" in the form of executable code and/or associated data, which may be embodied or carried out by a computer readable medium. Tangible, non-transitory storage media may include memory or storage for use by any computer, processor, or similar device or associated module. For example, various semiconductor memories, tape drives, disk drives, or any similar device capable of providing a storage function for software.

All or a portion of the software may sometimes communicate over a network, such as the internet or other communication network. Such communication may load software from one computer device or processor to another. For example: from a server or host computer of the question and answer retrieval device, to a hardware platform of a computer environment or other computer environment implementing a system or similar functionality associated with providing information needed for retrieval. Thus, another medium capable of transferring software elements may also be used as a physical connection between local devices, such as optical, electrical, electromagnetic waves, etc., propagating through cables, optical cables, air, etc. The physical medium used for the carrier wave, such as an electric, wireless or optical cable or the like, may also be considered as the medium carrying the software. As used herein, unless limited to a tangible "storage" medium, other terms referring to a computer or machine "readable medium" refer to media that participate in the execution of any instructions by a processor.

This application uses specific words to describe embodiments of the application. Reference to "a first/second embodiment," "an embodiment," and/or "some embodiments" means a feature, structure, or characteristic described in connection with at least one embodiment of the application. Therefore, it is emphasized and should be appreciated that two or more references to "an embodiment" or "one embodiment" or "an alternative embodiment" in various places throughout this specification are not necessarily all referring to the same embodiment. Furthermore, some features, structures, or characteristics of one or more embodiments of the present application may be combined as appropriate.

Moreover, those skilled in the art will appreciate that aspects of the present application may be illustrated and described in terms of several patentable species or situations, including any new and useful combination of processes, machines, manufacture, or materials, or any new and useful improvement thereon. Accordingly, various aspects of the present application may be embodied entirely in hardware, entirely in software (including firmware, resident software, micro-code, etc.) or in a combination of hardware and software. The above hardware or software may be referred to as "data block," module, "" engine, "" unit, "" component, "or" system. Furthermore, aspects of the present application may be represented as a computer product, including computer readable program code, embodied in one or more computer readable media.

Unless otherwise defined, all terms (including technical and scientific terms) used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this invention belongs. It will be further understood that terms, such as those defined in commonly used dictionaries, should be interpreted as having a meaning that is consistent with their meaning in the context of the relevant art and will not be interpreted in an idealized or overly formal sense unless expressly so defined herein.

The foregoing is illustrative of the present invention and is not to be construed as limiting thereof. Although a few exemplary embodiments of this invention have been described, those skilled in the art will readily appreciate that many modifications are possible in the exemplary embodiments without materially departing from the novel teachings and advantages of this invention. Accordingly, all such modifications are intended to be included within the scope of this invention as defined in the claims. It is to be understood that the foregoing is illustrative of the present invention and is not to be construed as limited to the specific embodiments disclosed, and that modifications to the disclosed embodiments, as well as other embodiments, are intended to be included within the scope of the appended claims. The invention is defined by the claims and their equivalents.

Claims

1. A question-answer retrieval method comprises the following steps:

carrying out syntactic structure analysis on an input problem to obtain a syntactic structure vector of the input problem;

processing an input question to obtain a syntactic content vector of the input question;

and comparing the input question with a preset question in a preset question-answering library based on the syntactic structure vector and the syntactic content vector to obtain a retrieval result.

2. The question-answer retrieving method according to claim 1, wherein comparing the input question with preset questions in a preset question-answer library based on the syntactic structure vector and the syntactic content vector to obtain a retrieval result comprises:

calculating the syntactic structure similarity of the syntactic structure vector and the syntactic structure vector of each preset question in a preset question-answering library;

calculating the syntactic content similarity of the syntactic content vector and the syntactic content vector of each preset question in a preset question-answering library;

determining question similarity between the input question and each preset question in a preset question-answering library according to the syntactic structure similarity and the syntactic content similarity; and

and outputting a retrieval result according to the problem similarity.

3. The question-answering retrieval method according to claim 2, wherein outputting the retrieval result according to the question similarity includes:

determining the maximum question similarity in the preset question-answer library based on the question similarity determined for each preset question, and acquiring a preset question and answer pair corresponding to the maximum question similarity;

comparing the maximum problem similarity with the preset threshold;

and outputting corresponding answers in the preset question and answer pair when the maximum question similarity is larger than or equal to a preset threshold, and outputting a null value when the maximum question similarity is smaller than the preset threshold.

4. The question-answer retrieval method according to claim 1, wherein the input question is a medical question, the preset question-answer library comprises a plurality of preset medical question and answer pairs, wherein each preset medical question and answer pair comprises: a class of medical questions and their corresponding answers.

5. The question-answer retrieving method according to claim 2, wherein calculating the syntactic structure similarity of the syntactic structure vector and the syntactic structure vector of each preset question in a preset question-answer library comprises:

for each sub-element in the syntactic structure vector of the input problem, obtaining a corresponding syntactic sub-tree;

comparing the syntax subtree with the syntax subtree corresponding to each sub-element in the syntax structure vector of the preset problem, and obtaining the sub-element similarity of the sub-element in the syntax structure vector of the input problem based on the comparison result;

and adding the sub-element similarity of all sub-elements in the syntactic structure vector of the input question to obtain the syntactic structure similarity of the input question and the preset question.

6. The question-answering retrieval method according to claim 5, wherein comparing the syntactic sub-tree with the syntactic sub-tree corresponding to each sub-element in the syntactic structure vector of the preset question comprises:

judging whether the element values of the current sub-elements in the syntactic structure vector of the input problem and the current sub-elements in the syntactic structure vector of the preset problem are non-zero numerical values or not;

if a zero value exists in the numerical values of the current sub-elements in the syntax structure vector of the input problem and/or the current sub-elements in the syntax structure vector of the preset problem, outputting a preset first comparison result;

and if the current sub-element in the syntactic structure vector of the input problem and the current sub-element in the syntactic structure vector of the preset problem are both nonzero values, comparing the syntactic sub-tree corresponding to the current sub-element in the syntactic structure vector of the input problem with the syntactic sub-tree corresponding to the current sub-element in the syntactic structure vector of the preset problem.

7. The question-answering retrieval method according to claim 6, wherein comparing the syntax subtree corresponding to the current sub-element in the syntax structure vector of the input question with the syntax subtree corresponding to the current sub-element in the syntax structure vector of the preset question comprises:

taking a syntax sub-tree corresponding to a current sub-element in a syntax structure vector of an input problem as a first syntax sub-tree, and taking a syntax sub-tree corresponding to a current sub-element in the syntax structure vector of a preset problem as a second syntax sub-tree;

and comparing the first syntax subtree and the second syntax subtree based on a preset rule to obtain the similarity of the first syntax subtree and the second syntax subtree.

8. The question-answering retrieval method of claim 7, wherein comparing the first and second syntax subtrees based on a preset rule comprises:

judging whether a production formula on an initial node of the first syntax subtree is the same as a production formula on an initial node of the second syntax subtree or not;

when the production formula on the initial node of the first syntax subtree is different from the production formula on the initial node of the second syntax subtree, outputting a preset first comparison result;

and wherein, when the production formula on the initial node of the first syntax subtree and the production formula on the initial node of the second syntax subtree are the same, determining whether only leaf nodes exist in descendants of the initial node of the first syntax subtree and descendants of the initial node of the second syntax subtree;

if only leaf nodes exist in the descendants of the initial node of the first syntax subtree and the descendants of the initial node of the second syntax subtree, outputting a preset second comparison result;

and if the descendants of the initial node of the first syntactic subtree and/or the descendants of the initial node of the second syntactic subtree comprise non-leaf nodes, adopting a preset algorithm to calculate the similarity of the first syntactic subtree and the second syntactic subtree.

9. The question-answer retrieving method according to claim 2, wherein calculating the syntactic content similarity of the syntactic content vector to the syntactic content vector of each preset question in a preset question-answer library comprises:

calculating the cosine similarity between the syntactic content vector of the input problem and the syntactic content vector of the preset problem;

and taking the cosine similarity as syntax content similarity.

10. A question-answer retrieval apparatus comprising:

the syntactic structure analysis module is configured to perform syntactic structure analysis on the input question to obtain a syntactic structure vector of the input question;

the syntactic content analysis module is configured to process the input question to obtain a syntactic content vector of the input question;

and the retrieval result generation module is configured to compare the input question with a preset question in a preset question-answering library based on the syntactic structure vector and the syntactic content vector to obtain a retrieval result.

11. A question-answering retrieval device, wherein the device comprises a processor and a memory, the memory containing a set of instructions that, when executed by the processor, cause the question-answering retrieval device to perform operations comprising:

12. The question-answering retrieval device according to claim 11, wherein comparing the input question with preset questions in a preset question-answering library based on the syntactic structure vector and syntactic content vector to obtain a retrieval result comprises:

and outputting a retrieval result according to the problem similarity.

13. The question-answering retrieval device according to claim 11, wherein outputting retrieval results according to the question similarity includes:

comparing the maximum problem similarity with the preset threshold;

14. The question-answer retrieving device according to claim 11, wherein the input question is a medical question, and the preset question-answer library includes a plurality of preset medical question and answer pairs, wherein each preset medical question and answer pair includes: a class of medical questions and their corresponding answers.

15. A computer-readable storage medium having computer-readable instructions stored thereon which, when executed by a computer, perform the method of any of claims 1-9.