CN110688478B

CN110688478B - Answer sorting method, device and storage medium

Info

Publication number: CN110688478B
Application number: CN201910939362.XA
Authority: CN
Inventors: 张映雪; 孟凡东; 李鹏; 周杰
Original assignee: Tencent Technology Shenzhen Co Ltd
Current assignee: Tencent Technology Shenzhen Co Ltd
Priority date: 2019-09-29
Filing date: 2019-09-29
Publication date: 2021-11-30
Anticipated expiration: 2039-09-29
Also published as: CN110688478A

Abstract

The embodiment of the application discloses an answer sorting method, an answer sorting device and a storage medium, wherein the method comprises the following steps: obtaining a plurality of candidate answers corresponding to the question; calculating the correctness score of each candidate answer as a correct answer according to the matching information between the question and each candidate answer; pre-sorting the candidate answers according to the correctness score of each candidate answer to obtain a pre-sorting queue; determining target candidate answers needing to participate in reordering currently from a pre-ordering queue according to the ordering sequence of each candidate answer in the pre-ordering queue; extracting target evidence information from background information of the historical candidate answers, and acquiring target correctness scores of the target candidate answers according to the target evidence information and the questions; inserting the target candidate answers into corresponding positions in a target sorting queue according to the target correctness scores; and outputting the target sorting queue when all the candidate answers in the pre-sorting queue are inserted into the target sorting queue. The scheme can improve the accuracy of judging the candidate answers.

Description

Answer sorting method, device and storage medium

Technical Field

The embodiment of the application relates to the technical field of internet, in particular to an answer ranking method, an answer ranking device and a storage medium.

Background

In a search system, a plurality of answers are provided for a question input by a user to be selected by the user, which is referred to as answer selection. Answer selection is a ranking process that ranks correct candidate answers before incorrect answers. Currently, answer selection is mainly realized based on two types of models, one type is based on similarity matching between a question and a candidate answer, and the question is the only basis for judging the candidate answer without considering other additional information. One is to try to extract some extra information from an extra corpus or knowledge base as a basis for selecting the correct answer, in addition to the question. For example, the question and the candidate answer are encoded and further subjected to similarity matching by using a deep network structure such as a recurrent neural network or a convolutional neural network. The higher the degree of matching with the question, the more likely it is that the answer is correct for the candidate answer. Such models can be roughly classified into Attention mechanism (Attention mechanism) based models and comparison-aggregation (Compare-aggregation) mechanism based models.

In the course of research and practice on the prior art, the inventors of the embodiments of the present application found that the effect of the answer selection system is limited to some extent due to the limited matching information between the question and the candidate answer. Some information related to the question is extracted from an external corpus or knowledge base as additional information to assist in determining whether the candidate answer is correct. The additional information used includes text related to the problem searched using a search system and triples extracted from the knowledge base by physical links, but this way of extracting additional information requires the use of third party tools and is time consuming and expensive. The accuracy of the candidate answer is not high in the two ways.

Disclosure of Invention

The embodiment of the application provides an answer sorting method, an answer sorting device and a storage medium, which can improve the accuracy of judging candidate answers, do not need to use a third-party tool, and improve the efficiency of selecting the whole answer.

In a first aspect, an embodiment of the present application provides an answer ranking method, where the method includes:

obtaining a plurality of candidate answers corresponding to the question;

calculating the correctness score of each candidate answer as a correct answer according to the matching information between the question and each candidate answer;

pre-sorting the candidate answers according to the correctness score of each candidate answer to obtain a pre-sorting queue;

determining target candidate answers needing to participate in reordering currently from a pre-ordering queue according to the ordering sequence of each candidate answer in the pre-ordering queue;

extracting target evidence information from background information of historical candidate answers, and acquiring target correctness scores of the target candidate answers according to the target evidence information and the questions, wherein the historical candidate answers are candidate answers which are historically participated in reordering;

inserting the target candidate answer into a corresponding position in a target sorting queue according to the target correctness score;

and outputting the target sorting queue when all the candidate answers in the pre-sorting queue are inserted into the target sorting queue.

In one possible design, after extracting the target evidence information from the background information of the historical candidate answer and before obtaining the target correctness score of the target candidate answer according to the target evidence information and the question, the method further includes:

acquiring matching information between the target evidence information and the target candidate answers;

synthesizing matching information between the question and the target candidate answer and matching information between the target evidence information and the target candidate answer to obtain target matching information;

the obtaining a target correctness score of the target candidate answer according to the target evidence information and the question includes:

and acquiring the target correctness score according to the target matching information, the target evidence information and the problem.

In one possible design, the extracting target evidence information from the background information of the historical candidate answers includes:

acquiring historical evidence information and historical correctness scores of the historical candidate answers;

determining an evidence component required by current reordering from the historical evidence information according to the historical correctness score;

and obtaining the target evidence information according to the background information of the historical candidate answers, the historical evidence information and the evidence component.

In one possible design, the obtaining the target correctness score according to the target matching information, the target evidence information and the question includes:

predicting and marking all possible assertion results when the target candidate answer answers the question according to the target evidence information;

calculating probability distribution of all possible assertion results when the target candidate answer answers the question according to the target matching information;

and obtaining the target correctness score according to the probability distribution and the multilayer perception parameters.

In one possible design, before extracting the target evidence information from the background information of the historical candidate answers, the method further includes:

extracting background information of a first candidate answer participating in reordering in the pre-ordering queue;

initializing the background information of the first candidate answer into an information vector, and taking the information vector as the evidence information of the candidate answer participating in next reordering in the pre-ordering queue.

In one possible design, the method is implemented by a neural network model; after the inserting the target candidate answer into the corresponding position in the target sorting queue, before outputting the target sorting queue when all candidate answers in the pre-sorting queue are inserted into the target sorting queue, the method further includes:

obtaining the current sorting precision of the target sorting queue after candidate answers are inserted in the reordering and the historical sorting precision of the target sorting queue after historical candidate answers are inserted in the reordering, wherein the sorting precision represents evaluation parameters of the target sorting queue after the answers are inserted in each time;

calculating a difference between the current ranking precision and the historical ranking precision;

and training the neural network model according to the difference value.

In one possible design, the training the neural network model according to the difference value includes:

calculating a reward value for the difference;

and training the neural network model according to the reward value maximization rule of the difference value.

In one possible design, the method further includes:

and storing the target sorting queue to a block chain.

In a second aspect, an embodiment of the present application provides an answer ranking device having a function of implementing the answer ranking method corresponding to the first aspect. The functions can be realized by hardware, and the functions can also be realized by executing corresponding software by hardware. The hardware or software includes one or more modules corresponding to the above functions, which may be software and/or hardware.

In one possible design, the apparatus includes:

the input and output module is used for acquiring a plurality of candidate answers corresponding to the question;

the processing module is used for calculating the correctness score of each candidate answer as a correct answer according to the matching information between the question and each candidate answer; pre-sorting the candidate answers according to the correctness score of each candidate answer to obtain a pre-sorting queue;

the processing module is further used for determining a target candidate answer which needs to participate in reordering currently from the pre-sorting queue according to the sorting sequence of each candidate answer in the pre-sorting queue; extracting target evidence information from background information of historical candidate answers, and acquiring target correctness scores of the target candidate answers according to the target evidence information and the questions, wherein the historical candidate answers are candidate answers which are historically participated in reordering;

the processing module is further used for inserting the target candidate answer into a corresponding position in a target sorting queue according to the target correctness score;

the input and output module is further used for outputting the target sorting queue when all the candidate answers in the pre-sorting queue are inserted into the target sorting queue.

In one possible design, after extracting the target evidence information from the background information of the historical candidate answer, the processing module is further configured to, before obtaining the target correctness score of the target candidate answer according to the target evidence information and the question:

In one possible design, the processing module is specifically configured to:

In one possible design, the processing module is further configured to, before extracting the target evidence information from the background information of the historical candidate answers:

In one possible design, the method is implemented by a neural network model; the processing module is further configured to, after the target candidate answer is inserted into the corresponding position in the target sorting queue, before outputting the target sorting queue when all candidate answers in the pre-sorting queue are inserted into the target sorting queue, further:

and training the neural network model according to the difference value.

In one possible design, the processing module is specifically configured to:

calculating a reward value for the difference;

In one possible design, the apparatus further includes a storage module configured to store the target ordering queue to a block chain.

In yet another aspect, an embodiment of the present application provides a computer apparatus, which includes at least one connected processor, a memory and an input/output unit, where the memory is used for storing a computer program, and the processor is used for calling the computer program in the memory to execute the method according to the first aspect.

Yet another aspect of the embodiments of the present application provides a computer-readable storage medium, which includes instructions that, when executed on a computer, cause the computer to perform the method of the first aspect.

Compared with the prior art, in the scheme provided by the embodiment of the application, the correctness score of each candidate answer as a correct answer is calculated according to the matching information between the question and each candidate answer; and pre-sorting the candidate answers according to the correctness score of each candidate answer to obtain a pre-sorting queue. Therefore, the candidate answers with higher correctness scores can be subjected to rough sorting by pre-sorting the candidate answers, and unnecessary calculation is reduced for calculating the target correctness scores and outputting a target sorting queue during the subsequent re-sorting of the target candidate answers, so that the sorting efficiency is improved, and the accuracy and the reasonability of the sorting result are improved. In addition, extracting target evidence information for judging the correctness of the candidate answers; the target correctness score is determined according to the target evidence information and the question, the correctness score of the target candidate answer is determined by introducing the target evidence information, the accuracy and the validity of the correctness score of the target candidate answer can be further improved, meanwhile, the target evidence information is the inherent characteristics of the historical candidate answer, and therefore the triple which assists in judging the correctness of the candidate answer is obtained without the aid of a third-party tool, the time consumed by sorting is reduced, and the cost is saved.

Drawings

FIG. 1a is a schematic diagram of a neural network model for reordering in an embodiment of the present application;

FIG. 1b is a schematic diagram of a neural network model for reordering in an embodiment of the present application;

FIG. 2 is a flow chart illustrating answer ranking according to an embodiment of the present application;

FIG. 3a is a diagram illustrating background information extracted from historical candidate answers according to an embodiment of the present invention;

FIG. 3b is a schematic diagram of a block chain system according to an embodiment of the present application;

FIG. 4 is a schematic diagram of an apparatus for ranking answers according to an embodiment of the present application;

FIG. 5 is a schematic diagram illustrating an embodiment of an apparatus for performing an answer ranking method;

fig. 6 is a schematic structural diagram of a server for executing the answer sorting method in the embodiment of the present application.

Detailed Description

The terms "comprises" and "comprising," and any variations thereof, in the description and claims of embodiments of the present application and the above-described drawings, are intended to cover non-exclusive inclusions, such that a process, method, system, article, or apparatus that comprises a list of steps or modules is not necessarily limited to those steps or modules explicitly listed, but may include other steps or modules not explicitly listed or inherent to such process, method, article, or apparatus, and the division of modules presented in the embodiments of the present application is merely a logical division and may be implemented in other ways that may be practiced in practice, such that multiple modules may be combined or integrated into another system or that certain features may be omitted, or not implemented, and such that couplings or direct couplings or communicative coupling between each other as shown or discussed may be through interfaces, the indirect coupling or communication connection between the modules may be electrical or in other similar forms, and is not limited in the embodiments of the present application. Moreover, the modules or sub-modules described as separate components may or may not be physically separated, may or may not be physical modules, or may be distributed in a plurality of circuit modules, and some or all of the modules may be selected according to actual needs to achieve the purpose of the embodiments of the present application.

The embodiment of the application provides an answer ranking method, an answer ranking device and a storage medium, which can be used on a server side, wherein the server side can be used for application scenarios such as retrieval, search or query. The server may be a retrieval system, a search engine, or a query system, among others. Embodiments of the present application relate to Artificial Intelligence (AI), which is a theory, method, technique, and application system that simulates, extends, and expands human Intelligence, senses the environment, acquires knowledge, and uses the knowledge to obtain optimal results using a digital computer or a machine controlled by a digital computer. In other words, artificial intelligence is a comprehensive technique of computer science that attempts to understand the essence of intelligence and produce a new intelligent machine that can react in a manner similar to human intelligence. Artificial intelligence is the research of the design principle and the realization method of various intelligent machines, so that the machines have the functions of perception, reasoning and decision making.

The artificial intelligence technology is a comprehensive subject and relates to the field of extensive technology, namely the technology of a hardware level and the technology of a software level. The artificial intelligence infrastructure generally includes technologies such as sensors, dedicated artificial intelligence chips, cloud computing, distributed storage, big data processing technologies, operation/interaction systems, mechatronics, and the like. The artificial intelligence software technology mainly comprises a computer vision technology, a voice processing technology, a natural language processing technology, machine learning/deep learning and the like.

The embodiment of the present application can implement the answer ranking method based on AI, which is specifically described by the following embodiments:

in some embodiments, the server side may deploy an answer ranking device, the answer ranking device may be implemented by a neural network model, and after pre-ranking each candidate answer, perform correctness score calculation and reordering on each candidate answer based on the neural network model, and output a ranking result. As shown in fig. 1a, a schematic structural diagram of a neural network model deployed in a server side, based on the structure of the neural network model, the functions mentioned in the answer ranking method in the embodiment of the present application may be implemented. The neural network model can be divided into three modules: the system comprises a basic module, an evidence module and an inference module.

A basic module: aiming at obtaining matching information between the question and each candidate answer, the basic module comprises a coding layer and an attention layer. The coding layer is to: the question and the current candidate answer are encoded as a vector via a Bi-directional gated recurrent neural network (Bi-GRU). Attention layer: a mechanism of attention in bi-directional attention flow is employed to match the question with the candidate answer.

An evidence module: the method aims to aggregate evidence information from background information of historical candidate answers and acquire matching information between the evidence information and candidate answers to be reordered currently. The evidence module includes an evidence aggregator. The evidence module may also be referred to as an evidence information module, and the evidence aggregator becomes an evidence information aggregator. The examples of the present application are not limited thereto.

An inference module: the method comprises a state space and an action space, wherein after matching information between a question and a candidate answer and between each candidate answer and matching information between target evidence information between evidence information and the candidate answer and between the target candidate answer are obtained at a t time step (for convenience of expression, the matching information is simply referred to as the t time step, other similar reasons are omitted), the matching information between the question and each candidate answer and between the target evidence information and the target candidate answer are spliced by a neural network model to be used as a state S of the t time step_t. The neural network model may be based on state S_tAnd judging whether the candidate answer can answer the current question or not by mapping the candidate answer to the action space.

In the embodiment of the present application, the time step is a hyper-parameter of the neural network model, and is relative to the input of the neural network model, i.e. the number of times required for a complete input. For example, a plurality of candidate answers to be input into the neural network model are regarded as a time series, then the input into the neural network model is a plurality of sequences (for example, a plurality of candidate answers in the embodiment of the present application), and each time there is a parallel or sequential order of the sequences input into the neural network model, it may be determined that a plurality of time steps are required for the sequence to be input into the neural network model. The time step in the embodiment of the present application may also be referred to as a time step, and is not limited thereto. For example, the number of candidate answers to be input into the neural network model is 100, the time step is set to 20, and then the 1 st to 20 th candidate answers are used as the first training sample, the 21 st to 40 th candidate answers are used as the second training sample, and so on, for a total of 5 training samples. And inputting the 5 training samples into the neural network model respectively 5 times in sequence so as to train the neural network model.

In the embodiment of the present application, the sorting of the candidate answers includes a pre-sorting stage and a re-sorting stage, where the re-sorting stage may include at least one sorting, and each sorting is based on a sorting result of historical re-sorting. For example, the sorted results may be reordered based on the last time, or may be reordered based on history any number of times. Taking the sorting result of the current reordering based on the last reordering as an example, specifically, the first sorting of the reordering stage is based on the sorting result of the pre-sorting stage, the second sorting of the reordering stage is based on the second sorting result of the reordering stage, and so on. A flowchart of ranking candidate answers is shown in fig. 1 b.

In the pre-sorting stage, a basic module is called to obtain matching information between a question and candidate answers, the matching information between the question and the candidate answers is input into a multilayer perceptron of a neural network model, corresponding correctness scores are output by the multilayer perceptron, the correctness scores of the candidate answers are sorted to obtain a pre-sorting result, and the pre-sorting result determines the re-sorting sequence of the candidate answers. The reordering module calls the three modules together to reorder each candidate answer in turn by reinforcement learning (as shown in fig. 1 a).

In the re-ordering stage, for each candidate answer, the basic module is first called to obtain the question and the current candidate answer C_tThe matching information (solid point box in fig. 1 a) between them is taken as one of the bases, and then the evidence module is called to aggregate the candidate answers obtained from the reordering stage in the previous t-1 time step to obtain the evidence information E_t(corresponds to the event Extractor in FIG. 1 a), and further obtain E_tAnd C_tMatching information between (e.g. the slashed boxes in fig. 1 a). Finally, calling an inference module to splice the two matching information to obtain target matching information S_tMatching the target with information S_tTo calculate the current candidate answer C_tThe correctness score of. And updating the sorting queue according to the correctness score, and then calculating a difference value to guide the whole model to be continuously trained.

Since the matching information between the question and the candidate answer is limited, the effect of the answer selection system is limited to a certain extent, and some information related to the question needs to be extracted from an external corpus or a knowledge base to serve as additional information to assist in judging whether the candidate answer is correct or incorrect. The additional information used includes text related to the problem searched using a search system and triples extracted from the knowledge base by physical links, but this way of extracting additional information requires the use of external tools and is time consuming and expensive. The accuracy of the candidate answer is not high in the two ways. In order to solve the above technical problem, the embodiments of the present application mainly provide the following technical solutions:

the method comprises the steps of dividing an answer selection process into a pre-sorting stage and a re-sorting stage, taking background information of a plurality of candidate answers as evidence for judging whether the candidate answers are correct or incorrect, then aggregating the background information of the candidate answers by using a pre-sorting result and a door mechanism to obtain evidence information, and re-sorting the candidate answers by using the evidence information to obtain a final sorting result. Since the background information of the candidate answers exists naturally, any third-party tool, corpus and knowledge base are not needed, and the method is convenient and effective.

It should be noted that the answer ranking method in the embodiment of the present application may be used in scenarios such as ranking answers, selecting answers, querying information, or selecting a policy, for example, a candidate answer may be replaced by a candidate policy or a candidate policy, which is not limited in the embodiment of the present application. The embodiment of the present application only exemplifies the answer ranking method in the embodiment of the present application by taking the ranking of answers in the search system as an example, and the embodiment of selecting a policy based on the answer ranking method may refer to the embodiment of selecting answers, without limiting the selected object and the application scenario where the selected object is located.

Referring to fig. 2, an answer ranking method provided in the embodiment of the present application is described below, which may be implemented by an answer selecting device, which may be a search engine, in the embodiment of the present application, including:

201. and obtaining a plurality of candidate answers corresponding to the question, and calculating the correctness score of each candidate answer as a correct answer according to the matching information between the question and each candidate answer.

The matching information between the question and each candidate answer may be a probability or a correctness score, for example, the matching information between the question and each candidate answer is a probability that the candidate answer becomes a correct answer to the question, or the matching information between the question and each candidate answer is a score given when the candidate answer is answered to the question.

202. And pre-sorting the candidate answers according to the correctness score of each candidate answer to obtain a pre-sorting queue.

The pre-sorting queue comprises a plurality of candidate answers, and each candidate answer corresponds to a sorting order.

When the plurality of candidate answers are pre-ranked, the candidate answers may be pre-ranked according to the probability that the candidate answers become correct answers to the questions, or pre-ranked according to correctness scores given when the candidate answers to the questions, which is not limited in the embodiment of the present application.

For example, the question of input: what are personal insurance contained?

The question "what personal insurance contains? "there are 20 candidate answers in total (i.e., including C)₁、C₂、C₃…C₂₀)：

C₁: life insurance is a contract between an insured life (the policy holder) and an insurer in which the insurer promises to pay the insured life an amount to a given beneficiary after the insured life.

C₂: based on the contract, other events may also trigger the reimbursement of the contractual agreement, such as late stage disease or crisis.

C₃: person-based contracts tend to be of two major categories.

C₄: the terms of protection are intended to provide some funds in the event of a particular event, particularly a tumor claim.

C₅: the personal insurance is the insurance taking the life and body of a person as the insurance target. When people suffer from unfortunate accidents or are lost due to diseases and old ageWhen the working capacity, the disabled, the dead or the old retires, the insurer pays insurance money to the insured life or the beneficiary according to the agreement of the insurance contract so as to solve the economic difficulties caused by diseases, the disabled, the old and the dead.

C₆: in personal insurance, the insurer is responsible for paying without any question about the loss. For this reason, life insurance is usually rated insurance.

C₇: the personal insurance includes life insurance, injury insurance and health insurance.

C₈: one of the characteristics of life insurance is that the life insurance is long. The personal insurance risk period of individual people is short, and is several days or even several minutes, and if passengers accidentally injure the insurance and the high-altitude pulley insurance, the personal insurance period is different. One reason why insurers are reluctant to set the insurance time too short is that people have a long-term need for insurance assurance; another reason is that the amount of insurance required for life insurance is high, and is generally obtained by paying the premium by stages for a long period of time.

……

C₁₉: the personal insurance includes life insurance, injury insurance and health insurance.

C₂₀: one of the characteristics of life insurance is that the life insurance is long. The personal insurance risk period of individual people is short, and is several days or even several minutes, and if passengers accidentally injure the insurance and the high-altitude pulley insurance, the personal insurance period is different. One reason why insurers are reluctant to set the insurance time too short is that people have a long-term need for insurance assurance; another reason is that the amount of insurance required for life insurance is high, and is generally obtained by paying the premium by stages for a long period of time.

Respectively calculate C₁、C₂、C₃…、C₁₉、C₂₀Answer question "what are personal insurance contained? "probability of time, probability is in turn: 0.99, 0.97, 0.92, … 0.69.69, 0.80, then, according to the magnitude of the probability, C₁、C₂、C₃…、C₁₉、C₂₀And performing pre-sorting to obtain a pre-sorting queue RL 1.

RL1 is: { C₁、C₂、C₃、C₅、C₆、C₉、C₁₀、C₁₂、C₁₄、C₂₀、C₄、C₈、C₇、C₁₁、C₁₈、C₁₇、C₁₆、C₁₃、C₁₉、C₁₅}。

In some embodiments, since the number of candidate answers may be large, in order to reduce the number of times of reordering to reduce the amount of calculation and improve the efficiency of ordering the answers, the candidate answers with probability values lower than a preset threshold may be removed from the preset queue, or how many candidate answers are selected from the preset queue for reordering. For example, a candidate answer to top10 may be selected from RL1 according to the magnitude of the probability, i.e., RL1 'is obtained, and RL 1' includes C₁、C₂、C₃、C₅、C₆、C₉、C₁₀、C₁₂、C₁₄、C₂₀. Therefore, the candidate answers with higher correctness scores can be effectively screened out through the method, unnecessary calculation is filtered out for the subsequent calculation of the correctness scores of the candidate answers and the obtained target sorting queue, the reordering operation aiming at the candidate answers with lower reference values is reduced, and the reordering efficiency and the accuracy and the reasonability of the sorting result are improved.

In the embodiment of the present application, each time of reordering the candidate answers may be considered to be completed within one time step, and the reordering of the candidate answers participating in the reordering is finally completed within a plurality of time steps. For example, the reordering stage of k candidate answers needs to be completed within k time steps, so the target candidate answer that needs to participate in reordering at present can be regarded as the candidate answer C to be reordered at t time steps_t. Before the next reordering, the next candidate answer is selected from the pre-ordering queue. Since the reordering stage in the embodiment of the present application is an iterative loop process, each iteration will be performed for each candidateThe selected answers are reordered once, if the cyclic iteration process is realized by adopting the neural network model in the embodiment of the application, the candidate answers in the preset queue are reordered and can be classified according to the sequence of time steps, namely, each reordering is finished within one time step, the target candidate answers are input into the neural network model at each time step, and then the next time step iteration cycle is entered.

204. Extracting target evidence information from background information of the historical candidate answers, and obtaining target correctness scores of the target candidate answers according to the target evidence information and the questions.

Wherein the historical candidate answer is a candidate answer which is historically participated in reordering. The historical candidate answer may be a candidate answer that has been re-ranked last time, or may be a re-ranked candidate answer that has been re-ranked any time before the current re-ranking, which is not limited in the embodiments of the present application. For example, the target candidate answer is C_t. Then the historical candidate answer may be C_t-1(ii) a Or the historical candidate answer may be C_iI is 1, 2, 3 … t-1, i.e. the historical candidate answer may be C₁、C₂、…、C_t-1T is less than or equal to the total number of candidate answers in the preset queue. C_t-1For candidate answers participating in reordering at time step t-1, other similar operations will not be described.

The target evidence information may also be referred to as auxiliary judgment information, auxiliary evidence, or the like, and the embodiments of the present application are not limited thereto.

In some embodiments, the following operations may be used to extract the target evidence information from the background information of the historical candidate answers:

1. target evidence information is extracted from the background information of the historical candidate answers.

The background information of the candidate answers refers to information obtained by mining and deriving based on the content of the candidate answers.

For example, as shown in FIG. 3a, the question of input: what are personal insurance contained?

Obtained by reordering at 4 time steps4, candidate answers: c₁、C₂、C₃And C₄。

C₁: life insurance is a contract between an insured life (the policy holder) and an insurer in which the insurer promises to pay the insured life an amount to a given beneficiary after the insured life. Wherein the underlined parts are all C₁Background information of (1).

C₂: based on the contract, other events may also trigger the reimbursement of the contractual agreement, such as late stage disease or crisis. Wherein the underlined parts are all C₂Background information of (1).

C₃: person-based contracts tend to be of two major categories.

C₄: the terms of protection are intended to provide some funds in the event of a particular event, particularly a tumor claim. Wherein the underlined parts are all C₄Background information of (1).

Wherein, C₁、C₂、C₃、C₄The labels of (a) are in turn: true, false, true.

Separately extracting C₁、C₂、C₄And C is₁、C₂、C₄The background information is used as evidence information to be aggregated to obtain the target evidence information, and the target evidence information is used as a basis for judging candidate answers at the 5 th time step.

2. And aggregating the background information of the historical candidate answers to obtain the target evidence information.

In some embodiments, the following operations are used to extract the target evidence information from the background information of the historical candidate answers:

determining an evidence component (namely information to be added into target evidence information) required by current reordering from the historical evidence information according to the historical correctness score;

Alternatively, the evidence component may be determined based on the flow of information from the time step t-1 to the time step t of the door mechanism control.

Wherein the target correctness score is the probability that the target candidate answer is used for answering the question correctly or not when the target is reordered.

In some embodiments, the obtaining a target correctness score of the target candidate answer according to the target matching information, the target evidence information and the question includes:

a. and predicting and marking all possible assertion results when the target candidate answer answers the question according to the target evidence information.

Optionally, the target evidence information may be used as the state of the neural network model at the current reordering (e.g., the state of the neural network model at t time step).

b. And calculating the probability distribution of all possible assertion results when the target candidate answer answers the question according to the target matching information.

Optionally, if the target evidence information is used as a current reordered state of the neural network model, determining probability distribution of all possible assertion results when the target candidate answer answers the question according to the state.

c. And obtaining the target correctness score according to the probability distribution and the multilayer perception parameters.

For example, at the t time step, after the matching information between the question and each candidate answer and the matching information between the target evidence information and the target candidate answer are obtained, the neural network model splices the matching information between the question and each candidate answer and the matching information between the target evidence information and the target candidate answer, and the spliced information is used as the state S of the t time step_t. Neural network model according to state S_tFor executing an action a_tI.e. neural network model according to state S_tTo predict whether the candidate answer is available to answer the question. Can be formed according to the shapeState S_tFor executing an action a_t"seen as" S_tMapping to motion space, get a_t", since there are a plurality of candidate answers, a plurality of a are formed by mapping the plurality of candidate answers to the motion space, respectively_tA plurality of a_tForm a probability distribution, which will also be "according to state S_tFor executing an action a_t"is regarded as a pair_tSampling is performed.

In the embodiment of the present application, the action space is defined as {0,1}, where 1 indicates that the current candidate answer can answer the question (i.e., the current candidate answer is correct), and 0 indicates that the question cannot be answered (i.e., the current candidate answer is wrong). Specifically, the neural network model transforms S into S through a two-layer multi-layer perceptron (MLP)_tMapping to vector space to obtain S_tAll possibilities a_tThe probability distribution of (a) is as follows:

f(s_t)＝tanh(W₁s_t+b₁)

p(a_t|s_t)＝softmax(W₂f(s_t)+b₂)

wherein, W1, W2, b1 and b2 are multilayer perception parameters of MLP, W1, W2, b1 and b2 are obtained by training a neural network model, S_tIs the state of the neural network model at time step t. p (a)_t|s_t) Is the correctness score of the tth candidate answer.

Neural network model according to p (a)_t|s_t) The size of the value adds the current candidate answer to the target ranking queue RL2, enabling the reordering of the current candidate answer.

When the current candidate answer is added to the target ranking queue RL2, it may be due to p (a)_t|s_t) Differences in the magnitude of the value from the other candidate answers in the target sort queue RL2 cause a change in the target sort queue RL 2. In the embodiment of the application, an evaluation index can be set for the target sorting queue RL2, so that the change situation of the target sorting queue RL2 can be known conveniently. For example, average Accuracy (AVER) can be usedage precision, english abbreviation: AveP) to evaluate the change of the target sorting queue RL2 before and after the update. For example, denote AveP at t time step as AP_t。

In some embodiments, in order to further improve the accuracy of the candidate answer in the reordering stage, the probability that the evidence information can further determine that the candidate answer is used for answering the question (i.e., the matching information between the evidence information and the candidate answer) may also be considered. Specifically, after extracting target evidence information from background information of a historical candidate answer, before the obtaining a target correctness score of the target candidate answer according to the target evidence information and the question, the method further includes:

correspondingly, the target correctness score is obtained according to the target matching information, the target evidence information and the problem.

205. And inserting the target candidate answer into a corresponding position in a target sorting queue according to the target correctness score.

The target sorting queue may be a queue newly created after a preset queue, or may be a preset queue changed after reordering candidate answers in the preset queue, for example, the target candidate answer may be updated into the sorting queue according to the target correctness score, which is not limited in the embodiments of the present application.

The target candidate answers are inserted into corresponding positions in the target sorting queue, or the process of updating the pre-sorting queue can be regarded as the process of reordering each candidate answer, so that the recommendation sequence of the candidate answers is updated more reasonably, the final output sorting result is improved, and more accurate candidate answers are provided for the user.

It is understood that the insertion of the target candidate answer into the corresponding position in the target sorting queue is an iterative loop process, and except for the first candidate answer participating in the re-ranking, the same iterative loop is performed for each candidate answer after the first candidate answer participating in the re-ranking (i.e., steps 203 to 205).

Updating the sort queue is an iterative loop process that, except for the first time step, performs the same iterative loop for the candidate answers at each time step after the second time step.

Optionally, in some of the embodiments of the present application, the method is implemented by a neural network model; after the target candidate answer is inserted into the corresponding position in the target sorting queue, and when all candidate answers in the pre-sorting queue are inserted into the target sorting queue, before the target sorting queue is output, the method further includes:

obtaining the current sorting precision of the target sorting queue after the candidate answer is inserted in the current reordering and the historical sorting precision of the target sorting queue after the historical candidate answer is inserted in the last reordering; wherein the sorting precision represents an evaluation parameter of the target sorting queue after evaluating each sorting. The ranking precision may also be referred to as an average precision;

and training the neural network model according to the difference value.

The difference value in the embodiment of the present application is used to indicate a sorting index of the sorting queue, for example, the sorting model index is normalized broken cumulative gain (NDCG). The difference may also be referred to as a prize.

In some embodiments, in order to continuously improve the accuracy and the rationality of the predicted problem answers of the neural network model, the neural network model may be subjected to reinforcement learning (or referred to as optimization), for example, at each time step of the reinforcement learning, the expected value of the difference may be maximized, so as to train the model parameters in the neural network model. Specifically, the training the neural network model according to the difference value includes:

calculating a reward value for the difference;

For example, according to the reward value maximization rule of the difference, when the neural network model is trained, the reward value of the difference may be maximized by using an Objective Function (Objective Function) in a policy-gradient algorithm (policy-gradient), and then the neural network model trains the neural network model by using the maximized reward value, thereby achieving the purpose of performing reinforcement learning on the neural network model. In the embodiment of the present application, a value obtained by maximizing the reward value of the difference is an expected value of the difference.

The embodiment of the application does not limit that the reward value of the difference value of each time step is maximized, and only needs to perform reinforcement learning on the neural network model.

Actions a made by the neural network model_tThe more accurate, i.e. the more accurate the judgment of the correctness of the candidate answer by the neural network model, the higher the reward the neural network model receives (i.e. the larger the difference). Therefore, model parameters in the neural network model can be trained by continuously maximizing the reward value of the difference value, so that the accuracy of the neural network model can be continuously improved, and the accuracy and the reasonability of the output sequencing result are further improved.

In some embodiments, the difference (e.g., reward) may be designed as a change in AveP before and after the ordering queue update. The above difference R (a)_t) One way of calculating (c) is as follows:

wherein, AP_tAverage precision of the target sorting queue at time step t, AP_t-1Sorting the queue for the target at time t-1Average precision of the steps, T and T are positive integers.

206. And outputting the target sorting queue when all the candidate answers in the pre-sorting queue are inserted into the target sorting queue.

In some embodiments, the output target sorting queue is used to present the display order of answers corresponding to the question on the front page, for example, C is sequentially added according to the target correctness score of each target candidate answer in the pre-sorting queue₁、C₂、C₃…、C₁₉、C₂₀And inserting the target sorting queue to obtain a target sorting queue RL 2.

RL2 is: { C₂、C₁、C₃、C₉、C₆、C₁₀、C₅、C₁₂、C₁₄、C₂₀、C₄、C₈、C₇、C₁₁、C₁₈、C₁₇、C₁₆、C₁₃、C₁₉、C₁₅}。

In the front-end page, the user can click according to the presented link of the candidate answer, namely, the user can check the detail page when the candidate answer corresponding to the link answers the question.

As another example, as shown in FIG. 1b, after the reordering is completed, a vector is output, which represents the ordering result, as can be seen, C₃、C₄、C₁And C₂From left to right, in order from high to low, C₃As the best answer to the question.

Compared with the existing mechanism, in the embodiment of the present application, in one aspect, a correctness score of each candidate answer being a correct answer is calculated according to matching information between the question and each candidate answer; and pre-sorting the candidate answers according to the correctness score of each candidate answer to obtain a pre-sorting queue. Therefore, by pre-sorting the candidate answers, the candidate answers with higher correctness scores can be subjected to rough sorting, and the target correctness scores are calculated and the output target sorting queue is reduced when the target candidate answers are subsequently re-sorted; by selecting the candidate answers, the candidate answers with higher correctness scores can be effectively screened out, unnecessary calculation is filtered out for the subsequent calculation of the correctness scores of the candidate answers and the generation of the ranking results, and therefore the ranking efficiency is improved, and the accuracy and the reasonability of the ranking results are improved. On the other hand, target evidence information used for judging the correctness of the candidate answer is extracted, a target correctness score of the target candidate answer is determined according to the target evidence information and the question, the correctness score of the target candidate answer is determined by introducing the target evidence information, the accuracy and the effectiveness of the score of the target candidate answer can be further improved, meanwhile, the target evidence information is the inherent characteristics of the historical candidate answer, and therefore a triple which assists in judging the correctness of the candidate answer is obtained without the aid of a third-party tool, the time consumed by sorting is reduced, and the cost is saved.

Optionally, in some embodiments of the present application, the target evidence information may be obtained by aggregation in a first manner, and the matching information between the target evidence information and the target candidate answer may be obtained in a second manner. Since there is no candidate answer that can be referred to when computing the evidence information of the first time step at the first time step, a processing mode of the first candidate answer participating in the re-ranking stage needs to be separately set, so that the candidate answer of the time step after the first time step can iteratively compute the evidence information of each time step in the re-ranking stage.

The first method is as follows:

and aggregating the background information of the historical candidate answers based on a gate mechanism (or called operations such as synthesis and splicing, but not limited thereto).

Specifically, before extracting the target evidence information from the background information of the historical candidate answers, the embodiment of the present application further includes:

Specifically, the first candidate answer C is extracted at the first time step t1 (for convenience, it may be abbreviated as t1 time step, and other similar reasons will not be described herein) of the reordering stage₁The background information of (A), the C₁Is initialized to an information vector E₁The information vector C is added₁Evidence information as candidate answers in the pre-sort queue to participate in re-sorting at time step t 2; at the time step t3 in the reordering stage, computing evidence information corresponding to the time step t3 according to candidate answers participating in reordering at the time step t2 and evidence information corresponding to the time step t2 at the time step t2, and so on, the evidence information obtained at each time step after the time step t2 and the time step t2 in the reordering stage can be obtained through recursive computation.

In some embodiments, the background information may be encoded into an information vector through a Bi-directional gated cyclic unit network (Bi-GRU), or may be encoded into an information vector E through a Bi-directional long-and-short memory network (Bi-LSTM) or a Convolutional Neural Network (CNN)₁。

Suppose that the predicted candidate answer C_tWhen the accuracy of (1) is high, C_tCorresponding target evidence information E_tIs according to C_t-1And C_t-1Corresponding evidence information E_t-1Is calculated to obtain C_t-1The candidate answers are obtained by the reordering participation in t-1 time steps before the t time step. Firstly, mix C_t-1Encoding into vector space to obtain

O_t-1＝BiGRU(C_t-1，|C_t-1|)

Wherein, | C_t-1I representsC_t-1Length of sequence of (1), O_t-1Is the concatenation of the last hidden state of the Bi-GRU. p is a radical of_t-1Is a time step prediction on a neural network model_t-1Accuracy score of p_t-1Is the action of reinforcement learning according to the current state at step t-1, namely prediction C_t-1Accuracy of p_t-1Is a probability value greater than 0 and less than 1. Then according to p_t-1Can be got from C_t-1E of (A)_t-1To be added to C_tAs C participating in reordering at t time steps in said pre-ordering queue_tEvidence information E of_t. Will E_tIs added to C_tAnd then, a plurality of candidate answers can be fused, so that the accuracy of the candidate answers is improved.

In some embodiments, the information to be added to the target evidence information may be determined based on a gate mechanism (gate mechanism). Specifically, E is calculated by the following formula_t。

Where σ is a sigmoid function, W_r，U_rIs a parameter and G is a gate (gate) that controls the flow of information.

In some embodiments, matching information between the question and each candidate answer and matching information between the target evidence information and the target candidate answer may be spliced based on a Capsule Network (Capsule Network) to obtain the target matching information.

The second method comprises the following steps:

after obtaining the target evidence information based on the first mode, the target evidence information and the current candidate answer may be matched by using an attention mechanism in bidirectional attention flow or by using a Multi-head attention mechanism (Multi-head attention). For example, using the attention mechanism pair E_tAnd C_tAnd matching to obtain matching information between the target evidence information and the target candidate answer. Due to E_tExpressed as a vector, E_tAfter matching with the word of each candidate answerAnd pooling to obtain the target evidence information.

In this embodiment of the present application, after all candidate answers in the pre-sorting queue are inserted into the target sorting queue, the target sorting queue may be further stored in a block chain. The blockchain is a novel application mode of computer technologies such as distributed data storage, point-to-point transmission, a consensus mechanism and an encryption algorithm. A block chain (Blockchain), which is essentially a decentralized database, is a series of data blocks associated by using a cryptographic method, and each data block contains information of a batch of network transactions, so as to verify the validity (anti-counterfeiting) of the information and generate a next block. The blockchain may include a blockchain underlying platform, a platform product services layer, and an application services layer.

The block chain underlying platform can comprise processing modules such as user management, basic service, intelligent contract and operation monitoring. The user management module is responsible for identity information management of all blockchain participants, and comprises public and private key generation maintenance (account management), key management, user real identity and blockchain address corresponding relation maintenance (authority management) and the like, and under the authorization condition, the user management module supervises and audits the transaction condition of certain real identities and provides rule configuration (wind control audit) of risk control; the basic service module is deployed on all block chain node equipment and used for verifying the validity of the service request, recording the service request to storage after consensus on the valid request is completed, for a new service request, the basic service firstly performs interface adaptation analysis and authentication processing (interface adaptation), then encrypts service information (consensus management) through a consensus algorithm, transmits the service information to a shared account (network communication) completely and consistently after encryption, and performs recording and storage; the intelligent contract module is responsible for registering and issuing contracts, triggering the contracts and executing the contracts, developers can define contract logics through a certain programming language, issue the contract logics to a block chain (contract registration), call keys or other event triggering and executing according to the logics of contract clauses, complete the contract logics and simultaneously provide the function of upgrading and canceling the contracts; the operation monitoring module is mainly responsible for deployment, configuration modification, contract setting, cloud adaptation in the product release process and visual output of real-time states in product operation, such as: alarm, monitoring network conditions, monitoring node equipment health status, and the like.

The server (which may also be referred to as a device for selecting answers) performing the search method in the embodiment of the present application may be a node in the blockchain system. The server in the embodiment of the present application may be a node in a blockchain system as shown in fig. 3 b.

Any technical feature mentioned in the embodiment corresponding to any one of fig. 1a to 3b is also applicable to the embodiments corresponding to fig. 4 and 5 in the embodiment of the present application, and the subsequent similarities are not repeated.

In the above description, an answer ranking method in the embodiment of the present application is described, and a device for executing the answer ranking method is described below.

One answer ranking method in the present embodiment is described above, and the device 40 in the present embodiment is described below.

Referring to fig. 4, a schematic structural diagram of an answer ranking device 40 is shown, which can be applied to a search system, a query system, a retrieval system, or the like, and the application scenario is not limited in the embodiment of the present application. The apparatus 40 in the embodiment of the present application can implement the steps corresponding to the answer ranking method performed in the embodiment corresponding to fig. 1 a. The functions implemented by the apparatus 40 may be implemented by hardware, or by hardware executing corresponding software. The hardware or software includes one or more modules corresponding to the above functions, which may be software and/or hardware. The apparatus 40 may include a processing module 401 and an input/output module 402 (which may also be referred to as a separate input module and output module, or as a transceiver module), where the processing module 401 may further be divided into a selection module, a generation module, an extraction module, an update module, and the like, and the processing module in this embodiment of the present application may be capable of implementing all the same or similar functions that the selection module, the generation module, the extraction module, and the update module can implement. The function of the processing module 401 and the input/output module 402 can refer to the operations of selecting and calculating the correctness score, extracting the background information and the target evidence information, and the like executed in the embodiment corresponding to fig. 1a, which are not described herein again. For example, the processing module 401 may be used to control the input/output, acquisition, etc. operations of the input/output module 402.

In some embodiments, the input/output module 402 is configured to obtain a plurality of candidate answers corresponding to the question;

the processing module 401 may be configured to calculate a correctness score for each candidate answer as a correct answer according to matching information between the question and each candidate answer; pre-sorting the candidate answers according to the correctness score of each candidate answer to obtain a pre-sorting queue;

the processing module 401 is further configured to calculate a correctness score for each candidate answer as a correct answer according to matching information between the question and each candidate answer; pre-sorting the candidate answers according to the correctness score of each candidate answer to obtain a pre-sorting queue;

the processing module 401 is further configured to insert the target candidate answer into a corresponding position in a target sorting queue according to the target correctness score;

the input-output module 402 is configured to output the target sorting queue when all candidate answers in the pre-sorting queue are inserted into the target sorting queue.

Compared with the existing mechanism, in an embodiment of the present application, in an aspect, the processing module 401 calculates a correctness score for each candidate answer as a correct answer according to matching information between the question and each candidate answer; and pre-sorting the candidate answers according to the correctness score of each candidate answer to obtain a pre-sorting queue. Therefore, the candidate answers with higher correctness scores can be subjected to rough sorting by pre-sorting the candidate answers, and unnecessary calculation is reduced for calculating the target correctness scores and outputting a target sorting queue during the subsequent re-sorting of the target candidate answers, so that the sorting efficiency is improved, and the accuracy and the reasonability of the sorting result are improved. In another aspect, the processing module 401 extracts target evidence information for determining the correctness of the candidate answer; the target correctness score is determined according to the target evidence information and the question, the correctness score of the target candidate answer is determined by introducing the target evidence information, the accuracy and the validity of the correctness score of the target candidate answer can be further improved, meanwhile, the target evidence information is the inherent characteristics of the historical candidate answer, and therefore the triple which assists in judging the correctness of the candidate answer is obtained without the aid of a third-party tool, the time consumed by sorting is reduced, and the cost is saved.

In some embodiments, after extracting the target evidence information from the background information of the historical candidate answer, the processing module 401 is further configured to, before obtaining the target correctness score of the target candidate answer according to the target evidence information and the question:

In some embodiments, the processing module 401 is specifically configured to:

In some embodiments, before the processing module 401 extracts the target evidence information from the background information of the historical candidate answers, it is further configured to:

extracting background information of the first candidate answer;

In some embodiments, the method is implemented by a neural network model; the processing module 401 is further configured to, after the target candidate answer is inserted into the corresponding position in the target sorting queue, before outputting the target sorting queue when all candidate answers in the pre-sorting queue are inserted into the target sorting queue, further:

and training the neural network model according to the difference value.

In some embodiments, the processing module 401 is specifically configured to:

calculating a reward value for the difference;

In some embodiments, the apparatus 40 further includes a storage module (not shown in fig. 4) configured to store the target sorting queue to a block chain.

The answer ranking device in the embodiment of the present application is described above from the perspective of the modular functional entity, and the network authentication server and the terminal device in the embodiment of the present application are described below from the perspective of hardware processing. It should be noted that, in the embodiment shown in fig. 6 of the present application, the entity device corresponding to the transceiver module may be an input/output unit, the entity device corresponding to the processing module may be a processor, and the entity device corresponding to the display module may be a display unit such as a display screen. The apparatus shown in fig. 4 may have a structure as shown in fig. 5, when the apparatus shown in fig. 4 has a structure as shown in fig. 5, the processor and the input/output unit in fig. 5 can implement the same or similar functions of the processing module 401 and the input/output module 402 provided in the embodiment of the apparatus corresponding to the apparatus, and the central storage in fig. 5 stores a computer program that needs to be called when the processor executes the answer sorting method. In this embodiment of the application, an entity device corresponding to the input/output module 402 in the embodiment shown in fig. 4 may be an input/output interface, an input/output unit, an input/output device, or a transceiver, and an entity device corresponding to the processing module 401 may be a processor.

Fig. 6 is a schematic structural diagram of a server 620 according to an embodiment of the present disclosure, where the server 620 may have a relatively large difference due to different configurations or performances, and may include one or more Central Processing Units (CPUs) 622 (e.g., one or more processors) and a memory 632, and one or more storage media 630 (e.g., one or more mass storage devices) for storing applications 642 or data 644. Memory 632 and storage medium 630 may be, among other things, transient or persistent storage. The program stored in the storage medium 630 may include one or more modules (not shown), each of which may include a series of instruction operations for the server. Still further, the central processor 622 may be configured to communicate with the storage medium 630 and execute a series of instruction operations in the storage medium 630 on the server 620.

The Server 620 may also include one or more power supplies 626, one or more wired or wireless network interfaces 650, one or more input-output interfaces 658, and/or one or more operating systems 641, such as Windows Server, Mac OS X, Unix, Linux, FreeBSD, etc.

The steps performed by the answer ranking device 40 in the above embodiment may be based on the structure of the server 620 shown in fig. 6. The steps performed by the server 620 shown in fig. 6 in the above-described embodiment may be based on the server structure shown in fig. 6, for example. For example, the processor 622, by calling instructions in the memory 632, performs the following operations:

obtaining a plurality of candidate answers corresponding to the question through the input/output interface 658;

determining target candidate answers needing to participate in reordering currently from a pre-ordering queue according to the ordering sequence of each candidate answer in the pre-ordering queue; extracting target evidence information from background information of historical candidate answers, and acquiring target correctness scores of the target candidate answers according to the target evidence information and the questions, wherein the historical candidate answers are candidate answers which are historically participated in reordering;

when all candidate answers in the pre-sort queue are inserted into the target sort queue, the target sort queue is output through the input-output interface 658.

In the foregoing embodiments, the descriptions of the respective embodiments have respective emphasis, and for parts that are not described in detail in a certain embodiment, reference may be made to related descriptions of other embodiments.

It can be clearly understood by those skilled in the art that, for convenience and brevity of description, the specific working processes of the system, the apparatus and the module described above may refer to the corresponding processes in the foregoing method embodiments, and are not described herein again.

In the several embodiments provided in the embodiments of the present application, it should be understood that the disclosed system, apparatus, and method may be implemented in other ways. For example, the above-described apparatus embodiments are merely illustrative, and for example, the division of the modules is merely a logical division, and in actual implementation, there may be other divisions, for example, multiple modules or components may be combined or integrated into another system, or some features may be omitted, or not implemented. In addition, the shown or discussed mutual coupling or direct coupling or communication connection may be an indirect coupling or communication connection through some interfaces, devices or modules, and may be in an electrical, mechanical or other form.

The modules described as separate parts may or may not be physically separate, and parts displayed as modules may or may not be physical modules, may be located in one place, or may be distributed on a plurality of network modules. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution of the present embodiment.

In addition, functional modules in the embodiments of the present application may be integrated into one processing module, or each module may exist alone physically, or two or more modules are integrated into one module. The integrated module can be realized in a hardware mode, and can also be realized in a software functional module mode. The integrated module, if implemented in the form of a software functional module and sold or used as a stand-alone product, may be stored in a computer readable storage medium.

In the above embodiments, the implementation may be wholly or partially realized by software, hardware, firmware, or any combination thereof. When implemented in software, may be implemented in whole or in part in the form of a computer program product.

The computer program product includes one or more computer instructions. The procedures or functions described in accordance with the embodiments of the present application are generated in whole or in part when the computer program is loaded and executed on a computer. The computer may be a general purpose computer, a special purpose computer, a network of computers, or other programmable device. The computer instructions may be stored in a computer readable storage medium or transmitted from one computer readable storage medium to another, for example, from one website site, computer, server, or data center to another website site, computer, server, or data center via wired (e.g., coaxial cable, fiber optic, Digital Subscriber Line (DSL)) or wireless (e.g., infrared, wireless, microwave, etc.). The computer-readable storage medium can be any available medium that a computer can store or a data storage device, such as a server, a data center, etc., that is integrated with one or more available media. The usable medium may be a magnetic medium (e.g., floppy Disk, hard Disk, magnetic tape), an optical medium (e.g., DVD), or a semiconductor medium (e.g., Solid State Disk (SSD)), among others.

The technical solutions provided by the embodiments of the present application are introduced in detail, and the principles and implementations of the embodiments of the present application are explained by applying specific examples in the embodiments of the present application, and the descriptions of the embodiments are only used to help understanding the method and core ideas of the embodiments of the present application; meanwhile, for a person skilled in the art, according to the idea of the embodiment of the present application, there may be a change in the specific implementation and application scope, and in summary, the content of the present specification should not be construed as a limitation to the embodiment of the present application.

Claims

1. A method for ranking answers, the method comprising:

obtaining a plurality of candidate answers corresponding to the question;

extracting target evidence information from background information of historical candidate answers, and obtaining target correctness scores of the target candidate answers according to the target evidence information and the questions, wherein the historical candidate answers are candidate answers which are subjected to reordering by history participation, and the reordering by history participation is any reordering before current reordering;

outputting the target sorting queue when all candidate answers in the pre-sorting queue are inserted into the target sorting queue;

the extracting target evidence information from the background information of the historical candidate answers comprises: extracting background information of a first candidate answer at a first time step in a reordering stage, and initializing the background information of the first candidate answer into an information vector, wherein the information vector is evidence information of the candidate answer participating in reordering at a second time step in a pre-ordering queue; when the second time step of the reordering stage is carried out, calculating evidence information corresponding to a third time step according to the candidate answers participating in reordering at the second time step and the evidence information corresponding to the second time step; and so on, to obtain the evidence information obtained at the second time step and each time step after the second time step.

2. The method according to claim 1, wherein after extracting the target evidence information from the background information of the historical candidate answer and before obtaining the target correctness score of the target candidate answer according to the target evidence information and the question, the method further comprises:

3. The method of claim 2, wherein extracting target evidence information from context information of the historical candidate answers comprises:

4. The method of claim 3, wherein obtaining the target correctness score based on the target matching information, the target evidence information, and the question comprises:

5. The method of claim 1, wherein before extracting the target evidence information from the context information of the historical candidate answers, the method further comprises:

6. The method of claim 2 or 3, wherein the method is implemented by a neural network model;

after the inserting the target candidate answer into the corresponding position in the target sorting queue, before outputting the target sorting queue when all candidate answers in the pre-sorting queue are inserted into the target sorting queue, the method further includes:

and training the neural network model according to the difference value.

7. The method of claim 6, wherein training a neural network model based on the difference comprises:

calculating a reward value for the difference;

8. The method of claim 1, wherein the target ordering queue is stored to a blockchain.

9. An answer ranking device, wherein the answer ranking device comprises:

the processing module is further used for determining a target candidate answer which needs to participate in reordering currently from the pre-sorting queue according to the sorting sequence of each candidate answer in the pre-sorting queue; extracting target evidence information from background information of historical candidate answers, and obtaining target correctness scores of the target candidate answers according to the target evidence information and the questions, wherein the historical candidate answers are candidate answers which are subjected to reordering by history participation, and the reordering by history participation is any reordering before current reordering;

the extracting target evidence information from the background information of the historical candidate answers comprises: extracting background information of a first candidate answer at a first time step in a reordering stage, and initializing the background information of the first candidate answer into an information vector, wherein the information vector is evidence information of the candidate answer participating in reordering at a second time step in a pre-ordering queue; when the second time step of the reordering stage is carried out, calculating evidence information corresponding to a third time step according to the candidate answers participating in reordering at the second time step and the evidence information corresponding to the second time step; and so on to obtain the evidence information obtained at the second time step and each time step after the second time step;

10. A computer device, the computer device comprising:

at least one processor, a memory, and an input-output unit;

wherein the memory is for storing a computer program and the processor is for calling the computer program stored in the memory to perform the method of any one of claims 1-8.

11. A computer-readable storage medium comprising instructions which, when executed on a computer, cause the computer to perform the method of any one of claims 1-8.