CN109766418B

CN109766418B - Method and apparatus for outputting information

Info

Publication number: CN109766418B
Application number: CN201811524304.2A
Authority: CN
Inventors: 戴松泰; 杨仁凯
Original assignee: Beijing Baidu Netcom Science and Technology Co Ltd
Current assignee: Beijing Baidu Netcom Science and Technology Co Ltd
Priority date: 2018-12-13
Filing date: 2018-12-13
Publication date: 2021-08-24
Anticipated expiration: 2038-12-13
Also published as: CN109766418A

Abstract

The embodiment of the application discloses a method and a device for outputting information. One embodiment of the method comprises: acquiring an inquiry text and a target text; inputting a query text and a target text into a pre-trained answer extraction model to obtain an answer text corresponding to the query text and a no-answer probability, wherein the answer extraction model is used for representing the corresponding relation between the query text and the target text and between the answer text and the no-answer probability, and the no-answer probability is used for representing the probability that an answer matched with the query text cannot be extracted from the target text; and outputting answer text and probability of no answer. This embodiment enables the output of answer text corresponding to the query text and the output of a probability characterizing the absence of an answer in the target text that matches the query text.

Description

Method and apparatus for outputting information

Technical Field

The embodiment of the application relates to the technical field of computers, in particular to a method and a device for outputting information.

Background

With the rapid development of artificial intelligence technology, the general answer extraction technology is more and more important in the field of machine reading understanding. For a particular question, the technique of mining answers to the question in a given text is one of the important components of the question-answering system.

The correlation is usually done by fuzzy matching of the input question with the text to try to ensure that the input question is answered in the input text. In addition, additional confidence check can be carried out on the matching of the output answer and the original question through a network or authoritative data, so that whether the answer corresponds to the original question or not is judged.

Disclosure of Invention

The embodiment of the application provides a method and a device for outputting information.

In a first aspect, an embodiment of the present application provides a method for outputting information, where the method includes: acquiring an inquiry text and a target text; inputting the query text and the target text into a pre-trained answer extraction model to obtain an answer text corresponding to the query text and a no-answer probability, wherein the answer extraction model is used for representing the corresponding relation between the query text and the target text and the answer text and the no-answer probability, and the no-answer probability is used for representing the probability that an answer matched with the query text cannot be extracted from the target text; and outputting answer text and probability of no answer.

In some embodiments, the answer extraction model includes a first encoding layer, a first interaction layer based on attention mechanism (attention), a recurrent neural network, a neural network output layer, a first hidden layer, and a first hidden layer output layer; and inputting the question text and the target text into a pre-trained answer extraction model to obtain an answer text and a no-answer probability corresponding to the question text, wherein the method comprises the following steps: inputting the query text and the target text into a first coding layer to obtain a first query text vector and a first target text vector; inputting the first query text vector and the first target text vector to a first interaction layer based on an attention mechanism to obtain a first output matrix; inputting the first output matrix into a recurrent neural network to obtain an output vector; inputting the output vector to a neural network output layer to obtain the position of an answer in a target text; inputting the first output matrix and the output vector into a first hidden layer to obtain a first probability vector; inputting the first probability vector to a first hidden layer output layer to obtain the probability of no answer; and generating an answer text according to the position of the answer in the target text.

In some embodiments, the method further comprises: inputting the question text, the target text and the answer text into a pre-trained answer credibility model to obtain answer credibility probability, wherein the answer credibility model is used for representing the corresponding relation among the question text, the target text, the answer text and the answer credibility probability, and the answer credibility probability is used for representing the matching degree among the answer text, the question text and the target text; and outputting the credibility probability of the answer.

In some embodiments, the answer credibility model comprises a second coding layer, a second interaction layer based on an attention mechanism, a second hidden layer and a second hidden layer output layer; inputting the query text, the target text and the answer text into a pre-trained answer credibility model to obtain answer credibility probability, wherein the answer credibility probability comprises the following steps: inputting the query text, the target text and the answer text into a second coding layer to obtain a second query text vector, a second target text vector and an answer text vector; inputting the second query text vector, the second target text vector and the answer text vector to a second interaction layer based on an attention mechanism to obtain a second output matrix; inputting the second output matrix into a second hidden layer to obtain a second probability vector; and inputting the second probability vector to a second hidden layer output layer to obtain the answer credibility probability.

In some embodiments, the method further comprises: determining the correct probability of the answer based on the probability of no answer and the credibility probability of the answer, wherein the correct probability of the answer is used for representing the accuracy degree of the answer text as the answer of the question text; and outputting the correct probability of the answer.

In a second aspect, an embodiment of the present application provides an apparatus for outputting information, including: an acquisition unit configured to acquire a query text and a target text; the first determination unit is configured to input the question text and the target text into a pre-trained answer extraction model, and obtain an answer text corresponding to the question text and a no-answer probability, wherein the answer extraction model is used for representing the corresponding relation between the question text and the target text and between the answer text and the no-answer probability, and the no-answer probability is used for representing the probability that an answer matched with the question text cannot be extracted from the target text; a first output unit configured to output the answer text and the no answer probability.

In some embodiments, the answer extraction model includes a first encoding layer, a first interaction layer based on attention mechanism, a recurrent neural network, a neural network output layer, a first hidden layer, and a first hidden layer output layer; and the first determination unit is further configured to: inputting the query text and the target text into a first coding layer to obtain a first query text vector and a first target text vector; inputting the first query text vector and the first target text vector to a first interaction layer based on an attention mechanism to obtain a first output matrix; inputting the first output matrix into a recurrent neural network to obtain an output vector; inputting the output vector to a neural network output layer to obtain the position of an answer in a target text; inputting the first output matrix and the output vector into a first hidden layer to obtain a first probability vector; inputting the first probability vector to a first hidden layer output layer to obtain the probability of no answer; and generating an answer text according to the position of the answer in the target text.

In some embodiments, the apparatus further comprises: the second determination unit is configured to input the question text, the target text and the answer text into a pre-trained answer credibility model to obtain answer credibility probability, wherein the answer credibility model is used for representing the corresponding relation among the question text, the target text, the answer text and the answer credibility probability, and the answer credibility probability is used for representing the matching degree among the answer text, the question text and the target text; a second output unit configured to output the answer credibility probability.

In some embodiments, the answer credibility model comprises a second coding layer, a second interaction layer based on an attention mechanism, a second hidden layer and a second hidden layer output layer; and the second determination unit is further configured to: inputting the query text, the target text and the answer text into a second coding layer to obtain a second query text vector, a second target text vector and an answer text vector; inputting the second query text vector, the second target text vector and the answer text vector to a second interaction layer based on an attention mechanism to obtain a second output matrix; inputting the second output matrix into a second hidden layer to obtain a second probability vector; and inputting the second probability vector to a second hidden layer output layer to obtain the answer credibility probability.

In some embodiments, the apparatus further comprises: a third determining unit configured to determine an answer correctness probability based on the no-answer probability and the answer credibility probability, wherein the answer correctness probability is used for representing the accuracy degree of the answer text as the answer of the question text; a third output unit configured to output the answer correct probability.

In a third aspect, an embodiment of the present application provides an electronic device, including: one or more processors; a storage device having one or more programs stored thereon; when the one or more programs are executed by the one or more processors, the one or more processors are caused to implement the method as described in any implementation of the first aspect.

In a fourth aspect, the present application provides a computer-readable medium, on which a computer program is stored, which when executed by a processor implements the method described in any implementation manner of the first aspect.

According to the method and the device for outputting the information, firstly, an inquiry text and a target text are obtained; then, inputting the question text and the target text into a pre-trained answer extraction model to obtain an answer text corresponding to the question text and a probability of no answer, wherein the answer extraction model is used for representing the corresponding relation between the question text and the target text and between the answer text and the probability of no answer, and the probability of no answer is used for representing the probability that an answer matched with the question text cannot be extracted from the target text; finally, the answer text and the probability of no answer are output. Therefore, the answer text corresponding to the query text and the probability for representing that no answer matched with the query text exists in the target text are output according to the query text and the target text.

Drawings

Other features, objects and advantages of the present application will become more apparent upon reading of the following detailed description of non-limiting embodiments thereof, made with reference to the accompanying drawings in which:

FIG. 1 is an exemplary system architecture diagram in which one embodiment of the present application may be applied;

FIG. 2 is a flow diagram for one embodiment of a method for outputting information, in accordance with the present application;

FIG. 3 is a schematic diagram of an application scenario of a method for outputting information according to an embodiment of the present application;

FIG. 4 is a flow diagram of yet another embodiment of a method for outputting information according to the present application;

FIG. 5 is a schematic block diagram illustrating one embodiment of an apparatus for outputting information according to the present application;

FIG. 6 is a schematic block diagram of a computer system suitable for use in implementing an electronic device according to embodiments of the present application.

Detailed Description

The present application will be described in further detail with reference to the following drawings and examples. It is to be understood that the specific embodiments described herein are merely illustrative of the relevant invention and not restrictive of the invention. It should be noted that, for convenience of description, only the portions related to the related invention are shown in the drawings.

It should be noted that the embodiments and features of the embodiments in the present application may be combined with each other without conflict. The present application will be described in detail below with reference to the embodiments with reference to the attached drawings.

Fig. 1 shows an exemplary architecture 100 to which the method for outputting information or the apparatus for outputting information of the present application may be applied.

As shown in fig. 1, the system architecture 100 may include

terminal devices

101, 102, 103, a network 104, and a server 105. The network 104 serves as a medium for providing communication links between the

terminal devices

101, 102, 103 and the server 105. Network 104 may include various connection types, such as wired, wireless communication links, or fiber optic cables, to name a few.

The

terminal devices

101, 102, 103 interact with a server 105 via a network 104 to receive or send messages or the like. The

terminal devices

101, 102, 103 may have various communication client applications installed thereon, such as a web browser application, a search-type application, an instant messaging tool, a mailbox client, social platform software, a text editing-type application, a reading-type application, and the like.

The

terminal apparatuses

101, 102, and 103 may be hardware or software. When the

terminal devices

101, 102, 103 are hardware, they may be various electronic devices having a display screen and supporting text processing, including but not limited to smart phones, tablet computers, e-book readers, laptop portable computers, desktop computers, and the like. When the

terminal apparatuses

101, 102, 103 are software, they can be installed in the electronic apparatuses listed above. It may be implemented as multiple pieces of software or software modules (e.g., to provide distributed services) or as a single piece of software or software module. And is not particularly limited herein.

The server 105 may be a server that provides various services, such as a background server that provides support for outputting answer text corresponding to the query text on the

terminal devices

101, 102, 103. The background server can analyze the acquired query text and target text, extract answers and the like, and feed back a processing result (such as answer text corresponding to the query text and probability of no answer) to the terminal device.

Note that, the query text and the target text may be directly stored locally in the server 105, and the server 105 may directly extract and process the query text and the target text stored locally, in which case, the

terminal apparatuses

101, 102, and 103 and the network 104 may not be present.

The server may be hardware or software. When the server is hardware, it may be implemented as a distributed server cluster formed by multiple servers, or may be implemented as a single server. When the server is software, it may be implemented as multiple pieces of software or software modules (e.g., to provide distributed services), or as a single piece of software or software module. And is not particularly limited herein.

It should be noted that the method for outputting information provided in the embodiment of the present application may be executed by the server 105, or may be executed by the

terminal devices

101, 102, and 103. Accordingly, the means for outputting information may be provided in the server 105, or may be provided in the

terminal apparatuses

101, 102, 103.

It should be understood that the number of terminal devices, networks, and servers in fig. 1 is merely illustrative. There may be any number of terminal devices, networks, and servers, as desired for implementation.

With continued reference to FIG. 2, a flow 200 of one embodiment of a method for outputting information in accordance with the present application is shown. The method for outputting information includes the steps of:

step 201, obtaining a query text and a target text.

In this embodiment, the query text may be text with a question property. The query text may be text containing preset characters. The preset characters may include, but are not limited to, at least one of the following: "who", "what", "where", "how", "? "," how much "," ask for questions ", and" ask for help ". As an example, the above-mentioned query text may be, for example, "who is the world's first riches? "can also be" how do it is cold? "it can also be" find the download address of XX electronic book urgently ". The target text can be any text which is pre-designated in a preset corpus according to actual application requirements; but also text according to rules, such as text related to the content of the above-mentioned query text. As an example, the target text may be a text whose semantic relevance to the query text exceeds a preset threshold. As yet another example, if a keyword determined from the above query text is included in the text, the text may be determined as the target text. The keywords may be determined according to a TF-IDF (term frequency-inverse document frequency) algorithm.

In the present embodiment, the execution subject of the method for outputting information (such as the server 105 shown in fig. 1) may acquire the query text and the target text by a wired connection manner or a wireless connection manner. As an example, the execution body may acquire the query text and the target text from a data server of the communication connection. As still another example, the execution subject may further acquire voice information of the user from the voice input device, and convert the voice information into the query text through voice recognition. Then, the execution subject may also grab the target text from the internet.

Step 202, inputting the query text and the target text into a pre-trained answer extraction model to obtain an answer text corresponding to the query text and a probability of no answer.

In this embodiment, the answer extraction model may be used to represent the correspondence between the question text, the target text and the answer text, and the probability of no answer. The above no answer probability may be used to characterize the probability that an answer matching the query text cannot be extracted from the target text. In practice, the answer extraction model may be various answer extraction methods applied to the question-answering system. As an example, first, the execution main body may split the target text into a sentence fragment set according to a preset character (e.g., "" "" | ". The sentence fragment can be a single character, a word, a phrase or a short sentence. Then, for a sentence fragment in the sentence fragment set, the execution body may calculate a similarity between the sentence fragment and the query text. Next, the execution subject may determine the sentence fragment corresponding to the obtained maximum similarity as the candidate answer text. Then, the answer texts are extracted from the candidate answer texts by utilizing a pre-trained semantic analysis model. Finally, the execution body may determine a difference between the number 1 and the maximum similarity as a probability of no answer. The method for calculating the similarity can firstly convert texts into a vector form by utilizing algorithms such as word2vec and GloVe, and then calculate the cosine similarity between vectors or calculate the semantic similarity by utilizing a pre-trained deep learning model. It should be noted that the method for calculating the similarity between the above texts and the semantic analysis model are well-known technologies that are widely researched and applied at present. And will not be described in detail herein.

In some optional implementations of the embodiment, the answer extraction model may include a first coding layer, a first interaction layer based on an attention mechanism, a recurrent neural network, a neural network output layer, a first hidden layer, and a first hidden layer output layer. Therefore, the executing body can input the query text and the target text into a pre-trained answer extraction model according to the following steps to obtain an answer text and a probability of no answer corresponding to the query text:

firstly, inputting a query text and a target text into a first coding layer to obtain a first query text vector and a first target text vector.

In these implementations, the first encoding layer described above may be used to characterize correspondence between text and text vectors. The first encoding layer described above may be various methods for generating word vectors. As an example, the first coding layer may be an LSA (Latent semantic analysis) matrix decomposition model. As yet another example, the first encoding layer may also be a Word2Vector model. The executing entity may input the query text and the target text obtained in step 201 into the first encoding layer, so as to obtain a first query text vector corresponding to the query text and a first target text vector corresponding to the target text.

And secondly, inputting the first query text vector and the first target text vector into a first interaction layer based on an attention mechanism to obtain a first output matrix.

In these implementations, the first interaction layer based on attention mechanism described above may be used to characterize the correspondence between the text vectors and the output matrix. The first interaction layer based on attention mechanism may be various types of ANN (Artificial Neural Network), such as CNN (Convolutional Neural Network), RNN (Recurrent Neural Network). After introducing the attention mechanism, the first interaction layer may determine weights of different elements in the input first query text vector and the first target text vector.

Optionally, the first interaction layer based on attention mechanism may include a first text interaction layer and a self-attention layer. The first text interaction layer may be an alignment model of the first query text vector, the first target text vector, and the intermediate output matrix. The above-mentioned self-attention layer may be used to characterize the correspondence between the intermediate output matrix and the first output matrix. With the self-attention layer, elements in the intermediate output matrix can be directly related through a calculation step, so that the long-distance interdependent features in the text can be captured more easily.

The execution body may input the first query text vector and the first target text vector obtained in the first step to the first interaction layer based on the attention mechanism, and obtain a first output matrix corresponding to the first query text vector and the first target text vector.

And thirdly, inputting the first output matrix into a recurrent neural network to obtain an output vector.

In these implementations, the RNN described above may be used to characterize the correspondence between the output matrix and the output vector. Alternatively, the RNN may be an LSTM (Long-Short Term Memory) network. The execution body may input the first output matrix obtained in the second step to the RNN to obtain an output vector corresponding to the first output matrix.

And fourthly, inputting the output vector to a neural network output layer to obtain the position of the answer in the target text.

In these implementations, the neural network output layer described above may be used to characterize the correspondence between the output vector and the location of the answer in the target text. The neural network output layer may be various activation functions for multi-class neural network output, such as a Softmax function. The executing body may input the output vector obtained in the third step to the neural network output layer to obtain a position of an answer in the target text. The position of the answer in the target text may include a starting position of the answer in the target text and an ending position of the answer in the target text. The above-mentioned positions may have various representations. For example, the first word in a sentence or the first paragraph in the target text.

And fifthly, inputting the first output matrix and the output vector into the first hidden layer to obtain a first probability vector.

In these implementations, the first hidden layer may be used to characterize the correspondence between the output matrix, the output vector, and the probability vector. The execution body may input the first output matrix obtained in the second step and the output vector obtained in the third step to the first hidden layer, and may obtain a first probability vector corresponding to the first output matrix and the output vector.

And sixthly, inputting the first probability vector to the first hidden layer output layer to obtain the probability of no answer.

In these implementations, the first hidden layer output layer may be used to characterize the correspondence between the probability vector and the probability of no answer. The first hidden layer output layer may be various activation functions for hidden layer neuron output, such as Sigmoid function. The executing body may input the first probability vector obtained in the fifth step to the first hidden layer output layer to obtain a probability of no answer.

And seventhly, generating an answer text according to the position of the answer in the target text.

In these implementation manners, the execution main body may extract a text corresponding to the position of the answer determined in the fourth step according to the position of the answer in the target text, and generate an answer text. As an example, the query text is "where XX will be held in 2008". The target text is "Beijing 2008XX will be open captioned". The executing body may determine that the starting position of the answer in the target text is the first word and the ending position is the first word according to the foregoing steps. The executing body may extract the first word "beijing" of the target text as the answer text according to the position of the answer in the target text.

It should be noted that, the fourth step and the fifth to sixth steps may be executed substantially in parallel; the fifth to sixth steps may be executed first, and then the fourth step may be executed, which is not limited herein.

In these implementations, the answer extraction model may be trained by:

and S1, obtaining an initial answer extraction model.

In these implementations, the initial answer extraction model may be various ANNs. By way of example, the initial answer extraction model may include, but is not limited to, RNN, CNN, and combinations thereof.

And S2, acquiring a training sample set.

In these implementations, each training sample in the set of training samples may include a sample challenge text, a sample target text, and a sample answer text and a sample no answer probability corresponding to the sample challenge text, the sample target text. The sample no-answer probability can be used to represent the probability that an answer matching the sample query text cannot be extracted from the sample target text.

In practice, training samples can be obtained in a variety of ways. As an example, a question entered by a user into a search engine may be taken as a sample query text. The text portions contained in the web pages returned by the search engine for the entered question may be used as sample target text. The answer may then be extracted from the sample target text as a sample answer text. Then, verifying according to a preset knowledge base, and for a sample answer text which can be used as a correct answer of the query text, determining the sample no-answer probability corresponding to the sample answer text as a small numerical value between 0 and 1, such as 0; for a sample answer text that cannot be a correct answer to the question text, the sample no-answer probability corresponding to the sample answer text may be determined to be a large numerical value between 0 and 1, for example, 1. And finally, performing associated storage on the sample inquiry text, the sample target text, the sample answer text and the sample no-answer probability corresponding to the sample answer text to finally obtain the training sample. And forming a large number of training samples through a large number of data to form a training sample set.

And S3, using a machine learning method to take the sample question texts and the sample target texts of the training samples in the training sample set as the input of the initial answer extraction model, taking the sample answer texts and the sample no-answer probabilities corresponding to the input sample question texts and the sample target texts as the expected output, and training to obtain the answer extraction model.

Specifically, the executing agent of the training step may input the sample query text and the sample target text of the training sample in the training sample set to the initial answer extraction model, and obtain the answer text and the probability of no answer of the training sample. Then, the difference degree between the obtained answer text and the sample answer text of the training sample can be calculated by using a preset loss function; and calculating the difference degree between the obtained no-answer probability and the sample no-answer probability of the training sample. Next, the complexity of the model can be computed using a regularization term. And then, based on the calculated difference degree and the complexity of the model, adjusting the structural parameters of the initial answer extraction model, and finishing the training under the condition of meeting a preset training finishing condition. And finally, determining the initial answer extraction model obtained by training as an answer extraction model.

It should be noted that the loss function may be a logarithmic loss function, and the regularization term may be an L2 norm or a Dropout technique. The preset training end condition may include, but is not limited to, at least one of the following: the training time exceeds the preset time; the training times exceed the preset times; the calculated difference degree is smaller than a preset difference threshold value; the accuracy on the test set reaches a preset accuracy threshold; and the coverage rate on the test set reaches a preset coverage rate threshold value.

It should be further noted that, based on the difference degree between the obtained answer text and no-answer probability of the training sample and the sample answer text and no-answer probability of the training sample, the structural parameters of the initial answer extraction model may be adjusted in various ways. For example, a BP (Back Propagation) algorithm or an SGD (Stochastic Gradient Descent) algorithm may be used to adjust network parameters of the initial answer extraction model.

It is noted that the execution subject of the training step described above may be the same as or different from the execution subject of the method for outputting information. If the answer extraction model is the same as the structure information of the answer extraction model, the execution subject of the training step can store the structure information and the parameter values of the trained answer extraction model locally after the answer extraction model is obtained through training. If the answer extraction model is different from the structure information, the execution subject of the training step can send the structure information and the parameter values of the trained answer extraction model to the execution subject of the method for outputting the information after the answer extraction model is obtained through training.

It should be noted that the first coding layer, the first interaction layer based on the attention mechanism, the recurrent neural network, and the first hidden layer in the answer extraction model may be trained separately or simultaneously as a whole, which is not limited in this embodiment.

Step 203, outputting the answer text and the probability of no answer.

In this embodiment, after obtaining the answer text and the probability of no answer from step 202, the executive body may output the answer text and the probability of no answer. Where the output may take a variety of forms. As an example, the executive may output the answer text and the probability of no answer to a communicatively coupled display device, such as a display. Therefore, the answer text and the probability of no answer obtained according to the steps can be presented. As yet another example, the execution body may further output the answer text and the no answer probability to a storage medium of the communication connection, such as a hard disk. Therefore, the answer texts and the probability of no answer obtained according to the steps can be stored for subsequent use.

With continued reference to fig. 3, fig. 3 is a schematic diagram of an application scenario of a method for outputting information according to an embodiment of the present application. In the application scenario of fig. 3, the user enters the text "2008 XX will be held there? "304. The server 302 acquires the above text 304 from the terminal device as an inquiry text. Then, the server 302 acquires, as a target text, a text 305 containing "2008 beijing XX meeting, 2008 8 months, 8 days, and 8 nights, which are held in the capital beijing of the people's republic of china" from the database server 303 communicatively connected thereto. The server 302 then inputs the query text 304 and the target text 305 into a pre-trained answer extraction model 306. The server 302 may split the target text 305 into sentence fragment sets "beijing XX meeting in 2008", "8 month in 2008 and 8 th in evening and 8 hours", and "held in beijing, the capital of the people's republic of china". The server 302 may determine the similarity between the query text 304 and the sentence fragments described above. Then, the server 302 may determine the sentence fragment "beijing 2008XX meeting" with the highest similarity (e.g., 0.9) as the candidate answer text. Then, a semantic analysis model is used to extract "beijing" from the candidate answer text as an answer text 307. Next, server 302 may determine 0.1 as no answer probability 308. Finally, the server 302 may output and display the determined answer text 307 and the no answer probability 308. Alternatively, the server 302 may integrate the determined answer text 307 and the probability of no answer 308 into information 309 of "beijing" and "0.1", and transmit the information 309 to the terminal device 301.

In the method provided by the above embodiment of the present application, first, an inquiry text and a target text are obtained; then, inputting the question text and the target text into a pre-trained answer extraction model to obtain an answer text corresponding to the question text and a probability of no answer, wherein the answer extraction model is used for representing the corresponding relation between the question text and the target text and between the answer text and the probability of no answer, and the probability of no answer is used for representing the probability that an answer matched with the question text cannot be extracted from the target text; finally, the answer text and the probability of no answer are output. According to the question text and the target text, the answer text corresponding to the question text is output, and the probability for representing that no answer matched with the question text exists in the target text is output. So that the confidence of the answer text can be known while the answer text is obtained.

With further reference to fig. 4, a flow 400 of yet another embodiment of a method for outputting information is shown. The process 400 of the method for outputting information includes the steps of:

step 401, obtaining a query text and a target text.

Step 402, inputting the query text and the target text into a pre-trained answer extraction model to obtain an answer text and a no-answer probability corresponding to the query text.

Step 403, outputting the answer text and the probability of no answer.

Step 401, step 402, and step 403 are respectively the same as step 201, step 202, and step 203 in the foregoing embodiment, and the above description for step 201, step 202, and step 203 also applies to step 401, step 402, and step 403, which is not described herein again.

Step 404, inputting the query text, the target text and the answer text into a pre-trained answer credibility model to obtain answer credibility probability.

In this embodiment, the answer confidence model may be used to represent the correspondence between the question text, the target text, the answer text and the answer confidence probability. The answer confidence probabilities described above may be used to characterize the degree of match between the answer text, the query text, and the target text. In practice, the answer confidence model described above may be various methods for determining the degree of matching between the answer text, the query text, and the target text. As an example, the executing entity may obtain the answer confidence probability according to the following steps:

in the first step, answer text and query text are combined to generate answer verification text. The above-described manner of generating the answer validation text may be various manners. As an example, answer text may be substituted for the query word in the query text. For example, the query text is "who is wife of zhang san? The answer text is "Liquan". Then, the generated answer validation text may be "san triplet wife is lie four".

And secondly, inputting the answer verification text into a pre-trained language recognition model, and obtaining sentence probability according to maximum likelihood estimation. As an example, the language identification model described above may be an N-gram (N-gram) language model. It should be noted that the above-mentioned language identification model is a well-known technology which is widely researched and applied at present. And will not be described in detail herein.

And thirdly, calculating the similarity between the sentences formed by splitting the answer verification text and the target text. For the sentences in the sentence set formed after the target text is split, the execution main body can calculate the similarity between the answer verification text and the sentences to obtain a similarity set.

And fourthly, calculating the maximum value in the obtained similarity set and the average value of the obtained sentence probability, and determining the average value as the answer credibility probability.

In some optional implementations of this embodiment, the answer confidence model may include a second encoding layer, a second interaction layer based on an attention mechanism, a second hidden layer, and a second hidden layer output layer. Therefore, the executing agent may input the query text, the target text and the answer text into a pre-trained answer credibility model according to the following steps to obtain an answer credibility probability:

firstly, inputting the query text, the target text and the answer text into a second coding layer to obtain a second query text vector, a second target text vector and an answer text vector.

In these implementations, the second encoding layer described above may be used to characterize correspondence between text and text vectors. The second encoding layer described above may be various methods for generating word vectors. As an example, the first encoding layer may be an LSA matrix decomposition model. As yet another example, the second encoding layer may also be a Word2Vector model. The executing entity may input the query text obtained in step 401, the target text, and the answer text obtained in step 402 into the second encoding layer, so as to obtain a second query text vector corresponding to the query text, a second target text vector corresponding to the target text, and an answer text vector corresponding to the answer text.

And secondly, inputting the second query text vector, the second target text vector and the answer text vector to a second interaction layer based on the attention mechanism to obtain a second output matrix.

In these implementations, the second interaction layer based on attention mechanism described above may be used to characterize the correspondence between the text vectors and the output matrix. The second interaction layer based on attention mechanism may be various ANN, such as CNN, RNN. After the attention mechanism is introduced, the second interaction layer may determine weights of different elements in the input second query text vector, the second target text vector, and the answer text vector.

The execution body may input the second query text vector, the second target text vector, and the answer text vector obtained in the first step to the second interaction layer based on the attention mechanism, and obtain a second output matrix corresponding to the second query text vector, the second target text vector, and the answer text vector.

And thirdly, inputting the second output matrix into a second hidden layer to obtain a second probability vector.

In these implementations, the second hidden layer may be used to characterize the correspondence between the output matrix and the probability vector. The execution subject may input the second output matrix obtained in the second step to the second hidden layer to obtain a second probability vector corresponding to the second output matrix.

And fourthly, inputting the second probability vector to a second hidden layer output layer to obtain the answer credibility probability.

In these implementations, the second hidden layer output layer can be used to characterize the correspondence between the probability vector and the answer confidence probability. The second hidden layer output layer may be various activation functions for hidden layer neuron output, such as Sigmoid function. The executing body may input the second probability vector obtained in the third step to the second hidden layer output layer to obtain the answer confidence probability.

In these implementations, the answer confidence model is trained by:

and S1, obtaining an initial answer credibility model.

In these implementations, the initial answer credibility model may be various ANNs. By way of example, the initial answer credibility model may include, but is not limited to, RNN, CNN, and combinations thereof.

And S2, acquiring a training sample set.

In these implementations, each training sample in the set of training samples may include a sample challenge text, a sample target text, a sample answer text, and a sample answer confidence probability corresponding to the sample challenge text, the sample target text, and the sample answer text. Wherein, the sample answer credibility can be used for characterizing the matching degree between the sample answer text, the sample inquiry text and the sample target text.

In practice, training samples can be obtained in a variety of ways. As an example, a question entered by a user into a search engine may be taken as a sample query text. The text portions contained in the web pages returned by the search engine for the entered question may be used as sample target text. The answer may then be extracted from the sample target text as a sample answer text. And then, labeling the matching degree between the sample answer text, the sample inquiry text and the sample target text according to a preset matching rule. As an example, for a sample answer text, a sample inquiry text and a sample target text with a matching degree greater than a preset threshold, the sample answer confidence probability corresponding to the sample answer text, the sample answer confidence probability being determined as a large value between 0 and 1, for example, 1; for the sample answer text, the sample inquiry text and the sample target text with the matching degree smaller than or equal to the preset threshold, the credibility probability of the corresponding sample answer can be determined to be a smaller value between 0 and 1, such as 0. And finally, performing associated storage on the sample answer text, the sample inquiry text, the sample target text and the sample answer credibility corresponding to the sample answer text, and finally obtaining the training sample. And forming a large number of training samples through a large number of data to form a training sample set.

And S3, using a machine learning method to input the sample query texts, the sample target texts and the sample answer texts of the training samples in the training sample set as initial answer credibility models, using the sample answer credibility probabilities corresponding to the input sample query texts, the sample target texts and the sample answer texts as expected outputs, and training to obtain the answer credibility models.

Specifically, the executing agent of the training step may input the sample query text, the sample target text, and the sample answer text of the training sample in the training sample set into the initial answer reliability model, so as to obtain the answer reliability probability of the training sample. Then, the degree of difference between the obtained answer confidence probability and the sample answer confidence probability may be calculated using a preset loss function. Next, the complexity of the model can be computed using a regularization term. And then, based on the calculated difference degree and the complexity of the model, adjusting the structural parameters of the initial answer reliability model, and finishing the training under the condition of meeting a preset training finishing condition. And finally, determining the initial answer credibility model obtained by training as an answer credibility model.

It should be further noted that, based on the difference degree between the obtained answer confidence probability of the training sample and the sample answer confidence probability of the training sample, the structural parameters of the initial answer confidence model may be adjusted in various ways. For example, a BP (Back Propagation) algorithm or an SGD (Stochastic Gradient Descent) algorithm may be used to adjust the network parameters of the initial answer confidence model.

It is noted that the execution subject of the training step described above may be the same as or different from the execution subject of the method for outputting information. If the answer confidence level model is the same as the structure information of the answer confidence level model, the executing agent of the training step can store the structure information and the parameter values of the trained answer confidence level model locally after the answer confidence level model is obtained through training. If the answer credibility model is different from the structure information, the execution subject of the training step can send the structure information and the parameter values of the trained answer credibility model to the execution subject of the method for outputting the information after the answer credibility model is obtained through training. It is understood that the main body of the training step may be the same as or different from the main body of the training step of step 202 in the previous embodiment. And is not limited herein.

It should be noted that the second coding layer, the second interaction layer based on the attention mechanism, and the second hidden layer in the answer confidence model may be trained separately, or may be trained simultaneously as a whole, which is not limited in this embodiment.

Step 405, determining the answer correct probability based on the no answer probability and the answer credibility probability.

In this embodiment, the answer correctness probability can be used to represent the probability that the answer text is used as the correct answer for the query text.

The executing entity may determine the answer correctness probability by using various evaluation methods capable of comprehensively using the no-answer probability and the answer credibility probability based on the no-answer probability obtained in step 402 and the answer credibility probability obtained in step 404. For example, the executing entity may first calculate a difference between the number 1 and the probability of no answer, and then calculate an average of the difference and the confidence probability of the answer, so as to obtain the correct probability of the answer. As another example, the execution subject may first determine whether the probability of no answer is less than a preset threshold (e.g., 0.1); in response to determining that the answer is less than the predetermined answer threshold, the executing entity may directly determine the answer confidence probability as the answer correct probability. Typically, the threshold is set to a small value. It is understood that when the no-answer probability is less than a small threshold, it means that the determined answer text has a high degree of confidence in the degree of matching between the question text and the target text. Therefore, the answer confidence probability for characterizing the degree of match between the question text, the target text, and the answer text may be determined as the answer correct probability.

And step 406, outputting the answer correct probability.

In this embodiment, after obtaining the answer correct probability from step 405, the executing entity may output the answer correct probability. Where the output may take a variety of forms. As an example, the execution subject may output the answer correct probability to a display device, such as a display, of the communication connection. Therefore, the answer correct probability obtained according to the steps can be presented. As yet another example, the execution main body may further output the answer correct probability to a storage medium of the communication connection, such as a hard disk. Therefore, the answer correct probability obtained according to the steps can be stored for subsequent use.

As can be seen from fig. 4, compared with the embodiment corresponding to fig. 2, the process 400 of the method for outputting information in this embodiment embodies a step of inputting the query text, the target text, and the answer text into a pre-trained answer credibility model to obtain an answer credibility probability, and a step of determining and outputting an answer correct probability based on the no-answer probability and the answer credibility probability. Therefore, the scheme described in the embodiment can comprehensively consider the matching degree between the obtained answer text and the query text and the target text, thereby realizing the objective evaluation on the credibility of the answer text.

With further reference to fig. 5, as an implementation of the methods shown in the above-mentioned figures, the present application provides an embodiment of an apparatus for outputting information, which corresponds to the method embodiment shown in fig. 2, and which is particularly applicable in various electronic devices.

As shown in fig. 5, the apparatus 500 for outputting information provided by the present embodiment includes an acquisition unit 501, a first determination unit 502, and a first output unit 503. The acquiring unit 501 is configured to acquire a query text and a target text; a first determining unit 502 configured to input the question text and the target text into a pre-trained answer extraction model, and obtain an answer text corresponding to the question text and a probability of no answer, where the answer extraction model is used to represent a correspondence between the question text and the target text and the answer text and the probability of no answer, and the probability of no answer is used to represent a probability that an answer matching the question text cannot be extracted from the target text; a first output unit 503 configured to output the answer text and the no answer probability.

In the present embodiment, in the apparatus 500 for outputting information: the specific processing of the obtaining unit 501, the first determining unit 502, and the first output unit 503 and the technical effects thereof can refer to the related descriptions of step 201, step 202, and step 203 in the corresponding embodiment of fig. 2, which are not repeated herein.

In some optional implementations of this embodiment, the answer extraction model may include a first coding layer, a first interaction layer based on an attention mechanism, a recurrent neural network, a neural network output layer, a first hidden layer, and a first hidden layer output layer; and the first determining unit 502 may be further configured to: inputting the query text and the target text into a first coding layer to obtain a first query text vector and a first target text vector; inputting the first query text vector and the first target text vector to a first interaction layer based on an attention mechanism to obtain a first output matrix; inputting the first output matrix into a recurrent neural network to obtain an output vector; inputting the output vector to a neural network output layer to obtain the position of an answer in a target text; inputting the first output matrix and the output vector into a first hidden layer to obtain a first probability vector; inputting the first probability vector to a first hidden layer output layer to obtain the probability of no answer; and generating an answer text according to the position of the answer in the target text.

In some optional implementations of the present embodiment, the apparatus 500 for outputting information may further include a second determining unit (not shown in the figure) and a second outputting unit (not shown in the figure). The second determining unit may be configured to input the question text, the target text, and the answer text into a pre-trained answer confidence model to obtain an answer confidence probability, where the answer confidence model may be used to represent a correspondence between the question text, the target text, the answer text, and the answer confidence probability may be used to represent a matching degree between the answer text, the question text, and the target text; the second output unit may be configured to output the answer confidence probability.

In some optional implementations of this embodiment, the answer confidence model may include a second encoding layer, a second interaction layer based on an attention mechanism, a second hidden layer, and a second hidden layer output layer; and the second determination unit may be further configured to: inputting the query text, the target text and the answer text into a second coding layer to obtain a second query text vector, a second target text vector and an answer text vector; inputting the second query text vector, the second target text vector and the answer text vector to a second interaction layer based on an attention mechanism to obtain a second output matrix; inputting the second output matrix into a second hidden layer to obtain a second probability vector; and inputting the second probability vector to a second hidden layer output layer to obtain the answer credibility probability.

In some optional implementations of the present embodiment, the apparatus 500 for outputting information may further include a third determining unit (not shown in the figure) and a third outputting unit (not shown in the figure). The third determining unit may be configured to determine a correct answer probability based on the no-answer probability and the answer confidence probability, where the correct answer probability may be used to represent an accuracy degree of an answer text as an answer to the question text; the third output unit may be configured to output the answer correct probability.

In the apparatus provided in the above embodiment of the present application, first, the query text and the target text are acquired by the acquiring unit 501; then, the first determining unit 502 inputs the question text and the target text into a pre-trained answer extraction model to obtain an answer text corresponding to the question text and a probability of no answer, where the answer extraction model is used to represent a correspondence between the question text and the target text and the answer text and the probability of no answer, and the probability of no answer is used to represent a probability that an answer matching the question text cannot be extracted from the target text; finally, the first output unit 503 outputs the answer text and the no answer probability. Therefore, the answering text corresponding to the inquiring text and the reliability of the answering text are output according to the inquiring text and the target text.

Referring now to FIG. 6, shown is a block diagram of a computer system 600 suitable for use in implementing the electronic device of an embodiment of the present application. The electronic device shown in fig. 6 is only an example, and should not bring any limitation to the functions and the scope of use of the embodiments of the present application.

As shown in fig. 6, the computer system 600 includes a Central Processing Unit (CPU)601 that can perform various appropriate actions and processes according to a program stored in a Read Only Memory (ROM)602 or a program loaded from a storage section 608 into a Random Access Memory (RAM) 603. In the RAM 603, various programs and data necessary for the operation of the system 600 are also stored. The CPU 601, ROM 602, and RAM 603 are connected to each other via a bus 604. An input/output (I/O) interface 605 is also connected to bus 604.

The following components are connected to the I/O interface 605: an input portion 606 including a keyboard, a mouse, and the like; an output portion 607 including a display such as a Liquid Crystal Display (LCD); a storage section 608 including a hard disk and the like; and a communication section 609 including a network interface card such as a LAN card, a modem, or the like. The communication section 609 performs communication processing via a network such as the internet. The driver 610 is also connected to the I/O interface 605 as needed. A removable medium 611 such as a magnetic disk, an optical disk, a magneto-optical disk, a semiconductor memory, or the like is mounted on the drive 610 as necessary, so that a computer program read out therefrom is mounted in the storage section 608 as necessary.

In particular, according to an embodiment of the present disclosure, the processes described above with reference to the flowcharts may be implemented as computer software programs. For example, embodiments of the present disclosure include a computer program product comprising a computer program embodied on a computer readable medium, the computer program comprising program code for performing the method illustrated in the flow chart. In such an embodiment, the computer program may be downloaded and installed from a network through the communication section 609, and/or installed from the removable medium 611. The computer program performs the above-described functions defined in the method of the present application when executed by a Central Processing Unit (CPU) 601.

It should be noted that the computer readable medium of the present application can be a computer readable signal medium or a computer readable storage medium or any combination of the two. A computer readable storage medium may be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any combination of the foregoing. More specific examples of the computer readable storage medium may include, but are not limited to: an electrical connection having one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing. In the present application, a computer readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. In this application, however, a computer readable signal medium may include a propagated data signal with computer readable program code embodied therein, for example, in baseband or as part of a carrier wave. Such a propagated data signal may take many forms, including, but not limited to, electro-magnetic, optical, or any suitable combination thereof. A computer readable signal medium may also be any computer readable medium that is not a computer readable storage medium and that can communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device. Program code embodied on a computer readable medium may be transmitted using any appropriate medium, including but not limited to: wireless, wire, fiber optic cable, RF, etc., or any suitable combination of the foregoing.

Computer program code for carrying out operations for aspects of the present application may be written in any combination of one or more programming languages, including an object oriented programming language such as Java, Smalltalk, C + + or the like and conventional procedural programming languages, such as the "C" programming language or similar programming languages. The program code may execute entirely on the user's computer, partly on the user's computer, as a stand-alone software package, partly on the user's computer and partly on a remote computer or entirely on the remote computer or server. In the case of a remote computer, the remote computer may be connected to the user's computer through any type of network, including a Local Area Network (LAN) or a Wide Area Network (WAN), or the connection may be made to an external computer (for example, through the Internet using an Internet service provider).

The flowchart and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments of the present application. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams and/or flowchart illustration, and combinations of blocks in the block diagrams and/or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.

The units described in the embodiments of the present application may be implemented by software or hardware. The described units may also be provided in a processor, and may be described as: a processor includes an acquisition unit, a first determination unit, and a first output unit. The names of these units do not in some cases constitute a limitation to the unit itself, and for example, the acquiring unit may also be described as a "unit that acquires question text and target text".

As another aspect, the present application also provides a computer-readable medium, which may be contained in the electronic device described in the above embodiments; or may exist separately without being assembled into the electronic device. The computer readable medium carries one or more programs which, when executed by the apparatus, cause the electronic device to: acquiring an inquiry text and a target text; inputting a query text and a target text into a pre-trained answer extraction model to obtain an answer text corresponding to the query text and a no-answer probability, wherein the answer extraction model is used for representing the corresponding relation between the query text and the target text and between the answer text and the no-answer probability, and the no-answer probability is used for representing the probability that an answer matched with the query text cannot be extracted from the target text; and outputting answer text and probability of no answer.

The above description is only a preferred embodiment of the application and is illustrative of the principles of the technology employed. It will be appreciated by those skilled in the art that the scope of the invention herein disclosed is not limited to the particular combination of features described above, but also encompasses other arrangements formed by any combination of the above features or their equivalents without departing from the spirit of the invention. For example, the above features may be replaced with (but not limited to) features having similar functions disclosed in the present application.

Claims

1. A method for outputting information, comprising:

acquiring an inquiry text and a target text;

inputting the question text and the target text into a pre-trained answer extraction model to obtain an answer text corresponding to the question text and a probability of no answer, wherein the answer extraction model is used for representing the corresponding relation between the question text and the target text and between the answer text and the probability of no answer, and the probability of no answer is used for representing the probability that an answer matched with the question text cannot be extracted from the target text;

and outputting the answer text and the probability of no answer.

2. The method of claim 1, wherein the answer extraction model comprises a first coding layer, a first interaction layer based on an attention mechanism, a recurrent neural network, a neural network output layer, a first hidden layer, and a first hidden layer output layer; and

the inputting the question text and the target text into a pre-trained answer extraction model to obtain an answer text and a no-answer probability corresponding to the question text comprises:

inputting the query text and the target text into the first coding layer to obtain a first query text vector and a first target text vector;

inputting the first query text vector and the first target text vector to the first interaction layer based on the attention mechanism to obtain a first output matrix;

inputting the first output matrix into the recurrent neural network to obtain an output vector;

inputting the output vector to the neural network output layer to obtain the position of an answer in the target text;

inputting the first output matrix and the output vector to the first hidden layer to obtain a first probability vector;

inputting the first probability vector to the first hidden layer output layer to obtain a probability without answer;

and generating an answer text according to the position of the answer in the target text.

3. The method of claim 1, wherein the method further comprises:

inputting the question text, the target text and the answer text into a pre-trained answer credibility model to obtain answer credibility probability, wherein the answer credibility model is used for representing the corresponding relation among the question text, the target text, the answer text and the answer credibility probability, and the answer credibility probability is used for representing the matching degree among the answer text, the question text and the target text;

and outputting the credibility probability of the answer.

4. The method of claim 3, wherein the answer confidence model comprises a second encoding layer, a second interaction layer based on an attention mechanism, a second hidden layer, and a second hidden layer output layer; and

inputting the question text, the target text and the answer text into a pre-trained answer credibility model to obtain answer credibility probability, wherein the answer credibility probability comprises the following steps:

inputting the query text, the target text and the answer text into the second coding layer to obtain a second query text vector, a second target text vector and an answer text vector;

inputting the second query text vector, the second target text vector and the answer text vector to the second interaction layer based on the attention mechanism to obtain a second output matrix;

inputting the second output matrix to the second hidden layer to obtain a second probability vector;

and inputting the second probability vector to the second hidden layer output layer to obtain the answer credibility probability.

5. The method of claim 3 or 4, wherein the method further comprises:

determining an answer correct probability based on the no answer probability and the answer credibility probability, wherein the answer correct probability is used for representing the accuracy degree of the answer text as the answer of the question text;

and outputting the correct probability of the answer.

6. An apparatus for outputting information, comprising:

an acquisition unit configured to acquire a query text and a target text;

a first determining unit, configured to input the question text and the target text into a pre-trained answer extraction model, and obtain an answer text corresponding to the question text and a no-answer probability, wherein the answer extraction model is used for representing a correspondence between the question text and the target text and between the answer text and the no-answer probability, and the no-answer probability is used for representing a probability that an answer matching the question text cannot be extracted from the target text;

a first output unit configured to output the answer text and the no answer probability.

7. The apparatus of claim 6, wherein the answer extraction model comprises a first coding layer, a first interaction layer based on an attention mechanism, a recurrent neural network, a neural network output layer, a first hidden layer, and a first hidden layer output layer;

the first determination unit is further configured to:

8. The apparatus of claim 6, wherein the apparatus further comprises:

a second determining unit configured to input the question text, the target text and the answer text into a pre-trained answer credibility model to obtain answer credibility probabilities, wherein the answer credibility model is used for representing correspondence between the question text, the target text, the answer text and the answer credibility probabilities, and the answer credibility probabilities are used for representing matching degrees among the answer text, the question text and the target text;

a second output unit configured to output the answer credibility probability.

9. The apparatus of claim 8, wherein the answer confidence model comprises a second encoding layer, a second interaction layer based on an attention mechanism, a second hidden layer, and a second hidden layer output layer;

the second determination unit is further configured to:

10. The apparatus of claim 8 or 9, wherein the apparatus further comprises:

a third determining unit configured to determine a probability of answer correctness based on the probability of no answer and the probability of answer confidence, wherein the probability of answer correctness is used for representing the accuracy degree of the answer text as the answer of the question text;

a third output unit configured to output the answer correct probability.

11. An electronic device, comprising:

one or more processors;

a storage device having one or more programs stored thereon;

when executed by the one or more processors, cause the one or more processors to implement the method of any one of claims 1-5.

12. A computer-readable medium, on which a computer program is stored which, when being executed by a processor, carries out the method according to any one of claims 1-5.