CN112434149B

CN112434149B - Information extraction method, information extraction device, information extraction equipment and storage medium

Info

Publication number: CN112434149B
Application number: CN202011419456.3A
Authority: CN
Inventors: 李长亮; 白金国; 唐剑波
Original assignee: Beijing Kingsoft Digital Entertainment Co Ltd
Current assignee: Beijing Kingsoft Digital Entertainment Co Ltd
Priority date: 2020-06-24
Filing date: 2020-06-24
Publication date: 2023-09-19
Anticipated expiration: 2040-06-24
Also published as: CN112434149A; CN111475636A; CN111475636B

Abstract

The application provides an information extraction method and device, a computing device and a computer readable storage medium, wherein the information extraction method comprises the following steps: inputting the problem vector, the selected paragraph vector and at least one unselected paragraph vector in the paragraph vector set into an extraction network to obtain the matching score of each unselected paragraph vector; inputting the unselected section vector with the highest matching score and the selected section vector into a fusion network to obtain an updated selected section vector; and under the condition that the updated selected paragraph vector meets the extraction condition, generating evidence chain information according to the problem vector and the updated selected paragraph vector. The information extraction method can improve the accuracy of evidence chain information extraction and the applicability to questions, and thereby improve the accuracy of answer prediction.

Description

Information extraction method, information extraction device, information extraction equipment and storage medium

Technical Field

The present application relates to the field of artificial intelligence, and in particular, to an information extraction method and apparatus, a computing device, and a computer readable storage medium.

Background

In a multi-hop question-answering task that uses machines to implement reading understanding, questions often require an inference chain made up of information from multiple paragraphs to answer, and any single relevant paragraph is insufficient to answer the question. Therefore, in order to answer a question correctly, it is generally necessary to filter a given plurality of relevant paragraphs and interfering paragraphs to obtain information related to the question, and concatenate the obtained information to obtain an answer corresponding to the question.

In the prior art, the information extraction method generally comprises the steps of firstly matching a problem with a paragraph to obtain directly related paragraphs; then, matching the obtained paragraphs to obtain the paragraphs related to the obtained paragraphs; and finally, constructing evidence chain information based on the acquired paragraphs, and inputting the evidence chain information into a model for answer prediction.

In the screening process of evidence chain information, the prior art adopts a method that firstly, paragraphs are roughly screened to obtain a plurality of paragraphs possibly related to the questions, and then answer prediction is carried out according to the related paragraphs. However, this approach suffers from the following drawbacks and deficiencies:

firstly, a keyword matching mechanism is adopted in the coarse screening method, and although paragraphs with keywords consistent with the questions can be screened out, some paragraphs with implicit answers can be omitted;

second, each filtering is performed only according to the previous paragraph, and all paragraphs and questions selected are not considered, so that the filtered paragraphs are irrelevant to answers;

third, there are limits to the number of screens and the relevant paragraphs that each screen has, resulting in a fixed length of evidence chain information.

Therefore, a new information extraction method is needed to improve the accuracy of evidence chain information extraction and applicability to questions, thereby improving the accuracy of answer prediction.

Disclosure of Invention

In view of the above, embodiments of the present application provide an information extraction method and apparatus, a computing device, and a computer readable storage medium, so as to solve the technical drawbacks in the prior art.

The embodiment of the application discloses an information extraction method, which comprises the following steps: inputting the problem vector, the selected paragraph vector and at least one unselected paragraph vector in the paragraph vector set into an extraction network to obtain the matching score of each unselected paragraph vector; inputting the corresponding unselected section vector and the selected section vector into a fusion network according to the matching score to obtain an updated selected section vector; and under the condition that the updated selected paragraph vector meets the extraction condition, generating evidence chain information according to the problem vector and the updated selected paragraph vector.

Optionally, the method further comprises: and under the condition that the updated selected paragraph vector does not meet the extraction condition, inputting at least one unselected paragraph vector in the question vector, the updated selected paragraph vector and the paragraph vector set into an extraction network.

Optionally, the unselected section vector includes a terminator;

determining that the updated selected paragraph vector satisfies an extraction condition includes:

Determining that the updated selected paragraph vector includes the terminator.

Optionally, inputting the unselected section vector with the highest matching score and the selected section vector into a fusion network to obtain an updated selected section vector, which includes:

cascading the unselected section vector with the highest matching score with the selected section vector to obtain a cascading vector;

and inputting the cascade vector into a fusion network to obtain the updated selected paragraph vector output by the fusion network.

Optionally, the method further comprises: and inputting the evidence chain information into an answer prediction model, and determining an answer vector corresponding to the question vector according to the updated selected paragraph vector.

Optionally, the training process of extracting the network includes:

inputting a problem vector sample, a selected paragraph vector sample and at least one unselected paragraph vector sample in a paragraph vector sample set into the extraction network to obtain a matching score of each unselected paragraph vector sample output by the extraction network;

and adjusting parameters of the extraction network until the matching score of the pre-designated unselected section vector sample is highest.

Optionally, the test procedure of the extraction network includes:

Inputting the problem vector, the selected paragraph vector and at least one unselected paragraph vector in the paragraph vector set into an extraction network to obtain the matching score of each unselected paragraph vector;

and under the condition that the unselected paragraph vector with the highest matching score does not meet a preset condition, respectively combining the unselected paragraph vector with the highest matching score with each original paragraph vector in the paragraph vector set to update the paragraph vector set.

Optionally, the method further comprises: and when the unselected section vector with the highest matching score meets a preset condition, taking the unselected section vector with the highest matching score as a final test result.

Optionally, the preset condition includes: the length of the unselected section vector with the highest matching score is smaller than a preset length; or alternatively, the process may be performed,

the paragraph vector set comprises a terminator in an initial state, wherein the preset condition comprises: the unselected section vector with the highest matching score includes the terminator.

The embodiment of the application also discloses an information extraction device, which comprises: the input module is configured to input the problem vector, the selected paragraph vector and at least one unselected paragraph vector in the paragraph vector set into the extraction network to obtain a matching score of each unselected paragraph vector; the fusion module is configured to input the corresponding unselected section vector and the selected section vector into a fusion network according to the matching score to obtain an updated selected section vector; the generation module is configured to generate evidence chain information according to the problem vector and the updated selected paragraph vector under the condition that the updated selected paragraph vector meets the extraction condition.

The embodiment of the application also discloses a computing device which comprises a memory, a processor and computer instructions stored on the memory and capable of running on the processor, wherein the processor executes the instructions to realize the steps of the information extraction method.

The embodiment of the application also discloses a computer readable storage medium which stores computer instructions which, when executed by a processor, implement the steps of the extraction method as described above.

The information extraction method and device, the computing equipment and the computer readable storage medium provided by the application effectively solve the problem that in the prior art, the segments of the implicit answer are missed due to the fact that the previous screening result is referenced according to the screening process of each round because the selected segment vector is continuously updated in the segment screening process executed by utilizing the extraction network. Furthermore, the paragraph screening process executed by the extraction network can adjust the circulation times according to the actual condition of whether the questions can be answered or not, and can generate evidence chain information with different lengths according to different questions, so that the evidence chain information required by answers is effectively ensured to be extracted, and the accuracy of answer prediction is improved.

Drawings

FIG. 1 is a block diagram of a computing device of an embodiment of the application.

Fig. 2 is a flow chart of an information extraction method according to an embodiment of the present application.

Fig. 3 is a schematic structural view of the multi-layered perceptron.

Fig. 4 is a schematic diagram of the structure of a recurrent neural network.

Fig. 5 is a flow chart of an information chain extraction method according to an embodiment of the present application.

Fig. 6 is a flow chart of an information extraction method according to an embodiment of the present application.

Fig. 7 is a schematic structural diagram of an information extraction device according to an embodiment of the present application.

Detailed Description

In the following description, numerous specific details are set forth in order to provide a thorough understanding of the present application. The present application may be embodied in many other forms than those herein described, and those skilled in the art will readily appreciate that the present application may be similarly embodied without departing from the spirit or essential characteristics thereof, and therefore the present application is not limited to the specific embodiments disclosed below.

The terminology used in the one or more embodiments of the specification is for the purpose of describing particular embodiments only and is not intended to be limiting of the one or more embodiments of the specification. As used in this specification, one or more embodiments and the appended claims, the singular forms "a," "an," and "the" are intended to include the plural forms as well, unless the context clearly indicates otherwise. It should also be understood that the term "and/or" as used in one or more embodiments of the present specification refers to and encompasses any or all possible combinations of one or more of the associated listed items.

It should be understood that, although the terms first, second, etc. may be used in one or more embodiments of this specification to describe various information, these information should not be limited by these terms. These terms are only used to distinguish one type of information from another. For example, a first may also be referred to as a second, and similarly, a second may also be referred to as a first, without departing from the scope of one or more embodiments of the present description. The word "if" as used herein may be interpreted as "responsive to a determination" depending on the context.

First, terms related to one or more embodiments of the present application will be explained.

Problem vector: generally, the method is in a vector representation form obtained after feature extraction based on a problem presented by a user, and the method is used for generating evidence chain information based on the problem vector, wherein the evidence chain information is used as reference data for obtaining a prediction answer corresponding to the problem.

Paragraph vector: the reference data for evidence chain information extraction based on a problem is obtained by extracting features from text paragraphs, and is therefore called a paragraph vector.

Extraction network: and the neural network is established through a training process and is used for scoring the matching degree of unselected paragraph vectors in the paragraph vector set based on the question vector and the selected paragraph vector in the paragraph vector set.

Fusion network: and the neural network is used for fusing the unselected section vector with the highest matching score with the selected section vector so as to update the selected section vector.

Evidence chain information: format data generated based on the content of the question and the selected paragraph vector is used as reference data for answer prediction.

Terminator: characters in the paragraph vector set are provided for identifying that the paragraph extraction process can end.

Multilayer perceptron (Multilayer Perceptron, MLP): is a neural network architecture that may have multiple hidden layers in between, in addition to the input and output layers.

Recurrent neural network (Recurrent Neural Network, RNN): is a recurrent neural network (Recursive Neural Network) which takes Sequence (Sequence) data as input, performs Recursion (reconversion) in the evolution direction of the Sequence, and all nodes (circulation units) are connected in a chained mode.

Answer prediction model: and acquiring a neural network model of a predicted answer corresponding to the problem presented by the user based on the evidence chain information.

In the present application, an information extraction method is provided. The present specification relates to an information extraction apparatus, a computing device, and a computer-readable storage medium, and is described in detail in the following embodiments.

Fig. 1 is a block diagram illustrating a configuration of a computing device 100 according to an embodiment of the present description. The components of the computing device 100 include, but are not limited to, a memory 110 and a processor 120. Processor 120 is coupled to memory 110 via bus 130 and database 150 is used to store data.

Computing device 100 also includes access device 140, access device 140 enabling computing device 100 to communicate via one or more networks 160. Examples of such networks include the Public Switched Telephone Network (PSTN), a Local Area Network (LAN), a Wide Area Network (WAN), a Personal Area Network (PAN), or a combination of communication networks such as the internet. The access device 140 may include one or more of any type of network interface, wired or wireless (e.g., a Network Interface Card (NIC)), such as an IEEE802.11 Wireless Local Area Network (WLAN) wireless interface, a worldwide interoperability for microwave access (Wi-MAX) interface, an ethernet interface, a Universal Serial Bus (USB) interface, a cellular network interface, a bluetooth interface, a Near Field Communication (NFC) interface, and so forth.

In one embodiment of the present description, the above-described components of computing device 100, as well as other components not shown in FIG. 1, may also be connected to each other, such as by a bus. It should be understood that the block diagram of the computing device shown in FIG. 1 is for exemplary purposes only and is not intended to limit the scope of the present description. Those skilled in the art may add or replace other components as desired.

Computing device 100 may be any type of stationary or mobile computing device including a mobile computer or mobile computing device (e.g., tablet, personal digital assistant, laptop, notebook, netbook, etc.), mobile phone (e.g., smart phone), wearable computing device (e.g., smart watch, smart glasses, etc.), or other type of mobile device, or a stationary computing device such as a desktop computer or PC. Computing device 100 may also be a mobile or stationary server.

Wherein the processor 120 may perform the steps of the method shown in fig. 2. Fig. 2 is a schematic flow chart illustrating an information extraction method according to an embodiment of the present specification, including steps 202 to 206.

Step 202: and inputting the problem vector, the selected paragraph vector and at least one unselected paragraph vector in the paragraph vector set into an extraction network to obtain the matching score of each unselected paragraph vector.

In particular, the question vector and paragraph vectors in the paragraph vector set may be obtained by a text encoder. For example, the text content of the question and the paragraph may be input to a text encoder, and the text content of the question and the paragraph may be vectorized by the text encoder to obtain a question vector and a paragraph vector. It should be noted that, the embodiment of the present disclosure does not limit the type of the file encoder.

The extraction network may be established through a pre-existing sample set-based training process and may be implemented using a multi-layer perceptron. Here, the multi-layer perceptron may be a feed-forward artificial neural network model that maps multiple data sets of inputs onto a single data set of outputs. Fig. 3 is a schematic structural view of the multi-layered perceptron. As shown in fig. 3, the multi-layer perceptron introduces one to a plurality of hidden layers (hidden layers) on the basis of a single-layer neural network, where the hidden layers shown in fig. 3 have a total of 5 hidden units; further, since the input layer does not involve computation, the number of layers of the multi-layer perceptron shown in fig. 3 is 2. It should be noted that, the hidden layer and the output layer in the multi-layer perceptron shown in fig. 3 are all connection layers.

For a multi-layer perceptron with only one hidden layer and H hidden units, the output is denoted as H. Since the hidden layer and the output layer in the multi-layer perceptron of fig. 3 are all connected layers, the weight parameter and the bias parameter of the hidden layer can be respectively set as W _h And b _h The weight parameter and the deviation parameter of the output layer are respectively set as W _o And b _o And thereby obtaining the relationship between single hidden layer neural network input, hidden layer output and output:

H＝XW _h +b _h A kind of electronic device

O＝HW _o +b _o ，

The above two equations are combined to obtain the relationship between the input and the output:

O＝(XW _h +b _h )W _o +b _o ＝XW _h W _o +b _h W _o +b _o ，

as can be readily seen from the above, such a neural network is equivalent to a single layer neural network, although it incorporates hidden layer(s). The root cause of this problem is the full join layer, which performs affine transformation only on data, while a plurality of affine transformation stacks remain one affine transformation. To solve such a problem, a nonlinear transformation, i.e. an activation function, is introduced.

Further, the output result of the extraction network is used to evaluate the degree of matching of the question vector, the selected section vector and the unselected section vector, and a matching score of each unselected section vector can be given, and the matching score is used to screen the unselected section vector which can be matched with the question vector and the selected section vector. It should be noted that, the embodiment of the present disclosure does not limit the type and structure of the extraction network.

In one embodiment of the present disclosure, the matching score of the unselected segment vector can be obtained by a learnable function f (q, p, pi). For example, the question vector q, the selected paragraph vector p and the unselected paragraph vector pi in the paragraph vector set may be input to a function f (q, p, pi) to obtain a matching score for the unselected paragraph vector. Here, the paragraph vector set is [ p1, p2, p3, p4, EOE ], where EOE (End Of Evidence) is a terminator or identifier in the paragraph vector set.

Step 204: and inputting the corresponding unselected section vector and the selected section vector into a fusion network according to the matching score to obtain the updated selected section vector.

Specifically, taking the function f (q, p, pi) above as an example, assuming that the score of p3 is highest, the paragraph vector screened by the extraction network is p3, and at this time, the selected paragraph vector p and p3 are fused to generate an updated selected paragraph vector.

Here, the converged network may be implemented using a recurrent neural network. The recurrent neural network is a recurrent neural network which takes sequence data as input, performs recursion in the evolution direction of the sequence and connects all nodes in a chained mode. FIG. 4 is a schematic diagram of the structure of the RNN neural network, as shown in FIG. 4, including inputs and outputs at three times t-1, t and t+1; the input Xt at the time t is subjected to the combined action of the state memory at the time U, t-1 and the weight W to form the state St at the time t; the state St at the time t is used as a part of the memory and is transmitted to the next time through the weight W, and is also used as the output at the time t and is output to the user after the action of the weight V. It should be noted that, the embodiment of the present disclosure does not limit the type and structure of the converged network.

Step 206: and under the condition that the updated selected paragraph vector meets the extraction condition, generating evidence chain information according to the problem vector and the updated selected paragraph vector.

Specifically, the updated selected paragraph vector is obtained by updating the selected paragraph vector, so that the selected paragraph vector p of the input function f (q, p, pi) in the next cycle includes not only the content of the selected paragraph vector but also the content of the previously selected paragraph vector.

In one embodiment of the present disclosure, the unselected section vector includes a terminator; determining that the updated selected paragraph vector satisfies the extraction condition includes: the updated selected paragraph vector is determined to include a terminator.

In an embodiment of the present disclosure, inputting an unselected section vector with the highest matching score and a selected section vector into a fusion network to obtain an updated selected section vector, including: cascading the unselected section vector with the highest matching score with the selected section vector to obtain a cascading vector; and inputting the cascade vectors into a fusion network to obtain updated selected paragraph vectors output by the fusion network.

Specifically, the fusion network is used for fusing the paragraph vectors screened by the extraction network and the selected paragraph vectors, so that the updated selected paragraph vectors not only comprise the content of the screened paragraph vectors, but also comprise the content of the selected paragraph vectors.

Here, the converged network may be implemented using a recurrent neural network. The recurrent neural network is a recurrent neural network which takes sequence data as input, performs recursion in the evolution direction of the sequence and connects all nodes in a chained mode. It should be noted that, the embodiment of the present disclosure does not limit the type and structure of the converged network.

Fig. 5 is a schematic flow chart illustrating an information extraction method according to an embodiment of the present specification, including steps 502 to 508.

Step 502: and inputting the problem vector, the selected paragraph vector and at least one unselected paragraph vector in the paragraph vector set into an extraction network to obtain the matching score of each unselected paragraph vector.

Step 504: and inputting the unselected section vector with the highest matching score and the selected section vector into a fusion network to obtain the updated selected section vector.

Step 506: and under the condition that the updated selected paragraph vector meets the extraction condition, generating evidence chain information according to the problem vector and the updated selected paragraph vector.

Step 508: and under the condition that the updated selected paragraph vector does not meet the extraction condition, inputting the problem vector, the updated selected paragraph vector and at least one unselected paragraph vector in the paragraph vector set into an extraction network.

Specifically, when the updated selected paragraph vector does not meet the extraction condition, it is explained that the paragraph extraction process has not yet ended, and there may be unselected paragraph vectors including valuable information in the paragraph vector set. As described above, whether the extraction condition is satisfied can be determined by identifying whether the terminator is included in the selected paragraph vector, and when the terminator is included in the selected paragraph vector, it can be regarded that the extraction condition is satisfied. When the selected paragraph vector does not include a terminator, the question vector, the selected paragraph vector in the paragraph vector set, and the remaining unselected paragraph vectors may be input to the extraction network, and the paragraph screening process may be re-performed by the extraction network in a return to step 502. Since the selected paragraph vector has been updated in step 504, in the next cycle, a new unselected paragraph vector may be selected from the remaining unselected paragraph vectors and fused with the selected paragraph vector to obtain an updated selected paragraph vector, and new evidence chain information is generated using the updated selected paragraph vector and the problem vector.

Fig. 6 is a schematic flow chart illustrating an information extraction method according to an embodiment of the present specification, including steps 602 to 612.

Step 602: and inputting the problem vector, the selected paragraph vector and at least one unselected paragraph vector in the paragraph vector set into an extraction network to obtain the matching score of each unselected paragraph vector.

Step 604: and inputting the unselected section vector with the highest matching score and the selected section vector into a fusion network to obtain the updated selected section vector.

Step 606: and under the condition that the updated selected paragraph vector meets the extraction condition, generating evidence chain information according to the problem vector and the updated selected paragraph vector.

Step 608: and inputting the evidence chain information into an answer prediction model, and determining an answer vector corresponding to the question vector according to the updated selected paragraph vector.

Specifically, the answer prediction model is used to output an answer corresponding to a question based on evidence chain information. In the case that the paragraph content in the evidence chain information is insufficient, the answer prediction model may not output a prediction answer; and in the case where the answer prediction model is capable of outputting a predicted answer, a terminator in the paragraph vector set may be selected to be fused with the selected paragraph vector.

Step 610: judging whether the answer prediction model can output a prediction answer, if so, continuing to execute step 612; otherwise, return to step 602.

Specifically, under the condition that the answer prediction model cannot output a predicted answer, the condition that the extraction condition of evidence chain information is not met is considered to indicate that the content in the currently selected paragraph vector is insufficient, and more paragraph vectors need to be continuously selected from a paragraph vector set to expand the content of evidence chain information capable of being formed; at this point, the question vector, the selected paragraph vector in the paragraph vector set, and the remaining unselected paragraph vectors may be input to the extraction network, and the paragraph screening process may be re-performed by the extraction network, returning to step 602. Since the selected paragraph vector has been updated in step 604, in the next cycle, a new unselected paragraph vector may be selected from the remaining unselected paragraph vectors and fused with the selected paragraph vector to obtain an updated selected paragraph vector, and new evidence chain information is generated using the updated selected paragraph vector and the problem vector. And the method is repeated in a circulating way until the answer prediction model can output a predicted answer based on the generated evidence chain information.

Step 612: judging whether the selected paragraph vector comprises a terminator, if so, ending the flow; otherwise, return to step 602.

Specifically, in the case where the answer prediction model is capable of outputting the predicted answer, since it cannot be determined accurately when to stop screening at the stage of loop screening of the paragraphs by using the extraction network, a terminator EOE may be added to the set of paragraph vectors as a new paragraph, i.e. the set of candidate paragraph vectors is [ p1, p2, p3, p4, EOE ], and when the updated selected paragraph vector does not include the terminator, the extraction condition of evidence link information is still considered to be not satisfied, and the paragraph screening process needs to be re-executed by the extraction network again at this time in step 602. However, since the remaining paragraph vectors may be independent of the question, that is, they may interfere with the answer prediction model to output the predicted answer, in the case that the answer prediction model is able to output the predicted answer, the terminator in the paragraph vector set may be directly selected to be fused with the selected paragraph vector, without further screening of new paragraph vectors; however, in the case that the extraction network selects the terminator, this means that the actual evidence chain information is sufficient to predict the answer, and the filtered paragraph vector is sufficient, at which point the paragraph extraction process may end, i.e., the loop terminates.

In an embodiment of the present application, since the selected paragraph vector is continuously updated in the paragraph screening process performed by using the extraction network, the screening process according to each round actually refers to the previous screening result, so that the problem in the prior art that the paragraphs with implicit answers are missed due to the fact that the next evidence paragraph is acquired only according to the previous evidence paragraph is effectively solved. Furthermore, the paragraph screening process executed by the extraction network can adjust the circulation times according to the actual condition of whether the questions can be answered or not, and can generate evidence chain information with different lengths according to different questions, so that the evidence chain information required by answers is effectively ensured to be extracted, and the accuracy of answer prediction is improved.

In one embodiment of the present application, the training process of extracting the network may include: inputting the problem vector sample, the selected paragraph vector sample and at least one unselected paragraph vector sample in the paragraph vector sample set into an extraction network to obtain the matching score of each unselected paragraph vector sample output by the extraction network; and adjusting parameters of the extraction network until the matching score of the pre-designated unselected section vector sample is highest. The pre-specified unselected paragraph vector samples should be the paragraph vector samples that most closely match the problem vector samples, and the training process may be stopped by adjusting the parameters of the paragraph extraction such that the match score of the pre-specified unselected paragraph vector samples is the highest, thus proving that the output accuracy of the extraction network is sufficiently high.

In an embodiment of the present application, the test procedure for extracting the network includes: inputting the problem vector, the selected paragraph vector and at least one unselected paragraph vector in the paragraph vector set into an extraction network to obtain the matching score of each unselected paragraph vector; and under the condition that the unselected paragraph vector with the highest matching score does not meet the preset condition, respectively combining the unselected paragraph vector with the highest matching score with each original paragraph vector in the paragraph vector set to update the paragraph vector set. In an embodiment of the present application, when the unselected segment vector with the highest matching score meets a preset condition, the unselected segment vector with the highest matching score is used as a final test result. The preset conditions may include: the length of the unselected section vector with the highest matching score is smaller than the preset length; or, the paragraph vector set includes a terminator in an initial state, wherein the preset condition includes: the unselected paragraph vector with the highest matching score includes a terminator. It should be understood that the test procedure of the extraction network is similar to the steps of the actual information extraction method performed in the embodiment of the present application, and will not be described herein.

Fig. 7 is a schematic structural diagram of an information extraction device according to an embodiment of the present application. As shown in fig. 7, the information extraction device 70 includes:

An input module 71 configured to input the question vector, the selected paragraph vector and the at least one unselected paragraph vector in the paragraph vector set into the extraction network, resulting in a matching score for each unselected paragraph vector;

a fusion module 72 configured to input the corresponding unselected section vector and the selected section vector into a fusion network according to the matching score, to obtain an updated selected section vector;

the generating module 73 is configured to generate evidence chain information according to the question vector and the updated selected paragraph vector, in case the updated selected paragraph vector satisfies the extraction condition.

According to the information extraction device 70 provided by the embodiment of the application, the selected paragraph vector is continuously updated in the paragraph screening process executed by the extraction network, so that the screening process according to each round actually refers to the previous screening result, and therefore the problem that in the prior art, the paragraphs with implicit answers are missed due to the fact that the next evidence paragraph is acquired according to the previous evidence paragraph only is effectively solved. Furthermore, the paragraph screening process executed by the extraction network can adjust the circulation times according to the actual condition of whether the questions can be answered or not, and can generate evidence chain information with different lengths according to different questions, so that the evidence chain information required by answers is effectively ensured to be extracted, and the accuracy of answer prediction is improved.

In one embodiment of the present application, the input module 71 is further configured to: and under the condition that the updated selected paragraph vector does not meet the extraction condition, inputting the problem vector, the updated selected paragraph vector and at least one unselected paragraph vector in the paragraph vector set into an extraction network.

In one embodiment of the application, the unselected section vector includes a terminator; wherein determining that the updated selected paragraph vector satisfies the extraction condition comprises: the updated selected paragraph vector is determined to include a terminator.

In one embodiment of the present application, the fusion module 72 includes:

a concatenation unit 721 configured to concatenate the unselected section vector with the highest matching score with the selected section vector, to obtain a concatenated vector;

the merging unit 722 is configured to input the cascade vector into the merging network, and obtain an updated selected paragraph vector output by the merging network.

In an embodiment of the present application, the information extraction device 70 further includes: the prediction module 74 is configured to input evidence chain information into the answer prediction model, and determine an answer vector corresponding to the question vector according to the updated selected paragraph vector.

In an embodiment of the present application, the information extraction device 70 further includes:

A training module 75 configured to train the extraction network; wherein the training module 75 comprises:

an input unit 751 configured to input a question vector sample, a selected paragraph vector sample in a set of paragraph vector samples, and at least one unselected paragraph vector sample into an extraction network, resulting in a matching score for each unselected paragraph vector sample output by the extraction network;

an adjustment unit 752 is configured to adjust parameters of the extraction network until the matching score of the pre-specified unselected section vector samples is highest.

In an embodiment of the present application, the information chain extraction device 70 further includes:

a test module 76 configured to test the extraction network; wherein the test module 76 comprises:

an obtaining unit 761 configured to input the question vector, the selected paragraph vector and at least one unselected paragraph vector in the paragraph vector set into the extraction network, to obtain a matching score of each unselected paragraph vector;

the updating unit 762 is configured to, in a case where the unselected paragraph vector with the highest matching score does not meet the preset condition, combine the unselected paragraph vector with the highest matching score with each original paragraph vector in the paragraph vector set, so as to update the paragraph vector set.

In one embodiment of the present application, the test module 76 further includes:

and a confirmation unit 763 configured to take the unselected section vector with the highest matching score as a final test result when the unselected section vector with the highest matching score meets a preset condition.

In an embodiment of the present application, the preset conditions include: the length of the unselected section vector with the highest matching score is smaller than the preset length; or, the paragraph vector set includes a terminator in an initial state, wherein the preset condition includes: the unselected paragraph vector with the highest matching score includes a terminator.

An embodiment of the present application also provides a computing device including a memory, a processor, and computer instructions stored on the memory and executable on the processor, the processor implementing the following steps when executing the instructions:

inputting the unselected section vector with the highest matching score and the selected section vector into a fusion network to obtain an updated selected section vector;

and under the condition that the updated selected paragraph vector meets the extraction condition, generating evidence chain information according to the problem vector and the updated selected paragraph vector.

In an embodiment of the present application, the problem vector, the updated selected paragraph vector, and at least one unselected paragraph vector in the paragraph vector set may be input to the extraction network if the updated selected paragraph vector does not satisfy the extraction condition.

In one embodiment of the application, the unselected section vector includes a terminator; determining that the updated selected paragraph vector satisfies the extraction condition includes: the updated selected paragraph vector is determined to include a terminator.

In an embodiment of the present application, inputting the unselected section vector with the highest matching score and the selected section vector into a fusion network to obtain an updated selected section vector, including: cascading the unselected section vector with the highest matching score with the selected section vector to obtain a cascading vector; and inputting the cascade vectors into a fusion network to obtain updated selected paragraph vectors output by the fusion network.

In an embodiment of the present application, evidence chain information may be further input to an answer prediction model, and an answer vector corresponding to the question vector may be determined according to the updated selected paragraph vector.

In an embodiment of the present application, the training process for extracting the network includes: inputting the problem vector sample, the selected paragraph vector sample and at least one unselected paragraph vector sample in the paragraph vector sample set into an extraction network to obtain the matching score of each unselected paragraph vector sample output by the extraction network; and adjusting parameters of the extraction network until the matching score of the pre-designated unselected section vector sample is highest.

In an embodiment of the present application, the test procedure for extracting the network includes: inputting the problem vector, the selected paragraph vector and at least one unselected paragraph vector in the paragraph vector set into an extraction network to obtain the matching score of each unselected paragraph vector; and under the condition that the unselected paragraph vector with the highest matching score does not meet the preset condition, respectively combining the unselected paragraph vector with the highest matching score with each original paragraph vector in the paragraph vector set to update the paragraph vector set.

In an embodiment of the present application, when the unselected segment vector with the highest matching score meets a preset condition, the unselected segment vector with the highest matching score may be used as a final test result.

An embodiment of the application also provides a computer-readable storage medium storing computer instructions that, when executed by a processor, implement the steps of the information extraction method as described above.

The above is an exemplary version of a computer-readable storage medium of the present embodiment. It should be noted that, the technical solution of the storage medium and the technical solution of the information extraction method belong to the same concept, and details of the technical solution of the storage medium which are not described in detail can be referred to the description of the technical solution of the information extraction method.

The computer instructions include computer program code that may be in source code form, object code form, executable file or some intermediate form, etc. The computer readable medium may include: any entity or device capable of carrying the computer program code, a recording medium, a U disk, a removable hard disk, a magnetic disk, an optical disk, a computer Memory, a Read-Only Memory (ROM), a random access Memory (RAM, random Access Memory), an electrical carrier signal, a telecommunications signal, a software distribution medium, and so forth. It should be noted that the computer readable medium contains content that can be appropriately scaled according to the requirements of jurisdictions in which such content is subject to legislation and patent practice, such as in certain jurisdictions in which such content is subject to legislation and patent practice, the computer readable medium does not include electrical carrier signals and telecommunication signals.

It should be noted that, for the sake of simplicity of description, the foregoing method embodiments are all expressed as a series of combinations of actions, but it should be understood by those skilled in the art that the present application is not limited by the order of actions described, as some steps may be performed in other order or simultaneously in accordance with the present application. Further, those skilled in the art will appreciate that the embodiments described in the specification are all preferred embodiments, and that the acts and modules referred to are not necessarily all required for the present application.

In the foregoing embodiments, the descriptions of the embodiments are emphasized, and for parts of one embodiment that are not described in detail, reference may be made to the related descriptions of other embodiments.

The preferred embodiments of the application disclosed above are intended only to assist in the explanation of the application. Alternative embodiments are not intended to be exhaustive or to limit the application to the precise form disclosed. Obviously, many modifications and variations are possible in light of the above teaching. The embodiments were chosen and described in order to best explain the principles of the application and the practical application, to thereby enable others skilled in the art to best understand and utilize the application. The application is limited only by the claims and the full scope and equivalents thereof.

Claims

1. An information extraction method, characterized by comprising:

inputting the corresponding unselected section vector and the selected section vector into a fusion network according to the matching score to obtain an updated selected section vector;

generating evidence chain information according to the problem vector and the updated selected paragraph vector under the condition that the updated selected paragraph vector meets the extraction condition;

the test process of the extraction network comprises the following steps:

under the condition that the unselected paragraph vector with the highest matching score does not meet a preset condition, respectively combining the unselected paragraph vector with the highest matching score with each original paragraph vector in the paragraph vector set to update the paragraph vector set;

and under the condition that the updated selected paragraph vector does not meet the extraction condition, inputting at least one unselected paragraph vector in the question vector, the updated selected paragraph vector and the paragraph vector set into an extraction network.

2. The method of claim 1, wherein the unselected section vectors include a terminator;

determining that the updated selected paragraph vector includes the terminator.

3. The method of claim 1, wherein inputting the corresponding unselected section vector and the selected section vector into a fusion network according to the matching score, to obtain an updated selected section vector, comprises:

and inputting the cascade vector into the fusion network to obtain the updated selected paragraph vector output by the fusion network.

4. The method according to claim 1, wherein the method further comprises:

and inputting the evidence chain information into an answer prediction model, and determining an answer vector corresponding to the question vector according to the updated selected paragraph vector.

5. The method of claim 1, wherein the training process of extracting the network comprises:

6. The method according to claim 1, wherein the method further comprises:

and when the unselected section vector with the highest matching score meets a preset condition, taking the unselected section vector with the highest matching score as a final test result.

7. The method according to claim 1 or 6, wherein the preset conditions comprise: the length of the unselected section vector with the highest matching score is smaller than a preset length; or alternatively, the process may be performed,

8. The method of claim 1, wherein the problem vector and paragraph vectors in the paragraph vector set are obtained by a text encoder.

9. The method of claim 1, wherein the problem vector and paragraph vectors in the paragraph vector set are obtained by a text encoder, comprising:

inputting the text contents of the questions and the paragraphs into the text encoder, and carrying out vectorization processing on the text contents of the questions and the paragraphs through the text encoder to obtain question vectors and paragraph vectors.

10. The method of claim 4, wherein determining an answer vector corresponding to the question vector from the updated selected paragraph vector further comprises:

under the condition that the answer prediction model outputs a predicted answer, judging whether the selected paragraph vector comprises a terminator, if so, ending the flow;

if not, continuing to input the question vector, the selected paragraph vector and at least one unselected paragraph vector in the paragraph vector set into the extraction network.

11. The method of claim 4, wherein determining an answer vector corresponding to the question vector from the updated selected paragraph vector further comprises:

and if the answer prediction model does not output a predicted answer, continuing to input the question vector, the selected paragraph vector in the paragraph vector set and at least one unselected paragraph vector into the extraction network.

12. An information extraction apparatus, characterized by comprising:

the input module is configured to input the problem vector, the selected paragraph vector and at least one unselected paragraph vector in the paragraph vector set into the extraction network to obtain a matching score of each unselected paragraph vector;

The fusion module is configured to input the corresponding unselected section vector and the selected section vector into a fusion network according to the matching score to obtain an updated selected section vector;

a generation module configured to generate information from the question vector and the updated selected paragraph vector if the updated selected paragraph vector satisfies an extraction condition;

a test module configured to test the extraction network; wherein, the test module includes:

the acquisition unit is configured to input the problem vector, the selected paragraph vector and at least one unselected paragraph vector in the paragraph vector set into the extraction network to obtain a matching score of each unselected paragraph vector;

the updating unit is configured to respectively combine the unselected paragraph vector with the highest matching score with each original paragraph vector in the paragraph vector set to update the paragraph vector set under the condition that the unselected paragraph vector with the highest matching score does not meet the preset condition;

the input module is configured to input the question vector, the updated selected paragraph vector, and at least one unselected paragraph vector in a paragraph vector set into an extraction network if the updated selected paragraph vector does not satisfy an extraction condition.

13. A computing device comprising a memory, a processor, and computer instructions stored on the memory and executable on the processor, wherein the processor, when executing the instructions, implements the steps of the method of any of claims 1-11.

14. A computer readable storage medium storing computer instructions which, when executed by a processor, implement the steps of the method of any one of claims 1 to 11.