CN114077655A

CN114077655A - Method and device for training answer extraction model

Info

Publication number: CN114077655A
Application number: CN202010825792.1A
Authority: CN
Inventors: 孙雪; 李长亮
Original assignee: Beijing Kingsoft Digital Entertainment Co Ltd
Current assignee: Beijing Kingsoft Digital Entertainment Co Ltd
Priority date: 2020-08-17
Filing date: 2020-08-17
Publication date: 2022-02-22

Abstract

The application provides a method and a device for training an answer extraction model, wherein the method for training the answer extraction model comprises the following steps: determining a sample text from an original corpus, and screening at least one question to be inquired and a corresponding answer label which are associated with the sample text in a pre-constructed question set; inputting any one of the questions to be inquired and the sample text into a pre-trained answer extraction model, and determining an answer extraction result of the questions to be inquired; and generating a target loss value of the answer extraction model based on the answer extraction result and the answer label, and optimizing the answer extraction model based on the target loss value to obtain the target answer extraction model.

Description

Method and device for training answer extraction model

Technical Field

The present application relates to the field of computer technologies, and in particular, to a method and an apparatus for training an answer extraction model, a computing device, and a computer-readable storage medium.

Background

With the rapid development of the internet, more and more information is presented to users in the form of electronic text. In order to help a user to quickly find needed information in massive information, an information extraction concept is provided. Information extraction refers to extracting fact information from natural language texts and describing the information in a structured form; the machine reading understanding is a research aiming at teaching the machine to read human language and understand the connotation, the machine reading understanding task focuses more on understanding the text of the chapters, and the machine must learn relevant information from the chapters instead of answering the questions by using preset world knowledge and common knowledge.

At present, an important implementation manner of a method for training a machine to read and understand human language is to establish a machine reading understanding model, and further train the established machine reading understanding model to obtain a desired machine reading understanding model, so as to find out answers to questions in text segments on the basis of the machine reading understanding model obtained by training. However, in the current machine reading understanding model training process, aiming at the Chinese answer extraction task, a query problem matched with certain argument types cannot be generated; in addition, the loss considered in the model training process is not sufficient, the loss of the predicted answer cannot be fully reflected, the generalization performance of the model obtained by training is low, and the accuracy of the generated predicted answer is low.

Disclosure of Invention

In view of the above, embodiments of the present application provide a method and an apparatus for training an answer extraction model, a computing device, and a computer-readable storage medium, so as to solve technical defects in the prior art.

According to a first aspect of embodiments of the present application, there is provided a method for training an answer extraction model, including:

determining a sample text from an original corpus, and screening at least one question to be inquired and a corresponding answer label which are associated with the sample text in a pre-constructed question set;

inputting any one of the questions to be inquired and the sample text into a pre-trained answer extraction model, and determining an answer extraction result of the questions to be inquired;

and generating a target loss value of the answer extraction model based on the answer extraction result and the answer label, and optimizing the answer extraction model based on the target loss value to obtain the target answer extraction model.

Optionally, the problem set is constructed in the following manner:

extracting an event type label and an answer type label of a text from the original corpus;

integrating the event type label and the answer type label to generate a question label;

and generating a query question matched with the question label according to the category to which the answer type label contained in the question label belongs, and constructing a question set based on the query question.

Optionally, the generating a query question matched with the question tag according to the category to which the answer type tag included in the question tag belongs includes:

if the answer type label contained in the question label is of a first type, acquiring a predefined question template, and constructing a query question matched with the question label based on the question label and the question template;

and if the answer type label contained in the question label is of a second type, performing statistical analysis on an event sentence related to the answer type label of the second type in the original corpus, and constructing a query question matched with the question label according to an analysis result.

Optionally, the inputting any one of the questions to be queried and the sample text into a pre-trained answer extraction model, and determining an answer extraction result of the question to be queried, includes:

inputting any one of the questions to be queried and the sample text into the answer extraction model as an input set, and adding a word vector, a text vector and a position vector corresponding to each word unit in the input set by a vector coding module of the answer extraction model to generate a coding vector corresponding to each word unit;

calculating the probability distribution of the starting position and the ending position of each word unit serving as a predictive answer corresponding to the question to be inquired based on the coding vector;

and determining an answer extraction result corresponding to the question to be inquired according to the probability distribution of the starting position and the ending position.

Optionally, the determining, according to the probability distribution of the starting position and the ending position, an answer extraction result corresponding to the question to be queried includes:

taking the position of the word unit with the highest probability in the probability distribution of the starting position in the sample text as the starting position of the answer;

taking the position of the word unit with the highest probability in the probability distribution of the end position in the sample text as the end position of the answer; and the number of the first and second groups,

and taking word units between the starting position and the ending position as answer extraction results.

Optionally, the generating a target loss value of the answer extraction model based on the answer extraction result and the answer label includes:

determining a starting position loss of the starting position of the answer extraction result in the sample text based on the probability distribution of the starting position and the probability of the target starting position in the answer label;

determining an end position loss of the answer extraction result in the end position in the sample text based on the probability distribution of the end position and the probability of the target end position in the answer label;

determining a length loss of the answer extraction result based on the starting location and the ending location;

calculating the target loss value based on the start position loss, the end position loss, and the length loss.

Optionally, the calculating the target loss value based on the starting position loss, the ending position loss and the length loss includes:

and calculating the weighted sum of the starting position loss, the ending position loss and the length loss as the target loss value.

Optionally, the vector encoding module includes an embedding layer and n stack layers;

correspondingly, the generating the coding vector corresponding to each word unit includes:

s11, inputting the question to be inquired and the sample text into the embedding layer as an input set to obtain a corresponding input vector;

s12, inputting the input vector to the 1 st stack layer to obtain the output vector of the 1 st stack layer;

s13, inputting the output vector of the ith stack layer to the (i + 1) th stack layer to obtain the output vector of the (i + 1) th stack layer, wherein i belongs to [1, n ], and i begins to take a value from 1;

s14, judging whether i is equal to n-1, if yes, executing a step S15, and if not, executing a step S13;

and S15, outputting the output vector of the nth stack layer as the coding vector of each word unit in the input set.

According to a second aspect of embodiments of the present application, there is provided an answer extraction model training apparatus, including:

the system comprises a screening module, a query module and a query module, wherein the screening module is configured to determine a sample text from an original corpus and screen at least one to-be-queried question associated with the sample text and a corresponding answer label in a pre-constructed question set;

the determining module is configured to input any one of the questions to be queried and the sample text into a pre-trained answer extraction model, and determine an answer extraction result of the question to be queried;

and the calculation module is configured to generate a target loss value of the answer extraction model based on the answer extraction result and the answer label, and optimize the answer extraction model based on the target loss value to obtain a target answer extraction model.

According to a third aspect of embodiments herein, there is provided a computing device comprising a memory, a processor and computer instructions stored on the memory and executable on the processor, the processor implementing the steps of the method of training the answer extraction model when executing the instructions.

According to a fourth aspect of embodiments of the present application, there is provided a computer-readable storage medium storing computer instructions which, when executed by a processor, implement the steps of the method for training the answer extraction model.

In the embodiment of the application, after the sample text is determined, the questions to be inquired and the corresponding answer labels which are associated with the sample text are screened in a set of pre-constructed questions, and the questions to be inquired and the sample text are input into an answer extraction model for model training, so that the accuracy of the training result of the answer extraction model is improved, and the efficiency of the model training is improved; in addition, the target loss value between the answer extraction result output by the calculation model and the answer label is calculated, and the answer extraction model is optimized based on the target loss value, so that the generalization performance of the answer extraction model is improved.

Drawings

FIG. 1 is a block diagram of a computing device provided by an embodiment of the present application;

FIG. 2 is a flowchart of a method for training an answer extraction model according to an embodiment of the present application;

fig. 3 is a schematic architecture diagram of a BERT model provided in an embodiment of the present application;

FIG. 4 is a flow chart of a process for generating a code vector according to an embodiment of the present application;

FIG. 5 is a schematic diagram illustrating the generation of an input vector for an embedding layer according to an embodiment of the present disclosure;

FIG. 6 is a schematic diagram of a method for training an answer extraction model according to an embodiment of the present application;

fig. 7 is a schematic structural diagram of an answer extraction model training device according to an embodiment of the present application.

Detailed Description

In the following description, numerous specific details are set forth in order to provide a thorough understanding of the present application. This application is capable of implementation in many different ways than those herein set forth and of similar import by those skilled in the art without departing from the spirit of this application and is therefore not limited to the specific implementations disclosed below.

The terminology used in the one or more embodiments of the present application is for the purpose of describing particular embodiments only and is not intended to be limiting of the one or more embodiments of the present application. As used in one or more embodiments of the present application and the appended claims, the singular forms "a," "an," and "the" are intended to include the plural forms as well, unless the context clearly indicates otherwise. It should also be understood that the term "and/or" as used in one or more embodiments of the present application refers to and encompasses any and all possible combinations of one or more of the associated listed items.

It will be understood that, although the terms first, second, etc. may be used herein in one or more embodiments of the present application to describe various information, these information should not be limited by these terms. These terms are only used to distinguish one type of information from another. For example, a first aspect may be termed a second aspect, and, similarly, a second aspect may be termed a first aspect, without departing from the scope of one or more embodiments of the present application.

First, the noun terms to which one or more embodiments of the present invention relate are explained.

Event extraction: according to the type of the event, extracting the trigger word causing the event, the argument role participating in the event, the type of the event to which the argument role belongs and the like.

Hard-Loss: more difficult to lose. The triple loss is characterized in that during training, a given anchor, a positive sample and a negative sample form a triple, and a boundary parameter is set, so that the model strives to draw the distance between the positive sample pairs and push the distance between the negative sample pairs away.

MRC: the machine reads the understanding. The goal of this task is to extract the scope of answers from the article by a given question.

Word unit (token): before any actual processing of the input text, it needs to be segmented into language units such as words, punctuation marks, numbers or letters, which are called word units. For an english text, a word unit may be a word, a punctuation mark, a number, etc., and for a chinese text, the smallest word unit may be a word, a punctuation mark, a number, etc.

BERT model: a bidirectional attention neural network model. The BERT model may predict the current word from the left and right side contexts and the next sentence from the current sentence. The BERT model aims to obtain the semantic representation of the text containing rich semantic information by utilizing large-scale unmarked corpus training, then finely adjust the semantic representation of the text in a specific NLP task, and finally apply the NLP task.

And (3) sequence labeling: in short, given a sequence, each element in the sequence is labeled with a correlation model, which can be entity label, part of speech label, etc.

CE: a cross entropy loss function. Commonly used in the classification problem, a weighted sum of the predicted probability vector and the true label vector is used to reduce the loss through back propagation, making it trend toward the true label.

In the present application, a training method and apparatus for an answer extraction model, a computing device and a computer readable storage medium are provided, which are described in detail in the following embodiments one by one.

FIG. 1 shows a block diagram of a computing device 100 according to an embodiment of the present application. The components of the computing device 100 include, but are not limited to, memory 110 and processor 120. The processor 120 is coupled to the memory 110 via a bus 130 and a database 150 is used to store data.

Computing device 100 also includes access device 140, access device 140 enabling computing device 100 to communicate via one or more networks 160. Examples of such networks include the Public Switched Telephone Network (PSTN), a Local Area Network (LAN), a Wide Area Network (WAN), a Personal Area Network (PAN), or a combination of communication networks such as the internet. Access device 140 may include one or more of any type of network interface (e.g., a Network Interface Card (NIC)) whether wired or wireless, such as an IEEE802.11 Wireless Local Area Network (WLAN) wireless interface, a worldwide interoperability for microwave access (Wi-MAX) interface, an ethernet interface, a Universal Serial Bus (USB) interface, a cellular network interface, a bluetooth interface, a Near Field Communication (NFC) interface, and so forth.

In one embodiment of the present application, the above-mentioned components of the computing device 100 and other components not shown in fig. 1 may also be connected to each other, for example, by a bus. It should be understood that the block diagram of the computing device architecture shown in FIG. 1 is for purposes of example only and is not limiting as to the scope of the present application. Those skilled in the art may add or replace other components as desired.

Computing device 100 may be any type of stationary or mobile computing device, including a mobile computer or mobile computing device (e.g., tablet, personal digital assistant, laptop, notebook, netbook, etc.), a mobile phone (e.g., smartphone), a wearable computing device (e.g., smartwatch, smartglasses, etc.), or other type of mobile device, or a stationary computing device such as a desktop computer or PC. Computing device 100 may also be a mobile or stationary server.

The processor 120 may perform the steps of the method for training the answer extraction model shown in fig. 2. Fig. 2 is a flowchart illustrating a method for training an answer extraction model according to an embodiment of the present application, including steps 202 to 206.

Step 202, determining a sample text from the original corpus, and screening at least one to-be-queried question associated with the sample text and a corresponding answer label in a pre-constructed question set.

At present, for an event extraction task, one processing mode is to divide the event extraction task into two subtasks of trigger word extraction and argument extraction for processing, and the trigger word extraction multiple application sequence labels are used for obtaining entity labels of trigger words, but the processing mode is not applicable to the situation of argument entity overlapping.

Based on this, in the training method of the answer extraction model provided in the embodiment of the present specification, the answer extraction task is treated as an MRC task, a sample text is determined from an original corpus, at least one to-be-queried question and a corresponding answer label associated with the sample text are screened in a pre-constructed question set, any one of the to-be-queried question and the sample text is input into the pre-trained answer extraction model, an answer extraction result of the to-be-queried question is determined, a target loss value of the answer extraction model is generated based on the answer extraction result and the answer label, and the answer extraction model is optimized based on the target loss value, so that a target answer extraction model is obtained.

Specifically, the answer extraction model in the embodiment of the present specification is composed of a vector coding module (BERT model), a starting index prediction model, and an ending index prediction model, where an architecture diagram of the BERT model is shown in fig. 3 and includes 12 stack layers, and the 12 stack layers are sequentially connected. Each stack layer further comprises: a self-attention layer, a first specification layer, a feedforward layer, and a second specification layer. Inputting texts composed of articles and problems into an embedding layer as an input set to obtain a text vector, then inputting the text vector into the 1 st stack layer, inputting the output vector of the 1 st stack layer into the 2 nd stack layer … …, and so on, and finally obtaining the output vector of the last stack layer. And taking the output vector of the last stack layer as the expression vector of each word unit, inputting the expression vector into a feedforward layer for processing to obtain the coding vector of the input set.

In practical application, the obtained original corpus includes many texts, and the texts are written texts including certain information content, which may be texts of various paragraphs such as a sentence, a segment of text, multiple segments of text, an article or multiple articles, and the application does not limit this.

In specific implementation, the problem set can be specifically constructed in the following manner:

Specifically, an event type tag and an argument type tag are respectively extracted from a text of an original corpus, the two tags are integrated to generate a problem tag (actually, a process of traversal and combination), and then a problem matched with each problem tag is constructed for each problem tag to generate a problem set.

For example, event type tags extracted in the text of the original corpus include: marriage, promotion and judgment; the argument type tags extracted include: time and place; the two types of tags are integrated, and the generated problem tags include: promotion-time, marriage-time, decision-time, promotion-place, marriage-place, and decision-place.

After the problem label is generated, the MRC problem to be used is constructed for the problem label, the actual construction of the MRC problem is equivalent to performing a description on the problem label, and performing the targeted description is equivalent to providing more semantic information, for example, the problem label is: promotion-time, the MRC problem constructed for it may be: finding out the occurrence time of promotion events;

however, because some of the problem labels belong to labels which cannot give appropriate problems, the problem labels need to be classified, and an MRC problem matched with the problem labels needs to be constructed according to the classification result.

In specific implementation, the query question matched with the question label is generated according to the category to which the answer type label included in the question label belongs, and the query question can be specifically realized through the following modes:

Specifically, the arguments in the first category, i.e. the answer type labels, have universality, such as time, people, number of people, organization, and the like in the event sentence, and have substantially the same expression meaning, so that for the question labels including the answer type labels of the first category, only the character strings of the event type need to be added before each question for distinguishing, for example: the promotion-time and the marriage-time belong to a general label, and for the general label, a problem template (finding out the occurrence time of the XX event) can be used, and based on the problem template and the problem label, the MRC problem is synthesized by using a code, so that the MRC problem corresponding to the promotion-time is: finding out the occurrence time of promotion events; the MRC problem corresponding to "marriage-time" is: find out the time when the marriage event happens.

The argument in the second category, namely the label representing answer type, has no generality, for example: stop-and-go, the question label comprising answer type label of the second kind can't produce the appropriate question, to this kind of label, can use the statistical analysis of the event sentence, set up a more general and detailed question description; for example: stop-go-no-go, that is, finding the event of "stock-go" type in the text of the original corpus, then determining the universal language description of the event, analyzing the universal language description, and determining the general and detailed MRC problem matched with the universal language description, for example, "stock-go-no-go", the corresponding MRC problem is: finding out the range of the fluctuation in the stock fluctuation event, including the percentage fluctuation, the stop, the fall, etc.

The problem labels are classified, MRC problems matched with the problem labels are generated according to the classification results, a problem set is constructed, after sample texts are determined, problems to be inquired related to the sample texts can be screened in the problem set in the later period, the problems to be inquired and the sample texts are input into an answer extraction model for model training, and the accuracy of the answer extraction model training results and the efficiency of the model training are improved.

Step 204, inputting any one of the questions to be queried and the sample text into a pre-trained answer extraction model, and determining an answer extraction result of the questions to be queried.

Specifically, after a question set is constructed, a sample text is determined from an original corpus, at least one question to be queried and a corresponding answer label associated with the sample text are screened from the constructed question set, and the question to be queried and the sample text are input into a pre-trained answer extraction model to obtain an answer extraction result corresponding to the question to be queried.

In practical application, a sentence or a sample text is selected from an original corpus as a given event sequence, and then a question to be queried is determined according to an entity in the event sequence.

For example, if a given event sequence is "zhangsandesabeijing", the sequence length n is 5, and the entities in the event sequence are "zhangsanjing" and "beijing", it can be determined from these two entities that the question to be queried may be "which is to go? "and" who the person in the event is? ", the corresponding answers are" Beijing "and" Zhang three ", respectively;

after the problem to be queried is determined, the problem to be queried and the event sequence can be used as an input set, and are input into a BERT model (vector coding module) in a character string mode, so that a coding vector output by the model is obtained.

In specific implementation, any one of the questions to be queried and the sample text are input into a pre-trained answer extraction model, and an answer extraction result of the question to be queried is determined, which can be specifically realized by the following method:

Further, the vector coding module comprises an embedded layer and n stack layers; fig. 4 is a flowchart of a process for generating a coding vector according to an embodiment of the present disclosure, and referring to fig. 4, generating a coding vector corresponding to each word unit may specifically be implemented in a manner shown in steps 402 to 410:

step 402, inputting the question to be queried and the sample text into the embedding layer as an input set to obtain a corresponding input vector.

Referring to fig. 5, fig. 5 is a schematic diagram of the generation of the input vector. Wherein the input set includes "where to go? Two sentences of ' and ' zhang san deing beijing '. Wherein, "zhang san goes to beijing" is used as the target text, "go which? "as a problem.

The input vector generated by the embedding layer is formed by summing the following 3 vectors:

word unit vector-the vector to which each word unit corresponds;

sentence vector-the sentence vector to which each word unit belongs;

position vector-a vector generated by the position corresponding to each word unit.

Step 404, inputting the input vector to the 1 st stack layer to obtain an output vector of the 1 st stack layer;

step 406, inputting the output vector of the ith stack layer to the (i + 1) th stack layer to obtain the output vector of the (i + 1) th stack layer, wherein i belongs to [1, n ], and i takes a value from 1;

step 408, judging whether i is equal to n-1, if so, executing step 405, and if not, executing step 403;

and step 410, outputting the output vector of the nth stack layer as the coding vector of each word unit in the input set.

Specifically, the input set may take the following format: [ [ cls ], problem, [ sep ], sample text, [ sep ] ].

If it is determined that the question to be asked is "to which? "and" who the person in the event is? "the event sequence is" zhang san go to beijing ", then the input set may be" which is going? Two sentences of ' and ' zhang san deing beijing '. Wherein, "zhang san goes to beijing" is used as a sample text, "go where? "as a question, the input format is: [ [ cls ], go, where? [ sep ], Zhang, san, go, Beijing, and [ sep ] ]; the specific schematic diagram is shown in fig. 4.

Alternatively, the input set may be "who the person in the event is? Two sentences of ' and ' zhang san deing beijing '. Wherein, "zhang san goes to beijing" is used as a sample text, "who is a character in an event? "as a question, the input format is: [ [ cls ], events, medians, people, things, yes, who,? [ sep ], Zhang, san, go, Beijing, and [ sep ] ];

for example, the sample text includes "zhang san wen beijing", and the query question includes "where? "carry on the word segmentation to the above-mentioned sample text and inquiry question, get the unit set of word [ [ cls ] of the word [ ]]Where, where? And [ sep]Zhang, san, go, Bei, Jing, [ sep ]]]Wherein, CLS is the symbol of the beginning of sentence, SEP is the symbol of sentence division, the word unit set is embedded and input to BERT model, the output vector of the last stack layer of the model is used as the expression vector of each word unit and input to feedforward layer for processing, the coding vector of the input set is [ A ]₁、A₂、……A₁₀、A₁₁]。

In the embodiment of the specification, a problem set is constructed in advance, after a sample text is determined, a problem to be inquired and a corresponding answer label which are associated with the sample text are screened in the problem set, the problem to be inquired and the sample text are input into an answer extraction model for model training, the problem to be inquired is equivalent to prior information of the model, and the model outputs an answer extraction result according to the prior information, so that the accuracy of the output result is improved.

After the BERT model outputs the coding vector, it can predict whether it is the start or end position index of the entity (answer) or not by using two classification strategies for each word unit in the coding vector, which is specifically obtained by the classification functions shown in equations (1) and (2):

P_start＝softmax_eachrow(E·T_start)∈R^n×2formula (1)

P_end＝softmax_eachrow(E·T_end)∈R^n×2Formula (2)

The formula (1) is used for predicting the probability that each word unit in the coded vector is the initial position index of the entity; equation (2) is used to predict the probability that each word unit in the coded vector is an end position index of an entity; t in the formula (1)_startAnd T in the formula (2)_endThe preset model parameters are equivalent to the initialized weight of the word unit;

after the probability that each word unit is the entity starting position index or ending position index is calculated, probability results can be screened, and an index set is constructed based on the starting position index or the ending position index with the maximum probability in the screening results.

In specific implementation, according to the probability distribution of the starting position and the ending position, an answer extraction result corresponding to the question to be queried is determined, which can be specifically realized in the following manner:

Specifically, after the probability that each word unit is an entity initial or end position index is calculated, probability results can be screened, and an index set is constructed based on the initial or end position index with the highest probability in the screening results;

in practical application, the probability that each word unit is an entity initial position index is screened, and the formula of an initial position index set constructed based on the initial position index with the maximum probability in the screening result is shown as a formula (3); screening the probability that each word unit is the entity end position index, wherein the formula of an end position index set constructed based on the end position index with the maximum probability in the screening result is shown as the formula (4):

wherein i in formula (3) and j in formula (4) respectively represent the ith or jth word unit in the sample text,

representing the probability that the ith word unit is indexed by the starting position of the entity,

indicating the probability that the jth word unit is the index of the entity end position.

For example, suppose that the sample text is "zhangsan to beijing", the first word unit "zhang" in the sample text, the second word unit "three", and so on, and the fifth word unit "beijing"; if the probability of each word unit in the sample text as the initial position of the answer is [ x1, x2, x3, x4, x5 respectively]The probability of each word unit as the answer end position is [ y1, y2, y3, y4, y5 respectively]Wherein, in the probability of the answer starting position, x4

The probability value is maximum, then

Of the probabilities of answer end positions, y5

The probability value is maximum, then

Therefore, the entity (answer) corresponding to the query question is the sameThe 4 th word unit and the 5 th word unit in the text can determine that the answer corresponding to the query question is 'Beijing'.

In the embodiment of the specification, the probability that each word unit is an answer initial position index or an answer ending position index is calculated, screening is performed according to the calculation result, the word unit with the highest probability is used as the answer initial position or the answer ending position in the answer extraction result, and the word unit between the initial position and the answer ending position is used as the answer extraction result, so that the accuracy of the answer extraction result is ensured;

and step 206, generating a target loss value of the answer extraction model based on the answer extraction result and the answer label, and optimizing the answer extraction model based on the target loss value to obtain a target answer extraction model.

Specifically, after an answer extraction result of a to-be-queried question output by a model is obtained, calculation can be performed based on the answer extraction result and the answer label to generate a target loss value of the answer extraction model, and the answer extraction model is optimized based on the target loss value to obtain a target answer extraction model.

In specific implementation, the target loss value of the answer extraction model is generated based on the answer extraction result and the answer label, and the method may specifically be implemented in the following manner:

Further, the target loss value is calculated based on the start position loss, the end position loss and the length loss, i.e. a weighted sum of the start position loss, the end position loss and the length loss is calculated as the target loss value.

In the embodiment of the present disclosure, the target Loss value is calculated by using a triple Loss function (Hard-Loss function), and specific calculation formulas are shown in formulas (5), (6), and (7):

L_start＝CE(P_start,Y_start) Formula (6)

L_end＝CE(P_end,Y_end) Formula (7)

Wherein k in the formula (5) represents

Number of elements in the set, E_istartIs composed of

An element in the collection;

and

is composed of

The elements in the set are selected from the group,

represents a positive sample end index matching the ith start index;

an end index representing a negative example; alpha is an edgeA boundary parameter;

y in the formula (6)_startThe probability of indexing the starting position of the real answer label corresponding to the query question; y in the formula (7)_endIndexing the end position of the real answer label corresponding to the query question;

in use, the Hard-Loss function is implemented by giving an anchor (i.e., a target sample), a positive sample and a negative sample to form a triplet, and setting a boundary parameter to make the model approach the distance between the target sample and the positive sample and simultaneously pull the distance between the target sample and the negative sample.

The final target loss value is a weighted sum of the three losses, i.e., the target loss value L ═ ω₁·L_hard+ω₂·L_start+ω₃·L_end；

Wherein, ω is₁、ω₂And ω₃Weights corresponding to the three loss values, and in practical application, the weights, the boundary parameters and the T corresponding to the three loss values_startAnd T_endThe setting can be carried out according to the actual requirement, and no limitation is made herein.

If the probability that the 1 st word unit and the 2 nd word unit are the initial position indexes is determined to be the maximum according to the probability calculation result of taking each word unit in the sample text as the initial position of the answer, then

If the probability that the 3 rd word unit and the 4 th word unit are the index of the end position is determined to be the maximum according to the probability calculation result of taking each word unit in the sample text as the end position of the answer, then

If it is equal to the 1 st starting index (E)_1start) The matching positive sample end index is E_3endThen with the 1 st starting index (E)_1start) Matching negative sample end index is E_4endSimilarly, if the index is equal to the 2 nd initial index (E)_2start) The matching positive sample end index is E_4endThen with the 2 nd starting index (E)_2start) Matching negative sample end index is E_3end；

Thus, the method can obtain the product,

L_startand L_endRespectively obtaining the result by calculation according to a cross entropy loss function, namely obtaining the result by performing weighted calculation on the probability corresponding to the initial or end position index of the answer and the probability corresponding to the initial or end position index of the real answer by using a predicted word unit;

L_hard、L_startand L_endAnd after all the three losses are calculated, carrying out weighted calculation on the three losses according to the weights corresponding to the three losses respectively, and obtaining a target loss value.

After the target loss value is calculated, the parameters of the model can be adjusted according to the target loss value so as to realize model optimization.

In addition, after the answer extraction result is obtained, the answer extraction result can be compared with the answer label, if the accuracy of the answer extraction result does not meet the preset condition, the questions to be inquired in the question set can be optimized, the specific optimization mode can be determined according to the actual requirement, and no limitation is made here.

In the embodiment of the specification, the answer extraction result is compared with the answer label, and the question to be inquired corresponding to the answer extraction result with the accuracy not meeting the preset condition is optimized, so that the problem of argument entity overlapping is reduced to a certain extent; in addition, after the answer extraction result is obtained, the target loss value is calculated by utilizing the triple loss function, and the model parameter is adjusted based on the target loss value, so that the difficulty is increased for the training process of the model, the positive and negative samples matched with the argument entity can be better distinguished by the model, and the generalization performance of the model can be enhanced.

The present embodiment will be further described with reference to specific examples.

A schematic diagram of the answer extraction model training method provided in the embodiment of the present specification is shown in fig. 6, and the method includes firstly generating an MRC problem, that is, extracting an event type tag and a argument type tag from a text of an original corpus, integrating the two types of tags to generate a problem tag (actually, a process of traversal and combination), and then constructing an MRC problem matching each type of problem tag;

after the MRC problem is constructed, the problem q to be inquired is constructed_yCombining the event sequence X with the event sequence X to form a character string, inputting the character string into BERT, and outputting a context characterization matrix E;

if the sample text corresponding to the event sequence X is "zhang san wen beijing", the question to be queried may be "which is gone? ", the string is [ [ cls ], go, where, are? The method comprises the steps of (1), (sep), (post), (triplet), (go, north, Beijing), (sep) ], inputting a character string into a BERT model, obtaining a coding vector output by the model, inputting the coding vector into a starting index prediction model, obtaining the probability that each word unit in the coding vector is an answer starting position index, inputting the coding vector into an ending index prediction model, obtaining the probability that each word unit in the coding vector is an answer ending position index, screening probability results after obtaining the probability that each word unit is an answer starting or ending position index, and determining an answer extraction result based on a starting or ending position index with the highest probability in the screening results;

furthermore, after the answer extraction result is obtained, a target Loss value is calculated by using a Hard-Loss function, and the parameters of the model are adjusted according to the target Loss value so as to realize model optimization.

According to the method for training the answer extraction model, after the sample text is determined, the questions to be inquired and the corresponding answer labels which are associated with the sample text are screened in a set of pre-constructed questions, and the questions to be inquired and the sample text are input into the answer extraction model for model training, so that the accuracy of the result of the answer extraction model training is improved, and the efficiency of the model training is improved; in addition, the target loss value between the answer extraction result output by the calculation model and the answer label is calculated, and the answer extraction model is optimized based on the target loss value, so that the generalization performance of the answer extraction model is improved.

Corresponding to the above method embodiment, the present application further provides an embodiment of a training apparatus for an answer extraction model, and fig. 7 shows a schematic structural diagram of the training apparatus for an answer extraction model according to an embodiment of the present application. As shown in fig. 7, the apparatus 700 includes:

a screening module 702 configured to determine a sample text from an original corpus, and screen at least one to-be-queried question associated with the sample text and a corresponding answer label in a pre-constructed question set;

a determining module 704, configured to input any one of the questions to be queried and the sample text into a pre-trained answer extraction model, and determine an answer extraction result of the question to be queried;

a calculating module 706 configured to generate a target loss value of the answer extraction model based on the answer extraction result and the answer label, and optimize the answer extraction model based on the target loss value to obtain a target answer extraction model.

Optionally, the training device for the answer extraction model further includes:

the label extraction module is configured to extract an event type label and an answer type label of a text from the original corpus;

the label generation module is configured to integrate the event type label and the answer type label to generate a question label;

and the question set building module is configured to generate a query question matched with the question label according to the category to which the answer type label contained in the question label belongs, and build a question set based on the query question.

Optionally, the problem set constructing module includes:

a first question generation module configured to obtain a predefined question template if an answer type tag included in the question tag is of a first category, and generate a query question matched with the question tag based on the question tag and the question template;

and the second question generation module is configured to perform statistical analysis on event sentences related to answer type labels of the second type in the original corpus if the answer type labels included in the question labels are of the second type, and generate query questions matched with the question labels according to analysis results.

Optionally, the determining module 704 includes:

the coding vector generation sub-module is configured to input any one of the questions to be queried and the sample text into the answer extraction model as an input set, and the vector coding module of the answer extraction model sums a word vector, a text vector and a position vector corresponding to each word unit in the input set to generate a coding vector corresponding to each word unit;

a calculation sub-module configured to calculate, based on the encoding vector, a probability distribution of a starting position and an ending position of each word unit as a predictive answer corresponding to the question to be queried;

and the answer extraction result determining submodule is configured to determine an answer extraction result corresponding to the question to be queried according to the probability distribution of the starting position and the ending position.

Optionally, the answer extraction result determining sub-module includes:

a starting position determining unit configured to take the position of the word unit with the highest probability in the probability distribution of the starting position in the sample text as the starting position of the answer;

an end position determination unit configured to take a position of a word unit with a highest probability in the probability distribution of the end position in the sample text as an end position of the answer;

an answer extraction result determination unit configured to take word units between the starting position and the ending position as the answer extraction result.

Optionally, the calculating module 706 includes:

a first loss calculation submodule configured to determine a starting position loss of a starting position of the answer extraction result in the sample text based on the probability distribution of the starting position and the probability of a target starting position in the answer label;

a second loss calculation submodule configured to determine an end position loss of the answer extraction result in the sample text based on the probability distribution of the end position and the probability of the target end position in the answer label;

a third loss calculation submodule configured to determine a length loss of the answer extraction result based on the start position and the end position;

a target loss value operator module configured to calculate the target loss value based on the start position loss, the end position loss, and the length loss.

Optionally, the target loss value operator module is further configured to:

the encoding vector generation submodule includes:

the first input subunit is configured to input the question to be queried and the sample text into the embedding layer as an input set to obtain a corresponding input vector;

the second input subunit is configured to input the input vector to the 1 st stack layer, so as to obtain an output vector of the 1 st stack layer;

a third input subunit, configured to input the output vector of the ith stack layer to the (i + 1) th stack layer, so as to obtain the output vector of the (i + 1) th stack layer, where i belongs to [1, n ], and i starts to take a value from 1;

a judging subunit, configured to judge whether i is equal to n-1, if yes, run the output submodule, and if no, run the third input subunit;

the output subunit is configured to output the output vector of the nth stack layer as the encoding vector of each word unit in the input set.

It should be noted that the components in the device claims should be understood as functional blocks which are necessary to implement the steps of the program flow or the steps of the method, and each functional block is not actually defined by functional division or separation. The device claims defined by such a set of functional modules are to be understood as a functional module framework for implementing the solution mainly by means of a computer program as described in the specification, and not as a physical device for implementing the solution mainly by means of hardware.

There is also provided in an embodiment of the present application a computing device comprising a memory, a processor, and computer instructions stored on the memory and executable on the processor, wherein the processor implements the steps of the method for training an answer extraction model when executing the instructions.

An embodiment of the present application further provides a computer readable storage medium, which stores computer instructions, when executed by a processor, for implementing the steps of the method for training the answer extraction model as described above.

The above is an illustrative scheme of a computer-readable storage medium of the present embodiment. It should be noted that the technical solution of the storage medium and the technical solution of the above-mentioned training method for the answer extraction model belong to the same concept, and details that are not described in detail in the technical solution of the storage medium can be referred to the description of the technical solution of the above-mentioned training method for the answer extraction model.

The foregoing description of specific embodiments of the present application has been presented. Other embodiments are within the scope of the following claims. In some cases, the actions or steps recited in the claims may be performed in a different order than in the embodiments and still achieve desirable results. In addition, the processes depicted in the accompanying figures do not necessarily require the particular order shown, or sequential order, to achieve desirable results. In some embodiments, multitasking and parallel processing may also be possible or may be advantageous.

The computer instructions comprise computer program code which may be in the form of source code, object code, an executable file or some intermediate form, or the like. The computer-readable medium may include: any entity or device capable of carrying the computer program code, recording medium, usb disk, removable hard disk, magnetic disk, optical disk, computer Memory, Read-Only Memory (ROM), Random Access Memory (RAM), electrical carrier wave signals, telecommunications signals, software distribution medium, and the like. It should be noted that the computer readable medium may contain content that is subject to appropriate increase or decrease as required by legislation and patent practice in jurisdictions, for example, in some jurisdictions, computer readable media does not include electrical carrier signals and telecommunications signals as is required by legislation and patent practice.

It should be noted that, for the sake of simplicity, the above-mentioned method embodiments are described as a series of acts or combinations, but those skilled in the art should understand that the present application is not limited by the described order of acts, as some steps may be performed in other orders or simultaneously according to the present application. Further, those skilled in the art should also appreciate that the embodiments described in the specification are preferred embodiments and that the acts and modules referred to are not necessarily required in this application.

In the above embodiments, the descriptions of the respective embodiments have respective emphasis, and for parts that are not described in detail in a certain embodiment, reference may be made to related descriptions of other embodiments.

The preferred embodiments of the present application disclosed above are intended only to aid in the explanation of the application. Alternative embodiments are not exhaustive and do not limit the invention to the precise embodiments described. Obviously, many modifications and variations are possible in light of the above teaching. The embodiments were chosen and described in order to best explain the principles of the application and its practical applications, to thereby enable others skilled in the art to best understand and utilize the application. The application is limited only by the claims and their full scope and equivalents.

Claims

1. A method for training an answer extraction model, comprising:

2. The method of claim 1, wherein the question set is constructed by:

3. The method for training an answer extraction model according to claim 2, wherein the generating a query question matching the question label according to the category to which the answer type label included in the question label belongs comprises:

4. The method for training an answer extraction model according to claim 1, wherein the step of inputting any one of the questions to be queried and the sample text into the pre-trained answer extraction model to determine the answer extraction result of the question to be queried comprises:

5. The method for training an answer extraction model according to claim 4, wherein the determining the answer extraction result corresponding to the question to be queried according to the probability distribution of the starting position and the ending position comprises:

6. The method for training an answer extraction model according to claim 5, wherein the generating a target loss value of the answer extraction model based on the answer extraction result and the answer label comprises:

7. The method of training an answer extraction model of claim 6, wherein said calculating the target loss value based on the start position loss, the end position loss, and the length loss comprises:

8. The method of claim 4, wherein the vector coding module comprises an embedding layer and n stacked layers;

9. An answer extraction model training device, comprising:

10. A computing device comprising a memory, a processor, and computer instructions stored on the memory and executable on the processor, wherein the processor implements the steps of the method of any one of claims 1-8 when executing the instructions.

11. A computer-readable storage medium storing computer instructions, which when executed by a processor, perform the steps of the method of any one of claims 1 to 8.