CN115544224A - Machine reading understanding method, model training method, device and medium - Google Patents

Machine reading understanding method, model training method, device and medium Download PDF

Info

Publication number
CN115544224A
CN115544224A CN202210990226.5A CN202210990226A CN115544224A CN 115544224 A CN115544224 A CN 115544224A CN 202210990226 A CN202210990226 A CN 202210990226A CN 115544224 A CN115544224 A CN 115544224A
Authority
CN
China
Prior art keywords
question
answer
training
text
reading understanding
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202210990226.5A
Other languages
Chinese (zh)
Inventor
计云杰
陈亮宇
窦辰晓
马宝昌
李先刚
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Seashell Housing Beijing Technology Co Ltd
Original Assignee
Seashell Housing Beijing Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Seashell Housing Beijing Technology Co Ltd filed Critical Seashell Housing Beijing Technology Co Ltd
Priority to CN202210990226.5A priority Critical patent/CN115544224A/en
Publication of CN115544224A publication Critical patent/CN115544224A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/332Query formulation
    • G06F16/3329Natural language query formulation or dialogue systems
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/3331Query processing
    • G06F16/334Query execution
    • G06F16/3344Query execution using natural language analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/30Semantic analysis

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Artificial Intelligence (AREA)
  • Computational Linguistics (AREA)
  • General Engineering & Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • Mathematical Physics (AREA)
  • Human Computer Interaction (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • General Health & Medical Sciences (AREA)
  • Electrically Operated Instructional Devices (AREA)

Abstract

The embodiment of the invention discloses a machine reading understanding method, a model training method, a device, electronic equipment and a storage medium. The training method comprises the following steps: determining an original question, a positive question, a negative question and a text containing an answer of the original question, wherein the positive question and the original question have the same answer, and the negative question is a question which cannot be answered based on the text; determining a training sample, wherein the training sample comprises a first splicing result of an original question and a text, a second splicing result of a positive question and the text, a third splicing result of a negative question and the text, and the position of an answer of the original question in the text; inputting the training sample into a pre-training language model to execute a training process of the pre-training language model; and determining the pre-training language model completing the training process as a machine reading understanding model. The embodiment of the invention improves the machine reading understanding effect in the scene without answers.

Description

Machine reading understanding method, model training method, device and medium
Technical Field
The present invention relates to the field of natural language processing technologies, and in particular, to a Machine Reading Comprehension (MRC) method, a model training apparatus, an electronic device, and a storage medium.
Background
Machine-readable understanding is a technique that uses algorithms to make computing mechanisms solve text semantics and answer related questions. Machine-reading understanding is the core task of natural language understanding (NLP) to have a machine read a piece of text and then answer a given question. In addition to being a basic natural language understanding task, machine-reading understanding also plays an important role in many applications, such as automated question-answering and man-machine conversation tasks. In recent years, with the development of machine learning, especially deep learning, machine reading understanding research has advanced greatly and in practical application, the head is exposed.
Most current machine-reading understanding techniques are generally based on the assumption that the questions posed are always answered. However, in most application scenarios, such assumptions are in fact not satisfied, and the user is likely to ask an unanswered question, i.e., the question cannot find an answer in a given text. Since the current technology does not consider such a scenario or simply adopts a simple card threshold method, even if the question is unanswerable, it will give a seemingly correct, but rather wrong answer.
Disclosure of Invention
The embodiment of the invention provides a machine reading understanding method, a model training method, a device, electronic equipment and a storage medium.
A training method of a machine reading understanding model comprises the following steps:
determining an original question, a positive question, a negative question and a text containing an answer to the original question, wherein the positive question and the original question have the same answer, and the negative question is a question which cannot be answered based on the text;
determining a training sample, wherein the training sample comprises a first splicing result of the original question and the text, a second splicing result of the positive question and the text, a third splicing result of the negative question and the text, and a position of an answer of the original question in the text;
inputting the training samples into a pre-training language model to perform a training process of the pre-training language model, the training process comprising: inputting the position, the first stitching result, the second stitching result, and the third stitching result into the pre-trained language model, respectively, to determine, by the pre-trained language model, a first vector representation of an answer to the original question from a vector representation of the first stitching result based on the position, a second vector representation of an answer to the positive question from a vector representation of the second stitching result based on the position, and a third vector representation of an answer to the negative question from a vector representation of the third stitching result based on the position; determining a first similarity of the second vector representation to the first vector representation and a second similarity of the third vector representation to the first vector representation; determining a loss function value of the pre-training language model based on the first similarity and the second similarity; configuring model parameters of the pre-training language model so that the loss function value is lower than a preset threshold value;
and determining the pre-training language model completing the training process as a machine reading understanding model.
In an exemplary embodiment, the determining the forward problem comprises at least one of:
returning the terms of the original question to the translation result to determine the original question as the forward question;
and determining the translation accuracy test retranslation result of the original question as the forward question.
The determining a negative-going problem comprises at least one of:
replacing the entity contained in the original question with a replacement result of a different entity, and determining the problem as the negative direction problem;
and determining the result after negative words are introduced into the original question as the negative question.
In an exemplary embodiment, said determining a loss function value for said pre-trained language model based on said first similarity and said second similarity comprises:
the loss function value decreases with increasing first similarity and the loss function value increases with increasing second similarity.
In an exemplary embodiment, further comprising:
determining the loss function value L spanCL Wherein
Figure BDA0003803553940000031
Wherein: log is a logarithmic function based on a natural constant e; exp is an exponential function with a natural constant e as the base;
Figure BDA0003803553940000032
is a vector from the starting position i of the answer to the original question
Figure BDA0003803553940000033
Vector to end position j of answer to the original question
Figure BDA0003803553940000034
The splicing vector of (1);
Figure BDA0003803553940000035
is a vector from the starting position i of the answer to the forward question
Figure BDA0003803553940000036
Vector to end position j of answer to the forward question
Figure BDA0003803553940000037
The splicing vector of (1);
Figure BDA0003803553940000038
is a vector from the starting position i of the answer to the negative-going question
Figure BDA0003803553940000039
Vector to end position j of answer to the negative-going question
Figure BDA00038035539400000310
The splicing vector of (1); τ is a predetermined scaling factor;
Figure BDA00038035539400000311
a similarity function is calculated.
A machine reading understanding method, comprising:
receiving a test question and a test text;
inputting the test question and the test text into the machine reading understanding model, wherein the machine reading understanding model is obtained by training according to the training method of the machine reading understanding model;
receiving an answer to the test question from the machine-reading understanding model, wherein the answer to the test question is determined based on the test text.
A training apparatus for machine reading understanding models, comprising:
a first determination module, configured to determine an original question, a positive question, a negative question, and a text containing an answer to the original question, where the positive question and the original question have the same answer, and the negative question is a question that cannot be answered based on the text;
a second determining module, configured to determine a training sample, where the training sample includes a first concatenation result of the original question and the text, a second concatenation result of the positive question and the text, a third concatenation result of the negative question and the text, and a position of an answer to the original question in the text;
a training module, configured to input the training sample into a pre-training language model to execute a training process of the pre-training language model, where the training process includes: inputting the location, the first concatenation result, the second concatenation result, and the third concatenation result into the pre-trained language model, respectively, to determine, by the pre-trained language model, a first vector representation of an answer to the original question from a vector representation of the first concatenation result based on the location, a second vector representation of an answer to the positive question from a vector representation of the second concatenation result based on the location, and a third vector representation of an answer to the negative question from a vector representation of the third concatenation result based on the location; determining a first similarity of the second vector representation to the first vector representation and a second similarity of the third vector representation to the first vector representation; determining a loss function value of the pre-training language model based on the first similarity and the second similarity; configuring model parameters of the pre-training language model so that the loss function value is lower than a preset threshold value;
and the third determining module is used for determining the pre-training language model completing the training process as a machine reading understanding model.
In an exemplary embodiment, the first determining module is configured to perform at least one of:
returning the terms of the original question to the translation result to determine the original question as the forward question;
and determining the translation accuracy test retranslation result of the original question as the forward question.
In an exemplary embodiment, the first determining module is configured to perform at least one of:
replacing the entity contained in the original problem with a replacement result of a different entity, and determining the problem as the negative problem;
and determining the result after negative words are introduced into the original question as the negative question.
In an exemplary embodiment, the training module is configured to decrease the loss function value with an increase in the first similarity and to increase the loss function value with an increase in the second similarity.
In an exemplary embodiment, a training module for determining said loss function value L spanCL Wherein
Figure BDA0003803553940000041
Wherein: log is a logarithmic function based on a natural constant e; exp is an exponential function with a natural constant e as the base;
Figure BDA0003803553940000042
is a vector from the starting position i of the answer to the original question
Figure BDA0003803553940000043
Vector to end position j of answer to the original question
Figure BDA0003803553940000044
The splicing vector of (1);
Figure BDA0003803553940000045
is a vector from the starting position i of the answer to the forward question
Figure BDA0003803553940000046
Vector to end position j of answer to the forward question
Figure BDA0003803553940000047
The splicing vector of (1);
Figure BDA0003803553940000051
is a vector from the starting position i of the answer to the negative question
Figure BDA0003803553940000052
An end bit to answer to the negative-going questionVector at position j
Figure BDA0003803553940000053
The splicing vector of (1); τ is a predetermined scaling factor;
Figure BDA0003803553940000054
a similarity function is calculated.
A machine reading understanding apparatus, comprising:
the first receiving module is used for receiving the test questions and the test texts;
an input module, configured to input the test question and the test text into the machine reading understanding model, where the machine reading understanding model is trained according to a training method of the machine reading understanding model as described in any one of the above;
a second receiving module for receiving an answer to the test question from the machine-reading understanding model, wherein the answer to the test question is determined based on the test text.
An electronic device, comprising:
a processor;
a memory for storing the processor-executable instructions;
the processor is configured to read the executable instructions from the memory and execute the executable instructions to implement the machine reading understanding model training method or the machine reading understanding method according to any one of the above items.
A computer readable storage medium having stored thereon computer instructions which, when executed by a processor, may implement the method of training a machine reading understanding model or the method of machine reading understanding as described in any one of the above.
A computer program product comprising computer instructions which, when executed by a processor, implement a method of training a machine-reading understanding model or a method of machine-reading understanding as described in any one of the preceding claims.
As can be seen from the above technical solutions, in an embodiment of the present invention, a training method includes: determining an original question, a positive question, a negative question and a text containing an answer of the original question, wherein the positive question and the original question have the same answer, and the negative question is a question which cannot be answered based on the text; determining a training sample, wherein the training sample comprises a first splicing result of an original question and a text, a second splicing result of a positive question and the text, a third splicing result of a negative question and the text, and the position of an answer of the original question in the text; inputting the training sample into a pre-training language model to execute a training process of the pre-training language model; and determining the pre-training language model completing the training process as a machine reading understanding model. The embodiment of the invention does not simply answer the question by depending on the contact ratio of the question and the words in the text, and the model can better understand the meaning information of the question and the given text, thereby improving the machine reading understanding effect in the non-answer scene.
Drawings
In order to more clearly illustrate the technical solutions in the embodiments of the present invention, the drawings used in the description of the embodiments will be briefly introduced below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and it is obvious for those skilled in the art that other drawings can be obtained according to the drawings without inventive labor.
Fig. 1 is an exemplary flow chart of a method of training a machine-readable understanding model according to an embodiment of the present invention.
Fig. 2 is an exemplary diagram of a baseline model of a machine-readable understanding model of an embodiment of the present invention.
FIG. 3 is a machine-readable understanding model training diagram according to an embodiment of the present invention.
Fig. 4 is an exemplary flowchart of a machine reading understanding method based on a machine reading understanding model according to an embodiment of the present invention.
FIG. 5 is a graph comparing the performance of machine-read understanding models of embodiments of the present invention.
Fig. 6 is an exemplary block diagram of a machine-readable understanding model training apparatus according to an embodiment of the present invention.
Fig. 7 is an exemplary block diagram of a machine-readable understanding model apparatus according to an embodiment of the present invention.
Fig. 8 is an exemplary block diagram of an electronic device according to an embodiment of the present invention.
Detailed Description
In order to make the objects, technical solutions and advantages of the present invention more apparent, the present invention is further described in detail with reference to the accompanying drawings.
For simplicity and clarity of description, the invention will be described below by describing several representative embodiments. Numerous details of the embodiments are set forth to provide an understanding of the principles of the invention. It will be apparent, however, that the invention may be practiced without these specific details. Some embodiments are not described in detail, but rather are merely provided as frameworks, in order to avoid unnecessarily obscuring aspects of the invention. Hereinafter, "including" means "including but not limited to", "according to … …" means "according to at least … …, but not according to only … …". In view of the language convention of chinese, the following description, when it does not specifically state the number of a component, means that the component may be one or more, or may be understood as at least one. The terms "first," "second," "third," "fourth," and the like in the description and in the claims, as well as in the drawings, if any, are used for distinguishing between similar elements and not necessarily for describing a particular sequential or chronological order. It is to be understood that the data so used is interchangeable under appropriate circumstances such that the embodiments of the invention described herein are, for example, capable of operation in sequences other than those illustrated or otherwise described herein.
Current machine-reading understanding techniques are based on the basic assumption that questions can always be answered. However, in most application scenarios, such assumptions are in fact not satisfied, since the user is likely to ask an unanswered question, i.e., a question that cannot find an answer in a given text. However, since current machine reading understanding techniques do not take into account such scenarios, even if the question is unanswerable, it still gives seemingly correct, but in fact wrong, answers.
For example, given the text: "Seerda legend: twilight princess is an action adventure game developed and released by the family electronic game machines of GameCube and Wii in Nintendo. This is the thirteenth part of the Selda legend series. Twilight princess, originally planned to be released on GameCube in 11 months of 2005, was delayed by nintendo to allow its developers to improve games, add more content, and migrate it to Wii. The Wii version was released 11 months 2006 in north america and japan, europe and australia with this gaming machine. The GameCube version was released worldwide in 2006, month 12. "
For the given text above, the question is given: the legend for "Selda: twilight princess originally planned to be issued in which year? ". Existing machine-reading understanding models can often correctly give answers: "2005".
However, for another problem: the legend for "Selda: the australian princess originally planned to be released in which year? ". Based on the given text, this question should obviously be unanswerable or unanswered. However, existing machine-reading understanding models typically give plausible answers: "2005". Obviously, such an answer is wrong.
The embodiment of the invention improves the effect of the machine reading understanding model in the reading understanding scene without answers. In particular, embodiments of the present invention enable the model to better understand the meaning information of a question with a given text, rather than simply relying on the degree of coincidence of the question and words in the text to answer the question.
The applicant found that: slight changes to the question often result in changes in the answers to the question, which is a significant challenge for machines to identify unanswered questions. In view of this feature, embodiments of the present invention propose comparative learning based on answer snippets. For a resumable question (called the original question), by changing the textual representation of the question, two types of new questions can be constructed: (1) Questions with the same answer (called forward questions); (2) questions that cannot be answered (called negative-going questions). These two new problems are similar in expression to the original ones, but some key component sites have been changed. Through comparative learning, the machine reading understanding model learns the difference between the original problem and the two problems, so that the machine reading understanding model has the capability of distinguishing key semantic changes.
Fig. 1 is an exemplary flow chart of a method of training a machine-readable understanding model according to an embodiment of the present invention.
As shown in fig. 1, the method includes:
step 101: an original question (original question), a positive question (positive question) having the same answer as the original question, a negative question (negative question) being a text-based unanswerable question, and text containing the answer to the original question are determined.
Here, the original question is a text-based answerable question, that is, an answer to the original question is contained in the text. The forward question is also a text-based answerable question and has the same answer as the original question. Negative questions are based on questions that the text cannot answer.
By changing the textual representation of the original question, positive and negative questions can be constructed.
For example, for the original question, the forward question can be obtained by technical means of translation. The translation process specifically comprises: firstly, translating a forward question in a first language into a translation in a second language; and translating the translated text of the second language back to the first language, wherein the translated question of the first language is a forward question. The forward question and the original question have the same answer, but there is a slight variation in the expression that does not affect the respondence of the forward question.
For example, the manner of determining the forward problem may include:
(1) Returning the terms of the original problem to the translation result to determine the original problem as a forward problem;
(2) And testing the translation accuracy of the original problem and determining the translation result as a forward problem.
The negative problem of the original problem can be obtained by performing entity replacement or introducing negative words and other transformation methods on the original problem. There is a slight variation in the expression of the negative-going problem from the original problem. Meanwhile, the negative-going question is unanswerable.
For example, the manner of determining the negative-going problem may include:
(1) And replacing the entity contained in the original problem with a replacement result of a different meaning entity, and determining the problem as the negative problem.
Examples are: the original questions were: "how to go to the five road junctions in the beijing hai lake area? Replacing the five road junctions with different meanings of the known spring road to obtain an unanswerable negative question: "how to know how to go to the beijing hai lake area? ".
(2) And determining a result obtained after negative words are introduced into the original problem as a negative problem.
Examples are: the original questions were: "is Xiaoming successful in a plan to play today in Shanghai? "in which the negative word" none "is introduced, a negative unanswered question is obtained: "did Xiaoming not go to the Shanghai for a play plan to succeed today? ".
The above exemplary description describes a typical example of determining a positive problem and a negative problem based on the original problem, and those skilled in the art will appreciate that this description is merely exemplary and is not intended to limit the scope of embodiments of the present invention.
Step 102: determining a training sample, wherein the training sample comprises a first splicing result of the original question and the text, a second splicing result of the positive question and the text, a third splicing result of the negative question and the text, and the position of an answer of the original question in the text.
Here, the original question is spliced with the text to obtain a first splicing result; splicing the forward question with the same text to obtain a second splicing result; and splicing the negative direction problem with the same text to obtain a third splicing result. Further, a training sample is constructed, the training sample comprising: (1) a first splicing result; (2) second splicing results; (3) a third splicing result; (4) The location of the answer to the original question in the same text (e.g., the beginning location of the answer in the text and the end location of the answer in the text).
Step 103: inputting training samples into a Pre-trained model for Natural language Processing (PLM) to perform a training process of the Pre-trained language model, the training process comprising: inputting the position of the answer of the original question in the text, the first splicing result, the second splicing result and the third splicing result into a pre-training language model respectively, so as to determine a first vector representation of the answer of the original question from the vector representation of the first splicing result based on the position by the pre-training language model, determine a second vector representation of the answer of the positive question from the vector representation of the second splicing result based on the position and determine a third vector representation of the answer of the negative question from the vector representation of the third splicing result based on the position; determining a first similarity of the second vector representation to the first vector representation and a second similarity of the third vector representation to the first vector representation; determining a loss function value of the pre-training language model based on the first similarity and the second similarity; and configuring model parameters of the pre-training language model so that the loss function value is lower than a preset threshold value. Wherein: the vector representation of the first splicing result, the vector representation of the second splicing result and the vector representation of the third splicing result have the same length. Furthermore, the location of the answer to the original question in the vector representation of the first concatenation result, the location of the answer to the positive-going question in the vector representation of the second concatenation result, and the location of the answer to the negative-going question in the vector representation of the third concatenation result, which are identical in the respective vector representations, are determined by the locations in the text of the answers to the original questions specified in the training samples.
Here, large-scale PLM significantly improves the performance of various NLP tasks. Starting from BERT and GPT-2, the self-supervised pre-training paradigm and the supervised fine-tuning paradigm have enjoyed tremendous success and have refreshed the most advanced achievements in many natural language processing areas, such as semantic similarity, machine-read understanding, common sense reasoning, and text summarization. The encoder read the baseline model of the understanding model with PLM as the machine. The question and text are spliced together and entered into the PLM. The PLM obtains an answer to the question by predicting the starting and ending positions of the answer in the text.
For example, the PLM may be implemented as a BERT-base model, a BERT-large model, a Roberta-base model, a Roberta-large model, an ALBERT-base model, or an ALBERT-large model, among others.
While the above exemplary description describes a typical example of a PLM, those skilled in the art will appreciate that this description is exemplary only and is not intended to limit the scope of embodiments of the present invention.
In an exemplary embodiment, determining the loss function value of the pre-trained language model based on the first similarity and the second similarity comprises: the loss function value decreases with increasing first similarity and the loss function value increases with increasing second similarity. Therefore, by zooming in the vector representation of the answers to the original question and the positive question and zooming out the vector representation of the answers to the original question and the negative question, a loss function suitable for improving the machine reading understanding effect in the no-answer scene can be constructed.
Therefore, based on the machine-read understanding model obtained by the training process, the meaning information of the question and the given text can be better understood, and the question is not simply answered by relying on the coincidence degree of the question and the words in the text. The embodiment of the invention improves the machine reading understanding effect in the scene without answers.
FIG. 2 is an exemplary diagram of a baseline model of a machine-readable understanding model of an embodiment of the present invention. The model shown in fig. 2 is a common baseline model for current machine reading understanding tasks, with PLM as the encoder of the baseline model. The question and text are spliced together and entered into the PLM. The PLM obtains an answer to the question by predicting the start and end positions of the answer in the text.
In FIG. 2, the question is participled to obtain a natural language sequence
Figure BDA0003803553940000111
Segmenting text to obtain natural language sequence
Figure BDA0003803553940000112
Will be provided with
Figure BDA0003803553940000113
And
Figure BDA0003803553940000114
Figure BDA0003803553940000115
phase splicing, in which [ CLS ] is added]As initiator, [ SEP]As a separator between questions and text. And inputting the splicing result into the PLM. The PLM maps the concatenation result to a hidden layer, where the mapping may be done multiple times. Such as: firstly, will [ CLS]Mapping to vector E [ CLS ]],
Figure BDA0003803553940000116
Are respectively mapped into vectors
Figure BDA0003803553940000117
Are respectively mapped into vectors
Figure BDA0003803553940000118
[SEP]Mapping as vector E [ SEP ]](ii) a Then PLM will vector E [ CLS]Mapping to vector T [ CLS ]],
Figure BDA0003803553940000119
Figure BDA00038035539400001110
Are respectively mapped into vectors
Figure BDA00038035539400001111
Are respectively mapped into vectors
Figure BDA00038035539400001112
[SEP]Mapping as vector T [ SEP ]]. Also, PLM is driven from T [ CLS ]],
Figure BDA00038035539400001113
Figure BDA00038035539400001114
The initial position and the end position of the answer are predicted. The vector representation between the start position and the end position is the answer.
FIG. 3 is a machine-readable understanding model training diagram according to an embodiment of the present invention. FIG. 3 depicts a comparative learning process based on answer snippets. In an embodiment of the present invention, the PLM shown in fig. 2 is trained to obtain a machine-read understanding model based on the process shown in fig. 3. The core idea of the comparative learning process is as follows: zooming in on vector representations of the same type of text and zooming out of vector representations of different types of text. In embodiments of the present invention, the vector representation of the answers to the original question and the positive question is zoomed in, and the vector representation of the answers to the original question and the negative question is zoomed out.
During the training of the PLM: and splicing the original question, the positive question and the negative question with the same text, and then respectively inputting PLMs (product language models) to obtain vector representations of corresponding answers. Wherein: the location in the text of the answer to the original question is also entered into the PLM. For example, assume the answer to the original question is at the starting position i, the ending position j in the text.
First, the original question is spliced with the text to generate a first splicing result, and the first splicing result is input into the PLM. The first vector of answers to the original question output by the PLM is represented as: from the vector representation of the first concatenation result, the vector at the starting position i
Figure BDA0003803553940000121
Vector at end position j in vector representation of first stitching result
Figure BDA0003803553940000122
Splicing vectors of, i.e.
Figure BDA0003803553940000123
The forward question is then stitched with the text to generate a second stitching result, which is input to the PLM. The second vector of answers to the forward question output by the PLM is represented as: vector table from second concatenation resultVector at starting position i in the representation
Figure BDA0003803553940000124
And a vector at the end position j in the vector representation of the second stitching result
Figure BDA0003803553940000125
I.e. is
Figure BDA0003803553940000126
Then, the negative-going question is stitched with the text to generate a third stitching result, which is input to the PLM. The third vector of answers to the negative-going questions output by the PLM is represented as: from the vector representation of the third stitching result, the vector at the starting position i
Figure BDA0003803553940000127
And a vector at the end position j in the vector representation of the third concatenation result
Figure BDA0003803553940000128
Is the splicing vector of
Figure BDA0003803553940000129
In the above description, the order of inputting the first splicing result, the second splicing result, and the third splicing result into the PLM may be changed, and the embodiment of the present invention is not limited thereto.
Next, a loss function value L for the comparative learning process can be determined spanCL Wherein:
Figure BDA00038035539400001210
wherein: log is a logarithmic function based on a natural constant e; exp is an exponential function with a natural constant e as the base;
Figure BDA0003803553940000131
is a vector from the starting position i of the answer to the original question
Figure BDA0003803553940000132
Vector to end position j of answer to original question
Figure BDA0003803553940000133
The splicing vector of (2);
Figure BDA0003803553940000134
is a vector from the starting position i of the answer to the forward question
Figure BDA0003803553940000135
Vector to end position j of answer to forward question
Figure BDA0003803553940000136
The splicing vector of (1);
Figure BDA0003803553940000137
is a vector from the starting position i of the answer to the negative-going question
Figure BDA0003803553940000138
Vector at end position j of answer to negative-going question
Figure BDA0003803553940000139
The splicing vector of (2); τ is a predetermined scaling factor, τ is typically greater than 0;
Figure BDA00038035539400001310
a similarity function is calculated. For example,
Figure BDA00038035539400001311
can be that
Figure BDA00038035539400001312
And
Figure BDA00038035539400001313
the dot product of (a) is,
Figure BDA00038035539400001314
is characterized by
Figure BDA00038035539400001315
And
Figure BDA00038035539400001316
the similarity between them.
L spanCL Can be regarded as the contrast loss function value L of PLM, then the model parameters of PLM are configured by back propagation and random gradient descent algorithm, so as to make the contrast loss function value L spanCL Below a preset threshold. Or, mixing L spanCL Further combined with the predicted loss function value L of the PLM model span The overall loss function value L of the PLM model is determined in a weighted manner, e.g. L = λ 1 *L span2 L spanCL Wherein λ is 1 And λ 2 Is a predetermined weight coefficient. Then, the model parameters of the PLM are then propagated back and configured using a stochastic gradient descent algorithm, by configuring the model parameters of the PLM such that the contrast loss function value L is below a preset threshold.
Step 104: and determining the pre-training language model completing the training process as a machine reading understanding model.
Based on the machine reading understanding model obtained by the training process, various machine reading understanding tasks can be executed.
Fig. 4 is an exemplary flowchart of a machine reading understanding method based on a machine reading understanding model according to an embodiment of the present invention.
As shown in fig. 4, the machine reading understanding method based on the machine reading understanding model includes:
step 401: receiving a test question and a test text;
step 402: and inputting the test question and the test text into a machine reading understanding model, wherein the machine reading understanding model is trained according to the training method of the machine reading understanding model. For example, question and test text are stitched together and entered into the machine-read understanding model.
Step 403: receiving an answer to the test question from a machine-reading understanding model, wherein the answer to the test question is determined based on the test text. Wherein the machine-read understanding model obtains an answer to the test question (referred to as a test answer) by predicting the start and end positions of the answer in the test text.
FIG. 5 is a graph comparing the performance of machine-read understanding models of embodiments of the present invention. 6 models of BERT-base, BERT-large, roberta-base, roberta-large, ALBERT-base and ALBERT-large are selected as baseline models, respectively. By comparing with the segment contrast learning (spanCL) provided by the embodiment of the invention, the model performance can be generally improved by 0.86-2.14 of accurate matching rate (EM) index and 0.76-2.0 of fuzzy matching rate (F1) index.
Fig. 6 is an exemplary block diagram of a machine-readable understanding model training apparatus according to an embodiment of the present invention. The training apparatus 500 for machine-reading understanding models includes:
a first determining module 501, configured to determine an original question, a positive question, a negative question, and a text containing an answer to the original question, where the positive question and the original question have the same answer, and the negative question is a question that cannot be answered based on the text;
a second determining module 502, configured to determine a training sample, where the training sample includes a first concatenation result of an original question and a text, a second concatenation result of a positive question and the text, a third concatenation result of a negative question and the text, and a position of an answer to the original question in the text;
a training module 503, configured to input training samples into the pre-training language model to perform a training process of the pre-training language model, where the training process includes: inputting the position, the first splicing result, the second splicing result and the third splicing result into a pre-trained language model respectively, so as to determine a first vector representation of an answer to the original question from the vector representation of the first splicing result based on the position by the pre-trained language model, determine a second vector representation of an answer to a positive question from the vector representation of the second splicing result based on the position, and determine a third vector representation of an answer to a negative question from the vector representation of the third splicing result based on the position; determining a first similarity of the second vector representation to the first vector representation and a second similarity of the third vector representation to the first vector representation; determining a loss function value of the pre-training language model based on the first similarity and the second similarity; configuring model parameters of a pre-training language model so that a loss function value is lower than a preset threshold value;
and a third determining module 504, configured to determine the pre-training language model completing the training process as the machine reading understanding model.
In an exemplary embodiment, the first determining module 501 is configured to perform at least one of the following: returning the terms of the original problem to the translation result to determine the original problem as a forward problem; and (5) testing the translation accuracy of the original problem and determining the translation result as a forward problem.
In an exemplary embodiment, the first determining module 501 is configured to perform at least one of the following: replacing the entity contained in the original problem with a replacement result of a different entity, and determining the problem as a negative problem; and determining the result after the negative word is introduced into the original problem as a negative problem.
In an exemplary embodiment, the training module 503 is configured to decrease the loss function value with an increase of the first similarity and increase the loss function value with an increase of the second similarity.
In an exemplary embodiment, a training module 503 for determining the loss function value L spanCL Wherein
Figure BDA0003803553940000151
Wherein: log is a logarithmic function based on a natural constant e; exp is an exponential function with a natural constant e as the base;
Figure BDA0003803553940000152
is a vector from the starting position i of the answer to the original question
Figure BDA0003803553940000153
Vector to end position j of answer to original question
Figure BDA0003803553940000154
The splicing vector of (1);
Figure BDA0003803553940000155
is a vector from the starting position i of the answer to the forward question
Figure BDA0003803553940000156
Vector to end position j of answer to forward question
Figure BDA0003803553940000157
The splicing vector of (1);
Figure BDA0003803553940000158
is a vector from the starting position i of the answer to the negative-going question
Figure BDA0003803553940000159
Vector at end position j of answer to negative-going question
Figure BDA00038035539400001510
The splicing vector of (1); τ is a predetermined scaling factor;
Figure BDA00038035539400001511
a similarity function is calculated.
Fig. 7 is an exemplary block diagram of a machine-readable understanding model apparatus according to an embodiment of the present invention. The machine reading understanding model apparatus 600 includes:
a first receiving module 601, configured to receive a test question and a test text;
an input module 602, configured to input the test question and the test text into a machine reading understanding model, where the machine reading understanding model is trained according to the training method of the machine reading understanding model as described in any one of the above;
a second receiving module 603 configured to receive an answer to the test question from the machine-reading understanding model, wherein the answer to the test question is determined based on the test text.
Embodiments of the present invention also provide a computer-readable medium, which stores instructions that, when executed by a processor, may perform the above training method of a machine-reading understanding model or steps in a machine-reading understanding method. The computer readable medium in practical application may be included in the device/apparatus/system described in the above embodiment, or may exist alone without being assembled into the device/apparatus/system. The computer-readable storage medium carries one or more programs which, when executed, implement the method for training a machine-readable understanding model or the method for machine-readable understanding described in the embodiments. According to embodiments of the present disclosure, the computer-readable storage medium may be a non-volatile computer-readable storage medium, which may include, for example but is not limited to: a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing, without limiting the scope of the invention. In the embodiments of the present disclosure, a computer readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device.
As shown in fig. 8, an embodiment of the present invention further provides an electronic device, in which a device of a machine reading understanding method or a machine reading understanding method of the embodiment of the present invention may be integrated. Fig. 8 is a block diagram showing an exemplary configuration of an electronic device according to an embodiment of the present invention. Specifically, the method comprises the following steps: the electronic device may include a processor 701 of one or more processing cores, memory 702 of one or more computer-readable storage media, and a computer program stored on the memory and executable on the processor. The training method of the machine reading understanding model or the machine reading understanding method described above may be implemented when the program of the memory 702 is executed.
In practical applications, the electronic device may further include a power supply 703, an input unit 704, an output unit 705, and the like. Those skilled in the art will appreciate that the configuration of the electronic device shown in fig. 8 is not intended to be limiting of the electronic device and may include more or fewer components than shown, or some components in combination, or a different arrangement of components. Wherein: the processor 701 is a control center of the electronic device, connects various parts of the entire electronic device using various interfaces and lines, performs various functions of the server and processes data by operating or executing software programs and/or modules stored in the memory 702 and calling data stored in the memory 702, thereby integrally monitoring the electronic device. The memory 702 may be used to store software programs and modules, i.e., the computer-readable storage media described above. The processor 701 executes various functional applications and data processing by executing software programs and modules stored in the memory 702. The memory 702 may mainly include a program storage area and a data storage area, wherein the program storage area may store an operating system, an application program required for at least one function, and the like; the storage data area may store data created according to the use of the server, and the like. Further, the memory 702 may include high speed random access memory, and may also include non-volatile memory, such as at least one magnetic disk storage device, flash memory device, or other volatile solid state storage device. Accordingly, the memory 702 may also include a memory controller to provide the processor 701 with access to the memory 702.
The electronic device further includes a power source 703 for supplying power to each component, and the power source 703 can be logically connected to the processor 701 through a power management system, so that functions of managing charging, discharging, power consumption management, and the like can be realized through the power management system. The power supply 703 may also include any component including one or more of a dc or ac power source, a recharging system, a power failure detection circuit, a power converter or inverter, a power status indicator, and the like. The electronic device may also include an input unit 704, and the input unit 704 may be used to receive input numeric or character information and generate keyboard, mouse, joystick, optical or trackball signal inputs related to user settings and function control. The electronic device may further include an output unit 705, which output unit 705 may be used to display information input by or provided to a user, as well as various graphical user interfaces, which may be made up of graphics, text, icons, video, and any combination thereof.
Embodiments of the present invention also provide a computer program product, which includes computer instructions, when executed by a processor, for implementing the training method of the machine reading understanding model or the machine reading understanding method according to any one of the above embodiments. The flowchart and block diagrams in the figures of the present invention illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments of the present disclosure. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams or flowchart illustration, and combinations of blocks in the block diagrams or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.
The principles and embodiments of the present invention have been described herein using specific embodiments, which are presented only to aid in understanding the method and its core ideas and are not intended to limit the present invention. It will be appreciated by those skilled in the art that changes may be made in this embodiment and its uses without departing from the principles, spirit and scope of the invention, and that any such modifications, equivalents, improvements and equivalents as may be included within the scope of the invention.

Claims (10)

1. A training method of a machine reading understanding model is characterized by comprising the following steps:
determining an original question, a positive question, a negative question and a text containing an answer to the original question, wherein the positive question and the original question have the same answer, and the negative question is a question which cannot be answered based on the text;
determining a training sample, wherein the training sample comprises a first splicing result of the original question and the text, a second splicing result of the positive question and the text, a third splicing result of the negative question and the text, and a position of an answer of the original question in the text;
inputting the training samples into a pre-training language model to perform a training process of the pre-training language model, the training process comprising: inputting the location, the first concatenation result, the second concatenation result, and the third concatenation result into the pre-trained language model, respectively, to determine, by the pre-trained language model, a first vector representation of an answer to the original question from a vector representation of the first concatenation result based on the location, a second vector representation of an answer to the positive question from a vector representation of the second concatenation result based on the location, and a third vector representation of an answer to the negative question from a vector representation of the third concatenation result based on the location; determining a first similarity of the second vector representation to the first vector representation and a second similarity of the third vector representation to the first vector representation; determining a loss function value of the pre-training language model based on the first similarity and the second similarity; configuring model parameters of the pre-training language model so that the loss function value is lower than a preset threshold value;
and determining the pre-training language model completing the training process as a machine reading understanding model.
2. The method of claim 1,
the determining a forward problem comprises at least one of:
returning the terms of the original question to the translation result to determine the original question as the forward question;
and determining the translation accuracy test retranslation result of the original question as the forward question.
The determining a negative-going problem comprises at least one of:
replacing the entity contained in the original question with a replacement result of a different entity, and determining the problem as the negative direction problem;
and determining the result after negative words are introduced into the original question as the negative question.
3. The method of claim 1, wherein determining the loss function value for the pre-trained language model based on the first similarity and the second similarity comprises:
the loss function value decreases with increasing first similarity and the loss function value increases with increasing second similarity.
4. The method of claim 3, further comprising:
determining the loss function value L spanCL Wherein
Figure FDA0003803553930000021
Wherein: log is a logarithmic function based on a natural constant e; exp is an exponential function with a natural constant e as the base;
Figure FDA0003803553930000022
is a vector from the starting position i of the answer to the original question
Figure FDA0003803553930000023
Vector to end position j of answer to the original question
Figure FDA0003803553930000024
The splicing vector of (1);
Figure FDA0003803553930000025
is a vector from the starting position i of the answer to the forward question
Figure FDA0003803553930000026
Vector to end position j of answer to the forward question
Figure FDA0003803553930000027
The splicing vector of (1);
Figure FDA0003803553930000028
is a vector from the starting position i of the answer to the negative-going question
Figure FDA0003803553930000029
Vector to end position j of answer to the negative-going question
Figure FDA00038035539300000210
The splicing vector of (1); τ is a predetermined scaling factor;
Figure FDA00038035539300000211
a similarity function is calculated.
5. A machine reading understanding method, comprising:
receiving a test question and a test text;
inputting the test question and test text into the machine reading understanding model, wherein the machine reading understanding model is trained according to the training method of the machine reading understanding model of any one of claims 1-4;
receiving an answer to the test question from the machine-reading understanding model, wherein the answer to the test question is determined based on the test text.
6. A training device for machine reading understanding models, comprising:
a first determination module, configured to determine an original question, a positive question, a negative question, and a text containing an answer to the original question, where the positive question and the original question have the same answer, and the negative question is a question that cannot be answered based on the text;
a second determining module, configured to determine a training sample, where the training sample includes a first concatenation result of the original question and the text, a second concatenation result of the positive question and the text, a third concatenation result of the negative question and the text, and a position of an answer to the original question in the text;
a training module, configured to input the training sample into a pre-training language model to execute a training process of the pre-training language model, where the training process includes: inputting the position, the first stitching result, the second stitching result, and the third stitching result into the pre-trained language model, respectively, to determine, by the pre-trained language model, a first vector representation of an answer to the original question from a vector representation of the first stitching result based on the position, a second vector representation of an answer to the positive question from a vector representation of the second stitching result based on the position, and a third vector representation of an answer to the negative question from a vector representation of the third stitching result based on the position; determining a first similarity of the second vector representation to the first vector representation and a second similarity of the third vector representation to the first vector representation; determining a loss function value of the pre-training language model based on the first similarity and the second similarity; configuring model parameters of the pre-training language model so that the loss function value is lower than a preset threshold value;
and the third determining module is used for determining the pre-training language model completing the training process as a machine reading understanding model.
7. A machine reading understanding apparatus, comprising:
the first receiving module is used for receiving the test questions and the test texts;
an input module, configured to input the test question and the test text into the machine reading understanding model, wherein the machine reading understanding model is trained according to the training method of the machine reading understanding model of any one of claims 1 to 4;
a second receiving module for receiving an answer to the test question from the machine-reading understanding model, wherein the answer to the test question is determined based on the test text.
8. An electronic device, characterized in that the electronic device comprises:
a processor;
a memory for storing the processor-executable instructions;
the processor is configured to read the executable instructions from the memory and execute the executable instructions to implement the training method of the machine reading understanding model according to any one of claims 1 to 4 or the machine reading understanding method according to claim 5.
9. A computer readable storage medium having stored thereon computer instructions, which when executed by a processor, implement the method of training a machine reading understanding model of any one of claims 1-4 or the method of machine reading understanding of claim 5.
10. A computer program product comprising computer instructions which, when executed by a processor, implement the method of training of a machine reading understanding model of any of claims 1 to 4 or the method of machine reading understanding of claim 5.
CN202210990226.5A 2022-08-18 2022-08-18 Machine reading understanding method, model training method, device and medium Pending CN115544224A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210990226.5A CN115544224A (en) 2022-08-18 2022-08-18 Machine reading understanding method, model training method, device and medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210990226.5A CN115544224A (en) 2022-08-18 2022-08-18 Machine reading understanding method, model training method, device and medium

Publications (1)

Publication Number Publication Date
CN115544224A true CN115544224A (en) 2022-12-30

Family

ID=84726754

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210990226.5A Pending CN115544224A (en) 2022-08-18 2022-08-18 Machine reading understanding method, model training method, device and medium

Country Status (1)

Country Link
CN (1) CN115544224A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117149989A (en) * 2023-11-01 2023-12-01 腾讯科技(深圳)有限公司 Training method for large language model, text processing method and device

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117149989A (en) * 2023-11-01 2023-12-01 腾讯科技(深圳)有限公司 Training method for large language model, text processing method and device
CN117149989B (en) * 2023-11-01 2024-02-09 腾讯科技(深圳)有限公司 Training method for large language model, text processing method and device

Similar Documents

Publication Publication Date Title
Uc-Cetina et al. Survey on reinforcement learning for language processing
CN110442718B (en) Statement processing method and device, server and storage medium
JP6894058B2 (en) Hazardous address identification methods, computer-readable storage media, and electronic devices
CN111309889A (en) Method and device for text processing
CN107622709A (en) Knowledge point Grasping level evaluation method, medium and electronic equipment
CN108710704B (en) Method and device for determining conversation state, electronic equipment and storage medium
CN110781682B (en) Named entity recognition model training method, recognition method, device and electronic equipment
CN110276023A (en) POI changes event discovery method, apparatus, calculates equipment and medium
CN107657560A (en) Knowledge point intensive training method, medium and electronic equipment
US8682677B2 (en) System and method for automatically generating a dialog manager
EP3563302A1 (en) Processing sequential data using recurrent neural networks
CN116737908A (en) Knowledge question-answering method, device, equipment and storage medium
CN115544224A (en) Machine reading understanding method, model training method, device and medium
CN111667728A (en) Voice post-processing module training method and device
CN110471835A (en) A kind of similarity detection method and system based on power information system code file
CN112307048A (en) Semantic matching model training method, matching device, equipment and storage medium
CN115827838A (en) Dialog generation method and system based on story continuous writing and dynamic knowledge base
Singla et al. From {Solution Synthesis} to {Student Attempt Synthesis} for block-based visual programming tasks
CN112699046B (en) Application program testing method and device, electronic equipment and storage medium
CN111158472A (en) Simulation situation construction method and system of virtual experiment
CN111931503B (en) Information extraction method and device, equipment and computer readable storage medium
CN113238952B (en) Intelligent auxiliary guide test method and device based on application program state transition diagram
CN113836005A (en) Virtual user generation method and device, electronic equipment and storage medium
CN111553173B (en) Natural language generation training method and device
CN111240787A (en) Interactive help method and system based on real scene semantic understanding

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination