CN117391198A

CN117391198A - Method, device, equipment and storage medium for generating reading understanding

Info

Publication number: CN117391198A
Application number: CN202311268546.0A
Authority: CN
Inventors: 关玉洋; 邢启洲; 李健; 陈明; 武卫东
Original assignee: Beijing Sinovoice Technology Co Ltd
Current assignee: Beijing Sinovoice Technology Co Ltd
Priority date: 2023-09-27
Filing date: 2023-09-27
Publication date: 2024-01-12

Abstract

The embodiment of the invention provides a method, a device, equipment and a storage medium for generating reading understanding, wherein the method comprises the following steps: acquiring reference content and target problems, and acquiring a generated reading understanding model; generating a reading understanding model, namely performing reinforcement learning by adopting an objective function containing information to extract correct rewards and regular terms based on the initial reading understanding model; and obtaining a target answer according to the reference content and the target question and the generated reading understanding model. The initial reading understanding model is subjected to reinforcement training by taking an objective function containing correct rewards and regular terms extracted from information as a training target, so that the extraction capacity of the reading understanding model for complex information and correct information extraction capacity are improved, the complexity and cost of data annotation can be reduced, the effect of the model in the aspect is improved, the effect of the model beyond optimized data is ensured based on the regular terms, and further reinforcement learning of the initial reading understanding model is realized.

Description

Method, device, equipment and storage medium for generating reading understanding

Technical Field

The present invention relates to the field of data processing technologies, and in particular, to a generative reading and understanding method, a generative reading and understanding device, a corresponding electronic device, and a corresponding computer readable storage medium.

Background

The generating reading understanding refers to a way of giving an article and a question, extracting relevant contents from the article and generating an answer, and can be mainly realized based on reinforcement learning, wherein the reinforcement learning refers to a training method of a machine learning model.

In the related art based on the reinforcement learning generation type reading understanding, the goal of reinforcement learning training is generally to make the text similarity between the generated answer and the reference answer higher, so as to achieve the task purpose. However, in the reinforcement learning manner using the similarity as a target, a complete reference answer needs to be labeled to each question for determining the similarity, which is disadvantageous for extracting complex information.

Disclosure of Invention

In view of the foregoing, embodiments of the present invention have been developed to provide a generative reading and understanding method, a generative reading and understanding device, a corresponding electronic device, and a corresponding computer-readable storage medium that overcome or at least partially solve the foregoing problems.

The embodiment of the invention discloses a generating type reading and understanding method, which comprises the following steps:

acquiring reference content and target problems, and acquiring a generated reading understanding model; the generated reading understanding model is obtained by performing reinforcement learning on the initial reading understanding model by adopting an objective function containing information to extract correct rewards and regular terms;

and obtaining a target answer according to the reference content, the target question and the generated reading understanding model.

Optionally, the obtaining the target answer according to the reference content and the target question and the generated reading understanding model includes:

inputting the reference content and the target problem as input items of the generated reading understanding model into the generated reading understanding model;

and outputting a target answer based on the input item through the generated reading understanding model.

Optionally, before the obtaining the generated reading understanding model, the method further includes:

acquiring an initial reading understanding model; the initial reading understanding model is a model for training based on the reference content sample data and the question sample data to obtain generated answer sample data;

And obtaining an objective function, and performing reinforcement learning training on the initial reading understanding model based on the objective function to obtain the generated reading understanding model.

Optionally, the acquiring the objective function includes:

acquiring a loss function, acquiring a regular term aiming at the initial reading understanding model and extracting correct rewards aiming at information;

adding a regular term of the initial reading understanding model into a loss function;

and constructing an objective function by adopting a loss function after adding the regular term and extracting the correct rewards aiming at the information.

Optionally, obtaining the correct rewards for information extraction includes:

acquiring generated answer sample data of the initial reading understanding model aiming at reference question sample data and reference answer sample data of the reference question sample data;

performing numerical labeling on the reference answer sample data, and determining the information extraction accuracy of the generated answer sample data based on the numerical labeled reference answer sample data;

based on the degree of correctness of the information extraction, a correct reward for information extraction is determined.

Optionally, the generated answer sample data includes at least one piece of information; the determining the information extraction accuracy degree of the generated answer sample data based on the reference answer sample data after the numerical value labeling comprises the following steps:

If the marked numerical value exists in the at least one piece of information, determining that the generated information corresponding to the marked numerical value in the at least one piece of information is correct information;

and/or if the marked numerical value does not exist in the at least one piece of information, determining that the generated information which does not correspond to the marked numerical value in the at least one piece of information is error information;

and obtaining information extraction accuracy degree based on the correct information and the number of the error information.

Optionally, the accuracy is obtained based on the correct information and the number of error information; the determining, based on the degree of correctness of information extraction, a correct reward for information extraction includes:

obtaining a target reward score aiming at the correct information by adopting a preset reward score and the quantity of the correct information;

obtaining a target penalty score for the error information by adopting a preset penalty score and the preset quantity of the error information;

and superposing the target reward score and the target punishment score to obtain the reward which is extracted correctly for the information.

Optionally, the performing reinforcement learning training on the initial reading understanding model based on the objective function to obtain the generated reading understanding model includes:

And training the initial reading understanding model to maximize the objective function and obtain the generated reading understanding model.

The embodiment of the invention also discloses a generating type reading and understanding device, which comprises:

the reading understanding model acquisition module is used for acquiring the reference content and the target problem and acquiring a generated reading understanding model; the generated reading understanding model is obtained by performing reinforcement learning on the initial reading understanding model by adopting an objective function containing information to extract correct rewards and regular terms;

and the answer generation module is used for obtaining a target answer according to the reference content, the target question and the generated reading and understanding model.

Optionally, the answer generation module includes:

the answer generation sub-module is used for taking the reference content and the target questions as input items of the generated reading understanding model and inputting the reference content and the target questions into the generated reading understanding model; and outputting a target answer based on the input item through the generated reading understanding model.

Optionally, before the obtaining the generated reading understanding model, the apparatus further includes:

and the reading understanding model generation module is used for carrying out reinforcement learning by adopting an objective function containing information to extract correct rewards and regular terms based on the initial reading understanding model to obtain a generated reading understanding model.

Optionally, the reading understanding model generating module includes:

the reading understanding model generation sub-module is used for acquiring an initial reading understanding model; the initial reading understanding model is a model for training based on the reference content sample data and the question sample data to obtain generated answer sample data; and obtaining an objective function, and performing reinforcement learning training on the initial reading understanding model based on the objective function to obtain the generated reading understanding model.

Optionally, the reading understanding model generating submodule includes:

the objective function acquisition unit is used for acquiring a loss function, acquiring a regular term aiming at the initial reading understanding model and extracting correct rewards aiming at information; adding a regular term of the initial reading understanding model into a loss function; and constructing an objective function by adopting a loss function after adding the regular term and extracting the correct rewards aiming at the information.

Optionally, the objective function obtaining unit includes:

a reward acquiring unit configured to acquire generated answer sample data of the initial reading understanding model for reference question sample data, and reference answer sample data of the reference question sample data; performing numerical labeling on the reference answer sample data, and determining the information extraction accuracy of the generated answer sample data based on the numerical labeled reference answer sample data; based on the degree of correctness of the information extraction, a correct reward for information extraction is determined.

Optionally, the generated answer sample data includes at least one piece of information; the bonus acquisition unit includes:

a reward obtaining subunit, configured to determine that the generated information corresponding to the marked value in the at least one piece of information is correct information if the marked value exists in the at least one piece of information; and/or if the marked numerical value does not exist in the at least one piece of information, determining that the generated information which does not correspond to the marked numerical value in the at least one piece of information is error information; and obtaining information extraction accuracy degree based on the correct information and the number of the error information. The accuracy is obtained based on the correct information and the number of the error information, and is further used for obtaining a target reward score aiming at the correct information by adopting a preset reward score and the number of the correct information; obtaining a target penalty score for the error information by adopting a preset penalty score and the preset quantity of the error information; and superposing the target reward score and the target punishment score to obtain the reward which is extracted correctly for the information.

Optionally, the reading understanding model generating submodule includes:

And the reading understanding model strengthening unit is used for training the initial reading understanding model to maximize the objective function and obtaining the generated reading understanding model.

The embodiment of the invention also discloses an electronic device, which comprises: a processor, a memory, and a computer program stored on the memory and capable of running on the processor, which when executed by the processor implements any of the generative reading and understanding methods.

The embodiment of the invention also discloses a computer readable storage medium, wherein the computer readable storage medium stores a computer program, and the computer program realizes the generating type reading and understanding method when being executed by a processor.

The embodiment of the invention has the following advantages:

in the embodiment of the invention, the generated reading understanding model is acquired to obtain the target answer based on the reference content and the target problem by using the acquired generated reading understanding model, the generated reading understanding model is mainly based on the model obtained by performing reinforcement learning on the initial reading understanding model by adopting the objective function comprising the correct rewards and the regular terms extracted by the information, the initial reading understanding model can be subjected to reinforcement training by taking the objective function as the training target of reinforcement learning, the capability of the reading understanding model for extracting complex information and the capability of the correct information extraction are improved, the complexity and the cost of the data labeling are reduced based on the correct rewards extracted based on the information, the effect of the model in the aspect is improved, and the effect of the model beyond the optimized data can be ensured based on the regular terms.

Drawings

FIG. 1 is a flow chart of the steps of an embodiment of a generative reading understanding method of the present invention;

FIG. 2 is a flow chart of steps of another embodiment of a generative reading understanding method of the present invention;

fig. 3 is a block diagram of an embodiment of a generative reading understanding device of the present invention.

Detailed Description

In order that the above-recited objects, features and advantages of the present invention will become more readily apparent, a more particular description of the invention will be rendered by reference to the appended drawings and appended detailed description.

In the related art based on the reinforcement learning generation type reading understanding, as an example, the deep neural network and reinforcement learning generation type machine reading understanding method is presented, mainly a generation type language mode and a reinforcement learning method can be used, a reinforcement learning reward model can use a BLEU value or a ROUGE value, the value is mainly obtained by comparing a generation result and a reference answer, the higher value indicates that the generation result and the reference answer are more similar in text characteristics, then the value can be used as an objective function, the BLEU value (Bilingual Evaluation Understudy, an index for evaluating the quality of a machine translation result) or the ROUGE value (Recall-Oriented Understudy for Gisting Evaluation, an index for evaluating the quality of a text abstract or other natural language processing task) of the model generation result and the reference answer are higher through a training model, and then model training is completed; as another example, the keyword question-answering method based on the language model and reinforcement learning is represented, mainly, keywords can be selected through the model, new keywords are added into search sentences to rewrite the search sentences so that the question-answering effect is better, then similarity of answer results and reference answers is compared to establish a reward function, the reinforcement learning is used for training the rewrite model, the similarity of the answer results of the rewritten questions and the reference answers is higher, and further model training is completed.

From the above, the related technologies of generating reading understanding based on reinforcement learning all achieve model training by making the text similarity between the generated answer and the reference answer higher as the reinforcement learning training target to achieve the task purpose.

However, the model effect of using similarity as a training target is generally lower than the effect of using probability or entropy as a training target, i.e., training using similarity as a target on a generated model is not an optimal method; and the similarity is used as a training target and depends on training data marking with higher difficulty, a complete reference answer needs to be marked for each question so as to be used for generating the similarity of the answer and the reference answer, and further model training is completed, namely, the marking difficulty and the cost are higher, the correctness and the sufficiency of the data cannot be ensured, and the generation effect of the generated reading understanding models with different scales is poor when complex information is encountered on a question-answering task. That is, when the reference content provided for the model is more complex, the effect of the model will be affected, for example, given that the reference content may contain a plurality of similar information, such as a certain statistic data of different years, the model may have a problem of year correspondence error due to the plurality of similar information contained in the reference content when the problem is to query the statistic data of a specific year, because the statistic data types are various and the combination of years is rich in reading understanding, the possible situation cannot be covered by marking a large number of reference answers to optimize the model effect, and because the model is more sensitive to the name of the statistic data and relatively insensitive to the year information; for another example, under the same reference content, a problem is that the model may miss some year data when querying all year related statistics.

That is, although reinforcement learning is also used as a training method in the related art, there is no difference between the data use and the method using the entropy training model, and it is necessary to prepare a complete reference answer to the question, and the difficulty in data preparation is high.

In order to optimize a specific problem of a model on a question-answer or reading understanding task, an embodiment of the present invention focuses on a reinforcement learning method for a reading understanding model (referred to as an initial reading understanding model) obtained through training in the related art, that is, a further model training process performed on the basis of the initial reading understanding model. Specifically, the initial reading understanding model is subjected to reinforcement training mainly by taking an objective function containing information extraction correct rewards and regular terms as a reinforcement learning training target, so that the capability of the reading understanding model for extracting complex information and the capability of correctly extracting the information are improved, the complexity and cost of data annotation are reduced based on the information extraction of the correct rewards, the effect of the model is improved, the effect of the model outside optimized data can be ensured based on the regular terms, and further reinforcement learning of the initial reading understanding model is realized. In the process of performing reinforcement learning based on information extraction and correct rewards, only short numerical answers need to be marked on training data, and complete reference answers do not need to be marked, so that a complex data preparation process can be avoided, and the complexity and cost of data marking are reduced.

Referring to fig. 1, a flowchart illustrating steps of an embodiment of a generative reading understanding method of the present invention focuses on a usage/application process based on which a generative reading understanding model is based, and may specifically include the following steps:

step 101, acquiring reference content and target problems, and acquiring a generated reading understanding model;

the generated reading understanding model can be a model obtained by further strengthening learning of the initial reading understanding model. The initial reading and understanding model may refer to a preliminary reading and understanding model obtained by training through labeled reference content, generated answers and reference answers before reinforcement learning training and using an autoregressive method, i.e. training by adopting the prior art.

Specifically, the generating type reading and understanding model mainly can be obtained by performing reinforcement learning on the initial reading and understanding model by adopting an objective function containing correct rewards and regular terms for information extraction, and specifically can be expressed as performing reinforcement training on the initial reading and understanding model by taking the objective function as training, so that the capability of the reading and understanding model for extracting complex information and the capability of correctly extracting the information are improved.

In the process of performing reinforcement learning based on information extraction of correct rewards, training data only need to be marked with short numerical answers, and complete reference answers do not need to be marked, so that a complex data preparation process can be avoided, and complexity and cost of data marking are reduced; in the reinforcement learning process based on the regular term, the parameter difference between the model after training and the model before training is not excessively large based on the regular term, and the effect of the model beyond the optimized data is guaranteed.

The reading understanding model obtained by reinforcement learning of the initial reading understanding model can be used for generating a generated reading model, and the reading understanding model can be mainly used for obtaining correct answers to target questions, namely target answers, based on reference content and target questions which are provided for reference of the model, so that the model can combine the reference content and the target questions.

In one embodiment of the invention, the reference content and the target question, and the generated reading understanding model may be obtained, so as to obtain a target answer for the reference content and the target question based on the obtained generated reading understanding model.

And 102, obtaining a target answer according to the reference content, the target question and the generated reading understanding model.

Specifically, the reference content and the target question may be input to the generative reading understanding model as input items of the generative reading understanding model, and then the output target answer is generated based on the input items, i.e., by combining the input reference content and the target question, through the generative reading understanding model.

In a specific implementation, the Generative reading understanding model used is a Generative language model, such as GPT (Generative Pre-trained Transformer, a large-scale Pre-training model for generating natural language text), which can predict the following given the content of the preamble. In the embodiment of the invention, the reading and understanding task can be designed to splice the reference content and the target problem according to the preset template, and then the spliced content is input into the generated reading and understanding model, and the model is combined with the input item to generate the answer.

Illustratively, the preset template may be as follows:

"answer questions according to the following references: [ reference content ]

The problems are: [ problem of object ] "

It should be noted that, for other preset templates, embodiments of the present invention are not limited.

In practical application, the generated reading understanding model obtained after reinforcement learning is deployed on the client, then when the user has reading understanding requirements, corresponding instructions are generated based on the generated reading understanding operation executed by the user, after the response of the client to the instructions is imported into the reference content and the target questions to be read and understood by the user, correct answers to the target questions, namely target answers, are automatically fed back to the user through the use of the generated reading model.

Referring to fig. 2, a flowchart illustrating steps of another embodiment of a generative reading understanding method of the present invention focuses on a process for generating/training a generative reading understanding model, which may specifically include the steps of:

step 201, obtaining an initial reading understanding model;

the process of generating/training the generated reading understanding model can be mainly represented as a reinforcement learning process of the initial reading understanding model.

The initial reading understanding model is a reading understanding model before reinforcement learning training, and mainly can refer to a preliminary reading understanding model obtained by marking reference content, generated answers and reference answers before reinforcement learning training and training by using an autoregressive method, namely training by adopting the prior art.

Specifically, the initial reading understanding model obtained based on training in the related art can be trained based on the reference content sample data and the question sample data to obtain generated answer sample data. It should be noted that the generated answer sample data may be used to determine that the information is extracted correctly for the reinforcement learning.

Step 202, obtaining an objective function;

the objective function can be mainly used as an optimization target/training target of model training, and in the embodiment of the invention, the objective function can be used as an optimization target of the initial reading and understanding model so as to complete the reinforcement learning process of the initial reading and understanding model.

In one embodiment of the invention, the obtained objective function can contain information extraction correct rewards and regular terms, wherein reinforcement learning performed by the information extraction correct rewards can enable training data to be only required to be marked with short numerical answers, complete reference answers are not required to be marked, complex data preparation process is avoided, and therefore complexity and cost of data marking are reduced; the regular term is a regular term aiming at the optimized data effect of the initial reading understanding model, reinforcement learning is carried out based on the regular term, the parameter difference between the model after training and the model before training can be prevented from being too large, and the effect of the model beyond the optimized data is guaranteed.

Specifically, a loss function can be obtained, a regular term for an initial reading understanding model and correct rewards for information are obtained, then the regular term for the initial reading understanding model is added into the loss function, and finally the loss function after the regular term is added and the correct rewards for the information are extracted are adopted, so that an objective function is constructed.

The information extraction correct rewards can be obtained, the information extraction correct degree can be determined based on the information extraction correct degree, the information extraction correct degree can be mainly determined by matching the content to be extracted in the reference answer in the generated answer, and if the corresponding content is matched in the generated answer, the information extraction correct degree can be determined; if the corresponding content is not matched in the generated answer, it may be determined that the extraction is wrong.

In one embodiment of the invention, in the process of reinforcement learning based on information extraction of correct rewards, training data only need to be marked with short numerical answers for matching whether generated answers contain relevant information or not, and no complete reference answers need to be marked. That is, in the foregoing determination of the correctness/mistakes of the extraction, whether the generated answer matches the reference answer or not can be determined based on whether the marked numerical value exists in the generated answer.

In a preferred embodiment of the invention, when the model effect is further optimized by using a reinforcement learning method on the basis of the preliminary reading understanding model, a certain proportion of autoregressive tasks can be added for training so as to avoid losing the text generating capability of the model; the autoregressive task can be expressed as data training by using a training original model, so that the data training takes a regular term added in a loss function as a complementary relation, and the parameter difference between the model after training and the model before training is ensured not to be excessively large based on the formed complementary relation.

In the reinforcement learning process, the data training using the training original model can be specifically expressed as that when the information extraction correct rewards are acquired, the generated answer sample data of the initial reading understanding model aiming at the reference question sample data and the reference answer sample data of the reference question sample data can be mainly acquired, then the reference answer sample data can be subjected to numerical labeling at the moment, the information extraction correct degree of the generated answer sample data is determined based on the reference answer sample data after the numerical labeling, and the information extraction correct rewards aiming at the information are determined based on the information extraction correct degree.

In practical application, the generated answer sample data may include at least one piece of information, and in one case, if the marked value exists in the at least one piece of information, the generated information corresponding to the marked value in the at least one piece of information may be determined to be correct information; in another case, if the marked value does not exist in the at least one piece of information, it may be determined that the generated information, which does not correspond to the marked value, in the at least one piece of information is the error information. The information extraction accuracy degree can be obtained based on the number of correct information and error information, the whole content of the reference answer is not required to be written out, and only whether certain numerical values are correct or not is required to be given out, and the information extraction accuracy degree is determined by judging whether the numerical values are contained in the answer or not.

In a specific implementation, the target reward score for the correct information can be obtained by adopting the preset reward score and the number of the correct information, the target penalty score for the error information can be obtained by adopting the preset penalty score and the preset number of the error information, and then the target reward score and the target penalty score are overlapped to obtain the correct reward for information extraction.

Illustratively, the embodiment of the invention can extract the correct degree of correctness by using information on the design of the reinforcement-learned reward function, and can set the preset reward score to be q, q >0 on the assumption that each correct information extraction in the generated answers; each error information extraction in the generated answers sets a preset penalty score w, w <0, if multiple pieces of information are involved in the correct answer serving as a reference, the rewards in each generated answer are obtained based on positive rewards or penalty overlapping of each piece of information, and the final rewards can be set as R. Wherein the predetermined bonus point q and the predetermined penalty point w may be opposite numbers, which is not limited in the embodiment of the present invention.

Besides adding information to extract correct rewards in the objective function, a regular term aiming at the data optimization effect of the initial reading model can be added in the objective function, so that the effect of the model on other problems is not reduced. The objective function constructed based on information extraction of the correct rewards and regularization terms may be expressed as follows:

wherein, R can refer to information extracting correct rewards; y may refer toDecoding results of the model under reinforcement learning; x may refer to reference content and target questions; alpha and beta can be manually set super parameters;refers to the probability of language model prediction, where LLM (Large Language Model) is an artificial intelligence model;regular terms that can be used to represent the data optimization effect for the initial reading model.

And 203, performing reinforcement learning training on the initial reading understanding model based on the objective function to obtain a generated reading understanding model.

The reinforcement learning of the initial reading and understanding model based on the objective function can mainly be used for training the objective of the initial reading and understanding model to maximize the objective function and obtain the generated reading and understanding model. I.e. training targets can be set to maximize the above objective function C based on rewards training and trained using a random gradient increasing method, such as Adam or AdamW (all optimizers commonly used in model training) gradient methods, to encourage the model generation to contain as much correct information and as little erroneous information as possible.

In a preferred embodiment of the present invention, the reinforcement learning derived generative reading understanding model may be further used, specifically, the reference content and the target question may be acquired, and then the reference content and the target question are input as input items of the generative reading understanding model to the generative reading understanding model, and the target answer is output based on the input items through the generative reading understanding model.

In the embodiment of the invention, the initial reading understanding model is subjected to reinforcement training by taking the objective function containing the correct rewards and the regular terms extracted by the information as the training target of reinforcement learning, so that the capability of the reading understanding model for extracting complex information and the capability of correctly extracting the information are improved, the complexity and the cost of data labeling are reduced based on the correct rewards extracted by the information, the effect of the model in the aspect is improved, and the effect of the model outside optimized data can be ensured based on the regular terms, so that the further reinforcement learning of the initial reading understanding model is realized. In the process of performing reinforcement learning based on information extraction and correct rewards, only short numerical answers need to be marked on training data, and complete reference answers do not need to be marked, so that a complex data preparation process can be avoided, and the complexity and cost of data marking are reduced.

It should be noted that, for simplicity of description, the method embodiments are shown as a series of acts, but it should be understood by those skilled in the art that the embodiments are not limited by the order of acts, as some steps may occur in other orders or concurrently in accordance with the embodiments. Further, those skilled in the art will appreciate that the embodiments described in the specification are presently preferred embodiments, and that the acts are not necessarily required by the embodiments of the invention.

Referring to fig. 3, a block diagram of an embodiment of a generating reading and understanding device of the present invention is shown, and may specifically include the following modules:

the reading understanding model acquisition module 301 is configured to acquire a reference content and a target problem, and acquire a generated reading understanding model; the generated reading understanding model is obtained by performing reinforcement learning on the initial reading understanding model by adopting an objective function containing information to extract correct rewards and regular terms;

and the answer generation module 302 is configured to obtain a target answer according to the reference content and the target question, and the generated reading understanding model.

In one embodiment of the invention, answer generation module 302 may comprise the following sub-modules:

In an embodiment of the present invention, before the obtaining the generated reading understanding model, the apparatus provided by the embodiment of the present invention may further include the following modules:

In one embodiment of the invention, the reading understanding model generation module may include the following sub-modules:

In one embodiment of the invention, the reading understanding model generation sub-module may include the following elements:

In one embodiment of the present invention, the objective function acquisition unit may include the following sub-units:

In one embodiment of the invention, the generated answer sample data includes at least one piece of information; the bonus acquisition unit may comprise the following sub-units:

In the embodiment of the invention, the generating type reading and understanding device provided by the embodiment of the invention obtains the target answer based on the reference content and the target problem by obtaining the generating type reading and understanding model, the generating type reading and understanding model is mainly based on the model obtained by performing reinforcement learning on the initial reading and understanding model by adopting the target function comprising the correct rewards extracted by the information and the regular terms, namely the initial reading and understanding model can be subjected to reinforcement training by taking the target function as the training target of reinforcement learning, so that the capability of the reading and understanding model for extracting complex information and the capability of the correct information are improved, the complexity and cost of the data annotation are reduced based on the correct rewards extracted based on the information, the effect of the model in the aspect is improved, and the effect of the model beyond the optimized data can be ensured based on the regular terms.

For the device embodiments, since they are substantially similar to the method embodiments, the description is relatively simple, and reference is made to the description of the method embodiments for relevant points.

The embodiment of the invention also provides electronic equipment, which comprises:

The system comprises a processor, a memory and a computer program stored in the memory and capable of running on the processor, wherein the computer program realizes the processes of the generating reading and understanding method embodiment when being executed by the processor, and can achieve the same technical effects, and the repetition is avoided, so that the description is omitted.

The embodiment of the invention also provides a computer readable storage medium, on which a computer program is stored, which when executed by a processor, realizes the processes of the above-mentioned generating reading and understanding method embodiment, and can achieve the same technical effects, and in order to avoid repetition, the description is omitted here.

In this specification, each embodiment is described in a progressive manner, and each embodiment is mainly described by differences from other embodiments, and identical and similar parts between the embodiments are all enough to be referred to each other.

It will be apparent to those skilled in the art that embodiments of the present invention may be provided as a method, apparatus, or computer program product. Accordingly, embodiments of the present invention may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, embodiments of the invention may take the form of a computer program product on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, etc.) having computer-usable program code embodied therein.

Embodiments of the present invention are described with reference to flowchart illustrations and/or block diagrams of methods, terminal devices (systems), and computer program products according to embodiments of the invention. It will be understood that each flow and/or block of the flowchart illustrations and/or block diagrams, and combinations of flows and/or blocks in the flowchart illustrations and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing terminal device to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing terminal device, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

While preferred embodiments of the present invention have been described, additional variations and modifications in those embodiments may occur to those skilled in the art once they learn of the basic inventive concepts. It is therefore intended that the following claims be interpreted as including the preferred embodiment and all such alterations and modifications as fall within the scope of the embodiments of the invention.

Finally, it should be noted that, the user information (including but not limited to user equipment information, user personal information, etc.) and the data (including but not limited to data for analysis, stored data, presented data, etc.) referred to in the present application are information and data authorized by the user or fully authorized by each party, and the collection, use and processing of the related data need to comply with the related laws and regulations and standards of the related country and region, and provide corresponding operation entries for the user to select authorization or rejection.

It is further noted that relational terms such as first and second, and the like are used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Moreover, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or terminal that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or terminal. Without further limitation, an element defined by the phrase "comprising one … …" does not exclude the presence of other like elements in a process, method, article or terminal device comprising the element.

The foregoing has outlined a detailed description of a generative reading understanding method, a generative reading understanding device, a corresponding electronic device, and a corresponding computer readable storage medium, wherein specific examples are provided herein to illustrate the principles and embodiments of the present invention, and the above examples are only for the purpose of aiding in the understanding of the method and core idea of the present invention; meanwhile, as those skilled in the art will have variations in the specific embodiments and application scope in accordance with the ideas of the present invention, the present description should not be construed as limiting the present invention in view of the above.

Claims

1. A method of generating a reading understanding, the method comprising:

2. The method of claim 1, further comprising, prior to the obtaining the generative reading understanding model:

3. The method of claim 2, wherein the obtaining an objective function comprises:

4. A method according to claim 3, wherein obtaining the correct rewards for information extraction comprises:

5. The method of claim 4, wherein the generated answer sample data includes at least one message; the determining the information extraction accuracy degree of the generated answer sample data based on the reference answer sample data after the numerical value labeling comprises the following steps:

6. The method according to claim 4 or 5, characterized in that the degree of correctness is obtained based on the amount of the correct information and the error information; the determining, based on the degree of correctness of information extraction, a correct reward for information extraction includes:

7. The method of claim 2, wherein the reinforcement learning training of the initial reading understanding model based on the objective function to obtain the generated reading understanding model comprises:

Training the initial reading understanding model to maximize the objective function and obtain the generated reading understanding model;

the obtaining a target answer according to the reference content, the target question and the generated reading understanding model comprises the following steps:

8. A generative reading understanding device, the device comprising:

9. An electronic device, comprising: a processor, a memory and a computer program stored on the memory and capable of running on the processor, which when executed by the processor implements the generative reading and understanding method of any of claims 1 to 7.

10. A computer readable storage medium, characterized in that the computer readable storage medium stores thereon a computer program, which when executed by a processor implements the generative reading understanding method according to any of claims 1 to 7.