CN117910462A

CN117910462A - Training method and device for abstract extraction model

Info

Publication number: CN117910462A
Application number: CN202311220061.4A
Authority: CN
Inventors: 刘微
Original assignee: Beijing Topsec Technology Co Ltd; Beijing Topsec Network Security Technology Co Ltd; Beijing Topsec Software Co Ltd
Current assignee: Beijing Topsec Technology Co Ltd; Beijing Topsec Network Security Technology Co Ltd; Beijing Topsec Software Co Ltd
Priority date: 2023-09-20
Filing date: 2023-09-20
Publication date: 2024-04-19

Abstract

The application provides a training method and a device for a abstract extraction model, which inputs a text into the abstract extraction model, the abstract extraction model outputs a candidate abstract, and then obtains a reward value of the candidate abstract according to the correlation degree of the candidate abstract and a golden abstract, and learns the sequence of sentences in the text by utilizing a network structure of a reinforcement learning training abstract extraction model, namely: if the model obtains higher rewards, the system improves the score of the extracted sentences through adjustment so as to lead the extracted sentences to move forward in the sorting process; conversely, if the model gets a lower reward, the corresponding sentence score is reduced, reducing the likelihood that it will be extracted again. The training method of the embodiment enables the abstract extraction model to make real-time adjustment according to the rewarding value of the candidate abstract so as to obtain the abstract with higher quality and improve the readability of the generated abstract.

Description

Training method and device for abstract extraction model

Technical Field

The application relates to the technical field of automatic abstract generation, in particular to a training method and device of an abstract extraction model.

Background

In the present age, the development of the internet, the maturation of cloud computing technology and the advent of the 5G age have led to an increasing diversity of information acquisition pathways and a trend toward mobility. The concomitant problem of "information explosion" is now plaguing everyone in the modern society. Limited by the effort and reading ability of the person, the inconsequential information in the text can take a significant amount of time, making it difficult for the person to know the content in the text in a timely manner and easily miss the useful information therein. Therefore, how to more effectively utilize text information resources, improve reading quality, and help people to rapidly process information and timely and accurately acquire main text content through a computer has become one of the key points in the current intelligent information processing research. Accordingly, research and development of automatic text summarization technology has become a major research branch in the field of natural language processing.

The closest technical method to the prior art is to input the target document text into a heterogeneous hypergraph construction layer, construct a summary document text heterogeneous hypergraph, input the summary document text heterogeneous hypergraph into a heterogeneous hypergraph updating layer to obtain an updated heterogeneous hypergraph, and finally input the updated heterogeneous hypergraph into a sentence classification layer to obtain at least one summary sentence extracted from the target document text. The method comprises the following steps: and inputting the target document text into the heterogeneous hypergraph construction layer to obtain a summary document text heterogeneous hypergraph, wherein the summary label represented by the summary label node is fictitiously obtained. Inputting the obtained abstract document text heterogeneous hypergraph into the heterogeneous hypergraph updating layer, so as to update the word hyperedge, the sentence node and the abstract label node through the heterogeneous hypergraph updating layer, and obtain an updated heterogeneous hypergraph. And inputting the updated heterogeneous hypergraph into a sentence classification layer to obtain at least one abstract sentence extracted from the target document text, and obtaining the abstract of the target document text based on the at least one abstract sentence.

However, the existing scheme encodes the input document text by constructing and updating the abstract-document text heterogeneous hypergraph, but the extracted feature context information is very weak, and the existing scheme obtains target loss capable of enhancing the perception of the abstract label by introducing contrast learning loss on the basis of cross entropy loss, so that model training is performed on the basis of the target loss, and the task target is not matched with the model optimization target, so that the readability of the generated abstract is low.

Disclosure of Invention

The embodiment of the application aims to provide a training method and device for a abstract extraction model, which are used for solving the problem of low readability of a text abstract generated by the existing scheme.

The training method of the abstract extraction model provided by the embodiment of the application comprises the following steps:

inputting the text into a abstract extraction model, obtaining sentence scores of sentences in the text, and selecting a plurality of sentences with highest sentence scores to form a candidate abstract;

Acquiring a golden abstract, and obtaining a reward value of a candidate abstract according to the correlation degree of the golden abstract and the candidate abstract;

according to the rewarding value, utilizing a reinforcement learning algorithm, and aiming at maximizing mathematical expectation of the rewarding value, adjusting model parameters of the abstract extraction model;

And repeating the process from obtaining sentence scores of sentences in the text to adjusting model parameters of the abstract extraction model until the training exit condition is reached, and obtaining the trained abstract extraction model.

In the above technical solution, a text is input into a abstract extraction model, the abstract extraction model outputs a candidate abstract, and then a reward value of the candidate abstract is obtained according to the correlation degree of the candidate abstract and the golden abstract, and the ranking of sentences in the text is learned by utilizing the network structure of the reinforcement learning training abstract extraction model, namely: if the model obtains higher rewards, the system improves the score of the extracted sentences through adjustment so as to lead the extracted sentences to move forward in the sorting process; conversely, if the model gets a lower reward, the corresponding sentence score is reduced, reducing the likelihood that it will be extracted again. The training method of the embodiment enables the abstract extraction model to make real-time adjustment according to the rewarding value of the candidate abstract so as to obtain the abstract with higher quality and improve the readability of the generated abstract.

In some alternative embodiments, the summary extraction model includes: a BERT model and a logistic regression classifier; wherein the BERT model is a model stacked by a bidirectional transducer encoder, the BERT model is used for realizing feature vector extraction using sentence context information, and the logistic regression classifier is used for determining the probability of the sentence appearing in the candidate abstract according to the extracted feature vector.

In the technical scheme, the abstract extraction model is composed of a BERT stacked several layers of converter encoder substructures, and finally a layer of logistic regression classifier is added to obtain sentence scores in texts, sentences with highest sentence scores are selected to form candidate abstracts, and the correlation degree of the candidate abstracts and the golden abstracts is used as rewards, wherein the rewards are used for strengthening a learning algorithm to update model parameters.

The BERT (Bidirectional Encoder Representations from Transformers) structure is a natural language processing model based on a transducer architecture, is a pre-training model, and can be used for various NLP tasks, such as text classification, named entity recognition, sentence relation judgment and the like. The BERT model was introduced by Google in 2018, which captures semantic information by learning context information in both directions. Unlike previous pre-training models, the BERT model has the feature of bi-directional pre-training, which can be used to represent contextual information and semantics of words, so that text can be analyzed from different perspectives.

In some alternative embodiments, inputting the text into the abstract extraction model to obtain sentence scores of sentences in the text, including:

the probability of each sentence in the text appearing in the candidate abstract is calculated and used as a sentence score.

In the above technical solution, the abstract extraction model calculates the probability p (y _i|x_i, θ) that each sentence x _i in the text appears in the candidate abstract, that is, the sentence score _i, i=1, 2, …, m. And then, according to the sentence score ranking, selecting k sentences with highest ranking (k is smaller than m) to form a candidate abstract.

In some alternative embodiments, obtaining the prize value of the candidate digest according to the correlation degree between the golden digest and the candidate digest includes:

calculating an evaluation index value according to the golden abstract and the candidate abstract by using an internal evaluation method of the automatic text abstract;

and obtaining the rewarding value according to the evaluation index value.

In the above technical solution, the automatic text abstract evaluation method is divided into an internal evaluation method and an external evaluation method. The internal evaluation method is that a text abstract is obtained by means of manual abstract, then the text abstract is used as a golden abstract, and compared with a candidate abstract generated by a system, and if the coincidence degree of the text abstract and the candidate abstract is high, the quality of the candidate abstract is higher. The external evaluation method is that under the condition of no golden abstract, the generated candidate abstract is directly used for replacing the original text to execute tasks related to the text, such as text retrieval, classification, clustering and the like, and if the application performance is improved, the quality of the candidate abstract is high. The external evaluation method is relatively limited, and can only make corresponding evaluation for specific tasks. The internal evaluation method is comparatively more pure, and there are two common internal evaluation methods Edmundson and ROUGE, so the present embodiment calculates an evaluation index value using the internal evaluation method.

In some alternative embodiments, the internal evaluation method comprises a ROUGE method; the evaluation index value includes an F1 value of ROUGE-N and/or an F1 value of ROUGE-L.

In the above technical solution, ROUGE performs comparison calculation on the automatically generated candidate abstract and the manually obtained golden abstract to obtain a corresponding score to evaluate the quality of the candidate abstract. The recall R for ROUGE-N and ROUGE-L is calculated as follows:

Wherein n-gram is n-gram word, n is {1,2}, refsummaries represents golden digest, count _match (n-gram) is number of n-gram appearing in candidate digest and golden digest at the same time, count (n-gram) is number of n-gram in golden digest, LCS represents length of longest public subsequence of candidate digest and golden digest, and l is length of golden digest. ROUGE-1 and ROUGE-2 calculate the coverage of the unigrams and bigrams, and ROUGE-L compares the similarity of the longest common subsequences.

In the extraction type abstract task, the precision (P) is the ratio of the length of the same sequence in the candidate abstract and the golden abstract to the length of the candidate abstract, and the recall rate (R) is the ratio of the length of the same sequence in the candidate abstract and the golden abstract to the length of the golden abstract. The values of P and R are both between 0 and 1, and it is desirable that the values of P and R are both high, but in general these two evaluation criteria are mutually constrained, i.e. an increase in P will result in a decrease in R, and vice versa. For example, in the abstract extraction task, only one sentence is extracted as a candidate abstract, and the sentence is located in the golden abstract, then P is 1, and r is very low; if all text is extracted as a candidate digest, then R is 1, but P will be very low.

In order to more fully evaluate the quality of the candidate digests, P and R should be considered simultaneously, i.e. the F1 value, the F1 value being a weighted geometric average of R and P, the calculation formula of the F1 value being as follows:

Where β is a preset value, it is determined whether P or R is more concerned, and F1 is a value of 1. For example, if the F1 value of ROUGE-N is used as the evaluation criterion for the abstract, when the F1 value of ROUGE-N is higher, it is indicated that the quality of the extracted abstract is closer to that of the golden abstract. For another example, if the F1 value of ROUGE-L is used as the evaluation criterion for the abstract, when the F1 value of ROUGE-L is higher, it is also indicated that the quality of the extracted abstract is closer to that of the golden abstract.

In some alternative embodiments, the evaluation index value includes an F1 value of ROUGE-1, an F1 value of ROUGE-2, and an F1 value of ROUGE-L;

Obtaining a reward value according to the evaluation index value, including:

And carrying out weighted summation according to the F1 value ROUGE-1, the F1 value ROUGE-2 and the F1 value ROUGE-L to obtain the bonus value.

In the technical scheme, the F1 value of ROUGE-1, the F1 value of ROUGE-2 and the F1 value of ROUGE-L are synthesized through a gradient strategy method in reinforcement learning, the rewarding values of the three indexes are integrated into an optimization objective function to perform automatic text abstract extraction, the information redundancy problem in the result abstract is effectively reduced, and the abstract extraction performance of an abstract extraction model is improved. In one embodiment, the average of the three index values, F1 value ROUGE-1, F1 value ROUGE-2, and F1 value ROUGE-L, may be selected as the prize value.

In some alternative embodiments, adjusting model parameters of the summary extraction model based on the prize value using a reinforcement learning algorithm with the goal of maximizing mathematical expectations of the prize value, includes:

calculating the gradient of the model parameter according to the mathematical expectation of the reward value;

Obtaining an updating formula of the model parameters according to the gradient of the model parameters;

And adjusting the model parameters by using the updated formula.

In the above technical solution, the reinforcement learning-based abstract extraction problem may be described as: optimization problems targeting the mathematical expectation of maximizing rewards are as follows:

Where p _θ =p (y|d, θ) represents the probability distribution of sentence tag vector y= [ y ₁,y₂,…,y_m]^T in the text given the text D to be summarized and the abstract extraction model parameter θ, θ ^* is the model parameter optimal solution of the abstract extraction model, And (3) representing the evaluation rewarding expectation obtained by abstracting the abstract when the text sentence classification label vector obeys the p _θ probability distribution, wherein r (·) is the abstract evaluation rewarding function.

When using the evaluation index ROUGE as the prize value, the mathematical expectation of the prize is not led, but ROUGE may be used as the prize in the reinforcement learning framework, so the reinforcement learning (REINFORCE) algorithm may solve this problem well. The parameter θ is updated using REINFORCE algorithm, the goal of equation (1.1) is to learn to distinguish sentences according to their frequency of occurrence in the high-score abstract. Because the bonus function ROUGE score is not differentiable, thenThe gradient with respect to the parameter θ is as follows:

Because of the large number of possible extractions, calculation of the desired term in equation (1.2) is impractical. The desired gradient in equation (1.2) can be approximated with a single sample, expressed as follows:

Therefore, the update formula of the model parameter θ in REINFORCE algorithm is as follows:

Where α is the step size.

The REINFORCE algorithm starts learning with a random strategy, which can make model training challenging for complex tasks like this, where a piece of text can produce a large number of candidate summaries. Therefore, the formula (1.4) is as followsIs limited to the set of samples of highest probability/>First, the first g sentences are selected from the text, which have a high ROUGE score per se, so that candidate summaries can be combined efficiently. K sentences are then selected from the g sentences, all possible combinations are generated and evaluated according to the golden abstract. /(I)The J candidate digests with the highest score of ROUGE out of all possible combinations generated are included. Use/>Instead of a true probability distribution, the possible exploration space can be reduced, and the search efficiency can be improved.

The training device of the abstract extraction model provided by the embodiment of the application comprises the following components:

the input module is used for inputting the text into the abstract extraction model, obtaining the sentence score of the sentences in the text, and selecting a plurality of sentences with the highest sentence score to form a candidate abstract;

the rewarding module is used for acquiring the golden abstract and obtaining rewarding values of the candidate abstracts according to the correlation degree of the golden abstract and the candidate abstracts;

the updating module is used for adjusting model parameters of the abstract extraction model by using a reinforcement learning algorithm with the mathematical expectation of maximizing the reward value as a target according to the reward value;

And the judging module is used for judging whether the training exit condition is reached, if so, ending the training to obtain a trained abstract extraction model.

According to the technical scheme, the text is input into the abstract extraction model by the input module, the abstract extraction model outputs the candidate abstract, the rewarding module obtains the rewarding value of the candidate abstract according to the correlation degree of the candidate abstract and the golden abstract, the updating module learns the sequence of sentences in the text by strengthening the network structure of the training abstract extraction model, so that the abstract extraction model can make real-time adjustment according to the rewarding value of the candidate abstract to obtain the abstract with higher quality, and the judging module stops training when the training exit condition is reached. The abstract generated by the trained abstract extraction model has higher abstract readability.

In some alternative embodiments, the input module is configured to calculate a probability of each sentence in the text appearing in the candidate abstract, and take the probability as the sentence score.

In some alternative embodiments, the reward module is configured to calculate an evaluation index value according to the golden summary and the candidate summary by using an internal evaluation method of automatic text summary; and obtaining the rewarding value according to the evaluation index value.

In some alternative embodiments, the update module is configured to calculate a gradient of the model parameter based on a mathematical expectation of the reward value; obtaining an updating formula of the model parameters according to the gradient of the model parameters; and adjusting the model parameters by using the updated formula.

An electronic device provided by an embodiment of the present application includes: a processor and a memory storing machine-readable instructions executable by the processor, which when executed by the processor, perform a method as any one of the above.

A computer readable storage medium having stored thereon a computer program which, when executed by a processor, performs a method as described in any of the above.

Drawings

In order to more clearly illustrate the technical solutions of the embodiments of the present application, the drawings that are needed in the embodiments of the present application will be briefly described below, it should be understood that the following drawings only illustrate some embodiments of the present application and should not be considered as limiting the scope, and other related drawings can be obtained according to these drawings without inventive effort for a person skilled in the art.

FIG. 1 is a flowchart of a training method of a summary extraction model according to an embodiment of the present application;

FIG. 2 is a schematic diagram of a model structure provided by the present application;

FIG. 3 is a functional block diagram of a training device of a summary extraction model according to an embodiment of the present application;

Fig. 4 is a schematic diagram of a possible structure of an electronic device according to an embodiment of the present application.

Icon: 1-input module, 2-rewarding module, 3-updating module, 4-judging module, 110-communication unit, 120-memory, 130-input unit, 131-touch-sensitive surface, 132-other input device, 140-display unit, 141-display panel, 150-sensor, 160-audio circuit, 170-wireless communication unit, 180-processor, 190-power supply.

Detailed Description

The technical solutions in the embodiments of the present application will be described below with reference to the accompanying drawings in the embodiments of the present application.

Referring to fig. 1, fig. 1 is a flowchart of a training method of a summary extraction model according to an embodiment of the present application, including:

Step 100, inputting the text into a abstract extraction model, obtaining sentence scores of sentences in the text, and selecting a plurality of sentences with highest sentence scores to form a candidate abstract;

The abstract extraction model comprises an RNN model, a BERT model and the like. The abstract extraction model is used for making extraction decisions, and is interacted with an interaction environment consisting of texts. Before training begins, a abstract extraction model is initialized, a text D (which consists of m sentences, { x ₁,x₂,…,x_m }) is read, and a relevance score of each sentence x _i in the text is predicted by using a strategy p (y _i|x_i, theta) (wherein y _i epsilon {0,1} is a classification label of whether an ith sentence x _i in the text D appears in a candidate abstract or not, and theta is a model parameter), and after the text is input into the abstract extraction model, the abstract with the label y is extracted from the sequenced sentences.

Step 200, acquiring a golden abstract, and obtaining a reward value of a candidate abstract according to the correlation degree of the golden abstract and the candidate abstract;

The golden abstract is a true value for generating the abstract, the abstract of the text can be obtained in a manual abstract mode, and then the abstract is used as the golden abstract. The correlation degree of the golden abstract and the candidate abstract can be characterized by the calculation result of an automatic text abstract evaluation method and obtain corresponding rewards, wherein the automatic text abstract evaluation method comprises an internal evaluation method and an external evaluation method.

Step 300, according to the rewarding value, utilizing a reinforcement learning algorithm to adjust model parameters of the abstract extraction model with the aim of maximizing mathematical expectation of the rewarding value;

The reinforcement learning (Reinforcement Learning, RL) is an important method of machine learning, and is mainly embodied on reinforcement signals, and the action generated in reinforcement learning is evaluated by the reinforcement signals provided by the environment, so that the reinforcement learning system is not informed of how to generate the correct action. Because less information is provided by the external environment, reinforcement learning systems must learn from their own experience. In this way, the reinforcement learning system acquires knowledge in the action-assessment environment while improving the action plan to accommodate the environment. The technology interacts with the environment through the agent, and seeks an optimal strategy to obtain the maximum return, has natural advantages for decision on discrete space, and is suitable for the task of automatic text summarization.

Reinforcement learning is different from supervised learning, which is learning from a series of labeled training sets. Supervised learning is an important learning approach, but simply doing so is not sufficient to learn from interactions. Reinforcement learning is also distinguished from unsupervised learning, which typically looks for hidden structures in unlabeled data sets. Reinforcement learning does not rely on samples of correct behavior, but instead of finding hidden structures, reinforcement learning aims at maximizing the reward signal. Exploring the structure in the agent experience is of course useful for reinforcement learning, but does not in itself address the problem of maximization of the reward signal in reinforcement learning. Thus, reinforcement learning is considered to be another machine learning than supervised learning and unsupervised learning.

Reinforcement learning is learning by an agent in a "trial and error" manner, and guides behavior by rewards obtained by constantly interacting with the environment, in order to maximize rewards obtained by the agent.

Step 400, judging whether a training exit condition is met, if yes, entering step 500; if not, returning to the step 100;

Specifically, the training exit condition may be whether the training times reach a set value, and it should be clear that the training exit condition may also be other conditions for converging the model, for example, the reward values of candidate summaries obtained in the continuous multiple training process are all greater than a threshold value.

And 500, obtaining a trained abstract extraction model.

Referring to fig. 2, fig. 2 is a schematic diagram of a model structure provided by the present application, a text D is input into a summary extraction model, the summary extraction model outputs a candidate summary, and then a reward value of the candidate summary is obtained according to a correlation degree between the candidate summary and a golden summary, and a web structure of a reinforcement learning training summary extraction model is utilized to learn the ranking of sentences in the text, so that the summary extraction model can make real-time adjustment according to the reward value of the candidate summary, thereby obtaining a higher quality extraction summary and improving the readability of the generated summary.

In some alternative embodiments, the summary extraction model includes: a BERT model and a logistic regression classifier; wherein the BERT model is a model stacked by a bi-directional transducer encoder. In the embodiment of the application, the abstract extraction model is composed of a BERT stacked several layers of a transducer encoder substructure, and finally a layer of logistic regression classifier is added to obtain sentence scores in texts, sentences with highest sentence scores are selected to form candidate abstracts, and the correlation degree of the candidate abstracts and the golden abstracts is used as rewards, wherein the rewards are used for strengthening a learning algorithm to update model parameters. The BERT (Bidirectional Encoder Representations from Transformers) structure is a natural language processing model based on a transducer architecture, is a pre-training model, and can be used for various NLP tasks, such as text classification, named entity recognition, sentence relation judgment and the like. The BERT model was introduced by Google in 2018, which captures semantic information by learning context information in both directions. Unlike previous pre-training models, the BERT model has the feature of bi-directional pre-training, which can be used to represent contextual information and semantics of words, so that text can be analyzed from different perspectives.

In some alternative embodiments, inputting the text into the abstract extraction model to obtain sentence scores of sentences in the text, including: the probability of each sentence in the text appearing in the candidate abstract is calculated and used as a sentence score.

In the embodiment of the application, the abstract extraction model calculates the probability p (y _i|x_i, θ) that each sentence x _i in the text appears in the candidate abstract, namely the sentence score _i, i=1, 2, …, m. And then, according to the sentence score ranking, selecting k sentences with highest ranking (k is smaller than m) to form a candidate abstract.

In some alternative embodiments, obtaining the prize value of the candidate digest according to the correlation degree between the golden digest and the candidate digest includes: calculating an evaluation index value according to the golden abstract and the candidate abstract by using an internal evaluation method of the automatic text abstract;

and obtaining the rewarding value according to the evaluation index value.

In the embodiment of the application, the automatic text abstract evaluation method is divided into an internal evaluation method and an external evaluation method. The internal evaluation method is that a text abstract is obtained by means of manual abstract, then the text abstract is used as a golden abstract, and compared with a candidate abstract generated by a system, and if the coincidence degree of the text abstract and the candidate abstract is high, the quality of the candidate abstract is higher. The external evaluation method is that under the condition of no golden abstract, the generated candidate abstract is directly used for replacing the original text to execute tasks related to the text, such as text retrieval, classification, clustering and the like, and if the application performance is improved, the quality of the candidate abstract is high. There are two common internal evaluation methods Edmundson and ROUGE, and the present embodiment calculates an evaluation index value using the internal evaluation method.

In some alternative embodiments, the internal evaluation method comprises a ROUGE method; the evaluation index value includes an F1 value of ROUGE-N and/or an F1 value of ROUGE-L. In the embodiment of the application, ROUGE method evaluates the quality of the candidate abstract by comparing and calculating the automatically generated candidate abstract with the manually obtained golden abstract to obtain the corresponding score. The recall R for ROUGE-N and ROUGE-L is calculated as follows:

Obtaining a reward value according to the evaluation index value, including: and carrying out weighted summation according to the F1 value ROUGE-1, the F1 value ROUGE-2 and the F1 value ROUGE-L to obtain the bonus value.

In the embodiment of the application, the F1 value of ROUGE-1, the F1 value of ROUGE-2 and the F1 value of ROUGE-L are synthesized through a gradient strategy method in reinforcement learning, and the three index synthesized reward values are fused into an optimization objective function to perform automatic text abstract extraction, so that the information redundancy problem in the result abstract is effectively reduced, and the abstract extraction performance of an abstract extraction model is improved.

In some alternative embodiments, adjusting model parameters of the summary extraction model based on the prize value using a reinforcement learning algorithm with the goal of maximizing mathematical expectations of the prize value, includes: calculating the gradient of the model parameter according to the mathematical expectation of the reward value; obtaining an updating formula of the model parameters according to the gradient of the model parameters;

And adjusting the model parameters by using the updated formula.

In an embodiment of the present application, the reinforcement learning-based abstract extraction problem can be described as: optimization problems targeting the mathematical expectation of maximizing rewards are as follows:

Where p _θ =p (y|d, θ) represents the probability distribution of sentence tag vector y= [ y ₁,y₂,…,y_m]^T in the text given the text D to be summarized and the abstract extraction model parameter θ, θ ^* is the model parameter optimal solution of the abstract extraction model, And (3) representing the evaluation rewarding expectation obtained by abstracting the abstract when the text sentence classification label vector obeys the p _θ probability distribution, wherein r (·) is the abstract evaluation rewarding function. When using the evaluation index ROUGE as the prize value, the mathematical expectation of the prize is not led, but ROUGE may be used as the prize in the reinforcement learning framework, so the reinforcement learning (REINFORCE) algorithm may solve this problem well. The parameter θ is updated using REINFORCE algorithm, the goal of equation (1.1) is to learn to distinguish sentences according to their frequency of occurrence in the high-score abstract. Because the bonus function ROUGE score is not differentiable, then/>The gradient with respect to the parameter θ is as follows:

Where α is the step size.

All the above alternative solutions of the present embodiment may be combined in any manner to form the alternative solution of the present invention, which is not illustrated herein.

Referring to fig. 3, fig. 3 is a training device for a summary extraction model according to an embodiment of the present application, which includes an input module 1, a reward module 2, an update module 3, and a judgment module 4.

The input module 1 is used for inputting the text into the abstract extraction model, obtaining the sentence score of the sentences in the text, and selecting a plurality of sentences with highest sentence scores to form the candidate abstract. And the rewarding module 2 is used for acquiring the golden abstract and obtaining the rewarding value of the candidate abstract according to the correlation degree of the golden abstract and the candidate abstract. And the updating module 3 is used for adjusting model parameters of the abstract extraction model by using a reinforcement learning algorithm with the aim of maximizing mathematical expectation of the reward value according to the reward value. And the judging module 4 is used for judging whether the training exit condition is reached, if so, ending the training to obtain a trained abstract extraction model.

In the embodiment of the application, the text is input into the abstract extraction model by using the input module 1, the abstract extraction model outputs a candidate abstract, the reward value of the candidate abstract is obtained by using the reward module 2 according to the correlation degree of the candidate abstract and the golden abstract, the sentence ordering in the text is learned by using the update module 3 through the network structure of the reinforcement learning training abstract extraction model, so that the abstract extraction model can make real-time adjustment according to the reward value of the candidate abstract, the abstract extraction abstract with higher quality is obtained, and the training is stopped when the training exit condition is reached by using the judgment module 4. The abstract generated by the trained abstract extraction model has higher abstract readability.

The training device of the embodiment, the training method for implementing the abstract extraction model by using the above modules is the same as the implementation mechanism of the related method embodiment, and details of the implementation mechanism can refer to the description of the above embodiment, and will not be repeated here.

The training device of the present embodiment may be specifically disposed on the browser client side, and used as an engine device of the browser. But may also be provided in an electronic device to perform its functions separately.

Fig. 4 is a schematic structural diagram of an electronic device according to an embodiment of the present invention. Referring to fig. 4, the electronic device may be used to implement the static feature extraction method of the malicious document provided in the above-described embodiment.

Specifically, the present invention relates to a method for manufacturing a semiconductor device.

The electronic device may include a communication unit 110, a memory 120 including one or more computer-readable storage media, an input unit 130, a display unit 140, a sensor 150, an audio circuit 160, a WiFi (WIRELESSFIDELITY ) module 170, a processor 180 including one or more processing cores, and a power supply 190, among other components. Those skilled in the art will appreciate that the electronic device structure shown in fig. 4 is not limiting of the electronic device and may include more or fewer components than shown, or may combine certain components, or may be arranged in different components. Wherein:

The communication unit 110 may be used for receiving and transmitting information or signals during a call, and the communication unit 110 may be a network communication device such as an RF (radio frequency) circuit, a router, a modem, and the like. In particular, when the communication unit 110 is an RF circuit, the downlink information of the base station is received and then processed by one or more processors 180; in addition, data relating to uplink is transmitted to the base station. Typically, RF circuitry as a communication unit includes, but is not limited to, an antenna, at least one amplifier, a tuner, one or more oscillators, a Subscriber Identity Module (SIM) card, a transceiver, a coupler, an LNA (Low Noise Amplifier ), a duplexer, and the like. In addition, the communication unit 110 may also communicate with networks and other devices through wireless communication. The wireless communication may use any communication standard or protocol including, but not limited to, GSM (Global Systemof Mobile communication, global system for mobile communications), GPRS (GENERAL PACKET radio service), CDMA (Code Division Multiple Access ), WCDMA (Wideband Code Division Multiple Access, wideband code division multiple access), LTE (Long Term Evolution ), email, SMS (short MESSAGING SERVICE, short message service), etc. The memory 120 may be used to store software programs and modules, and the processor 180 performs various functional applications and data processing by executing the software programs and modules stored in the memory 120. The memory 120 may mainly include a storage program area and a storage data area, wherein the storage program area may store an operating system, an application program (such as a sound playing function, an image playing function, etc.) required for at least one function, and the like; the storage data area may store data created according to the use of the electronic device (such as audio data, phonebooks, etc.), and the like. In addition, memory 120 may include high-speed random access memory, and may also include non-volatile memory, such as at least one magnetic disk storage device, flash memory device, or other volatile solid-state storage device. Accordingly, the memory 120 may also include a memory controller to provide access to the memory 120 by the processor 180 and the input unit 130.

The input unit 130 may be used to receive input numeric or character information and to generate keyboard, mouse, joystick, optical or trackball signal inputs related to user settings and function control. In particular, the input unit 130 may comprise a touch sensitive surface 131 and other input devices 132. The touch-sensitive surface 131, also referred to as a touch display screen or a touch pad, may collect touch operations thereon or thereabout by a user (e.g., operations of the user on the touch-sensitive surface 131 or thereabout by using any suitable object or accessory such as a finger, stylus, etc.), and actuate the corresponding connection means according to a predetermined program. Alternatively, the touch sensitive surface 131 may comprise two parts, a touch detection device and a touch controller. The touch detection device detects the touch azimuth of a user, detects a signal brought by touch operation and transmits the signal to the touch controller; the touch controller receives touch information from the touch detection device and converts it into touch point coordinates, which are then sent to the processor 180, and can receive commands from the processor 180 and execute them. In addition, the touch-sensitive surface 131 may be implemented in various types of resistive, capacitive, infrared, surface acoustic wave, and the like. In addition to the touch-sensitive surface 131, the input unit 130 may also comprise other input devices 132. In particular, other input devices 132 may include, but are not limited to, one or more of a physical keyboard, function keys (e.g., volume control keys, switch keys, etc.), a trackball, mouse, joystick, etc.

The display unit 140 may be used to display information input by a user or information provided to the user and various graphical user interfaces of the electronic device, which may be composed of graphics, text, icons, video, and any combination thereof. The display unit 140 may include a display panel 141, and alternatively, the display panel 141 may be configured in the form of an LCD (Liquid CRYSTAL DISPLAY), an OLED (Organic Light-Emitting Diode), or the like. Further, the touch-sensitive surface 131 may overlay the display panel 141, and upon detection of a touch operation thereon or thereabout by the touch-sensitive surface 131, the touch-sensitive surface is transferred to the processor 180 to determine the type of touch event, and the processor 180 then provides a corresponding visual output on the display panel 141 based on the type of touch event. Although in fig. 4 the touch-sensitive surface 131 and the display panel 141 are implemented as two separate components for input and output functions, in some embodiments the touch-sensitive surface 131 may be integrated with the display panel 141 for input and output functions.

The electronic device may also include at least one sensor 150, such as a light sensor, a motion sensor, and other sensors. Specifically, the light sensor may include an ambient light sensor that may adjust the brightness of the display panel 141 according to the brightness of ambient light, and a proximity sensor that may turn off the display panel 141 and/or the backlight when the electronic device moves to the ear. As one of the motion sensors, the gravity acceleration sensor can detect the acceleration in all directions (generally three axes), and can detect the gravity and the direction when the mobile phone is stationary, and can be used for applications of recognizing the gesture of the mobile phone (such as horizontal and vertical screen switching, related games, magnetometer gesture calibration), vibration recognition related functions (such as pedometer and knocking), and the like; other sensors such as gyroscopes, barometers, hygrometers, thermometers, infrared sensors, etc. that may also be configured with the electronic device are not described in detail herein.

Audio circuitry 160, speakers, and a microphone may provide an audio interface between the user and the electronic device. The audio circuit 160 may transmit the received electrical signal converted from audio data to a speaker, and convert the electrical signal into a sound signal for output; on the other hand, the microphone converts the collected sound signals into electrical signals, which are received by the audio circuit 160 and converted into audio data, which are processed by the audio data output processor 180 for transmission to, for example, another electronic device via the RF circuit 110, or which are output to the memory 120 for further processing. Audio circuitry 160 may also include an ear bud jack to provide communication of the peripheral headphones with the electronic device.

To enable wireless communication, the electronic device may be configured with a wireless communication unit 170, and the wireless communication unit 170 may be a WiFi module. WiFi belongs to a short-distance wireless transmission technology, and the electronic equipment can help a user to send and receive e-mails, browse web pages, access streaming media and the like through the wireless communication unit 170, so that wireless broadband Internet access is provided for the user. Although fig. 4 shows the wireless communication unit 170, it is understood that it does not belong to the necessary constitution of the electronic device, and can be omitted entirely as necessary within a range not changing the essence of the invention.

The processor 180 is a control center of the electronic device, connects various parts of the entire cellular phone using various interfaces and lines, and performs various functions of the electronic device and processes data by running or executing software programs and/or modules stored in the memory 120, and calling data stored in the memory 120, thereby performing overall monitoring of the cellular phone. Optionally, the processor 180 may include one or more processing cores; preferably, the processor 180 may integrate an application processor that primarily handles operating systems, user interfaces, applications, etc., with a modem processor that primarily handles wireless communications. It will be appreciated that the modem processor described above may not be integrated into the processor 180.

The electronic device also includes a power supply 190 (e.g., a battery) for powering the various components, which may be logically connected to the processor 180 via a power management system, such as to provide for managing charge, discharge, and power consumption by the power management system. The power supply 190 may also include one or more of any of a direct current or alternating current power supply, a recharging system, a power failure detection circuit, a power converter or inverter, a power status indicator, and the like.

Although not shown, the electronic device may further include a camera, a bluetooth module, etc., which will not be described herein. In particular, in this embodiment, the display unit of the electronic device is a touch screen display, the electronic device further includes a memory, and one or more programs, wherein the one or more programs are stored in the memory and configured to be executed by the one or more processors, the one or more programs including instructions for: inputting the text into a abstract extraction model, obtaining sentence scores of sentences in the text, and selecting a plurality of sentences with highest sentence scores to form a candidate abstract; acquiring a golden abstract, and obtaining a reward value of a candidate abstract according to the correlation degree of the golden abstract and the candidate abstract; according to the rewarding value, utilizing a reinforcement learning algorithm, and aiming at maximizing mathematical expectation of the rewarding value, adjusting model parameters of the abstract extraction model; and repeating the process from obtaining sentence scores of sentences in the text to adjusting model parameters of the abstract extraction model until the training exit condition is reached, and obtaining the trained abstract extraction model.

Optionally, the memory is further configured to store the following instructions: the probability of each sentence in the text appearing in the candidate abstract is calculated and used as a sentence score.

Optionally, the memory is further configured to store the following instructions: calculating an evaluation index value according to the golden abstract and the candidate abstract by using an internal evaluation method of the automatic text abstract; and obtaining the rewarding value according to the evaluation index value.

Optionally, the memory is further configured to store the following instructions: and carrying out weighted summation according to the F1 value ROUGE-1, the F1 value ROUGE-2 and the F1 value ROUGE-L to obtain the bonus value.

Optionally, the memory is further configured to store the following instructions: calculating the gradient of the model parameter according to the mathematical expectation of the reward value; obtaining an updating formula of the model parameters according to the gradient of the model parameters; and adjusting the model parameters by using the updated formula.

In the embodiments provided in the present application, it should be understood that the disclosed apparatus and method may be implemented in other manners. The above-described apparatus embodiments are merely illustrative, for example, the division of the units is merely a logical function division, and there may be other manners of division in actual implementation, and for example, multiple units or components may be combined or integrated into another system, or some features may be omitted, or not performed. Alternatively, the coupling or direct coupling or communication connection shown or discussed with each other may be through some communication interface, device or unit indirect coupling or communication connection, which may be in electrical, mechanical or other form.

Further, the units described as separate units may or may not be physically separate, and units displayed as units may or may not be physical units, may be located in one place, or may be distributed over a plurality of network units. Some or all of the units may be selected according to actual needs to achieve the purpose of the solution of this embodiment.

Furthermore, functional modules in various embodiments of the present application may be integrated together to form a single portion, or each module may exist alone, or two or more modules may be integrated to form a single portion.

In this document, relational terms such as first and second, and the like may be used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions.

The above description is only an example of the present application and is not intended to limit the scope of the present application, and various modifications and variations will be apparent to those skilled in the art. Any modification, equivalent replacement, improvement, etc. made within the spirit and principle of the present application should be included in the protection scope of the present application.

Claims

1. A method for training a summary extraction model, comprising:

Inputting a text into a abstract extraction model, obtaining sentence scores of sentences in the text, and selecting a plurality of sentences with highest sentence scores to form a candidate abstract;

according to the rewarding value, using a reinforcement learning algorithm to adjust model parameters of the abstract extraction model with the aim of maximizing mathematical expectation of the rewarding value;

repeating the process from obtaining sentence scores of sentences in the text to adjusting model parameters of the abstract extraction model until training exit conditions are reached, and obtaining the trained abstract extraction model.

2. The method of claim 1, wherein the summary extraction model comprises: a BERT model and a logistic regression classifier; the BERT model is a model stacked through a bidirectional transducer encoder and is used for extracting feature vectors by utilizing sentence context information, and the logistic regression classifier is used for determining the probability of sentences appearing in candidate abstracts according to the extracted feature vectors.

3. The method of claim 1, wherein inputting text into a abstract extraction model to obtain a sentence score for a sentence in the text comprises:

And calculating the probability of each sentence in the text appearing in the candidate abstract, and taking the probability as the sentence score.

4. The method of claim 1, wherein the obtaining the prize value of the candidate digest according to the degree of correlation between the golden digest and the candidate digest comprises:

and obtaining the rewarding value according to the evaluation index value.

5. The method of claim 4, wherein the internal evaluation method comprises a ROUGE method; the evaluation index value includes an F1 value of ROUGE-N and/or an F1 value of ROUGE-L.

6. The method of claim 5, wherein the evaluation index value includes an F1 value of ROUGE-1, an F1 value of ROUGE-2, and an F1 value of ROUGE-L;

the obtaining the reward value according to the evaluation index value includes:

7. The method of claim 1, wherein said adjusting model parameters of said summary extraction model based on said prize value using a reinforcement learning algorithm with the goal of maximizing mathematical expectations of said prize value, comprises:

Calculating a gradient of the model parameter according to the mathematical expectation of the reward value;

and adjusting the model parameters by using the updated formula.

8. A training device for a summary extraction model, comprising:

The rewarding module is used for acquiring a golden abstract and obtaining a rewarding value of the candidate abstract according to the correlation degree of the golden abstract and the candidate abstract;

The updating module is used for adjusting model parameters of the abstract extraction model by using a reinforcement learning algorithm with the aim of maximizing mathematical expectation of the reward value according to the reward value;

9. An electronic device, comprising: a processor and a memory storing machine-readable instructions executable by the processor, which when executed by the processor, perform the method of any of claims 1-7.

10. A computer-readable storage medium, characterized in that the storage medium has stored thereon a computer program which, when run by a processor, performs the method according to any of claims 1-7.