CN115129819A

CN115129819A - Text abstract model production method and device, equipment and medium thereof

Info

Publication number: CN115129819A
Application number: CN202210833616.1A
Authority: CN
Inventors: 郑彦
Original assignee: Guangzhou Huanju Shidai Information Technology Co Ltd
Current assignee: Guangzhou Huanju Shidai Information Technology Co Ltd
Priority date: 2022-07-14
Filing date: 2022-07-14
Publication date: 2022-09-30

Abstract

The application relates to a text abstract model production method, a device, equipment and a medium thereof in the technical field of computers, wherein the method comprises the following steps: acquiring a training set, wherein the training set comprises a plurality of sample data, and the sample data comprises paragraph texts and abstract texts corresponding to the paragraph texts; taking a keyword sequence of a non-core statement in a paragraph text in the sample data as a training sample, taking the non-core statement as a supervision label, and carrying out self-supervision training on a preset generator; replacing a corresponding non-core sentence in a paragraph text in the sample data by an equivalent sentence, covering partial core sentences in the paragraph text as training samples, and pre-training a text abstract model to be convergent by using the core sentences as supervision labels; and taking the paragraph texts as training samples, taking the abstract texts corresponding to the paragraph texts as supervision labels, and performing fine tuning training on the text abstract model until convergence. The text abstract generating method and device can generate the text abstract with high quality.

Description

Text abstract model production method, device, equipment and medium

Technical Field

The present application relates to the field of computer technologies, and in particular, to a text abstract model production method, and a corresponding apparatus, computer device, and computer-readable storage medium.

Background

In order to cope with the phenomenon of information overload, an information processing technology capable of quickly filtering and utilizing effective information has been developed, which has been developed in two directions, an extraction summary (extraction summary) and a generation summary (generation summary). The abstract type abstract is that words and/or sentences with the top importance sequence are extracted from the original text as the abstract by sequencing the importance of the words and/or sentences in the original text; the generated abstract is to understand the semantic information of the original text through a machine and to transfer the refined words and/or sentences as the abstract.

The neural network model applied to the generative abstract generally adopts a noise function to destroy the original text input into the neural network model during pre-training, so that the model can repair the input destroyed original text, the semantic information of the original text can be more accurately extracted after the model is pre-trained to be convergent, and the subsequent generation of the abstract with high readability and accuracy is facilitated. In the conventional technology, the noise function usually adopts token masking (token masking), which can force the neural network model to restore the masked lemmas in the original text by understanding the context semantic information corresponding to the masked lemmas in the original text in the pre-training stage, thereby improving the capability of the model for extracting the semantic information of the original text to a certain extent.

In practice, the phenomenon that the generation quality is poor when the abstract is generated in a neural network model realized under the traditional technology is found, so that a new method is expected to be developed to search a better solution.

Disclosure of Invention

A primary object of the present application is to solve at least one of the above problems and provide a text abstract model production method, and a corresponding apparatus, computer device, and computer readable storage medium.

In order to meet various purposes of the application, the following technical scheme is adopted in the application:

the text abstract model production method adaptive to one of the purposes of the application comprises the following steps:

acquiring a training set, wherein the training set comprises a plurality of sample data, the sample data comprises a paragraph text and a corresponding abstract text thereof, the paragraph text comprises a core sentence and a non-core sentence, and the core sentence represents the key semantics of the paragraph text;

taking a keyword sequence of a non-core statement in a paragraph text in the sample data as a training sample, taking the non-core statement as a supervision label, and carrying out self-supervision training on a preset generator to enable the generator trained to be converged to be suitable for generating an equivalent statement of the non-core statement according to the keyword sequence;

replacing a corresponding non-core sentence in a paragraph text in the sample data by an equivalent sentence, covering partial core sentences in the paragraph text as training samples, and pre-training a text abstract model to be convergent by using the core sentences as supervision labels;

and taking the paragraph text in the sample data as a training sample, taking the abstract text corresponding to the paragraph text as a supervision label, and performing fine tuning training on the text abstract model until convergence.

In a preferred embodiment, the step of masking part of the core sentences in the paragraph text as training samples includes the following steps:

coding sentences in the paragraph text by adopting a text coding model which is trained to be convergent in advance to obtain a coding vector corresponding to each sentence;

calculating the average value of the coding vectors corresponding to each statement to obtain a central vector;

determining a core statement in the paragraph text according to a vector distance between the coding vector corresponding to each statement and the center vector;

and selecting part of core sentences in the paragraph text according to a preset proportion to replace the core sentences with specific covering marks.

In the extended embodiment, the step of replacing the corresponding non-core sentence in the paragraph text in the sample data with the equivalent sentence includes the following steps:

selecting a non-core sentence corresponding to the core sentence context in the paragraph text, extracting a target keyword text in the non-core sentence, obtaining position information corresponding to the target keyword text in the sentence, and constructing a keyword sequence, wherein the target keyword text comprises a noun text, a verb text, a subject clause keyword text and a fixed language clause keyword text;

and generating an equivalent sentence of the non-core sentence by adopting a generator of the anti-neural network which is trained to be converged in advance according to the keyword sequence, and replacing the corresponding non-core sentence in the paragraph text in the sample data by the equivalent sentence.

In a further embodiment, the training process of the generator of the anti-neural network includes the following steps:

accessing a preset generator to a pre-trained to converged discriminator to construct an antagonistic neural network, and freezing the weight of the discriminator;

calling a single training sample to be input into the generator to generate an equivalent sentence corresponding to the training sample, wherein the training sample is a keyword sequence of a non-core sentence corresponding to a core sentence context in a paragraph text in the sample data;

calling the discriminator to perform classification mapping on the equivalent sentences, mapping the equivalent sentences to a binary space, and obtaining classification probabilities corresponding to the mapping to a positive class space or a negative class space;

and (3) assuming that the supervision label of the called training sample calculates a loss value for the positive type sample, judging that the generator converges to terminate the training when the loss value reaches a preset threshold value, and otherwise calling another training sample to continue to carry out iterative training on the generator.

In a further embodiment, the method comprises the following steps of accessing a preset generator to a pre-trained to converged discriminator to construct an anti-neural network, and before freezing the weight of the discriminator:

taking a keyword sequence of a non-core sentence in a paragraph text in the sample data as a training sample, taking the non-core sentence as a supervision label, and supervising and training a preset generator until convergence;

calling a generator to generate equivalent sentences corresponding to the non-core sentences as training samples of a training discriminator according to the keyword sequence of the training samples, and taking the corresponding non-core sentences as supervision labels;

calling a discriminator to perform classification mapping on a training sample for training the discriminator, mapping the training sample to a binary space, and obtaining classification probability corresponding to mapping to a positive class space or a negative class space;

and calculating a loss value of the corresponding classification probability according to the supervision label corresponding to the training sample of the discriminator, judging that the discriminator is converged to terminate the training when the loss value reaches a preset threshold value, and calling the next training sample to continue to carry out iterative training if the loss value does not reach the preset threshold value.

In a further embodiment, the pre-training of the text summarization model to convergence comprises the following steps:

calling an encoder of a text abstract model to extract deep semantic features corresponding to covered partial core sentences in a single training sample, and encoding a first text feature vector;

a decoder of the text abstract model is called to decode the first text characteristic vector to obtain probability distribution corresponding to mapping of each word element in the first decoded text to a preset dictionary;

and calculating a loss value corresponding to the probability distribution according to a supervision label corresponding to the training sample, judging that the model converges when the loss value reaches a preset threshold value, and stopping pre-training, otherwise calling another training sample to continue to carry out iterative training on the model.

In a further embodiment, the method for performing fine tuning training on the text summarization model until convergence comprises the following steps:

a coder of a text abstract model is called to extract deep semantic features corresponding to paragraph texts of a single training sample, and a second text feature vector is coded;

a decoder of the text abstract model is called to decode the second text feature vector to obtain probability distribution corresponding to mapping of each word element in the second decoded text to a preset dictionary;

and calculating a loss value corresponding to the probability distribution according to a supervision label corresponding to the training sample, judging that the model converges when the loss value reaches a preset threshold value, and stopping fine tuning training, otherwise calling another training sample to continue to carry out iterative training on the model.

On the other hand, a text abstract model production device adapted to one of the purposes of the present application includes a data acquisition module, a generator training module, a pre-training module, and a fine-tuning training module, wherein: the data acquisition module is used for acquiring a training set, wherein the training set comprises a plurality of sample data, the sample data comprises a paragraph text and a corresponding abstract text thereof, the paragraph text comprises a core sentence and a non-core sentence, and the core sentence represents key semantics of the paragraph text; the generator training module is used for taking a keyword sequence of a non-core sentence in a paragraph text in the sample data as a training sample, taking the non-core sentence as a supervision label, and carrying out self-supervision training on a preset generator so that the generator trained to be converged is suitable for generating an equivalent sentence of the non-core sentence according to the keyword sequence; the pre-training module is used for replacing a corresponding non-core sentence in a paragraph text in the sample data by an equivalent sentence, covering a part of core sentences in the paragraph text as a training sample, and pre-training a text abstract model to be convergent by taking the core sentences as a supervision label; and the fine tuning training module is used for taking the paragraph texts in the sample data as training samples and taking the abstract texts corresponding to the paragraph texts as supervision labels to carry out fine tuning training on the text abstract model until convergence.

In a preferred embodiment, the pre-training module includes: the sentence coding submodule is used for coding the sentences in the paragraph text by adopting a text coding model which is trained to be convergent in advance to obtain a coding vector corresponding to each sentence; the central vector submodule is used for calculating the average value of the coding vectors corresponding to the statements to obtain a central vector; a core determining submodule, configured to determine a core sentence in the paragraph text according to a vector distance between the coding vector corresponding to each sentence and the center vector; and the text replacement submodule is used for selecting part of core sentences in the paragraph text according to a preset proportion and replacing the core sentences with the specific covering marks.

In an extended embodiment, the pre-training module includes: the text extraction sub-module is used for selecting a non-core sentence corresponding to the context of the core sentence in the paragraph text, extracting a target keyword text in the non-core sentence, obtaining position information corresponding to the target keyword text in the sentence and constructing a keyword sequence, wherein the target keyword text comprises a noun text, a verb text, a subject clause keyword text and a fixed language clause keyword text; and the sentence generation submodule is used for generating an equivalent sentence of the non-core sentence by adopting a generator of the antagonistic neural network which is trained to be convergent in advance according to the keyword sequence so as to replace the corresponding non-core sentence in the paragraph text in the sample data.

In a further embodiment, the training process of the generator of the anti-neural network includes: the network construction module is used for accessing a preset generator into a discriminator which is trained to be convergent in advance to construct an antagonistic neural network and freezing the weight of the discriminator; the first sentence editing module is used for calling a single training sample to be input into the generator to generate an equivalent sentence corresponding to the training sample, wherein the training sample is a keyword sequence of a non-core sentence corresponding to a core sentence context in a paragraph text in the sample data; the first classification mapping module is used for calling the discriminator to perform classification mapping on the equivalent sentences and mapping the equivalent sentences to a binary space to obtain a classification probability corresponding to a positive class space or a negative class space; and the first iterative training module is used for calculating a loss value for the positive type sample on the assumption that the supervision label of the called training sample is used, judging that the generator converges to terminate the training when the loss value reaches a preset threshold value, and otherwise calling another training sample to continue to carry out iterative training on the generator.

In a further embodiment, before the network constructing module, the method further includes: the preset generator training module is used for taking a keyword sequence of a non-core sentence in a paragraph text in the sample data as a training sample, taking the non-core sentence as a supervision label, and supervising and training a preset generator until convergence; the second sentence compiling module is used for calling the generator to generate equivalent sentences corresponding to the non-core sentences as training samples of the training discriminator according to the keyword sequence of the training samples, and the corresponding non-core sentences are used as supervision labels; the second classification mapping submodule is used for calling the discriminator to perform classification mapping on the training sample for training the discriminator and mapping the training sample to a binary space to obtain the corresponding classification probability of mapping to a positive class space or a negative class space; and the second iterative training module is used for calculating a loss value of corresponding classification probability according to the supervision label corresponding to the training sample of the discriminator, judging the discriminator to be converged when the loss value reaches a preset threshold value, and stopping training, otherwise calling the next training sample to continue to carry out iterative training.

In a further embodiment, the pre-training module includes: the first coding sub-module is used for calling a coder of a text abstract model to extract deep semantic features corresponding to covered partial core sentences in a single training sample and coding a first text feature vector; the first coding submodule is used for calling a decoder of the text abstract model to decode the first text feature vector to obtain probability distribution corresponding to mapping of each word element in the first decoded text to a preset dictionary; and the first iterative training submodule is used for calculating a loss value corresponding to the probability distribution according to the supervision label corresponding to the training sample, judging that the model converges to terminate the pre-training when the loss value reaches a preset threshold value, and otherwise calling another training sample to continue to carry out iterative training on the model.

In a further embodiment, the fine tuning training module includes: the second coding submodule is used for calling a coder of the text abstract model to extract deep semantic features corresponding to paragraph texts of a single training sample and coding a second text feature vector; the second decoding submodule is used for calling a decoder of the text abstract model to decode the second text feature vector to obtain probability distribution corresponding to mapping of each word element in the second decoded text to the preset dictionary; and the second iterative training submodule is used for calculating a loss value corresponding to the probability distribution according to the supervision label corresponding to the training sample, judging that the model is converged when the loss value reaches a preset threshold value and terminating the fine tuning training, and otherwise calling another training sample to continue to carry out iterative training on the model.

In yet another aspect, a computer apparatus adapted for one of the purposes of the present application includes a central processing unit and a memory, the central processing unit being configured to invoke execution of a computer program stored in the memory to perform the steps of the text abstract model production method described herein.

In accordance with another aspect of the present invention, there is provided a computer-readable storage medium storing a computer program according to the above-mentioned text abstract model production method in the form of computer-readable instructions, wherein the computer program is called by a computer to execute the steps included in the method when the computer program is executed.

The technical solution of the present application has various advantages, including but not limited to the following aspects:

the method comprises the steps of firstly extracting a keyword sequence from a core sentence in a paragraph text, training a generator which is suitable for generating an equivalent sentence of the core sentence by using the keyword sequence, then using the generator to generate the equivalent sentence which replaces a non-core sentence of the paragraph text in a training sample of a text abstract model, and slightly covering the core sentence in the paragraph text, so as to train the text abstract model, wherein the equivalent sentence generated by the generator forms slight noise for the training sample of the text abstract model, the covered core sentence forms severe noise, the training sample of the text abstract model is constructed by combining the slight noise and the severe noise, the characteristics of the training sample can be reasonably generalized, and when the training sample is applied to the pre-training of the text abstract model, the text abstract model can be forced to understand the paragraph text more deeply, corresponding deep semantic features are accurately extracted, so that when the abstract text of the paragraph text is generated, the high-quality abstract text can be generated according to the accurate deep semantic features, and because the training samples are preprocessed by combining different types of noise, the training speed of the text abstract model can be increased, the text abstract model can be quickly in a convergence state, and the training cost is saved.

Drawings

The foregoing and/or additional aspects and advantages of the present application will become apparent and readily appreciated from the following description of the embodiments, taken in conjunction with the accompanying drawings of which:

FIG. 1 is a schematic flow chart diagram of an exemplary embodiment of a text summarization model production method of the present application;

FIG. 2 is a flow chart illustrating core sentence hiding in an embodiment of the present application;

FIG. 3 is a flow chart illustrating the replacement of non-core statements in an embodiment of the present application;

FIG. 4 is a schematic flow chart of training of a generator against a neural network in an embodiment of the present application;

FIG. 5 is a schematic flow chart of the discriminant training of the anti-neural network according to an embodiment of the present disclosure;

FIG. 6 is a schematic flow chart illustrating a pre-training text summarization model according to an embodiment of the present application;

FIG. 7 is a flowchart illustrating fine tuning of a training text summarization model according to an embodiment of the present application;

FIG. 8 is a schematic block diagram of a text summarization model production device of the present application;

fig. 9 is a schematic structural diagram of a computer device used in the present application.

Detailed Description

Reference will now be made in detail to embodiments of the present application, examples of which are illustrated in the accompanying drawings, wherein like or similar reference numerals refer to the same or similar elements or elements having the same or similar function throughout. The embodiments described below with reference to the drawings are exemplary only for the purpose of explaining the present application and are not to be construed as limiting the present application.

As used herein, the singular forms "a", "an", "the" and "the" are intended to include the plural forms as well, unless the context clearly indicates otherwise. It will be further understood that the terms "comprises" and/or "comprising," when used in this specification, specify the presence of stated features, integers, steps, operations, elements, and/or components, but do not preclude the presence or addition of one or more other features, integers, steps, operations, elements, components, and/or groups thereof. It will be understood that when an element is referred to as being "connected" or "coupled" to another element, it can be directly connected or coupled to the other element or intervening elements may also be present. Further, "connected" or "coupled" as used herein may include wirelessly connected or wirelessly coupled. As used herein, the term "and/or" includes all or any element and all combinations of one or more of the associated listed items.

It will be understood by those within the art that, unless otherwise defined, all terms (including technical and scientific terms) used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this application belongs. It will be further understood that terms, such as those defined in commonly used dictionaries, should be interpreted as having a meaning that is consistent with their meaning in the context of the prior art and will not be interpreted in an idealized or overly formal sense unless expressly so defined herein.

As used herein, "client," "terminal," and "terminal device" include both wireless signal receiver devices, which are only capable of wireless signal receiver devices without transmit capability, and receiving and transmitting hardware devices, which have receiving and transmitting hardware capable of two-way communication over a two-way communication link, as will be understood by those skilled in the art. Such a device may include: cellular or other communication devices such as personal computers, tablets, etc. having a single line display or a multi-line display or cellular or other communication devices without a multi-line display; PCS (Personal Communications Service), which may combine voice, data processing, facsimile and/or data communication capabilities; a PDA (Personal Digital Assistant), which may include a radio frequency receiver, a pager, internet/intranet access, a web browser, a notepad, a calendar and/or a GPS (Global Positioning System) receiver; a conventional laptop and/or palmtop computer or other appliance having and/or including a radio frequency receiver. As used herein, a "client," "terminal device" can be portable, transportable, installed in a vehicle (aeronautical, maritime, and/or land-based), or situated and/or configured to operate locally and/or in a distributed fashion at any other location(s) on earth and/or in space. The "client", "terminal Device" used herein may also be a communication terminal, a web terminal, a music/video playing terminal, such as a PDA, an MID (Mobile Internet Device) and/or a Mobile phone with music/video playing function, and may also be a smart tv, a set-top box, and the like.

The hardware referred to by the names "server", "client", "service node", etc. is essentially an electronic device with the performance of a personal computer, and is a hardware device having necessary components disclosed by the von neumann principle such as a central processing unit (including an arithmetic unit and a controller), a memory, an input device, an output device, etc., a computer program is stored in the memory, and the central processing unit calls a program stored in an external memory into the internal memory to run, executes instructions in the program, and interacts with the input and output devices, thereby completing a specific function.

It should be noted that the concept of "server" as referred to in this application can be extended to the case of a server cluster. According to the network deployment principle understood by those skilled in the art, the servers should be logically divided, and in physical space, the servers may be independent from each other but can be called through an interface, or may be integrated into one physical computer or a set of computer clusters. Those skilled in the art will appreciate this variation and should not be so limited as to restrict the implementation of the network deployment of the present application.

One or more technical features of the present application, unless expressly specified otherwise, may be deployed to a server for implementation by a client remotely invoking an online service interface provided by a capture server for access, or may be deployed directly and run on the client for access.

Unless specified in clear text, the neural network model referred to or possibly referred to in the application can be deployed in a remote server and performs remote invocation at a client, and can also be deployed in a client with sufficient equipment capability to perform direct invocation.

Various data referred to in the present application may be stored in a server remotely or in a local terminal device unless specified in the clear text, as long as the data is suitable for being called by the technical solution of the present application.

The person skilled in the art will know this: although the various methods of the present application are described based on the same concept so as to be common to each other, they may be independently performed unless otherwise specified. In the same way, for each embodiment disclosed in the present application, it is proposed based on the same inventive concept, and therefore, concepts of the same expression and concepts of which expressions are different but are appropriately changed only for convenience should be equally understood.

The embodiments to be disclosed herein can be flexibly constructed by cross-linking related technical features of the embodiments unless the mutual exclusion relationship between the related technical features is stated in the clear text, as long as the combination does not depart from the inventive spirit of the present application and can meet the needs of the prior art or solve the deficiencies of the prior art. Those skilled in the art will appreciate variations therefrom.

The text abstract model production method can be programmed into a computer program product and is deployed in a client or a server to run, for example, in an exemplary application scenario of the application, the text abstract model production method can be deployed in a server of an e-commerce platform, so that the text abstract model production method can be executed by accessing an open interface after the computer program product runs and performing human-computer interaction with a process of the computer program product through a graphical user interface.

Referring to fig. 1, the text summarization model production method of the present application, in an exemplary embodiment, comprises the following steps:

step S1100, a training set is obtained, wherein the training set comprises a plurality of sample data, the sample data comprises paragraph texts and corresponding abstract texts thereof, the paragraph texts comprise core sentences and non-core sentences, and the core sentences represent key semantics of the paragraph texts;

in one embodiment, an open-source data set may be used as the training set, that is, paragraph text and corresponding abstract text of each data in the data set are used as the sample data in the training set, and for a language of chinese, the data set may use any one or more of lcts, CNewSum, and QBSUM, and for a language of english, the data set may use any one or more of DUC, TAC, TREC, CNN/DailyMail, XSum, Wikiasp, FacetSum, and WikiHow.

In another embodiment, text contents published by users corresponding to multiple fields including but not limited to entertainment, news, sports, education, finance, science and technology, and the internet can be collected from a third party forum client, the text contents include but not limited to titles and texts, then any one or more operations of filtering invalid characters such as inactive characters, expressive characters, redundant punctuation marks and the like, cleaning template texts, extracting effective text information and the like can be adopted, formatting is carried out on the collected text contents, the formatted text contents are obtained, the correlation between the texts and the titles can be scored by manual evaluation, the corresponding correlation score interval can be [1,5], the higher the score is, the stronger the correlation is indicated, therefore, the text contents are sequenced in the order of high score to low score according to the correlation score corresponding to the text contents after being formatted, therefore, a plurality of text contents which are ranked in the front are selected, the text body of each text content is used as the paragraph text, the title corresponding to each text content is used as the abstract text corresponding to the paragraph text, the sample data is constructed, and the training set is obtained.

It will be readily appreciated that the abstract text is generally a brief summary of the central idea of its corresponding paragraph text.

Step S1200, taking a keyword sequence of a non-core statement in a paragraph text in the sample data as a training sample, taking the non-core statement as a supervision label, and performing self-supervision training on a preset generator to enable the generator trained to be converged to be suitable for generating an equivalent statement of the non-core statement according to the keyword sequence;

in one embodiment, a plurality of sentences similar to the rest of text in the paragraph text can be searched according to an index of the ROUGE, and each sentence is determined as a core sentence in the paragraph text, specifically, the ROUGE calculates n-gram overlap by using scores from 0 to 100 (the ROUGE-1, the ROUGE-2 and the ROUGE-L are three common variants) to represent the similarity between two sentences, accordingly, the ROUGE-2 can be selected to calculate n-gram overlap to represent the similarity between each sentence and all other sentences in the paragraph text, and further, each sentence can be sorted according to the sequence of the similarity corresponding to each sentence from high to low, and then, a plurality of sentences sorted in the front are determined as core sentences in the paragraph text.

In another embodiment, each sentence in the paragraph text may be encoded by using a text encoding model trained to converge in advance to achieve vectorization representing each sentence, and obtain a corresponding encoding vector, further, a vector similarity calculation function, such as a cosine similarity calculation function, may be used to calculate a vector distance between the encoding vectors corresponding to each sentence, characterize a similarity between each sentence, calculate a sum of similarities between each sentence and other sentences, rank the sentences according to a sequence of the sums of the similarities corresponding to the sentences from high to low, and then determine a plurality of sentences ranked ahead as core sentences in the paragraph text. The text coding model can be any one of Word2Vec, CBOW and Bert.

Randomly selecting a plurality of non-core sentences in the paragraph text in the sample data, wherein the plurality of non-core sentences can be selected according to a certain proportion to obtain 20% of the total number of the non-core sentences in the paragraph text, extracting target keyword texts, the target keyword texts comprise any one or more of noun texts, verb texts, subject clause keyword texts and fixed clause keyword texts, the subject clause keyword texts can be subordinate conjunctions such as that, whther, that/that, etc., connected pronouns such as that, whhat, whose, whho, s/he/it, this/that, s/he/it, etc., connected adverbs such as why, when, where, how, when, how, where, etc., the fixed clause keyword can be related conjunctions such as that, what, how, what, where, etc, who, whose, as, s/he/it, this/that, as with …, s/he/it, etc. In addition, the position of each target keyword text corresponding to the non-core sentence can be coded to obtain corresponding position information, the coding can be realized by adopting any one of sine and cosine coding, learnable coding and one-hot coding, and a person skilled in the art can flexibly select and implement the target keyword text according to needs.

Further, the target keyword text is mapped and associated with corresponding position information to construct a keyword sequence as a training sample, and a corresponding non-core sentence is used as a supervision label to perform self-supervision training on a preset generator. Specifically, an encoder of the generator encodes a keyword text vector corresponding to a target keyword text in a keyword sequence of the training sample, the keyword text vector is spliced with corresponding position information in the keyword sequence to obtain a spliced encoded vector, the keyword sequence is used as a hard constraint condition, namely, a sentence generated by encoding of the encoder of the generator needs to contain the target keyword text of a corresponding position sequence, and accordingly, the encoder of the generator decodes the sentence in an autoregressive mode represented by unidirectional features according to the spliced encoded vector to generate an equivalent sentence corresponding to the training sample. And applying a preset cross entropy loss calculation function according to a supervision label, namely a non-core statement, corresponding to the training sample, generating a cross entropy loss value corresponding to an equivalent statement corresponding to the training sample by using a calculation generator, judging that the generator converges to terminate the training when the loss value reaches a preset threshold value, and otherwise calling another training sample to continue to carry out iterative training on the generator. The generator can be an Gpt series model, and the cross entropy loss calculation function and the preset threshold can be flexibly set by a person skilled in the art according to the prior knowledge.

It can be understood that the generator continuously fits the difference between the equivalent sentence generated by the generator and the real label, i.e. the non-core sentence, under the self-supervision training, so that the generator can generate the equivalent sentence with the semantic similar to the non-core sentence according to the keyword sequence after training to convergence.

Step 1300, replacing a corresponding non-core sentence in a paragraph text in the sample data by an equivalent sentence, covering a part of core sentences in the paragraph text as a training sample, and pre-training a text abstract model to be convergent by taking the core sentences as a supervision label;

the generator for training to convergence may replace a non-core sentence corresponding to a paragraph text in the sample data with an equivalent sentence generated according to a keyword sequence corresponding to the non-core sentence, and on this basis, randomly select a part of core sentences in the paragraph text to replace with a specific masking identifier, where the specific masking identifier may be a [ mask ] character, and accordingly, use the preprocessed paragraph text as a training sample.

Further, the training sample and the covered part of the core sentences can be used as supervision labels to pre-train the text abstract model until convergence. Specifically, an encoder of the text abstract model performs bidirectional feature representation encoding, calculation and feature extraction on the paragraph text, then a decoder of the text abstract model performs Attention aggregation calculation by using Cross _ Attention and a hidden state result of the last layer of the encoder, and further generates a core sentence, which is replaced by the specific mask identifier, in the paragraph text of the training sample in an autoregressive manner represented by unidirectional features. And applying a preset cross entropy loss calculation function according to a supervision label, namely a core statement, corresponding to the training sample, and calculating a cross entropy loss value corresponding to the core statement of the training sample generated by the text abstract model, judging that the model converges to terminate the training when the loss value reaches a preset threshold value, and otherwise calling another training sample to continuously carry out iterative pre-training on the model. The text abstract model can be any one of a Transformer model, a Longformer model, a T5 model and a Bart model, and can be selected and realized by those skilled in the art as required, and the cross entropy loss calculation function and the preset threshold value can be flexibly set by those skilled in the art according to prior knowledge and/or experimental data.

And S1400, taking the paragraph texts in the sample data as training samples, taking the abstract texts corresponding to the paragraph texts as supervision labels, and performing fine tuning training on the text abstract model until convergence.

In step S1300, the text abstract model from the self-supervised pre-training to the convergence learns to generate the core sentences masked in the paragraph text according to the core semantics of the extracted paragraph text, and it can be understood that the output of the model is quite close to the abstract text of the paragraph text. Therefore, paragraph texts in the sample data can be used as training samples, abstract texts corresponding to the paragraph texts are used as supervision labels, and fine tuning training is carried out on the text abstract model until convergence. Specifically, an encoder of the text abstract model performs bidirectional feature representation encoding, calculation and feature extraction on the paragraph text, a decoder of the text abstract model performs Attention aggregation calculation by using Cross _ Attention and a hidden state result of the last layer of the encoder, further generates an abstract text corresponding to the paragraph text of the training sample in an autoregressive mode represented by a unidirectional feature, applies a preset Cross entropy loss calculation function according to a supervision label corresponding to the training sample, namely the abstract text, calculates a Cross entropy loss value corresponding to the abstract text corresponding to the training sample generated by the text abstract model, determines that the model converges when the loss value reaches a preset threshold value, and terminates the training, otherwise, calls another training sample to continue to perform iterative pretraining on the model. The cross entropy loss calculation function and the preset threshold can be flexibly set by a person skilled in the art according to a priori knowledge and/or experimental data.

As can be appreciated from the exemplary embodiments of the present application, the technical solution of the present application has various advantages, including but not limited to the following aspects:

the method comprises the steps of firstly extracting a keyword sequence from a core sentence in a paragraph text, training a generator which is suitable for generating an equivalent sentence of the core sentence by using the keyword sequence, then using the generator to generate the equivalent sentence which replaces a non-core sentence of the paragraph text in a training sample of a text abstract model, and slightly covering the core sentence in the paragraph text, so as to train the text abstract model, wherein the equivalent sentence generated by the generator forms slight noise for the training sample of the text abstract model, the covered core sentence forms severe noise, the training sample of the text abstract model is constructed by combining the slight noise and the severe noise, the characteristics of the training sample can be reasonably generalized, and when the training sample is applied to the pre-training of the text abstract model, the text abstract model can be forced to understand the paragraph text more deeply, corresponding deep semantic features are accurately extracted, so that when the abstract text of the paragraph text is generated, the high-quality abstract text can be generated according to the accurate deep semantic features, and the training speed of the text abstract model can be increased by combining different types of noise for preprocessing training samples, so that the text abstract model can quickly reach a convergence state, and the training cost is saved.

Referring to fig. 2, in the preferred embodiment, the step S1300 of masking part of the core sentences in the paragraph text as training samples includes the following steps:

step S1301, coding sentences in the paragraph text by adopting a text coding model which is trained to be convergent in advance to obtain coding vectors corresponding to the sentences;

the pre-trained to converge text coding model can be an open-source Bert model, or a model suitable for a text coding task, which is pre-trained to converge and can be realized by a person skilled in the art.

And coding the sentences in the paragraph text by adopting the Bert model which is trained to be convergent in advance, extracting deep semantic features corresponding to each sentence, and obtaining a coding vector which expresses each sentence in a vectorization manner.

Step S1302, calculating an average value of the coding vectors corresponding to each statement to obtain a central vector;

and adding the coding vectors corresponding to the sentences by dividing the total number of the sentences in the paragraph text, calculating a corresponding average value, and taking the average value as a central vector.

Step S1303, determining a core sentence in the paragraph text according to a vector distance between the coding vector corresponding to each sentence and the center vector;

calculating the vector distance between the coding vector corresponding to each sentence and the central vector by using a vector similarity calculation function, wherein the vector similarity calculation function may be a cosine similarity calculation method, an euclidean distance algorithm, a pearson correlation coefficient algorithm, a minkowski distance algorithm, a mahalanobis distance algorithm, a jackard coefficient algorithm, or the like, and one of ordinary skill in the art may optionally implement the calculation of the vector distance. In one embodiment, a cosine similarity algorithm is called to calculate the vector distance, the closer the vector distance is to 1, the closer the semantics of the corresponding sentence and the key semantics of the paragraph text are characterized, and the closer the vector distance is to 0, the greater the difference between the semantics of the corresponding sentence and the key semantics of the paragraph text is characterized. Accordingly, the sentence corresponding to the vector distance meeting the preset threshold can be determined as the core sentence in the paragraph text, and the preset threshold can be flexibly set by a person skilled in the art according to the priori knowledge and/or experimental data.

Step S1304, selecting a part of core sentences in the paragraph text to replace with specific covering marks according to a preset ratio.

It can be understood that the selected core sentences should not be too many or too few, and should be selected according to a certain proportion, so as to avoid that the core sentences in the paragraph text are covered too much and cannot be fitted when the model is pre-trained subsequently, or the covered core sentences are too few and influence the performance of the model, and the recommended preset proportion is 20%, although those skilled in the art can also set the proportion according to experimental data. In one embodiment, 20% of the core sentences in the paragraph text are randomly selected and replaced with specific covering identifiers, where the specific covering identifiers may be [ mask ] characters, so as to cover the core sentences in the paragraph text.

In the embodiment, the central vector representing the key semantics of the paragraph text is simply, scientifically and further, the core sentences in the paragraph text can be effectively determined according to the vector distance between the central vector and the coding vector corresponding to each sentence ingeniously.

Referring to fig. 3, in the extended embodiment, the step of replacing the corresponding non-core sentence in the paragraph text in the sample data with the equivalent sentence includes the following steps:

step 1310', selecting a non-core sentence corresponding to the core sentence context in the paragraph text, extracting a target keyword text in the non-core sentence, and obtaining position information corresponding to the target keyword text in the sentence to construct a keyword sequence, wherein the target keyword text comprises a noun text, a verb text, a subject clause keyword text and a fixed language clause keyword text;

selecting non-core sentences corresponding to the core sentence context in the paragraph text, namely the non-core sentences adjacent to and closest to the core sentences, segmenting the non-core sentences by using jieba (jieba) to obtain corresponding segmented word texts, matching the segmented word texts with the word texts in the dictionary by adopting an accurate matching mode according to a preset dictionary to obtain the parts of speech corresponding to the matched word texts, wherein the parts of speech include but are not limited to nouns, quantifiers, pronouns, related words, adverbs, subordinate conjunctions, conjunctions and conjunctions, and then determining the segmented word texts belonging to the parts of speech corresponding to the target keyword texts according to the parts of speech corresponding to the segmented word texts, thereby extracting the target keyword texts from the non-core sentences, and coding the positions of the target keyword texts corresponding to the non-core sentences to obtain corresponding position information, the coding can be realized by any one of sine and cosine coding, learnable coding and one-hot coding, and the implementation can be flexibly selected and implemented by a person skilled in the art according to needs. Further, the keyword sequence can be constructed by associating the target keyword text mapping with the corresponding position information.

The subject clause keyword text may be any one or more of subordinate conjunctions such as that, whheter, this/that, etc., conjunctions such as which, what, whose, who, s/he/it, this/that, s/he/it, etc., conjunctions of whhy, whhen, where, how, when, how, where, etc., the terns may be related words such as that, who, whose, as, s/he/it, this/that, and …, s/he/it, etc.

Step S1320', according to the keyword sequence, generating an equivalent sentence of a non-core sentence by using a generator of an anti-neural network trained to converge in advance, so as to replace a corresponding non-core sentence in a paragraph text in the sample data.

The generator of the antagonistic neural network can be an Gpt series model, and the corresponding arbiter can be flexibly realized by those skilled in the art or realized by referring to the subsequent embodiments. The training process of the generator of the antagonistic neural network is further disclosed by the following part of the embodiment, and the step is temporarily pressed.

In one embodiment, the encoder of the generator encodes a keyword text vector corresponding to a target keyword text in the keyword sequence, and splices the keyword text vector with corresponding position information in the keyword sequence to obtain a spliced encoded vector, and then the keyword sequence is used as a hard constraint condition, that is, a sentence generated by encoding with the encoder of the generator needs to contain the target keyword text in a corresponding position sequence, and accordingly, the encoder of the generator decodes the spliced encoded vector in an autoregressive manner represented by a unidirectional feature to generate an equivalent sentence corresponding to a training sample.

In the embodiment, firstly, a rule matching mode is adopted, a target keyword text in a non-core sentence can be quickly and accurately determined, secondly, an equivalent sentence of the non-core sentence is generated by a generator of an anti-neural network, the quality of the generated equivalent sentence is effectively improved, in addition, the non-core sentence corresponding to the context of the core sentence is selected to be replaced by the equivalent sentence generated by the generator, the difficulty in understanding the paragraph text is effectively improved, the generalization capability of the model is improved when the method is subsequently applied to the pre-training of a text abstract model, the semantic features of the covered core sentence can be accurately extracted, the understanding of the model to the deep paragraph text is deepened, and the quality of the abstract text corresponding to the generated paragraph text is favorably improved.

Referring to fig. 4, in a further embodiment, the step S1320' of training the generator of the anti-neural network includes the following steps:

step S1325', a preset generator is connected to a discriminator which is trained to be converged in advance to construct an anti-neural network, and the weight of the discriminator is frozen;

when the antagonistic neural network is constructed, the preset generator is connected with the discriminator which is trained to be converged in advance, and in addition, the weight of the discriminator is frozen through setting operation so that the discriminator does not participate in the weight updating of the subsequent training process. The preset generator may be an Gpt series model. The training of the discriminator is further disclosed by the following part of the embodiment, and the step is temporarily pressed.

Step S1326', a single training sample is called and input to the generator to generate an equivalent sentence corresponding to the training sample, wherein the training sample is a keyword sequence of a non-core sentence corresponding to a core sentence context in a paragraph text in the sample data;

regarding the core sentence and the keyword sequence in the paragraph text, reference may be made to some embodiments of the present application, which is not repeated herein.

Step S1327', calling the discriminator to carry out classification mapping on the equivalent sentences, and mapping the equivalent sentences to a binary space to obtain classification probabilities corresponding to the mapping to a positive class space or a negative class space;

and the encoder of the discriminator encodes the equivalent sentences to obtain corresponding equivalent encoding vectors, performs classification mapping on the equivalent encoding vectors through full connection, maps the equivalent encoding vectors to the binary space, and predicts the classification probability corresponding to the positive class space or the negative class space.

And step S1328', assuming that the supervision label of the called training sample calculates a loss value for the positive sample, judging that the generator converges to terminate the training when the loss value reaches a preset threshold value, and otherwise calling another training sample to continue to carry out iterative training on the generator.

The preset threshold value can be flexibly set by a person skilled in the art according to the prior knowledge and/or experimental data.

And when the generator does not converge, implementing back propagation to the generator according to the loss value to realize gradient updating, and then continuing to invoke the next training sample to implement iterative training. After the generator convergence is determined based on the loss value, training may be terminated.

Through the process, training of the generator is completed in the generation of the countermeasure network, in the process, the discriminator is used as a judge, the generator is supposed to produce a qualified positive sample result, and if the quality of the equivalent sentence produced by the generator does not reach the expectation, the weight parameter of the generator is corrected by a corresponding amplitude under the action of the supposed supervision label, so that the actual capacity of the generator is continuously and effectively improved, and the generator finally reaches the state of resisting against the discriminator.

Furthermore, as will be understood by those skilled in the art, the training corresponding to the generator and the arbiter of the neural network is for one side, when the trained side trains, the weight of the other side is frozen, and the other side guides the training of the trained side according to the input of the trained side until convergence, based on the iteration of the generator and the arbiter, until the probability that the arbiter judges the generation of the generator is a non-core sentence is 0.5 for many times, and when nash balance is reached, at this time, the generator can be generated as the generator of the neural network of the present application.

In this embodiment, on the one hand, a training process of a generator of an anti-neural network is disclosed, and the generator is trained under the condition that the weight of a discriminator trained in advance to be converged is frozen, so that after the generator is trained to be converged, the ability of generating an equivalent sentence corresponding to a non-core sentence according to a keyword sequence of the non-core sentence is learned, and the equivalent sentence can be generated quickly and accurately by subsequently calling the generator. On the other hand, it can be understood that the equivalent sentence generated by the generator is judged by the judger as a judge, so that the quality of the equivalent sentence generated by the generator is effectively improved, and the semantic accuracy of the equivalent sentence is guaranteed.

Referring to fig. 5, in a further embodiment, step S1320' includes the following steps before accessing the preset generator to the classifier trained in advance to converge to construct the anti-neural network and freezing the weight of the classifier:

step S1321', taking a keyword sequence of a non-core sentence in a paragraph text in the sample data as a training sample, taking the non-core sentence as a supervision label, and supervising and training a preset generator until convergence;

this step can be implemented specifically, referring to step S1200, which is not repeated herein.

Step S1322' calls a generator to generate equivalent sentences corresponding to the non-core sentences as training samples of a training discriminator according to the keyword sequence of the training samples, and takes the corresponding non-core sentences as supervision labels;

for the specific implementation that the call generator generates the equivalent statement corresponding to the non-core statement according to the keyword sequence of the training sample, step S1200 may be referred to, and this step is not repeated.

And taking equivalent sentences corresponding to the non-core sentences generated by the generator as training samples of the training discriminator, and taking the corresponding non-core sentences as supervision labels.

Step S1323', calling a discriminator to perform classification mapping on the training sample for training, mapping the training sample to a binary space, and obtaining the classification probability corresponding to the mapping to the positive class space or the negative class space;

and the encoder of the discriminator encodes equivalent sentences of training samples for training the equivalent sentences to obtain corresponding equivalent encoding vectors, performs classification mapping on the equivalent encoding vectors through full connection, maps the equivalent encoding vectors to the binary space, and predicts the classification probability corresponding to the positive class space or the negative class space.

And step S1324', calculating a loss value of the corresponding classification probability according to the supervision label corresponding to the training sample of the discriminator, judging that the discriminator is converged to terminate the training when the loss value reaches a preset threshold value, and otherwise calling the next training sample to continue to carry out iterative training.

And when the discriminator is not converged, implementing back propagation to the discriminator according to the loss value to realize gradient updating, and then continuing to call the next training sample to implement iterative training. After the decision is made that the arbiter has converged based on the penalty value, the training is terminated.

Through the process, the training of the arbiter is completed, the arbiter is connected with the generator to construct the antagonistic neural network, and the antagonistic neural network is used as the judge of the generator in the network to discriminate the generation of the generator.

In the embodiment, the training process of the arbiter is disclosed, and after the arbiter trains to converge, the capability of judging whether the equivalent sentence generated by the generator is a non-core sentence is learned, so that the generation of the generator can be quickly judged after the equivalent sentence is connected with the generator.

Referring to fig. 6, in a further embodiment, the step S1300 of performing pre-training on the text summarization model until convergence includes the following steps:

step S1310, an encoder of a text abstract model is called to extract deep semantic features corresponding to covered partial core sentences in a single training sample, and a first text feature vector is encoded;

and calling an encoder of a text abstract model to perform bidirectional feature representation encoding on the paragraph text of the training sample, performing corresponding attention calculation, extracting deep semantic features corresponding to the covered partial core sentences, and obtaining a first text feature vector.

Step S1320, a decoder of the text abstract model is called to decode the first text feature vector, and probability distribution corresponding to mapping of each word element in the first decoded text to a preset dictionary is obtained;

furthermore, a decoder calling a text abstract model performs Attention aggregation calculation by using Cross _ Attention and the first text feature vector output by the hidden state of the last layer of the encoder, and then obtains probability distribution corresponding to mapping of each lemma in the first decoded text to a preset dictionary in an autoregressive mode represented by a unidirectional feature. The preset dictionary can be flexibly set by a person skilled in the art according to prior knowledge or experimental data.

Step S1330, calculating a loss value corresponding to the probability distribution according to the supervised label corresponding to the training sample, determining that the model converges to terminate the pre-training when the loss value reaches a preset threshold, and otherwise, calling another training sample to continue to perform iterative training on the model.

And applying a preset cross entropy loss calculation function according to a supervision label corresponding to the training sample, namely a covered part of core statements in the training sample, calculating a cross entropy loss value corresponding to the probability distribution, judging that the model converges to terminate the training when the loss value reaches a preset threshold value, and otherwise calling another training sample to continuously implement iterative pre-training on the model. After the model convergence is determined based on the loss value, the pre-training of the model may be terminated. The cross entropy loss calculation function and the preset threshold value can be flexibly set by a person skilled in the art according to a priori knowledge and/or experimental data.

In the embodiment, the process from pre-training to convergence of the text abstract model is disclosed, and the method is applied to the paragraph text of the training sample of the pre-training of the text abstract model, and the paragraph text is subjected to core sentence covering and non-core sentence replacement processing, so that the model deepens the understanding of the model on the paragraph text in the pre-training process, can accurately extract the deep semantic features of the covered core sentences, and is favorable for improving the quality of the generated abstract text corresponding to the paragraph text.

Referring to fig. 7, in a further embodiment, the step S1400 of performing fine tuning training on the text summarization model to converge includes the following steps:

step S1410, an encoder of a text abstract model is called to extract deep semantic features corresponding to paragraph texts of a single training sample, and a second text feature vector is encoded;

and calling an encoder of a text abstract model to perform bidirectional feature representation encoding on the paragraph text of the training sample, performing corresponding attention calculation, extracting deep semantic features corresponding to the paragraph text, and obtaining a second text feature vector.

Step S1420, a decoder of the text abstract model is called to decode the second text feature vector, and probability distribution corresponding to mapping of each word element in the second decoded text to a preset dictionary is obtained;

furthermore, a decoder calling the text abstract model performs Attention aggregation calculation by using Cross _ Attention and the second text feature vector output in the hidden state of the last layer of the encoder, and then obtains probability distribution corresponding to mapping of each lemma in the second decoded text to a preset dictionary in an autoregressive mode represented by unidirectional features. The preset dictionary can be flexibly set by a person skilled in the art according to prior knowledge or experimental data.

And S1430, calculating a loss value corresponding to the probability distribution according to the supervision label corresponding to the training sample, judging that the model converges when the loss value reaches a preset threshold value, and terminating fine-tuning training, otherwise calling another training sample to continue to perform iterative training on the model.

And according to a supervision label corresponding to the training sample, namely the abstract text corresponding to the paragraph text, applying a preset cross entropy loss calculation function, calculating a cross entropy loss value corresponding to the probability distribution, judging that the model converges to terminate the training when the loss value reaches a preset threshold value, and otherwise calling another training sample to continuously carry out iterative training on the model. After the model convergence is determined based on the loss value, the fine tuning training of the model may be terminated. The cross entropy loss calculation function and the preset threshold value can be flexibly set by a person skilled in the art according to a priori knowledge and/or experimental data.

In the embodiment, the process from fine tuning training to convergence of the text abstract model is disclosed, so that the text abstract model learns the capability of generating the corresponding abstract text according to the deep semantic features of the text of the precise extraction paragraph, and the text abstract model can be subsequently applied to an actual service scene to generate the high-quality abstract text for the long text.

Referring to fig. 8, a text abstract model production apparatus adapted to one of the objectives of the present application is a functional embodiment of the text abstract model production method of the present application, and the apparatus includes a data acquisition module, a generator training module, a pre-training module, and a fine-tuning training module, wherein: the data acquisition module is used for acquiring a training set, wherein the training set comprises a plurality of sample data, the sample data comprises a paragraph text and a corresponding abstract text thereof, the paragraph text comprises a core sentence and a non-core sentence, and the core sentence represents key semantics of the paragraph text; the generator training module is used for taking a keyword sequence of a non-core sentence in a paragraph text in the sample data as a training sample, taking the non-core sentence as a supervision label, and performing self-supervision training on a preset generator to ensure that the generator which is trained to be converged is suitable for generating an equivalent sentence of the non-core sentence according to the keyword sequence; the pre-training module is used for replacing a corresponding non-core sentence in a paragraph text in the sample data by an equivalent sentence, covering a part of core sentences in the paragraph text as a training sample, and pre-training a text abstract model to be convergent by taking the core sentences as a supervision label; and the fine tuning training module is used for taking the paragraph texts in the sample data as training samples and taking the abstract texts corresponding to the paragraph texts as supervision labels to carry out fine tuning training on the text abstract model until convergence.

In a preferred embodiment, the pre-training module includes: the sentence coding submodule is used for coding the sentences in the paragraph text by adopting a text coding model which is trained to be convergent in advance to obtain a coding vector corresponding to each sentence; the central vector submodule is used for calculating the average value of the coding vectors corresponding to the statements to obtain a central vector; a core determining submodule, configured to determine a core sentence in the paragraph text according to a vector distance between the coding vector corresponding to each sentence and the center vector; and the text replacement sub-module is used for selecting part of core sentences in the paragraph text according to a preset proportion and replacing the part of the core sentences with the specific covering marks.

In an extended embodiment, the pre-training module includes: the text extraction submodule is used for selecting a non-core sentence corresponding to the context of the core sentence in the paragraph text, extracting a target keyword text in the non-core sentence, obtaining position information corresponding to the target keyword text in the sentence and constructing a keyword sequence, wherein the target keyword text comprises a noun text, a verb text, a subject clause keyword text and a fixed language clause keyword text; and the sentence generation submodule is used for generating an equivalent sentence of the non-core sentence by adopting a generator of the antagonistic neural network which is trained to be convergent in advance according to the keyword sequence so as to replace the corresponding non-core sentence in the paragraph text in the sample data.

In a further embodiment, the training process of the generator of the neural network includes: the network construction module is used for accessing a preset generator into a discriminator which is trained to be convergent in advance to construct an antagonistic neural network and freezing the weight of the discriminator; the first sentence editing module is used for calling a single training sample to be input into the generator to generate an equivalent sentence corresponding to the training sample, wherein the training sample is a keyword sequence of a non-core sentence corresponding to a core sentence context in a paragraph text in the sample data; the first classification mapping module is used for calling the discriminator to perform classification mapping on the equivalent sentences, mapping the equivalent sentences to a binary space and obtaining classification probabilities corresponding to the mapping to a positive class space or a negative class space; and the first iterative training module is used for calculating a loss value for the positive type sample on the assumption that the supervision label of the called training sample is used, judging that the generator converges to terminate the training when the loss value reaches a preset threshold value, and otherwise calling another training sample to continue to carry out iterative training on the generator.

In a further embodiment, before the network constructing module, the method further includes: the preset generator training module is used for taking a keyword sequence of a non-core sentence in a paragraph text in the sample data as a training sample, taking the non-core sentence as a supervision label and supervising and training a preset generator until convergence; the second sentence compiling module is used for calling the generator to generate equivalent sentences corresponding to the non-core sentences as training samples of the training discriminator according to the keyword sequence of the training samples, and the corresponding non-core sentences are used as supervision labels; the second classification mapping submodule is used for calling the discriminator to perform classification mapping on the training sample for training the discriminator and mapping the training sample to a binary space to obtain the corresponding classification probability of mapping to a positive class space or a negative class space; and the second iterative training module is used for calculating a loss value of corresponding classification probability according to the supervision label corresponding to the training sample of the discriminator, judging the discriminator to be converged when the loss value reaches a preset threshold value, and stopping training, otherwise calling the next training sample to continue to carry out iterative training.

In a further embodiment, the pre-training module includes: the first coding submodule is used for calling a coder of a text abstract model to extract deep semantic features corresponding to covered partial core sentences in a single training sample and coding a first text feature vector; the first coding submodule is used for calling a decoder of the text abstract model to decode the first text feature vector to obtain probability distribution corresponding to mapping of each word element in the first decoded text to a preset dictionary; and the first iterative training submodule is used for calculating a loss value corresponding to the probability distribution according to the supervision label corresponding to the training sample, judging that the model converges to terminate the pre-training when the loss value reaches a preset threshold value, and otherwise calling another training sample to continue to carry out iterative training on the model.

In a further embodiment, the fine tuning training module includes: the second coding submodule is used for calling a coder of the text abstract model to extract deep semantic features corresponding to paragraph texts of a single training sample and coding a second text feature vector; the second decoding submodule is used for calling a decoder of the text abstract model to decode the second text feature vector to obtain probability distribution corresponding to mapping of each word element in the second decoded text to a preset dictionary; and the second iterative training submodule is used for calculating a loss value corresponding to the probability distribution according to the supervision label corresponding to the training sample, judging that the model is converged when the loss value reaches a preset threshold value and terminating the fine tuning training, and otherwise calling another training sample to continue to carry out iterative training on the model.

In order to solve the technical problem, an embodiment of the present application further provides a computer device. As shown in fig. 9, the internal structure of the computer device is schematic. The computer device includes a processor, a computer-readable storage medium, a memory, and a network interface connected by a system bus. The computer readable storage medium of the computer device stores an operating system, a database and computer readable instructions, the database can store control information sequences, and the computer readable instructions, when executed by the processor, can make the processor implement a text abstract model production method. The processor of the computer device is used for providing calculation and control capability and supporting the operation of the whole computer device. The memory of the computer device may have stored therein computer readable instructions that, when executed by the processor, may cause the processor to perform the text summarization model production method of the present application. The network interface of the computer device is used for connecting and communicating with the terminal. It will be appreciated by those skilled in the art that the configuration shown in fig. 9 is a block diagram of only a portion of the configuration associated with the present application, and is not intended to limit the computing device to which the present application may be applied, and that a particular computing device may include more or fewer components than shown, or may combine certain components, or have a different arrangement of components.

In this embodiment, the processor is configured to execute specific functions of each module and its sub-module in fig. 8, and the memory stores program codes and various data required for executing the modules or sub-modules. The network interface is used for data transmission to and from a user terminal or a server. The memory in this embodiment stores program codes and data required for executing all modules/sub-modules in the text abstract model production apparatus of the present application, and the server can call the program codes and data of the server to execute the functions of all sub-modules.

The present application also provides a storage medium storing computer-readable instructions which, when executed by one or more processors, cause the one or more processors to perform the steps of the text abstract model production method of any of the embodiments of the present application.

It will be understood by those skilled in the art that all or part of the processes of the methods of the embodiments of the present application can be implemented by a computer program, which can be stored in a computer-readable storage medium, and when the computer program is executed, the processes of the embodiments of the methods can be included. The storage medium may be a computer-readable storage medium such as a magnetic disk, an optical disk, a Read-Only Memory (ROM), or a Random Access Memory (RAM).

In summary, the method and the device have the advantages that the sentence-level noise processing of the core sentences and the noise processing of the non-core sentence replacement of the context of the core sentences are combined, the damaged paragraph texts are used as the pre-training text abstract models of the training samples, the performance of the text abstract models can be effectively guaranteed, and the high-quality abstract texts can be generated.

Those of skill in the art will appreciate that the various operations, methods, steps in the processes, acts, or solutions discussed in this application can be interchanged, modified, combined, or eliminated. Further, various operations, methods, steps, measures, schemes in the various processes, methods, procedures that have been discussed in this application may be alternated, modified, rearranged, decomposed, combined, or eliminated. Further, steps, measures, schemes in the prior art having various operations, methods, procedures disclosed in the present application may also be alternated, modified, rearranged, decomposed, combined, or deleted.

The foregoing is only a partial embodiment of the present application, and it should be noted that, for those skilled in the art, several modifications and decorations can be made without departing from the principle of the present application, and these modifications and decorations should also be regarded as the protection scope of the present application.

Claims

1. A text abstract model production method is characterized by comprising the following steps:

taking a keyword sequence of a non-core sentence in a paragraph text in the sample data as a training sample, taking the non-core sentence as a supervision label, and performing self-supervision training on a preset generator to enable the generator from training to convergence to be suitable for generating an equivalent sentence of the non-core sentence according to the keyword sequence;

2. The method for producing the text abstract model of claim 1, wherein the step of masking part of the core sentences in the paragraph text as training samples comprises the following steps:

coding the sentences in the paragraph text by adopting a text coding model trained to be convergent in advance to obtain a coding vector corresponding to each sentence;

3. The method for generating a text abstract model according to claim 1, wherein the step of replacing the corresponding non-core sentence in the paragraph text in the sample data with the equivalent sentence comprises the steps of:

4. The method for producing the text abstract model of claim 3, wherein the training process of the generator of the countermeasure neural network comprises the following steps:

and assuming that the supervision label of the called training sample calculates a loss value for the positive type sample, judging that the generator converges when the loss value reaches a preset threshold value to terminate training, and otherwise calling another training sample to continue to carry out iterative training on the generator.

5. The method for generating text abstract model according to claim 4, wherein the method comprises the following steps before accessing a preset generator to a pre-trained to converged discriminator to construct an anti-neural network and freezing the weight of the discriminator:

and calculating a loss value of the corresponding classification probability according to the supervision label corresponding to the training sample of the discriminator, judging that the discriminator converges to terminate the training when the loss value reaches a preset threshold value, and calling the next training sample to continue to carry out iterative training if the loss value does not reach the preset threshold value.

6. The method for producing a text abstract model of claim 1, wherein the pre-training of the text abstract model to converge comprises the steps of:

a decoder of the text abstract model is called to decode the first text feature vector to obtain probability distribution corresponding to mapping of each word element in the first decoded text to a preset dictionary;

and calculating a loss value corresponding to the probability distribution according to the supervision label corresponding to the training sample, judging the convergence of the model when the loss value reaches a preset threshold value, and stopping the pre-training, otherwise, calling another training sample to continuously carry out iterative training on the model.

7. The method for producing a text abstract model of claim 1, wherein the performing fine tuning training on the text abstract model to converge comprises the following steps:

a decoder of the text abstract model is called to decode the second text characteristic vector to obtain probability distribution corresponding to mapping of each word element in the second decoded text to a preset dictionary;

and calculating a loss value corresponding to the probability distribution according to the supervision label corresponding to the training sample, judging the convergence of the model when the loss value reaches a preset threshold value, and terminating the fine tuning training, otherwise, calling another training sample to continue to carry out iterative training on the model.

8. A text abstract model production device is characterized by comprising:

the data acquisition module is used for acquiring a training set, wherein the training set comprises a plurality of sample data, the sample data comprises a paragraph text and a corresponding abstract text thereof, the paragraph text comprises a core sentence and a non-core sentence, and the core sentence represents key semantics of the paragraph text;

the generator training module is used for taking a keyword sequence of a non-core sentence in a paragraph text in the sample data as a training sample, taking the non-core sentence as a supervision label, and carrying out self-supervision training on a preset generator so that the generator trained to be converged is suitable for generating an equivalent sentence of the non-core sentence according to the keyword sequence;

the pre-training module is used for replacing a corresponding non-core sentence in a paragraph text in the sample data by an equivalent sentence, covering a part of core sentences in the paragraph text as a training sample, and pre-training a text abstract model to be convergent by taking the core sentences as a supervision label;

and the fine tuning training module is used for taking the paragraph texts in the sample data as training samples and taking the abstract texts corresponding to the paragraph texts as supervision labels to carry out fine tuning training on the text abstract model until convergence.

9. A computer device comprising a central processing unit and a memory, characterized in that the central processing unit is adapted to invoke the execution of a computer program stored in the memory to perform the steps of the method according to any one of claims 1 to 7.

10. A computer-readable storage medium, characterized in that it stores, in the form of computer-readable instructions, a computer program implemented according to the method of any one of claims 1 to 7, which, when invoked by a computer, performs the steps comprised by the corresponding method.