CN116306599B

CN116306599B - Faithfulness optimization method, system, equipment and storage medium based on generated text

Info

Publication number: CN116306599B
Application number: CN202310580415.XA
Authority: CN
Inventors: 贾国庆
Original assignee: Shanghai Mdata Information Technology Co ltd
Current assignee: Shanghai Mido Technology Co ltd
Priority date: 2023-05-23
Filing date: 2023-05-23
Publication date: 2023-09-08
Anticipated expiration: 2043-05-23
Also published as: CN116306599A

Abstract

The application provides a faithfulness optimization method, a system, equipment and a storage medium based on a generated text, wherein the method comprises the following steps: acquiring an input text; performing text error correction on an input text, and repairing text errors in the input text; extracting entity content in an input text aiming at the text corrected by text, and replacing the entity content by using identification information; according to a preset text generation task, converting the input text after identification processing into a generated text; and converting the identification information into entity content corresponding to the identification information in response to the identification information in the generated text, so as to obtain a final generated text. The application can reduce the phenomenon that the generated text is not faithful to the original text, and can reduce the phenomenon that the generated text is inconsistent with the entity in the original text.

Description

Faithfulness optimization method, system, equipment and storage medium based on generated text

Technical Field

The application belongs to the technical field of text processing, relates to a processing method for generating a text, and in particular relates to a faithfulness optimization method, a system, equipment and a storage medium based on the generated text.

Background

At present, text generation is generally performed by adopting a language model, but the text which is not faithful in the generated text exists in any language model when the text is generated. For example, in a conditional text generation task, a text summary of formula or article generation is generated. Such tasks sometimes place high demands on the fidelity of the generated text. Reasons for this include (1) errors in entering text; (2) The language model learns error information in a pre-training stage; (3) the model generates text errors during the decoding stage.

While training larger language models can improve the above problems, it requires a lot of computational and data resources and does not completely solve the loyalty problem; in addition, multiple reasoning calculates the generated text most similar to the original text by calculating rouge or bluu score, so that the probability of occurrence of the wrong text can be reduced to a certain extent, but some clear entity error problems can not be completely solved, and a great amount of time is consumed in the reasoning stage.

Disclosure of Invention

The application aims to provide a faithfulness optimization method, a system, equipment and a storage medium based on a generated text, which are used for solving the problem that the generated text has lower faithfulness.

An embodiment of the present application provides a method for optimizing loyalty based on a generated text, the method including: acquiring an input text; performing text error correction on an input text, and repairing text errors in the input text; extracting entity content in an input text aiming at the text corrected by text, and replacing the entity content by using identification information; according to a preset text generation task, converting the input text after identification processing into a generated text; and converting the identification information into entity content corresponding to the identification information in response to the identification information in the generated text, so as to obtain a final generated text.

In an implementation manner of the first aspect, the step of performing text correction on the input text and repairing text errors in the input text includes: and detecting and repairing spelling errors, grammar errors and/or punctuation errors existing in the input text through a pre-trained text error correction model.

In one implementation manner of the first aspect, the training process of the text error correction model includes: acquiring a correct text for training; generating error text with spelling errors, grammar errors and/or punctuation errors by using the correct text; constructing a plurality of text pairs through the correct text and the error text; based on each text pair, a wrong text is input to the text correction model such that the text correction model outputs a correct text.

In an implementation manner of the first aspect, the step of extracting, for the text-corrected input text, entity contents in the input text, and replacing the entity contents with identification information includes: extracting entity contents in the input text through a pre-trained entity recognition model, and replacing the entity contents by using identification information; the physical content includes people, time and/or place.

In one implementation manner of the first aspect, the training process of the entity identification model includes: acquiring training texts containing entity contents; extracting entity contents in the training text for numbering, and setting a corresponding identifier for each entity content; and adding a corresponding identifier in front of the entity content in the training text to be input, and inputting the entity content into the entity recognition model, so that all entity content in the output text of the entity recognition model is replaced by the corresponding identifier.

In an implementation manner of the first aspect, the step of converting the input text after the identification processing into the generated text according to the preset text generating task includes: converting the input text after the identification processing into a text according to a preset text generation task by utilizing a pre-trained text generation model; the generated text includes a text abstract or an article satisfying a preset condition.

In one implementation manner of the first aspect, the training process of the text generation model includes: acquiring training texts containing entity contents; replacing entity content in the training text with a preset identifier; when training text with the identifier is input into the text generation model for training, the text generation model book generates output text with entity content replaced by the identifier.

A second aspect of an embodiment of the present application provides a system for generating text-based faithfulness optimization, the system comprising: a text acquisition module configured to acquire an input text; the text error correction module is configured to perform text error correction on an input text and repair text errors in the input text; the entity identification module is configured to extract entity contents in the input text aiming at the text corrected input text, and replace the entity contents by using the identification information; the text generation module is configured to convert the input text after the identification processing into a text according to a preset text generation task; and the text conversion module is configured to respond to the existence of the identification information in the generated text, and convert the identification information into entity content corresponding to the identification information to obtain a final generated text.

A third aspect of an embodiment of the present application provides an electronic device, including: a processor and a memory; the memory is used for storing a computer program, and the processor is used for executing the computer program stored in the memory so as to enable the electronic device to execute the method.

A fourth aspect of the embodiments of the present application provides a computer readable storage medium having stored thereon a computer program which when executed by a processor implements the method.

As described above, the generated text-based loyalty optimization method, system, device and storage medium of the present application have the following beneficial effects:

the application provides a scheme for improving the fidelity of the generated text, does not need to consume a large amount of training resources, does not need to reduce too much efficiency in the reasoning stage, and can obviously reduce the common entity errors in the generated text. Through the combination of the text error correction model, the entity recognition model and the text generation model, the problems that errors exist in an input text, the language model learns error information in a pre-training stage, the model generates the text in a decoding stage and the like are effectively solved.

Drawings

Fig. 1 shows an application scenario schematic diagram of a generated text-based loyalty optimization method according to an embodiment of the present application.

Fig. 2 shows a schematic flow chart of a generated text-based loyalty optimization method according to an embodiment of the present application.

Fig. 3 is a schematic flow chart of a faithfulness optimization method based on generated text according to an embodiment of the application.

Fig. 4 shows a schematic structural diagram of a generated text-based loyalty optimization system according to an embodiment of the present application.

Fig. 5 is a schematic structural connection diagram of an electronic device according to an embodiment of the application.

Description of element reference numerals

4, generating a faithfulness optimization system of the text; 41—a text acquisition module; 42—text error correction module; 43—entity identification module; 44—a text generation module; 45—text conversion module; 5-an electronic device; 51—a processor; 52—memory; 53—a communication interface; 54—a system bus; S21-S25.

Detailed Description

Other advantages and effects of the present application will become apparent to those skilled in the art from the following disclosure, which describes the embodiments of the present application with reference to specific examples. The application may be practiced or carried out in other embodiments that depart from the specific details, and the details of the present description may be modified or varied from the spirit and scope of the present application. It should be noted that the following embodiments and features in the embodiments may be combined with each other without conflict.

It should be noted that the illustrations provided in the following embodiments merely illustrate the basic concept of the present application by way of illustration, and only the components related to the present application are shown in the drawings and are not drawn according to the number, shape and size of the components in actual implementation, and the form, number and proportion of the components in actual implementation may be arbitrarily changed, and the layout of the components may be more complicated.

The following embodiments of the present application provide a method, a system, a device, and a storage medium for generating text-based faithfulness optimization, including but not limited to application in an electronic device, and will be described below by taking the hardware application scenario as an example.

Referring to fig. 1, an application scenario diagram of a generated text-based loyalty optimization method according to an embodiment of the present application is shown. As shown in fig. 1, the present embodiment provides a hardware application scenario based on a faithfulness optimization method for generating text, which specifically includes: an electronic device. The electronic device uses a Python scripting language and a pythorch deep learning framework. And transmitting the input text to the electronic equipment, and executing the generated text-based loyalty optimization method by the electronic equipment to output the generated text with higher loyalty.

Wherein the electronic device may be, for example, a computer comprising all or part of the components of a memory, a memory controller, one or more processing units (CPUs), a peripheral interface, RF circuitry, audio circuitry, speakers, a microphone, an input/output (I/O) subsystem, a display screen, other output or control devices, and an external port, etc.; the computer includes, but is not limited to, a personal computer such as a desktop computer, a notebook computer, a tablet computer, a smart phone, a personal digital assistant (Personal Digital Assistant, PDA for short), and the like. In other embodiments, the electronic device may also be a server, where the server may be disposed on one or more entity servers according to multiple factors such as functions, loads, and the like, and may also be a cloud server formed by a distributed or centralized server cluster, which is not limited in this embodiment.

The following describes the technical solution in the embodiment of the present application in detail with reference to the drawings in the embodiment of the present application.

Referring to fig. 2, a schematic flow chart of a generated text-based loyalty optimization method according to an embodiment of the present application is shown. As shown in fig. 2, the present embodiment provides a faithfulness optimization method based on generated text, which specifically includes the following steps:

s21, acquiring an input text.

S22, performing text correction on the input text, and repairing text errors in the input text.

In one embodiment, the step of performing text correction on the input text and repairing text errors in the input text includes: and detecting and repairing spelling errors, grammar errors and/or punctuation errors existing in the input text through a pre-trained text error correction model.

In one embodiment, the training process of the text error correction model includes:

(1) The correct text for training is obtained.

Specifically, the correct text is "when encountering stress, we must be courier to face".

(2) Generating error text with spelling errors, grammar errors and/or punctuation errors by using the correct text.

In particular, the error text may be "when encountering reverse, we have to be courier to face". Wherein the word is a wrongly written word.

(3) And constructing a plurality of text pairs through the correct text and the error text.

In particular, with the correct text "when encountering stress we have to be wary of facing" and the wrong text "when encountering reverse, we have to be wary of facing" building a text pair.

(4) Based on each text pair, a wrong text is input to the text correction model such that the text correction model outputs a correct text.

In particular, entering the wrong text into the text correction model "when encountering an adverse event, we must be courier to face", and the text correction model outputs the correct text "when encountering an adverse event, we must be courier to face".

In practical applications, the text error correction model is a neural network-based model, and by training a large amount of data, errors can be detected and repaired in the input text. The text error correction model may be a Soft-Masked BERT model, a gemtor model, etc., or may be a model formed by some conventional rules to correct errors in text.

Thus, the use of a text correction model that has been trained can help correct spelling errors, grammar errors, punctuation errors, etc. that are present in the input text.

S23, extracting entity contents in the input text aiming at the input text subjected to text correction, and replacing the entity contents by using the identification information. Therefore, the application can greatly reduce entity errors in the generated text and improve the fidelity of the generated text under the condition of not consuming too much computing resources.

In an embodiment, the step of extracting the entity content in the input text for the text corrected input text and replacing the entity content with the identification information includes: extracting entity contents in the input text through a pre-trained entity recognition model, and replacing the entity contents by using identification information; the physical content includes people, time and/or place.

The entity recognition model is also a neural network-based model, and can detect and extract entity information in an input text by training a large number of texts with entity labels. In an implementation, a training set needs to be built to train the model to enable the model to have the ability to identify entities. In the reasoning process of the specific application, input text is transmitted into a model, the model automatically extracts entity information in the input text, and a number is allocated to each entity.

In one embodiment, the training process of the entity recognition model includes:

(1) Training text containing physical content is obtained.

Specifically, the training text is "Liu what happens in the past three years old.

(2) And extracting entity contents in the training text for numbering, and setting a corresponding identifier for each entity content.

Specifically, information such as characters, time, place, etc. in the input text is extracted and numbered, for example, "time 1", "place 1", etc. And for which a corresponding identifier, i.e. a special token, is designed, respectively. In practical application, each entity may be numbered according to the extracted sequence, or may be numbered according to a related numbering rule set by the attribute, name, etc. of the entity. In combination with the training text, entity contents that can be extracted include the name "Liu" of the person and the time "over the year".

(3) And adding a corresponding identifier in front of the entity content in the training text to be input, and inputting the entity content into the entity recognition model, so that all entity content in the output text of the entity recognition model is replaced by the corresponding identifier.

Specifically, during model training, a corresponding special token is added before an entity in the input text, for example, "[ time1]2023, 3 and 7 days", and the corresponding entity in the output text is replaced by the special token, namely, "2023, 3 and 7 days" is replaced by "[ time1]".

Further, a vocabulary is designed to convert each word (token) into a corresponding id representation, say the vocabulary content is "you: 1, a step of; the following steps: 2; and (3) good: 3", the" hello "can be converted into" 132 "so as to be sent to the entity recognition model for training.

A special token is a token in a vocabulary of some special expressions, say "[ MASK ]" is used to replace the masked word, "[ CLS ]", "[ SEP ]" represents the beginning and end of a sentence, respectively. For example, the sentence "hello" plus the beginning and end (assuming that id represented by [ CLS ] and [ SEP ] is 4, 5) is converted to id "41325", i.e. "CLS hello [ SEP ]".

Information of each entity is represented by adding special token to the vocabulary. The specific adding scheme can be to expand the vocabulary or replace the token identified by "[ unused ]" in the vocabulary with a special token needing to be added. These token identified by "[ unused ]" are used for expansion of the vocabulary.

In practical applications, the entity recognition model may be an NER model, and training of the NER model may take many forms, including sequence labeling, pointer, reading and understanding training, and the like. The NER model comprises BERT-CRF model, GPLink model, PURE model, etc. Therefore, the application saves more resources through the training and application of the entity identification model.

S24, according to a preset text generation task, the input text after identification processing is converted into a generated text. Therefore, the application does not consume more reasoning time, and can generate the text more efficiently.

In an embodiment, the step of converting the input text after the identification processing into the generated text according to the preset text generating task includes: converting the input text after the identification processing into a text according to a preset text generation task by utilizing a pre-trained text generation model; the generated text includes a text abstract or an article satisfying a preset condition. The preset text generation task may be a text abstract, outline generation task, or other article generation task meeting preset conditions. The preset condition may be a preset format, a preset paragraph, a preset field, a preset description object or preset content, etc.

In the text generation, in order to distinguish the entity information in training and reasoning, each entity information is mapped to a corresponding special token, such as "[ time1]", "[ place1]", and the like.

In one embodiment, the training process of the text generation model includes:

(1) Training text containing physical content is obtained.

(2) And replacing the entity content in the training text with a preset identifier.

(3) When training text with the identifier is input into the text generation model for training, the text generation model book generates output text with entity content replaced by the identifier.

Thus, when the text generation model is trained, the entity information in the input text is replaced by the special token, and the entity information in the output text is also replaced by the special token.

S25, converting the identification information into corresponding entity content in response to the identification information in the generated text, and obtaining a final generated text.

Specifically, when the text generation model is applied to reasoning, if a special token is generated, the corresponding entity information can be directly retrieved according to the number of the token, and the entity information is filled in the generated text. In practical application, the corresponding entity can be found in the dictionary representing the corresponding relation according to the generated special token.

Referring to fig. 3, a flow chart of a faithfulness optimization method based on generated text according to an embodiment of the application is shown. As shown in fig. 3, the input text is first subjected to a text correction model to repair text errors present in the input text. And extracting entity information from the corrected input text by using the NER model, adding a special token identifier before an entity in the input text to obtain an entity number table, inputting the input text added with the entity token into a text generation model, and replacing the subsequent text generation content by using an entity code after the special token identifier appears in a text generation model decoding stage to finally obtain a generated text.

Taking the text generation scenario of the generated text abstract as an example, with reference to FIG. 3, the input text is "XX white spirit has been hot pre-sold-! XX white spirit is of great interest in recent years. Why should one hundred degrees? As the wine industry again challenges the limits. They not only integrate with designs of different images, but also integrate with a certain lucky culture, seal and store for more than twenty years, and adopt the domestic leading technology, so that the XX white spirit in the present year has more collection value and investment potential. So if you want to taste a particular wine, you grasp the time to click on the pre-market bar. "why is the target generation text" XX white spirit fire-heat pre-sold? "

First, through a text error correction model, errors of input text are modified, and a pull in a fire pre-selling pull is modified into a la.

Then extracting the entity through the NER model to obtain an entity relation pair: "XX white spirit" -entity 1"[ object1]; "this year" -time 1"[ time1] and so on, thereby obtaining an entity number table. At the same time, the original data is added with the corresponding entity token, namely "[ object1] XX white spirit is pre-sold under the condition of having been burnt-! [ time1] the [ object1] XX white spirit of the present year is of particular interest. Why should one hundred degrees? The reason is that the wine industry again challenges the limit … … ", why is the output text modified to" [ time1] [ object1] heat pre-sold? ".

What will the output text "[ time1] [ object1] fire-heat pre-sell? "input to text generation model, if a special token is generated, say" [ time1] ", then automatically query the entity number table for [ time1] to find the entity" this year "in the reasoning stage of the text generation model; for another example, "[ object1]", the entity number table is automatically queried for [ object1], the entity "XX white spirit" is found, and then reasoning is continued. Such filling is performed every time a special token appears in the following, and the generated text "why is the fire and heat preselling of the present-year XX white spirit? ".

Furthermore, the application can also carry out a large amount of debugging and optimizing work based on the trained text error correction model and entity recognition model so as to ensure that the system can work normally under various scenes. After the debugging optimization is completed, the method can be applied to various actual text generation task scenes which need to be faithful to the original text for generating the text, such as the generation of document manuscripts, the generation of commodity description and the like. Specifically, because the emphasis to be extracted for different scenes is different, the text error correction model and the NER model need to be adjusted. For example, in a certain scenario, the importance of customers may be that some place names cannot be wrong, and at this time, the text error correction model needs to be adjusted to prevent the situation that the place names are modified to be wrong in the error correction process, and the NER model is adjusted to ensure that the place names can be extracted, so that the specific scenario needs to be completed through the debugging optimization.

The protection scope of the generated text-based loyalty optimization method according to the embodiments of the present application is not limited to the execution sequence of the steps listed in the present embodiment, and all the schemes implemented by adding or removing steps and replacing steps according to the prior art made by the principles of the present application are included in the protection scope of the present application.

The embodiment of the application also provides a faithful optimization system based on the generated text, which can realize the faithful optimization method based on the generated text, but the realization device of the faithful optimization method based on the generated text comprises but is not limited to the structure of the faithful optimization system based on the generated text listed in the embodiment, and all the structural variations and substitutions of the prior art according to the principles of the application are included in the protection scope of the application.

Referring now to FIG. 4, a schematic diagram of a system for generating text-based loyalty optimization in accordance with an embodiment of the present application is shown. As shown in fig. 4, the present embodiment provides a faithfulness optimization system 4 based on generated text, specifically including: a text acquisition module 41, a text correction module 42, an entity recognition module 43, a text generation module 44, and a text conversion module 45.

The text obtaining module 41 is configured to obtain input text.

The text error correction module 42 is configured to perform text error correction on the input text, repairing text errors in the input text.

In one embodiment, the text error correction module 42 is specifically configured to detect and repair spelling errors, grammar errors, and/or punctuation errors present in the input text via a pre-trained text error correction model.

Specifically, the training process of the text error correction model comprises the following steps: acquiring a correct text for training; generating error text with spelling errors, grammar errors and/or punctuation errors by using the correct text; constructing a plurality of text pairs through the correct text and the error text; based on each text pair, a wrong text is input to the text correction model such that the text correction model outputs a correct text.

The entity recognition module 43 is configured to extract, for the text-corrected input text, entity contents in the input text, and replace the entity contents with identification information.

In one embodiment, the entity recognition module 43 is specifically configured to extract the entity content in the input text through a pre-trained entity recognition model, and replace the entity content with the identification information; the physical content includes people, time and/or place.

Specifically, the training process of the entity recognition model comprises the following steps: acquiring training texts containing entity contents; extracting entity contents in the training text for numbering, and setting a corresponding identifier for each entity content; and adding a corresponding identifier in front of the entity content in the training text to be input, and inputting the entity content into the entity recognition model, so that all entity content in the output text of the entity recognition model is replaced by the corresponding identifier.

The text generation module 44 is configured to convert the identified processed input text into a generated text according to a preset text generation task.

In one embodiment, the text generation module 44 is specifically configured to convert the input text after the identification process into a text according to a preset text generation task by using a pre-trained text generation model; the generated text includes a text abstract or an article satisfying a preset condition.

Specifically, the training process of the text generation model comprises the following steps: acquiring training texts containing entity contents; replacing entity content in the training text with a preset identifier; when training text with the identifier is input into the text generation model for training, the text generation model book generates output text with entity content replaced by the identifier.

The text conversion module 45 is configured to convert the identification information into entity content corresponding to the identification information in response to the existence of the identification information in the generated text, so as to obtain a final generated text.

In the several embodiments provided by the present application, it should be understood that the disclosed system or method may be implemented in other manners. For example, the system embodiments described above are merely illustrative, e.g., the division of modules/units is merely a logical functional division, and there may be additional divisions when actually implemented, e.g., multiple modules or units may be combined or integrated into another system, or some features may be omitted or not performed. Alternatively, the coupling or direct coupling or communication connection shown or discussed with each other may be an indirect coupling or communication connection via some interfaces, devices or modules or units, which may be in electrical, mechanical or other forms.

The modules/units illustrated as separate components may or may not be physically separate, and components shown as modules/units may or may not be physical modules, i.e., may be located in one place, or may be distributed over a plurality of network elements. Some or all of the modules/units may be selected according to actual needs to achieve the objectives of the embodiments of the present application. For example, functional modules/units in various embodiments of the application may be integrated into one processing module, or each module/unit may exist alone physically, or two or more modules/units may be integrated into one module/unit.

Those of ordinary skill would further appreciate that the elements and algorithm steps of the examples described in connection with the embodiments disclosed herein may be embodied in electronic hardware, in computer software, or in a combination of the two, and that the elements and steps of the examples have been generally described in terms of function in the foregoing description to clearly illustrate the interchangeability of hardware and software. Whether such functionality is implemented as hardware or software depends upon the particular application and design constraints imposed on the solution. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present application.

Fig. 5 is a schematic diagram showing structural connection of an electronic device according to an embodiment of the application. As shown in fig. 5, the electronic device 5 of the present application includes: a processor 51, a memory 52, a communication interface 53, or/and a system bus 54. The memory 52 and the communication interface 53 are connected to the processor 51 via a system bus 54 and perform communication with each other, the memory 52 being arranged to store a computer program, the communication interface 53 being arranged to communicate with other devices, the processor 51 being arranged to run the computer program to cause the electronic device 5 to perform the steps of the generated text based loyalty optimization method.

The processor 51 may be a general-purpose processor, including a central processing unit (Central Processing Unit, CPU for short), a network processor (Network Processor, NP for short), etc.; but also digital signal processors (Digital Signal Processing, DSP for short), application specific integrated circuits (Application Specific Integrated Circuit, ASIC for short), field programmable gate arrays (Field Programmable Gate Array, FPGA for short) or other programmable logic devices, discrete gate or transistor logic devices, discrete hardware components.

The memory 52 may include a random access memory (Random Access Memory, simply referred to as RAM), and may further include a non-volatile memory (non-volatile memory), such as at least one magnetic disk memory.

The system bus 54 mentioned above may be a peripheral component interconnect standard (Peripheral Component Interconnect, PCI) bus or an extended industry standard architecture (Extended Industry Standard Architecture, EISA) bus, or the like. The system bus 54 may be divided into an address bus, a data bus, a control bus, and the like. The communication interface is used for realizing communication between the database access device and other devices (such as a client, a read-write library and a read-only library).

The embodiment of the application also provides a computer readable storage medium. Those of ordinary skill in the art will appreciate that all or part of the steps in the method implementing the above embodiments may be implemented by a program to instruct a processor, where the program may be stored in a computer readable storage medium, where the storage medium is a non-transitory (non-transitory) medium, such as a random access memory, a read only memory, a flash memory, a hard disk, a solid state disk, a magnetic tape (magnetic tape), a floppy disk (floppy disk), an optical disk (optical disk), and any combination thereof. The storage media may be any available media that can be accessed by a computer or a data storage device such as a server, data center, or the like that contains an integration of one or more available media. The usable medium may be a magnetic medium (e.g., a floppy disk, a hard disk, a magnetic tape), an optical medium (e.g., a digital video disc (digital video disc, DVD)), or a semiconductor medium (e.g., a Solid State Disk (SSD)), or the like.

The descriptions of the processes or structures corresponding to the drawings have emphasis, and the descriptions of other processes or structures may be referred to for the parts of a certain process or structure that are not described in detail.

The above embodiments are merely illustrative of the principles of the present application and its effectiveness, and are not intended to limit the application. Modifications and variations may be made to the above-described embodiments by those skilled in the art without departing from the spirit and scope of the application. Accordingly, it is intended that all equivalent modifications and variations of the application be covered by the claims, which are within the ordinary skill of the art, be within the spirit and scope of the present disclosure.

Claims

1. A method of generating text based loyalty optimization, the method comprising:

acquiring an input text;

performing text error correction on an input text, and repairing text errors in the input text;

extracting entity content in an input text aiming at the text corrected by text, and replacing the entity content by using identification information;

according to a preset text generation task, converting the input text after identification processing into a generated text;

responding to the identification information in the generated text, and converting the identification information into entity content corresponding to the identification information to obtain a final generated text;

the step of extracting the entity content in the input text aiming at the text corrected input text and replacing the entity content by using the identification information comprises the following steps:

extracting entity contents in the input text through a pre-trained entity recognition model, and replacing the entity contents by using identification information; the physical content includes characters, time and/or places;

the training process of the entity recognition model comprises the following steps:

acquiring training texts containing entity contents;

extracting entity contents in the training text for numbering, and setting a corresponding identifier for each entity content;

adding a corresponding identifier in front of the entity content in the training text to be input, and inputting the entity content into the entity recognition model, so that all entity content in the output text of the entity recognition model is replaced by the corresponding identifier;

the step of performing text correction on the input text and repairing text errors in the input text comprises the following steps:

detecting and repairing spelling errors, grammar errors and/or punctuation errors existing in the input text through a pre-trained text error correction model;

the training process of the text error correction model comprises the following steps:

acquiring a correct text for training;

generating error text with spelling errors, grammar errors and/or punctuation errors by using the correct text;

constructing a plurality of text pairs through the correct text and the error text;

inputting error text into the text correction model based on each text pair, so that the text correction model outputs correct text;

the step of converting the input text after the identification processing into the generated text according to the preset text generation task comprises the following steps:

converting the input text after the identification processing into a text according to a preset text generation task by utilizing a pre-trained text generation model; the text generation comprises a text abstract or an article meeting preset conditions;

the training process of the text generation model comprises the following steps:

acquiring training texts containing entity contents;

replacing entity content in the training text with a preset identifier;

when training text with the identifier is input into the text generation model for training, the text generation model book generates output text with entity content replaced by the identifier.

2. A system for generating text-based loyalty optimization, the system comprising:

a text acquisition module configured to acquire an input text;

the text error correction module is configured to perform text error correction on an input text and repair text errors in the input text; performing text correction on the input text, and repairing text errors in the input text comprises:

acquiring a correct text for training;

the entity identification module is configured to extract entity contents in the input text aiming at the text corrected input text, and replace the entity contents by using the identification information; the step of extracting the entity content in the input text aiming at the text corrected input text and replacing the entity content by using the identification information comprises the following steps:

acquiring training texts containing entity contents;

the text generation module is configured to convert the input text after the identification processing into a text according to a preset text generation task; the step of converting the input text after the identification processing into the generated text according to the preset text generation task comprises the following steps:

acquiring training texts containing entity contents;

replacing entity content in the training text with a preset identifier;

when training is performed by inputting training text with the identifier into the text generation model, the text generation model book generates output text with entity content replaced by the identifier;

and the text conversion module is configured to respond to the existence of the identification information in the generated text, and convert the identification information into entity content corresponding to the identification information to obtain a final generated text.

3. An electronic device, comprising: a processor and a memory;

the memory is configured to store a computer program, and the processor is configured to execute the computer program stored in the memory, to cause the electronic device to perform the method of claim 1.

4. A computer readable storage medium, on which a computer program is stored, characterized in that the computer program, when being executed by a processor, implements the method of claim 1.