CN116306599B - Faithfulness optimization method, system, equipment and storage medium based on generated text - Google Patents

Faithfulness optimization method, system, equipment and storage medium based on generated text Download PDF

Info

Publication number
CN116306599B
CN116306599B CN202310580415.XA CN202310580415A CN116306599B CN 116306599 B CN116306599 B CN 116306599B CN 202310580415 A CN202310580415 A CN 202310580415A CN 116306599 B CN116306599 B CN 116306599B
Authority
CN
China
Prior art keywords
text
entity
input
training
errors
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202310580415.XA
Other languages
Chinese (zh)
Other versions
CN116306599A (en
Inventor
贾国庆
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shanghai Mido Technology Co ltd
Original Assignee
Shanghai Mdata Information Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shanghai Mdata Information Technology Co ltd filed Critical Shanghai Mdata Information Technology Co ltd
Priority to CN202310580415.XA priority Critical patent/CN116306599B/en
Publication of CN116306599A publication Critical patent/CN116306599A/en
Application granted granted Critical
Publication of CN116306599B publication Critical patent/CN116306599B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/232Orthographic correction, e.g. spell checking or vowelisation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/10Text processing
    • G06F40/12Use of codes for handling textual entities
    • G06F40/151Transformation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/10Text processing
    • G06F40/166Editing, e.g. inserting or deleting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods

Abstract

The application provides a faithfulness optimization method, a system, equipment and a storage medium based on a generated text, wherein the method comprises the following steps: acquiring an input text; performing text error correction on an input text, and repairing text errors in the input text; extracting entity content in an input text aiming at the text corrected by text, and replacing the entity content by using identification information; according to a preset text generation task, converting the input text after identification processing into a generated text; and converting the identification information into entity content corresponding to the identification information in response to the identification information in the generated text, so as to obtain a final generated text. The application can reduce the phenomenon that the generated text is not faithful to the original text, and can reduce the phenomenon that the generated text is inconsistent with the entity in the original text.

Description

Faithfulness optimization method, system, equipment and storage medium based on generated text
Technical Field
The application belongs to the technical field of text processing, relates to a processing method for generating a text, and in particular relates to a faithfulness optimization method, a system, equipment and a storage medium based on the generated text.
Background
At present, text generation is generally performed by adopting a language model, but the text which is not faithful in the generated text exists in any language model when the text is generated. For example, in a conditional text generation task, a text summary of formula or article generation is generated. Such tasks sometimes place high demands on the fidelity of the generated text. Reasons for this include (1) errors in entering text; (2) The language model learns error information in a pre-training stage; (3) the model generates text errors during the decoding stage.
While training larger language models can improve the above problems, it requires a lot of computational and data resources and does not completely solve the loyalty problem; in addition, multiple reasoning calculates the generated text most similar to the original text by calculating rouge or bluu score, so that the probability of occurrence of the wrong text can be reduced to a certain extent, but some clear entity error problems can not be completely solved, and a great amount of time is consumed in the reasoning stage.
Disclosure of Invention
The application aims to provide a faithfulness optimization method, a system, equipment and a storage medium based on a generated text, which are used for solving the problem that the generated text has lower faithfulness.
An embodiment of the present application provides a method for optimizing loyalty based on a generated text, the method including: acquiring an input text; performing text error correction on an input text, and repairing text errors in the input text; extracting entity content in an input text aiming at the text corrected by text, and replacing the entity content by using identification information; according to a preset text generation task, converting the input text after identification processing into a generated text; and converting the identification information into entity content corresponding to the identification information in response to the identification information in the generated text, so as to obtain a final generated text.
In an implementation manner of the first aspect, the step of performing text correction on the input text and repairing text errors in the input text includes: and detecting and repairing spelling errors, grammar errors and/or punctuation errors existing in the input text through a pre-trained text error correction model.
In one implementation manner of the first aspect, the training process of the text error correction model includes: acquiring a correct text for training; generating error text with spelling errors, grammar errors and/or punctuation errors by using the correct text; constructing a plurality of text pairs through the correct text and the error text; based on each text pair, a wrong text is input to the text correction model such that the text correction model outputs a correct text.
In an implementation manner of the first aspect, the step of extracting, for the text-corrected input text, entity contents in the input text, and replacing the entity contents with identification information includes: extracting entity contents in the input text through a pre-trained entity recognition model, and replacing the entity contents by using identification information; the physical content includes people, time and/or place.
In one implementation manner of the first aspect, the training process of the entity identification model includes: acquiring training texts containing entity contents; extracting entity contents in the training text for numbering, and setting a corresponding identifier for each entity content; and adding a corresponding identifier in front of the entity content in the training text to be input, and inputting the entity content into the entity recognition model, so that all entity content in the output text of the entity recognition model is replaced by the corresponding identifier.
In an implementation manner of the first aspect, the step of converting the input text after the identification processing into the generated text according to the preset text generating task includes: converting the input text after the identification processing into a text according to a preset text generation task by utilizing a pre-trained text generation model; the generated text includes a text abstract or an article satisfying a preset condition.
In one implementation manner of the first aspect, the training process of the text generation model includes: acquiring training texts containing entity contents; replacing entity content in the training text with a preset identifier; when training text with the identifier is input into the text generation model for training, the text generation model book generates output text with entity content replaced by the identifier.
A second aspect of an embodiment of the present application provides a system for generating text-based faithfulness optimization, the system comprising: a text acquisition module configured to acquire an input text; the text error correction module is configured to perform text error correction on an input text and repair text errors in the input text; the entity identification module is configured to extract entity contents in the input text aiming at the text corrected input text, and replace the entity contents by using the identification information; the text generation module is configured to convert the input text after the identification processing into a text according to a preset text generation task; and the text conversion module is configured to respond to the existence of the identification information in the generated text, and convert the identification information into entity content corresponding to the identification information to obtain a final generated text.
A third aspect of an embodiment of the present application provides an electronic device, including: a processor and a memory; the memory is used for storing a computer program, and the processor is used for executing the computer program stored in the memory so as to enable the electronic device to execute the method.
A fourth aspect of the embodiments of the present application provides a computer readable storage medium having stored thereon a computer program which when executed by a processor implements the method.
As described above, the generated text-based loyalty optimization method, system, device and storage medium of the present application have the following beneficial effects:
the application provides a scheme for improving the fidelity of the generated text, does not need to consume a large amount of training resources, does not need to reduce too much efficiency in the reasoning stage, and can obviously reduce the common entity errors in the generated text. Through the combination of the text error correction model, the entity recognition model and the text generation model, the problems that errors exist in an input text, the language model learns error information in a pre-training stage, the model generates the text in a decoding stage and the like are effectively solved.
Drawings
Fig. 1 shows an application scenario schematic diagram of a generated text-based loyalty optimization method according to an embodiment of the present application.
Fig. 2 shows a schematic flow chart of a generated text-based loyalty optimization method according to an embodiment of the present application.
Fig. 3 is a schematic flow chart of a faithfulness optimization method based on generated text according to an embodiment of the application.
Fig. 4 shows a schematic structural diagram of a generated text-based loyalty optimization system according to an embodiment of the present application.
Fig. 5 is a schematic structural connection diagram of an electronic device according to an embodiment of the application.
Description of element reference numerals
4, generating a faithfulness optimization system of the text; 41—a text acquisition module; 42—text error correction module; 43—entity identification module; 44—a text generation module; 45—text conversion module; 5-an electronic device; 51—a processor; 52—memory; 53—a communication interface; 54—a system bus; S21-S25.
Detailed Description
Other advantages and effects of the present application will become apparent to those skilled in the art from the following disclosure, which describes the embodiments of the present application with reference to specific examples. The application may be practiced or carried out in other embodiments that depart from the specific details, and the details of the present description may be modified or varied from the spirit and scope of the present application. It should be noted that the following embodiments and features in the embodiments may be combined with each other without conflict.
It should be noted that the illustrations provided in the following embodiments merely illustrate the basic concept of the present application by way of illustration, and only the components related to the present application are shown in the drawings and are not drawn according to the number, shape and size of the components in actual implementation, and the form, number and proportion of the components in actual implementation may be arbitrarily changed, and the layout of the components may be more complicated.
The following embodiments of the present application provide a method, a system, a device, and a storage medium for generating text-based faithfulness optimization, including but not limited to application in an electronic device, and will be described below by taking the hardware application scenario as an example.
Referring to fig. 1, an application scenario diagram of a generated text-based loyalty optimization method according to an embodiment of the present application is shown. As shown in fig. 1, the present embodiment provides a hardware application scenario based on a faithfulness optimization method for generating text, which specifically includes: an electronic device. The electronic device uses a Python scripting language and a pythorch deep learning framework. And transmitting the input text to the electronic equipment, and executing the generated text-based loyalty optimization method by the electronic equipment to output the generated text with higher loyalty.
Wherein the electronic device may be, for example, a computer comprising all or part of the components of a memory, a memory controller, one or more processing units (CPUs), a peripheral interface, RF circuitry, audio circuitry, speakers, a microphone, an input/output (I/O) subsystem, a display screen, other output or control devices, and an external port, etc.; the computer includes, but is not limited to, a personal computer such as a desktop computer, a notebook computer, a tablet computer, a smart phone, a personal digital assistant (Personal Digital Assistant, PDA for short), and the like. In other embodiments, the electronic device may also be a server, where the server may be disposed on one or more entity servers according to multiple factors such as functions, loads, and the like, and may also be a cloud server formed by a distributed or centralized server cluster, which is not limited in this embodiment.
The following describes the technical solution in the embodiment of the present application in detail with reference to the drawings in the embodiment of the present application.
Referring to fig. 2, a schematic flow chart of a generated text-based loyalty optimization method according to an embodiment of the present application is shown. As shown in fig. 2, the present embodiment provides a faithfulness optimization method based on generated text, which specifically includes the following steps:
s21, acquiring an input text.
S22, performing text correction on the input text, and repairing text errors in the input text.
In one embodiment, the step of performing text correction on the input text and repairing text errors in the input text includes: and detecting and repairing spelling errors, grammar errors and/or punctuation errors existing in the input text through a pre-trained text error correction model.
In one embodiment, the training process of the text error correction model includes:
(1) The correct text for training is obtained.
Specifically, the correct text is "when encountering stress, we must be courier to face".
(2) Generating error text with spelling errors, grammar errors and/or punctuation errors by using the correct text.
In particular, the error text may be "when encountering reverse, we have to be courier to face". Wherein the word is a wrongly written word.
(3) And constructing a plurality of text pairs through the correct text and the error text.
In particular, with the correct text "when encountering stress we have to be wary of facing" and the wrong text "when encountering reverse, we have to be wary of facing" building a text pair.
(4) Based on each text pair, a wrong text is input to the text correction model such that the text correction model outputs a correct text.
In particular, entering the wrong text into the text correction model "when encountering an adverse event, we must be courier to face", and the text correction model outputs the correct text "when encountering an adverse event, we must be courier to face".
In practical applications, the text error correction model is a neural network-based model, and by training a large amount of data, errors can be detected and repaired in the input text. The text error correction model may be a Soft-Masked BERT model, a gemtor model, etc., or may be a model formed by some conventional rules to correct errors in text.
Thus, the use of a text correction model that has been trained can help correct spelling errors, grammar errors, punctuation errors, etc. that are present in the input text.
S23, extracting entity contents in the input text aiming at the input text subjected to text correction, and replacing the entity contents by using the identification information. Therefore, the application can greatly reduce entity errors in the generated text and improve the fidelity of the generated text under the condition of not consuming too much computing resources.
In an embodiment, the step of extracting the entity content in the input text for the text corrected input text and replacing the entity content with the identification information includes: extracting entity contents in the input text through a pre-trained entity recognition model, and replacing the entity contents by using identification information; the physical content includes people, time and/or place.
The entity recognition model is also a neural network-based model, and can detect and extract entity information in an input text by training a large number of texts with entity labels. In an implementation, a training set needs to be built to train the model to enable the model to have the ability to identify entities. In the reasoning process of the specific application, input text is transmitted into a model, the model automatically extracts entity information in the input text, and a number is allocated to each entity.
In one embodiment, the training process of the entity recognition model includes:
(1) Training text containing physical content is obtained.
Specifically, the training text is "Liu what happens in the past three years old.
(2) And extracting entity contents in the training text for numbering, and setting a corresponding identifier for each entity content.
Specifically, information such as characters, time, place, etc. in the input text is extracted and numbered, for example, "time 1", "place 1", etc. And for which a corresponding identifier, i.e. a special token, is designed, respectively. In practical application, each entity may be numbered according to the extracted sequence, or may be numbered according to a related numbering rule set by the attribute, name, etc. of the entity. In combination with the training text, entity contents that can be extracted include the name "Liu" of the person and the time "over the year".
(3) And adding a corresponding identifier in front of the entity content in the training text to be input, and inputting the entity content into the entity recognition model, so that all entity content in the output text of the entity recognition model is replaced by the corresponding identifier.
Specifically, during model training, a corresponding special token is added before an entity in the input text, for example, "[ time1]2023, 3 and 7 days", and the corresponding entity in the output text is replaced by the special token, namely, "2023, 3 and 7 days" is replaced by "[ time1]".
Further, a vocabulary is designed to convert each word (token) into a corresponding id representation, say the vocabulary content is "you: 1, a step of; the following steps: 2; and (3) good: 3", the" hello "can be converted into" 132 "so as to be sent to the entity recognition model for training.
A special token is a token in a vocabulary of some special expressions, say "[ MASK ]" is used to replace the masked word, "[ CLS ]", "[ SEP ]" represents the beginning and end of a sentence, respectively. For example, the sentence "hello" plus the beginning and end (assuming that id represented by [ CLS ] and [ SEP ] is 4, 5) is converted to id "41325", i.e. "CLS hello [ SEP ]".
Information of each entity is represented by adding special token to the vocabulary. The specific adding scheme can be to expand the vocabulary or replace the token identified by "[ unused ]" in the vocabulary with a special token needing to be added. These token identified by "[ unused ]" are used for expansion of the vocabulary.
In practical applications, the entity recognition model may be an NER model, and training of the NER model may take many forms, including sequence labeling, pointer, reading and understanding training, and the like. The NER model comprises BERT-CRF model, GPLink model, PURE model, etc. Therefore, the application saves more resources through the training and application of the entity identification model.
S24, according to a preset text generation task, the input text after identification processing is converted into a generated text. Therefore, the application does not consume more reasoning time, and can generate the text more efficiently.
In an embodiment, the step of converting the input text after the identification processing into the generated text according to the preset text generating task includes: converting the input text after the identification processing into a text according to a preset text generation task by utilizing a pre-trained text generation model; the generated text includes a text abstract or an article satisfying a preset condition. The preset text generation task may be a text abstract, outline generation task, or other article generation task meeting preset conditions. The preset condition may be a preset format, a preset paragraph, a preset field, a preset description object or preset content, etc.
In the text generation, in order to distinguish the entity information in training and reasoning, each entity information is mapped to a corresponding special token, such as "[ time1]", "[ place1]", and the like.
In one embodiment, the training process of the text generation model includes:
(1) Training text containing physical content is obtained.
(2) And replacing the entity content in the training text with a preset identifier.
(3) When training text with the identifier is input into the text generation model for training, the text generation model book generates output text with entity content replaced by the identifier.
Thus, when the text generation model is trained, the entity information in the input text is replaced by the special token, and the entity information in the output text is also replaced by the special token.
S25, converting the identification information into corresponding entity content in response to the identification information in the generated text, and obtaining a final generated text.
Specifically, when the text generation model is applied to reasoning, if a special token is generated, the corresponding entity information can be directly retrieved according to the number of the token, and the entity information is filled in the generated text. In practical application, the corresponding entity can be found in the dictionary representing the corresponding relation according to the generated special token.
Referring to fig. 3, a flow chart of a faithfulness optimization method based on generated text according to an embodiment of the application is shown. As shown in fig. 3, the input text is first subjected to a text correction model to repair text errors present in the input text. And extracting entity information from the corrected input text by using the NER model, adding a special token identifier before an entity in the input text to obtain an entity number table, inputting the input text added with the entity token into a text generation model, and replacing the subsequent text generation content by using an entity code after the special token identifier appears in a text generation model decoding stage to finally obtain a generated text.
Taking the text generation scenario of the generated text abstract as an example, with reference to FIG. 3, the input text is "XX white spirit has been hot pre-sold-! XX white spirit is of great interest in recent years. Why should one hundred degrees? As the wine industry again challenges the limits. They not only integrate with designs of different images, but also integrate with a certain lucky culture, seal and store for more than twenty years, and adopt the domestic leading technology, so that the XX white spirit in the present year has more collection value and investment potential. So if you want to taste a particular wine, you grasp the time to click on the pre-market bar. "why is the target generation text" XX white spirit fire-heat pre-sold? "
First, through a text error correction model, errors of input text are modified, and a pull in a fire pre-selling pull is modified into a la.
Then extracting the entity through the NER model to obtain an entity relation pair: "XX white spirit" -entity 1"[ object1]; "this year" -time 1"[ time1] and so on, thereby obtaining an entity number table. At the same time, the original data is added with the corresponding entity token, namely "[ object1] XX white spirit is pre-sold under the condition of having been burnt-! [ time1] the [ object1] XX white spirit of the present year is of particular interest. Why should one hundred degrees? The reason is that the wine industry again challenges the limit … … ", why is the output text modified to" [ time1] [ object1] heat pre-sold? ".
What will the output text "[ time1] [ object1] fire-heat pre-sell? "input to text generation model, if a special token is generated, say" [ time1] ", then automatically query the entity number table for [ time1] to find the entity" this year "in the reasoning stage of the text generation model; for another example, "[ object1]", the entity number table is automatically queried for [ object1], the entity "XX white spirit" is found, and then reasoning is continued. Such filling is performed every time a special token appears in the following, and the generated text "why is the fire and heat preselling of the present-year XX white spirit? ".
Furthermore, the application can also carry out a large amount of debugging and optimizing work based on the trained text error correction model and entity recognition model so as to ensure that the system can work normally under various scenes. After the debugging optimization is completed, the method can be applied to various actual text generation task scenes which need to be faithful to the original text for generating the text, such as the generation of document manuscripts, the generation of commodity description and the like. Specifically, because the emphasis to be extracted for different scenes is different, the text error correction model and the NER model need to be adjusted. For example, in a certain scenario, the importance of customers may be that some place names cannot be wrong, and at this time, the text error correction model needs to be adjusted to prevent the situation that the place names are modified to be wrong in the error correction process, and the NER model is adjusted to ensure that the place names can be extracted, so that the specific scenario needs to be completed through the debugging optimization.
The protection scope of the generated text-based loyalty optimization method according to the embodiments of the present application is not limited to the execution sequence of the steps listed in the present embodiment, and all the schemes implemented by adding or removing steps and replacing steps according to the prior art made by the principles of the present application are included in the protection scope of the present application.
The embodiment of the application also provides a faithful optimization system based on the generated text, which can realize the faithful optimization method based on the generated text, but the realization device of the faithful optimization method based on the generated text comprises but is not limited to the structure of the faithful optimization system based on the generated text listed in the embodiment, and all the structural variations and substitutions of the prior art according to the principles of the application are included in the protection scope of the application.
Referring now to FIG. 4, a schematic diagram of a system for generating text-based loyalty optimization in accordance with an embodiment of the present application is shown. As shown in fig. 4, the present embodiment provides a faithfulness optimization system 4 based on generated text, specifically including: a text acquisition module 41, a text correction module 42, an entity recognition module 43, a text generation module 44, and a text conversion module 45.
The text obtaining module 41 is configured to obtain input text.
The text error correction module 42 is configured to perform text error correction on the input text, repairing text errors in the input text.
In one embodiment, the text error correction module 42 is specifically configured to detect and repair spelling errors, grammar errors, and/or punctuation errors present in the input text via a pre-trained text error correction model.
Specifically, the training process of the text error correction model comprises the following steps: acquiring a correct text for training; generating error text with spelling errors, grammar errors and/or punctuation errors by using the correct text; constructing a plurality of text pairs through the correct text and the error text; based on each text pair, a wrong text is input to the text correction model such that the text correction model outputs a correct text.
The entity recognition module 43 is configured to extract, for the text-corrected input text, entity contents in the input text, and replace the entity contents with identification information.
In one embodiment, the entity recognition module 43 is specifically configured to extract the entity content in the input text through a pre-trained entity recognition model, and replace the entity content with the identification information; the physical content includes people, time and/or place.
Specifically, the training process of the entity recognition model comprises the following steps: acquiring training texts containing entity contents; extracting entity contents in the training text for numbering, and setting a corresponding identifier for each entity content; and adding a corresponding identifier in front of the entity content in the training text to be input, and inputting the entity content into the entity recognition model, so that all entity content in the output text of the entity recognition model is replaced by the corresponding identifier.
The text generation module 44 is configured to convert the identified processed input text into a generated text according to a preset text generation task.
In one embodiment, the text generation module 44 is specifically configured to convert the input text after the identification process into a text according to a preset text generation task by using a pre-trained text generation model; the generated text includes a text abstract or an article satisfying a preset condition.
Specifically, the training process of the text generation model comprises the following steps: acquiring training texts containing entity contents; replacing entity content in the training text with a preset identifier; when training text with the identifier is input into the text generation model for training, the text generation model book generates output text with entity content replaced by the identifier.
The text conversion module 45 is configured to convert the identification information into entity content corresponding to the identification information in response to the existence of the identification information in the generated text, so as to obtain a final generated text.
In the several embodiments provided by the present application, it should be understood that the disclosed system or method may be implemented in other manners. For example, the system embodiments described above are merely illustrative, e.g., the division of modules/units is merely a logical functional division, and there may be additional divisions when actually implemented, e.g., multiple modules or units may be combined or integrated into another system, or some features may be omitted or not performed. Alternatively, the coupling or direct coupling or communication connection shown or discussed with each other may be an indirect coupling or communication connection via some interfaces, devices or modules or units, which may be in electrical, mechanical or other forms.
The modules/units illustrated as separate components may or may not be physically separate, and components shown as modules/units may or may not be physical modules, i.e., may be located in one place, or may be distributed over a plurality of network elements. Some or all of the modules/units may be selected according to actual needs to achieve the objectives of the embodiments of the present application. For example, functional modules/units in various embodiments of the application may be integrated into one processing module, or each module/unit may exist alone physically, or two or more modules/units may be integrated into one module/unit.
Those of ordinary skill would further appreciate that the elements and algorithm steps of the examples described in connection with the embodiments disclosed herein may be embodied in electronic hardware, in computer software, or in a combination of the two, and that the elements and steps of the examples have been generally described in terms of function in the foregoing description to clearly illustrate the interchangeability of hardware and software. Whether such functionality is implemented as hardware or software depends upon the particular application and design constraints imposed on the solution. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present application.
Fig. 5 is a schematic diagram showing structural connection of an electronic device according to an embodiment of the application. As shown in fig. 5, the electronic device 5 of the present application includes: a processor 51, a memory 52, a communication interface 53, or/and a system bus 54. The memory 52 and the communication interface 53 are connected to the processor 51 via a system bus 54 and perform communication with each other, the memory 52 being arranged to store a computer program, the communication interface 53 being arranged to communicate with other devices, the processor 51 being arranged to run the computer program to cause the electronic device 5 to perform the steps of the generated text based loyalty optimization method.
The processor 51 may be a general-purpose processor, including a central processing unit (Central Processing Unit, CPU for short), a network processor (Network Processor, NP for short), etc.; but also digital signal processors (Digital Signal Processing, DSP for short), application specific integrated circuits (Application Specific Integrated Circuit, ASIC for short), field programmable gate arrays (Field Programmable Gate Array, FPGA for short) or other programmable logic devices, discrete gate or transistor logic devices, discrete hardware components.
The memory 52 may include a random access memory (Random Access Memory, simply referred to as RAM), and may further include a non-volatile memory (non-volatile memory), such as at least one magnetic disk memory.
The system bus 54 mentioned above may be a peripheral component interconnect standard (Peripheral Component Interconnect, PCI) bus or an extended industry standard architecture (Extended Industry Standard Architecture, EISA) bus, or the like. The system bus 54 may be divided into an address bus, a data bus, a control bus, and the like. The communication interface is used for realizing communication between the database access device and other devices (such as a client, a read-write library and a read-only library).
The embodiment of the application also provides a computer readable storage medium. Those of ordinary skill in the art will appreciate that all or part of the steps in the method implementing the above embodiments may be implemented by a program to instruct a processor, where the program may be stored in a computer readable storage medium, where the storage medium is a non-transitory (non-transitory) medium, such as a random access memory, a read only memory, a flash memory, a hard disk, a solid state disk, a magnetic tape (magnetic tape), a floppy disk (floppy disk), an optical disk (optical disk), and any combination thereof. The storage media may be any available media that can be accessed by a computer or a data storage device such as a server, data center, or the like that contains an integration of one or more available media. The usable medium may be a magnetic medium (e.g., a floppy disk, a hard disk, a magnetic tape), an optical medium (e.g., a digital video disc (digital video disc, DVD)), or a semiconductor medium (e.g., a Solid State Disk (SSD)), or the like.
The descriptions of the processes or structures corresponding to the drawings have emphasis, and the descriptions of other processes or structures may be referred to for the parts of a certain process or structure that are not described in detail.
The above embodiments are merely illustrative of the principles of the present application and its effectiveness, and are not intended to limit the application. Modifications and variations may be made to the above-described embodiments by those skilled in the art without departing from the spirit and scope of the application. Accordingly, it is intended that all equivalent modifications and variations of the application be covered by the claims, which are within the ordinary skill of the art, be within the spirit and scope of the present disclosure.

Claims (4)

1. A method of generating text based loyalty optimization, the method comprising:
acquiring an input text;
performing text error correction on an input text, and repairing text errors in the input text;
extracting entity content in an input text aiming at the text corrected by text, and replacing the entity content by using identification information;
according to a preset text generation task, converting the input text after identification processing into a generated text;
responding to the identification information in the generated text, and converting the identification information into entity content corresponding to the identification information to obtain a final generated text;
the step of extracting the entity content in the input text aiming at the text corrected input text and replacing the entity content by using the identification information comprises the following steps:
extracting entity contents in the input text through a pre-trained entity recognition model, and replacing the entity contents by using identification information; the physical content includes characters, time and/or places;
the training process of the entity recognition model comprises the following steps:
acquiring training texts containing entity contents;
extracting entity contents in the training text for numbering, and setting a corresponding identifier for each entity content;
adding a corresponding identifier in front of the entity content in the training text to be input, and inputting the entity content into the entity recognition model, so that all entity content in the output text of the entity recognition model is replaced by the corresponding identifier;
the step of performing text correction on the input text and repairing text errors in the input text comprises the following steps:
detecting and repairing spelling errors, grammar errors and/or punctuation errors existing in the input text through a pre-trained text error correction model;
the training process of the text error correction model comprises the following steps:
acquiring a correct text for training;
generating error text with spelling errors, grammar errors and/or punctuation errors by using the correct text;
constructing a plurality of text pairs through the correct text and the error text;
inputting error text into the text correction model based on each text pair, so that the text correction model outputs correct text;
the step of converting the input text after the identification processing into the generated text according to the preset text generation task comprises the following steps:
converting the input text after the identification processing into a text according to a preset text generation task by utilizing a pre-trained text generation model; the text generation comprises a text abstract or an article meeting preset conditions;
the training process of the text generation model comprises the following steps:
acquiring training texts containing entity contents;
replacing entity content in the training text with a preset identifier;
when training text with the identifier is input into the text generation model for training, the text generation model book generates output text with entity content replaced by the identifier.
2. A system for generating text-based loyalty optimization, the system comprising:
a text acquisition module configured to acquire an input text;
the text error correction module is configured to perform text error correction on an input text and repair text errors in the input text; performing text correction on the input text, and repairing text errors in the input text comprises:
detecting and repairing spelling errors, grammar errors and/or punctuation errors existing in the input text through a pre-trained text error correction model;
the training process of the text error correction model comprises the following steps:
acquiring a correct text for training;
generating error text with spelling errors, grammar errors and/or punctuation errors by using the correct text;
constructing a plurality of text pairs through the correct text and the error text;
inputting error text into the text correction model based on each text pair, so that the text correction model outputs correct text;
the entity identification module is configured to extract entity contents in the input text aiming at the text corrected input text, and replace the entity contents by using the identification information; the step of extracting the entity content in the input text aiming at the text corrected input text and replacing the entity content by using the identification information comprises the following steps:
extracting entity contents in the input text through a pre-trained entity recognition model, and replacing the entity contents by using identification information; the physical content includes characters, time and/or places;
the training process of the entity recognition model comprises the following steps:
acquiring training texts containing entity contents;
extracting entity contents in the training text for numbering, and setting a corresponding identifier for each entity content;
adding a corresponding identifier in front of the entity content in the training text to be input, and inputting the entity content into the entity recognition model, so that all entity content in the output text of the entity recognition model is replaced by the corresponding identifier;
the text generation module is configured to convert the input text after the identification processing into a text according to a preset text generation task; the step of converting the input text after the identification processing into the generated text according to the preset text generation task comprises the following steps:
converting the input text after the identification processing into a text according to a preset text generation task by utilizing a pre-trained text generation model; the text generation comprises a text abstract or an article meeting preset conditions;
the training process of the text generation model comprises the following steps:
acquiring training texts containing entity contents;
replacing entity content in the training text with a preset identifier;
when training is performed by inputting training text with the identifier into the text generation model, the text generation model book generates output text with entity content replaced by the identifier;
and the text conversion module is configured to respond to the existence of the identification information in the generated text, and convert the identification information into entity content corresponding to the identification information to obtain a final generated text.
3. An electronic device, comprising: a processor and a memory;
the memory is configured to store a computer program, and the processor is configured to execute the computer program stored in the memory, to cause the electronic device to perform the method of claim 1.
4. A computer readable storage medium, on which a computer program is stored, characterized in that the computer program, when being executed by a processor, implements the method of claim 1.
CN202310580415.XA 2023-05-23 2023-05-23 Faithfulness optimization method, system, equipment and storage medium based on generated text Active CN116306599B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202310580415.XA CN116306599B (en) 2023-05-23 2023-05-23 Faithfulness optimization method, system, equipment and storage medium based on generated text

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202310580415.XA CN116306599B (en) 2023-05-23 2023-05-23 Faithfulness optimization method, system, equipment and storage medium based on generated text

Publications (2)

Publication Number Publication Date
CN116306599A CN116306599A (en) 2023-06-23
CN116306599B true CN116306599B (en) 2023-09-08

Family

ID=86824319

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202310580415.XA Active CN116306599B (en) 2023-05-23 2023-05-23 Faithfulness optimization method, system, equipment and storage medium based on generated text

Country Status (1)

Country Link
CN (1) CN116306599B (en)

Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110717031A (en) * 2019-10-15 2020-01-21 南京摄星智能科技有限公司 Intelligent conference summary generation method and system
CN111460827A (en) * 2020-04-01 2020-07-28 北京爱咔咔信息技术有限公司 Text information processing method, system, equipment and computer readable storage medium
CN112185520A (en) * 2020-09-27 2021-01-05 志诺维思(北京)基因科技有限公司 Text structured processing system and method for medical pathology report picture
CN114036930A (en) * 2021-10-28 2022-02-11 北京明略昭辉科技有限公司 Text error correction method, device, equipment and computer readable medium
WO2022095563A1 (en) * 2020-11-06 2022-05-12 北京世纪好未来教育科技有限公司 Text error correction adaptation method and apparatus, and electronic device, and storage medium
CN115062104A (en) * 2022-05-17 2022-09-16 北京理工大学 Knowledge prompt-fused legal text small sample named entity identification method
CN115630632A (en) * 2022-09-29 2023-01-20 北京蜜度信息技术有限公司 Method, system, medium and terminal for correcting personal name in specific field based on context semantics
CN115965009A (en) * 2022-12-23 2023-04-14 中国联合网络通信集团有限公司 Training and text error correction method and device for text error correction model

Family Cites Families (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US10762293B2 (en) * 2010-12-22 2020-09-01 Apple Inc. Using parts-of-speech tagging and named entity recognition for spelling correction
US11429790B2 (en) * 2019-09-25 2022-08-30 International Business Machines Corporation Automated detection of personal information in free text
US11593557B2 (en) * 2020-06-22 2023-02-28 Crimson AI LLP Domain-specific grammar correction system, server and method for academic text
US20230121711A1 (en) * 2021-10-14 2023-04-20 Adobe Inc. Content augmentation with machine generated content to meet content gaps during interaction with target entities

Patent Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110717031A (en) * 2019-10-15 2020-01-21 南京摄星智能科技有限公司 Intelligent conference summary generation method and system
CN111460827A (en) * 2020-04-01 2020-07-28 北京爱咔咔信息技术有限公司 Text information processing method, system, equipment and computer readable storage medium
CN112185520A (en) * 2020-09-27 2021-01-05 志诺维思(北京)基因科技有限公司 Text structured processing system and method for medical pathology report picture
WO2022095563A1 (en) * 2020-11-06 2022-05-12 北京世纪好未来教育科技有限公司 Text error correction adaptation method and apparatus, and electronic device, and storage medium
CN114036930A (en) * 2021-10-28 2022-02-11 北京明略昭辉科技有限公司 Text error correction method, device, equipment and computer readable medium
CN115062104A (en) * 2022-05-17 2022-09-16 北京理工大学 Knowledge prompt-fused legal text small sample named entity identification method
CN115630632A (en) * 2022-09-29 2023-01-20 北京蜜度信息技术有限公司 Method, system, medium and terminal for correcting personal name in specific field based on context semantics
CN115965009A (en) * 2022-12-23 2023-04-14 中国联合网络通信集团有限公司 Training and text error correction method and device for text error correction model

Also Published As

Publication number Publication date
CN116306599A (en) 2023-06-23

Similar Documents

Publication Publication Date Title
WO2020224219A1 (en) Chinese word segmentation method and apparatus, electronic device and readable storage medium
JP6909832B2 (en) Methods, devices, equipment and media for recognizing important words in audio
US20220262151A1 (en) Method, apparatus, and system for recognizing text in image
WO2018086519A1 (en) Method and device for identifying specific text information
CN111459977B (en) Conversion of natural language queries
WO2023184633A1 (en) Chinese spelling error correction method and system, storage medium, and terminal
CN114218945A (en) Entity identification method, device, server and storage medium
US11494431B2 (en) Generating accurate and natural captions for figures
US20230065965A1 (en) Text processing method and apparatus
JP2022003544A (en) Method for increasing field text, related device, and computer program product
CN113780289A (en) Image recognition method and device, storage medium and electronic equipment
US20200320255A1 (en) Language Processing Method and Device
CN116306599B (en) Faithfulness optimization method, system, equipment and storage medium based on generated text
CN112632956A (en) Text matching method, device, terminal and storage medium
CN112559725A (en) Text matching method, device, terminal and storage medium
US20230334075A1 (en) Search platform for unstructured interaction summaries
CN109902309B (en) Translation method, device, equipment and storage medium
CN115455949A (en) Chinese grammar error correction method and system, storage medium and terminal
KR102559849B1 (en) Malicious comment filter device and method
CN115455981A (en) Semantic understanding method, device, equipment and storage medium for multi-language sentences
WO2022271369A1 (en) Training of an object linking model
WO2021217915A1 (en) Human-machine dialog method and apparatus, and computer device and storage medium
CN113326698A (en) Method for detecting entity relationship, model training method and electronic equipment
CN112506952A (en) Data inquiry device and data inquiry method
CN116306598B (en) Customized error correction method, system, equipment and medium for words in different fields

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant
CP01 Change in the name or title of a patent holder
CP01 Change in the name or title of a patent holder

Address after: Room 301ab, No.10, Lane 198, zhangheng Road, China (Shanghai) pilot Free Trade Zone, Pudong New Area, Shanghai 201204

Patentee after: Shanghai Mido Technology Co.,Ltd.

Address before: Room 301ab, No.10, Lane 198, zhangheng Road, China (Shanghai) pilot Free Trade Zone, Pudong New Area, Shanghai 201204

Patentee before: SHANGHAI MDATA INFORMATION TECHNOLOGY Co.,Ltd.