CN115408997A - Text generation method, text generation device and readable storage medium - Google Patents

Text generation method, text generation device and readable storage medium Download PDF

Info

Publication number
CN115408997A
CN115408997A CN202210961108.1A CN202210961108A CN115408997A CN 115408997 A CN115408997 A CN 115408997A CN 202210961108 A CN202210961108 A CN 202210961108A CN 115408997 A CN115408997 A CN 115408997A
Authority
CN
China
Prior art keywords
sentence
target
similar
text
sentences
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202210961108.1A
Other languages
Chinese (zh)
Inventor
徐华韫
黄明星
王福钋
曹富康
张航飞
王月宝
沈鹏
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Absolute Health Ltd
Original Assignee
Beijing Absolute Health Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Absolute Health Ltd filed Critical Beijing Absolute Health Ltd
Priority to CN202210961108.1A priority Critical patent/CN115408997A/en
Publication of CN115408997A publication Critical patent/CN115408997A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/10Text processing
    • G06F40/166Editing, e.g. inserting or deleting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/205Parsing
    • G06F40/211Syntactic parsing, e.g. based on context-free grammar [CFG] or unification grammars
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • G06F40/289Phrasal analysis, e.g. finite state techniques or chunking
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/30Semantic analysis

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Health & Medical Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Computational Linguistics (AREA)
  • General Health & Medical Sciences (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The application provides a text generation method, a text generation device and a readable storage medium, and relates to the technical field of information processing. The method comprises the following steps: acquiring a text to be processed, dividing the text to be processed into a plurality of sentences, and determining a target sentence which does not contain key information in the plurality of sentences, wherein the key information is information representing key semantics of the text to be processed; inputting the target sentence into a similar text generation model to obtain at least one first similar sentence of the target sentence; acquiring at least one second similar sentence of the target sentence in a preset text knowledge base based on the sentence similarity; replacing the similar meaning words of the participles in the target sentence to generate at least one third similar sentence of the target sentence; and determining a target similar sentence in the at least one first similar sentence, the at least one second similar sentence and the at least one third similar sentence, and replacing the target sentence with the target similar sentence to generate a target text of the text to be processed.

Description

Text generation method, text generation device and readable storage medium
Technical Field
The present application relates to the field of information processing technologies, and in particular, to a text generation method, a text generation apparatus, and a readable storage medium.
Background
In the internet insurance industry, traffic and content are two crucial factors, and the development of the whole industry is continuously promoted. In recent years, as applications become mature, how to create better content becomes more critical, and clients who attract and retain consultants with better skills become more critical. In particular, in the fields of insurance sales, cash change, and the like, for example, how a conversation robot communicates with a client using rich speech becomes a new problem.
The generation method in the related technology takes a generation algorithm as a core, preprocesses sentences containing key words, and puts the preprocessed sentences into a generation model to generate original sentence-pair similar sentences after data is enhanced. However, if the method is used in a large scale, the logic between article paragraphs is easy to be inconsistent, and the substitution of the sentences of the keywords completely changes the original meaning of the sentences, so that the generated result is unsatisfactory.
Disclosure of Invention
In view of this, the present application provides a text generation method, a text generation apparatus, and a readable storage medium, which solve the problem in the related art that the effect of generating a new text is not ideal.
In a first aspect, an embodiment of the present application provides a text generation method, including: acquiring a text to be processed, dividing the text to be processed into a plurality of sentences, and determining a target sentence which does not contain key information in the plurality of sentences, wherein the key information is information representing key semantics of the text to be processed; inputting the target sentence into a similar text generation model to obtain at least one first similar sentence of the target sentence; acquiring at least one second similar sentence of the target sentence in a preset text knowledge base based on the sentence similarity; performing near meaning word replacement on the participles in the target sentence to generate at least one third similar sentence of the target sentence; and determining a target similar sentence in the at least one first similar sentence, the at least one second similar sentence and the at least one third similar sentence, and replacing the target sentence with the target similar sentence to generate a target text of the text to be processed.
The text generation method according to the embodiment of the present application may further have the following additional technical features:
in the above technical solution, optionally, dividing the text to be processed into a plurality of sentences includes: and dividing the text to be processed according to punctuation marks to obtain a plurality of sentences.
In any of the above technical solutions, optionally, inputting the target sentence into the similar text generation model to obtain at least one first similar sentence of the target sentence, including: inputting the target sentence into a similar text generation model to obtain a plurality of first candidate sentences; and filtering out first candidate sentences which do not meet the specification in the plurality of first candidate sentences, and determining the first candidate sentences of which the similarity with the target sentences is greater than or equal to a first threshold value in the first candidate sentences which are remained as the first similar sentences of the target sentences.
In any of the above technical solutions, optionally, before obtaining at least one second similar sentence of the target sentence in the preset text knowledge base based on the sentence similarity, the method further includes: crawling a plurality of text article data by using a crawler tool, and dividing the text article data according to punctuations to obtain a plurality of short sentences; and screening out a second candidate sentence from the plurality of short sentences according to the key information, and storing the second candidate sentence into a preset text knowledge base.
In any of the above technical solutions, optionally, obtaining at least one second similar sentence of the target sentence in a preset text knowledge base based on the sentence similarity includes: respectively carrying out similarity calculation on a plurality of second candidate sentences stored in a preset text knowledge base and the target sentences; and taking the second candidate sentence with the similarity larger than or equal to the second threshold value as a second similar sentence of the target sentence.
In any of the above technical solutions, optionally, performing near-meaning word replacement on the participles in the target sentence to generate at least one third similar sentence of the target sentence, including: dividing a target sentence into a plurality of participles, and obtaining a word vector of each participle; obtaining the near-meaning words of each participle according to the word vector, sequentially selecting the near-meaning words of each participle, and recombining the target sentences to generate a plurality of third candidate sentences; and calculating the smoothness of each third candidate sentence, and taking the third candidate sentence with the smoothness larger than or equal to a third threshold value as a third similar sentence of the target sentence.
In any of the above technical solutions, optionally, determining the target similar sentence in the at least one first similar sentence, the at least one second similar sentence, and the at least one third similar sentence includes: and randomly extracting one similar sentence from the at least one first similar sentence, the at least one second similar sentence and the at least one third similar sentence to serve as a target similar sentence.
In any of the above technical solutions, optionally, determining the target similar sentence among the at least one first similar sentence, the at least one second similar sentence, and the at least one third similar sentence includes: and taking the similar sentence with the highest similarity with the target sentence as the target similar sentence in the at least one first similar sentence, the at least one second similar sentence and the at least one third similar sentence.
In a second aspect, an embodiment of the present application provides a text generation apparatus, including: the system comprises an acquisition module, a processing module and a processing module, wherein the acquisition module is used for acquiring a text to be processed, dividing the text to be processed into a plurality of sentences, and determining a target sentence which does not contain key information in the plurality of sentences, and the key information is information representing key semantics of the text to be processed; the first processing module is used for inputting the target sentence into the similar text generation model to obtain at least one first similar sentence of the target sentence; the second processing module is used for acquiring at least one second similar sentence of the target sentence in a preset text knowledge base based on the sentence similarity; the third processing module is used for replacing the similar meaning words of the participles in the target sentence to generate at least one third similar sentence of the target sentence; and the generating module is used for determining a target similar sentence in the at least one first similar sentence, the at least one second similar sentence and the at least one third similar sentence, replacing the target sentence with the target similar sentence and generating a target text of the text to be processed.
In a third aspect, embodiments of the present application provide a readable storage medium on which a program or instructions are stored, which when executed by a processor implement the steps of the method according to the first aspect.
In a fourth aspect, embodiments of the present application provide a computer device comprising a processor and a memory, the memory storing a program or instructions executable on the processor, the program or instructions, when executed by the processor, implementing the steps of the method as in the first aspect.
In a fifth aspect, embodiments of the present application provide a chip, where the chip includes a processor and a communication interface, where the communication interface is coupled to the processor, and the processor is configured to execute a program or instructions to implement the method according to the first aspect.
In a sixth aspect, embodiments of the present application provide a computer program product, stored on a storage medium, for execution by at least one processor to implement a method as in the first aspect.
In the embodiment of the present application, in order to ensure that the finally generated target text is logical and consistent with the semantics of the original text, the obtained text to be processed (i.e., the original text) is first subjected to sentence division to obtain a plurality of sentences, and the plurality of sentences are screened, specifically, a part of target sentences that do not contain the key information (i.e., the immobility information) indicating the key semantics of the text to be processed are screened. For example, in the insurance field, the key information may refer to insurance brands such as peace, life, and the like. Further, the target sentence is processed in three ways, so that at least one first similar sentence, at least one second similar sentence and at least one third similar sentence are obtained. Specifically, the first processing is to generate a first similar sentence of the target sentence by using a similar text generation model, the second processing is to obtain a second similar sentence of the target sentence according to the sentences stored in a preset text knowledge base, and the third processing is to generate a third similar sentence of the target sentence according to a similar word replacement mode. And finally, determining a target similar sentence in the at least one first similar sentence, the at least one second similar sentence and the at least one third similar sentence, and replacing the target sentence with the target similar sentence to generate a target text of the text to be processed.
According to the method and the device, the diversity expansion of the text is supported by using the three processes, the diversity of text expression is improved, and the finally generated target text can be ensured not to deviate from the gist, the logic is strict and the language is smooth by dividing sentences and reserving key information.
The foregoing description is only an overview of the technical solutions of the present application, and the present application can be implemented according to the content of the description in order to make the technical means of the present application more clearly understood, and the following detailed description of the present application is given in order to make the above and other objects, features, and advantages of the present application more clearly understandable.
Drawings
The accompanying drawings, which are included to provide a further understanding of the application and are incorporated in and constitute a part of this application, illustrate embodiment(s) of the application and together with the description serve to explain the application and not to limit the application. In the drawings:
FIG. 1 is a flow chart of a text generation method according to an embodiment of the present application;
FIG. 2 is a second flowchart of a text generation method according to an embodiment of the present application;
fig. 3 is a block diagram showing a structure of a text generation apparatus according to an embodiment of the present application;
fig. 4 shows a block diagram of a computer device according to an embodiment of the present application.
Detailed Description
The technical solutions in the embodiments of the present application will be described clearly below with reference to the drawings in the embodiments of the present application, and it is obvious that the described embodiments are some, but not all, embodiments of the present application. All other embodiments, which can be derived from the embodiments in the present application by a person skilled in the art, are within the scope of protection of the present application.
The terms first, second and the like in the description and in the claims of the present application are used for distinguishing between similar elements and not necessarily for describing a particular sequential or chronological order. It will be appreciated that the data so used may be interchanged under appropriate circumstances such that embodiments of the application are capable of operation in sequences other than those illustrated or described herein, and that the terms "first," "second," etc. are generally used in a generic sense and do not limit the number of terms, e.g., a first term can be one or more than one. In addition, "and/or" in the specification and claims means at least one of connected objects, a character "/", and generally means that the former and latter related objects are in an "or" relationship.
On the basis of generating diversified new texts, the generated new texts need to be ensured to be logical and smooth and consistent, and compared with the original texts, the meanings of the expressions cannot have obvious deviation, namely, the information representing the key semantics of the original texts needs to be kept unchanged. The embodiment of the application provides a text generation method, which not only can ensure the diversity of a new text, but also can ensure the reliability of the new text.
The text generation method, the text generation device, and the readable storage medium provided in the embodiments of the present application are described in detail below with reference to the accompanying drawings by specific embodiments and application scenarios thereof.
An embodiment of the present application provides a text generation method, and fig. 1 shows one of flow diagrams of the text generation method of the embodiment of the present application, where the method includes:
step 101, acquiring a text to be processed, dividing the text to be processed into a plurality of sentences, and determining a target sentence which does not contain key information in the plurality of sentences, wherein the key information is information representing key semantics of the text to be processed;
102, inputting a target sentence into a similar text generation model to obtain at least one first similar sentence of the target sentence;
103, acquiring at least one second similar sentence of the target sentence in a preset text knowledge base based on the sentence similarity;
104, performing near meaning word replacement on the participles in the target sentence to generate at least one third similar sentence of the target sentence;
and 105, determining a target similar sentence in the at least one first similar sentence, the at least one second similar sentence and the at least one third similar sentence, and replacing the target sentence with the target similar sentence to generate a target text of the text to be processed.
In this embodiment, in order to ensure logical smoothness of the finally generated target text and consistency with the semantics of the original text, the obtained text to be processed (i.e., the original text) is first subjected to sentence division to obtain a plurality of sentences, one sentence is a short sentence, and the plurality of sentences are screened, specifically, a part of target sentences not containing key information (i.e., immobility information) representing the key semantics of the text to be processed are screened. For example, in the insurance field, the key information may refer to insurance brands such as peace, life, and the like.
Further, the target sentence is processed in three ways, so that at least one first similar sentence, at least one second similar sentence and at least one third similar sentence are obtained. Specifically, the first processing is to generate a first similar sentence of the target sentence by using a similar text generation model, the second processing is to obtain a second similar sentence of the target sentence according to the sentences stored in a preset text knowledge base, and the third processing is to generate a third similar sentence of the target sentence according to a similar word replacement mode.
And finally, determining a target similar sentence in the at least one first similar sentence, the at least one second similar sentence and the at least one third similar sentence, and replacing the target sentence by using the target similar sentence to generate a target text (namely a new text) of the text to be processed.
It should be noted that the number of the target sentences determined for the text to be processed may be one or more, and the target text of the text to be processed is obtained after replacing one or more target sentences.
According to the method and the device, the diversity expansion of the text is supported by using the three processes, the diversity of text expression is improved, and the finally generated target text can be ensured not to deviate from the gist, the logic is strict and the language is smooth by dividing sentences and reserving key information.
According to the embodiment of the application, a large quantity of expansion is carried out on one article or one section of characters, and the expansion can be carried out on more than ten or twenty articles, so that the repetition degree among the articles is kept at a low level.
The text generation method can be used for filling items of conversational robot conversational gestures in the insurance field, the robot conversational gestures with the same targets and meanings but quite various conversational gestures can be generated by the text generation method, the reply of the robot is ensured not to be uniform, and therefore the customer experience is better.
Fig. 2 shows a second flowchart of the text generation method according to the embodiment of the present application. The text generation method according to the embodiment of the present application will be described in detail below with reference to fig. 2.
In the embodiment of the application, in order to ensure that the finally generated target text is consistent in logic and semantics with the original text, the acquired text to be processed is firstly divided according to punctuations, the sentence with the minimum granularity is obtained by the division, and the subsequent operations are performed on the sentence with the minimum granularity. And (3) performing wave screening on each sentence, screening out a part of target sentences which do not contain key information and unmovable information, and performing the following three processing steps on the target sentences respectively to support text diversity expansion. The three processing steps include:
(1) And generating a similar sentence of the target sentence by using the similar text generation model. Specifically, the target sentence is vectorized, and the vectorized target sentence is used as an input of the similar text generation model to generate a plurality of first candidate sentences. And (3) for the plurality of generated first candidate sentences, passing through a part of posterior rules, and deleting the first candidate sentences which do not accord with the specification, wherein the first candidate sentences which do not accord with the specification comprise spoken sentences, non-simplified sentences and the like. And in the rest of the first candidate sentences, taking the first candidate sentence with the similarity greater than or equal to the first threshold value with the target sentence as the first similar sentence of the target sentence, or taking the first k first candidate sentences in the sequence according to the similarity with the target sentence as the first similar sentences of the target sentence, and using the first candidate sentences as the first similar sentences of the target sentence for the subsequent generation of the target text.
(2) And obtaining similar sentences of the target sentences according to the sentences stored in the preset text knowledge base. Specifically, a crawler tool is used for crawling a large amount of text article data in a relevant field, such as articles or question and answer data, after the text article data are accumulated to a certain magnitude, all the text article data are divided according to punctuations to obtain a plurality of short sentences with the minimum granularity, key information of a text to be processed is used for screening out second candidate sentences which can be used for processing the text to be processed, the second candidate sentences are used as a preset text knowledge base for replacing similar sentences, and the short sentences in the preset text knowledge base are accumulated to a million level, so that the accuracy of obtaining the second candidate sentences is guaranteed. And then, carrying out similarity calculation on the target sentence and all second candidate sentences in a preset text knowledge base, and using the second candidate sentences with the similarity larger than or equal to a second threshold value as second similar sentences of the target sentence for subsequent generation of the target text.
(3) And generating similar sentences of the target sentence according to the mode of replacing the similar words. Specifically, word segmentation is performed on the target sentence by using a word segmentation tool, and the obtained word segmentation obtains a word vector based on a trained word vector model, so that one or more similar words of each word segmentation can be obtained. When one or more participles in a target sentence have replaceable similar meaning words, the similar meaning words are sequentially selected to replace the target sentence for sentence recombination, and a plurality of third candidate sentences are generated. And calculating the compliance of the plurality of third candidate sentences by using a sentence compliance algorithm, wherein if the compliance is greater than or equal to a third threshold value, the third candidate sentences are used as third similar sentences of the target sentences for subsequent generation of the target text.
And obtaining a candidate database through the three processing steps, obtaining a target similar sentence of each target sentence of the text to be processed from the candidate database, and randomly selecting one target similar sentence to replace the target sentence when constructing the target text. And carrying out similar sentence replacement on all target sentences in the text to be processed to obtain the target text.
The similar sentence replacing mode can be that a target similar sentence is randomly extracted to replace a target sentence, and the target similar sentence with the highest similarity can be selected to replace the target sentence according to the sequence of the similarity from high to low.
In the embodiment of the application, the finally generated target text is different from the original text to be processed, so that language expression is enriched, text diversity is improved, the meaning of the original text is not deviated, and text reliability is guaranteed.
As a specific implementation of the text generation method, an embodiment of the present application provides a text generation apparatus. As shown in fig. 3, the text generating apparatus 300 includes: an acquisition module 301, a first processing module 302, a second processing module 303, a third processing module 304, and a generation module 305.
The acquiring module 301 is configured to acquire a to-be-processed text, divide the to-be-processed text into a plurality of sentences, and determine a target sentence that does not include key information in the plurality of sentences, where the key information is information indicating key semantics of the to-be-processed text; the first processing module 302 is configured to input the target sentence into the similar text generation model to obtain at least one first similar sentence of the target sentence; the second processing module 303 is configured to obtain at least one second similar sentence of the target sentence in a preset text knowledge base based on the sentence similarity; the third processing module 304 is configured to perform near-meaning word replacement on the segmented words in the target sentence, and generate at least one third similar sentence of the target sentence; a generating module 305, configured to determine a target similar sentence among the at least one first similar sentence, the at least one second similar sentence, and the at least one third similar sentence, and replace the target sentence with the target similar sentence, so as to generate a target text of the text to be processed.
In this embodiment, in order to ensure logical smoothness of the finally generated target text and consistency with the semantics of the original text, the obtained text to be processed (i.e., the original text) is first subjected to sentence division to obtain a plurality of sentences, and the plurality of sentences are screened, specifically, a part of the target sentences which do not contain the key information (i.e., the immobility information) representing the key semantics of the text to be processed are screened. For example, in the insurance field, the key information may refer to insurance brands such as peace, life, and the like. Further, the target sentence is subjected to three kinds of processing, so that at least one first similar sentence, at least one second similar sentence and at least one third similar sentence are obtained respectively. Specifically, the first processing is to generate a first similar sentence of the target sentence by using a similar text generation model, the second processing is to obtain a second similar sentence of the target sentence according to the sentences stored in a preset text knowledge base, and the third processing is to generate a third similar sentence of the target sentence according to a similar word replacement mode. And finally, determining a target similar sentence in the at least one first similar sentence, the at least one second similar sentence and the at least one third similar sentence, and replacing the target sentence with the target similar sentence to generate a target text of the text to be processed.
According to the embodiment of the application, the three processes are used for supporting the diversity expansion of the text, so that the diversity of text expression is improved, and through sentence division and key information reservation, the finally generated target text can be ensured not to deviate from the subject, the logic is strict, and the language is smooth.
Further, the obtaining module 301 is specifically configured to divide the text to be processed according to punctuation marks to obtain a plurality of sentences.
Further, the first processing module 302 is specifically configured to: inputting the target sentence into a similar text generation model to obtain a plurality of first candidate sentences; and filtering out first candidate sentences which do not meet the specification in the plurality of first candidate sentences, and determining the first candidate sentences of which the similarity with the target sentence is greater than or equal to a first threshold value from the retained first candidate sentences as the first similar sentences of the target sentence.
Further, the text generating apparatus 300 further includes: a fourth processing module to: crawling a plurality of text article data by using a crawler tool, and dividing the text article data according to punctuations to obtain a plurality of short sentences; and screening out a second candidate sentence from the plurality of short sentences according to the key information, and storing the second candidate sentence into a preset text knowledge base.
Further, the second processing module 303 is specifically configured to: respectively carrying out similarity calculation on a plurality of second candidate sentences stored in a preset text knowledge base and target sentences; and taking the second candidate sentence with the similarity larger than or equal to the second threshold value as a second similar sentence of the target sentence.
Further, the third processing module 304 is specifically configured to: dividing a target sentence into a plurality of participles, and obtaining a word vector of each participle; obtaining the near meaning words of each participle according to the word vectors, sequentially selecting the near meaning words of each participle, recombining the target sentences to generate a plurality of third candidate sentences; and calculating the smoothness of each third candidate sentence, and taking the third candidate sentence with the smoothness larger than or equal to a third threshold value as a third similar sentence of the target sentence.
Further, the generating module 305 is specifically configured to: and randomly extracting one similar sentence from the at least one first similar sentence, the at least one second similar sentence and the at least one third similar sentence to serve as a target similar sentence.
Further, the generating module 305 is specifically configured to: and taking the similar sentence with the highest similarity with the target sentence as the target similar sentence in the at least one first similar sentence, the at least one second similar sentence and the at least one third similar sentence.
The text generating apparatus 300 in the embodiment of the present application may be a computer device, or may be a component in a computer device, such as an integrated circuit or a chip. The computer device may be a terminal, or may be a device other than a terminal. The Computer Device may be, for example, a Mobile phone, a tablet Computer, a notebook Computer, a palm top Computer, a vehicle-mounted Computer Device, a Mobile Internet Device (MID), a robot, a wearable Device, an Ultra-Mobile Personal Computer (UMPC), a netbook or a Personal Digital Assistant (PDA), and the like, and may also be a server, a Network Attached Storage (NAS), a Personal Computer (PC), a Television (Television, TV), a teller machine, a self-service machine, and the like, and the embodiments of the present application are not limited in particular.
The text generation apparatus 300 in the embodiment of the present application may be an apparatus having an operating system. The operating system may be an Android (Android) operating system, an ios operating system, or other possible operating systems, and embodiments of the present application are not limited specifically.
The text generation apparatus 300 provided in this embodiment of the application can implement each process implemented in the text generation method embodiment of fig. 1, and is not described here again to avoid repetition.
The embodiments of the present application further provide a readable storage medium, where a program or an instruction is stored, and when the program or the instruction is executed by a processor, the program or the instruction implements the processes of the text generation method embodiment, and can achieve the same technical effects, and in order to avoid repetition, details are not repeated here.
As shown in fig. 4, the computer device 400 includes a processor 401 and a memory 402, where the memory 402 stores a program or an instruction that can be executed on the processor 401, and when the program or the instruction is executed by the processor 401, the steps of the text generation method embodiment can be implemented, and the same technical effects can be achieved.
It should be noted that the computer devices in the embodiments of the present application include the mobile computer device and the non-mobile computer device described above.
The memory 402 may be used to store software programs as well as various data. The memory 402 may mainly include a first storage area storing programs or instructions and a second storage area storing data, wherein the first storage area may store an operating system, application programs or instructions required for at least one function (such as a sound playing function, an image playing function, etc.), and the like. Further, memory 402 may include volatile memory or nonvolatile memory, or memory 402 may include both volatile and nonvolatile memory. The non-volatile Memory may be a Read-Only Memory (ROM), a Programmable ROM (PROM), an Erasable PROM (EPROM), an Electrically Erasable PROM (EEPROM), or a flash Memory. The volatile Memory may be a Random Access Memory (RAM), a Static Random Access Memory (Static RAM, SRAM), a Dynamic Random Access Memory (Dynamic RAM, DRAM), a Synchronous Dynamic Random Access Memory (Synchronous DRAM, SDRAM), a Double Data Rate Synchronous Dynamic Random Access Memory (Double Data Rate SDRAM, ddr SDRAM), an Enhanced Synchronous SDRAM (ESDRAM), a Synchronous Link DRAM (SLDRAM), and a Direct Memory bus RAM (DRRAM). The memory 402 in the embodiments of the subject application includes, but is not limited to, these and any other suitable types of memory.
Processor 401 may include one or more processing units; optionally, the processor 401 may integrate an application processor, which mainly handles operations related to the operating system, user interface, application programs, etc., and a modem processor, which mainly handles wireless communication signals, such as a baseband processor. It will be appreciated that the modem processor described above may not be integrated into the processor 401.
The embodiment of the present application further provides a chip, where the chip includes a processor and a communication interface, the communication interface is coupled to the processor, and the processor is configured to execute a program or an instruction to implement each process of the text generation method embodiment, and the same technical effect can be achieved.
It should be understood that the chips mentioned in the embodiments of the present application may also be referred to as system-on-chip, system-on-chip or system-on-chip, etc.
The embodiment of the present application further provides a computer program product, where the program product is stored in a storage medium, and the program product is executed by at least one processor to implement the processes of the foregoing text generation method embodiment, and can achieve the same technical effect, and in order to avoid repetition, details are not repeated here.
It should be noted that, in this document, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrases "comprising a component of' 8230; \8230;" does not exclude the presence of another like element in a process, method, article, or apparatus that comprises the element. Further, it should be noted that the scope of the methods and apparatus of the embodiments of the present application is not limited to performing the functions in the order illustrated or discussed, but may include performing the functions in a substantially simultaneous manner or in a reverse order based on the functions involved, e.g., the methods described may be performed in an order different than that described, and various steps may be added, omitted, or combined. In addition, features described with reference to certain examples may be combined in other examples.
While the present embodiments have been described with reference to the accompanying drawings, it is to be understood that the invention is not limited to the precise embodiments described above, which are meant to be illustrative and not restrictive, and that various changes may be made therein by those skilled in the art without departing from the spirit and scope of the invention as defined by the appended claims.

Claims (10)

1. A text generation method, comprising:
acquiring a text to be processed, dividing the text to be processed into a plurality of sentences, and determining a target sentence which does not contain key information in the plurality of sentences, wherein the key information is information representing key semantics of the text to be processed;
inputting the target sentence into a similar text generation model to obtain at least one first similar sentence of the target sentence;
acquiring at least one second similar sentence of the target sentence in a preset text knowledge base based on the sentence similarity;
performing near word replacement on the participles in the target sentence to generate at least one third similar sentence of the target sentence;
and determining a target similar sentence in the at least one first similar sentence, the at least one second similar sentence and the at least one third similar sentence, and replacing the target sentence with the target similar sentence to generate a target text of the text to be processed.
2. The method of claim 1, wherein the dividing the text to be processed into a plurality of sentences comprises:
and dividing the text to be processed according to punctuation marks to obtain the plurality of sentences.
3. The method of claim 1, wherein the inputting the target sentence into a similar text generation model to obtain at least one first similar sentence of the target sentence comprises:
inputting the target sentence into a similar text generation model to obtain a plurality of first candidate sentences;
and filtering out first candidate sentences which do not meet the specification in the plurality of first candidate sentences, and determining the first candidate sentences of which the similarity with the target sentence is greater than or equal to a first threshold value in the retained first candidate sentences as the first similar sentences of the target sentence.
4. The method of claim 1, further comprising, before the obtaining at least one second similar sentence of the target sentence in a preset text knowledge base based on the sentence similarity,:
crawling a plurality of text article data by using a crawler tool, and dividing the text article data according to punctuations to obtain a plurality of short sentences;
and screening out a second candidate sentence from the plurality of short sentences according to the key information, and storing the second candidate sentence into the preset text knowledge base.
5. The method of claim 1, wherein the obtaining at least one second similar sentence of the target sentence in a preset text knowledge base based on the sentence similarity comprises:
respectively carrying out similarity calculation on a plurality of second candidate sentences stored in the preset text knowledge base and the target sentence;
and taking the second candidate sentence with the similarity larger than or equal to a second threshold value as a second similar sentence of the target sentence.
6. The method of claim 1, wherein the performing near word replacement on the participles in the target sentence to generate at least one third similar sentence of the target sentence comprises:
dividing the target sentence into a plurality of participles, and obtaining a word vector of each participle;
obtaining a near meaning word of each participle according to the word vector, sequentially selecting the near meaning word of each participle, and recombining the target sentence to generate a plurality of third candidate sentences;
and calculating the compliance of each third candidate sentence, and taking the third candidate sentence with the compliance larger than or equal to a third threshold value as a third similar sentence of the target sentence.
7. The method according to any one of claims 1 to 6, wherein the determining a target similar sentence among the at least one first similar sentence, the at least one second similar sentence, and the at least one third similar sentence comprises:
randomly extracting one similar sentence from the at least one first similar sentence, the at least one second similar sentence and the at least one third similar sentence as the target similar sentence.
8. The method according to any one of claims 1 to 6, wherein the determining a target similar sentence among the at least one first similar sentence, the at least one second similar sentence, and the at least one third similar sentence includes:
and taking the similar sentence with the highest similarity with the target sentence in the at least one first similar sentence, the at least one second similar sentence and the at least one third similar sentence as the target similar sentence.
9. A text generation apparatus, comprising:
the system comprises an acquisition module, a processing module and a processing module, wherein the acquisition module is used for acquiring a text to be processed, dividing the text to be processed into a plurality of sentences, and determining a target sentence which does not contain key information in the plurality of sentences, and the key information is information representing key semantics of the text to be processed;
the first processing module is used for inputting the target sentence into a similar text generation model to obtain at least one first similar sentence of the target sentence;
the second processing module is used for acquiring at least one second similar sentence of the target sentence from a preset text knowledge base based on the sentence similarity;
the third processing module is used for carrying out near meaning word replacement on the participles in the target sentence to generate at least one third similar sentence of the target sentence;
and the generating module is used for determining a target similar sentence in the at least one first similar sentence, the at least one second similar sentence and the at least one third similar sentence, and replacing the target sentence with the target similar sentence to generate a target text of the text to be processed.
10. A readable storage medium on which a program or instructions are stored, characterized in that the program or instructions, when executed by a processor, implement the steps of the text generation method according to any one of claims 1 to 8.
CN202210961108.1A 2022-08-11 2022-08-11 Text generation method, text generation device and readable storage medium Pending CN115408997A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210961108.1A CN115408997A (en) 2022-08-11 2022-08-11 Text generation method, text generation device and readable storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210961108.1A CN115408997A (en) 2022-08-11 2022-08-11 Text generation method, text generation device and readable storage medium

Publications (1)

Publication Number Publication Date
CN115408997A true CN115408997A (en) 2022-11-29

Family

ID=84159362

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210961108.1A Pending CN115408997A (en) 2022-08-11 2022-08-11 Text generation method, text generation device and readable storage medium

Country Status (1)

Country Link
CN (1) CN115408997A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117235546A (en) * 2023-11-14 2023-12-15 国泰新点软件股份有限公司 Multi-version file comparison method, device, system and storage medium

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117235546A (en) * 2023-11-14 2023-12-15 国泰新点软件股份有限公司 Multi-version file comparison method, device, system and storage medium
CN117235546B (en) * 2023-11-14 2024-03-12 国泰新点软件股份有限公司 Multi-version file comparison method, device, system and storage medium

Similar Documents

Publication Publication Date Title
CN111859960B (en) Semantic matching method, device, computer equipment and medium based on knowledge distillation
CN109271521B (en) Text classification method and device
US20210271823A1 (en) Content generation using target content derived modeling and unsupervised language modeling
US9898464B2 (en) Information extraction supporting apparatus and method
CN112215008A (en) Entity recognition method and device based on semantic understanding, computer equipment and medium
CN111859986A (en) Semantic matching method, device, equipment and medium based on multitask twin network
CN113297366B (en) Emotion recognition model training method, device, equipment and medium for multi-round dialogue
CN109784365B (en) Feature selection method, terminal, readable medium and computer program
CN112651236B (en) Method and device for extracting text information, computer equipment and storage medium
CN111767394A (en) Abstract extraction method and device based on artificial intelligence expert system
CN111523960A (en) Product pushing method and device based on sparse matrix, computer equipment and medium
CN115017288A (en) Model training method, model training device, equipment and storage medium
CN114399396A (en) Insurance product recommendation method and device, computer equipment and storage medium
CN113204953A (en) Text matching method and device based on semantic recognition and device readable storage medium
CN115408997A (en) Text generation method, text generation device and readable storage medium
CN117235546B (en) Multi-version file comparison method, device, system and storage medium
CN110347934B (en) Text data filtering method, device and medium
CN111737548A (en) Click verification code identification method and device, computer equipment and storage medium
CN113705198B (en) Scene graph generation method and device, electronic equipment and storage medium
WO2022246162A1 (en) Content generation using target content derived modeling and unsupervised language modeling
CN114638229A (en) Entity identification method, device, medium and equipment of record data
CN109766539B (en) Standard word stock word segmentation method, device, equipment and computer readable storage medium
JP7099254B2 (en) Learning methods, learning programs and learning devices
CN112148855A (en) Intelligent customer service problem retrieval method, terminal and storage medium
Chaonithi et al. A hybrid approach for Thai word segmentation with crowdsourcing feedback system

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
CB02 Change of applicant information

Address after: 100102 201 / F, block C, 2 lizezhong 2nd Road, Chaoyang District, Beijing

Applicant after: Beijing Shuidi Technology Group Co.,Ltd.

Address before: 100102 201, 2 / F, block C, No.2 lizezhong 2nd Road, Chaoyang District, Beijing

Applicant before: Beijing Health Home Technology Co.,Ltd.

CB02 Change of applicant information