CN112380848A

CN112380848A - Text generation method, device, equipment and storage medium

Info

Publication number: CN112380848A
Application number: CN202011298668.0A
Authority: CN
Inventors: 邓黎明; 庄伯金; 钱江; 王少军
Original assignee: Ping An Technology Shenzhen Co Ltd
Current assignee: Ping An Technology Shenzhen Co Ltd
Priority date: 2020-11-19
Filing date: 2020-11-19
Publication date: 2021-02-19
Anticipated expiration: 2040-11-19
Also published as: CN112380848B

Abstract

The invention relates to the technical field of artificial intelligence, and discloses a text generation method, a device, equipment and a storage medium, wherein the method comprises the following steps: acquiring an original text to be edited and performing semantic recognition to obtain a first content and a second content; based on the corresponding relation between the preset words and the slot positions, respectively marking the slot positions of the words in the first content and the second content to generate a first slot position text and a second slot position text; calling a language model, adjusting words corresponding to each slot position in the first slot position text, and generating a first adjusted text; extracting context contents corresponding to each slot position in the second slot position text according to the slot position marks, inputting the context contents into an automatic question-answering model for question-answering search, and obtaining answer data associated with words corresponding to the slot positions; and adjusting the content of the second slot position text based on answer data, and combining the second slot position text with the first adjusting text to form a final expression text. Thereby improving the accuracy and the logistical property of the text generation.

Description

Text generation method, device, equipment and storage medium

Technical Field

The present invention relates to the field of natural language processing, and in particular, to a text generation method, apparatus, device, and storage medium.

Background

The automatic generation of the current text can greatly improve the productivity of people in processing documents. Citation text generation techniques may significantly reduce the paperwork load of workers. In different fields, the requirements for text generation are different, but in certain specific fields, such as the field of legal documents, the requirements for text generation are higher due to the characteristics of work and the like, and the generated text not only requires correct grammar, but also is logical.

At present, text generation mainly depends on a language model to modify and adjust an input text sentence to be modified so as to generate a text with correct grammar, but because the existing language model mainly depends on a co-occurrence relation in a modeling past corpus and lacks accurate learning of relations such as logic collocation and the like, the generated text is easy to generate logic errors and is not suitable for fields such as legal documents and the like which have higher requirements on the quality of text generation.

Disclosure of Invention

The invention mainly aims to provide a text generation method, a text generation device, text generation equipment and a storage medium, which are used for solving the technical problem that a text generated by using a language model is easy to generate logic errors in the prior art.

The first aspect of the present invention provides a text generation method, where the text generation method includes:

acquiring an original text to be edited, and performing semantic recognition on sentences in the original text to obtain first content and second content, wherein the first content is a sentence set with definite semantics, and the second content is a sentence set with indefinite semantics;

based on the corresponding relation between preset words and slot positions, respectively marking the words of sentences in the first content and the second content to generate a first slot position text and a second slot position text;

calling a preset language model, and adjusting words corresponding to each slot position in the first slot position text to generate a first adjusted text;

extracting context content corresponding to each slot in the second slot text according to the slot marks, and inputting the context content into a preset automatic question-answering model, wherein the automatic question-answering model carries out question-answering search processing based on the context content to obtain answer data associated with words corresponding to the slot, and the automatic question-answering model is a retrieval model constructed based on a knowledge graph in a specific field;

and adjusting the content of the second slot position text based on the answer data, and combining the second slot position text with the first adjusted text to obtain a final expression text of the original text.

Optionally, in a first implementation manner of the first aspect of the present invention, the invoking a preset language model, performing adjustment processing on a word corresponding to each slot in the first slot text, and generating the first adjusted text includes:

analyzing the grammar of the content in the first slot position text to obtain a first word to be adjusted in the first slot position text, wherein the first word to be adjusted is a word which is not habitually expressed;

according to the first to-be-adjusted word, searching a word which has the same part of speech as the first to-be-adjusted word and accords with a habit expression from a preset knowledge graph library to obtain a first adjusted word;

and replacing a first word to be adjusted corresponding to the first slot position text according to the first adjusting word to generate a first adjusting text.

Optionally, in a second implementation manner of the first aspect of the present invention, the extracting, according to the slot mark, context content corresponding to each slot in the second slot text, and inputting the context content into a preset automatic question-and-answer model, where the automatic question-and-answer model performs question-and-answer search processing based on the context content, and obtaining answer data associated with a word corresponding to a slot includes:

extracting keywords from the context content to obtain at least one keyword;

taking the at least one keyword as a question index, and inputting the at least one keyword into the automatic question-answering model for question-answering retrieval to obtain answer data;

and comparing the answer data with the words corresponding to the slot positions to obtain content data associated with the words corresponding to the slot positions.

Optionally, in a third implementation manner of the first aspect of the present invention, the comparing the retrieved question and answer data with the words corresponding to the slot to obtain content data associated with the words corresponding to the slot includes:

extracting keywords from the content data to obtain at least one answer keyword;

and comparing the at least one answer keyword with the words corresponding to the slot positions to obtain keyword data associated with the words corresponding to the slot positions.

Optionally, in a fourth implementation manner of the first aspect of the present invention, the adjusting the content of the second slot text based on the answer data, and combining the second slot text with the first adjusted text to obtain a final expression text of the original text includes:

calling a preset language model, and adjusting words in the answer data to obtain adjusted data;

replacing the slot position content associated with the second slot position text according to the adjusted data to generate a second adjusted text;

and combining the second adjusting text and the first adjusting text to obtain a final expression text of the original text.

Optionally, in a fifth implementation manner of the first aspect of the present invention, the invoking a preset language model, adjusting words in the answer data, and obtaining adjusted data includes:

carrying out syntactic analysis on sentences in the answer data to obtain second words to be adjusted, wherein the second words to be adjusted refer to the words which are not habitually expressed;

according to the second word to be adjusted, searching a word which has the same part of speech as the second word to be adjusted and accords with the habit expression from a preset knowledge graph library to obtain a second adjusting word;

and replacing a second word to be adjusted corresponding to the answer data by using the second adjusting word to obtain adjusted data.

Optionally, in a sixth implementation manner of the first aspect of the present invention, after the adjusting the content of the second slot text based on the answer data, and merging the second slot text with the first adjusted text to obtain a final expression text of the original text, the method further includes:

merging the second adjusting text and the first adjusting text to obtain a third adjusting text;

calling a preset language model, adjusting words corresponding to each slot position in the third adjustment text, and generating a fourth adjustment text;

extracting context contents corresponding to each slot position in the fourth adjusting text according to the slot position mark, and inputting the context contents into a preset automatic question-answering model, wherein the automatic question-answering model carries out question-answering search processing based on the context contents to obtain answer data associated with words corresponding to the slot position;

and adjusting the content of the fourth adjusting text based on the answer data to obtain a final expression text of the third adjusting text.

A second aspect of the present invention provides a text generating apparatus, including:

the semantic recognition module is used for acquiring an original text to be edited and performing semantic recognition on sentences in the original text to obtain first content and second content, wherein the first content is a sentence set with definite semantics, and the second content is a sentence set with indefinite semantics;

the slot position marking module is used for respectively marking the slot positions of the words of the sentences in the first content and the second content based on the corresponding relation between the preset words and the slot positions, and generating a first slot position text and a second slot position text;

the word adjusting module is used for calling a preset language model, adjusting words corresponding to each slot position in the first slot position text and generating a first adjusting text;

the retrieval module is used for extracting context content corresponding to each slot position in the second slot position text according to the slot position mark and inputting the context content into a preset automatic question-answering model, the automatic question-answering model carries out question-answering search processing based on the context content to obtain answer data associated with words corresponding to the slot position, and the automatic question-answering model is a retrieval model constructed based on a knowledge graph in a specific field;

and the text merging module is used for adjusting the content of the second slot position text based on the answer data and merging the second slot position text with the first adjusted text to obtain the final expression text of the original text.

Optionally, in a first implementation manner of the second aspect of the present invention, the term adjustment module is specifically configured to:

Optionally, in a second implementation manner of the second aspect of the present invention, the retrieving module includes:

the keyword extraction unit is used for extracting keywords from the context content to obtain at least one keyword;

the question-answer retrieval unit is used for taking the at least one keyword as a question index and inputting the question index into the automatic question-answer model to retrieve questions and answers so as to obtain answer data;

and the data comparison unit is used for comparing the answer data with the words corresponding to the slot positions to obtain content data associated with the words corresponding to the slot positions.

Optionally, in a third implementation manner of the second aspect of the present invention, the data comparing unit is specifically configured to:

Optionally, in a fourth implementation manner of the second aspect of the present invention, the text merging module includes:

the data adjusting unit is used for calling a preset language model, adjusting words in the answer data and obtaining adjusted data;

the data replacement unit is used for replacing the slot position content associated with the second slot position text according to the adjusted data to generate a second adjusted text;

and the text merging unit is used for merging the second adjusting text and the first adjusting text to obtain a final expression text of the original text.

Optionally, in a fifth implementation manner of the second aspect of the present invention, the data adjusting unit is specifically configured to:

Optionally, in a sixth implementation manner of the second aspect of the present invention, the text generating apparatus further includes a text readjusting module, which is specifically configured to:

A third aspect of the present invention provides a text generating apparatus, including: a memory having instructions stored therein and at least one processor, the memory and the at least one processor interconnected by a line; the at least one processor invokes the instructions in the memory to cause the text generation device to perform the steps of the text generation method described above.

A fourth aspect of the present invention provides a computer-readable storage medium having stored thereon instructions which, when run on a computer, cause the computer to perform the steps of the text generation method described above.

According to the technical scheme provided by the invention, the first content and the second content are obtained by acquiring the original text to be edited and performing semantic recognition; based on the corresponding relation between the preset words and the slot positions, respectively marking the slot positions of the words in the first content and the second content to generate a first slot position text and a second slot position text; calling a language model, adjusting words corresponding to each slot position in the first slot position text, and generating a first adjusted text; extracting context contents corresponding to each slot position in the second slot position text according to the slot position marks, inputting the context contents into an automatic question-answering model for question-answering search, and obtaining answer data associated with words corresponding to the slot positions; and adjusting the content of the second slot position text based on answer data, and combining the second slot position text with the first adjusting text to form a final expression text. The final expression text generated by the technical scheme provided by the invention has correct semantics and accords with logic, so that the accuracy and the synthetic logic of text generation are improved.

Drawings

FIG. 1 is a schematic diagram of a first embodiment of a text generation method in an embodiment of the present invention;

FIG. 2 is a diagram of a second embodiment of a text generation method according to an embodiment of the present invention;

FIG. 3 is a diagram of a third embodiment of a text generation method according to an embodiment of the present invention;

FIG. 4 is a diagram of a fourth embodiment of a text generation method according to an embodiment of the present invention;

FIG. 5 is a diagram of an embodiment of a text generation apparatus according to an embodiment of the present invention;

FIG. 6 is a schematic diagram of another embodiment of a text generation apparatus according to an embodiment of the present invention;

fig. 7 is a schematic diagram of an embodiment of a text generation device in the embodiment of the present invention.

Detailed Description

The embodiment of the invention provides a text generation method, a text generation device, text generation equipment and a storage medium, wherein a first content and a second content are obtained by acquiring an original text to be edited and performing semantic recognition; based on the corresponding relation between the preset words and the slot positions, respectively marking the slot positions of the words in the first content and the second content to generate a first slot position text and a second slot position text; calling a language model, adjusting words corresponding to each slot position in the first slot position text, and generating a first adjusted text; extracting context contents corresponding to each slot position in the second slot position text according to the slot position marks, inputting the context contents into an automatic question-answering model for question-answering search, and obtaining answer data associated with words corresponding to the slot positions; and adjusting the content of the second slot position text based on answer data, and combining the second slot position text with the first adjusting text to form a final expression text. The final expression text generated by the technical scheme provided by the invention has correct semantics and accords with logic, so that the accuracy and the synthetic logic of text generation are improved.

The terms "first," "second," "third," "fourth," and the like in the description and in the claims, as well as in the drawings, if any, are used for distinguishing between similar elements and not necessarily for describing a particular sequential or chronological order. It will be appreciated that the data so used may be interchanged under appropriate circumstances such that the embodiments described herein may be practiced otherwise than as specifically illustrated or described herein. Furthermore, the terms "comprises," "comprising," or "having," and any variations thereof, are intended to cover non-exclusive inclusions, such that a process, method, system, article, or apparatus that comprises a list of steps or elements is not necessarily limited to those steps or elements expressly listed, but may include other steps or elements not expressly listed or inherent to such process, method, article, or apparatus.

For the sake of understanding, the following describes specific contents of an embodiment of the present invention, and referring to fig. 1, a first embodiment of a text generation method in an embodiment of the present invention includes:

101, acquiring an original text to be edited, and performing semantic recognition on sentences in the original text to obtain first content and second content, wherein the first content is a sentence set with definite semantics, and the second content is a sentence set with indefinite semantics;

it is to be understood that the executing subject of the present invention may be a text generating apparatus, and may also be a terminal or a server, which is not limited herein. The embodiment of the present invention is described by taking a server as an execution subject.

It is emphasized that the text data may be stored in a node of a blockchain in order to ensure the integrity and security of the text data.

The method comprises the steps of obtaining an original text to be edited, and carrying out semantic inspection on the original text, wherein the semantic inspection is mainly carried out on whether the text content contains sentences with ambiguous semantics, and the ambiguous semantics means that the sentence expression is unclear and does not accord with logic.

The semantic inspection of the text is mainly to utilize a semantic recognition technology and use a semantic dictionary to inspect whether word collocation is reasonable, whether sentences are unclear and whether the sentences are not logical based on a semantic recognition method, wherein the word collocation refers to whether word class collocation is reasonable or not, and the inspection is mainly performed according to part-of-speech collocation, for example, a collocation mode of 'verb + noun' is reasonable collocation, and 'adjective + verb' is not reasonable collocation.

After the original text is obtained, performing word segmentation processing on sentences in the text, marking the part of speech of each word, and performing semantic analysis and grammar analysis according to the sentences and the words in the original text. After the sentences in the original text are identified by using a semantic identification method, first content and second content are generated according to an identification result, wherein the first content is a sentence set which is determined to be clear in semantic after semantic identification, and the second content is a sentence set which is not clear in semantic.

102, based on a preset corresponding relation between words and slot positions, respectively marking the words of sentences in the first content and the second content to generate a first slot position text and a second slot position text;

the method comprises the steps of marking the groove positions of sentences in first content and second content according to the corresponding relation between preset words and the groove positions, wherein the groove positions are marked with words, one word corresponds to one groove position, and after the words of the sentences in the first content and the second content are marked with the groove positions, original first content and second content texts are converted to generate first groove position texts and second groove position texts, wherein the first groove position texts and the second groove position texts are obtained by marking the groove positions of the first content and the second content, the first content is a sentence set with clear semantics, the second content is a sentence set with ambiguous semantics, therefore, the words marked with the groove positions in the first groove position texts are words which do not accord with customary expressions, and the words marked with the groove positions in the second groove position texts are words which do not accord with logic.

103, calling a preset language model, adjusting words corresponding to each slot position in the first slot position text, and generating a first adjusted text;

after the first slot position text is obtained, a preset language model is called to adjust words corresponding to each slot position in the first slot position text, and the words in the first slot position text are mainly used for checking whether the words accord with habit expressions in the fields of daily life, study and research and the like. Specifically, the content of the first slot text may be parsed to find out words that are not habitually expressed, the words are marked as words to be adjusted, the words to be adjusted are input into the language model, a knowledge gallery arranged in the language model is used to search words with the same part of speech as the words to be adjusted from the knowledge gallery, words that are attached to the content context of the first slot text and conform to the habitual usage are searched from the words with the same part of speech according to the content of the first slot text, the words to be adjusted are replaced and adjusted by using the words, so that the first slot text is converted to generate a first adjusted text, wherein the first adjusted text is obtained by adjusting words, sentences in the text content and words used for conforming to the habitual expression, for example, "television eating" expressed by mistake is input into the language model, the language model finds out the words with the same part of speech as the 'eating' according to the collocation of the 'verb and the noun', and finds out the word 'watching' to replace the 'eating' according to the overall context and the semantic meaning of the sentence, thereby obtaining the word which is in line with the habitual expression, namely 'watching TV'.

104, extracting context content corresponding to each slot in a second slot text according to the slot marks, inputting the context content into a preset automatic question-answering model, and performing question-answering search processing on the automatic question-answering model based on the context content to obtain answer data associated with words corresponding to the slot, wherein the automatic question-answering model is a retrieval model constructed based on a knowledge graph in a specific field;

and extracting context contents corresponding to each slot in the second slot text according to the slot marks, inputting the context contents serving as questions into an automatic question-answering model, calling a knowledge graph of a related knowledge field arranged on the model by the automatic question-answering model, performing question-answering retrieval on the input questions, and outputting related data.

In the process of inputting the context content corresponding to each slot in the second slot text as an answer to the automatic question-answering model for question-answering retrieval, keyword extraction may be performed on the related context content, at least one keyword or related keyword information is extracted, the extracted at least one keyword or keyword information is used as a question index and is input to the automatic question-answering model for question-answering retrieval, question-answering data is obtained, and the question-answering data is compared with the words corresponding to the slots, so that content data associated with the words corresponding to the slots is obtained.

Specifically, the context content corresponding to the slot position is input to a question input end of the automatic question-answering system, then an answer is obtained through calculation of the model, for example, contents such as "move the official money" and "law" are input as a question, and the automatic question-answering model outputs a relevant answer, such as data of the "one hundred eighty five items of the national criminal law". If the question entered is "during the company, a certain nail appropriates 10 ten thousand dollars in the notations, what legal the decision for a certain nail can be based on? The automatic question-answering model can simply inquire the knowledge graph and output question-answering data in the form of 'appropriating the official money and having a one-hundred-eighty-five item and a second-hundred-seventy-two item, however, the two items are respectively suitable for financial departments such as banks and the like and related question-answering data such as companies'.

And 105, adjusting the content of the second slot position text based on the answer data, and combining the second slot position text with the first adjusted text to obtain a final expression text of the original text.

After the relevant data are obtained through the question and answer retrieval of the automatic question and answer model, the words in the relevant data are adjusted based on the relevant data, the adjusted data are replaced with the related slot position content in the second slot position text, and the second slot position content and the first adjusted text are merged to obtain the final expression text.

The method comprises the steps of obtaining a first slot position text, obtaining a second slot position text, obtaining question-answer data, adjusting the question-answer data, wherein the question-answer data is adjusted, the first slot position text is matched with the second slot position text, the second slot position text is matched with the question-answer data, the question-answer data is adjusted, and the question-answer data is mainly adjusted through a language model, wherein expressions which are not matched with logic in the question-answer data are adjusted through the language model, for example, words which have the same word property as words in the question-answer data but are matched with custom expressions are.

In the embodiment, the first content and the second content are obtained by acquiring the original text to be edited and performing semantic recognition; based on the corresponding relation between the preset words and the slot positions, respectively marking the slot positions of the words in the first content and the second content to generate a first slot position text and a second slot position text; calling a language model, adjusting words corresponding to each slot position in the first slot position text, and generating a first adjusted text; extracting context contents corresponding to each slot position in the second slot position text according to the slot position marks, inputting the context contents into an automatic question-answering model for question-answering search, and obtaining answer data associated with words corresponding to the slot positions; and adjusting the content of the second slot position text based on the answer data, and combining the second slot position text with the first adjusting text to form a final expression text. The final expression text generated by the technical scheme provided by the embodiment is correct in semantics and accords with logic, so that the accuracy and the synthetic logic of text generation are improved.

Referring to fig. 2, a second embodiment of the text generation method according to the embodiment of the present invention includes:

201, acquiring an original text to be edited, and performing semantic recognition on sentences in the original text to obtain first content and second content, wherein the first content is a sentence set with definite semantics, and the second content is a sentence set with indefinite semantics;

202, based on the preset corresponding relation between the words and the slot positions, respectively marking the words of sentences in the first content and the second content to generate a first slot position text and a second slot position text;

and respectively marking the slot positions of sentences in the first content and the second content according to the corresponding relation between preset words and the slot positions, wherein the words are marked in the slot positions, one word corresponds to one slot position, and after the words of the sentences in the first content and the second content are marked in the slot positions, the original first content and second content texts are converted to generate a first slot position text and a second slot position text.

203, performing grammatical analysis on the content in the first slot position text to obtain a first word to be adjusted in the first slot position text, wherein the first word to be adjusted is a word which is not habitually expressed;

the sentence in the first slot position text is identified with the word which is not in accordance with the habitual expression by using the grammar analysis technology, the grammar identification technology is mainly used for identifying the word according to the common habitual expression of each field person when writing the text content by combining the sentence semantics and the context, identifying the word which is not in accordance with the habitual expression, and sorting the word which is not in accordance with the habitual expression into the first word to be adjusted.

The habitual expressions of the people in each field are mainly based on the existing big data analysis tool, the texts written by the people in each field are analyzed and subjected to probability statistics, words which accord with the habitual expressions are screened out and are counted into analysis data which can be called, and the habitual expressions of the texts can be identified and analyzed.

204, searching terms which have the same part of speech as the first term to be adjusted and are in line with the habitual expression from a preset knowledge graph library according to the first term to be adjusted to obtain a first adjustment term;

after the first word to be adjusted is obtained, the language model is utilized, and words which have the same part of speech as the first word to be adjusted and are in accordance with the habitual expression are searched out according to a knowledge graph library arranged in the language model, wherein the part of speech is a result of dividing words by taking grammatical features (including syntactic function and morphological change) as main basis and giving consideration to word meaning, the part of speech in modern Chinese is classified, the part of speech of the words can be divided into 13, the words are classified by the knowledge graph library, the word to be adjusted is input into the language model, then the words with the same part of speech are searched by the language model, and the word which is in accordance with the habitual expression is screened out of the searched words to be used as the first adjusting word.

205, replacing a first word to be adjusted corresponding to the first slot position text according to the first adjusting word to generate a first adjusting text;

after the first adjusting word is obtained, the first to-be-adjusted word corresponding to the first adjusting word is found from the first slot position text according to the first adjusting word, and the first to-be-adjusted word is replaced by the first adjusting word, so that the first adjusting text is generated.

206, extracting context contents corresponding to each slot in the second slot text according to the slot marks, inputting the context contents into a preset automatic question-answering model, and performing question-answering search processing on the automatic question-answering model based on the context contents to obtain answer data associated with words corresponding to the slot, wherein the automatic question-answering model is a retrieval model constructed based on a knowledge graph in a specific field;

And 207, adjusting the content of the second slot position text based on the question and answer data, and combining the second slot position text with the first adjusted text to obtain a final expression text of the original text.

After the relevant data are obtained through the question and answer retrieval of the automatic question and answer model, words which are not logical and do not conform to the habitual expression are adjusted and replaced on the basis of the relevant data, the data obtained after adjustment are replaced with the related slot position content in the second slot position text, and the data are merged with the first adjustment text, so that the final expression text is obtained.

In this embodiment, words corresponding to each slot in the first slot text are adjusted, words that are not habitually expressed are selected from the words, a language model is used to replace and adjust the words that are not habitually expressed to obtain a first adjusted text, and then the first adjusted text and the second adjusted text are merged to generate a final expressed text. The technical scheme provided by the embodiment can generate the expression text with correct grammar and reasonable logic, and provides the accuracy and the logical property of text generation.

Referring to fig. 3, a third embodiment of the text generation method according to the embodiment of the present invention includes:

301, acquiring an original text to be edited, and performing semantic recognition on sentences in the original text to obtain first content and second content, wherein the first content is a sentence set with definite semantics, and the second content is a sentence set with indefinite semantics;

302, based on the preset corresponding relation between the words and the slot positions, respectively marking the words of sentences in the first content and the second content to generate a first slot position text and a second slot position text;

303, calling a preset language model, adjusting words corresponding to each slot position in the first slot position text, and generating a first adjusted text;

after the first slot position text is obtained, a preset language model is called to adjust words corresponding to each slot position in the first slot position text, and the words in the first slot position text are mainly used for checking whether the words accord with habit expressions in the fields of daily life, study and research and the like. And then, the words with the same part of speech and conforming to the idiom are searched by utilizing the language model to replace the words needing to be adjusted, so that a first adjusting text is generated.

304, extracting context content corresponding to each slot position in the second slot position text according to the slot position mark, and extracting keywords from the context content corresponding to each slot position to obtain at least one keyword;

the second slot position text is adjusted mainly by adjusting words which do not accord with logic in the text, so in the second slot position text, the slot position marks the words which do not accord with the logic. And extracting the context content corresponding to each slot position from the second slot position text according to the slot position mark, and extracting keywords from the context content corresponding to each slot position to obtain at least one keyword or keyword information.

The extraction of the keywords and the keyword information can be realized by using a keyword extraction algorithm. For example, a keyword extraction technique based on statistical features, a keyword extraction technique based on a word graph model, a keyword extraction technique based on a topic model, and the like can be used. The keyword extraction technology based on the statistical characteristics mainly utilizes the idea of a keyword extraction algorithm based on the statistical characteristics and utilizes the statistical information of words in the document to extract keywords of the document; firstly, constructing a language network diagram of a document by a keyword extraction technology based on a word diagram model, then analyzing the language network diagram, and searching words or phrases with important functions on the diagram, wherein the phrases are keywords of the document; the topic-based keyword extraction technology is used for extracting keywords by using the property about topic distribution in a topic model according to a topic-based keyword extraction algorithm. Extracting keywords by using a keyword extraction algorithm is the prior art, and therefore, the details are not repeated here.

305, taking at least one keyword as a question index, and inputting the keyword into an automatic question-answering model for retrieving question-answering to obtain answer data;

after at least one keyword is extracted, the at least one keyword is used as a question index and is input into an automatic question-answering model for question-answering retrieval. After the automatic question-answering model obtains the input question index, relevant question-answering data are searched from the input question index by using a preset knowledge graph in the automatic question-answering model, and relevant answer data are output.

The automatic question-answering model is mainly characterized in that a knowledge base (such as Freebase) is preset, the knowledge base contains a large amount of priori knowledge data, then the knowledge resources are utilized to automatically answer questions in natural language forms, the knowledge base is a knowledge map, knowledge is used as a main unit, an entity is used as a main carrier, and the automatic question-answering model contains a huge database containing the cognition of people on thousands of things and various facts in real life. In general, knowledge (or facts) is presented primarily in the form of triples: < head entity, relationship, tail entity >, where an entity is anything, such as a person, place, or specific concept. For example, < she, changed, china > is a simple example of a triple, with the beginning and end being physical elements inherent in the knowledge base.

The automatic question-answering model extracts names of main entities in questions from the questions by carrying out entity recognition on the input questions, and then the extracted entity names correspond to specific entities in a knowledge base, because the names are not unique identifications of the entities due to the existence of entities with the same name, and unique serial numbers (id) of the entities are unique identifications, the specific entities can be found in the knowledge base by corresponding to the specific entity serial numbers in the knowledge base to obtain related information, then the question is predicted to be answered from which relation in the knowledge base according to other words except the entity names in the original questions, after the entities and the relations are found, the corresponding triples are directly searched in the knowledge base, and the output triples are answer data.

306, comparing the answer data with the words corresponding to the slot positions to obtain content data associated with the words corresponding to the slot positions;

after the output relevant answer data is obtained, the answer data is compared with words corresponding to the slot positions, and the process mainly comprises the steps of screening the answer data and screening out data relevant to the words corresponding to the slot positions. In the comparison process, the association degree of each word in the answer data and the slot position can be calculated through a calculation tool for calculating the association degree.

The correlation coefficient is firstly calculated by using a correlation degree calculating tool, but the correlation coefficient only represents the correlation degree between the reference sequence and the comparison sequence at each moment, and in order to know the correlation degree between the sequences as a whole, the time average value of the correlation coefficient and the reference sequence must be obtained, wherein the time average value is the correlation degree.

307, based on the content data associated with the words corresponding to the slot positions, adjusting the content of the second slot position text, and combining the second slot position text with the first adjusted text to obtain a final expression text of the original text.

In this embodiment, the context content in the second slot is subjected to keyword extraction, then the keyword is input into the automatic question-answering model as a question index to obtain answer data, the answer data is compared with words corresponding to the slot, data related to the slot is found, the words in the slot are replaced with the data, and finally the final expression text is obtained. The technical scheme provided by the embodiment ensures that the expression text which is logical is generated, and improves the synthetic logicality of the text generation.

Referring to fig. 4, a fourth embodiment of the text generation method according to the embodiment of the present invention includes:

401, acquiring an original text to be edited, and performing semantic recognition on sentences in the original text to obtain first content and second content, wherein the first content is a sentence set with definite semantics, and the second content is a sentence set with indefinite semantics;

402, based on the preset corresponding relation between the words and the slot positions, respectively marking the words of sentences in the first content and the second content to generate a first slot position text and a second slot position text;

403, calling a preset language model, adjusting words corresponding to each slot position in the first slot position text, and generating a first adjusted text;

404, extracting context content corresponding to each slot in the second slot text according to the slot marks, inputting the context content into a preset automatic question-answering model, and performing question-answering search processing on the automatic question-answering model based on the context content to obtain answer data associated with words corresponding to the slot, wherein the automatic question-answering model is a retrieval model constructed based on a knowledge graph in a specific field;

405, calling a preset language model, and adjusting words in answer data to obtain adjusted data;

after obtaining the relevant answer data, inputting the answer data into a language model, adjusting words in the answer data by using an algorithm preset in the language model, wherein the adjustment process mainly performs semantic expression adjustment on the words in the answer data, namely screening out words which do not conform to the habitual expression in the answer data, adjusting the words which do not conform to the habitual expression by using the language model so as to generate data content conforming to the habitual expression, and adjusting the words in the answer data to obtain the adjusted data.

406, replacing the slot position content associated with the second slot position text according to the adjusted data to generate a second adjusted text;

and when the adjusted data is obtained, searching the associated slot position content from the second slot position text according to the adjusted data, and replacing the associated slot position content with the adjusted data to generate a second adjusted text.

And 407, combining the second adjusting text with the first adjusting text to obtain a final expression text of the original text.

And combining the obtained second adjusting text and the first adjusting text according to the corresponding association relationship between sentences, so that the generated text is the final expression text of the original text.

In this embodiment, a language model is called to adjust the retrieved answer data, and the adjusted data is used to replace the content associated with the second slot text, so as to obtain a second adjusted text that is logical, and then the second adjusted text is merged with the first adjusted text, so as to generate a final expression text. The technical scheme provided by the embodiment can generate the text which is logical and correct in grammar and smooth in semantic expression, and the synthetic logicality of text generation is improved.

With reference to fig. 5, the text generating method in the embodiment of the present invention is described above, and a text generating apparatus in the embodiment of the present invention is described below, where an embodiment of the text generating apparatus in the embodiment of the present invention includes:

a semantic identification module 501, configured to acquire an original text to be edited, and perform semantic identification on sentences in the original text to obtain first content and second content, where the first content is a sentence set with definite semantics, and the second content is a sentence set with indefinite semantics;

a slot position marking module 502, configured to mark slot positions of words of sentences in the first content and the second content respectively based on a preset corresponding relationship between the words and the slot positions, and generate a first slot position text and a second slot position text;

a word adjusting module 503, configured to invoke a preset language model, and adjust a word corresponding to each slot in the first slot text to generate a first adjusted text;

a retrieval module 504, configured to extract context content corresponding to each slot in the second slot text according to the slot tag, and input the context content into a preset automatic question-answering model, where the automatic question-answering model performs question-answering search processing based on the context content to obtain answer data associated with a word corresponding to the slot, where the automatic question-answering model is a retrieval model constructed based on a knowledge graph in a specific field;

and a text merging module 505, configured to adjust the content of the second slot text based on the answer data, and merge the second slot text with the first adjusted text to obtain a final expression text of the original text.

In the embodiment of the invention, the text generation device runs the text generation method, and obtains the first content and the second content by acquiring the original text to be edited and performing semantic recognition; based on the corresponding relation between the preset words and the slot positions, respectively marking the slot positions of the words in the first content and the second content to generate a first slot position text and a second slot position text; calling a language model, adjusting words corresponding to each slot position in the first slot position text, and generating a first adjusted text; extracting context contents corresponding to each slot position in the second slot position text according to the slot position marks, inputting the context contents into an automatic question-answering model for question-answering search, and obtaining answer data associated with words corresponding to the slot positions; and adjusting the content of the second slot position text based on the answer data, and combining the second slot position text with the first adjusting text to form a final expression text. The final expression text generated by the technical scheme provided by the embodiment is correct in semantics and accords with logic, so that the accuracy and the synthetic logic of text generation are improved.

Referring to fig. 6, another embodiment of the text generating apparatus according to the embodiment of the present invention includes:

In this embodiment, the word adjustment 503 module is specifically configured to:

In this embodiment, the retrieving module 504 includes:

a keyword extraction unit 5041, configured to perform keyword extraction on the context content to obtain at least one keyword;

a question-answer retrieval unit 5042, configured to use the at least one keyword as a question index, and input the question index into the automatic question-answer model to perform question-answer retrieval, so as to obtain answer data;

a data comparison unit 5043, configured to compare the answer data with the word corresponding to the slot, so as to obtain content data associated with the word corresponding to the slot.

In this embodiment, the data comparing unit 5043 is specifically configured to:

In this embodiment, the text merging module 505 includes:

a data adjusting unit 5051, configured to call a preset language model, and adjust words in the answer data to obtain adjusted data;

a data replacement unit 5052, configured to replace, according to the adjusted data, slot content associated with the second slot text, so as to generate a second adjusted text;

a text merging unit 5053, configured to merge the second adjusted text with the first adjusted text to obtain a final expression text of the original text.

In this embodiment, the data adjusting unit 5051 is specifically configured to:

Optionally, the text readjustment module 506 is specifically configured to:

In the embodiment, through the implementation of the device, semantic recognition can be performed again on the generated expression text, and the text content of the recognized incommercial expression and the unconventional expression is adjusted again by using the language model and the automatic question-and-answer model again until the expression text which is correct in grammar, conforms to the habitual expression and is logical can be generated finally, so that the accuracy and the logical property of text generation are ensured.

Fig. 5 and fig. 6 describe the text generation apparatus in the embodiment of the present invention in detail from the perspective of the modular functional entity, and the text generation device in the embodiment of the present invention is described in detail from the perspective of hardware processing.

Fig. 7 is a schematic structural diagram of a text generating apparatus 700 according to an embodiment of the present invention, where the text generating apparatus 700 may have a relatively large difference due to different configurations or performances, and may include one or more processors (CPUs) 710 (e.g., one or more processors) and a memory 720, and one or more storage media 730 (e.g., one or more mass storage devices) for storing applications 733 or data 732. Memory 720 and storage medium 730 may be, among other things, transient storage or persistent storage. The program stored in the storage medium 730 may include one or more modules (not shown), each of which may include a series of instruction operations for the text generation apparatus 700. Still further, the processor 710 may be configured to communicate with the storage medium 730 to execute a series of instruction operations in the storage medium 730 on the text generation apparatus 700.

The text-generating apparatus 700 may also include one or more power supplies 740, one or more wired or wireless network interfaces 750, one or more input-output interfaces 760, and/or one or more operating systems 731, such as Windows Server, Mac OS X, Unix, Linux, FreeBSD, and so forth. Those skilled in the art will appreciate that the text generation device architecture shown in FIG. 7 does not constitute a limitation of the text generation device, and may include more or fewer components than shown, or some components in combination, or a different arrangement of components.

The present invention also provides a computer-readable storage medium, which may be a non-volatile computer-readable storage medium, and which may also be a volatile computer-readable storage medium, having stored therein instructions, which, when run on a computer, cause the computer to perform the steps of the text generation method.

It is clear to those skilled in the art that, for convenience and brevity of description, the specific working processes of the above-described apparatuses and units may refer to the corresponding processes in the foregoing method embodiments, and are not described herein again.

The integrated unit, if implemented in the form of a software functional unit and sold or used as a stand-alone product, may be stored in a computer readable storage medium. Based on such understanding, the technical solution of the present invention may be embodied in the form of a software product, which is stored in a storage medium and includes instructions for causing a computer device (which may be a personal computer, a server, or a network device) to execute all or part of the steps of the method according to the embodiments of the present invention. And the aforementioned storage medium includes: various media capable of storing program codes, such as a usb disk, a removable hard disk, a read-only memory (ROM), a Random Access Memory (RAM), a magnetic disk, or an optical disk.

The above-mentioned embodiments are only used for illustrating the technical solutions of the present invention, and not for limiting the same; although the present invention has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some technical features may be equivalently replaced; and such modifications or substitutions do not depart from the spirit and scope of the corresponding technical solutions of the embodiments of the present invention.

Claims

1. A text generation method, characterized in that the text generation method comprises:

2. The text generation method according to claim 1, wherein the invoking of the preset language model adjusts a word corresponding to each slot in the first slot text, and generating the first adjusted text comprises:

3. The text generation method according to claim 1, wherein the extracting, according to the slot mark, the context content corresponding to each slot in the second slot text, and inputting the context content into a preset automatic question and answer model, wherein the automatic question and answer model performs question and answer search processing based on the context content, and obtaining answer data associated with the word corresponding to the slot includes:

extracting keywords from the context content to obtain at least one keyword;

4. The text generation method of claim 3, wherein the comparing the retrieved question and answer data with the words corresponding to the slots to obtain content data associated with the words corresponding to the slots comprises:

5. The text generation method according to any one of claims 1 to 4, wherein the adjusting the content of the second slot text based on the answer data and combining the second slot text with the first adjusted text to obtain the final expression text of the original text comprises:

6. The text generation method according to claim 5, wherein the invoking a preset language model adjusts words in the answer data, and obtaining adjusted data includes:

7. The text generation method according to claim 6, wherein after the adjusting the content of the second slot text based on the answer data and combining with the first adjusted text to obtain the final expression text of the original text, the method further comprises:

8. A text generation apparatus, characterized in that the text generation apparatus comprises:

the slot position marking module is used for respectively marking the slot positions of the words of the sentences in the first content and the second content based on the corresponding relation between the preset words and the slot positions to generate a first slot position text and a second slot position text;

9. A text generation device, characterized in that the text generation device comprises:

a memory having instructions stored therein and at least one processor, the memory and the at least one processor interconnected by a line;

the at least one processor invokes the instructions in the memory to cause the text generation apparatus to perform the text generation method of any of claims 1-7.

10. A computer-readable storage medium having instructions stored thereon, wherein the instructions, when executed by a processor, implement a text generation method as recited in any of claims 1-7.