WO2020020084A1

WO2020020084A1 - Text generation method, apparatus and device

Info

Publication number: WO2020020084A1
Application number: PCT/CN2019/096894
Authority: WO
Inventors: 沈力行; 陈展
Original assignee: 杭州海康威视数字技术股份有限公司
Priority date: 2018-07-27
Filing date: 2019-07-19
Publication date: 2020-01-30
Also published as: CN110852084A; CN110852084B

Abstract

The embodiments of the present application provide a text generation method, an apparatus and a device. Said method comprises: for each module in a fixed writing format of a text to be generated, acquiring, from a preset database, a plurality of valid texts conforming to demand information of the module; inputting, for each module, the plurality of valid texts of the module into a pre-trained first recurrent neural network respectively, so as to obtain a first feature vector of each valid text; inputting, for each module, the first feature vector of each valid text into a pre-trained memory network, so as to obtain segmented words in each valid text, and first position information in a filling text of the module, and arranging the segmented words in each valid text to obtain the filling text; and arranging the filling text of each module according to the fixed writing format of the text to be generated, so as to obtain the text to be generated. Thus, a text to be generated conforming to the natural language expression structure, is obtained.

Description

Text generating method, device and equipment

This application claims priority from a Chinese patent application filed with the Chinese Patent Office on July 27, 2018, with an application number of 201810846953.8, and the invention name is "text generation method, device, and device", the entire contents of which are incorporated herein by reference.

Technical field

The present application relates to the technical field of natural language processing, and in particular, to a method, a device, and a device for generating text.

Background technique

Natural language is the language that people use every day. Natural language processing technology can realize natural language communication between humans and computers. It is widely used to generate texts with fixed writing format and specified demand information and expressed in natural language. For example, for each module in the fixed writing format of the text to be generated, using natural language processing technology to determine valid text from the database that meets the text requirements of each module, and then filling the determined valid text directly into each module, The filled text of each module is obtained, and then the filled text of each module is arranged in a fixed writing format to obtain the text to be generated. Among them, the filled text of each module in the fixed writing format usually includes: structured text with a fixed structure of words or sentences, and / or unstructured text with a fixed structure of sentences. For example, each module in a fixed writing format of a hot news is a "Title" module, a "Release Date" module, and a "Body" module. The filled text of the "Title" and "Release Date" modules is structured text. The filled text of the Body module is unstructured text.

In the above-mentioned natural language processing technology, since valid text is directly filled into the module without considering the representation structure after the valid text is filled, for a module that has unstructured text, it is likely that the filled text of the module is multiple valid texts In the mechanical combination, the filled text of the module does not conform to the natural language expression structure, which leads to the problem that the text to be generated obtained by using the filled text module does not conform to the natural language expression structure. Taking the "body" module of one of the above hot news as an example, the text requirement information of the "body" module is "2018 World Cup". For the "body" module, valid texts determined from the database that meet the text requirements information include: "The World Cup is held in Russia for the first time", "The 2018 World Cup is held in 12 stadiums in 11 cities in Russia" and " The competition will be held from June 14th to July 15th, 2018. " Because the filled text of the "body" module is unstructured text with a fixed structure, the valid text is directly filled into the module. The filled text of the generated body module may be "The competition will be from June 14 to July 2018. Held on the 15th, the 2018 World Cup will be held in 12 stadiums in 11 cities in Russia, and the World Cup will be held in Russia for the first time. "And the filled text conforming to the structure of natural language expressions can be" The 2018 World Cup matches will be held in June 2018 From 14th to 15th July, it will be held in 12 stadiums in 11 cities in Russia. This is the first time the World Cup has been held in Russia. "

It can be seen that, for a module having unstructured text, valid content is directly filled into the module to generate the text to be generated, and the generated text to be generated will have a problem that the text structure does not conform to the structure of natural language expressions.

Summary of the Invention

The purpose of the embodiments of the present application is to provide a method, a device, and a device for generating a text, so as to achieve the purpose of generating a text conforming to a natural language expression structure. Specific technical solutions are as follows:

In a first aspect, an embodiment of the present application provides a text generating method, which includes:

For each module in the fixed writing format of the text to be generated, a plurality of valid texts that meet the module's requirement information are obtained from a preset database, and the requirement information is used to indicate the text content corresponding to the module;

For each module, input multiple valid texts of the module into the first recurrent neural network trained in advance to obtain the first feature vector of each valid text of the module. Obtained by training the sample valid text of the information that meets the specified requirements;

For each module, input the first feature vector of each valid text of the module into the pre-trained memory network to obtain the segmentation words in each valid text, and fill in the first position information of the filled text in this module. The text structure of the text is the same as the text structure of the first sample text used in the training of the memory network. The first sample text is a text that conforms to the structure of the natural language expression and meets the specified requirements. Obtained by training the first sample text;

For each module, arrange the participles in each valid text of the module according to the obtained first position information to obtain the filled text of the module;

According to the fixed writing format of the text to be generated, the filling text of each module is arranged to obtain the text to be generated.

In a second aspect, an embodiment of the present application provides a text generating device, where the device includes:

Text acquisition module, for each module in the fixed writing format of the text to be generated, obtain multiple valid texts from the preset database that meet the module's requirement information, and the requirement information is used to indicate the text content corresponding to the module ;

A feature extraction module is used for each module to input multiple valid texts of the module into the first recurrent neural network trained in advance to obtain the first feature vector and first recurrent neural network of each valid text of the module. It is obtained by training with multiple pre-collected sample valid texts that meet the specified requirements information;

A position information determining module is used for each module to input a first feature vector of each valid text of the module into a pre-trained memory network to obtain each participle in each valid text of the module. The first position information in the filled text, the text structure of the filled text is the same as the text structure of the first sample text in the memory network, the first sample text is a text that conforms to the natural language expression structure and meets the specified requirements information, The memory network is obtained by training with multiple first collected first sample texts;

A text generating module is configured to arrange each participle in each valid text of the module according to the obtained first position information for each module to obtain the filled text of the module; according to the text to be generated, The writing format is fixed, and the filled text of each module is arranged to obtain the text to be generated.

In a third aspect, an embodiment of the present application provides a computer device, where the device includes:

A processor, a communication interface, a memory, and a communication bus, wherein the processor, the communication interface, and the memory communicate with each other through the bus; the memory is used to store a computer program; the processor is used to execute the program stored in the memory to implement The steps of the text generating method provided by the first aspect above.

In a fourth aspect, an embodiment of the present application provides a computer-readable storage medium. A computer program is stored in the storage medium. When the computer program is executed by a processor, the steps of the text generation method provided in the first aspect are implemented.

A text generation method, device, and device provided in the embodiments of the present application. For each module, the memory network is trained by using a plurality of pre-collected first sample texts, and the first sample text conforms to natural language. A sample of the content structure and the information required by the module. Therefore, the first position information of each participle in the valid text obtained from the memory network in the filled text is the same as the position information of each participle in the first sample text. On this basis, the segmentation in the valid text is arranged according to the first position information, and the text structure of the obtained filled text is the same as that of the first sample text, which also conforms to the natural language expression structure. Therefore, it can be ensured that the filled text of each module is arranged in accordance with the fixed writing format of the text to be generated, and the obtained text to be generated is a text conforming to the structure of natural language expression.

BRIEF DESCRIPTION OF THE DRAWINGS

In order to more clearly explain the embodiments of the present application and the technical solutions of the prior art, the following briefly introduces the drawings used in the embodiments and the prior art. Obviously, the drawings in the following description are only the present invention. Some embodiments of the application, for those of ordinary skill in the art, can obtain other drawings according to the drawings without paying creative labor.

FIG. 1 is a schematic flowchart of a text generation method according to an embodiment of the present application; FIG.

2 is a schematic structural diagram of a recurrent neural network in a text generating method according to an embodiment of the present application;

3 is a schematic structural diagram of a memory network in a text generating method according to an embodiment of the present application;

4 is a schematic structural diagram of a memory network in a text generating method according to another embodiment of the present application;

5 is a schematic flowchart of a text generation method according to another embodiment of the present application;

FIG. 6 is a schematic structural diagram of a convolutional neural network in a text generation method according to another embodiment of the present application; FIG.

FIG. 7 is a schematic structural diagram of a sequence labeling model in a text generation method according to another embodiment of the present application; FIG.

8 is a schematic structural diagram of a text generating device according to an embodiment of the present application;

9 is a schematic structural diagram of a text generating apparatus according to another embodiment of the present application;

FIG. 10 is a schematic structural diagram of a computer device according to an embodiment of the present application.

detailed description

In order to make the purpose, technical solution, and advantages of the present application clearer and clearer, the following describes the present application in detail with reference to the accompanying drawings and examples. Obviously, the described embodiments are only a part of the embodiments of the present application, but not all the embodiments. Based on the embodiments in the present application, all other embodiments obtained by a person of ordinary skill in the art without creative efforts shall fall within the protection scope of the present application.

A text generation method according to an embodiment of the present application is first introduced below.

The text generation method provided in the embodiment of the present application can be applied to a computer device capable of text generation. The device includes a desktop computer, a portable computer, an Internet television, a smart mobile terminal, a wearable smart terminal, and a server, etc., and is not limited herein. Any computer equipment that can implement the embodiments of the present application belongs to the protection scope of the embodiments of the present application.

As shown in FIG. 1, the flow of a text generation method according to an embodiment of the present application may include:

S101. For each module in the fixed writing format of the text to be generated, a plurality of valid texts that meet the requirement information of the module are obtained from a preset database, and the requirement information is used to indicate the text content corresponding to the module.

For the same text to be generated, the text of each module is used to describe the same event. For example, the "Title" module and "Text" module in the press release covering the 2018 World Cup start all describe the start of the 2018 World Cup. For each module that generates text, while the requirement information of each module indicates its corresponding text content, it can also indicate the event described by the text to be generated to which the module belongs.

There are various ways to obtain multiple valid texts from the preset database that meet the requirements of the module. Exemplarily, a method of performing keyword matching may be used to obtain text containing the requirement information of the module from a preset database. Alternatively, as an example, the requirement information may be used as the text to be answered, and the position of the answer matching the text to be answered is obtained from a preset database by using a reading comprehension technique, and the answer at this position is used as the valid text. Any method for obtaining valid text can be used in this application, and this embodiment does not limit this.

S102. For each module, input multiple valid texts of the module into a first recurrent neural network trained in advance, and obtain a first feature vector of each valid text of the module. The first recurrent neural network uses multiple Pre-collected samples of valid text that meet the specified requirements are obtained by training.

For each module, the events described by the sample valid text that meets the specified requirement information have the same characteristics as the events described by the module's requirement information, and the specified requirement information can be the same or similar to the module's requirement information. Can be set according to specific application needs. For example, when the demand information is "2018 World Cup", the specified demand information can be "2018 World Cup", "2008 Olympics", or "2018 NBA" and so on. When the demand information is "Spring Game", the specified demand information may be "Spring Game", "Winter Game", or "Indoor Game" and so on.

In addition, for example, the RNN (Recurrent Neural Networks) may have a structure as shown in FIG. 2. The current input of the neuron 202 in the hidden layer may include the output 2010 of the input layer 201 and the neuron 202. The output 2020 at a moment enables the recurrent neural network to remember and use the output at the previous moment to determine the output at the current moment, and then obtains the feature vector output by the output layer 203. Therefore, for texts where each participle is not isolated, the current participle and the previous participle can be used to predict the next participle. When extracting the feature vector of the effective text, in order to make the extracted features not only include the features of a single participle, but also reflect The relationship between each participle in the text can be used to extract the feature vector of the effective text using a recurrent neural network. The recurrent neural network can remember and use the output of the previous moment to determine the characteristics of the output at the current moment, so that the extracted feature vector It can reflect the characteristics of each segmentation in the effective text and the characteristics of the relationship between each segmentation. Based on this, the first recurrent neural network in the above S102 obtained by training with a plurality of sample valid texts collected in advance and meeting the specified demand information, establishes a mapping relationship between valid texts and feature vectors, thereby ensuring the obtained first The feature vector can reflect the semantic features of the effective text as a whole, not just the features of the individual participles in the text. For example, if the current participle is "hit" and the previous participle is "driving", the next participle is likely to be "hurt".

In addition, it can be understood that the recurrent neural network in any embodiment of the present application is similar to the first recurrent neural network in S102 described above, the difference is that in order to implement the extraction of feature vectors of different input texts, it is used for training to obtain different loops The samples of the neural network are different.

S103. For each module, first input the first feature vector of each valid text of the module into the memory network obtained in advance, and obtain the word segmentation of each valid text of the module. A position information. The text structure of the filled text is the same as the text structure of the first sample text used in the training of the memory network. The first sample text is a text that conforms to the structure of the natural language expression and meets the information required by the module. The network is obtained by training with a plurality of pre-collected first sample texts.

Taking into account the characteristics of the fixed writing format of the text to be generated: the arrangement of the modules is only related to the fixed format of the text to be generated, and does not involve the structure of the text in the text to be generated. For example, when the "Title" module is arranged behind the "Body" module, and the text in these two modules conforms to the natural language expression structure, the text to be generated is only a fixed format exception, and the text does not conform to the natural language. The problem of structure. Therefore, to make the text to be generated conform to the natural language expression structure, it is necessary to ensure that the filled text of each module conforms to the natural language expression structure.

To this end, a plurality of pre-collected first sample texts can be used to train a memory network, and the first sample texts are samples that conform to the natural language content structure and meet the module's requirement information. Therefore, the first position information of each participle in the effective text obtained from the memory network in the filled text is the same as the position information of each participle in the first sample text, and it can be guaranteed that in the subsequent step S104, the first position information By arranging each participle in each valid text, the resulting filled text is that the arrangement position of each participle is the same as the position information of each participle in the first sample text to ensure that the filled text conforms to the natural language content structure.

The memory network in this embodiment may specifically have a structure as shown in FIG. 3:

The input layer 301 is a first recurrent neural network having the same structure as the recurrent neural network in the embodiment of FIG. 2 of the present application, and is used to obtain a first feature vector and input the first feature vector to a hidden layer, which will not be repeated here. For details, see The above description of the embodiment shown in FIG. 2.

The hidden layer 302 may specifically include a neuron 3020, a neuron 3021, and a neuron 3022. When determining the position of each participle in the text, the context relationship between each participle in the text will affect the position of the participle. Therefore, the hidden layer 302 may Adopt the structure of recurrent neural network. In addition, the position of each participle is related to the characteristics of the entire text. Therefore, it is also necessary to save the historical state information 3023 of each neuron as the input of each neuron. For example, the input of neuron 3021 may include the output and status information of all 3020 neurons. 3023. Therefore, features can be extracted from the input according to the historical state information stored in the memory network to extract features associated with the historical state information. For example, in step S103 above, a plurality of pre-collected first sample texts are used to train a memory network, and the memory network stores historical state information of the first sample text that conforms to the natural language content structure and meets the module's requirement information. Then, the memory network is used to determine each participle in the valid text. When filling the first position information in the text, the input of the valid text can be determined based on the historical state information saved by each neuron to indicate compliance with the natural language content structure. In the feature, the start position 303 and the end position 304 of each participle.

For ease of understanding, in the embodiments of the present application, word segmentation is used as an example for effective description. In specific applications, the valid text is not limited to the word segmentation, and may include sentences and paragraphs.

Exemplarily, the first sample text of the requirement information "Spring Game" is "Little Red Kick". In the first sample text, the word "Little Red" is in the first position, "Kick" is in the second position, and "Bitch" is in the third position. Then the memory network trained using the first sample can input the features of the network The position of the participle "Xiao Ming" having the same characteristics as "Xiao Hong" in the vector is determined as the first position, the position of the participle "put" having the same characteristics as "Kick" is determined as the second position, and it has the same position as "Xunzi" The position of the participle "kite" with the same characteristics is determined as the third position.

S104. For each module, arrange the participles in each valid text of the module according to the obtained first position information to obtain the filled text of the module.

Because the first position information determined in step S103 is the same as the position of each participle in the first sample text, therefore, in step S104, each participle in the valid text is arranged according to the first position information, and the text structure of the obtained filled text is The text structure of the first sample text is the same and conforms to the structure of natural language expression. For example, based on the first position information of the valid texts "Xiao Ming", "Fang" and "Kite" corresponding to the first feature vector, the filled text "Xiao Ming Flies Kite" is obtained. Through the determination of the first position information, it is possible to generate filled text that conforms to the structure of natural language expressions, and avoids filling the valid text directly into the module, which can result in expressions that do not conform to natural language, such as "Kite Flying Xiaoming" or "Kite Flying Xiaoming". Custom filled text.

S105. According to the fixed writing format of the text to be generated, arrange the filled text of each module to obtain the text to be generated.

The fixed writing format of the text to be generated may include an arrangement rule of each module, and the identification information of each module is used to distinguish each module, and then the filled-in text of each module is arranged by using the identification information of the module. For example, the fixed writing format of the text to be generated includes: "theme" module M1 is arranged before the "body" module M2, and the filling text of the identification information M1 of the module can be arranged in front of the filling text of the identification information M2 of the module.

A text generation method provided in the embodiment of the present application is that, for each module, the memory network is trained by using a plurality of pre-collected first sample texts, and the first sample text conforms to a natural language content structure, and A sample of information that meets the requirements of the module. Therefore, the first position information of each participle in the valid text obtained from the memory network in the filled text is the same as the position information of each participle in the first sample text. On this basis, the segmentation in the valid text is arranged according to the first position information, and the text structure of the obtained filled text is the same as that of the first sample text, which also conforms to the natural language expression structure. Therefore, it is guaranteed that the filled text of each module is arranged according to the fixed writing format of the text to be generated, and the obtained text to be generated is a text conforming to the structure of natural language expression.

Optionally, after step S101 in the embodiment shown in FIG. 1 of the present application, the text generating method provided in the embodiment of the present application may further include the following steps:

For each module, the first identification information of the module is marked for each valid text of the module.

The first identification information is preset information used to indicate the uniqueness of each module.

Correspondingly, step S105 in the embodiment shown in FIG. 1 of the present application may specifically include:

For each module, according to the preset correspondence between the first identification information and the module position, determine the sixth position information of the filled text of the module in the text to be generated, and the correspondence between the preset first identification information and the module position. A fixed writing format used to represent the text to be generated;

Each filled text is arranged according to the sixth position information to obtain the text to be generated.

In order to obtain the text to be generated, the filling text of each module needs to be arranged according to the fixed writing format of the text to be generated to which the module belongs. Specifically, the fixed writing format of the text to be generated may be expressed in advance as a correspondence table or mapping (for example, key-value) of the first identification information and the module position, thereby according to the correspondence between the first identification information and the module position. , The sixth position information of each filled text in the text to be generated can be determined, so that each filled text is arranged according to the sixth position information, and the obtained text to be generated is a text that conforms to the fixed writing format of the text to be generated.

Exemplarily, a fixed writing format for generating hot news includes ["Title" module, "Posting time" module, "Text" module]. Mark the first identification information a1 for the filling text of the "Title" module "2018 World Cup", and mark the first identification information a2 for the filling text of the "Posting Time" module on June 14, 2018, position 02, as the "body text" Filled text of the module "The 2018 World Cup matches will start on June 14, 2018 and will run until July 15 in 12 stadiums in 11 cities in Russia. This is the first time the World Cup has been held in Russia." Mark the first identification information a3, position 03. According to the preset correspondence between the first identification information and the module position [a1 corresponds to position 01, a2 corresponds to position 02, and a3 corresponds to position 03], it is determined that the sixth position information of the filling text of the "title" module in the text to be generated is At position 01, the sixth position information of the filled text of the "Post Time" module in the text to be generated is position 02, and the sixth position information of the filled text of the "Text" module in the text to be generated is position 03. Arrange each fill text according to the sixth position information to get the text to be generated [Title: 2018 World Cup starts; Release time: June 14, 2018; Text: 2018 World Cup matches start on June 14, 2018, and will continue until July 15, held in 12 stadiums in 11 cities in Russia, this is the first time the World Cup was held in Russia].

When obtaining the text that meets the requirement information, in order to avoid the problem that the requirement information is used as a keyword and the text content obtained by keyword matching is not accurate and rich enough, the requirement information can be used as the text to be answered and the requirement information The text is used as the answer to the text to be answered, and the valid text is obtained from the semantic level of the requirement information, so as to avoid the problem of inaccurate and insufficient rich text obtained by matching only at the text level.

Therefore, optionally, step S101 in the embodiment shown in FIG. 1 of the present application may specifically include the following steps 1 to 5:

Step 1: For each module in the fixed writing format of the text to be generated, obtain a plurality of complete texts from the preset database that meet the events described by the text to be generated, as the backup text of the module.

For the same text to be generated, the complete text of each module is used to describe the same event. For example, the "Title" module and "Body" module in the press release covering the 2018 World Cup start all describe the 2018 World Cup start. For each module of the text to be generated, while the requirement information of each module indicates the complete text of the module itself, it can also indicate that the module belongs to the same event described by the text to be generated. On this basis, in order to ensure that the obtained valid text describes the same event and can obtain rich valid text, multiple complete texts in the preset database that match the events described by the text to be generated can be used as backup text for each module. Of course, although each module uses the same multiple complete texts, different modules have different requirements information, so the valid text of different requirements information is different texts in multiple complete texts, so it will not cause the problem of duplicate content . For example, for a civil indictment to be generated, the valid text of the "Party Natural Situations" module is the party's information text in the case data, and the valid text of the "Cause" module is the text of the lawsuit request in the case data.

Step 2: For each module, input each backup text of the module into a second recurrent neural network trained in advance to obtain a second feature vector of each backup text. The second recurrent neural network is a plurality of previously collected The sample backup text is obtained by training.

Step 3: For each module, input the module's demand information into a third recurrent neural network trained in advance, and obtain a third feature vector of the demand information as the feature vector of the module. The third recurrent neural network consists of multiple The sample requirement information of the module collected in advance is obtained through training.

In specific applications, when the requirement information is used as the text to be answered and the text that meets the requirement information is used as the answer to the text to be answered, it is equivalent to calculating the feature matching degree between the standby text and the requirement information. Therefore, for each module, it is necessary to obtain each second feature vector of each backup text of the module and the third feature vector of the requirement information of the module. In addition, the second recurrent neural network and the third recurrent neural network are recurrent neural networks with the same structure as the recurrent neural network in the embodiment of FIG. 2 of the present application. The difference is that in order to obtain corresponding outputs for different inputs, they are used for training to obtain different The samples of the recurrent neural network are different. The same parts are not repeated here. For details, refer to the description of the embodiment shown in FIG. 2 above.

Step 4. For each module, input the vector information corresponding to each backup text of the module into the fourth recurrent neural network obtained in advance, and obtain each backup text of the module that meets the information required by the module. The second position information of the module; wherein the vector information corresponding to any backup text of the module includes: the second feature vector of the backup text and the feature vector of the module; the fourth recurrent neural network is The third position information is obtained by training the sample complete text of the same event corresponding to the requirement information of the module, and the third position information is the position information of the text that meets the module's requirement information in the sample complete text.

Exemplarily, a backup text and a sample full text are taken as examples, the third position information is marked, and the sample full text describing the event "Spring Game" described by the text to be generated is "Spring is here, children can go out and play." , Xiaohong went to kick the shuttlecock. " The third position information corresponding to the requirement information "Games for Spring" marked in the complete text of the sample includes: the first position information of "Spring", the first position, and the tenth position of the ending position information of "Kicker". Then, using the fourth recurrent neural network trained on the full text of the sample, the backup text "Vector in spring, children can go out to play, and Xiaoming likes flying a kite" is the vector information: the second feature vector and the demand information "Spring Day Game The third feature vector, that is, the feature vector of the module is input to the fourth recurrent neural network, so as to obtain the second position information of the text of "Spring Game" in the standby text that meets the module's requirement information: "Spring" The first position information is the first position, and the end position information of the "kite" is the tenth position.

In step 5, for each module, the text at the corresponding second position information is extracted from each of the backup texts of the module as valid text that meets the requirement information of the module.

After the second position information is obtained through the above step 4, for each module, the text at the corresponding second position information can be extracted from each standby text of the module, as valid text that meets the module's requirements information. . Exemplarily, from the backup text of the module "in the spring, children can go out to play, Xiaoming likes to fly a kite", the second position information corresponding to the backup text is extracted: The text "Spring kite" at the 1st position and the end position information of the "Kite Flying" is located at the 10th position information, as the valid text of the information "Spring Game" that meets the requirements of this module.

In specific applications, there may be a case where there is multiple requirement information for the same module, and at this time, a valid text needs to be obtained for each requirement information of the module. In this regard, optionally, for each module in the fixed writing format of the text to be generated, when the requirement information of the module is multiple:

S101 in the embodiment shown in FIG. 1 in the foregoing application may specifically include the following steps:

For each module in the fixed writing format of the text to be generated, multiple valid texts are obtained from a preset database that meets each requirement information of the module.

For example, in a fixed writing format to generate hot news, the demand information of the "body" module includes: demand information Q1 "2018 World Cup holding time", demand information Q2 "2018 World Cup holding place", and demand information Q3 " Special information for the 2018 World Cup. " Then the multiple valid texts from the preset database that meet each requirement information of the "body" module include: the valid text of the demand information Q1 A1 "The 2018 World Cup match starts on June 14, 2018" and A2 " The 2018 World Cup will continue until July 15 ", the effective text of demand information Q2 A3" in Russia "and A4" held in 12 stadiums in 11 cities ", the effective text of demand information Q3 A5" World Cup for the first time Held in Russia. "

Accordingly, S102 in the embodiment shown in FIG. 1 of the foregoing application may specifically include:

For each module, a plurality of valid texts of each requirement information of the module are respectively input into a first recurrent neural network trained in advance to obtain a first feature vector of each valid text.

Different from obtaining multiple valid texts corresponding to requirement information of the same module in S102 in the embodiment shown in FIG. 1, this step obtains multiple valid texts corresponding to multiple requirement information of the same module.

Correspondingly, before S103 in the embodiment shown in FIG. 1 of the present application, the text generation method provided in the embodiment of the present application may further include:

For each module, each requirement information of the module is input into a third recurrent neural network trained in advance, and a third feature vector of each requirement information of the module is obtained. The third recurrent neural network is a plurality of previously collected This module is obtained by training the sample requirement information of the module.

In specific applications, if there is multiple requirement information in the same module, the valid text corresponding to the multiple requirement information needs to be arranged according to the respective corresponding requirement information. Therefore, it is necessary to obtain a feature vector of each requirement information of the same module for subsequent determination of position information of multiple valid texts of the module.

Correspondingly, S103 in the embodiment shown in FIG. 1 of the present application may specifically include:

For each module, input the vector information corresponding to each requirement information of the module into the pre-trained memory network, and obtain the valid text corresponding to each requirement information of the module and the corresponding first position information; The first position information is each participle in the valid text, and the position information in the filled text of the module; the vector information corresponding to any requirement information of the module includes: the first feature of each valid text corresponding to the requirement information Vector, and the third feature vector of the demand information; the filled text is the text corresponding to the module, and the text structure of the filled text and the text structure of the first sample text marked with the fourth position information used in the training of the memory network Similarly, the fourth position information is position information of each text in the first sample text that meets the specified requirement information.

Because the first sample text is labeled with the fourth position information of each text that meets the specified requirement information, for each module, the vector information corresponding to each requirement information of the module can be input into the pre-trained The memory network obtains the valid text corresponding to each requirement information of the module and the corresponding first position information. Based on this, the segmented words in each valid text are subsequently arranged according to the first position information, and the resulting filled text has the same structure as the first sample text, and the first sample text is in line with natural language description habits. , So filled text is also in line with natural language description habits. In this embodiment, the third feature vector of the demand information corresponding to the effective text is also input to the memory network, and the first position of the first sample text used for training the memory network is labeled with a fourth position, thereby ensuring the accuracy of each segmentation in the determined effective text. The fourth position information can be arranged according to the demand information.

The memory network in this embodiment may specifically have a structure as shown in FIG. 4:

The memory network in this embodiment is similar to the memory network in the embodiment in FIG. 3 described above. The difference is that in order to cope with a situation where there is multiple demand information, the memory network in this embodiment adds an input layer 401 for each module. A third feature vector of each requirement information of the module is obtained, and the third feature vector is input to the hidden layer. Regarding the recurrent neural network, it will not be repeated here. For details, refer to the description of the embodiment in FIG. 2 described above. After adding the input layer 401, neuron 406, and historical state information 4033 corresponding to the neuron to extract the third feature vector, the output of the input layer 401 is used as the input of the neuron 4030 to obtain each requirement information of the module The corresponding valid text and the corresponding first position information. In addition, by adding the output of the neuron 406 to the output of the neuron 4032, it is possible to determine the probability that the start position 404 and the cut-off position 405 of each segmentation of the output belong to different demand information, so that the determined validity can be guaranteed based on the probability. The position of each participle in the text is arranged corresponding to the demand information.

In addition, the input layer 402, the hidden layer 403, the neuron 4030, the neuron 4031, the neuron 4032, the historical state information 4033, the start position 404, and the cut-off position 405 of each participle are the same as the memory network in the embodiment of FIG. 3 of this application. The input layer 301, hidden layer 302, neuron 3020, neuron 3021, neuron 3022, historical state information 3023, start position 303, and cut-off position 304 of each participle are the same, and will not be repeated here, see Figure 3 for details. Description of the illustrated embodiment.

Exemplary, a first sample text that meets the specified requirements information Q11 "2008 Olympic Games Held", Q12 "2008 Olympic Games Held Location", and Q13 "2008 Olympic Special Information" marked with fourth location information "The 2008 Olympic Games will be held in 6 cities in China from August 8 to August 24, 2008. This is the first time that the Olympic Games will be held in China." The fourth position information marked in the first sample text includes the position information 4th and 6th positions of "August 8, 2008" and "August 24, 2008" that meet the specified requirement information Q11, which meet the specified requirements. The eighth and ninth positions of the location information of the "inside China" and "six cities" of the demand information Q12, and the twelfth location of the location information of the "first Olympics held in China" that meets the specified demand information Q13. The valid texts A1 to A5 obtained above and the requirement information Q1 to Q3 of the "body" module are input into the memory network, and the fourth position information of the valid text of each demand information in the filled text is determined, so that each fourth position information is used Later, we got the filled text with the same structure and natural language as the first sample. "The 2018 World Cup will be held from June 14 to July 15, 2018 in 12 stadiums in 11 cities in Russia. This is the first time the World Cup has been held in Russia. "

In specific applications, in many texts to be generated with a fixed writing format, it is likely that the complete text of the module is a structured text and the complete text of the module is an unstructured text. At this time, structured type text has a fixed representation structure, compared with unstructured type text, it requires less information to be determined by the neural network, and usually the neural network will occupy a large amount of computing resources. Therefore, in order to reduce the occupation of computing resources and improve the efficiency of text generation, the text type of the module can be determined, so that different text generation methods can be performed on modules with different text types in a targeted manner.

For this reason, as shown in FIG. 5, a flow of a text generation method according to another embodiment of the present application may include:

S501. For each module, input the requirement information of the module into a preset classification algorithm to obtain the text type of the filled text of the module, and the text type includes a structured type and an unstructured type. When the text type of the filled text of the module is an unstructured type, S502 to S505 are performed, and when the text type of the filled text of the module is a structured type, S506 to S508 are performed.

The preset classification algorithm may specifically be a support vector machine algorithm, a logistic regression algorithm, or a pre-trained convolutional neural network using a plurality of sample demand information corresponding to structured text and unstructured text collected in advance. . It can also be judged whether the demand information is preset information corresponding to the text type. For example, for a civil indictment to be generated, the preset information corresponding to the structured type is "the natural situation of the parties", "the respondent court", "payment", and " Attachment ", the default information corresponding to the unstructured type is" suit request "and" facts and reasons ". Any classification algorithm capable of determining the text type corresponding to the model based on the model's requirement information can be used in this application, which is not limited in this embodiment.

When the convolutional neural network is used to determine the text type of the filled text, it may specifically have a structure as shown in FIG. 6. The hidden layer of the neural network of this embodiment has two feature extraction channels. After inputting the demand information through the input layer 601, the channel 602 is used to extract local feature variables, and the channel 603 is used to extract global feature variables to ensure that the extracted features not only reflect the needs. The characteristics of each participle in the information can also reflect the overall semantics of each participle. By synthesizing the local feature variables and the global feature variables, the probability that the demand information output by the output layer 604 belongs to different text types is obtained, and based on the output probability, the text type of the filled text corresponding to the input demand information is determined.

Structured type text includes text or sentences with a fixed structure of expression. Unstructured type text includes text or sentences with a fixed structure of text. For example, each module in a fixed writing format of a hot news is a "title" module, a "release date" module, and a "body" module, where the text of the "title" and "release date" modules are structured text, The text of the Body module is unstructured text.

S502: For each module in the fixed writing format of the text to be generated, obtain a plurality of valid texts from a preset database that meet the requirement information of the module, and the requirement information is used to indicate the text content corresponding to the module.

S503. For each module, input multiple valid texts of the module into the first recurrent neural network trained in advance to obtain the first feature vector of each valid text. The first recurrent neural network uses multiple pre-collected A sample of valid text that meets the specified requirements is obtained by training.

S504. For each module, first input the first feature vector of each valid text of the module into the memory network obtained in advance, and obtain the participles in each valid text of the module. The first position information, the text structure of the filled text is the same as the text structure of the first sample text used in the training of the memory network, and the first sample text is a text that conforms to the structure of the natural language expression and meets the required information of the module. The memory network is obtained by training a plurality of pre-collected first sample texts.

S505: For each module, arrange the participles in each valid text of the module according to the obtained first position information to obtain the filled text of the module.

Steps S502 to S505 are the same steps as S101 to S104 in the embodiment shown in FIG. 1 of this application, and are not repeated here. For details, refer to the description of the embodiment shown in FIG. 1 of this application.

S506. Input multiple valid texts of the module into a sequence labeling model trained in advance to obtain the second identification information of each participle in each valid text. The sequence labeling model is a plurality of pre-labeled second labels that are collected in advance. The information is obtained by training a second sample of valid text that meets the requirements of the module.

The second identification information is used to represent the uniqueness of each participle in the valid text. The sequence labeling model is used to label the second valid information of the input valid text, and is used to determine the position information of each participle in the filled text in step S507. The sequence labeling model in this embodiment may specifically have the structure shown in FIG. 7. The valid text is input to the sequence labeling model through the input layer 701 in the form of a string. After the feature extraction of the hidden layer 702, the second identification information corresponding to each segmentation is determined, so that the second segment of each segmentation is labeled at the output layer 703. Identification information. Considering that there is an association between each participle in the text, the context of a certain participle will affect the semantics of the participle. Therefore, in this embodiment, each neuron in the hidden layer of the sequence labeling model is an LSTM network (Long Short Term Memory, a kind of An RNN network with a special structure), when the network is a neuron, information is exchanged between neurons to extract a feature that reflects the overall semantics of the effective text, and based on this feature, the second identification information is marked for each participle of the effective text .

S507: Determine, according to the second identification information, the fifth position information of each participle in each valid text in the filled text of the module by using a preset correspondence between the identification and the participle position information.

Wherein, the preset correspondence between the identifier and the segmentation position information may be a correspondence table between the identifier and the segmentation position information, and may also be a correspondence mapping (for example, key-value).

S508: Arrange the participles in each valid text according to the fifth position information of each participle in each valid text to obtain a filled text.

Exemplarily, the filling text corresponding to the "Title" module in a fixed writing format to generate hot news is structured type text, and the valid text "2018 World Cup starts on June 14 in Russia" is entered into a preset sequence label The model obtains the second identification information g1 of the segmentation "2018", the second identification information g2 of the segmentation "World Cup", and the second identification information g3 of the segmentation "open match". Using the preset correspondence between the identifier and the participle position information ["g1-position 1", "g2-position 2", "g3-position 3"], determine that the fifth position information of the participle "2018" is position 1, The fifth position information of the participle "World Cup" is position 2 and the fifth position information of the participle "kickoff" is position 3. According to the fifth position information of each participle, arrange each participle in each valid text to get the filled text "2018 World Cup start".

S509. According to the fixed writing format of the text to be generated, arrange the filled text of each module to obtain the text to be generated.

The above S509 is the same step as S105 in the embodiment of FIG. 1 of this application, which is not repeated here. For details, refer to the description of the embodiment of FIG. 1 in this application.

In the above-mentioned embodiment of FIG. 5 of the present application, the structured type text has a fixed representation structure. Compared with the unstructured type text, less information needs to be determined through the neural network, and usually the neural network will occupy a large amount of computing resources. . Therefore, by determining the text type of the module, and accordingly performing different text generation methods on modules with different text types, it can reduce the occupation of computing resources and improve the efficiency of text generation.

Corresponding to the foregoing method embodiments, an embodiment of the present application further provides a text generating device.

As shown in FIG. 8, the structure of a text generating device according to an embodiment of the present application may include:

A text acquisition module 801, for each module in the fixed writing format of the text to be generated, obtains a plurality of valid texts from a preset database that meet the requirement information of the module, and the requirement information is used to indicate the text corresponding to the module content;

A feature extraction module 802 is configured for each module to input multiple valid texts of the module into a first recurrent neural network trained in advance to obtain a first feature vector and a first recurrent neural network of each valid text of the module. The network is obtained by training with multiple pre-collected sample valid texts that meet the specified requirements information;

A position information determining module 803 is configured to input a first feature vector of each valid text of the module into a memory network obtained in advance for each module, and obtain the word segmentation of each valid text of the module in the filling The first position information in the text, the text structure of the filled text is the same as the text structure of the first sample text used in the training of the memory network, and the first sample text is information that conforms to the structure of the natural language expression and meets the specified requirements Text, the memory network is obtained by training with a plurality of the first sample texts collected in advance;

A text generating module 804, for each module, arranging the participles in each valid text of the module according to the obtained first position information to obtain filled text; according to a fixed writing format of the text to be generated, Arrange the filled text of each module to obtain the text to be generated.

A text generating device provided in the embodiment of the present application is that, for each module, a memory network used is obtained by training a plurality of pre-collected first sample texts, and the first sample texts conform to natural language content. A sample of structure and information that meets the requirements of the module. Therefore, the first position information of each participle in the valid text obtained from the first memory network in the filled text is the same as the position information of each participle in the first sample text. On this basis, the segmentation in the valid text is arranged according to the first position information, and the text structure of the obtained filled text is the same as that of the first sample text, which also conforms to the natural language expression structure. Therefore, the filling text of each module is arranged according to the fixed writing format of the text to be generated, and the obtained text to be generated is also a text conforming to the structure of natural language expression.

Optionally, the text generation module 804 is specifically configured to:

For each module, mark the first identification information of the module for each valid text of the module;

For each module, the sixth position information of the filled text of the module in the text to be generated is determined according to the preset correspondence between the first identification information and the module position, and the preset first identification information and the position of the module The correspondence relationship is used to represent a fixed writing format of the text to be generated;

Optionally, the text acquisition module 801 is specifically used for:

For each module in the fixed writing format of the text to be generated, obtaining a plurality of complete texts from the preset database that conform to the events described by the text to be generated, as the backup text of the module;

Correspondingly, the feature extraction module 802 is further configured for each module to input each backup text of the module into a second recurrent neural network trained in advance to obtain a second feature vector and a second recurrent neural network of each backup text. The network is trained with multiple pre-collected sample backup texts. The requirement information of the module is input into a third recurrent neural network trained in advance, and a third feature vector of the demand information is obtained as the feature vector of the module. The third recurrent neural network is a plurality of sample requirements of the module collected in advance. Information obtained through training;

Correspondingly, the position information determining module 803 is further configured for each module to input the vector information corresponding to each backup text of the module into the fourth recurrent neural network obtained in advance, and obtain each backup text of the module. , The second position information of the text that meets the requirement information of the module; wherein the vector information corresponding to any backup text of the module includes: the second feature vector of the backup text and the feature vector of the module; the fourth recurrent neural network It is obtained by training with a plurality of pre-collected samples of the complete text marked with the third position information and describing the same event corresponding to the specified demand information. The third position information is the text that meets the demand information of the module. Position information in the text;

Correspondingly, the text obtaining module 801 is specifically used for each module to extract the text at the corresponding second position information from each standby text of the module, as valid text that meets the requirements of the module.

Optionally, for each module in the fixed writing format of the text to be generated, the requirement information of the module is multiple:

Correspondingly, the text acquisition module 801 is specifically used for:

For each module in the fixed writing format of the text to be generated, obtain a plurality of valid texts from a preset database that meets each requirement information of the module;

Correspondingly, the feature extraction module 802 is further configured to:

For each module, input multiple valid texts of each requirement information of the module into the first recurrent neural network trained in advance to obtain the first feature vector of each valid text;

For each module, each requirement information of the module is input into a third recurrent neural network trained in advance, and a third feature vector of each requirement information of the module is obtained. The third recurrent neural network is a plurality of previously collected Obtained by training the sample requirement information of the module;

Correspondingly, the location information determining module 803 is specifically configured to:

For each module, input the vector information corresponding to each requirement information of the module into the pre-trained memory network, and obtain the valid text corresponding to each requirement information of the module and the corresponding first position information; The first position information is each participle in the valid text, and the first position information in the filled text of the module; the vector information corresponding to any requirement information of the module includes: each valid text corresponding to the requirement information And the third feature vector of the required information; the text structure of the filled text is the same as the text structure of the first sample text labeled with the fourth position information used in the training of the memory network, and the fourth position information Position information of each text in the first sample text that meets the requirement information.

As shown in FIG. 9, the structure of a text generating device according to another embodiment of the present application may include:

A text classification module 901 is configured to input the requirement information of the module into a preset classification algorithm for each module, and obtain a text type of the module's filled text, where the text type includes a structured type and an unstructured type;

A text acquisition module 902 is used for each module in the fixed writing format of the text to be generated. When the text type of the filled text of the module is an unstructured type, the information corresponding to the module's requirements is obtained from a preset database. Multiple valid texts;

A feature extraction module 903 is configured for each module. When the text type of the filled text of the module is an unstructured type, multiple valid texts of the module are input into a first recurrent neural network trained in advance to obtain each First feature vector of valid text;

The position information determining module 904 is configured for each module. When the text type of the filled text of the module is an unstructured type, the first feature vector of each valid text is separately input into a memory network obtained in advance to obtain each The first position information of each participle in each valid text in the filled text of the module;

The text acquisition module 902 is further configured for each module in the fixed writing format of the text to be generated. When the text type of the filled text of the module is a structured type, the plurality of valid text inputs of the module are pre-trained. The obtained sequence labeling model obtains the second identification information of each participle in each valid text. The sequence labeling model is a plurality of pre-collected pre-labeled second identification information and meets the requirements of the module. Obtained by training the second sample of valid text;

The position information determining module 904 is further configured to determine, according to the second identification information, the fifth position information of each participle in each valid text in the filled text of the module by using a preset correspondence between the identification and the positional part information;

The text generating module 905 is further configured to arrange the participles in each valid text according to the fifth position information of each participle in each valid text to obtain the filled text of the module; according to the text to be generated Fixed writing format, arrange the filled text of each module to get the text to be generated.

Corresponding to the foregoing embodiments, an embodiment of the present application further provides a computer device, as shown in FIG. 10, which may include:

A processor 1001, a communication interface 1002, a memory 1003, and a communication bus 1004, where the processor 1001, the communication interface 1002, and the memory communicate with each other through the communication bus 1004 through the communication bus 1004;

A memory 1003, configured to store a computer program;

The processor 1001 is configured to implement the steps of the text generating method in any one of the foregoing embodiments when the computer program stored in the memory 1003 is executed.

A computer device provided in the embodiment of the present application is that, for each module, a memory network used is obtained by training a plurality of pre-collected first sample texts, and the first sample text conforms to a natural language content structure. Samples that meet the module's requirements information. Therefore, the first position information of each participle in the valid text obtained from the first memory network in the filled text is the same as the position information of each participle in the first sample text. On this basis, the segmentation in the valid text is arranged according to the first position information, and the text structure of the obtained filled text is the same as that of the first sample text, which also conforms to the natural language expression structure. Therefore, it is guaranteed that the filled text of each module is arranged according to the fixed writing format of the text to be generated, and the obtained text to be generated is a text conforming to the structure of natural language expression.

The foregoing memory may include RAM (Random Access Memory, Random Access Memory), and may also include NVM (Non-Volatile Memory, non-volatile memory), such as at least one disk memory. Optionally, the memory may also be at least one storage device located far from the processor.

The above processor may be a general-purpose processor, including a CPU (Central Processing Unit), a NP (Network Processor), etc .; it may also be a DSP (Digital Signal Processor), ASIC (Application Specific Integrated Circuit (ASIC), FPGA (Field-Programmable Gate Array), or other programmable logic devices, discrete gate or transistor logic devices, discrete hardware components.

A computer-readable storage medium provided by an embodiment of the present application. The computer-readable storage medium stores a computer program. When the computer program is executed by a processor, the steps of the text generation method in any of the foregoing embodiments are implemented.

A computer-readable storage medium provided by an embodiment of the present application. When the computer program is executed by a processor, since a memory network used for each module is obtained by using a plurality of pre-collected first sample texts, And the first sample text is a sample that conforms to the natural language content structure and meets the module's requirements information. Therefore, the first position information of each participle in the valid text obtained from the first memory network in the filled text is the same as the position information of each participle in the first sample text. On this basis, the segmentation in the valid text is arranged according to the first position information, and the text structure of the obtained filled text is the same as that of the first sample text, which also conforms to the natural language expression structure. Therefore, it is guaranteed that the filled text of each module is arranged according to the fixed writing format of the text to be generated, and the obtained text to be generated is a text conforming to the structure of natural language expression.

In still another embodiment of the present application, a computer program product containing instructions is also provided. When the computer program product is run on a computer, the computer is caused to execute the text generating method in any of the foregoing embodiments.

In another embodiment of the present application, an application program is also provided, and when the application program is running, the text generating method in any of the foregoing embodiments may be executed.

In the above embodiments, it may be implemented in whole or in part by software, hardware, firmware, or any combination thereof. When implemented in software, it may be implemented in whole or in part in the form of a computer program product. The computer program product includes one or more computer instructions. When the computer program instructions are loaded and executed on a computer, all or part of the processes or functions according to the embodiments of the present application are generated. The computer may be a general-purpose computer, a special-purpose computer, a computer network, or other programmable devices. The computer instructions may be stored in a computer-readable storage medium or transmitted from one computer-readable storage medium to another computer-readable storage medium, for example, the computer instructions may be from a website site, computer, server, or data center Transmission by wire (for example, coaxial cable, optical fiber, DSL (Digital Subscriber Line) or wireless (for example: infrared, radio, microwave, etc.) to another website site, computer, server, or data center. A computer-readable storage medium may be any available media that can be accessed by a computer or a data storage device such as a server, data center, etc. that includes one or more available media integrations. The available media may be magnetic media (eg, a floppy disk, a hard disk , Magnetic tape), optical media (for example: DVD (Digital Versatile Disc), or semiconductor media (for example: SSD (Solid State Disk)).

In this article, relational terms such as first and second are used only to distinguish one entity or operation from another entity or operation, and do not necessarily require or imply any such relationship between these entities or operations. Actual relationship or order. Moreover, the terms "including", "comprising", or any other variation thereof are intended to encompass non-exclusive inclusion, such that a process, method, article, or device that includes a series of elements includes not only those elements but also those that are not explicitly listed Or other elements inherent to such a process, method, article, or device. Without more restrictions, the elements defined by the sentence "including a ..." do not exclude the existence of other identical elements in the process, method, article, or equipment including the elements.

Each embodiment in this specification is described in a related manner, and the same or similar parts between the various embodiments can be referred to each other. Each embodiment focuses on the differences from other embodiments. In particular, the embodiments of the apparatus and computer equipment are basically similar to the method embodiments, so the description is relatively simple. For the related parts, refer to the description of the method embodiments.

The above descriptions are merely preferred embodiments of the present application, and are not intended to limit the protection scope of the present application. Any modification, equivalent replacement, and improvement made within the spirit and principle of this application are included in the protection scope of this application.

The above are only preferred embodiments of this application, and are not intended to limit this application. Any modification, equivalent replacement, or improvement made within the spirit and principles of this application shall be included in this application Within the scope of protection.

Claims

A text generation method, characterized in that the method includes:

For each module in the fixed writing format of the text to be generated, obtain a plurality of valid texts from a preset database that meet the requirement information of the module, and the requirement information is used to indicate the text content corresponding to the module;

For each module, input the plurality of valid texts of the module into a first recurrent neural network trained in advance to obtain a first feature vector of each valid text of the module, the first recurrent neural network It is obtained by training with multiple pre-collected sample valid texts that meet the specified requirements information;

For each module, first input the first feature vector of each valid text of the module into the memory network obtained in advance, and obtain the first word of each valid text of the module in the filled text of the module. Location information, the text structure of the filled text is the same as the text structure of the first sample text used in the training of the memory network, and the first sample text is information that conforms to a natural language expression structure and meets specified requirements Text, the memory network is obtained by training with a plurality of the first sample texts collected in advance;

For each module, arrange the participles in each valid text of the module according to the obtained first position information to obtain the filled text of the module;

According to the fixed writing format of the text to be generated, the filled text of each module is arranged to obtain the text to be generated.
The method according to claim 1, wherein, for each module in the fixed writing format for the text to be generated, obtaining a plurality of valid texts from a preset database that meets the module's requirement information includes:

For each module in the fixed writing format of the text to be generated, obtaining a plurality of complete texts from the preset database that conform to the events described by the text to be generated, as the backup text of the module;

For each module, each backup text of the module is input into a second recurrent neural network trained in advance to obtain a second feature vector of each backup text. The second recurrent neural network is Obtained by training the sample backup text;

For each of the modules, the demand information of the module is input into a third recurrent neural network trained in advance, and a third feature vector of the demand information is obtained as a feature vector of the module. The third recurrent neural network is Trained with multiple pre-collected sample requirement information of this module;

For each module, input the vector information corresponding to each backup text of the module into the fourth recurrent neural network trained in advance, and obtain the text of each backup text of the module that meets the information requirements of the module. Second position information; wherein the vector information corresponding to any backup text of the module includes: the second feature vector of the backup text and the feature vector of the module; the fourth recurrent neural network is a plurality of pre-collected labels The third position information is obtained by training the complete text of the sample describing the same event corresponding to the specified demand information, where the third position information is the position information of the text that meets the demand information of the module in the complete text of the sample;

For each module, the corresponding text at the second position information is extracted from each standby text of the module as the valid text that meets the module's requirement information.
The method according to claim 1, wherein the module has a plurality of requirements information:

For each module in the fixed writing format of the text to be generated, obtaining a valid text from a preset database that meets the information requirements of the module includes:

For each module in the fixed writing format of the text to be generated, obtain a plurality of valid texts from a preset database that meets each requirement information of the module;

For each module, inputting the plurality of valid texts of the module into a first recurrent neural network trained in advance to obtain a first feature vector of each valid text of the module includes:

For each of the modules, input a plurality of valid texts of each requirement information of the module into a first recurrent neural network trained in advance to obtain a first feature vector of each valid text of the module;

For each module, firstly input the first feature vector of each valid text of the module into the memory network obtained in advance, and obtain the first part of each participle in each valid text in the filled text of the module. Before a position information, the method further includes:

For each module, each requirement information of the module is input into a third recurrent neural network trained in advance, and a third feature vector of each requirement information of the module is obtained. The third recurrent neural network is Multiple pre-collected sample requirement information of the module is obtained through training;

For each module, the first feature vector of each valid text of the module is input into the first memory network obtained in advance, and the word segmentation in each valid text of the module in the filled text is obtained. The first position information includes:

For each module, input the vector information corresponding to each requirement information of the module into the pre-trained memory network to obtain the valid text corresponding to each requirement information of the module and the corresponding first position information; , The first position information is each participle in the valid text, and the position information in the filled text of the module; the vector information corresponding to any requirement information of the module includes: each valid text corresponding to the requirement information The first feature vector and the third feature vector of the demand information; the text structure of the filled text is the same as the text structure of the first sample text marked with the fourth position information used in the training of the memory network, The fourth position information is position information of each text that meets the requirement information in the first sample text.
The method according to claim 1, characterized in that, for each of the modules, the plurality of valid texts of the module are respectively input into a first recurrent neural network trained in advance to obtain each valid text. Before the first feature vector, the method further includes:

For each module, input the requirement information of the module into a preset classification algorithm to obtain the text type of the filled text of the module, where the text type includes a structured type and an unstructured type;

For each module, when the text type of the filled text of the module is the unstructured type, executing the first recurrent neural network obtained by pre-training the multiple valid texts of the module into the first recurrent neural network obtained by The first feature vector of each valid text.
The method according to claim 4, characterized in that, for each of the modules, after entering the requirement information of the module into a preset classification algorithm to obtain the text type of the filled text of the module, the method further include:

For each module, when the text type of the filled text of the module is the structured type, the following steps are performed:

The multiple valid texts of the module are input into a sequence labeling model trained in advance, and second identification information of each participle in each valid text is obtained. The sequence labeling model uses a plurality of previously collected Training the second sample valid text with the second identification information and the requirement information of the module;

Determining, according to the second identification information, the fifth position information of each participle in each valid text in the filled text of the module by using a preset correspondence between the identification and the participle position information;

According to the fifth position information of each participle in each valid text, arrange the participles in each valid text to obtain the filled text of the module.
The method according to claim 1, wherein after each module in the fixed writing format for the text to be generated, obtains a plurality of valid texts from a preset database that meets the module's requirement information, The method further includes:

For each module, mark the first identification information of the module for each valid text of the module;

Arranging the filled text of each module according to the fixed writing format of the text to be generated to obtain the text to be generated includes:

For each module, the sixth position information of the filled text of the module in the text to be generated is determined according to the preset correspondence between the first identification information and the module position, and the preset first identification information A correspondence relationship with a module position is used to indicate a fixed writing format of the text to be generated;

Arrange each filled text according to the sixth position information to obtain the text to be generated.
A text generating device, characterized in that the device includes:

A text acquisition module, for each module in the fixed writing format of the text to be generated, obtains from the preset database multiple valid texts that meet the module's requirement information, and the requirement information is used to indicate that the module corresponds to Text content

A feature extraction module is configured for each of the modules to input the plurality of valid texts of the module into a first recurrent neural network obtained in advance to obtain a first feature vector of each valid text of the module. The first recurrent neural network is obtained by training a plurality of pre-collected sample valid texts that meet the specified requirements information;

A position information determining module, configured to, for each of the modules, input a first feature vector of each valid text of the module into a pre-trained memory network, and obtain each segmentation word in each valid text of the module in the module; The first position information of the filled text of the module, the text structure of the filled text is the same as the text structure of the first sample text used in the training of the memory network, and the first sample text is in accordance with a natural language expression structure And the text that meets the specified requirement information, the memory network is obtained by training with a plurality of the first sample texts collected in advance;

A text generating module, for each module, arranging the participles in each valid text of the module according to the obtained first position information to obtain the filled text; according to the fixed text to be generated Writing format, arrange the filled text of each module to get the text to be generated.
The apparatus according to claim 7, wherein the text acquisition module is specifically configured to:

For each module in the fixed writing format of the text to be generated, obtaining a plurality of complete texts from the preset database that conform to the events described by the text to be generated, as the backup text of the module;

The feature extraction module is further configured to, for each module, input each backup text of the module into a second recurrent neural network trained in advance to obtain a second feature vector of each backup text, and the second The recurrent neural network is obtained by training with a plurality of pre-collected sample backup texts; the module's requirement information is input into a third recurrent neural network obtained in advance to obtain a third feature vector of the demand information as the module's Feature vector, the third recurrent neural network is obtained by training with a plurality of pre-collected sample demand information of the module;

The position information determining module is further configured to input, for each module, vector information corresponding to each backup text of the module into a fourth recurrent neural network obtained in advance, and obtain each backup text of the module in accordance with The second position information of the text of the demand information of the module; wherein the vector information corresponding to any backup text of the module includes: the second feature vector of the backup text and the feature vector of the module; the fourth recurrent neural network It is obtained by training with a plurality of pre-collected samples of the complete text marked with the third position information and describing the same event corresponding to the specified demand information. The third position information is a text that meets the demand information of the module. Position information in the complete text of the sample;

The text acquisition module is specifically configured for each module to extract the corresponding text at the second position information from each standby text of the module, as valid text that meets the requirements of the module. .
The device according to claim 7, characterized in that the requirement information of the module is multiple:

The text acquisition module is specifically configured to:

For each module in the fixed writing format of the text to be generated, obtain a plurality of valid texts from a preset database that meets each requirement information of the module;

The feature extraction module is further configured to:

For each of the modules, input a plurality of valid texts of each requirement information of the module into a first recurrent neural network trained in advance to obtain a first feature vector of each valid text;

For each module, each requirement information of the module is input into a third recurrent neural network trained in advance, and a third feature vector of the demand information of the module is obtained. Obtained by pre-collecting the sample requirement information of the module;

The location information determining module is specifically configured to:

For each module, input the vector information corresponding to each requirement information of the module into the pre-trained memory network to obtain the valid text corresponding to each requirement information of the module and the corresponding first position information; The first position information is the first position information of each participle in the valid text; the vector information corresponding to any requirement information of the module includes: the first feature vector of each valid text corresponding to the requirement information, And the third feature vector of the demand information; the text structure of the filled text is the same as the text structure of the first sample text labeled with the fourth position information used in the training of the memory network, and the fourth position information Position information of each text in the first sample text that meets the requirement information.
The apparatus according to claim 7, further comprising:

A text classification module, for each of the modules, inputting the requirement information of the module into a preset classification algorithm to obtain a text type of the module's filled text, the text type includes a structured type and an unstructured type;

For each module, when the text type of the filled text of the module is the unstructured type, the text acquisition module is configured to input the multiple valid texts of the module into the first trained first Recurrent neural network to obtain the first feature vector of each valid text.
The apparatus according to claim 10, wherein the text acquisition module is further configured to:

For each of the modules, when the text type of the filled text of the module is the structured type, the multiple valid texts of the module are input into a sequence labeling model trained in advance to obtain the The second identification information of each segmentation, the sequence labeling model is obtained by training with a plurality of pre-collected second sample valid texts which are labeled with the second identification information in advance and meet the requirement information of the module;

The position information determination module is further configured to determine, based on the second identification information, a fifth relationship between each segmented word in each valid text in the filled text of the module by using a corresponding relationship between a preset identifier and the segmented word position information. location information;

The text generating module is further configured to arrange the participles in each valid text according to the fifth position information of each participle in each valid text to obtain the filled text of the module; In the fixed writing format of the text to be generated, the filled text of each module is arranged to obtain the text to be generated.
The apparatus according to claim 7, wherein the text generating module is specifically configured to:

For each module, mark the first identification information of the module for each valid text of the module;

For each module, the sixth position information of the filled text of the module in the text to be generated is determined according to the preset correspondence between the first identification information and the module position, and the preset first identification information A correspondence relationship with a module position is used to indicate a fixed writing format of the text to be generated;

Arrange each filled text according to the sixth position information to obtain the text to be generated.
A computer device is characterized by comprising a processor, a communication interface, a memory, and a communication bus, wherein the processor, the communication interface, and the memory complete communication with each other through the bus; the memory is used to store a computer program; the processor is used to The program stored on the execution memory implements the method steps according to any one of claims 1-6.
A computer-readable storage medium, characterized in that a computer program is stored in the storage medium, and when the computer program is executed by a processor, the method steps according to any one of claims 1-6 are implemented.
An application program, wherein the application program is executed at runtime: the method steps according to any one of claims 1-6.