CN112507692A

CN112507692A - Method and device for establishing style text generation model

Info

Publication number: CN112507692A
Application number: CN202011445182.5A
Authority: CN
Inventors: 陈亮宇; 刘家辰; 肖欣延
Original assignee: Beijing Baidu Netcom Science and Technology Co Ltd
Current assignee: Beijing Baidu Netcom Science and Technology Co Ltd
Priority date: 2020-12-08
Filing date: 2020-12-08
Publication date: 2021-03-16
Anticipated expiration: 2040-12-08
Also published as: CN112507692B

Abstract

The application discloses a method and a device for establishing a style text generation model, and relates to the technical field of natural language processing and deep learning. The scheme of the method for establishing the style text generation model is as follows: acquiring a training sample, wherein the training sample comprises a plurality of first texts and second texts which correspond to the generation tasks of the first texts and have a specific style; constructing a neural network model comprising a generative model, a language model, a connector and a discriminator, wherein the connector is used for connecting the generative model and the language model; and performing countermeasure training on the connector and the discriminator in the neural network model by using the training sample, and obtaining a style text generation model by using the generation model, the language model and the connector in the neural network model obtained by training. According to the method and the device, the target text generated by the style text generation model can have a more vivid style and be smoother while conforming to a corresponding generation task.

Description

Method and device for establishing style text generation model

Technical Field

The present application relates to the field of information processing technologies, and in particular, to a method and an apparatus for building a style text generation model in the fields of natural language processing and deep learning technologies, an electronic device, and a readable storage medium.

Background

Style text means that in a specific text generation task, the generated text has a specific style, such as positive emotion, negative emotion, humorous, romance, banjo, and the like.

In the prior art, when generating a style text, a mode generally adopted is to obtain a text generation model for generating the style text through multitask training of a DAE (Denoising Auto Encoder) technology and a seq2seq technology. However, research shows that the effect of the style text generated by the text generation model obtained by the method is poor.

Disclosure of Invention

The technical scheme adopted by the application for solving the technical problem is to provide a method for establishing a style text generation model, which comprises the following steps: acquiring a training sample, wherein the training sample comprises a plurality of first texts and second texts which correspond to the generation tasks of the first texts and have a specific style; constructing a neural network model comprising a generative model, a language model, a connector and a discriminator, wherein the connector is used for connecting the generative model and the language model; and performing countermeasure training on the connector and the discriminator in the neural network model by using the training sample, and obtaining a style text generation model by using the generation model, the language model and the connector in the neural network model obtained by training.

The technical solution adopted by the present application to solve the technical problem is to provide a device for creating a style text generation model, comprising: the device comprises an acquisition unit, a processing unit and a processing unit, wherein the acquisition unit is used for acquiring a training sample, and the training sample comprises a plurality of first texts and second texts which correspond to generation tasks of the first texts and have specific styles; the device comprises a construction unit, a judgment unit and a control unit, wherein the construction unit is used for constructing a neural network model comprising a generation model, a language model, a connector and a discriminator, and the connector is used for connecting the generation model and the language model; and the training unit is used for performing countermeasure training on the connector and the discriminator in the neural network model by using the training sample, and obtaining a style text generation model by using the generation model, the language model and the connector in the neural network model obtained by training.

An electronic device, comprising: at least one processor; and a memory communicatively coupled to the at least one processor; wherein the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the above method.

A non-transitory computer readable storage medium storing computer instructions for causing the computer to perform the above method.

One embodiment in the above application has the following advantages or benefits: according to the method and the device, the target text generated by the style text generation model can have a more vivid style and be smoother while conforming to a corresponding generation task. Because the technical means of dynamically predicting the style intensity of each character in the target text by the connector in the style text generation model is adopted, the technical problem that the effect of the text generated by the text generation model obtained by training in the prior art is unstable is solved, and the technical effects of more vivid style and more smoothness are achieved while the target text conforms to the corresponding generation task.

Other effects of the above-described alternative will be described below with reference to specific embodiments.

Drawings

The drawings are included to provide a better understanding of the present solution and are not intended to limit the present application. Wherein:

FIG. 1 is a schematic diagram according to a first embodiment of the present application;

FIG. 2 is a schematic diagram according to a second embodiment of the present application;

FIG. 3 is a schematic illustration according to a third embodiment of the present application;

FIG. 4 is a schematic illustration according to a fourth embodiment of the present application;

FIG. 5 is a schematic illustration according to a fifth embodiment of the present application;

FIG. 6 is a schematic illustration according to a sixth embodiment of the present application;

FIG. 7 is a block diagram of an electronic device for implementing a method of modeling a stylistic text generation model in accordance with an embodiment of the present application.

Detailed Description

The following description of the exemplary embodiments of the present application, taken in conjunction with the accompanying drawings, includes various details of the embodiments of the application for the understanding of the same, which are to be considered exemplary only. Accordingly, those of ordinary skill in the art will recognize that various changes and modifications of the embodiments described herein can be made without departing from the scope and spirit of the present application. Also, descriptions of well-known functions and constructions are omitted in the following description for clarity and conciseness.

Fig. 1 is a schematic diagram according to a first embodiment of the present application. As shown in fig. 1, the method for establishing a style text generation model according to this embodiment may specifically include the following steps:

s101, obtaining a training sample, wherein the training sample comprises a plurality of first texts and second texts which correspond to generation tasks of the first texts and have specific styles;

s102, constructing a neural network model comprising a generating model, a language model, a connector and a discriminator, wherein the connector is used for connecting the generating model and the language model;

s103, performing countermeasure training on the connector and the discriminator in the neural network model by using the training sample, and obtaining a style text generation model by using a generation model, a language model and the connector in the neural network model obtained by training;

the style text generation model is used for generating a target text which corresponds to a generation task of the text to be processed and has a specific style according to the text to be processed.

In the method for establishing the style text generation model provided by this embodiment, after the neural network model including the generation model, the language model, the connector and the discriminator is constructed, the neural network model is trained in a mode of performing countertraining on the connector and the discriminator in the neural network model, so that the generation model, the language model and the connector in the trained neural network model are used as the style text generation model.

In the embodiment, in executing the training sample obtained in S101, the first text generating task may be one of a dialog generating task, an abstract generating task, a title generating task, and other text generating tasks; the specific style of the second text may be one of positive emotion, negative emotion, humorous, romantic, headline, swordsman, and the like.

In this embodiment, the plurality of first texts acquired in S101 correspond to the same generation task, and the specific style of the acquired second text is one of the plurality of styles. For example, if the generation task is title generation and the specific style is swordsmen, the second text in the training sample obtained in this embodiment is the title of the first text having the swordsmen style.

After executing S101 to obtain training samples, the present embodiment executes S102 to construct a neural network model including a generative model, a language model, a connector and a discriminator, where the connector is used to connect the generative model and the language model.

In the embodiment, a generative model is obtained through pre-training, and is used for outputting a text corresponding to a specific generative task according to an input text.

For example, if the generation task corresponding to the generation model is a title generation task, the generation model outputs a title text corresponding to the input text; and if the generating task corresponding to the generating model is the dialogue generating task, outputting a reply text corresponding to the input text by the generating model.

In this embodiment, a Language Model (LM) is obtained through pre-training, and is used to output a next character having a specific style after the current input text according to the current input text, and then, after a splicing result of the output character and the current input text is used as the current input text, the Language Model is continuously input, so that iteration is performed for multiple times, the input current input text is a partial character output by the generation Model, for example, a first character or word in the text output by the generation Model, the output character is a character or word, and the Language Model in this embodiment corresponds to one style.

For example, if the specific style corresponding to the language model is forward emotion, the language model outputs the next character which is located after the current input text and has forward emotion; if the specific style corresponding to the language model is martial arts wind, the language model outputs the next character which is located behind the current input text and has martial arts wind.

The embodiment obtains a discriminator, which may be a dual-tower-based DSSM (Deep Semantic matching Model), through pre-training, and is used to determine whether the generated text satisfies a task of generating the input text, for example, whether the generated text can be used as a title of the input text.

The connector in this embodiment is a neural network model based on an encoder and decoder structure, and is configured to obtain a first probability value of a next character located after an input text of a generator model according to the input text of the generator model and the input text of a language model, obtain a final probability value of a character located at a corresponding position in an output text of the generator model and the language model by combining a second probability value and a third probability value of the character located at the corresponding position according to the position of the obtained character, and obtain the output text according to the final probability value of the character located at each position.

It can be understood that, when the connector in this embodiment obtains the final probability value of the character at each position according to the probability value of the character at the position, the connector may use the average of the three probability values as the final probability value, or may use the following calculation formula to calculate:

fina_prob＝lm_prob*(gate)+g_prob(1-gate)

in the formula: final _ prob represents the final probability value of the character at each position; lm _ prob represents a third probability value of the character at each position output by the language model; gate represents a first probability value of the character at each location output by the connector; g _ prob represents a second probability value of the character at each position output by the generative model.

In this embodiment, after the step S102 of constructing the neural network model including the generative model, the language model, the connector and the discriminator, the step S103 of performing countermeasure training on the connector and the discriminator in the neural network model using the training samples is performed, and the generative model, the language model and the connector in the trained neural network model are used to obtain the style text generative model. The style text generation model obtained by training in the embodiment can be used for generating a target text which corresponds to a generation task of the text to be processed and has a specific style according to the input text to be processed.

In this embodiment, when the countermeasure training of the connector and the discriminator is performed in S103, that is, parameters of the fixed generative model and the language model are used, the countermeasure training of the connector and the discriminator is performed, so that the output text of the connector can be fused into a specific style to the maximum extent, and the output text is judged by the discriminator to conform to the generative task corresponding to the input text.

Specifically, when performing S103 antagonistic training on the connector and the discriminator in the neural network model by using the training sample, the present embodiment may adopt an optional implementation manner as follows: taking a second text in the training sample as a real sample corresponding to the first text; taking the first text as the input of a neural network model, and taking the output result of the first text after the processing of a generating model, a language model and a connector as a generating sample corresponding to the first text; taking the first text and the corresponding real sample and the generated sample as the input of a discriminator, and respectively obtaining a loss function of the connector and a loss function of the discriminator according to the output result of the discriminator; and adjusting parameters of the connector and the discriminator according to the loss function of the connector and the loss function of the discriminator until the neural network model converges.

In this embodiment, when S103 is executed to obtain the loss function of the discriminator according to the output result of the discriminator, the optional implementation manner that may be adopted is: inputting the first text and a real sample corresponding to the first text into a discriminator to obtain a first output of the discriminator corresponding to the first text; inputting the first text and the corresponding generated sample into a discriminator to obtain a second output of the discriminator corresponding to the first text; and obtaining a loss function of the discriminator according to the first output and the second output corresponding to the first text.

In the present embodiment, the first output and the second output outputted by the discriminator are values between 0 and 1; in this embodiment, when the loss function of the discriminator is obtained according to the first output and the second output, the loss function can be obtained based on a calculation method of the cross entropy loss function.

In the embodiment, when S103 is executed to obtain the loss function of the connector according to the output result of the discriminator, the optional implementation manner that can be adopted is as follows: determining a first probability value of a next character, which is obtained by the connector according to the first text and the current input text of the language model and is positioned after the current input text; obtaining a probability mean value according to first probability values of characters at all positions; inputting the first text and the corresponding generated sample into a discriminator to obtain a second output of the discriminator corresponding to the first text; from the obtained probability mean and the second output, a loss function of the connector is obtained, e.g. the sum of the two is taken as the loss function of the connector.

Since the loss function of the connector in this embodiment is composed of two parts, the output obtained by the connector can have a specific style, and the discriminator cannot distinguish whether the output is a generated sample or a real sample.

Specifically, in the embodiment, when the parameters of the connector and the discriminator are adjusted according to the loss function, the training targets of the connector and the discriminator are such that the loss functions corresponding to the connector and the discriminator converge at the same time. When the loss function of the connector and the loss function of the discriminator converge at the same time, namely the training of the neural network model is considered to be completed, the generation model, the language model and the connector in the neural network model can be used as the style text generation model.

By using the style text generation model obtained in the embodiment, after the text to be processed is input, the style text generation model can generate the target text which corresponds to the generation task of the text to be processed and has a specific style.

According to the method provided by the embodiment, the neural network model is trained in a mode of carrying out antagonistic training on the connector and the discriminator in the neural network model, so that the generation model, the language model and the connector in the trained neural network model are used as the style text generation model, and the connector in the style text generation model can dynamically predict the style intensity of each character in the target text, so that the target text generated by the style text generation model can have a more vivid style and be more smooth while conforming to a corresponding generation task.

Fig. 2 is a schematic diagram according to a second embodiment of the present application. As shown in fig. 2, this figure shows the network structure of the connectors in the neural network model: wherein the input of the encoder is the input of the generative model, i.e. the first text; the input of the decoder is the input of a language model, namely the current input text; the output Gate output is a first probability value of the next character after the current input text and output by the connector; g output is a second probability value of the character at the corresponding position in the output text of the generating model; LM output is a third probability value of characters at a corresponding position in an output text of the language model; final output is the Final probability value of the character at the corresponding position.

Fig. 3 is a schematic diagram according to a third embodiment of the present application. As shown in fig. 3, this figure shows the network structure of the discriminators in the neural network model: wherein A is a first text, and B is a real sample or a generated sample; after the first text passes through a Transformer structure, pooling is carried out, and then full-connection processing is carried out to obtain a full-connection result corresponding to the first text; after the real sample or the generated sample passes through a Transformer structure, pooling is carried out, and then full-connection processing is carried out to obtain a full-connection result corresponding to the real sample or the generated sample; performing full-connection processing on a full-connection result corresponding to the first text and a full-connection result corresponding to the real sample to obtain a first output corresponding to the first text; and carrying out full connection processing on the full connection result corresponding to the first text and the full connection result corresponding to the generated sample to obtain a second output corresponding to the first text.

Fig. 4 is a schematic diagram according to a fourth embodiment of the present application. As shown in fig. 4, which shows a schematic view of the counter training of a connector; generating a model for the style text in a dotted line frame; inputting A as a first text and outputting B' as a generated sample; loss-gs represents the mean probability; loss-gd represents a second output; the loss function of the connector is: the sum of loss-gs and loss-gd.

Fig. 5 is a schematic diagram according to a fifth embodiment of the present application. As shown in FIG. 5, this figure shows a schematic diagram of the confrontational training on a discriminator; generating a model for the style text in a dotted line frame; inputting A as a first text, outputting B' as a generation sample, and enabling real B as a second text corresponding to the first text; loss-d represents the penalty function of the arbiter.

Fig. 6 is a schematic diagram according to a sixth embodiment of the present application. As shown in fig. 6, the apparatus for creating a model of generating a stylistic text of the present embodiment includes:

the acquiring unit 601 is configured to acquire a training sample, where the training sample includes a plurality of first texts and a second text that corresponds to a task for generating each first text and has a specific style;

the building unit 602 is configured to build a neural network model including a generative model, a language model, a connector and a discriminator, where the connector is used to connect the generative model and the language model;

the training unit 603 is configured to perform countermeasure training on the connector and the discriminator in the neural network model by using the training sample, and obtain a style text generation model by using a generation model, a language model and a connector in the neural network model obtained through training;

in the training sample of the obtaining unit 601 in this embodiment, the generating task of the first text may be one of a dialog generating task, an abstract generating task, a title generating task, and the like; the specific style of the second text may be one of positive emotion, negative emotion, humorous, romantic, headline, swordsman, and the like.

In this embodiment, a plurality of first texts acquired by the acquiring unit 601 correspond to the same generation task, and the specific style of the acquired second text is one of a plurality of styles.

In the embodiment, after the training samples are obtained by the obtaining unit 601, the building unit 602 builds a neural network model including a generative model, a language model, a connector and a discriminator, wherein the connector is used for connecting the generative model and the language model.

The building unit 602 obtains a generative model through pre-training, and is configured to output a text corresponding to a specific generative task according to an input text, where the generative model in this embodiment corresponds to one generative task.

The constructing unit 602 obtains a Language Model (LM) through pre-training, and is configured to output a next character with a specific style after the current input text according to the current input text, and then continue inputting the Language Model after a result of splicing the output character and the current input text is used as the current input text, where the iteration is performed for multiple times, the input current input text is a partial character output by the generation Model, for example, a first character or word in the text output by the generation Model, and the output character is a character or word, where the Language Model in this embodiment corresponds to one style.

The constructing unit 602 obtains a discriminator, which may be a dual-tower based DSSM (Deep Structured Semantic matching Model), through pre-training, and is used for judging whether the generated text meets the generation task of the input text, for example, whether the generated text can be used as a title of the input text.

The connector in the building unit 602 is a neural network model based on an encoder and decoder structure, and is configured to obtain a first probability value of a next character located after the current input text according to the input text of the generator model and the current input text of the language model, then obtain a final probability value of the character located at the position according to the obtained position of the character by combining a second probability value and a third probability value of the character at a corresponding position in the output text of the generator model and the language model, and finally obtain the output text according to the final probability value of the character located at each position.

It can be understood that, when the connector in the constructing unit 602 obtains the final probability value of the character at each position according to the probability value of the character at the position, the connector may use the average of the three probability values as the final probability value, or calculate the final probability value by using the following calculation formula:

fina_prob＝lm_prob*(gate)+g_prob(1-gate)

In this embodiment, after the building unit 602 builds the neural network model including the generative model, the language model, the connector, and the discriminator, the training unit 603 performs countermeasure training on the connector and the discriminator in the neural network model using the training samples, and obtains the style text generative model using the generative model, the language model, and the connector in the neural network model obtained by the training. The style text generation model trained by the training unit 603 can generate a target text with a specific style corresponding to a generation task of the text to be processed according to the input text to be processed.

In this embodiment, the training unit 603 performs countermeasure training on the connector and the discriminator, that is, parameters of the fixed generative model and the language model, so that the output text of the connector can be merged into a specific style to the maximum extent, and the connector and the discriminator determines that the output text conforms to the generative task corresponding to the input text.

Specifically, when the training unit 603 in this embodiment performs countermeasure training on the connectors and the discriminators in the neural network model using the training samples, the optional implementation manner that can be adopted is as follows: taking a second text in the training sample as a real sample corresponding to the first text; taking the first text as the input of a neural network model, and taking the output result of the first text after the processing of a generating model, a language model and a connector as a generating sample corresponding to the first text; taking the first text and the corresponding real sample and the generated sample as the input of a discriminator, and respectively obtaining a loss function of the connector and a loss function of the discriminator according to the output result of the discriminator; and adjusting parameters of the connector and the discriminator according to the loss function of the connector and the loss function of the discriminator until the neural network model converges.

When the training unit 603 in this embodiment obtains the loss function of the discriminator according to the output result of the discriminator, the optional implementation manner that can be adopted is as follows: inputting the first text and a real sample corresponding to the first text into a discriminator to obtain a first output of the discriminator corresponding to the first text; inputting the first text and the corresponding generated sample into a discriminator to obtain a second output of the discriminator corresponding to the first text; and obtaining a loss function of the discriminator according to the first output and the second output corresponding to the first text.

Wherein, the first output and the second output outputted by the discriminator in the training unit 603 are values between 0 and 1; the training unit 603 may obtain the loss function based on the cross entropy loss function when obtaining the loss function of the discriminator according to the first output and the second output.

When the training unit 603 in this embodiment obtains the loss function of the connector according to the output result of the discriminator, the optional implementation manner that can be adopted is as follows: determining a first probability value of a next character, which is obtained by the connector according to the first text and the current input text of the language model and is positioned after the current input text; obtaining a probability mean value according to first probability values of characters at all positions; inputting the first text and the corresponding generated sample into a discriminator to obtain a second output of the discriminator corresponding to the first text; and obtaining a loss function of the connector according to the obtained probability mean value and the second output.

Since the loss function of the connector in the training unit 603 is composed of two parts, the output obtained by the connector can be made to have a specific style, and it is impossible for the discriminator to distinguish whether the output is a generated sample or a true sample.

Specifically, when the parameters of the connector and the discriminator are adjusted according to the loss function, the training unit 603 sets the training targets of the connector and the discriminator so that the loss functions corresponding to the connector and the discriminator converge at the same time. When the loss function of the connector and the loss function of the discriminator converge at the same time, namely the training of the neural network model is considered to be completed, the generation model, the language model and the connector in the neural network model can be used as the style text generation model.

According to an embodiment of the present application, an electronic device and a computer-readable storage medium are also provided.

Fig. 7 is a block diagram of an electronic device according to an embodiment of the present application. Electronic devices are intended to represent various forms of digital computers, such as laptops, desktops, workstations, personal digital assistants, servers, blade servers, mainframes, and other appropriate computers. The electronic device may also represent various forms of mobile devices, such as personal digital processing, cellular phones, smart phones, wearable devices, and other similar computing devices. The components shown herein, their connections and relationships, and their functions, are meant to be examples only, and are not meant to limit implementations of the present application that are described and/or claimed herein.

As shown in fig. 7, the electronic apparatus includes: one or more processors 701, a memory 702, and interfaces for connecting the various components, including a high-speed interface and a low-speed interface. The various components are interconnected using different buses and may be mounted on a common motherboard or in other manners as desired. The processor may process instructions for execution within the electronic device, including instructions stored in or on the memory to display graphical information of a GUI on an external input/output apparatus (such as a display device coupled to the interface). In other embodiments, multiple processors and/or multiple buses may be used, along with multiple memories and multiple memories, as desired. Also, multiple electronic devices may be connected, with each device providing portions of the necessary operations (e.g., as a server array, a group of blade servers, or a multi-processor system). In fig. 7, one processor 701 is taken as an example.

The memory 702 is a non-transitory computer readable storage medium as provided herein. Wherein the memory stores instructions executable by at least one processor to cause the at least one processor to perform the method of modeling stylistic text generations provided herein. A non-transitory computer readable storage medium of the present application stores computer instructions for causing a computer to perform the method of building a stylized text generation model provided herein.

The memory 702, which is a non-transitory computer readable storage medium, may be used to store non-transitory software programs, non-transitory computer executable programs, and modules, such as program instructions/modules corresponding to the method of creating a model for generating a style text in the embodiments of the present application (e.g., the obtaining unit 601, the constructing unit 602, and the training unit 603 shown in fig. 6). The processor 701 executes various functional applications of the server and data processing, i.e., implements the method of establishing a stylistic text generation model in the above-described method embodiments, by running non-transitory software programs, instructions, and modules stored in the memory 702.

The memory 702 may include a storage program area and a storage data area, wherein the storage program area may store an operating system, an application program required for at least one function; the storage data area may store data created according to the use of the electronic device, and the like. Further, the memory 702 may include high speed random access memory, and may also include non-transitory memory, such as at least one magnetic disk storage device, flash memory device, or other non-transitory solid state storage device. In some embodiments, memory 702 optionally includes memory located remotely from processor 701, and such remote memory may be connected over a network to an electronic device that is used in methods of building a stylized text generation model. Examples of such networks include, but are not limited to, the internet, intranets, local area networks, mobile communication networks, and combinations thereof.

The electronic device of the method of building a stylistic text model may further comprise: an input device 703 and an output device 704. The processor 701, the memory 702, the input device 703 and the output device 704 may be connected by a bus or other means, and fig. 7 illustrates an example of a connection by a bus.

The input device 703 may receive input numeric or character information and generate key signal inputs related to user settings and function controls of the electronic device of the method of creating a stylistic text generation model, such as a touch screen, keypad, mouse, track pad, touch pad, pointer stick, one or more mouse buttons, track ball, joystick, or the like. The output devices 704 may include a display device, auxiliary lighting devices (e.g., LEDs), and tactile feedback devices (e.g., vibrating motors), among others. The display device may include, but is not limited to, a Liquid Crystal Display (LCD), a Light Emitting Diode (LED) display, and a plasma display. In some implementations, the display device can be a touch screen.

Various implementations of the systems and techniques described here can be realized in digital electronic circuitry, integrated circuitry, application specific ASICs (application specific integrated circuits), computer hardware, firmware, software, and/or combinations thereof. These various embodiments may include: implemented in one or more computer programs that are executable and/or interpretable on a programmable system including at least one programmable processor, which may be special or general purpose, receiving data and instructions from, and transmitting data and instructions to, a storage system, at least one input device, and at least one output device.

These computer programs (also known as programs, software applications, or code) include machine instructions for a programmable processor, and may be implemented using high-level procedural and/or object-oriented programming languages, and/or assembly/machine languages. As used herein, the terms "machine-readable medium" and "computer-readable medium" refer to any computer program product, apparatus, and/or device (e.g., magnetic discs, optical disks, memory, Programmable Logic Devices (PLDs)) used to provide machine instructions and/or data to a programmable processor, including a machine-readable medium that receives machine instructions as a machine-readable signal. The term "machine-readable signal" refers to any signal used to provide machine instructions and/or data to a programmable processor.

To provide for interaction with a user, the systems and techniques described here can be implemented on a computer having: a display device (e.g., a CRT (cathode ray tube) or LCD (liquid crystal display) monitor) for displaying information to a user; and a keyboard and a pointing device (e.g., a mouse or a trackball) by which a user can provide input to the computer. Other kinds of devices may also be used to provide for interaction with a user; for example, feedback provided to the user can be any form of sensory feedback (e.g., visual feedback, auditory feedback, or tactile feedback); and input from the user may be received in any form, including acoustic, speech, or tactile input.

The systems and techniques described here can be implemented in a computing system that includes a back-end component (e.g., as a data server), or that includes a middleware component (e.g., an application server), or that includes a front-end component (e.g., a user computer having a graphical user interface or a web browser through which a user can interact with an implementation of the systems and techniques described here), or any combination of such back-end, middleware, or front-end components. The components of the system can be interconnected by any form or medium of digital data communication (e.g., a communication network). Examples of communication networks include: local Area Networks (LANs), Wide Area Networks (WANs), and the Internet.

The computer system may include clients and servers. A client and server are generally remote from each other and typically interact through a communication network. The relationship of client and server arises by virtue of computer programs running on the respective computers and having a client-server relationship to each other. The Server can be a cloud Server, also called a cloud computing Server or a cloud host, and is a host product in a cloud computing service system, so as to solve the defects of high management difficulty and weak service expansibility in the traditional physical host and VPS service ("Virtual Private Server", or simply "VPS").

According to the technical scheme of the embodiment of the application, the neural network model is trained in a mode of carrying out antagonistic training on the connector and the discriminator in the neural network model, so that the generation model, the language model and the connector in the trained neural network model are used as the style text generation model, and the connector in the style text generation model can dynamically predict the style intensity of each character in the target text, so that the target text generated by the style text generation model can have a more vivid style and be more smooth while conforming to a corresponding generation task.

It should be understood that various forms of the flows shown above may be used, with steps reordered, added, or deleted. For example, the steps described in the present application may be executed in parallel, sequentially, or in different orders, as long as the desired results of the technical solutions disclosed in the present application can be achieved, and the present invention is not limited herein.

The above-described embodiments should not be construed as limiting the scope of the present application. It should be understood by those skilled in the art that various modifications, combinations, sub-combinations and substitutions may be made in accordance with design requirements and other factors. Any modification, equivalent replacement, and improvement made within the spirit and principle of the present application shall be included in the protection scope of the present application.

Claims

1. A method of building a stylized text generation model, comprising:

acquiring a training sample, wherein the training sample comprises a plurality of first texts and second texts which correspond to the generation tasks of the first texts and have a specific style;

constructing a neural network model comprising a generative model, a language model, a connector and a discriminator, wherein the connector is used for connecting the generative model and the language model;

performing countermeasure training on a connector and a discriminator in the neural network model by using the training sample, and obtaining a style text generation model by using a generation model, a language model and the connector in the neural network model obtained by training;

2. The method of claim 1, wherein the training counter the connectors and discriminators in the neural network model using the training samples comprises:

fixing parameters of a generation model and a language model in the neural network model;

and carrying out countermeasure training on the connector and the discriminator in the neural network model by using the training sample.

3. The method of claim 2, wherein the training counter the connectors and discriminators in the neural network model using the training samples comprises:

taking a second text in the training sample as a real sample corresponding to the first text;

taking a first text as the input of the neural network model, and taking an output result obtained by processing the first text through a generation model, a language model and a connector as a generation sample corresponding to the first text;

taking the first text and the corresponding real sample and the generated sample as the input of a discriminator, and respectively obtaining a loss function of the connector and a loss function of the discriminator according to the output result of the discriminator;

and adjusting parameters of the connector and the discriminator according to the loss function of the connector and the loss function of the discriminator until the neural network model converges.

4. The method of claim 3, wherein deriving a penalty function for the arbiter based on the output of the arbiter comprises:

inputting the first text and a real sample corresponding to the first text into a discriminator to obtain a first output of the discriminator corresponding to the first text;

inputting the first text and the corresponding generated sample into a discriminator to obtain a second output of the discriminator corresponding to the first text;

and obtaining a loss function of the discriminator according to the first output and the second output corresponding to the first text.

5. The method of claim 3, wherein deriving a penalty function for the arbiter based on the output of the arbiter comprises:

determining a first probability value of a next text which is obtained by the connector according to the first text and the current input text of the language model and is positioned after the current input text;

obtaining a probability mean value according to first probability values of texts located at all positions;

and obtaining a loss function of the connector according to the probability mean value and the second output.

6. An apparatus for building a stylized text generation model, comprising:

the device comprises an acquisition unit, a processing unit and a processing unit, wherein the acquisition unit is used for acquiring a training sample, and the training sample comprises a plurality of first texts and second texts which correspond to generation tasks of the first texts and have specific styles;

the device comprises a construction unit, a judgment unit and a control unit, wherein the construction unit is used for constructing a neural network model comprising a generation model, a language model, a connector and a discriminator, and the connector is used for connecting the generation model and the language model;

the training unit is used for carrying out countermeasure training on the connector and the discriminator in the neural network model by using the training sample, and obtaining a style text generation model by using a generation model, a language model and the connector in the neural network model obtained by training;

7. The apparatus of claim 6, wherein the training unit performs, in the training of the connectors and discriminators in the neural network model against each other using the training samples, specifically:

8. The apparatus of claim 7, wherein the training unit performs, in the training of the connectors and discriminators in the neural network model against each other using the training samples, specifically:

9. The apparatus according to claim 8, wherein the training unit, when obtaining the loss function of the discriminator according to the output result of the discriminator, specifically performs:

10. The apparatus according to claim 8, wherein the training unit, when obtaining the loss function of the discriminator according to the output result of the discriminator, specifically performs:

11. An electronic device, comprising:

at least one processor; and

a memory communicatively coupled to the at least one processor; wherein,

the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the method of any one of claims 1-5.

12. A non-transitory computer readable storage medium having stored thereon computer instructions for causing the computer to perform the method of any one of claims 1-5.