CN112507721B

CN112507721B - Method, apparatus, device and computer readable storage medium for generating text theme

Info

Publication number: CN112507721B
Application number: CN202011367702.5A
Authority: CN
Inventors: 盛广智; 郑烨翰; 蔡远俊
Original assignee: Beijing Baidu Netcom Science and Technology Co Ltd
Current assignee: Beijing Baidu Netcom Science and Technology Co Ltd
Priority date: 2020-11-27
Filing date: 2020-11-27
Publication date: 2023-08-11
Anticipated expiration: 2040-11-27
Also published as: CN112507721A

Abstract

Embodiments of the present disclosure provide methods, apparatuses, devices, and computer-readable storage media for generating text topics, relating to the field of artificial intelligence for knowledge-graph, deep learning, and the like. A method of training a topic generation model includes obtaining a correct training sample and a wrong training sample, the correct training sample including text and a correct topic for the text, the wrong training sample including text and a wrong topic for the text; training a reward model based on the correct training sample and the incorrect training sample, the reward model for scoring the generated topic for the text; acquiring a pre-trained topic generation model, wherein the topic generation model is used for generating topics of texts; and optimizing the topic generation model by reinforcement learning training based on the reward model. The embodiment of the disclosure can automatically generate concise topics for the text, and the generated topics can completely and correctly reserve key information in the text.

Description

Method, apparatus, device and computer readable storage medium for generating text theme

Technical Field

Embodiments of the present disclosure relate generally to the field of machine learning, and more particularly, relate to a method, apparatus, device, computer readable storage medium and computer program product for generating text topics.

Background

With the development of the internet, content ecology is becoming more and more important. Typically, the content creator needs to read a large amount of article material from which to find key information suitable for authoring. The reading and summarizing of a large number of article stories requires a significant amount of effort and time. It is therefore desirable to automatically generate short topics of text by technical means to provide content creators as references.

Disclosure of Invention

Embodiments of the present disclosure provide methods, apparatuses, devices, and computer-readable storage media for generating text topics.

In a first aspect of the present disclosure, a method of training a topic generation model is provided. The method comprises the steps of obtaining a correct training sample and a wrong training sample, wherein the correct training sample comprises a text and a correct theme of the text, and the wrong training sample comprises the text and a wrong theme of the text; training a reward model based on the correct training sample and the incorrect training sample, the reward model for scoring the generated topic for the text; acquiring a pre-trained topic generation model, wherein the topic generation model is used for generating topics of texts; and optimizing the topic generation model by reinforcement learning training based on the reward model.

In a second aspect of the present disclosure, a method of generating a text topic is provided. The method includes obtaining an input text; and generating a topic of the input text using a topic generation model trained in accordance with the method of the first aspect of the present disclosure.

In a third aspect of the present disclosure, an apparatus for training a topic generation model is provided. The apparatus includes a training sample acquisition module configured to acquire a correct training sample and a wrong training sample, the correct training sample including text and a correct topic of the text, the wrong training sample including text and a wrong topic of the text; a reward model training module configured to train a reward model for scoring a topic generated for the text based on the correct training sample and the incorrect training sample; the model pre-training module is configured to acquire a pre-trained theme generation model, wherein the theme generation model is used for generating a theme of the text; and a reinforcement learning training module configured to optimize the topic generation model through reinforcement learning training based on the reward model.

In a fourth aspect of the present disclosure, an apparatus for generating a text topic is provided. The apparatus includes an acquisition module configured to acquire an input text; and a generation module configured to generate a topic of the input text using a topic generation model trained in accordance with the method of the first aspect of the present disclosure.

In a fifth aspect of the present disclosure, a computing device is provided that includes one or more processors; and a memory for storing one or more programs which, when executed by the one or more processors, cause the computing device to implement a method as described in accordance with the first or second aspects of the present disclosure.

In a sixth aspect of the present disclosure, a computer readable storage medium having a computer program stored thereon is provided. The computer program, when executed by a processor, implements any of the steps of the method described in accordance with the first or second aspect of the present disclosure.

In a seventh aspect of the present disclosure, there is provided a computer program product comprising a computer program which, when executed by a processor, implements a method according to the first or second aspect of the present disclosure.

The summary is provided to introduce a selection of concepts in a simplified form that are further described below in the detailed description. This summary is not intended to identify key features or essential features of the disclosure, nor is it intended to be used to limit the scope of the disclosure.

Drawings

The foregoing and other objects, features and advantages of the disclosure will be apparent from the following more particular descriptions of exemplary embodiments of the disclosure as illustrated in the accompanying drawings wherein like reference numbers generally represent like parts throughout the exemplary embodiments of the disclosure.

FIG. 1 illustrates a block diagram of an example system in which embodiments of the present disclosure may be implemented;

FIG. 2 shows a schematic diagram of a training topic generation model in accordance with an embodiment of the present disclosure;

FIG. 3 illustrates a flowchart of an example method of training a topic generation model in accordance with an embodiment of the present disclosure;

FIG. 4 illustrates a flowchart of an example method of generating text topics, according to an embodiment of the present disclosure;

FIG. 5 illustrates a block diagram of an example apparatus for training a topic generation model in accordance with an embodiment of the present disclosure;

FIG. 6 illustrates a block diagram of an example apparatus to generate text topics in accordance with an embodiment of the present disclosure; and

FIG. 7 illustrates a block diagram of an example computing device capable of implementing various embodiments of the disclosure.

Like or corresponding reference characters indicate like or corresponding parts throughout the several views.

Detailed Description

Preferred embodiments of the present disclosure will be described in more detail below with reference to the accompanying drawings. While the preferred embodiments of the present disclosure are illustrated in the drawings, it should be understood that the present disclosure may be embodied in various forms and should not be limited to the embodiments set forth herein. Rather, these embodiments are provided so that this disclosure will be thorough and complete, and will fully convey the scope of the disclosure to those skilled in the art.

The term "comprising" and variations thereof as used herein means open ended, i.e., "including but not limited to. The term "or" means "and/or" unless specifically stated otherwise. The term "based on" means "based at least in part on". The terms "one example embodiment" and "one embodiment" mean "at least one example embodiment. The term "another embodiment" means "at least one additional embodiment". The terms "first," "second," and the like, may refer to different or the same object. Other explicit and implicit definitions are also possible below.

As described above, with the development of the internet, content ecology is becoming more and more important. Typically, the content creator needs to read a large amount of article material from which to find key information suitable for authoring. The reading and summarizing of a large number of article stories requires a significant amount of effort and time. It is therefore desirable to automatically generate short topics of text by technical means to provide content creators as references. For example, for news material entitled "Faraday Future (FF) title received Jiucheng first stroke money, three weeks of completion of technical negotiations", if the article is automatically summarized as entitled "FF91 is commercially promising", it is helpful for the finance author to rapidly determine that the material has higher value for finance writing, and is suitable for creating an article with depth.

Some text topic generation schemes typically extract key sentences directly from text as topics. In addition, some topic phrase generation schemes combine the extraction discrimination technology and the topic generation technology, and two candidate topics with high confidence are given out through strategy screening for selection by a user. These conventional solutions have the following drawbacks: (1) The topics directly extracted from the text may not be short enough to be filled with a large amount of unnecessary information, resulting in reduced reading efficiency for the user; (2) The existing theme phrase generation scheme gives a plurality of candidates with higher confidence degrees instead of unique results, and the candidates need to be selected manually, thus consuming manpower and time.

Embodiments of the present disclosure propose a solution to generate text topics that can address one or more of the above problems and other potential problems. According to the scheme, concise topics can be automatically generated for the text, and the generated topics can completely and correctly reserve key information in the text. In this way, the scheme can help content creators to quickly screen information, and improve content creation efficiency.

Embodiments of the present disclosure will be described in detail below with reference to the accompanying drawings. As used herein, the term "model" generally refers to a relational structure of a system that is expressed, generally or approximately, in mathematical language with reference to the characteristics of that system. The model may generally be generated by training on known data. The generated model may include model structures and model parameters, and so forth. Model parameters may vary depending on the type of particular model.

Fig. 1 illustrates a block diagram of an example system 100 in which embodiments of the present disclosure may be implemented. The system 100 may generally include a model training subsystem 110 and a model application subsystem 120. In model training subsystem 110, model training device 111 may obtain training data or training samples from training data set 101 (e.g., a text library) to train topic generation model 102, topic generation model 102 for generating topics of text. The trained topic generation model 102 can be provided to a model application device 121. In model application subsystem 120, model application device 121 may utilize trained topic generation model 102 to generate topic 104 of input text 103. For example, an example of the input text 103 may be a news full text titled "Faraday Future (FF) says that a nine-city first stroke is received, three weeks to complete a technical negotiation", and an example of the subject 104 may be "FF91 is expected to be marketed".

It should be understood that the structure and function of system 100 are described for exemplary purposes only and do not imply any limitation on the scope of the present disclosure. Embodiments of the present disclosure may also be applied in environments having different structures and/or functions. It should also be appreciated that in fig. 1, model application device 121 and model training device 111 may be the same device or different devices.

FIG. 2 shows a schematic diagram of a training topic generation model 102 in accordance with an embodiment of the present disclosure. The training process can be divided into a reward model training phase 210 and a reinforcement learning training phase 220.

As shown in fig. 2, during a reward model training phase 210, the model training device 111 may obtain the correct training samples 201 from the training data set 101 (e.g., a text library). For example, training sample 201 may include text and the correct subject matter of the text. The correct subject matter of the text may be pre-annotated or generated manually or otherwise and will not be discussed in detail herein. Model training device 111 may construct erroneous training samples 202 based on the correct training samples 201. For example, training sample 202 may include text and an error topic for the text.

In some embodiments, the model training device 111 may extract knowledge items from the text and topics of the training sample 201. For example, the model training device 111 may utilize any known or yet to be developed algorithms and/or tools to extract knowledge items from the text and topics of the training sample 201. Examples of algorithms and/or tools include, but are not limited to, the OpenIE tool of CoreNLP of stanford. Knowledge items extracted from the text of training sample 201 are used to describe the relationships between entities that appear in the text. Knowledge items extracted from the topics of the training sample 201 are used to describe the relationships between entities that appear in the topic. Each knowledge item may be represented by a triplet < entity, relationship, object >, where an object may be, for example, another entity having a "relationship" with an "entity", which may be, for example, a person, place, etc. An example knowledge item is, for example, < Li An (entity), director (relationship), fantasy drift (object) of the juvenile group.

In some embodiments, model training device 111 may construct an error topic by modifying or deleting information in the correct topic and generate an error training sample 202 based on the text and the constructed error topic. For example, the model training device 111 may delete entities (and/or objects) in the correct topic or replace them with other entities (and/or objects) to construct the wrong topic. The other entities (and/or objects) may be, for example, entities (and/or objects) that appear in the corresponding text or entities (and/or objects) that do not appear in the corresponding text. For another example, the model training device 111 may delete relationships in the correct topic or replace relationships with other relationships to construct the wrong topic. Other relationships may be relationships that occur in the corresponding text or relationships that do not occur in the corresponding text.

It should be appreciated that model training device 111 may also obtain or construct training samples 202 containing false topics in a different manner than that described above. Embodiments of the disclosure are not limited in this respect.

In some embodiments, model training device 111 may train reward model 203 based on correct training sample 201 and incorrect training sample 202 such that the reward model generates a first predetermined score (e.g., 1 score) for the correct training sample and a second predetermined score (e.g., 0 score) for the incorrect training sample. For example, the model training device 111 may acquire a pre-trained semantic understanding model and then perform a transfer learning training on the model to classify a correct training sample and a wrong training sample. In this way, the model training device 111 can obtain the bonus model 203, which bonus model 203 can give a score (e.g., 0-1 score) ranging between the first predetermined score and the second predetermined score to the subject generated for the text. The trained reward model 203 will be used during the reinforcement learning training phase 220.

As shown in fig. 2, during the reinforcement learning training phase 220, the model training device 111 may obtain a pre-trained topic generation model 102. In some embodiments, the topic generation model 102 may be based on a pre-trained text generation model based on a set of training samples that include text and its corresponding correct topic for transfer learning training. The pre-trained topic generation model 102 will be used as a baseline model for the reinforcement learning training phase.

In some embodiments, model training device 111 may iteratively perform one or more of the following operations to reinforcement learning train pre-trained topic generation model 102. For example, model training device 111 may obtain any training text 204 from training data set 101. Model training device 111 may generate model 102 using pre-trained topics to generate topics 205 of training samples 204. Model training device 111 may utilize trained rewards model 203 to generate scores 206 for topics 205. Model training device 111 may then optimize topic generation model 102 based on score 206. For example, the model training device 111 may optimize model parameters of the topic generation model 102 based on the score 206 using a near-end policy optimization (PPO) algorithm and/or any other suitable algorithm.

The reinforcement-learning trained topic generation model 102 can be provided to a model application device 121 as shown in FIG. 1 for generating topics of text. Additionally, in some embodiments, the trained reward model 203 may also be provided to the model application device 121 as shown in fig. 1 to give a confidence score for the generated topic.

FIG. 3 illustrates a flowchart of an example method 300 of training a topic generation model in accordance with an embodiment of the present disclosure. The method 300 may be performed, for example, at the model training apparatus 111 as shown in fig. 1. The method 300 will be described in detail below in conjunction with fig. 1 and 2. It should be understood that method 300 may also include blocks not shown and/or that the blocks shown may be omitted. The scope of the present disclosure is not limited in this respect.

At block 310, the model training device 111 obtains a correct training sample 201 and a wrong training sample 202, the correct training sample 201 including text and a correct topic for the text, the wrong training sample 202 including text and a wrong topic for the text.

In some embodiments, obtaining the erroneous training samples comprises: constructing an error topic of the text by modifying or deleting information in the correct topic; and generating a training sample of errors based on the text and the constructed error topic of the text.

In some embodiments, constructing the error topic of the text includes: extracting at least one knowledge item from the correct topic, the at least one knowledge item describing a relationship between entities that occur in the correct topic; and constructing an incorrect topic of the text by modifying or deleting at least one of the entities and relationships in the correct topic.

At block 320, the model training device 111 trains the reward model 203 based on the correct training sample 201 and the incorrect training sample 202, the reward model 203 being used to score the topics generated for the text.

In some embodiments, training the reward model includes: the reward model is trained based on the correct training sample and the incorrect training sample such that the reward model generates a first predetermined score for the correct training sample and a second predetermined score for the incorrect training sample.

At block 330, the model training device 111 obtains the pre-trained topic generation model 102, the topic generation model 102 being used to generate topics for text.

At block 340, the model training device 111 optimizes the topic generation model 102 through reinforcement learning training based on the reward model 203.

In some embodiments, optimizing the topic generation model by reinforcement learning training includes: acquiring training text 204; generating a topic 205 of the training text 204 using the topic generation model 102; generating a score 206 for the topic 205 using the reward model 203; and optimizing the topic generation model 102 based on the score 206.

In some embodiments, optimizing the topic generation model based on the score includes: the topic generation model is optimized based on the scores using a near-end policy optimization algorithm.

Fig. 4 illustrates a flowchart of an example method 400 of generating text topics, according to an embodiment of the present disclosure. The method 400 may be performed, for example, at the model application device 121 as shown in fig. 1. The method 400 will be described in detail below in conjunction with fig. 1. It should be understood that method 400 may also include blocks not shown and/or that the blocks shown may be omitted. The scope of the present disclosure is not limited in this respect.

At block 410, the model application device 121 obtains the input text 103. At block 420, model application device 121 generates model 102 using the trained subject matter, generating subject matter 104 of input text 103.

As can be seen from the above description, embodiments of the present disclosure propose a scheme for generating text topics. According to the scheme, concise topics can be automatically generated for the text, and the generated topics can completely and correctly reserve key information in the text. In this way, the scheme can help content creators to quickly screen information, and improve content creation efficiency.

Embodiments of the present disclosure also provide corresponding apparatus for implementing the above-described methods 300 and 400.

FIG. 5 illustrates a block diagram of an example apparatus 500 to train a topic generation model in accordance with an embodiment of the present disclosure. As shown in fig. 5, the apparatus 500 includes a training sample acquisition module 510 configured to acquire a correct training sample including text and a correct topic of the text and a wrong training sample including text and a wrong topic of the text. The apparatus 500 further comprises a reward model training module 520 configured to train a reward model for scoring the generated topic for the text based on the correct training sample and the incorrect training sample. The apparatus 500 further includes a model pre-training module 530 configured to obtain a pre-trained topic generation model for generating topics of text. In addition, the apparatus 500 further includes a reinforcement learning training module 540 configured to optimize the topic generation model through reinforcement learning training based on the reward model.

In some embodiments, training sample acquisition module 510 includes: an error topic construction module configured to construct an error topic for the text by modifying or deleting information in the correct topic; and an error sample generation module configured to generate an error training sample based on the text and the constructed error topic of the text.

In some embodiments, the error topic construction module includes: an extraction module configured to extract at least one knowledge item from the correct topic, the at least one knowledge item describing a relationship between entities that occur in the correct topic; and a construction module configured to construct an erroneous topic of text by modifying or deleting at least one of entities and relationships in the correct topic.

In some embodiments, the reward model training module is configured to: the reward model is trained based on the correct training sample and the incorrect training sample such that the reward model generates a first predetermined score for the correct training sample and a second predetermined score for the incorrect training sample.

In some embodiments, the reinforcement learning training module includes: the training sample acquisition module is configured to acquire training texts; the topic generation module is configured to generate topics of the training text by using the topic generation model; a score generation module configured to generate a score for a topic using the reward model; and a model optimization module configured to optimize the topic generation model based on the score.

In some embodiments, the model optimization module is configured to: the topic generation model is optimized based on the scores using a near-end policy optimization algorithm.

Fig. 6 illustrates a block diagram of an example apparatus 600 to generate text topics according to an embodiment of the present disclosure. As shown in fig. 6, the apparatus 600 includes an acquisition module 610 configured to acquire an input text. The apparatus 600 further includes a generation module 620 configured to generate a topic of the input text using the trained topic generation model.

The modules included in apparatus 500 and/or 600 may be implemented in various ways, including software, hardware, firmware, or any combination thereof. In some embodiments, one or more units may be implemented using software and/or firmware, such as machine executable instructions stored on a storage medium. In addition to or in lieu of machine-executable instructions, some or all of the modules in apparatus 500 and/or 600 may be implemented at least in part by one or more hardware logic components. By way of example and not limitation, exemplary types of hardware logic components that can be used include Field Programmable Gate Arrays (FPGAs), application Specific Integrated Circuits (ASICs), application Specific Standards (ASSPs), systems On Chip (SOCs), complex Programmable Logic Devices (CPLDs), and the like.

Fig. 7 illustrates a block diagram of an example computing device 700 capable of implementing various embodiments of the disclosure. For example, model training device 111 and/or model application device 121 as shown in FIG. 1 may be implemented by device 700. As shown in fig. 7, the apparatus 700 includes a Central Processing Unit (CPU) 701, which may perform various suitable actions and processes according to computer program instructions stored in a Read Only Memory (ROM) 702 or computer program instructions loaded from a storage unit 708 into a Random Access Memory (RAM) 703. In the RAM 703, various programs and data required for the operation of the device 700 may also be stored. The CPU 701, ROM702, and RAM 703 are connected to each other through a bus 704. An input/output (I/O) interface 705 is also connected to bus 704.

Various components in device 700 are connected to I/O interface 705, including: an input unit 706 such as a keyboard, a mouse, etc.; an output unit 707 such as various types of displays, speakers, and the like; a storage unit 708 such as a magnetic disk, an optical disk, or the like; and a communication unit 709 such as a network card, modem, wireless communication transceiver, etc. The communication unit 709 allows the device 700 to exchange information/data with other devices via a computer network, such as the internet, and/or various telecommunication networks.

The various processes and treatments described above, such as methods 300 and/or 400, may be performed by processing unit 701. For example, in some embodiments, methods 300 and/or 400 may be implemented as a computer software program tangibly embodied on a machine-readable medium, such as storage unit 708. In some embodiments, part or all of the computer program may be loaded and/or installed onto device 700 via ROM702 and/or communication unit 709. When the computer program is loaded into RAM 703 and executed by CPU 701, one or more actions of methods 300 and/or 400 described above may be performed.

The present disclosure may be methods, apparatus, systems, and/or computer program products. The computer program product may include a computer readable storage medium having computer readable program instructions embodied thereon for performing aspects of the present disclosure.

The computer readable storage medium may be a tangible device that can hold and store instructions for use by an instruction execution device. The computer readable storage medium may be, for example, but not limited to, an electronic storage device, a magnetic storage device, an optical storage device, an electromagnetic storage device, a semiconductor storage device, or any suitable combination of the foregoing. More specific examples (a non-exhaustive list) of the computer-readable storage medium would include the following: portable computer disks, hard disks, random Access Memory (RAM), read-only memory (ROM), erasable programmable read-only memory (EPROM or flash memory), static Random Access Memory (SRAM), portable compact disk read-only memory (CD-ROM), digital Versatile Disks (DVD), memory sticks, floppy disks, mechanical coding devices, punch cards or in-groove structures such as punch cards or grooves having instructions stored thereon, and any suitable combination of the foregoing. Computer-readable storage media, as used herein, are not to be construed as transitory signals per se, such as radio waves or other freely propagating electromagnetic waves, electromagnetic waves propagating through waveguides or other transmission media (e.g., optical pulses through fiber optic cables), or electrical signals transmitted through wires.

The computer readable program instructions described herein may be downloaded from a computer readable storage medium to a respective computing/processing device or to an external computer or external storage device over a network, such as the internet, a local area network, a wide area network, and/or a wireless network. The network may include copper transmission cables, fiber optic transmissions, wireless transmissions, routers, firewalls, switches, gateway computers and/or edge servers. The network interface card or network interface in each computing/processing device receives computer readable program instructions from the network and forwards the computer readable program instructions for storage in a computer readable storage medium in the respective computing/processing device.

Computer program instructions for performing the operations of the present disclosure can be assembly instructions, instruction Set Architecture (ISA) instructions, machine-related instructions, microcode, firmware instructions, state setting data, or source or object code written in any combination of one or more programming languages, including an object oriented programming language such as Smalltalk, c++ or the like and conventional procedural programming languages, such as the "C" programming language or similar programming languages. The computer readable program instructions may be executed entirely on the user's computer, partly on the user's computer, as a stand-alone software package, partly on the user's computer and partly on a remote computer or entirely on the remote computer or server. In the case of a remote computer, the remote computer may be connected to the user's computer through any kind of network, including a Local Area Network (LAN) or a Wide Area Network (WAN), or may be connected to an external computer (for example, through the Internet using an Internet service provider). In some embodiments, aspects of the present disclosure are implemented by personalizing electronic circuitry, such as programmable logic circuitry, field Programmable Gate Arrays (FPGAs), or Programmable Logic Arrays (PLAs), with state information of computer readable program instructions, which can execute the computer readable program instructions.

Various aspects of the present disclosure are described herein with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems) and computer program products according to embodiments of the disclosure. It will be understood that each block of the flowchart illustrations and/or block diagrams, and combinations of blocks in the flowchart illustrations and/or block diagrams, can be implemented by computer-readable program instructions.

These computer readable program instructions may be provided to a processing unit of a general purpose computer, special purpose computer, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processing unit of the computer or other programmable data processing apparatus, create means for implementing the functions/acts specified in the flowchart and/or block diagram block or blocks. These computer readable program instructions may also be stored in a computer readable storage medium that can direct a computer, programmable data processing apparatus, and/or other devices to function in a particular manner, such that the computer readable medium having the instructions stored therein includes an article of manufacture including instructions which implement the function/act specified in the flowchart and/or block diagram block or blocks.

The computer readable program instructions may also be loaded onto a computer, other programmable data processing apparatus, or other devices to cause a series of operational steps to be performed on the computer, other programmable apparatus or other devices to produce a computer implemented process such that the instructions which execute on the computer, other programmable apparatus or other devices implement the functions/acts specified in the flowchart and/or block diagram block or blocks.

The flowcharts and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments of the present disclosure. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of instructions, which comprises one or more executable instructions for implementing the specified logical function(s). In some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams and/or flowchart illustration, and combinations of blocks in the block diagrams and/or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.

The foregoing description of the embodiments of the present disclosure has been presented for purposes of illustration and description, and is not intended to be exhaustive or limited to the embodiments disclosed. Many modifications and variations will be apparent to those of ordinary skill in the art without departing from the scope and spirit of the various embodiments described. The terminology used herein was chosen in order to best explain the principles of the embodiments, the practical application, or the improvement of technology in the marketplace, or to enable others of ordinary skill in the art to understand the embodiments disclosed herein.

Claims

1. A method of training a topic generation model, comprising:

acquiring a correct training sample and a wrong training sample, wherein the correct training sample comprises a text and a correct theme of the text, and the wrong training sample comprises the text and a wrong theme of the text;

training a reward model based on the correct training sample and the incorrect training sample, the reward model for scoring topics generated for text;

acquiring a pre-trained topic generation model, wherein the topic generation model is used for generating topics of texts; and

optimizing the topic generation model by reinforcement learning training based on the reward model,

wherein obtaining the erroneous training samples comprises:

constructing the wrong topic of the text by modifying or deleting information in the correct topic; and

based on the text and the constructed error topic of the text, generating a training sample of the error,

wherein constructing the error topic for the text comprises:

extracting at least one knowledge item from the correct topic, the at least one knowledge item describing a relationship between entities that occur in the correct topic; and

constructing the wrong topic of the text by modifying or deleting at least one of the entity and the relationship in the correct topic,

wherein training the reward model comprises:

training the reward model based on the correct training sample and the incorrect training sample such that the reward model generates a first predetermined score for the correct training sample and a second predetermined score for the incorrect training sample,

wherein optimizing the topic generation model by reinforcement learning training comprises:

acquiring a training text;

generating a theme of the training text by using the theme generation model;

generating a score for the topic using the reward model; and

optimizing the topic generation model based on the score.

2. The method of claim 1, wherein optimizing the topic generation model based on the score comprises:

the topic generation model is optimized based on the score using a near-end policy optimization algorithm.

3. A method of generating a text topic, comprising:

acquiring an input text; and

a topic generation model trained using the method of claim 1 or 2, generating a topic of the input text.

4. An apparatus for training a topic generation model, comprising:

a training sample acquisition module configured to acquire a correct training sample and a wrong training sample, the correct training sample including text and a correct subject of the text, the wrong training sample including the text and a wrong subject of the text;

a reward model training module configured to train a reward model for scoring topics generated for text based on the correct training sample and the incorrect training sample;

a model pre-training module configured to obtain a pre-trained topic generation model for generating topics of text; and

a reinforcement learning training module configured to optimize the topic generation model through reinforcement learning training based on the reward model,

wherein the training sample acquisition module comprises:

an error topic construction module configured to construct the error topic of the text by modifying or deleting information in the correct topic; and

an error sample generation module configured to generate the erroneous training samples based on the text and the constructed error topic of the text,

wherein the error topic construction module comprises:

an extraction module configured to extract at least one knowledge item from the correct topic, the at least one knowledge item describing a relationship between entities that occur in the correct topic; and

a construction module configured to construct the wrong topic of the text by modifying or deleting at least one of the entity and the relationship in the correct topic,

wherein the reward model training module is configured to:

wherein the reinforcement learning training module comprises:

the training sample acquisition module is configured to acquire training texts;

a topic generation module configured to generate a topic of the training text using the topic generation model;

a score generation module configured to generate a score for the topic using the reward model; and

a model optimization module configured to optimize the topic generation model based on the score.

5. The apparatus of claim 4, wherein the model optimization module is configured to:

6. An apparatus for generating a text topic, comprising:

an acquisition module configured to acquire an input text; and

a generation module configured to generate a topic of the input text using a topic generation model trained in accordance with the method of claim 1 or 2.

7. A computing device, comprising:

one or more processors; and

a memory for storing one or more programs, which when executed by the one or more processors, cause the computing device to implement the method of any of claims 1-3.

8. A computer readable storage medium having stored thereon a computer program which, when executed by a processor, implements the method according to any of claims 1-3.