CN116542297A

CN116542297A - Method and device for generating countermeasure network based on text data training

Info

Publication number: CN116542297A
Application number: CN202310797490.1A
Authority: CN
Inventors: 暴宇健; 王芳
Original assignee: Shenzhen Xumi Yuntu Space Technology Co Ltd
Current assignee: Shenzhen Xumi Yuntu Space Technology Co Ltd
Priority date: 2023-07-03
Filing date: 2023-07-03
Publication date: 2023-08-04

Abstract

The present disclosure provides a method and apparatus for generating an countermeasure network based on text data training, the method comprising: each text data is identified through a first generation generator, and a target text data set corresponding to the text data set is generated; inputting each target text data in the target text data set to a first generation discriminator, and discriminating each target text data through the first generation discriminator to obtain a score of whether each target text data contains harmful content or not; determining the harm rate of harmful contents contained in the target text data set according to the score of whether each target text data contains harmful contents or not; generating reward scores by taking the harmful rate of harmful contents contained in target text data in the target text data set as a first generation generator, and performing reinforcement learning training on the first generation generator through a reinforcement learning algorithm. The technical scheme can promote the capability of the generator to identify harmful contents.

Description

Method and device for generating countermeasure network based on text data training

Technical Field

The disclosure relates to the technical field of data processing, in particular to a method and a device for generating an countermeasure network based on text data training.

Background

At present, with the continuous development of artificial intelligence technology, language models are widely applied in the field of natural language processing. However, current language models are less capable of identifying objectionable content, and when text is generated, the language model may involve some objectionable content, such as offensive content, private content, content related to yellow gambling, and so forth. These harmful contents may adversely affect people and may even cause social problems and legal disputes. Therefore, how to avoid or reduce the harmful content generated by the language model becomes a hot spot problem of the current research.

Disclosure of Invention

In view of the above, embodiments of the present disclosure provide a method, apparatus, electronic device and computer readable storage medium for generating an countermeasure network based on text data training, so as to solve the technical problem in the prior art that harmful content may be included when generating text due to poor ability of a language model to identify the harmful content.

In a first aspect of embodiments of the present disclosure, there is provided a method of generating a countermeasure network based on text data training, the generating of the countermeasure network including a first generation generator and a first generation arbiter, the method comprising: acquiring a text data set for training a first generation generator; inputting each text data in the text data set to a first generation generator, and identifying each text data through the first generation generator to generate a target text data set corresponding to the text data set; inputting each target text data in the target text data set to a first generation discriminator, and discriminating each target text data through the first generation discriminator to obtain a score of whether each target text data contains harmful content or not; determining the harm rate of harmful contents contained in the target text data set according to the score of whether each target text data contains harmful contents or not; and taking the harmful rate of harmful content contained in the target text data set as the generated rewarding score of the first generation generator, performing reinforcement learning training on the first generation generator through a reinforcement learning algorithm, and cycling the steps until the first generation generator converges and stops training, so as to obtain the target generator.

In a second aspect of embodiments of the present disclosure, there is provided an apparatus for training a generation countermeasure network based on text data, the generation countermeasure network including a first generation generator and a first generation arbiter, the apparatus comprising: the acquisition module is used for acquiring a text data set for training the first generation generator; the recognition processing module is used for inputting each text data in the text data set to the first generation generator, recognizing each text data through the first generation generator and generating a target text data set corresponding to the text data set; the judging and processing module is used for inputting each target text data in the target text data set to the first generation of judger, judging and processing each target text data through the first generation of judger to obtain a score of whether each target text data contains harmful content or not; the determining module is used for determining the harm rate of harmful contents contained in the target text data set according to the score of whether each target text data contains harmful contents or not; the training module is used for generating rewards scores of the first generation generator by taking the harmful rate of harmful contents contained in the target text data set, and performing reinforcement learning training on the first generation generator through a reinforcement learning algorithm; and the circulation module is used for circulating the steps until the first generation generator converges and stops training, and obtaining the target generator.

In a third aspect of the disclosed embodiments, an electronic device is provided, comprising a memory, a processor and a computer program stored in the memory and executable on the processor, the processor implementing the steps of the above method when executing the computer program.

In a fourth aspect of the disclosed embodiments, a computer-readable storage medium is provided, which stores a computer program which, when executed by a processor, implements the steps of the above-described method.

Compared with the prior art, the embodiment of the disclosure has the beneficial effects that: the method comprises the steps of acquiring a text data set used for training a first generation generator, inputting each text data in the text data set into the first generation generator, carrying out recognition processing on each text data through the first generation generator to generate a target text data set corresponding to the text data set, inputting each target text data in the target text data set into a first generation discriminator, carrying out discrimination processing on each target text data through the first generation discriminator to obtain a score of whether each target text data contains harmful contents, determining the harmful rate of the harmful contents contained in the target text data set according to the score of whether each target text data contains harmful contents, taking the harmful rate of the harmful contents contained in the target text data set as a generation rewarding score of the first generation generator, carrying out reinforcement learning training on the first generation generator through a reinforcement learning algorithm, and cycling the steps until the first generation generator converges and stops training to obtain the target generator.

Drawings

In order to more clearly illustrate the technical solutions of the embodiments of the present disclosure, the drawings that are required for the embodiments or the description of the prior art will be briefly introduced below, and it is obvious that the drawings in the following description are only some embodiments of the present disclosure, and other drawings may be obtained according to these drawings without inventive effort for a person of ordinary skill in the art.

FIG. 1 shows a schematic diagram of an exemplary system architecture to which the technical solution of an embodiment of the invention may be applied;

FIG. 2 is a flow chart of a method for generating an countermeasure network based on text data training provided by an embodiment of the present disclosure;

FIG. 3 is a flow chart of another method for generating an countermeasure network based on text data training provided by embodiments of the present disclosure;

FIG. 4 is a flow chart of yet another method for generating an countermeasure network based on text data training provided by embodiments of the present disclosure;

FIG. 5 is a schematic diagram of an apparatus for generating an countermeasure network based on text data training provided by an embodiment of the present disclosure;

fig. 6 is a schematic structural diagram of an electronic device according to an embodiment of the present disclosure.

Detailed Description

In the following description, for purposes of explanation and not limitation, specific details are set forth, such as particular system configurations, techniques, etc. in order to provide a thorough understanding of the disclosed embodiments. However, it will be apparent to one skilled in the art that the present disclosure may be practiced in other embodiments that depart from these specific details. In other instances, detailed descriptions of well-known systems, devices, circuits, and methods are omitted so as not to obscure the description of the present disclosure with unnecessary detail.

It should be noted that, the user information (including, but not limited to, terminal device information, user personal information, etc.) and the data (including, but not limited to, data for presentation, analyzed data, etc.) related to the present disclosure are information and data authorized by the user or sufficiently authorized by each party.

In the related art, the large language model has wide application in various fields of natural language processing, and can extract the internal rule and semantic information of texts through learning and modeling massive corpus, so that various applications of natural language are realized, and the current use scene is as follows:

1. language generation tasks: the large language model can be used for generating texts, such as generating conversations, generating abstracts, generating articles and the like, and has great application potential for tasks such as conversational systems, automatic writing, automatic abstracts and the like.

2. Language understanding tasks: the large language model can be used for language understanding tasks such as named entity recognition, emotion analysis, text classification and the like, and has great application potential for search engines, recommendation systems and the like.

3. Language enhancement tasks: the large language model can be used for text data enhancement, and the original data set is expanded by generating new text data, so that the generalization capability and the robustness of the model are improved.

4. Machine translation: a large language model may be used for machine translation, such as google's GNMT (Google Neural MachineTranslation) model is a large language model based machine translation model that translates in a sequence-to-sequence manner.

5. Text generation and typesetting: large language models can be used for text generation and typesetting, such as automatically generating news headlines, automatically typesetting, and the like.

However, in practical applications, large language models may also generate potentially harmful content while providing users with premium content. In one aspect, training data for large language models is derived from the Internet and other text data, which inevitably contains erroneous, spurious, or misleading information. These erroneous information may cause the model to produce similar erroneous or misleading outputs when generating the content, thereby causing confusion and misleading to the user. Second, large language models may generate content that is offensive and privacy-violating. These models may absorb some offensive, infringing, or privacy infringing information during the learning process. In practice, the model may unconsciously generate content containing similar information, thereby causing ethical and legal disputes. Furthermore, large language models may involve malicious language and offensive topics in generating content. The training data may contain some sensitive topics, extreme views, or offensive information. This situation may lead to the model producing content with negative impact when generating text, further expanding the propagation range of bad speech.

Common large language models include GPT (generated Pre-traditional transformer), BERT (Bidirectional EncoderRepresentations from Transformers), XLNet, etc., which are excellent in natural language processing tasks and have been widely used.

Taking GPT as an example, this is a large-scale neuro-language model based on a transducer architecture, whose basic idea is to pretrain on massive text data and then fine tune on specific tasks to achieve better performance. The basic principle is based on a transducer architecture, mapping an input sequence into a vector space, and representing points in this vector space as word vectors with contextual relevance. In the pre-training process, the GPT uses an autoregressive model-based language model to learn model parameters by maximizing the conditional probability of the next word.

The main process of the GPT model is as follows:

(1) Pretreatment: GPT first performs word segmentation on the original text and converts the text into a sequence of word vectors. In the pre-training phase, GPT takes a task similar to a language model, i.e., predicting the next word in a given context.

(2) Transformer encoder: GPT uses a multi-layer transducer encoder to learn context-dependent word vector representations. In each layer, the transducer encoder uses a self-attention mechanism and a forward neural network to further process the output of the previous layer.

(3) Predicting the next word: at each prediction, the GPT takes the previous text as input, gets a representation of the current text through the multi-layer transducer encoder, and calculates the conditional probability of the next word through the softmax layer.

(4) Fine tuning: when fine tuning is performed on a specific task, the GPT modifies some neural network structures or adds some task-specific inputs on the basis of a pre-training stage according to the characteristics of the specific task, and then fine tuning is performed on the whole model.

GPT performs well in natural language processing tasks such as text generation, text classification, machine translation, etc., but may also generate potentially harmful content.

Based on the above problems, the present disclosure proposes a method for generating an countermeasure network based on text data training, by which a generator trained can be used to resist harmful content generation, so as to ensure that the generator can effectively prevent the propagation of harmful information when generating text.

Fig. 1 shows a schematic diagram of an exemplary system architecture to which the technical solution of an embodiment of the present invention may be applied.

As shown in fig. 1, the system architecture 100 may include one or more of a terminal first end device 101, a second end device 102, a third end device 103, a network 104, and a server 105. The network 104 is a medium used to provide communication links between the first end device 101, the second end device 102, the third end device 103, and the server 105. The network 104 may include various connection types, such as wired, wireless communication links, or fiber optic cables, among others.

It should be understood that the number of terminal devices, networks and servers in fig. 1 is merely illustrative. There may be any number of terminal devices, networks, and servers, as desired for implementation. For example, the server 105 may be a server cluster formed by a plurality of servers.

A user may interact with the server 105 via the network 104 using the first end device 101, the second end device 102, and the third end device 103 to receive or transmit text data. The first end device 101, the second end device 102, the third end device 103 may be various electronic devices with display screens including, but not limited to, smartphones, tablets, portable computers, desktop computers, and the like.

The server 105 may be a server providing various services. For example, the server 105 may obtain a text data set for training the first generator from the third terminal device 103 (or the first terminal device 101 or the second terminal device 102), input each text data in the text data set to the first generator, perform recognition processing on each text data by the first generator to generate a target text data set corresponding to the text data set, input each target text data in the target text data set to the first generation discriminator, perform discrimination processing on each target text data by the first generation discriminator to obtain a score of whether each target text data contains harmful content, determine a harmful rate of the harmful content contained in the target text data set according to the score of whether each target text data contains harmful content, so that the harmful rate of the harmful content contained in the target text data set is used as a score of the first generator, perform reinforcement learning training on the first generator by a reinforcement learning algorithm, and circulate the above steps until the first generator stops training to obtain the target generator, thereby obtaining a strong recognition capability of the target generator for identifying harmful content, and the target text data can be prevented from being generated more effectively based on the target text data.

In some embodiments, the method for generating an countermeasure network based on text data training provided by the embodiments of the present invention is generally performed by the server 105, and accordingly, the apparatus for generating an countermeasure network based on text data training is generally disposed in the server 105. In other embodiments, some terminal devices may have similar functionality as a server to perform the method. Therefore, the method for generating the countermeasure network based on the text data training provided by the embodiment of the invention is not limited to be executed at the server side.

Methods and apparatuses for generating an countermeasure network based on text data training according to embodiments of the present disclosure will be described in detail below with reference to the accompanying drawings.

Fig. 2 is a flow chart of a method for generating an countermeasure network based on text data training provided by an embodiment of the present disclosure. The method provided by the embodiments of the present disclosure may be performed by any electronic device having computer processing capabilities, for example, the electronic device may be a server as shown in fig. 1.

As shown in fig. 2, the method of generating a countermeasure network based on text data training includes steps S210 to S260.

In step S210, a text data set for training a first generator is acquired.

Step S220, each text data in the text data set is input to the first generation generator, and each text data is identified by the first generation generator, so that a target text data set corresponding to the text data set is generated.

In step S230, each target text data in the target text data set is input to the first generation discriminator, and discrimination processing is performed on each target text data by the first generation discriminator, so as to obtain a score of whether each target text data contains harmful content.

In step S240, a harmfulness rate of the harmful content contained in the target text data set is determined according to whether each target text data contains a score of the harmful content.

In step S250, the first-generation generator is subjected to reinforcement learning training by the reinforcement learning algorithm with the generated bonus score of the first-generation generator being the harm rate of the harmful content contained in the target text data set.

In step S260, the above steps are looped until the first generator converges to stop training, and the target generator is obtained.

According to the method, a text data set used for training a first generation generator can be obtained, each text data in the text data set is input to the first generation generator, identification processing is carried out on each text data through the first generation generator, a target text data set corresponding to the text data set is generated, then each target text data in the target text data set is input to a first generation discriminator, discrimination processing is carried out on each target text data through the first generation discriminator, the score of whether each target text data contains harmful contents is obtained, the harmful rate of the harmful contents contained in the target text data set can be determined according to the score of whether each target text data contains harmful contents, the harmful rate of the harmful contents contained in the target text data set is used as a generation rewarding score of the first generation generator, reinforcement learning training is carried out on the first generation generator through a reinforcement learning algorithm, the steps are circulated until the first generation generator is converged and stopped, and the obtained target generator has stronger harmful content identification capability, and the data containing harmful contents are effectively avoided when texts are generated based on the target generator.

In some embodiments of the present disclosure, the above-described set of text data for training the first generator may be to collect a large amount of text data related to the target task. For example, the data can be obtained from a public data set of the terminal equipment, and can also be collected by a web crawler. It should be noted that the collected data should cover as much as possible different fields, different styles and different languages in order to obtain a richer linguistic knowledge. In this embodiment, preprocessing may be required for the collected text data, including removing invalid characters, punctuation marks, stop words, and the like, and performing word segmentation processing to divide the text into words or phrases. These preprocessing operations are beneficial to improving the performance of subsequent models. In order to improve the robustness and generalization capability of the model, data enhancement can be performed on the training set. For example, random deletion, substitution, insertion, exchange, etc. may be performed to generate more training samples.

In some embodiments of the present disclosure, a generation countermeasure network for processing text data is constructed. For example, it is first necessary to determine a text processing task target, such as text generation, to be completed against the network. And designing the structures of the generator and the discriminator according to the characteristics of the task targets and the text data. The generator can adopt a structure such as a cyclic neural network (RNN), a long and short time memory network (LSTM), a variational self-encoder (VAE) and the like; the discriminator can adopt a Convolutional Neural Network (CNN), a sensor and the like. The loss functions of the generator and the arbiter are defined, and for the text generation task, a cross entropy loss function may be used as the loss function of the generator. The loss function of the arbiter may employ an antagonistic loss function such as JS divergence, wasserstein distance, and the like. The quality of the text data generated by the generator is improved by alternately training the generator and the arbiter. Specifically, in the training process, firstly, a generator generates sample data, and then the generated data and real data are input into a discriminator for discrimination; the discriminator updates own weight according to the difference between the generated data and the real data, so that the discriminator can more accurately distinguish the generated data from the real data; meanwhile, the generator updates own weight according to the feedback information of the discriminator, so that the generator can generate more realistic text data. In this application, the generator in this embodiment is a language model obtained by an existing training method, that is, the first generation generator described above, and is configured to generate target text data corresponding to the text data based on the text data. The discriminator in this embodiment is a first generation discriminator trained by the following method for judging whether or not target specimen data contains harmful contents.

In some embodiments of the present disclosure, prior to inputting each target text data in the set of target text data into the first generation arbiter, the method comprises: acquiring labeled text data for training a discriminator, wherein the labeled text data comprises harmful text data and harmless text data acquired by a first generation generator; the method comprises the steps of inputting harmful text data and harmless text data into a discriminator, and judging whether the harmful text data and the harmless text data contain harmful contents or not through the discriminator to obtain a score of whether the harmful text data contain harmful contents or not and a score of whether the harmless text data contain harmful contents or not; when the score of whether the harmful text data contains harmful contents and the score of whether the harmless text data contains harmful contents respectively meet the initial preset conditions, stopping training to obtain a first generation of discriminators, training can be performed based on small-scale sample data to obtain the first discriminators, and the first generation of discriminators can be used for judging whether target text data generated by a first generation generator contains harmful contents.

In some embodiments of the present disclosure, the first-generation generator may be an existing language model, and the text data generated by the first-generation generator may contain harmful content that is erroneous, false, or misleading, or may be offensive, infringing, or infringing privacy, and yellow-gambling-related information.

In some embodiments of the present disclosure, after the first generator converges, the method further comprises: acquiring a text data set for training a first generation discriminator; inputting each text data in the text data set for training the first generation discriminator to a target generator, and identifying each text data in the text data set for training the first generation discriminator through the target generator to generate a target text data set corresponding to the text data set for training the first generation discriminator; inputting each target text data in the target text data set corresponding to the text data set for training the first generation discriminator to the first generation discriminator, judging whether each target text data in the target text data set corresponding to the text data set for training the first generation discriminator contains harmful content or not through the first generation discriminator, and obtaining a score whether each target text data in the target text data set corresponding to the text data set for training the first generation discriminator contains harmful content or not; and stopping training when the score of whether each target text data in the target text data set corresponding to the text data set for training the first generation of the discriminators contains harmful content meets the target preset condition, and obtaining the target discriminators. After the first generation generator converges, the first generation identifier can be optimized in the circulating mode, so that the obtained target identifier is more accurate in judging the target text data.

In some embodiments of the present disclosure, the harm rate of the harmful content contained in the target text data set is determined according to whether each target text data contains a score of the harmful content. For example, there are 100 pieces of target text data in the target text data set, wherein the target text data containing harmful content is determined to be 50 pieces of target text data containing harmful content according to the score of whether each piece of target text data contains harmful content, and the harmful rate of the target text data in the target text data set is 50%.

In some embodiments of the present disclosure, generating a reward score with a target text data in the target text data set including a hazard content as a first generation generator, reinforcement learning training the first generation generator by a reinforcement learning algorithm includes: generating rewarding scores by taking the harmful rate of harmful contents contained in target text data in the target text data set as a first generation generator, and performing reinforcement learning training on the first generation generator through a strategy gradient algorithm; generating rewarding scores by taking the harmful rate of harmful contents contained in target text data in the target text data set as a first generation generator, and performing reinforcement learning training on the first generation generator through an actor-critique algorithm; generating rewarding scores by taking the harmful rate of harmful contents contained in target text data in the target text data set as a first generation generator, and performing reinforcement learning training on the first generation generator through a reinforcement countermeasure training algorithm; generating reward scores by taking the harmful rate of harmful contents contained in target text data in the target text data set as a first generation generator, and performing reinforcement learning training on the first generation generator through a Q-learning algorithm.

Specifically, when the first generation generator is subjected to reinforcement learning training through a strategy gradient algorithm, rewards are generally defined as accumulated rewards obtained by an agent when executing a certain strategy in the strategy gradient algorithm, and in a text generation task, the rewards can be defined according to the generated text quality or other indexes. For example, the generated text quality may be used as a reward signal to maximize the generated text quality. In this application, the generated text quality may refer to whether harmful content is contained in the text data, and the bonus signal may be a bonus score for generating this operation. In a strategy gradient algorithm, it is desirable to maximize the jackpot by learning a strategy function, and in particular, the strategy gradient algorithm uses a gradient-ramp method to update the strategy function parameters to maximize the jackpot. Thus, in a strategic gradient algorithm, the rewards mechanism plays an important role, which can guide the agent to learn how to take optimal actions in different states to maximize the jackpot. At the same time, the strategy gradient algorithm also requires updating of the strategy function parameters by gradient ascent to maximize the desired jackpot.

When the first generation generator is subjected to reinforcement learning training through an actor-commentator algorithm, rewards are generally defined as real-time rewards obtained by an agent when executing a certain strategy or accumulated rewards obtained by the agent when executing a certain strategy, and in a text generation task, the rewards can be defined according to the generated text quality or other indexes. For example, the generated text quality may be used as a reward signal to maximize the generated text quality. In this application, the generated text quality may refer to whether harmful content is contained in the text data, and the bonus signal may be a bonus score for generating this operation. In the actor-critique algorithm, it is necessary to learn a strategy function and a cost function at the same time. The policy function is used to guide the agent how to take optimal actions in different states, while the cost function is used to evaluate the cost of the current state to guide the updating of the policy function. In the actor-commentator algorithm, the rewards mechanism plays an important role, which can guide the agent to learn how to take optimal actions in different states to maximize real-time rewards or jackpots. At the same time, the actor-critter algorithm also needs to maximize the state-action value function by updating the strategy function and the cost function parameters to guide the updating of the strategy function.

When the first generation generator is subjected to reinforcement learning training through the reinforcement countermeasure training algorithm, the rewards are generally defined as real-time rewards obtained by an agent when executing a certain strategy or accumulated rewards obtained by the agent when executing a certain strategy, and in the text generation task, the rewards can be defined according to the generated text quality or other indexes. For example, the generated text quality may be used as a reward signal to maximize the generated text quality. In this application, the generated text quality may refer to whether harmful content is contained in the text data, and the bonus signal may be a bonus score for generating this operation. In the reinforcement countermeasure training algorithm, it is necessary to learn a strategy function and a countermeasure model at the same time. The policy function is used to instruct the agent how to take optimal actions in different states, and the countermeasure model is used to evaluate whether the current state is countermeasure or not to instruct updating of the policy function. In the enhanced countermeasure training algorithm, a rewarding mechanism plays an important role, which can guide the agent to learn how to take optimal actions in different states to maximize real-time rewards or jackpots. At the same time, the reinforcement countermeasure training algorithm also needs to maximize the robustness of the state-action value function and the countermeasure model by updating the strategy function and the countermeasure model parameters.

When the first generation generator is subjected to reinforcement learning training through the Q-learning algorithm, rewards in the Q-learning algorithm are feedback signals of the reinforcement learning algorithm and are used for guiding the learning process of the intelligent agent. In the Q-learning algorithm, rewards are generally defined as real-time rewards that an agent gets after performing an action in a state. During the training process, the Q-learning algorithm updates the state-action value function based on the current state and the real-time rewards obtained by executing the action. In the Q-learning algorithm, the rewarding mechanism plays an important role, which can guide the agent to learn how to take optimal actions in different states to maximize the jackpot.

According to the method provided by the application, the four modes can be adopted, the harmful rate of harmful content contained in the target text data set is used as the generated rewarding score of the first generation generator, the first generation generator is subjected to reinforcement learning training through the reinforcement learning algorithm, the target generator obtained in the mode is higher in harmful content identification capability, the text data of the harmful content is prevented from being transmitted to a user, and the user experience is further improved.

Fig. 3 is a flow chart of another method for generating an countermeasure network based on text data training provided by an embodiment of the present disclosure.

As shown in fig. 3, the above method may further include steps S310 to S330.

In step S310, text data to be recognized is acquired.

Step S320, inputting the text data to be recognized into a target generator, and performing recognition processing on the text data to be recognized through the target generator to generate target text data corresponding to the text data to be recognized.

Step S330, filtering the target text data corresponding to the text data to be identified based on preset filtering rules, and determining whether the target text data corresponding to the text data to be identified contains harmful content, wherein the preset filtering rules comprise filtering rules set based on regular expressions, filtering rules set based on a hard matching mode and filtering rules set based on a soft matching mode.

According to the method, text data to be identified can be input into the target generator, identification processing is carried out on the text data to be identified through the target generator, target text data corresponding to the text data to be identified is generated, the probability that the target text data generated in the mode contains harmful content is smaller, in order to further guarantee the quality of the target text data, filtering processing can be carried out on the target text data corresponding to the text data to be identified based on preset filtering rules, whether the target text data corresponding to the text data to be identified contains harmful content is determined, and therefore generation of the target text data containing the harmful content can be further avoided.

In some embodiments of the present disclosure, the filtering rules set based on regular expressions described above: it is first necessary to determine the type of harmful content, such as yellow, violent, administrative, terrorist, etc. Different types of objectionable content differ in language expression and different regular expressions need to be designed for different types. Text data containing harmful contents is collected and classified. And analyzing the collected sample data to find out the characteristics and modes thereof. Text mining, machine learning, etc. techniques may be employed for analysis. Based on the analysis results of the sample data, regular expressions are designed to match the harmful content. Regular expressions may employ some common linguistic features, such as specific words, specific characters, specific parts of speech, and the like. And verifying the sample data by using the regular expression to check whether the matching result is correct. If a match is found, the regular expression needs to be adjusted and re-validated.

In some embodiments of the present disclosure, the setting process of the hard matching method and the soft matching method is similar to the setting process of the regular expression, and will not be described herein. Hard matching is a term in programming that refers to a direct match between two different data sources. In hard matching, the program directly compares the data in the two data sources, and if the two data sources are completely matched, the matching is considered to be successful; if there is not a perfect match, the match fails. Hard matching is typically used to process data sets of relatively small data size and relatively simple structure, such as where a certain text string is to be matched exactly. In practical application, the precision of the hard matching mode is higher. In contrast to the hard matching approach, the soft matching approach is typically used to process data sets with relatively large amounts of data and relatively complex structures, such as matching speech recognition or natural language processing tasks. In soft matching, the program may employ algorithms or models to perform preprocessing, feature extraction, etc. on the data, and then use the corresponding models to perform matching. The advantage of the soft matching approach is that more complex data sets can be processed.

Fig. 4 is a flow chart of yet another method for generating an countermeasure network based on text data training provided by an embodiment of the present disclosure.

As shown in fig. 4, the above method may further include steps S410 to S440.

In step S410, text data to be recognized is acquired.

In step S420, the text data to be recognized is input to the target generator, and the target generator performs recognition processing on the text data to be recognized to generate target text data corresponding to the text data to be recognized.

In step S430, entity extraction processing and entity relation extraction processing are performed on the target text data corresponding to the text data to be identified through an entity identification model, so as to obtain an entity and an entity relation in the target text data corresponding to the text data to be identified, where the entity identification model includes a bidirectional coding layer and a conditional random field layer.

In step S440, it is determined whether the target text data corresponding to the text data to be recognized contains harmful content according to the entity and the entity relationship in the target text data corresponding to the text data to be recognized.

According to the method, text data to be identified can be input into a target generator, identification processing is carried out on the text data to be identified through the target generator, target text data corresponding to the text data to be identified is generated, the probability that the target text data generated in the mode contains harmful content is smaller, in order to further guarantee the quality of the target text data, entity extraction processing and entity relation extraction processing can be carried out on the target text data corresponding to the text data to be identified through an entity identification model, entity and entity relation in the target text data corresponding to the text data to be identified are obtained, and whether the target text data corresponding to the text data to be identified contains harmful content is determined according to the entity and entity relation in the target text data corresponding to the text data to be identified, so that the generation of the target text data containing the harmful content can be further avoided.

In some embodiments of the present disclosure, the entity recognition model described above includes a bi-directional encoding layer (i.e., BERT layer) and a conditional random field layer (i.e., CRF layer). In this embodiment, the BERT model is used to perform entity recognition on the target text data, so as to obtain an entity labeling result. For example, some common entity identification tools may be used, such as BERT-NER, BERT-CRF, and the like. In entity identification, care needs to be taken to select the appropriate entity type, such as person name, place name, organization name, etc. And extracting the entity relationship from the entity labeling result by using the CRF model to obtain the entity relationship labeling result. During entity relation extraction, proper characteristic functions and parameters need to be designed to improve accuracy and robustness of entity relation extraction. In this way, the entity and entity relationship can be extracted from the target text data, and whether the target text data contains harmful content can be judged according to the association relationship between the entity and the entity relationship.

In some embodiments of the present disclosure, the method may further include obtaining feedback information made by the user for the recommended text data generated by the target generator, the feedback information including whether the recommended text data includes harmful content; updating the target generator, updating the preset filtering rules and updating the entity recognition model based on feedback information made by the user for the recommended text data generated by the target generator. For example, the feedback information may refer to whether the recommended text data is harmful text data or harmless text data. These harmful text data and harmless text data are then sent back to the training process, which may continue training the target generator to optimize parameters in the target generator, and which may also continue training the training entity recognition model to optimize parameters in the entity recognition model. New filtering rules may also be set for harmful text data and harmless text data in the feedback information. By adopting the mode, the target generator, the filtering rule and the entity identification model can be continuously optimized according to the feedback information returned by the front end.

The following are device embodiments of the present disclosure that may be used to perform method embodiments of the present disclosure. The apparatus for generating the countermeasure network based on the text data training described below and the method for generating the countermeasure network based on the text data training described above can be referred to in correspondence with each other. For details not disclosed in the embodiments of the apparatus of the present disclosure, please refer to the embodiments of the method of the present disclosure.

Fig. 5 is a schematic structural diagram of an apparatus for generating an countermeasure network based on text data training according to an embodiment of the present disclosure.

As shown in fig. 5, the apparatus 500 for generating an countermeasure network based on text data training includes an acquisition module 510, an identification processing module 520, a discrimination processing module 530, a determination module 540, a training module 550, and a circulation module 560.

Specifically, the obtaining module 510 is configured to obtain a text data set for training the first generator.

The recognition processing module 520 is configured to input each text data in the text data set to the first generator, and perform recognition processing on each text data by the first generator to generate a target text data set corresponding to the text data set.

The discriminating processing module 530 is configured to input each target text data in the target text data set to the first generation discriminator, and perform discriminating processing on each target text data by the first generation discriminator to obtain a score of whether each target text data includes harmful content.

The determining module 540 is configured to determine a hazard rate of the target text data in the target text data set including the hazard content according to whether each target text data includes the score of the hazard content.

The training module 550 is configured to perform reinforcement learning training on the first generator through a reinforcement learning algorithm by using a odds ratio of the harmful content included in the target text data set as a generated bonus score of the first generator.

And the circulation module 560 is configured to circulate the above steps until the first generator converges and stops training, so as to obtain the target generator.

The device 500 for generating a countermeasure network based on text data training can acquire a text data set for training a first generation generator, input each text data in the text data set to the first generation generator, perform recognition processing on each text data by the first generation generator to generate a target text data set corresponding to the text data set, input each target text data in the target text data set to a first generation discriminator, perform discrimination processing on each target text data by the first generation discriminator to obtain a score of whether each target text data contains harmful content, determine the harmful rate of the harmful content contained in the target text data set according to the score of whether each target text data contains harmful content, so that the harmful rate of the harmful content contained in the target text data set is used as a generated rewarding score of the first generation generator, perform reinforcement learning training on the first generation generator by a reinforcement learning algorithm, circulate the steps until the first generation generator converges and stops training, obtain the target generator with stronger harmful content recognition capability, and effectively avoid the occurrence of data containing harmful content when the target text data is generated based on the target text data.

In some embodiments of the present disclosure, the training module 550 described above is further configured to: generating reward scores by taking the harmful rate of harmful contents contained in target text data in the target text data set as a first generation generator, and performing reinforcement learning training on the first generation generator through a reinforcement learning algorithm comprises: generating rewarding scores by taking the harmful rate of harmful contents contained in target text data in the target text data set as a first generation generator, and performing reinforcement learning training on the first generation generator through a strategy gradient algorithm; generating rewarding scores by taking the harmful rate of harmful contents contained in target text data in the target text data set as a first generation generator, and performing reinforcement learning training on the first generation generator through an actor-critique algorithm; generating rewarding scores by taking the harmful rate of harmful contents contained in target text data in the target text data set as a first generation generator, and performing reinforcement learning training on the first generation generator through a reinforcement countermeasure training algorithm; generating reward scores by taking the harmful rate of harmful contents contained in target text data in the target text data set as a first generation generator, and performing reinforcement learning training on the first generation generator through a Q-learning algorithm.

In some embodiments of the present disclosure, the above-described apparatus 500 for generating a countermeasure network based on text data training is further configured to, prior to inputting each target text data in the set of target text data into the first generation arbiter: acquiring labeled text data for training a discriminator, wherein the labeled text data comprises harmful text data and harmless text data acquired by a first generation generator; the method comprises the steps of inputting harmful text data and harmless text data into a discriminator, and judging whether the harmful text data and the harmless text data contain harmful contents or not through the discriminator to obtain a score of whether the harmful text data contain harmful contents or not and a score of whether the harmless text data contain harmful contents or not; and stopping training when the score of whether the harmful text data contains harmful contents and the score of whether the harmless text data contains harmful contents respectively meet the initial preset conditions, so as to obtain the first generation of discriminators.

In some embodiments of the present disclosure, after the first generation generator converges, the apparatus 500 for generating a countermeasure network based on text data training described above is further configured to: acquiring a text data set for training a first generation discriminator; inputting each text data in the text data set for training the first generation discriminator to a target generator, and identifying each text data in the text data set for training the first generation discriminator through the target generator to generate a target text data set corresponding to the text data set for training the first generation discriminator; inputting each target text data in the target text data set corresponding to the text data set for training the first generation discriminator to the first generation discriminator, judging whether each target text data in the target text data set corresponding to the text data set for training the first generation discriminator contains harmful content or not through the first generation discriminator, and obtaining a score whether each target text data in the target text data set corresponding to the text data set for training the first generation discriminator contains harmful content or not; and stopping training when the score of whether each target text data in the target text data set corresponding to the text data set for training the first generation of the discriminators contains harmful content meets the target preset condition, and obtaining the target discriminators.

In some embodiments of the present disclosure, the apparatus 500 for generating a countermeasure network based on text data training described above is further configured to: acquiring text data to be identified; inputting the text data to be identified into a target generator, and identifying the text data to be identified through the target generator to generate target text data corresponding to the text data to be identified; filtering target text data corresponding to text data to be identified based on preset filtering rules, and determining whether the target text data corresponding to the text data to be identified contains harmful content, wherein the preset filtering rules comprise filtering rules set based on regular expressions, filtering rules set based on a hard matching mode and filtering rules set based on a soft matching mode.

In some embodiments of the present disclosure, the apparatus 500 for generating a countermeasure network based on text data training described above is further configured to: acquiring text data to be identified; inputting the text data to be identified into a target generator, and identifying the text data to be identified through the target generator to generate target text data corresponding to the text data to be identified; performing entity extraction processing and entity relation extraction processing on target text data corresponding to text data to be identified through an entity identification model, and obtaining an entity and entity relation in the target text data corresponding to the text data to be identified, wherein the entity identification model comprises a bidirectional coding layer and a conditional random field layer; and determining whether the target text data corresponding to the text data to be identified contains harmful content according to the entity and entity relation in the target text data corresponding to the text data to be identified.

In some embodiments of the present disclosure, the apparatus 500 for generating a countermeasure network based on text data training described above is further configured to: acquiring feedback information made by a user aiming at recommended text data generated by a target generator, wherein the feedback information comprises whether the recommended text data contains harmful contents or not; updating the target generator, updating the preset filtering rules and updating the entity recognition model based on feedback information made by the user for the recommended text data generated by the target generator.

Fig. 6 is a schematic diagram of an electronic device 6 provided by an embodiment of the present disclosure. As shown in fig. 6, the electronic device 6 of this embodiment includes: a processor 601, a memory 602 and a computer program 603 stored in the memory 602 and executable on the processor 601. The steps of the various method embodiments described above are implemented by the processor 601 when executing the computer program 603. Alternatively, the processor 601 may implement the functions of the modules in the above-described device embodiments when executing the computer program 603.

The electronic device 6 may be a desktop computer, a notebook computer, a palm computer, a cloud server, or the like. The electronic device 6 may include, but is not limited to, a processor 601 and a memory 602. It will be appreciated by those skilled in the art that fig. 6 is merely an example of the electronic device 6 and is not limiting of the electronic device 6 and may include more or fewer components than shown, or different components.

The processor 601 may be a central processing unit (Central Processing Unit, CPU) or other general purpose processor, digital signal processor (Digital Signal Processor, DSP), application specific integrated circuit (Application SpecificIntegrated Circuit, ASIC), field programmable gate array (Field-Programmable Gate Array, FPGA) or other programmable logic device, discrete gate or transistor logic device, discrete hardware components, or the like.

The memory 602 may be an internal storage unit of the electronic device 6, for example, a hard disk or a memory of the electronic device 6. The memory 602 may also be an external storage device of the electronic device 6, for example, a plug-in hard disk, a Smart Media Card (SMC), a Secure Digital (SD) Card, a Flash Card (Flash Card) or the like, which are provided on the electronic device 6. The memory 602 may also include both internal and external storage units of the electronic device 6. The memory 602 is used to store computer programs and other programs and data required by the electronic device.

It will be apparent to those skilled in the art that, for convenience and brevity of description, only the above-described division of the functional units and modules is illustrated, and in practical application, the above-described functional distribution may be performed by different functional units and modules according to needs, i.e. the internal structure of the apparatus is divided into different functional units or modules to perform all or part of the above-described functions. The functional units and modules in the embodiment may be integrated in one processing unit, or each unit may exist alone physically, or two or more units may be integrated in one unit, where the integrated units may be implemented in a form of hardware or a form of a software functional unit.

The integrated modules, if implemented in the form of software functional units and sold or used as a stand-alone product, may be stored in a computer readable storage medium. Based on such understanding, the present disclosure may implement all or part of the flow of the method of the above-described embodiments, or may be implemented by a computer program to instruct related hardware, and the computer program may be stored in a computer readable storage medium, where the computer program, when executed by a processor, may implement the steps of the method embodiments described above. The computer program may comprise computer program code, which may be in source code form, object code form, executable file or in some intermediate form, etc. The computer readable medium may include: any entity or device capable of carrying computer program code, a recording medium, a U disk, a removable hard disk, a magnetic disk, an optical disk, a computer Memory, a Read-Only Memory (ROM), a random access Memory (Random Access Memory, RAM), an electrical carrier signal, a telecommunications signal, a software distribution medium, and so forth. It should be noted that the content of the computer readable medium can be appropriately increased or decreased according to the requirements of the jurisdiction's jurisdiction and the patent practice, for example, in some jurisdictions, the computer readable medium does not include electrical carrier signals and telecommunication signals according to the jurisdiction and the patent practice.

The above embodiments are merely for illustrating the technical solution of the present disclosure, and are not limiting thereof; although the present disclosure has been described in detail with reference to the foregoing embodiments, it should be understood by those of ordinary skill in the art that: the technical scheme described in the foregoing embodiments can be modified or some technical features thereof can be replaced by equivalents; such modifications and substitutions do not depart from the spirit and scope of the technical solutions of the embodiments of the disclosure, and are intended to be included in the scope of the present disclosure.

Claims

1. A method of generating an countermeasure network based on text data training, the generating the countermeasure network comprising a first generation generator and a first generation arbiter, the method comprising:

acquiring a text data set for training the first generation generator;

inputting each text data in the text data set to the first generation generator, and identifying each text data through the first generation generator to generate a target text data set corresponding to the text data set;

inputting each target text data in the target text data set into the first generation discriminator, and discriminating each target text data through the first generation discriminator to obtain a score of whether each target text data contains harmful content or not;

Determining the harm rate of harmful contents contained in the target text data set according to the score of whether each target text data contains harmful contents or not;

generating rewards by taking the harmful rate of harmful contents contained in target text data in the target text data set as the generated rewards of the first generation generator, and performing reinforcement learning training on the first generation generator through a reinforcement learning algorithm;

and circulating the steps until the first generation generator converges and stops training, and obtaining the target generator.

2. The method of claim 1, wherein prior to inputting each target text data in the set of target text data into the first generation arbiter, the method comprises:

acquiring labeled text data for training a discriminator, wherein the labeled text data comprises harmful text data and harmless text data acquired by the first generation generator;

inputting the harmful text data and the harmless text data into the discriminator, and respectively judging whether the harmful text data and the harmless text data contain harmful contents or not through the discriminator to obtain a score of whether the harmful text data contain harmful contents or not and a score of whether the harmless text data contain harmful contents or not;

And stopping training when the score of whether the harmful text data contains harmful contents and the score of whether the harmless text data contains harmful contents respectively meet the initial preset conditions, and obtaining the first generation discriminator.

3. The method of claim 2, wherein after the first generator converges, the method further comprises:

acquiring a text data set for training the first generation discriminator;

inputting each text data in the text data set for training the first generation of discriminators to the target generator, and performing recognition processing on each text data in the text data set for training the first generation of discriminators through the target generator to generate a target text data set corresponding to the text data set for training the first generation of discriminators;

inputting each target text data in a target text data set corresponding to a text data set for training the first generation discriminator into the first generation discriminator, judging whether each target text data in the target text data set corresponding to the text data set for training the first generation discriminator contains harmful content or not through the first generation discriminator, and obtaining a score whether each target text data in the target text data set corresponding to the text data set for training the first generation discriminator contains harmful content or not;

And stopping training to obtain the target discriminator when the score for training whether each target text data in the target text data set corresponding to the text data set of the first generation discriminator contains harmful content meets the target preset condition.

4. The method according to claim 1, wherein the method further comprises:

acquiring text data to be identified;

inputting the text data to be identified into the target generator, and identifying the text data to be identified through the target generator to generate target text data corresponding to the text data to be identified;

and filtering the target text data corresponding to the text data to be identified based on a preset filtering rule, and determining whether the target text data corresponding to the text data to be identified contains harmful content, wherein the preset filtering rule comprises a filtering rule set based on a regular expression, a filtering rule set based on a hard matching mode and a filtering rule set based on a soft matching mode.

5. The method according to claim 4, wherein the method further comprises:

acquiring text data to be identified;

performing entity extraction processing and entity relation extraction processing on target text data corresponding to the text data to be identified through an entity identification model to obtain an entity and entity relation in the target text data corresponding to the text data to be identified, wherein the entity identification model comprises a bidirectional coding layer and a conditional random field layer;

and determining whether the target text data corresponding to the text data to be identified contains harmful content or not according to the entity and entity relation in the target text data corresponding to the text data to be identified.

6. The method of claim 5, wherein the method further comprises:

acquiring feedback information made by a user for the recommended text data generated by the target generator, wherein the feedback information comprises whether the recommended text data contains harmful content or not;

updating the target generator, updating the preset filtering rule and updating the entity recognition model based on feedback information made by a user for the recommended text data generated by the target generator.

7. The method of claim 1, wherein generating a reward score for the first generator based on a odds ratio of the targeted text data in the targeted text data collection that includes the objectionable content, the reinforcement learning training of the first generator by a reinforcement learning algorithm includes:

generating rewarding scores by taking the harmful rate of harmful contents contained in target text data in the target text data set as the generated rewarding scores of the first generation generator, and performing reinforcement learning training on the first generation generator through a strategy gradient algorithm;

generating rewarding scores by taking the harmful rate of harmful contents contained in target text data in the target text data set as the generated rewarding scores of the first generation generator, and performing reinforcement learning training on the first generation generator through an actor-critique algorithm;

generating rewarding scores by taking the harmful rate of harmful contents contained in target text data in the target text data set as the generated rewarding scores of the first generation generator, and performing reinforcement learning training on the first generation generator through a reinforcement countermeasure training algorithm;

and taking the harm rate of harmful contents contained in the target text data set as the generated reward score of the first generation generator, and performing reinforcement learning training on the first generation generator through a Q-learning algorithm.

8. An apparatus for training generation of an countermeasure network based on text data, the generation of the countermeasure network including a first generation generator and a first generation arbiter, the apparatus comprising:

an acquisition module for acquiring a text data set for training the first generator;

the recognition processing module is used for inputting each text data in the text data set to the first generation generator, recognizing each text data through the first generation generator and generating a target text data set corresponding to the text data set;

the judging and processing module is used for inputting each target text data in the target text data set into the first generation of judging device, judging and processing each target text data through the first generation of judging device to obtain a score of whether each target text data contains harmful content or not;

the determining module is used for determining the harm rate of harmful contents contained in the target text data set according to the score of whether each target text data contains harmful contents or not;

the training module is used for taking the harm rate of harmful contents contained in the target text data set as the generated rewarding score of the first generation generator, and performing reinforcement learning training on the first generation generator through a reinforcement learning algorithm;

And the circulation module is used for circulating the steps until the first generation generator converges and stops training, so as to obtain the target generator.

9. An electronic device comprising a memory, a processor and a computer program stored in the memory and executable on the processor, characterized in that the processor implements the steps of the method according to any of claims 1 to 7 when the computer program is executed.

10. A computer readable storage medium storing a computer program, characterized in that the computer program when executed by a processor implements the steps of the method according to any one of claims 1 to 7.