WO2020239061A1 - 基于文本的事件检测方法、装置、计算机设备及存储介质 - Google Patents

基于文本的事件检测方法、装置、计算机设备及存储介质 Download PDF

Info

Publication number
WO2020239061A1
WO2020239061A1 PCT/CN2020/093189 CN2020093189W WO2020239061A1 WO 2020239061 A1 WO2020239061 A1 WO 2020239061A1 CN 2020093189 W CN2020093189 W CN 2020093189W WO 2020239061 A1 WO2020239061 A1 WO 2020239061A1
Authority
WO
WIPO (PCT)
Prior art keywords
data set
event
instance
probability
discriminator
Prior art date
Application number
PCT/CN2020/093189
Other languages
English (en)
French (fr)
Other versions
WO2020239061A9 (zh
Inventor
王晓智
刘知远
韩旭
孙茂松
李鹏
周杰
Original Assignee
腾讯科技(深圳)有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 腾讯科技(深圳)有限公司 filed Critical 腾讯科技(深圳)有限公司
Publication of WO2020239061A1 publication Critical patent/WO2020239061A1/zh
Publication of WO2020239061A9 publication Critical patent/WO2020239061A9/zh
Priority to US17/367,130 priority Critical patent/US20210334665A1/en

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/088Non-supervised learning, e.g. competitive learning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/3331Query processing
    • G06F16/334Query execution
    • G06F16/3346Query execution using probabilistic model
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/044Recurrent networks, e.g. Hopfield networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/047Probabilistic or stochastic networks

Definitions

  • the embodiments of the present application relate to the field of artificial intelligence technology, in particular to the field of natural language processing technology, and in particular to a text-based event detection method, device, computer equipment, and storage medium.
  • Text-based event detection is an important subtask of event extraction, which is of great significance for various downstream natural language processing applications, such as question answering, information retrieval, and reading comprehension.
  • text-based event detection can be implemented by convolutional neural networks.
  • the training data is obtained in advance through manual annotation.
  • the training data also includes the trigger word manually marked in the text and the event corresponding to the trigger word;
  • the training data performs machine learning training on the convolutional neural network, and the unlabeled text is processed through the trained convolutional neural network to determine the trigger word in the unlabeled text, so as to determine the unlabeled text through the trigger word The corresponding event.
  • a text-based event detection method a text-based event detection method, device, computer equipment, and storage medium.
  • the technical solutions are as follows:
  • a text-based event detection method executed by a computer device, the method includes:
  • the event instances include text and events corresponding to the text;
  • the first data set contains standard event instances, and the second data set contains non-standard event instances;
  • the adversarial network includes a generator and a discriminator; the generator is used to select event instances from the second data set to input to the discriminator; the discriminator is used to output the first data set The first credible probability of the event instance of, and the second credible probability of outputting the event instance input by the generator; the loss function of the adversarial network is used to adjust the parameters of the adversarial network to maximize the first credible probability , And minimize the second credible probability;
  • the standard event instance in the second data set is obtained.
  • a text-based event detection method executed by a computer device, the method includes:
  • the text to be detected is processed by a confrontation network, which is obtained by training on a first data set and a second data set, the first data set contains standard event instances, and the second data set contains Non-standard event instances;
  • the confrontation network includes a generator and a discriminator; the generator is used to select event instances from the second data set for input to the discriminator; the discriminator is used to output the first The first credible probability of the event instance in the data set, and the second credible probability of outputting the event instance input by the generator; the loss function of the confrontation network is used to adjust the parameters of the confrontation network to Maximize the first credible probability and minimize the second credible probability;
  • the event corresponding to the text to be detected is obtained.
  • a text-based event detection device is set in computer equipment, and the device includes:
  • the data set acquisition module is used to acquire a first data set and a second data set that respectively contain event instances.
  • the event instances include text and events corresponding to the text; the first data set contains standard event instances, so The second data set contains non-standard event instances;
  • the confrontation training module is used to train a confrontation network through the first data set and the second data set.
  • the confrontation network includes a generator and a discriminator; the generator is used to select event instances from the second data set To input to the discriminator; the discriminator is used to output the first credible probability of the event instance in the first data set, and output the second credible probability of the event instance input by the generator; the confrontation network The loss function of is used to adjust the parameters of the confrontation network to maximize the first credible probability and minimize the second credible probability;
  • the instance acquisition module is used to acquire standard event instances in the second data set through the combat network completed through training.
  • the average credible probability of the second event instance Standard instance Standard instance Standard instance A computer device that includes one or more processors and one or more memories, and at least one computer-readable instruction is stored in one or more memories At least one computer-readable instruction is loaded and executed by one or more processors to implement the above text-based event detection method.
  • One or more computer-readable storage media, and at least one computer-readable instruction is stored in the storage medium, and the at least one computer-readable instruction is loaded and executed by one or more processors to implement the above text-based event detection method.
  • Fig. 1 is a schematic diagram showing a flow of text-based event detection according to an exemplary embodiment
  • FIG. 2 is a framework diagram of a countermeasure network training and application involved in the embodiment shown in FIG. 1;
  • Fig. 3 is a flow chart showing a method for text-based event detection according to an exemplary embodiment
  • FIG. 4 is an overall frame diagram of a confrontation strategy involved in the embodiment shown in FIG. 3;
  • FIG. 5 is a framework diagram of a countermeasure network training and application involved in the embodiment shown in FIG. 3;
  • FIG. 6 and 7 are schematic diagrams of comparison of two precision recall curves involved in the embodiment shown in FIG. 3;
  • Fig. 8 is a block diagram showing the structure of a text-based event detection device according to an exemplary embodiment
  • Fig. 9 is a schematic structural diagram showing a computer device according to an exemplary embodiment.
  • This application proposes a text-based event detection solution, which can quickly and accurately obtain credible event instances from automatically labeled event instances by means of confrontation training, thereby achieving efficient and high-accuracy event detection.
  • a text-based event detection solution which can quickly and accurately obtain credible event instances from automatically labeled event instances by means of confrontation training, thereby achieving efficient and high-accuracy event detection.
  • the word that can represent the event corresponding to the given text can be called the word in the given text Trigger word.
  • the given text is "Mark Twain and Olivia Langdon married in 1870.”
  • the event corresponding to the given text is a marriage event, and the trigger word in the given text is "marriage”.
  • event detection refers to detecting event trigger words from a given text, and then identifying its specific event type.
  • the trigger word "marriage” of the event can be extracted from the given text "Mark Twain and Olivia Landon married in 1870", and it is further determined that the event corresponding to the given text is a marriage event.
  • Fig. 1 is a schematic diagram showing a flow of text-based event detection according to an exemplary embodiment.
  • the text-based event detection process can be executed by a computer device, which can be a device with certain computing capabilities such as a personal computer, a server, or a workstation.
  • the developer sets up a confrontation network in the computer equipment in advance, and the confrontation network includes a generator and a discriminator.
  • the computer equipment performs the following steps:
  • the event instance includes a text and an event corresponding to the text.
  • the above-mentioned first data set contains standard event instances.
  • a standard event instance refers to an event instance that is accurately labeled by default without errors or noise.
  • the aforementioned second data set contains non-standard event instances.
  • a non-standard event instance is an event instance with inaccurate indicators, incorrect labeling or noisy data.
  • the second data set is not limited to only including non-standard event instances, and may also include standard event instances. It is through the method in the embodiment of the present application that the standard event instance in the second data set is detected. It can be understood that the second data set may also only include non-standard event instances. In this way, the final detection result is an event instance for which no standard is detected from the second data set.
  • the first data set may include accurately labeled event instances (that is, the event instances in the first data set are credible), and the second data set includes accurately labeled event instances and inaccurately labeled events Examples (that is, the event instances in the second data set are non-standard), where the inaccurately labeled event instances included in the second data set are also referred to as noise data in the second data set.
  • S12 Train a confrontation network through the first data set and the second data set.
  • the generator is used to select event instances from the second data set to be input to the discriminator;
  • the discriminator is used to output the first credible probability of the event instances in the first data set, and
  • the event instance input by the processor outputs the second credible probability;
  • the loss function of the confrontation network is used to adjust the parameters of the confrontation network to maximize the first credible probability and minimize the second credible probability .
  • the credible event instance obtained from the second data set through the confrontation network can be added to the first data set, so as to realize automatic control of the first data set. expansion.
  • the aforementioned confrontation network training cannot be used to directly determine which event instances in the second data set are accurately labeled and which event instances are inaccurately labeled, and the event instances in the second data set are credible by default.
  • the training principle of the adversarial network is to train the generator and discriminator through multiple rounds of iterative training of standard event instances and non-standard event instances, and according to the output results of the generator and discriminator in each round of training, and The pre-set loss function continuously adjusts the parameters of the generator and the discriminator, and finally enables the discriminator to more accurately determine which event instances in the second data set are accurately labeled and which event instances are not accurately labeled.
  • the trained adversarial network can be used to select accurately labeled event instances from the second data set. The events corresponding to these selected accurately labeled event instances are based on the text in the event instance Event detected.
  • the solution shown in the embodiment of the present application trains the generator and discriminator in the confrontation network through the first data set containing standard event instances and the second data set containing non-standard event instances, so that the training The latter discriminator can accurately determine whether the event instances in the second data set are credible.
  • this solution does not require a large amount of manual annotation, which saves data preparation time and improves the efficiency of text-based event detection.
  • This solution uses a counter-network approach for event detection, which can accurately eliminate noise data in the second data set and improve the accuracy of event detection.
  • the generator can output the confusing score of the event instance to the discriminator (also called the confusion probability of the event instance) according to the input event instance.
  • the confusion probability is To indicate the probability that the discriminator incorrectly judges whether the corresponding event instance is credible.
  • the above-mentioned confusion probability refers to the probability that the discriminator cannot correctly determine whether an event instance is accurately labeled. That is to say, for the event instance in the second data set, the event instance is non-standard, and the generator's confusion probability for its output refers to the probability that the event instance is accurately labeled by the discriminator, that is, the generation
  • the purpose of the generator is to recommend event instances that are most likely to confuse the discriminator from the second data set, the event instances accurately labeled by the discriminator by default (that is, the event instances in the first data set), and the default event instances recommended by the generator Inaccurate and confusing event instances are identified.
  • the event instance recommended to the discriminator for discrimination can be determined according to the confusion probability output by the generator for each event instance.
  • the parameters of the above-mentioned confrontation network can be adjusted through the loss function and the respective output results of the generator and the discriminator.
  • both the generator and the discriminator can be optimized, that is to say, as the confrontation training progresses, the generator can more and more accurately select confusing event instances from the second data set, and the discriminator It can be more accurately judged whether the input event instance is accurately labeled.
  • the above-mentioned confrontation training process can be shown in Figure 2.
  • Fig. 2 is a framework diagram of a countermeasure network training and application involved in the embodiment shown in Fig. 1 above.
  • a confrontation network is preset.
  • the confrontation network contains a generator and a discriminator.
  • two data sets are set, namely the first data set and the second data set.
  • the first data set contains the default accurate
  • the labeled text corresponds to event instances of the event
  • the second data set contains event instances that are not accurately labeled by default.
  • the number of event instances in the first data set may be less than the number of event instances included in the second data set.
  • the computer device inputs each event instance in the second data set to the generator (corresponding to step S21 in Figure 2), and the generator outputs the confusion probability for the input event instance (corresponding to step S22 in Figure 2). ), and then determine the recommended event instance in the second data set according to the confusion probability (corresponding to step S23 in Figure 2), and input the recommended event instance to the discriminator (corresponding to step S24 in Figure 2); in addition, the computer device will also Each event instance in the first data set is input to the discriminator (corresponding to step S25 in Figure 2); the discriminator respectively outputs the recommended event instance and the credible probability of the event instance in the first data set (corresponding to step S26 in Figure 2) ); The computer device inputs the confusion probability output by the generator and the credible probability output by the discriminator into the loss function (corresponding to step S27 in Figure 2), and optimizes the parameters in the confrontation network through the loss value output by the loss function ( Corresponding to step S28 in Figure 2).
  • the above steps can be repeated until the output result of the discriminator converges (for example, the output result of the discriminator no longer changes significantly).
  • the training of the adversarial network is completed, and the trained adversarial network can be obtained from the second data Centralized screening of credible event instances.
  • the event corresponding to the text contained in the credible event instance is filtered out, which is the event detected based on the text. It can be understood that the recommended event instance is the event instance selected for input to the discriminator.
  • the first data set and the second data set can be quickly and automatically labeled on a large scale through preset rules, and can also be quickly and automatically labeled on a large scale through a weak supervision method.
  • Fig. 3 is a flowchart of a text-based event detection method according to an exemplary embodiment.
  • the text-based event detection method can be used in a computer device to train and event the confrontation network shown in Fig. 2 above. Detection.
  • the text-based event detection method may include the following steps:
  • Step 301 Obtain a first data set and a second data set containing event instances respectively.
  • the event instance includes text and an event corresponding to the text; the first data set includes standard event instances, and the second data set includes non-standard event instances.
  • the event detection scheme shown in this application can be used in weakly-supervised learning application scenarios such as semi-supervised scenarios or remote-supervised scenarios.
  • the computer device can first obtain the first data set; then obtain the event labeling rules according to the first data set, the event labeling rules including the event of the standard instance and the trigger word in the text of the standard instance
  • the standard instance is an event instance in the first data set; then each text outside the first data set is labeled according to the event labeling rule to obtain a candidate data set;
  • the discriminator is pre-trained to obtain the pre-trained discriminator; each event instance in the candidate data set is processed by the pre-trained discriminator to obtain the credible probability of each event instance in the candidate data set;
  • the credible probability of each event instance in the candidate data set is extracted from the candidate data set.
  • the computer device when the computer device obtains the first data set, it can obtain the first data set that is manually labeled.
  • the discriminator when adjusting the adversarial training strategy for a semi-supervised scene, can be pre-trained with small-scale labeled data (ie, the first data set) to make it detectable to a certain extent. Event trigger words in the text and identify the type of event. Then, a potential instance discovery strategy is adopted, and a large-scale candidate set is constructed by using trigger words in small-scale annotation data as heuristic seeds (that is, corresponding to the above-mentioned event annotation rules). Then use the pre-trained discriminator to automatically distinguish the trigger words and event types of all instances in the candidate set to construct a large-scale noisy data set.
  • the small-scale labeled data is regarded as the credible set R (that is, the first data set), and the large-scale automatically labeled data is regarded as the untrusted set U (that is, the second data set).
  • the above-mentioned potential instance discovery strategy based on trigger words is a simple potential instance discovery strategy based on trigger words proposed by the embodiment of this application in order to use unlabeled data. This strategy can automatically label the trigger words and trigger words of the original data. Event type.
  • the above-mentioned strategy based on trigger words is based on a heuristic hypothesis: that is, if a given word acts as an event trigger word in a known instance, all other instances where the word is mentioned in the unlabeled data are potentially possible representations The instance of the event. For example, the word “married” in “Mark Twain and Olivia Langdon married in 1870” is used as a trigger to indicate the event "married”. Based on this, other unlabeled data can contain the word "married” All texts are added to the set of potential instance candidates along with the event "marriage".
  • the potential instance discovery strategy based on trigger words involved in the embodiments of the present application is relatively concise, without considering the correlation between words, trigger words, and event types. Moreover, since the above-mentioned potential instance discovery strategy has fewer restrictions, it is possible to obtain large-scale candidate sets efficiently without relying on special manual design. At the same time, the candidate set can cover more examples and topics.
  • the computer equipment when it obtains the first data set and the second data set containing event instances, it can mark each text according to the preset event labeling rules to obtain the initial data set; the event labeling rule Including the correspondence between events and trigger words; pre-training the discriminator through the initial data set; processing each event instance in the initial data set through the pre-trained discriminator to obtain each of the initial data set The credible probability of each event instance; according to the credible probability of each event instance in the initial data set, the first data set and the second data set are obtained from the initial data set.
  • the computer device when obtaining the first data set and the second data set from the initial data set according to the respective credible probabilities of each event instance in the initial data set, the computer device may In each event instance, the event instance whose credible probability is higher than the first probability threshold is added to the first data set; and among the event instances in the initial data set, the event whose credible probability is not higher than the first probability threshold The instance is added to the second data set.
  • the adaptation strategy for remotely supervised scenes is similar to the adaptation strategy for semi-supervised scenes.
  • all automatic labeling data (which is not all accurate) can be used to pre-train the discriminator first.
  • the discriminator is used to calculate the credibility score (ie credibility probability) of all event instances in the automatically labeled data.
  • the entire set of automatically labeled data can be divided into two parts. Event instances with scores higher than the threshold will be added to the credible set R (ie, the first data set), and other event instances with lower scores will be added to the untrusted set U (ie, the second data set).
  • the credible set R can be used as a seed to obtain more labeled data with a potential instance discovery strategy based on trigger words in the above semi-supervised scenario.
  • Fig. 4 is an overall framework diagram of a confrontation strategy involved in an embodiment of the present disclosure.
  • the overall framework of the confrontation strategy provided by the embodiment of the present application includes a discriminator and a generator.
  • the discriminator is used to detect event trigger words and identify the event type of each instance in the data set. When noise data is given, the discriminator should resist noise and clearly point out that there are no trigger words and events.
  • the generator is used to select examples from the untrusted data set U (ie, the above-mentioned second data set) to confuse the discriminator as much as possible.
  • each event instance x ⁇ R in the first data set clearly indicates its tagged trigger word t and event type e.
  • each instance x ⁇ U in the second data set is untrustworthy, that is, it has a certain probability of labeling errors. Therefore, the embodiment of the present application uses a pre-designed discriminator to determine whether a given event instance can indicate its labeled event type, and its purpose is to maximize the conditional probability P(e
  • x is the information of the instance
  • t is the information of the trigger word
  • e is the marked event type
  • x, t) is the probability that this instance and the trigger word can reflect the corresponding event type e.
  • x, t) is the probability that this instance and the trigger word cannot express the corresponding event type e.
  • E is the symbol of mathematical expectation, Refers to the expectation of a random variable x that obeys the Pu distribution.
  • P R is the reliable distribution of data and distribution generator P u against an unreliable data sampling instances according to a probability. although with Are contradictory, but the noise data in U with All have side effects. Therefore, when the generator and the discriminator reach a balance after being fully trained, the generator tends to choose an informative instance with a higher probability than the noisy instance, and the discriminator increases the resistance to noise and Events can be better classified.
  • the confrontation network also includes an encoder, which is used to encode event instances into embedding vectors so that the generator and The discriminator performs processing, and the parameters of the encoder are also parameters that need to be optimized in the confrontation training.
  • the encoder since the process of obtaining the first data set and the second data set involves the pre-training process of the discriminator, in the pre-training process, in order to facilitate the discriminator to process the event instance, the encoder also needs to And pre-training.
  • Step 302 In each round of adversarial training, each event instance in the first data set and the second data set is encoded by the encoder to obtain each event instance in the first data set and the second data set The embedding vector.
  • the embedding vector is used to indicate each word in the text corresponding to the event instance and the positional relationship between the various words.
  • an encoder based on Convolutional Neural Networks (CNN) or an encoder based on Bidirectional Encoder Representation from Transformers (BERT) can be selected as the encoder for encoding a given event instance.
  • CNN Convolutional Neural Networks
  • BERT Bidirectional Encoder Representation from Transformers
  • CNN-based encoder Represents all words in the event instance as input vectors, including word embedding vectors and position embedding vectors, encoding the position relative to the candidate trigger word, and CNN-based encoder sliding convolution on the input vector
  • the kernel to obtain the hidden embedding vector is as follows:
  • BERT-based encoder similar to the CNN encoder, after summing the word segment vectors and position embedding vectors of all words in the event instance as the input vector, the BERT-based encoder uses a multi-layer bidirectional transform encoder Obtain the hidden embedding vector as follows:
  • the candidate trigger word t divides each word in the event instance x into two parts.
  • a dynamic multiple pooling operation is also used for the hidden embedding vector to obtain the embedding vector x of the event instance:
  • [ ⁇ ] j in the above formula is the j-th dimension value of the pointing quantity, and i refers to the position of the trigger word t.
  • the aforementioned encoder based on CNN and adopting dynamic multiple pooling may be called a dynamic multiple pooling CNN encoder.
  • the aforementioned encoder based on BERT and adopting dynamic multiple pooling may be referred to as a dynamic multiple pooling BERT encoder.
  • step 303 the embedded vector of each event instance in the second data set is processed by the generator to obtain the confusion probability of each event instance in the second data set.
  • the generator aims to deceive the discriminator by selecting the most confusing event instance from U. Therefore, the embodiment of the present application designs a generator to optimize the probability distribution Pu to select event instances. That is, the generator calculates the confusion score of each instance in U, evaluates their degree of confusion, and further calculates the confusion probability P u as follows:
  • x is the embedding vector of event instance x calculated by the encoder.
  • W and b are parameters of the hyperplane.
  • Step 304 according to the confusion probability of each event instance in the second data set, recommend a second event instance from the second data set.
  • the computer device can recommend the second event instance from the second data set according to the confusion probability. For example, the computer device can Each event instance in the set is sorted according to the order of confusion probability from high to low, and at least one event instance ranked in the front row among them is acquired as a recommended second event instance.
  • the computer device may also acquire an event instance with a confusion probability higher than a confusion probability threshold among the event instances in the second data set as the recommended second event instance.
  • the above-mentioned confusion probability threshold may be a probability threshold preset by a developer, or the above-mentioned confusion probability threshold may also be a threshold determined by the computer device according to the confusion probability of each event instance in the second data set.
  • step 305 the respective embedding vectors of the first event instance and the second event instance are processed by the discriminator to obtain the respective credible probabilities of the first event instance and the second event instance.
  • the first event instance is an event instance in the first data set.
  • the discriminator is responsible for judging whether the given event instance correctly corresponds to its labeled trigger word and event type.
  • the discriminator can be implemented as follows:
  • e is the embedding vector of event type e ⁇ E.
  • x, t) represents the credible probability of event instance x.
  • the discriminator not only processes the second event instance recommended by the generator to output its credible probability, but also processes the first event instance in the first data set to output its credibility. Probability.
  • the computer device before processing the embedded vector of each event instance in the second data set by the generator, may also sample the second data set to obtain each event in the second data set Instance; Correspondingly, before processing the respective embedding vectors of the first event instance and the second event instance by the discriminator, the computer device also samples the first data set to obtain the first event instance.
  • the first data set and the second data set can be sampled separately (for example, sampling can be performed in a uniform random manner) to obtain a subset of the first data set and a subset of the second data set. And according to the subset of the first data set and the subset of the second data set obtained by sampling, the subsequent steps are processed.
  • the foregoing sampling process for the first data set may be performed before step 302 or before step 305; the foregoing sampling process for the second data set may be performed before step 302, or may be performed before step 303.
  • step 306 if the output result of the discriminator does not converge, a loss value is calculated according to the loss function, the output result of the generator, and the output result of the discriminator.
  • the output result of the discriminator has a small change relative to the output result of the previous round or multiple rounds, for example, the difference of the output result is less than the preset difference threshold, that is, the discriminator's The output result converges. At this point, the training of the adversarial network is completed.
  • the output result of the discriminator has a large change with respect to the output result of the previous round or multiple rounds, for example, the difference of the output result is not less than the preset difference threshold, it can be considered that the output result of the discriminator is not Convergence, at this time, the parameters of the confrontation network need to be optimized. That is, the loss value is calculated by the loss function and the output results of the discriminator and generator.
  • Step 307 Adjust the parameters of the confrontation network according to the loss value.
  • the loss function includes a first loss function; when calculating the loss value according to the loss function, the output result of the generator, and the output result of the discriminator, the computer device may calculate the loss value according to the first loss function Calculate the first loss value of the first credible probability of the first event instance, the second credible probability of the second event instance, and the confusion probability of the second event instance.
  • the computer device when adjusting the parameters of the confrontation network according to the loss value, can adjust the parameters of the encoder and the discriminator according to the first loss value.
  • the optimized discriminator will give high scores to those event instances in R (i.e. the first data set) (i.e. output high credibility probabilities), and at the same time do not trust in U (i.e. the second data set)
  • U i.e. the second data set
  • the loss function can be formalized as follows to optimize the discriminator:
  • the loss function includes a second loss function; when calculating the loss value according to the loss function, the output result of the generator, and the output result of the discriminator, the computer device can calculate the loss value according to the The second loss function, the second credible probability of the second event instance, and the confusion probability of the second event instance calculate the second loss value.
  • the computer device may adjust the parameters of the generator according to the second loss value.
  • the solution shown in this application hopes that the optimized generator can pay more attention to the most confusing event examples. Therefore, given an instance x ⁇ U, its unreliable trigger word t and event type e, the loss function can be formalized as follows to optimize the generator:
  • x,t) is the output result calculated by the discriminator (ie, the probability of confusion).
  • the part that calculates P u (x) is taken as the parameter that needs to be updated.
  • the loss function Corresponds to equation (1)
  • the computer device may calculate the second loss value according to the second event instance Obtain the average credible probability of the second event instance; and calculate the second loss value according to the second loss function, the average credible probability, and the confusion probability of the second event instance.
  • the embodiment of the present application may use the average scores of all event types to replace P(e
  • represents a collection of event types.
  • the computer device when calculating the loss value according to the loss function, the output result of the generator, and the output result of the discriminator, may sample the first event instance to obtain the first sampling instance; Sampling the second event instance to obtain a second sampling instance; according to the loss function, the output result of the generator for the second sampling instance, and the discriminator respectively the first sampling instance and the second sampling instance The output result of, calculate the loss value.
  • the embodiment of the present application can sample a subset of R and U to estimate the basic probability distribution, and form a new loss function:
  • is a hyperparameter, which controls the sharpness of the probability distribution to avoid the weight concentration on certain specific instances.
  • is a harmonic factor, with It can be alternately optimized in confrontation training, and ⁇ is reflected in Learning rate.
  • the above-mentioned sampling process can be executed before the event instances in the first data set and the second data set are processed by the encoder, generator and discriminator, that is, the encoder, generator and discriminator process the sampled event instances, Subsequently, the loss value is calculated from the output result of the sampled event instance.
  • the above sampling process can also be performed after the event instances in the first data set and the second data set are processed by the encoder, generator, and discriminator, that is, the first data set and the discriminator are processed by the encoder, generator, and discriminator. All event instances in the second data set are processed, and sampling is performed before calculating the loss value, and the output result of the sampled event instance is calculated by the generator and the discriminator.
  • hyperparameter settings of the above generator and discriminator may be as shown in Table 1 below:
  • Drop probability of random inactivation 5 ⁇ 10 -1 The learning rate of the generator with dynamic multi-pooling CNN as the encoder 5 ⁇ 10 -3
  • Step 308 For the target event instance recommended by the generator after the training in the second data set, when the discriminator after the training has a credible output for the target event instance is higher than the first probability threshold, The target event instance is added to the first data set.
  • Adversarial training iteratively can identify information-rich instances and filter out noise instances in U, enabling the use of large-scale unlabeled data to enrich small-scale labeled data.
  • FIG. 5 is a framework diagram of a countermeasure network training and application involved in the above embodiment of the present application.
  • the computer device acquires the first data set and the second data set.
  • the acquisition process of the first data set and the second data set reference may be made to the description under step 301 above, which will not be repeated here.
  • the computer device samples each event instance in the second data set to obtain a second data subset, and inputs it to the generator (S51), and the generator outputs the confusion probability for the input event instance (S52) , And then determine the recommended event instance in the second data subset according to the confusion probability (S53), and input the recommended event instance to the discriminator (S54); in addition, the computer device also samples the first data set to obtain the first data subset Set, and input each event instance in the first data subset to the discriminator (S55); the discriminator outputs the recommended event instance and the credible probability of the event instance in the first data subset (S56); the computer equipment judges according to The output of the detector determines whether to converge (S57); if so, the computer device recommends and discriminates each event instance in the second data set through the adversarial network to determine a credible event instance from the second data set and add it to the first data Concentration (S58); if not, the computer device inputs the confusion probability output by the generator and the credible probability
  • Fig. 6 and Fig. 7 are schematic diagrams of comparison of two precision recall curves in a remote supervision scenario related to an embodiment of the present application.
  • Figure 6 shows the adversarial network model provided by the present application with dynamic multi-pooling CNN as the encoder, and three weakly supervised models in related technologies (ie correlation model 1, Correlation Model 2 and Correlation Model 3) Schematic diagrams of their respective precision recall curves in text-based event detection applications.
  • Figure 7 shows the confrontation network model provided by the present application with dynamic multi-pooling BERT as the encoder, and three weakly supervised models (ie correlation model 4, correlation model 5 and related models 6)
  • three weakly supervised models ie correlation model 4, correlation model 5 and related models 6
  • the embodiment of this application uses the existing trigger words in the original training set (such as the ACE-2005 training set) as heuristic seeds, and through the above-mentioned potential instance discovery strategy based on trigger words, from a corpus (such as the New York Times) ⁇ A large-scale candidate set is constructed in the corpus), and the adversarial network shown in the embodiment of this application is used to train and filter out the noise examples to construct a new data set, and then use the new data set to expand the original training set. Obtain the extended training set, and test the adversarial network trained on the extended training set on the original test set.
  • the existing trigger words in the original training set such as the ACE-2005 training set
  • a corpus such as the New York Times
  • the dynamic multi-pooling CNN is used as the encoder, and the confrontation network model trained through the original training set is CNN model 1; the dynamic multi-pooling CNN is used as the encoder, and the confrontation network model trained through the extended training set is CNN model 2; uses dynamic multiple pooling BERT as the encoder, and the confrontation network model trained through the original training set is BERT model 1; uses dynamic multiple pooling BERT as the encoder, and the confrontation network model trained through the extended training set is the BERT model 2. Comparing the above-mentioned CNN model 1, CNN model 2, BERT model 1, and BERT model 2 with the weakly supervised model trained by the ACE-2005 training set in related technologies (select related models 7-15), you can get the following table 3 compare results.
  • the P column in Table 3 above represents the accuracy rate
  • the R column represents the recall rate
  • the F1 column represents the harmonic average of the accuracy rate and the recall rate.
  • this application passes the average accuracy and Fleis Kappa ( Fleiss's Kappa coefficient is used to evaluate the weakly supervised models in related technologies (correlation model 16 and correlation model 17) and the model of this application.
  • the ACE-2005 instance is a typical event instance in the ACE-2005 training set corresponding to a prosecution event.
  • the two instances in the extended instance are event instances obtained by sampling from the data set constructed by the scheme provided by this application.
  • the first event instance has the trigger word of the ACE-2005 instance, but the syntax is different; the second event instance has a new trigger word that is not included in the ACE-2005 instance.
  • 1.2% of the trigger words are newly discovered trigger words. This shows that the method shown in this application can not only find new instances from unlabeled data similar to the instances in the labeled data, but also discover new trigger words, thereby expanding the coverage of the data set.
  • the discriminator in the confrontation network can predict the event corresponding to the input text.
  • the recognition device such as an online server deployed with the above-trained confrontation network can obtain the text to be recognized (such as a natural language sentence), and process the text to be detected through the trained confrontation network, and then process the text to be detected according to the confrontation network
  • the discriminator in the output result of the text to be detected obtains the event corresponding to the text to be detected, so as to realize the event detection of the text to be recognized.
  • the solution shown in the embodiment of the present application trains the generator and discriminator in the confrontation network through the first data set containing standard event instances and the second data set containing non-standard event instances, so that the training The latter discriminator can accurately determine whether the event instances in the second data set are credible.
  • this solution does not require a large amount of manual annotation, which saves data preparation time and improves the efficiency of text-based event detection.
  • This solution uses a counter-network approach for event detection, which can accurately eliminate noise data in the second data set and improve the accuracy of event detection.
  • the embodiment of the present application proposes a confrontation training mechanism, which can not only automatically extract more informative examples from the candidate set, but also improve the performance of the event detection model in noisy data scenarios (such as remote supervision).
  • noisy data scenarios such as remote supervision.
  • the potential instance discovery strategy based on trigger words and the adversarial training method can cooperate to obtain more diverse and accurate training data, and reduce noise problems
  • this application provides a new weakly-supervised event detection model, which can expand the data set to achieve a higher coverage, and reduce the low coverage, topic deviation and noise problems in event detection, and ultimately improve the effect of event detection.
  • the training and application schemes of the confrontation network shown in the various embodiments of this application can be applied to text-based event detection and subsequent application of artificial intelligence (AI) scenarios based on the detected events.
  • AI artificial intelligence
  • the implementation of this application The training and application scheme of the confrontation network shown in the example can automatically recognize the corresponding event from the text described in natural language by AI, and combine the recognized event to provide AI services such as intelligent question answering, information retrieval, and reading comprehension.
  • the confrontation network shown in the embodiment of the present application can be applied to a service system based on natural language.
  • a natural language-based service system can deploy the above-mentioned confrontation network after the training is completed, and provide a service interface to the outside.
  • the service system such as an intelligent question answering service
  • the user’s terminal can send to the service system through the service interface
  • the service system generates the corresponding sentence text through natural language, and then detects the event corresponding to the sentence text through the confrontation network, and then provides the user with an intelligent question answering service based on the detected event.
  • the confrontation network shown in the embodiment of the present application can also be independently deployed as an event detection system.
  • an event detection system deployed with a confrontation network after the training is completed can provide external service interfaces.
  • a service system based on natural language such as an intelligent question answering system, receives the natural language sent by the user’s terminal and generates the corresponding response through the natural language. Then send the sentence text to the event detection system through the service interface.
  • the event detection system detects the event corresponding to the sentence text through the confrontation network, and sends the detected event to the service system so that the service system can detect The event that appears provides intelligent question answering service to the user.
  • This application only uses the above-mentioned service system to provide users with intelligent question and answer services as an example.
  • the above-mentioned service system may also provide users with other services based on events detected in the text, such as retrieval or reading comprehension. Wait.
  • Fig. 8 is a block diagram showing the structure of a text-based event detection device according to an exemplary embodiment.
  • the text-based event detection apparatus can be used in computer equipment to perform all or part of the steps in the embodiment shown in FIG. 1 or FIG. 3.
  • the device has functional modules or units that implement the foregoing method examples, and each functional module or unit can be implemented in whole or in part by software, hardware, or a combination thereof.
  • the text-based event detection device may include:
  • the data set acquisition module 801 is used to acquire a first data set and a second data set containing event instances respectively.
  • the event instances include text and events corresponding to the text; the first data set contains standard event instances, and the second data set contains non- Standard event instance;
  • the confrontation training module 802 is used to train the confrontation network through the first data set and the second data set.
  • the confrontation network includes a generator and a discriminator; the generator is used to recommend event instances from the first data set and the second data set; the discriminator is used To output the credible probability of the event instance in the first data set, and the credible probability of the event instance recommended by the generator; the loss function of the adversarial network is used to adjust the parameters of the adversarial network to maximize the first credible probability , And minimize the second credible probability.
  • the first credible probability is the credible probability that the discriminator outputs the event instance belonging to the first data set, and the second credible probability is the discriminator’s credible probability of the event belonging to the second data set.
  • the instance obtaining module 803 is configured to obtain standard event instances in the second data set through the trained confrontation network.
  • the confrontation network further includes an encoder, and the confrontation training module 802 is used to:
  • the encoder encodes each event instance in the first data set and the second data set to obtain the embedding vector of each event instance in the first data set and the second data set, and the embedding vector is used for Indicate each word in the text of the corresponding event instance, and the positional relationship between each word;
  • the generator processes the embedding vector of each event instance in the second data set to obtain the confusion probability of each event instance in the first data set and the second data set; the confusion probability is used to indicate whether the discriminator incorrectly judges whether the corresponding event instance Credible probability
  • the respective embedding vectors of the first event instance and the second event instance are processed by the discriminator to obtain the respective credible probabilities of the first event instance and the second event instance;
  • the first event instance is the event instance in the first data set;
  • the loss value is calculated according to the loss function, the output result of the generator and the output result of the discriminator;
  • the loss function includes the first loss function
  • the confrontation training module 802 is used to:
  • the confrontation training module 802 is used to:
  • the parameters of the encoder and the discriminator are adjusted according to the first loss value.
  • the loss function includes a second loss function
  • the confrontation training module 802 is used to:
  • the confrontation training module 802 is used to:
  • the parameters of the generator are adjusted according to the second loss value.
  • the confrontation training module 802 when calculating the second loss value according to the second loss function, the second credible probability of the second event instance, and the confusion probability of the second event instance, the confrontation training module 802 is used to:
  • the second loss value is calculated according to the second loss function, the average credible probability, and the confusion probability of the second event instance.
  • the confrontation training module 802 when calculating the loss value according to the loss function, the output result of the generator, and the output result of the discriminator, the confrontation training module 802 is used to:
  • the loss value is calculated.
  • the instance acquisition module 803 is used to:
  • the target event instance For the target event instance recommended by the generator after training in the second data set, when the confidence probability of the output of the target event instance by the trained discriminator is higher than the first probability threshold, the target event instance is added to the first data set.
  • the data set acquisition module 801 is used for the data set acquisition module 801 .
  • the event labeling rules include the correspondence between the events of the standard instance and the trigger words in the text of the standard instance, and the standard instance is the event instance in the first data set;
  • each event instance in the candidate data set is processed to obtain the credible probability of each event instance in the candidate data set;
  • the second data set is extracted from the candidate data set.
  • the data set acquisition module 801 when acquiring the first data set, is configured to acquire the first data set that is manually labeled.
  • the data set acquisition module 801 is used to:
  • the event labeling rules include the correspondence between events and trigger words
  • each event instance in the initial data set is processed to obtain the respective credible probability of each event instance in the initial data set;
  • the first data set and the second data set are obtained from the initial data set.
  • the data set obtaining module 801 when obtaining the first data set and the second data set from the initial data set according to the respective credible probabilities of each event instance in the initial data set, is configured to:
  • the solution shown in the embodiment of the present application trains the generator and discriminator in the confrontation network through the first data set containing standard event instances and the second data set containing non-standard event instances, so that the training The latter discriminator can accurately determine whether the event instances in the second data set are credible.
  • this solution does not require a large amount of manual annotation, which saves data preparation time and improves the efficiency of text-based event detection.
  • This solution uses a counter-network approach for event detection, which can accurately eliminate noise data in the second data set and improve the accuracy of event detection.
  • Fig. 9 is a schematic structural diagram showing a computer device according to an exemplary embodiment.
  • the computer device 900 includes a central processing unit (CPU) 901, a system memory 904 including a random access memory (RAM) 902 and a read only memory (ROM) 903, and a system bus 905 connecting the system memory 904 and the central processing unit 901.
  • the computer device 900 also includes a basic input/output system (I/O system) 906 that helps to transfer information between various devices in the computer, and a mass storage device for storing the operating system 913, application programs 914, and other program modules 915 907.
  • I/O system basic input/output system
  • the basic input/output system 906 includes a display 908 for displaying information and an input device 909 such as a mouse and a keyboard for the user to input information.
  • the display 908 and the input device 909 are both connected to the central processing unit 901 through the input and output controller 910 connected to the system bus 905.
  • the basic input/output system 906 may also include an input and output controller 910 for receiving and processing input from multiple other devices such as a keyboard, a mouse, or an electronic stylus.
  • the input and output controller 910 also provides output to a display screen, a printer, or other types of output devices.
  • the mass storage device 907 is connected to the central processing unit 901 through a mass storage controller (not shown) connected to the system bus 905.
  • the mass storage device 907 and its associated computer-readable medium provide non-volatile storage for the computer device 900. That is, the mass storage device 907 may include a computer-readable medium (not shown) such as a hard disk or a CD-ROM drive.
  • Computer-readable media may include computer storage media and communication media.
  • Computer storage media includes volatile and nonvolatile, removable and non-removable media implemented in any method or technology for storing information such as computer readable instructions, data structures, program modules or other data.
  • Computer storage media include RAM, ROM, EPROM, EEPROM, flash memory or other solid-state storage technologies, CD-ROM, DVD or other optical storage, tape cartridges, magnetic tape, disk storage or other magnetic storage devices.
  • RAM random access memory
  • ROM read-only memory
  • EPROM Erasable programmable read-only memory
  • EEPROM electrically erasable programmable read-only memory
  • the computer device 900 may be connected to the Internet or other network devices through the network interface unit 911 connected to the system bus 905.
  • the memory also includes one or more programs, one or more programs are stored in the memory, and the central processing unit 901 executes the one or more programs to implement all or all of the methods shown in FIG. 2, FIG. 1 or FIG. Part of the steps.
  • non-transitory computer-readable storage medium including instructions, such as a memory including a computer program (instruction), which can be executed by a processor of a computer device to complete the application All or part of the steps of the method shown in each embodiment.
  • the non-transitory computer-readable storage medium may be ROM, random access memory (RAM), CD-ROM, magnetic tape, floppy disk, optical data storage device, etc.

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Computational Linguistics (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • General Health & Medical Sciences (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Evolutionary Computation (AREA)
  • Biomedical Technology (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • Artificial Intelligence (AREA)
  • Biophysics (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Health & Medical Sciences (AREA)
  • Probability & Statistics with Applications (AREA)
  • Databases & Information Systems (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
  • Machine Translation (AREA)

Abstract

一种基于文本的事件检测方法,包括:获取分别包含事件实例的第一数据集和第二数据集,第一数据集中包含标准的事件实例,第二数据集中包含非标准的事件实例;通过第一数据集和第二数据集训练对抗网络,通过训练完成的对抗网络,获取第二数据集中的标准的事件实例。

Description

基于文本的事件检测方法、装置、计算机设备及存储介质
本申请要求于2019年05月31日提交中国专利局,申请号为2019104716051,申请名称为“基于文本的事件检测方法、装置、计算机设备及存储介质”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。
技术领域
本申请实施例涉及人工智能技术领域,具体涉及自然语言处理技术领域,特别涉及一种基于文本的事件检测方法、装置、计算机设备及存储介质。
背景技术
基于文本的事件检测是事件抽取的重要子任务,对于各种下游自然语言处理应用,例如问答、信息检索和阅读理解等,都有着很重要的意义。
在相关技术中,基于文本的事件检测可以通过卷积神经网络来实现。比如,预先通过人工标注的方式获取训练数据,训练数据除了包括文本(比如一个完整的句子)之外,还包括该文本中由人工标注的触发词,以及触发词对应的事件;通过人工标注的训练数据对卷积神经网络进行机器学习训练,并通过训练好的卷积神经网络对未标注的文本进行处理,以确定未标注的文本中的触发词,从而通过该触发词确定未标注的文本对应的事件。
然而,相关技术中的方案需要人工标注训练数据,在模型的训练效率和训练准确性方面都存在瓶颈,从而导致基于文本的事件检测的效率和准确性都不高。
发明内容
根据本申请提供的各种实施例,提供了一种基于文本的事件检测方法、装置、计算机设备及存储介质。技术方案如下:
一种基于文本的事件检测方法,由计算机设备执行,方法包括:
获取分别包含事件实例的第一数据集和第二数据集,事件实例包括文本以及文本对应的事件;第一数据集中包含标准的事件实例,第二数据集中包含非标准的事件实例;
通过第一数据集和第二数据集训练对抗网络,对抗网络包括生成器和判别器;生成器用于从第二数据集中选取事件实例以输入至所述判别器;判别 器用于输出第一数据集中的事件实例的第一可信概率,以及对由生成器输入的事件实例输出第二可信概率;对抗网络的损失函数用于对对抗网络的参数进行调整,以使得第一可信概率最大化,并使得第二可信概率最小化;
通过训练完成的对抗网络,获取第二数据集中的标准的事件实例。
一种基于文本的事件检测方法,由计算机设备执行,方法包括:
获取待检测文本;
通过对抗网络对所述待检测文本进行处理,所述对抗网络是通过第一数据集和第二数据集训练获得的,所述第一数据集中包含标准的事件实例,所述第二数据集中包含非标准的事件实例;所述对抗网络包括生成器和判别器;所述生成器用于从所述第二数据集中选取事件实例以输入至所述判别器;所述判别器用于输出所述第一数据集中的事件实例的第一可信概率,以及对由所述生成器输入的事件实例输出第二可信概率;所述对抗网络的损失函数用于对所述对抗网络的参数进行调整,以使得所述第一可信概率最大化,并使得所述第二可信概率最小化;
根据所述对抗网络中的判别器对所述待检测文本的输出结果,获取所述待检测文本对应的事件。
一种基于文本的事件检测装置,设置于计算机设备中,装置包括:
数据集获取模块,用于获取分别包含事件实例的第一数据集和第二数据集,所述事件实例包括文本以及所述文本对应的事件;所述第一数据集中包含标准的事件实例,所述第二数据集中包含非标准的事件实例;
对抗训练模块,用于通过所述第一数据集和所述第二数据集训练对抗网络,所述对抗网络包括生成器和判别器;所述生成器用于从所述第二数据集中选取事件实例以输入至所述判别器;所述判别器用于输出第一数据集中的事件实例的第一可信概率,以及对由所述生成器输入的事件实例输出第二可信概率;所述对抗网络的损失函数用于对所述对抗网络的参数进行调整,以使得所述第一可信概率最大化,并使得所述第二可信概率最小化;
实例获取模块,用于通过训练完成的所述对抗网络,获取所述第二数据集中的标准的事件实例。
第二事件实例的平均可信概率标准实例标准实例标准实例一种计算机设备,计算机设备包含一个或多个处理器和一个或多个存储器,一个或多个存 储器中存储有至少一条计算机可读指令,至少一条计算机可读指令由一个或多个处理器加载并执行以实现如上的基于文本的事件检测方法。
一个或多个计算机可读存储介质,存储介质中存储有至少一条计算机可读指令,至少一条计算机可读指令由一个或多个处理器加载并执行以实现如上的基于文本的事件检测方法。
本申请的一个或多个实施例的细节在下面的附图和描述中提出。基于本申请的说明书、附图以及权利要求书,本申请的其它特征、目的和优点将变得更加明显。
附图说明
为了更清楚地说明本申请实施例中的技术方案,下面将对实施例描述中所需要使用的附图作简单地介绍,显而易见地,下面描述中的附图仅仅是本申请的一些实施例,对于本领域普通技术人员来讲,在不付出创造性劳动的前提下,还可以根据这些附图获得其他的附图。
图1是根据一示例性实施例示出的一种基于文本的事件检测流程示意图;
图2是图1所示实施例涉及的一种对抗网络训练及应用的框架图;
图3是根据一示例性实施例示出的一种基于文本的事件检测方法的流程图;
图4是图3所示实施例涉及的一种对抗策略的总体框架图;
图5是图3所示实施例涉及的一种对抗网络训练及应用的框架图;
图6和图7是图3所示实施例涉及的两种精度召回曲线对比示意图;
图8是根据一示例性实施例示出的一种基于文本的事件检测装置的结构方框图;
图9是根据一示例性实施例示出的一种计算机设备的结构示意图。
具体实施方式
这里将详细地对示例性实施例进行说明,其示例表示在附图中。下面的描述涉及附图时,除非另有表示,不同附图中的相同数字表示相同或相似的要素。以下示例性实施例中所描述的实施方式并不代表与本申请相一致的所有实施方式。相反,它们仅是与如所附权利要求书中所详述的、本申请的一 些方面相一致的装置和方法的例子。
本申请提出了一种基于文本的事件检测方案,该方案能够通过对抗训练的方式从自动标注的事件实例中快速且准确的获取可信的事件实例,从而实现高效并且高准确率的事件检测。为了便于理解,下面对本申请实施例涉及的几个名词进行解释。
(1)触发词:
在本申请中,对于一个给定文本,该给定文本包含的多个词(可以是单词或者短语)中,能够表示该给定文本对应事件的词,即可以称为该给定文本中的触发词。
比如,给定文本为“马克吐温和奥利维亚兰登于1870年结婚”。该给定文本对应的事件是结婚事件,则该给定文本中的触发词就是“结婚”。
(2)事件检测:
在本申请中,事件检测是指从给定文本中检测事件触发词,然后识别其特定事件类型。例如,可以从上述给定文本“马克吐温和奥利维亚兰登于1870年结婚”中提取事件的触发词“结婚”,进一步确定该给定文本对应的事件是结婚事件。
本申请后续各个实施例的方案是一种对抗网络的训练及应用方案。图1是根据一示例性实施例示出的一种基于文本的事件检测流程示意图。如图1所示,基于文本的事件检测流程可以由计算机设备执行,该计算机设备可以是个人电脑、服务器或者工作站等具有一定计算能力的设备。开发人员预先在计算机设备中设置一个对抗网络,该对抗网络中包含生成器和判别器。在进行基于文本的事件检测时,计算机设备执行以下步骤:
S11,获取分别包含事件实例的第一数据集和第二数据集。
在本申请实施例中,事件实例包括文本以及该文本对应的事件。
比如,假设文本为“马克吐温和奥利维亚兰登于1870年结婚”,该文本对应的事件是结婚事件,那么,一个可能的事件实例可以由上述文本“马克吐温和奥利维亚兰登于1870年结婚”,以及“结婚事件”这两部分组成,其中,“结婚事件”是对文本“马克吐温和奥利维亚兰登于1870年结婚”的标注。
其中,上述第一数据集中包含标准的事件实例。标准的事件实例,是指 默认为标注准确的、不存在错误或噪音的事件实例。
上述第二数据集中包含非标准的事件实例。非标准的事件实例,是指标注不准确的、存在错误标注或噪音数据的事件实例。需要说明的是,第二数据集并不限定于仅包括非标准的事件实例,也可以包括标准的事件实例。正是通过本申请实施例中的方法,识别检测出第二数据集中标准的事件实例。可以理解,第二数据集也可以仅包括非标准的事件实例,这样一来,最终的检测结果,即为未从第二数据集中检测出标准的事件实例。
在本申请实施例中,第一数据集可以包括标注准确的事件实例(即第一数据集中的事件实例是可信的),而第二数据集包含标注准确的事件实例以及标注不准确的事件实例(即第二数据集中的事件实例是非标准的),其中,在第二数据集中包含的标注不准确的事件实例也称为第二数据集中的噪声数据。
S12,通过该第一数据集和该第二数据集训练对抗网络。在本申请实施例中,生成器用于从该第二数据集中选取事件实例以输入至该判别器;判别器用于输出该第一数据集中的事件实例的第一可信概率,以及对由该生成器输入的事件实例输出第二可信概率;该对抗网络的损失函数用于对该对抗网络的参数进行调整,以使得该第一可信概率最大化,并使得该第二可信概率最小化。
S13,通过训练完成的该对抗网络,获取该第二数据集中的标准的事件实例。
在本申请实施例中,在训练完成上述对抗网络之后,可以将通过该对抗网络从第二数据集中获取到的可信的事件实例添加到第一数据集中,以实现对第一数据集的自动扩充。
上述对抗网络训练之前并不能用来直接确定第二数据集中哪些事件实例标注准确,哪些事件实例标注不准确,并且,默认第二数据集中的事件实例是可信的。而对抗网络的训练原理,就是通过标准的事件实例和非标准的事件实例对生成器和判别器的多轮迭代训练,并根据生成器和判别器在每一轮训练过程中的输出结果,以及预先设置好的损失函数,不断的对生成器和判别器的参数进行调整,最终使得判别器能够较为准确的判断出第二数据集哪些事件实例标注准确,哪些事件实例标注不准确。在对抗网络训练完成后, 即可以通过训练好的对抗网络,从第二数据集中挑选出标注准确的事件实例,这些挑选出的标注准确的事件实例对应的事件,就是基于该事件实例中的文本检测出的事件。
综上,本申请实施例所示的方案,通过包含标准的事件实例的第一数据集,以及包含非标准的事件实例的第二数据集,训练对抗网络中的生成器和判别器,使得训练后的判别器能够准确的判别出第二数据集中的事件实例是否可信,一方面,本方案不需要大量的人工标注,节省了数据准备时间,提高了基于文本的事件检测效率,另一方面,本方案采用对抗网络的方式进行事件检测,能够准确的排除第二数据集中的噪声数据,提高事件检测的准确性。
其中,在上述图1所示的方案中,生成器可以根据输入的事件实例,输出该事件实例对判别器的迷惑性评分(也称事件实例的混淆概率),在本申请中,混淆概率用于指示判别器错误判别对应的事件实例是否可信的概率。
换句话说,上述混淆概率是指判别器不能正确判断某一事件实例是否被准确标注的概率。也就是说,对于第二数据集中的事件实例,该事件实例非标准,则生成器对其输出的混淆概率,是指通过判别器判断出该事件实例被准确标注的概率,也就是说,生成器的目的,是从第二数据集中推荐最容易对判别器造成迷惑的事件实例,由判别器对默认准确标注的事件实例(即第一数据集中的事件实例),以及由生成器推荐的默认不准确且具有迷惑性的事件实例进行判别。
在通过生成器推荐事件实例时,可以根据生成器对各个事件实例输出的混淆概率,来确定推荐给判别器进行判别的事件实例。并且,在进行对抗训练过程中,可以通过损失函数,以及生成器和判别器各自的输出结果,对上述对抗网络中的参数的参数进行调整。上述优化过程中,生成器和判别器都可以得到优化,也就是说,随着对抗训练的进行,生成器能够越来越准确的从第二数据集中挑选具有迷惑性的事件实例,而判别器能够越来越准确的判别输入的事件实例是否被准确标注。上述对抗训练过程可以如图2所示。
图2是上述图1所示的实施例涉及的一种对抗网络训练及应用的框架图。如图2所示,预先设置一个对抗网络,对抗网络中包含生成器和判别器,另外还设置两个数据集,即第一数据集和第二数据集;其中,第一数据集中包 含默认准确标注文本对应事件的事件实例,而第二数据集中包含默认未准确标注文本对应事件的事件实例。在一个实施例中,第一数据集中的事件实例数量可以少于第二数据集中包含的事件实例的数量。
在对抗训练过程中,计算机设备将第二数据集中的各个事件实例输入至生成器(对应图2中的步骤S21),由生成器对输入的事件实例输出混淆概率(对应图2中的步骤S22),再根据混淆概率确定第二数据集中推荐的事件实例(对应图2中的步骤S23),将推荐的事件实例输入至判别器(对应图2中的步骤S24);此外,计算机设备还将第一数据集中的各个事件实例输入至判别器(对应图2中的步骤S25);判别器分别输出推荐的事件实例和第一数据集中的事件实例的可信概率(对应图2中的步骤S26);计算机设备将生成器输出的混淆概率,以及判别器输出的可信概率,输入至损失函数(对应图2中的步骤S27),并通过损失函数输出的损失值优化对抗网络中的参数(对应图2中的步骤S28)。上述各个步骤可以反复执行,直至判别器的输出结果收敛(比如,判别器的输出结果不再大幅度的变化),此时可以认为对抗网络训练完成,可以通过训练后的对抗网络从第二数据集中筛选可信的事件实例。其中,筛选出可信的事件实例中包含的文本所对应的事件,就是基于该文本检测出的事件。可以理解,推荐的事件实例,即为选取的用于输入判别器的事件实例。
在上述图1和图2所示的方案中,第一数据集和第二数据集可以通过预设的规则快速的大规模自动标注,也可以通过弱监督方式快速的大规模自动标注。
图3是根据一示例性实施例示出的一种基于文本的事件检测方法的流程图,该基于文本的事件检测方法可以用于计算机设备,以对上述图2所示的对抗网络进行训练和事件检测。如图3所示,该基于文本的事件检测方法可以包括如下步骤:
步骤301,获取分别包含事件实例的第一数据集和第二数据集。
其中,该事件实例包括文本以及该文本对应的事件;该第一数据集中包含标准的事件实例,该第二数据集中包含非标准的事件实例。
本申请所示的事件检测方案,可以用于半监督场景或者远程监督场景等弱监督学习的应用场景。
一)在半监督场景下,计算机设备可以首先获取第一数据集;然后根据该第一数据集获取事件标注规则,该事件标注规则包括标准实例的事件与该标准实例的文本中的触发词之间的对应关系,该标准实例是该第一数据集中的事件实例;再根据该事件标注规则对该第一数据集之外的各个文本进行标注,获得候选数据集;通过该第一数据集对该判别器进行预训练,获得预训练的该判别器;通过该预训练的判别器对该候选数据集中的各个事件实例进行处理,获得该候选数据集中的各个事件实例的可信概率;根据该候选数据集中的各个事件实例的可信概率,从该候选数据集中提取。
在一个实施例中,在半监督场景下,计算机设备获取第一数据集时,可以获取人工标注的该第一数据集。
也就是说,在本公开实施例中,在针对半监督场景调整对抗训练策略时,可以首先使用小规模标注数据(即第一数据集)预训练判别器,以使其能够在一定程度上检测文本中的事件触发词并识别事件类型。然后采用潜在实例发现策略,利用小规模标注数据中的触发词作为启发式种子(即对应上述事件标注规则)构建一个大规模的候选集。再使用预训练的判别器来自动判别候选集中所有实例的触发词和事件类型,以构建一个大规模的有噪声的数据集。将小规模标注数据作为可信集R(即第一数据集),将大规模自动标注数据作为不可信集U(即第二数据集)。
其中,上述基于触发词的潜在实例发现策略,是本申请实施例为了利用未标注的数据,提出的一种简单的基于触发词的潜在实例发现策略,该策略可以自动标注原始数据的触发词和事件类型。
其中,上述基于触发词的策略是基于一个启发式假设:即如果给定单词在已知实例中充当事件触发词,则在未标注数据中提及该单词的所有其他实例都是潜在的可能表示该事件的实例。例如,“结婚”这个词在“马克吐温和奥利维亚兰登结婚于1870年”中作为触发词指示事件“结婚”,基于此,可以将其它未标注数据中包含“结婚”一词的所有文本,与事件“结婚”一起添加到潜在的实例候选集中。
本申请实施例涉及的基于触发词的潜在实例发现策略较为简洁,无需考虑单词、触发词和事件类型之间的相关性。并且,由于上述潜在实例发现策略限制较少,因此能够不依赖特殊的人工设计,即可以高效地获得大规模的 候选集。同时,该候选集可以覆盖更多的实例和主题。
二)在远程监督场景下,计算机设备获取分别包含事件实例的第一数据集和第二数据集时,可以按照预设的事件标注规则对各个文本进行标注,获得初始数据集;该事件标注规则包括事件与触发词之间的对应关系;通过该初始数据集对该判别器进行预训练;通过预训练的该判别器对该初始数据集中的各个事件实例进行处理,获得该初始数据集中的各个事件实例各自的可信概率;根据该初始数据集中的各个事件实例各自的可信概率,从该初始数据集中获取该第一数据集和该第二数据集。
在一个实施例中,在根据该初始数据集中的各个事件实例各自的可信概率,从该初始数据集中获取该第一数据集和该第二数据集时,计算机设备可以将该初始数据集中的各个事件实例中,可信概率高于第一概率阈值的事件实例添加入该第一数据集;并将该初始数据集中的各个事件实例中,可信概率不高于该第一概率阈值的事件实例添加入该第二数据集。
对远程监督场景的适配策略类似于对半监督场景的适配策略,例如,可以首先使用所有自动标注数据(该自动标注数据并不都是准确的)来预训练判别器。然后,判别器用于计算自动标注数据中所有事件实例的可信分数(即可信概率)。然后通过设置特定阈值,可以将整个自动标注数据的集合分成两部分。其中分数高于阈值的事件实例将被添加到可信集R(即第一数据集)中,具有较低分数的其他事件实例将被添加到不可信集U(即第二数据集)中。同时,可信集R可以用作种子,以在上述半监督场景中以基于触发词的潜在实例发现策略获得更多标注数据。
上述第一数据集和第二数据集获取之后,即可以用来训练对抗网络。图4是本公开实施例涉及的一种对抗策略的总体框架图。如图4所示,本申请实施例提供的对抗策略的总体框架包括判别器和生成器。采用判别器来检测事件触发词并识别数据集中每个实例的事件类型。当给出噪声数据时,判别器应当抵抗噪声,并明确指出没有触发词和事件。而生成器用于从不可信数据集U(即上述第二数据集)中选择实例以尽可能地迷惑判别器。
假设第一数据集内每个事件实例x∈R均明确表示其标记的触发词t和事件类型e。与之相反,在对抗训练期间,假设第二数据集内每个实例x∈U是不可信的,即它有一定的概率标注错误。因此,本申请实施例通过预先设计 的判别器来判断给定事件实例是否可以表明其标注的事件类型,其目的在于最大化条件概率P(e|x,t),x∈R和1-P(e|x,t),x∈U。其中,x是实例的信息,t是触发词的信息,e是标注的事件类型,P(e|x,t)就是这个实例和触发词能够体现出对应事件类型e的概率。1-P(e|x,t)就是这个实例和触发词不能表达对应事件类型e的概率。
在训练发生器时,使之从不可信数据集U(即上述第二数据集)中选择最具迷惑性的事件实例,用于欺骗判别器,即按照P(e|x,t),x∈U选择事件实例。上述训练过程是一个对抗性的极大-极小博弈过程,可以所示如下:
Figure PCTCN2020093189-appb-000001
Figure PCTCN2020093189-appb-000002
其中,E是数学期望的符号,
Figure PCTCN2020093189-appb-000003
指对服从P u分布的随机变量x求期望。
P R是可靠数据的分布,并且生成器根据概率分布P u从不可靠数据中采样对抗实例。虽然
Figure PCTCN2020093189-appb-000004
Figure PCTCN2020093189-appb-000005
是相互矛盾的,但是U中的噪声数据对
Figure PCTCN2020093189-appb-000006
Figure PCTCN2020093189-appb-000007
都有副作用。因此,当发生器和判别器在经过充分训练后达到平衡时,发生器倾向于选择与有噪声的实例相比具有较高概率的有信息的实例,而判别器增强了对噪声的抵抗力并且可以更好地分类事件。
其中,对抗网络中除了包含上述图1和图2所示实施例中提及的生成器和判别器之外,还包含编码器,该编码器用于将事件实例编码为嵌入向量,以便生成器和判别器进行处理,其中该编码器的参数也是对抗训练中需要优化的参数。
相应的,由于获取第一数据集和第二数据集的过程中涉及到对判别器的预训练过程,在该预训练过程中,为了便于判别器对事件实例进行处理,也需要对编码器一并进行预训练。
步骤302,在每一轮对抗训练中,通过该编码器对该第一数据集和该第二数据集中的各个事件实例进行编码,获得该第一数据集和该第二数据集中的各个事件实例的嵌入向量。
其中,该嵌入向量用于指示对应事件实例的文本中的各个词,以及该各个词之间的位置关系。
本申请实施例中编码器,用于将事件实例编码为其对应的嵌入向量,以 便为对抗网络的其他模块(即生成器和判别器)提供语义特征。比如,假设给定一个由n个单词及其候选触发词t组成的实例x=(w1,...,t,...,wn),通过嵌入层可以获得其嵌入向量,本申请实施例中,可以采用几个有效的神经网络模型来对事件实例进行编码。
例如,在本申请实施例中,可以选择基于卷积神经网络(Convolutional Neural Networks,CNN)的编码器或者基于双向促进编码(Bidirectional Encoder Representation from Transformers,BERT)的编码器作为编码给定事件实例的编码器。两种编码器的原理如下:
1)基于CNN的编码器:将事件实例中的所有单词表示为输入向量,包括词嵌入向量和位置嵌入向量,编码相对于候选触发词的位置,基于CNN的编码器在输入向量上滑动卷积核以获取隐藏嵌入向量如下:
{h1,...,hn}=CNN(w1,...,t,...,wn)  (2)
2)基于BERT的编码器:类似于CNN编码器,在对事件实例中所有词语的单词片段向量、和位置嵌入向量求和作为输入向量后,基于BERT的编码器采用了多层双向变换编码器获取隐藏嵌入向量如下:
{h1,...,hn}=BERT(w1,...,t,...,wn)  (3)
其中,候选触发词t将事件实例x中的各个词分成两部分,在本申请实施例中,还对隐藏嵌入向量采用动态多重池化操作来得到事件实例的嵌入向量x:
Figure PCTCN2020093189-appb-000008
其中,
Figure PCTCN2020093189-appb-000009
上述公式中的[·] j是指向量的第j维数值,i指触发词t的位置。
其中,上述基于CNN且采用动态多重池化的编码器可以称为动态多重池化CNN编码器。相应的,上述基于BERT且采用动态多重池化的编码器可以称为动态多重池化BERT编码器。
步骤303,通过生成器对第二数据集中的各个事件实例的嵌入向量进行处理,获得第二数据集中的各个事件实例的混淆概率。
在本申请实施例中,生成器旨在从U中选择最具迷惑性的事件实例来欺骗判别器。因此,本申请实施例设计生成器以优化概率分布P u来选择事件实例。即生成器计算U中各个实例的混淆分数,评估他们的迷惑性程度,并进 一步计算混淆概率P u如下:
f(x)-W-x+b;
Figure PCTCN2020093189-appb-000010
其中x是编码器计算的事件实例x的嵌入向量。W和b是超平面的参数。
步骤304,根据该第二数据集中的各个事件实例的混淆概率,从该第二数据集中推荐第二事件实例。
在本申请实施例中,生成器对第二数据集中的各个事件实例输出混淆概率之后,计算机设备可以根据混淆概率,从第二数据集中推荐第二事件实例,比如,计算机设备可以对第二数据集中的各个事件实例按照混淆概率从高到低的顺序进行排序,并将其中排在前列的至少一个事件实例获取为推荐的第二事件实例。
或者,在另一种可能的实现方式中,计算机设备也可以将第二数据集的各个事件实例中,混淆概率高于混淆概率阈值的事件实例获取为推荐的第二事件实例。
其中,上述混淆概率阈值可以是开发人员预先设置的概率阈值,或者,上述混淆概率阈值也可以是计算机设备根据第二数据集的各个事件实例的混淆概率确定的阈值。
步骤305,通过该判别器对第一事件实例和该第二事件实例各自的嵌入向量进行处理,获得该第一事件实例和该第二事件实例各自的可信概率。
其中,该第一事件实例是该第一数据集中的事件实例。
在本申请实施例中,对于给定事件实例x及其标注的触发词t和事件类型e,判别器负责判断给定事件实例是否正确对应其标注的触发词和事件类型。在在使用嵌入向量x表示事件实例x之后,可以按如下方式实现判别器:
D(e|x,t)=e·x;
Figure PCTCN2020093189-appb-000011
其中,e是事件类型e∈E的嵌入向量。P(e|x,t)表示事件实例x的可信概率。
在本申请实施例中,判别器除了对生成器推荐的第二事件实例进行处理, 以输出其可信概率之外,还对第一数据集中的第一事件实例进行处理,以输出其可信概率。
在一个实施例中,在通过该生成器对该第二数据集中的各个事件实例的嵌入向量进行处理之前,计算机设备还可以对该第二数据集进行采样,获得该第二数据集中的各个事件实例;相应的,在通过该判别器对第一事件实例和该第二事件实例各自的嵌入向量进行处理之前,计算机设备还对该第一数据集进行采样,获得该第一事件实例。
由于第一数据集和第二数据集中可能存在大量的事件实例,若对每个事件实例都进行上述步骤303至步骤305的处理,会耗费大量的处理时间,因此,本申请实施例在每一轮对抗训练过程中,可以对第一数据集和第二数据集分别进行采样(比如,可以通过均匀随机方式进行采样),得到第一数据集的子集,以及第二数据集的子集,并根据采样得到的第一数据集的子集以及第二数据集的子集,进行后续步骤的处理。
其中,上述对第一数据集的采样过程可以在步骤302之前执行,也可以在步骤305之前执行;上述对第二数据集的采样过程可以在步骤302之前执行,也可以在步骤303之前执行。
步骤306,若该判别器的输出结果未收敛,则根据该损失函数、该生成器的输出结果以及该判别器的输出结果,计算获得损失数值。
在本申请实施例中,若判别器的输出结果相对于前一轮或者多轮的输出结果的变化较小,比如,输出结果的差值小于预设的差值阈值,即可以认为判别器的输出结果收敛,此时,对抗网络训练完成。相应的,若判别器的输出结果相对于前一轮或者多轮的输出结果的变化较大,比如,输出结果的差值不小于预设的差值阈值,即可以认为判别器的输出结果未收敛,此时,需要优化对抗网络的参数。即通过损失函数以及判别器和生成器的输出结果计算损失数值。
步骤307,根据该损失数值对该对抗网络的参数进行调整。
在一个实施例中,该损失函数包括第一损失函数;在根据该损失函数、该生成器的输出结果以及该判别器的输出结果,计算获得损失数值时,计算机设备可以根据该第一损失函数、该第一事件实例的第一可信概率、该第二事件实例的第二可信概率以及该第二事件实例的混淆概率计算第一损失数 值。
相应的,在根据该损失数值对该对抗网络的参数进行调整时,计算机设备可以根据该第一损失数值对该编码器和该判别器的参数进行调整。
在本申请实施例中,经过优化的判别器将为R(即第一数据集)中的那些事件实例打高分(即输出高可信概率),同时不信任在U(即第二数据集)中的事件实例及其标签,即对U中的事件实例输出低可信概率。因此,可以将损失函数通过如下形式化,用以优化判别器:
Figure PCTCN2020093189-appb-000012
在优化判别器时,可以将编码器部分和D(e|x,t)视为更新的参数。该损失函数
Figure PCTCN2020093189-appb-000013
对应于等式(1)中的
Figure PCTCN2020093189-appb-000014
在另一种可能的实现方式中,该损失函数包括第二损失函数;在根据该损失函数、该生成器的输出结果以及该判别器的输出结果,计算获得损失数值时,计算机设备可以根据该第二损失函数、该第二事件实例的第二可信概率以及该第二事件实例的混淆概率计算第二损失数值。
相应的,在根据该损失数值对该对抗网络的参数进行调整时,计算机设备可以根据该第二损失数值对该生成器的参数进行调整。
在本申请实施例中,一个事件实例经过生成器处理输出的混淆概率越高,该事件实例就越具迷惑性,也就更可能欺骗判别器做出错误的决定。本申请所示的方案希望优化的生成器能够更加关注那些最具迷惑性的事件实例。因此,给定一个实例x∈U,以及其不可靠的触发词t和事件类型e,可以将损失函数如下形式化,以优化生成器:
Figure PCTCN2020093189-appb-000015
其中P(e|x,t)由判别器计算的输出结果(即混淆概率)。在优化生成器时,将计算P u(x)的部分作为需要更新的参数。该损失函数
Figure PCTCN2020093189-appb-000016
对应于等式(1)中的
Figure PCTCN2020093189-appb-000017
在一个实施例中,在根据该第二损失函数、该第二事件实例的第二可信概率以及该第二事件实例的混淆概率计算第二损失数值时,计算机设备可以根据该第二事件实例的第二可信概率获取第二事件实例的平均可信概率;并根据该第二损失函数、该平均可信概率以及该第二事件实例的混淆概率计算该第二损失数值。
在U(即第二数据集)中可能存在一些事件实例没有对应的事件类型,即为NA,并且这些事件实例可能被错误地分类到其它的事件类型中。因此,为了进一步提高对生成器的训练准确性,本申请实施例可以使用所有事件类型的平均分数来将等式(8)中的P(e|x,t)替换如下:
Figure PCTCN2020093189-appb-000018
其中,ε表示事件类型集合。
在一个实施例中,在根据该损失函数、该生成器的输出结果以及该判别器的输出结果,计算获得损失数值时,计算机设备可以对该第一事件实例进行采样,获得第一采样实例;对该第二事件实例进行采样,获得第二采样实例;根据该损失函数、该生成器对该第二采样实例的输出结果、以及该判别器分别对该第一采样实例和该第二采样实例的输出结果,计算获得该损失数值。
其中,由于R和U中可能存在大量实例,所以直接计算
Figure PCTCN2020093189-appb-000019
Figure PCTCN2020093189-appb-000020
是非常耗时的。为了提高训练效率,本申请实施例可以采样R和U的子集来估算基本概率分布,并形成一个新的损失函数:
Figure PCTCN2020093189-appb-000021
Figure PCTCN2020093189-appb-000022
其中,
Figure PCTCN2020093189-appb-000023
Figure PCTCN2020093189-appb-000024
是从U和R中采样的子集,
Figure PCTCN2020093189-appb-000025
是公式(6)的估计值。
Figure PCTCN2020093189-appb-000026
其中,α是一个超参数,其控制概率分布的尖锐程度,以避免权重集中在某些特定实例上。最后,整体优化函数是:
Figure PCTCN2020093189-appb-000027
其中,λ是一个调和因子,
Figure PCTCN2020093189-appb-000028
Figure PCTCN2020093189-appb-000029
在对抗训练中可以是交替优化的,且λ体现在
Figure PCTCN2020093189-appb-000030
的学习速率中。
其中,上述采样过程可以在通过编码器、生成器和判别器对第一数据集和第二数据集中的事件实例处理之前执行,即编码器、生成器和判别器对采 样的事件实例进行处理,后续通过采样的事件实例的输出结果计算损失数值。
或者,上述采样过程也可以在通过编码器、生成器和判别器对第一数据集和第二数据集中的事件实例处理之后执行,即通过编码器、生成器和判别器对第一数据集和第二数据集中的全部事件实例进行处理,并在计算损失数值之前进行采样,通过生成器和判别器对采样到的事件实例的输出结果计算损失数值。
在一个实施例中,上述生成器和判别器的超参数的设置可以如下表1所示:
表1
随机失活的丢弃概率 5×10 -1
以动态多重池化CNN为编码器的生成器的学习率 5×10 -3
以动态多重池化CNN为编码器的判别器的学习率 2×10 -2
以动态多重池化BERT为编码器的生成器的学习率 2×10 -5
以动态多重池化BERT为编码器的判别器的学习率 1×10 -4
步骤308,对于该第二数据集中由训练完成后的该生成器推荐的目标事件实例,当训练完成后的该判别器对该目标事件实例输出的可信概率高于第一概率阈值时,将该目标事件实例添加至该第一数据集。
在对抗训练期间,当判别器和发生器在一定训练轮数之后达到平衡时,所有由生成器推荐并由判别器标记为正确的不可信集合U的实例将从U调整到R。迭代地进行对抗训练可以识别信息丰富的实例并滤除U中的噪声实例,实现利用大规模未标注数据来丰富小规模标注数据。
图5是上述本申请实施例涉及的一种对抗网络训练及应用的框架图。如图5所示,计算机设备获取第一数据集和第二数据集,该第一数据集和第二数据集的获取过程可以参考上述步骤301下的描述,此处不再赘述。在一轮对抗训练过程中,计算机设备将第二数据集中的各个事件实例采样获得第二数据子集,并输入至生成器(S51),由生成器对输入的事件实例输出混淆概率(S52),再根据混淆概率确定第二数据子集中推荐的事件实例(S53),将推荐的事件实例输入至判别器(S54);此外,计算机设备还对第一数据集进行采样,获得第一数据子集,并将第一数据子集中的各个事件实例输入至判别器(S55);判别器分别输出推荐的事件实例和第一数据子集中的事件实例 的可信概率(S56);计算机设备根据判别器的输出确定是否收敛(S57);若是,则计算机设备通过对抗网络对第二数据集中的各个事件实例进行推荐和判别,以从第二数据集中确定出可信的事件实例添加到第一数据集中(S58);若否,则计算机设备将生成器输出的混淆概率,以及判别器输出的可信概率,输入至损失函数(S59),并通过损失函数输出的损失值优化对抗网络中的参数(S510),并返回进行下一轮对抗训练。
图6和图7是本申请实施例涉及的,在远距离监督场景下的两种精度召回曲线对比示意图。
其中,图6示出了本申请提供的以动态多重池化CNN为编码器的对抗网络模型,与相关技术中以动态多重池化CNN为编码器的三种弱监督模型(即相关模型1、相关模型2以及相关模型3)在基于文本的事件检测应用中,各自的精度召回曲线示意图。
图7示出了本申请提供的以动态多重池化BERT为编码器的对抗网络模型,与相关技术中以动态多重池化BERT为编码器的三种弱监督模型(即相关模型4、相关模型5以及相关模型6)在基于文本的事件检测应用中,各自的精度召回曲线示意图。
基于上述图6和图7,可以获得不同编码器情况下,本申请提供的对抗网络模型与相关技术中的弱监督模型的曲线下面积(area under curve,AUC)的对比表,该对比表如下述表2所示:
表2
Figure PCTCN2020093189-appb-000031
Figure PCTCN2020093189-appb-000032
由上述表2可知,在远距离监督场景下,本申请实施例提供的两种基于不同编码器的对抗网络,明显优于相关技术中基于这两种编码器的其它弱监督模型。
对于半监督场景,本申请实施例利用原始训练集(比如ACE-2005训练集)中的现有触发词作为启发式种子,通过上述基于触发词的潜在实例发现策略,从语料库(比如《纽约时报》语料库)中构建了一个大规模的候选集,并使用本申请实施例所示的对抗网络进行训练并过滤掉噪声实例,以构建新的数据集,然后用新的数据集扩展原始训练集,获得扩展训练集,并在原始测试集上测试在扩展训练集上训练的对抗网络。本申请实施例中,以动态多重池化CNN为编码器,通过原始训练集训练的对抗网络模型为CNN模型1;以动态多重池化CNN为编码器,通过扩展训练集训练的对抗网络模型为CNN模型2;以动态多重池化BERT为编码器,通过原始训练集训练的对抗网络模型为BERT模型1;以动态多重池化BERT为编码器,通过扩展训练集训练的对抗网络模型为BERT模型2。将上述CNN模型1、CNN模型2、BERT模型1和BERT模型2与相关技术中通过ACE-2005训练集训练的弱监督模型(选择相关模型7-15)进行比较,可以获得表3所示的对比结果。
表3
Figure PCTCN2020093189-appb-000033
Figure PCTCN2020093189-appb-000034
上述表3中的P栏表示准确率,R栏表示召回率,F1栏表示准确率和召回率的调和平均数。由表3可见,本申请提供的方案可以用于构建高质量的数据集,而无需复杂的规则和大规模的知识库,并且可以有效收集不同的事件实例,有利于模型的训练。此外,本申请能够通过增加训练数据来获得更好的模型性能,证明了本申请能够提供的对抗网络模型的有效性。
为了对本申请实施例提供的,使用基于触发词的实例发现策略和对抗性训练策略构建的数据集的质量进行细粒度评估,如表4所示,本申请通过平均准确度和弗莱斯Kappa(Fleiss’s Kappa)系数来评价相关技术中的弱监督模型(相关模型16和相关模型17)与本申请模型。
表4
模型 平均准确度 Fleiss’s Kappa
相关模型16 88.9
相关模型17 91.0
本申请模型第一次迭代 91.7 61.3
本申请模型第二次迭代 87.5 52.0
由表4可见,本申请实施例提供的,基于触发词的实例发现策略和对抗性训练策略能够高准确度的提取事件实例。
为了进一步证明本申请提供的模型提高数据集覆盖率的有效性,本申请实施例给出了表5所示的一个示例。
表5
Figure PCTCN2020093189-appb-000035
ACE-2005实例是ACE-2005训练集中,对应起诉事件的典型事件实例,扩展实例中的两个实例是从通过本申请提供的方案构建的数据集中进行采样得到的事件实例。在扩展实例中,第一个事件实例具有ACE-2005实例的触 发词,但语法不同;第二个事件实例具有ACE-2005实例中未包含的新的触发词。实验表明,通过本申请所示的方案构建的扩展数据集中,有1.2%的触发词是新发现的触发词。这表明本申请所示的方法不仅可以从与标记数据中的实例类似的未标记数据中找到新的实例,还可以发现新的触发词,从而扩展数据集的覆盖范围。
通过本申请实施例提供的上述训练好的对抗网络,不仅可以实现对第二数据集中包含的事件实例中的文本进行事件检测,以扩充第一数据集,从而得到更大规模的高质量数据集以方便其他模型的训练,此外,也可以直接应用于从其它未标注的文本中自动检测事件的场景。
比如,在一个实施例中,对抗网络中的判别器对于输入的文本,可以预测该文本对应的事件。部署有上述训练好的对抗网络的识别设备(比如线上的服务器)可以获取待识别文本(比如一个自然语言句子),通过上述训练好的对抗网络对该待检测文本进行处理,并根据对抗网络中的判别器对待检测文本的输出结果获取待检测文本对应的事件,从而实现对待识别文本的事件检测。
综上,本申请实施例所示的方案,通过包含标准的事件实例的第一数据集,以及包含非标准的事件实例的第二数据集,训练对抗网络中的生成器和判别器,使得训练后的判别器能够准确的判别出第二数据集中的事件实例是否可信,一方面,本方案不需要大量的人工标注,节省了数据准备时间,提高了基于文本的事件检测效率,另一方面,本方案采用对抗网络的方式进行事件检测,能够准确的排除第二数据集中的噪声数据,提高事件检测的准确性。
具体来说,本申请实施例提出一种对抗训练机制,不仅能够自动从候选集提取更有信息量的实例,还能提高事件检测模型在有噪声数据场景下(如远距离监督)的性能。在半监督和远程监督的场景中的实验表明,本申请所示的方案中,基于触发词的潜在实例发现策略和对抗训练方法可以合作获得更多样化和准确的训练数据,并减少噪声问题的副作用,从而明显优于当前最高水平的事件检测模型。即本申请提供一种新的弱监督事件检测模型,能够扩展数据集以实现更高的覆盖范围,并减轻事件检测中的低覆盖率、主题偏差和噪声问题,最终提高事件检测的效果。
本申请各个实施例所示的对抗网络的训练和应用方案,可以应用于基于文本进行事件检测,以及根据检测出的事件进行后续应用的人工智能(Artificial Intelligence,AI)场景,比如,本申请实施例所示的对抗网络的训练和应用方案可以由AI从自然语言描述的文本中自动识别出对应的事件,并结合识别出的事件提供智能问答、信息检索、阅读理解等AI服务。
在一种可能的实现场景中,本申请实施例所示的对抗网络可以应用于基于自然语言的服务系统中。例如,基于自然语言的服务系统可以部署上述训练完成后的对抗网络,并对外提供服务接口,用户接受服务系统提供的服务,比如智能问答服务时,用户的终端可以通过该服务接口向服务系统发送自然语言,服务系统通过自然语言生成对应的句子文本,然后通过该对抗网络检测该句子文本对应的事件,后续根据检测出的事件向该用户提供智能问答服务。
或者,在另一种可能的实现场景中,本申请实施例所示的对抗网络也可以独立部署为事件检测系统。比如,部署有上述训练完成后的对抗网络的事件检测系统可以对外提供服务接口,基于自然语言的服务系统,比如智能问答系统,接收到用户的终端发送的自然语言后,通过该自然语言生成对应的句子文本,然后将该句子文本通过服务接口发送给事件检测系统,由事件检测系统通过该对抗网络检测该句子文本对应的事件,并将检测出的事件发送给服务系统,以便服务系统根据检测出的事件向该用户提供智能问答服务。
本申请仅以上述服务系统向用户提供智能问答服务为例进行说明,在一个实施例中,上述服务系统也可以向用户提供基于从文本中检测出的事件的其它服务,比如检索或者阅读理解等等。
应该理解的是,虽然上述各实施例的流程图中的各个步骤按照箭头的指示依次显示,但是这些步骤并不是必然按照箭头指示的顺序依次执行。除非本文中有明确的说明,这些步骤的执行并没有严格的顺序限制,这些步骤可以以其它的顺序执行。而且,上述各实施例中的至少一部分步骤可以包括多个子步骤或者多个阶段,这些子步骤或者阶段并不必然是在同一时刻执行完成,而是可以在不同的时刻执行,这些子步骤或者阶段的执行顺序也不必然是依次进行,而是可以与其它步骤或者其它步骤的子步骤或者阶段的至少一部分轮流或者交替地执行。
图8是根据一示例性实施例示出的一种基于文本的事件检测装置的结构方框图。该基于文本的事件检测装置可以用于计算机设备中,以执行图1或图3所示实施例中的全部或者部分步骤。该装置具有实现上述方法示例的功能模块或单元,每个功能模块或单元可全部或部分通过软件、硬件或其组合来实现。该基于文本的事件检测装置可以包括:
数据集获取模块801,用于获取分别包含事件实例的第一数据集和第二数据集,事件实例包括文本以及文本对应的事件;第一数据集中包含标准的事件实例,第二数据集中包含非标准的事件实例;
对抗训练模块802,用于通过第一数据集和第二数据集训练对抗网络,对抗网络包括生成器和判别器;生成器用于从第一数据集和第二数据集中分别推荐事件实例;判别器用于输出第一数据集中的事件实例的可信概率,以及生成器推荐的事件实例的可信概率;对抗网络的损失函数用于对对抗网络的参数进行调整,以使得第一可信概率最大化,并使得第二可信概率最小化,第一可信概率是判别器对属于第一数据集的事件实例输出的可信概率,第二可信概率是判别器对属于第二数据集的事件实例输出的可信概率;
实例获取模块803,用于通过训练完成的对抗网络,获取第二数据集中的标准的事件实例。
在一个实施例中,对抗网络还包括编码器,对抗训练模块802,用于,
在每一轮对抗训练中,通过编码器对第一数据集和第二数据集中的各个事件实例进行编码,获得第一数据集和第二数据集中的各个事件实例的嵌入向量,嵌入向量用于指示对应事件实例的文本中的各个词,以及各个词之间的位置关系;
通过生成器对第二数据集中的各个事件实例的嵌入向量进行处理,获得第一数据集和第二数据集中的各个事件实例的混淆概率;混淆概率用于指示判别器错误判别对应的事件实例是否可信的概率;
根据第二数据集中的各个事件实例的混淆概率,从第二数据集中推荐第二事件实例;
通过判别器对第一事件实例和第二事件实例各自的嵌入向量进行处理,获得第一事件实例和第二事件实例各自的可信概率;第一事件实例是第一数据集中的事件实例;
若判别器的输出结果未收敛,则根据损失函数、生成器的输出结果以及判别器的输出结果,计算获得损失数值;
根据损失数值对对抗网络的参数进行调整。
在一个实施例中,损失函数包括第一损失函数;
在根据损失函数、生成器的输出结果以及判别器的输出结果,计算获得损失数值时,对抗训练模块802,用于,
根据第一损失函数、第一事件实例的第一可信概率、第二事件实例的第二可信概率以及第二事件实例的混淆概率计算第一损失数值;
在根据损失数值对对抗网络的参数进行调整时,对抗训练模块802,用于,
根据第一损失数值对编码器和判别器的参数进行调整。
在一个实施例中,损失函数包括第二损失函数;
在根据损失函数、生成器的输出结果以及判别器的输出结果,计算获得损失数值时,对抗训练模块802,用于,
根据第二损失函数、第二事件实例的第二可信概率以及第二事件实例的混淆概率计算第二损失数值;
在根据损失数值对对抗网络的参数进行调整时,对抗训练模块802,用于,
根据第二损失数值对生成器的参数进行调整。
在一个实施例中,在根据第二损失函数、第二事件实例的第二可信概率以及第二事件实例的混淆概率计算第二损失数值时,对抗训练模块802,用于,
根据第二事件实例的第二可信概率获取第二事件实例的平均可信概率;
根据第二损失函数、平均可信概率以及第二事件实例的混淆概率计算第二损失数值。
在一个实施例中,在根据损失函数、生成器的输出结果以及判别器的输出结果,计算获得损失数值时,对抗训练模块802,用于,
对第一事件实例进行采样,获得第一采样实例;
对第二事件实例进行采样,获得第二采样实例;
根据损失函数、生成器对第二采样实例的输出结果、以及判别器分别对 第一采样实例和第二采样实例的输出结果,计算获得损失数值。
在一个实施例中,实例获取模块803,用于,
对于第二数据集中由训练完成后的生成器推荐的目标事件实例,当训练完成后的判别器对目标事件实例输出的可信概率高于第一概率阈值时,将目标事件实例添加至第一数据集。
在一个实施例中,数据集获取模块801,用于
获取第一数据集;
根据第一数据集获取事件标注规则,事件标注规则包括标准实例的事件与标准实例的文本中的触发词之间的对应关系,标准实例是第一数据集中的事件实例;
根据事件标注规则对第一数据集之外的各个文本进行标注,获得候选数据集;
通过第一数据集对判别器进行预训练,获得预训练的判别器;
通过预训练的判别器对候选数据集中的各个事件实例进行处理,获得候选数据集中的各个事件实例的可信概率;
根据候选数据集中的各个事件实例的可信概率,从候选数据集中提取第二数据集。
在一个实施例中,在获取第一数据集时,数据集获取模块801,用于获取人工标注的第一数据集。
在一个实施例中,数据集获取模块801,用于,
按照预设的事件标注规则对各个文本进行标注,获得初始数据集;事件标注规则包括事件与触发词之间的对应关系;
通过初始数据集对判别器进行预训练;
通过预训练的判别器对初始数据集中的各个事件实例进行处理,获得初始数据集中的各个事件实例各自的可信概率;
根据初始数据集中的各个事件实例各自的可信概率,从初始数据集中获取第一数据集和第二数据集。
在一个实施例中,在根据初始数据集中的各个事件实例各自的可信概率,从初始数据集中获取第一数据集和第二数据集时,数据集获取模块801,用于,
将初始数据集中的各个事件实例中,可信概率高于第二概率阈值的事件实例添加入第一数据集;
将初始数据集中的各个事件实例中,可信概率不高于第二概率阈值的事件实例添加入第二数据集。
综上,本申请实施例所示的方案,通过包含标准的事件实例的第一数据集,以及包含非标准的事件实例的第二数据集,训练对抗网络中的生成器和判别器,使得训练后的判别器能够准确的判别出第二数据集中的事件实例是否可信,一方面,本方案不需要大量的人工标注,节省了数据准备时间,提高了基于文本的事件检测效率,另一方面,本方案采用对抗网络的方式进行事件检测,能够准确的排除第二数据集中的噪声数据,提高事件检测的准确性。
图9是根据一示例性实施例示出的一种计算机设备的结构示意图。计算机设备900包括中央处理单元(CPU)901、包括随机存取存储器(RAM)902和只读存储器(ROM)903的系统存储器904,以及连接系统存储器904和中央处理单元901的系统总线905。计算机设备900还包括帮助计算机内的各个器件之间传输信息的基本输入/输出系统(I/O系统)906,和用于存储操作系统913、应用程序914和其他程序模块915的大容量存储设备907。
基本输入/输出系统906包括有用于显示信息的显示器908和用于用户输入信息的诸如鼠标、键盘之类的输入设备909。其中显示器908和输入设备909都通过连接到系统总线905的输入输出控制器910连接到中央处理单元901。基本输入/输出系统906还可以包括输入输出控制器910以用于接收和处理来自键盘、鼠标、或电子触控笔等多个其他设备的输入。类似地,输入输出控制器910还提供输出到显示屏、打印机或其他类型的输出设备。
大容量存储设备907通过连接到系统总线905的大容量存储控制器(未示出)连接到中央处理单元901。大容量存储设备907及其相关联的计算机可读介质为计算机设备900提供非易失性存储。也就是说,大容量存储设备907可以包括诸如硬盘或者CD-ROM驱动器之类的计算机可读介质(未示出)。
不失一般性,计算机可读介质可以包括计算机存储介质和通信介质。计算机存储介质包括以用于存储诸如计算机可读指令、数据结构、程序模块或 其他数据等信息的任何方法或技术实现的易失性和非易失性、可移动和不可移动介质。计算机存储介质包括RAM、ROM、EPROM、EEPROM、闪存或其他固态存储其技术,CD-ROM、DVD或其他光学存储、磁带盒、磁带、磁盘存储或其他磁性存储设备。当然,本领域技术人员可知计算机存储介质不局限于上述几种。上述的系统存储器904和大容量存储设备907可以统称为存储器。
计算机设备900可以通过连接在系统总线905上的网络接口单元911连接到互联网或者其它网络设备。
存储器还包括一个或者一个以上的程序,一个或者一个以上程序存储于存储器中,中央处理器901通过执行该一个或一个以上程序来实现图2、图1或图3所示的方法中的全部或者部分步骤。
在示例性实施例中,还提供了一种包括指令的非临时性计算机可读存储介质,例如包括计算机程序(指令)的存储器,上述程序(指令)可由计算机设备的处理器执行以完成本申请各个实施例所示的方法的全部或者部分步骤。例如,非临时性计算机可读存储介质可以是ROM、随机存取存储器(RAM)、CD-ROM、磁带、软盘和光数据存储设备等。
本领域技术人员在考虑说明书及实践这里公开的发明后,将容易想到本申请的其它实施方案。本申请旨在涵盖本申请的任何变型、用途或者适应性变化,这些变型、用途或者适应性变化遵循本申请的一般性原理并包括本申请未公开的本技术领域中的公知常识或惯用技术手段。说明书和实施例仅被视为示例性的,本申请的真正范围和精神由下面的权利要求指出。
应当理解的是,本申请并不局限于上面已经描述并在附图中示出的精确结构,并且可以在不脱离其范围进行各种修改和改变。本申请的范围仅由所附的权利要求来限制。

Claims (20)

  1. 一种基于文本的事件检测方法,由计算机设备执行,其特征在于,所述方法包括:
    获取分别包含事件实例的第一数据集和第二数据集,所述事件实例包括文本以及所述文本对应的事件;所述第一数据集中包含标准的事件实例,所述第二数据集中包含非标准的事件实例;
    通过所述第一数据集和所述第二数据集训练对抗网络,所述对抗网络包括生成器和判别器;所述生成器用于从所述第二数据集中选取事件实例以输入至所述判别器;所述判别器用于输出所述第一数据集中的事件实例的第一可信概率,以及对由所述生成器输入的事件实例输出第二可信概率;所述对抗网络的损失函数用于对所述对抗网络的参数进行调整,以使得所述第一可信概率最大化,并使得所述第二可信概率最小化;
    通过训练完成的所述对抗网络,获取所述第二数据集中的标准的事件实例。
  2. 根据权利要求1所述的方法,其特征在于,所述对抗网络还包括编码器,所述通过所述第一数据集和所述第二数据集训练对抗网络,包括:
    在每一轮对抗训练中,通过所述编码器对所述第一数据集和所述第二数据集中的各个事件实例进行编码,获得所述第一数据集和所述第二数据集中的各个事件实例的嵌入向量,所述嵌入向量用于指示对应事件实例的文本中的各个词,以及所述各个词之间的位置关系;
    通过所述生成器对所述第二数据集中的各个事件实例的嵌入向量进行处理,获得所述第二数据集中的各个事件实例的混淆概率;所述混淆概率用于指示所述判别器错误判别对应的事件实例是否可信的概率;
    根据所述第二数据集中的各个事件实例的混淆概率,从所述第二数据集中选取第二事件实例;
    通过所述判别器对第一事件实例和所述第二事件实例各自的嵌入向量进行处理,得到所述判别器的输出结果;所述输出结果包括所述第一事件实例的所述第一可信概率和所述第二事件实例的所述第二可信概率;所述第一事件实例是所述第一数据集中的事件实例;
    若所述判别器的输出结果未收敛,则根据所述损失函数、所述生成器的 输出结果以及所述判别器的输出结果,计算获得损失数值;
    根据所述损失数值对所述对抗网络的参数进行调整。
  3. 根据权利要求2所述的方法,其特征在于,所述损失函数包括第一损失函数;
    所述根据所述损失函数、所述生成器的输出结果以及所述判别器的输出结果,计算获得损失数值,包括:
    根据所述第一损失函数、所述第一事件实例的第一可信概率、所述第二事件实例的第二可信概率以及所述第二事件实例的混淆概率计算第一损失数值;
    所述根据所述损失数值对所述对抗网络的参数进行调整,包括:
    根据所述第一损失数值对所述编码器和所述判别器的参数进行调整。
  4. 根据权利要求2所述的方法,其特征在于,所述损失函数包括第二损失函数;
    所述根据所述损失函数、所述生成器的输出结果以及所述判别器的输出结果,计算获得损失数值,包括:
    根据所述第二损失函数、所述第二事件实例的第二可信概率以及所述第二事件实例的混淆概率计算第二损失数值;
    所述根据所述损失数值对所述对抗网络的参数进行调整,包括:
    根据所述第二损失数值对所述生成器的参数进行调整。
  5. 根据权利要求4所述的方法,其特征在于,所述根据所述第二损失函数、所述第二事件实例的第二可信概率以及所述第二事件实例的混淆概率计算第二损失数值,包括:
    根据所述第二事件实例的第二可信概率,获取各第二事件实例的平均可信概率;
    根据所述第二损失函数、所述平均可信概率以及所述第二事件实例的混淆概率计算所述第二损失数值。
  6. 根据权利要求2所述的方法,其特征在于,所述根据所述损失函数、所述生成器的输出结果以及所述判别器的输出结果,计算获得损失数值,包括:
    对所述第一事件实例进行采样,获得第一采样实例;
    对所述第二事件实例进行采样,获得第二采样实例;
    根据所述损失函数、所述生成器对所述第二采样实例的输出结果、以及所述判别器分别对所述第一采样实例和所述第二采样实例的输出结果,计算获得所述损失数值。
  7. 根据权利要求1所述的方法,其特征在于,所述通过训练完成的所述对抗网络,获取所述第二数据集中的标准的事件实例,包括:
    对于所述第二数据集中由训练完成后的所述生成器选取的目标事件实例,当训练完成后的所述判别器对所述目标事件实例输出的可信概率高于第一概率阈值时,将所述目标事件实例添加至所述第一数据集。
  8. 根据权利要求1所述的方法,其特征在于,所述获取分别包含事件实例的第一数据集和第二数据集,包括:
    获取所述第一数据集;
    根据所述第一数据集获取事件标注规则,所述事件标注规则包括标准实例的事件与所述标准实例的文本中的触发词之间的对应关系,所述标准实例是所述第一数据集中的事件实例;
    根据所述事件标注规则对所述第一数据集之外的各个文本进行标注,获得候选数据集;
    通过所述第一数据集对所述判别器进行预训练,获得预训练的所述判别器;
    通过所述预训练的判别器对所述候选数据集中的各个事件实例进行处理,获得所述候选数据集中的各个事件实例的可信概率;
    根据所述候选数据集中的各个事件实例的可信概率,从所述候选数据集中提取所述第二数据集。
  9. 根据权利要求8所述的方法,其特征在于,所述获取所述第一数据集,包括:
    获取人工标注的所述第一数据集。
  10. 根据权利要求1所述的方法,其特征在于,所述获取分别包含事件实例的第一数据集和第二数据集,包括:
    按照预设的事件标注规则对各个文本进行标注,获得初始数据集;所述事件标注规则包括事件与触发词之间的对应关系;
    通过所述初始数据集对所述判别器进行预训练;
    通过预训练的所述判别器对所述初始数据集中的各个事件实例进行处理,获得所述初始数据集中的各个事件实例各自的可信概率;
    根据所述初始数据集中的各个事件实例各自的可信概率,从所述初始数据集中获取所述第一数据集和所述第二数据集。
  11. 根据权利要求10所述的方法,其特征在于,所述根据所述初始数据集中的各个事件实例各自的可信概率,从所述初始数据集中获取所述第一数据集和所述第二数据集,包括:
    将所述初始数据集中的各个事件实例中,可信概率高于第二概率阈值的事件实例添加入所述第一数据集;
    将所述初始数据集中的各个事件实例中,可信概率不高于所述第二概率阈值的事件实例添加入所述第二数据集。
  12. 一种基于文本的事件检测方法,由计算机设备执行,其特征在于,所述方法包括:
    获取待检测文本;
    通过对抗网络对所述待检测文本进行处理,所述对抗网络是通过第一数据集和第二数据集训练获得的,所述第一数据集中包含标准的事件实例,所述第二数据集中包含非标准的事件实例;所述对抗网络包括生成器和判别器;所述生成器用于从所述第二数据集中选取事件实例以输入至所述判别器;所述判别器用于输出所述第一数据集中的事件实例的第一可信概率,以及对由所述生成器输入的事件实例输出第二可信概率;所述对抗网络的损失函数用于对所述对抗网络的参数进行调整,以使得所述第一可信概率最大化,并使得所述第二可信概率最小化;
    根据所述对抗网络中的判别器对所述待检测文本的输出结果,获取所述待检测文本对应的事件。
  13. 一种基于文本的事件检测装置,设置于计算机设备中,其特征在于,所述装置包括:
    数据集获取模块,用于获取分别包含事件实例的第一数据集和第二数据集,所述事件实例包括文本以及所述文本对应的事件;所述第一数据集中包含标准的事件实例,所述第二数据集中包含非标准的事件实例;
    对抗训练模块,用于通过所述第一数据集和所述第二数据集训练对抗网络,所述对抗网络包括生成器和判别器;所述生成器用于从所述第二数据集中选取事件实例以输入至所述判别器;所述判别器用于输出第一数据集中的事件实例的第一可信概率,以及对由所述生成器输入的事件实例输出第二可信概率;所述对抗网络的损失函数用于对所述对抗网络的参数进行调整,以使得所述第一可信概率最大化,并使得所述第二可信概率最小化;
    实例获取模块,用于通过训练完成的所述对抗网络,获取所述第二数据集中的标准的事件实例。
  14. 根据权利要求13所述的装置,其特征在于,所述对抗网络还包括编码器,所述对抗训练模块用于在每一轮对抗训练中,通过所述编码器对所述第一数据集和所述第二数据集中的各个事件实例进行编码,获得所述第一数据集和所述第二数据集中的各个事件实例的嵌入向量,所述嵌入向量用于指示对应事件实例的文本中的各个词,以及所述各个词之间的位置关系;通过所述生成器对所述第二数据集中的各个事件实例的嵌入向量进行处理,获得所述第一数据集和所述第二数据集中的各个事件实例的混淆概率;所述混淆概率用于指示所述判别器错误判别对应的事件实例是否可信的概率;根据所述第二数据集中的各个事件实例的混淆概率,从所述第二数据集中选取第二事件实例;通过所述判别器对第一事件实例和所述第二事件实例各自的嵌入向量进行处理,获得所述第一事件实例和所述第二事件实例各自的可信概率;所述第一事件实例是所述第一数据集中的事件实例;若所述判别器的输出结果未收敛,则根据所述损失函数、所述生成器的输出结果以及所述判别器的输出结果,计算获得损失数值;根据所述损失数值对所述对抗网络的参数进行调整。
  15. 根据权利要求14所述的装置,其特征在于,所述损失函数包括第一损失函数;
    在根据所述损失函数、所述生成器的输出结果以及所述判别器的输出结果,计算获得损失数值时,所述对抗训练模块,用于根据所述第一损失函数、所述第一事件实例的第一可信概率、所述第二事件实例的第二可信概率以及所述第二事件实例的混淆概率计算第一损失数值;
    在根据所述损失数值对所述对抗网络的参数进行调整时,所述对抗训练 模块用于根据所述第一损失数值对所述编码器和所述判别器的参数进行调整。
  16. 根据权利要求14所述的装置,其特征在于,所述损失函数包括第二损失函数;
    在根据所述损失函数、所述生成器的输出结果以及所述判别器的输出结果,计算获得损失数值时,所述对抗训练模块,用于根据所述第二损失函数、所述第二事件实例的第二可信概率以及所述第二事件实例的混淆概率计算第二损失数值;
    在根据所述损失数值对所述对抗网络的参数进行调整时,所述对抗训练模块,用于根据所述第二损失数值对所述生成器的参数进行调整。
  17. 根据权利要求16所述的装置,其特征在于,在根据所述第二损失函数、所述第二事件实例的第二可信概率以及所述第二事件实例的混淆概率计算第二损失数值时,所述对抗训练模块,用于根据所述第二事件实例的第二可信概率获取所述第二数据集中第二事件实例的平均可信概率;根据所述第二损失函数、所述平均可信概率以及所述第二事件实例的混淆概率计算所述第二损失数值。
  18. 根据权利要求14所述的装置,其特征在于,在根据所述损失函数、所述生成器的输出结果以及所述判别器的输出结果,计算获得损失数值时,所述对抗训练模块,用于对所述第一事件实例进行采样,获得第一采样实例;对所述第二事件实例进行采样,获得第二采样实例;根据所述损失函数、所述生成器对所述第二采样实例的输出结果、以及所述判别器分别对所述第一采样实例和所述第二采样实例的输出结果,计算获得所述损失数值。
  19. 一种计算机设备,其特征在于,所述计算机设备包括一个或多个处理器和一个或多个存储器,所述一个或多个存储器中存储有计算机可读指令,所述计算机可读指令由所述处理器执行以实现如权利要求1至12任一所述的基于文本的事件检测方法。
  20. 一个或多个计算机可读存储介质,其特征在于,所述存储介质中存储有计算机可读指令,所述指令由计算机设备的一个或多个处理器执行以实现如权利要求1至12任一所述的基于文本的事件检测方法。
PCT/CN2020/093189 2019-05-31 2020-05-29 基于文本的事件检测方法、装置、计算机设备及存储介质 WO2020239061A1 (zh)

Priority Applications (1)

Application Number Priority Date Filing Date Title
US17/367,130 US20210334665A1 (en) 2019-05-31 2021-07-02 Text-based event detection method and apparatus, computer device, and storage medium

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201910471605.1 2019-05-31
CN201910471605.1A CN110188172B (zh) 2019-05-31 2019-05-31 基于文本的事件检测方法、装置、计算机设备及存储介质

Related Child Applications (1)

Application Number Title Priority Date Filing Date
US17/367,130 Continuation US20210334665A1 (en) 2019-05-31 2021-07-02 Text-based event detection method and apparatus, computer device, and storage medium

Publications (2)

Publication Number Publication Date
WO2020239061A1 true WO2020239061A1 (zh) 2020-12-03
WO2020239061A9 WO2020239061A9 (zh) 2021-01-28

Family

ID=67719615

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2020/093189 WO2020239061A1 (zh) 2019-05-31 2020-05-29 基于文本的事件检测方法、装置、计算机设备及存储介质

Country Status (3)

Country Link
US (1) US20210334665A1 (zh)
CN (1) CN110188172B (zh)
WO (1) WO2020239061A1 (zh)

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112862837A (zh) * 2021-01-27 2021-05-28 南京信息工程大学 一种基于卷积神经网络的图像处理方法和系统
CN113326371A (zh) * 2021-04-30 2021-08-31 南京大学 一种融合预训练语言模型与抗噪声干扰远程监督信息的事件抽取方法
CN113392213A (zh) * 2021-04-19 2021-09-14 合肥讯飞数码科技有限公司 事件抽取方法以及电子设备、存储装置
CN113724149A (zh) * 2021-07-20 2021-11-30 北京航空航天大学 一种弱监督的可见光遥感图像薄云去除方法
CN113987163A (zh) * 2021-09-27 2022-01-28 浙江大学 一种基于本体指导的终身事件抽取方法
CN113392213B (zh) * 2021-04-19 2024-05-31 合肥讯飞数码科技有限公司 事件抽取方法以及电子设备、存储装置

Families Citing this family (16)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110188172B (zh) * 2019-05-31 2022-10-28 清华大学 基于文本的事件检测方法、装置、计算机设备及存储介质
EP3767536A1 (en) * 2019-07-17 2021-01-20 Naver Corporation Latent code for unsupervised domain adaptation
CN112948535B (zh) * 2019-12-10 2022-06-14 复旦大学 一种文本的知识三元组抽取方法、装置及存储介质
CN111368056B (zh) * 2020-03-04 2023-09-29 北京香侬慧语科技有限责任公司 一种古诗词生成方法和装置
CN111597328B (zh) * 2020-05-27 2022-10-18 青岛大学 一种新事件主题提取方法
CN111813931B (zh) * 2020-06-16 2021-03-16 清华大学 事件检测模型的构建方法、装置、电子设备及存储介质
CN111694924B (zh) * 2020-06-17 2023-05-26 合肥中科类脑智能技术有限公司 一种事件抽取方法和系统
CN111767402B (zh) * 2020-07-03 2022-04-05 北京邮电大学 一种基于对抗学习的限定域事件检测方法
CN111883222B (zh) * 2020-09-28 2020-12-22 平安科技(深圳)有限公司 文本数据的错误检测方法、装置、终端设备及存储介质
CN112364945B (zh) * 2021-01-12 2021-04-16 之江实验室 一种基于域-不变特征的元-知识微调方法及平台
GB2608344A (en) 2021-01-12 2022-12-28 Zhejiang Lab Domain-invariant feature-based meta-knowledge fine-tuning method and platform
CN114462418B (zh) * 2021-12-31 2023-04-07 粤港澳大湾区数字经济研究院(福田) 事件检测方法、系统、智能终端及计算机可读存储介质
CN114841162B (zh) * 2022-05-20 2024-01-05 中国电信股份有限公司 文本处理方法、装置、设备及介质
KR102655393B1 (ko) * 2022-08-17 2024-04-05 국방과학연구소 적대적 강인성을 위한 신경망모델의 학습방법 및 이를 위한 장치
CN115878761B (zh) * 2023-03-02 2023-05-09 湖南蚁坊软件股份有限公司 事件脉络生成方法、设备及介质
CN116029356B (zh) * 2023-03-24 2023-06-13 杭州景业智能科技股份有限公司 刀具监测模型训练方法、刀具状态监测方法及相关装置

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106557566A (zh) * 2016-11-18 2017-04-05 杭州费尔斯通科技有限公司 一种文本训练方法及装置
CN109492764A (zh) * 2018-10-24 2019-03-19 平安科技(深圳)有限公司 生成式对抗网络的训练方法、相关设备及介质
US20190114348A1 (en) * 2017-10-13 2019-04-18 Microsoft Technology Licensing, Llc Using a Generative Adversarial Network for Query-Keyword Matching
CN109766432A (zh) * 2018-07-12 2019-05-17 中国科学院信息工程研究所 一种基于生成对抗网络的中文摘要生成方法和装置
CN110188172A (zh) * 2019-05-31 2019-08-30 清华大学 基于文本的事件检测方法、装置、计算机设备及存储介质

Family Cites Families (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7024370B2 (en) * 2002-03-26 2006-04-04 P) Cis, Inc. Methods and apparatus for early detection of health-related events in a population
US20100122270A1 (en) * 2008-11-12 2010-05-13 Lin Yeejang James System And Method For Consolidating Events In A Real Time Monitoring System
US20130212156A1 (en) * 2012-02-15 2013-08-15 Qpr Software Oyj Processing event instance data in a client-server architecture

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106557566A (zh) * 2016-11-18 2017-04-05 杭州费尔斯通科技有限公司 一种文本训练方法及装置
US20190114348A1 (en) * 2017-10-13 2019-04-18 Microsoft Technology Licensing, Llc Using a Generative Adversarial Network for Query-Keyword Matching
CN109766432A (zh) * 2018-07-12 2019-05-17 中国科学院信息工程研究所 一种基于生成对抗网络的中文摘要生成方法和装置
CN109492764A (zh) * 2018-10-24 2019-03-19 平安科技(深圳)有限公司 生成式对抗网络的训练方法、相关设备及介质
CN110188172A (zh) * 2019-05-31 2019-08-30 清华大学 基于文本的事件检测方法、装置、计算机设备及存储介质

Cited By (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112862837A (zh) * 2021-01-27 2021-05-28 南京信息工程大学 一种基于卷积神经网络的图像处理方法和系统
CN112862837B (zh) * 2021-01-27 2023-06-23 南京信息工程大学 一种基于卷积神经网络的图像处理方法和系统
CN113392213A (zh) * 2021-04-19 2021-09-14 合肥讯飞数码科技有限公司 事件抽取方法以及电子设备、存储装置
CN113392213B (zh) * 2021-04-19 2024-05-31 合肥讯飞数码科技有限公司 事件抽取方法以及电子设备、存储装置
CN113326371A (zh) * 2021-04-30 2021-08-31 南京大学 一种融合预训练语言模型与抗噪声干扰远程监督信息的事件抽取方法
CN113326371B (zh) * 2021-04-30 2023-12-29 南京大学 一种融合预训练语言模型与抗噪声干扰远程监督信息的事件抽取方法
CN113724149A (zh) * 2021-07-20 2021-11-30 北京航空航天大学 一种弱监督的可见光遥感图像薄云去除方法
CN113724149B (zh) * 2021-07-20 2023-09-12 北京航空航天大学 一种弱监督的可见光遥感图像薄云去除方法
CN113987163A (zh) * 2021-09-27 2022-01-28 浙江大学 一种基于本体指导的终身事件抽取方法
CN113987163B (zh) * 2021-09-27 2024-06-07 浙江大学 一种基于本体指导的终身事件抽取方法

Also Published As

Publication number Publication date
CN110188172A (zh) 2019-08-30
WO2020239061A9 (zh) 2021-01-28
US20210334665A1 (en) 2021-10-28
CN110188172B (zh) 2022-10-28

Similar Documents

Publication Publication Date Title
WO2020239061A1 (zh) 基于文本的事件检测方法、装置、计算机设备及存储介质
JP6453968B2 (ja) 閾値変更装置
US11531874B2 (en) Regularizing machine learning models
Gu et al. Stack-captioning: Coarse-to-fine learning for image captioning
US20210019599A1 (en) Adaptive neural architecture search
CN111428021A (zh) 基于机器学习的文本处理方法、装置、计算机设备及介质
US20220092416A1 (en) Neural architecture search through a graph search space
US11282501B2 (en) Speech recognition method and apparatus
CN110929515A (zh) 基于协同注意力和自适应调整的阅读理解方法及系统
Yan et al. Discrete-continuous action space policy gradient-based attention for image-text matching
Wang et al. Cost-effective object detection: Active sample mining with switchable selection criteria
US11809965B2 (en) Continual learning for multi modal systems using crowd sourcing
WO2021001517A1 (en) Question answering systems
Xu et al. AFAT: adaptive failure-aware tracker for robust visual object tracking
US11550831B1 (en) Systems and methods for generation and deployment of a human-personified virtual agent using pre-trained machine learning-based language models and a video response corpus
US20230096070A1 (en) Natural-language processing across multiple languages
US20230029590A1 (en) Evaluating output sequences using an auto-regressive language model neural network
CN112905166B (zh) 人工智能编程系统、计算机设备、计算机可读存储介质
WO2023091144A1 (en) Forecasting future events from current events detected by an event detection engine using a causal inference engine
JP7161974B2 (ja) 品質管理方法
KR20200010679A (ko) 이질성 학습 기반의 정보 분류 장치
US11514920B2 (en) Method and system for determining speaker-user of voice-controllable device
US11921806B2 (en) Rearranging tags on a graphical user interface (GUI) based on known and unknown levels of web traffic
US11934794B1 (en) Systems and methods for algorithmically orchestrating conversational dialogue transitions within an automated conversational system
WO2023221592A1 (zh) 模型协同训练方法及相关装置

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 20814762

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 20814762

Country of ref document: EP

Kind code of ref document: A1