CN115935983A - Event extraction method and device, electronic equipment and storage medium - Google Patents

Event extraction method and device, electronic equipment and storage medium Download PDF

Info

Publication number
CN115935983A
CN115935983A CN202211717646.2A CN202211717646A CN115935983A CN 115935983 A CN115935983 A CN 115935983A CN 202211717646 A CN202211717646 A CN 202211717646A CN 115935983 A CN115935983 A CN 115935983A
Authority
CN
China
Prior art keywords
text
event
processed
label
word
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202211717646.2A
Other languages
Chinese (zh)
Inventor
李晓平
顾文斌
杨祎聪
李松柏
孙勇
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shanghai Hengsheng Juyuan Data Service Co ltd
Hangzhou Hengsheng Juyuan Information Technology Co ltd
Original Assignee
Shanghai Hengsheng Juyuan Data Service Co ltd
Hangzhou Hengsheng Juyuan Information Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shanghai Hengsheng Juyuan Data Service Co ltd, Hangzhou Hengsheng Juyuan Information Technology Co ltd filed Critical Shanghai Hengsheng Juyuan Data Service Co ltd
Priority to CN202211717646.2A priority Critical patent/CN115935983A/en
Publication of CN115935983A publication Critical patent/CN115935983A/en
Pending legal-status Critical Current

Links

Images

Landscapes

  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The embodiment of the application relates to the field of natural language processing, and provides an event extraction method, an event extraction device, electronic equipment and a storage medium, wherein when a text to be processed is subjected to event extraction, a text classification model is firstly utilized to perform event primary classification on the text to be processed, so that a prediction category label and text heat information of the text to be processed are obtained; then, according to the subject information of each event subject in the text to be processed, and in combination with text heat information, finding out a target event subject matched with the prediction type label from all event subjects; then, because the prediction category label is obtained by clustering at least one event type, event secondary classification is carried out on the text to be processed by utilizing the key feature word bank, and a target event type is restored from the prediction category label, so that an event label of the text to be processed can be obtained; therefore, the event extraction can be realized under the condition that the trigger words do not need to be extracted, and the event extraction efficiency is improved.

Description

Event extraction method and device, electronic equipment and storage medium
Technical Field
The embodiment of the application relates to the field of natural language processing, in particular to an event extraction method and device, electronic equipment and a storage medium.
Background
The event extraction refers to extracting structured event information from a text containing the event information, generally, the event information includes an event type and an event element, wherein the event element includes information such as an event main body, and the event extraction has practical use significance in many fields such as information retrieval.
In the early stage, event extraction generally adopts a mode of pattern matching, and event information is extracted from a text based on a syntax tree or a regular expression. In recent years, with the development of machine learning and deep learning, event extraction using a statistical model and a deep learning model has become the mainstream of research. In the latter, the pipeline extraction and the joint extraction can be further divided according to the arrangement mode of the tasks. The pipeline type extraction is composed of a plurality of independent subtasks, wherein the core link is to extract trigger words, and the extraction of the trigger words is complicated and difficult to exhaust. The joint extraction uniformly processes links such as detecting trigger words, judging event types, extracting event elements and the like, can supplement the mutual influence among texts, and has higher model complexity.
Therefore, the conventional event extraction method is complicated, and thus, the efficiency is low.
Disclosure of Invention
An object of the embodiments of the present application is to provide an event extraction method, an event extraction device, an electronic device, and a storage medium, which can complete event extraction of a text without extracting a trigger word, and improve event extraction efficiency.
In order to achieve the above purpose, the embodiments of the present application employ the following technical solutions:
in a first aspect, an embodiment of the present application provides an event extraction method, where the method includes:
acquiring a text to be processed and each event main body and main body information thereof in the text to be processed;
performing event primary classification on the text to be processed by using a pre-trained text classification model to obtain a prediction category label and text heat information of the text to be processed, wherein the prediction category label is obtained by clustering at least one event type;
obtaining a target event main body matched with the prediction category label according to the main body information of each event main body and the text heat information;
and performing event secondary classification on the text to be processed by utilizing a pre-established key feature word bank, and reducing a target event type from the prediction category label to obtain an event label of the text to be processed, wherein the event label comprises the target event main body and the target event type.
Optionally, the text classification model includes a Bert model and a multi-label classifier, the multi-label classifier including a plurality of category labels;
the step of performing event primary classification on the text to be processed by using a pre-trained text classification model to obtain a prediction category label and text heat information of the text to be processed comprises the following steps:
inputting the text to be processed into the text classification model, and obtaining an embedding sequence of the text to be processed by using the Bert model, wherein the embedding sequence comprises word embedding of a set CLS symbol and word embedding of each word in the text to be processed;
learning semantic information of the text to be processed based on an attention mechanism by using the Bert model, and obtaining an attention matrix corresponding to the text to be processed and an output vector of the CLS symbol; wherein the attention matrix represents the similarity relation between the CLS symbol and each word in the text to be processed;
classifying the output vector by using the multi-label classifier to obtain a probability value of each class label, and taking the class label with the probability value higher than a set threshold value as the prediction class label;
and performing linear transformation on the attention moment array by using the multi-label classifier to obtain the text heat information, wherein the text heat information represents the relevance between the CLS symbol under the prediction class label and each word in the text to be processed.
Optionally, the subject information includes location information;
the step of obtaining a target event subject matched with the prediction category label according to the subject information of each event subject and the text popularity information includes:
according to the position information of each event main body, the text to be processed is divided into sentences to obtain a text unit corresponding to each event main body;
calculating the text heat corresponding to each text unit according to the text heat information;
and taking the event main body corresponding to the text unit with the highest text popularity as the target event main body.
Optionally, the text classification model includes a plurality of category labels, and each category label is obtained by clustering at least one event type; the key characteristic word library comprises a plurality of key characteristic words corresponding to each event type and the weight of each key characteristic word;
the step of performing event secondary classification on the text to be processed by using a pre-established key feature word library and restoring a target event type from the prediction category label comprises the following steps:
performing word segmentation on the text to be processed to obtain a plurality of reference words;
for each event type under the prediction category label, determining each target key feature word of the event type from the plurality of reference words based on the key feature word library;
obtaining the weight of each target key feature word and summing the weights to obtain the weight of the event type;
and taking the event type with the highest weight as the target event type.
Optionally, the step of obtaining the text to be processed and each event subject and subject information thereof in the text to be processed includes:
acquiring an original text;
generating an abstract of the original text through an automatic abstract model to obtain the text to be processed;
and carrying out entity recognition on the text to be processed through an entity recognition model to obtain each event main body and main body information thereof in the text to be processed.
Optionally, the text classification model is trained by:
acquiring a supervised corpus, wherein the supervised corpus comprises a plurality of training samples and an event type of each training sample;
clustering all event types to obtain a plurality of category labels, wherein the category labels comprise at least one event type;
and training the text classification model by using the training samples and the class labels to obtain the trained text classification model.
Optionally, the clustering all event types to obtain a plurality of category labels includes:
performing text steering on each training sample through a pre-trained word embedding model to obtain each word embedding information;
dividing the word embedding information with the same event type into a group, and taking the average value of all the word embedding information in the group as the feature vector of the event type to obtain the feature vector of each event type;
calculating the correlation of every two event types according to the feature vector of each event type;
and performing hierarchical clustering on all event types according to the correlation of every two event types to obtain the plurality of category labels.
Optionally, the text classification model comprises a Bert model and a multi-label classifier, the multi-label classifier comprising the plurality of category labels;
the step of training the text classification model by using the training samples and the class labels to obtain a trained text classification model includes:
inputting the training samples and the class labels into the text classification model, and obtaining a sample embedding sequence of the training samples by using the Bert model, wherein the sample embedding sequence comprises word embedding for setting a CLS symbol and word embedding of each word in the training samples;
learning semantic information of the sample embedding sequence by using the Bert model based on an attention mechanism to obtain an output vector of the CLS symbol;
classifying the output vector of the CLS symbol by using the multi-label classifier to obtain a prediction class label of the training sample;
and training the text classification model based on the class label and the prediction class label of each training sample and a preset loss function to obtain the trained text classification model.
Optionally, the loss function is:
L total (x k ,y k )=[1+γ(1-F1 body (x k ,u k ))]L DB (x k ,y k )
wherein L is total Representing the loss function, k represents the number of training samples, x represents the training samples, y represents class labels of the training samples, gamma represents the coefficient of the loss of the event subject, F1 body Indicating event subject accuracy, L DB Representing a classification loss function;
the event subject accuracy is:
Figure SMS_1
wherein C represents the total number of class labels, i represents the number of class labels of the multi-label classifier; TP, FP and FN represent confusion matrix indexes of event subject classification results in the ith class label of the kth training sample;
the classification loss function is:
Figure SMS_2
wherein, the first and the second end of the pipe are connected with each other,
Figure SMS_3
represents the weight of the ith class label of the k training sample after smoothing, z represents the predicted class label of the training sample, lambda is a hyperparameter influencing the loss weight of the negative sample, v i The weight bias for the ith class label. />
Optionally, the keyword library is built by:
performing word segmentation on the supervised corpus and removing stop words to obtain a word segmentation result of each training sample;
based on the word segmentation result of each training sample, removing the high-frequency public words of the training sample corresponding to each class label;
screening out a special high-frequency word of a training sample corresponding to the event type aiming at each event type under any one category label to obtain each key feature word of the event type;
for each key feature word of any event type under the category label, obtaining the weight of the key feature word according to the word frequency of the key feature word in the supervised corpus and the sample number of training samples corresponding to the event type;
and obtaining the weight of each key characteristic word of each event type under each category label to obtain the key characteristic word library.
In a second aspect, an embodiment of the present application further provides an event extraction apparatus, where the apparatus includes:
the system comprises an obtaining module, a processing module and a processing module, wherein the obtaining module is used for obtaining a text to be processed and each event main body and main body information thereof in the text to be processed;
the event primary classification module is used for carrying out event primary classification on the text to be processed by utilizing a pre-trained text classification model to obtain a prediction category label and text heat information of the text to be processed, wherein the prediction category label is obtained by clustering at least one event type;
the event main body matching module is used for obtaining a target event main body matched with the prediction category label according to the main body information of each event main body and the text heat information;
and the event secondary classification module is used for carrying out event secondary classification on the text to be processed by utilizing a pre-established key feature word library, reducing a target event type from the prediction category label and obtaining an event label of the text to be processed, wherein the event label comprises the target event main body and the target event type.
In a third aspect, an embodiment of the present application further provides an electronic device, which includes a processor and a memory, where the memory is used to store a program, and the processor is used to implement the event extraction method in the first aspect when executing the program.
In a fourth aspect, an embodiment of the present application further provides a computer-readable storage medium, on which a computer program is stored, where the computer program, when executed by a processor, implements the event extraction method in the first aspect.
Compared with the prior art, according to the event extraction method, the event extraction device, the electronic device and the storage medium provided by the embodiment of the application, when the event extraction is performed on the text to be processed, the text classification model is firstly utilized to perform the event extraction on the text to be processed for one time
Classifying to obtain a prediction category label and text heat information of the text to be processed; then, according to the main body information of each 5 event main bodies in the text to be processed, combining with text heat information, finding out the main body matched with the prediction category label from all the event main bodies
Matching a target event main body; secondly, clustering at least one event type to obtain a prediction category label, performing secondary event classification on the text to be processed by utilizing the key feature word library, and reducing a target event type from the prediction category label to obtain an event label of the text to be processed; therefore, the event extraction can be realized under the condition that the trigger words do not need to be extracted, and the event extraction efficiency is improved.
Drawings
Fig. 1 illustrates a clustering example diagram of event types provided in an embodiment of the present application.
Fig. 2 shows a schematic structural diagram of a text classification model provided in an embodiment of the present application.
Fig. 3 shows a flowchart of an event extraction method provided in an embodiment of the present application.
Fig. 4 shows a block schematic diagram of an event extraction device according to an embodiment of the present application.
Fig. 5 shows a block schematic diagram of an electronic device provided in an embodiment of the present application.
An icon: 100-event extraction means; 101-obtaining a module; 102-event primary classification module; 103-event subject matching module; 104-event secondary classification module; 10-an electronic device; 11-a processor; 12-a memory; 13-bus.
Detailed Description
The technical solution in the embodiments of the present application will be described clearly and completely 0 with reference to the drawings in the embodiments of the present application.
The traditional pipeline type extraction is composed of a plurality of independent subtasks, which mainly comprise: the method comprises the steps of detecting trigger words, judging event types, extracting event elements and the like, wherein the core link is the extraction of the trigger words, and the extraction of the trigger words is complicated and difficult to exhaust. The joint extraction is to process the links of detecting trigger words, judging event types, extracting event elements and the like uniformly, and can supplement each link
The interaction between the individual texts. In contrast, however, the complexity of the joint extraction model is higher, and the actual effect is not necessarily better than that of the pipeline extraction.
In order to solve the problems, in the embodiment of the application, when extracting an event from a text to be processed, a text classification model is firstly utilized to perform event primary classification on the text to be processed, so as to obtain a prediction category label and text heat information of the text to be processed; then, according to the main body information of each event main body in the text to be processed, combining the text heat information and all events
Finding out a target event main body matched with the prediction category label from the main body; then, as the prediction category label is obtained by clustering at least one event type of 0, the secondary event classification is carried out on the text to be processed by utilizing the key feature word bank
Class, namely, restoring the target event type from the prediction class label to obtain an event label of the text to be processed; therefore, the event extraction can be realized under the condition that the trigger words do not need to be extracted, and the event extraction efficiency is improved. As described in detail below.
The text for extracting the event in the embodiment of the present application may be various texts such as news, blogs, treatises, electronic medical records, and the like, which is not limited in any way in the embodiment of the present application, and the following embodiment takes a news text as an example for description.
For convenience of understanding, before describing specific implementations of the embodiments of the present application, a training process of a text classification model and a process of establishing a keyword library are described.
First, a training process of the text classification model is introduced, which may include the following steps:
s1, obtaining a supervised corpus, wherein the supervised corpus comprises a plurality of training samples and an event type of each training sample.
In this embodiment, the training sample may be news text on the internet, including news headlines and news body. The event type may be manually labeled from the training sample, and may be a classification of news events involved in the training sample, such as high-head off-duty, and the like.
In practical applications, some news texts may be relatively long, and if the training samples directly adopt the news texts, the model training efficiency may be affected. Therefore, before model training, the news text can be abstracted through an automatic abstract model to control the overall text length, so that a training sample comprising news headlines and text abstracts is obtained.
Optionally, the automatic summarization model may be a model that can automatically generate a summary in the prior art, for example, seq2seq, etc., and this is not limited in this application.
In this embodiment, before labeling the event type for the training sample, an event system may be established according to business attributes, for example, related to business of an enterprise, and an event system may be established around the business of the enterprise, for example, related events of personnel, finance, administration, and the like, and then the training sample may be labeled according to the event system. It should be noted that the event system can be flexibly established by the user according to the actual service, and the embodiment of the present application does not limit this.
And S2, clustering all event types to obtain a plurality of category labels, wherein the category labels comprise at least one event type.
In practical applications, the event classification may be more detailed, for example, the event "high management leaving" may be influenced by different levels of high management, so that the event classification may be further subdivided into actual requirements of "leader person leaving", "core high management leaving", and "important high management leaving". However, for news related to the event types "top management away", "leader person away", "core top management away", and "important top management away", the general text descriptions are generally substantially consistent and the difficulty of distinguishing is high.
For the text classification scenes with low event type distinguishing degree, in order to improve the accuracy of text classification, the similar event types can be clustered into a class label, so that the problem of high distinguishing difficulty can be solved, and the efficiency of event extraction can be improved.
Therefore, after the supervised corpus is obtained, the event type of each training sample can be clustered to obtain a plurality of category labels, and then the plurality of category labels are utilized to classify texts.
Optionally, the process of clustering all event types in step S2 to obtain a plurality of category labels may include S21 to S24.
And S21, performing text steering on each training sample through the pre-trained word embedding model to obtain each word embedding information.
In this embodiment, the pre-trained Word embedding model may be a model capable of implementing a text steering amount in the prior art, for example, word2Vec, gloVe, and the like, which is not limited in this embodiment.
And S22, dividing the word embedding information with the same event type into a group, and taking the average value of all the word embedding information in the group as the feature vector of the event type to obtain the feature vector of each event type.
In this embodiment, each word embedding information is grouped according to event type, word embedding information having the same event type (e.g., core high management career) is divided into one group, and an average value of all word embedding information in the group is calculated as a feature vector of the event type, so that a feature vector of each event type can be obtained.
And S23, calculating the correlation between every two event types according to the feature vector of each event type.
In this embodiment, for any two event types, the cosine similarity between the feature vectors of the two event types is calculated as the correlation between the two event types, and so on, and the correlation between every two event types is calculated.
And S24, performing hierarchical clustering on all event types according to the correlation of every two event types to obtain a plurality of category labels.
In this embodiment, hierarchical clustering is performed according to the correlation of event types, each event type is classified into one class, the classes closest to each other are merged according to the correlation of event types calculated in S23 to obtain a new class, and the average value of the feature vectors of all event types in the new class is used as the feature vector of the new class, so that the feature vector of each new class can be obtained; and further combining the new classes according to the processes from S23 to S24, and repeating the steps until the combined result achieves the optimal classification effect to obtain a plurality of class labels.
For example, referring to fig. 1, assume that there are 8 event types: leader departure, core high-management departure, important high-management departure, general high-management violation, important high-management violation, violation of the hold, general hold, and stockholder commitment not to hold. Firstly, according to the process from S23 to S24, leading sleeve character departure, core high-management departure and important high-management departure are combined into a virtual node 1, general high-management violation and important high-management violation are combined into a virtual node 2, violation reduction and general reduction are combined into a virtual node 3, and stockholder commitment and non-reduction are independently in 1 category; then, further according to the processes of S23 to S24, the virtual node 1 and the virtual node 2 are combined into the category label 1, and the virtual node 3 and the shareholder commitment are combined into the category label 2 without loss of support.
It should be noted that the combined result achieves the optimal classification effect, the evaluation needs to be performed by comprehensively considering the training effect of the text classification model and the secondary event classification effect, and meanwhile, the evaluation needs to be determined by combining the experience of the user on the actual service, which is not limited in the embodiment of the present application.
And S3, training the text classification model by using the plurality of training samples and the plurality of class labels to obtain the trained text classification model.
In this embodiment, after the plurality of class labels are obtained in step S2, each training sample has one class label, and at this time, the text classification model may be trained by using the plurality of training samples and the class label of each training sample.
Referring to fig. 2, the text classification model may include a Bert model and a multi-label classifier, and the multi-label classifier corresponds to the plurality of class labels obtained in step S2, so that the process of training the text classification model by using the plurality of training samples and the plurality of class labels in step S3 to obtain the trained text classification model may include S31 to S34.
And S31, inputting a plurality of training samples and a plurality of class labels into the text classification model, and obtaining a sample embedding sequence of the training samples by using the Bert model, wherein the sample embedding sequence comprises word embedding for setting CLS symbols and word embedding of each word in the training samples.
In this embodiment, for the text classification task, the Bert model inserts a CLS symbol in front of the text, and uses an output vector corresponding to the symbol as a semantic representation of the whole text for text classification.
And then, performing text steering quantity on the training sample after the CLS symbol is inserted to obtain word embedding of the CLS symbol and word embedding of each word in the training sample.
And S32, learning semantic information of the sample embedded sequence based on the attention mechanism by using the Bert model, and obtaining an output vector of the CLS symbol.
In this embodiment, the Bert model is assembled from multiple layers of transformers, and the attention mechanism is the most critical part of the transformers, so the attention mechanism will be described with emphasis. The main functions of the attention mechanism are: let the model put "attention" on a part of the inputs, i.e.: the effect of different parts of the input on the output is distinguished. The method is combined into a text classification task, namely enhancing the semantic representation of the words, the context information of the words is helpful for enhancing the semantic representation of the words, meanwhile, the effects of different words in the context on enhancing the semantic representation are often different, and in order to distinctively utilize the context information to enhance the semantic representation of the target words, the attention mechanism can be used.
The attention mechanism mainly involves three concepts: query, key and Value are combined into the application scene of semantic representation of enhanced words, the target word and the word of the context thereof have respective original Value, the attribution mechanism takes the target word as Query and each word of the context thereof as Key, and takes the similarity between Query and each Key as weight, and the Value of each word of the context is merged into the original Value of the target word.
That is, the attention mechanism takes semantic vector representation of a target word and each word of context as input, first obtains Query vector representation of the target word, key vector representation of each word of context, and original Value representation of the target word and each word of context through linear transformation, then calculates similarity of the Query vector and each Key vector as weight, and weight-fuses the Value vector of the target word and the Value vector of each word of context as output, that is, enhanced semantic vector representation of the target word.
In this embodiment, after word embedding is completed, the Bert model learns semantic information of each position in a text by using an attention mechanism, and obtains an output vector of a CLS symbol, where the output vector of the CLS symbol is used to finally obtain a prediction class label of a training sample through a multi-label classifier.
And S33, classifying the output vector of the CLS symbol by using a multi-label classifier to obtain a prediction class label of the training sample.
In this embodiment, the multi-label classifier includes a plurality of class labels, which are obtained in step S2, for example, class label 1, class label 2, and the like. The output vector of the CLS symbol is classified by using a multi-label classifier, the probability value of each class label can be obtained, the class label with the probability value higher than a set threshold (for example, 0.5) is used as a prediction class label of a training sample, and the set threshold can be flexibly set by a user according to actual requirements.
And S34, training the text classification model based on the class label and the prediction class label of each training sample and a preset loss function to obtain the trained text classification model.
In this embodiment, the loss function is:
L total (x k ,y k )=[1+γ(1-F1 body (x k ,u k ))]L DB (x k ,y k )
wherein L is total Represents a loss function, k represents the number of training samples, x represents a training sample,y represents the class label of the training sample (i.e., the class label obtained in step S2). And gamma represents the loss coefficient of the event subject, the gamma value is 0 in the initial training stage, and when the text classification accuracy is stable (for example, the F1 value of the latest 5 epoch text classifications is improved by no more than 0.01), the gamma value is gradually increased, the influence of the event subject accuracy on the loss is enhanced, and an optimal effect model is selected based on the final evaluation result.
F1 body Representing the accuracy of the event subject, and when training a text classification model, calculating the accuracy of the event subject in order to increase the perception of the model to the event subject, specifically as follows:
Figure SMS_4
wherein C represents the total number of class labels, and i represents the number of class labels of the multi-label classifier; TP, FP, and FN represent confusion matrix indexes of the classification result of the event subject in the ith class label of the kth training sample, specifically, four basic indexes of TP, FP, FN, and TN in the confusion matrix are as follows:
1. true values are Positive, the model considers the number of Positive (True Positive = TP);
2. true values are positive, the model considers the number of Negative (False Negative = FN);
3. the true value is negative, and the model considers the number of Positive (False Positive = FP);
4. the True value is Negative and the model considers the number of Negative (True Negative = TN).
It should be noted that the true value is the actual class label of the training sample, and the model is regarded as the predicted class label of the training sample output by the text classification model. A true value of positive means that the actual class label of the training sample is correct, i.e., a positive sample; the true value being negative means that the actual class label of the training sample is wrong, i.e., a negative sample. The model is considered to be positive, which means that the prediction class label of the training sample is correct, and the model is considered to be negative, which means that the prediction class label of the training sample is wrong.
In the calculation of the accuracy rate F1 of the event subject body Then, the accuracy rate F1 of the event subject is used body Based on this, the event subject accuracy factor, i.e., [1+ γ (1-F1) ] body (x k ,y k ))]Added to the loss function of the model. The purpose of this is to: taking the accuracy of the subject of the event as part of the loss, the training process will reduce the overall loss and thus will increase the accuracy of the subject of the event.
L DB And representing a classification loss function, wherein the classification loss function is a DB loss function (Distribution-balanced loss) in order to solve the problems of class imbalance and class co-occurrence of class labels. The unbalanced class means that, assuming that there are 10w training samples, there are 1w training samples of class label 1, and there are only 100 training samples of class label 2, and because the difference between the number of samples is too large, the training samples of class label 2 are easily ignored when training the model. The category co-occurrence means that when the model is trained, a sample of a certain category label is repeatedly trained by sample belts of other category labels, and the accuracy of model training is influenced.
The DB loss function mainly comprises two parts, wherein the first part is a weight rebalancing module after resampling, and the weight rebalancing module comprises the following specific steps:
Figure SMS_5
wherein the content of the first and second substances,
Figure SMS_6
representing the weight of the ith class label of the kth training sample after smoothing treatment, and being used for improving the class imbalance and class co-occurrence problems of the class labels; x represents the training sample, y represents the class label of the training sample (i.e., the class label obtained in step S2), and z represents the predicted class label of the training sample.
The second part is to improve the over-suppression of negative examples in the multi-label classification problem, which is specifically as follows:
Figure SMS_7
where λ is a hyperparameter that affects the loss weight of the negative sample, v i The weight bias for the ith class label.
After the two losses are combined, the DB loss function is obtained as follows:
Figure SMS_8
next, a process of establishing a keyword library is described, which may include the following steps:
and S10, performing word segmentation on the supervised corpus and removing stop words to obtain a word segmentation result of each training sample.
In this embodiment, the supervised corpus is segmented, stop words are removed, and the filtered words are counted, so as to obtain the segmentation result of each training sample.
And S20, based on the word segmentation result of each training sample, eliminating the high-frequency public words of the training sample corresponding to each class label.
In the present embodiment, for each category label, high-frequency common words under the same category label, such as "leave job", "dictionary job", and the like, are removed.
And S30, screening out the special high-frequency words of the training sample corresponding to the event type aiming at each event type under any category label to obtain each key feature word of the event type.
In this embodiment, for each category label, the unique high-frequency words of each event type under the category label are screened out, for example, with reference to fig. 1, the category label 1 includes "leader person resignation", "core high management resignation", and "important high management resignation", and for the event type of "leader person resignation", the unique high-frequency words of the training sample corresponding to "leader person resignation", for example, president, CEO, president executive, highest executive, sponsor, employer, joint president, and the like, are screened out as the key feature words of the event type.
And S40, aiming at each key characteristic word of any event type under the category label, obtaining the weight of the key characteristic word according to the word frequency of the key characteristic word in the supervised corpus and the sample number of the training sample corresponding to the event type.
In this embodiment, after obtaining the key feature words of a certain event type in the manner of step S30, each key feature word is given a weight, and the calculation method is as follows:
Figure SMS_9
wherein, w i Weight, n, representing key feature word i i Representing the ratio of the word frequency of the key characteristic word i in the supervised corpus to the sample number of the training samples corresponding to the event type in the supervised corpus, n median Represents n i The median of (2).
It should be noted that the above-mentioned process of calculating the weight of the key feature words is only an example, and in practice, the weight of each key word supports manual modification, and this is not limited in this embodiment of the present application.
And S50, obtaining the weight of each key feature word of each event type under each category label to obtain a key feature word library.
According to the process of the steps S30 to S40, for each category label, each key feature word and the weight thereof of each event type under the category label are obtained, and therefore the establishment of the key feature word library is completed.
The training process of the text classification model and the establishment process of the key feature word bank are introduced above, and on this basis, detailed description is given to specific implementation of the embodiment of the present application.
Referring to fig. 3, fig. 3 is a schematic flowchart illustrating an event extraction method according to an embodiment of the present application. The event extraction method is applied to the electronic equipment and can comprise the following steps:
s101, obtaining the text to be processed, each event main body in the text to be processed and main body information of the event main body.
In this embodiment, the text to be processed may be text that needs to be subjected to event extraction, for example, news text on the internet, including a title and a body, and the like. In practice, some news texts are relatively long, so before event extraction, the news texts can be abstracted through an automatic abstraction model to control the length of the whole text.
Alternatively, the process of obtaining the text to be processed and each event body in the text to be processed and the body information thereof in step S101 may include S1011 to S1013.
S1011, acquiring the original text.
And S1012, generating an abstract of the original text through the automatic abstract model to obtain the text to be processed.
And S1013, performing entity identification on the text to be processed through the entity identification model to obtain each event main body and main body information of the text to be processed.
In the present embodiment, the automatic summarization model may be a model that can automatically generate a summary in the prior art, for example, seq2 seq. The entity recognition model may be a model that can implement entity recognition in the prior art, such as BilSTM and the like.
In this embodiment, the event subject may be a name of a person, a name of an organization, a name of a place, and other entities identified by names, such as a company. The main body information may include position information, weight, and the like of the event main body in the text to be processed, which is not limited in any way by the embodiment of the present application.
S102, performing event primary classification on a text to be processed by using a pre-trained text classification model to obtain a prediction category label and text heat information of the text to be processed, wherein the prediction category label is obtained by clustering at least one event type.
In this embodiment, the process of performing the primary event classification on the text to be processed by using the pre-trained text classification model in step S102 may include S1021 to S1024.
And S1021, inputting the text to be processed into the text classification model, and obtaining an embedding sequence of the text to be processed by using the Bert model, wherein the embedding sequence comprises word embedding for setting CLS symbols and word embedding for each word in the text to be processed.
S1022, learning semantic information of the text to be processed based on an attention mechanism by using a Bert model, and obtaining an attention matrix corresponding to the text to be processed and an output vector of a CLS symbol; wherein, the moment matrix represents the similarity relation between the CLS symbol and each word in the text to be processed.
And S1023, classifying the output vector by using a multi-label classifier to obtain the probability value of each class label, and taking the class label with the probability value higher than a set threshold value as a prediction class label.
And S1024, performing linear transformation on the attention moment array by using the multi-label classifier to obtain text heat information, wherein the text heat information represents the relevance between a CLS symbol under the prediction class label and each word in the text to be processed.
In this embodiment, the Bert model uses an attention mechanism in the text classification inference process, and in this process, an attention matrix corresponding to the text to be processed can be extracted as follows:
Figure SMS_10
the Attention matrix may reflect the similarity relationship between q and k of the corresponding position, i.e., the similarity relationship between the CLS symbol and each word in the text to be processed. The Attention matrix is process data of model reasoning, and the specific calculation is performed by a Query matrix
Figure SMS_11
Key matrix->
Figure SMS_12
And constant d k Is calculated to get >>
Figure SMS_13
Wherein l is the length of the input sequence and h is the dimension of the hidden layer.
The Attention matrix is performed through a multi-label classifierLinear transformation is adopted to obtain word heat under each category label
Figure SMS_14
Wherein n is the number of category labels, as follows:
hot=sigmoid(AttentionW)
wherein the W matrix is from a multi-label classifier,
Figure SMS_15
in this embodiment, after the attention moment array is linearly transformed by using the multi-tag classifier to obtain the word heat under each class tag, hot information of the first CLS symbol is obtained, that is, matrix information hot [0,: i.e., a two-dimensional matrix of l × n is obtained from a hot three-dimensional matrix, where the first dimension is 0, and the correlation between the CLS symbol and each word in the text to be processed can be obtained.
Meanwhile, in the step S1023, the multi-tag classifier classifies the output vector of the CLS symbol to obtain a prediction category tag of the text to be processed, and the multi-tag classifier performs linear transformation on the Attention matrix to obtain word heat under each category tag, so that text heat information of the prediction category tag can be obtained from the hot three-dimensional matrix.
S103, obtaining a target event main body matched with the prediction type label according to the main body information and the text heat information of each event main body.
In this embodiment, a text classification model is used to perform event primary classification on a text to be processed, so as to obtain a prediction category tag and text heat information of the text to be processed, and then the prediction category tag is matched with an event main body, so as to obtain a target event main body matched with the prediction category tag.
Alternatively, the subject information may include location information, and the process of obtaining the target event subject matching the prediction category label according to the subject information and the text popularity information of each event subject in step S103 may include S1031 to S1033.
And S1031, according to the position information of each event main body, carrying out clause division on the text to be processed to obtain a text unit corresponding to each event main body.
In this embodiment, the text to be processed is divided into sentences according to the position information of the event body, and the text between the previous event body and the next event body is used as the text unit corresponding to the previous event body until the text unit corresponding to each event body is obtained.
S1032, calculating the text heat corresponding to each text unit according to the text heat information.
In this embodiment, after the text to be processed is divided into the text units corresponding to each event subject, the word popularity corresponding to each text unit is summed according to the text popularity information obtained in step S102, so as to obtain the text popularity corresponding to each text unit.
And S1033, taking the event main body corresponding to the text unit with the highest text popularity as the target event main body.
In this embodiment, after the text popularity corresponding to each text unit is calculated, the event body corresponding to the text unit with the highest text popularity is taken as the target event body. For example, assuming that the prediction category label is "financial loss or index variation", there are two event subjects "huanen international" and "yiwu rural commercial bank" in the text to be processed, and it is calculated that in the event of "financial loss or index variation", the text popularity of "huanen international" is higher than the text popularity of "yiwu rural commercial bank", then "huanen international" and "financial loss or index variation" are matched.
And S104, performing secondary event classification on the text to be processed by utilizing the pre-established key feature word library, and reducing a target event type from the prediction category label to obtain an event label of the text to be processed, wherein the event label comprises a target event main body and a target event type.
In this embodiment, because similar event types are clustered in the model training process, the prediction class label obtained by model prediction includes at least one event type, and therefore the prediction class label needs to be split and restored to obtain the target event type.
In this embodiment, the text classification model includes a plurality of category labels, each category label is obtained by clustering at least one event type, and the keyword library includes a keyword and a weight thereof corresponding to each event type under each category label, for example, in combination with fig. 1, the category label 1 is obtained by clustering three event types, namely, "leader person careers", "core high management careers", and "important high management careers", and the keyword library includes a keyword and a weight thereof corresponding to each of the "leader person careers", "core high management careers", and "important high management careers".
Therefore, the secondary event classification can be performed on the text to be processed in a key feature word scoring manner, and the target event type can be restored from the prediction category label, and the specific process can include S1041 to S1044.
S1041, performing word segmentation on the text to be processed to obtain a plurality of reference words.
S1042, aiming at each event type under the prediction category label, determining each target key feature word of the event type from a plurality of reference words based on the key feature word library.
And S1043, obtaining the weight of each target key feature word and summing the weights to obtain the weight of the event type.
And S1044, taking the event type with the highest weight as a target event type.
In this embodiment, when performing secondary event classification, the text to be processed is first segmented, and then, in combination with the key feature word library, it is determined whether each word is a key feature word of an event type under the prediction category label, so as to obtain a key feature word of each event type under the prediction category label. And then, summing the weights of the key characteristic words of each event type to serve as the weight of the event type, comparing the weights of the event types, and selecting the event type with the highest weight from the event types to serve as the final target event type.
Compared with the prior art, the embodiment of the application has the following beneficial effects:
firstly, the event extraction method provided by the embodiment of the application can obtain the event type with the accuracy rate of approximate text classification without extracting the trigger word.
Secondly, determining a hot spot area of the event information by adopting an Attention matrix based on an Attention mechanism, selecting an event main body close to the hot spot area as a final event main body, and completing the matching of the event type and the event main body.
Thirdly, aiming at a text classification scene with low event type discrimination, in order to improve the accuracy of text classification, firstly clustering and merging event types according to text word embedded information of labeled corpora, then carrying out primary event classification through a text classification model, and then carrying out secondary classification on similar events according to a key feature word grading mechanism on classification results, thereby better solving the problem of similar event classification, improving the classification error caused by insufficient perception of the model on local key information, and further improving the overall accuracy of event extraction.
In order to perform the corresponding steps in the above method embodiments and various possible embodiments, an implementation of the event extraction device is given below.
Referring to fig. 4, fig. 4 is a block diagram illustrating an event extraction apparatus 100 according to an embodiment of the present disclosure. The event extraction device 100 is applied to the electronic equipment 10 and comprises: the event classification method comprises an obtaining module 101, an event primary classification module 102, an event subject matching module 103 and an event secondary classification module 104.
An obtaining module 101, configured to obtain a to-be-processed text and each event subject in the to-be-processed text and subject information thereof.
The event primary classification module 102 is configured to perform event primary classification on a to-be-processed text by using a pre-trained text classification model to obtain a prediction category label and text heat information of the to-be-processed text, where the prediction category label is obtained by clustering at least one event type.
And the event main body matching module 103 is configured to obtain a target event main body matched with the prediction category tag according to the main body information and the text popularity information of each event main body.
And the event secondary classification module 104 is configured to perform event secondary classification on the text to be processed by using the pre-established key feature word bank, and restore the target event type from the prediction category label to obtain an event label of the text to be processed, where the event label includes a target event main body and a target event type.
Optionally, the obtaining module 101 is specifically configured to:
acquiring an original text;
generating an abstract of an original text through an automatic abstract model to obtain a text to be processed;
and carrying out entity recognition on the text to be processed through the entity recognition model to obtain each event main body and main body information of the text to be processed.
Optionally, the text classification model includes a Bert model and a multi-label classifier, the multi-label classifier including a plurality of category labels; the event primary classification module 102 is specifically configured to:
inputting a text to be processed into a text classification model, and obtaining an embedding sequence of the text to be processed by utilizing a Bert model, wherein the embedding sequence comprises word embedding of a set CLS symbol and word embedding of each word in the text to be processed;
learning semantic information of the text to be processed based on an attention mechanism by using a Bert model, and obtaining an attention matrix corresponding to the text to be processed and an output vector of a CLS symbol; wherein, the moment matrix represents the similarity relation between the CLS symbol and each word in the text to be processed;
classifying the output vector by using a multi-label classifier to obtain the probability value of each class label, and taking the class label with the probability value higher than a set threshold value as a prediction class label;
and performing linear transformation on the attention moment array by using a multi-label classifier to obtain text heat information, wherein the text heat information represents the relevance between the CLS symbol under the prediction class label and each word in the text to be processed.
Optionally, the subject information includes location information, and the event subject matching module 103 is specifically configured to:
according to the position information of each event main body, carrying out sentence separation on the text to be processed to obtain a text unit corresponding to each event main body;
calculating the text heat corresponding to each text unit according to the text heat information;
and taking the event main body corresponding to the text unit with the highest text popularity as a target event main body.
Optionally, the text classification model includes a plurality of category labels, and each category label is obtained by clustering at least one event type; the key characteristic word bank comprises a plurality of key characteristic words corresponding to each event type and the weight of each key characteristic word; the event secondary classification module 104 is specifically configured to:
performing word segmentation on a text to be processed to obtain a plurality of reference words;
determining each target key feature word of the event type from a plurality of reference words based on a key feature word library aiming at each event type under the prediction category label;
obtaining the weight of each target key feature word and summing the weights to obtain the weight of the event type;
and taking the event type with the highest weight as a target event type.
It can be clearly understood by those skilled in the art that, for convenience and brevity of description, the specific working process of the event extraction apparatus 100 described above may refer to the corresponding process in the foregoing method embodiments, and is not described herein again.
Referring to fig. 5, fig. 5 is a block diagram illustrating an electronic device 10 according to an embodiment of the present disclosure. The electronic device 10 includes a processor 11, a memory 12, and a bus 13, and the processor 11 is connected to the memory 12 through the bus 13.
The memory 12 is used for storing a program, and the processor 11 executes the program after receiving the execution instruction to implement the event extraction method disclosed in the above embodiment.
The Memory 12 may include a Random Access Memory (RAM) and may also include a non-volatile Memory (NVM).
The processor 11 may be an integrated circuit chip having signal processing capabilities. In implementation, the steps of the above method may be performed by instructions in the form of hardware integrated logic circuits or software in the processor 11. The processor 11 may be a general-purpose processor, and includes a Central Processing Unit (CPU), a Micro Control Unit (MCU), a Complex Programmable Logic Device (CPLD), a Field Programmable Gate Array (FPGA), and an embedded ARM.
The embodiment of the present application further provides a computer-readable storage medium, on which a computer program is stored, and when the computer program is executed by the processor 11, the event extraction method disclosed in the foregoing embodiment is implemented.
To sum up, according to the event extraction method, the event extraction device, the electronic device, and the storage medium provided in the embodiments of the present application, when extracting an event from a text to be processed, a text classification model is first used to perform event primary classification on the text to be processed, so as to obtain a prediction category label and text heat information of the text to be processed; then, according to the subject information of each event subject in the text to be processed, and in combination with text heat information, finding out a target event subject matched with the prediction type label from all event subjects; then, because the prediction category label is obtained by clustering at least one event type, event secondary classification is carried out on the text to be processed by utilizing the key feature word bank, and a target event type is restored from the prediction category label, so that an event label of the text to be processed can be obtained; therefore, under the condition that trigger words do not need to be extracted, event extraction can be achieved in a text classification mode, and the event extraction efficiency is improved.
The above description is only a preferred embodiment of the present application and is not intended to limit the present application, and various modifications and changes may be made by those skilled in the art. Any modification, equivalent replacement, improvement and the like made within the spirit and principle of the present application shall be included in the protection scope of the present application.

Claims (13)

1. An event extraction method, the method comprising:
acquiring a text to be processed, and each event main body and main body information thereof in the text to be processed;
performing event primary classification on the text to be processed by using a pre-trained text classification model to obtain a prediction category label and text heat information of the text to be processed, wherein the prediction category label is obtained by clustering at least one event type;
obtaining a target event main body matched with the prediction type label according to the main body information of each event main body and the text heat information;
and performing event secondary classification on the text to be processed by utilizing a pre-established key feature word bank, and reducing a target event type from the prediction category label to obtain an event label of the text to be processed, wherein the event label comprises the target event main body and the target event type.
2. The method of claim 1, wherein the text classification model comprises a Bert model and a multi-label classifier, the multi-label classifier comprising a plurality of category labels;
the step of performing event primary classification on the text to be processed by using a pre-trained text classification model to obtain a prediction category label and text heat information of the text to be processed comprises the following steps:
inputting the text to be processed into the text classification model, and obtaining an embedding sequence of the text to be processed by using the Bert model, wherein the embedding sequence comprises word embedding of a set CLS symbol and word embedding of each word in the text to be processed;
learning semantic information of the text to be processed based on an attention mechanism by using the Bert model, and obtaining an attention matrix corresponding to the text to be processed and an output vector of the CLS symbol; wherein the attention matrix represents the similarity relation between the CLS symbol and each word in the text to be processed;
classifying the output vector by using the multi-label classifier to obtain a probability value of each class label, and taking the class label with the probability value higher than a set threshold value as the prediction class label;
and performing linear transformation on the attention moment array by using the multi-label classifier to obtain the text heat information, wherein the text heat information represents the relevance between the CLS symbol under the prediction class label and each word in the text to be processed.
3. The method of claim 1, wherein the subject information includes location information;
the step of obtaining a target event subject matched with the prediction category label according to the subject information of each event subject and the text popularity information includes:
according to the position information of each event main body, the text to be processed is divided into sentences to obtain a text unit corresponding to each event main body;
calculating the text heat corresponding to each text unit according to the text heat information;
and taking the event main body corresponding to the text unit with the highest text popularity as the target event main body.
4. The method of claim 1, wherein the text classification model comprises a plurality of class labels, each of the class labels clustered for at least one event type; the key characteristic word library comprises a plurality of key characteristic words corresponding to each event type and the weight of each key characteristic word;
the step of performing event secondary classification on the text to be processed by using a pre-established key feature word library and restoring a target event type from the prediction category label comprises the following steps:
performing word segmentation on the text to be processed to obtain a plurality of reference words;
for each event type under the prediction category label, determining each target key feature word of the event type from the plurality of reference words based on the key feature word library;
obtaining the weight of each target key feature word and summing the weights to obtain the weight of the event type;
and taking the event type with the highest weight as the target event type.
5. The method of claim 1, wherein the step of obtaining the text to be processed and each event body and body information thereof in the text to be processed comprises:
acquiring an original text;
generating an abstract of the original text through an automatic abstract model to obtain the text to be processed;
and carrying out entity recognition on the text to be processed through an entity recognition model to obtain each event main body and main body information thereof in the text to be processed.
6. The method of claim 1, wherein the text classification model is trained by:
acquiring a supervised corpus, wherein the supervised corpus comprises a plurality of training samples and an event type of each training sample;
clustering all event types to obtain a plurality of category labels, wherein the category labels comprise at least one event type;
and training the text classification model by using the training samples and the class labels to obtain the trained text classification model.
7. The method of claim 6, wherein the step of clustering all event types to obtain a plurality of category labels comprises:
performing text steering quantity on each training sample through a pre-trained word embedding model to obtain each word embedding information;
dividing the word embedding information with the same event type into a group, and taking the mean value of all the word embedding information in the group as the feature vector of the event type to obtain the feature vector of each event type;
calculating the correlation of every two event types according to the feature vector of each event type;
and performing hierarchical clustering on all event types according to the correlation of every two event types to obtain the plurality of category labels.
8. The method of claim 6, wherein the text classification model comprises a Bert model and a multi-label classifier, the multi-label classifier comprising the plurality of category labels;
the step of training the text classification model by using the training samples and the category labels to obtain a trained text classification model comprises:
inputting the training samples and the class labels into the text classification model, and obtaining a sample embedding sequence of the training samples by using the Bert model, wherein the sample embedding sequence comprises word embedding for setting a CLS symbol and word embedding of each word in the training samples;
learning semantic information of the sample embedding sequence by using the Bert model based on an attention mechanism to obtain an output vector of the CLS symbol;
classifying the output vector of the CLS symbol by using the multi-label classifier to obtain a prediction class label of the training sample;
and training the text classification model based on the class label and the prediction class label of each training sample and a preset loss function to obtain the trained text classification model.
9. The method of claim 8, wherein the loss function is:
L total (x k ,y k )=[1+γ(1-F1 body (x k ,y k ))]L DB (x k ,y k )
wherein L is total Representing the loss function, k represents the number of training samples, x represents the training samples, y represents class labels of the training samples, γ represents the coefficient of loss of the event subject, F1 body Indicating event subject accuracy, L DB Representing a classification loss function;
the event subject accuracy is:
Figure FDA0004027933340000031
wherein C represents the total number of class labels, i represents the number of class labels of the multi-label classifier; TP, FP and FN represent confusion matrix indexes of event main body classification results in the ith class label of the kth training sample;
the classification loss function is:
Figure FDA0004027933340000032
wherein the content of the first and second substances,
Figure FDA0004027933340000033
represents the weight of the ith class label of the k training sample after smoothing, z represents the predicted class label of the training sample, lambda is a hyperparameter influencing the loss weight of the negative sample, v i The weight bias for the ith class label.
10. The method of claim 6, wherein the keyword library is created by:
performing word segmentation on the supervised corpus and removing stop words to obtain a word segmentation result of each training sample;
based on the word segmentation result of each training sample, removing the high-frequency public words of the training sample corresponding to each class label;
screening out a special high-frequency word of a training sample corresponding to the event type aiming at each event type under any one category label to obtain each key feature word of the event type;
for each key feature word of any event type under the category label, obtaining the weight of the key feature word according to the word frequency of the key feature word in the supervised corpus and the sample number of training samples corresponding to the event type;
and obtaining the weight of each key characteristic word of each event type under each category label to obtain the key characteristic word library.
11. An event extraction device, the device comprising:
the system comprises an obtaining module, a processing module and a processing module, wherein the obtaining module is used for obtaining a text to be processed and each event main body and main body information thereof in the text to be processed;
the event primary classification module is used for performing event primary classification on the text to be processed by utilizing a pre-trained text classification model to obtain a prediction category label and text heat information of the text to be processed, wherein the prediction category label is obtained by clustering at least one event type;
the event main body matching module is used for obtaining a target event main body matched with the prediction category label according to the main body information of each event main body and the text heat information;
and the event secondary classification module is used for carrying out event secondary classification on the text to be processed by utilizing a pre-established key feature word library, reducing a target event type from the prediction category label and obtaining an event label of the text to be processed, wherein the event label comprises the target event main body and the target event type.
12. An electronic device, comprising a processor and a memory, the memory being configured to store a program, the processor being configured to implement the event extraction method of any one of claims 1-10 when executing the program.
13. A computer-readable storage medium, on which a computer program is stored which, when being executed by a processor, carries out an event extraction method according to any one of claims 1 to 10.
CN202211717646.2A 2022-12-29 2022-12-29 Event extraction method and device, electronic equipment and storage medium Pending CN115935983A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202211717646.2A CN115935983A (en) 2022-12-29 2022-12-29 Event extraction method and device, electronic equipment and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202211717646.2A CN115935983A (en) 2022-12-29 2022-12-29 Event extraction method and device, electronic equipment and storage medium

Publications (1)

Publication Number Publication Date
CN115935983A true CN115935983A (en) 2023-04-07

Family

ID=86552265

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202211717646.2A Pending CN115935983A (en) 2022-12-29 2022-12-29 Event extraction method and device, electronic equipment and storage medium

Country Status (1)

Country Link
CN (1) CN115935983A (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116340831A (en) * 2023-05-24 2023-06-27 京东科技信息技术有限公司 Information classification method and device, electronic equipment and storage medium
CN116501898A (en) * 2023-06-29 2023-07-28 之江实验室 Financial text event extraction method and device suitable for few samples and biased data

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116340831A (en) * 2023-05-24 2023-06-27 京东科技信息技术有限公司 Information classification method and device, electronic equipment and storage medium
CN116340831B (en) * 2023-05-24 2024-02-06 京东科技信息技术有限公司 Information classification method and device, electronic equipment and storage medium
CN116501898A (en) * 2023-06-29 2023-07-28 之江实验室 Financial text event extraction method and device suitable for few samples and biased data
CN116501898B (en) * 2023-06-29 2023-09-01 之江实验室 Financial text event extraction method and device suitable for few samples and biased data

Similar Documents

Publication Publication Date Title
CN108959431B (en) Automatic label generation method, system, computer readable storage medium and equipment
CA2423033C (en) A document categorisation system
CN107844533A (en) A kind of intelligent Answer System and analysis method
Al-Khawaldeh et al. Lexical cohesion and entailment based segmentation for arabic text summarization (lceas)
US9575947B2 (en) System and method of automatically mapping a given annotator to an aggregate of given annotators
CN115935983A (en) Event extraction method and device, electronic equipment and storage medium
CN111190997A (en) Question-answering system implementation method using neural network and machine learning sequencing algorithm
CN109766437A (en) A kind of Text Clustering Method, text cluster device and terminal device
CN113821605B (en) Event extraction method
Armouty et al. Automated keyword extraction using support vector machine from Arabic news documents
CN108804564A (en) The combined recommendation method and terminal device of financial product
Hossari et al. TEST: A terminology extraction system for technology related terms
CN109271624A (en) A kind of target word determines method, apparatus and storage medium
CN114328800A (en) Text processing method and device, electronic equipment and computer readable storage medium
CN114239828A (en) Supply chain affair map construction method based on causal relationship
CN111241848B (en) Article reading comprehension answer retrieval method and device based on machine learning
CN116049376B (en) Method, device and system for retrieving and replying information and creating knowledge
Revindasari et al. Traceability between business process and software component using Probabilistic Latent Semantic Analysis
CN115600595A (en) Entity relationship extraction method, system, equipment and readable storage medium
Kumari et al. Performance of Optimizers in Text Summarization for News Articles
Posonia et al. Context-based classification of XML documents in feature clustering
Suzuki et al. On a new model for automatic text categorization based on vector space model
CN112270189A (en) Question type analysis node generation method, question type analysis node generation system and storage medium
Janciak et al. Distributed classification of textual documents on the grid
Freitas et al. Catboost algorithm application in legal texts and UN 2030 Agenda

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination