CN113641888B - Event-related news filtering learning method based on fusion topic information enhanced PU learning - Google Patents

Event-related news filtering learning method based on fusion topic information enhanced PU learning Download PDF

Info

Publication number
CN113641888B
CN113641888B CN202110347488.5A CN202110347488A CN113641888B CN 113641888 B CN113641888 B CN 113641888B CN 202110347488 A CN202110347488 A CN 202110347488A CN 113641888 B CN113641888 B CN 113641888B
Authority
CN
China
Prior art keywords
news
event
training
learning
topic
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202110347488.5A
Other languages
Chinese (zh)
Other versions
CN113641888A (en
Inventor
余正涛
王冠文
线岩团
张玉
黄于欣
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Kunming University of Science and Technology
Original Assignee
Kunming University of Science and Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Kunming University of Science and Technology filed Critical Kunming University of Science and Technology
Priority to CN202110347488.5A priority Critical patent/CN113641888B/en
Publication of CN113641888A publication Critical patent/CN113641888A/en
Application granted granted Critical
Publication of CN113641888B publication Critical patent/CN113641888B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/953Querying, e.g. by the use of web search engines
    • G06F16/9532Query formulation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/35Clustering; Classification
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/30Semantic analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/044Recurrent networks, e.g. Hopfield networks

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Databases & Information Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • General Health & Medical Sciences (AREA)
  • Mathematical Physics (AREA)
  • Health & Medical Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Computational Linguistics (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Evolutionary Computation (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • Software Systems (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention relates to a learning method for enhancing event-related news filtering of PU learning based on fusion topic information. According to the invention, the marked and unmarked event related news data sets are subjected to topic information extraction in an unsupervised pre-training mode, and extracted topic information is added into the initial training and subsequent iterative training processes of PU learning, so that more sample information can be utilized under the condition of fewer initial event related news samples, topic enhancement is performed in the subsequent iterative training processes, and the classifier trained in each iteration can acquire truly reliable positive and negative sample data from unmarked data, thereby improving the performance of the final event related news classifier. The invention improves the F1 value by 1.8% compared with the baseline model learned by PU, and leads more under the conditions of low initial sample and high iteration. The method for enhancing PU learning by using the topic information can effectively solve the problem of lack of training data in the news filtering task related to the case.

Description

Event-related news filtering learning method based on fusion topic information enhanced PU learning
Technical Field
The invention relates to a learning method for enhancing event-related news filtering of PU learning based on fusion topic information, belonging to the technical field of natural language processing.
Background
Event-related news filtering tasks can be generally regarded as a two-class problem, and common methods can be classified into keyword retrieval and machine learning methods. Early researchers matched news text through a collection of domain related keywords, such as KMP, sunday, etc. algorithms. Machine learning algorithms are currently an effective solution to event-related news filtering. Researchers make assumptions about the data distribution by statistical methods to infer event-related news categories such as SVMs, decision trees, etc. Researchers have also used deep learning algorithms for news filtering, and deep networks for extracting hidden features of text and for classification. Because the event-related news scenes are complex and changeable, a complete keyword set is difficult to construct, so that the event-related news filtering task cannot be performed by using keyword retrieval, meanwhile, because of the territory and specificity of the event-related news, small-scale event-related news data can be collected only through the occurred cases, all case situations and scenes are difficult to cover, a large number of unlabeled event-related news are hidden in historical news, and the situation of lack of training data can make a text filtering method based on machine learning difficult to obtain ideal effects. Therefore, how to achieve better filtering performance with only a small number of event-related news samples is an important point of the invention.
The invention mainly considers that the topic information is utilized to enhance PU learning to conduct event-related news classification. Therefore, the method fully utilizes the topic information in news based on the PU learning methods proposed by Yu et al, liu et al, ren et al, li et al and Xiao et al, integrates the topic information into the method for enhancing PU learning and exploring the news text classification related to the event.
Disclosure of Invention
The invention provides a learning method for enhancing event-related news filtering based on PU learning by fusing topic information, which fully utilizes topic information implicit in news to improve the accuracy of event-related news filtering. And better results are obtained in event-related news filtering tasks compared with other baseline methods.
The technical scheme of the invention is as follows: the method for enhancing event-related news filtering of PU learning based on fusion topic information comprises the following specific steps:
step1, training a classifier, and adding an unsupervised topic model VAE to enhance;
step2, predicting unlabeled data through a trained classifier model, and sequencing the probability of the predicted result of unlabeled news from high to low;
step3, after the primary training and prediction process is completed, carrying out iteration of PU learning, namely retraining a classifier on the newly obtained training set and repeating the whole prediction and training process;
step4, putting all samples into a classifier for training to obtain an event related news classification model required by the user, and further filtering out the required event related news more accurately.
As a preferred embodiment of the present invention, the specific steps of Step1 are:
step1.1, extracting non-event related news data by using an improved I-DNF algorithm, and obtaining the counterexample with the same scale as the initial event related news.
Step1.2, using a variational self-encoding (VAE) topic model, in order to extract potential features from the word vector space of a document, the present invention is understood to be topic features. The present invention refers to the work of the predecessor and the VAE principle, implements this VAE structure and uses the entire event related news data set for unsupervised pre-training. To train the initial classifier.
Step1.3, using the network structure of the embedded and two-way long short-term memory network (BiLSTM) as a classifier.
As a preferable scheme of the invention, the specific steps of the step Step1.1 are as follows:
step1.1.1, the frequency of occurrence of a text feature in the positive example set is more than 90%, and the frequency of occurrence of the text feature in the unidentified set is only 10%, so that the feature is regarded as the feature of the positive example;
step1.1.2, establishing a positive example feature set by the fact that features are different in occurrence frequency in the positive example set and the unidentified set;
step1.1.3, the sample document in the unidentified set U does not contain any feature in the positive feature set, and is extracted from the unidentified set U and identified as a counterexample.
As a preferred embodiment of the present invention, the step step1.2 includes:
the step1.2.1, variable self-encoding (VAE) architecture is an encoder-decoder architecture. In the encoder, the input is compressed into a potential distribution, and the decoder reconstructs the input signal from the distribution in the potential space of the data by sampling;
step1.2.2, typically, the VAE model assumes that the posterior probability of the potential distribution of the input data approximately meets the gaussian distribution, and then reconstruct through the decoding network;
step1.2.3, the implementation of the present invention for decoding network decoding is implemented using a full connectivity network (MLP).
As a preferable scheme of the invention, the specific steps of the step Step1.3 are as follows:
step1.3.1, firstly, word Embedding is carried out on an input text by using an Embedding network layer, and a word Embedding vector is obtained. In addition, the input text is passed through a VAE topic model to obtain topic vectors of news texts, and two kinds of coding information are obtained;
step1.3.2, using news topic vectors to guide word embedding vectors; the new matrix is the news coding vector which is merged into the theme vector;
and step1.3.3, modeling the context relation of the news coding vector after integrating the theme information through a two-way long-short-term memory network layer (BiLSTM) to obtain a news semantic characterization vector.
As a preferred embodiment of the present invention, the specific steps of Step2 are:
step2.1, carrying out class probability prediction on the rest unlabeled data samples in the data set through a classifier and a topic model. The prediction result is a probability value that the news belongs to event-related news.
Step2.2, sequencing the probability of the predicted result of the unlabeled news from high to low, acquiring data with front probability as a reliable event related news sample and data with rear probability as a reliable negative sample according to a certain iteration step in each prediction, removing the samples from unlabeled samples, and adding the samples into training data for the subsequent iteration training process.
As a preferred embodiment of the present invention, the specific steps of Step3 are:
after completing the initial training and prediction process, step3.1 retrains the classifier on the newly obtained training set and repeats the entire prediction and training process.
Step3.2, after each iteration is completed, the number of unlabeled data is reduced, the number of training sets is increased, and when the unlabeled data is completely predicted as a reliable sample, the whole iteration process is completed.
The beneficial effects of the invention are as follows:
the invention applies the PU learning method to the event-related news filtering task, and effectively solves the problem of event-related news filtering under the condition of a small number of manual labels.
According to the invention, the topic information of the event-related news data is extracted in an unsupervised pre-training mode, and the training process of PU learning is enhanced by using the topic information, so that the accuracy is improved compared with that of ordinary PU learning.
An event related news data set is constructed and experiments are carried out by using the method, and experimental results show that the method provided by the invention obtains better results in the experiments compared with a PU learning method without theme enhancement.
Drawings
FIG. 1 is a general model diagram of the present invention;
FIG. 2 is a diagram of a PU learning training process in the present invention;
FIG. 3 is a graph of experimental results on a validation set in accordance with the present invention;
FIG. 4 is a graph of experimental results on unlabeled datasets in the present invention;
FIG. 5 is a graph of initial data versus experimental results for different scales in the present invention;
fig. 6 is a graph of iterative steps versus experimental results in the present invention.
Detailed Description
Example 1: as shown in fig. 1-5, a learning method for enhancing event-related news filtering for PU learning based on fusion topic information includes the following specific steps:
step1, training a classifier, and adding an unsupervised topic model VAE to enhance.
Step2, predicting unlabeled data through a trained classifier model, and sequencing the probability of the predicted result of unlabeled news from high to low.
Step3, after the primary training and prediction process is completed, performing iteration of PU learning, namely retraining the classifier on the newly obtained training set and repeating the whole prediction and training process.
Step4, putting all samples into a classifier for training to obtain an event related news classification model required by the user, and further filtering out the required event related news more accurately.
The specific steps of the Step1 are as follows:
step1.1, extracting non-event related news data by using an improved I-DNF algorithm, and obtaining the counterexample with the same scale as the initial event related news.
Step1.2, using a variational self-encoding (VAE) topic model, in order to extract potential features from the word vector space of a document, the present invention is understood to be topic features. The present invention refers to the work of the predecessor and the VAE principle, implements this VAE structure and uses the entire event related news data set for unsupervised pre-training. To train the initial classifier.
Step1.3, using the network structure of the embedded and two-way long short-term memory network (BiLSTM) as a classifier.
Example 2: as shown in fig. 1 to 5, this embodiment is the same as embodiment 1 in that the learning method for event-related news filtering for PU learning is enhanced based on fusion topic information, in which:
as a preferable scheme of the invention, the specific steps of the step Step1.1 are as follows:
step1.1.1, a text feature appears more than 90% frequently in the positive example set, while it appears only 10% frequently in the unidentified set, and this feature is considered as the feature of the positive example.
Step1.1.2, a positive example feature set is established by the fact that features occur in the positive example set and the unidentified set with different frequencies.
Step1.1.3, the sample document in the unidentified set U does not contain any feature in the positive feature set, and is extracted from the unidentified set U and identified as a counterexample.
As a preferred embodiment of the present invention, the step step1.2 includes:
the step1.2.1, variable self-encoding (VAE) architecture is an encoder-decoder architecture. In the encoder, the input is compressed into a potential distribution Z, and the decoder reconstructs the input signal D from the sampling from the distribution of Z in the potential space of the data.
Where Z represents the potential distribution, where P (D|Z) describes the probability of generating D from Z.
Step1.2.2, typically, the VAE model assumes that the posterior probability of the potential distribution Z of the input data D approximately meets the gaussian distribution, i.e
logP(Z|d (i) )=logN(z;μ (i)2(i) I) (2)
Wherein d is (i) Representing a certain real sample in D, each μ and δ 2 All are composed of d (i) Generated by a neural network. By the mu obtained (i) And delta 2(i) Each d can be obtained (i) Corresponding distribution P (Z (i) |d (i) ) Then through the decoding networkRestructuring +.>
Step1.2.3, inventive for μ and δ 2 The generation of and decoding of the network Decode is implemented using a full connectivity network (MLP).
Where m represents a preset number of potential topics. After the calculation, the potential theme distribution of the event related news required by the invention can be expressed asIn order to make the reconstruction data as close as possible to the original data, the final optimization objective of the VAE is to maximize d (i) The generation probability P (d) (i) ) At the same time, the KL divergence is used to make the posterior probability P (Z) (i) |d (i) ) As close as possible to its theoretical variation probability, N (0,I), the final expression of this optimization objective is as follows.
As a preferable scheme of the invention, the specific steps of the step Step1.3 are as follows:
step1.3.1, firstly, word Embedding is carried out on an input text by using an Embedding network layer to obtain a word Embedding vectorWhere n represents the news text length and v is the word vector dimension. In addition, the input text is passed through the VAE topic model to obtain topic vector +.>Wherein m is a preset subject numberA number. Two kinds of encoded information are obtained.
Step1.3.2, use of news topic vectorsTo guide the word embedding vector X. Since the topic vector obtained by the topic model is a vector with a shape of 1*m, n copies of the topic vector are copied, and after the topic vector is spliced into the word embedded vector X, the new matrix X' is the news coded vector integrated into the topic vector.
And step1.3.3, modeling the context relation of the news coding vector after integrating the theme information through a two-way long-short-term memory network layer (BiLSTM) to obtain a news semantic characterization vector. The specific formula is shown below.
Where H is the BiLSTM encoded sentence vector, q is the BiLSTM hidden layer dimension, and y represents the final probability output.
As a preferred embodiment of the present invention, the specific steps of Step2 are:
step2.1, carrying out class probability prediction on the rest unlabeled data samples in the data set through a classifier and a topic model. The prediction result is a probability value that the news belongs to event-related news.
Step2.2, sequencing the probability of the predicted result of the unlabeled news from high to low, acquiring data with front probability as a reliable event related news sample and data with rear probability as a reliable negative sample according to a certain iteration step in each prediction, removing the samples from unlabeled samples, and adding the samples into training data for the subsequent iteration training process.
As a preferred embodiment of the present invention, the specific steps of Step3 are:
after completing the initial training and prediction process, step3.1 retrains the classifier on the newly obtained training set and repeats the entire prediction and training process.
Step3.2, after each iteration is completed, the number of unlabeled data is reduced, the number of training sets is increased, and when the unlabeled data is completely predicted as a reliable sample, the whole iteration process is completed.
The invention constructs an event related news data set for experiments, and combines the method to make three types of experiments, one type is to make a comparison experiment with the performance of the PU classification algorithm without the subject, and analyze the prediction performance of the two in iterative training, in addition, the comparison experiment of the initial test data sets with different scales is carried out, finally, the iterative stride comparison experiment is carried out, and the effectiveness of the method for comparing the PU classification algorithm without the subject under the condition of different strides is verified. The experimental result verifies the effectiveness of the method on the event-related news correlation analysis task, and simultaneously, the effect of using the subject information to enhance the model performance improvement in the PU learning iterative process is also demonstrated.
The choice of experimental parameters directly influences the final experimental result. Because the length of the news text in the data set is about between 100 and 250 characters, all the data are manually marked for the convenience of experimental verification effect, wherein 10000 event-related news and 20000 non-event-related news are included. The invention sets the maximum length of the text as 200 characters. Adopting an Adam algorithm as an optimizer; the learning rate was set to 0.001; dropout of the single layer Bi-LSTM is set to 0.2 lost; batch processing size was set to 128; training runs were set to 20; the iterative training times are the ratio of the total amount of unlabeled data to the number of positive and negative samples extracted each time. The evaluation index herein mainly adopts the accuracy (acc.), the precision (P), the recall (R) and the F1 value.
Comparing the method with the traditional PU learning method through the three experiments, the invention finds that the method is superior to the traditional PU method in each iteration when the initial data scale and the iteration step are fixed. When the initial data size is smaller or the iteration step is larger, the performance of the method is improved more and is more stable.
The comparative experiment with the performance of the PU classification algorithm without the theme is mainly used for verifying the effectiveness of the method in event-related news filtering problems with a small number of event-related news samples and verifying the enhancement effect of theme information on the PU learning iterative process. Two sets of experiments were set up: one group is to use a reserved verification set to evaluate the generalization performance of the classifier trained by the iterative process, and the experimental result is shown in figure 3; the other group is to evaluate the prediction results of the classifier trained in each iteration on the remaining unlabeled samples, and the experimental results are shown in fig. 4. As can be seen from the analysis of FIG. 3, the upper limit of the F1 value of the data set and the classification model used in the method can reach 83.4% in the case of "supervised" and the F1 value is only 73.9% in the case of "PU learning", which is different by 13.7%, whereas the F1 value of the method can reach 75.7% in the same experimental setting, which is improved by 1.8% compared with the case of "PU learning", so that the method can be illustrated as the effectiveness of the method on event-related news filtering problems with only a small number of event-related news samples and the enhancement effect of the topic model on PU learning.
As can be seen from the analysis of FIG. 4, the performance of the method of the invention in predicting unlabeled data is all the lead to the traditional PU learning scheme, and the difference between the two is larger and larger along with the increase of the iteration times. It is also illustrated that the inventive method is effective against improvements of the PU learning method.
In the comparison experiments of initial data of different scales, the performance of the method is improved on the reserved verification set compared with the PU learning method without the subject. The experimental results are shown in fig. 5 below. As can be seen from analysis of fig. 5, when the initial labeling data size is only 500, the conventional PU learning method is almost disabled, and this phenomenon is because PU learning depends on the initial labeling data size, and when the initial data size is too small, the trained classifier precision is too low, and errors existing in reliable positive and negative samples obtained in the subsequent prediction process are too large, and as the iteration process proceeds, the errors are accumulated and amplified, and finally the PU learning is disabled. As the initial data size increases, the error per iteration becomes smaller and the final training results become better, which is a common phenomenon of PU learning. This common phenomenon is also followed by the methods herein. In contrast, the methods herein show better adaptability when the initial data set is smaller. When the initial data size is only 750, the F1 value difference between the effect of the method and the traditional PU learning reaches 9.4%, and as the initial data size is increased, the difference between the effect and the traditional PU learning gradually reduces, and the phenomenon is that when the initial data size is smaller, an unsupervised subject model brings more information to a small amount of case-related training data, so that the performance of an initial classifier is not reduced too much, and the phenomenon of error accumulation is finally relieved.
In the iterative stride comparison experiment, compared with the PU learning method without the theme, the performance of the method is improved on the reserved verification set. The experimental results are shown in fig. 6 below. As can be seen from analysis of fig. 6, the performance of the present invention and the conventional PU learning can be kept at a better level when the iteration steps are 300 and 500, and the performance of the conventional PU learning starts to be greatly reduced when the iteration steps are further expanded, while the method of the present invention is still kept at a better level. However, when the iteration step reaches 1500, both the invention and the traditional PU learning fail, because the accuracy of the classification model trained by the PU learning is limited when the initial data size is only 1000, and even if subject information is added for enhancement, the required accuracy cannot be achieved.
Therefore, the learning method can better filter the event-related news, effectively solve the problem of lack of training data in the event-related news filtering task, and improve the accuracy of the event-related news filtering result.
While the present invention has been described in detail with reference to the drawings, the present invention is not limited to the above embodiments, and various changes can be made without departing from the spirit of the present invention within the knowledge of those skilled in the art.

Claims (3)

1. The learning method for enhancing event-related news filtering of PU learning based on the integrated theme information is characterized by comprising the following steps: the method comprises the following specific steps:
step1, training a classifier, and adding an unsupervised topic model to enhance;
step2, predicting unlabeled data through a trained classifier model, and sequencing the probability of the predicted result of unlabeled news from high to low;
the specific steps of the Step2 are as follows:
step2.1, carrying out class probability prediction on remaining unlabeled data samples in the data set through a classifier and a topic model; the prediction result is a probability value of news belonging to event-related news;
step2.2, sequencing the probability of the predicted result of unlabeled news from high to low, acquiring data with front probability as a reliable event related news sample and data with rear probability as a reliable negative sample according to a certain iteration step in each prediction, removing the samples from unlabeled samples, and adding the samples into training data for carrying out a subsequent iteration training process;
step3, after the primary training and prediction process is completed, carrying out iteration of PU learning, namely retraining a classifier on the newly obtained training set and repeating the whole prediction and training process;
step4, putting all samples into a classifier for training to obtain a required event related news classification model, and filtering out required event related news based on the event related news classification model;
the Step1 includes:
extracting non-event related news data by using an I-DNF algorithm, obtaining counterexamples with the same scale as the initial event related news, training an initial classifier, and adding an unsupervised topic model VAE for enhancement;
wherein, the network structure of the two-way long-short-term memory network BiLSTM is used as a classifier;
firstly, word Embedding is carried out on an input text by using an Embedding network layer to obtain a word Embedding vectorWhere n represents the news text length and v is the word vector dimension; in addition, the input text is passed through the VAE topic model to obtain topic vector +.>Wherein m is the number of preset topics to obtain two kinds of coding information;
topic vector using news textTo guide the word embedding vector X; since the topic vector obtained by the topic model is a vector with a shape of 1*m, n copies of the topic vector are spliced into word embedding vectors X, and then a new matrix X' is formed, namely a news coding vector which is merged into the topic vector:
the news coding vector integrated with the theme information is modeled through a two-way long-short-term memory network layer BiLSTM to obtain a news semantic characterization vector, and the specific formula is as follows:
where H is the BiLSTM encoded sentence vector, q is the BiLSTM hidden layer dimension, and y represents the final probability output.
2. The learning method of event-related news filtering based on merging topic information enhanced PU learning of claim 1, wherein: the specific steps of obtaining the counterexample with the same scale as the news related to the initial event are as follows:
step1.1.1, the frequency of occurrence of a text feature in the positive example set is more than 90%, and the frequency of occurrence of the text feature in the unidentified set is only 10%, so that the feature is regarded as the feature of the positive example;
step1.1.2, establishing a positive example feature set by the fact that features are different in occurrence frequency in the positive example set and the unidentified set;
step1.1.3, the sample document in the unidentified set U does not contain any feature in the positive feature set, and is extracted from the unidentified set U and identified as a counterexample.
3. The learning method of event-related news filtering based on merging topic information enhanced PU learning of claim 1, wherein: the specific steps of the Step3 are as follows:
after completing the primary training and prediction process, step3.1 retrains the classifier on the newly obtained training set and repeats the whole prediction and training process;
step3.2, after each iteration is completed, the number of unlabeled data is reduced, the number of training sets is increased, and after the unlabeled data is completely predicted as a reliable sample, the iteration process is ended.
CN202110347488.5A 2021-03-31 2021-03-31 Event-related news filtering learning method based on fusion topic information enhanced PU learning Active CN113641888B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110347488.5A CN113641888B (en) 2021-03-31 2021-03-31 Event-related news filtering learning method based on fusion topic information enhanced PU learning

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110347488.5A CN113641888B (en) 2021-03-31 2021-03-31 Event-related news filtering learning method based on fusion topic information enhanced PU learning

Publications (2)

Publication Number Publication Date
CN113641888A CN113641888A (en) 2021-11-12
CN113641888B true CN113641888B (en) 2023-08-29

Family

ID=78415731

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110347488.5A Active CN113641888B (en) 2021-03-31 2021-03-31 Event-related news filtering learning method based on fusion topic information enhanced PU learning

Country Status (1)

Country Link
CN (1) CN113641888B (en)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116501898B (en) * 2023-06-29 2023-09-01 之江实验室 Financial text event extraction method and device suitable for few samples and biased data

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108881196A (en) * 2018-06-07 2018-11-23 中国民航大学 The semi-supervised intrusion detection method of model is generated based on depth
CN110263166A (en) * 2019-06-18 2019-09-20 北京海致星图科技有限公司 Public sentiment file classification method based on deep learning
CN110852419A (en) * 2019-11-08 2020-02-28 中山大学 Action model based on deep learning and training method thereof
CN111538807A (en) * 2020-04-16 2020-08-14 上海交通大学 System and method for acquiring Web API knowledge based on Stack Overflow website
CN112434744A (en) * 2020-11-27 2021-03-02 北京奇艺世纪科技有限公司 Training method and device for multi-modal feature fusion model

Family Cites Families (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20050114313A1 (en) * 2003-11-26 2005-05-26 Campbell Christopher S. System and method for retrieving documents or sub-documents based on examples
US11468358B2 (en) * 2017-11-30 2022-10-11 Palo Alto Networks (Israel Analytics) Ltd. Framework for semi-supervised learning when no labeled data is given
US11544558B2 (en) * 2019-08-30 2023-01-03 Nec Corporation Continual learning of artificial intelligence systems based on bi-level optimization

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108881196A (en) * 2018-06-07 2018-11-23 中国民航大学 The semi-supervised intrusion detection method of model is generated based on depth
CN110263166A (en) * 2019-06-18 2019-09-20 北京海致星图科技有限公司 Public sentiment file classification method based on deep learning
CN110852419A (en) * 2019-11-08 2020-02-28 中山大学 Action model based on deep learning and training method thereof
CN111538807A (en) * 2020-04-16 2020-08-14 上海交通大学 System and method for acquiring Web API knowledge based on Stack Overflow website
CN112434744A (en) * 2020-11-27 2021-03-02 北京奇艺世纪科技有限公司 Training method and device for multi-modal feature fusion model

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
Social Media Relevance Filtering Using Perplexity-Based Positive-Unlabelled Learning;Sunghwan Mac Kim et al;《Proceedings of the Fourteenth International AAAI Conference on Web and Social Media》;第370-379页 *

Also Published As

Publication number Publication date
CN113641888A (en) 2021-11-12

Similar Documents

Publication Publication Date Title
CN107239529B (en) Public opinion hotspot category classification method based on deep learning
CN111914091B (en) Entity and relation combined extraction method based on reinforcement learning
CN111738004A (en) Training method of named entity recognition model and named entity recognition method
CN113868432B (en) Automatic knowledge graph construction method and system for iron and steel manufacturing enterprises
CN104408153A (en) Short text hash learning method based on multi-granularity topic models
CN112101027A (en) Chinese named entity recognition method based on reading understanding
Zhang et al. Data-driven meta-set based fine-grained visual recognition
CN112070139B (en) Text classification method based on BERT and improved LSTM
CN113553831B (en) Method and system for analyzing aspect level emotion based on BAGCNN model
CN112749274A (en) Chinese text classification method based on attention mechanism and interference word deletion
CN113378563B (en) Case feature extraction method and device based on genetic variation and semi-supervision
Wu et al. Optimized deep learning framework for water distribution data-driven modeling
CN113609284A (en) Method and device for automatically generating text abstract fused with multivariate semantics
CN108681532B (en) Sentiment analysis method for Chinese microblog
CN116245107A (en) Electric power audit text entity identification method, device, equipment and storage medium
CN113920379A (en) Zero sample image classification method based on knowledge assistance
CN113886562A (en) AI resume screening method, system, equipment and storage medium
CN113641888B (en) Event-related news filtering learning method based on fusion topic information enhanced PU learning
CN115952292A (en) Multi-label classification method, device and computer readable medium
CN116010793A (en) Classification model training method and device and category detection method
Jeyakarthic et al. Optimal bidirectional long short term memory based sentiment analysis with sarcasm detection and classification on twitter data
CN114676346A (en) News event processing method and device, computer equipment and storage medium
CN117787283A (en) Small sample fine granularity text named entity classification method based on prototype comparison learning
CN116680590B (en) Post portrait label extraction method and device based on work instruction analysis
CN116108840A (en) Text fine granularity emotion analysis method, system, medium and computing device

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant