CN110032733A - A kind of rumour detection method and system for news long text - Google Patents

A kind of rumour detection method and system for news long text Download PDF

Info

Publication number
CN110032733A
CN110032733A CN201910184862.7A CN201910184862A CN110032733A CN 110032733 A CN110032733 A CN 110032733A CN 201910184862 A CN201910184862 A CN 201910184862A CN 110032733 A CN110032733 A CN 110032733A
Authority
CN
China
Prior art keywords
rumour
paragraph
collection
data
text
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201910184862.7A
Other languages
Chinese (zh)
Inventor
曹娟
钟雷
郭俊波
李锦涛
谢添
刘浩远
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Institute of Computing Technology of CAS
Original Assignee
Institute of Computing Technology of CAS
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Institute of Computing Technology of CAS filed Critical Institute of Computing Technology of CAS
Priority to CN201910184862.7A priority Critical patent/CN110032733A/en
Publication of CN110032733A publication Critical patent/CN110032733A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/3331Query processing
    • G06F16/334Query execution
    • G06F16/3344Query execution using natural language analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • G06F18/2411Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches based on the proximity to a decision surface, e.g. support vector machines
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/30Semantic analysis

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • General Physics & Mathematics (AREA)
  • Artificial Intelligence (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Computational Linguistics (AREA)
  • Evolutionary Biology (AREA)
  • Evolutionary Computation (AREA)
  • Databases & Information Systems (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • General Health & Medical Sciences (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The present invention relates to a kind of rumour detection methods and system for news long text, it include: to obtain the text for being greater than default number of words in specified news platform as long text, extract the keyword of paragraph in long text, and social data is obtained with the keyword retrieval social platform, the related data of the paragraph is obtained using text relevant algorithm;Obtain labeled data collection, labeled data collection includes the multiple social datas for having marked rumour information, use the multiple disaggregated models of labeled data collection training, and the disaggregated model collection that training is completed is combined into Fusion Model, the confidence score of related data is obtained using Fusion Model, to represent paragraph as the probability of non-rumour.The present invention solves the problems, such as to be difficult to directly differentiate long article using heterologous detection method.

Description

A kind of rumour detection method and system for news long text
Technical field
The present invention relates to the rumour detection field in big data analysis, in particular to a kind of rumour for news long text Detection method and system.
Background technique
It is the main source that people obtain news information as Internet news media platform is closely bound up with our life One of.However, there is a large amount of deceptive information, especially part news media platforms in media platform to increase information source It introduces that the long article text quality for causing wherein to issue from media number is irregular, easily becomes the issue source of rumour, these letters It ceases and the normal spin and civil life of society is brought and seriously affected, carrying out rumour detection for media platform also becomes It is particularly important.Long article data involved in this patent, which refer to, is present in the data that text size in news media's platform is greater than 140. The mode of traditional manual identified rumour needs to expend a large amount of manpower and material resources, it is difficult to meet requirement of real-time, and long article number More disperse according to semantic information, the artificial cost that marks further increases.Currently, carrying out rumour detection using machine learning method Work in, mainly using microblogging, push away short text data in top grade platform as research object, for news such as " flash reports everyday " Long article data research in media platform is less.Microblogging, the short essay data pushed away in top grade platform can provide more for learning algorithm More learning characteristics.Such as content characteristic, user characteristics, propagation characteristic, temporal characteristics etc., in conjunction with currently a popular engineering Algorithm or deep learning algorithm are practised, has had reached higher accuracy for the rumour detection method of short essay data.And due to Masses cannot participate in the contents production of news media's platform, therefore long article data are rich without social media data in such platform Rich data characteristics, it is available that common detection algorithm usually only has content of text, and finds according to the observation, long article text It is usually weaker in the characteristic aspects distinction such as semanteme, emotion, punctuation mark, so that sorting algorithm is difficult to ensure accuracy.Therefore The present invention proposes a kind of new rumour detection method for news media's platform long article data.
In the rumour detection method based on content, main explicit features and semantic implicit features using syntax.It is aobvious In terms of formula character, the prior art proposes to use word feature, symbolic feature and the simple affective characteristics of content of text; Whether the prior art proposes using string length, word number, includes punctuation mark, issuing time etc. feature.It is implicit special Sign aspect, the prior art are expressed using the hidden layer of Recognition with Recurrent Neural Network study message, improve experiment effect;The prior art uses Term vector obtains the semantic meaning representation of text using convolutional neural networks as input.Since the platform datas text such as microblogging is shorter, Information is concentrated, and text style is different, therefore content characteristic can make rumour detection obtain better effect.And in news media's platform Data text it is longer, semanteme dispersion, clause grammer is plain, is difficult to obtain preferable classifying quality using only content characteristic.
Research at present for the detection of long article rumour is less, and the prior art is for " food health " " medical health " two necks The long article in domain carries out rumour identification, according to " rumour has the characteristics that abnormal emotion feature ", the method for proposing to use sentiment analysis Carry out rumour detection.But this method does not have universality, only effective to certain types of rumour.
For the problem present on, the present invention proposes that a kind of rumour for long article data in news media's platform detects Method.It has been observed that the rumour in long article data is usually only present in some paragraph, this method is by more mature micro- Rich short essay rumour detection method, is first accounted for long article as unit of paragraph, is extracted to each paragraph corresponding crucial Word, into microblog, search obtains microblog data, under the premise of guaranteeing that microblog data is relevant to long article paragraph content, uses Fusion Model calculates the confidence level of microblog data, and then obtains the confidence score of each paragraph in long article.
Summary of the invention
In view of the above-mentioned problems, the present invention proposes a kind of rumour detection method for long article data in news media's platform, It mainly solves the problems, such as to be assessed to find close data in microblog, while providing the credible of each paragraph in long article Spend score.
In particular it relates to a kind of rumour detection method for news long text, including:
Step 1 obtains the text for being greater than default number of words in specified news platform as long text, extracts the long text middle section The keyword fallen, and social data is obtained with the keyword retrieval social platform, the paragraph is obtained using text relevant algorithm Related data;
Step 2 obtains labeled data collection, which includes the multiple social datas for having marked rumour information, is made It is combined into Fusion Model with the multiple disaggregated models of labeled data collection training, and by the disaggregated model collection that training is completed, is melted using this Molding type obtains the confidence score of the related data, to represent the paragraph as the probability of non-rumour.
This is directed to the rumour detection method of news long text, and wherein the step 1 includes: to use TF- for each paragraph IDF method is extracted to obtain the keyword of paragraph.
This is directed to the rumour detection method of news long text, and wherein the step 1 includes: to calculate the social activity with the keyword Similarity between data and the paragraph, and gather the social data that the similarity is greater than threshold value, as the related data.
This is directed to the rumour detection method of news long text, and wherein multiple disaggregated model includes: supporting vector in step 2 Machine, random forest, extreme random tree, gradient promote decision tree, the promotion of limit gradient and Logic Regression Models.
This is directed to the rumour detection method of news long text, wherein the multiple disaggregated models of training in the step 2 specifically:
Labeled data collection is divided into training set and test set, and training set is divided into 5 foldings of same size, for the support Vector machine, the random forest, the extreme random tree, the gradient promote decision tree, the limit gradient lift scheme, choose instruction respectively Practice 4 foldings concentrated to be trained, residue 1 folds into capable prediction, by the support vector machines, the random forest, the extreme random tree, is somebody's turn to do Gradient promotes decision tree, the respective prediction result collection of limit gradient lift scheme is combined into the first middle trained collection, and each While secondary trained, which is predicted, if being b to the prediction result each time of test seti, by the support to Amount machine, the random forest, the extreme random tree, the gradient promote the respective prediction knot of decision tree, the limit gradient lift scheme Fruit is averaged to obtain the second middle trained collection, is instructed using the Logic Regression Models in the first middle trained collection and first centre Practice and train and test on collection, obtains the final Fusion Model.
The invention also discloses a kind of rumour detection system for news long text, including:
Module 1 obtains the text for being greater than default number of words in specified news platform as long text, extracts the long text middle section The keyword fallen, and social data is obtained with the keyword retrieval social platform, the paragraph is obtained using text relevant algorithm Related data;
Module 2 obtains labeled data collection, which includes the multiple social datas for having marked rumour information, is made It is combined into Fusion Model with the multiple disaggregated models of labeled data collection training, and by the disaggregated model collection that training is completed, is melted using this Molding type obtains the confidence score of the related data, to represent the paragraph as the probability of non-rumour.
This is directed to the rumour detection system of news long text, and wherein the module 1 includes: to use TF- for each paragraph IDF method is extracted to obtain the keyword of paragraph.
This is directed to the rumour detection system of news long text, and wherein the module 1 includes: to calculate the social activity with the keyword Similarity between data and the paragraph, and gather the social data that the similarity is greater than threshold value, as the related data.
This is directed to the rumour detection system of news long text, and wherein multiple disaggregated model includes: supporting vector in module 2 Machine, random forest, extreme random tree, gradient promote decision tree, the promotion of limit gradient and Logic Regression Models.
This is directed to the rumour detection system of news long text, wherein the multiple disaggregated models of training in the module 2 specifically:
Labeled data collection is divided into training set and test set, and training set is divided into 5 foldings of same size, for the support Vector machine, the random forest, the extreme random tree, the gradient promote decision tree, the limit gradient lift scheme, choose instruction respectively Practice 4 foldings concentrated to be trained, residue 1 folds into capable prediction, by the support vector machines, the random forest, the extreme random tree, is somebody's turn to do Gradient promotes decision tree, the respective prediction result collection of limit gradient lift scheme is combined into the first middle trained collection, and each While secondary trained, which is predicted, if being b to the prediction result each time of test seti, by the support to Amount machine, the random forest, the extreme random tree, the gradient promote the respective prediction knot of decision tree, the limit gradient lift scheme Fruit is averaged to obtain the second middle trained collection, is instructed using the Logic Regression Models in the first middle trained collection and first centre Practice and train and test on collection, obtains the final Fusion Model.
The technology of the present invention effect includes: to solve the problems, such as to be difficult to directly differentiate long article using heterologous detection method, is made The specific paragraph in long article there are rumour can be navigated to sectional detecting method, the method for the present invention is pervasive in news media's platform The rumour of middle long article data detects.
Detailed description of the invention
Fig. 1 is the method block diagram for obtaining long article paragraph;
Fig. 2 is Model Fusion stacking method schematic diagram;
Fig. 3 holistic approach block diagram of the present invention.
Specific implementation details
Key point of the present invention includes:
1, sectional detecting method.It is in the whole text that rumour/non-rumour is different from traditional short text, long article number in news media's platform According to rumour be usually only present in certain several paragraph, for this feature, long article is carried out ballad by the present invention as unit of paragraph Speech detection.Sectional detecting method can provide the confidence score of each paragraph in long article, navigate to the physical segment for rumour occur It falls, makes result that more there is interpretation.
2, heterologous detection method.Availability data feature is less in long article data, is directly difficult to using machine learning method It is assessed.The present invention carries out rumour detection using the microblog data that search obtains different source, before guaranteeing similar in content It puts, obtains the confidence score of corresponding long article paragraph.Heterologous detection method enriches data characteristics, and it is accurate to improve detection Property.
3, construction feature calculates microblog data confidence score using Fusion Model.The present invention has crawled Sina weibo platform The rumour data of middle official's certification, and non-rumour data are obtained by manually marking normal microblogging.Further, the present invention extracts Comment number in microblog data, 22 data characteristicses such as thumb up number, forwarding number, first using support vector machines, random forest, Gradient promotes decision tree etc., and totally 6 models carry out initial training to data, then using the stacking method in Fusion Model New training test data set is constructed, is finally trained on neotectonics data set using Logic Regression Models.Merge mould Type keeps the rumour testing result of microblog data more accurate.
To allow features described above and effect of the invention that can illustrate more clearly understandable, special embodiment below, and cooperate Bright book attached drawing is described in detail below.
The method of the present invention can provide confidence score to each of long article paragraph text, and pervasive in news simultaneously The rumour detection of long article data in media platform.Invention is illustrated with specific implementation method with reference to the accompanying drawing.
Step 1, to long article segment processing, obtain relevant microblog data.
Long article data characteristics distinction in news media's platform is weaker, is difficult to directly carry out from long article text merely Rumour/non-rumour classification, therefore, the present invention propose the data characteristics progress rumour detection using relevant microblog data rich, more The not strong disadvantage of long article feature differentiation is mended, and then obtains the assessment score of long article, obtains the flow chart of relevant microblog data such as Shown in Fig. 1.
Found according to statistics: the rumour of long article data exists only in the part paragraph in text in news media's platform, and its His paragraph remains as normal text, and the present invention first merges the paragraph of number of words in long article less (such as less than 25 words), It is authenticated as unit of paragraph.For each paragraph, TF-IDF (term frequency-inverse is used Document frequency) method extracts paragraph keyword, and wherein TF represents the normalization of the word frequency of occurrences in a document Value, the bigger word of the frequency of occurrences, TF value is bigger, calculation method are as follows:
Word sum in word frequency of occurrence/document in TF=document
IDF represents inverse document frequency of the word in collection of document, and the number of files comprising the word is fewer, and IDF value is bigger. Calculation method are as follows:
IDF=log (total number of documents/(number of files+1 comprising the word))
TF-IDF finally represents the importance of word in a document, the key that the present invention extracts using the product of TF and IDF Word is 4 forward words of TF-IDF score.
Microblog provides searching interface, and user inputs keyword and is obtained with corresponding microblog data.Utilize this Interface, the present invention develop data acquisition program, using the keyword extracted, crawl the homepage number in search return list According to.Further, in order to guarantee the content relevance of microblog data Yu long article paragraph, the present invention uses word embedding grammar Word2vec obtains the vector expression of word in microblog data and long article paragraph respectively, according to the TF-IDF weight of word, to word Vector weighting is averaged to obtain the vector expression of text.And the degree of correlation size using cosine correlation calculations between the two, if Microblog data and the vector expression of long article paragraph are respectivelyThen text relevant calculation formula are as follows:
Relevance threshold, reservation and the biggish microblog data of long article correlation are set, is obtained in long article using the above method The corresponding microblog data of each paragraph (related data).
Step 2, certification analysis is carried out to microblog data.
Microblog data corresponding for long article paragraph, the present invention promote decision using support vector machines, random forest, gradient The fusion of 6 models such as tree to microblog data carries out certification analysis.
Microblogging certification analysis is it is believed that belong to two classification problems, and the rumour data in training data are both from microblog The rumour data of official's certification, non-rumour data are from artificial mark.For each microblog data, the point in microblogging is extracted Praise the data characteristicses such as number, comment number, forwarding number totally 22 social characteristics.Stacking method uses the multiple model structures in upper layer first New training test data set is built out, is then trained again using underlying model.Method is as shown in Figure 2:
The present invention has used support vector machines (SVM), random forest (RF), extreme random tree (ET), gradient to mention at the middle and upper levels Decision tree (GDBT), limit gradient promotion (xgboost) totally 5 models are risen, data set is divided into training set and test set first, And training set is divided into 5 foldings of same size, for each model, 4 foldings chosen in training set are trained, another to fold into Row prediction (guarantees that each model is different with the data set to give a forecast), if the result predicted each time is ai, 5 times are predicted As a result combination forms matrix A, becomes new training dataset.While training each time, test set data are predicted, If the prediction result each time to test set is bi, 5 prediction results are averaged to obtain matrix B, become new test data Collection.Finally, training and testing on new training dataset A and new test data set B using Logic Regression Models, obtain most Whole evaluation model.
Fusion Model can reduce the deviation that single model occurs in assorting process, achieve in rumour detection more preferable Effect.For each long article paragraph, the confidence score of relevant microblog data is obtained using Model Fusion method, to represent The paragraph is the probability of non-rumour.
The following are system embodiment corresponding with above method embodiment, present embodiment can be mutual with above embodiment Cooperation is implemented.The relevant technical details mentioned in above embodiment are still effective in the present embodiment, in order to reduce repetition, Which is not described herein again.Correspondingly, the relevant technical details mentioned in present embodiment are also applicable in above embodiment.
The invention also discloses a kind of rumour detection system for news long text, including:
Module 1 obtains the text for being greater than default number of words in specified news platform as long text, extracts the long text middle section The keyword fallen, and social data is obtained with the keyword retrieval social platform, the paragraph is obtained using text relevant algorithm Related data;
Module 2 obtains labeled data collection, which includes the multiple social datas for having marked rumour information, is made It is combined into Fusion Model with the multiple disaggregated models of labeled data collection training, and by the disaggregated model collection that training is completed, is melted using this Molding type obtains the confidence score of the related data, to represent the paragraph as the probability of non-rumour.
This is directed to the rumour detection system of news long text, and wherein the module 1 includes: to use TF- for each paragraph IDF method is extracted to obtain the keyword of paragraph.
This is directed to the rumour detection system of news long text, and wherein the module 1 includes: to calculate the social activity with the keyword Similarity between data and the paragraph, and gather the social data that the similarity is greater than threshold value, as the related data.
This is directed to the rumour detection system of news long text, and wherein multiple disaggregated model includes: supporting vector in module 2 Machine, random forest, extreme random tree, gradient promote decision tree, the promotion of limit gradient and Logic Regression Models.
This is directed to the rumour detection system of news long text, wherein the multiple disaggregated models of training in the module 2 specifically:
Labeled data collection is divided into training set and test set, and training set is divided into 5 foldings of same size, for the support Vector machine, the random forest, the extreme random tree, the gradient promote decision tree, the limit gradient lift scheme, choose instruction respectively Practice 4 foldings concentrated to be trained, residue 1 folds into capable prediction, by the support vector machines, the random forest, the extreme random tree, is somebody's turn to do Gradient promotes decision tree, the respective prediction result collection of limit gradient lift scheme is combined into the first middle trained collection, and each While secondary trained, which is predicted, if being b to the prediction result each time of test seti, by the support to Amount machine, the random forest, the extreme random tree, the gradient promote the respective prediction knot of decision tree, the limit gradient lift scheme Fruit is averaged to obtain the second middle trained collection, is instructed using the Logic Regression Models in the first middle trained collection and first centre Practice and train and test on collection, obtains the final Fusion Model.

Claims (10)

1. a kind of rumour detection method for news long text characterized by comprising
Step 1 obtains the text for being greater than default number of words in specified news platform as long text, extracts paragraph in the long text Keyword, and social data is obtained with the keyword retrieval social platform, the phase of the paragraph is obtained using text relevant algorithm Close data;
Step 2 obtains labeled data collection, which includes the multiple social datas for having marked rumour information, uses this The multiple disaggregated models of labeled data collection training, and the disaggregated model collection that training is completed is combined into Fusion Model, use the fusion mould Type obtains the confidence score of the related data, to represent the paragraph as the probability of non-rumour.
2. being directed to the rumour detection method of news long text as described in claim 1, which is characterized in that the step 1 includes: pair In each paragraph, extract to obtain the keyword of paragraph using TF-IDF method.
3. being directed to the rumour detection method of news long text as described in claim 1, which is characterized in that the step 1 includes: meter Calculator has the similarity between the social data of the keyword and the paragraph, and gathers the social data that the similarity is greater than threshold value, As the related data.
4. being directed to the rumour detection method of news long text as described in claim 1, which is characterized in that multiple in step 2 Disaggregated model includes: support vector machines, random forest, extreme random tree, gradient promotion decision tree, limit gradient is promoted and logic Regression model.
5. being directed to the rumour detection method of news long text as described in claim 1, which is characterized in that training in the step 2 Multiple disaggregated models specifically:
Labeled data collection is divided into training set and test set, and training set is divided into 5 foldings of same size, for the supporting vector Machine, the random forest, the extreme random tree, the gradient promote decision tree, the limit gradient lift scheme, choose training set respectively In 4 foldings be trained, residue 1 folds into capable prediction, by the support vector machines, the random forest, the extreme random tree, the gradient Promote decision tree, the respective prediction result collection of limit gradient lift scheme is combined into the first middle trained collection, and instructs each time While white silk, which is predicted, if being b to the prediction result each time of test seti, by the supporting vector Machine, the random forest, the extreme random tree, the gradient promote decision tree, the respective prediction result of limit gradient lift scheme It is averaged to obtain the second middle trained collection, using the Logic Regression Models in the first middle trained collection and first middle trained It trains and tests on collection, obtain the final Fusion Model.
6. a kind of rumour detection system for news long text characterized by comprising
Module 1 obtains the text for being greater than default number of words in specified news platform as long text, extracts paragraph in the long text Keyword, and social data is obtained with the keyword retrieval social platform, the phase of the paragraph is obtained using text relevant algorithm Close data;
Module 2 obtains labeled data collection, which includes the multiple social datas for having marked rumour information, uses this The multiple disaggregated models of labeled data collection training, and the disaggregated model collection that training is completed is combined into Fusion Model, use the fusion mould Type obtains the confidence score of the related data, to represent the paragraph as the probability of non-rumour.
7. being directed to the rumour detection system of news long text as claimed in claim 6, which is characterized in that the module 1 includes: pair In each paragraph, extract to obtain the keyword of paragraph using TF-IDF method.
8. being directed to the rumour detection system of news long text as claimed in claim 6, which is characterized in that the module 1 includes: meter Calculator has the similarity between the social data of the keyword and the paragraph, and gathers the social data that the similarity is greater than threshold value, As the related data.
9. being directed to the rumour detection system of news long text as claimed in claim 6, which is characterized in that multiple in module 2 Disaggregated model includes: support vector machines, random forest, extreme random tree, gradient promotion decision tree, limit gradient is promoted and logic Regression model.
10. being directed to the rumour detection system of news long text as claimed in claim 6, which is characterized in that training in the module 2 Multiple disaggregated models specifically:
Labeled data collection is divided into training set and test set, and training set is divided into 5 foldings of same size, for the supporting vector Machine, the random forest, the extreme random tree, the gradient promote decision tree, the limit gradient lift scheme, choose training set respectively In 4 foldings be trained, residue 1 folds into capable prediction, by the support vector machines, the random forest, the extreme random tree, the gradient Promote decision tree, the respective prediction result collection of limit gradient lift scheme is combined into the first middle trained collection, and instructs each time While white silk, which is predicted, if being b to the prediction result each time of test seti, by the supporting vector Machine, the random forest, the extreme random tree, the gradient promote decision tree, the respective prediction result of limit gradient lift scheme It is averaged to obtain the second middle trained collection, using the Logic Regression Models in the first middle trained collection and first middle trained It trains and tests on collection, obtain the final Fusion Model.
CN201910184862.7A 2019-03-12 2019-03-12 A kind of rumour detection method and system for news long text Pending CN110032733A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910184862.7A CN110032733A (en) 2019-03-12 2019-03-12 A kind of rumour detection method and system for news long text

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910184862.7A CN110032733A (en) 2019-03-12 2019-03-12 A kind of rumour detection method and system for news long text

Publications (1)

Publication Number Publication Date
CN110032733A true CN110032733A (en) 2019-07-19

Family

ID=67235919

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910184862.7A Pending CN110032733A (en) 2019-03-12 2019-03-12 A kind of rumour detection method and system for news long text

Country Status (1)

Country Link
CN (1) CN110032733A (en)

Cited By (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110188284A (en) * 2019-04-25 2019-08-30 中国科学院计算技术研究所 A kind of rumour detection method and system based on retrieval auxiliary
CN110532563A (en) * 2019-09-02 2019-12-03 苏州美能华智能科技有限公司 The detection method and device of crucial paragraph in text
CN111143551A (en) * 2019-12-04 2020-05-12 支付宝(杭州)信息技术有限公司 Text preprocessing method, classification method, device and equipment
CN111475648A (en) * 2020-03-30 2020-07-31 东软集团股份有限公司 Text classification model generation method, text classification method, device and equipment
CN111489065A (en) * 2020-03-27 2020-08-04 北京理工大学 Node risk assessment integrating ICT supply chain network topology and product business information
CN111506710A (en) * 2020-07-01 2020-08-07 平安国际智慧城市科技股份有限公司 Information sending method and device based on rumor prediction model and computer equipment
CN111611981A (en) * 2020-06-28 2020-09-01 腾讯科技(深圳)有限公司 Information identification method and device and information identification neural network training method and device
CN111694955A (en) * 2020-05-08 2020-09-22 中国科学院计算技术研究所 Early dispute message detection method and system for social platform
CN111831790A (en) * 2020-06-23 2020-10-27 广东工业大学 False news identification method based on low threshold integration and text content matching
CN114897270A (en) * 2022-06-15 2022-08-12 青岛文达通科技股份有限公司 Semantic information fused public opinion propagation quantity prediction method and system

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108038240A (en) * 2017-12-26 2018-05-15 武汉大学 Based on content, the social networks rumour detection method of user's multiplicity
CN108228853A (en) * 2018-01-11 2018-06-29 北京信息科技大学 A kind of microblogging rumour recognition methods and system
CN108614855A (en) * 2018-03-19 2018-10-02 众安信息技术服务有限公司 A kind of rumour recognition methods

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108038240A (en) * 2017-12-26 2018-05-15 武汉大学 Based on content, the social networks rumour detection method of user's multiplicity
CN108228853A (en) * 2018-01-11 2018-06-29 北京信息科技大学 A kind of microblogging rumour recognition methods and system
CN108614855A (en) * 2018-03-19 2018-10-02 众安信息技术服务有限公司 A kind of rumour recognition methods

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
JUAN CAO 等: ""Automatic Rumor Detection on Microblogs: A Survey"", 《ARXIV:1807.03505V1》 *

Cited By (16)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110188284B (en) * 2019-04-25 2022-01-28 中国科学院计算技术研究所 Rumor detection method and system based on retrieval assistance
CN110188284A (en) * 2019-04-25 2019-08-30 中国科学院计算技术研究所 A kind of rumour detection method and system based on retrieval auxiliary
CN110532563A (en) * 2019-09-02 2019-12-03 苏州美能华智能科技有限公司 The detection method and device of crucial paragraph in text
CN110532563B (en) * 2019-09-02 2023-06-20 苏州美能华智能科技有限公司 Method and device for detecting key paragraphs in text
CN111143551A (en) * 2019-12-04 2020-05-12 支付宝(杭州)信息技术有限公司 Text preprocessing method, classification method, device and equipment
CN111489065A (en) * 2020-03-27 2020-08-04 北京理工大学 Node risk assessment integrating ICT supply chain network topology and product business information
CN111475648A (en) * 2020-03-30 2020-07-31 东软集团股份有限公司 Text classification model generation method, text classification method, device and equipment
CN111475648B (en) * 2020-03-30 2023-11-14 东软集团股份有限公司 Text classification model generation method, text classification device and equipment
CN111694955A (en) * 2020-05-08 2020-09-22 中国科学院计算技术研究所 Early dispute message detection method and system for social platform
CN111694955B (en) * 2020-05-08 2023-09-12 中国科学院计算技术研究所 Early dispute message detection method and system for social platform
CN111831790A (en) * 2020-06-23 2020-10-27 广东工业大学 False news identification method based on low threshold integration and text content matching
CN111831790B (en) * 2020-06-23 2023-07-14 广东工业大学 False news identification method based on low threshold integration and text content matching
CN111611981A (en) * 2020-06-28 2020-09-01 腾讯科技(深圳)有限公司 Information identification method and device and information identification neural network training method and device
CN111506710B (en) * 2020-07-01 2020-11-06 平安国际智慧城市科技股份有限公司 Information sending method and device based on rumor prediction model and computer equipment
CN111506710A (en) * 2020-07-01 2020-08-07 平安国际智慧城市科技股份有限公司 Information sending method and device based on rumor prediction model and computer equipment
CN114897270A (en) * 2022-06-15 2022-08-12 青岛文达通科技股份有限公司 Semantic information fused public opinion propagation quantity prediction method and system

Similar Documents

Publication Publication Date Title
CN110032733A (en) A kind of rumour detection method and system for news long text
CN109829166B (en) People and host customer opinion mining method based on character-level convolutional neural network
KR101284788B1 (en) Apparatus for question answering based on answer trustworthiness and method thereof
CN103544255B (en) Text semantic relativity based network public opinion information analysis method
US7761447B2 (en) Systems and methods that rank search results
CN103853738B (en) A kind of recognition methods of info web correlation region
CN103823859B (en) Name recognition algorithm based on combination of decision-making tree rules and multiple statistic models
CN106528599A (en) A rapid fuzzy matching algorithm for strings in mass audio data
CN101599071A (en) The extraction method of conversation text topic
CN107239439A (en) Public sentiment sentiment classification method based on word2vec
CN110888991B (en) Sectional type semantic annotation method under weak annotation environment
Cai et al. Large-scale question classification in cqa by leveraging wikipedia semantic knowledge
CN102637192A (en) Method for answering with natural language
CN106933800A (en) A kind of event sentence abstracting method of financial field
CN102750316A (en) Concept relation label drawing method based on semantic co-occurrence model
CN113312474A (en) Similar case intelligent retrieval system of legal documents based on deep learning
CN108520038B (en) Biomedical literature retrieval method based on sequencing learning algorithm
CN111221968B (en) Author disambiguation method and device based on subject tree clustering
CN1687924A (en) Method for producing internet personage information search engine
CN110134799B (en) BM25 algorithm-based text corpus construction and optimization method
CN106126502A (en) A kind of emotional semantic classification system and method based on support vector machine
CN104794209B (en) Chinese microblogging mood sorting technique based on Markov logical network and system
CN112084312B (en) Intelligent customer service system constructed based on knowledge graph
CN106960003A (en) Plagiarize the query generation method of the retrieval of the source based on machine learning in detection
CN115329085A (en) Social robot classification method and system

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication
RJ01 Rejection of invention patent application after publication

Application publication date: 20190719