CN108090046B - Microblog rumor identification method based on LDA and random forest - Google Patents

Microblog rumor identification method based on LDA and random forest Download PDF

Info

Publication number
CN108090046B
CN108090046B CN201711483228.0A CN201711483228A CN108090046B CN 108090046 B CN108090046 B CN 108090046B CN 201711483228 A CN201711483228 A CN 201711483228A CN 108090046 B CN108090046 B CN 108090046B
Authority
CN
China
Prior art keywords
microblog
text content
rumor
lda
data
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Expired - Fee Related
Application number
CN201711483228.0A
Other languages
Chinese (zh)
Other versions
CN108090046A (en
Inventor
曾子明
王婧
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Wuhan University WHU
Original Assignee
Wuhan University WHU
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Wuhan University WHU filed Critical Wuhan University WHU
Priority to CN201711483228.0A priority Critical patent/CN108090046B/en
Publication of CN108090046A publication Critical patent/CN108090046A/en
Application granted granted Critical
Publication of CN108090046B publication Critical patent/CN108090046B/en
Expired - Fee Related legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • G06F40/289Phrasal analysis, e.g. finite state techniques or chunking
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/35Clustering; Classification
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/30Semantic analysis

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Computational Linguistics (AREA)
  • General Health & Medical Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
  • Document Processing Apparatus (AREA)

Abstract

The invention discloses a microblog rumor recognition method based on LDA and random forests, which is characterized in that microblog data are collected from a microblog official platform by using a crawler method and are manually marked; calculating user credibility characteristics and microblog influence characteristics through text content data processing and z-score standardized microblog data; calculating the confusion degree through the LDA optimized text content and theme distribution probability and the LDA theme and optimized text content word distribution probability; further constructing and constructing a microblog feature vector; and establishing a microblog rumor classifier by taking the user credibility characteristic, the microblog influence characteristic, the LDA optimized text content and the theme distribution probability as input characteristics of a random forest model. According to the microblog text semantic information classification method, microblog text semantic information is deeply mined, and rumor classification precision is high.

Description

Microblog rumor identification method based on LDA and random forest
Technical Field
The invention relates to the fields of social networks, text analysis and the like, in particular to a social network rumor identification method based on LDA and random forests.
Background
With the rapid development of the internet and mobile communication equipment, the online social platform becomes an important channel for people to publish and acquire information, develop and maintain social relationships. The microblog attracts a large number of users by virtue of a convenient interaction mode, friendly interaction experience and influence of resident celebrities. According to the 8-month UnionWare index display in 2017, the number of active users in microblog months reaches 3.3 hundred million. As one of the active social platforms in China, a large amount of fragmented user generation information is gathered by microblogs. Because the information of the social platform presents a severe chaotic state, the uncertainty of individual cognition is improved, and the network rumor is bred. Research has found that rumors responsible for the greater social impact are mostly derived from the microblog platform. In the situation of official channels missing, rumors can relieve the cognitive anxiety of people to a certain extent. However, the abuse of rumors often causes negative and negative cyber public opinion wind waves, which pose potential threats to social stability and citizen safety, and the identification work of the cyber rumors is particularly critical.
Current research on rumor identification is mainly centered around rumor text feature studies, rumor issue user feature studies, and propagation network feature studies analyzing network rumor generation and propagation mechanisms.
In the above method, the deep semantic features, the credibility of the propagation users and the behavior features of the rumor content have not been well utilized.
Disclosure of Invention
In order to overcome the defects in the prior art, the technical scheme of the invention is a microblog rumor identification method based on LDA and random forests. The method comprises the following steps:
step 1, collecting microblog data from a microblog official platform by using a crawler method, wherein the microblog data comprise text content, praise number, forwarding number, comment number, microblog number, concern number, fan number, authentication state and rumor state, and the microblog data are artificially labeled according to rumor information issued by the microblog official platform and national departments;
step 2, performing irrelevant character filtering, text word segmentation, word removal and data conversion processing according to the text content in the step 1 to obtain optimized text content and an optimized text content phrase, counting the number of words of the optimized text content, obtaining z-score standardized microblog data by optimizing the text content, the number of words of the optimized text content and the number of praise, forwarding number, comment number, microblog number, attention number and fan number in the step 1 of z-score standardized processing, and calculating user credibility characteristics and microblog influence characteristics according to the z-score standardized microblog data;
step 3, modeling calculation is carried out on the optimized text content and the optimized text content words in the step 2 through an LDA topic model, so that LDA topic distribution probability, LDA optimized text content and topic distribution probability and LDA optimized text content word and topic distribution probability are obtained, the LDA optimized text content and topic distribution probability is used as a text deep semantic feature of rumor recognition, and the perplexity is calculated according to the LDA optimized text content and topic distribution probability and the LDA topic and optimized text content word distribution probability;
step 4, constructing a microblog feature vector according to the user credibility feature in the step 2, the microblog influence feature in the step 2 and the LDA theme distribution probability in the step 3;
and 5, according to the user credibility characteristics in the step 2, the microblog influence characteristics in the step 2 and the LDA optimized text content and the theme distribution probability in the step 3 as input characteristics of a random forest model, calculating optimal parameters of the random forest model based on the CART decision tree by using a 10-fold cross validation grid search algorithm, designing a microblog rumor classifier by combining the optimal parameters with the microblog feature vectors in the step 4, training according to the artificially labeled microblog data in the step 1 to obtain a final microblog rumor classifier, and applying the final microblog rumor classifier to rumor screening work.
Preferably, the microblog data in the step 1 are:
weiboi={doci,likei,reposti,commenti,numi,followingi,followeri,verifyi,fakei}
(1≤i≤M)
wherein M is the number of pieces of microblog data, i is the serial number of the microblog data, dociFor text content, likeiFor praise, requestiFor forwarding data, commentiNumber of comments, numiFor microblog count, followingiTo count, followeriVerify, the number of vermicelliiTo an authenticated state, fakeiIn rumor state;
in step 1, the manual notation is:
authentication, verify, of user status through a microblog official platformiRepresents the publication weiboiWhether the user passes the personal authentication of the Sing microblog or not, if so, verifyiIs 1, otherwise verifyiWhen the number is 0, rumor marking is carried out on microblog data through rumor information issued by national departments, and if the microblog weibo is not equal to the national departments, the microblog data are subjected to rumor markingiFake for rumor microblogiIs 1, otherwise fakeiIs 0;
preferably, the z-score normalized microblog data in the step 2 are as follows:
z_weiboi={op_doci,op_wordi,op_ni,z_likei,z_reposti,z_commenti,z_numi,
z_followingi,z_followeri,verifyi,fakei}(1≤i≤M)
wherein, op _ dociTo optimize text content, op wordiTo optimize text content words, op _ niTo optimize the number of text content words, z _ likeiStandardizing the number of praise for z-score, z _ reloadiStandardizing forwarding numbers for z-score, z _ commentiNumber of standardized reviews for z-score, z _ numiStandardizing the number of microblogs for z-score, z _ followingiNormalizing the attention number for z-score, z _ focusiStandardized vermicelli number for z-score;
the user credibility characteristics in the step 2 are as follows:
Figure BDA0001534286180000031
the microblog influence characteristic in the step 2 is as follows:
Figure BDA0001534286180000034
preferably, the confusion degree in step 3 is:
Figure BDA0001534286180000032
D={op_word1,...,op_wordM}
Figure BDA0001534286180000033
pweiboi=(pi,1,...,pi,K)(1≤i≤M)
whereinM is the number of pieces of microblog data in the step 1, op _ niFor the number of words of the optimized text content in step 2, op _ wordiFor the optimized text content word, p (op _ word), in step 2i) To optimize the probability of optimizing textual content words in the textual content, D represents the set of all optimized textual content words, p (z)j|op_doci) The probability p (op _ word) of the j (th) theme in the optimized text content of the ith z-score standardized microblog data in the step 2i|zj) Is the probability of the occurrence of the optimized text content words of the ith z-score standardized microblog data in the step 2 in the jth theme, and K is the number of themes with minimum confusion, pweiboiThe LDA topic distribution probability, p, of the ith z-score normalized microblog data in the step 2i,1~pi,KAre each z1~zKA probability of the topic;
preferably, the microblog feature vector in step 4 is:
cweiboi=(pi,1,...,pi,K,Reliabilityi,Influencei)(1≤i≤M)
wherein M is the number of pieces of microblog data in the step 1, and pi,1~pi,KRespectively in step 3 is z1~zKProbability of topic, ReliabilityiFor the user credibility feature in step 2, infiluenceiAnd (3) the microblog influence characteristics in the step 2.
Compared with the prior art, the microblog text semantic information is deeply mined based on the LDA topic identification model, the LDA optimized text content and the topic distribution probability are obtained, and the LDA optimized text content and the topic distribution probability are classified and trained with the user credibility characteristics and microblog influence characteristic variables as the input variables of the random forest.
Drawings
FIG. 1: is a method flow diagram of an embodiment of the invention;
Detailed Description
In order to facilitate the understanding and implementation of the present invention for those of ordinary skill in the art, the present invention is further described in detail with reference to the accompanying drawings and examples, it is to be understood that the embodiments described herein are merely illustrative and explanatory of the present invention and are not restrictive thereof.
Referring to fig. 1, a method flowchart of an embodiment of the present invention provides a microblog rumor identification method based on LDA and random forest, including the following steps:
step 1, collecting microblog data from a 2016 green wave microblog platform by using a crawler method, wherein the microblog data comprise text content, praise number, forwarding number, comment number, microblog number, concern number, vermicelli number, authentication state and rumor state, and the microblog data are artificially labeled according to rumor information which is published by a microblog rumor official account number of a green wave microblog, a propaganda education center of the national environmental protection ministry, and a 2016 haze rumor which is jointly exposed in 12 months and 30 days by an environmental protection propaganda center of Beijing city as a rumor evaluation benchmark;
preferably, the microblog data in the step 1 are:
weiboi={doci,likei,reposti,commenti,numi,followingi,followeri,verifyi,fakei}(1)
(1≤i≤M)
wherein, M872 is the number of microblog data, i is the serial number of microblog data, dociFor text content, likeiFor praise, requestiFor forwarding data, commentiNumber of comments, numiFor microblog count, followingiTo count, followeriVerify, the number of vermicelliiTo an authenticated state, fakeiIn rumor state;
in step 1, the manual notation is:
authentication, verify, of user status through a microblog official platformiRepresents the publication weiboiWhether the user passes the personal authentication of the Sing microblog or not, if so, verifyiIs 1, otherwise verifyi0, rumor announced by national departmentsRumor marking microblog data if microblog weiboiFake for rumor microblogiIs 1, otherwise fakeiIs 0;
step 2, performing irrelevant character filtering, text word segmentation, word removal and data conversion processing according to the text content in the step 1 to obtain optimized text content and an optimized text content phrase, counting the number of words of the optimized text content, obtaining z-score standardized microblog data by optimizing the text content, the number of words of the optimized text content and the number of praise, forwarding number, comment number, microblog number, attention number and fan number in the step 1 of z-score standardized processing, and calculating user credibility characteristics and microblog influence characteristics according to the z-score standardized microblog data;
preferably, the z-score normalized microblog data in the step 2 are as follows:
z_weiboi={op_doci,op_wordi,op_ni,z_likei,z_reposti,z_commenti,z_numi,(2)
z_followingi,z_followeri,verifyi,fakei}(1≤i≤M)
wherein, op _ dociTo optimize text content, op wordiTo optimize text content words, op _ niTo optimize the number of text content words, z _ likeiStandardizing the number of praise for z-score, z _ reloadiStandardizing forwarding numbers for z-score, z _ commentiNumber of standardized reviews for z-score, z _ numiStandardizing the number of microblogs for z-score, z _ followingiNormalizing the attention number for z-score, z _ focusiStandardized vermicelli number for z-score;
the user credibility characteristics in the step 2 are as follows:
Figure BDA0001534286180000051
the microblog influence characteristic in the step 2 is as follows:
Figure BDA0001534286180000052
step 3, modeling calculation is carried out on the optimized text content and the optimized text content words in the step 2 through an LDA topic model, so that LDA topic distribution probability, LDA optimized text content and topic distribution probability and LDA topic and optimized text content word distribution probability are obtained, the LDA optimized text content and topic distribution probability is used as a text deep semantic feature of rumor recognition, and the perplexity is calculated according to the LDA optimized text content and topic distribution probability and the LDA topic and optimized text content word distribution probability;
preferably, the confusion degree in step 3 is:
Figure BDA0001534286180000061
D={op_word1,...,op_wordM}(6)
Figure BDA0001534286180000062
pweiboi=(pi,1,...,pi,K)(1≤i≤M)(8)
wherein M872 is the number of pieces of microblog data in step 1, and op _ niFor the number of words of the optimized text content in step 2, op _ wordiFor the optimized text content word, p (op _ word), in step 2i) To optimize the probability of optimizing textual content words in textual content, D represents a set of optimized textual content words, p (z)j|op_doci) The probability p (op _ word) of the j (th) theme in the optimized text content of the ith z-score standardized microblog data in the step 2i|zj) Is the probability of the occurrence of the optimized text content words of the ith z-score standardized microblog data in the step 2 in the jth topic, and K-7 is the number of topics with minimum confusion, pweiboiFor the ith z-s in step 2LDA topic distribution probability, p, of core standardized microblog datai,1~pi,KAre each z1~zKA probability of the topic;
step 4, constructing a microblog feature vector according to the user credibility feature in the step 2, the microblog influence feature in the step 2 and the LDA theme distribution probability in the step 3;
preferably, the microblog feature vector in step 4 is:
cweiboi=(pi,1,...,pi,K,Reliabilityi,Influencei)(1≤i≤M) (9)
wherein M872 is the number of pieces of microblog data in step 1, and pi,1~pi,KRespectively in step 3 is z1~zKProbability of topic, ReliabilityiFor the user credibility feature in step 2, infiluenceiAnd (3) the microblog influence characteristics in the step 2.
And 5, according to the user credibility characteristics in the step 2, the microblog influence characteristics in the step 2 and the LDA optimized text content and the theme distribution probability in the step 3 as input characteristics of a random forest model, calculating optimal parameters of the random forest model based on the CART decision tree by using a 10-fold cross validation grid search algorithm, designing a microblog rumor classifier by combining the optimal parameters with the microblog feature vectors in the step 4, training according to the artificially labeled microblog data in the step 1 to obtain a final microblog rumor classifier, and applying the final microblog rumor classifier to rumor screening work.
It should be understood that the above description of the preferred embodiments is given for clarity and not for any purpose of limitation, and that various changes, substitutions and alterations can be made herein without departing from the spirit and scope of the invention as defined by the appended claims.

Claims (1)

1. A microblog rumor identification method based on LDA and random forests is characterized by comprising the following steps:
step 1, collecting microblog data from a microblog official platform by using a crawler method, wherein the microblog data comprise text content, praise number, forwarding number, comment number, microblog number, concern number, fan number, authentication state and rumor state, and the microblog data are artificially labeled according to rumor information issued by the microblog official platform and national departments;
step 2, performing irrelevant character filtering, text word segmentation, word removal and data conversion processing according to the text content in the step 1 to obtain optimized text content and optimized text content words, counting the number of the optimized text content words, obtaining z-score standardized microblog data by optimizing the text content, the optimized text content words and the number of the like in the step 1 and performing z-score standardized processing on the number of prawns, the number of forwarded words, the number of comments, the number of microblogs, the number of concerns and the number of fans, and calculating user credibility characteristics and microblog influence characteristics according to the z-score standardized microblog data;
step 3, modeling calculation is carried out on the optimized text content and the optimized text content words in the step 2 through an LDA topic model, so that LDA topic distribution probability, LDA optimized text content and topic distribution probability and LDA optimized text content word and topic distribution probability are obtained, the LDA optimized text content and topic distribution probability is used as a text deep semantic feature of rumor recognition, and the perplexity is calculated according to the LDA optimized text content and topic distribution probability and the LDA optimized text content word and topic distribution probability;
step 4, constructing a microblog feature vector according to the user credibility feature in the step 2, the microblog influence feature in the step 2 and the LDA theme distribution probability in the step 3;
step 5, according to the user credibility characteristics in the step 2, the microblog influence characteristics in the step 2, the LDA optimized text content and the theme distribution probability in the step 3 as input characteristics of a random forest model, calculating optimal parameters of the random forest model based on a CART decision tree by using a 10-fold cross validation grid search algorithm, designing a microblog rumor classifier by combining the optimal parameters with the microblog feature vectors in the step 4, training according to the artificially labeled microblog data in the step 1 to obtain a final microblog rumor classifier, and applying the final microblog rumor classifier to rumor screening work;
in the step 1, the microblog data are as follows:
weiboi={doci,likei,reposti,commenti,numi,followingi,followeri,verifyi,fakei},1≤i≤M;
wherein M is the number of pieces of microblog data, i is the serial number of the microblog data, dociFor text content, likeiFor praise, requestiFor forwarding data, commentiNumber of comments, numiFor microblog count, followingiTo count, followeriVerify, the number of vermicelliiTo an authenticated state, fakeiIn rumor state;
in step 1, the manual notation is:
authentication, verify, of user status through a microblog official platformiRepresents the publication weiboiWhether the user passes the personal authentication of the Sing microblog or not, if so, verifyiIs 1, otherwise verifyiWhen the number is 0, rumor marking is carried out on microblog data through rumor information issued by national departments, and if the microblog weibo is not equal to the national departments, the microblog data are subjected to rumor markingiFake for rumor microblogiIs 1, otherwise fakeiIs 0;
the z-score standardized microblog data in the step 2 are as follows:
z_weiboi={op_doci,op_wordi,op_ni,z_likei,z_reposti,z_commenti,z_numi,z_followingi,z_followeri,verifyi,fakei},1≤i≤M;
wherein, op _ dociTo optimize text content, op wordiTo optimize text content words, op _ niTo optimize the number of text content words, z _ likeiStandardizing the number of praise for z-score, z _ reloadiIs z-score normalized forwarding number, z _ commentiNumber of standardized reviews for z-score, z _ numiStandardizing the number of microblogs for z-score, z _ followingiNormalizing the attention number for z-score, z _ focusiStandardized vermicelli number for z-score;
the user credibility characteristics in the step 2 are as follows:
Figure FDA0002944169790000021
the microblog influence characteristic in the step 2 is as follows:
Figure FDA0002944169790000022
the confusion degree in step 3 is as follows:
Figure FDA0002944169790000023
D={op_word1,...,op_wordM}
Figure FDA0002944169790000024
pweiboi=(pi,1,...,pi,K),1≤i≤M;
wherein M is the number of pieces of microblog data in the step 1, and op _ niFor the number of words of the optimized text content in step 2, op _ wordiFor the optimized text content word, p (op _ word), in step 2i) To optimize the probability of optimizing textual content words in the textual content, D represents the set of all optimized textual content words, p (z)j|op_doci) Probability p (op _ word) of occurrence of jth theme in optimized text content of ith z-score standardized microblog data in step 2i|zj) Is the ith z-score standard in step 2 of the jth subjectOptimizing the probability of occurrence of text content words of microblog data, wherein K is the number of subjects with minimum confusion degree perplexity, pweiboiFor the LDA topic distribution probability, p, of the ith z-score normalized microblog data in the step 2i,1~pi,KAre each z1~zKA probability of the topic;
in step 4, the microblog feature vector is as follows:
cweiboi=(pi,1,...,pi,K,Reliabilityi,Influencei),1≤i≤M;
wherein M is the number of pieces of microblog data in the step 1, ReliabilityiFor the user credibility feature in step 2, infiluenceiAnd (3) the microblog influence characteristics in the step 2.
CN201711483228.0A 2017-12-29 2017-12-29 Microblog rumor identification method based on LDA and random forest Expired - Fee Related CN108090046B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201711483228.0A CN108090046B (en) 2017-12-29 2017-12-29 Microblog rumor identification method based on LDA and random forest

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201711483228.0A CN108090046B (en) 2017-12-29 2017-12-29 Microblog rumor identification method based on LDA and random forest

Publications (2)

Publication Number Publication Date
CN108090046A CN108090046A (en) 2018-05-29
CN108090046B true CN108090046B (en) 2021-05-04

Family

ID=62181216

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201711483228.0A Expired - Fee Related CN108090046B (en) 2017-12-29 2017-12-29 Microblog rumor identification method based on LDA and random forest

Country Status (1)

Country Link
CN (1) CN108090046B (en)

Families Citing this family (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109558483B (en) * 2018-10-16 2021-06-18 北京航空航天大学 Rumor recognition method based on naive Bayes model
CN111385655A (en) * 2018-12-29 2020-07-07 武汉斗鱼网络科技有限公司 Advertisement bullet screen detection method and device, server and storage medium
CN110413776B (en) * 2019-07-01 2021-09-14 武汉大学 High-performance calculation method for LDA (text-based extension) of text topic model based on CPU-GPU (Central processing Unit-graphics processing Unit) collaborative parallel
CN110795560A (en) * 2019-10-21 2020-02-14 国网湖南省电力有限公司 Method and system for subdividing power grid electricity customers
CN111368092B (en) * 2020-02-21 2020-12-04 中国科学院电子学研究所苏州研究院 Knowledge graph construction method based on trusted webpage resources
CN112766359B (en) * 2021-01-14 2023-07-25 北京工商大学 Word double-dimension microblog rumor identification method for food safety public opinion
CN113190682B (en) * 2021-06-30 2021-09-28 平安科技(深圳)有限公司 Method and device for acquiring event influence degree based on tree model and computer equipment

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105975478A (en) * 2016-04-09 2016-09-28 北京交通大学 Word vector analysis-based online article belonging event detection method and device
CN106202211A (en) * 2016-06-27 2016-12-07 四川大学 A kind of integrated microblogging rumour recognition methods based on microblogging type
CN107423339A (en) * 2017-04-29 2017-12-01 天津大学 Popular microblogging Forecasting Methodology based on extreme Gradient Propulsion and random forest

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105975478A (en) * 2016-04-09 2016-09-28 北京交通大学 Word vector analysis-based online article belonging event detection method and device
CN106202211A (en) * 2016-06-27 2016-12-07 四川大学 A kind of integrated microblogging rumour recognition methods based on microblogging type
CN107423339A (en) * 2017-04-29 2017-12-01 天津大学 Popular microblogging Forecasting Methodology based on extreme Gradient Propulsion and random forest

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
Automatic Detection of Rumor on Social Network;Qiao Zhang et al.;《CCF International Conference on Natural Language Processing and Chinese Computing》;20151020;第1-10页 *
False rumors detection on sina weibo by propagation structures;Ke Wu et al.;《2015 IEEE 31st International Conference on Data Engineering》;20150601;第651-662页 *
新浪微博平台上的用户可信度评估;王峰 等;《计算机科学与探索》;20131231;第1125-1134页 *

Also Published As

Publication number Publication date
CN108090046A (en) 2018-05-29

Similar Documents

Publication Publication Date Title
CN108090046B (en) Microblog rumor identification method based on LDA and random forest
Heidari et al. Deep contextualized word embedding for text-based online user profiling to detect social bots on twitter
Hu et al. Social spammer detection with sentiment information
Abbasi et al. Descriptive analytics: Examining expert hackers in web forums
CN104615608B (en) A kind of data mining processing system and method
Agarwal et al. A focused crawler for mining hate and extremism promoting videos on YouTube.
CN106940732A (en) A kind of doubtful waterborne troops towards microblogging finds method
CN110990683B (en) Microblog rumor integrated identification method and device based on region and emotional characteristics
CN104573094B (en) Network account identifies matching process
Bouazizi et al. Sentiment analysis in twitter: From classification to quantification of sentiments within tweets
CN113055386B (en) Method and device for identifying and analyzing attack organization
US9563770B2 (en) Spammer group extraction apparatus and method
Parime et al. Cyberbullying detection and prevention: Data mining and psychological perspective
CN107305545A (en) A kind of recognition methods of the network opinion leader based on text tendency analysis
Feng et al. Stopping the cyberattack in the early stage: assessing the security risks of social network users
Ashcroft et al. A Step Towards Detecting Online Grooming--Identifying Adults Pretending to be Children
US20170011480A1 (en) Data analysis system, data analysis method, and data analysis program
Heidari et al. Online user profiling to detect social bots on twitter
Abinaya et al. Spam detection on social media platforms
Jin et al. Filtering spam in Weibo using ensemble imbalanced classification and knowledge expansion
Gaurav et al. Machine learning technique for fake news detection using text-based word vector representation
Al-Hashedi et al. Cyberbullying detection based on emotion
Song et al. Spatial and temporal sentiment analysis of twitter data
CN110110079B (en) Social network spam user detection method
Lee et al. Cyberbullying Detection on Social Network Services.

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant
CF01 Termination of patent right due to non-payment of annual fee

Granted publication date: 20210504

Termination date: 20211229

CF01 Termination of patent right due to non-payment of annual fee