CN108090046A - A kind of microblogging rumour recognition methods based on LDA and random forest - Google Patents

A kind of microblogging rumour recognition methods based on LDA and random forest Download PDF

Info

Publication number
CN108090046A
CN108090046A CN201711483228.0A CN201711483228A CN108090046A CN 108090046 A CN108090046 A CN 108090046A CN 201711483228 A CN201711483228 A CN 201711483228A CN 108090046 A CN108090046 A CN 108090046A
Authority
CN
China
Prior art keywords
text
mrow
microblogging
content
word
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201711483228.0A
Other languages
Chinese (zh)
Other versions
CN108090046B (en
Inventor
曾子明
王婧
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Wuhan University WHU
Original Assignee
Wuhan University WHU
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Wuhan University WHU filed Critical Wuhan University WHU
Priority to CN201711483228.0A priority Critical patent/CN108090046B/en
Publication of CN108090046A publication Critical patent/CN108090046A/en
Application granted granted Critical
Publication of CN108090046B publication Critical patent/CN108090046B/en
Expired - Fee Related legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • G06F40/289Phrasal analysis, e.g. finite state techniques or chunking
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/35Clustering; Classification
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/30Semantic analysis

Abstract

The invention discloses a kind of microblogging rumour recognition methods based on LDA and random forest, collect microblog data from microblogging official platform using reptile method and are manually marked;Microblog data is standardized by content of text data processing and z score to calculate User reliability feature and microblogging influence power feature;Content of text is optimized by LDA and calculates puzzlement degree with optimization content of text word distribution probability with theme distribution probability and LDA themes;Further structure structure microblogging feature vector;The input feature vector of Random Forest model is used as to establish microblogging rumour grader with theme distribution probability by User reliability feature, microblogging influence power feature, LDA optimization content of text.The present invention has deeply excavated microblogging text semantic information and rumour nicety of grading is high.

Description

A kind of microblogging rumour recognition methods based on LDA and random forest
Technical field
The present invention relates to fields such as social networks, text analyzings, more particularly to a kind of social activity based on LDA and random forest Network rumour recognition methods.
Background technology
With the rapid development of internet and mobile communication equipment, online social platform becomes people and issues and obtain letter Cease, develop and maintain the important channel of social relationships.Microblogging easily interactive mode, friendly Interactive Experience and is entered by it The influence gravitational attraction of famous person a large number of users.Index is irrigated according to the unicom of in August, 2017 to show, the microblogging moon, any active ues were up to 3.3 hundred million. One of social platform active as China, microblogging summarize a large amount of fragment type users and generate information.Due to the letter of social platform Cease the serious chaos state of presentation, the uncertain promotion of individual cognition, network rumour thus growth.Research finds to cause larger society The rumour that can be influenced largely is derived from microblog.Under the situation of official channel missing, rumour can alleviated to a certain degree The cognitive A-states of people.However, wreaking havoc for rumour often triggers negative passive network public opinion disturbance, to social stability and citizen Safety forms potential threat, and the identification work of network rumour is particularly critical.
The research of related rumour identification at present mainly around the research of rumour text feature, rumour issue user characteristics study with And communication network properties study analysis network rumour generates and mechanism of transmission.
In the above-mentioned methods, the Deep Semantics feature of rumour content, propagation User reliability and behavioural characteristic not yet obtain It is used to good.
The content of the invention
In order to solve deficiency of the prior art, the technical scheme is that a kind of micro- based on LDA and random forest Rich rumour recognition methods.Comprise the following steps:
Step 1, microblog data is collected from microblogging official platform using reptile method, the microblog data is included in text Hold, thumb up number, forwarding number, comment number, microblogging number, concern number, bean vermicelli number, authentication state, rumour state, official puts down according to microblogging Platform and the rumour information of national sector's issue manually mark microblog data;
Step 2, the content of text according to step 1 carries out unrelated character filtering, text participle, goes stop words, data Conversion process, so as to obtain optimization content of text and optimization content of text phrase, and the number of statistic op- timization content of text word Amount is standardized by optimizing content of text, optimization content of text word, the quantity for optimizing content of text word and z-score Number, forwarding number, comment number, microblogging number, concern number, bean vermicelli number are thumbed up so as to obtain z-score standards described in processing step 1 Change microblog data, and microblog data is standardized according to z-score and calculates User reliability feature and microblogging influence power feature;
Step 3, carried out by LDA topic models to optimizing content of text and optimization content of text word described in step 2 Modeling Calculation, so as to obtain LDA theme distributions probability, LDA optimization content of text and theme distribution probability and LDA optimization texts Content word and theme distribution probability, the text deep layer that LDA optimization content of text is identified with theme distribution probability as rumour Semantic feature, and content of text is optimized according to LDA and is distributed with theme distribution probability and LDA themes with optimization content of text word Probability calculation puzzlement degree;
Step 4, the User reliability feature according to step 2, microblogging influence power feature described in step 2, in step 3 The LDA theme distributions probability builds microblogging feature vector;
Step 5, the User reliability feature according to step 2, microblogging influence power feature described in step 2, in step 3 The LDA optimization content of text and input feature vector of the theme distribution probability as Random Forest model, use 10 folding cross validations Grid-search algorithms calculate the Random Forest model based on CART decision trees optimized parameter, the optimized parameter combination step Microblogging feature vector described in 4 designs microblogging rumour grader, and the microblog data manually marked according to step 1 carries out Training obtains final microblogging rumour grader, and work is screened applied to rumour.
Preferably, microblog data is described in step 1:
weiboi={ doci,likei,reposti,commenti,numi,followingi,followeri,verifyi, fakei}
(1≤i≤M)
Wherein, M be microblog data item number, i be microblog data sequence number, dociFor content of text, likeiTo thumb up number, repostiTo forward number, commentiTo comment on number, numiFor microblogging number, followingiTo pay close attention to number, followeriFor powder Silk number, verifyiFor authentication state, fakeiFor rumour state;
It is manually labeled as described in step 1:
User Status is authenticated by microblogging official platform, verifyiRepresent issue weiboiUser whether lead to Sina weibo personal authentication is crossed, if passing through verifyiFor 1, otherwise verifyiFor 0, pass through the rumour that national sector issues and believe Breath carries out rumour mark to microblog data, if microblogging weiboiFor rumour microblogging, then fakeiFor 1, otherwise fakeiFor 0;
Preferably, the standardization of z-score described in step 2 microblog data is:
z_weiboi={ op_doci,op_wordi,op_ni,z_likei,z_reposti,z_commenti,z_numi,
z_followingi,z_followeri,verifyi,fakei}(1≤i≤M)
Wherein, op_dociTo optimize content of text, op_wordiTo optimize content of text word, op_niTo optimize text The quantity of content word, z_likeiNumber, z_repost are thumbed up for z-score standardizationiForwarding number, z_ are standardized for z-score commentiComment number, z_num are standardized for z-scoreiMicroblogging number, z_following are standardized for z-scoreiFor z- Score standardization concern numbers, z_followeriBean vermicelli number is standardized for z-score;
User reliability is characterized as described in step 2:
Microblogging influence power described in step 2 is characterized as:
Preferably, puzzlement degree described in step 3 is:
D={ op_word1,...,op_wordM}
pweiboi=(pi,1,...,pi,K)(1≤i≤M)
Wherein, M be step 1 described in microblog data item number, op_niTo optimize content of text word described in step 2 Quantity, op_wordiTo optimize content of text word, p (op_word described in step 2i) literary to optimize in optimization content of text The probability of this content word, D represent the set of all optimization content of text words, p (zj|op_doci) it is i-th described in step 2 The probability that j-th of theme occurs in the optimization content of text of z-score standardization microblog data, p (op_wordi|zj) it is jth The probability that the optimization content of text word of i-th z-score standardization microblog data occurs described in step 2 in a theme, K are Theme number during puzzlement degree perplexity minimums, pweiboiMicroblogging number is standardized for i-th z-score described in step 2 According to LDA theme distribution probability, pi,1~pi,KRespectively z1~zKThe probability of theme;
Preferably, microblogging feature vector described in step 4 is:
cweiboi=(pi,1,...,pi,K,Reliabilityi,Influencei)(1≤i≤M)
Wherein, M be step 1 described in microblog data item number, pi,1~pi,KIt is respectively z described in step 31~zKTheme Probability, ReliabilityiFor User reliability feature described in step 2, InfluenceiIt is influenced for microblogging described in step 2 Power feature.
Compared with prior art, the present invention is based on LDA topic identifications models deeply to excavate microblogging text semantic information, obtains LDA optimizes content of text and theme distribution probability, by itself and the User reliability feature of definition and microblogging influence power characteristic variable Input variable as random forest carries out classification based training, and rumour recognition effect of the present invention is notable, and the identification of model rumour is accurate Rate is high.
Description of the drawings
Fig. 1:It is the method flow diagram of the embodiment of the present invention;
Specific embodiment
Understand for the ease of those of ordinary skill in the art and implement the present invention, with reference to the accompanying drawings and embodiments to this hair It is bright to be described in further detail, it should be understood that implementation example described herein is merely to illustrate and explain the present invention, not For limiting the present invention.
Referring to Fig.1, the method flow diagram of the embodiment of the present invention, the present invention provides a kind of micro- based on LDA and random forest Rich rumour recognition methods, comprises the following steps:
Step 1, microblog data was collected on Sina weibo platform from 2016 using reptile method, the microblog data includes Content of text thumbs up number, forwarding number, comment number, microblogging number, concern number, bean vermicelli number, authentication state, rumour state, according to Sina Refute a rumour microblogging of refuting a rumour, Ministry of Environmental Protection's communication and education center, the Beijing environment of official account issue of the microblogging of microblogging is protected The haze rumours in 2016 that Publicity and Education Center is protected in joint exposure on December 30 judge the rumour information of benchmark to microblogging as rumour Data are manually marked;
Preferably, microblog data is described in step 1:
weiboi={ doci,likei,reposti,commenti,numi,followingi,followeri,verifyi, fakei}(1)
(1≤i≤M)
Wherein, M=872 be microblog data item number, i be microblog data sequence number, dociFor content of text, likeiFor point Praise number, repostiTo forward number, commentiTo comment on number, numiFor microblogging number, followingiTo pay close attention to number, followeri For bean vermicelli number, verifyiFor authentication state, fakeiFor rumour state;
It is manually labeled as described in step 1:
User Status is authenticated by microblogging official platform, verifyiRepresent issue weiboiUser whether lead to Sina weibo personal authentication is crossed, if passing through verifyiFor 1, otherwise verifyiFor 0, pass through the rumour that national sector issues and believe Breath carries out rumour mark to microblog data, if microblogging weiboiFor rumour microblogging, then fakeiFor 1, otherwise fakeiFor 0;
Step 2, the content of text according to step 1 carries out unrelated character filtering, text participle, goes stop words, data Conversion process, so as to obtain optimization content of text and optimization content of text phrase, and the number of statistic op- timization content of text word Amount is standardized by optimizing content of text, optimization content of text word, the quantity for optimizing content of text word and z-score Number, forwarding number, comment number, microblogging number, concern number, bean vermicelli number are thumbed up so as to obtain z-score standards described in processing step 1 Change microblog data, and microblog data is standardized according to z-score and calculates User reliability feature and microblogging influence power feature;
Preferably, the standardization of z-score described in step 2 microblog data is:
z_weiboi={ op_doci,op_wordi,op_ni,z_likei,z_reposti,z_commenti,z_numi, (2)
z_followingi,z_followeri,verifyi,fakei}(1≤i≤M)
Wherein, op_dociTo optimize content of text, op_wordiTo optimize content of text word, op_niTo optimize text The quantity of content word, z_likeiNumber, z_repost are thumbed up for z-score standardizationiForwarding number, z_ are standardized for z-score commentiComment number, z_num are standardized for z-scoreiMicroblogging number, z_following are standardized for z-scoreiFor z- Score standardization concern numbers, z_followeriBean vermicelli number is standardized for z-score;
User reliability is characterized as described in step 2:
Microblogging influence power described in step 2 is characterized as:
Step 3, carried out by LDA topic models to optimizing content of text and optimization content of text word described in step 2 Modeling Calculation, so as to obtain LDA theme distributions probability, LDA optimization content of text and theme distribution probability and LDA themes with it is excellent Change content of text word distribution probability, the text deep layer that LDA optimization content of text is identified with theme distribution probability as rumour Semantic feature, and content of text is optimized according to LDA and is distributed with theme distribution probability and LDA themes with optimization content of text word Probability calculation puzzlement degree;
Preferably, puzzlement degree described in step 3 is:
D={ op_word1,...,op_wordM}(6)
pweiboi=(pi,1,...,pi,K)(1≤i≤M)(8)
Wherein, M=872 be step 1 described in microblog data item number, op_niTo optimize content of text described in step 2 The quantity of word, op_wordiTo optimize content of text word, p (op_word described in step 2i) excellent in content of text to optimize Change the probability of content of text word, D represents the set of optimization content of text word, p (zj|op_doci) it is i-th described in step 2 The probability that j-th of theme occurs in the optimization content of text of z-score standardization microblog data, p (op_wordi|zj) it is jth The probability that the optimization content of text word of i-th z-score standardization microblog data occurs described in step 2 in a theme, K= 7 be puzzlement degree perplexity minimums when theme number, pweiboiIt is micro- for i-th z-score standardization described in step 2 The LDA theme distribution probability of rich data, pi,1~pi,KRespectively z1~zKThe probability of theme;
Step 4, the User reliability feature according to step 2, microblogging influence power feature described in step 2, in step 3 The LDA theme distributions probability builds microblogging feature vector;
Preferably, microblogging feature vector described in step 4 is:
cweiboi=(pi,1,...,pi,K,Reliabilityi,Influencei)(1≤i≤M) (9)
Wherein, M=872 be step 1 described in microblog data item number, pi,1~pi,KIt is respectively z described in step 31~zK The probability of theme, ReliabilityiFor User reliability feature described in step 2, InfluenceiFor microblogging described in step 2 Influence power feature.
Step 5, the User reliability feature according to step 2, microblogging influence power feature described in step 2, in step 3 The LDA optimization content of text and input feature vector of the theme distribution probability as Random Forest model, use 10 folding cross validations Grid-search algorithms calculate the Random Forest model based on CART decision trees optimized parameter, the optimized parameter combination step Microblogging feature vector described in 4 designs microblogging rumour grader, and the microblog data manually marked according to step 1 carries out Training obtains final microblogging rumour grader, and work is screened applied to rumour.
It should be appreciated that the above-mentioned description for preferred embodiment is more detailed, can not therefore be considered to this The limitation of invention patent protection scope, those of ordinary skill in the art are not departing from power of the present invention under the enlightenment of the present invention Profit is required under protected ambit, can also be made replacement or deformation, be each fallen within protection scope of the present invention, this hair It is bright scope is claimed to be determined by the appended claims.

Claims (2)

1. a kind of microblogging rumour recognition methods based on LDA and random forest, which is characterized in that comprise the following steps:
Step 1, collect microblog data from microblogging official platform using reptile method, the microblog data include content of text, Number, forwarding number, comment number, microblogging number, concern number, bean vermicelli number, authentication state, rumour state are thumbed up, according to microblogging official platform The rumour information issued with national sector manually marks microblog data;
Step 2, the content of text according to step 1 carries out unrelated character filtering, text participle, goes stop words, data conversion Processing so as to obtain optimization content of text and optimization content of text phrase, and the quantity of statistic op- timization content of text word, is led to Cross optimization content of text, optimization content of text word, the quantity for optimizing content of text word and z-score standardizations step Number, forwarding number, comment number, microblogging number, concern number, bean vermicelli number are thumbed up so as to obtain z-score standardization microbloggings described in rapid 1 Data, and microblog data is standardized according to z-score and calculates User reliability feature and microblogging influence power feature;
Step 3, it is modeled by LDA topic models to optimizing content of text and optimization content of text word described in step 2 It calculates, so as to obtain LDA theme distributions probability, LDA optimization content of text and theme distribution probability and LDA optimization content of text Word and theme distribution probability, the text Deep Semantics that LDA optimization content of text is identified with theme distribution probability as rumour Feature, and content of text and theme distribution probability and LDA themes and optimization content of text word distribution probability are optimized according to LDA Calculate puzzlement degree;
Step 4, the User reliability feature according to step 2, microblogging influence power feature described in step 2, described in step 3 LDA theme distribution probability builds microblogging feature vector;
Step 5, the User reliability feature according to step 2, microblogging influence power feature described in step 2, described in step 3 LDA optimizes content of text and input feature vector of the theme distribution probability as Random Forest model, uses the net of 10 folding cross validations Lattice searching algorithm calculates the optimized parameter of the Random Forest model based on CART decision trees, in the optimized parameter combination step 4 The microblogging feature vector designs microblogging rumour grader, and the microblog data manually marked according to step 1 is instructed Final microblogging rumour grader is got, work is screened applied to rumour.
2. the microblogging rumour recognition methods according to claim 1 based on LDA and random forest, which is characterized in that step 1 Described in microblog data be:
weiboi={ doci,likei,reposti,commenti,numi,followingi,followeri,verifyi,fakei} (1≤i≤M)
Wherein, M be microblog data item number, i be microblog data sequence number, dociFor content of text, likeiTo thumb up number, repostiTo forward number, commentiTo comment on number, numiFor microblogging number, followingiTo pay close attention to number, followeriFor powder Silk number, verifyiFor authentication state, fakeiFor rumour state;
It is manually labeled as described in step 1:
User Status is authenticated by microblogging official platform, verifyiRepresent issue weiboiUser whether by new Unrestrained microblogging personal authentication, if passing through verifyiFor 1, otherwise verifyiFor 0, pass through the rumour information pair that national sector issues Microblog data carries out rumour mark, if microblogging weiboiFor rumour microblogging, then fakeiFor 1, otherwise fakeiFor 0;
Z-score described in step 2 standardizes microblog data:
z_weiboi={ op_doci,op_wordi,op_ni,z_likei,z_reposti,z_commenti,z_numi,z_ followingi,z_followeri,verifyi,fakei}(1≤i≤M)
Wherein, op_dociTo optimize content of text, op_wordiTo optimize content of text word, op_niTo optimize content of text The quantity of word, z_likeiNumber, z_repost are thumbed up for z-score standardizationiForwarding number, z_ are standardized for z-score commentiComment number, z_num are standardized for z-scoreiMicroblogging number, z_following are standardized for z-scoreiFor z- Score standardization concern numbers, z_followeriBean vermicelli number is standardized for z-score;
User reliability is characterized as described in step 2:
Microblogging influence power described in step 2 is characterized as:
<mrow> <msub> <mi>Influence</mi> <mi>i</mi> </msub> <mo>=</mo> <mi>log</mi> <mrow> <mo>(</mo> <msup> <mi>e</mi> <mrow> <mi>z</mi> <mo>_</mo> <msub> <mi>follower</mi> <mi>i</mi> </msub> </mrow> </msup> <mo>+</mo> <msup> <mi>e</mi> <mrow> <mi>z</mi> <mo>_</mo> <msub> <mi>repost</mi> <mi>i</mi> </msub> </mrow> </msup> <mo>+</mo> <msup> <mi>e</mi> <mrow> <mi>z</mi> <mo>_</mo> <msub> <mi>comment</mi> <mi>i</mi> </msub> </mrow> </msup> <mo>)</mo> </mrow> <mo>+</mo> <mi>z</mi> <mo>_</mo> <msub> <mi>like</mi> <mi>i</mi> </msub> </mrow>
Puzzlement degree described in step 3 is:
<mrow> <mi>p</mi> <mi>e</mi> <mi>r</mi> <mi>p</mi> <mi>l</mi> <mi>e</mi> <mi>x</mi> <mi>i</mi> <mi>t</mi> <mi>y</mi> <mrow> <mo>(</mo> <mi>D</mi> <mo>)</mo> </mrow> <mo>=</mo> <mi>exp</mi> <mo>{</mo> <mo>-</mo> <mfrac> <mrow> <munderover> <mo>&amp;Sigma;</mo> <mrow> <mi>i</mi> <mo>=</mo> <mn>1</mn> </mrow> <mi>M</mi> </munderover> <mi>log</mi> <mi> </mi> <mi>p</mi> <mrow> <mo>(</mo> <mi>o</mi> <mi>p</mi> <mo>_</mo> <msub> <mi>word</mi> <mi>i</mi> </msub> <mo>)</mo> </mrow> </mrow> <mrow> <munderover> <mo>&amp;Sigma;</mo> <mrow> <mi>i</mi> <mo>=</mo> <mn>1</mn> </mrow> <mi>M</mi> </munderover> <mi>o</mi> <mi>p</mi> <mo>_</mo> <msub> <mi>n</mi> <mi>i</mi> </msub> </mrow> </mfrac> <mo>}</mo> </mrow>
D={ op_word1,...,op_wordM}
<mrow> <mi>p</mi> <mrow> <mo>(</mo> <mi>o</mi> <mi>p</mi> <mo>_</mo> <msub> <mi>word</mi> <mi>i</mi> </msub> <mo>)</mo> </mrow> <mo>=</mo> <munderover> <mo>&amp;Sigma;</mo> <mrow> <mi>j</mi> <mo>=</mo> <mn>1</mn> </mrow> <mi>K</mi> </munderover> <mrow> <mo>(</mo> <mi>p</mi> <mo>(</mo> <mrow> <msub> <mi>z</mi> <mi>j</mi> </msub> <mo>|</mo> <mi>o</mi> <mi>p</mi> <mo>_</mo> <msub> <mi>doc</mi> <mi>i</mi> </msub> </mrow> <mo>)</mo> <mi>p</mi> <mo>(</mo> <mrow> <mi>o</mi> <mi>p</mi> <mo>_</mo> <msub> <mi>word</mi> <mi>i</mi> </msub> <mo>|</mo> <msub> <mi>z</mi> <mi>j</mi> </msub> </mrow> <mo>)</mo> <mo>)</mo> </mrow> </mrow>
pweiboi=(pi,1,...,pi,K)(1≤i≤M)
Wherein, M be step 1 described in microblog data item number, op_niTo optimize the number of content of text word described in step 2 Amount, op_wordiTo optimize content of text word, p (op_word described in step 2i) in optimization text in optimization content of text Hold the probability of word, D represents the set of all optimization content of text words, p (zj|op_doci) it is i-th z- described in step 2 The probability that j-th of theme occurs in the optimization content of text of score standardization microblog datas, p (op_wordi|zj) it is j-th of master The probability that the optimization content of text word of i-th z-score standardization microblog data occurs described in step 2 in topic, K are puzzlement Spend theme number during perplexity minimums, pweiboiMicroblog data is standardized for i-th z-score described in step 2 LDA theme distribution probability, pi,1~pi,KRespectively z1~zKThe probability of theme;
Microblogging feature vector described in step 4 is:
cweiboi=(pi,1,...,pi,K,Reliabilityi,Influencei)(1≤i≤M)
Wherein, M be step 1 described in microblog data item number, pi,1~pi,KIt is respectively z described in step 31~zKTheme it is general Rate, ReliabilityiFor User reliability feature described in step 2, InfluenceiIt is special for microblogging influence power described in step 2 Sign.
CN201711483228.0A 2017-12-29 2017-12-29 Microblog rumor identification method based on LDA and random forest Expired - Fee Related CN108090046B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201711483228.0A CN108090046B (en) 2017-12-29 2017-12-29 Microblog rumor identification method based on LDA and random forest

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201711483228.0A CN108090046B (en) 2017-12-29 2017-12-29 Microblog rumor identification method based on LDA and random forest

Publications (2)

Publication Number Publication Date
CN108090046A true CN108090046A (en) 2018-05-29
CN108090046B CN108090046B (en) 2021-05-04

Family

ID=62181216

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201711483228.0A Expired - Fee Related CN108090046B (en) 2017-12-29 2017-12-29 Microblog rumor identification method based on LDA and random forest

Country Status (1)

Country Link
CN (1) CN108090046B (en)

Cited By (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109558483A (en) * 2018-10-16 2019-04-02 北京航空航天大学 A kind of rumour recognition methods based on model-naive Bayesian
CN110413776A (en) * 2019-07-01 2019-11-05 武汉大学 It is a kind of to cooperate with parallel text subject model LDA high-performance calculation method based on CPU-GPU
CN110795560A (en) * 2019-10-21 2020-02-14 国网湖南省电力有限公司 Method and system for subdividing power grid electricity customers
CN111368092A (en) * 2020-02-21 2020-07-03 中国科学院电子学研究所苏州研究院 Knowledge graph construction method based on trusted webpage resources
CN111385655A (en) * 2018-12-29 2020-07-07 武汉斗鱼网络科技有限公司 Advertisement bullet screen detection method and device, server and storage medium
CN112766359A (en) * 2021-01-14 2021-05-07 北京工商大学 Word double-dimensional microblog rumor recognition method for food safety public sentiment
CN113190682A (en) * 2021-06-30 2021-07-30 平安科技(深圳)有限公司 Method and device for acquiring event influence degree based on tree model and computer equipment

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105975478A (en) * 2016-04-09 2016-09-28 北京交通大学 Word vector analysis-based online article belonging event detection method and device
CN106202211A (en) * 2016-06-27 2016-12-07 四川大学 A kind of integrated microblogging rumour recognition methods based on microblogging type
CN107423339A (en) * 2017-04-29 2017-12-01 天津大学 Popular microblogging Forecasting Methodology based on extreme Gradient Propulsion and random forest

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105975478A (en) * 2016-04-09 2016-09-28 北京交通大学 Word vector analysis-based online article belonging event detection method and device
CN106202211A (en) * 2016-06-27 2016-12-07 四川大学 A kind of integrated microblogging rumour recognition methods based on microblogging type
CN107423339A (en) * 2017-04-29 2017-12-01 天津大学 Popular microblogging Forecasting Methodology based on extreme Gradient Propulsion and random forest

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
KE WU ET AL.: "False rumors detection on sina weibo by propagation structures", 《2015 IEEE 31ST INTERNATIONAL CONFERENCE ON DATA ENGINEERING》 *
QIAO ZHANG ET AL.: "Automatic Detection of Rumor on Social Network", 《CCF INTERNATIONAL CONFERENCE ON NATURAL LANGUAGE PROCESSING AND CHINESE COMPUTING》 *
王峰 等: "新浪微博平台上的用户可信度评估", 《计算机科学与探索》 *

Cited By (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109558483A (en) * 2018-10-16 2019-04-02 北京航空航天大学 A kind of rumour recognition methods based on model-naive Bayesian
CN109558483B (en) * 2018-10-16 2021-06-18 北京航空航天大学 Rumor recognition method based on naive Bayes model
CN111385655A (en) * 2018-12-29 2020-07-07 武汉斗鱼网络科技有限公司 Advertisement bullet screen detection method and device, server and storage medium
CN110413776A (en) * 2019-07-01 2019-11-05 武汉大学 It is a kind of to cooperate with parallel text subject model LDA high-performance calculation method based on CPU-GPU
CN110795560A (en) * 2019-10-21 2020-02-14 国网湖南省电力有限公司 Method and system for subdividing power grid electricity customers
CN111368092A (en) * 2020-02-21 2020-07-03 中国科学院电子学研究所苏州研究院 Knowledge graph construction method based on trusted webpage resources
CN111368092B (en) * 2020-02-21 2020-12-04 中国科学院电子学研究所苏州研究院 Knowledge graph construction method based on trusted webpage resources
CN112766359A (en) * 2021-01-14 2021-05-07 北京工商大学 Word double-dimensional microblog rumor recognition method for food safety public sentiment
CN112766359B (en) * 2021-01-14 2023-07-25 北京工商大学 Word double-dimension microblog rumor identification method for food safety public opinion
CN113190682A (en) * 2021-06-30 2021-07-30 平安科技(深圳)有限公司 Method and device for acquiring event influence degree based on tree model and computer equipment
CN113190682B (en) * 2021-06-30 2021-09-28 平安科技(深圳)有限公司 Method and device for acquiring event influence degree based on tree model and computer equipment

Also Published As

Publication number Publication date
CN108090046B (en) 2021-05-04

Similar Documents

Publication Publication Date Title
CN108090046A (en) A kind of microblogging rumour recognition methods based on LDA and random forest
CN105045857A (en) Social network rumor recognition method and system
US8306932B2 (en) System and method for adaptive data masking
CN104573094B (en) Network account identifies matching process
CN104991891B (en) A kind of short text feature extracting method
CN107220352A (en) The method and apparatus that comment collection of illustrative plates is built based on artificial intelligence
CN111767725B (en) Data processing method and device based on emotion polarity analysis model
CN106294590A (en) A kind of social networks junk user filter method based on semi-supervised learning
CN106295702B (en) A kind of social platform user classification method based on the analysis of individual affective behavior
CN108804701A (en) Personage&#39;s portrait model building method based on social networks big data
CN112199608A (en) Social media rumor detection method based on network information propagation graph modeling
Valdivia et al. Neutrality in the sentiment analysis problem based on fuzzy majority
Chen et al. Modeling rumor diffusion process with the consideration of individual heterogeneity: Take the imported food safety issue as an example during the COVID-19 pandemic
CN107688576A (en) The structure and tendentiousness sorting technique of a kind of CNN SVM models
CN109522416A (en) A kind of construction method of Financial Risk Control knowledge mapping
CN102945246A (en) Method and device for processing network information data
CN111191099A (en) User activity type identification method based on social media
CN104778283A (en) User occupation classification method and system based on microblog
CN112559734A (en) Presentation generation method and device, electronic equipment and computer readable storage medium
CN107392392A (en) Microblogging forwarding Forecasting Methodology based on deep learning
Fedushko et al. Determination of the account personal data adequacy of web-community member
CN115423639A (en) Social network-oriented secure community discovery method
CN108536757A (en) One kind being based on the potentially harmful theme bootstrap technique of user&#39;s history network
CN108073604A (en) Text handling method and device
Al-Hashedi et al. Cyberbullying detection based on emotion

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant
CF01 Termination of patent right due to non-payment of annual fee
CF01 Termination of patent right due to non-payment of annual fee

Granted publication date: 20210504

Termination date: 20211229