CN108090046A - A kind of microblogging rumour recognition methods based on LDA and random forest - Google Patents
A kind of microblogging rumour recognition methods based on LDA and random forest Download PDFInfo
- Publication number
- CN108090046A CN108090046A CN201711483228.0A CN201711483228A CN108090046A CN 108090046 A CN108090046 A CN 108090046A CN 201711483228 A CN201711483228 A CN 201711483228A CN 108090046 A CN108090046 A CN 108090046A
- Authority
- CN
- China
- Prior art keywords
- text
- mrow
- microblogging
- content
- word
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/20—Natural language analysis
- G06F40/279—Recognition of textual entities
- G06F40/289—Phrasal analysis, e.g. finite state techniques or chunking
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/30—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
- G06F16/35—Clustering; Classification
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/30—Semantic analysis
Abstract
The invention discloses a kind of microblogging rumour recognition methods based on LDA and random forest, collect microblog data from microblogging official platform using reptile method and are manually marked;Microblog data is standardized by content of text data processing and z score to calculate User reliability feature and microblogging influence power feature;Content of text is optimized by LDA and calculates puzzlement degree with optimization content of text word distribution probability with theme distribution probability and LDA themes;Further structure structure microblogging feature vector;The input feature vector of Random Forest model is used as to establish microblogging rumour grader with theme distribution probability by User reliability feature, microblogging influence power feature, LDA optimization content of text.The present invention has deeply excavated microblogging text semantic information and rumour nicety of grading is high.
Description
Technical field
The present invention relates to fields such as social networks, text analyzings, more particularly to a kind of social activity based on LDA and random forest
Network rumour recognition methods.
Background technology
With the rapid development of internet and mobile communication equipment, online social platform becomes people and issues and obtain letter
Cease, develop and maintain the important channel of social relationships.Microblogging easily interactive mode, friendly Interactive Experience and is entered by it
The influence gravitational attraction of famous person a large number of users.Index is irrigated according to the unicom of in August, 2017 to show, the microblogging moon, any active ues were up to 3.3 hundred million.
One of social platform active as China, microblogging summarize a large amount of fragment type users and generate information.Due to the letter of social platform
Cease the serious chaos state of presentation, the uncertain promotion of individual cognition, network rumour thus growth.Research finds to cause larger society
The rumour that can be influenced largely is derived from microblog.Under the situation of official channel missing, rumour can alleviated to a certain degree
The cognitive A-states of people.However, wreaking havoc for rumour often triggers negative passive network public opinion disturbance, to social stability and citizen
Safety forms potential threat, and the identification work of network rumour is particularly critical.
The research of related rumour identification at present mainly around the research of rumour text feature, rumour issue user characteristics study with
And communication network properties study analysis network rumour generates and mechanism of transmission.
In the above-mentioned methods, the Deep Semantics feature of rumour content, propagation User reliability and behavioural characteristic not yet obtain
It is used to good.
The content of the invention
In order to solve deficiency of the prior art, the technical scheme is that a kind of micro- based on LDA and random forest
Rich rumour recognition methods.Comprise the following steps:
Step 1, microblog data is collected from microblogging official platform using reptile method, the microblog data is included in text
Hold, thumb up number, forwarding number, comment number, microblogging number, concern number, bean vermicelli number, authentication state, rumour state, official puts down according to microblogging
Platform and the rumour information of national sector's issue manually mark microblog data;
Step 2, the content of text according to step 1 carries out unrelated character filtering, text participle, goes stop words, data
Conversion process, so as to obtain optimization content of text and optimization content of text phrase, and the number of statistic op- timization content of text word
Amount is standardized by optimizing content of text, optimization content of text word, the quantity for optimizing content of text word and z-score
Number, forwarding number, comment number, microblogging number, concern number, bean vermicelli number are thumbed up so as to obtain z-score standards described in processing step 1
Change microblog data, and microblog data is standardized according to z-score and calculates User reliability feature and microblogging influence power feature;
Step 3, carried out by LDA topic models to optimizing content of text and optimization content of text word described in step 2
Modeling Calculation, so as to obtain LDA theme distributions probability, LDA optimization content of text and theme distribution probability and LDA optimization texts
Content word and theme distribution probability, the text deep layer that LDA optimization content of text is identified with theme distribution probability as rumour
Semantic feature, and content of text is optimized according to LDA and is distributed with theme distribution probability and LDA themes with optimization content of text word
Probability calculation puzzlement degree;
Step 4, the User reliability feature according to step 2, microblogging influence power feature described in step 2, in step 3
The LDA theme distributions probability builds microblogging feature vector;
Step 5, the User reliability feature according to step 2, microblogging influence power feature described in step 2, in step 3
The LDA optimization content of text and input feature vector of the theme distribution probability as Random Forest model, use 10 folding cross validations
Grid-search algorithms calculate the Random Forest model based on CART decision trees optimized parameter, the optimized parameter combination step
Microblogging feature vector described in 4 designs microblogging rumour grader, and the microblog data manually marked according to step 1 carries out
Training obtains final microblogging rumour grader, and work is screened applied to rumour.
Preferably, microblog data is described in step 1:
weiboi={ doci,likei,reposti,commenti,numi,followingi,followeri,verifyi,
fakei}
(1≤i≤M)
Wherein, M be microblog data item number, i be microblog data sequence number, dociFor content of text, likeiTo thumb up number,
repostiTo forward number, commentiTo comment on number, numiFor microblogging number, followingiTo pay close attention to number, followeriFor powder
Silk number, verifyiFor authentication state, fakeiFor rumour state;
It is manually labeled as described in step 1:
User Status is authenticated by microblogging official platform, verifyiRepresent issue weiboiUser whether lead to
Sina weibo personal authentication is crossed, if passing through verifyiFor 1, otherwise verifyiFor 0, pass through the rumour that national sector issues and believe
Breath carries out rumour mark to microblog data, if microblogging weiboiFor rumour microblogging, then fakeiFor 1, otherwise fakeiFor 0;
Preferably, the standardization of z-score described in step 2 microblog data is:
z_weiboi={ op_doci,op_wordi,op_ni,z_likei,z_reposti,z_commenti,z_numi,
z_followingi,z_followeri,verifyi,fakei}(1≤i≤M)
Wherein, op_dociTo optimize content of text, op_wordiTo optimize content of text word, op_niTo optimize text
The quantity of content word, z_likeiNumber, z_repost are thumbed up for z-score standardizationiForwarding number, z_ are standardized for z-score
commentiComment number, z_num are standardized for z-scoreiMicroblogging number, z_following are standardized for z-scoreiFor z-
Score standardization concern numbers, z_followeriBean vermicelli number is standardized for z-score;
User reliability is characterized as described in step 2:
Microblogging influence power described in step 2 is characterized as:
Preferably, puzzlement degree described in step 3 is:
D={ op_word1,...,op_wordM}
pweiboi=(pi,1,...,pi,K)(1≤i≤M)
Wherein, M be step 1 described in microblog data item number, op_niTo optimize content of text word described in step 2
Quantity, op_wordiTo optimize content of text word, p (op_word described in step 2i) literary to optimize in optimization content of text
The probability of this content word, D represent the set of all optimization content of text words, p (zj|op_doci) it is i-th described in step 2
The probability that j-th of theme occurs in the optimization content of text of z-score standardization microblog data, p (op_wordi|zj) it is jth
The probability that the optimization content of text word of i-th z-score standardization microblog data occurs described in step 2 in a theme, K are
Theme number during puzzlement degree perplexity minimums, pweiboiMicroblogging number is standardized for i-th z-score described in step 2
According to LDA theme distribution probability, pi,1~pi,KRespectively z1~zKThe probability of theme;
Preferably, microblogging feature vector described in step 4 is:
cweiboi=(pi,1,...,pi,K,Reliabilityi,Influencei)(1≤i≤M)
Wherein, M be step 1 described in microblog data item number, pi,1~pi,KIt is respectively z described in step 31~zKTheme
Probability, ReliabilityiFor User reliability feature described in step 2, InfluenceiIt is influenced for microblogging described in step 2
Power feature.
Compared with prior art, the present invention is based on LDA topic identifications models deeply to excavate microblogging text semantic information, obtains
LDA optimizes content of text and theme distribution probability, by itself and the User reliability feature of definition and microblogging influence power characteristic variable
Input variable as random forest carries out classification based training, and rumour recognition effect of the present invention is notable, and the identification of model rumour is accurate
Rate is high.
Description of the drawings
Fig. 1:It is the method flow diagram of the embodiment of the present invention;
Specific embodiment
Understand for the ease of those of ordinary skill in the art and implement the present invention, with reference to the accompanying drawings and embodiments to this hair
It is bright to be described in further detail, it should be understood that implementation example described herein is merely to illustrate and explain the present invention, not
For limiting the present invention.
Referring to Fig.1, the method flow diagram of the embodiment of the present invention, the present invention provides a kind of micro- based on LDA and random forest
Rich rumour recognition methods, comprises the following steps:
Step 1, microblog data was collected on Sina weibo platform from 2016 using reptile method, the microblog data includes
Content of text thumbs up number, forwarding number, comment number, microblogging number, concern number, bean vermicelli number, authentication state, rumour state, according to Sina
Refute a rumour microblogging of refuting a rumour, Ministry of Environmental Protection's communication and education center, the Beijing environment of official account issue of the microblogging of microblogging is protected
The haze rumours in 2016 that Publicity and Education Center is protected in joint exposure on December 30 judge the rumour information of benchmark to microblogging as rumour
Data are manually marked;
Preferably, microblog data is described in step 1:
weiboi={ doci,likei,reposti,commenti,numi,followingi,followeri,verifyi,
fakei}(1)
(1≤i≤M)
Wherein, M=872 be microblog data item number, i be microblog data sequence number, dociFor content of text, likeiFor point
Praise number, repostiTo forward number, commentiTo comment on number, numiFor microblogging number, followingiTo pay close attention to number, followeri
For bean vermicelli number, verifyiFor authentication state, fakeiFor rumour state;
It is manually labeled as described in step 1:
User Status is authenticated by microblogging official platform, verifyiRepresent issue weiboiUser whether lead to
Sina weibo personal authentication is crossed, if passing through verifyiFor 1, otherwise verifyiFor 0, pass through the rumour that national sector issues and believe
Breath carries out rumour mark to microblog data, if microblogging weiboiFor rumour microblogging, then fakeiFor 1, otherwise fakeiFor 0;
Step 2, the content of text according to step 1 carries out unrelated character filtering, text participle, goes stop words, data
Conversion process, so as to obtain optimization content of text and optimization content of text phrase, and the number of statistic op- timization content of text word
Amount is standardized by optimizing content of text, optimization content of text word, the quantity for optimizing content of text word and z-score
Number, forwarding number, comment number, microblogging number, concern number, bean vermicelli number are thumbed up so as to obtain z-score standards described in processing step 1
Change microblog data, and microblog data is standardized according to z-score and calculates User reliability feature and microblogging influence power feature;
Preferably, the standardization of z-score described in step 2 microblog data is:
z_weiboi={ op_doci,op_wordi,op_ni,z_likei,z_reposti,z_commenti,z_numi,
(2)
z_followingi,z_followeri,verifyi,fakei}(1≤i≤M)
Wherein, op_dociTo optimize content of text, op_wordiTo optimize content of text word, op_niTo optimize text
The quantity of content word, z_likeiNumber, z_repost are thumbed up for z-score standardizationiForwarding number, z_ are standardized for z-score
commentiComment number, z_num are standardized for z-scoreiMicroblogging number, z_following are standardized for z-scoreiFor z-
Score standardization concern numbers, z_followeriBean vermicelli number is standardized for z-score;
User reliability is characterized as described in step 2:
Microblogging influence power described in step 2 is characterized as:
Step 3, carried out by LDA topic models to optimizing content of text and optimization content of text word described in step 2
Modeling Calculation, so as to obtain LDA theme distributions probability, LDA optimization content of text and theme distribution probability and LDA themes with it is excellent
Change content of text word distribution probability, the text deep layer that LDA optimization content of text is identified with theme distribution probability as rumour
Semantic feature, and content of text is optimized according to LDA and is distributed with theme distribution probability and LDA themes with optimization content of text word
Probability calculation puzzlement degree;
Preferably, puzzlement degree described in step 3 is:
D={ op_word1,...,op_wordM}(6)
pweiboi=(pi,1,...,pi,K)(1≤i≤M)(8)
Wherein, M=872 be step 1 described in microblog data item number, op_niTo optimize content of text described in step 2
The quantity of word, op_wordiTo optimize content of text word, p (op_word described in step 2i) excellent in content of text to optimize
Change the probability of content of text word, D represents the set of optimization content of text word, p (zj|op_doci) it is i-th described in step 2
The probability that j-th of theme occurs in the optimization content of text of z-score standardization microblog data, p (op_wordi|zj) it is jth
The probability that the optimization content of text word of i-th z-score standardization microblog data occurs described in step 2 in a theme, K=
7 be puzzlement degree perplexity minimums when theme number, pweiboiIt is micro- for i-th z-score standardization described in step 2
The LDA theme distribution probability of rich data, pi,1~pi,KRespectively z1~zKThe probability of theme;
Step 4, the User reliability feature according to step 2, microblogging influence power feature described in step 2, in step 3
The LDA theme distributions probability builds microblogging feature vector;
Preferably, microblogging feature vector described in step 4 is:
cweiboi=(pi,1,...,pi,K,Reliabilityi,Influencei)(1≤i≤M) (9)
Wherein, M=872 be step 1 described in microblog data item number, pi,1~pi,KIt is respectively z described in step 31~zK
The probability of theme, ReliabilityiFor User reliability feature described in step 2, InfluenceiFor microblogging described in step 2
Influence power feature.
Step 5, the User reliability feature according to step 2, microblogging influence power feature described in step 2, in step 3
The LDA optimization content of text and input feature vector of the theme distribution probability as Random Forest model, use 10 folding cross validations
Grid-search algorithms calculate the Random Forest model based on CART decision trees optimized parameter, the optimized parameter combination step
Microblogging feature vector described in 4 designs microblogging rumour grader, and the microblog data manually marked according to step 1 carries out
Training obtains final microblogging rumour grader, and work is screened applied to rumour.
It should be appreciated that the above-mentioned description for preferred embodiment is more detailed, can not therefore be considered to this
The limitation of invention patent protection scope, those of ordinary skill in the art are not departing from power of the present invention under the enlightenment of the present invention
Profit is required under protected ambit, can also be made replacement or deformation, be each fallen within protection scope of the present invention, this hair
It is bright scope is claimed to be determined by the appended claims.
Claims (2)
1. a kind of microblogging rumour recognition methods based on LDA and random forest, which is characterized in that comprise the following steps:
Step 1, collect microblog data from microblogging official platform using reptile method, the microblog data include content of text,
Number, forwarding number, comment number, microblogging number, concern number, bean vermicelli number, authentication state, rumour state are thumbed up, according to microblogging official platform
The rumour information issued with national sector manually marks microblog data;
Step 2, the content of text according to step 1 carries out unrelated character filtering, text participle, goes stop words, data conversion
Processing so as to obtain optimization content of text and optimization content of text phrase, and the quantity of statistic op- timization content of text word, is led to
Cross optimization content of text, optimization content of text word, the quantity for optimizing content of text word and z-score standardizations step
Number, forwarding number, comment number, microblogging number, concern number, bean vermicelli number are thumbed up so as to obtain z-score standardization microbloggings described in rapid 1
Data, and microblog data is standardized according to z-score and calculates User reliability feature and microblogging influence power feature;
Step 3, it is modeled by LDA topic models to optimizing content of text and optimization content of text word described in step 2
It calculates, so as to obtain LDA theme distributions probability, LDA optimization content of text and theme distribution probability and LDA optimization content of text
Word and theme distribution probability, the text Deep Semantics that LDA optimization content of text is identified with theme distribution probability as rumour
Feature, and content of text and theme distribution probability and LDA themes and optimization content of text word distribution probability are optimized according to LDA
Calculate puzzlement degree;
Step 4, the User reliability feature according to step 2, microblogging influence power feature described in step 2, described in step 3
LDA theme distribution probability builds microblogging feature vector;
Step 5, the User reliability feature according to step 2, microblogging influence power feature described in step 2, described in step 3
LDA optimizes content of text and input feature vector of the theme distribution probability as Random Forest model, uses the net of 10 folding cross validations
Lattice searching algorithm calculates the optimized parameter of the Random Forest model based on CART decision trees, in the optimized parameter combination step 4
The microblogging feature vector designs microblogging rumour grader, and the microblog data manually marked according to step 1 is instructed
Final microblogging rumour grader is got, work is screened applied to rumour.
2. the microblogging rumour recognition methods according to claim 1 based on LDA and random forest, which is characterized in that step 1
Described in microblog data be:
weiboi={ doci,likei,reposti,commenti,numi,followingi,followeri,verifyi,fakei}
(1≤i≤M)
Wherein, M be microblog data item number, i be microblog data sequence number, dociFor content of text, likeiTo thumb up number,
repostiTo forward number, commentiTo comment on number, numiFor microblogging number, followingiTo pay close attention to number, followeriFor powder
Silk number, verifyiFor authentication state, fakeiFor rumour state;
It is manually labeled as described in step 1:
User Status is authenticated by microblogging official platform, verifyiRepresent issue weiboiUser whether by new
Unrestrained microblogging personal authentication, if passing through verifyiFor 1, otherwise verifyiFor 0, pass through the rumour information pair that national sector issues
Microblog data carries out rumour mark, if microblogging weiboiFor rumour microblogging, then fakeiFor 1, otherwise fakeiFor 0;
Z-score described in step 2 standardizes microblog data:
z_weiboi={ op_doci,op_wordi,op_ni,z_likei,z_reposti,z_commenti,z_numi,z_
followingi,z_followeri,verifyi,fakei}(1≤i≤M)
Wherein, op_dociTo optimize content of text, op_wordiTo optimize content of text word, op_niTo optimize content of text
The quantity of word, z_likeiNumber, z_repost are thumbed up for z-score standardizationiForwarding number, z_ are standardized for z-score
commentiComment number, z_num are standardized for z-scoreiMicroblogging number, z_following are standardized for z-scoreiFor z-
Score standardization concern numbers, z_followeriBean vermicelli number is standardized for z-score;
User reliability is characterized as described in step 2:
Microblogging influence power described in step 2 is characterized as:
<mrow>
<msub>
<mi>Influence</mi>
<mi>i</mi>
</msub>
<mo>=</mo>
<mi>log</mi>
<mrow>
<mo>(</mo>
<msup>
<mi>e</mi>
<mrow>
<mi>z</mi>
<mo>_</mo>
<msub>
<mi>follower</mi>
<mi>i</mi>
</msub>
</mrow>
</msup>
<mo>+</mo>
<msup>
<mi>e</mi>
<mrow>
<mi>z</mi>
<mo>_</mo>
<msub>
<mi>repost</mi>
<mi>i</mi>
</msub>
</mrow>
</msup>
<mo>+</mo>
<msup>
<mi>e</mi>
<mrow>
<mi>z</mi>
<mo>_</mo>
<msub>
<mi>comment</mi>
<mi>i</mi>
</msub>
</mrow>
</msup>
<mo>)</mo>
</mrow>
<mo>+</mo>
<mi>z</mi>
<mo>_</mo>
<msub>
<mi>like</mi>
<mi>i</mi>
</msub>
</mrow>
Puzzlement degree described in step 3 is:
<mrow>
<mi>p</mi>
<mi>e</mi>
<mi>r</mi>
<mi>p</mi>
<mi>l</mi>
<mi>e</mi>
<mi>x</mi>
<mi>i</mi>
<mi>t</mi>
<mi>y</mi>
<mrow>
<mo>(</mo>
<mi>D</mi>
<mo>)</mo>
</mrow>
<mo>=</mo>
<mi>exp</mi>
<mo>{</mo>
<mo>-</mo>
<mfrac>
<mrow>
<munderover>
<mo>&Sigma;</mo>
<mrow>
<mi>i</mi>
<mo>=</mo>
<mn>1</mn>
</mrow>
<mi>M</mi>
</munderover>
<mi>log</mi>
<mi> </mi>
<mi>p</mi>
<mrow>
<mo>(</mo>
<mi>o</mi>
<mi>p</mi>
<mo>_</mo>
<msub>
<mi>word</mi>
<mi>i</mi>
</msub>
<mo>)</mo>
</mrow>
</mrow>
<mrow>
<munderover>
<mo>&Sigma;</mo>
<mrow>
<mi>i</mi>
<mo>=</mo>
<mn>1</mn>
</mrow>
<mi>M</mi>
</munderover>
<mi>o</mi>
<mi>p</mi>
<mo>_</mo>
<msub>
<mi>n</mi>
<mi>i</mi>
</msub>
</mrow>
</mfrac>
<mo>}</mo>
</mrow>
D={ op_word1,...,op_wordM}
<mrow>
<mi>p</mi>
<mrow>
<mo>(</mo>
<mi>o</mi>
<mi>p</mi>
<mo>_</mo>
<msub>
<mi>word</mi>
<mi>i</mi>
</msub>
<mo>)</mo>
</mrow>
<mo>=</mo>
<munderover>
<mo>&Sigma;</mo>
<mrow>
<mi>j</mi>
<mo>=</mo>
<mn>1</mn>
</mrow>
<mi>K</mi>
</munderover>
<mrow>
<mo>(</mo>
<mi>p</mi>
<mo>(</mo>
<mrow>
<msub>
<mi>z</mi>
<mi>j</mi>
</msub>
<mo>|</mo>
<mi>o</mi>
<mi>p</mi>
<mo>_</mo>
<msub>
<mi>doc</mi>
<mi>i</mi>
</msub>
</mrow>
<mo>)</mo>
<mi>p</mi>
<mo>(</mo>
<mrow>
<mi>o</mi>
<mi>p</mi>
<mo>_</mo>
<msub>
<mi>word</mi>
<mi>i</mi>
</msub>
<mo>|</mo>
<msub>
<mi>z</mi>
<mi>j</mi>
</msub>
</mrow>
<mo>)</mo>
<mo>)</mo>
</mrow>
</mrow>
pweiboi=(pi,1,...,pi,K)(1≤i≤M)
Wherein, M be step 1 described in microblog data item number, op_niTo optimize the number of content of text word described in step 2
Amount, op_wordiTo optimize content of text word, p (op_word described in step 2i) in optimization text in optimization content of text
Hold the probability of word, D represents the set of all optimization content of text words, p (zj|op_doci) it is i-th z- described in step 2
The probability that j-th of theme occurs in the optimization content of text of score standardization microblog datas, p (op_wordi|zj) it is j-th of master
The probability that the optimization content of text word of i-th z-score standardization microblog data occurs described in step 2 in topic, K are puzzlement
Spend theme number during perplexity minimums, pweiboiMicroblog data is standardized for i-th z-score described in step 2
LDA theme distribution probability, pi,1~pi,KRespectively z1~zKThe probability of theme;
Microblogging feature vector described in step 4 is:
cweiboi=(pi,1,...,pi,K,Reliabilityi,Influencei)(1≤i≤M)
Wherein, M be step 1 described in microblog data item number, pi,1~pi,KIt is respectively z described in step 31~zKTheme it is general
Rate, ReliabilityiFor User reliability feature described in step 2, InfluenceiIt is special for microblogging influence power described in step 2
Sign.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201711483228.0A CN108090046B (en) | 2017-12-29 | 2017-12-29 | Microblog rumor identification method based on LDA and random forest |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201711483228.0A CN108090046B (en) | 2017-12-29 | 2017-12-29 | Microblog rumor identification method based on LDA and random forest |
Publications (2)
Publication Number | Publication Date |
---|---|
CN108090046A true CN108090046A (en) | 2018-05-29 |
CN108090046B CN108090046B (en) | 2021-05-04 |
Family
ID=62181216
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201711483228.0A Expired - Fee Related CN108090046B (en) | 2017-12-29 | 2017-12-29 | Microblog rumor identification method based on LDA and random forest |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN108090046B (en) |
Cited By (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109558483A (en) * | 2018-10-16 | 2019-04-02 | 北京航空航天大学 | A kind of rumour recognition methods based on model-naive Bayesian |
CN110413776A (en) * | 2019-07-01 | 2019-11-05 | 武汉大学 | It is a kind of to cooperate with parallel text subject model LDA high-performance calculation method based on CPU-GPU |
CN110795560A (en) * | 2019-10-21 | 2020-02-14 | 国网湖南省电力有限公司 | Method and system for subdividing power grid electricity customers |
CN111368092A (en) * | 2020-02-21 | 2020-07-03 | 中国科学院电子学研究所苏州研究院 | Knowledge graph construction method based on trusted webpage resources |
CN111385655A (en) * | 2018-12-29 | 2020-07-07 | 武汉斗鱼网络科技有限公司 | Advertisement bullet screen detection method and device, server and storage medium |
CN112766359A (en) * | 2021-01-14 | 2021-05-07 | 北京工商大学 | Word double-dimensional microblog rumor recognition method for food safety public sentiment |
CN113190682A (en) * | 2021-06-30 | 2021-07-30 | 平安科技(深圳)有限公司 | Method and device for acquiring event influence degree based on tree model and computer equipment |
Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN105975478A (en) * | 2016-04-09 | 2016-09-28 | 北京交通大学 | Word vector analysis-based online article belonging event detection method and device |
CN106202211A (en) * | 2016-06-27 | 2016-12-07 | 四川大学 | A kind of integrated microblogging rumour recognition methods based on microblogging type |
CN107423339A (en) * | 2017-04-29 | 2017-12-01 | 天津大学 | Popular microblogging Forecasting Methodology based on extreme Gradient Propulsion and random forest |
-
2017
- 2017-12-29 CN CN201711483228.0A patent/CN108090046B/en not_active Expired - Fee Related
Patent Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN105975478A (en) * | 2016-04-09 | 2016-09-28 | 北京交通大学 | Word vector analysis-based online article belonging event detection method and device |
CN106202211A (en) * | 2016-06-27 | 2016-12-07 | 四川大学 | A kind of integrated microblogging rumour recognition methods based on microblogging type |
CN107423339A (en) * | 2017-04-29 | 2017-12-01 | 天津大学 | Popular microblogging Forecasting Methodology based on extreme Gradient Propulsion and random forest |
Non-Patent Citations (3)
Title |
---|
KE WU ET AL.: "False rumors detection on sina weibo by propagation structures", 《2015 IEEE 31ST INTERNATIONAL CONFERENCE ON DATA ENGINEERING》 * |
QIAO ZHANG ET AL.: "Automatic Detection of Rumor on Social Network", 《CCF INTERNATIONAL CONFERENCE ON NATURAL LANGUAGE PROCESSING AND CHINESE COMPUTING》 * |
王峰 等: "新浪微博平台上的用户可信度评估", 《计算机科学与探索》 * |
Cited By (11)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109558483A (en) * | 2018-10-16 | 2019-04-02 | 北京航空航天大学 | A kind of rumour recognition methods based on model-naive Bayesian |
CN109558483B (en) * | 2018-10-16 | 2021-06-18 | 北京航空航天大学 | Rumor recognition method based on naive Bayes model |
CN111385655A (en) * | 2018-12-29 | 2020-07-07 | 武汉斗鱼网络科技有限公司 | Advertisement bullet screen detection method and device, server and storage medium |
CN110413776A (en) * | 2019-07-01 | 2019-11-05 | 武汉大学 | It is a kind of to cooperate with parallel text subject model LDA high-performance calculation method based on CPU-GPU |
CN110795560A (en) * | 2019-10-21 | 2020-02-14 | 国网湖南省电力有限公司 | Method and system for subdividing power grid electricity customers |
CN111368092A (en) * | 2020-02-21 | 2020-07-03 | 中国科学院电子学研究所苏州研究院 | Knowledge graph construction method based on trusted webpage resources |
CN111368092B (en) * | 2020-02-21 | 2020-12-04 | 中国科学院电子学研究所苏州研究院 | Knowledge graph construction method based on trusted webpage resources |
CN112766359A (en) * | 2021-01-14 | 2021-05-07 | 北京工商大学 | Word double-dimensional microblog rumor recognition method for food safety public sentiment |
CN112766359B (en) * | 2021-01-14 | 2023-07-25 | 北京工商大学 | Word double-dimension microblog rumor identification method for food safety public opinion |
CN113190682A (en) * | 2021-06-30 | 2021-07-30 | 平安科技(深圳)有限公司 | Method and device for acquiring event influence degree based on tree model and computer equipment |
CN113190682B (en) * | 2021-06-30 | 2021-09-28 | 平安科技(深圳)有限公司 | Method and device for acquiring event influence degree based on tree model and computer equipment |
Also Published As
Publication number | Publication date |
---|---|
CN108090046B (en) | 2021-05-04 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN108090046A (en) | A kind of microblogging rumour recognition methods based on LDA and random forest | |
CN105045857A (en) | Social network rumor recognition method and system | |
US8306932B2 (en) | System and method for adaptive data masking | |
CN104573094B (en) | Network account identifies matching process | |
CN104991891B (en) | A kind of short text feature extracting method | |
CN107220352A (en) | The method and apparatus that comment collection of illustrative plates is built based on artificial intelligence | |
CN111767725B (en) | Data processing method and device based on emotion polarity analysis model | |
CN106294590A (en) | A kind of social networks junk user filter method based on semi-supervised learning | |
CN106295702B (en) | A kind of social platform user classification method based on the analysis of individual affective behavior | |
CN108804701A (en) | Personage's portrait model building method based on social networks big data | |
CN112199608A (en) | Social media rumor detection method based on network information propagation graph modeling | |
Valdivia et al. | Neutrality in the sentiment analysis problem based on fuzzy majority | |
Chen et al. | Modeling rumor diffusion process with the consideration of individual heterogeneity: Take the imported food safety issue as an example during the COVID-19 pandemic | |
CN107688576A (en) | The structure and tendentiousness sorting technique of a kind of CNN SVM models | |
CN109522416A (en) | A kind of construction method of Financial Risk Control knowledge mapping | |
CN102945246A (en) | Method and device for processing network information data | |
CN111191099A (en) | User activity type identification method based on social media | |
CN104778283A (en) | User occupation classification method and system based on microblog | |
CN112559734A (en) | Presentation generation method and device, electronic equipment and computer readable storage medium | |
CN107392392A (en) | Microblogging forwarding Forecasting Methodology based on deep learning | |
Fedushko et al. | Determination of the account personal data adequacy of web-community member | |
CN115423639A (en) | Social network-oriented secure community discovery method | |
CN108536757A (en) | One kind being based on the potentially harmful theme bootstrap technique of user's history network | |
CN108073604A (en) | Text handling method and device | |
Al-Hashedi et al. | Cyberbullying detection based on emotion |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant | ||
CF01 | Termination of patent right due to non-payment of annual fee | ||
CF01 | Termination of patent right due to non-payment of annual fee |
Granted publication date: 20210504 Termination date: 20211229 |