CN105045857A - Social network rumor recognition method and system - Google Patents
Social network rumor recognition method and system Download PDFInfo
- Publication number
- CN105045857A CN105045857A CN201510401458.2A CN201510401458A CN105045857A CN 105045857 A CN105045857 A CN 105045857A CN 201510401458 A CN201510401458 A CN 201510401458A CN 105045857 A CN105045857 A CN 105045857A
- Authority
- CN
- China
- Prior art keywords
- feature
- microblogging
- popularity
- microblog
- text
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/90—Details of database functions independent of the retrieved data types
- G06F16/95—Retrieval from the web
- G06F16/951—Indexing; Web crawling techniques
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/30—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
- G06F16/35—Clustering; Classification
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Databases & Information Systems (AREA)
- Data Mining & Analysis (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
The invention discloses a social network rumor recognition method and system. The method comprises the steps of: obtaining a microblog information case, obtaining microblog information and user information of the microblog information case, and according to the microblog information and the user information, extracting microblog content features of the microblog information case, wherein the microblog content features include a shallow text feature and a deeply implied microblog feature; extracting basic attribute features of a user and deeply implied features of the user according to the user information, and extracting microblog popularity features according to the microblog information, wherein the microblog popularity features include a volatility feature based on popularity and popularity trend, a difference feature and a forwarding feature; and establishing a feature vector and a training classifier according to the shallow text feature, the deeply implied microblog feature, the basic attribute features, the deeply implied features of the user and the microblog popularity features, inputting the feature vector into the classifier, and outputting a result.
Description
Technical field
The present invention relates to social network analysis field, particularly the recognition methods of a kind of social networks rumour and system.
Background technology
Social networks popular and universal, information content in social networks is increased with presenting explosion type, but information quality is not but promoted accordingly, the various junk information especially deceptive information such as rumour is flooded with whole social networks, and the propagation of rumour and diffusion bring to the development of the life of people and society and endanger and negative effect greatly.
The rumour message in the middle of social networks can be identified timely and accurately, not only contribute to building good internet environment, help the true and false of people's identifying information better, stop the serious harm that malicious rumor brings in time, can also monitor in public sentiment, play positive effect in information guidance etc.
Current existing rumour recognition methods mainly can be divided into two classes, one class is based on artificial method, its mechanism is mainly by manually reporting to the authorities announced message and judging, the initial stage that these class methods cannot produce at rumour contains that it is propagated and diffusion, promptness is poor, and need a large amount of labours and financial resources, cost-effectivenes is high, another kind of method is the method based on machine learning, whether be that rumour processes as classification problem using microblogging, and utilize each category feature of microblogging, adopt certain classification learning algorithm to carry out the identification of rumour, in the selection of characteristic of division, mainly can be divided into 3 kinds at present, the content of microblogging respectively, the propagation of publisher and microblogging, in the selection of content characteristic, mainly utilize the shallow-layer text feature of content (as whether comprised link in content at present, picture, whether mention other people etc.), and deeper analysis is not done to text, its semanteme of abundant excavation, theme, the hidden features such as emotion, in publisher, mainly select some static natures, comprise the base attributes such as the bean vermicelli number of publisher, friend's number, do not take the confidence level and influence power etc. of publisher into consideration, in the selection of microblogging propagation characteristic, related work mainly concentrates on the propagation model of research microblogging rumour, structure take rumour as the forwarding graph of a relation of ancestor node, simulate its dissemination, or be only confined to some and simply forward attribute, do not analyse in depth rumour other features in communication process further.In the correlative study that these rumour recognition features are selected, but calibration is bad for selected feature, has some limitations, causes final rumour recognition effect not good, in sum, a kind of automatic mode that accurately can identify microblogging rumour is lacked in existing method.
Summary of the invention
For the defect existed in prior art and the exclusive feature of microblogging rumour, the object of the invention is to utilize the content of microblogging, issue the feature of popularity three aspects of user and microblogging, and by the sorting technique in machine learning, realize the automatic identification of microblogging rumour, and effectively improve recognition accuracy and the recall rate of microblogging rumour, the present invention proposes the recognition methods of a kind of social networks rumour and system.
The invention provides the recognition methods of a kind of social networks rumour, comprising:
Step 1, acquisition micro-blog information example, and obtain micro-blog information and the user profile of described micro-blog information example, according to described micro-blog information and described user profile, extract the content of microblog feature of described micro-blog information example, described content of microblog feature comprises shallow-layer text feature and microblogging deep layer hidden feature;
Step 2, according to described user profile, extract base attribute feature and user's deep layer hidden feature of described user, extract the microblogging popularity feature of described microblogging according to described micro-blog information, described microblogging popularity feature comprise based on popularity and popularity trend undulatory property feature and otherness feature and forward feature;
Step 3, according to described shallow-layer text feature, described microblogging deep layer hidden feature, described base attribute feature, described user's deep layer hidden feature, described microblogging popularity feature, construction feature vector, training classifier, described proper vector is inputted described sorter and Output rusults, identifies social networks rumour to complete.
Described social networks rumour recognition methods, described microblogging deep layer hidden feature comprises the viewpoint tendentiousness feature of focus tendentiousness feature, inside and outside consistance feature, feeling polarities feature and comment.
Described social networks rumour recognition methods, described user's deep layer hidden feature comprises social characteristics, viewpoint forwards feature and microblogging matching degree feature.
Described social networks rumour recognition methods, the step extracting described focus tendentiousness feature and described inside and outside consistance feature comprises carries out participle and part-of-speech tagging to microblogging text, noun, the verb with performance meaning are extracted as keyword, and use the weight that the TF-IDF in Text character extraction sorts as keyword, using the keyword of K the highest for a weight word as microblogging text.
Described social networks rumour recognition methods, the step of described extraction feeling polarities feature comprises carries out participle and part-of-speech tagging to microblogging text, and carried out the extraction of keyword by sentiment dictionary, emoticon dictionary, punctuation mark dictionary, responsive dictionary, the word extracting the notional word in described microblogging and can match in dictionary.
The present invention also proposes a kind of social networks rumour recognition system, comprising:
Extract content of microblog characteristic module, for obtaining micro-blog information example, and obtain micro-blog information and the user profile of described micro-blog information example, according to described micro-blog information and described user profile, extract the content of microblog feature of described micro-blog information example, described content of microblog feature comprises shallow-layer text feature and microblogging deep layer hidden feature;
Extract microblogging popularity characteristic module, for according to described user profile, extract base attribute feature and user's deep layer hidden feature of described user, extract the microblogging popularity feature of described microblogging according to described micro-blog information, described microblogging popularity feature comprise based on popularity and popularity trend undulatory property feature and otherness feature and forward feature;
Identify rumour module, for according to described shallow-layer text feature, described microblogging deep layer hidden feature, described base attribute feature, described user's deep layer hidden feature, described microblogging popularity feature, construction feature vector, training classifier, described proper vector is inputted described sorter and Output rusults, identifies social networks rumour to complete.
Described social networks rumour recognition system, described microblogging deep layer hidden feature comprises the viewpoint tendentiousness feature of focus tendentiousness feature, inside and outside consistance feature, feeling polarities feature and comment.
Described social networks rumour recognition system, described user's deep layer hidden feature comprises social characteristics, viewpoint forwards feature and microblogging matching degree feature.
Described social networks rumour recognition system, the step extracting described focus tendentiousness feature and described inside and outside consistance feature comprises carries out participle and part-of-speech tagging to microblogging text, noun, the verb with performance meaning are extracted as keyword, and use the weight that the TF-IDF in Text character extraction sorts as keyword, using the keyword of K the highest for a weight word as microblogging text.
Described social networks rumour recognition system, the step of described extraction feeling polarities feature comprises carries out participle and part-of-speech tagging to microblogging text, and carried out the extraction of keyword by sentiment dictionary, emoticon dictionary, punctuation mark dictionary, responsive dictionary, the word extracting the notional word in described microblogging and can match in dictionary.
From above scheme, the invention has the advantages that:
Effect of the present invention is: the exclusive feature that the present invention is directed to microblogging rumour, introduces content of microblog and the deep layer hidden feature issuing user, effectively can distinguish rumour microblogging and general microblogging in identifying; Merge the popularity and popularity trend feature that change in microblogging communication process, significantly improve accuracy rate and the recall rate of rumour identification in assorting process.
Accompanying drawing explanation
Fig. 1 is the overall flow figure of one embodiment of the invention;
Fig. 2 be content in one embodiment of the invention focus tendentiousness characteristic sum inside and outside the process flow diagram of consistance feature extraction;
Fig. 3 is the process flow diagram of the feeling polarities feature extraction of content in one embodiment of the invention;
Fig. 4 is the process flow diagram of the viewpoint tendentiousness feature extraction commented in one embodiment of the invention;
The process flow diagram that Fig. 5 is the social characteristics of user in one embodiment of the invention, viewpoint forwards feature and the feature extraction of history microblogging matching degree;
Fig. 6 is the undulatory property of popularity in one embodiment of the invention and the process flow diagram of otherness feature extraction.
Embodiment
Describe the present invention below in conjunction with the drawings and specific embodiments, but not as a limitation of the invention.
The present invention proposes. and the recognition methods of a kind of social networks rumour, below comprises for overall step:
Acquisition micro-blog information example, and obtain micro-blog information and the user profile of described micro-blog information example, according to described micro-blog information and described user profile, extract the content of microblog feature of described micro-blog information example, described content of microblog feature comprises shallow-layer text feature and microblogging deep layer hidden feature;
According to described user profile, extract base attribute feature and user's deep layer hidden feature of described user, extract the microblogging popularity feature of described microblogging according to described micro-blog information, described microblogging popularity feature comprise based on popularity and popularity trend undulatory property feature and otherness feature and forward feature;
According to described shallow-layer text feature, described microblogging deep layer hidden feature, described base attribute feature, described user's deep layer hidden feature, described microblogging popularity feature, construction feature vector, training classifier, described proper vector is inputted described sorter and Output rusults, identifies social networks rumour to complete.
Described microblogging deep layer hidden feature comprises the viewpoint tendentiousness feature of focus tendentiousness feature, inside and outside consistance feature, feeling polarities feature and comment.
Described user's deep layer hidden feature comprises social characteristics, viewpoint forwards feature and microblogging matching degree feature.
The step extracting described focus tendentiousness feature and described inside and outside consistance feature comprises carries out participle and part-of-speech tagging to microblogging text, noun, the verb with performance meaning are extracted as keyword, and use the weight that the TF-IDF in Text character extraction sorts as keyword, using the keyword of K the highest for a weight word as microblogging text.
The step of described extraction feeling polarities feature comprises carries out participle and part-of-speech tagging to microblogging text, and carried out the extraction of keyword by sentiment dictionary, emoticon dictionary, punctuation mark dictionary, responsive dictionary, the word extracting the notional word in described microblogging and can match in dictionary.
The present invention also proposes a kind of social networks rumour recognition system, comprising:
Extract content of microblog characteristic module, for obtaining micro-blog information example, and obtain micro-blog information and the user profile of described micro-blog information example, according to described micro-blog information and described user profile, extract the content of microblog feature of described micro-blog information example, described content of microblog feature comprises shallow-layer text feature and microblogging deep layer hidden feature;
Extract microblogging popularity characteristic module, for according to described user profile, extract base attribute feature and user's deep layer hidden feature of described user, extract the microblogging popularity feature of described microblogging according to described micro-blog information, described microblogging popularity feature comprise based on popularity and popularity trend undulatory property feature and otherness feature and forward feature;
Identify rumour module, for according to described shallow-layer text feature, described microblogging deep layer hidden feature, described base attribute feature, described user's deep layer hidden feature, described microblogging popularity feature, construction feature vector, training classifier, described proper vector is inputted described sorter and Output rusults, identifies social networks rumour to complete.
Described microblogging deep layer hidden feature comprises the viewpoint tendentiousness feature of focus tendentiousness feature, inside and outside consistance feature, feeling polarities feature and comment.
Described user's deep layer hidden feature comprises social characteristics, viewpoint forwards feature and microblogging matching degree feature.
The step extracting described focus tendentiousness feature and described inside and outside consistance feature comprises carries out participle and part-of-speech tagging to microblogging text, noun, the verb with performance meaning are extracted as keyword, and use the weight that the TF-IDF in Text character extraction sorts as keyword, using the keyword of K the highest for a weight word as microblogging text.
The step of described extraction feeling polarities feature comprises carries out participle and part-of-speech tagging to microblogging text, and carried out the extraction of keyword by sentiment dictionary, emoticon dictionary, punctuation mark dictionary, responsive dictionary, the word extracting the notional word in described microblogging and can match in dictionary.
As shown in Figure 1, a specific embodiment of a kind of social networks rumour recognition methods, comprises the following steps:
(1) acquisition of micro-blog information example, according to the microblogging unique identification of input, obtain micro-blog information and corresponding user profile, micro-blog information comprises microblogging text, the history transfer amount vector of microblogging and all comment texts of microblogging, and user profile comprises the base attribute (bean vermicelli number, friend's number, mutually powder number) of user, the history microblogging within month and corresponding transfer amount vector.Step (1) is corresponding to the step 101 in figure and 102.
(2) extraction of content of microblog feature, comprises the shallow-layer text feature of content and deep layer hidden feature (the viewpoint tendentiousness feature of focus tendentiousness feature, inside and outside consistance feature, feeling polarities feature and comment)
Step (2) is corresponding to the step 103 in Fig. 1.
In step 103, first extract the shallow-layer text feature of content of microblog, comprising: whether comprise external linkage in microblogging issuing time to the interval, microblogging text of microblog account hour of log-on, whether containing picture, audio frequency, video etc., whether mention other people; Then, utilize the outside page 105 linking indication in focus theme dictionary 104, microblogging text to extract the detailed process of consistance feature inside and outside focus tendentiousness characteristic sum as shown in Figure 2, utilize all kinds of dictionary 106 such as emotion, symbol to extract the detailed process of the viewpoint tendentiousness feature of feeling polarities feature and comment respectively as shown in Figure 3 and Figure 4.
Inside and outside focus tendentiousness characteristic sum consistance feature leaching process in, first instrument is used to carry out participle and part-of-speech tagging to microblogging text, only using having the performance noun of meaning, verb extracts as keyword, and use the TF-IDF in Text character extraction (termfrequency – inversedocumentfrequency) to be used as the weight of keyword sequence, K the word that after selected and sorted, weight is the highest is as the keyword of microblogging text, and this process is as shown in step 201 and step 202.
In step 203, utilize focus theme dictionary 204 to extract the focus tendentiousness feature of content.
Suppose W={w
1, w
2..., w
nthe keyword set extracted from microblogging T, w
irepresent wherein a certain keyword, HotTopicWordBase={T
1, T
2..., T
mby the focus theme dictionary of subject classification, T
ibe the set of letters under a certain focus theme, then the focus tendentiousness computing formula of this content of microblog is as follows:
hot_feature(W)=max(simi(W,T
1),simi(W,T
2),…,simi(W,T
m))
In above-mentioned formula, simi (W, T
i) represent set of letters T under the keyword set W of microblogging T and a certain focus theme
ijaccard similarity.
In step 205, utilize in microblogging text the outside page 206 linking indication to extract the inside and outside consistance feature of content, wherein the outside page is described by page title (title), page statement (description) and page key words (keyword), and computing formula is as follows:
In above-mentioned formula, Rel (T, title), Rel (T, description), Rel (T, keywords) represent the correlativity of the title of microblogging text and the outside page, page-describing and page key words respectively, and use Jaccard similarity to characterize.
In the leaching process of feeling polarities feature, first instrument is used to carry out participle and part-of-speech tagging 301 to microblogging text, and the extraction of keyword is carried out by sentiment dictionary, emoticon dictionary, punctuation mark dictionary, responsive dictionary, the word extracting the notional word (verb, noun, adjective) in microblogging and can match in dictionary, this process as shown in step 302.
Adopt the TF-IDF improved to carry out the calculating 304 of lexical item weight to the word extracted, weight calculation is as follows:
Wight in above-mentioned formula
krepresent the weight of a kth lexical item in microblogging, f
krepresent the word frequency of current lexical item in this microblogging, l represents the lexical item number in this microblogging, level
krepresent the grade of current lexical item, lexical item grade be set as follows shown in table:
In step 305, the lexical item weight obtained in above-mentioned steps is utilized to build the proper vector of microblogging text, for the input of feeling polarities sorter, obtain the feeling polarities feature of microblogging, have employed in the present invention with support vector machine (SVM), and feeling polarities is divided into front, negative and neutral three types.
In step 401, instrument is first used to carry out participle and part-of-speech tagging to microblogging text.
In step 402, except utilizing the dictionary that uses in feeling polarities feature extraction, also use viewpoint polarity dictionary, extract the viewpoint tendentiousness feature of comment.First according to the result of participle and part-of-speech tagging, and sentiment dictionary, emoticon dictionary, punctuation mark dictionary, responsive dictionary and viewpoint polarity dictionary, the word extracting the notional word (verb, noun, adjective) in microblogging and can match in dictionary, and supplement keyword by 2-gram, 3-gram and 4-gram of microblogging text.
In step 405, the TF-IDF improved is adopted to carry out the calculating of lexical item weight to the word extracted, the same with step 304.
The lexical item weight utilizing above-mentioned steps to obtain builds the proper vector of the every bar comment of microblogging, as the input of viewpoint polarity sorter, obtain the viewpoint polarity of the every bar comment of microblogging, as shown at step 406, in the present invention, the viewpoint polarity of comment is divided into support, opposes and other three classes.
In step 407, in statistics microblogging, all comment viewpoint polarity is the number supported and oppose, calculate the viewpoint tendentiousness feature of comment, computing formula is as follows:
Wherein, N
posrepresent that in all comments of this microblogging, viewpoint polarity is the number of reviews supported, N
negrepresent that in all comments, viewpoint polarity is the number of reviews opposed.
(3) microblogging issues the extraction of user characteristics, comprises the base attribute feature issuing user, and the hidden feature of deep layer (social characteristics, viewpoint forward feature and microblogging matching degree feature)
Step (3) is corresponding to the step 107 in Fig. 1.
In step 107, first utilize the base attribute feature of base attribute information extraction user issuing user, comprising: the auth type of user, sex, whether a guy's information, the microblogging number delivered, bean vermicelli number, friend's book etc.; Then extract the deep layer hidden feature (social characteristics, viewpoint forward feature and microblogging matching degree feature) of user's base attribute feature and user in conjunction with the history microblog data of user, this detailed process as shown in Figure 5.
In step 501, first obtain the information of user, comprise the history microblogging that the base attribute information of user and user issued in nearest month.
In step 502, according to bean vermicelli number, friend's number, mutually powder number of user, calculate the social characteristics of user, computing formula is as follows:
Wherein, fol_num represents the bean vermicelli number of this user, and fri_num represents friend's number (i.e. the number of users of this user concern) of this user, the number of the user that bi_fol_num represents and this user pays close attention to mutually.
In step 503, according to the history microblogging of user, the viewpoint calculating user forwards feature, as follows:
In above-mentioned formula, status_num represents the microblogging number that this user issues, and retweets_num represents the total amount that the microblogging that this user issues is forwarded.
In step 504, according to the history microblogging of user, calculate the microblogging matching degree feature of user.First utilize topic model, obtain the theme distribution of user's history microblogging and current microblogging, be denoted as: his_topic={his_p
1, his_p
2..., his_p
kand now_topic={now_p
1, now_p
2..., now_p
k, wherein k is the theme number of specifying, and the matching degree feature of user is calculated by the cosine similarity of the theme distribution of history microblogging and the theme distribution of present microblogging, as follows:
(4) extraction of microblogging popularity feature, comprises the undulatory property feature based on popularity and popularity trend and otherness feature, and simply forwards feature.
Step (4) is corresponding to the step 108 in Fig. 1, and detailed process as shown in Figure 6.
In step 601, the history transfer amount vector of current microblogging is first obtained according to the unique identification of microblogging.
Step 602 is for obtaining the undulatory property feature of popularity.First a period of time after being issued by microblogging T is divided into n the equally spaced time interval, is denoted as: Interval={I
1, I
2..., I
n, wherein I
krepresent a kth time that time interval distance microblogging is issued.The transfer amount that this microblogging obtained in the moment that n time interval is corresponding, can be expressed as a vector:
retweets_vector={count
1,count
2,…,count
n}
Wherein, count
irepresent when i-th time interval is corresponding, the current transfer amount of this microblogging.Approx the curve that the transfer amount of microblogging on each time interval is formed is regarded as popularity trend curve herein.The straight slope that in this curve, the fluctuation rate of change of adjacent two points can be linked by 2 calculates, as follows.
In above-mentioned formula, wave_rate
irepresent the fluctuation rate of change of adjacent two points in curve, count
irepresent i-th component forwarded in vector,
be normalized factor, represent the average transfer amount of the history microblogging in user one month, measured this user send out the average popularity of microblogging.
The final popularity undulatory property feature of microblogging can be represented by the maximum fluctuation rate of change intending popularity trend curve, and namely forward the maximal value of adjacent 2 fluctuation rate of change in vector, computing formula is as follows.
wave_feature=max(wave_rate
1,wave_rate
2,…,wave_rate
n-1)
In step 603, in conjunction with the average transfer amount of user's history microblogging in month, calculate the otherness feature of popularity, as follows:
Wherein retweet_count
trepresenting the obtainable transfer amount of microblogging, is also the current popularity of microblogging.
(5) based on the microblogging rumour identification of classification, according to the content characteristic of above-mentioned microblogging, the popularity feature issuing user characteristics and microblogging, construction feature vector, as the step 109 in Fig. 1, wherein every one dimension of proper vector is as shown in the table.
Then according to the sorter built, and above-mentioned (2), (3), obtain in (4) step microblogging content characteristic, issue the popularity feature of user characteristics and microblogging, carry out the prediction whether microblogging is rumour, and exporting corresponding Forecasting recognition result, this step corresponds to the step 110 in Fig. 1.Wherein sorter can select the common classification such as SVM, decision tree device, in sorter building process, first the proper vector (the same with step 109) of training data is obtained, as the input of sorter, in order to adjust the parameter of sorter, treat that parameter adjustment is complete, obtain the prediction that the sorter of finally having trained is applied to rumour.
The present invention proposes a kind of microblogging rumour recognition methods based on multiple features classification and system, the feature that this system is exclusive according to microblogging rumour, introduce the content that forefathers never proposed and the implicit new feature issuing user, and innovatively merge popularity and the popularity feature of microblogging, and in conjunction with the classification learning method in machine learning, realize accurately and automatically identify microblogging rumour.
Certainly; the present invention also can have other various embodiments; when not deviating from the present invention's spirit and essence thereof; those of ordinary skill in the art are when making various corresponding change and distortion according to the present invention, but these change accordingly and are out of shape the protection domain that all should belong to the claim appended by the present invention.
Claims (10)
1. the recognition methods of social networks rumour, is characterized in that, comprising:
Step 1, acquisition micro-blog information example, and obtain micro-blog information and the user profile of described micro-blog information example, according to described micro-blog information and described user profile, extract the content of microblog feature of described micro-blog information example, described content of microblog feature comprises shallow-layer text feature and microblogging deep layer hidden feature;
Step 2, according to described user profile, extract base attribute feature and user's deep layer hidden feature of described user, extract the microblogging popularity feature of described microblogging according to described micro-blog information, described microblogging popularity feature comprise based on popularity and popularity trend undulatory property feature and otherness feature and forward feature;
Step 3, according to described shallow-layer text feature, described microblogging deep layer hidden feature, described base attribute feature, described user's deep layer hidden feature, described microblogging popularity feature, construction feature vector, training classifier, described proper vector is inputted described sorter and Output rusults, identifies social networks rumour to complete.
2. social networks rumour recognition methods as claimed in claim 1, is characterized in that, described microblogging deep layer hidden feature comprises the viewpoint tendentiousness feature of focus tendentiousness feature, inside and outside consistance feature, feeling polarities feature and comment.
3. social networks rumour recognition methods as claimed in claim 1, is characterized in that, described user's deep layer hidden feature comprises social characteristics, viewpoint forwards feature and microblogging matching degree feature.
4. social networks rumour recognition methods as claimed in claim 2, it is characterized in that, the step extracting described focus tendentiousness feature and described inside and outside consistance feature comprises carries out participle and part-of-speech tagging to microblogging text, noun, the verb with performance meaning are extracted as keyword, and use the weight that the TF-IDF in Text character extraction sorts as keyword, using the keyword of K the highest for a weight word as microblogging text.
5. social networks rumour recognition methods as claimed in claim 2, it is characterized in that, the step of described extraction feeling polarities feature comprises carries out participle and part-of-speech tagging to microblogging text, and carried out the extraction of keyword by sentiment dictionary, emoticon dictionary, punctuation mark dictionary, responsive dictionary, the word extracting the notional word in described microblogging and can match in dictionary.
6. a social networks rumour recognition system, is characterized in that, comprising:
Extract content of microblog characteristic module, for obtaining micro-blog information example, and obtain micro-blog information and the user profile of described micro-blog information example, according to described micro-blog information and described user profile, extract the content of microblog feature of described micro-blog information example, described content of microblog feature comprises shallow-layer text feature and microblogging deep layer hidden feature;
Extract microblogging popularity characteristic module, for according to described user profile, extract base attribute feature and user's deep layer hidden feature of described user, extract the microblogging popularity feature of described microblogging according to described micro-blog information, described microblogging popularity feature comprise based on popularity and popularity trend undulatory property feature and otherness feature and forward feature;
Identify rumour module, for according to described shallow-layer text feature, described microblogging deep layer hidden feature, described base attribute feature, described user's deep layer hidden feature, described microblogging popularity feature, construction feature vector, training classifier, described proper vector is inputted described sorter and Output rusults, identifies social networks rumour to complete.
7. social networks rumour recognition system as claimed in claim 6, is characterized in that, described microblogging deep layer hidden feature comprises the viewpoint tendentiousness feature of focus tendentiousness feature, inside and outside consistance feature, feeling polarities feature and comment.
8. social networks rumour recognition system as claimed in claim 6, is characterized in that, described user's deep layer hidden feature comprises social characteristics, viewpoint forwards feature and microblogging matching degree feature.
9. social networks rumour recognition system as claimed in claim 7, it is characterized in that, the step extracting described focus tendentiousness feature and described inside and outside consistance feature comprises carries out participle and part-of-speech tagging to microblogging text, noun, the verb with performance meaning are extracted as keyword, and use the weight that the TF-IDF in Text character extraction sorts as keyword, using the keyword of K the highest for a weight word as microblogging text.
10. social networks rumour recognition system as claimed in claim 7, it is characterized in that, the step of described extraction feeling polarities feature comprises carries out participle and part-of-speech tagging to microblogging text, and carried out the extraction of keyword by sentiment dictionary, emoticon dictionary, punctuation mark dictionary, responsive dictionary, the word extracting the notional word in described microblogging and can match in dictionary.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201510401458.2A CN105045857A (en) | 2015-07-09 | 2015-07-09 | Social network rumor recognition method and system |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201510401458.2A CN105045857A (en) | 2015-07-09 | 2015-07-09 | Social network rumor recognition method and system |
Publications (1)
Publication Number | Publication Date |
---|---|
CN105045857A true CN105045857A (en) | 2015-11-11 |
Family
ID=54452404
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201510401458.2A Pending CN105045857A (en) | 2015-07-09 | 2015-07-09 | Social network rumor recognition method and system |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN105045857A (en) |
Cited By (33)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN105787101A (en) * | 2016-03-18 | 2016-07-20 | 联想(北京)有限公司 | Information processing method and electronic equipment |
CN105809162A (en) * | 2016-03-10 | 2016-07-27 | 腾讯科技(深圳)有限公司 | Method and device for acquiring WIFI hot pot and picture associated information |
CN106096638A (en) * | 2016-06-03 | 2016-11-09 | 腾讯科技(深圳)有限公司 | A kind of data processing method and device |
CN106126700A (en) * | 2016-07-01 | 2016-11-16 | 复旦大学 | A kind of analysis method of microblogging gossip propagation |
CN106202211A (en) * | 2016-06-27 | 2016-12-07 | 四川大学 | A kind of integrated microblogging rumour recognition methods based on microblogging type |
CN106354845A (en) * | 2016-08-31 | 2017-01-25 | 上海交通大学 | Microblog rumor recognizing method and system based on propagation structures |
CN106447285A (en) * | 2016-09-12 | 2017-02-22 | 北京大学 | Multidimensional field key knowledge-based recruitment information matching method |
CN106528655A (en) * | 2016-10-18 | 2017-03-22 | 百度在线网络技术(北京)有限公司 | Text subject recognition method and device |
CN106570162A (en) * | 2016-11-04 | 2017-04-19 | 北京百度网讯科技有限公司 | Canard identification method and device based on artificial intelligence |
CN106776557A (en) * | 2016-12-13 | 2017-05-31 | 竹间智能科技(上海)有限公司 | Affective state memory recognition methods and the device of emotional robot |
CN107180077A (en) * | 2017-04-18 | 2017-09-19 | 北京交通大学 | A kind of social networks rumour detection method based on deep learning |
CN107451923A (en) * | 2017-07-14 | 2017-12-08 | 北京航空航天大学 | A kind of online social networks rumour Forecasting Methodology based on forwarding Analytic Network Process |
CN107562814A (en) * | 2017-08-14 | 2018-01-09 | 中国农业大学 | A kind of earthquake emergency and the condition of a disaster acquisition of information sorting technique and system |
WO2018014543A1 (en) * | 2016-07-20 | 2018-01-25 | 平安科技(深圳)有限公司 | Information query method, information query device, storage medium, and terminal |
CN107741939A (en) * | 2016-10-31 | 2018-02-27 | 腾讯科技(深圳)有限公司 | A kind of recognition methods of info web and device |
CN107797998A (en) * | 2016-08-29 | 2018-03-13 | 腾讯科技(深圳)有限公司 | The recognition methods of user-generated content containing rumour and device |
CN108038240A (en) * | 2017-12-26 | 2018-05-15 | 武汉大学 | Based on content, the social networks rumour detection method of user's multiplicity |
CN108228853A (en) * | 2018-01-11 | 2018-06-29 | 北京信息科技大学 | A kind of microblogging rumour recognition methods and system |
CN108491480A (en) * | 2018-03-12 | 2018-09-04 | 义语智能科技(上海)有限公司 | Rumour detection method and equipment |
CN108563686A (en) * | 2018-03-14 | 2018-09-21 | 中国科学院自动化研究所 | Social networks rumour recognition methods based on hybrid neural networks and system |
CN109558483A (en) * | 2018-10-16 | 2019-04-02 | 北京航空航天大学 | A kind of rumour recognition methods based on model-naive Bayesian |
WO2019196259A1 (en) * | 2018-04-09 | 2019-10-17 | 平安科技(深圳)有限公司 | Method for identifying false message and device thereof |
CN110807556A (en) * | 2019-11-05 | 2020-02-18 | 重庆邮电大学 | Method and device for predicting propagation trend of microblog rumors or/and dagger rumors |
CN111079444A (en) * | 2019-12-25 | 2020-04-28 | 北京中科研究院 | Network rumor detection method based on multi-modal relationship |
CN111259658A (en) * | 2020-02-05 | 2020-06-09 | 中国科学院计算技术研究所 | General text classification method and system based on category dense vector representation |
CN111553167A (en) * | 2020-04-28 | 2020-08-18 | 腾讯科技(深圳)有限公司 | Text type identification method and device and storage medium |
CN111966919A (en) * | 2020-07-13 | 2020-11-20 | 江汉大学 | Event message processing method, device and equipment |
CN112035669A (en) * | 2020-09-09 | 2020-12-04 | 中国科学技术大学 | Social media multi-modal rumor detection method based on propagation heterogeneous graph modeling |
CN112199468A (en) * | 2020-09-23 | 2021-01-08 | 夏一雪 | Network public opinion artificial intelligence decision-making system under big data environment |
CN112199574A (en) * | 2020-09-23 | 2021-01-08 | 夏一雪 | Network public opinion artificial intelligence early warning system under big data environment |
CN112560495A (en) * | 2020-12-09 | 2021-03-26 | 新疆师范大学 | Microblog rumor detection method based on emotion analysis |
CN113177164A (en) * | 2021-05-13 | 2021-07-27 | 聂佼颖 | Multi-platform collaborative new media content monitoring and management system based on big data |
US11630957B2 (en) | 2017-09-04 | 2023-04-18 | Huawei Technologies Co., Ltd. | Natural language processing method and apparatus |
Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN102609475A (en) * | 2012-01-19 | 2012-07-25 | 浙江省公众信息产业有限公司 | Method for monitoring content of microblog and monitoring system |
CN103077240A (en) * | 2013-01-10 | 2013-05-01 | 北京工商大学 | Microblog water army identifying method based on probabilistic graphical model |
CN103631901A (en) * | 2013-11-20 | 2014-03-12 | 清华大学 | Rumor control method based on maximum spanning tree of user-trusted network |
CN103902621A (en) * | 2012-12-28 | 2014-07-02 | 深圳先进技术研究院 | Method and device for identifying network rumor |
US20150067849A1 (en) * | 2013-08-29 | 2015-03-05 | International Business Machines Corporation | Neutralizing propagation of malicious information |
-
2015
- 2015-07-09 CN CN201510401458.2A patent/CN105045857A/en active Pending
Patent Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN102609475A (en) * | 2012-01-19 | 2012-07-25 | 浙江省公众信息产业有限公司 | Method for monitoring content of microblog and monitoring system |
CN103902621A (en) * | 2012-12-28 | 2014-07-02 | 深圳先进技术研究院 | Method and device for identifying network rumor |
CN103077240A (en) * | 2013-01-10 | 2013-05-01 | 北京工商大学 | Microblog water army identifying method based on probabilistic graphical model |
US20150067849A1 (en) * | 2013-08-29 | 2015-03-05 | International Business Machines Corporation | Neutralizing propagation of malicious information |
CN103631901A (en) * | 2013-11-20 | 2014-03-12 | 清华大学 | Rumor control method based on maximum spanning tree of user-trusted network |
Non-Patent Citations (1)
Title |
---|
贺刚、吕学强、李卓、徐丽萍: "微博谣言识别研究", 《国书情报工作》 * |
Cited By (48)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN105809162A (en) * | 2016-03-10 | 2016-07-27 | 腾讯科技(深圳)有限公司 | Method and device for acquiring WIFI hot pot and picture associated information |
CN105787101A (en) * | 2016-03-18 | 2016-07-20 | 联想(北京)有限公司 | Information processing method and electronic equipment |
CN105787101B (en) * | 2016-03-18 | 2019-06-07 | 联想(北京)有限公司 | A kind of information processing method and electronic equipment |
CN106096638A (en) * | 2016-06-03 | 2016-11-09 | 腾讯科技(深圳)有限公司 | A kind of data processing method and device |
CN106096638B (en) * | 2016-06-03 | 2018-08-07 | 腾讯科技(深圳)有限公司 | A kind of data processing method and device |
CN106202211A (en) * | 2016-06-27 | 2016-12-07 | 四川大学 | A kind of integrated microblogging rumour recognition methods based on microblogging type |
CN106126700A (en) * | 2016-07-01 | 2016-11-16 | 复旦大学 | A kind of analysis method of microblogging gossip propagation |
CN106126700B (en) * | 2016-07-01 | 2020-05-12 | 复旦大学 | Analysis method for propagation of microblog rumors |
WO2018014543A1 (en) * | 2016-07-20 | 2018-01-25 | 平安科技(深圳)有限公司 | Information query method, information query device, storage medium, and terminal |
CN107644029A (en) * | 2016-07-20 | 2018-01-30 | 平安科技(深圳)有限公司 | Information query method and information query device |
CN107797998A (en) * | 2016-08-29 | 2018-03-13 | 腾讯科技(深圳)有限公司 | The recognition methods of user-generated content containing rumour and device |
CN107797998B (en) * | 2016-08-29 | 2021-05-07 | 腾讯科技(深圳)有限公司 | Rumor-containing user generated content identification method and device |
CN106354845A (en) * | 2016-08-31 | 2017-01-25 | 上海交通大学 | Microblog rumor recognizing method and system based on propagation structures |
CN106447285B (en) * | 2016-09-12 | 2020-06-12 | 北京大学 | Recruitment information matching method based on multi-dimensional domain key knowledge |
CN106447285A (en) * | 2016-09-12 | 2017-02-22 | 北京大学 | Multidimensional field key knowledge-based recruitment information matching method |
CN106528655A (en) * | 2016-10-18 | 2017-03-22 | 百度在线网络技术(北京)有限公司 | Text subject recognition method and device |
CN107741939A (en) * | 2016-10-31 | 2018-02-27 | 腾讯科技(深圳)有限公司 | A kind of recognition methods of info web and device |
CN107741939B (en) * | 2016-10-31 | 2020-05-12 | 腾讯科技(深圳)有限公司 | Webpage information identification method and device |
CN106570162A (en) * | 2016-11-04 | 2017-04-19 | 北京百度网讯科技有限公司 | Canard identification method and device based on artificial intelligence |
CN106570162B (en) * | 2016-11-04 | 2020-07-28 | 北京百度网讯科技有限公司 | Artificial intelligence-based rumor recognition method and device |
CN106776557B (en) * | 2016-12-13 | 2020-09-08 | 竹间智能科技(上海)有限公司 | Emotional state memory identification method and device of emotional robot |
CN106776557A (en) * | 2016-12-13 | 2017-05-31 | 竹间智能科技(上海)有限公司 | Affective state memory recognition methods and the device of emotional robot |
CN107180077A (en) * | 2017-04-18 | 2017-09-19 | 北京交通大学 | A kind of social networks rumour detection method based on deep learning |
CN107451923A (en) * | 2017-07-14 | 2017-12-08 | 北京航空航天大学 | A kind of online social networks rumour Forecasting Methodology based on forwarding Analytic Network Process |
CN107562814A (en) * | 2017-08-14 | 2018-01-09 | 中国农业大学 | A kind of earthquake emergency and the condition of a disaster acquisition of information sorting technique and system |
US11630957B2 (en) | 2017-09-04 | 2023-04-18 | Huawei Technologies Co., Ltd. | Natural language processing method and apparatus |
CN108038240A (en) * | 2017-12-26 | 2018-05-15 | 武汉大学 | Based on content, the social networks rumour detection method of user's multiplicity |
CN108228853A (en) * | 2018-01-11 | 2018-06-29 | 北京信息科技大学 | A kind of microblogging rumour recognition methods and system |
CN108491480B (en) * | 2018-03-12 | 2021-05-11 | 义语智能科技(上海)有限公司 | Rumor detection method and apparatus |
CN108491480A (en) * | 2018-03-12 | 2018-09-04 | 义语智能科技(上海)有限公司 | Rumour detection method and equipment |
CN108563686B (en) * | 2018-03-14 | 2021-07-30 | 中国科学院自动化研究所 | Social network rumor identification method and system based on hybrid neural network |
CN108563686A (en) * | 2018-03-14 | 2018-09-21 | 中国科学院自动化研究所 | Social networks rumour recognition methods based on hybrid neural networks and system |
WO2019196259A1 (en) * | 2018-04-09 | 2019-10-17 | 平安科技(深圳)有限公司 | Method for identifying false message and device thereof |
CN109558483B (en) * | 2018-10-16 | 2021-06-18 | 北京航空航天大学 | Rumor recognition method based on naive Bayes model |
CN109558483A (en) * | 2018-10-16 | 2019-04-02 | 北京航空航天大学 | A kind of rumour recognition methods based on model-naive Bayesian |
CN110807556B (en) * | 2019-11-05 | 2022-05-31 | 重庆邮电大学 | Method and device for predicting propagation trend of microblog rumors or/and dagger topics |
CN110807556A (en) * | 2019-11-05 | 2020-02-18 | 重庆邮电大学 | Method and device for predicting propagation trend of microblog rumors or/and dagger rumors |
CN111079444B (en) * | 2019-12-25 | 2020-09-29 | 北京中科研究院 | Network rumor detection method based on multi-modal relationship |
CN111079444A (en) * | 2019-12-25 | 2020-04-28 | 北京中科研究院 | Network rumor detection method based on multi-modal relationship |
CN111259658A (en) * | 2020-02-05 | 2020-06-09 | 中国科学院计算技术研究所 | General text classification method and system based on category dense vector representation |
CN111553167A (en) * | 2020-04-28 | 2020-08-18 | 腾讯科技(深圳)有限公司 | Text type identification method and device and storage medium |
CN111966919A (en) * | 2020-07-13 | 2020-11-20 | 江汉大学 | Event message processing method, device and equipment |
CN112035669A (en) * | 2020-09-09 | 2020-12-04 | 中国科学技术大学 | Social media multi-modal rumor detection method based on propagation heterogeneous graph modeling |
CN112199574A (en) * | 2020-09-23 | 2021-01-08 | 夏一雪 | Network public opinion artificial intelligence early warning system under big data environment |
CN112199468A (en) * | 2020-09-23 | 2021-01-08 | 夏一雪 | Network public opinion artificial intelligence decision-making system under big data environment |
CN112560495A (en) * | 2020-12-09 | 2021-03-26 | 新疆师范大学 | Microblog rumor detection method based on emotion analysis |
CN112560495B (en) * | 2020-12-09 | 2024-03-15 | 新疆师范大学 | Microblog rumor detection method based on emotion analysis |
CN113177164A (en) * | 2021-05-13 | 2021-07-27 | 聂佼颖 | Multi-platform collaborative new media content monitoring and management system based on big data |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN105045857A (en) | Social network rumor recognition method and system | |
Jiang et al. | Sentiment computing for the news event based on the social media big data | |
CN106354872B (en) | Text clustering method and system | |
CN106940732A (en) | A kind of doubtful waterborne troops towards microblogging finds method | |
CN104615608B (en) | A kind of data mining processing system and method | |
CN104331451B (en) | A kind of recommendation degree methods of marking of network user's comment based on theme | |
Aragón et al. | Overview of MEX-A3T at IberLEF 2020: Fake News and Aggressiveness Analysis in Mexican Spanish. | |
CN110390018A (en) | A kind of social networks comment generation method based on LSTM | |
El-Halees | Mining opinions in user-generated contents to improve course evaluation | |
CN107239512B (en) | A kind of microblogging comment spam recognition methods of combination comment relational network figure | |
CN104331506A (en) | Multiclass emotion analyzing method and system facing bilingual microblog text | |
CN103793503A (en) | Opinion mining and classification method based on web texts | |
CN106354845A (en) | Microblog rumor recognizing method and system based on propagation structures | |
CN103761239A (en) | Method for performing emotional tendency classification to microblog by using emoticons | |
CN101599071A (en) | The extraction method of conversation text topic | |
CN104899335A (en) | Method for performing sentiment classification on network public sentiment of information | |
CN105183717A (en) | OSN user emotion analysis method based on random forest and user relationship | |
CN106126502A (en) | A kind of emotional semantic classification system and method based on support vector machine | |
CN105893484A (en) | Microblog Spammer recognition method based on text characteristics and behavior characteristics | |
Mehra et al. | Sentimental analysis using fuzzy and naive bayes | |
CN105740382A (en) | Aspect classification method for short comment texts | |
Alawneh et al. | Sentiment analysis-based sexual harassment detection using machine learning techniques | |
CN107305545A (en) | A kind of recognition methods of the network opinion leader based on text tendency analysis | |
CN104915399A (en) | Recommended data processing method based on news headline and recommended data processing method system based on news headline | |
CN106569999A (en) | Multi-granularity short text semantic similarity comparison method and system |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
C06 | Publication | ||
PB01 | Publication | ||
C10 | Entry into substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
RJ01 | Rejection of invention patent application after publication | ||
RJ01 | Rejection of invention patent application after publication |
Application publication date: 20151111 |