CN105045857A - Social network rumor recognition method and system - Google Patents

Social network rumor recognition method and system Download PDF

Info

Publication number
CN105045857A
CN105045857A CN201510401458.2A CN201510401458A CN105045857A CN 105045857 A CN105045857 A CN 105045857A CN 201510401458 A CN201510401458 A CN 201510401458A CN 105045857 A CN105045857 A CN 105045857A
Authority
CN
China
Prior art keywords
feature
microblogging
popularity
microblog
text
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201510401458.2A
Other languages
Chinese (zh)
Inventor
熊锦华
张巧
程学旗
张水源
许洪波
余智华
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Institute of Computing Technology of CAS
Original Assignee
Institute of Computing Technology of CAS
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Institute of Computing Technology of CAS filed Critical Institute of Computing Technology of CAS
Priority to CN201510401458.2A priority Critical patent/CN105045857A/en
Publication of CN105045857A publication Critical patent/CN105045857A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/951Indexing; Web crawling techniques
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/35Clustering; Classification

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention discloses a social network rumor recognition method and system. The method comprises the steps of: obtaining a microblog information case, obtaining microblog information and user information of the microblog information case, and according to the microblog information and the user information, extracting microblog content features of the microblog information case, wherein the microblog content features include a shallow text feature and a deeply implied microblog feature; extracting basic attribute features of a user and deeply implied features of the user according to the user information, and extracting microblog popularity features according to the microblog information, wherein the microblog popularity features include a volatility feature based on popularity and popularity trend, a difference feature and a forwarding feature; and establishing a feature vector and a training classifier according to the shallow text feature, the deeply implied microblog feature, the basic attribute features, the deeply implied features of the user and the microblog popularity features, inputting the feature vector into the classifier, and outputting a result.

Description

A kind of social networks rumour recognition methods and system
Technical field
The present invention relates to social network analysis field, particularly the recognition methods of a kind of social networks rumour and system.
Background technology
Social networks popular and universal, information content in social networks is increased with presenting explosion type, but information quality is not but promoted accordingly, the various junk information especially deceptive information such as rumour is flooded with whole social networks, and the propagation of rumour and diffusion bring to the development of the life of people and society and endanger and negative effect greatly.
The rumour message in the middle of social networks can be identified timely and accurately, not only contribute to building good internet environment, help the true and false of people's identifying information better, stop the serious harm that malicious rumor brings in time, can also monitor in public sentiment, play positive effect in information guidance etc.
Current existing rumour recognition methods mainly can be divided into two classes, one class is based on artificial method, its mechanism is mainly by manually reporting to the authorities announced message and judging, the initial stage that these class methods cannot produce at rumour contains that it is propagated and diffusion, promptness is poor, and need a large amount of labours and financial resources, cost-effectivenes is high, another kind of method is the method based on machine learning, whether be that rumour processes as classification problem using microblogging, and utilize each category feature of microblogging, adopt certain classification learning algorithm to carry out the identification of rumour, in the selection of characteristic of division, mainly can be divided into 3 kinds at present, the content of microblogging respectively, the propagation of publisher and microblogging, in the selection of content characteristic, mainly utilize the shallow-layer text feature of content (as whether comprised link in content at present, picture, whether mention other people etc.), and deeper analysis is not done to text, its semanteme of abundant excavation, theme, the hidden features such as emotion, in publisher, mainly select some static natures, comprise the base attributes such as the bean vermicelli number of publisher, friend's number, do not take the confidence level and influence power etc. of publisher into consideration, in the selection of microblogging propagation characteristic, related work mainly concentrates on the propagation model of research microblogging rumour, structure take rumour as the forwarding graph of a relation of ancestor node, simulate its dissemination, or be only confined to some and simply forward attribute, do not analyse in depth rumour other features in communication process further.In the correlative study that these rumour recognition features are selected, but calibration is bad for selected feature, has some limitations, causes final rumour recognition effect not good, in sum, a kind of automatic mode that accurately can identify microblogging rumour is lacked in existing method.
Summary of the invention
For the defect existed in prior art and the exclusive feature of microblogging rumour, the object of the invention is to utilize the content of microblogging, issue the feature of popularity three aspects of user and microblogging, and by the sorting technique in machine learning, realize the automatic identification of microblogging rumour, and effectively improve recognition accuracy and the recall rate of microblogging rumour, the present invention proposes the recognition methods of a kind of social networks rumour and system.
The invention provides the recognition methods of a kind of social networks rumour, comprising:
Step 1, acquisition micro-blog information example, and obtain micro-blog information and the user profile of described micro-blog information example, according to described micro-blog information and described user profile, extract the content of microblog feature of described micro-blog information example, described content of microblog feature comprises shallow-layer text feature and microblogging deep layer hidden feature;
Step 2, according to described user profile, extract base attribute feature and user's deep layer hidden feature of described user, extract the microblogging popularity feature of described microblogging according to described micro-blog information, described microblogging popularity feature comprise based on popularity and popularity trend undulatory property feature and otherness feature and forward feature;
Step 3, according to described shallow-layer text feature, described microblogging deep layer hidden feature, described base attribute feature, described user's deep layer hidden feature, described microblogging popularity feature, construction feature vector, training classifier, described proper vector is inputted described sorter and Output rusults, identifies social networks rumour to complete.
Described social networks rumour recognition methods, described microblogging deep layer hidden feature comprises the viewpoint tendentiousness feature of focus tendentiousness feature, inside and outside consistance feature, feeling polarities feature and comment.
Described social networks rumour recognition methods, described user's deep layer hidden feature comprises social characteristics, viewpoint forwards feature and microblogging matching degree feature.
Described social networks rumour recognition methods, the step extracting described focus tendentiousness feature and described inside and outside consistance feature comprises carries out participle and part-of-speech tagging to microblogging text, noun, the verb with performance meaning are extracted as keyword, and use the weight that the TF-IDF in Text character extraction sorts as keyword, using the keyword of K the highest for a weight word as microblogging text.
Described social networks rumour recognition methods, the step of described extraction feeling polarities feature comprises carries out participle and part-of-speech tagging to microblogging text, and carried out the extraction of keyword by sentiment dictionary, emoticon dictionary, punctuation mark dictionary, responsive dictionary, the word extracting the notional word in described microblogging and can match in dictionary.
The present invention also proposes a kind of social networks rumour recognition system, comprising:
Extract content of microblog characteristic module, for obtaining micro-blog information example, and obtain micro-blog information and the user profile of described micro-blog information example, according to described micro-blog information and described user profile, extract the content of microblog feature of described micro-blog information example, described content of microblog feature comprises shallow-layer text feature and microblogging deep layer hidden feature;
Extract microblogging popularity characteristic module, for according to described user profile, extract base attribute feature and user's deep layer hidden feature of described user, extract the microblogging popularity feature of described microblogging according to described micro-blog information, described microblogging popularity feature comprise based on popularity and popularity trend undulatory property feature and otherness feature and forward feature;
Identify rumour module, for according to described shallow-layer text feature, described microblogging deep layer hidden feature, described base attribute feature, described user's deep layer hidden feature, described microblogging popularity feature, construction feature vector, training classifier, described proper vector is inputted described sorter and Output rusults, identifies social networks rumour to complete.
Described social networks rumour recognition system, described microblogging deep layer hidden feature comprises the viewpoint tendentiousness feature of focus tendentiousness feature, inside and outside consistance feature, feeling polarities feature and comment.
Described social networks rumour recognition system, described user's deep layer hidden feature comprises social characteristics, viewpoint forwards feature and microblogging matching degree feature.
Described social networks rumour recognition system, the step extracting described focus tendentiousness feature and described inside and outside consistance feature comprises carries out participle and part-of-speech tagging to microblogging text, noun, the verb with performance meaning are extracted as keyword, and use the weight that the TF-IDF in Text character extraction sorts as keyword, using the keyword of K the highest for a weight word as microblogging text.
Described social networks rumour recognition system, the step of described extraction feeling polarities feature comprises carries out participle and part-of-speech tagging to microblogging text, and carried out the extraction of keyword by sentiment dictionary, emoticon dictionary, punctuation mark dictionary, responsive dictionary, the word extracting the notional word in described microblogging and can match in dictionary.
From above scheme, the invention has the advantages that:
Effect of the present invention is: the exclusive feature that the present invention is directed to microblogging rumour, introduces content of microblog and the deep layer hidden feature issuing user, effectively can distinguish rumour microblogging and general microblogging in identifying; Merge the popularity and popularity trend feature that change in microblogging communication process, significantly improve accuracy rate and the recall rate of rumour identification in assorting process.
Accompanying drawing explanation
Fig. 1 is the overall flow figure of one embodiment of the invention;
Fig. 2 be content in one embodiment of the invention focus tendentiousness characteristic sum inside and outside the process flow diagram of consistance feature extraction;
Fig. 3 is the process flow diagram of the feeling polarities feature extraction of content in one embodiment of the invention;
Fig. 4 is the process flow diagram of the viewpoint tendentiousness feature extraction commented in one embodiment of the invention;
The process flow diagram that Fig. 5 is the social characteristics of user in one embodiment of the invention, viewpoint forwards feature and the feature extraction of history microblogging matching degree;
Fig. 6 is the undulatory property of popularity in one embodiment of the invention and the process flow diagram of otherness feature extraction.
Embodiment
Describe the present invention below in conjunction with the drawings and specific embodiments, but not as a limitation of the invention.
The present invention proposes. and the recognition methods of a kind of social networks rumour, below comprises for overall step:
Acquisition micro-blog information example, and obtain micro-blog information and the user profile of described micro-blog information example, according to described micro-blog information and described user profile, extract the content of microblog feature of described micro-blog information example, described content of microblog feature comprises shallow-layer text feature and microblogging deep layer hidden feature;
According to described user profile, extract base attribute feature and user's deep layer hidden feature of described user, extract the microblogging popularity feature of described microblogging according to described micro-blog information, described microblogging popularity feature comprise based on popularity and popularity trend undulatory property feature and otherness feature and forward feature;
According to described shallow-layer text feature, described microblogging deep layer hidden feature, described base attribute feature, described user's deep layer hidden feature, described microblogging popularity feature, construction feature vector, training classifier, described proper vector is inputted described sorter and Output rusults, identifies social networks rumour to complete.
Described microblogging deep layer hidden feature comprises the viewpoint tendentiousness feature of focus tendentiousness feature, inside and outside consistance feature, feeling polarities feature and comment.
Described user's deep layer hidden feature comprises social characteristics, viewpoint forwards feature and microblogging matching degree feature.
The step extracting described focus tendentiousness feature and described inside and outside consistance feature comprises carries out participle and part-of-speech tagging to microblogging text, noun, the verb with performance meaning are extracted as keyword, and use the weight that the TF-IDF in Text character extraction sorts as keyword, using the keyword of K the highest for a weight word as microblogging text.
The step of described extraction feeling polarities feature comprises carries out participle and part-of-speech tagging to microblogging text, and carried out the extraction of keyword by sentiment dictionary, emoticon dictionary, punctuation mark dictionary, responsive dictionary, the word extracting the notional word in described microblogging and can match in dictionary.
The present invention also proposes a kind of social networks rumour recognition system, comprising:
Extract content of microblog characteristic module, for obtaining micro-blog information example, and obtain micro-blog information and the user profile of described micro-blog information example, according to described micro-blog information and described user profile, extract the content of microblog feature of described micro-blog information example, described content of microblog feature comprises shallow-layer text feature and microblogging deep layer hidden feature;
Extract microblogging popularity characteristic module, for according to described user profile, extract base attribute feature and user's deep layer hidden feature of described user, extract the microblogging popularity feature of described microblogging according to described micro-blog information, described microblogging popularity feature comprise based on popularity and popularity trend undulatory property feature and otherness feature and forward feature;
Identify rumour module, for according to described shallow-layer text feature, described microblogging deep layer hidden feature, described base attribute feature, described user's deep layer hidden feature, described microblogging popularity feature, construction feature vector, training classifier, described proper vector is inputted described sorter and Output rusults, identifies social networks rumour to complete.
Described microblogging deep layer hidden feature comprises the viewpoint tendentiousness feature of focus tendentiousness feature, inside and outside consistance feature, feeling polarities feature and comment.
Described user's deep layer hidden feature comprises social characteristics, viewpoint forwards feature and microblogging matching degree feature.
The step extracting described focus tendentiousness feature and described inside and outside consistance feature comprises carries out participle and part-of-speech tagging to microblogging text, noun, the verb with performance meaning are extracted as keyword, and use the weight that the TF-IDF in Text character extraction sorts as keyword, using the keyword of K the highest for a weight word as microblogging text.
The step of described extraction feeling polarities feature comprises carries out participle and part-of-speech tagging to microblogging text, and carried out the extraction of keyword by sentiment dictionary, emoticon dictionary, punctuation mark dictionary, responsive dictionary, the word extracting the notional word in described microblogging and can match in dictionary.
As shown in Figure 1, a specific embodiment of a kind of social networks rumour recognition methods, comprises the following steps:
(1) acquisition of micro-blog information example, according to the microblogging unique identification of input, obtain micro-blog information and corresponding user profile, micro-blog information comprises microblogging text, the history transfer amount vector of microblogging and all comment texts of microblogging, and user profile comprises the base attribute (bean vermicelli number, friend's number, mutually powder number) of user, the history microblogging within month and corresponding transfer amount vector.Step (1) is corresponding to the step 101 in figure and 102.
(2) extraction of content of microblog feature, comprises the shallow-layer text feature of content and deep layer hidden feature (the viewpoint tendentiousness feature of focus tendentiousness feature, inside and outside consistance feature, feeling polarities feature and comment)
Step (2) is corresponding to the step 103 in Fig. 1.
In step 103, first extract the shallow-layer text feature of content of microblog, comprising: whether comprise external linkage in microblogging issuing time to the interval, microblogging text of microblog account hour of log-on, whether containing picture, audio frequency, video etc., whether mention other people; Then, utilize the outside page 105 linking indication in focus theme dictionary 104, microblogging text to extract the detailed process of consistance feature inside and outside focus tendentiousness characteristic sum as shown in Figure 2, utilize all kinds of dictionary 106 such as emotion, symbol to extract the detailed process of the viewpoint tendentiousness feature of feeling polarities feature and comment respectively as shown in Figure 3 and Figure 4.
Inside and outside focus tendentiousness characteristic sum consistance feature leaching process in, first instrument is used to carry out participle and part-of-speech tagging to microblogging text, only using having the performance noun of meaning, verb extracts as keyword, and use the TF-IDF in Text character extraction (termfrequency – inversedocumentfrequency) to be used as the weight of keyword sequence, K the word that after selected and sorted, weight is the highest is as the keyword of microblogging text, and this process is as shown in step 201 and step 202.
In step 203, utilize focus theme dictionary 204 to extract the focus tendentiousness feature of content.
Suppose W={w 1, w 2..., w nthe keyword set extracted from microblogging T, w irepresent wherein a certain keyword, HotTopicWordBase={T 1, T 2..., T mby the focus theme dictionary of subject classification, T ibe the set of letters under a certain focus theme, then the focus tendentiousness computing formula of this content of microblog is as follows:
hot_feature(W)=max(simi(W,T 1),simi(W,T 2),…,simi(W,T m))
In above-mentioned formula, simi (W, T i) represent set of letters T under the keyword set W of microblogging T and a certain focus theme ijaccard similarity.
In step 205, utilize in microblogging text the outside page 206 linking indication to extract the inside and outside consistance feature of content, wherein the outside page is described by page title (title), page statement (description) and page key words (keyword), and computing formula is as follows:
c o n _ f e a u r e ( T , u r l _ p a g e ) = 0 , T n o t c o n t a i n U R L m a x ( Re l ( T , t i t l e ) , Re l ( T , d e s c r i p t i o n ) , Re l ( T , k e y w o r d s ) ) , T c o n t a i n U R L
In above-mentioned formula, Rel (T, title), Rel (T, description), Rel (T, keywords) represent the correlativity of the title of microblogging text and the outside page, page-describing and page key words respectively, and use Jaccard similarity to characterize.
In the leaching process of feeling polarities feature, first instrument is used to carry out participle and part-of-speech tagging 301 to microblogging text, and the extraction of keyword is carried out by sentiment dictionary, emoticon dictionary, punctuation mark dictionary, responsive dictionary, the word extracting the notional word (verb, noun, adjective) in microblogging and can match in dictionary, this process as shown in step 302.
Adopt the TF-IDF improved to carry out the calculating 304 of lexical item weight to the word extracted, weight calculation is as follows:
weight k = ( l o g ( f k ) + 1.0 ) × log 2 ( level k + 1 ) Σ k - 1 l [ ( l o g ( f k ) + 1.0 ) × log 2 ( level k + 1 ) ] 2
Wight in above-mentioned formula krepresent the weight of a kth lexical item in microblogging, f krepresent the word frequency of current lexical item in this microblogging, l represents the lexical item number in this microblogging, level krepresent the grade of current lexical item, lexical item grade be set as follows shown in table:
In step 305, the lexical item weight obtained in above-mentioned steps is utilized to build the proper vector of microblogging text, for the input of feeling polarities sorter, obtain the feeling polarities feature of microblogging, have employed in the present invention with support vector machine (SVM), and feeling polarities is divided into front, negative and neutral three types.
In step 401, instrument is first used to carry out participle and part-of-speech tagging to microblogging text.
In step 402, except utilizing the dictionary that uses in feeling polarities feature extraction, also use viewpoint polarity dictionary, extract the viewpoint tendentiousness feature of comment.First according to the result of participle and part-of-speech tagging, and sentiment dictionary, emoticon dictionary, punctuation mark dictionary, responsive dictionary and viewpoint polarity dictionary, the word extracting the notional word (verb, noun, adjective) in microblogging and can match in dictionary, and supplement keyword by 2-gram, 3-gram and 4-gram of microblogging text.
In step 405, the TF-IDF improved is adopted to carry out the calculating of lexical item weight to the word extracted, the same with step 304.
The lexical item weight utilizing above-mentioned steps to obtain builds the proper vector of the every bar comment of microblogging, as the input of viewpoint polarity sorter, obtain the viewpoint polarity of the every bar comment of microblogging, as shown at step 406, in the present invention, the viewpoint polarity of comment is divided into support, opposes and other three classes.
In step 407, in statistics microblogging, all comment viewpoint polarity is the number supported and oppose, calculate the viewpoint tendentiousness feature of comment, computing formula is as follows:
v i e w _ s e n t i _ f e a t u r e = l o g ( N p o s N n e g )
Wherein, N posrepresent that in all comments of this microblogging, viewpoint polarity is the number of reviews supported, N negrepresent that in all comments, viewpoint polarity is the number of reviews opposed.
(3) microblogging issues the extraction of user characteristics, comprises the base attribute feature issuing user, and the hidden feature of deep layer (social characteristics, viewpoint forward feature and microblogging matching degree feature)
Step (3) is corresponding to the step 107 in Fig. 1.
In step 107, first utilize the base attribute feature of base attribute information extraction user issuing user, comprising: the auth type of user, sex, whether a guy's information, the microblogging number delivered, bean vermicelli number, friend's book etc.; Then extract the deep layer hidden feature (social characteristics, viewpoint forward feature and microblogging matching degree feature) of user's base attribute feature and user in conjunction with the history microblog data of user, this detailed process as shown in Figure 5.
In step 501, first obtain the information of user, comprise the history microblogging that the base attribute information of user and user issued in nearest month.
In step 502, according to bean vermicelli number, friend's number, mutually powder number of user, calculate the social characteristics of user, computing formula is as follows:
s o c i a l _ i n f _ f e a t u r e = l o g ( f o l _ n u m - b i _ f o l _ n u m f r i _ n u m + 1 )
Wherein, fol_num represents the bean vermicelli number of this user, and fri_num represents friend's number (i.e. the number of users of this user concern) of this user, the number of the user that bi_fol_num represents and this user pays close attention to mutually.
In step 503, according to the history microblogging of user, the viewpoint calculating user forwards feature, as follows:
v i e w _ r e t w e e t - f e a t u r e = r e t w e e t s _ n u m s t a t u s e s _ n u m
In above-mentioned formula, status_num represents the microblogging number that this user issues, and retweets_num represents the total amount that the microblogging that this user issues is forwarded.
In step 504, according to the history microblogging of user, calculate the microblogging matching degree feature of user.First utilize topic model, obtain the theme distribution of user's history microblogging and current microblogging, be denoted as: his_topic={his_p 1, his_p 2..., his_p kand now_topic={now_p 1, now_p 2..., now_p k, wherein k is the theme number of specifying, and the matching degree feature of user is calculated by the cosine similarity of the theme distribution of history microblogging and the theme distribution of present microblogging, as follows:
w e i b o _ m a t c h _ f e a t u r e = cos i n _ s i m i ( h i s _ t o p i c , n o w _ t o p i c ) = h i s _ t o p i c × n o w _ t o p i c | h i s _ t o p i c | × | n o w _ t o p i c |
(4) extraction of microblogging popularity feature, comprises the undulatory property feature based on popularity and popularity trend and otherness feature, and simply forwards feature.
Step (4) is corresponding to the step 108 in Fig. 1, and detailed process as shown in Figure 6.
In step 601, the history transfer amount vector of current microblogging is first obtained according to the unique identification of microblogging.
Step 602 is for obtaining the undulatory property feature of popularity.First a period of time after being issued by microblogging T is divided into n the equally spaced time interval, is denoted as: Interval={I 1, I 2..., I n, wherein I krepresent a kth time that time interval distance microblogging is issued.The transfer amount that this microblogging obtained in the moment that n time interval is corresponding, can be expressed as a vector:
retweets_vector={count 1,count 2,…,count n}
Wherein, count irepresent when i-th time interval is corresponding, the current transfer amount of this microblogging.Approx the curve that the transfer amount of microblogging on each time interval is formed is regarded as popularity trend curve herein.The straight slope that in this curve, the fluctuation rate of change of adjacent two points can be linked by 2 calculates, as follows.
w a v e _ rate i = count i + 1 - count i r e t w e e t _ c o u n t ‾ × ( I i + 1 - I i )
In above-mentioned formula, wave_rate irepresent the fluctuation rate of change of adjacent two points in curve, count irepresent i-th component forwarded in vector, be normalized factor, represent the average transfer amount of the history microblogging in user one month, measured this user send out the average popularity of microblogging.
The final popularity undulatory property feature of microblogging can be represented by the maximum fluctuation rate of change intending popularity trend curve, and namely forward the maximal value of adjacent 2 fluctuation rate of change in vector, computing formula is as follows.
wave_feature=max(wave_rate 1,wave_rate 2,…,wave_rate n-1)
In step 603, in conjunction with the average transfer amount of user's history microblogging in month, calculate the otherness feature of popularity, as follows:
i d f _ f e a t u r e = r e t w e e t _ count T - r e w e e t _ c o u n t r e t w e e t _ c o u n t
Wherein retweet_count trepresenting the obtainable transfer amount of microblogging, is also the current popularity of microblogging.
(5) based on the microblogging rumour identification of classification, according to the content characteristic of above-mentioned microblogging, the popularity feature issuing user characteristics and microblogging, construction feature vector, as the step 109 in Fig. 1, wherein every one dimension of proper vector is as shown in the table.
Then according to the sorter built, and above-mentioned (2), (3), obtain in (4) step microblogging content characteristic, issue the popularity feature of user characteristics and microblogging, carry out the prediction whether microblogging is rumour, and exporting corresponding Forecasting recognition result, this step corresponds to the step 110 in Fig. 1.Wherein sorter can select the common classification such as SVM, decision tree device, in sorter building process, first the proper vector (the same with step 109) of training data is obtained, as the input of sorter, in order to adjust the parameter of sorter, treat that parameter adjustment is complete, obtain the prediction that the sorter of finally having trained is applied to rumour.
The present invention proposes a kind of microblogging rumour recognition methods based on multiple features classification and system, the feature that this system is exclusive according to microblogging rumour, introduce the content that forefathers never proposed and the implicit new feature issuing user, and innovatively merge popularity and the popularity feature of microblogging, and in conjunction with the classification learning method in machine learning, realize accurately and automatically identify microblogging rumour.
Certainly; the present invention also can have other various embodiments; when not deviating from the present invention's spirit and essence thereof; those of ordinary skill in the art are when making various corresponding change and distortion according to the present invention, but these change accordingly and are out of shape the protection domain that all should belong to the claim appended by the present invention.

Claims (10)

1. the recognition methods of social networks rumour, is characterized in that, comprising:
Step 1, acquisition micro-blog information example, and obtain micro-blog information and the user profile of described micro-blog information example, according to described micro-blog information and described user profile, extract the content of microblog feature of described micro-blog information example, described content of microblog feature comprises shallow-layer text feature and microblogging deep layer hidden feature;
Step 2, according to described user profile, extract base attribute feature and user's deep layer hidden feature of described user, extract the microblogging popularity feature of described microblogging according to described micro-blog information, described microblogging popularity feature comprise based on popularity and popularity trend undulatory property feature and otherness feature and forward feature;
Step 3, according to described shallow-layer text feature, described microblogging deep layer hidden feature, described base attribute feature, described user's deep layer hidden feature, described microblogging popularity feature, construction feature vector, training classifier, described proper vector is inputted described sorter and Output rusults, identifies social networks rumour to complete.
2. social networks rumour recognition methods as claimed in claim 1, is characterized in that, described microblogging deep layer hidden feature comprises the viewpoint tendentiousness feature of focus tendentiousness feature, inside and outside consistance feature, feeling polarities feature and comment.
3. social networks rumour recognition methods as claimed in claim 1, is characterized in that, described user's deep layer hidden feature comprises social characteristics, viewpoint forwards feature and microblogging matching degree feature.
4. social networks rumour recognition methods as claimed in claim 2, it is characterized in that, the step extracting described focus tendentiousness feature and described inside and outside consistance feature comprises carries out participle and part-of-speech tagging to microblogging text, noun, the verb with performance meaning are extracted as keyword, and use the weight that the TF-IDF in Text character extraction sorts as keyword, using the keyword of K the highest for a weight word as microblogging text.
5. social networks rumour recognition methods as claimed in claim 2, it is characterized in that, the step of described extraction feeling polarities feature comprises carries out participle and part-of-speech tagging to microblogging text, and carried out the extraction of keyword by sentiment dictionary, emoticon dictionary, punctuation mark dictionary, responsive dictionary, the word extracting the notional word in described microblogging and can match in dictionary.
6. a social networks rumour recognition system, is characterized in that, comprising:
Extract content of microblog characteristic module, for obtaining micro-blog information example, and obtain micro-blog information and the user profile of described micro-blog information example, according to described micro-blog information and described user profile, extract the content of microblog feature of described micro-blog information example, described content of microblog feature comprises shallow-layer text feature and microblogging deep layer hidden feature;
Extract microblogging popularity characteristic module, for according to described user profile, extract base attribute feature and user's deep layer hidden feature of described user, extract the microblogging popularity feature of described microblogging according to described micro-blog information, described microblogging popularity feature comprise based on popularity and popularity trend undulatory property feature and otherness feature and forward feature;
Identify rumour module, for according to described shallow-layer text feature, described microblogging deep layer hidden feature, described base attribute feature, described user's deep layer hidden feature, described microblogging popularity feature, construction feature vector, training classifier, described proper vector is inputted described sorter and Output rusults, identifies social networks rumour to complete.
7. social networks rumour recognition system as claimed in claim 6, is characterized in that, described microblogging deep layer hidden feature comprises the viewpoint tendentiousness feature of focus tendentiousness feature, inside and outside consistance feature, feeling polarities feature and comment.
8. social networks rumour recognition system as claimed in claim 6, is characterized in that, described user's deep layer hidden feature comprises social characteristics, viewpoint forwards feature and microblogging matching degree feature.
9. social networks rumour recognition system as claimed in claim 7, it is characterized in that, the step extracting described focus tendentiousness feature and described inside and outside consistance feature comprises carries out participle and part-of-speech tagging to microblogging text, noun, the verb with performance meaning are extracted as keyword, and use the weight that the TF-IDF in Text character extraction sorts as keyword, using the keyword of K the highest for a weight word as microblogging text.
10. social networks rumour recognition system as claimed in claim 7, it is characterized in that, the step of described extraction feeling polarities feature comprises carries out participle and part-of-speech tagging to microblogging text, and carried out the extraction of keyword by sentiment dictionary, emoticon dictionary, punctuation mark dictionary, responsive dictionary, the word extracting the notional word in described microblogging and can match in dictionary.
CN201510401458.2A 2015-07-09 2015-07-09 Social network rumor recognition method and system Pending CN105045857A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201510401458.2A CN105045857A (en) 2015-07-09 2015-07-09 Social network rumor recognition method and system

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201510401458.2A CN105045857A (en) 2015-07-09 2015-07-09 Social network rumor recognition method and system

Publications (1)

Publication Number Publication Date
CN105045857A true CN105045857A (en) 2015-11-11

Family

ID=54452404

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201510401458.2A Pending CN105045857A (en) 2015-07-09 2015-07-09 Social network rumor recognition method and system

Country Status (1)

Country Link
CN (1) CN105045857A (en)

Cited By (33)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105787101A (en) * 2016-03-18 2016-07-20 联想(北京)有限公司 Information processing method and electronic equipment
CN105809162A (en) * 2016-03-10 2016-07-27 腾讯科技(深圳)有限公司 Method and device for acquiring WIFI hot pot and picture associated information
CN106096638A (en) * 2016-06-03 2016-11-09 腾讯科技(深圳)有限公司 A kind of data processing method and device
CN106126700A (en) * 2016-07-01 2016-11-16 复旦大学 A kind of analysis method of microblogging gossip propagation
CN106202211A (en) * 2016-06-27 2016-12-07 四川大学 A kind of integrated microblogging rumour recognition methods based on microblogging type
CN106354845A (en) * 2016-08-31 2017-01-25 上海交通大学 Microblog rumor recognizing method and system based on propagation structures
CN106447285A (en) * 2016-09-12 2017-02-22 北京大学 Multidimensional field key knowledge-based recruitment information matching method
CN106528655A (en) * 2016-10-18 2017-03-22 百度在线网络技术(北京)有限公司 Text subject recognition method and device
CN106570162A (en) * 2016-11-04 2017-04-19 北京百度网讯科技有限公司 Canard identification method and device based on artificial intelligence
CN106776557A (en) * 2016-12-13 2017-05-31 竹间智能科技(上海)有限公司 Affective state memory recognition methods and the device of emotional robot
CN107180077A (en) * 2017-04-18 2017-09-19 北京交通大学 A kind of social networks rumour detection method based on deep learning
CN107451923A (en) * 2017-07-14 2017-12-08 北京航空航天大学 A kind of online social networks rumour Forecasting Methodology based on forwarding Analytic Network Process
CN107562814A (en) * 2017-08-14 2018-01-09 中国农业大学 A kind of earthquake emergency and the condition of a disaster acquisition of information sorting technique and system
WO2018014543A1 (en) * 2016-07-20 2018-01-25 平安科技(深圳)有限公司 Information query method, information query device, storage medium, and terminal
CN107741939A (en) * 2016-10-31 2018-02-27 腾讯科技(深圳)有限公司 A kind of recognition methods of info web and device
CN107797998A (en) * 2016-08-29 2018-03-13 腾讯科技(深圳)有限公司 The recognition methods of user-generated content containing rumour and device
CN108038240A (en) * 2017-12-26 2018-05-15 武汉大学 Based on content, the social networks rumour detection method of user's multiplicity
CN108228853A (en) * 2018-01-11 2018-06-29 北京信息科技大学 A kind of microblogging rumour recognition methods and system
CN108491480A (en) * 2018-03-12 2018-09-04 义语智能科技(上海)有限公司 Rumour detection method and equipment
CN108563686A (en) * 2018-03-14 2018-09-21 中国科学院自动化研究所 Social networks rumour recognition methods based on hybrid neural networks and system
CN109558483A (en) * 2018-10-16 2019-04-02 北京航空航天大学 A kind of rumour recognition methods based on model-naive Bayesian
WO2019196259A1 (en) * 2018-04-09 2019-10-17 平安科技(深圳)有限公司 Method for identifying false message and device thereof
CN110807556A (en) * 2019-11-05 2020-02-18 重庆邮电大学 Method and device for predicting propagation trend of microblog rumors or/and dagger rumors
CN111079444A (en) * 2019-12-25 2020-04-28 北京中科研究院 Network rumor detection method based on multi-modal relationship
CN111259658A (en) * 2020-02-05 2020-06-09 中国科学院计算技术研究所 General text classification method and system based on category dense vector representation
CN111553167A (en) * 2020-04-28 2020-08-18 腾讯科技(深圳)有限公司 Text type identification method and device and storage medium
CN111966919A (en) * 2020-07-13 2020-11-20 江汉大学 Event message processing method, device and equipment
CN112035669A (en) * 2020-09-09 2020-12-04 中国科学技术大学 Social media multi-modal rumor detection method based on propagation heterogeneous graph modeling
CN112199468A (en) * 2020-09-23 2021-01-08 夏一雪 Network public opinion artificial intelligence decision-making system under big data environment
CN112199574A (en) * 2020-09-23 2021-01-08 夏一雪 Network public opinion artificial intelligence early warning system under big data environment
CN112560495A (en) * 2020-12-09 2021-03-26 新疆师范大学 Microblog rumor detection method based on emotion analysis
CN113177164A (en) * 2021-05-13 2021-07-27 聂佼颖 Multi-platform collaborative new media content monitoring and management system based on big data
US11630957B2 (en) 2017-09-04 2023-04-18 Huawei Technologies Co., Ltd. Natural language processing method and apparatus

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102609475A (en) * 2012-01-19 2012-07-25 浙江省公众信息产业有限公司 Method for monitoring content of microblog and monitoring system
CN103077240A (en) * 2013-01-10 2013-05-01 北京工商大学 Microblog water army identifying method based on probabilistic graphical model
CN103631901A (en) * 2013-11-20 2014-03-12 清华大学 Rumor control method based on maximum spanning tree of user-trusted network
CN103902621A (en) * 2012-12-28 2014-07-02 深圳先进技术研究院 Method and device for identifying network rumor
US20150067849A1 (en) * 2013-08-29 2015-03-05 International Business Machines Corporation Neutralizing propagation of malicious information

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102609475A (en) * 2012-01-19 2012-07-25 浙江省公众信息产业有限公司 Method for monitoring content of microblog and monitoring system
CN103902621A (en) * 2012-12-28 2014-07-02 深圳先进技术研究院 Method and device for identifying network rumor
CN103077240A (en) * 2013-01-10 2013-05-01 北京工商大学 Microblog water army identifying method based on probabilistic graphical model
US20150067849A1 (en) * 2013-08-29 2015-03-05 International Business Machines Corporation Neutralizing propagation of malicious information
CN103631901A (en) * 2013-11-20 2014-03-12 清华大学 Rumor control method based on maximum spanning tree of user-trusted network

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
贺刚、吕学强、李卓、徐丽萍: "微博谣言识别研究", 《国书情报工作》 *

Cited By (48)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105809162A (en) * 2016-03-10 2016-07-27 腾讯科技(深圳)有限公司 Method and device for acquiring WIFI hot pot and picture associated information
CN105787101A (en) * 2016-03-18 2016-07-20 联想(北京)有限公司 Information processing method and electronic equipment
CN105787101B (en) * 2016-03-18 2019-06-07 联想(北京)有限公司 A kind of information processing method and electronic equipment
CN106096638A (en) * 2016-06-03 2016-11-09 腾讯科技(深圳)有限公司 A kind of data processing method and device
CN106096638B (en) * 2016-06-03 2018-08-07 腾讯科技(深圳)有限公司 A kind of data processing method and device
CN106202211A (en) * 2016-06-27 2016-12-07 四川大学 A kind of integrated microblogging rumour recognition methods based on microblogging type
CN106126700A (en) * 2016-07-01 2016-11-16 复旦大学 A kind of analysis method of microblogging gossip propagation
CN106126700B (en) * 2016-07-01 2020-05-12 复旦大学 Analysis method for propagation of microblog rumors
WO2018014543A1 (en) * 2016-07-20 2018-01-25 平安科技(深圳)有限公司 Information query method, information query device, storage medium, and terminal
CN107644029A (en) * 2016-07-20 2018-01-30 平安科技(深圳)有限公司 Information query method and information query device
CN107797998A (en) * 2016-08-29 2018-03-13 腾讯科技(深圳)有限公司 The recognition methods of user-generated content containing rumour and device
CN107797998B (en) * 2016-08-29 2021-05-07 腾讯科技(深圳)有限公司 Rumor-containing user generated content identification method and device
CN106354845A (en) * 2016-08-31 2017-01-25 上海交通大学 Microblog rumor recognizing method and system based on propagation structures
CN106447285B (en) * 2016-09-12 2020-06-12 北京大学 Recruitment information matching method based on multi-dimensional domain key knowledge
CN106447285A (en) * 2016-09-12 2017-02-22 北京大学 Multidimensional field key knowledge-based recruitment information matching method
CN106528655A (en) * 2016-10-18 2017-03-22 百度在线网络技术(北京)有限公司 Text subject recognition method and device
CN107741939A (en) * 2016-10-31 2018-02-27 腾讯科技(深圳)有限公司 A kind of recognition methods of info web and device
CN107741939B (en) * 2016-10-31 2020-05-12 腾讯科技(深圳)有限公司 Webpage information identification method and device
CN106570162A (en) * 2016-11-04 2017-04-19 北京百度网讯科技有限公司 Canard identification method and device based on artificial intelligence
CN106570162B (en) * 2016-11-04 2020-07-28 北京百度网讯科技有限公司 Artificial intelligence-based rumor recognition method and device
CN106776557B (en) * 2016-12-13 2020-09-08 竹间智能科技(上海)有限公司 Emotional state memory identification method and device of emotional robot
CN106776557A (en) * 2016-12-13 2017-05-31 竹间智能科技(上海)有限公司 Affective state memory recognition methods and the device of emotional robot
CN107180077A (en) * 2017-04-18 2017-09-19 北京交通大学 A kind of social networks rumour detection method based on deep learning
CN107451923A (en) * 2017-07-14 2017-12-08 北京航空航天大学 A kind of online social networks rumour Forecasting Methodology based on forwarding Analytic Network Process
CN107562814A (en) * 2017-08-14 2018-01-09 中国农业大学 A kind of earthquake emergency and the condition of a disaster acquisition of information sorting technique and system
US11630957B2 (en) 2017-09-04 2023-04-18 Huawei Technologies Co., Ltd. Natural language processing method and apparatus
CN108038240A (en) * 2017-12-26 2018-05-15 武汉大学 Based on content, the social networks rumour detection method of user's multiplicity
CN108228853A (en) * 2018-01-11 2018-06-29 北京信息科技大学 A kind of microblogging rumour recognition methods and system
CN108491480B (en) * 2018-03-12 2021-05-11 义语智能科技(上海)有限公司 Rumor detection method and apparatus
CN108491480A (en) * 2018-03-12 2018-09-04 义语智能科技(上海)有限公司 Rumour detection method and equipment
CN108563686B (en) * 2018-03-14 2021-07-30 中国科学院自动化研究所 Social network rumor identification method and system based on hybrid neural network
CN108563686A (en) * 2018-03-14 2018-09-21 中国科学院自动化研究所 Social networks rumour recognition methods based on hybrid neural networks and system
WO2019196259A1 (en) * 2018-04-09 2019-10-17 平安科技(深圳)有限公司 Method for identifying false message and device thereof
CN109558483B (en) * 2018-10-16 2021-06-18 北京航空航天大学 Rumor recognition method based on naive Bayes model
CN109558483A (en) * 2018-10-16 2019-04-02 北京航空航天大学 A kind of rumour recognition methods based on model-naive Bayesian
CN110807556B (en) * 2019-11-05 2022-05-31 重庆邮电大学 Method and device for predicting propagation trend of microblog rumors or/and dagger topics
CN110807556A (en) * 2019-11-05 2020-02-18 重庆邮电大学 Method and device for predicting propagation trend of microblog rumors or/and dagger rumors
CN111079444B (en) * 2019-12-25 2020-09-29 北京中科研究院 Network rumor detection method based on multi-modal relationship
CN111079444A (en) * 2019-12-25 2020-04-28 北京中科研究院 Network rumor detection method based on multi-modal relationship
CN111259658A (en) * 2020-02-05 2020-06-09 中国科学院计算技术研究所 General text classification method and system based on category dense vector representation
CN111553167A (en) * 2020-04-28 2020-08-18 腾讯科技(深圳)有限公司 Text type identification method and device and storage medium
CN111966919A (en) * 2020-07-13 2020-11-20 江汉大学 Event message processing method, device and equipment
CN112035669A (en) * 2020-09-09 2020-12-04 中国科学技术大学 Social media multi-modal rumor detection method based on propagation heterogeneous graph modeling
CN112199574A (en) * 2020-09-23 2021-01-08 夏一雪 Network public opinion artificial intelligence early warning system under big data environment
CN112199468A (en) * 2020-09-23 2021-01-08 夏一雪 Network public opinion artificial intelligence decision-making system under big data environment
CN112560495A (en) * 2020-12-09 2021-03-26 新疆师范大学 Microblog rumor detection method based on emotion analysis
CN112560495B (en) * 2020-12-09 2024-03-15 新疆师范大学 Microblog rumor detection method based on emotion analysis
CN113177164A (en) * 2021-05-13 2021-07-27 聂佼颖 Multi-platform collaborative new media content monitoring and management system based on big data

Similar Documents

Publication Publication Date Title
CN105045857A (en) Social network rumor recognition method and system
Jiang et al. Sentiment computing for the news event based on the social media big data
CN106354872B (en) Text clustering method and system
CN106940732A (en) A kind of doubtful waterborne troops towards microblogging finds method
CN104615608B (en) A kind of data mining processing system and method
CN104331451B (en) A kind of recommendation degree methods of marking of network user's comment based on theme
Aragón et al. Overview of MEX-A3T at IberLEF 2020: Fake News and Aggressiveness Analysis in Mexican Spanish.
CN110390018A (en) A kind of social networks comment generation method based on LSTM
El-Halees Mining opinions in user-generated contents to improve course evaluation
CN107239512B (en) A kind of microblogging comment spam recognition methods of combination comment relational network figure
CN104331506A (en) Multiclass emotion analyzing method and system facing bilingual microblog text
CN103793503A (en) Opinion mining and classification method based on web texts
CN106354845A (en) Microblog rumor recognizing method and system based on propagation structures
CN103761239A (en) Method for performing emotional tendency classification to microblog by using emoticons
CN101599071A (en) The extraction method of conversation text topic
CN104899335A (en) Method for performing sentiment classification on network public sentiment of information
CN105183717A (en) OSN user emotion analysis method based on random forest and user relationship
CN106126502A (en) A kind of emotional semantic classification system and method based on support vector machine
CN105893484A (en) Microblog Spammer recognition method based on text characteristics and behavior characteristics
Mehra et al. Sentimental analysis using fuzzy and naive bayes
CN105740382A (en) Aspect classification method for short comment texts
Alawneh et al. Sentiment analysis-based sexual harassment detection using machine learning techniques
CN107305545A (en) A kind of recognition methods of the network opinion leader based on text tendency analysis
CN104915399A (en) Recommended data processing method based on news headline and recommended data processing method system based on news headline
CN106569999A (en) Multi-granularity short text semantic similarity comparison method and system

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication
RJ01 Rejection of invention patent application after publication

Application publication date: 20151111