CN106940732A - A kind of doubtful waterborne troops towards microblogging finds method - Google Patents

A kind of doubtful waterborne troops towards microblogging finds method Download PDF

Info

Publication number
CN106940732A
CN106940732A CN201710212983.9A CN201710212983A CN106940732A CN 106940732 A CN106940732 A CN 106940732A CN 201710212983 A CN201710212983 A CN 201710212983A CN 106940732 A CN106940732 A CN 106940732A
Authority
CN
China
Prior art keywords
user
microblogging
data
waterborne troops
feature
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201710212983.9A
Other languages
Chinese (zh)
Inventor
刘春阳
乔杨
赵志云
李雄
张华平
张旭
庞琳
王萌
商建云
王卿
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
National Computer Network and Information Security Management Center
Original Assignee
National Computer Network and Information Security Management Center
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by National Computer Network and Information Security Management Center filed Critical National Computer Network and Information Security Management Center
Publication of CN106940732A publication Critical patent/CN106940732A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/951Indexing; Web crawling techniques
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q50/00Systems or methods specially adapted for specific business sectors, e.g. utilities or tourism
    • G06Q50/01Social networking

Abstract

Method is found the present invention relates to a kind of doubtful waterborne troops towards microblog data, belongs to Computer Applied Technology field.The present invention is divided into the collection of following six step, respectively relevant microblog data;Data prediction;User characteristics is extracted;Build training set;Train waterborne troops's detection model;Prediction differentiates unlabeled data.Contrast prior art, the present invention realizes making full use of for data, conveniently carry out colony's discovery and without setting up complicated classification and Detection model, so as to reduce the complexity of algorithm, and the modularity of algorithm is higher, large-scale data calculating can be put into, with higher stability;The present invention a collection of user in a certain particular event can also be identified, this method modularity is extremely strong except that can carry out waterborne troops's detection to sole user, can stablize and be applied under large-scale data Computational frame.

Description

A kind of doubtful waterborne troops towards microblogging finds method
Technical field
This method is related to a kind of doubtful waterborne troops towards microblogging and finds method, belongs to social network analysis and data mining skill Art field.
Background technology
In the past several years, social networks has become what people kept in touch in internet with friends and family One of major way.There is statistics to show that the average time that people spend in social network sites will be far more than other websites.Big portion The social network sites divided both provide the service conducted interviews by mobile device, and this also causes the access more frequency of social network sites It is numerous.
The quick prevalence of social networks is with widely using the substantial amounts of relevant use for alloing these websites to be collected into generation The information of the interest at family, the friend of user and user.Unfortunately, easily information propagation pattern and substantial amounts of valuable number According to many illegal groups or personal attention has also been attracted, social networks regards that one gains high profits or real as by them Existing illegal purpose convenient way.At present, there is a large amount of rumours or spoofing in some social network sites.Particularly, society at this stage Hand in media, people receive a large amount of " pouring water " information of the extreme influence of navy account number, such as waterborne troops's issue, waterborne troops's machine human hair Cloth magnanimity spam, farthest to propagate junk information etc., has had a strong impact on online experience.
The behavior of legacy network waterborne troops, its time of occurrence is relatively small compared with early, quantity size, behavior does not have height disguised, The junk information of generation has obvious characteristic.Therefore, to its recognition methods predominantly based on spam content analysis, such as mail Content analysis.Meanwhile, by largely recognizing that setting up blacklist and white list is respectively intended to record suspicious user information and just conventional Family information, waterborne troops's Activity recognition efficiency and accuracy rate are improved with this.In addition, the behavior of mail domain network navy produces spam Required resource is similar, and mail waterborne troops can be positioned well using resource and its network level feature by it.With network rings The increase that the complication in border and waterborne troops endanger, the ability that user is taken precautions against it also constantly strengthens.To reach its purpose, network navy Behavior gradually complicates and is intended to normal users, and the recognition methods of conventional mail waterborne troops behavior can not find these hidden nets Network navy account number.
Web 2.0 is a kind of emerging interconnected network mode, by network application, promotes interpersonal information in network Exchange and cooperative cooperating, its pattern customer-centric.Currently, the network navy Study of recognition of Web 2.0 is according to target domain Difference, can be divided into mail domain, e-commerce field, field of social network and forum's field network navy Study of recognition.Net Network waterborne troops Study of recognition can be divided into according to the difference of research method and produce content characteristic, based on the related spy of user based on user Levy, the recognition methods based on environmental characteristic.
The network navy Study of recognition of Web 2.0, is the adaptability Study of recognition on the basis of the identification of legacy network waterborne troops.Mesh Before, domestic and international network navy Study of recognition achieves bigger progress, but there are still many major issues urgently It is to be solved.External network navy Study of recognition initially concentrates on mail domain, and is rapidly spread to social networks interior in recent years In e-commerce field.Domestic network waterborne troops Study of recognition more lacks by contrast.Mainly have special based on content at this stage Levy, the network navy recognition methods of user characteristics, environmental characteristic and comprehensive characteristics.For example:Ratkiewicz in 2010 et al. " Truthy " system is devised, those hot issues tweet propagation is collected, analyzes and visualize online, and using such as topic The collection such as label ' # ', short chain, expression recognizes the political information abuse on Twitter from tweet feature. 2011, Qazvinian et al. attempted to detect rumour on Twitter.PROBLEM DECOMPOSITION is had supervision machine by them for two steps Habit task:The microblogging for being related to rumour is retrieved first, the microblogging for therefrom identifying and supporting rumour of then classifying on this basis.Point Content of text, user's history and the specific mould of microblogging have been used in class because of the linear combination of this three category features log-likelihood, experiment As a result show that text feature (word frequency, part of speech) is still most important, while latter two feature has also been obviously improved classification performance.
But in the processing procedure of practical problem, learn no doubt ensure knowledge by exercising supervision using excessive feature Not other rate, but the extraction difficulty of high-dimensional characteristic set and Individual features excessive can also cause the performance of system can not accordingly The requirement of practical application is met, simultaneously because data is openness, many times, we can not possibly always obtain full dose data (bean vermicelli relation, concern relation, forwarding information etc.), in this case, because data is incomprehensive, it would be desirable to as far as possible Simplification feature set merge using the feature, introduction for being easy to extract that more cleverly identification model ensures Feature extraction and recognition The efficiency of prediction.
The content of the invention
The purpose of the present invention is can not accurately to carry out colony to solve in the case of customer relationship links Sparse It was found that the problem of, propose that a kind of doubtful waterborne troops towards microblogging finds method.
Idea of the invention is that in view of being easiest in mass data obtain and more comprehensive information is social user The text data information delivered, proposes that a kind of colony based on text data finds and extending method, mainly for user's Text data carries out natural language processing and finally extracts the characteristic information of the user, and is modeled according to characteristic information, Cluster analysis is carried out finally by the similitude compared between each user, corporations of colony are finally given, and extract the colony Outstanding feature carry out colony expansion.
The purpose of the present invention is achieved through the following technical solutions:
A kind of doubtful waterborne troops towards microblogging finds method, comprises the following steps:
Step 1: collection relevant microblog data, obtain following information:Text message that microblog users are sent out, user are done The text message of comment, the interactive information that is carried out on microblogging of user, including comment operation, forwarding relation, thumb up operation; The base attribute of user includes bean vermicelli number, concern number, concern relation;
Based on some data resources disclosed in crawler technology or microblogging and the waterborne troops's microblogging or account directly bought, obtain To the micro-blog information for needing to analyze, these information mainly include:The comment that text message that microblog users are sent out, user are done The interactive information that text message, user are carried out on microblogging, including comment operation, forwarding relation, thumb up operation;The base of user This attribute includes bean vermicelli number, concern number, concern relation;
Step 2: carrying out following data prediction work to the sample data obtained via step one:Data are carried out first Cleaning, then carries out Chinese word segmentation to microblogging text, and data are parsed finally by hierarchical relationship, obtains user-microblogging text and reflects Penetrate, user-comment text maps, and retains user-concern relation, user-bean vermicelli relation, user-forwarding relation data;
Step 3: to carrying out user characteristics extraction via the pretreated data of step 2:For the institute in microblog data There is user to extract feature " bean vermicelli number " and " concern number " respectively;Then indirect feature is calculated according to the content of microblog for extracting user " bean vermicelli concern ratio ", " original microblogging ratio ", " forwarding microblogging ratio ", " microblogging be averaged@numbers ", " frequency of posting ", " full dose microblogging Network access number ", " forwarding microblogging network access number " and " whether participating in microblogging of the forwarding more than m times ";
Step 4: building training set:If user does not provide training set, the user marked in advance is gathered, classification Label is waterborne troops, non-waterborne troops, carries out user characteristics and extracts structure training set, if user provides training set, is carried using user The data marked of confession are as training set;
Preferably, during the progress user characteristics extraction, can be to be used in adjusting training according to different identification demands Characteristic set, it is not necessary to use the complete characteristic set described in step 2.
Step 5: training waterborne troops detection model:The characteristic set data marked using step 4 carry out classification and Detection mould The training of type;
Preferably, the present embodiment uses LogisticRegression algorithms as classification and Detection model, n spy is given Levy x=(x1,x2,…,xn), if conditional probability p (y=1 | x) is observation sample y relative to the probability that event factor x occurs, use Sigmoid function representations are:
Wherein g (x)=w0+w1x1+…+wnxn, w0For intercept, w1,…,wnRepresent that feature 1 arrives feature n weights, in x bars The probability that y does not occur under part is:
Step 6: using the above-mentioned waterborne troops's detection model trained, carrying out waterborne troops's user's identification, detailed process is:User Addition needs the microblog users predicted, user is gathered by step one to step 3 first if the ID or the pet name of only user Microblog data simultaneously calculates user characteristics, and the feature of acquisition is predicted applied to detection model.
Preferably, being carried out successively to all users in a certain event using waterborne troops's user's identification process described in step 6 Detection, can differentiate whether the event contains waterborne troops.
Beneficial effect
The microblogging text data information used in the present invention, contains microblog users multi-angle, many features, completely Describe a microblog account, this avoid because the relatively low accuracy rate brought of characteristic dimension it is relatively low the problem of;Exist simultaneously The feature related to event is added in characteristic set, so that evaded the influence of the navy account number accidentally appeared in event, The accuracy rate of identification is further improved, can detect account while whether being waterborne troops pair we have appreciated that whether having in event Waterborne troops, which promotes, provides more helps, with very strong use value.The present invention realizes making full use of for data, convenient and swift Carry out colony's discovery and without setting up complicated classification and Detection model, so as to reduce the complexity of algorithm, and algorithm Modularity is higher, large-scale data calculating can be put into, with higher stability.The present invention to sole user except that can enter Water-filling army is detected, a collection of user in a certain particular event can also be identified, and to judge whether to have in event waterborne troops's curtain After promote, the data such as customer relationship obtained during intermediate treatment can also be used to find waterborne troops source, group of identification waterborne troops Even depth is excavated, and improves information utilization, there is very big practical value.This method modularity is extremely strong, can stablize and be applied to greatly Under scale data Computational frame.
Brief description of the drawings
Fig. 1 is the schematic flow sheet that a kind of doubtful waterborne troops towards microblogging of the embodiment of the present invention finds method;
Fig. 2 is data acquisition and pretreatment process schematic diagram of the embodiment of the present invention for microblogging;
Fig. 3 is that the flow that the training that the embodiment of the present invention carries out detection model using the feature of extraction is predicted with new samples is shown It is intended to.
Embodiment
The present invention is described in detail with reference to the accompanying drawings and examples:
By taking a certain Sina weibo user as an example:
The ID of this user is 5364402211.When whether need to judge this microblog account is waterborne troops's account, only need This ID of user 5364402211 is provided, the collection of related data can be carried out according to ID according to the inventive method Whether it is finally that waterborne troops provides and predicted the outcome to this account with analysis.Detailed process as shown in figure 1, illustrate in detail below.
The collection of relevant microblog data is carried out according to step one:
It is acquired for the Sina weibo data that we to be studied or directly obtains the public data that microblogging is provided.Number According to collection by setting up buffering URL queues, web link search is carried out using breadth-first search (BFS), and to every Individual node web page is scanned download, and the page is parsed, and removes unrelated noise, and reservation can describe the attribute of user Metadata information:Microblogging text message that user delivers, the microblogging text message of user comment, the bean vermicelli number of user, use The concern number at family, the forwarding relation of user, the log-on message of user;The API that the offer of microblogging official can also be directly invoked connects The feedback information such as mouth or RSS directly extracts relevant information.The waterborne troops's user data needed in training process is then by interconnection Online purchase account or forwarding comment obtain waterborne troops's ID, then obtain corresponding user and micro- by our acquisition method Rich data;
Data prediction is carried out according to step 2, as shown in Fig. 2
Due to there is substantial amounts of semi-structured, unstructured data in microblogging, therefore the microblogging obtained for collection is first Data are, it is necessary to carry out corresponding cleaning and integrated, the integration that these metadata are carried out with data is stored, and set up corresponding mapping Relation, is easy to the implementation of subsequent process.
1) data cleansing:For the initial data collected, the inspection of data integrity is carried out, user profile or micro- is removed The rich incomplete microblog users of information and its corresponding content of microblog;
2) text participle:Participle instrument is used (such as to the microblogging text message (delivering microblogging, comment microblogging) of user ICTCLAS Words partition systems) or method progress text participle, stop words is removed, the vector space model (VSM of corresponding text is obtained: Vector space model);
3) it is based on step 1) and data after 2) handling, set up user-microblogging text VSM mapping and user-comment text VSM maps, while the mapping such as user-forwarding relation, user-bean vermicelli relation, user-concern relation can also be obtained.
User characteristics extraction is carried out according to step 3:
User is described using the characteristic set of a various dimensions for we for this part, wherein both having included directly leading to The direct feature that collection is obtained is crossed, while the indirect feature also obtained including secondary calculating, specific as shown in table 1:
Table 1:User characteristics
Wherein bean vermicelli number, concern number and bean vermicelli concern than can by the bean vermicelli of the Part III counting user of step 2, Concern relation is obtained, and bean vermicelli number is obtained by user-bean vermicelli relation, and concern number is obtained by user-concern relation, bean vermicelli concern Than being obtained by bean vermicelli number/concern number.
Microblogging ratio, original microblogging ratio and content of microblog average "@" number is forwarded to be counted by the Part II of step 2 Obtain.Judge whether its microblogging type is forwarding microblogging or original microblogging simultaneously statistics numbers according to the content of every microblogging of user, The ratio calculated after statistics with the total microblogging number of user is worth to forwarding microblogging ratio, original microblogging ratio.By VSM modeling statistics its In@symbol numbers, be used as content of microblog average "@" number with the ratio of user's microblogging sum.
Post frequency, full dose microblogging network access number, forwarding microblogging network access number, whether participate in forwarding and be more than 100 times Microblogging obtained by the Part II of step 2.Microblogging issuing time in every microblogging of user is ranked up with sending out the latest The difference (hour) of cloth time and earliest issuing time calculates frequency of posting as time interval by microblogging sum/time interval Rate.The network access in every microblogging is counted, the summation of all network accesses is regard as full dose microblogging network access number.Statistics is every Network access in bar forwarding microblogging, regard the summation of all network accesses as forwarding microblogging network access number.Counting user The former microblogging of every forwarding microblogging is forwarded number, and if any one former microblogging is forwarded more than 100 times, then identification user has Microblogging of the forwarding more than 100 times.
As described above to user id be 5364402211 carry out feature extractions, obtain its characteristic vector for x=(38, 182,0.21,0.19,0.0078,0.494,0.118,9.0,5.0,1.0)
Can adjusting training is used according to demand feature, the reduction one of characteristic set during realistic model is trained Determine that the reduction of recognition accuracy can be caused in degree, but also can lifting system accordingly performance.On the other hand, if to calling together The requirement for the rate of returning is higher and is not to take much count of overall recognition accuracy, then can use attributive character, the group of content characteristic Close and obtain higher recall rate.
According to step 4, training set is built:
What training set was included is the data for training waterborne troops's detection model, each user's correspondence one in units of user Training set data.One training set data is two tuples<X, L>It is made up of two parts, X is one group of characteristic vector:
X=(X1,X2,…,Xd),
Characteristic vector is used as the input of disaggregated model, wherein XiRepresent the characteristic value of user, i ∈ 1,2 ... ..., D }, d represents the feature quantity of user;L is that the tag attributes of user, i.e. " yes " or " no " represent that user is waterborne troops or is not Waterborne troops.
Waterborne troops's detection model is trained according to step 5, as shown in Figure 3:
The identification of doubtful waterborne troops is considered as a classification problem by us, the training set built by step 4, to train point Class model, and model preserve being predicted.LogisticRegression algorithm conducts are used in the present embodiment Disaggregated model, gives n feature x=(x1,x2,…,xn), if conditional probability p (y=1 | x) is observation sample y relative to event The probability that factor x occurs, be with sigmoid function representations:
Wherein g (x)=w0+w1x1+…+wnxn, then the probability that y does not occur under the conditions of x is:
In this part, to be done is exactly to train w=(w0,w1,…,wn) this group of weights.
According to Step 6: prediction differentiates unlabeled data, the identification of progress waterborne troops
When whether detect some user be waterborne troops, the microblogging ID detected the need for being provided first according to user passes through above-mentioned step Process described in rapid 1- steps 3 carries out the collection of relevant microblog data, and carries out feature extraction, and the feature of extraction is applied into training Whether good detection model, output is predicting the outcome for waterborne troops for user;The characteristic vector x extracted in step 3 is applied The model trained in step 5 is predicted, and is obtained p (y=1 | x) > p (y=0 | x), then be may determine that ID is 5364402211 user is waterborne troops, is otherwise non-waterborne troops.
Detect when whether having waterborne troops in a certain event, then whether use above-mentioned detection user for the same flow pair of waterborne troops Each microblog users in event are detected, when a certain user is detected for waterborne troops user, then assert that this event contains There is waterborne troops.
The inventive method has been successfully applied to waterborne troops's Detection task of million grades of content of microblog, achieves more than 80% Recognition accuracy.In addition the public sentiment that the present invention has been applied to social networks by state security department is monitored, in microblogging social platform Waterborne troops's context of detection effect significantly, a large amount of microblog water armies have been successfully be detected, to safeguard that internet order is made that remarkable tribute Offer.
In order to illustrate present disclosure and implementation, above-mentioned specific embodiment is given.Introduced in embodiment thin The purpose of section is not the scope for limiting claims, and is to aid in understanding the method for the invention.Those skilled in the art It should be understood that:Do not departing from the present invention and its spirit and scope of the appended claims, the various of most preferred embodiment step are being repaiied Change, change or replacement are all possible.Therefore, the present invention should not be limited to most preferred embodiment and accompanying drawing disclosure of that.

Claims (4)

1. a kind of doubtful waterborne troops towards microblogging finds method, it is characterised in that:Comprise the following steps:
Step 1: collection relevant microblog data, obtain following information:What text message that microblog users are sent out, user were done comments The interactive information that the text message of opinion, user are carried out on microblogging, including comment operation, forwarding relation, thumb up operation;User Base attribute include bean vermicelli number, concern number, concern relation;
Step 2: carrying out following data prediction work to the sample data obtained via step one:Data cleansing is carried out first, Then Chinese word segmentation is carried out to microblogging text, data is parsed finally by hierarchical relationship, obtained user-microblogging text mapping, use Family-comment text mapping, and retain user-concern relation, user-bean vermicelli relation, user-forwarding relation data;
Step 3: to carrying out user characteristics extraction via the pretreated data of step 2:It is useful for the institute in microblog data Feature " bean vermicelli number " and " concern number " are extracted respectively in family;Then indirect feature " bean vermicelli is calculated according to the content of microblog for extracting user Concern ratio ", " original microblogging ratio ", " forwarding microblogging ratio ", " the average@numbers of microblogging ", " frequency of posting ", " full dose microblogging online Mode number ", " forwarding microblogging network access number " and " whether participating in microblogging of the forwarding more than m times ";
Step 4: building training set:If user does not provide training set, the user marked in advance is gathered, tag along sort For waterborne troops, non-waterborne troops, carry out user characteristics and extract structure training set, if user provides training set, provided using user The data marked are as training set;
Step 5: training waterborne troops detection model:The characteristic set data marked using step 4 carry out classification and Detection model Training;
Step 6: using the above-mentioned waterborne troops's detection model trained, carrying out waterborne troops's user's identification, detailed process is:User adds The microblog users of prediction are needed, user's microblogging is gathered by step one to step 3 first if the ID or the pet name of only user Data simultaneously calculate user characteristics, and the feature of acquisition is predicted applied to detection model.
2. a kind of doubtful waterborne troops towards microblogging according to claim 1 finds method, it is characterised in that:It is described to be used During the feature extraction of family, according to different identification demands, characteristic set that can be to be used in adjusting training, it is not necessary to use step 2 institute The complete characteristic set stated.
3. a kind of doubtful waterborne troops towards microblogging according to claim 1 finds method, it is characterised in that:The classification inspection Survey model is LogisticRegression algorithms, that is, gives n feature x=(x1,x2,…,xn), if conditional probability p (y=1 | X) for observation sample y relative to the probability that event factor x occurs, be with sigmoid function representations:
p ( y = 1 | x ) = &pi; ( x ) = 1 1 + e - g ( x ) ;
Wherein g (x)=w0+w1x1+…+wnxn, w0For intercept, w1,…,wnRepresent that feature 1 arrives feature n weights, the y under the conditions of x The probability not occurred is:
p ( y = 0 | x ) = 1 - p ( y = 1 | x ) = 1 1 + e g ( x ) .
4. method is found according to a kind of any described doubtful waterborne troopies towards microblogging of claim 1-3, it is characterised in that:Using Waterborne troops's user's identification process described in step 6 is detected to all users in a certain event successively, can differentiate that the event is It is no to contain waterborne troops.
CN201710212983.9A 2016-05-30 2017-04-01 A kind of doubtful waterborne troops towards microblogging finds method Pending CN106940732A (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201610371264 2016-05-30
CN2016103712647 2016-05-30

Publications (1)

Publication Number Publication Date
CN106940732A true CN106940732A (en) 2017-07-11

Family

ID=59463575

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201710212983.9A Pending CN106940732A (en) 2016-05-30 2017-04-01 A kind of doubtful waterborne troops towards microblogging finds method

Country Status (1)

Country Link
CN (1) CN106940732A (en)

Cited By (23)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107451923A (en) * 2017-07-14 2017-12-08 北京航空航天大学 A kind of online social networks rumour Forecasting Methodology based on forwarding Analytic Network Process
CN107659647A (en) * 2017-09-26 2018-02-02 精硕科技(北京)股份有限公司 The recognition methods of water note and device
CN107832413A (en) * 2017-11-07 2018-03-23 电子科技大学 A kind of detection method of microblogging inactive users
CN107895010A (en) * 2017-11-13 2018-04-10 华东师范大学 A kind of method that detection network navy is thumbed up based on network
CN108763574A (en) * 2018-06-06 2018-11-06 电子科技大学 A kind of microblogging rumour detection algorithm based on gradient boosted tree detects characteristic set with rumour
CN108921587A (en) * 2018-05-24 2018-11-30 腾讯科技(深圳)有限公司 A kind of data processing method, device and server
CN109558555A (en) * 2018-08-20 2019-04-02 湖北大学 Microblog water army detection method and detection system based on artificial immunity danger theory
CN109559245A (en) * 2017-09-26 2019-04-02 北京国双科技有限公司 A kind of method and device identifying specific user
CN109783586A (en) * 2019-01-21 2019-05-21 福州大学 Waterborne troops's comment detection system and method based on cluster resampling
CN110032859A (en) * 2018-12-25 2019-07-19 阿里巴巴集团控股有限公司 Abnormal account's discrimination method and device and medium
CN110110079A (en) * 2019-03-21 2019-08-09 中国人民解放军战略支援部队信息工程大学 A kind of social networks junk user detection method
CN110457558A (en) * 2019-07-31 2019-11-15 沃民高新科技(北京)股份有限公司 The recognition methods and device of network navy, storage medium and processor
CN110727763A (en) * 2019-10-09 2020-01-24 南京邮电大学 Method for identifying special ethnic group in social media propagation
CN110727861A (en) * 2019-09-23 2020-01-24 上海蜜度信息技术有限公司 Method and equipment for microblog water army identification
CN110956210A (en) * 2019-11-29 2020-04-03 重庆邮电大学 Semi-supervised network water force identification method and system based on AP clustering
CN111198992A (en) * 2020-01-07 2020-05-26 精硕科技(北京)股份有限公司 Identification method and identification device for mother and infant crowd, electronic equipment and storage medium
CN111259962A (en) * 2020-01-17 2020-06-09 中南大学 Sybil account detection method for time sequence social data
CN112597309A (en) * 2020-12-25 2021-04-02 西南电子技术研究所(中国电子科技集团公司第十研究所) Detection system for identifying microblog data stream of sudden event in real time
CN112800304A (en) * 2021-01-08 2021-05-14 上海海事大学 Microblog water army group detection method based on clustering
CN112906383A (en) * 2021-02-05 2021-06-04 成都信息工程大学 Integrated adaptive water army identification method based on incremental learning
CN113837512A (en) * 2020-06-23 2021-12-24 中国移动通信集团辽宁有限公司 Abnormal user identification method and device
CN115840844A (en) * 2022-12-17 2023-03-24 深圳市新联鑫网络科技有限公司 Internet platform user behavior analysis system based on big data
CN116150507A (en) * 2023-04-04 2023-05-23 湖南蚁坊软件股份有限公司 Water army group identification method, device, equipment and medium

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103077240A (en) * 2013-01-10 2013-05-01 北京工商大学 Microblog water army identifying method based on probabilistic graphical model
CN104915397A (en) * 2015-05-28 2015-09-16 国家计算机网络与信息安全管理中心 Method and device for predicting microblog propagation tendencies

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103077240A (en) * 2013-01-10 2013-05-01 北京工商大学 Microblog water army identifying method based on probabilistic graphical model
CN104915397A (en) * 2015-05-28 2015-09-16 国家计算机网络与信息安全管理中心 Method and device for predicting microblog propagation tendencies

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
孙卫强: "基于深度信念网络的网络水军识别研究", 《中国优秀硕士学位论文全文数据库信息科技辑》 *

Cited By (29)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107451923A (en) * 2017-07-14 2017-12-08 北京航空航天大学 A kind of online social networks rumour Forecasting Methodology based on forwarding Analytic Network Process
CN107659647A (en) * 2017-09-26 2018-02-02 精硕科技(北京)股份有限公司 The recognition methods of water note and device
CN109559245A (en) * 2017-09-26 2019-04-02 北京国双科技有限公司 A kind of method and device identifying specific user
CN109559245B (en) * 2017-09-26 2022-02-25 北京国双科技有限公司 Method and device for identifying specific user
CN107832413A (en) * 2017-11-07 2018-03-23 电子科技大学 A kind of detection method of microblogging inactive users
CN107895010A (en) * 2017-11-13 2018-04-10 华东师范大学 A kind of method that detection network navy is thumbed up based on network
CN108921587A (en) * 2018-05-24 2018-11-30 腾讯科技(深圳)有限公司 A kind of data processing method, device and server
CN108763574A (en) * 2018-06-06 2018-11-06 电子科技大学 A kind of microblogging rumour detection algorithm based on gradient boosted tree detects characteristic set with rumour
CN109558555A (en) * 2018-08-20 2019-04-02 湖北大学 Microblog water army detection method and detection system based on artificial immunity danger theory
CN110032859A (en) * 2018-12-25 2019-07-19 阿里巴巴集团控股有限公司 Abnormal account's discrimination method and device and medium
CN109783586A (en) * 2019-01-21 2019-05-21 福州大学 Waterborne troops's comment detection system and method based on cluster resampling
CN109783586B (en) * 2019-01-21 2022-10-21 福州大学 Water army comment detection method based on clustering resampling
CN110110079A (en) * 2019-03-21 2019-08-09 中国人民解放军战略支援部队信息工程大学 A kind of social networks junk user detection method
CN110110079B (en) * 2019-03-21 2021-06-08 中国人民解放军战略支援部队信息工程大学 Social network spam user detection method
CN110457558A (en) * 2019-07-31 2019-11-15 沃民高新科技(北京)股份有限公司 The recognition methods and device of network navy, storage medium and processor
CN110727861A (en) * 2019-09-23 2020-01-24 上海蜜度信息技术有限公司 Method and equipment for microblog water army identification
CN110727763A (en) * 2019-10-09 2020-01-24 南京邮电大学 Method for identifying special ethnic group in social media propagation
CN110727763B (en) * 2019-10-09 2022-10-14 南京邮电大学 Method for identifying special ethnic group in social media propagation
CN110956210A (en) * 2019-11-29 2020-04-03 重庆邮电大学 Semi-supervised network water force identification method and system based on AP clustering
CN110956210B (en) * 2019-11-29 2023-03-28 重庆邮电大学 Semi-supervised network water force identification method and system based on AP clustering
CN111198992A (en) * 2020-01-07 2020-05-26 精硕科技(北京)股份有限公司 Identification method and identification device for mother and infant crowd, electronic equipment and storage medium
CN111259962A (en) * 2020-01-17 2020-06-09 中南大学 Sybil account detection method for time sequence social data
CN113837512A (en) * 2020-06-23 2021-12-24 中国移动通信集团辽宁有限公司 Abnormal user identification method and device
CN112597309A (en) * 2020-12-25 2021-04-02 西南电子技术研究所(中国电子科技集团公司第十研究所) Detection system for identifying microblog data stream of sudden event in real time
CN112800304A (en) * 2021-01-08 2021-05-14 上海海事大学 Microblog water army group detection method based on clustering
CN112906383A (en) * 2021-02-05 2021-06-04 成都信息工程大学 Integrated adaptive water army identification method based on incremental learning
CN115840844A (en) * 2022-12-17 2023-03-24 深圳市新联鑫网络科技有限公司 Internet platform user behavior analysis system based on big data
CN115840844B (en) * 2022-12-17 2023-08-15 深圳市新联鑫网络科技有限公司 Internet platform user behavior analysis system based on big data
CN116150507A (en) * 2023-04-04 2023-05-23 湖南蚁坊软件股份有限公司 Water army group identification method, device, equipment and medium

Similar Documents

Publication Publication Date Title
CN106940732A (en) A kind of doubtful waterborne troops towards microblogging finds method
CN106980692B (en) Influence calculation method based on microblog specific events
CN104615608B (en) A kind of data mining processing system and method
Lee et al. Uncovering social spammers: social honeypots+ machine learning
CN106886518A (en) A kind of method of microblog account classification
CN106354845A (en) Microblog rumor recognizing method and system based on propagation structures
CN105045857A (en) Social network rumor recognition method and system
CN103793503A (en) Opinion mining and classification method based on web texts
Al-Zoubi et al. Spam profiles detection on social networks using computational intelligence methods: the effect of the lingual context
CN110457404A (en) Social media account-classification method based on complex heterogeneous network
CN107291886A (en) A kind of microblog topic detecting method and system based on incremental clustering algorithm
CN110990683B (en) Microblog rumor integrated identification method and device based on region and emotional characteristics
CN105893484A (en) Microblog Spammer recognition method based on text characteristics and behavior characteristics
CN106681989A (en) Method for predicting microblog forwarding probability
Zulfiker et al. Analyzing the public sentiment on COVID-19 vaccination in social media: Bangladesh context
Yang et al. Comparison and modelling of country-level microblog user and activity in cyber-physical-social systems using Weibo and Twitter data
Chen et al. The best answers? think twice: online detection of commercial campaigns in the CQA forums
Wei et al. A new evaluation algorithm for the influence of user in social network
Cheng et al. ISC: An iterative social based classifier for adult account detection on twitter
Lin et al. Finding the key users in Facebook fan pages via a clustering approach
Zadeh et al. Mining social network for semantic advertisement
Xianlei et al. Finding domain experts in microblogs
Yin et al. Research of integrated algorithm establishment of a spam detection system
Altinel et al. Identifying topic-based opinion leaders in social networks by content and user information
Zheng et al. A study on microblog classification based on information publicness

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication
RJ01 Rejection of invention patent application after publication

Application publication date: 20170711