CN106940732A - A kind of doubtful waterborne troops towards microblogging finds method - Google Patents
A kind of doubtful waterborne troops towards microblogging finds method Download PDFInfo
- Publication number
- CN106940732A CN106940732A CN201710212983.9A CN201710212983A CN106940732A CN 106940732 A CN106940732 A CN 106940732A CN 201710212983 A CN201710212983 A CN 201710212983A CN 106940732 A CN106940732 A CN 106940732A
- Authority
- CN
- China
- Prior art keywords
- user
- microblogging
- data
- waterborne troops
- feature
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/90—Details of database functions independent of the retrieved data types
- G06F16/95—Retrieval from the web
- G06F16/951—Indexing; Web crawling techniques
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06Q—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
- G06Q50/00—Systems or methods specially adapted for specific business sectors, e.g. utilities or tourism
- G06Q50/01—Social networking
Abstract
Method is found the present invention relates to a kind of doubtful waterborne troops towards microblog data, belongs to Computer Applied Technology field.The present invention is divided into the collection of following six step, respectively relevant microblog data;Data prediction;User characteristics is extracted;Build training set;Train waterborne troops's detection model;Prediction differentiates unlabeled data.Contrast prior art, the present invention realizes making full use of for data, conveniently carry out colony's discovery and without setting up complicated classification and Detection model, so as to reduce the complexity of algorithm, and the modularity of algorithm is higher, large-scale data calculating can be put into, with higher stability;The present invention a collection of user in a certain particular event can also be identified, this method modularity is extremely strong except that can carry out waterborne troops's detection to sole user, can stablize and be applied under large-scale data Computational frame.
Description
Technical field
This method is related to a kind of doubtful waterborne troops towards microblogging and finds method, belongs to social network analysis and data mining skill
Art field.
Background technology
In the past several years, social networks has become what people kept in touch in internet with friends and family
One of major way.There is statistics to show that the average time that people spend in social network sites will be far more than other websites.Big portion
The social network sites divided both provide the service conducted interviews by mobile device, and this also causes the access more frequency of social network sites
It is numerous.
The quick prevalence of social networks is with widely using the substantial amounts of relevant use for alloing these websites to be collected into generation
The information of the interest at family, the friend of user and user.Unfortunately, easily information propagation pattern and substantial amounts of valuable number
According to many illegal groups or personal attention has also been attracted, social networks regards that one gains high profits or real as by them
Existing illegal purpose convenient way.At present, there is a large amount of rumours or spoofing in some social network sites.Particularly, society at this stage
Hand in media, people receive a large amount of " pouring water " information of the extreme influence of navy account number, such as waterborne troops's issue, waterborne troops's machine human hair
Cloth magnanimity spam, farthest to propagate junk information etc., has had a strong impact on online experience.
The behavior of legacy network waterborne troops, its time of occurrence is relatively small compared with early, quantity size, behavior does not have height disguised,
The junk information of generation has obvious characteristic.Therefore, to its recognition methods predominantly based on spam content analysis, such as mail
Content analysis.Meanwhile, by largely recognizing that setting up blacklist and white list is respectively intended to record suspicious user information and just conventional
Family information, waterborne troops's Activity recognition efficiency and accuracy rate are improved with this.In addition, the behavior of mail domain network navy produces spam
Required resource is similar, and mail waterborne troops can be positioned well using resource and its network level feature by it.With network rings
The increase that the complication in border and waterborne troops endanger, the ability that user is taken precautions against it also constantly strengthens.To reach its purpose, network navy
Behavior gradually complicates and is intended to normal users, and the recognition methods of conventional mail waterborne troops behavior can not find these hidden nets
Network navy account number.
Web 2.0 is a kind of emerging interconnected network mode, by network application, promotes interpersonal information in network
Exchange and cooperative cooperating, its pattern customer-centric.Currently, the network navy Study of recognition of Web 2.0 is according to target domain
Difference, can be divided into mail domain, e-commerce field, field of social network and forum's field network navy Study of recognition.Net
Network waterborne troops Study of recognition can be divided into according to the difference of research method and produce content characteristic, based on the related spy of user based on user
Levy, the recognition methods based on environmental characteristic.
The network navy Study of recognition of Web 2.0, is the adaptability Study of recognition on the basis of the identification of legacy network waterborne troops.Mesh
Before, domestic and international network navy Study of recognition achieves bigger progress, but there are still many major issues urgently
It is to be solved.External network navy Study of recognition initially concentrates on mail domain, and is rapidly spread to social networks interior in recent years
In e-commerce field.Domestic network waterborne troops Study of recognition more lacks by contrast.Mainly have special based on content at this stage
Levy, the network navy recognition methods of user characteristics, environmental characteristic and comprehensive characteristics.For example:Ratkiewicz in 2010 et al.
" Truthy " system is devised, those hot issues tweet propagation is collected, analyzes and visualize online, and using such as topic
The collection such as label ' # ', short chain, expression recognizes the political information abuse on Twitter from tweet feature.
2011, Qazvinian et al. attempted to detect rumour on Twitter.PROBLEM DECOMPOSITION is had supervision machine by them for two steps
Habit task:The microblogging for being related to rumour is retrieved first, the microblogging for therefrom identifying and supporting rumour of then classifying on this basis.Point
Content of text, user's history and the specific mould of microblogging have been used in class because of the linear combination of this three category features log-likelihood, experiment
As a result show that text feature (word frequency, part of speech) is still most important, while latter two feature has also been obviously improved classification performance.
But in the processing procedure of practical problem, learn no doubt ensure knowledge by exercising supervision using excessive feature
Not other rate, but the extraction difficulty of high-dimensional characteristic set and Individual features excessive can also cause the performance of system can not accordingly
The requirement of practical application is met, simultaneously because data is openness, many times, we can not possibly always obtain full dose data
(bean vermicelli relation, concern relation, forwarding information etc.), in this case, because data is incomprehensive, it would be desirable to as far as possible
Simplification feature set merge using the feature, introduction for being easy to extract that more cleverly identification model ensures Feature extraction and recognition
The efficiency of prediction.
The content of the invention
The purpose of the present invention is can not accurately to carry out colony to solve in the case of customer relationship links Sparse
It was found that the problem of, propose that a kind of doubtful waterborne troops towards microblogging finds method.
Idea of the invention is that in view of being easiest in mass data obtain and more comprehensive information is social user
The text data information delivered, proposes that a kind of colony based on text data finds and extending method, mainly for user's
Text data carries out natural language processing and finally extracts the characteristic information of the user, and is modeled according to characteristic information,
Cluster analysis is carried out finally by the similitude compared between each user, corporations of colony are finally given, and extract the colony
Outstanding feature carry out colony expansion.
The purpose of the present invention is achieved through the following technical solutions:
A kind of doubtful waterborne troops towards microblogging finds method, comprises the following steps:
Step 1: collection relevant microblog data, obtain following information:Text message that microblog users are sent out, user are done
The text message of comment, the interactive information that is carried out on microblogging of user, including comment operation, forwarding relation, thumb up operation;
The base attribute of user includes bean vermicelli number, concern number, concern relation;
Based on some data resources disclosed in crawler technology or microblogging and the waterborne troops's microblogging or account directly bought, obtain
To the micro-blog information for needing to analyze, these information mainly include:The comment that text message that microblog users are sent out, user are done
The interactive information that text message, user are carried out on microblogging, including comment operation, forwarding relation, thumb up operation;The base of user
This attribute includes bean vermicelli number, concern number, concern relation;
Step 2: carrying out following data prediction work to the sample data obtained via step one:Data are carried out first
Cleaning, then carries out Chinese word segmentation to microblogging text, and data are parsed finally by hierarchical relationship, obtains user-microblogging text and reflects
Penetrate, user-comment text maps, and retains user-concern relation, user-bean vermicelli relation, user-forwarding relation data;
Step 3: to carrying out user characteristics extraction via the pretreated data of step 2:For the institute in microblog data
There is user to extract feature " bean vermicelli number " and " concern number " respectively;Then indirect feature is calculated according to the content of microblog for extracting user
" bean vermicelli concern ratio ", " original microblogging ratio ", " forwarding microblogging ratio ", " microblogging be averaged@numbers ", " frequency of posting ", " full dose microblogging
Network access number ", " forwarding microblogging network access number " and " whether participating in microblogging of the forwarding more than m times ";
Step 4: building training set:If user does not provide training set, the user marked in advance is gathered, classification
Label is waterborne troops, non-waterborne troops, carries out user characteristics and extracts structure training set, if user provides training set, is carried using user
The data marked of confession are as training set;
Preferably, during the progress user characteristics extraction, can be to be used in adjusting training according to different identification demands
Characteristic set, it is not necessary to use the complete characteristic set described in step 2.
Step 5: training waterborne troops detection model:The characteristic set data marked using step 4 carry out classification and Detection mould
The training of type;
Preferably, the present embodiment uses LogisticRegression algorithms as classification and Detection model, n spy is given
Levy x=(x1,x2,…,xn), if conditional probability p (y=1 | x) is observation sample y relative to the probability that event factor x occurs, use
Sigmoid function representations are:
Wherein g (x)=w0+w1x1+…+wnxn, w0For intercept, w1,…,wnRepresent that feature 1 arrives feature n weights, in x bars
The probability that y does not occur under part is:
Step 6: using the above-mentioned waterborne troops's detection model trained, carrying out waterborne troops's user's identification, detailed process is:User
Addition needs the microblog users predicted, user is gathered by step one to step 3 first if the ID or the pet name of only user
Microblog data simultaneously calculates user characteristics, and the feature of acquisition is predicted applied to detection model.
Preferably, being carried out successively to all users in a certain event using waterborne troops's user's identification process described in step 6
Detection, can differentiate whether the event contains waterborne troops.
Beneficial effect
The microblogging text data information used in the present invention, contains microblog users multi-angle, many features, completely
Describe a microblog account, this avoid because the relatively low accuracy rate brought of characteristic dimension it is relatively low the problem of;Exist simultaneously
The feature related to event is added in characteristic set, so that evaded the influence of the navy account number accidentally appeared in event,
The accuracy rate of identification is further improved, can detect account while whether being waterborne troops pair we have appreciated that whether having in event
Waterborne troops, which promotes, provides more helps, with very strong use value.The present invention realizes making full use of for data, convenient and swift
Carry out colony's discovery and without setting up complicated classification and Detection model, so as to reduce the complexity of algorithm, and algorithm
Modularity is higher, large-scale data calculating can be put into, with higher stability.The present invention to sole user except that can enter
Water-filling army is detected, a collection of user in a certain particular event can also be identified, and to judge whether to have in event waterborne troops's curtain
After promote, the data such as customer relationship obtained during intermediate treatment can also be used to find waterborne troops source, group of identification waterborne troops
Even depth is excavated, and improves information utilization, there is very big practical value.This method modularity is extremely strong, can stablize and be applied to greatly
Under scale data Computational frame.
Brief description of the drawings
Fig. 1 is the schematic flow sheet that a kind of doubtful waterborne troops towards microblogging of the embodiment of the present invention finds method;
Fig. 2 is data acquisition and pretreatment process schematic diagram of the embodiment of the present invention for microblogging;
Fig. 3 is that the flow that the training that the embodiment of the present invention carries out detection model using the feature of extraction is predicted with new samples is shown
It is intended to.
Embodiment
The present invention is described in detail with reference to the accompanying drawings and examples:
By taking a certain Sina weibo user as an example:
The ID of this user is 5364402211.When whether need to judge this microblog account is waterborne troops's account, only need
This ID of user 5364402211 is provided, the collection of related data can be carried out according to ID according to the inventive method
Whether it is finally that waterborne troops provides and predicted the outcome to this account with analysis.Detailed process as shown in figure 1, illustrate in detail below.
The collection of relevant microblog data is carried out according to step one:
It is acquired for the Sina weibo data that we to be studied or directly obtains the public data that microblogging is provided.Number
According to collection by setting up buffering URL queues, web link search is carried out using breadth-first search (BFS), and to every
Individual node web page is scanned download, and the page is parsed, and removes unrelated noise, and reservation can describe the attribute of user
Metadata information:Microblogging text message that user delivers, the microblogging text message of user comment, the bean vermicelli number of user, use
The concern number at family, the forwarding relation of user, the log-on message of user;The API that the offer of microblogging official can also be directly invoked connects
The feedback information such as mouth or RSS directly extracts relevant information.The waterborne troops's user data needed in training process is then by interconnection
Online purchase account or forwarding comment obtain waterborne troops's ID, then obtain corresponding user and micro- by our acquisition method
Rich data;
Data prediction is carried out according to step 2, as shown in Fig. 2
Due to there is substantial amounts of semi-structured, unstructured data in microblogging, therefore the microblogging obtained for collection is first
Data are, it is necessary to carry out corresponding cleaning and integrated, the integration that these metadata are carried out with data is stored, and set up corresponding mapping
Relation, is easy to the implementation of subsequent process.
1) data cleansing:For the initial data collected, the inspection of data integrity is carried out, user profile or micro- is removed
The rich incomplete microblog users of information and its corresponding content of microblog;
2) text participle:Participle instrument is used (such as to the microblogging text message (delivering microblogging, comment microblogging) of user
ICTCLAS Words partition systems) or method progress text participle, stop words is removed, the vector space model (VSM of corresponding text is obtained:
Vector space model);
3) it is based on step 1) and data after 2) handling, set up user-microblogging text VSM mapping and user-comment text
VSM maps, while the mapping such as user-forwarding relation, user-bean vermicelli relation, user-concern relation can also be obtained.
User characteristics extraction is carried out according to step 3:
User is described using the characteristic set of a various dimensions for we for this part, wherein both having included directly leading to
The direct feature that collection is obtained is crossed, while the indirect feature also obtained including secondary calculating, specific as shown in table 1:
Table 1:User characteristics
Wherein bean vermicelli number, concern number and bean vermicelli concern than can by the bean vermicelli of the Part III counting user of step 2,
Concern relation is obtained, and bean vermicelli number is obtained by user-bean vermicelli relation, and concern number is obtained by user-concern relation, bean vermicelli concern
Than being obtained by bean vermicelli number/concern number.
Microblogging ratio, original microblogging ratio and content of microblog average "@" number is forwarded to be counted by the Part II of step 2
Obtain.Judge whether its microblogging type is forwarding microblogging or original microblogging simultaneously statistics numbers according to the content of every microblogging of user,
The ratio calculated after statistics with the total microblogging number of user is worth to forwarding microblogging ratio, original microblogging ratio.By VSM modeling statistics its
In@symbol numbers, be used as content of microblog average "@" number with the ratio of user's microblogging sum.
Post frequency, full dose microblogging network access number, forwarding microblogging network access number, whether participate in forwarding and be more than 100 times
Microblogging obtained by the Part II of step 2.Microblogging issuing time in every microblogging of user is ranked up with sending out the latest
The difference (hour) of cloth time and earliest issuing time calculates frequency of posting as time interval by microblogging sum/time interval
Rate.The network access in every microblogging is counted, the summation of all network accesses is regard as full dose microblogging network access number.Statistics is every
Network access in bar forwarding microblogging, regard the summation of all network accesses as forwarding microblogging network access number.Counting user
The former microblogging of every forwarding microblogging is forwarded number, and if any one former microblogging is forwarded more than 100 times, then identification user has
Microblogging of the forwarding more than 100 times.
As described above to user id be 5364402211 carry out feature extractions, obtain its characteristic vector for x=(38,
182,0.21,0.19,0.0078,0.494,0.118,9.0,5.0,1.0)
Can adjusting training is used according to demand feature, the reduction one of characteristic set during realistic model is trained
Determine that the reduction of recognition accuracy can be caused in degree, but also can lifting system accordingly performance.On the other hand, if to calling together
The requirement for the rate of returning is higher and is not to take much count of overall recognition accuracy, then can use attributive character, the group of content characteristic
Close and obtain higher recall rate.
According to step 4, training set is built:
What training set was included is the data for training waterborne troops's detection model, each user's correspondence one in units of user
Training set data.One training set data is two tuples<X, L>It is made up of two parts, X is one group of characteristic vector:
X=(X1,X2,…,Xd),
Characteristic vector is used as the input of disaggregated model, wherein XiRepresent the characteristic value of user, i ∈ 1,2 ... ...,
D }, d represents the feature quantity of user;L is that the tag attributes of user, i.e. " yes " or " no " represent that user is waterborne troops or is not
Waterborne troops.
Waterborne troops's detection model is trained according to step 5, as shown in Figure 3:
The identification of doubtful waterborne troops is considered as a classification problem by us, the training set built by step 4, to train point
Class model, and model preserve being predicted.LogisticRegression algorithm conducts are used in the present embodiment
Disaggregated model, gives n feature x=(x1,x2,…,xn), if conditional probability p (y=1 | x) is observation sample y relative to event
The probability that factor x occurs, be with sigmoid function representations:
Wherein g (x)=w0+w1x1+…+wnxn, then the probability that y does not occur under the conditions of x is:
In this part, to be done is exactly to train w=(w0,w1,…,wn) this group of weights.
According to Step 6: prediction differentiates unlabeled data, the identification of progress waterborne troops
When whether detect some user be waterborne troops, the microblogging ID detected the need for being provided first according to user passes through above-mentioned step
Process described in rapid 1- steps 3 carries out the collection of relevant microblog data, and carries out feature extraction, and the feature of extraction is applied into training
Whether good detection model, output is predicting the outcome for waterborne troops for user;The characteristic vector x extracted in step 3 is applied
The model trained in step 5 is predicted, and is obtained p (y=1 | x) > p (y=0 | x), then be may determine that ID is
5364402211 user is waterborne troops, is otherwise non-waterborne troops.
Detect when whether having waterborne troops in a certain event, then whether use above-mentioned detection user for the same flow pair of waterborne troops
Each microblog users in event are detected, when a certain user is detected for waterborne troops user, then assert that this event contains
There is waterborne troops.
The inventive method has been successfully applied to waterborne troops's Detection task of million grades of content of microblog, achieves more than 80%
Recognition accuracy.In addition the public sentiment that the present invention has been applied to social networks by state security department is monitored, in microblogging social platform
Waterborne troops's context of detection effect significantly, a large amount of microblog water armies have been successfully be detected, to safeguard that internet order is made that remarkable tribute
Offer.
In order to illustrate present disclosure and implementation, above-mentioned specific embodiment is given.Introduced in embodiment thin
The purpose of section is not the scope for limiting claims, and is to aid in understanding the method for the invention.Those skilled in the art
It should be understood that:Do not departing from the present invention and its spirit and scope of the appended claims, the various of most preferred embodiment step are being repaiied
Change, change or replacement are all possible.Therefore, the present invention should not be limited to most preferred embodiment and accompanying drawing disclosure of that.
Claims (4)
1. a kind of doubtful waterborne troops towards microblogging finds method, it is characterised in that:Comprise the following steps:
Step 1: collection relevant microblog data, obtain following information:What text message that microblog users are sent out, user were done comments
The interactive information that the text message of opinion, user are carried out on microblogging, including comment operation, forwarding relation, thumb up operation;User
Base attribute include bean vermicelli number, concern number, concern relation;
Step 2: carrying out following data prediction work to the sample data obtained via step one:Data cleansing is carried out first,
Then Chinese word segmentation is carried out to microblogging text, data is parsed finally by hierarchical relationship, obtained user-microblogging text mapping, use
Family-comment text mapping, and retain user-concern relation, user-bean vermicelli relation, user-forwarding relation data;
Step 3: to carrying out user characteristics extraction via the pretreated data of step 2:It is useful for the institute in microblog data
Feature " bean vermicelli number " and " concern number " are extracted respectively in family;Then indirect feature " bean vermicelli is calculated according to the content of microblog for extracting user
Concern ratio ", " original microblogging ratio ", " forwarding microblogging ratio ", " the average@numbers of microblogging ", " frequency of posting ", " full dose microblogging online
Mode number ", " forwarding microblogging network access number " and " whether participating in microblogging of the forwarding more than m times ";
Step 4: building training set:If user does not provide training set, the user marked in advance is gathered, tag along sort
For waterborne troops, non-waterborne troops, carry out user characteristics and extract structure training set, if user provides training set, provided using user
The data marked are as training set;
Step 5: training waterborne troops detection model:The characteristic set data marked using step 4 carry out classification and Detection model
Training;
Step 6: using the above-mentioned waterborne troops's detection model trained, carrying out waterborne troops's user's identification, detailed process is:User adds
The microblog users of prediction are needed, user's microblogging is gathered by step one to step 3 first if the ID or the pet name of only user
Data simultaneously calculate user characteristics, and the feature of acquisition is predicted applied to detection model.
2. a kind of doubtful waterborne troops towards microblogging according to claim 1 finds method, it is characterised in that:It is described to be used
During the feature extraction of family, according to different identification demands, characteristic set that can be to be used in adjusting training, it is not necessary to use step 2 institute
The complete characteristic set stated.
3. a kind of doubtful waterborne troops towards microblogging according to claim 1 finds method, it is characterised in that:The classification inspection
Survey model is LogisticRegression algorithms, that is, gives n feature x=(x1,x2,…,xn), if conditional probability p (y=1 |
X) for observation sample y relative to the probability that event factor x occurs, be with sigmoid function representations:
Wherein g (x)=w0+w1x1+…+wnxn, w0For intercept, w1,…,wnRepresent that feature 1 arrives feature n weights, the y under the conditions of x
The probability not occurred is:
4. method is found according to a kind of any described doubtful waterborne troopies towards microblogging of claim 1-3, it is characterised in that:Using
Waterborne troops's user's identification process described in step 6 is detected to all users in a certain event successively, can differentiate that the event is
It is no to contain waterborne troops.
Applications Claiming Priority (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201610371264 | 2016-05-30 | ||
CN2016103712647 | 2016-05-30 |
Publications (1)
Publication Number | Publication Date |
---|---|
CN106940732A true CN106940732A (en) | 2017-07-11 |
Family
ID=59463575
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201710212983.9A Pending CN106940732A (en) | 2016-05-30 | 2017-04-01 | A kind of doubtful waterborne troops towards microblogging finds method |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN106940732A (en) |
Cited By (23)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN107451923A (en) * | 2017-07-14 | 2017-12-08 | 北京航空航天大学 | A kind of online social networks rumour Forecasting Methodology based on forwarding Analytic Network Process |
CN107659647A (en) * | 2017-09-26 | 2018-02-02 | 精硕科技(北京)股份有限公司 | The recognition methods of water note and device |
CN107832413A (en) * | 2017-11-07 | 2018-03-23 | 电子科技大学 | A kind of detection method of microblogging inactive users |
CN107895010A (en) * | 2017-11-13 | 2018-04-10 | 华东师范大学 | A kind of method that detection network navy is thumbed up based on network |
CN108763574A (en) * | 2018-06-06 | 2018-11-06 | 电子科技大学 | A kind of microblogging rumour detection algorithm based on gradient boosted tree detects characteristic set with rumour |
CN108921587A (en) * | 2018-05-24 | 2018-11-30 | 腾讯科技(深圳)有限公司 | A kind of data processing method, device and server |
CN109558555A (en) * | 2018-08-20 | 2019-04-02 | 湖北大学 | Microblog water army detection method and detection system based on artificial immunity danger theory |
CN109559245A (en) * | 2017-09-26 | 2019-04-02 | 北京国双科技有限公司 | A kind of method and device identifying specific user |
CN109783586A (en) * | 2019-01-21 | 2019-05-21 | 福州大学 | Waterborne troops's comment detection system and method based on cluster resampling |
CN110032859A (en) * | 2018-12-25 | 2019-07-19 | 阿里巴巴集团控股有限公司 | Abnormal account's discrimination method and device and medium |
CN110110079A (en) * | 2019-03-21 | 2019-08-09 | 中国人民解放军战略支援部队信息工程大学 | A kind of social networks junk user detection method |
CN110457558A (en) * | 2019-07-31 | 2019-11-15 | 沃民高新科技(北京)股份有限公司 | The recognition methods and device of network navy, storage medium and processor |
CN110727763A (en) * | 2019-10-09 | 2020-01-24 | 南京邮电大学 | Method for identifying special ethnic group in social media propagation |
CN110727861A (en) * | 2019-09-23 | 2020-01-24 | 上海蜜度信息技术有限公司 | Method and equipment for microblog water army identification |
CN110956210A (en) * | 2019-11-29 | 2020-04-03 | 重庆邮电大学 | Semi-supervised network water force identification method and system based on AP clustering |
CN111198992A (en) * | 2020-01-07 | 2020-05-26 | 精硕科技(北京)股份有限公司 | Identification method and identification device for mother and infant crowd, electronic equipment and storage medium |
CN111259962A (en) * | 2020-01-17 | 2020-06-09 | 中南大学 | Sybil account detection method for time sequence social data |
CN112597309A (en) * | 2020-12-25 | 2021-04-02 | 西南电子技术研究所(中国电子科技集团公司第十研究所) | Detection system for identifying microblog data stream of sudden event in real time |
CN112800304A (en) * | 2021-01-08 | 2021-05-14 | 上海海事大学 | Microblog water army group detection method based on clustering |
CN112906383A (en) * | 2021-02-05 | 2021-06-04 | 成都信息工程大学 | Integrated adaptive water army identification method based on incremental learning |
CN113837512A (en) * | 2020-06-23 | 2021-12-24 | 中国移动通信集团辽宁有限公司 | Abnormal user identification method and device |
CN115840844A (en) * | 2022-12-17 | 2023-03-24 | 深圳市新联鑫网络科技有限公司 | Internet platform user behavior analysis system based on big data |
CN116150507A (en) * | 2023-04-04 | 2023-05-23 | 湖南蚁坊软件股份有限公司 | Water army group identification method, device, equipment and medium |
Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN103077240A (en) * | 2013-01-10 | 2013-05-01 | 北京工商大学 | Microblog water army identifying method based on probabilistic graphical model |
CN104915397A (en) * | 2015-05-28 | 2015-09-16 | 国家计算机网络与信息安全管理中心 | Method and device for predicting microblog propagation tendencies |
-
2017
- 2017-04-01 CN CN201710212983.9A patent/CN106940732A/en active Pending
Patent Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN103077240A (en) * | 2013-01-10 | 2013-05-01 | 北京工商大学 | Microblog water army identifying method based on probabilistic graphical model |
CN104915397A (en) * | 2015-05-28 | 2015-09-16 | 国家计算机网络与信息安全管理中心 | Method and device for predicting microblog propagation tendencies |
Non-Patent Citations (1)
Title |
---|
孙卫强: "基于深度信念网络的网络水军识别研究", 《中国优秀硕士学位论文全文数据库信息科技辑》 * |
Cited By (29)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN107451923A (en) * | 2017-07-14 | 2017-12-08 | 北京航空航天大学 | A kind of online social networks rumour Forecasting Methodology based on forwarding Analytic Network Process |
CN107659647A (en) * | 2017-09-26 | 2018-02-02 | 精硕科技(北京)股份有限公司 | The recognition methods of water note and device |
CN109559245A (en) * | 2017-09-26 | 2019-04-02 | 北京国双科技有限公司 | A kind of method and device identifying specific user |
CN109559245B (en) * | 2017-09-26 | 2022-02-25 | 北京国双科技有限公司 | Method and device for identifying specific user |
CN107832413A (en) * | 2017-11-07 | 2018-03-23 | 电子科技大学 | A kind of detection method of microblogging inactive users |
CN107895010A (en) * | 2017-11-13 | 2018-04-10 | 华东师范大学 | A kind of method that detection network navy is thumbed up based on network |
CN108921587A (en) * | 2018-05-24 | 2018-11-30 | 腾讯科技(深圳)有限公司 | A kind of data processing method, device and server |
CN108763574A (en) * | 2018-06-06 | 2018-11-06 | 电子科技大学 | A kind of microblogging rumour detection algorithm based on gradient boosted tree detects characteristic set with rumour |
CN109558555A (en) * | 2018-08-20 | 2019-04-02 | 湖北大学 | Microblog water army detection method and detection system based on artificial immunity danger theory |
CN110032859A (en) * | 2018-12-25 | 2019-07-19 | 阿里巴巴集团控股有限公司 | Abnormal account's discrimination method and device and medium |
CN109783586A (en) * | 2019-01-21 | 2019-05-21 | 福州大学 | Waterborne troops's comment detection system and method based on cluster resampling |
CN109783586B (en) * | 2019-01-21 | 2022-10-21 | 福州大学 | Water army comment detection method based on clustering resampling |
CN110110079A (en) * | 2019-03-21 | 2019-08-09 | 中国人民解放军战略支援部队信息工程大学 | A kind of social networks junk user detection method |
CN110110079B (en) * | 2019-03-21 | 2021-06-08 | 中国人民解放军战略支援部队信息工程大学 | Social network spam user detection method |
CN110457558A (en) * | 2019-07-31 | 2019-11-15 | 沃民高新科技(北京)股份有限公司 | The recognition methods and device of network navy, storage medium and processor |
CN110727861A (en) * | 2019-09-23 | 2020-01-24 | 上海蜜度信息技术有限公司 | Method and equipment for microblog water army identification |
CN110727763A (en) * | 2019-10-09 | 2020-01-24 | 南京邮电大学 | Method for identifying special ethnic group in social media propagation |
CN110727763B (en) * | 2019-10-09 | 2022-10-14 | 南京邮电大学 | Method for identifying special ethnic group in social media propagation |
CN110956210A (en) * | 2019-11-29 | 2020-04-03 | 重庆邮电大学 | Semi-supervised network water force identification method and system based on AP clustering |
CN110956210B (en) * | 2019-11-29 | 2023-03-28 | 重庆邮电大学 | Semi-supervised network water force identification method and system based on AP clustering |
CN111198992A (en) * | 2020-01-07 | 2020-05-26 | 精硕科技(北京)股份有限公司 | Identification method and identification device for mother and infant crowd, electronic equipment and storage medium |
CN111259962A (en) * | 2020-01-17 | 2020-06-09 | 中南大学 | Sybil account detection method for time sequence social data |
CN113837512A (en) * | 2020-06-23 | 2021-12-24 | 中国移动通信集团辽宁有限公司 | Abnormal user identification method and device |
CN112597309A (en) * | 2020-12-25 | 2021-04-02 | 西南电子技术研究所(中国电子科技集团公司第十研究所) | Detection system for identifying microblog data stream of sudden event in real time |
CN112800304A (en) * | 2021-01-08 | 2021-05-14 | 上海海事大学 | Microblog water army group detection method based on clustering |
CN112906383A (en) * | 2021-02-05 | 2021-06-04 | 成都信息工程大学 | Integrated adaptive water army identification method based on incremental learning |
CN115840844A (en) * | 2022-12-17 | 2023-03-24 | 深圳市新联鑫网络科技有限公司 | Internet platform user behavior analysis system based on big data |
CN115840844B (en) * | 2022-12-17 | 2023-08-15 | 深圳市新联鑫网络科技有限公司 | Internet platform user behavior analysis system based on big data |
CN116150507A (en) * | 2023-04-04 | 2023-05-23 | 湖南蚁坊软件股份有限公司 | Water army group identification method, device, equipment and medium |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN106940732A (en) | A kind of doubtful waterborne troops towards microblogging finds method | |
CN106980692B (en) | Influence calculation method based on microblog specific events | |
CN104615608B (en) | A kind of data mining processing system and method | |
Lee et al. | Uncovering social spammers: social honeypots+ machine learning | |
CN106886518A (en) | A kind of method of microblog account classification | |
CN106354845A (en) | Microblog rumor recognizing method and system based on propagation structures | |
CN105045857A (en) | Social network rumor recognition method and system | |
CN103793503A (en) | Opinion mining and classification method based on web texts | |
Al-Zoubi et al. | Spam profiles detection on social networks using computational intelligence methods: the effect of the lingual context | |
CN110457404A (en) | Social media account-classification method based on complex heterogeneous network | |
CN107291886A (en) | A kind of microblog topic detecting method and system based on incremental clustering algorithm | |
CN110990683B (en) | Microblog rumor integrated identification method and device based on region and emotional characteristics | |
CN105893484A (en) | Microblog Spammer recognition method based on text characteristics and behavior characteristics | |
CN106681989A (en) | Method for predicting microblog forwarding probability | |
Zulfiker et al. | Analyzing the public sentiment on COVID-19 vaccination in social media: Bangladesh context | |
Yang et al. | Comparison and modelling of country-level microblog user and activity in cyber-physical-social systems using Weibo and Twitter data | |
Chen et al. | The best answers? think twice: online detection of commercial campaigns in the CQA forums | |
Wei et al. | A new evaluation algorithm for the influence of user in social network | |
Cheng et al. | ISC: An iterative social based classifier for adult account detection on twitter | |
Lin et al. | Finding the key users in Facebook fan pages via a clustering approach | |
Zadeh et al. | Mining social network for semantic advertisement | |
Xianlei et al. | Finding domain experts in microblogs | |
Yin et al. | Research of integrated algorithm establishment of a spam detection system | |
Altinel et al. | Identifying topic-based opinion leaders in social networks by content and user information | |
Zheng et al. | A study on microblog classification based on information publicness |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
RJ01 | Rejection of invention patent application after publication | ||
RJ01 | Rejection of invention patent application after publication |
Application publication date: 20170711 |