CN110428291A - A method of Hei Chan clique is identified using directed acyclic graph - Google Patents

A method of Hei Chan clique is identified using directed acyclic graph Download PDF

Info

Publication number
CN110428291A
CN110428291A CN201910726773.0A CN201910726773A CN110428291A CN 110428291 A CN110428291 A CN 110428291A CN 201910726773 A CN201910726773 A CN 201910726773A CN 110428291 A CN110428291 A CN 110428291A
Authority
CN
China
Prior art keywords
group
clique
directed acyclic
acyclic graph
characteristic value
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201910726773.0A
Other languages
Chinese (zh)
Inventor
陈曦
魏国富
辜乘风
钟丹阳
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Information and Data Security Solutions Co Ltd
Original Assignee
Information and Data Security Solutions Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Information and Data Security Solutions Co Ltd filed Critical Information and Data Security Solutions Co Ltd
Priority to CN201910726773.0A priority Critical patent/CN110428291A/en
Publication of CN110428291A publication Critical patent/CN110428291A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/245Query processing
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/004Artificial life, i.e. computing arrangements simulating life
    • G06N3/006Artificial life, i.e. computing arrangements simulating life based on simulated virtual individual or collective life forms, e.g. social simulations or particle swarm optimisation [PSO]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q30/00Commerce
    • G06Q30/02Marketing; Price estimation or determination; Fundraising
    • G06Q30/0201Market modelling; Market analysis; Collecting market data
    • G06Q30/0202Market predictions or forecasting for commercial activities

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Business, Economics & Management (AREA)
  • General Physics & Mathematics (AREA)
  • Strategic Management (AREA)
  • Data Mining & Analysis (AREA)
  • Accounting & Taxation (AREA)
  • Development Economics (AREA)
  • Finance (AREA)
  • General Engineering & Computer Science (AREA)
  • Computational Linguistics (AREA)
  • Entrepreneurship & Innovation (AREA)
  • Game Theory and Decision Science (AREA)
  • Biophysics (AREA)
  • Economics (AREA)
  • Marketing (AREA)
  • General Business, Economics & Management (AREA)
  • Health & Medical Sciences (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Biomedical Technology (AREA)
  • Databases & Information Systems (AREA)
  • Evolutionary Computation (AREA)
  • General Health & Medical Sciences (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

The invention discloses a kind of methods using directed acyclic graph identification Hei Chan clique, are to solve the problem of existing Hei Chan clique identifies.Specific step is as follows: step 1 by the present invention, is found using chart database and invites group;Step 2 extracts the behavioural characteristic value for inviting group;Step 3 identifies the behavioural characteristic value for inviting group using machine learning classification method.The present invention has rational design, and the state that is evenly distributed of group's child nodes collection is described with Gini coefficient, describes figure by the dynamic attribute of the growth pattern of group, and finally identify hacker's behavior;Present invention employs group visual angles, it can be found that more problems, relative to the model of existing single user, it can be found that more having the abnormal account of hacker's behavior, using effect is good.

Description

A method of Hei Chan clique is identified using directed acyclic graph
Technical field
The present invention relates to brand promotion field, specifically a kind of method using directed acyclic graph identification Hei Chan clique.
Background technique
Brand promotion refer to enterprise mould itself, the image of product and service, the series for accepting the majority of consumers extensively Active procedure, main purpose are to promote brand recognition.When new website or new spectra are set up, it usually can all carry out brand and push away Extensively.Businessman is when doing brand promotion, it will usually provide some commodity excitations or preferential policy.When people utilize this information Money-making when, just produce " ulling up wool " event and " wool party ".When the behavior of " ulling up wool " takes on a certain scale, just Form the clique of " ulling up wool ".User's invitation is a kind of mode that businessman does brand promotion.User is by inviting new user to obtain Excitation is taken, businessman obtains new user in this way.New user can only be invited to registration once, that is, after being registered successfully It cannot be invited by other people, this relationship can be indicated with a kind of directed acyclic graph.Here the mark of user is usually phone number Code.
For maximizing the benefits, hacker service efficiency highest and would generally least be easy the mode being found and carry out ulling up sheep Hair.Corresponding black production behavior is identified using traditional database, needs to do a large amount of join operation, such expense is especially big, General database is being 3 layers of join with regard to gruelling, at all can not practical application, so needing exist for using chart database.Mesh Before, most of safety products are usually to be portrayed by the behavior to single user, or directly use when identifying black production Blacklist matching.Common method includes matching being done using list library and using statistical rules.It is complete that matching is done using list library Dependent on list, so the requirement by means of which to list quality is especially high, but list usually has hysteresis quality, this Matching way does not have any discernment to the phone after list update date, and can not identify clique.Utilize statistical rules The shortcomings that it is as follows: rule be all very easily by-passed, when it is black produce discovery oneself ull up take behavior to be identified when, he can attempt different It ulls up and takes mode, if new paragon, which is ulled up, takes success, rule is just bypassed;Laying down a regulation, it is saturating to need engineer to understand business Thorough, difficulty is higher;When regular quantity is more, rule is difficult to safeguard;And it can not identify clique.People are also carrying out phase Close the research of aspect.
Summary of the invention
The embodiment of the present invention is designed to provide a kind of method using directed acyclic graph identification Hei Chan clique, to solve The problems mentioned above in the background art.
To achieve the above object, the embodiment of the present invention provides the following technical solutions:
A method of Hei Chan clique is identified using directed acyclic graph, the specific steps are as follows:
Step 1 is found using chart database and invites group;
Step 2 extracts the behavioural characteristic value for inviting group;
Step 3 identifies the behavioural characteristic value for inviting group using machine learning classification method.
As further embodiment of the embodiment of the present invention: finding invitation group using chart database, specific step is as follows: really Recognize the initial account for not being invited to and having invited others, is found downwards step by step by the initial account, until downwards without other Until account, we just have found all groups for ulling up the corresponding account of wool behavior and they form.
As further embodiment of the embodiment of the present invention: the behavioural characteristic value of invitation group includes the depth of group in step 2 Degree, the shared rate of equipment in group, each father node corresponds to the statistics of child node number and the growth pattern of group in group.
As further embodiment of the embodiment of the present invention: the depth of group is by initial account to being farthest invited to account and pass through The number of plies gone through, group of the usual number of plies greater than 6 layers are defined as Hei Chan clique, and the formula of the shared rate of equipment is as follows in group:The shared rate of equipment is high for Hei Chan clique in group.
As further embodiment of the embodiment of the present invention: the growth pattern of group includes father node change rate and invitation layer depth Invitation people is denoted as 1 if two adjacent invitation people change according to time-sequencing by the change rate of degree;If inviting people not Variation, then be denoted as 0,The distance of registrant to ancestral's node is referred to as to invite Layer depth, if being denoted as 1 then two neighboring invitation people depth is different;If depth is identical, it is denoted as 0,
As further embodiment of the embodiment of the present invention: specific step is as follows for step 3: special to the behavior for inviting group Value indicative carries out tag processes, and the behavioural characteristic value of the invitation group after tag processes is inputted machine learning classification model, is obtained Model parameter after training is loaded into build environment by the model parameter after training.
As further embodiment of the embodiment of the present invention: machine learning classification model use Random Forest model, technology at Ripe, using effect is good.
Compared with prior art, the beneficial effect of the embodiment of the present invention is:
The present invention has rational design, and the state that is evenly distributed of group's child nodes collection is described with Gini coefficient, passes through group The dynamic attribute of growth pattern describes figure, and finally identifies hacker's behavior;
Present invention employs group visual angles, it can be found that more problems can relative to the model of existing single user To find that abnormal account more with hacker's behavior, using effect are good.
Detailed description of the invention
Fig. 1 exchanges prize institute use in the background technique to identify the method for Hei Chan clique using directed acyclic graph tactful Schematic diagram.
Fig. 2 is the operation schematic diagram that data importing in the method for Hei Chan clique is identified using directed acyclic graph.
Fig. 3 is the schematic diagram that the shared rate of equipment in group in the method for Hei Chan clique is identified using directed acyclic graph.
Fig. 4 is to identify the machine behavior for inviting the number of plies too deep in group in the method for Hei Chan clique using directed acyclic graph Schematic diagram.
Fig. 5 is to identify that benefiting object in the method for Hei Chan clique in group in group is his user entirely using directed acyclic graph Schematic diagram.
Fig. 6 is to show that there are the schematic diagrames of machine behavior in the method for identify Hei Chan clique using directed acyclic graph.
Specific embodiment
The technical solution of the patent is explained in further detail With reference to embodiment.
Embodiment 1
Initial data includes accumulated point exchanging table and invitation registration table, and accumulated point exchanging table is to have recorded exchange people to convert to beneficiary The table of the event of prize is changed, invitation registration table is the table for having recorded the event for inviting people that invitee is invited to register.Integral Conversion table is shown in Table 1, and invitation registration table is shown in Table 2.
Table 1
Field name Field
Order number order_id
Prize title bonus_name
Inventory's title inventory_name
Commodity code inventory_id
Beneficiary's cell-phone number recvr_phone
Beneficiary's mobile phone unique identification recvr_imei
Beneficiary ip recvr_ip
Beneficiary's registion time recvr_reg_time
Exchange people's cell-phone number sendr_phone
Exchange manpower machine unique identification sendr_imei
Exchange people ip sendr_ip
Activity name event_name
Activity No event_id
Conversion date order_date
Table 2
Nodal information is as follows at this time: beneficiary's cell-phone number of corresponding two tables of Account exchanges people's cell-phone number, invites People's cell-phone number and invitee's cell-phone number;The mobile phone unique identification of IMEI corresponding A ccount;OneDay is corresponding to invite date, note Volume date and conversion date;BonusOrder corresponds to order number;BonusName corresponds to prize title;Other information for IP.
Side information is as follows at this time: most important side information is invitation, prize relationship and apparatus bound relationship.In order to facilitate system Meter, wherein prize has split into prize people to prize and prize to beneficiary.Account-(invite) -> Account is to being invited It please be related to;Account-(send_bonus) -> BounsOrder corresponds to prize relationship;BounsOrder–(recv_bonus)-> The corresponding prize-winning relationship of Account;Account-(use_imei) -> IMEI corresponds to account and device relationships.
It imports point information: the cell-phone number in two tables is imported into Account node;Imei in two tables is believed Breath is imported into IMEI node;Other information correspondence is imported into corresponding node respectively.
It imports side information: importing chart database, i.e. acquisition side information will be contacted what is occurred with a line in tables of data, i.e., will In invitation registration table the connection of every a line sendr_phone and recvr_phone imported into Account-(invite) -> In Account mid-side node, the relationship of a line sendr_phone and order_id every in accumulated point exchanging table are imported into Account- (send_bonus) in -> BounsOrder mid-side node, by a line order_id and recvr_phone every in accumulated point exchanging table Relationship is imported into BounsOrder-(recv_bonus) -> Account mid-side node, respectively will be by every a line in two tables Cell-phone number and corresponding equipment unique identification are imported into Account-(use_imei) -> IMEI mid-side node, respectively by other sides Information is imported into corresponding mid-side node.
Feature extraction: it obtains the group of invitation relationship: obtaining start node, global search and filtered out in chart database Degree is greater than 1 and Account node of the in-degree equal to 0, community information is obtained, by start node information along Account- (invite) transmitting of -> Account directed acyclic graph believes start node until can not find new Account node Breath polymerization just obtains community information.
Population characteristic: the depth of group is extracted, maximum distance of the start node to all child nodes in each group;Group The shared rate of middle equipment, group's interior joint number is divided by group's device therefor number;Group's father node number, out-degree is greater than 0 in group Number of nodes;The Gini coefficient of group's son node number set, the son node number set of node of the out-degree greater than 0 inputs base in group Buddhist nun's coefficient formulas:G:Gini coefficient, Chinese are Gini coefficient;xiFor sub- section I-th of data in points set, xjFor j-th of data in child node set;N is the size of number of nodes set;Father node Invitation people in group is denoted as 1 if two adjacent invitation people change according to time-sequencing by change rate;If invited People does not change, then is denoted as 0,The change rate for inviting layer depth, in group The distance of registrant to ancestral's node be referred to as to invite layer depth, if being denoted as 1 then two neighboring invitation people depth is different; If depth is identical, it is denoted as 0,
Model training: doing tag processes to each group in production process, and wool the party and the masses body is designated as 1, will just be commonly used Family is designated as 0;Utilize the feature and label training random forest disaggregated model of extraction: It is the probability of wool party for prediction group;Wj(xi, x ') be i-th point jth tree in relative to the non-of new data point x' Negative weight, if xi and x' belong to the same leaf node comprising k' point,It otherwise is 0;M is random Forest hyper parameter, the number of tree;xiAnd yiIt is i-th data in training data;N is data set quantity.
Model is online: trained model is exported as into PMML file, PMML has recorded the structure and all parameters of model, PMML file is loaded on line, the identification of wool party is carried out, that is, may recognize that Anomaly groups.
In Fig. 3-6, in each subgraph x-axis be group in registrant serial number (i.e. according to time sequence after x-th registration use Family), the shortest distance of this user of y-axis to start node.Random like should normally be presented, rule is obvious in figure, illustrates that there are machines Device behavior.
The foregoing is merely illustrative of the preferred embodiments of the present invention, is not intended to limit the invention, all in essence of the invention Within mind and principle, any modification, equivalent replacement, improvement and so on be should all be included in the protection scope of the present invention.No It should treat any reference in the claims as limiting the claims involved.
In addition, it should be understood that although this specification is described in terms of embodiments, but not each embodiment is only wrapped Containing an independent technical solution, this description of the specification is merely for the sake of clarity, and those skilled in the art should It considers the specification as a whole, the technical solutions in the various embodiments may also be suitably combined, forms those skilled in the art The other embodiments being understood that.

Claims (7)

1. a kind of method using directed acyclic graph identification Hei Chan clique, which is characterized in that specific step is as follows:
Step 1 is found using chart database and invites group;
Step 2 extracts the behavioural characteristic value for inviting group;
Step 3 identifies the behavioural characteristic value for inviting group using machine learning classification method.
2. the method according to claim 1 using directed acyclic graph identification Hei Chan clique, which is characterized in that the step Using chart database searching invitation group, specific step is as follows in one: confirmation is not invited to and has invited others initial account, It is found downwards step by step by the initial account, until downwards without other accounts.
3. the method according to claim 1 using directed acyclic graph identification Hei Chan clique, which is characterized in that the step The behavioural characteristic value that group is invited in two includes the depth of group, the shared rate of equipment in group, each father node pair in group Answer the statistics of child node number and the growth pattern of group.
4. the method according to claim 3 using directed acyclic graph identification Hei Chan clique, which is characterized in that the group Depth be initial account to farthest the account number of plies experienced is invited to, the formula of the shared rate of equipment is as follows in group:
5. the method according to claim 3 or 4 using directed acyclic graph identification Hei Chan clique, which is characterized in that the group The growth pattern of body includes father node change rate and the change rate for inviting layer depth, invitation people according to time-sequencing, if adjacent Two invitation people variation, then be denoted as 1;If people is invited not change, it is denoted as 0, The distance of registrant to ancestral's node is referred to as to invite layer depth, if being denoted as 1 then two neighboring invitation people depth is different; If depth is identical, it is denoted as 0,
6. the method according to claim 1 using directed acyclic graph identification Hei Chan clique, which is characterized in that the step Three specific step is as follows: tag processes is carried out to the behavioural characteristic value for inviting group, by the invitation group after tag processes Behavioural characteristic value inputs machine learning classification model, and the model parameter after training is loaded by the model parameter after being trained Build environment.
7. the method according to claim 6 using directed acyclic graph identification Hei Chan clique, which is characterized in that the machine Learning classification model uses Random Forest model.
CN201910726773.0A 2019-08-07 2019-08-07 A method of Hei Chan clique is identified using directed acyclic graph Pending CN110428291A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910726773.0A CN110428291A (en) 2019-08-07 2019-08-07 A method of Hei Chan clique is identified using directed acyclic graph

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910726773.0A CN110428291A (en) 2019-08-07 2019-08-07 A method of Hei Chan clique is identified using directed acyclic graph

Publications (1)

Publication Number Publication Date
CN110428291A true CN110428291A (en) 2019-11-08

Family

ID=68414670

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910726773.0A Pending CN110428291A (en) 2019-08-07 2019-08-07 A method of Hei Chan clique is identified using directed acyclic graph

Country Status (1)

Country Link
CN (1) CN110428291A (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111984695A (en) * 2020-07-21 2020-11-24 微梦创科网络科技(中国)有限公司 Method and system for determining black grouping based on Spark
CN112184334A (en) * 2020-10-27 2021-01-05 北京嘀嘀无限科技发展有限公司 Method, apparatus, device and medium for determining problem users
CN114596111A (en) * 2022-03-03 2022-06-07 浙江吉利控股集团有限公司 Risk identification model generation method, device, equipment and storage medium

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106850346A (en) * 2017-01-23 2017-06-13 北京京东金融科技控股有限公司 Change and assist in identifying method, device and the electronic equipment of blacklist for monitor node
CN108038700A (en) * 2017-12-22 2018-05-15 上海前隆信息科技有限公司 A kind of anti-fraud data analysing method and system
CN109064318A (en) * 2018-08-24 2018-12-21 苏宁消费金融有限公司 A kind of internet financial risks monitoring system of knowledge based map
CN109299811A (en) * 2018-08-20 2019-02-01 众安在线财产保险股份有限公司 A method of the identification of fraud clique and Risk of Communication prediction based on complex network
CN109409918A (en) * 2018-08-24 2019-03-01 深圳壹账通智能科技有限公司 The recognition methods of wool party, device, equipment and storage medium based on user behavior
CN110032583A (en) * 2019-03-12 2019-07-19 平安科技(深圳)有限公司 A kind of recognition methods of fraud clique, device, readable storage medium storing program for executing and terminal device

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106850346A (en) * 2017-01-23 2017-06-13 北京京东金融科技控股有限公司 Change and assist in identifying method, device and the electronic equipment of blacklist for monitor node
CN108038700A (en) * 2017-12-22 2018-05-15 上海前隆信息科技有限公司 A kind of anti-fraud data analysing method and system
CN109299811A (en) * 2018-08-20 2019-02-01 众安在线财产保险股份有限公司 A method of the identification of fraud clique and Risk of Communication prediction based on complex network
CN109064318A (en) * 2018-08-24 2018-12-21 苏宁消费金融有限公司 A kind of internet financial risks monitoring system of knowledge based map
CN109409918A (en) * 2018-08-24 2019-03-01 深圳壹账通智能科技有限公司 The recognition methods of wool party, device, equipment and storage medium based on user behavior
CN110032583A (en) * 2019-03-12 2019-07-19 平安科技(深圳)有限公司 A kind of recognition methods of fraud clique, device, readable storage medium storing program for executing and terminal device

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111984695A (en) * 2020-07-21 2020-11-24 微梦创科网络科技(中国)有限公司 Method and system for determining black grouping based on Spark
CN111984695B (en) * 2020-07-21 2024-02-20 微梦创科网络科技(中国)有限公司 Method and system for determining black clusters based on Spark
CN112184334A (en) * 2020-10-27 2021-01-05 北京嘀嘀无限科技发展有限公司 Method, apparatus, device and medium for determining problem users
CN114596111A (en) * 2022-03-03 2022-06-07 浙江吉利控股集团有限公司 Risk identification model generation method, device, equipment and storage medium

Similar Documents

Publication Publication Date Title
CN107194623B (en) Group partner fraud discovery method and device
CN110428291A (en) A method of Hei Chan clique is identified using directed acyclic graph
CN110462604A (en) The data processing system and method for association internet device are used based on equipment
CN103929473B (en) Build unified APP and access distributed data storage in the method for the system of multiple subsystem
CN111104521B (en) Anti-fraud detection method and detection system based on graph analysis
CN108366045A (en) A kind of setting method and device of air control scorecard
CN108073659A (en) A kind of love and marriage object recommendation method and device
CN113689003B (en) Mixed federal learning framework and method for safely removing third party
CN102081774A (en) Card-raising identification method and system
CN107517463A (en) A kind of recognition methods of telephone number and device
CN110033342A (en) A kind of training method and device, a kind of recommended method and device of recommended models
CN107292775A (en) A kind of service system for intellectual property rights and its implementation
Zeng et al. Incentive mechanisms in federated learning and a game-theoretical approach
WO2022057108A1 (en) Federated-learning-based personal qualification evaluation method, apparatus and system, and storage medium
CN103218411B (en) Website related information acquisition methods and device
CN107330091A (en) Information processing method and device
CN110008402A (en) A kind of point of interest recommended method of the decentralization matrix decomposition based on social networks
CN106817390A (en) A kind of shared method and apparatus of user data
CN108228787A (en) According to the method and apparatus of multistage classification processing information
KR101577716B1 (en) A method of extracting couples with optimal condition using weight ratio for one's spouse index and the record medium recording thereof
CN105354343B (en) User characteristics method for digging based on remote dialogue
CN107256226A (en) The construction method and device of a kind of knowledge base
CN112925899B (en) Ordering model establishment method, case clue recommendation method, device and medium
CN104750860B (en) A kind of date storage method of uncertain data
CN105827873B (en) A kind of solution strange land client traffic handles limited method and device

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication
RJ01 Rejection of invention patent application after publication

Application publication date: 20191108