CN110428291A - A method of Hei Chan clique is identified using directed acyclic graph - Google Patents
A method of Hei Chan clique is identified using directed acyclic graph Download PDFInfo
- Publication number
- CN110428291A CN110428291A CN201910726773.0A CN201910726773A CN110428291A CN 110428291 A CN110428291 A CN 110428291A CN 201910726773 A CN201910726773 A CN 201910726773A CN 110428291 A CN110428291 A CN 110428291A
- Authority
- CN
- China
- Prior art keywords
- group
- clique
- directed acyclic
- acyclic graph
- characteristic value
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/24—Querying
- G06F16/245—Query processing
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/004—Artificial life, i.e. computing arrangements simulating life
- G06N3/006—Artificial life, i.e. computing arrangements simulating life based on simulated virtual individual or collective life forms, e.g. social simulations or particle swarm optimisation [PSO]
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06Q—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
- G06Q30/00—Commerce
- G06Q30/02—Marketing; Price estimation or determination; Fundraising
- G06Q30/0201—Market modelling; Market analysis; Collecting market data
- G06Q30/0202—Market predictions or forecasting for commercial activities
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- Business, Economics & Management (AREA)
- General Physics & Mathematics (AREA)
- Strategic Management (AREA)
- Data Mining & Analysis (AREA)
- Accounting & Taxation (AREA)
- Development Economics (AREA)
- Finance (AREA)
- General Engineering & Computer Science (AREA)
- Computational Linguistics (AREA)
- Entrepreneurship & Innovation (AREA)
- Game Theory and Decision Science (AREA)
- Biophysics (AREA)
- Economics (AREA)
- Marketing (AREA)
- General Business, Economics & Management (AREA)
- Health & Medical Sciences (AREA)
- Life Sciences & Earth Sciences (AREA)
- Artificial Intelligence (AREA)
- Biomedical Technology (AREA)
- Databases & Information Systems (AREA)
- Evolutionary Computation (AREA)
- General Health & Medical Sciences (AREA)
- Molecular Biology (AREA)
- Computing Systems (AREA)
- Mathematical Physics (AREA)
- Software Systems (AREA)
- Management, Administration, Business Operations System, And Electronic Commerce (AREA)
Abstract
The invention discloses a kind of methods using directed acyclic graph identification Hei Chan clique, are to solve the problem of existing Hei Chan clique identifies.Specific step is as follows: step 1 by the present invention, is found using chart database and invites group;Step 2 extracts the behavioural characteristic value for inviting group;Step 3 identifies the behavioural characteristic value for inviting group using machine learning classification method.The present invention has rational design, and the state that is evenly distributed of group's child nodes collection is described with Gini coefficient, describes figure by the dynamic attribute of the growth pattern of group, and finally identify hacker's behavior;Present invention employs group visual angles, it can be found that more problems, relative to the model of existing single user, it can be found that more having the abnormal account of hacker's behavior, using effect is good.
Description
Technical field
The present invention relates to brand promotion field, specifically a kind of method using directed acyclic graph identification Hei Chan clique.
Background technique
Brand promotion refer to enterprise mould itself, the image of product and service, the series for accepting the majority of consumers extensively
Active procedure, main purpose are to promote brand recognition.When new website or new spectra are set up, it usually can all carry out brand and push away
Extensively.Businessman is when doing brand promotion, it will usually provide some commodity excitations or preferential policy.When people utilize this information
Money-making when, just produce " ulling up wool " event and " wool party ".When the behavior of " ulling up wool " takes on a certain scale, just
Form the clique of " ulling up wool ".User's invitation is a kind of mode that businessman does brand promotion.User is by inviting new user to obtain
Excitation is taken, businessman obtains new user in this way.New user can only be invited to registration once, that is, after being registered successfully
It cannot be invited by other people, this relationship can be indicated with a kind of directed acyclic graph.Here the mark of user is usually phone number
Code.
For maximizing the benefits, hacker service efficiency highest and would generally least be easy the mode being found and carry out ulling up sheep
Hair.Corresponding black production behavior is identified using traditional database, needs to do a large amount of join operation, such expense is especially big,
General database is being 3 layers of join with regard to gruelling, at all can not practical application, so needing exist for using chart database.Mesh
Before, most of safety products are usually to be portrayed by the behavior to single user, or directly use when identifying black production
Blacklist matching.Common method includes matching being done using list library and using statistical rules.It is complete that matching is done using list library
Dependent on list, so the requirement by means of which to list quality is especially high, but list usually has hysteresis quality, this
Matching way does not have any discernment to the phone after list update date, and can not identify clique.Utilize statistical rules
The shortcomings that it is as follows: rule be all very easily by-passed, when it is black produce discovery oneself ull up take behavior to be identified when, he can attempt different
It ulls up and takes mode, if new paragon, which is ulled up, takes success, rule is just bypassed;Laying down a regulation, it is saturating to need engineer to understand business
Thorough, difficulty is higher;When regular quantity is more, rule is difficult to safeguard;And it can not identify clique.People are also carrying out phase
Close the research of aspect.
Summary of the invention
The embodiment of the present invention is designed to provide a kind of method using directed acyclic graph identification Hei Chan clique, to solve
The problems mentioned above in the background art.
To achieve the above object, the embodiment of the present invention provides the following technical solutions:
A method of Hei Chan clique is identified using directed acyclic graph, the specific steps are as follows:
Step 1 is found using chart database and invites group;
Step 2 extracts the behavioural characteristic value for inviting group;
Step 3 identifies the behavioural characteristic value for inviting group using machine learning classification method.
As further embodiment of the embodiment of the present invention: finding invitation group using chart database, specific step is as follows: really
Recognize the initial account for not being invited to and having invited others, is found downwards step by step by the initial account, until downwards without other
Until account, we just have found all groups for ulling up the corresponding account of wool behavior and they form.
As further embodiment of the embodiment of the present invention: the behavioural characteristic value of invitation group includes the depth of group in step 2
Degree, the shared rate of equipment in group, each father node corresponds to the statistics of child node number and the growth pattern of group in group.
As further embodiment of the embodiment of the present invention: the depth of group is by initial account to being farthest invited to account and pass through
The number of plies gone through, group of the usual number of plies greater than 6 layers are defined as Hei Chan clique, and the formula of the shared rate of equipment is as follows in group:The shared rate of equipment is high for Hei Chan clique in group.
As further embodiment of the embodiment of the present invention: the growth pattern of group includes father node change rate and invitation layer depth
Invitation people is denoted as 1 if two adjacent invitation people change according to time-sequencing by the change rate of degree;If inviting people not
Variation, then be denoted as 0,The distance of registrant to ancestral's node is referred to as to invite
Layer depth, if being denoted as 1 then two neighboring invitation people depth is different;If depth is identical, it is denoted as 0,
As further embodiment of the embodiment of the present invention: specific step is as follows for step 3: special to the behavior for inviting group
Value indicative carries out tag processes, and the behavioural characteristic value of the invitation group after tag processes is inputted machine learning classification model, is obtained
Model parameter after training is loaded into build environment by the model parameter after training.
As further embodiment of the embodiment of the present invention: machine learning classification model use Random Forest model, technology at
Ripe, using effect is good.
Compared with prior art, the beneficial effect of the embodiment of the present invention is:
The present invention has rational design, and the state that is evenly distributed of group's child nodes collection is described with Gini coefficient, passes through group
The dynamic attribute of growth pattern describes figure, and finally identifies hacker's behavior;
Present invention employs group visual angles, it can be found that more problems can relative to the model of existing single user
To find that abnormal account more with hacker's behavior, using effect are good.
Detailed description of the invention
Fig. 1 exchanges prize institute use in the background technique to identify the method for Hei Chan clique using directed acyclic graph tactful
Schematic diagram.
Fig. 2 is the operation schematic diagram that data importing in the method for Hei Chan clique is identified using directed acyclic graph.
Fig. 3 is the schematic diagram that the shared rate of equipment in group in the method for Hei Chan clique is identified using directed acyclic graph.
Fig. 4 is to identify the machine behavior for inviting the number of plies too deep in group in the method for Hei Chan clique using directed acyclic graph
Schematic diagram.
Fig. 5 is to identify that benefiting object in the method for Hei Chan clique in group in group is his user entirely using directed acyclic graph
Schematic diagram.
Fig. 6 is to show that there are the schematic diagrames of machine behavior in the method for identify Hei Chan clique using directed acyclic graph.
Specific embodiment
The technical solution of the patent is explained in further detail With reference to embodiment.
Embodiment 1
Initial data includes accumulated point exchanging table and invitation registration table, and accumulated point exchanging table is to have recorded exchange people to convert to beneficiary
The table of the event of prize is changed, invitation registration table is the table for having recorded the event for inviting people that invitee is invited to register.Integral
Conversion table is shown in Table 1, and invitation registration table is shown in Table 2.
Table 1
Field name | Field |
Order number | order_id |
Prize title | bonus_name |
Inventory's title | inventory_name |
Commodity code | inventory_id |
Beneficiary's cell-phone number | recvr_phone |
Beneficiary's mobile phone unique identification | recvr_imei |
Beneficiary ip | recvr_ip |
Beneficiary's registion time | recvr_reg_time |
Exchange people's cell-phone number | sendr_phone |
Exchange manpower machine unique identification | sendr_imei |
Exchange people ip | sendr_ip |
Activity name | event_name |
Activity No | event_id |
Conversion date | order_date |
Table 2
Nodal information is as follows at this time: beneficiary's cell-phone number of corresponding two tables of Account exchanges people's cell-phone number, invites
People's cell-phone number and invitee's cell-phone number;The mobile phone unique identification of IMEI corresponding A ccount;OneDay is corresponding to invite date, note
Volume date and conversion date;BonusOrder corresponds to order number;BonusName corresponds to prize title;Other information for IP.
Side information is as follows at this time: most important side information is invitation, prize relationship and apparatus bound relationship.In order to facilitate system
Meter, wherein prize has split into prize people to prize and prize to beneficiary.Account-(invite) -> Account is to being invited
It please be related to;Account-(send_bonus) -> BounsOrder corresponds to prize relationship;BounsOrder–(recv_bonus)->
The corresponding prize-winning relationship of Account;Account-(use_imei) -> IMEI corresponds to account and device relationships.
It imports point information: the cell-phone number in two tables is imported into Account node;Imei in two tables is believed
Breath is imported into IMEI node;Other information correspondence is imported into corresponding node respectively.
It imports side information: importing chart database, i.e. acquisition side information will be contacted what is occurred with a line in tables of data, i.e., will
In invitation registration table the connection of every a line sendr_phone and recvr_phone imported into Account-(invite) ->
In Account mid-side node, the relationship of a line sendr_phone and order_id every in accumulated point exchanging table are imported into Account-
(send_bonus) in -> BounsOrder mid-side node, by a line order_id and recvr_phone every in accumulated point exchanging table
Relationship is imported into BounsOrder-(recv_bonus) -> Account mid-side node, respectively will be by every a line in two tables
Cell-phone number and corresponding equipment unique identification are imported into Account-(use_imei) -> IMEI mid-side node, respectively by other sides
Information is imported into corresponding mid-side node.
Feature extraction: it obtains the group of invitation relationship: obtaining start node, global search and filtered out in chart database
Degree is greater than 1 and Account node of the in-degree equal to 0, community information is obtained, by start node information along Account-
(invite) transmitting of -> Account directed acyclic graph believes start node until can not find new Account node
Breath polymerization just obtains community information.
Population characteristic: the depth of group is extracted, maximum distance of the start node to all child nodes in each group;Group
The shared rate of middle equipment, group's interior joint number is divided by group's device therefor number;Group's father node number, out-degree is greater than 0 in group
Number of nodes;The Gini coefficient of group's son node number set, the son node number set of node of the out-degree greater than 0 inputs base in group
Buddhist nun's coefficient formulas:G:Gini coefficient, Chinese are Gini coefficient;xiFor sub- section
I-th of data in points set, xjFor j-th of data in child node set;N is the size of number of nodes set;Father node
Invitation people in group is denoted as 1 if two adjacent invitation people change according to time-sequencing by change rate;If invited
People does not change, then is denoted as 0,The change rate for inviting layer depth, in group
The distance of registrant to ancestral's node be referred to as to invite layer depth, if being denoted as 1 then two neighboring invitation people depth is different;
If depth is identical, it is denoted as 0,
Model training: doing tag processes to each group in production process, and wool the party and the masses body is designated as 1, will just be commonly used
Family is designated as 0;Utilize the feature and label training random forest disaggregated model of extraction: It is the probability of wool party for prediction group;Wj(xi, x ') be i-th point jth tree in relative to the non-of new data point x'
Negative weight, if xi and x' belong to the same leaf node comprising k' point,It otherwise is 0;M is random
Forest hyper parameter, the number of tree;xiAnd yiIt is i-th data in training data;N is data set quantity.
Model is online: trained model is exported as into PMML file, PMML has recorded the structure and all parameters of model,
PMML file is loaded on line, the identification of wool party is carried out, that is, may recognize that Anomaly groups.
In Fig. 3-6, in each subgraph x-axis be group in registrant serial number (i.e. according to time sequence after x-th registration use
Family), the shortest distance of this user of y-axis to start node.Random like should normally be presented, rule is obvious in figure, illustrates that there are machines
Device behavior.
The foregoing is merely illustrative of the preferred embodiments of the present invention, is not intended to limit the invention, all in essence of the invention
Within mind and principle, any modification, equivalent replacement, improvement and so on be should all be included in the protection scope of the present invention.No
It should treat any reference in the claims as limiting the claims involved.
In addition, it should be understood that although this specification is described in terms of embodiments, but not each embodiment is only wrapped
Containing an independent technical solution, this description of the specification is merely for the sake of clarity, and those skilled in the art should
It considers the specification as a whole, the technical solutions in the various embodiments may also be suitably combined, forms those skilled in the art
The other embodiments being understood that.
Claims (7)
1. a kind of method using directed acyclic graph identification Hei Chan clique, which is characterized in that specific step is as follows:
Step 1 is found using chart database and invites group;
Step 2 extracts the behavioural characteristic value for inviting group;
Step 3 identifies the behavioural characteristic value for inviting group using machine learning classification method.
2. the method according to claim 1 using directed acyclic graph identification Hei Chan clique, which is characterized in that the step
Using chart database searching invitation group, specific step is as follows in one: confirmation is not invited to and has invited others initial account,
It is found downwards step by step by the initial account, until downwards without other accounts.
3. the method according to claim 1 using directed acyclic graph identification Hei Chan clique, which is characterized in that the step
The behavioural characteristic value that group is invited in two includes the depth of group, the shared rate of equipment in group, each father node pair in group
Answer the statistics of child node number and the growth pattern of group.
4. the method according to claim 3 using directed acyclic graph identification Hei Chan clique, which is characterized in that the group
Depth be initial account to farthest the account number of plies experienced is invited to, the formula of the shared rate of equipment is as follows in group:
5. the method according to claim 3 or 4 using directed acyclic graph identification Hei Chan clique, which is characterized in that the group
The growth pattern of body includes father node change rate and the change rate for inviting layer depth, invitation people according to time-sequencing, if adjacent
Two invitation people variation, then be denoted as 1;If people is invited not change, it is denoted as 0,
The distance of registrant to ancestral's node is referred to as to invite layer depth, if being denoted as 1 then two neighboring invitation people depth is different;
If depth is identical, it is denoted as 0,
6. the method according to claim 1 using directed acyclic graph identification Hei Chan clique, which is characterized in that the step
Three specific step is as follows: tag processes is carried out to the behavioural characteristic value for inviting group, by the invitation group after tag processes
Behavioural characteristic value inputs machine learning classification model, and the model parameter after training is loaded by the model parameter after being trained
Build environment.
7. the method according to claim 6 using directed acyclic graph identification Hei Chan clique, which is characterized in that the machine
Learning classification model uses Random Forest model.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910726773.0A CN110428291A (en) | 2019-08-07 | 2019-08-07 | A method of Hei Chan clique is identified using directed acyclic graph |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910726773.0A CN110428291A (en) | 2019-08-07 | 2019-08-07 | A method of Hei Chan clique is identified using directed acyclic graph |
Publications (1)
Publication Number | Publication Date |
---|---|
CN110428291A true CN110428291A (en) | 2019-11-08 |
Family
ID=68414670
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201910726773.0A Pending CN110428291A (en) | 2019-08-07 | 2019-08-07 | A method of Hei Chan clique is identified using directed acyclic graph |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN110428291A (en) |
Cited By (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111984695A (en) * | 2020-07-21 | 2020-11-24 | 微梦创科网络科技(中国)有限公司 | Method and system for determining black grouping based on Spark |
CN112184334A (en) * | 2020-10-27 | 2021-01-05 | 北京嘀嘀无限科技发展有限公司 | Method, apparatus, device and medium for determining problem users |
CN114596111A (en) * | 2022-03-03 | 2022-06-07 | 浙江吉利控股集团有限公司 | Risk identification model generation method, device, equipment and storage medium |
Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN106850346A (en) * | 2017-01-23 | 2017-06-13 | 北京京东金融科技控股有限公司 | Change and assist in identifying method, device and the electronic equipment of blacklist for monitor node |
CN108038700A (en) * | 2017-12-22 | 2018-05-15 | 上海前隆信息科技有限公司 | A kind of anti-fraud data analysing method and system |
CN109064318A (en) * | 2018-08-24 | 2018-12-21 | 苏宁消费金融有限公司 | A kind of internet financial risks monitoring system of knowledge based map |
CN109299811A (en) * | 2018-08-20 | 2019-02-01 | 众安在线财产保险股份有限公司 | A method of the identification of fraud clique and Risk of Communication prediction based on complex network |
CN109409918A (en) * | 2018-08-24 | 2019-03-01 | 深圳壹账通智能科技有限公司 | The recognition methods of wool party, device, equipment and storage medium based on user behavior |
CN110032583A (en) * | 2019-03-12 | 2019-07-19 | 平安科技(深圳)有限公司 | A kind of recognition methods of fraud clique, device, readable storage medium storing program for executing and terminal device |
-
2019
- 2019-08-07 CN CN201910726773.0A patent/CN110428291A/en active Pending
Patent Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN106850346A (en) * | 2017-01-23 | 2017-06-13 | 北京京东金融科技控股有限公司 | Change and assist in identifying method, device and the electronic equipment of blacklist for monitor node |
CN108038700A (en) * | 2017-12-22 | 2018-05-15 | 上海前隆信息科技有限公司 | A kind of anti-fraud data analysing method and system |
CN109299811A (en) * | 2018-08-20 | 2019-02-01 | 众安在线财产保险股份有限公司 | A method of the identification of fraud clique and Risk of Communication prediction based on complex network |
CN109064318A (en) * | 2018-08-24 | 2018-12-21 | 苏宁消费金融有限公司 | A kind of internet financial risks monitoring system of knowledge based map |
CN109409918A (en) * | 2018-08-24 | 2019-03-01 | 深圳壹账通智能科技有限公司 | The recognition methods of wool party, device, equipment and storage medium based on user behavior |
CN110032583A (en) * | 2019-03-12 | 2019-07-19 | 平安科技(深圳)有限公司 | A kind of recognition methods of fraud clique, device, readable storage medium storing program for executing and terminal device |
Cited By (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111984695A (en) * | 2020-07-21 | 2020-11-24 | 微梦创科网络科技(中国)有限公司 | Method and system for determining black grouping based on Spark |
CN111984695B (en) * | 2020-07-21 | 2024-02-20 | 微梦创科网络科技(中国)有限公司 | Method and system for determining black clusters based on Spark |
CN112184334A (en) * | 2020-10-27 | 2021-01-05 | 北京嘀嘀无限科技发展有限公司 | Method, apparatus, device and medium for determining problem users |
CN114596111A (en) * | 2022-03-03 | 2022-06-07 | 浙江吉利控股集团有限公司 | Risk identification model generation method, device, equipment and storage medium |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN107194623B (en) | Group partner fraud discovery method and device | |
CN110428291A (en) | A method of Hei Chan clique is identified using directed acyclic graph | |
CN110462604A (en) | The data processing system and method for association internet device are used based on equipment | |
CN103929473B (en) | Build unified APP and access distributed data storage in the method for the system of multiple subsystem | |
CN111104521B (en) | Anti-fraud detection method and detection system based on graph analysis | |
CN108366045A (en) | A kind of setting method and device of air control scorecard | |
CN108073659A (en) | A kind of love and marriage object recommendation method and device | |
CN113689003B (en) | Mixed federal learning framework and method for safely removing third party | |
CN102081774A (en) | Card-raising identification method and system | |
CN107517463A (en) | A kind of recognition methods of telephone number and device | |
CN110033342A (en) | A kind of training method and device, a kind of recommended method and device of recommended models | |
CN107292775A (en) | A kind of service system for intellectual property rights and its implementation | |
Zeng et al. | Incentive mechanisms in federated learning and a game-theoretical approach | |
WO2022057108A1 (en) | Federated-learning-based personal qualification evaluation method, apparatus and system, and storage medium | |
CN103218411B (en) | Website related information acquisition methods and device | |
CN107330091A (en) | Information processing method and device | |
CN110008402A (en) | A kind of point of interest recommended method of the decentralization matrix decomposition based on social networks | |
CN106817390A (en) | A kind of shared method and apparatus of user data | |
CN108228787A (en) | According to the method and apparatus of multistage classification processing information | |
KR101577716B1 (en) | A method of extracting couples with optimal condition using weight ratio for one's spouse index and the record medium recording thereof | |
CN105354343B (en) | User characteristics method for digging based on remote dialogue | |
CN107256226A (en) | The construction method and device of a kind of knowledge base | |
CN112925899B (en) | Ordering model establishment method, case clue recommendation method, device and medium | |
CN104750860B (en) | A kind of date storage method of uncertain data | |
CN105827873B (en) | A kind of solution strange land client traffic handles limited method and device |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
RJ01 | Rejection of invention patent application after publication | ||
RJ01 | Rejection of invention patent application after publication |
Application publication date: 20191108 |