CN110070364A - Method and apparatus, storage medium based on the fraud of graph model detection clique - Google Patents

Method and apparatus, storage medium based on the fraud of graph model detection clique Download PDF

Info

Publication number
CN110070364A
CN110070364A CN201910239821.3A CN201910239821A CN110070364A CN 110070364 A CN110070364 A CN 110070364A CN 201910239821 A CN201910239821 A CN 201910239821A CN 110070364 A CN110070364 A CN 110070364A
Authority
CN
China
Prior art keywords
clique
user
determined
association
data
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201910239821.3A
Other languages
Chinese (zh)
Inventor
黄剑飞
陈振
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Sankuai Online Technology Co Ltd
Original Assignee
Beijing Sankuai Online Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Sankuai Online Technology Co Ltd filed Critical Beijing Sankuai Online Technology Co Ltd
Priority to CN201910239821.3A priority Critical patent/CN110070364A/en
Publication of CN110070364A publication Critical patent/CN110070364A/en
Priority to PCT/CN2019/124807 priority patent/WO2020192184A1/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q20/00Payment architectures, schemes or protocols
    • G06Q20/38Payment protocols; Details thereof
    • G06Q20/40Authorisation, e.g. identification of payer or payee, verification of customer or shop credentials; Review and approval of payers, e.g. check credit lines or negative lists
    • G06Q20/401Transaction verification
    • G06Q20/4016Transaction verification involving fraud or risk level assessment in transaction processing

Abstract

This disclosure relates to a kind of method and apparatus based on the fraud of graph model detection clique, storage medium, for solving the technical issues of being difficult to clique's fraud in the related technology.The method based on the fraud of graph model detection clique includes: to obtain user base data and history suspicion user data;According to the data of acquisition, user-association figure is generated;Wherein, the node of the user-association figure is the user-association subgraph generated according to data characteristics, and the side right of the user-association figure includes the similarity of node again;Based on the user-association figure, clique to be determined is generated using community's partitioning algorithm and is gathered;Calculate the suspicion degree of clique's set to be determined;According to calculated result, the judgement result of the clique to be determined is exported.

Description

Method and apparatus, storage medium based on the fraud of graph model detection clique
Technical field
This disclosure relates to network technique field, and in particular, to it is a kind of based on graph model detection clique fraud method and Device, storage medium.
Background technique
Financial field needs to guarantee the safety of funds transaction to the more demanding of transaction risk control.In practical application In, there may be some frauds.For example, fraudster inveigles many ordinary consumers to transfer accounts to it, but not to These consumers return corresponding return, are made profit with this.In order to identify above-mentioned fraud, by the fraudster of high risk It identifies, with the monetary losses for avoiding consumer as far as possible that take measures, can use Trading Model to identify fraudster, than Such as, some payment account is qualitative for fraudster's account, the qualitative funds transaction that fraudster's account is carried out is risk trade.
Summary of the invention
The disclosure provides a kind of method and apparatus, storage medium that clique's fraud is detected based on graph model, to solve correlation The technical issues of clique's fraud is difficult in technology.
To achieve the above object, the embodiment of the present disclosure in a first aspect, providing a kind of based on the fraud of graph model detection clique Method, which comprises
Obtain user base data and history suspicion user data;
According to the data of acquisition, user-association figure is generated;Wherein, the node of the user-association figure is according to data characteristics The side right of the user-association subgraph of generation, the user-association figure includes the similarity of node again;
Based on the user-association figure, clique to be determined is generated using community's partitioning algorithm and is gathered;
Calculate the suspicion degree of clique's set to be determined;
According to calculated result, the judgement result of the clique to be determined is exported.
Optionally, the generation user-association figure, comprising:
Choose the feature combination in the user base data and the history suspicion user data and group number;
Generate user-association subgraph and using feature consistency is equal or ambiguity equivalent way is corresponding with user pass Joining subgraph is that node splicing generates user without weighted associations figure;
Similarity using the user without weighted associations figure interior joint re-generates the similar weighted associations figure of user as side right.
It is optionally, described to generate clique's set to be determined using community's partitioning algorithm, comprising:
Based on the similar weighted associations figure of the user, n clique is generated using community's partitioning algorithm and is gathered, n is positive integer;
Confirm that number of users is less than or equal to very big threshold value in clique's set;
Confirm that number of users is less than the quantity of the clique set of minimum threshold value less than or equal to preset threshold;
Clique set is determined as clique's set to be determined.
Optionally, further includes:
The clique's set for being greater than the very big threshold value to number of users calls community's partitioning algorithm to be divided so that described Number of users is less than or equal to the very big threshold value in clique's set;
If the quantity that the clique that number of users is less than minimum threshold value gathers is greater than the preset threshold, call level poly- The clique set that class algorithm is less than minimum threshold value to number of users is condensed.
Optionally, community's partitioning algorithm includes icon label propagation algorithm or GN algorithm;The hierarchical clustering algorithm packet Include agglomerative algorithm or splitting algorithm.
Optionally, the suspicion degree score for calculating clique's set to be determined, comprising:
Target data feature is chosen from the data characteristics, the target data feature is gathered in the clique to be determined In distribution with distributional difference of the target data feature in overall data be more than targets threshold;
According to accounting of the target data feature in clique's set to be determined, clique's collection to be determined is calculated The suspicion degree score of conjunction.
Optionally, the suspicion degree score for calculating clique's set to be determined, comprising:
Extract clique's feature of each clique's set to be determined;
Clique's feature is inputted in trained regression model so that the regression model exports the group to be determined The suspicion degree score of partner's set.
Optionally, the suspicion degree score for calculating clique's set to be determined, comprising:
Target data feature is chosen from the data characteristics, the target data feature is gathered in the clique to be determined In distribution with distributional difference of the target data feature in overall data be more than targets threshold;
According to accounting of the target data feature in clique's set to be determined, clique's collection to be determined is calculated The the first suspicion degree score closed;
Extract clique's feature of each clique's set to be determined;
Clique's feature is inputted in trained regression model so that the regression model exports the group to be determined Second suspicion degree score of partner's set;
According to the first suspicion degree score and the second suspicion degree score, clique's set to be determined is calculated Comprehensive suspicion degree score.
The second aspect of the embodiment of the present disclosure provides a kind of device based on the fraud of graph model detection clique, described device Include:
Module is obtained, user base data and history suspicion user data are used for;
First generation module generates user-association figure for the data according to acquisition;Wherein, the user-association figure Node is the user-association subgraph generated according to data characteristics, and the side right of the user-association figure includes the similarity of node again;
Second generation module generates clique to be determined using community's partitioning algorithm and collects for being based on the user-association figure It closes;
Computing module, for calculating the suspicion degree of clique's set to be determined;
Output module, for exporting the judgement result of the clique to be determined according to calculated result.
Optionally, first generation module includes:
First chooses submodule, for choosing the feature in the user base data and the history suspicion user data Combination and group number;
First generates submodule, for equal using feature consistency or ambiguity equivalent way to correspond to and generates user-association Subgraph simultaneously splices generation user without weighted associations figure by node of the user-association subgraph;
Second generates submodule, for being re-generated using the user without the similarity of weighted associations figure interior joint as side right The similar weighted associations figure of user.
Optionally, second generation module includes:
Third generates submodule, for being based on the similar weighted associations figure of the user, generates n using community's partitioning algorithm Clique's set, n is positive integer;
First confirmation submodule, for confirming, number of users is less than or equal to very big threshold value in clique's set;
Second confirmation submodule, for confirm number of users be less than minimum threshold value the clique gather quantity be less than or Equal to preset threshold;
Third confirms submodule, gathers for clique set to be determined as clique to be determined.
Optionally, further includes:
Division module, clique's set for being greater than the very big threshold value to number of users call community's partitioning algorithm to carry out It divides so that number of users is less than or equal to the very big threshold value in clique set;
Module is agglomerated, if the quantity for the clique that number of users is less than minimum threshold value to gather is greater than the default threshold Value, the clique set for calling hierarchical clustering algorithm to be less than minimum threshold value to number of users are condensed.
Optionally, community's partitioning algorithm includes icon label propagation algorithm or GN algorithm;The hierarchical clustering algorithm packet Include agglomerative algorithm or splitting algorithm.
Optionally, the computing module includes:
Second chooses submodule, for choosing target data feature, the target data feature from the data characteristics Distribution and distributional difference of the target data feature in overall data in clique's set to be determined are more than target Threshold value;
First computational submodule, for the accounting according to the target data feature in the clique to be determined set, Calculate the suspicion degree score of clique's set to be determined.
Optionally, the computing module includes:
First extracts submodule, for extracting clique's feature of each clique's set to be determined;
First input submodule, for inputting in trained regression model clique's feature so that the recurrence mould Type exports the suspicion degree score of clique's set to be determined.
Optionally, the computing module includes:
Third chooses submodule, for choosing target data feature, the target data feature from the data characteristics Distribution and distributional difference of the target data feature in overall data in clique's set to be determined are more than target Threshold value;
Second computational submodule, for the accounting according to the target data feature in the clique to be determined set, Calculate the first suspicion degree score of clique's set to be determined;
Second extracts submodule, for extracting clique's feature of each clique's set to be determined;
Second input submodule, for inputting in trained regression model clique's feature so that the recurrence mould Type exports the second suspicion degree score of clique's set to be determined;
Third computational submodule, for calculating according to the first suspicion degree score and the second suspicion degree score The synthesis suspicion degree score of clique's set to be determined.
The third aspect of the embodiment of the present disclosure provides a kind of computer readable storage medium, is stored thereon with computer journey The step of sequence, which realizes any one of above-mentioned first aspect the method when being executed by processor.
The fourth aspect of the embodiment of the present disclosure provides a kind of device based on the fraud of graph model detection clique, comprising:
Memory is stored thereon with computer program;And
Processor, it is any in above-mentioned first aspect to realize for executing the computer program in the memory The step of item the method.
By adopting the above technical scheme, following technical effect can at least be reached:
The disclosure generates user-association figure, and to be determined using the generation of community's partitioning algorithm according to the user data of acquisition Clique's set, by the suspicion degree for calculating clique's set to be determined, it can tell whether clique's set to be determined belongs to Clique is cheated, solves the technical issues of being difficult to clique's fraud in the related technology.In addition, the disclosure is also divided using community Algorithm and hierarchical clustering algorithm, solve that clique's scale in clique's division result is excessive, there are many lesser clique's scale amounts Problem.Also, the disclosure promotes graph model data-handling capacity by the means of similarity indexing, while being assembled using subgraph, Similar side right can configure ground mode again and generate the similar weighted associations figure of user, and this method is more flexible can be parallel, can be into one Step promotes the large-scale data processing capacity under fraud scene.
Other feature and advantage of the disclosure will the following detailed description will be given in the detailed implementation section.
Detailed description of the invention
Attached drawing is and to constitute part of specification for providing further understanding of the disclosure, with following tool Body embodiment is used to explain the disclosure together, but does not constitute the limitation to the disclosure.In the accompanying drawings:
Fig. 1 is a kind of method flow based on the fraud of graph model detection clique shown in one exemplary embodiment of the disclosure Figure.
Fig. 2 is the step that a kind of method based on the fraud of graph model detection clique shown according to an exemplary embodiment includes The flow chart of user-association figure is generated in rapid.
Fig. 3 is the step that a kind of method based on the fraud of graph model detection clique shown according to an exemplary embodiment includes The flow chart of clique's set to be determined is generated in rapid.
Fig. 4 is the step that a kind of method based on the fraud of graph model detection clique shown according to an exemplary embodiment includes The flow chart of calculating suspicion degree score in rapid.
Fig. 5 is that another method based on the fraud of graph model detection clique shown according to an exemplary embodiment includes The flow chart of suspicion degree score is calculated in step.
Fig. 6 is that another method based on the fraud of graph model detection clique shown according to an exemplary embodiment includes The flow chart of suspicion degree score is calculated in step.
Fig. 7 is a kind of device block diagram based on the fraud of graph model detection clique shown in one exemplary embodiment of the disclosure.
Fig. 8 is first of a kind of device based on the fraud of graph model detection clique shown in one exemplary embodiment of the disclosure Generation module block diagram.
Fig. 9 is second of a kind of device based on the fraud of graph model detection clique shown in one exemplary embodiment of the disclosure Generation module block diagram.
Figure 10 is another device frame based on the fraud of graph model detection clique shown in one exemplary embodiment of the disclosure Figure.
Figure 11 is a kind of based on the device of graph model detection clique fraud shown in one exemplary embodiment of the disclosure Calculate module frame chart.
Figure 12 is another device based on the fraud of graph model detection clique shown in one exemplary embodiment of the disclosure Computing module block diagram.
Figure 13 is another device based on the fraud of graph model detection clique shown in one exemplary embodiment of the disclosure Computing module block diagram.
Figure 14 is a kind of device block diagram based on the fraud of graph model detection clique shown in one exemplary embodiment of the disclosure.
Specific embodiment
It is described in detail below in conjunction with specific embodiment of the attached drawing to the disclosure.It should be understood that this place is retouched The specific embodiment stated is only used for describing and explaining the disclosure, is not limited to the disclosure.
In order to cope with ubiquitous attack, fraud detection is seeming most important instantly.By investigation, the relevant technologies In, for financial fraud detection mainly using following several, and there are various defects, it is summarized as follows:
Method based on black and white lists, prestige library lookup needs unscheduled maintenance to add new black and white lists or prestige library Content, the paid data purchase of the relatively high such as third party of this maintaining method cost, and method response and spreadability are limited.
The method of rule-based engine, financial fraud means are changeable on line, after fraudster changes fraudulent mean, based on rule Then the method for engine will often fail, and need to put into a large amount of operations and financial resource goes to update regulation engine.
Method based on Supervised machine learning, Supervised machine learning are most widely used study sides in fraud detection Method.Machine learning model is by that can use such as decision tree, random forest, support vector machines (Support Vector Machine) and NB Algorithm etc., the complicated calculations of hundreds of variables (higher dimensional space) are carried out, accurate locking fraud row For but Supervised machine learning method depends on labeled data, and it is bigger, just that labeled data in financial fraud scene obtains difficulty Negative sample is unbalance (positive sample only when fraud generation after mark just have, and sample changeable in financial fraud scene fraudulent mean compared with Cause mark more difficult less).If it is limited to lack fraud labeled data, the ability of Supervised machine learning enough.
Method based on unsupervised learning, unsupervised learning are a branches of current fraud detection explorative research, mainly It is to be studied based on cluster and drawing method, current unsupervised Technical comparing is immature, and difficulty is bigger, not ready-made solution Unsupervised machine learning effectively can be used for fraud detection by scheme.Main difficulty as how to solve large-scale data ability, Suspicion determines quantization etc..
Fig. 1 is a kind of method flow based on the fraud of graph model detection clique shown in one exemplary embodiment of the disclosure Figure, to solve to be difficult to the technical issues of clique is cheated in the related technology.As shown in Figure 1, clique should be detected based on graph model Fraud includes:
S11 obtains user base data and history suspicion user data.
S12 generates user-association figure according to the data of acquisition;Wherein, the node of the user-association figure is according to data The user-association subgraph that feature generates, the side right of the user-association figure includes the similarity of node again.
S13 is based on the user-association figure, generates clique to be determined using community's partitioning algorithm and gathers.
S14 calculates the suspicion degree of clique's set to be determined.
S15 exports the judgement result of the clique to be determined according to calculated result.
In step s 11, the user data can be the data of the various client accounts of user's application, for example apply User data etc. when user data when user data when Meituan account, application Alipay account, application wechat account, institute Stating account can be bank's card number of user's application, such as deposit card or credit card.The user data is also possible to utilize The corresponding data of the user that payment platform is paid, for example, paid using Meituan user data, using Alipay into The capable user data paid, the user data paid using wechat etc..User base data include that applicant fills in Shen It please book data, people's row report queries information, mobile terminal behavioral data, electric quotient data and the social data of applicant's authorization.Institute Stating history suspicion user data may include black and white lists information, and black and white lists can be any entity type in network, account Family, address, telephone number etc..Blacklist includes the interior fraud accumulated of row, serious overdue or exchange blacklist, white list packet Include phone, the address etc. of vip client or handmarking's devoid of risk.
After obtaining user base data and history suspicion user data, step S12 is executed, it is raw according to the data of acquisition At user-association figure;Wherein, the node of the user-association figure is the user-association subgraph generated according to data characteristics, the use The side right of family associated diagram includes the similarity of node again.
Referring to FIG. 2, the data according to acquisition, generate user-association figure, may comprise steps of:
S121 chooses the feature combination in the user base data and the history suspicion user data and group number.
S122 generates user-association subgraph and using feature consistency is equal or ambiguity equivalent way is corresponding with the use It is that node splicing generates user without weighted associations figure that family, which is associated with subgraph,.
S123, the similarity using the user without weighted associations figure interior joint re-generate user's similarity weight series of fortified passes as side right Connection figure.
In step S121, the feature in the data can be device id, IP address, imsi, and (international mobile subscriber is known Other code), imei (international mobile equipment identification number), geography information, the features such as login time.The feature combination is from the number At least one feature is selected in feature in as one group, described group of number is at least also one group.
After selected characteristic combination and group number, using feature consistency is equal or ambiguity equivalent way, by different features Combination associates to form user-association subgraph.For example, the device id that different accounts log in is identical, then it is consistent to can use feature Property equivalent way, which is got up;The IP address part that different accounts log in is identical, i.e., under the same local area network Logged difference account, then can use feature Fuzzy equivalent way for two account relatings.Generate user-association After subgraph, is spliced using the user-association subgraph as node and generate user without weighted associations figure.Then, with the user without weight The similarity of associated diagram interior joint re-generates the similar weighted associations figure of user as side right, and measuring similarity can be used in Xiang Shidu Function calculates, and generates the similar weighted associations figure of user based on weight size alternative beta pruning optimization.
After generating user-association figure, step S13 is executed, user-association figure is based on, is generated using community's partitioning algorithm wait sentence Determine clique's set.It is referring to figure 3., described to generate clique's set to be determined using community's partitioning algorithm, comprising:
S131 is based on the similar weighted associations figure of the user, generates n clique using community's partitioning algorithm and gathers, n is positive Integer.Wherein, community's partitioning algorithm includes icon label propagation algorithm or GN algorithm.
S132 confirms that number of users is less than or equal to very big threshold value in clique's set.
S133, the quantity that the clique that confirmation number of users is less than minimum threshold value gathers are less than or equal to preset threshold. The very big threshold value is greater than the minimum threshold value.
Clique set is determined as clique to be determined and gathered by S134.
When number of users (such as different account quantity) is greater than very big threshold value in clique set, for example, one Different account quantity is more than 20 in clique's set, then continues that community's partitioning algorithm is called to be divided so that the clique collects Number of users is less than or equal to the very big threshold value in conjunction.If number of users is less than the quantity that the clique of minimum threshold value gathers Hierarchical clustering is then called for example, less than 3 clique's set numbers of different account quantity are more than 15 greater than the preset threshold Algorithm to number of users be less than minimum threshold value the clique set is condensed, here the optional Layer-agglomeration of hierarchical clustering or Disintegrating method.
After generating clique's set to be determined, step S14 is executed, calculates the suspicion degree of clique's set to be determined.Suspicion The calculation of degree includes but is not limited to three kinds following:
The first calculation: referring to FIG. 4, the suspicion degree score for calculating clique's set to be determined, including Following steps:
S141a chooses target data feature from the data characteristics, and the target data feature is in the group to be determined Distribution and distributional difference of the target data feature in overall data in partner's set are more than targets threshold.Wherein, whole Data refer to all user base data.
S142a is calculated described to be determined according to accounting of the target data feature in clique's set to be determined The suspicion degree score of clique's set.
For example, for account quantity 100 of the same day new registration of certain client, wherein being infused using virtual mobile phone number The quantity of volume account is 8, then distribution proportion of the account registered using virtual mobile phone number on the day of in the account of new registration as 8%.In some generated clique's set to be determined, account quantity is 10, wherein having 7 accounts is infused using virtual mobile phone number Volume, distribution proportion 70%, 70% comparison 8%, otherness is very big.The account then registered using virtual mobile phone number is target data Feature, accounting of the target data feature in the clique to be determined set is 0.7, can using the accounting as described in Determine the suspicion degree score of clique's set.
Alternatively, for account quantity 100 of new registration on the day of certain client, wherein history suspicion user's registration account Quantity be 8, then distribution proportion of the account of history suspicion user's registration on the day of in the account of new registration be 8%.It generates Some clique to be determined set in, account quantity is 10, wherein having 8 accounts is history suspicion user's registration, distribution Ratio is 80%, and 80% comparison 8%, otherness is very big.Then using the account of history suspicion user's registration as target data feature, institute Stating accounting of the target data feature in clique's set to be determined is 0.8, can be using the accounting as the group to be determined The suspicion degree score of partner's set.
Second of calculation:, can be with referring to FIG. 5, the suspicion degree score for calculating the clique to be determined set The following steps are included:
S141b extracts clique's feature of each clique's set to be determined.Wherein, clique's feature includes at least History suspicion user's accounting feature also may include the features such as clique's scale, shared device account quantity accounting.
S142b, clique's feature is inputted in trained regression model so that regression model output it is described to Determine the suspicion degree score of clique's set.Wherein, the regression model can be GBDT (Gradient Boosting Decision Tree;Gradient promotes decision tree) model.
The third calculation:, can be with referring to FIG. 6, the suspicion degree score for calculating the clique to be determined set The following steps are included:
S141c chooses target data feature from the data characteristics, and the target data feature is in the group to be determined Distribution and distributional difference of the target data feature in overall data in partner's set are more than targets threshold.
S142c is calculated described to be determined according to accounting of the target data feature in clique's set to be determined First suspicion degree score of clique's set.
S143c extracts clique's feature of each clique's set to be determined.
S14c inputs clique's feature in trained regression model so that regression model output is described wait sentence Determine the second suspicion degree score of clique's set.
S144c calculates the clique to be determined according to the first suspicion degree score and the second suspicion degree score The synthesis suspicion degree score of set.
Then according to calculated result, the judgement result of the clique to be determined is exported.For example, when comprehensive suspicion degree score is super When crossing preset value, then it can be determined that the clique to be determined for fraud clique.
For example, the first suspicion degree of some clique to be determined is scored at 0.7, and the second suspicion degree score 0.8 is then described The synthesis suspicion degree score of clique to be determined set can take the average value 0.75 of two scores, be more than preset value 0.6, then should be to Clique is determined to cheat clique.
The disclosure generates user-association figure, and to be determined using the generation of community's partitioning algorithm according to the user data of acquisition Clique's set, by the suspicion degree for calculating clique's set to be determined, it can tell whether clique's set to be determined belongs to Clique is cheated, solves the technical issues of being difficult to clique's fraud in the related technology.In addition, the disclosure is also divided using community Algorithm and hierarchical clustering algorithm, solve that clique's scale in clique's division result is excessive, there are many lesser clique's scale amounts Problem.Also, the disclosure promotes graph model data-handling capacity by the means of similarity indexing, while being assembled using subgraph, Similar side right can configure ground mode again and generate the similar weighted associations figure of user, and this method is more flexible can be parallel, can be into one Step promotes the large-scale data processing capacity under fraud scene.
It is worth noting that for simple description, therefore, it is stated as a systems for embodiment of the method shown in FIG. 1 The combination of actions of column, but those skilled in the art should understand that, the disclosure is not limited by the described action sequence.Its It is secondary, those skilled in the art should also know that, the embodiments described in the specification are all preferred embodiments, related dynamic Make necessary to the not necessarily disclosure.
Fig. 7 is a kind of device based on the fraud of graph model detection clique shown in one exemplary embodiment of the disclosure.Such as Fig. 7 Shown, the device 300 based on the fraud of graph model detection clique includes:
Module 310 is obtained, for obtaining user base data and history suspicion user data;
First generation module 320 generates user-association figure for the data according to acquisition;Wherein, the user-association figure Node be the user-association subgraph generated according to data characteristics, the side right of the user-association figure includes the similar of node again Degree;
Second generation module 330 generates clique to be determined using community's partitioning algorithm for being based on the user-association figure Set;
Computing module 340, for calculating the suspicion degree of clique's set to be determined;
Output module 350, for exporting the judgement result of the clique to be determined according to calculated result.
Optionally, as shown in figure 8, first generation module 320 includes:
First chooses submodule 321, for choosing in the user base data and the history suspicion user data Feature combination and group number;
First generates submodule 322, for equal using feature consistency or ambiguity equivalent way to correspond to and generates user It is associated with subgraph and is spliced using the user-association subgraph as node and generate user without weighted associations figure;
Second generates submodule 323, for the similarity using the user without weighted associations figure interior joint as side right weight Generate the similar weighted associations figure of user.
Optionally, as shown in figure 9, second generation module 330 includes:
Third generates submodule 331, for being based on the similar weighted associations figure of the user, is generated using community's partitioning algorithm N clique's set, n is positive integer;
First confirmation submodule 332, for confirming, number of users is less than or equal to very big threshold value in clique's set;
Second confirmation submodule 333, for confirming that number of users is small less than the quantity that the clique of minimum threshold value gathers In or equal to preset threshold;
Third confirms submodule 334, gathers for clique set to be determined as clique to be determined.
Optionally, as shown in Figure 10, the device 300 based on the fraud of graph model detection clique further include:
Division module 360, clique's set for being greater than the very big threshold value to number of users call community's partitioning algorithm It is divided so that number of users is less than or equal to the very big threshold value in clique set;
Module 370 is agglomerated, if being greater than for the quantity that the clique that number of users is less than minimum threshold value gathers described pre- If threshold value, the clique set for calling hierarchical clustering algorithm to be less than minimum threshold value to number of users is condensed.
Optionally, community's partitioning algorithm includes icon label propagation algorithm or GN algorithm;The hierarchical clustering algorithm packet Include agglomerative algorithm or splitting algorithm.
Optionally, as shown in figure 11, the computing module 340 includes:
Second chooses submodule 341a, for choosing target data feature, the target data from the data characteristics Distribution of the feature in the clique to be determined set be more than with distributional difference of the target data feature in overall data Targets threshold;
First computational submodule 342a, for the accounting in clique's set to be determined according to the target data feature Than calculating the suspicion degree score of clique's set to be determined.
Optionally, as shown in figure 12, the computing module 340 includes:
First extracts submodule 341b, for extracting clique's feature of each clique's set to be determined;
First input submodule 342b, for inputting in trained regression model clique's feature so that described time Model is returned to export the suspicion degree score of clique's set to be determined.
Optionally, as shown in figure 13, the computing module 340 includes:
Third chooses submodule 341c, for choosing target data feature, the target data from the data characteristics Distribution of the feature in the clique to be determined set be more than with distributional difference of the target data feature in overall data Targets threshold;
Second computational submodule 342c, for the accounting in clique's set to be determined according to the target data feature Than calculating the first suspicion degree score of clique's set to be determined;
Second extracts submodule 343c, for extracting clique's feature of each clique's set to be determined;
Second input submodule 344c, for inputting in trained regression model clique's feature so that described time Model is returned to export the second suspicion degree score of clique's set to be determined;
Third computational submodule 345c is used for according to the first suspicion degree score and the second suspicion degree score, Calculate the synthesis suspicion degree score of clique's set to be determined.
About the device in above-described embodiment, wherein modules execute the concrete mode of operation in related this method Embodiment in be described in detail, no detailed explanation will be given here.
The disclosure also provides a kind of computer readable storage medium, is stored thereon with computer program, and the program is processed The method and step based on the fraud of graph model detection clique described in any of the above-described alternative embodiment is realized when device executes.
The disclosure also provides a kind of device based on the fraud of graph model detection clique, comprising:
Memory is stored thereon with computer program;And
Processor, for executing the computer program in the memory, to realize the optional implementation of any of the above-described The example method and step based on the fraud of graph model detection clique.
Figure 14 is a kind of frame of device 400 based on the fraud of graph model detection clique shown according to an exemplary embodiment Figure.As shown in figure 14, which may include: processor 401, memory 402, multimedia component 403, input/output (I/O) interface 404 and communication component 405.
Wherein, processor 401 is used to control the integrated operation of the device 400, above-mentioned based on graph model detection to complete All or part of the steps in the method for clique's fraud.Memory 402 is for storing various types of data to support in the dress 400 operation is set, these data for example may include the finger of any application or method for operating on the device 400 Order and the relevant data of application program.The memory 402 can be by any kind of volatibility or non-volatile memory device Or their combination is realized, for example, static random access memory (Static Random Access Memory, referred to as SRAM), electrically erasable programmable read-only memory (Electrically Erasable Programmable Read-Only Memory, abbreviation EEPROM), Erasable Programmable Read Only Memory EPROM (Erasable Programmable Read-Only Memory, abbreviation EPROM), programmable read only memory (Programmable Read-Only Memory, abbreviation PROM), only It reads memory (Read-Only Memory, abbreviation ROM), magnetic memory, flash memory, disk or CD.Multimedia component 403 may include screen and audio component.Wherein screen for example can be touch screen, and audio component is for exporting and/or inputting Audio signal.For example, audio component may include a microphone, microphone is for receiving external audio signal.Institute is received Audio signal can be further stored in memory 402 or be sent by communication component 405.Audio component further includes at least one A loudspeaker is used for output audio signal.I/O interface 404 provides interface between processor 401 and other interface modules, on Stating other interface modules can be keyboard, mouse, button etc..These buttons can be virtual push button or entity button.Communication Component 405 is for carrying out wired or wireless communication between the device 400 and other equipment.Wireless communication, such as Wi-Fi, bluetooth, Near-field communication (Near Field Communication, abbreviation NFC), 2G, 3G or 4G or they one or more of group It closes, therefore the corresponding communication component 405 may include: Wi-Fi module, bluetooth module, NFC module.
In one exemplary embodiment, device 400 can be by one or more application specific integrated circuit (Application Specific Integrated Circuit, abbreviation ASIC), digital signal processor (Digital Signal Processor, abbreviation DSP), digital signal processing appts (Digital Signal Processing Device, Abbreviation DSPD), programmable logic device (Programmable Logic Device, abbreviation PLD), field programmable gate array (Field Programmable Gate Array, abbreviation FPGA), controller, microcontroller, microprocessor or other electronics member Part is realized, for executing the above-mentioned method based on the fraud of graph model detection clique.
In a further exemplary embodiment, a kind of computer readable storage medium including program instruction, example are additionally provided It such as include the memory 402 of program instruction, above procedure instruction can be executed above-mentioned to complete by the processor 401 of device 400 Method based on the fraud of graph model detection clique.
The preferred embodiment of the disclosure is described in detail in conjunction with attached drawing above, still, the disclosure is not limited to above-mentioned reality The detail in mode is applied, in the range of the technology design of the disclosure, a variety of letters can be carried out to the technical solution of the disclosure Monotropic type, these simple variants belong to the protection scope of the disclosure.
It is further to note that specific technical features described in the above specific embodiments, in not lance In the case where shield, it can be combined in any appropriate way.In order to avoid unnecessary repetition, the disclosure to it is various can No further explanation will be given for the combination of energy.
In addition, any combination can also be carried out between a variety of different embodiments of the disclosure, as long as it is without prejudice to originally Disclosed thought equally should be considered as disclosure disclosure of that.

Claims (10)

1. a kind of method based on the fraud of graph model detection clique, which is characterized in that the described method includes:
Obtain user base data and history suspicion user data;
According to the data of acquisition, user-association figure is generated;Wherein, the node of the user-association figure is to be generated according to data characteristics User-association subgraph, the side right of the user-association figure includes the similarity of node again;
Based on the user-association figure, clique to be determined is generated using community's partitioning algorithm and is gathered;
Calculate the suspicion degree of clique's set to be determined;
According to calculated result, the judgement result of the clique to be determined is exported.
2. the method according to claim 1, wherein the generation user-association figure, comprising:
Choose the feature combination in the user base data and the history suspicion user data and group number;
Generate user-association subgraph and using feature consistency is equal or ambiguity equivalent way is corresponding with user-association Figure is that node splicing generates user without weighted associations figure;
Similarity using the user without weighted associations figure interior joint re-generates the similar weighted associations figure of user as side right.
3. according to the method described in claim 2, it is characterized in that, described generate clique's collection to be determined using community's partitioning algorithm It closes, comprising:
Based on the similar weighted associations figure of the user, n clique is generated using community's partitioning algorithm and is gathered, n is positive integer;
Confirm that number of users is less than or equal to very big threshold value in clique's set;
Confirm that number of users is less than the quantity of the clique set of minimum threshold value less than or equal to preset threshold;
Clique set is determined as clique's set to be determined.
4. according to the method described in claim 3, it is characterized by further comprising:
The clique's set for being greater than the very big threshold value to number of users calls community's partitioning algorithm to be divided so that the clique Number of users is less than or equal to the very big threshold value in set;
If the quantity that the clique that number of users is less than minimum threshold value gathers is greater than the preset threshold, hierarchical clustering is called to calculate The clique set that method is less than minimum threshold value to number of users is condensed.
5. according to the method described in claim 4, it is characterized in that, community's partitioning algorithm include icon label propagation algorithm or GN algorithm;The hierarchical clustering algorithm includes agglomerative algorithm or splitting algorithm.
6. the method according to claim 1, wherein the suspicion degree for calculating clique's set to be determined obtains Point, comprising:
Target data feature is chosen from the data characteristics, the target data feature is in clique's set to be determined Distribution is more than targets threshold with distributional difference of the target data feature in overall data;
According to accounting of the target data feature in clique's set to be determined, clique's set to be determined is calculated Suspicion degree score.
7. the method according to claim 1, wherein the suspicion degree for calculating clique's set to be determined obtains Point, comprising:
Extract clique's feature of each clique's set to be determined;
Clique's feature is inputted in trained regression model so that regression model output clique's collection to be determined The suspicion degree score of conjunction.
8. a kind of device based on the fraud of graph model detection clique, which is characterized in that described device includes:
Module is obtained, for obtaining user base data and history suspicion user data;
First generation module generates user-association figure for the data according to acquisition;Wherein, the node of the user-association figure Side right for the user-association subgraph generated according to data characteristics, the user-association figure includes the similarity of node again;
Second generation module generates clique to be determined using community's partitioning algorithm and gathers for being based on the user-association figure;
Computing module, for calculating the suspicion degree of clique's set to be determined;
Output module, for exporting the judgement result of the clique to be determined according to calculated result.
9. a kind of computer readable storage medium, is stored thereon with computer program, which is characterized in that the program is held by processor The step of any one of claims 1 to 7 the method is realized when row.
10. a kind of device based on the fraud of graph model detection clique characterized by comprising
Memory is stored thereon with computer program;And
Processor, for executing the computer program in the memory, to realize any one of claims 1 to 7 institute The step of stating method.
CN201910239821.3A 2019-03-27 2019-03-27 Method and apparatus, storage medium based on the fraud of graph model detection clique Pending CN110070364A (en)

Priority Applications (2)

Application Number Priority Date Filing Date Title
CN201910239821.3A CN110070364A (en) 2019-03-27 2019-03-27 Method and apparatus, storage medium based on the fraud of graph model detection clique
PCT/CN2019/124807 WO2020192184A1 (en) 2019-03-27 2019-12-12 Gang fraud detection based on graph model

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910239821.3A CN110070364A (en) 2019-03-27 2019-03-27 Method and apparatus, storage medium based on the fraud of graph model detection clique

Publications (1)

Publication Number Publication Date
CN110070364A true CN110070364A (en) 2019-07-30

Family

ID=67366679

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910239821.3A Pending CN110070364A (en) 2019-03-27 2019-03-27 Method and apparatus, storage medium based on the fraud of graph model detection clique

Country Status (2)

Country Link
CN (1) CN110070364A (en)
WO (1) WO2020192184A1 (en)

Cited By (19)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110827159A (en) * 2019-11-11 2020-02-21 上海交通大学 Financial medical insurance fraud early warning method, device and terminal based on relational graph
CN111090729A (en) * 2019-12-16 2020-05-01 深圳市卡牛科技有限公司 Method, device, server and storage medium for identifying fraudulent group
CN111325350A (en) * 2020-02-19 2020-06-23 第四范式(北京)技术有限公司 Suspicious tissue discovery system and method
CN111339436A (en) * 2020-02-11 2020-06-26 腾讯科技(深圳)有限公司 Data identification method, device, equipment and readable storage medium
CN111401959A (en) * 2020-03-18 2020-07-10 多点(深圳)数字科技有限公司 Risk group prediction method and device, computer equipment and storage medium
CN111428217A (en) * 2020-04-12 2020-07-17 中信银行股份有限公司 Method and device for identifying cheat group, electronic equipment and computer readable storage medium
CN111476662A (en) * 2020-04-13 2020-07-31 中国工商银行股份有限公司 Anti-money laundering identification method and device
CN111709756A (en) * 2020-06-16 2020-09-25 银联商务股份有限公司 Method and device for identifying suspicious communities, storage medium and computer equipment
WO2020192184A1 (en) * 2019-03-27 2020-10-01 北京三快在线科技有限公司 Gang fraud detection based on graph model
CN111931047A (en) * 2020-07-31 2020-11-13 中国平安人寿保险股份有限公司 Artificial intelligence-based black product account detection method and related device
CN112184334A (en) * 2020-10-27 2021-01-05 北京嘀嘀无限科技发展有限公司 Method, apparatus, device and medium for determining problem users
CN112308694A (en) * 2020-11-24 2021-02-02 拉卡拉支付股份有限公司 Method and device for discovering cheating group
CN112651764A (en) * 2019-10-12 2021-04-13 武汉斗鱼网络科技有限公司 Target user identification method, device, equipment and storage medium
CN112907308A (en) * 2019-11-19 2021-06-04 京东数字科技控股有限公司 Data detection method and device and computer readable storage medium
CN113326178A (en) * 2021-06-22 2021-08-31 北京奇艺世纪科技有限公司 Abnormal account number propagation method and device, electronic equipment and storage medium
WO2021169631A1 (en) * 2020-02-29 2021-09-02 深圳壹账通智能科技有限公司 Fraudster identification method, apparatus and device, and storage medium
CN113592517A (en) * 2021-08-09 2021-11-02 深圳前海微众银行股份有限公司 Method and device for identifying cheating passenger groups, terminal equipment and computer storage medium
CN114820219A (en) * 2022-05-23 2022-07-29 杭银消费金融股份有限公司 Complex network-based cheating community identification method and system
CN115150052A (en) * 2022-06-08 2022-10-04 北京天融信网络安全技术有限公司 Method, device, equipment and storage medium for tracking and identifying attack group

Family Cites Families (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8694979B2 (en) * 2012-06-26 2014-04-08 International Business Machines Corporation Efficient egonet computation in a weighted directed graph
CN105812195B (en) * 2014-12-30 2019-05-07 阿里巴巴集团控股有限公司 The method and apparatus of computer identification batch account
CN107194623B (en) * 2017-07-20 2021-01-05 深圳市分期乐网络科技有限公司 Group partner fraud discovery method and device
CN108681936B (en) * 2018-04-26 2021-11-02 浙江邦盛科技有限公司 Fraud group identification method based on modularity and balanced label propagation
CN110070364A (en) * 2019-03-27 2019-07-30 北京三快在线科技有限公司 Method and apparatus, storage medium based on the fraud of graph model detection clique

Cited By (24)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2020192184A1 (en) * 2019-03-27 2020-10-01 北京三快在线科技有限公司 Gang fraud detection based on graph model
CN112651764B (en) * 2019-10-12 2023-03-31 武汉斗鱼网络科技有限公司 Target user identification method, device, equipment and storage medium
CN112651764A (en) * 2019-10-12 2021-04-13 武汉斗鱼网络科技有限公司 Target user identification method, device, equipment and storage medium
CN110827159B (en) * 2019-11-11 2023-11-03 上海交通大学 Financial medical insurance fraud early warning method, device and terminal based on relation diagram
CN110827159A (en) * 2019-11-11 2020-02-21 上海交通大学 Financial medical insurance fraud early warning method, device and terminal based on relational graph
CN112907308A (en) * 2019-11-19 2021-06-04 京东数字科技控股有限公司 Data detection method and device and computer readable storage medium
CN111090729B (en) * 2019-12-16 2024-04-09 深圳市卡牛科技有限公司 Identification method, device, server and storage medium for fraudulent group
CN111090729A (en) * 2019-12-16 2020-05-01 深圳市卡牛科技有限公司 Method, device, server and storage medium for identifying fraudulent group
CN111339436A (en) * 2020-02-11 2020-06-26 腾讯科技(深圳)有限公司 Data identification method, device, equipment and readable storage medium
CN111325350B (en) * 2020-02-19 2023-09-29 第四范式(北京)技术有限公司 Suspicious tissue discovery system and method
CN111325350A (en) * 2020-02-19 2020-06-23 第四范式(北京)技术有限公司 Suspicious tissue discovery system and method
WO2021169631A1 (en) * 2020-02-29 2021-09-02 深圳壹账通智能科技有限公司 Fraudster identification method, apparatus and device, and storage medium
CN111401959A (en) * 2020-03-18 2020-07-10 多点(深圳)数字科技有限公司 Risk group prediction method and device, computer equipment and storage medium
CN111401959B (en) * 2020-03-18 2023-09-29 多点(深圳)数字科技有限公司 Risk group prediction method, apparatus, computer device and storage medium
CN111428217A (en) * 2020-04-12 2020-07-17 中信银行股份有限公司 Method and device for identifying cheat group, electronic equipment and computer readable storage medium
CN111476662A (en) * 2020-04-13 2020-07-31 中国工商银行股份有限公司 Anti-money laundering identification method and device
CN111709756A (en) * 2020-06-16 2020-09-25 银联商务股份有限公司 Method and device for identifying suspicious communities, storage medium and computer equipment
CN111931047A (en) * 2020-07-31 2020-11-13 中国平安人寿保险股份有限公司 Artificial intelligence-based black product account detection method and related device
CN112184334A (en) * 2020-10-27 2021-01-05 北京嘀嘀无限科技发展有限公司 Method, apparatus, device and medium for determining problem users
CN112308694A (en) * 2020-11-24 2021-02-02 拉卡拉支付股份有限公司 Method and device for discovering cheating group
CN113326178A (en) * 2021-06-22 2021-08-31 北京奇艺世纪科技有限公司 Abnormal account number propagation method and device, electronic equipment and storage medium
CN113592517A (en) * 2021-08-09 2021-11-02 深圳前海微众银行股份有限公司 Method and device for identifying cheating passenger groups, terminal equipment and computer storage medium
CN114820219A (en) * 2022-05-23 2022-07-29 杭银消费金融股份有限公司 Complex network-based cheating community identification method and system
CN115150052A (en) * 2022-06-08 2022-10-04 北京天融信网络安全技术有限公司 Method, device, equipment and storage medium for tracking and identifying attack group

Also Published As

Publication number Publication date
WO2020192184A1 (en) 2020-10-01

Similar Documents

Publication Publication Date Title
CN110070364A (en) Method and apparatus, storage medium based on the fraud of graph model detection clique
CN106803168B (en) Abnormal transfer detection method and device
US11074350B2 (en) Method and device for controlling data risk
CN112396189B (en) Method and device for constructing federal learning model by multiple parties
CN112600810B (en) Ether house phishing fraud detection method and device based on graph classification
CN106408413A (en) Multi-cycle installment decision making method and system
KR101364763B1 (en) Financial fraud warning system using banking transaction pattern analysis and a method thereof
CN109063966A (en) The recognition methods of adventure account and device
CN108921686A (en) A kind of credit-graded approach and device of personal user
CN104616194A (en) Data processing method and payment platform
CN109325845A (en) A kind of financial product intelligent recommendation method and system
Kulkarni et al. Advanced credit score calculation using social media and machine learning
CN109711801A (en) A kind of Internetbank account checking method and device
CN109003088B (en) Business risk analysis method, device and equipment
CN110197426A (en) A kind of method for building up of credit scoring model, device and readable storage medium storing program for executing
CN106296154A (en) Transaction methods and system
CN114862110A (en) Method and device for building middle platform of commercial banking business, electronic equipment and storage medium
CN111428092B (en) Bank accurate marketing method based on graph model
CN111582873B (en) Method and device for evaluating interaction event, electronic equipment and storage medium
CN110347566A (en) For carrying out the method and device of measures of effectiveness to registration air control model
TWI717839B (en) Risk peak identification method and device
CN111553702A (en) Payment risk identification method and device
CN113052579B (en) Payment method and system of mobile payment platform
CN115345726B (en) Automatic credit card approval method and device, electronic equipment and medium
Roa Ballén Machine Learning Models and Alternative Data in Credit Scoring: Statistical and Financial impact

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination