CN110070364A - Method and apparatus, storage medium based on the fraud of graph model detection clique - Google Patents
Method and apparatus, storage medium based on the fraud of graph model detection clique Download PDFInfo
- Publication number
- CN110070364A CN110070364A CN201910239821.3A CN201910239821A CN110070364A CN 110070364 A CN110070364 A CN 110070364A CN 201910239821 A CN201910239821 A CN 201910239821A CN 110070364 A CN110070364 A CN 110070364A
- Authority
- CN
- China
- Prior art keywords
- clique
- user
- determined
- association
- data
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06Q—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
- G06Q20/00—Payment architectures, schemes or protocols
- G06Q20/38—Payment protocols; Details thereof
- G06Q20/40—Authorisation, e.g. identification of payer or payee, verification of customer or shop credentials; Review and approval of payers, e.g. check credit lines or negative lists
- G06Q20/401—Transaction verification
- G06Q20/4016—Transaction verification involving fraud or risk level assessment in transaction processing
Abstract
This disclosure relates to a kind of method and apparatus based on the fraud of graph model detection clique, storage medium, for solving the technical issues of being difficult to clique's fraud in the related technology.The method based on the fraud of graph model detection clique includes: to obtain user base data and history suspicion user data;According to the data of acquisition, user-association figure is generated;Wherein, the node of the user-association figure is the user-association subgraph generated according to data characteristics, and the side right of the user-association figure includes the similarity of node again;Based on the user-association figure, clique to be determined is generated using community's partitioning algorithm and is gathered;Calculate the suspicion degree of clique's set to be determined;According to calculated result, the judgement result of the clique to be determined is exported.
Description
Technical field
This disclosure relates to network technique field, and in particular, to it is a kind of based on graph model detection clique fraud method and
Device, storage medium.
Background technique
Financial field needs to guarantee the safety of funds transaction to the more demanding of transaction risk control.In practical application
In, there may be some frauds.For example, fraudster inveigles many ordinary consumers to transfer accounts to it, but not to
These consumers return corresponding return, are made profit with this.In order to identify above-mentioned fraud, by the fraudster of high risk
It identifies, with the monetary losses for avoiding consumer as far as possible that take measures, can use Trading Model to identify fraudster, than
Such as, some payment account is qualitative for fraudster's account, the qualitative funds transaction that fraudster's account is carried out is risk trade.
Summary of the invention
The disclosure provides a kind of method and apparatus, storage medium that clique's fraud is detected based on graph model, to solve correlation
The technical issues of clique's fraud is difficult in technology.
To achieve the above object, the embodiment of the present disclosure in a first aspect, providing a kind of based on the fraud of graph model detection clique
Method, which comprises
Obtain user base data and history suspicion user data;
According to the data of acquisition, user-association figure is generated;Wherein, the node of the user-association figure is according to data characteristics
The side right of the user-association subgraph of generation, the user-association figure includes the similarity of node again;
Based on the user-association figure, clique to be determined is generated using community's partitioning algorithm and is gathered;
Calculate the suspicion degree of clique's set to be determined;
According to calculated result, the judgement result of the clique to be determined is exported.
Optionally, the generation user-association figure, comprising:
Choose the feature combination in the user base data and the history suspicion user data and group number;
Generate user-association subgraph and using feature consistency is equal or ambiguity equivalent way is corresponding with user pass
Joining subgraph is that node splicing generates user without weighted associations figure;
Similarity using the user without weighted associations figure interior joint re-generates the similar weighted associations figure of user as side right.
It is optionally, described to generate clique's set to be determined using community's partitioning algorithm, comprising:
Based on the similar weighted associations figure of the user, n clique is generated using community's partitioning algorithm and is gathered, n is positive integer;
Confirm that number of users is less than or equal to very big threshold value in clique's set;
Confirm that number of users is less than the quantity of the clique set of minimum threshold value less than or equal to preset threshold;
Clique set is determined as clique's set to be determined.
Optionally, further includes:
The clique's set for being greater than the very big threshold value to number of users calls community's partitioning algorithm to be divided so that described
Number of users is less than or equal to the very big threshold value in clique's set;
If the quantity that the clique that number of users is less than minimum threshold value gathers is greater than the preset threshold, call level poly-
The clique set that class algorithm is less than minimum threshold value to number of users is condensed.
Optionally, community's partitioning algorithm includes icon label propagation algorithm or GN algorithm;The hierarchical clustering algorithm packet
Include agglomerative algorithm or splitting algorithm.
Optionally, the suspicion degree score for calculating clique's set to be determined, comprising:
Target data feature is chosen from the data characteristics, the target data feature is gathered in the clique to be determined
In distribution with distributional difference of the target data feature in overall data be more than targets threshold;
According to accounting of the target data feature in clique's set to be determined, clique's collection to be determined is calculated
The suspicion degree score of conjunction.
Optionally, the suspicion degree score for calculating clique's set to be determined, comprising:
Extract clique's feature of each clique's set to be determined;
Clique's feature is inputted in trained regression model so that the regression model exports the group to be determined
The suspicion degree score of partner's set.
Optionally, the suspicion degree score for calculating clique's set to be determined, comprising:
Target data feature is chosen from the data characteristics, the target data feature is gathered in the clique to be determined
In distribution with distributional difference of the target data feature in overall data be more than targets threshold;
According to accounting of the target data feature in clique's set to be determined, clique's collection to be determined is calculated
The the first suspicion degree score closed;
Extract clique's feature of each clique's set to be determined;
Clique's feature is inputted in trained regression model so that the regression model exports the group to be determined
Second suspicion degree score of partner's set;
According to the first suspicion degree score and the second suspicion degree score, clique's set to be determined is calculated
Comprehensive suspicion degree score.
The second aspect of the embodiment of the present disclosure provides a kind of device based on the fraud of graph model detection clique, described device
Include:
Module is obtained, user base data and history suspicion user data are used for;
First generation module generates user-association figure for the data according to acquisition;Wherein, the user-association figure
Node is the user-association subgraph generated according to data characteristics, and the side right of the user-association figure includes the similarity of node again;
Second generation module generates clique to be determined using community's partitioning algorithm and collects for being based on the user-association figure
It closes;
Computing module, for calculating the suspicion degree of clique's set to be determined;
Output module, for exporting the judgement result of the clique to be determined according to calculated result.
Optionally, first generation module includes:
First chooses submodule, for choosing the feature in the user base data and the history suspicion user data
Combination and group number;
First generates submodule, for equal using feature consistency or ambiguity equivalent way to correspond to and generates user-association
Subgraph simultaneously splices generation user without weighted associations figure by node of the user-association subgraph;
Second generates submodule, for being re-generated using the user without the similarity of weighted associations figure interior joint as side right
The similar weighted associations figure of user.
Optionally, second generation module includes:
Third generates submodule, for being based on the similar weighted associations figure of the user, generates n using community's partitioning algorithm
Clique's set, n is positive integer;
First confirmation submodule, for confirming, number of users is less than or equal to very big threshold value in clique's set;
Second confirmation submodule, for confirm number of users be less than minimum threshold value the clique gather quantity be less than or
Equal to preset threshold;
Third confirms submodule, gathers for clique set to be determined as clique to be determined.
Optionally, further includes:
Division module, clique's set for being greater than the very big threshold value to number of users call community's partitioning algorithm to carry out
It divides so that number of users is less than or equal to the very big threshold value in clique set;
Module is agglomerated, if the quantity for the clique that number of users is less than minimum threshold value to gather is greater than the default threshold
Value, the clique set for calling hierarchical clustering algorithm to be less than minimum threshold value to number of users are condensed.
Optionally, community's partitioning algorithm includes icon label propagation algorithm or GN algorithm;The hierarchical clustering algorithm packet
Include agglomerative algorithm or splitting algorithm.
Optionally, the computing module includes:
Second chooses submodule, for choosing target data feature, the target data feature from the data characteristics
Distribution and distributional difference of the target data feature in overall data in clique's set to be determined are more than target
Threshold value;
First computational submodule, for the accounting according to the target data feature in the clique to be determined set,
Calculate the suspicion degree score of clique's set to be determined.
Optionally, the computing module includes:
First extracts submodule, for extracting clique's feature of each clique's set to be determined;
First input submodule, for inputting in trained regression model clique's feature so that the recurrence mould
Type exports the suspicion degree score of clique's set to be determined.
Optionally, the computing module includes:
Third chooses submodule, for choosing target data feature, the target data feature from the data characteristics
Distribution and distributional difference of the target data feature in overall data in clique's set to be determined are more than target
Threshold value;
Second computational submodule, for the accounting according to the target data feature in the clique to be determined set,
Calculate the first suspicion degree score of clique's set to be determined;
Second extracts submodule, for extracting clique's feature of each clique's set to be determined;
Second input submodule, for inputting in trained regression model clique's feature so that the recurrence mould
Type exports the second suspicion degree score of clique's set to be determined;
Third computational submodule, for calculating according to the first suspicion degree score and the second suspicion degree score
The synthesis suspicion degree score of clique's set to be determined.
The third aspect of the embodiment of the present disclosure provides a kind of computer readable storage medium, is stored thereon with computer journey
The step of sequence, which realizes any one of above-mentioned first aspect the method when being executed by processor.
The fourth aspect of the embodiment of the present disclosure provides a kind of device based on the fraud of graph model detection clique, comprising:
Memory is stored thereon with computer program;And
Processor, it is any in above-mentioned first aspect to realize for executing the computer program in the memory
The step of item the method.
By adopting the above technical scheme, following technical effect can at least be reached:
The disclosure generates user-association figure, and to be determined using the generation of community's partitioning algorithm according to the user data of acquisition
Clique's set, by the suspicion degree for calculating clique's set to be determined, it can tell whether clique's set to be determined belongs to
Clique is cheated, solves the technical issues of being difficult to clique's fraud in the related technology.In addition, the disclosure is also divided using community
Algorithm and hierarchical clustering algorithm, solve that clique's scale in clique's division result is excessive, there are many lesser clique's scale amounts
Problem.Also, the disclosure promotes graph model data-handling capacity by the means of similarity indexing, while being assembled using subgraph,
Similar side right can configure ground mode again and generate the similar weighted associations figure of user, and this method is more flexible can be parallel, can be into one
Step promotes the large-scale data processing capacity under fraud scene.
Other feature and advantage of the disclosure will the following detailed description will be given in the detailed implementation section.
Detailed description of the invention
Attached drawing is and to constitute part of specification for providing further understanding of the disclosure, with following tool
Body embodiment is used to explain the disclosure together, but does not constitute the limitation to the disclosure.In the accompanying drawings:
Fig. 1 is a kind of method flow based on the fraud of graph model detection clique shown in one exemplary embodiment of the disclosure
Figure.
Fig. 2 is the step that a kind of method based on the fraud of graph model detection clique shown according to an exemplary embodiment includes
The flow chart of user-association figure is generated in rapid.
Fig. 3 is the step that a kind of method based on the fraud of graph model detection clique shown according to an exemplary embodiment includes
The flow chart of clique's set to be determined is generated in rapid.
Fig. 4 is the step that a kind of method based on the fraud of graph model detection clique shown according to an exemplary embodiment includes
The flow chart of calculating suspicion degree score in rapid.
Fig. 5 is that another method based on the fraud of graph model detection clique shown according to an exemplary embodiment includes
The flow chart of suspicion degree score is calculated in step.
Fig. 6 is that another method based on the fraud of graph model detection clique shown according to an exemplary embodiment includes
The flow chart of suspicion degree score is calculated in step.
Fig. 7 is a kind of device block diagram based on the fraud of graph model detection clique shown in one exemplary embodiment of the disclosure.
Fig. 8 is first of a kind of device based on the fraud of graph model detection clique shown in one exemplary embodiment of the disclosure
Generation module block diagram.
Fig. 9 is second of a kind of device based on the fraud of graph model detection clique shown in one exemplary embodiment of the disclosure
Generation module block diagram.
Figure 10 is another device frame based on the fraud of graph model detection clique shown in one exemplary embodiment of the disclosure
Figure.
Figure 11 is a kind of based on the device of graph model detection clique fraud shown in one exemplary embodiment of the disclosure
Calculate module frame chart.
Figure 12 is another device based on the fraud of graph model detection clique shown in one exemplary embodiment of the disclosure
Computing module block diagram.
Figure 13 is another device based on the fraud of graph model detection clique shown in one exemplary embodiment of the disclosure
Computing module block diagram.
Figure 14 is a kind of device block diagram based on the fraud of graph model detection clique shown in one exemplary embodiment of the disclosure.
Specific embodiment
It is described in detail below in conjunction with specific embodiment of the attached drawing to the disclosure.It should be understood that this place is retouched
The specific embodiment stated is only used for describing and explaining the disclosure, is not limited to the disclosure.
In order to cope with ubiquitous attack, fraud detection is seeming most important instantly.By investigation, the relevant technologies
In, for financial fraud detection mainly using following several, and there are various defects, it is summarized as follows:
Method based on black and white lists, prestige library lookup needs unscheduled maintenance to add new black and white lists or prestige library
Content, the paid data purchase of the relatively high such as third party of this maintaining method cost, and method response and spreadability are limited.
The method of rule-based engine, financial fraud means are changeable on line, after fraudster changes fraudulent mean, based on rule
Then the method for engine will often fail, and need to put into a large amount of operations and financial resource goes to update regulation engine.
Method based on Supervised machine learning, Supervised machine learning are most widely used study sides in fraud detection
Method.Machine learning model is by that can use such as decision tree, random forest, support vector machines (Support Vector
Machine) and NB Algorithm etc., the complicated calculations of hundreds of variables (higher dimensional space) are carried out, accurate locking fraud row
For but Supervised machine learning method depends on labeled data, and it is bigger, just that labeled data in financial fraud scene obtains difficulty
Negative sample is unbalance (positive sample only when fraud generation after mark just have, and sample changeable in financial fraud scene fraudulent mean compared with
Cause mark more difficult less).If it is limited to lack fraud labeled data, the ability of Supervised machine learning enough.
Method based on unsupervised learning, unsupervised learning are a branches of current fraud detection explorative research, mainly
It is to be studied based on cluster and drawing method, current unsupervised Technical comparing is immature, and difficulty is bigger, not ready-made solution
Unsupervised machine learning effectively can be used for fraud detection by scheme.Main difficulty as how to solve large-scale data ability,
Suspicion determines quantization etc..
Fig. 1 is a kind of method flow based on the fraud of graph model detection clique shown in one exemplary embodiment of the disclosure
Figure, to solve to be difficult to the technical issues of clique is cheated in the related technology.As shown in Figure 1, clique should be detected based on graph model
Fraud includes:
S11 obtains user base data and history suspicion user data.
S12 generates user-association figure according to the data of acquisition;Wherein, the node of the user-association figure is according to data
The user-association subgraph that feature generates, the side right of the user-association figure includes the similarity of node again.
S13 is based on the user-association figure, generates clique to be determined using community's partitioning algorithm and gathers.
S14 calculates the suspicion degree of clique's set to be determined.
S15 exports the judgement result of the clique to be determined according to calculated result.
In step s 11, the user data can be the data of the various client accounts of user's application, for example apply
User data etc. when user data when user data when Meituan account, application Alipay account, application wechat account, institute
Stating account can be bank's card number of user's application, such as deposit card or credit card.The user data is also possible to utilize
The corresponding data of the user that payment platform is paid, for example, paid using Meituan user data, using Alipay into
The capable user data paid, the user data paid using wechat etc..User base data include that applicant fills in Shen
It please book data, people's row report queries information, mobile terminal behavioral data, electric quotient data and the social data of applicant's authorization.Institute
Stating history suspicion user data may include black and white lists information, and black and white lists can be any entity type in network, account
Family, address, telephone number etc..Blacklist includes the interior fraud accumulated of row, serious overdue or exchange blacklist, white list packet
Include phone, the address etc. of vip client or handmarking's devoid of risk.
After obtaining user base data and history suspicion user data, step S12 is executed, it is raw according to the data of acquisition
At user-association figure;Wherein, the node of the user-association figure is the user-association subgraph generated according to data characteristics, the use
The side right of family associated diagram includes the similarity of node again.
Referring to FIG. 2, the data according to acquisition, generate user-association figure, may comprise steps of:
S121 chooses the feature combination in the user base data and the history suspicion user data and group number.
S122 generates user-association subgraph and using feature consistency is equal or ambiguity equivalent way is corresponding with the use
It is that node splicing generates user without weighted associations figure that family, which is associated with subgraph,.
S123, the similarity using the user without weighted associations figure interior joint re-generate user's similarity weight series of fortified passes as side right
Connection figure.
In step S121, the feature in the data can be device id, IP address, imsi, and (international mobile subscriber is known
Other code), imei (international mobile equipment identification number), geography information, the features such as login time.The feature combination is from the number
At least one feature is selected in feature in as one group, described group of number is at least also one group.
After selected characteristic combination and group number, using feature consistency is equal or ambiguity equivalent way, by different features
Combination associates to form user-association subgraph.For example, the device id that different accounts log in is identical, then it is consistent to can use feature
Property equivalent way, which is got up;The IP address part that different accounts log in is identical, i.e., under the same local area network
Logged difference account, then can use feature Fuzzy equivalent way for two account relatings.Generate user-association
After subgraph, is spliced using the user-association subgraph as node and generate user without weighted associations figure.Then, with the user without weight
The similarity of associated diagram interior joint re-generates the similar weighted associations figure of user as side right, and measuring similarity can be used in Xiang Shidu
Function calculates, and generates the similar weighted associations figure of user based on weight size alternative beta pruning optimization.
After generating user-association figure, step S13 is executed, user-association figure is based on, is generated using community's partitioning algorithm wait sentence
Determine clique's set.It is referring to figure 3., described to generate clique's set to be determined using community's partitioning algorithm, comprising:
S131 is based on the similar weighted associations figure of the user, generates n clique using community's partitioning algorithm and gathers, n is positive
Integer.Wherein, community's partitioning algorithm includes icon label propagation algorithm or GN algorithm.
S132 confirms that number of users is less than or equal to very big threshold value in clique's set.
S133, the quantity that the clique that confirmation number of users is less than minimum threshold value gathers are less than or equal to preset threshold.
The very big threshold value is greater than the minimum threshold value.
Clique set is determined as clique to be determined and gathered by S134.
When number of users (such as different account quantity) is greater than very big threshold value in clique set, for example, one
Different account quantity is more than 20 in clique's set, then continues that community's partitioning algorithm is called to be divided so that the clique collects
Number of users is less than or equal to the very big threshold value in conjunction.If number of users is less than the quantity that the clique of minimum threshold value gathers
Hierarchical clustering is then called for example, less than 3 clique's set numbers of different account quantity are more than 15 greater than the preset threshold
Algorithm to number of users be less than minimum threshold value the clique set is condensed, here the optional Layer-agglomeration of hierarchical clustering or
Disintegrating method.
After generating clique's set to be determined, step S14 is executed, calculates the suspicion degree of clique's set to be determined.Suspicion
The calculation of degree includes but is not limited to three kinds following:
The first calculation: referring to FIG. 4, the suspicion degree score for calculating clique's set to be determined, including
Following steps:
S141a chooses target data feature from the data characteristics, and the target data feature is in the group to be determined
Distribution and distributional difference of the target data feature in overall data in partner's set are more than targets threshold.Wherein, whole
Data refer to all user base data.
S142a is calculated described to be determined according to accounting of the target data feature in clique's set to be determined
The suspicion degree score of clique's set.
For example, for account quantity 100 of the same day new registration of certain client, wherein being infused using virtual mobile phone number
The quantity of volume account is 8, then distribution proportion of the account registered using virtual mobile phone number on the day of in the account of new registration as
8%.In some generated clique's set to be determined, account quantity is 10, wherein having 7 accounts is infused using virtual mobile phone number
Volume, distribution proportion 70%, 70% comparison 8%, otherness is very big.The account then registered using virtual mobile phone number is target data
Feature, accounting of the target data feature in the clique to be determined set is 0.7, can using the accounting as described in
Determine the suspicion degree score of clique's set.
Alternatively, for account quantity 100 of new registration on the day of certain client, wherein history suspicion user's registration account
Quantity be 8, then distribution proportion of the account of history suspicion user's registration on the day of in the account of new registration be 8%.It generates
Some clique to be determined set in, account quantity is 10, wherein having 8 accounts is history suspicion user's registration, distribution
Ratio is 80%, and 80% comparison 8%, otherness is very big.Then using the account of history suspicion user's registration as target data feature, institute
Stating accounting of the target data feature in clique's set to be determined is 0.8, can be using the accounting as the group to be determined
The suspicion degree score of partner's set.
Second of calculation:, can be with referring to FIG. 5, the suspicion degree score for calculating the clique to be determined set
The following steps are included:
S141b extracts clique's feature of each clique's set to be determined.Wherein, clique's feature includes at least
History suspicion user's accounting feature also may include the features such as clique's scale, shared device account quantity accounting.
S142b, clique's feature is inputted in trained regression model so that regression model output it is described to
Determine the suspicion degree score of clique's set.Wherein, the regression model can be GBDT (Gradient Boosting
Decision Tree;Gradient promotes decision tree) model.
The third calculation:, can be with referring to FIG. 6, the suspicion degree score for calculating the clique to be determined set
The following steps are included:
S141c chooses target data feature from the data characteristics, and the target data feature is in the group to be determined
Distribution and distributional difference of the target data feature in overall data in partner's set are more than targets threshold.
S142c is calculated described to be determined according to accounting of the target data feature in clique's set to be determined
First suspicion degree score of clique's set.
S143c extracts clique's feature of each clique's set to be determined.
S14c inputs clique's feature in trained regression model so that regression model output is described wait sentence
Determine the second suspicion degree score of clique's set.
S144c calculates the clique to be determined according to the first suspicion degree score and the second suspicion degree score
The synthesis suspicion degree score of set.
Then according to calculated result, the judgement result of the clique to be determined is exported.For example, when comprehensive suspicion degree score is super
When crossing preset value, then it can be determined that the clique to be determined for fraud clique.
For example, the first suspicion degree of some clique to be determined is scored at 0.7, and the second suspicion degree score 0.8 is then described
The synthesis suspicion degree score of clique to be determined set can take the average value 0.75 of two scores, be more than preset value 0.6, then should be to
Clique is determined to cheat clique.
The disclosure generates user-association figure, and to be determined using the generation of community's partitioning algorithm according to the user data of acquisition
Clique's set, by the suspicion degree for calculating clique's set to be determined, it can tell whether clique's set to be determined belongs to
Clique is cheated, solves the technical issues of being difficult to clique's fraud in the related technology.In addition, the disclosure is also divided using community
Algorithm and hierarchical clustering algorithm, solve that clique's scale in clique's division result is excessive, there are many lesser clique's scale amounts
Problem.Also, the disclosure promotes graph model data-handling capacity by the means of similarity indexing, while being assembled using subgraph,
Similar side right can configure ground mode again and generate the similar weighted associations figure of user, and this method is more flexible can be parallel, can be into one
Step promotes the large-scale data processing capacity under fraud scene.
It is worth noting that for simple description, therefore, it is stated as a systems for embodiment of the method shown in FIG. 1
The combination of actions of column, but those skilled in the art should understand that, the disclosure is not limited by the described action sequence.Its
It is secondary, those skilled in the art should also know that, the embodiments described in the specification are all preferred embodiments, related dynamic
Make necessary to the not necessarily disclosure.
Fig. 7 is a kind of device based on the fraud of graph model detection clique shown in one exemplary embodiment of the disclosure.Such as Fig. 7
Shown, the device 300 based on the fraud of graph model detection clique includes:
Module 310 is obtained, for obtaining user base data and history suspicion user data;
First generation module 320 generates user-association figure for the data according to acquisition;Wherein, the user-association figure
Node be the user-association subgraph generated according to data characteristics, the side right of the user-association figure includes the similar of node again
Degree;
Second generation module 330 generates clique to be determined using community's partitioning algorithm for being based on the user-association figure
Set;
Computing module 340, for calculating the suspicion degree of clique's set to be determined;
Output module 350, for exporting the judgement result of the clique to be determined according to calculated result.
Optionally, as shown in figure 8, first generation module 320 includes:
First chooses submodule 321, for choosing in the user base data and the history suspicion user data
Feature combination and group number;
First generates submodule 322, for equal using feature consistency or ambiguity equivalent way to correspond to and generates user
It is associated with subgraph and is spliced using the user-association subgraph as node and generate user without weighted associations figure;
Second generates submodule 323, for the similarity using the user without weighted associations figure interior joint as side right weight
Generate the similar weighted associations figure of user.
Optionally, as shown in figure 9, second generation module 330 includes:
Third generates submodule 331, for being based on the similar weighted associations figure of the user, is generated using community's partitioning algorithm
N clique's set, n is positive integer;
First confirmation submodule 332, for confirming, number of users is less than or equal to very big threshold value in clique's set;
Second confirmation submodule 333, for confirming that number of users is small less than the quantity that the clique of minimum threshold value gathers
In or equal to preset threshold;
Third confirms submodule 334, gathers for clique set to be determined as clique to be determined.
Optionally, as shown in Figure 10, the device 300 based on the fraud of graph model detection clique further include:
Division module 360, clique's set for being greater than the very big threshold value to number of users call community's partitioning algorithm
It is divided so that number of users is less than or equal to the very big threshold value in clique set;
Module 370 is agglomerated, if being greater than for the quantity that the clique that number of users is less than minimum threshold value gathers described pre-
If threshold value, the clique set for calling hierarchical clustering algorithm to be less than minimum threshold value to number of users is condensed.
Optionally, community's partitioning algorithm includes icon label propagation algorithm or GN algorithm;The hierarchical clustering algorithm packet
Include agglomerative algorithm or splitting algorithm.
Optionally, as shown in figure 11, the computing module 340 includes:
Second chooses submodule 341a, for choosing target data feature, the target data from the data characteristics
Distribution of the feature in the clique to be determined set be more than with distributional difference of the target data feature in overall data
Targets threshold;
First computational submodule 342a, for the accounting in clique's set to be determined according to the target data feature
Than calculating the suspicion degree score of clique's set to be determined.
Optionally, as shown in figure 12, the computing module 340 includes:
First extracts submodule 341b, for extracting clique's feature of each clique's set to be determined;
First input submodule 342b, for inputting in trained regression model clique's feature so that described time
Model is returned to export the suspicion degree score of clique's set to be determined.
Optionally, as shown in figure 13, the computing module 340 includes:
Third chooses submodule 341c, for choosing target data feature, the target data from the data characteristics
Distribution of the feature in the clique to be determined set be more than with distributional difference of the target data feature in overall data
Targets threshold;
Second computational submodule 342c, for the accounting in clique's set to be determined according to the target data feature
Than calculating the first suspicion degree score of clique's set to be determined;
Second extracts submodule 343c, for extracting clique's feature of each clique's set to be determined;
Second input submodule 344c, for inputting in trained regression model clique's feature so that described time
Model is returned to export the second suspicion degree score of clique's set to be determined;
Third computational submodule 345c is used for according to the first suspicion degree score and the second suspicion degree score,
Calculate the synthesis suspicion degree score of clique's set to be determined.
About the device in above-described embodiment, wherein modules execute the concrete mode of operation in related this method
Embodiment in be described in detail, no detailed explanation will be given here.
The disclosure also provides a kind of computer readable storage medium, is stored thereon with computer program, and the program is processed
The method and step based on the fraud of graph model detection clique described in any of the above-described alternative embodiment is realized when device executes.
The disclosure also provides a kind of device based on the fraud of graph model detection clique, comprising:
Memory is stored thereon with computer program;And
Processor, for executing the computer program in the memory, to realize the optional implementation of any of the above-described
The example method and step based on the fraud of graph model detection clique.
Figure 14 is a kind of frame of device 400 based on the fraud of graph model detection clique shown according to an exemplary embodiment
Figure.As shown in figure 14, which may include: processor 401, memory 402, multimedia component 403, input/output
(I/O) interface 404 and communication component 405.
Wherein, processor 401 is used to control the integrated operation of the device 400, above-mentioned based on graph model detection to complete
All or part of the steps in the method for clique's fraud.Memory 402 is for storing various types of data to support in the dress
400 operation is set, these data for example may include the finger of any application or method for operating on the device 400
Order and the relevant data of application program.The memory 402 can be by any kind of volatibility or non-volatile memory device
Or their combination is realized, for example, static random access memory (Static Random Access Memory, referred to as
SRAM), electrically erasable programmable read-only memory (Electrically Erasable Programmable Read-Only
Memory, abbreviation EEPROM), Erasable Programmable Read Only Memory EPROM (Erasable Programmable Read-Only
Memory, abbreviation EPROM), programmable read only memory (Programmable Read-Only Memory, abbreviation PROM), only
It reads memory (Read-Only Memory, abbreviation ROM), magnetic memory, flash memory, disk or CD.Multimedia component
403 may include screen and audio component.Wherein screen for example can be touch screen, and audio component is for exporting and/or inputting
Audio signal.For example, audio component may include a microphone, microphone is for receiving external audio signal.Institute is received
Audio signal can be further stored in memory 402 or be sent by communication component 405.Audio component further includes at least one
A loudspeaker is used for output audio signal.I/O interface 404 provides interface between processor 401 and other interface modules, on
Stating other interface modules can be keyboard, mouse, button etc..These buttons can be virtual push button or entity button.Communication
Component 405 is for carrying out wired or wireless communication between the device 400 and other equipment.Wireless communication, such as Wi-Fi, bluetooth,
Near-field communication (Near Field Communication, abbreviation NFC), 2G, 3G or 4G or they one or more of group
It closes, therefore the corresponding communication component 405 may include: Wi-Fi module, bluetooth module, NFC module.
In one exemplary embodiment, device 400 can be by one or more application specific integrated circuit
(Application Specific Integrated Circuit, abbreviation ASIC), digital signal processor (Digital
Signal Processor, abbreviation DSP), digital signal processing appts (Digital Signal Processing Device,
Abbreviation DSPD), programmable logic device (Programmable Logic Device, abbreviation PLD), field programmable gate array
(Field Programmable Gate Array, abbreviation FPGA), controller, microcontroller, microprocessor or other electronics member
Part is realized, for executing the above-mentioned method based on the fraud of graph model detection clique.
In a further exemplary embodiment, a kind of computer readable storage medium including program instruction, example are additionally provided
It such as include the memory 402 of program instruction, above procedure instruction can be executed above-mentioned to complete by the processor 401 of device 400
Method based on the fraud of graph model detection clique.
The preferred embodiment of the disclosure is described in detail in conjunction with attached drawing above, still, the disclosure is not limited to above-mentioned reality
The detail in mode is applied, in the range of the technology design of the disclosure, a variety of letters can be carried out to the technical solution of the disclosure
Monotropic type, these simple variants belong to the protection scope of the disclosure.
It is further to note that specific technical features described in the above specific embodiments, in not lance
In the case where shield, it can be combined in any appropriate way.In order to avoid unnecessary repetition, the disclosure to it is various can
No further explanation will be given for the combination of energy.
In addition, any combination can also be carried out between a variety of different embodiments of the disclosure, as long as it is without prejudice to originally
Disclosed thought equally should be considered as disclosure disclosure of that.
Claims (10)
1. a kind of method based on the fraud of graph model detection clique, which is characterized in that the described method includes:
Obtain user base data and history suspicion user data;
According to the data of acquisition, user-association figure is generated;Wherein, the node of the user-association figure is to be generated according to data characteristics
User-association subgraph, the side right of the user-association figure includes the similarity of node again;
Based on the user-association figure, clique to be determined is generated using community's partitioning algorithm and is gathered;
Calculate the suspicion degree of clique's set to be determined;
According to calculated result, the judgement result of the clique to be determined is exported.
2. the method according to claim 1, wherein the generation user-association figure, comprising:
Choose the feature combination in the user base data and the history suspicion user data and group number;
Generate user-association subgraph and using feature consistency is equal or ambiguity equivalent way is corresponding with user-association
Figure is that node splicing generates user without weighted associations figure;
Similarity using the user without weighted associations figure interior joint re-generates the similar weighted associations figure of user as side right.
3. according to the method described in claim 2, it is characterized in that, described generate clique's collection to be determined using community's partitioning algorithm
It closes, comprising:
Based on the similar weighted associations figure of the user, n clique is generated using community's partitioning algorithm and is gathered, n is positive integer;
Confirm that number of users is less than or equal to very big threshold value in clique's set;
Confirm that number of users is less than the quantity of the clique set of minimum threshold value less than or equal to preset threshold;
Clique set is determined as clique's set to be determined.
4. according to the method described in claim 3, it is characterized by further comprising:
The clique's set for being greater than the very big threshold value to number of users calls community's partitioning algorithm to be divided so that the clique
Number of users is less than or equal to the very big threshold value in set;
If the quantity that the clique that number of users is less than minimum threshold value gathers is greater than the preset threshold, hierarchical clustering is called to calculate
The clique set that method is less than minimum threshold value to number of users is condensed.
5. according to the method described in claim 4, it is characterized in that, community's partitioning algorithm include icon label propagation algorithm or
GN algorithm;The hierarchical clustering algorithm includes agglomerative algorithm or splitting algorithm.
6. the method according to claim 1, wherein the suspicion degree for calculating clique's set to be determined obtains
Point, comprising:
Target data feature is chosen from the data characteristics, the target data feature is in clique's set to be determined
Distribution is more than targets threshold with distributional difference of the target data feature in overall data;
According to accounting of the target data feature in clique's set to be determined, clique's set to be determined is calculated
Suspicion degree score.
7. the method according to claim 1, wherein the suspicion degree for calculating clique's set to be determined obtains
Point, comprising:
Extract clique's feature of each clique's set to be determined;
Clique's feature is inputted in trained regression model so that regression model output clique's collection to be determined
The suspicion degree score of conjunction.
8. a kind of device based on the fraud of graph model detection clique, which is characterized in that described device includes:
Module is obtained, for obtaining user base data and history suspicion user data;
First generation module generates user-association figure for the data according to acquisition;Wherein, the node of the user-association figure
Side right for the user-association subgraph generated according to data characteristics, the user-association figure includes the similarity of node again;
Second generation module generates clique to be determined using community's partitioning algorithm and gathers for being based on the user-association figure;
Computing module, for calculating the suspicion degree of clique's set to be determined;
Output module, for exporting the judgement result of the clique to be determined according to calculated result.
9. a kind of computer readable storage medium, is stored thereon with computer program, which is characterized in that the program is held by processor
The step of any one of claims 1 to 7 the method is realized when row.
10. a kind of device based on the fraud of graph model detection clique characterized by comprising
Memory is stored thereon with computer program;And
Processor, for executing the computer program in the memory, to realize any one of claims 1 to 7 institute
The step of stating method.
Priority Applications (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910239821.3A CN110070364A (en) | 2019-03-27 | 2019-03-27 | Method and apparatus, storage medium based on the fraud of graph model detection clique |
PCT/CN2019/124807 WO2020192184A1 (en) | 2019-03-27 | 2019-12-12 | Gang fraud detection based on graph model |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910239821.3A CN110070364A (en) | 2019-03-27 | 2019-03-27 | Method and apparatus, storage medium based on the fraud of graph model detection clique |
Publications (1)
Publication Number | Publication Date |
---|---|
CN110070364A true CN110070364A (en) | 2019-07-30 |
Family
ID=67366679
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201910239821.3A Pending CN110070364A (en) | 2019-03-27 | 2019-03-27 | Method and apparatus, storage medium based on the fraud of graph model detection clique |
Country Status (2)
Country | Link |
---|---|
CN (1) | CN110070364A (en) |
WO (1) | WO2020192184A1 (en) |
Cited By (19)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110827159A (en) * | 2019-11-11 | 2020-02-21 | 上海交通大学 | Financial medical insurance fraud early warning method, device and terminal based on relational graph |
CN111090729A (en) * | 2019-12-16 | 2020-05-01 | 深圳市卡牛科技有限公司 | Method, device, server and storage medium for identifying fraudulent group |
CN111325350A (en) * | 2020-02-19 | 2020-06-23 | 第四范式(北京)技术有限公司 | Suspicious tissue discovery system and method |
CN111339436A (en) * | 2020-02-11 | 2020-06-26 | 腾讯科技(深圳)有限公司 | Data identification method, device, equipment and readable storage medium |
CN111401959A (en) * | 2020-03-18 | 2020-07-10 | 多点(深圳)数字科技有限公司 | Risk group prediction method and device, computer equipment and storage medium |
CN111428217A (en) * | 2020-04-12 | 2020-07-17 | 中信银行股份有限公司 | Method and device for identifying cheat group, electronic equipment and computer readable storage medium |
CN111476662A (en) * | 2020-04-13 | 2020-07-31 | 中国工商银行股份有限公司 | Anti-money laundering identification method and device |
CN111709756A (en) * | 2020-06-16 | 2020-09-25 | 银联商务股份有限公司 | Method and device for identifying suspicious communities, storage medium and computer equipment |
WO2020192184A1 (en) * | 2019-03-27 | 2020-10-01 | 北京三快在线科技有限公司 | Gang fraud detection based on graph model |
CN111931047A (en) * | 2020-07-31 | 2020-11-13 | 中国平安人寿保险股份有限公司 | Artificial intelligence-based black product account detection method and related device |
CN112184334A (en) * | 2020-10-27 | 2021-01-05 | 北京嘀嘀无限科技发展有限公司 | Method, apparatus, device and medium for determining problem users |
CN112308694A (en) * | 2020-11-24 | 2021-02-02 | 拉卡拉支付股份有限公司 | Method and device for discovering cheating group |
CN112651764A (en) * | 2019-10-12 | 2021-04-13 | 武汉斗鱼网络科技有限公司 | Target user identification method, device, equipment and storage medium |
CN112907308A (en) * | 2019-11-19 | 2021-06-04 | 京东数字科技控股有限公司 | Data detection method and device and computer readable storage medium |
CN113326178A (en) * | 2021-06-22 | 2021-08-31 | 北京奇艺世纪科技有限公司 | Abnormal account number propagation method and device, electronic equipment and storage medium |
WO2021169631A1 (en) * | 2020-02-29 | 2021-09-02 | 深圳壹账通智能科技有限公司 | Fraudster identification method, apparatus and device, and storage medium |
CN113592517A (en) * | 2021-08-09 | 2021-11-02 | 深圳前海微众银行股份有限公司 | Method and device for identifying cheating passenger groups, terminal equipment and computer storage medium |
CN114820219A (en) * | 2022-05-23 | 2022-07-29 | 杭银消费金融股份有限公司 | Complex network-based cheating community identification method and system |
CN115150052A (en) * | 2022-06-08 | 2022-10-04 | 北京天融信网络安全技术有限公司 | Method, device, equipment and storage medium for tracking and identifying attack group |
Family Cites Families (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US8694979B2 (en) * | 2012-06-26 | 2014-04-08 | International Business Machines Corporation | Efficient egonet computation in a weighted directed graph |
CN105812195B (en) * | 2014-12-30 | 2019-05-07 | 阿里巴巴集团控股有限公司 | The method and apparatus of computer identification batch account |
CN107194623B (en) * | 2017-07-20 | 2021-01-05 | 深圳市分期乐网络科技有限公司 | Group partner fraud discovery method and device |
CN108681936B (en) * | 2018-04-26 | 2021-11-02 | 浙江邦盛科技有限公司 | Fraud group identification method based on modularity and balanced label propagation |
CN110070364A (en) * | 2019-03-27 | 2019-07-30 | 北京三快在线科技有限公司 | Method and apparatus, storage medium based on the fraud of graph model detection clique |
-
2019
- 2019-03-27 CN CN201910239821.3A patent/CN110070364A/en active Pending
- 2019-12-12 WO PCT/CN2019/124807 patent/WO2020192184A1/en active Application Filing
Cited By (24)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2020192184A1 (en) * | 2019-03-27 | 2020-10-01 | 北京三快在线科技有限公司 | Gang fraud detection based on graph model |
CN112651764B (en) * | 2019-10-12 | 2023-03-31 | 武汉斗鱼网络科技有限公司 | Target user identification method, device, equipment and storage medium |
CN112651764A (en) * | 2019-10-12 | 2021-04-13 | 武汉斗鱼网络科技有限公司 | Target user identification method, device, equipment and storage medium |
CN110827159B (en) * | 2019-11-11 | 2023-11-03 | 上海交通大学 | Financial medical insurance fraud early warning method, device and terminal based on relation diagram |
CN110827159A (en) * | 2019-11-11 | 2020-02-21 | 上海交通大学 | Financial medical insurance fraud early warning method, device and terminal based on relational graph |
CN112907308A (en) * | 2019-11-19 | 2021-06-04 | 京东数字科技控股有限公司 | Data detection method and device and computer readable storage medium |
CN111090729B (en) * | 2019-12-16 | 2024-04-09 | 深圳市卡牛科技有限公司 | Identification method, device, server and storage medium for fraudulent group |
CN111090729A (en) * | 2019-12-16 | 2020-05-01 | 深圳市卡牛科技有限公司 | Method, device, server and storage medium for identifying fraudulent group |
CN111339436A (en) * | 2020-02-11 | 2020-06-26 | 腾讯科技(深圳)有限公司 | Data identification method, device, equipment and readable storage medium |
CN111325350B (en) * | 2020-02-19 | 2023-09-29 | 第四范式(北京)技术有限公司 | Suspicious tissue discovery system and method |
CN111325350A (en) * | 2020-02-19 | 2020-06-23 | 第四范式(北京)技术有限公司 | Suspicious tissue discovery system and method |
WO2021169631A1 (en) * | 2020-02-29 | 2021-09-02 | 深圳壹账通智能科技有限公司 | Fraudster identification method, apparatus and device, and storage medium |
CN111401959A (en) * | 2020-03-18 | 2020-07-10 | 多点(深圳)数字科技有限公司 | Risk group prediction method and device, computer equipment and storage medium |
CN111401959B (en) * | 2020-03-18 | 2023-09-29 | 多点(深圳)数字科技有限公司 | Risk group prediction method, apparatus, computer device and storage medium |
CN111428217A (en) * | 2020-04-12 | 2020-07-17 | 中信银行股份有限公司 | Method and device for identifying cheat group, electronic equipment and computer readable storage medium |
CN111476662A (en) * | 2020-04-13 | 2020-07-31 | 中国工商银行股份有限公司 | Anti-money laundering identification method and device |
CN111709756A (en) * | 2020-06-16 | 2020-09-25 | 银联商务股份有限公司 | Method and device for identifying suspicious communities, storage medium and computer equipment |
CN111931047A (en) * | 2020-07-31 | 2020-11-13 | 中国平安人寿保险股份有限公司 | Artificial intelligence-based black product account detection method and related device |
CN112184334A (en) * | 2020-10-27 | 2021-01-05 | 北京嘀嘀无限科技发展有限公司 | Method, apparatus, device and medium for determining problem users |
CN112308694A (en) * | 2020-11-24 | 2021-02-02 | 拉卡拉支付股份有限公司 | Method and device for discovering cheating group |
CN113326178A (en) * | 2021-06-22 | 2021-08-31 | 北京奇艺世纪科技有限公司 | Abnormal account number propagation method and device, electronic equipment and storage medium |
CN113592517A (en) * | 2021-08-09 | 2021-11-02 | 深圳前海微众银行股份有限公司 | Method and device for identifying cheating passenger groups, terminal equipment and computer storage medium |
CN114820219A (en) * | 2022-05-23 | 2022-07-29 | 杭银消费金融股份有限公司 | Complex network-based cheating community identification method and system |
CN115150052A (en) * | 2022-06-08 | 2022-10-04 | 北京天融信网络安全技术有限公司 | Method, device, equipment and storage medium for tracking and identifying attack group |
Also Published As
Publication number | Publication date |
---|---|
WO2020192184A1 (en) | 2020-10-01 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN110070364A (en) | Method and apparatus, storage medium based on the fraud of graph model detection clique | |
CN106803168B (en) | Abnormal transfer detection method and device | |
US11074350B2 (en) | Method and device for controlling data risk | |
CN112396189B (en) | Method and device for constructing federal learning model by multiple parties | |
CN112600810B (en) | Ether house phishing fraud detection method and device based on graph classification | |
CN106408413A (en) | Multi-cycle installment decision making method and system | |
KR101364763B1 (en) | Financial fraud warning system using banking transaction pattern analysis and a method thereof | |
CN109063966A (en) | The recognition methods of adventure account and device | |
CN108921686A (en) | A kind of credit-graded approach and device of personal user | |
CN104616194A (en) | Data processing method and payment platform | |
CN109325845A (en) | A kind of financial product intelligent recommendation method and system | |
Kulkarni et al. | Advanced credit score calculation using social media and machine learning | |
CN109711801A (en) | A kind of Internetbank account checking method and device | |
CN109003088B (en) | Business risk analysis method, device and equipment | |
CN110197426A (en) | A kind of method for building up of credit scoring model, device and readable storage medium storing program for executing | |
CN106296154A (en) | Transaction methods and system | |
CN114862110A (en) | Method and device for building middle platform of commercial banking business, electronic equipment and storage medium | |
CN111428092B (en) | Bank accurate marketing method based on graph model | |
CN111582873B (en) | Method and device for evaluating interaction event, electronic equipment and storage medium | |
CN110347566A (en) | For carrying out the method and device of measures of effectiveness to registration air control model | |
TWI717839B (en) | Risk peak identification method and device | |
CN111553702A (en) | Payment risk identification method and device | |
CN113052579B (en) | Payment method and system of mobile payment platform | |
CN115345726B (en) | Automatic credit card approval method and device, electronic equipment and medium | |
Roa Ballén | Machine Learning Models and Alternative Data in Credit Scoring: Statistical and Financial impact |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination |