CN110209660A - Cheat clique's method for digging, device and electronic equipment - Google Patents

Cheat clique's method for digging, device and electronic equipment Download PDF

Info

Publication number
CN110209660A
CN110209660A CN201910496109.1A CN201910496109A CN110209660A CN 110209660 A CN110209660 A CN 110209660A CN 201910496109 A CN201910496109 A CN 201910496109A CN 110209660 A CN110209660 A CN 110209660A
Authority
CN
China
Prior art keywords
data
community
rule
clique
fraud
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201910496109.1A
Other languages
Chinese (zh)
Other versions
CN110209660B (en
Inventor
张亮杰
袁力
王亚亮
陈亮
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Financial Technologies Ltd Arxan Beijing
Original Assignee
Financial Technologies Ltd Arxan Beijing
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Financial Technologies Ltd Arxan Beijing filed Critical Financial Technologies Ltd Arxan Beijing
Priority to CN201910496109.1A priority Critical patent/CN110209660B/en
Publication of CN110209660A publication Critical patent/CN110209660A/en
Application granted granted Critical
Publication of CN110209660B publication Critical patent/CN110209660B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/21Design, administration or maintenance of databases
    • G06F16/215Improving data quality; Data cleansing, e.g. de-duplication, removing invalid entries or correcting typographical errors
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/245Query processing
    • G06F16/2455Query execution
    • G06F16/24564Applying rules; Deductive queries
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/245Query processing
    • G06F16/2458Special types of queries, e.g. statistical queries, fuzzy queries or distributed queries
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q30/00Commerce
    • G06Q30/018Certifying business or products
    • G06Q30/0185Product, service or business identity fraud
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q40/00Finance; Insurance; Tax strategies; Processing of corporate or income taxes
    • G06Q40/03Credit; Loans; Processing thereof

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Business, Economics & Management (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Databases & Information Systems (AREA)
  • Accounting & Taxation (AREA)
  • Finance (AREA)
  • General Engineering & Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Marketing (AREA)
  • Computational Linguistics (AREA)
  • General Business, Economics & Management (AREA)
  • Development Economics (AREA)
  • Strategic Management (AREA)
  • Economics (AREA)
  • Quality & Reliability (AREA)
  • Technology Law (AREA)
  • Entrepreneurship & Innovation (AREA)
  • Fuzzy Systems (AREA)
  • Mathematical Physics (AREA)
  • Probability & Statistics with Applications (AREA)
  • Software Systems (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The present invention provides a kind of fraud clique method for digging, device and electronic equipment, in the method, first initial data is pre-processed, the data that obtain that treated, the pretreated process can delete noise data before building figure, reduce influence of the noise data to figure is built, simultaneously, iteration pretreatment has been carried out during this and divides the process of processing, until the number of nodes of every group of community data is not more than preset threshold in finally obtained multiple groups community data, or, until the number of nodes of every group of community data no longer changes in multiple groups community data, visualization processing is carried out to every group of community data for dividing completion again, obtain community network figure, in this way, great deal of nodes is not present in obtained each community network figure, meet fraud clique's characteristic, convenient for visualizing and carrying out subsequent fraud clique evaluation, finally determine Fraud clique's accuracy is good, alleviates the technical problem of existing fraud clique's method for digging accuracy difference.

Description

Cheat clique's method for digging, device and electronic equipment
Technical field
The present invention relates to the technical fields of computer, more particularly, to a kind of fraud clique method for digging, device and electronics Equipment.
Background technique
Universal with the various loans in financial field or class loan transaction, clique's fraud case gradually increases, to investor, Company and country cause different degrees of loss.A kind of technical solution is eager to seek to find clique's case of victimization in each businesses and institutions Part is accomplished to take precautions against possible trouble, prevents and recover in time loss.
Existing fraud clique method for digging is to first pass through community discovery algorithm to be split data, is obtained after segmentation Then multiple communities carry out the evaluation of fraud clique to each community again, so that it is determined that fraud clique therein.But existing society Area finds algorithm, is all to divide from the characteristic of network topology structure to the technology that data carry out, never consideration actual demand.Most Eventually, there is a large amount of big community in the community's result divided, these big communities are not easy to the evaluation of subsequent fraud clique, And includes also some noise nodes or relationship in these big communities, cause the fraud clique accuracy finally determined poor.
To sum up, existing fraud clique method for digging accuracy is poor.
Summary of the invention
It is existing to alleviate the purpose of the present invention is to provide a kind of fraud clique method for digging, device and electronic equipment Cheat the technical problem of clique's method for digging accuracy difference.
A kind of fraud clique method for digging provided by the invention, comprising: obtain initial data;According to preprocessing rule to institute It states initial data to be pre-processed, the data that obtain that treated, wherein do not include noise data in treated the data; Treated that data are saved to chart database by described, obtains diagram data corresponding with the storage organization of the chart database;It is logical It crosses community discovery algorithm to divide the diagram data, obtains multiple groups community data;Based on preset rules library to the multiple groups Community data is analyzed, and determines target preprocessing rule based on the analysis results;Using the target preprocessing rule as described in Preprocessing rule, and using the multiple groups community data as the initial data, it returns and executes according to preprocessing rule to described Initial data carries out pretreated step, until every group of community data meets preset condition;The preset condition includes: every group of society Number of nodes in area's data is not more than preset threshold, alternatively, the number of nodes in every group of community data no longer changes;Described in satisfaction Every group of community data of preset condition carries out visualization processing, obtains community network figure;Pass through default fraud clique's mining rule The evaluation of fraud clique is carried out to the community network figure, whether the corresponding clique of the community network figure is determined according to evaluation result To cheat clique.
Further, the preprocessing rule includes: preset data cleaning rule and noise recognition rule, according to pretreatment Rule pre-processes the initial data, and obtaining that treated, data include: according to the preset data cleaning rule pair The initial data carries out data cleansing, the data after being cleaned;After identifying the cleaning based on the noise recognition rule Data in noise data;The noise data in data after removing the cleaning obtains treated the data.
Further, the community discovery algorithm includes but is not limited to any of the following: louvain community discovery algorithm, The community discovery algorithm propagated based on label.
Further, in the preset rules library include: corresponding relationship between default feature and processing rule, based on pre- If rule base analyzes the multiple groups community data, determine that target preprocessing rule includes: to described based on the analysis results Multiple groups community data carries out feature extraction, obtains the target signature of the multiple groups community data;By the target signature with it is described Default feature in preset rules library is matched;It is determined and the target signature in the processing rule according to matching result Corresponding target processing rule;It regard target processing rule as the target preprocessing rule.
It further, include: individual nodes, attribute node, the individual nodes and the category in the community network figure Incidence relation between property node.
Further, the evaluation of fraud clique, root are carried out to the community network figure by default fraud clique's mining rule According to evaluation result determine the corresponding clique of the community network figure whether be fraud clique include: the analysis community network figure with Default fraud clique's mining rule meets situation;Meet situation according to described and give a mark to the community network figure, Obtain the fraud clique score of the community network figure;Determine that the community network figure is corresponding according to fraud clique score Whether clique is fraud clique.
The present invention also provides a kind of fraud clique excavating gears, comprising: module is obtained, for obtaining initial data;In advance Processing module, for being pre-processed according to preprocessing rule to the initial data, the data that obtain that treated, wherein institute Noise data are not included in data of stating that treated;Preserving module, for treated that data are saved to chart database by described, Obtain diagram data corresponding with the storage organization of the chart database;Processing module is divided, for passing through community discovery algorithm pair The diagram data is divided, and multiple groups community data is obtained;Analysis module, for being based on preset rules library to the multiple groups community Data are analyzed, and determine target preprocessing rule based on the analysis results;Execution module is returned to, for pre-processing the target Rule is used as the preprocessing rule, and using the multiple groups community data as the initial data, returns and execute according to pre- place Reason rule carries out pretreated step to the initial data, until every group of community data meets preset condition;The default item Part includes: that the number of nodes in every group of community data is not more than preset threshold, alternatively, the number of nodes in every group of community data no longer becomes Change;Visualization processing module obtains community for carrying out visualization processing to every group of community data for meeting the preset condition Network;Clique's evaluation module is cheated, for cheating by default fraud clique's mining rule the community network figure Clique's evaluation determines whether the corresponding clique of the community network figure is fraud clique according to evaluation result.
Further, the preprocessing rule includes: preset data cleaning rule and noise recognition rule, the pretreatment Module includes: data cleansing unit, for carrying out data cleansing to the initial data according to the preset data cleaning rule, Data after being cleaned;Recognition unit, for identifying making an uproar in the data after the cleaning based on the noise recognition rule Sound data;Removal unit obtains treated the data for removing the noise data in the data after the cleaning.
Further, the community discovery algorithm includes but is not limited to any of the following: louvain community discovery algorithm, The community discovery algorithm propagated based on label.
The present invention also provides a kind of electronic equipment, including memory, processor, being stored on the memory can be in institute The computer program run on processor is stated, the processor is realized described in above content when executing the computer program The step of method.
In embodiments of the present invention, initial data is first obtained;Then initial data is located in advance according to preprocessing rule Reason, the data that obtain that treated, and will treated that data are saved to chart database, obtain the storage organization pair with chart database The diagram data answered;And then diagram data is divided by community discovery algorithm, multiple groups community data is obtained, then based on default rule Then multiple groups community data is analyzed in library, determines target preprocessing rule based on the analysis results, and by target preprocessing rule As preprocessing rule, using multiple groups community data as initial data, return execute according to preprocessing rule to initial data into The pretreated step of row, until every group of community data meets preset condition;Preset condition includes: the node in every group of community data Number is not more than preset threshold, alternatively, the number of nodes in every group of community data no longer changes;Finally to meeting every group of preset condition Community data carries out visualization processing, obtains community network figure;Again by default fraud clique's mining rule to community network figure The evaluation of fraud clique is carried out, determines whether the corresponding clique of community network figure is fraud clique according to evaluation result.By above-mentioned Description it is found that in fraud clique method for digging of the invention, be first to be pre-processed to initial data, obtain that treated Data, the pretreated process can delete noise data before building figure, reduce influence of the noise data to figure is built, together When, iteration pretreatment should have been carried out in the process and has divided the process of processing, until every group in finally obtained multiple groups community data The number of nodes of community data is not more than preset threshold, alternatively, the number of nodes of every group of community data no longer becomes in multiple groups community data It turns to only, that is, until the number of nodes of every group of community data meets the spy of fraud clique in finally obtained multiple groups community data Point, alternatively, in finally obtained multiple groups community data the number of nodes of every group of community data cannot be further continued for divide (i.e. divide obtain Be minimum unit) until, then to divide complete every group of community data carry out visualization processing, obtain community network Figure, in this way, in obtained each community network figure be not present great deal of nodes, meet fraud clique's characteristic, convenient for visualization and Subsequent fraud clique evaluation is carried out, finally determining fraud clique accuracy is good, alleviates excavation side, existing fraud clique The technical problem of method accuracy difference.
Detailed description of the invention
It, below will be to specific in order to illustrate more clearly of the specific embodiment of the invention or technical solution in the prior art Embodiment or attached drawing needed to be used in the description of the prior art be briefly described, it should be apparent that, it is described below Attached drawing is some embodiments of the present invention, for those of ordinary skill in the art, before not making the creative labor It puts, is also possible to obtain other drawings based on these drawings.
Fig. 1 is a kind of flow chart for cheating clique's method for digging provided in an embodiment of the present invention;
Fig. 2 pre-processes initial data according to preprocessing rule to be provided in an embodiment of the present invention, after obtaining processing Data method flow diagram;
Fig. 3 is that multiple groups community data is analyzed in the preset rules library provided in an embodiment of the present invention that is based on, according to analysis As a result the method flow diagram of target preprocessing rule is determined;
Fig. 4 carries out fraud group to community network figure by default fraud clique's mining rule to be provided in an embodiment of the present invention Partner's evaluation determines whether the corresponding clique of community network figure is the method flow diagram for cheating clique according to evaluation result;
Fig. 5 is a kind of schematic diagram for cheating clique's excavating gear provided in an embodiment of the present invention;
Fig. 6 is the schematic diagram of a kind of electronic equipment provided in an embodiment of the present invention.
Specific embodiment
Technical solution of the present invention is clearly and completely described below in conjunction with embodiment, it is clear that described reality Applying example is a part of the embodiment of the present invention, instead of all the embodiments.Based on the embodiments of the present invention, the common skill in this field Art personnel every other embodiment obtained without making creative work belongs to the model that the present invention protects It encloses.
For convenient for understanding the present embodiment, first to excavation side, a kind of fraud clique disclosed in the embodiment of the present invention Method describes in detail.
Embodiment one:
According to embodiments of the present invention, a kind of embodiment for cheating clique's method for digging is provided, it should be noted that attached The step of process of figure illustrates can execute in a computer system such as a set of computer executable instructions, though also, So logical order is shown in flow charts, but in some cases, it can be to be different from shown by sequence execution herein Or the step of description.
Fig. 1 is a kind of flow chart for cheating clique's method for digging according to an embodiment of the present invention, as shown in Figure 1, this method Include the following steps:
Step S102 obtains initial data;
In embodiments of the present invention, the acquisition channel of initial data can there are many, such as: user is carrying out related loan When the application of application business, the electronic application data of submission, alternatively, being the hand-written request for data submitted;It can also be online Related data crawled etc., the embodiment of the present invention is to the acquisition form of above-mentioned initial data without concrete restriction.
Step S104 pre-processes initial data according to preprocessing rule, the data that obtain that treated, wherein place Noise data are not included in data after reason;
After obtaining initial data, initial data is pre-processed according to preprocessing rule, hereinafter again to the process It describes in detail, details are not described herein.
Step S106, by treated, data are saved to chart database, are obtained corresponding with the storage organization of chart database Diagram data;
After the data that obtain that treated, will treated that data are saved to chart database, can obtain and chart database The corresponding diagram data of storage organization.
Step S108 divides diagram data by community discovery algorithm, obtains multiple groups community data;
After obtaining diagram data, further diagram data is divided by community discovery algorithm, obtains multiple groups community data. Specifically, community discovery algorithm includes but is not limited to any of the following: louvain community discovery algorithm is propagated based on label Community discovery algorithm.
Step S110 analyzes multiple groups community data based on preset rules library, determines that target is pre- based on the analysis results Processing rule;
Hereinafter the process is described in detail again, details are not described herein.
Step S112, using target preprocessing rule as preprocessing rule, and using multiple groups community data as initial data, It returns to execute and pretreated step is carried out to initial data according to preprocessing rule, until every group of community data meets default item Part;Preset condition includes: that the number of nodes in every group of community data is not more than preset threshold, alternatively, the section in every group of community data Points no longer change;
It returns to execute and pretreated step is carried out to initial data according to preprocessing rule, until obtained multiple groups community number According to until meeting preset condition.
Step S114 carries out visualization processing to every group of community data for meeting preset condition, obtains community network figure;
After obtaining meeting every group of community data of preset condition, carrying out to every group of community data for meeting preset condition can It is handled depending on change, it will be able to obtain community network figure.
Step S116 carries out the evaluation of fraud clique to community network figure by default fraud clique's mining rule, according to commenting Valence result determines whether the corresponding clique of community network figure is fraud clique.
From the above description it can be seen that in fraud clique method for digging of the invention, be first initial data has been carried out it is pre- Processing, the data that obtain that treated, the pretreated process can delete noise data before building figure, reduce noise number According to the influence to figure is built, meanwhile, iteration pretreatment should have been carried out in the process and has divided the process of processing, until finally obtained more The number of nodes of every group of community data is not more than preset threshold in group community data, alternatively, every group of community's number in multiple groups community data According to number of nodes no longer change until, that is, until finally obtained multiple groups community data in every group of community data number of nodes The characteristics of meeting fraud clique, alternatively, the number of nodes of every group of community data cannot be followed by finally obtained multiple groups community data Until continuous division (what i.e. division obtained has been minimum unit), then carried out at visualization to the every group of community data completed is divided Reason, obtains community network figure, in this way, great deal of nodes is not present in obtained each community network figure, it is special to meet fraud clique Property, convenient for visualizing and carrying out subsequent fraud clique evaluation, finally determining fraud clique accuracy is good, alleviates existing Cheat the technical problem of clique's method for digging accuracy difference.
Above content has carried out brief introduction to fraud clique method for digging of the invention, below to the tool being directed to Hold in vivo and is described in detail.
In an alternate embodiment of the present invention where, preprocessing rule includes: preset data cleaning rule and noise identification Rule pre-processes initial data according to preprocessing rule with reference to Fig. 2, step S104, and obtaining that treated, data include Following steps:
Step S201 carries out data cleansing to initial data according to preset data cleaning rule, the data after being cleaned;
Specifically, preset data cleaning rule can specifically include: check field format, format conversion, error correction, decimal place Number processing etc., the embodiment of the present invention is to above-mentioned preset data cleaning rule without concrete restriction.
Step S202, based on the noise data in the data after the identification cleaning of noise recognition rule;
Specifically, the noise recognition rule of first time is preset noise recognition rule, and subsequent noise recognition rule It is after being analyzed based on preset rules library last time obtained multiple groups community data, based on the analysis results determining target Preprocessing rule.Hereinafter describe in detail again to the process of determining target preprocessing rule.
Above-mentioned noise recognition rule can identify the noise data in the data after cleaning.The noise data refers to useless , generate the attribute node of interference or the data of incidence relation.
I.e. application individual is by some Attribute Associations at big community, but these attributes for gathering into big community are not small-scale Cheat the characteristic of clique.
Such as: 10,000 main bodys belong to company A, this ten thousand main bodys have all carried out loan application, then this ten thousand Main body will be based on this Attribute transposition of company A to a community Ge great, but it certainly not cheats clique.It so can be by A This attribute node of company removal, in this way, 10,000 main bodys would not establish subsequent association by this attribute node because of company A Relationship avoids the formation of useless big community, convenient for the excacation of subsequent fraud clique.
Step S203, the noise data in data after removal cleaning, the data that obtain that treated.
The process of data prediction is described in detail in above content, below to based on preset rules library to multiple groups society The process that area's data are analyzed is described in detail.
In an alternate embodiment of the present invention where, with reference to Fig. 3, step S110 includes: default feature in preset rules library With the corresponding relationship between processing rule, multiple groups community data is analyzed based on preset rules library, based on the analysis results really The preprocessing rule that sets the goal includes the following steps:
Step S301 carries out feature extraction to multiple groups community data, obtains the target signature of multiple groups community data;
Specifically, carrying out feature when feature extraction to multiple groups community data based on the default feature in preset rules library and mentioning It takes.For example, default feature is to belong to a company, then being judged as whether each group community data in multiple groups community data belongs to In a company, feature (the i.e. target spy whether each group community data in multiple groups community data belongs to a company is obtained Sign).
Step S302 matches target signature with the default feature in preset rules library;
Step S303 determines that target corresponding with target signature handles rule in processing rule according to matching result;
Step S304 regard target processing rule as target preprocessing rule.
The process for determining target preprocessing rule is described in detail in above content, below to the evaluation of fraud clique Process is described in detail.
In an alternate embodiment of the present invention where, with reference to Fig. 4, step S116, pass through default fraud clique's mining rule The evaluation of fraud clique is carried out to community network figure, determines whether the corresponding clique of community network figure is fraud group according to evaluation result Partner includes the following steps:
Step S401, analysis community network figure and default fraud clique's mining rule meet situation;
Specifically, including: the pass between individual nodes, attribute node, individual nodes and attribute node in community network figure Connection relationship, above-mentioned default fraud clique's mining rule are the rule summarized after analyzing actual fraud clique, should Default fraud clique's mining rule is adjustable, is no longer illustrated here.
Step S402 gives a mark to community network figure according to situation is met, and the fraud clique for obtaining community network figure obtains Point;
When realization, (it can be accorded with according to each community network figure and the percentage that meets of default fraud clique's mining rule Close situation) it gives a mark to each community network figure;It is, of course, also possible to default to each default fraud clique's mining rule in advance One weight, by obtained each community network figure and it is a certain it is default fraud clique's mining rule meet percentage with it is corresponding After multiplied by weight, score of each community network figure under default fraud clique's mining rule is obtained, then all preset is taken advantage of Cheat the fraud clique score after the score under clique's mining rule is added as each community network figure.
It is, of course, also possible to be other implementations, the embodiment of the present invention is to the process of above-mentioned marking without concrete restriction.
Step S403 determines whether the corresponding clique of community network figure is fraud clique according to fraud clique's score.
Obtain fraud clique's score after, can to all community network figures according to fraud clique's score descending sequence into Row sequence obtains community network figure collating sequence, and then the top n community network figure in community network figure collating sequence is corresponding Clique as fraud clique, wherein N is positive integer greater than 1;
Certainly, after obtaining fraud clique's score, fraud clique's score can also be compared with default score threshold, If cheating clique's score is greater than default score threshold, the target society that clique's score is greater than default score threshold will be just cheated Clique corresponding to area's network is as fraud clique.
Similarly, the embodiment of the present invention is to the process of determining fraud clique without concrete restriction.
It is first to be pre-processed to initial data in fraud clique method for digging of the invention, the number that obtains that treated According to the pretreated process can delete noise data before building figure, reduce influence of the noise data to figure is built, together When, iteration pretreatment should have been carried out in the process and has divided the process of processing, until every group in finally obtained multiple groups community data The number of nodes of community data is not more than preset threshold, alternatively, the number of nodes of every group of community data no longer becomes in multiple groups community data It turns to only, that is, until the number of nodes of every group of community data meets the spy of fraud clique in finally obtained multiple groups community data Point, alternatively, in finally obtained multiple groups community data the number of nodes of every group of community data cannot be further continued for divide (i.e. divide obtain Be minimum unit) until, then to divide complete every group of community data carry out visualization processing, obtain community network Figure, in this way, in obtained each community network figure be not present great deal of nodes, meet fraud clique's characteristic, convenient for visualization and Subsequent fraud clique evaluation is carried out, finally determining fraud clique accuracy is good, alleviates excavation side, existing fraud clique The technical problem of method accuracy difference.
Embodiment two:
The embodiment of the invention also provides a kind of fraud clique excavating gears, below to fraud provided in an embodiment of the present invention Clique's excavating gear does specific introduction.
Fig. 5 is a kind of schematic diagram for cheating clique's excavating gear according to an embodiment of the present invention, as shown in figure 5, the fraud Clique's excavating gear mainly includes obtaining module 10, preprocessing module 20, and preserving module 30 divides processing module 40, analyzes mould Block 50 returns to execution module 60, visualization processing module 70 and fraud clique's evaluation module 80, in which:
Module is obtained, for obtaining initial data;
Preprocessing module, for being pre-processed according to preprocessing rule to initial data, the data that obtain that treated, In, noise data are not included in data that treated;
Preserving module, for will treated that data are saved to chart database, obtain the storage organization pair with chart database The diagram data answered;
It divides processing module and obtains multiple groups community data for dividing by community discovery algorithm to diagram data;
Analysis module determines mesh for analyzing based on preset rules library multiple groups community data based on the analysis results Mark preprocessing rule;
Return to execution module, for using target preprocessing rule as preprocessing rule, and using multiple groups community data as Initial data is returned and is executed according to preprocessing rule to the pretreated step of initial data progress, until every group of community data is full Sufficient preset condition;Preset condition includes: that the number of nodes in every group of community data is not more than preset threshold, alternatively, every group of community's number Number of nodes in no longer changes;
Visualization processing module obtains society for carrying out visualization processing to every group of community data for meeting preset condition Area's network;
Clique's evaluation module is cheated, for carrying out fraud clique to community network figure by default fraud clique's mining rule Evaluation determines whether the corresponding clique of community network figure is fraud clique according to evaluation result.
It is first to be pre-processed to initial data in fraud clique excavating gear of the invention, the number that obtains that treated According to the pretreated process can delete noise data before building figure, reduce influence of the noise data to figure is built, together When, iteration pretreatment should have been carried out in the process and has divided the process of processing, until every group in finally obtained multiple groups community data The number of nodes of community data is not more than preset threshold, alternatively, the number of nodes of every group of community data no longer becomes in multiple groups community data It turns to only, that is, until the number of nodes of every group of community data meets the spy of fraud clique in finally obtained multiple groups community data Point, alternatively, in finally obtained multiple groups community data the number of nodes of every group of community data cannot be further continued for divide (i.e. divide obtain Be minimum unit) until, then to divide complete every group of community data carry out visualization processing, obtain community network Figure, in this way, in obtained each community network figure be not present great deal of nodes, meet fraud clique's characteristic, convenient for visualization and Subsequent fraud clique evaluation is carried out, finally determining fraud clique accuracy is good, alleviates excavation side, existing fraud clique The technical problem of method accuracy difference.
Optionally, preprocessing rule includes: preset data cleaning rule and noise recognition rule, and preprocessing module includes:
Data cleansing unit is cleaned for carrying out data cleansing to initial data according to preset data cleaning rule Data afterwards;
Recognition unit, for based on the noise data in the data after the identification cleaning of noise recognition rule;
Removal unit, for removing the noise data in the data after cleaning, the data that obtain that treated.
Optionally, community discovery algorithm includes but is not limited to any of the following: louvain community discovery algorithm, based on mark Sign the community discovery algorithm propagated.
Optionally, include: in preset rules library default feature and processing rule between corresponding relationship, analysis module packet It includes:
Feature extraction unit, for carrying out feature extraction to multiple groups community data, the target for obtaining multiple groups community data is special Sign;
Matching unit, for matching target signature with the default feature in preset rules library;
First determination unit, for determining that target corresponding with target signature is handled in processing rule according to matching result Rule;
Setup unit, for regarding target processing rule as target preprocessing rule.
Optionally, in community network figure include: pass between individual nodes, attribute node, individual nodes and attribute node Connection relationship.
Optionally, fraud clique's evaluation module includes:
Analytical unit meets situation for analyze community network figure and default fraud clique's mining rule;
Marking unit, meets situation for basis and gives a mark to community network figure, obtain the fraud group of community network figure Partner's score;
Second determination unit, for determining whether the corresponding clique of community network figure is fraud group according to fraud clique's score Group.
Particular content in the embodiment two can be with reference to the description in above-described embodiment one, and details are not described herein.
Embodiment three:
The embodiment of the invention provides a kind of electronic equipment, and with reference to Fig. 6, which includes: processor 90, memory 91, bus 92 and communication interface 93, processor 90, communication interface 93 and memory 91 are connected by bus 92;Processor 90 is used The executable module stored in execution memory 91, such as computer program.Processor is realized such as when executing calculating and program Described in embodiment of the method the step of method.
Wherein, memory 91 may include high-speed random access memory (RAM, Random Access Memory), It may further include non-labile memory (non-volatile memory), for example, at least a magnetic disk storage.By extremely A few communication interface 93 (can be wired or wireless) is realized logical between the system network element and at least one other network element Letter connection, can be used internet, wide area network, local network, Metropolitan Area Network (MAN) etc..
Bus 92 can be isa bus, pci bus or eisa bus etc..It is total that bus can be divided into address bus, data Line, control bus etc..Only to be indicated with a four-headed arrow in Fig. 6, it is not intended that an only bus or one convenient for indicating The bus of seed type.
Wherein, memory 91 is for storing program, and processor 90 executes program after receiving and executing instruction, and aforementioned Method performed by the device that the stream process that inventive embodiments any embodiment discloses defines can be applied in processor 90, or Person is realized by processor 90.
Processor 90 may be a kind of IC chip, the processing capacity with signal.During realization, above-mentioned side Each step of method can be completed by the integrated logic circuit of the hardware in processor 90 or the instruction of software form.Above-mentioned Processor 90 can be general processor, including central processing unit (Central Processing Unit, abbreviation CPU), network Processor (Network Processor, abbreviation NP) etc.;It can also be digital signal processor (Digital Signal Processing, abbreviation DSP), specific integrated circuit (Application Specific Integrated Circuit, referred to as ASIC), ready-made programmable gate array (Field-Programmable Gate Array, abbreviation FPGA) or other are programmable Logical device, discrete gate or transistor logic, discrete hardware components.It may be implemented or execute in the embodiment of the present invention Disclosed each method, step and logic diagram.General processor can be microprocessor or the processor is also possible to appoint What conventional processor etc..The step of method in conjunction with disclosed in the embodiment of the present invention, can be embodied directly in hardware decoding processing Device executes completion, or in decoding processor hardware and software module combination execute completion.Software module can be located at Machine memory, flash memory, read-only memory, programmable read only memory or electrically erasable programmable memory, register etc. are originally In the storage medium of field maturation.The storage medium is located at memory 91, and processor 90 reads the information in memory 91, in conjunction with Its hardware completes the step of above method.
In another embodiment, a kind of calculating of non-volatile program code that can be performed with processor is additionally provided The step of machine readable medium, said program code makes the processor execute method described in above-described embodiment one.
The computer program product of fraud clique's method for digging, device and electronic equipment provided by the embodiment of the present invention, Computer readable storage medium including storing program code, the instruction that said program code includes can be used for executing front side Method method as described in the examples, specific implementation can be found in embodiment of the method, and details are not described herein.
It is apparent to those skilled in the art that for convenience and simplicity of description, the system of foregoing description It with the specific work process of device, can refer to corresponding processes in the foregoing method embodiment, details are not described herein.
In addition, in the description of the embodiment of the present invention unless specifically defined or limited otherwise, term " installation ", " phase Even ", " connection " shall be understood in a broad sense, for example, it may be being fixedly connected, may be a detachable connection, or be integrally connected;It can To be mechanical connection, it is also possible to be electrically connected;It can be directly connected, can also can be indirectly connected through an intermediary Connection inside two elements.For the ordinary skill in the art, above-mentioned term can be understood at this with concrete condition Concrete meaning in invention.
It, can be with if the function is realized in the form of SFU software functional unit and when sold or used as an independent product It is stored in a computer readable storage medium.Based on this understanding, technical solution of the present invention is substantially in other words The part of the part that contributes to existing technology or the technical solution can be embodied in the form of software products, the meter Calculation machine software product is stored in a storage medium, including some instructions are used so that a computer equipment (can be a People's computer, server or network equipment etc.) it performs all or part of the steps of the method described in the various embodiments of the present invention. And storage medium above-mentioned includes: that USB flash disk, mobile hard disk, read-only memory (ROM, Read-Only Memory), arbitrary access are deposited The various media that can store program code such as reservoir (RAM, Random Access Memory), magnetic or disk.
In the description of the present invention, it should be noted that term " center ", "upper", "lower", "left", "right", "vertical", The orientation or positional relationship of the instructions such as "horizontal", "inner", "outside" be based on the orientation or positional relationship shown in the drawings, merely to Convenient for description the present invention and simplify description, rather than the device or element of indication or suggestion meaning must have a particular orientation, It is constructed and operated in a specific orientation, therefore is not considered as limiting the invention.In addition, term " first ", " second ", " third " is used for descriptive purposes only and cannot be understood as indicating or suggesting relative importance.
Finally, it should be noted that the above embodiments are only used to illustrate the technical solution of the present invention., rather than its limitations;To the greatest extent Pipe present invention has been described in detail with reference to the aforementioned embodiments, those skilled in the art should understand that: its according to So be possible to modify the technical solutions described in the foregoing embodiments, or to some or all of the technical features into Row equivalent replacement;And these are modified or replaceed, various embodiments of the present invention technology that it does not separate the essence of the corresponding technical solution The range of scheme.

Claims (10)

1. a kind of fraud clique method for digging characterized by comprising
Obtain initial data;
The initial data is pre-processed according to preprocessing rule, the data that obtain that treated, wherein described treated Noise data are not included in data;
Treated that data are saved to chart database by described, obtains figure number corresponding with the storage organization of the chart database According to;
The diagram data is divided by community discovery algorithm, obtains multiple groups community data;
The multiple groups community data is analyzed based on preset rules library, determines target preprocessing rule based on the analysis results;
Using the target preprocessing rule as the preprocessing rule, and using the multiple groups community data as the original number According to return, which is executed, carries out pretreated step to the initial data according to preprocessing rule, until every group of community data meets Preset condition;The preset condition includes: that the number of nodes in every group of community data is not more than preset threshold, alternatively, every group of community Number of nodes in data no longer changes;
Visualization processing is carried out to every group of community data for meeting the preset condition, obtains community network figure;
The evaluation of fraud clique is carried out to the community network figure by default fraud clique's mining rule, is determined according to evaluation result Whether the corresponding clique of the community network figure is fraud clique.
2. the method according to claim 1, wherein the preprocessing rule includes: preset data cleaning rule With noise recognition rule, the initial data is pre-processed according to preprocessing rule, obtaining that treated, data include:
Data cleansing is carried out to the initial data according to the preset data cleaning rule, the data after being cleaned;
The noise data in the data after the cleaning is identified based on the noise recognition rule;
The noise data in data after removing the cleaning obtains treated the data.
3. the method according to claim 1, wherein the community discovery algorithm includes but is not limited to following any Kind: louvain community discovery algorithm, the community discovery algorithm propagated based on label.
4. the method according to claim 1, wherein including: default feature and processing in the preset rules library Corresponding relationship between rule is analyzed the multiple groups community data based on preset rules library, is determined based on the analysis results Target preprocessing rule includes:
Feature extraction is carried out to the multiple groups community data, obtains the target signature of the multiple groups community data;
The target signature is matched with the default feature in the preset rules library;
Target processing rule corresponding with the target signature is determined in the processing rule according to matching result;
It regard target processing rule as the target preprocessing rule.
5. the method according to claim 1, wherein including: individual nodes, attribute section in the community network figure Incidence relation between point, the individual nodes and the attribute node.
6. the method according to claim 1, wherein by default fraud clique's mining rule to the community network Network figure carries out the evaluation of fraud clique, determines whether the corresponding clique of the community network figure is fraud clique packet according to evaluation result It includes:
It analyzes the community network figure and default fraud clique's mining rule meets situation;
Meet situation according to described and give a mark to the community network figure, the fraud clique for obtaining the community network figure obtains Point;
Determine whether the corresponding clique of the community network figure is fraud clique according to fraud clique score.
7. a kind of fraud clique excavating gear characterized by comprising
Module is obtained, for obtaining initial data:
Preprocessing module, for being pre-processed according to preprocessing rule to the initial data, the data that obtain that treated, In, noise data are not included in treated the data;
Preserving module obtains the storage knot with the chart database for treated that data are saved to chart database by described The corresponding diagram data of structure;
It divides processing module and obtains multiple groups community data for dividing by community discovery algorithm to the diagram data;
Analysis module determines mesh for analyzing based on preset rules library the multiple groups community data based on the analysis results Mark preprocessing rule;
Execution module is returned to, for using the target preprocessing rule as the preprocessing rule, and by the multiple groups community Data are returned and are executed according to preprocessing rule to the pretreated step of initial data progress, directly as the initial data Meet preset condition to every group of community data;The preset condition includes: number of nodes in every group of community data no more than default Threshold value, alternatively, the number of nodes in every group of community data no longer changes;
Visualization processing module obtains society for carrying out visualization processing to every group of community data for meeting the preset condition Area's network;
Clique's evaluation module is cheated, for carrying out fraud clique to the community network figure by default fraud clique's mining rule Evaluation determines whether the corresponding clique of the community network figure is fraud clique according to evaluation result.
8. device according to claim 7, which is characterized in that the preprocessing rule includes: preset data cleaning rule With noise recognition rule, the preprocessing module includes:
Data cleansing unit is obtained for carrying out data cleansing to the initial data according to the preset data cleaning rule Data after cleaning;
Recognition unit, for identifying the noise data in the data after the cleaning based on the noise recognition rule;
Removal unit obtains treated the data for removing the noise data in the data after the cleaning.
9. device according to claim 7, which is characterized in that the community discovery algorithm includes but is not limited to following any Kind: louvain community discovery algorithm, the community discovery algorithm propagated based on label.
10. a kind of electronic equipment, including memory, processor, it is stored with and can runs on the processor on the memory Computer program, which is characterized in that the processor is realized in the claims 1 to 6 when executing the computer program The step of described in any item methods.
CN201910496109.1A 2019-06-10 2019-06-10 Cheating group mining method and device and electronic equipment Active CN110209660B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910496109.1A CN110209660B (en) 2019-06-10 2019-06-10 Cheating group mining method and device and electronic equipment

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910496109.1A CN110209660B (en) 2019-06-10 2019-06-10 Cheating group mining method and device and electronic equipment

Publications (2)

Publication Number Publication Date
CN110209660A true CN110209660A (en) 2019-09-06
CN110209660B CN110209660B (en) 2021-12-24

Family

ID=67791653

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910496109.1A Active CN110209660B (en) 2019-06-10 2019-06-10 Cheating group mining method and device and electronic equipment

Country Status (1)

Country Link
CN (1) CN110209660B (en)

Cited By (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110647590A (en) * 2019-09-23 2020-01-03 税友软件集团股份有限公司 Target community data identification method and related device
CN112288330A (en) * 2020-11-24 2021-01-29 拉卡拉支付股份有限公司 Method and device for identifying cheating community
CN112419074A (en) * 2020-11-13 2021-02-26 中保车服科技服务股份有限公司 Vehicle insurance fraud group identification method and device
CN112910888A (en) * 2021-01-29 2021-06-04 杭州迪普科技股份有限公司 Illegal domain name registration group mining method and device
CN112926990A (en) * 2021-03-25 2021-06-08 支付宝(杭州)信息技术有限公司 Method and device for fraud identification
CN113129010A (en) * 2020-01-10 2021-07-16 联洋国融(北京)科技有限公司 Fraud group mining system and method based on complex network model
CN113240259A (en) * 2021-04-30 2021-08-10 顶象科技有限公司 Method and system for generating rule policy group and electronic equipment
CN113743954A (en) * 2021-06-29 2021-12-03 阳光保险集团股份有限公司 Vehicle insurance risk network identification method and device, electronic equipment and medium
CN113870021A (en) * 2021-12-03 2021-12-31 北京芯盾时代科技有限公司 Data analysis method and device, storage medium and electronic equipment
CN117575782A (en) * 2024-01-15 2024-02-20 杭银消费金融股份有限公司 Leiden community discovery algorithm-based group fraud identification method

Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20150161622A1 (en) * 2013-12-10 2015-06-11 Florian Hoffmann Fraud detection using network analysis
CN106408413A (en) * 2016-09-23 2017-02-15 快睿登信息科技(上海)有限公司 Multi-cycle installment decision making method and system
CN107194623A (en) * 2017-07-20 2017-09-22 深圳市分期乐网络科技有限公司 A kind of discovery method and device of clique's fraud
US20180285876A1 (en) * 2017-03-30 2018-10-04 Ncr Corporation Domain-specific configurable fraud prevention
CN108764917A (en) * 2018-05-04 2018-11-06 阿里巴巴集团控股有限公司 It is a kind of fraud clique recognition methods and device
CN109658222A (en) * 2018-10-16 2019-04-19 深圳壹账通智能科技有限公司 Risk analysis method, device, equipment and computer readable storage medium
CN109784636A (en) * 2018-12-13 2019-05-21 中国平安财产保险股份有限公司 Fraudulent user recognition methods, device, computer equipment and storage medium
CN109802915A (en) * 2017-11-16 2019-05-24 中国移动通信集团河南有限公司 A kind of telecommunication fraud detection processing method and device
CN109816535A (en) * 2018-12-13 2019-05-28 中国平安财产保险股份有限公司 Cheat recognition methods, device, computer equipment and storage medium

Patent Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20150161622A1 (en) * 2013-12-10 2015-06-11 Florian Hoffmann Fraud detection using network analysis
CN106408413A (en) * 2016-09-23 2017-02-15 快睿登信息科技(上海)有限公司 Multi-cycle installment decision making method and system
US20180285876A1 (en) * 2017-03-30 2018-10-04 Ncr Corporation Domain-specific configurable fraud prevention
CN107194623A (en) * 2017-07-20 2017-09-22 深圳市分期乐网络科技有限公司 A kind of discovery method and device of clique's fraud
CN109802915A (en) * 2017-11-16 2019-05-24 中国移动通信集团河南有限公司 A kind of telecommunication fraud detection processing method and device
CN108764917A (en) * 2018-05-04 2018-11-06 阿里巴巴集团控股有限公司 It is a kind of fraud clique recognition methods and device
CN109658222A (en) * 2018-10-16 2019-04-19 深圳壹账通智能科技有限公司 Risk analysis method, device, equipment and computer readable storage medium
CN109784636A (en) * 2018-12-13 2019-05-21 中国平安财产保险股份有限公司 Fraudulent user recognition methods, device, computer equipment and storage medium
CN109816535A (en) * 2018-12-13 2019-05-28 中国平安财产保险股份有限公司 Cheat recognition methods, device, computer equipment and storage medium

Cited By (13)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110647590A (en) * 2019-09-23 2020-01-03 税友软件集团股份有限公司 Target community data identification method and related device
CN113129010A (en) * 2020-01-10 2021-07-16 联洋国融(北京)科技有限公司 Fraud group mining system and method based on complex network model
CN112419074A (en) * 2020-11-13 2021-02-26 中保车服科技服务股份有限公司 Vehicle insurance fraud group identification method and device
CN112288330A (en) * 2020-11-24 2021-01-29 拉卡拉支付股份有限公司 Method and device for identifying cheating community
CN112910888A (en) * 2021-01-29 2021-06-04 杭州迪普科技股份有限公司 Illegal domain name registration group mining method and device
CN112926990A (en) * 2021-03-25 2021-06-08 支付宝(杭州)信息技术有限公司 Method and device for fraud identification
CN113240259A (en) * 2021-04-30 2021-08-10 顶象科技有限公司 Method and system for generating rule policy group and electronic equipment
CN113743954A (en) * 2021-06-29 2021-12-03 阳光保险集团股份有限公司 Vehicle insurance risk network identification method and device, electronic equipment and medium
CN113743954B (en) * 2021-06-29 2024-04-02 阳光保险集团股份有限公司 Vehicle risk network identification method and device, electronic equipment and medium
CN113870021A (en) * 2021-12-03 2021-12-31 北京芯盾时代科技有限公司 Data analysis method and device, storage medium and electronic equipment
CN113870021B (en) * 2021-12-03 2022-03-08 北京芯盾时代科技有限公司 Data analysis method and device, storage medium and electronic equipment
CN117575782A (en) * 2024-01-15 2024-02-20 杭银消费金融股份有限公司 Leiden community discovery algorithm-based group fraud identification method
CN117575782B (en) * 2024-01-15 2024-05-07 杭银消费金融股份有限公司 Leiden community discovery algorithm-based group fraud identification method

Also Published As

Publication number Publication date
CN110209660B (en) 2021-12-24

Similar Documents

Publication Publication Date Title
CN110209660A (en) Cheat clique's method for digging, device and electronic equipment
US11636487B2 (en) Graph decomposition for fraudulent transaction analysis
CN109615524B (en) Money laundering crime group partner identification method, money laundering crime group partner identification device, computer equipment and storage medium
WO2021254027A1 (en) Method and apparatus for identifying suspicious community, and storage medium and computer device
CN109784636A (en) Fraudulent user recognition methods, device, computer equipment and storage medium
WO2015135321A1 (en) Method and device for mining social relationship based on financial data
CN111222976B (en) Risk prediction method and device based on network map data of two parties and electronic equipment
CN111199474B (en) Risk prediction method and device based on network map data of two parties and electronic equipment
CN111210326A (en) Method and system for constructing user portrait
CN107704512A (en) Financial product based on social data recommends method, electronic installation and medium
CN110060087B (en) Abnormal data detection method, device and server
CN108363686A (en) A kind of character string segmenting method, device, terminal device and storage medium
CN105389486B (en) A kind of authentication method based on mouse behavior
CN106156092A (en) Data processing method and device
CN113537960B (en) Determination method, device and equipment for abnormal resource transfer link
CN110427628A (en) Web assets classes detection method and device based on neural network algorithm
CN111444232A (en) Method for mining digital currency exchange address and storage medium
CN113093958A (en) Data processing method and device and server
CN116503166A (en) Tracking method and tracking system for transaction funds on Ethernet chain
CN112581271A (en) Merchant transaction risk monitoring method, device, equipment and storage medium
CN114118816B (en) Risk assessment method, apparatus, device and computer storage medium
CN110009056B (en) Method and device for classifying social account numbers
CN111126788A (en) Risk identification method and device and electronic equipment
CN115237355B (en) Directional exchange method and system based on flash memory data identification
CN111242763A (en) Method and device for determining target user group

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant