CN110209660A - Cheat clique's method for digging, device and electronic equipment - Google Patents
Cheat clique's method for digging, device and electronic equipment Download PDFInfo
- Publication number
- CN110209660A CN110209660A CN201910496109.1A CN201910496109A CN110209660A CN 110209660 A CN110209660 A CN 110209660A CN 201910496109 A CN201910496109 A CN 201910496109A CN 110209660 A CN110209660 A CN 110209660A
- Authority
- CN
- China
- Prior art keywords
- data
- community
- rule
- clique
- fraud
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/21—Design, administration or maintenance of databases
- G06F16/215—Improving data quality; Data cleansing, e.g. de-duplication, removing invalid entries or correcting typographical errors
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/24—Querying
- G06F16/245—Query processing
- G06F16/2455—Query execution
- G06F16/24564—Applying rules; Deductive queries
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/24—Querying
- G06F16/245—Query processing
- G06F16/2458—Special types of queries, e.g. statistical queries, fuzzy queries or distributed queries
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06Q—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
- G06Q30/00—Commerce
- G06Q30/018—Certifying business or products
- G06Q30/0185—Product, service or business identity fraud
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06Q—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
- G06Q40/00—Finance; Insurance; Tax strategies; Processing of corporate or income taxes
- G06Q40/03—Credit; Loans; Processing thereof
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Business, Economics & Management (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Databases & Information Systems (AREA)
- Accounting & Taxation (AREA)
- Finance (AREA)
- General Engineering & Computer Science (AREA)
- Data Mining & Analysis (AREA)
- Marketing (AREA)
- Computational Linguistics (AREA)
- General Business, Economics & Management (AREA)
- Development Economics (AREA)
- Strategic Management (AREA)
- Economics (AREA)
- Quality & Reliability (AREA)
- Technology Law (AREA)
- Entrepreneurship & Innovation (AREA)
- Fuzzy Systems (AREA)
- Mathematical Physics (AREA)
- Probability & Statistics with Applications (AREA)
- Software Systems (AREA)
- Management, Administration, Business Operations System, And Electronic Commerce (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
The present invention provides a kind of fraud clique method for digging, device and electronic equipment, in the method, first initial data is pre-processed, the data that obtain that treated, the pretreated process can delete noise data before building figure, reduce influence of the noise data to figure is built, simultaneously, iteration pretreatment has been carried out during this and divides the process of processing, until the number of nodes of every group of community data is not more than preset threshold in finally obtained multiple groups community data, or, until the number of nodes of every group of community data no longer changes in multiple groups community data, visualization processing is carried out to every group of community data for dividing completion again, obtain community network figure, in this way, great deal of nodes is not present in obtained each community network figure, meet fraud clique's characteristic, convenient for visualizing and carrying out subsequent fraud clique evaluation, finally determine Fraud clique's accuracy is good, alleviates the technical problem of existing fraud clique's method for digging accuracy difference.
Description
Technical field
The present invention relates to the technical fields of computer, more particularly, to a kind of fraud clique method for digging, device and electronics
Equipment.
Background technique
Universal with the various loans in financial field or class loan transaction, clique's fraud case gradually increases, to investor,
Company and country cause different degrees of loss.A kind of technical solution is eager to seek to find clique's case of victimization in each businesses and institutions
Part is accomplished to take precautions against possible trouble, prevents and recover in time loss.
Existing fraud clique method for digging is to first pass through community discovery algorithm to be split data, is obtained after segmentation
Then multiple communities carry out the evaluation of fraud clique to each community again, so that it is determined that fraud clique therein.But existing society
Area finds algorithm, is all to divide from the characteristic of network topology structure to the technology that data carry out, never consideration actual demand.Most
Eventually, there is a large amount of big community in the community's result divided, these big communities are not easy to the evaluation of subsequent fraud clique,
And includes also some noise nodes or relationship in these big communities, cause the fraud clique accuracy finally determined poor.
To sum up, existing fraud clique method for digging accuracy is poor.
Summary of the invention
It is existing to alleviate the purpose of the present invention is to provide a kind of fraud clique method for digging, device and electronic equipment
Cheat the technical problem of clique's method for digging accuracy difference.
A kind of fraud clique method for digging provided by the invention, comprising: obtain initial data;According to preprocessing rule to institute
It states initial data to be pre-processed, the data that obtain that treated, wherein do not include noise data in treated the data;
Treated that data are saved to chart database by described, obtains diagram data corresponding with the storage organization of the chart database;It is logical
It crosses community discovery algorithm to divide the diagram data, obtains multiple groups community data;Based on preset rules library to the multiple groups
Community data is analyzed, and determines target preprocessing rule based on the analysis results;Using the target preprocessing rule as described in
Preprocessing rule, and using the multiple groups community data as the initial data, it returns and executes according to preprocessing rule to described
Initial data carries out pretreated step, until every group of community data meets preset condition;The preset condition includes: every group of society
Number of nodes in area's data is not more than preset threshold, alternatively, the number of nodes in every group of community data no longer changes;Described in satisfaction
Every group of community data of preset condition carries out visualization processing, obtains community network figure;Pass through default fraud clique's mining rule
The evaluation of fraud clique is carried out to the community network figure, whether the corresponding clique of the community network figure is determined according to evaluation result
To cheat clique.
Further, the preprocessing rule includes: preset data cleaning rule and noise recognition rule, according to pretreatment
Rule pre-processes the initial data, and obtaining that treated, data include: according to the preset data cleaning rule pair
The initial data carries out data cleansing, the data after being cleaned;After identifying the cleaning based on the noise recognition rule
Data in noise data;The noise data in data after removing the cleaning obtains treated the data.
Further, the community discovery algorithm includes but is not limited to any of the following: louvain community discovery algorithm,
The community discovery algorithm propagated based on label.
Further, in the preset rules library include: corresponding relationship between default feature and processing rule, based on pre-
If rule base analyzes the multiple groups community data, determine that target preprocessing rule includes: to described based on the analysis results
Multiple groups community data carries out feature extraction, obtains the target signature of the multiple groups community data;By the target signature with it is described
Default feature in preset rules library is matched;It is determined and the target signature in the processing rule according to matching result
Corresponding target processing rule;It regard target processing rule as the target preprocessing rule.
It further, include: individual nodes, attribute node, the individual nodes and the category in the community network figure
Incidence relation between property node.
Further, the evaluation of fraud clique, root are carried out to the community network figure by default fraud clique's mining rule
According to evaluation result determine the corresponding clique of the community network figure whether be fraud clique include: the analysis community network figure with
Default fraud clique's mining rule meets situation;Meet situation according to described and give a mark to the community network figure,
Obtain the fraud clique score of the community network figure;Determine that the community network figure is corresponding according to fraud clique score
Whether clique is fraud clique.
The present invention also provides a kind of fraud clique excavating gears, comprising: module is obtained, for obtaining initial data;In advance
Processing module, for being pre-processed according to preprocessing rule to the initial data, the data that obtain that treated, wherein institute
Noise data are not included in data of stating that treated;Preserving module, for treated that data are saved to chart database by described,
Obtain diagram data corresponding with the storage organization of the chart database;Processing module is divided, for passing through community discovery algorithm pair
The diagram data is divided, and multiple groups community data is obtained;Analysis module, for being based on preset rules library to the multiple groups community
Data are analyzed, and determine target preprocessing rule based on the analysis results;Execution module is returned to, for pre-processing the target
Rule is used as the preprocessing rule, and using the multiple groups community data as the initial data, returns and execute according to pre- place
Reason rule carries out pretreated step to the initial data, until every group of community data meets preset condition;The default item
Part includes: that the number of nodes in every group of community data is not more than preset threshold, alternatively, the number of nodes in every group of community data no longer becomes
Change;Visualization processing module obtains community for carrying out visualization processing to every group of community data for meeting the preset condition
Network;Clique's evaluation module is cheated, for cheating by default fraud clique's mining rule the community network figure
Clique's evaluation determines whether the corresponding clique of the community network figure is fraud clique according to evaluation result.
Further, the preprocessing rule includes: preset data cleaning rule and noise recognition rule, the pretreatment
Module includes: data cleansing unit, for carrying out data cleansing to the initial data according to the preset data cleaning rule,
Data after being cleaned;Recognition unit, for identifying making an uproar in the data after the cleaning based on the noise recognition rule
Sound data;Removal unit obtains treated the data for removing the noise data in the data after the cleaning.
Further, the community discovery algorithm includes but is not limited to any of the following: louvain community discovery algorithm,
The community discovery algorithm propagated based on label.
The present invention also provides a kind of electronic equipment, including memory, processor, being stored on the memory can be in institute
The computer program run on processor is stated, the processor is realized described in above content when executing the computer program
The step of method.
In embodiments of the present invention, initial data is first obtained;Then initial data is located in advance according to preprocessing rule
Reason, the data that obtain that treated, and will treated that data are saved to chart database, obtain the storage organization pair with chart database
The diagram data answered;And then diagram data is divided by community discovery algorithm, multiple groups community data is obtained, then based on default rule
Then multiple groups community data is analyzed in library, determines target preprocessing rule based on the analysis results, and by target preprocessing rule
As preprocessing rule, using multiple groups community data as initial data, return execute according to preprocessing rule to initial data into
The pretreated step of row, until every group of community data meets preset condition;Preset condition includes: the node in every group of community data
Number is not more than preset threshold, alternatively, the number of nodes in every group of community data no longer changes;Finally to meeting every group of preset condition
Community data carries out visualization processing, obtains community network figure;Again by default fraud clique's mining rule to community network figure
The evaluation of fraud clique is carried out, determines whether the corresponding clique of community network figure is fraud clique according to evaluation result.By above-mentioned
Description it is found that in fraud clique method for digging of the invention, be first to be pre-processed to initial data, obtain that treated
Data, the pretreated process can delete noise data before building figure, reduce influence of the noise data to figure is built, together
When, iteration pretreatment should have been carried out in the process and has divided the process of processing, until every group in finally obtained multiple groups community data
The number of nodes of community data is not more than preset threshold, alternatively, the number of nodes of every group of community data no longer becomes in multiple groups community data
It turns to only, that is, until the number of nodes of every group of community data meets the spy of fraud clique in finally obtained multiple groups community data
Point, alternatively, in finally obtained multiple groups community data the number of nodes of every group of community data cannot be further continued for divide (i.e. divide obtain
Be minimum unit) until, then to divide complete every group of community data carry out visualization processing, obtain community network
Figure, in this way, in obtained each community network figure be not present great deal of nodes, meet fraud clique's characteristic, convenient for visualization and
Subsequent fraud clique evaluation is carried out, finally determining fraud clique accuracy is good, alleviates excavation side, existing fraud clique
The technical problem of method accuracy difference.
Detailed description of the invention
It, below will be to specific in order to illustrate more clearly of the specific embodiment of the invention or technical solution in the prior art
Embodiment or attached drawing needed to be used in the description of the prior art be briefly described, it should be apparent that, it is described below
Attached drawing is some embodiments of the present invention, for those of ordinary skill in the art, before not making the creative labor
It puts, is also possible to obtain other drawings based on these drawings.
Fig. 1 is a kind of flow chart for cheating clique's method for digging provided in an embodiment of the present invention;
Fig. 2 pre-processes initial data according to preprocessing rule to be provided in an embodiment of the present invention, after obtaining processing
Data method flow diagram;
Fig. 3 is that multiple groups community data is analyzed in the preset rules library provided in an embodiment of the present invention that is based on, according to analysis
As a result the method flow diagram of target preprocessing rule is determined;
Fig. 4 carries out fraud group to community network figure by default fraud clique's mining rule to be provided in an embodiment of the present invention
Partner's evaluation determines whether the corresponding clique of community network figure is the method flow diagram for cheating clique according to evaluation result;
Fig. 5 is a kind of schematic diagram for cheating clique's excavating gear provided in an embodiment of the present invention;
Fig. 6 is the schematic diagram of a kind of electronic equipment provided in an embodiment of the present invention.
Specific embodiment
Technical solution of the present invention is clearly and completely described below in conjunction with embodiment, it is clear that described reality
Applying example is a part of the embodiment of the present invention, instead of all the embodiments.Based on the embodiments of the present invention, the common skill in this field
Art personnel every other embodiment obtained without making creative work belongs to the model that the present invention protects
It encloses.
For convenient for understanding the present embodiment, first to excavation side, a kind of fraud clique disclosed in the embodiment of the present invention
Method describes in detail.
Embodiment one:
According to embodiments of the present invention, a kind of embodiment for cheating clique's method for digging is provided, it should be noted that attached
The step of process of figure illustrates can execute in a computer system such as a set of computer executable instructions, though also,
So logical order is shown in flow charts, but in some cases, it can be to be different from shown by sequence execution herein
Or the step of description.
Fig. 1 is a kind of flow chart for cheating clique's method for digging according to an embodiment of the present invention, as shown in Figure 1, this method
Include the following steps:
Step S102 obtains initial data;
In embodiments of the present invention, the acquisition channel of initial data can there are many, such as: user is carrying out related loan
When the application of application business, the electronic application data of submission, alternatively, being the hand-written request for data submitted;It can also be online
Related data crawled etc., the embodiment of the present invention is to the acquisition form of above-mentioned initial data without concrete restriction.
Step S104 pre-processes initial data according to preprocessing rule, the data that obtain that treated, wherein place
Noise data are not included in data after reason;
After obtaining initial data, initial data is pre-processed according to preprocessing rule, hereinafter again to the process
It describes in detail, details are not described herein.
Step S106, by treated, data are saved to chart database, are obtained corresponding with the storage organization of chart database
Diagram data;
After the data that obtain that treated, will treated that data are saved to chart database, can obtain and chart database
The corresponding diagram data of storage organization.
Step S108 divides diagram data by community discovery algorithm, obtains multiple groups community data;
After obtaining diagram data, further diagram data is divided by community discovery algorithm, obtains multiple groups community data.
Specifically, community discovery algorithm includes but is not limited to any of the following: louvain community discovery algorithm is propagated based on label
Community discovery algorithm.
Step S110 analyzes multiple groups community data based on preset rules library, determines that target is pre- based on the analysis results
Processing rule;
Hereinafter the process is described in detail again, details are not described herein.
Step S112, using target preprocessing rule as preprocessing rule, and using multiple groups community data as initial data,
It returns to execute and pretreated step is carried out to initial data according to preprocessing rule, until every group of community data meets default item
Part;Preset condition includes: that the number of nodes in every group of community data is not more than preset threshold, alternatively, the section in every group of community data
Points no longer change;
It returns to execute and pretreated step is carried out to initial data according to preprocessing rule, until obtained multiple groups community number
According to until meeting preset condition.
Step S114 carries out visualization processing to every group of community data for meeting preset condition, obtains community network figure;
After obtaining meeting every group of community data of preset condition, carrying out to every group of community data for meeting preset condition can
It is handled depending on change, it will be able to obtain community network figure.
Step S116 carries out the evaluation of fraud clique to community network figure by default fraud clique's mining rule, according to commenting
Valence result determines whether the corresponding clique of community network figure is fraud clique.
From the above description it can be seen that in fraud clique method for digging of the invention, be first initial data has been carried out it is pre-
Processing, the data that obtain that treated, the pretreated process can delete noise data before building figure, reduce noise number
According to the influence to figure is built, meanwhile, iteration pretreatment should have been carried out in the process and has divided the process of processing, until finally obtained more
The number of nodes of every group of community data is not more than preset threshold in group community data, alternatively, every group of community's number in multiple groups community data
According to number of nodes no longer change until, that is, until finally obtained multiple groups community data in every group of community data number of nodes
The characteristics of meeting fraud clique, alternatively, the number of nodes of every group of community data cannot be followed by finally obtained multiple groups community data
Until continuous division (what i.e. division obtained has been minimum unit), then carried out at visualization to the every group of community data completed is divided
Reason, obtains community network figure, in this way, great deal of nodes is not present in obtained each community network figure, it is special to meet fraud clique
Property, convenient for visualizing and carrying out subsequent fraud clique evaluation, finally determining fraud clique accuracy is good, alleviates existing
Cheat the technical problem of clique's method for digging accuracy difference.
Above content has carried out brief introduction to fraud clique method for digging of the invention, below to the tool being directed to
Hold in vivo and is described in detail.
In an alternate embodiment of the present invention where, preprocessing rule includes: preset data cleaning rule and noise identification
Rule pre-processes initial data according to preprocessing rule with reference to Fig. 2, step S104, and obtaining that treated, data include
Following steps:
Step S201 carries out data cleansing to initial data according to preset data cleaning rule, the data after being cleaned;
Specifically, preset data cleaning rule can specifically include: check field format, format conversion, error correction, decimal place
Number processing etc., the embodiment of the present invention is to above-mentioned preset data cleaning rule without concrete restriction.
Step S202, based on the noise data in the data after the identification cleaning of noise recognition rule;
Specifically, the noise recognition rule of first time is preset noise recognition rule, and subsequent noise recognition rule
It is after being analyzed based on preset rules library last time obtained multiple groups community data, based on the analysis results determining target
Preprocessing rule.Hereinafter describe in detail again to the process of determining target preprocessing rule.
Above-mentioned noise recognition rule can identify the noise data in the data after cleaning.The noise data refers to useless
, generate the attribute node of interference or the data of incidence relation.
I.e. application individual is by some Attribute Associations at big community, but these attributes for gathering into big community are not small-scale
Cheat the characteristic of clique.
Such as: 10,000 main bodys belong to company A, this ten thousand main bodys have all carried out loan application, then this ten thousand
Main body will be based on this Attribute transposition of company A to a community Ge great, but it certainly not cheats clique.It so can be by A
This attribute node of company removal, in this way, 10,000 main bodys would not establish subsequent association by this attribute node because of company A
Relationship avoids the formation of useless big community, convenient for the excacation of subsequent fraud clique.
Step S203, the noise data in data after removal cleaning, the data that obtain that treated.
The process of data prediction is described in detail in above content, below to based on preset rules library to multiple groups society
The process that area's data are analyzed is described in detail.
In an alternate embodiment of the present invention where, with reference to Fig. 3, step S110 includes: default feature in preset rules library
With the corresponding relationship between processing rule, multiple groups community data is analyzed based on preset rules library, based on the analysis results really
The preprocessing rule that sets the goal includes the following steps:
Step S301 carries out feature extraction to multiple groups community data, obtains the target signature of multiple groups community data;
Specifically, carrying out feature when feature extraction to multiple groups community data based on the default feature in preset rules library and mentioning
It takes.For example, default feature is to belong to a company, then being judged as whether each group community data in multiple groups community data belongs to
In a company, feature (the i.e. target spy whether each group community data in multiple groups community data belongs to a company is obtained
Sign).
Step S302 matches target signature with the default feature in preset rules library;
Step S303 determines that target corresponding with target signature handles rule in processing rule according to matching result;
Step S304 regard target processing rule as target preprocessing rule.
The process for determining target preprocessing rule is described in detail in above content, below to the evaluation of fraud clique
Process is described in detail.
In an alternate embodiment of the present invention where, with reference to Fig. 4, step S116, pass through default fraud clique's mining rule
The evaluation of fraud clique is carried out to community network figure, determines whether the corresponding clique of community network figure is fraud group according to evaluation result
Partner includes the following steps:
Step S401, analysis community network figure and default fraud clique's mining rule meet situation;
Specifically, including: the pass between individual nodes, attribute node, individual nodes and attribute node in community network figure
Connection relationship, above-mentioned default fraud clique's mining rule are the rule summarized after analyzing actual fraud clique, should
Default fraud clique's mining rule is adjustable, is no longer illustrated here.
Step S402 gives a mark to community network figure according to situation is met, and the fraud clique for obtaining community network figure obtains
Point;
When realization, (it can be accorded with according to each community network figure and the percentage that meets of default fraud clique's mining rule
Close situation) it gives a mark to each community network figure;It is, of course, also possible to default to each default fraud clique's mining rule in advance
One weight, by obtained each community network figure and it is a certain it is default fraud clique's mining rule meet percentage with it is corresponding
After multiplied by weight, score of each community network figure under default fraud clique's mining rule is obtained, then all preset is taken advantage of
Cheat the fraud clique score after the score under clique's mining rule is added as each community network figure.
It is, of course, also possible to be other implementations, the embodiment of the present invention is to the process of above-mentioned marking without concrete restriction.
Step S403 determines whether the corresponding clique of community network figure is fraud clique according to fraud clique's score.
Obtain fraud clique's score after, can to all community network figures according to fraud clique's score descending sequence into
Row sequence obtains community network figure collating sequence, and then the top n community network figure in community network figure collating sequence is corresponding
Clique as fraud clique, wherein N is positive integer greater than 1;
Certainly, after obtaining fraud clique's score, fraud clique's score can also be compared with default score threshold,
If cheating clique's score is greater than default score threshold, the target society that clique's score is greater than default score threshold will be just cheated
Clique corresponding to area's network is as fraud clique.
Similarly, the embodiment of the present invention is to the process of determining fraud clique without concrete restriction.
It is first to be pre-processed to initial data in fraud clique method for digging of the invention, the number that obtains that treated
According to the pretreated process can delete noise data before building figure, reduce influence of the noise data to figure is built, together
When, iteration pretreatment should have been carried out in the process and has divided the process of processing, until every group in finally obtained multiple groups community data
The number of nodes of community data is not more than preset threshold, alternatively, the number of nodes of every group of community data no longer becomes in multiple groups community data
It turns to only, that is, until the number of nodes of every group of community data meets the spy of fraud clique in finally obtained multiple groups community data
Point, alternatively, in finally obtained multiple groups community data the number of nodes of every group of community data cannot be further continued for divide (i.e. divide obtain
Be minimum unit) until, then to divide complete every group of community data carry out visualization processing, obtain community network
Figure, in this way, in obtained each community network figure be not present great deal of nodes, meet fraud clique's characteristic, convenient for visualization and
Subsequent fraud clique evaluation is carried out, finally determining fraud clique accuracy is good, alleviates excavation side, existing fraud clique
The technical problem of method accuracy difference.
Embodiment two:
The embodiment of the invention also provides a kind of fraud clique excavating gears, below to fraud provided in an embodiment of the present invention
Clique's excavating gear does specific introduction.
Fig. 5 is a kind of schematic diagram for cheating clique's excavating gear according to an embodiment of the present invention, as shown in figure 5, the fraud
Clique's excavating gear mainly includes obtaining module 10, preprocessing module 20, and preserving module 30 divides processing module 40, analyzes mould
Block 50 returns to execution module 60, visualization processing module 70 and fraud clique's evaluation module 80, in which:
Module is obtained, for obtaining initial data;
Preprocessing module, for being pre-processed according to preprocessing rule to initial data, the data that obtain that treated,
In, noise data are not included in data that treated;
Preserving module, for will treated that data are saved to chart database, obtain the storage organization pair with chart database
The diagram data answered;
It divides processing module and obtains multiple groups community data for dividing by community discovery algorithm to diagram data;
Analysis module determines mesh for analyzing based on preset rules library multiple groups community data based on the analysis results
Mark preprocessing rule;
Return to execution module, for using target preprocessing rule as preprocessing rule, and using multiple groups community data as
Initial data is returned and is executed according to preprocessing rule to the pretreated step of initial data progress, until every group of community data is full
Sufficient preset condition;Preset condition includes: that the number of nodes in every group of community data is not more than preset threshold, alternatively, every group of community's number
Number of nodes in no longer changes;
Visualization processing module obtains society for carrying out visualization processing to every group of community data for meeting preset condition
Area's network;
Clique's evaluation module is cheated, for carrying out fraud clique to community network figure by default fraud clique's mining rule
Evaluation determines whether the corresponding clique of community network figure is fraud clique according to evaluation result.
It is first to be pre-processed to initial data in fraud clique excavating gear of the invention, the number that obtains that treated
According to the pretreated process can delete noise data before building figure, reduce influence of the noise data to figure is built, together
When, iteration pretreatment should have been carried out in the process and has divided the process of processing, until every group in finally obtained multiple groups community data
The number of nodes of community data is not more than preset threshold, alternatively, the number of nodes of every group of community data no longer becomes in multiple groups community data
It turns to only, that is, until the number of nodes of every group of community data meets the spy of fraud clique in finally obtained multiple groups community data
Point, alternatively, in finally obtained multiple groups community data the number of nodes of every group of community data cannot be further continued for divide (i.e. divide obtain
Be minimum unit) until, then to divide complete every group of community data carry out visualization processing, obtain community network
Figure, in this way, in obtained each community network figure be not present great deal of nodes, meet fraud clique's characteristic, convenient for visualization and
Subsequent fraud clique evaluation is carried out, finally determining fraud clique accuracy is good, alleviates excavation side, existing fraud clique
The technical problem of method accuracy difference.
Optionally, preprocessing rule includes: preset data cleaning rule and noise recognition rule, and preprocessing module includes:
Data cleansing unit is cleaned for carrying out data cleansing to initial data according to preset data cleaning rule
Data afterwards;
Recognition unit, for based on the noise data in the data after the identification cleaning of noise recognition rule;
Removal unit, for removing the noise data in the data after cleaning, the data that obtain that treated.
Optionally, community discovery algorithm includes but is not limited to any of the following: louvain community discovery algorithm, based on mark
Sign the community discovery algorithm propagated.
Optionally, include: in preset rules library default feature and processing rule between corresponding relationship, analysis module packet
It includes:
Feature extraction unit, for carrying out feature extraction to multiple groups community data, the target for obtaining multiple groups community data is special
Sign;
Matching unit, for matching target signature with the default feature in preset rules library;
First determination unit, for determining that target corresponding with target signature is handled in processing rule according to matching result
Rule;
Setup unit, for regarding target processing rule as target preprocessing rule.
Optionally, in community network figure include: pass between individual nodes, attribute node, individual nodes and attribute node
Connection relationship.
Optionally, fraud clique's evaluation module includes:
Analytical unit meets situation for analyze community network figure and default fraud clique's mining rule;
Marking unit, meets situation for basis and gives a mark to community network figure, obtain the fraud group of community network figure
Partner's score;
Second determination unit, for determining whether the corresponding clique of community network figure is fraud group according to fraud clique's score
Group.
Particular content in the embodiment two can be with reference to the description in above-described embodiment one, and details are not described herein.
Embodiment three:
The embodiment of the invention provides a kind of electronic equipment, and with reference to Fig. 6, which includes: processor 90, memory
91, bus 92 and communication interface 93, processor 90, communication interface 93 and memory 91 are connected by bus 92;Processor 90 is used
The executable module stored in execution memory 91, such as computer program.Processor is realized such as when executing calculating and program
Described in embodiment of the method the step of method.
Wherein, memory 91 may include high-speed random access memory (RAM, Random Access Memory),
It may further include non-labile memory (non-volatile memory), for example, at least a magnetic disk storage.By extremely
A few communication interface 93 (can be wired or wireless) is realized logical between the system network element and at least one other network element
Letter connection, can be used internet, wide area network, local network, Metropolitan Area Network (MAN) etc..
Bus 92 can be isa bus, pci bus or eisa bus etc..It is total that bus can be divided into address bus, data
Line, control bus etc..Only to be indicated with a four-headed arrow in Fig. 6, it is not intended that an only bus or one convenient for indicating
The bus of seed type.
Wherein, memory 91 is for storing program, and processor 90 executes program after receiving and executing instruction, and aforementioned
Method performed by the device that the stream process that inventive embodiments any embodiment discloses defines can be applied in processor 90, or
Person is realized by processor 90.
Processor 90 may be a kind of IC chip, the processing capacity with signal.During realization, above-mentioned side
Each step of method can be completed by the integrated logic circuit of the hardware in processor 90 or the instruction of software form.Above-mentioned
Processor 90 can be general processor, including central processing unit (Central Processing Unit, abbreviation CPU), network
Processor (Network Processor, abbreviation NP) etc.;It can also be digital signal processor (Digital Signal
Processing, abbreviation DSP), specific integrated circuit (Application Specific Integrated Circuit, referred to as
ASIC), ready-made programmable gate array (Field-Programmable Gate Array, abbreviation FPGA) or other are programmable
Logical device, discrete gate or transistor logic, discrete hardware components.It may be implemented or execute in the embodiment of the present invention
Disclosed each method, step and logic diagram.General processor can be microprocessor or the processor is also possible to appoint
What conventional processor etc..The step of method in conjunction with disclosed in the embodiment of the present invention, can be embodied directly in hardware decoding processing
Device executes completion, or in decoding processor hardware and software module combination execute completion.Software module can be located at
Machine memory, flash memory, read-only memory, programmable read only memory or electrically erasable programmable memory, register etc. are originally
In the storage medium of field maturation.The storage medium is located at memory 91, and processor 90 reads the information in memory 91, in conjunction with
Its hardware completes the step of above method.
In another embodiment, a kind of calculating of non-volatile program code that can be performed with processor is additionally provided
The step of machine readable medium, said program code makes the processor execute method described in above-described embodiment one.
The computer program product of fraud clique's method for digging, device and electronic equipment provided by the embodiment of the present invention,
Computer readable storage medium including storing program code, the instruction that said program code includes can be used for executing front side
Method method as described in the examples, specific implementation can be found in embodiment of the method, and details are not described herein.
It is apparent to those skilled in the art that for convenience and simplicity of description, the system of foregoing description
It with the specific work process of device, can refer to corresponding processes in the foregoing method embodiment, details are not described herein.
In addition, in the description of the embodiment of the present invention unless specifically defined or limited otherwise, term " installation ", " phase
Even ", " connection " shall be understood in a broad sense, for example, it may be being fixedly connected, may be a detachable connection, or be integrally connected;It can
To be mechanical connection, it is also possible to be electrically connected;It can be directly connected, can also can be indirectly connected through an intermediary
Connection inside two elements.For the ordinary skill in the art, above-mentioned term can be understood at this with concrete condition
Concrete meaning in invention.
It, can be with if the function is realized in the form of SFU software functional unit and when sold or used as an independent product
It is stored in a computer readable storage medium.Based on this understanding, technical solution of the present invention is substantially in other words
The part of the part that contributes to existing technology or the technical solution can be embodied in the form of software products, the meter
Calculation machine software product is stored in a storage medium, including some instructions are used so that a computer equipment (can be a
People's computer, server or network equipment etc.) it performs all or part of the steps of the method described in the various embodiments of the present invention.
And storage medium above-mentioned includes: that USB flash disk, mobile hard disk, read-only memory (ROM, Read-Only Memory), arbitrary access are deposited
The various media that can store program code such as reservoir (RAM, Random Access Memory), magnetic or disk.
In the description of the present invention, it should be noted that term " center ", "upper", "lower", "left", "right", "vertical",
The orientation or positional relationship of the instructions such as "horizontal", "inner", "outside" be based on the orientation or positional relationship shown in the drawings, merely to
Convenient for description the present invention and simplify description, rather than the device or element of indication or suggestion meaning must have a particular orientation,
It is constructed and operated in a specific orientation, therefore is not considered as limiting the invention.In addition, term " first ", " second ",
" third " is used for descriptive purposes only and cannot be understood as indicating or suggesting relative importance.
Finally, it should be noted that the above embodiments are only used to illustrate the technical solution of the present invention., rather than its limitations;To the greatest extent
Pipe present invention has been described in detail with reference to the aforementioned embodiments, those skilled in the art should understand that: its according to
So be possible to modify the technical solutions described in the foregoing embodiments, or to some or all of the technical features into
Row equivalent replacement;And these are modified or replaceed, various embodiments of the present invention technology that it does not separate the essence of the corresponding technical solution
The range of scheme.
Claims (10)
1. a kind of fraud clique method for digging characterized by comprising
Obtain initial data;
The initial data is pre-processed according to preprocessing rule, the data that obtain that treated, wherein described treated
Noise data are not included in data;
Treated that data are saved to chart database by described, obtains figure number corresponding with the storage organization of the chart database
According to;
The diagram data is divided by community discovery algorithm, obtains multiple groups community data;
The multiple groups community data is analyzed based on preset rules library, determines target preprocessing rule based on the analysis results;
Using the target preprocessing rule as the preprocessing rule, and using the multiple groups community data as the original number
According to return, which is executed, carries out pretreated step to the initial data according to preprocessing rule, until every group of community data meets
Preset condition;The preset condition includes: that the number of nodes in every group of community data is not more than preset threshold, alternatively, every group of community
Number of nodes in data no longer changes;
Visualization processing is carried out to every group of community data for meeting the preset condition, obtains community network figure;
The evaluation of fraud clique is carried out to the community network figure by default fraud clique's mining rule, is determined according to evaluation result
Whether the corresponding clique of the community network figure is fraud clique.
2. the method according to claim 1, wherein the preprocessing rule includes: preset data cleaning rule
With noise recognition rule, the initial data is pre-processed according to preprocessing rule, obtaining that treated, data include:
Data cleansing is carried out to the initial data according to the preset data cleaning rule, the data after being cleaned;
The noise data in the data after the cleaning is identified based on the noise recognition rule;
The noise data in data after removing the cleaning obtains treated the data.
3. the method according to claim 1, wherein the community discovery algorithm includes but is not limited to following any
Kind: louvain community discovery algorithm, the community discovery algorithm propagated based on label.
4. the method according to claim 1, wherein including: default feature and processing in the preset rules library
Corresponding relationship between rule is analyzed the multiple groups community data based on preset rules library, is determined based on the analysis results
Target preprocessing rule includes:
Feature extraction is carried out to the multiple groups community data, obtains the target signature of the multiple groups community data;
The target signature is matched with the default feature in the preset rules library;
Target processing rule corresponding with the target signature is determined in the processing rule according to matching result;
It regard target processing rule as the target preprocessing rule.
5. the method according to claim 1, wherein including: individual nodes, attribute section in the community network figure
Incidence relation between point, the individual nodes and the attribute node.
6. the method according to claim 1, wherein by default fraud clique's mining rule to the community network
Network figure carries out the evaluation of fraud clique, determines whether the corresponding clique of the community network figure is fraud clique packet according to evaluation result
It includes:
It analyzes the community network figure and default fraud clique's mining rule meets situation;
Meet situation according to described and give a mark to the community network figure, the fraud clique for obtaining the community network figure obtains
Point;
Determine whether the corresponding clique of the community network figure is fraud clique according to fraud clique score.
7. a kind of fraud clique excavating gear characterized by comprising
Module is obtained, for obtaining initial data:
Preprocessing module, for being pre-processed according to preprocessing rule to the initial data, the data that obtain that treated,
In, noise data are not included in treated the data;
Preserving module obtains the storage knot with the chart database for treated that data are saved to chart database by described
The corresponding diagram data of structure;
It divides processing module and obtains multiple groups community data for dividing by community discovery algorithm to the diagram data;
Analysis module determines mesh for analyzing based on preset rules library the multiple groups community data based on the analysis results
Mark preprocessing rule;
Execution module is returned to, for using the target preprocessing rule as the preprocessing rule, and by the multiple groups community
Data are returned and are executed according to preprocessing rule to the pretreated step of initial data progress, directly as the initial data
Meet preset condition to every group of community data;The preset condition includes: number of nodes in every group of community data no more than default
Threshold value, alternatively, the number of nodes in every group of community data no longer changes;
Visualization processing module obtains society for carrying out visualization processing to every group of community data for meeting the preset condition
Area's network;
Clique's evaluation module is cheated, for carrying out fraud clique to the community network figure by default fraud clique's mining rule
Evaluation determines whether the corresponding clique of the community network figure is fraud clique according to evaluation result.
8. device according to claim 7, which is characterized in that the preprocessing rule includes: preset data cleaning rule
With noise recognition rule, the preprocessing module includes:
Data cleansing unit is obtained for carrying out data cleansing to the initial data according to the preset data cleaning rule
Data after cleaning;
Recognition unit, for identifying the noise data in the data after the cleaning based on the noise recognition rule;
Removal unit obtains treated the data for removing the noise data in the data after the cleaning.
9. device according to claim 7, which is characterized in that the community discovery algorithm includes but is not limited to following any
Kind: louvain community discovery algorithm, the community discovery algorithm propagated based on label.
10. a kind of electronic equipment, including memory, processor, it is stored with and can runs on the processor on the memory
Computer program, which is characterized in that the processor is realized in the claims 1 to 6 when executing the computer program
The step of described in any item methods.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910496109.1A CN110209660B (en) | 2019-06-10 | 2019-06-10 | Cheating group mining method and device and electronic equipment |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910496109.1A CN110209660B (en) | 2019-06-10 | 2019-06-10 | Cheating group mining method and device and electronic equipment |
Publications (2)
Publication Number | Publication Date |
---|---|
CN110209660A true CN110209660A (en) | 2019-09-06 |
CN110209660B CN110209660B (en) | 2021-12-24 |
Family
ID=67791653
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201910496109.1A Active CN110209660B (en) | 2019-06-10 | 2019-06-10 | Cheating group mining method and device and electronic equipment |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN110209660B (en) |
Cited By (10)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110647590A (en) * | 2019-09-23 | 2020-01-03 | 税友软件集团股份有限公司 | Target community data identification method and related device |
CN112288330A (en) * | 2020-11-24 | 2021-01-29 | 拉卡拉支付股份有限公司 | Method and device for identifying cheating community |
CN112419074A (en) * | 2020-11-13 | 2021-02-26 | 中保车服科技服务股份有限公司 | Vehicle insurance fraud group identification method and device |
CN112910888A (en) * | 2021-01-29 | 2021-06-04 | 杭州迪普科技股份有限公司 | Illegal domain name registration group mining method and device |
CN112926990A (en) * | 2021-03-25 | 2021-06-08 | 支付宝(杭州)信息技术有限公司 | Method and device for fraud identification |
CN113129010A (en) * | 2020-01-10 | 2021-07-16 | 联洋国融(北京)科技有限公司 | Fraud group mining system and method based on complex network model |
CN113240259A (en) * | 2021-04-30 | 2021-08-10 | 顶象科技有限公司 | Method and system for generating rule policy group and electronic equipment |
CN113743954A (en) * | 2021-06-29 | 2021-12-03 | 阳光保险集团股份有限公司 | Vehicle insurance risk network identification method and device, electronic equipment and medium |
CN113870021A (en) * | 2021-12-03 | 2021-12-31 | 北京芯盾时代科技有限公司 | Data analysis method and device, storage medium and electronic equipment |
CN117575782A (en) * | 2024-01-15 | 2024-02-20 | 杭银消费金融股份有限公司 | Leiden community discovery algorithm-based group fraud identification method |
Citations (9)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20150161622A1 (en) * | 2013-12-10 | 2015-06-11 | Florian Hoffmann | Fraud detection using network analysis |
CN106408413A (en) * | 2016-09-23 | 2017-02-15 | 快睿登信息科技(上海)有限公司 | Multi-cycle installment decision making method and system |
CN107194623A (en) * | 2017-07-20 | 2017-09-22 | 深圳市分期乐网络科技有限公司 | A kind of discovery method and device of clique's fraud |
US20180285876A1 (en) * | 2017-03-30 | 2018-10-04 | Ncr Corporation | Domain-specific configurable fraud prevention |
CN108764917A (en) * | 2018-05-04 | 2018-11-06 | 阿里巴巴集团控股有限公司 | It is a kind of fraud clique recognition methods and device |
CN109658222A (en) * | 2018-10-16 | 2019-04-19 | 深圳壹账通智能科技有限公司 | Risk analysis method, device, equipment and computer readable storage medium |
CN109784636A (en) * | 2018-12-13 | 2019-05-21 | 中国平安财产保险股份有限公司 | Fraudulent user recognition methods, device, computer equipment and storage medium |
CN109802915A (en) * | 2017-11-16 | 2019-05-24 | 中国移动通信集团河南有限公司 | A kind of telecommunication fraud detection processing method and device |
CN109816535A (en) * | 2018-12-13 | 2019-05-28 | 中国平安财产保险股份有限公司 | Cheat recognition methods, device, computer equipment and storage medium |
-
2019
- 2019-06-10 CN CN201910496109.1A patent/CN110209660B/en active Active
Patent Citations (9)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20150161622A1 (en) * | 2013-12-10 | 2015-06-11 | Florian Hoffmann | Fraud detection using network analysis |
CN106408413A (en) * | 2016-09-23 | 2017-02-15 | 快睿登信息科技(上海)有限公司 | Multi-cycle installment decision making method and system |
US20180285876A1 (en) * | 2017-03-30 | 2018-10-04 | Ncr Corporation | Domain-specific configurable fraud prevention |
CN107194623A (en) * | 2017-07-20 | 2017-09-22 | 深圳市分期乐网络科技有限公司 | A kind of discovery method and device of clique's fraud |
CN109802915A (en) * | 2017-11-16 | 2019-05-24 | 中国移动通信集团河南有限公司 | A kind of telecommunication fraud detection processing method and device |
CN108764917A (en) * | 2018-05-04 | 2018-11-06 | 阿里巴巴集团控股有限公司 | It is a kind of fraud clique recognition methods and device |
CN109658222A (en) * | 2018-10-16 | 2019-04-19 | 深圳壹账通智能科技有限公司 | Risk analysis method, device, equipment and computer readable storage medium |
CN109784636A (en) * | 2018-12-13 | 2019-05-21 | 中国平安财产保险股份有限公司 | Fraudulent user recognition methods, device, computer equipment and storage medium |
CN109816535A (en) * | 2018-12-13 | 2019-05-28 | 中国平安财产保险股份有限公司 | Cheat recognition methods, device, computer equipment and storage medium |
Cited By (13)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110647590A (en) * | 2019-09-23 | 2020-01-03 | 税友软件集团股份有限公司 | Target community data identification method and related device |
CN113129010A (en) * | 2020-01-10 | 2021-07-16 | 联洋国融(北京)科技有限公司 | Fraud group mining system and method based on complex network model |
CN112419074A (en) * | 2020-11-13 | 2021-02-26 | 中保车服科技服务股份有限公司 | Vehicle insurance fraud group identification method and device |
CN112288330A (en) * | 2020-11-24 | 2021-01-29 | 拉卡拉支付股份有限公司 | Method and device for identifying cheating community |
CN112910888A (en) * | 2021-01-29 | 2021-06-04 | 杭州迪普科技股份有限公司 | Illegal domain name registration group mining method and device |
CN112926990A (en) * | 2021-03-25 | 2021-06-08 | 支付宝(杭州)信息技术有限公司 | Method and device for fraud identification |
CN113240259A (en) * | 2021-04-30 | 2021-08-10 | 顶象科技有限公司 | Method and system for generating rule policy group and electronic equipment |
CN113743954A (en) * | 2021-06-29 | 2021-12-03 | 阳光保险集团股份有限公司 | Vehicle insurance risk network identification method and device, electronic equipment and medium |
CN113743954B (en) * | 2021-06-29 | 2024-04-02 | 阳光保险集团股份有限公司 | Vehicle risk network identification method and device, electronic equipment and medium |
CN113870021A (en) * | 2021-12-03 | 2021-12-31 | 北京芯盾时代科技有限公司 | Data analysis method and device, storage medium and electronic equipment |
CN113870021B (en) * | 2021-12-03 | 2022-03-08 | 北京芯盾时代科技有限公司 | Data analysis method and device, storage medium and electronic equipment |
CN117575782A (en) * | 2024-01-15 | 2024-02-20 | 杭银消费金融股份有限公司 | Leiden community discovery algorithm-based group fraud identification method |
CN117575782B (en) * | 2024-01-15 | 2024-05-07 | 杭银消费金融股份有限公司 | Leiden community discovery algorithm-based group fraud identification method |
Also Published As
Publication number | Publication date |
---|---|
CN110209660B (en) | 2021-12-24 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN110209660A (en) | Cheat clique's method for digging, device and electronic equipment | |
US11636487B2 (en) | Graph decomposition for fraudulent transaction analysis | |
CN109615524B (en) | Money laundering crime group partner identification method, money laundering crime group partner identification device, computer equipment and storage medium | |
WO2021254027A1 (en) | Method and apparatus for identifying suspicious community, and storage medium and computer device | |
CN109784636A (en) | Fraudulent user recognition methods, device, computer equipment and storage medium | |
WO2015135321A1 (en) | Method and device for mining social relationship based on financial data | |
CN111222976B (en) | Risk prediction method and device based on network map data of two parties and electronic equipment | |
CN111199474B (en) | Risk prediction method and device based on network map data of two parties and electronic equipment | |
CN111210326A (en) | Method and system for constructing user portrait | |
CN107704512A (en) | Financial product based on social data recommends method, electronic installation and medium | |
CN110060087B (en) | Abnormal data detection method, device and server | |
CN108363686A (en) | A kind of character string segmenting method, device, terminal device and storage medium | |
CN105389486B (en) | A kind of authentication method based on mouse behavior | |
CN106156092A (en) | Data processing method and device | |
CN113537960B (en) | Determination method, device and equipment for abnormal resource transfer link | |
CN110427628A (en) | Web assets classes detection method and device based on neural network algorithm | |
CN111444232A (en) | Method for mining digital currency exchange address and storage medium | |
CN113093958A (en) | Data processing method and device and server | |
CN116503166A (en) | Tracking method and tracking system for transaction funds on Ethernet chain | |
CN112581271A (en) | Merchant transaction risk monitoring method, device, equipment and storage medium | |
CN114118816B (en) | Risk assessment method, apparatus, device and computer storage medium | |
CN110009056B (en) | Method and device for classifying social account numbers | |
CN111126788A (en) | Risk identification method and device and electronic equipment | |
CN115237355B (en) | Directional exchange method and system based on flash memory data identification | |
CN111242763A (en) | Method and device for determining target user group |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |