CN109886284B - Fraud detection method and system based on hierarchical clustering - Google Patents

Fraud detection method and system based on hierarchical clustering Download PDF

Info

Publication number
CN109886284B
CN109886284B CN201811522918.7A CN201811522918A CN109886284B CN 109886284 B CN109886284 B CN 109886284B CN 201811522918 A CN201811522918 A CN 201811522918A CN 109886284 B CN109886284 B CN 109886284B
Authority
CN
China
Prior art keywords
node
module
data
leaf
leaf node
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201811522918.7A
Other languages
Chinese (zh)
Other versions
CN109886284A (en
Inventor
蒋昌俊
闫春钢
丁志军
刘关俊
张亚英
张友军
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Tongji University
Original Assignee
Tongji University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Tongji University filed Critical Tongji University
Priority to CN201811522918.7A priority Critical patent/CN109886284B/en
Publication of CN109886284A publication Critical patent/CN109886284A/en
Application granted granted Critical
Publication of CN109886284B publication Critical patent/CN109886284B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Abstract

A fraud detection method and system based on hierarchical clustering, obtain and analyze the characteristic information of trade and get the characteristic analysis data, choose the clustering model according to the characteristic analysis data; acquiring a sample data set, hierarchically clustering the sample data set according to a clustering model to construct a tree structure, and dividing the sample data set into leaf nodes of the tree structure; classifying the leaf nodes to obtain node type data; the leaf nodes in the clustering tree model are processed according to the node type data to finish fraud transaction detection, and the technical problems of incomplete performance consideration, low detection accuracy and unbalanced category in the prior art are solved.

Description

Fraud detection method and system based on hierarchical clustering
Technical Field
The invention relates to a financial fraud detection system, in particular to a fraud detection method and system based on hierarchical clustering.
Background
With the rapid development of electronic commerce, the online transaction amount is increased rapidly, and transaction fraud events are frequent. Due to the openness of the internet environment, a fraudster can master various fraud means such as phishing websites, phone fraud, and the like; meanwhile, due to the characteristics of diversity, anonymity and the like of payment modes, fraud modes are continuously changed. Faced with these problems, it is difficult for financial companies to detect fraudulent transactions through conventional rule-based expert systems, which causes serious economic losses to companies and individuals. Therefore, it is of great practical significance to research how to establish an effective transaction fraud detection model.
In order to solve the increasingly serious transaction fraud problem, a plurality of Machine learning models are applied to fraud transaction detection, wherein the Machine learning models include classification models such as a Support Vector Machine (SVM), a K-nearest neighbor (KNN), a random forest and the like. However, since the number of valid transaction samples in the transaction data set is much larger than that of fraudulent transaction samples, i.e. there is a class imbalance phenomenon, which greatly reduces the classification performance of the conventional model, there are four main factors for generating the problem: unbalanced ratio, sample size, separability, and intra-class sub-clustering. The existing improvement method mainly reduces the negative influence of the class imbalance phenomenon on the performance of the traditional classification model through two aspects, namely a data level and an algorithm level. The data layer is mainly based on a data resampling method to achieve the purpose of changing the ratio of positive samples to negative samples in the data set, but the mode can cause the risk of under-fitting or over-fitting; in the aspect of the algorithm, the existing classification model structure is mainly modified, or a cost sensitive function and other modes are introduced, so that the model is more biased to learning of a few classes of samples in the training process, but the mode has no universality and high complexity. At the same time, in essence, they only consider one essential factor of the imbalance ratio, and ignore the other three factors.
In conclusion, the prior art has the technical problems of incomplete performance consideration, low detection accuracy and unbalanced category.
Disclosure of Invention
In view of the above disadvantages of the prior art, an object of the present invention is to provide a fraud detection method and system based on hierarchical clustering, which solve the technical problems of incomplete performance consideration, low detection accuracy and unbalanced classification in the prior art. A fraud detection method based on hierarchical clustering comprises the following steps: acquiring and analyzing transaction characteristic information to obtain characteristic analysis data, and selecting a clustering model according to the characteristic analysis data; acquiring a sample data set, hierarchically clustering the sample data set according to a clustering model to construct a tree structure, and dividing the sample data set into leaf nodes of the tree structure; classifying the leaf nodes to obtain node type data; and processing leaf nodes in the clustering tree model according to the node type data to finish fraud transaction detection.
In one embodiment of the present invention, obtaining and analyzing the transaction feature information to obtain feature analysis data, and selecting the clustering model according to the feature analysis data includes: acquiring an actual data set, and extracting transaction characteristic information in the actual data set; obtaining feature analysis data based on separability analysis of the transaction feature information; processing the characteristic analysis data into distribution judgment data; and selecting a clustering model according to the distribution judgment data.
In an embodiment of the present invention, acquiring a sample data set, hierarchically clustering the sample data set according to a clustering model to construct a tree structure, and partitioning the sample data set into leaf nodes of the tree structure, includes: creating a tree structure; acquiring and storing a sample data set and node condition data of leaf nodes; selecting applicable processing logic of the current leaf node according to the node condition data; dividing the current nodes into tree structures according to applicable processing logic hierarchical clustering; and iterating the steps until the sample data set is completely divided into leaf nodes in a tree structure.
In one embodiment of the present invention, classifying leaf nodes to obtain node type data includes: acquiring all leaf nodes in a tree structure; extracting category information, balance ratio data and sample number information of leaf nodes; classifying the current leaf nodes according to the category information, the balance ratio data and the sample number information; and acquiring node type data of the current leaf node, and circularly executing the steps until all the leaf nodes are classified into single-class leaf nodes, class balance leaf nodes and leaf nodes containing abnormal samples.
In an embodiment of the present invention, processing leaf nodes in a clustering tree model according to node type data to complete fraud transaction detection includes: acquiring node type data, and selecting an applicable processing mode of a node according to the node type data; and traversing and processing the leaf nodes in the tree structure according to an applicable processing mode.
In an embodiment of the present invention, traversing and processing leaf nodes in a tree structure according to an applicable processing manner includes: judging the type of the current leaf node according to the node type data; if the current leaf node is a single-category node, directly returning the type of the leaf node; if the current leaf node is a category balancing node, training samples in the leaf node by using a preset classification method; if the current leaf node is a leaf node containing an abnormal sample, detecting the leaf node by using preset abnormal detection logic; the foregoing operations are performed on leaf nodes in the tree structure.
In an embodiment of the present invention, a fraud detection system based on hierarchical clustering is characterized by comprising: the system comprises a clustering model selection module, a tree structure module, a leaf node classification module and a fraud detection module; the cluster model selection module is used for acquiring and analyzing the transaction characteristic information to obtain characteristic analysis data and selecting a cluster model according to the characteristic analysis data; the tree structure module is used for acquiring the sample data set, hierarchically clustering the sample data set according to the clustering model to construct a tree structure, and dividing the sample data set into leaf nodes of the tree structure, and the tree structure module is connected with the clustering model selection module; the leaf node classification module is used for classifying the leaf nodes to obtain node type data and is connected with the tree structure module; and the fraud detection module is used for processing the leaf nodes in the clustering tree model according to the node type data to complete fraud transaction detection, and is connected with the leaf node classification module.
In an embodiment of the present invention, the cluster model selecting module includes: the system comprises a transaction characteristic extraction module, a characteristic analysis module, an analysis data processing module and a model selection module; the transaction characteristic extraction module is used for acquiring an actual data set and extracting transaction characteristic information in the actual data set; the characteristic analysis module is used for obtaining characteristic analysis data based on separability analysis of the transaction characteristic information, and the transaction characteristic extraction module is connected with the characteristic analysis module; the analysis data processing module is used for processing the characteristic analysis data into distribution judgment data and is connected with the characteristic analysis module; and the model selection module is used for selecting the clustering model according to the distribution judgment data and is connected with the analysis data processing module.
In one embodiment of the present invention, the tree structure module includes: the system comprises a cluster tree creating module, a node condition obtaining module, a processing logic selecting module, a tree dividing module and a sample data iteration module; the system comprises a clustering tree creating module, a tree structure creating module and a tree structure setting module, wherein the clustering tree creating module is used for creating a tree structure; the node condition acquisition module is used for acquiring and storing the sample data set and the node condition data of the leaf nodes, and is connected with the aggregation tree creation module; the processing logic selection module is used for selecting the applicable processing logic of the current leaf node according to the node condition data, and is connected with the node condition acquisition module; the division and tree-entry module is used for dividing the current nodes into tree-shaped structures according to the applicable processing logic hierarchical categories, and the division and tree-entry module is connected with the processing logic selection module; and the sample data iteration module is used for iterating the steps until the sample data set is completely divided into leaf nodes in a tree structure, and is connected with the tree division module.
In an embodiment of the present invention, the leaf node classifying module includes: the system comprises a leaf node acquisition module, a node data extraction module, a current node classification module and a node class traversal module; the leaf node acquisition module is used for acquiring all leaf nodes in the tree structure; the node data extraction module is used for extracting the category information, the balance ratio data and the sample number information of the leaf nodes, and the leaf node extraction module is connected with the leaf node acquisition module; the current node classification module is used for classifying the current leaf nodes according to the class information, the balance ratio data and the sample number information, and is connected with the node data extraction module; and the node type traversal module is used for acquiring node type data of the current leaf node, and circularly executing the steps until all the leaf nodes are classified into single-type leaf nodes, type balance leaf nodes and leaf nodes containing abnormal samples, and the node type traversal module is connected with the current node classification module.
In one embodiment of the present invention, the fraud detection module includes: the device comprises an application mode selection module and a traversal detection module; the applicable mode selection module is used for acquiring the node type data and selecting the applicable processing mode of the node according to the node type data; and the traversal detection module is used for traversing and processing the leaf nodes in the tree structure according to an applicable processing mode and is connected with the use mode selection module.
In an embodiment of the present invention, the traversal detection module includes: the system comprises a node type judging module, a single-class returning module, a balanced node training module, an abnormal node detecting module and a tree structure traversing detecting module; the node type judging module is used for judging the type of the current leaf node according to the node type data; the single-category returning module is used for directly returning the type of the leaf node when the current leaf node is the single-category node, and the single-category returning module is connected with the node type judging module; the balanced node training module is used for training samples in the leaf nodes by using a preset classification method when the current leaf node is a category balanced node, and is connected with the node type judging module; the abnormal node detection module is used for detecting the leaf nodes by using preset abnormal detection logic when the current leaf nodes are leaf nodes containing abnormal samples, and the abnormal node detection module is connected with the node type judgment module; and the tree structure traversal detection module is used for executing the operation on the leaf nodes in the tree structure, and is connected with the node type judgment module.
As described above, the fraud detection method and system based on hierarchical clustering provided by the present invention have the following beneficial effects: four essential factors influencing the classification performance are comprehensively considered: the imbalance ratio, the sample size, the separability and the intra-class sub-clustering make up for the defect that only a single factor of the imbalance ratio is considered in the prior art. The unsupervised clustering model is used for hierarchical clustering, the large data set with unbalanced categories is divided into a plurality of data subsets with three characteristics, and the problems of division, treatment and simplification are solved, and the problem of unbalanced categories is solved from a new angle.
In conclusion, the invention solves the technical problems of incomplete performance consideration, low detection accuracy and unbalanced category in the prior art.
Drawings
FIG. 1 is a schematic diagram showing steps of a hierarchical clustering-based fraud detection method according to the present invention.
Fig. 2 is a flowchart illustrating step S1 in fig. 1 in an embodiment.
Fig. 3 is a flowchart illustrating step S2 in fig. 1 in an embodiment.
FIG. 4 is a schematic diagram of the class tree structure of the present invention.
Fig. 5 is a flowchart illustrating step S3 in fig. 1 in an embodiment.
Fig. 6 is a flowchart illustrating step S4 in fig. 1 in an embodiment.
Fig. 7 is a flowchart illustrating step S42 in fig. 1 in an embodiment.
FIG. 8 is a schematic diagram of a hierarchical clustering-based fraud detection system according to the present invention.
Fig. 9 is a schematic diagram illustrating a specific module of the clustering model selecting module 11 in fig. 8 in an embodiment.
Fig. 10 is a block diagram of the tree structure module 12 of fig. 8 according to an embodiment.
Fig. 11 is a block diagram of the leaf node classification module 13 in fig. 8 according to an embodiment.
FIG. 12 is a block diagram of the fraud detection module 14 of FIG. 8 in one embodiment.
Fig. 13 is a block diagram illustrating the traversal detection module 142 of fig. 8 in an embodiment.
Description of the element reference numerals
Fraud detection system based on hierarchical clustering
11 clustering model selection module
12-tree structure module
13 leaf node classification module
14 fraud detection module
111 transaction feature extraction module
112 feature analysis module
113 analysis data processing module
114 model selection module
121 class tree creation module
122 node condition acquisition module
123 processing logic selection module
124 divide into tree module
125 sample data iteration module
131 leaf node obtaining module
132 node data extraction module
133 current node classification module
134 node class traversal module
141 applicable mode selecting module
142 traversal detection module
1421 node type judgment module
1422 Single-class Return Module
1423 balanced node training module
1424 abnormal node detection module
1425 tree structure traversal detection module
Description of step designations
FIGS. 1S 1-S4
FIGS. 2S 11-S14
FIGS. 3S 21-S25
FIGS. 5S 31-S34
FIGS. 6S 41-S42
FIGS. 7S 421 to S425
Detailed Description
The following description of the embodiments of the present invention is provided for illustrative purposes, and other advantages and effects of the present invention will become apparent to those skilled in the art from the present disclosure.
Referring to fig. 1 to 12, it should be understood that the structures shown in the drawings attached to the present specification are only used for matching with the contents disclosed in the specification to be known and read by those skilled in the art, and are not used to limit the conditions under which the present invention can be implemented, so that the present invention has no essential technical significance. In addition, the terms "upper", "lower", "left", "right", "middle" and "one" used in the present specification are for clarity of description, and are not intended to limit the scope of the present invention, and the relative relationship between the terms and the terms is not to be construed as a scope of the present invention.
Referring to fig. 1, a schematic diagram showing steps of the fraud detection method based on hierarchical clustering according to the present invention is shown, as shown in fig. 1, a fraud detection method based on hierarchical clustering includes:
s1, acquiring and analyzing transaction characteristic information to obtain characteristic analysis data, selecting a clustering model according to the characteristic analysis data, and providing a fraud detection model based on hierarchical clustering aiming at the problem of category imbalance in fraud transaction detection;
s2, acquiring a sample data set, hierarchically clustering the sample data set according to a clustering model to construct a tree structure, and dividing the sample data set into leaf nodes of the tree structure, wherein optionally, the tree structure is a clustering tree, a fraud detection model forms a clustering tree in a hierarchical clustering mode, and in the process, an original data set is divided into the leaf nodes of the clustering tree after multiple iterations;
s3, classifying the leaf nodes to obtain node type data, optionally, each leaf node is a data subset;
and S4, processing the leaf nodes in the clustering tree model according to the node type data to complete fraud transaction detection, and finally, only performing corresponding processing on the data subsets in each leaf node to detect abnormal transaction samples in each data subset.
Referring to fig. 2, which is a detailed flowchart of step S1 in fig. 1 in an embodiment, as shown in fig. 2, step S1 is performed to obtain and analyze transaction feature information to obtain feature analysis data, and selecting a cluster model according to the feature analysis data includes:
s11, acquiring an actual data set, extracting transaction characteristic information in the actual data set, and aiming at four essential factors influencing classification performance, regarding sample scale, a class unbalanced data set can be used as the input of the model without any resampling pretreatment, so that the sample scale is equal to the size of the whole data set, regarding the unbalance ratio, the model automatically filters most classes of samples in the hierarchical clustering process, and finally, some leaf nodes with balanced classes are constructed, in other words, the model can automatically adjust the class unbalance ratio in the data set;
s12, analyzing separability based on the transaction characteristic information to obtain characteristic analysis data, and selecting a proper clustering model according to the characteristics of the data set for separability in order to filter more samples in the hierarchical clustering process;
s13, processing the feature analysis data into distribution judgment data, selecting a proper clustering model based on separability, if the data set meets the features of Gaussian distribution, using a Gaussian Mixture Model (GMM) in the model, and if the abnormal samples have aggregations in the Euclidean space, using K-Means, optionally, constructing a corresponding clustering tree model based on real transaction data of a financial company in the model. First, the features of the real data set need to be analyzed based on separability to select the most appropriate clustering model. The distribution characteristics of the data set can be found in the Euclidean space, and for visualization, the data set needs to be subjected to dimensionality reduction by a PCA method so as to obtain a more visual scatter diagram in a two-dimensional space;
s14, selecting a clustering model according to the distribution judgment data, and for intra-class sub-clustering, because the model is constructed based on an unsupervised clustering algorithm, the influence of intra-class sub-clustering on the classification performance can be greatly reduced. Alternatively, the data set can be found to have a clustering distribution phenomenon in the Euclidean space by means of a graph. For this case, K-Means may then be selected as the clustering model.
Referring to fig. 3 and 4, which are a detailed flowchart of step S2 in fig. 1 in an embodiment and a schematic diagram of a class tree structure of the present invention, as shown in fig. 3 and 4, step S2, acquiring a sample data set, hierarchically clustering the sample data set according to a clustering model to construct a tree structure, and partitioning the sample data set into leaf nodes of the tree structure, includes:
s21, creating a tree structure, wherein the most important part in the whole model is an algorithm for constructing a clustering tree through hierarchical clustering, and the construction process of the algorithm is explained as follows: the algorithm is a recursive calling algorithm;
s22, acquiring and storing the sample data set and node condition data of leaf nodes, wherein the algorithm needs to input a data set Dataset, a leaf node balance ratio BRate and a leaf node minimum sample number MSize at the beginning, then positive and negative sample numbers in the Dataset are respectively calculated and stored in N1 and N0;
s23, selecting applicable processing logic of the current leaf node according to the node condition data, optionally, sequentially judging whether the current Dataset meets three conditions of a leaf node, if the value of N1 or N0 is 0, the condition of the single-class leaf node is met, and the current leaf node is processed by using 'singleLable' (directly returning to the class of the data subset in the leaf node); if the ratio of N1 to N0 is less than BRate, then the class balance leaf node condition is satisfied and an "SVM" (support vector machine separator) is required to classify the subset of data in the current leaf node; if the total number of N1 and N0 is less than MSize, then the leaf node condition with outlier samples is satisfied, requiring the use of "KNN" (K neighbor model) for outlier detection on the subset of data in the current leaf node. When the conditions of the three leaf nodes are all satisfied, the current node is used as a non-leaf node, a data set in the current node needs to be clustered by using a KMeans (K-Means clustering model) or a GMM (Gaussian mixture model), the current process is recursively called for the data subsets divided into each cluster, and the result is used as a sub-tree of the current node;
s24, dividing the current node into tree structures according to the applicable processing logic hierarchical clustering, constructing a tree structure by continuous iteration by using a selected clustering model, wherein optionally, "cluster number" represents the ID number of the cluster to which the current node belongs after the last-layer clustering operation, "normal" represents the number of normal samples, "abnormal" represents the number of abnormal samples, and "model" represents the model used for processing the data subsets in the current node;
and S25, iterating the previous steps until the sample data set is completely divided into leaf nodes in a tree structure, wherein the data set is continuously divided into the leaf nodes in the process.
Referring to fig. 5, which is a detailed flowchart of step S3 in fig. 1 according to an embodiment, as shown in fig. 5, the step S3 of classifying leaf nodes to obtain node type data includes:
s31, acquiring all leaf nodes in the tree structure;
s32, extracting the category information, the balance ratio data and the sample number information of the leaf nodes, and comprehensively considering four essential factors influencing the classification performance: unbalanced ratio, sample size, separability and intra-class sub-clustering;
s33, classifying the current leaf node according to the category information, the balance ratio data, and the sample number information, and optionally, finally forming three leaf nodes: a single class leaf node, a class balance leaf node, and a leaf node containing an abnormal sample;
and S34, acquiring the node type data of the current leaf node, and circularly executing the steps until all the leaf nodes are classified into single-class leaf nodes, class balance leaf nodes and leaf nodes containing abnormal samples.
Referring to fig. 6, which is a detailed flowchart of step S4 in fig. 1, in an embodiment, as shown in fig. 6, step S4, processing leaf nodes in the clustering tree model according to the node type data to complete fraud transaction detection, includes:
s41, acquiring node type data, selecting an applicable processing mode of a node according to the node type data, and constructing a decision tree model, namely a clustering tree, in a hierarchical clustering mode by combining the ideas of a clustering model, an anomaly detection method and a decision tree classification model;
and S42, traversing and processing the leaf nodes in the tree structure according to the applicable processing mode, and respectively adopting three processing modes aiming at the three leaf nodes to carry out different processing on different leaf nodes generated in the process so as to detect more fraud transaction samples.
Referring to fig. 7, which is a detailed flowchart of step S42 in fig. 1 in one embodiment, as shown in fig. 7, step S42, traversing the leaf nodes in the tree structure according to the applicable processing method, includes:
s421, judging the type of the current leaf node according to the node type data;
s422, if the current leaf node is a single-class node, directly returning the type of the leaf node, and a single-class leaf node, where all the data subsets in the leaf node belong to the same class, optionally, for the single-class leaf node, directly returning the type to which the sample in the leaf node belongs, and for the evaluation of the clustering tree model, calculating to obtain a confusion matrix according to the result of the fraud detection, as shown in table 1.
TABLE 1 confusion matrix for two-class tasks
Figure GDA0002042961670000091
Then, according to table 1, Recall (Recall), Precision (Precision) and weighted average of the two (F1) are calculated, and the calculation formula is as follows.
Figure GDA0002042961670000092
Figure GDA0002042961670000093
Figure GDA0002042961670000094
Finally, we will use five common fraud detection models to detect on the same data and compare on these three metrics. The results of the experiment are shown in table 2.
TABLE 2 results of the experiment
Model F1 Precision Recall
Clustering.Tree 0.807 0.712 0.932
AdaBoosting 0.752 0.608 0.985
Random Forest 0.747 0.607 0.971
Decision Tree 0.661 0.502 0.965
SVM 0.657 0.494 0.981
Logistic Regression 0.651 0.487 0.979
From table 2, it can be seen that compared to other models, the model proposed herein has an improvement in accuracy index of 10% compared to the second name AdaBoosting, while the recall rate is reduced by only 5% and has a significant improvement in F1 index;
s423, if the current leaf node is a category balancing node, training samples in the leaf node by using a preset classification method, and the category balancing leaf node, where the sample subset in the leaf node has reached a category balancing ratio, that is, the ratio of the number of most samples to the number of least samples reaches the preset balancing ratio, optionally, for the category balancing leaf node, performing model training on the data set in the leaf node by using a decision tree, an SVM, a random forest and other traditional classification methods;
s424, if the current leaf node is a leaf node containing an abnormal sample, detecting the leaf node by using a preset abnormal detection logic, where the leaf node contains an abnormal sample leaf node, and the leaf node does not satisfy the conditions of the first two leaf nodes, but the total number of samples is less than the minimum number of samples allowed by a preset single node, so as to prevent the occurrence of model overfitting, and optionally, for the leaf node containing an abnormal sample, processing by using an abnormal detection method, such as an abnormal detection method based on distance, etc.;
and S425, performing the operation on the leaf nodes in the tree structure.
Referring to fig. 8, a schematic diagram of a hierarchical clustering-based fraud detection system module according to the present invention is shown, and as shown in fig. 8, a hierarchical clustering-based fraud detection system 1 is characterized by comprising: the system comprises a clustering model selection module 11, a tree structure module 12, a leaf node classification module 13 and a fraud detection module 14; the cluster model selecting module 11 is used for acquiring and analyzing transaction characteristic information to obtain characteristic analysis data, selecting a cluster model according to the characteristic analysis data, selecting the cluster model according to the characteristic analysis data, and providing a fraud detection model based on hierarchical clustering aiming at the problem of category imbalance in fraud transaction detection; the tree structure module 12 is configured to acquire a sample data set, hierarchically cluster the sample data set according to a clustering model to construct a tree structure, and partition the sample data set into leaf nodes of the tree structure, optionally, the tree structure is a clustering tree, the fraud detection model forms a clustering tree in a hierarchical clustering manner, in the process, an original data set is partitioned into each leaf node of the clustering tree through multiple iterations, and the tree structure module 12 is connected with the clustering model selection module 11; a leaf node classifying module 13, configured to classify leaf nodes to obtain node type data, optionally, each leaf node is a data subset, and the leaf node classifying module 13 is connected to the tree structure module 12; and the fraud detection module 14 is configured to process leaf nodes in the clustering tree model according to the node type data to complete fraud transaction detection, and finally, only corresponding processing needs to be performed on the data subsets in each leaf node to detect abnormal transaction samples in each data subset, and the fraud detection module 14 is connected with the leaf node classification module 13.
Referring to fig. 9, which is a schematic diagram illustrating a specific module of the cluster model selecting module 11 in fig. 8 in an embodiment, as shown in fig. 9, the cluster model selecting module 11 includes: a transaction feature extraction module 111, a feature analysis module 112, an analysis data processing module 113 and a model selection module 114; the transaction feature extraction module 111 is configured to obtain an actual data set, extract transaction feature information in the actual data set, and for four essential factors affecting classification performance, in terms of sample size, a class-unbalanced data set may be used as an input of the model without any resampling preprocessing, so that the sample size is equal to the size of the entire data set, and for an imbalance ratio, the model automatically filters most classes of samples in a hierarchical clustering process, and finally, some leaf nodes with balanced classes are constructed, in other words, the model may automatically adjust the class imbalance ratio in the data set; the characteristic analysis module 112 is used for analyzing characteristic analysis data based on separability of transaction characteristic information, for separability, in order to filter more samples in the hierarchical clustering process, a proper clustering model can be selected according to the characteristics of a data set, and the transaction characteristic extraction module 112 is connected with the characteristic analysis module 111; the analysis data processing module 113 is configured to process the feature analysis data into distribution judgment data, select a suitable clustering model based on separability, use a Gaussian Mixture Model (GMM) if the data set satisfies features of gaussian distribution, use K-Means if the abnormal samples have aggregations in the middle of the euclidean space, and optionally, construct a corresponding clustering tree model based on real transaction data of a financial company. First, the features of the real data set need to be analyzed based on separability to select the most appropriate clustering model. The distribution characteristics of the data set can be found in the Euclidean space, for visualization, the data set needs to be subjected to dimensionality reduction by a PCA method so as to obtain a more visual scatter diagram in a two-dimensional space, and the analysis data processing module 113 is connected with the characteristic analysis module 112; and the model selecting module 114 is used for selecting a clustering model according to the distribution judgment data, and for the intra-class sub-clustering, because the model is constructed based on an unsupervised clustering algorithm, the influence of the intra-class sub-clustering on the classification performance can be greatly reduced. Alternatively, the data set can be found to have a clustering distribution phenomenon in the Euclidean space by means of a graph. For this case, K-Means may be selected as the clustering model, and the model selection module 114 is connected to the analysis data processing module 113.
Referring to fig. 10, which is a schematic diagram illustrating a specific module of the tree structure module 12 in fig. 8 in an embodiment, as shown in fig. 10, the tree structure module 12 includes: the system comprises a cluster tree creating module 121, a node condition obtaining module 122, a processing logic selecting module 123, a tree dividing module 124 and a sample data iteration module 125; the cluster tree creating module 121 is configured to create a tree structure, where the most important part in the entire model is an algorithm for building a cluster tree through hierarchical clustering, and a process description of the algorithm is built: the algorithm is a recursive calling algorithm; the node condition obtaining module 122 is configured to obtain and store node condition data of a sample data set and leaf nodes, where an algorithm starts to input a data set Dataset, a balance ratio of a leaf node, and a minimum sample number of the leaf node, MSize, and then positive and negative sample numbers in the Dataset are respectively calculated and stored in N1 and N0, and the node condition obtaining module 122 is connected to the aggregation tree creating module 121; a processing logic selecting module 123, configured to select an applicable processing logic of a current leaf node according to the node condition data, optionally, then sequentially determine whether the current Dataset meets three conditions of a leaf node, if the value of N1 or N0 is 0, the single-category leaf node condition is met, and the current leaf node needs to be processed using a "SingleLable" (directly returning to the category of the data subset in the leaf node); if the ratio of N1 to N0 is less than BRate, then the class balance leaf node condition is satisfied and an "SVM" (support vector machine separator) is required to classify the subset of data in the current leaf node; if the total number of N1 and N0 is less than MSize, then the leaf node condition with outlier samples is satisfied, requiring the use of "KNN" (K neighbor model) for outlier detection on the subset of data in the current leaf node. When the conditions of the three leaf nodes are all satisfied, the current node is used as a non-leaf node, a data set in the current node needs to be clustered by using a KMeans (K-Means clustering model) or a GMM (Gaussian mixture model), the current process is recursively called for the data subsets divided into each cluster, the result is used as a sub-tree of the current node, and the processing logic selection module 123 is connected with the node condition acquisition module 122; a tree dividing and entering module 124, configured to divide the current node into tree structures according to applicable processing logic hierarchical clustering, and construct a tree structure through continuous iteration using a selected clustering model, optionally, in each leaf node, "cluster number" indicates an ID number of a cluster to which the current node belongs after a previous-layer clustering operation, "normal" indicates the number of normal samples, "abnormal" indicates the number of abnormal samples, "model" indicates a model used for processing a data subset in the current node, and the tree dividing and entering module 124 is connected to the processing logic selecting module 123; and the sample data iteration module 125 is used for iterating the steps until the sample data set is completely divided into leaf nodes in a tree structure, and the sample data iteration module 125 is connected with the tree division module 124.
Referring to fig. 11, which is a schematic block diagram illustrating an embodiment of the leaf node classification module 13 in fig. 8, as shown in fig. 11, the leaf node classification module 13 includes: a leaf node obtaining module 131, a node data extracting module 132, a current node classifying module 133 and a node class traversing module 134; a leaf node obtaining module 131, configured to obtain all leaf nodes in the tree structure; the node data extracting module 132 is configured to extract category information, balance ratio data, and sample number information of the leaf nodes, and comprehensively consider four essential factors that affect the classification performance: unbalanced ratio, sample size, separability and intra-class clustering, and the leaf node extracting module 132 is connected with the leaf node acquiring module 131; the current node classifying module 133 is configured to classify the current leaf node according to the category information, the balance ratio data, and the sample number information, and optionally, three types of leaf nodes are finally formed: a single-class leaf node, a class balance leaf node and a leaf node containing an abnormal sample, wherein the current node classification module 133 is connected with the node data extraction module 132; the node type traversal module 134 is configured to obtain node type data of a current leaf node, and execute the foregoing steps in a loop until all leaf nodes are classified into single-class leaf nodes, class balance leaf nodes, and leaf nodes containing abnormal samples, where the node type traversal module 134 is connected to the current node classification module 133.
Referring to fig. 12, which is a schematic block diagram illustrating the fraud detection module 14 in fig. 8 in an embodiment, as shown in fig. 12, the fraud detection module 14 includes: an applicable mode selection module 141 and a traversal detection module 142; an applicable mode selection module 141, configured to obtain node type data, select an applicable processing mode of a node according to the node type data, and construct a decision tree model, i.e., a clustering tree, in a hierarchical clustering manner by combining concepts of a clustering model, an anomaly detection method, and a decision tree classification model; the traversal detection module 142 is configured to perform traversal processing on leaf nodes in the tree structure according to an applicable processing manner, and for the three leaf nodes, different processing is performed on different leaf nodes generated in the process by using three processing manners respectively to detect more fraudulent transaction samples, and the traversal detection module 142 is connected to the use manner selection module 141.
Referring to fig. 13, which is a schematic diagram illustrating specific modules of the traversal detection module 142 in fig. 8 in an embodiment, as shown in fig. 13, the traversal detection module 142 includes: a node type judgment module 1421, a single category return module 1422, a balanced node training module 1423, an abnormal node detection module 1424 and a tree structure traversal detection module 1425; a node type determining module 1421, configured to determine the type of the current leaf node according to the node type data; a single-category returning module 1422, configured to, when the current leaf node is a single-category node, directly return the type of the leaf node, a single-category leaf node, where all the subsets of data in the leaf node belong to the same category, and optionally, for the leaf nodes of the single category, directly returning the type of the sample in the leaf nodes, for the evaluation of the clustering tree model, firstly, calculating a confusion matrix according to the result of fraud detection, calculating a Recall ratio (Recall), an accuracy ratio (Precision) and a weighted average value (F1) of the Recall ratio and the accuracy ratio, finally, detecting on the same data by using five common fraud detection models, compared with other models, the model provided by the method has the advantages that the accuracy index is improved by 10 percent compared with the second name AdaBoosting, the recall rate is reduced by only 5%, and an obvious promotion list type returning module 1422 is connected with the node type judging module 1421 on the F1 index; the balanced node training module 1423 is configured to, when the current leaf node is a category balanced node, train samples in the leaf node by using a preset classification method, and balance the leaf node by category, where a sample subset in the leaf node has reached a category balance ratio, that is, a ratio of a majority sample number to a minority sample number reaches a preset balance ratio, optionally, for the category balanced leaf node, perform model training on a data set in the leaf node by using a traditional classification method such as a decision tree, an SVM, a random forest, and the like, and the balanced node training module 1423 is connected to the node type determining module 1421; an abnormal node detection module 1424, configured to detect a leaf node by using a preset abnormal detection logic when the current leaf node is a leaf node containing an abnormal sample, where the leaf node does not satisfy the conditions of the first two leaf nodes, but the total number of samples is less than the minimum number of samples allowed by a preset single node, so as to prevent the occurrence of a phenomenon of model overfitting, and optionally, for the leaf node containing an abnormal sample, the abnormal node detection module 1424 is connected to the node type determination module 1421 by using an abnormal detection method for processing, for example, a distance-based abnormal detection method, and the like; the tree structure traversal detecting module 1425 is configured to perform the foregoing operation on a leaf node in the tree structure, and the tree structure traversal detecting module 1425 is connected to the node type determining module 1421.
In summary, the fraud detection method and system based on hierarchical clustering provided by the invention. The invention has the following beneficial effects: the fraud detection model based on hierarchical clustering provided by the invention comprehensively considers four factors influencing the classification performance and avoids the limitations of the two methods to a certain extent. The invention provides a fraud detection model based on hierarchical clustering, aiming at the problem of unbalanced category in fraud transaction detection. The model forms a clustering tree in a hierarchical clustering mode, in the process, an original data set is divided into leaf nodes of the clustering tree through multiple iterations, and each leaf node is a data subset. Finally, only the data subsets in each leaf node are correspondingly processed, and abnormal transaction samples in each data subset are detected, so that the method can be summarized, the technical problems that hierarchical clustering is performed by using an unsupervised clustering model in the prior art, a large data set with unbalanced categories is divided into a plurality of data subsets with three characteristics, the data subsets are treated separately and simplified, the problem of unbalanced categories is solved from a new angle, and four essential factors influencing the classification performance are comprehensively considered: the imbalance ratio, the sample size, the separability and the intra-class sub-clustering make up for the defect that only a single factor of the imbalance ratio is considered in the prior art. The unsupervised clustering model is used for hierarchical clustering, the large data set with unbalanced categories is divided into a plurality of data subsets with three characteristics, and the data subsets are treated by dividing the data subsets into three types, so that the problems of unbalanced categories are solved from a new angle, and the method has good authentication safety and accuracy, and has high commercial value and practicability.

Claims (10)

1. A fraud detection method based on hierarchical clustering is characterized by comprising the following steps:
acquiring and analyzing transaction characteristic information to obtain characteristic analysis data, and selecting a clustering model according to the characteristic analysis data;
acquiring a sample data set, hierarchically clustering the sample data set according to the clustering model to construct a tree structure, and dividing the sample data set into leaf nodes of the tree structure;
classifying the leaf nodes to obtain node type data;
processing the leaf nodes in the clustering tree model according to the node type data to complete fraud transaction detection;
the acquiring of the sample data set, hierarchically clustering the sample data set according to the clustering model to construct a tree structure, and partitioning the sample data set into leaf nodes of the tree structure includes:
creating the tree structure;
acquiring and storing node condition data of the sample data set and leaf nodes, wherein the node condition data comprise a data set Dataset, a balance ratio BRate of a leaf node, and a minimum sample number Msize of the leaf node, and positive and negative sample numbers of the data set Dataset are calculated through the data set Dataset, the balance ratio BRate of the leaf node and the minimum sample number Msize of the leaf node, and the positive and negative sample numbers of the data set Dataset are respectively stored in N1 and N0;
selecting applicable processing logic of the current leaf node according to the node condition data, wherein the applicable processing logic comprises,
judging whether the current Dataset meets three conditions of the leaf node, wherein the three conditions are as follows:
if the value of N1 or N0 of the Dataset is 0, a single-category leaf node condition is satisfied, the current leaf node needs to be processed by directly returning the category of the data subset in the current leaf node,
if the ratio of N1 to N0 of the data set Dataset is less than the equilibrium ratio of the leaf node, then a leaf node condition of class equilibrium is satisfied, a support vector machine separator is used to classify the data subset in the leaf node,
if the total number of N1 and N0 of the data set Dataset is less than the minimum number of leaf nodes Msize, then satisfying the leaf node condition containing abnormal samples, using a K neighbor model to perform abnormal detection on the data subset in the current leaf node,
if none of the data sets Dataset is satisfied, the current node is a non-leaf node, a K-Means clustering model or a Gaussian mixture model is used for clustering the data set in the current node, the current process is recursively called for the data subsets divided into each cluster, and the result is taken as a sub-tree of the current node;
dividing the current nodes into the tree structure according to the applicable processing logic hierarchical clustering;
and iterating until the sample data set is completely divided into the leaf nodes in the tree structure.
2. The method of claim 1, wherein the obtaining and analyzing transaction characteristic information to obtain characteristic analysis data, and the selecting a clustering model based on the characteristic analysis data comprises:
acquiring an actual data set, and extracting transaction characteristic information in the actual data set;
obtaining the characteristic analysis data based on separability analysis of the transaction characteristic information;
processing the characteristic analysis data into distribution judgment data;
and selecting the clustering model according to the distribution judgment data.
3. The method of claim 1, wherein classifying the leaf nodes to obtain node type data comprises:
acquiring all the leaf nodes in the tree structure;
extracting the category information, the balance ratio data and the sample number information of the leaf nodes;
classifying the current leaf node according to the category information, the balance ratio data and the sample number information;
and acquiring the node type data of the current leaf node, and executing in a circulating way until all the leaf nodes are classified into single-class leaf nodes, class balance leaf nodes and leaf nodes containing abnormal samples.
4. The method of claim 1, wherein processing the leaf nodes in the clustering tree model according to the node type data to perform fraud transaction detection comprises:
acquiring the node type data, and selecting an applicable processing mode of the node according to the node type data, wherein the applicable processing mode comprises a decision tree model constructed in a hierarchical clustering mode by combining ideas of a clustering model, an anomaly detection method and a decision tree classification model;
and traversing the leaf nodes in the tree structure according to the applicable processing mode.
5. The method according to claim 4, wherein said traversal processing of said leaf nodes in said tree structure according to said applicable processing mode comprises the steps of:
s1', judging the type of the current leaf node according to the node type data;
s2', if the current leaf node is a single-class node, directly returning the type of the leaf node;
s3', if the current leaf node is a class balance node, training the sample in the leaf node by using a preset classification method;
s4', if the current leaf node is a leaf node containing an abnormal sample, detecting the leaf node by using preset abnormal detection logic;
performing the operations of steps S1 'to S4' on the leaf nodes in the tree structure.
6. A hierarchical clustering-based fraud detection system, comprising: the system comprises a clustering model selection module, a tree structure module, a leaf node classification module and a fraud detection module;
the clustering model selecting module is used for acquiring and analyzing transaction characteristic information to obtain characteristic analysis data, and selecting a clustering model according to the characteristic analysis data;
the tree structure module is used for acquiring a sample data set, hierarchically clustering the sample data set according to the clustering model to construct a tree structure, and dividing the sample data set into leaf nodes of the tree structure;
the leaf node classifying module is used for classifying the leaf nodes to obtain node type data;
the fraud detection module is used for processing the leaf nodes in the clustering tree model according to the node type data to complete fraud transaction detection;
wherein the tree structure module comprises: a cluster tree creating module, a node condition obtaining module, a processing logic selecting module, a tree dividing module and a sample data iteration module,
the node condition obtaining module is configured to obtain and store node condition data of the sample data set and a leaf node in the node condition data, where the node condition data includes a data set Dataset, a balance ratio of a leaf node, a minimum number of samples of the leaf node Msize, and positive and negative sample numbers of the data set Dataset calculated by the data set Dataset, the balance ratio of the leaf node, and the minimum number of samples of the leaf node Msize, and the positive and negative sample numbers of the data set Dataset are stored in N1 and N0, respectively;
the processing logic selecting module is used for selecting the applicable processing logic of the current leaf node according to the node condition data, wherein the applicable processing logic comprises,
judging whether the current Dataset meets three conditions of the leaf node, wherein the three conditions are as follows:
if the value of N1 or N0 of the Dataset is 0, a single-category leaf node condition is satisfied, the current leaf node needs to be processed by directly returning the category of the data subset in the current leaf node,
if the ratio of N1 to N0 of the data set Dataset is less than the equilibrium ratio of the leaf node, then a leaf node condition of class equilibrium is satisfied, a support vector machine separator is used to classify the data subset in the leaf node,
if the total number of N1 and N0 of the data set Dataset is less than the minimum number of leaf nodes Msize, then satisfying the leaf node condition containing abnormal samples, using a K neighbor model to perform abnormal detection on the data subset in the current leaf node,
if none of the data sets Dataset is satisfied, the current node is a non-leaf node, a K-Means clustering model or a Gaussian mixture model is used for clustering the data set in the current node, the current process is recursively called for the data subsets divided into each cluster, and the result is taken as a sub-tree of the current node;
the tree dividing and entering module is used for dividing the current nodes into the tree structure according to the applicable processing logic hierarchical clustering;
and the sample data iteration module is used for iterating until the sample data set is completely divided into the leaf nodes in the tree structure.
7. The system of claim 6, wherein the cluster model selecting module comprises: the system comprises a transaction characteristic extraction module, a characteristic analysis module, an analysis data processing module and a model selection module;
the transaction characteristic extraction module is used for acquiring an actual data set and extracting transaction characteristic information in the actual data set;
the characteristic analysis module is used for analyzing separability of the transaction characteristic information to obtain the characteristic analysis data;
the analysis data processing module is used for processing the characteristic analysis data into distribution judgment data;
and the model selection module is used for selecting the clustering model according to the distribution judgment data.
8. The system of claim 6, wherein the leaf node classification module comprises: the system comprises a leaf node acquisition module, a node data extraction module, a current node classification module and a node class traversal module;
the leaf node acquisition module is used for acquiring all the leaf nodes in the tree structure;
the node data extraction module is used for extracting the category information, the balance ratio data and the sample number information of the leaf nodes;
the current node classification module is used for classifying the current leaf nodes according to the category information, the balance ratio data and the sample number information;
and the node type traversal module is used for acquiring the node type data of the current leaf node and executing in a circulating way until all the leaf nodes are classified into single-type leaf nodes, type balance leaf nodes and leaf nodes containing abnormal samples.
9. The system of claim 6, wherein the fraud detection module comprises: the device comprises an application mode selection module and a traversal detection module;
the applicable mode selection module is used for acquiring the node type data and selecting an applicable processing mode of the node according to the node type data, wherein the applicable processing mode comprises a decision tree model constructed in a hierarchical clustering mode by combining ideas of a clustering model, an anomaly detection method and a decision tree classification model;
and the traversal detection module is used for performing traversal processing on the leaf nodes in the tree structure according to the applicable processing mode.
10. The system of claim 9, wherein the traversal detection module comprises: the system comprises a node type judging module, a single-class returning module, a balanced node training module, an abnormal node detecting module and a tree structure traversing detecting module;
the node type judging module is configured to execute step S1', and judge the type of the current leaf node according to the node type data;
the single-category returning module is configured to execute step S2', and when the current leaf node is a single-category node, directly return the type of the leaf node;
the balanced node training module is configured to execute step S3', and when the current leaf node is a category balanced node, train the sample in the leaf node by using a preset classification method;
the abnormal node detecting module is configured to execute step S4', and when the current leaf node is a leaf node containing an abnormal sample, detect the leaf node using a preset abnormal detection logic;
the tree structure traversal detection module is used for executing the operations from S1 'to S4' on the leaf nodes in the tree structure.
CN201811522918.7A 2018-12-12 2018-12-12 Fraud detection method and system based on hierarchical clustering Active CN109886284B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201811522918.7A CN109886284B (en) 2018-12-12 2018-12-12 Fraud detection method and system based on hierarchical clustering

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201811522918.7A CN109886284B (en) 2018-12-12 2018-12-12 Fraud detection method and system based on hierarchical clustering

Publications (2)

Publication Number Publication Date
CN109886284A CN109886284A (en) 2019-06-14
CN109886284B true CN109886284B (en) 2021-02-12

Family

ID=66925020

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201811522918.7A Active CN109886284B (en) 2018-12-12 2018-12-12 Fraud detection method and system based on hierarchical clustering

Country Status (1)

Country Link
CN (1) CN109886284B (en)

Families Citing this family (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110322349B (en) * 2019-06-25 2023-08-22 创新先进技术有限公司 Data processing method, device and equipment
CN110825823B (en) * 2019-10-15 2023-04-07 清华大学 Method and system for hierarchical clustering
CN111310812B (en) * 2020-02-06 2023-04-28 佛山科学技术学院 Hierarchical human body activity recognition method and system based on data driving
CN111709472B (en) * 2020-06-15 2022-09-23 国家计算机网络与信息安全管理中心 Method for dynamically fusing rules to fraud behavior recognition model
CN113034145B (en) * 2021-05-24 2021-09-03 智安链云科技(北京)有限公司 Method and device for judging transaction category of user abnormal encrypted digital asset

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105357583A (en) * 2015-10-16 2016-02-24 Tcl集团股份有限公司 Method and device for discovering interest and preferences of intelligent television user
CN109509028A (en) * 2018-11-15 2019-03-22 北京奇虎科技有限公司 A kind of advertisement placement method and device, storage medium, computer equipment

Family Cites Families (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
DE60233935D1 (en) * 2002-07-19 2009-11-19 Mitsubishi Electric Inf Tech Method and device for data processing
CN101826105B (en) * 2010-04-02 2013-06-05 南京邮电大学 Phishing webpage detection method based on Hungary matching algorithm
CN101976348A (en) * 2010-10-21 2011-02-16 中国科学院深圳先进技术研究院 Image clustering method and system
CN103699678B (en) * 2013-12-31 2016-09-28 苏州大学 A kind of hierarchy clustering method based on multistage stratified sampling and system
CN103793484B (en) * 2014-01-17 2017-03-15 五八同城信息技术有限公司 The fraud identifying system based on machine learning in classification information website
CN104102706A (en) * 2014-07-10 2014-10-15 西安交通大学 Hierarchical clustering-based suspicious taxpayer detection method
CN105787743A (en) * 2016-02-26 2016-07-20 中国银联股份有限公司 Fraudulent trading detection method based on sample clustering
US10409911B2 (en) * 2016-04-29 2019-09-10 Cavium, Llc Systems and methods for text analytics processor
CN106296343A (en) * 2016-08-01 2017-01-04 王四春 A kind of e-commerce transaction monitoring method based on the Internet and big data
CN108268526A (en) * 2016-12-30 2018-07-10 中国移动通信集团北京有限公司 A kind of data classification method and device

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105357583A (en) * 2015-10-16 2016-02-24 Tcl集团股份有限公司 Method and device for discovering interest and preferences of intelligent television user
CN109509028A (en) * 2018-11-15 2019-03-22 北京奇虎科技有限公司 A kind of advertisement placement method and device, storage medium, computer equipment

Also Published As

Publication number Publication date
CN109886284A (en) 2019-06-14

Similar Documents

Publication Publication Date Title
CN109886284B (en) Fraud detection method and system based on hierarchical clustering
US20170083920A1 (en) Hybrid method of decision tree and clustering technology
CN111695626A (en) High-dimensional unbalanced data classification method based on mixed sampling and feature selection
US20200286095A1 (en) Method, apparatus and computer programs for generating a machine-learning system and for classifying a transaction as either fraudulent or genuine
CN112417176B (en) Method, equipment and medium for mining implicit association relation between enterprises based on graph characteristics
Pandey et al. Stratified linear systematic sampling based clustering approach for detection of financial risk group by mining of big data
CN110581840B (en) Intrusion detection method based on double-layer heterogeneous integrated learner
CN115577357A (en) Android malicious software detection method based on stacking integration technology
Pristyanto et al. The effect of feature selection on classification algorithms in credit approval
CN109902731B (en) Performance fault detection method and device based on support vector machine
Wang et al. Credit Card Fraud Detection using Logistic Regression
Xu et al. Sample selection-based hierarchical extreme learning machine
Rahman et al. An efficient approach for selecting initial centroid and outlier detection of data clustering
Zeng et al. Research on audit opinion prediction of listed companies based on sparse principal component analysis and kernel fuzzy clustering algorithm
Liu et al. A weight-incorporated similarity-based clustering ensemble method
CN116305103A (en) Neural network model backdoor detection method based on confidence coefficient difference
WO2022183019A1 (en) Methods for mitigation of algorithmic bias discrimination, proxy discrimination and disparate impact
Wang et al. Enhanced soft subspace clustering through hybrid dissimilarity
Sun et al. SMOTE-kTLNN: A hybrid re-sampling method based on SMOTE and a two-layer nearest neighbor classifier
Lubis et al. KNN method on credit risk classification with binary particle swarm optimization based feature selection
CN114066173A (en) Capital flow behavior analysis method and storage medium
Anitha et al. An extensive investigation of outlier detection by cluster validation indices
Tang et al. Graph neural network-based node classification with hard sample strategy
Lee et al. Validation measures of bicluster solutions
Zhao An evolutionary intelligent data analysis in promoting smart community

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant