CN112581265A

CN112581265A - Internet financial client application fraud detection method based on AdaBoost

Info

Publication number: CN112581265A
Application number: CN202011536761.0A
Authority: CN
Inventors: 江远强
Original assignee: Baiweijinke Shanghai Information Technology Co ltd
Current assignee: Baiweijinke Shanghai Information Technology Co ltd
Priority date: 2020-12-23
Filing date: 2020-12-23
Publication date: 2021-03-30

Abstract

The application discloses an Internet financial client application fraud detection method based on AdaBoost, which comprises the following steps: selecting a point concentration cheating client as a modeling sample, and collecting credit data; after the acquired credit data is subjected to deletion completion, abnormal value processing and normalization processing, dividing K-fold cross validation data into a training set and a test set; building a plurality of weak classifiers in a training set, and performing weighted synthesis on an application fraud classifier of Adaboost; optimizing Adaboost weak classifier weight by adopting a particle swarm algorithm, and training again on the training set; inputting a test set sample to the trained AdaBoost for detection, comparing with an actual label, and evaluating and comparing with a logistic regression and a support vector machine by using model precision evaluation indexes; and deploying the fraud detection model for optimizing AdaBoost to the application platform to realize the detection of the real-time fraud of the client. According to the method, the weight of the Adaboost weak classifier is optimized through the particle swarm, and the corrected strong classifier is generated to realize application fraud detection of Internet financial clients.

Description

Internet financial client application fraud detection method based on AdaBoost

Technical Field

The invention relates to the technical field of wind control in the Internet financial industry, in particular to an Internet financial client application fraud detection method based on AdaBoost.

Background

With the development of internet finance, network lending develops rapidly. In a network environment, strong relevant credit features such as financial information of a borrower are difficult to obtain, a large number of weak relevant credit features need to be obtained from different network platforms, the weak relevant credit features detect whether a certain client applies for fraud or not by establishing a fraud behavior model through machine learning, such as logistic regression and support vector machine algorithm, wherein an independent variable is personal basic information when a sample client account is registered and applied, operation behavior buried point data is obtained from monitoring software, and a dependent variable is fraud application probability. Research shows that the fraud rate of the network loan service is significantly higher than that of the traditional offline loan service, while a fraudulent user has higher false score cost and has great influence on the classification result of a base classifier and an integrated model, so that the demand for applying for a fraud detection method for an internet financial client based on AdaBoost is increasing day by day.

The common logistic regression and support vector machine algorithms are single classifiers, the precision is low, and compared with a simple classification algorithm model, the integrated model has better performance. The adaptive boosting (adaptive boosting) algorithm is combined into a strong classifier through a plurality of weak classifiers, has the advantages of simplicity, effectiveness, capability of constructing sub-classifiers by using different methods and the like, obviously improves the prediction performance of the learning algorithm, and is widely applied to various fields. How to apply Adaboost to financial customer fraud detection becomes an important research direction at present, so that an Internet financial customer application fraud detection method based on Adaboost is provided for the above problems.

Disclosure of Invention

The invention aims to provide an Internet financial client application fraud detection method based on AdaBoost, so as to solve the problems in the background technology.

In order to achieve the purpose, the invention provides the following technical scheme:

an Internet financial client application fraud detection method based on AdaBoost comprises the following six steps:

s1, collecting data, selecting a certain proportion and quantity of normal applications and fraudulent customers as modeling samples from an internet financial platform, collecting personal basic information when a customer account of a sample is registered and applied, obtaining operation behavior buried point data from monitoring software as credit data, and using normal applications or fraudulent manifestations corresponding to the sample as label data;

s2, preprocessing data, namely after performing deletion completion, abnormal value processing and normalization processing on the collected credit data, dividing K-fold cross validation data into a training set and a test set;

s3, weighting and synthesizing an application fraud classifier of Adaboost in a training set by constructing a plurality of weak classifiers;

s4, optimizing the Adaboost weak classifier weight by adopting a particle swarm algorithm, and training again on the training set;

s5, inputting a test set sample to the trained AdaBoost for detection, comparing with an actual label, and comparing with a logistic regression and a support vector machine evaluation according to a model precision evaluation index;

s6, deploying the fraud detection model of the optimized AdaBoost to an application platform, acquiring data of a real-time application client, importing the data serving as a sample to be detected into a prediction model, outputting whether fraud is caused or not, realizing real-time examination and approval of the application client, inputting the performance data into the model for training regularly, and realizing online updating of the model.

Preferably, in S1, a certain proportion and quantity of normal repayment and overdue customers are selected as modeling samples from the back end of the internet financial platform according to the post-loan performance, personal basic information when the sample customer account registration application is acquired, and operation behavior buried point data is acquired from monitoring software. The personal application information of the user comprises: the mobile phone number, the academic calendar, the marital status, the working unit, the address, the contact information, the personal basic information, the credit transaction information, the public information and the special record data which are acquired by the credit investigation report; the data of the buried point comprises equipment behavior data and log data which are collected when the point is buried, wherein the equipment behavior data comprises: the number of times, the number of clicks, the click frequency, the total input time and the average time, the mobile phone number data, the GPS position, the MAC address, the IP address data, the geographic information application frequency, the IP application frequency, the equipment electric quantity ratio and the average acceleration of the gyroscope of logging on the platform, and the log data comprises: login times within 7 days, time from the first click to the application of credit, the maximum number of sessions within one day, behavior statistics of one week before the application of credit and the like. In addition, under the compliance requirement, the method is not limited to obtaining the universe multi-dimensional big data including mobile internet behavior data, behavior data in the loan APP, credit history and operator data.

Preferably, in S2, Adaboost is sensitive to abnormal samples, which may get higher weight in iteration, and affect the prediction accuracy of the final strong learner. Normalizing the data after eliminating abnormal points and reducing noise by using a normalization formula on the sample data collected in the step S1, converting all data into [0,1], reducing the difference between the data and averaging the data, wherein the normalization formula is as follows:

wherein, X_normIs normalized data; x_min、X_maxRespectively representing the minimum and maximum values in the data set; and X is original data.

And dividing the normalized data set into a training set and a test set by adopting K-fold, firstly, disordering the data set, uniformly dividing the data set into K disjoint subsets, and randomly dividing the training set and the test set for cross validation.

Preferably, in S3, the Adaptive boosting algorithm (Adaptive boosting) is to repeatedly search the sample feature space, obtain the weights of the samples, continuously adjust the weights of the training samples in the iterative process, increase (decrease) the weights of the samples with low (high) prediction accuracy, and combine the samples by using a weighted majority voting method to form a strong predictor, that is, increase (decrease) the weight of the weak predictor with a smaller (large) prediction error rate, so that it plays a larger (small) role in voting, thereby significantly improving the prediction performance of the learning algorithm.

Selecting m groups of training samples T { (x) from the sample space₁，y₁)，(x₂，y₂)，…，(x_m，y_m) In which y is_i{ -1, +1} denotes positive and negative samples, if x_iNo manifestation of fraud, (x)_i，y_i)＝(x_i-1) and vice versa (x)_i，y_i)＝(x_i, +1), Weak pointsAnd f (x) a classifier algorithm, K, the iteration number of the weak classifier and H (X) a training result output classifier. The method comprises the following specific steps:

s31, initializing sample weight:

the weight distribution of the training samples is initialized,

D₁＝(w₁₁，w₁₂，…，w_1i)；w_1i1/m, i is 1,2, …, m represents the number of samples,

wherein D₁Representing the weight, D, of each sample of the first iteration_tRepresents the weight distribution, w, of the training data before the start of the t-th iteration_tiRepresenting the weight of the ith sample at the tth iteration.

S32 iterative training

Carrying out iteration T as 1,2, … and T, carrying out iteration training on the data sample to obtain a weak classifier h_t(x)。

Training weak classifier h using the sample distribution of the t-th round_t(x) At the current distribution D_tThen, calling a weak classifier to obtain a classification rule of the t-th round:

h_t(x)：X→{-1，+1}。

s33 weight normalization

Weight w for ith sample at the t-th iteration_tiNormalization treatment:

wherein, w_tiIs a sample (x)_i，y_i) Weights in the t-th iteration.

S34, calculating the fitness value of each weak classifier

Training weak classifier h using the sample distribution of the t-th round_t(x) Calculate h_t(x) E classification error rate of_tThe sum of all the misclassified sample weights represents the training error rate and the misjudgment rate e of the t-th weak classifier_tThe expression is as follows:

wherein, w_tiIs a sample (x)_i，y_i) Weight in the t-th iteration, I (h)_t(x_i)≠y_i) As the rate of misjudgment e_tCalculated are misclassified samples; y is_iRepresenting the true tag value of the ith sample.

S35, calculating h_t(x) Coefficient a of_t

Weight a of each weak classifier_tThe importance of the weak classifier can be measured, and the calculation expression is as follows:

wherein e is_tRepresenting the training error rate of the t-th weak classifier.

S36, updating the sample weight and increasing the weight of the error sample

Updating the weights D of a training data set_t+1The weight of the data set is updated according to the last weight, and the updating expression is as follows:

D_t+1,i＝(w_t+1,1,w_t+1,2,…,w_t+1,m)

wherein Z_tIs a normalization factor such that the weight distribution D_tBecomes a probability distribution, the expression is as follows:

wherein, w_tiWeights for the ith sample in the t-th iteration; a is_tIs the weight, h, of each weak classifier_t(x_i) Is a weak classifier.

S37 construction of Linear combination of basic classifiers

After the iteration is completed, combining the weak classifiers f (x):

then, a mathematical sign function is added to obtain a final strong classifier H (x):

wherein, sign mathematical sign function expression is as follows:

experimental results show that AdaBoost has higher detection rate, low generalization error rate, no need of parameter adjustment and difficult occurrence of overfitting phenomenon.

Preferably, in S4, for the problem that once the weak classification coefficients are determined in each iteration process, the weak classifier cannot be changed in the later stage, and the redundant or useless weak classifier has a large weight, the particle swarm optimization is used to optimize the weight of the Adaboost weak classifier, so that the weak classifier with high accuracy obtains the large weight, and the useless or redundant weak classifier obtains the small weight, thereby further improving the accuracy and readability of Adaboost.

S41, initializing particle swarm parameters and coding Adaboost

Encoding weak classifier weights needing optimization by Adaboost as a position vector of each particle, wherein initial position parameters of each particle are random numbers between [0 and 1], and generating a speed matrix and a position matrix of the particle, wherein the individual number of the particle is determined by a specific data scale and a training set data scale and is generally between 20 and 40;

s42 setting fitness function

Error rate e of Adaboost_tAs a fitness function for each particle of the population, the expression is as follows:

wherein m represents the number of samples, e_tIndicating the error rate, i.e., fitness value, of the ith particle, m is the number of weak classifiers, w_tiT-th weight value, y, representing the ith example_iRepresenting the true class of the ith sample.

S43, position and speed update

Calculating the particle fitness value in each iteration according to the fitness function fit, comparing the particle fitness values, and determining the individual extreme value and the global optimal extreme value of each particle to update the optimal position and speed of each particle;

assuming that M particles form a particle swarm in a D-dimensional search space, in each iteration process, the particles update the speed V of the particles through an individual extremum and a global extremum_idAnd position X_idThe update formula is as follows:

wherein w is an inertial weight (w balances the local search capability and the global search capability of the PSO, generally 0.5); c. C₁、c₂An acceleration factor (usually taken to be 2, typically between 0 and 4); r is₁、r₂Is distributed in [0,1]]A random number in between; d is 1,2, …, D is the data dimension; i is 1,2, …, M is the number of particles; k is an iterationThe number of times;

representing individual extrema of the current particle;

representing the global extremum of the current particle.

S44, iterative optimization

Analyzing the result after each iteration, and using the current fitness value fit (i) obtained by the iteration and the individual extreme value of the individual current particle

Make a comparison if

Update the individual extremum of the current particle with fit (i)

If it is not

Update the global extremum of the current particle with fit (i)

Simultaneous velocity V_idAnd position X_idUpdates the velocity and position of the current particle.

S45, finding the global optimal solution

According to the fact that the adaptive value fitness is smaller than the set value or the maximum iteration number k is reached_maxEvaluating all individuals in each generation of particle swarm, continuously iterating until reaching the maximum iteration times, and finding out the minimum value fitness of individual fitness_minCorrespondingly obtaining the individual optimal solution g_bestAnd global optimal solution p of the particle swarm_best。

S46, finding the global optimal solution

Global optimal solution p of particle swarm_bestTo obtainThe optimal individual decoding is given to the weak classifier weight of Adaboost to obtain an Adaboost detection model, and a training set is input into learning training.

Preferably, in S5, the training samples are compared according to the actual and predicted results to obtain a confusion matrix, and the values of the following indexes, i.e., true Positive rate tpr (true Positive rate), false Positive rate fpr (false Positive rate), auc (area Under curve) and KS (Kolmogorov-Smirnov), can be calculated as follows:

KS＝max(TPR-FPR)

wherein, True Positive (TP) means that the model correctly predicts the Positive class sample as the Positive class; true Negative (TN) refers to the model correctly predicting Negative class samples as Negative classes; false Positive example (FP) refers to a model that incorrectly predicts negative class samples as Positive classes; false Negative (FN) refers to a model that correctly predicts Negative class samples as Negative classes. In this application, the fraud sample is used as a positive type, and the normal sample is used as a negative type.

The TPR is taken as a vertical axis, the FPR is taken as a horizontal axis for plotting to obtain an ROC (receiver operating characteristic Curve), an AUC value (Area Under the ROC Curve) obtained by the ROC Curve is taken as an evaluation standard for measuring the accuracy of the model, and the effect of the model is better when the AUC value is closer to 1.

The KS value is the maximum value of the difference between the TPR and the FPR, the optimal distinguishing effect of the model can be reflected, the threshold value taken at the moment is generally used as the optimal threshold value for defining good and bad users, and generally KS is larger than 0.2, so that the model has better prediction accuracy.

Preferably, in S6, the fraud detection model that optimizes AdaBoost is deployed to the application platform, data of the real-time application client is acquired and imported as a sample to be tested into the prediction model to output whether fraud occurs, so as to achieve real-time approval of the application client, and periodically input performance data into the model for training, thereby achieving online update of the model.

Compared with the prior art, the invention has the beneficial effects that:

1. in the invention, the Adaboost algorithm is combined into a strong classifier through a plurality of weak classifiers, so that the Adaboost algorithm has the advantages of simplicity, effectiveness, capability of constructing sub-classifiers by using different methods and the like, and can remarkably improve the prediction performance of the learning algorithm.

2. In the invention, the particle swarm algorithm has less parameter adjustment, starts from random solution, finds global optimum by local optimum through iteration, and is suitable for solving the complex problems of nonlinearity, incoherence and multi-peak value.

3. In the invention, the particle swarm optimization is adopted to optimize the weight of the Adaboost weak classifier, so that the weak classifier with high precision obtains larger weight, and the useless or redundant weak classifier obtains smaller weight, thereby further improving the accuracy and readability of Adaboost.

Drawings

FIG. 1 is a flow chart of the present invention.

Detailed Description

Referring to fig. 1, the present invention provides a technical solution:

In S1, normal repayment and overdue clients in certain proportion and quantity are selected as modeling samples from the back end of the Internet financial platform according to the post-loan performance, personal basic information when the account of the sample client is registered and applied is collected, and operation behavior buried point data is obtained from monitoring software. The personal application information of the user comprises: the mobile phone number, the academic calendar, the marital status, the working unit, the address, the contact information, the personal basic information, the credit transaction information, the public information and the special record data which are acquired by the credit investigation report; the data of the buried point comprises equipment behavior data and log data which are collected when the point is buried, wherein the equipment behavior data comprises: the number of times, the number of clicks, the click frequency, the total input time and the average time, the mobile phone number data, the GPS position, the MAC address, the IP address data, the geographic information application frequency, the IP application frequency, the equipment electric quantity ratio and the average acceleration of the gyroscope of logging on the platform, and the log data comprises: login times within 7 days, time from the first click to the application of credit, the maximum number of sessions within one day, behavior statistics of one week before the application of credit and the like. In addition, under the compliance requirement, the method is not limited to obtaining the universe multi-dimensional big data including mobile internet behavior data, behavior data in the loan APP, credit history and operator data, and the arrangement is favorable for comprehensively counting user information so as to be convenient for subsequently predicting the credit risk of the user.

In S2, Adaboost is sensitive to abnormal samples, which may get higher weight in iteration, and affect the prediction accuracy of the final strong learner. Normalizing the data after eliminating abnormal points and reducing noise by using a normalization formula on the sample data collected in the step S1, converting all data into [0,1], reducing the difference between the data and averaging the data, wherein the normalization formula is as follows:

And dividing the normalized data set into a training set and a test set by adopting K-fold, firstly, disordering the data set, uniformly dividing the data set into K disjoint subsets, and randomly dividing the training set and the test set for cross validation, wherein the setting is convenient for calculation.

In S3, an Adaptive boosting algorithm (Adaptive boosting, Adaptive) obtains sample weights by repeatedly searching a sample feature space, continuously adjusts the weights of training samples in an iterative process, increases (decreases) the weights of samples with low (high) prediction accuracy, and combines a weighted majority voting method to form a strong predictor, i.e., increases (decreases) the weights of weak predictors with smaller (larger) prediction error rates, so that it plays a larger (smaller) role in voting, and significantly improves the prediction performance of the learning algorithm.

Selecting m groups of training samples T { (x) from the sample space₁，y₁)，(x₂，y₂)，…，(x_m，y_m) In which y is_i{ -1, +1} denotes positive and negative samples, if x_iNo manifestation of fraud, (x)_i，y_i)＝(x_i-1) and vice versa (x)_i，y_i)＝(x_i+1), a weak classifier algorithm f (x), the iteration number K of the weak classifier, and a training result output classifier H (X). The method comprises the following specific steps:

s31, initializing sample weight:

the weight distribution of the training samples is initialized,

S32 iterative training

h_t(x)：X→{-1，+1}

s33 weight normalization

Weight w for ith sample at the t-th iteration_tiNormalization treatment:

wherein, w_tiIs a sample (x)_i，y_i) Weights in the t-th iteration.

S34, calculating the fitness value of each weak classifier

wherein, w_tiIs a sample (x)_i，y_i) Weight in the t-th iteration, I (h)_t(x_i)≠y_i) As the rate of misjudgment e_tCalculated are misclassified samples; y is_iA true tag value representing the ith sample;

s35, calculating h_t(x) Coefficient a of_t

wherein e is_tRepresenting the training error rate of the t-th weak classifier.

S36, updating the sample weight and increasing the weight of the error sample

D_t+1,i＝(w_t+1,1,w_t+1,2,…,w_t+1,m)

wherein, w_tiWeights for the ith sample in the t-th iteration; a is_tIs the weight, h, of each weak classifier_t(x_i) Is a weak classificationA device.

S37 construction of Linear combination of basic classifiers

After the iteration is completed, combining the weak classifiers f (x):

wherein, sign mathematical sign function expression is as follows:

In S4, aiming at the problem that once the weak classification coefficients are determined in each iteration process, the weak classification coefficients cannot be changed in the later stage, and redundant or useless weak classifiers have a large weight, the particle swarm optimization is used to optimize the weight of the Adaboost weak classifier, so that the weak classifier with high accuracy obtains a large weight, and the useless or redundant weak classifier obtains a small weight, thereby further improving the accuracy and readability of Adaboost.

S41, initializing particle swarm parameters and coding Adaboost

s42 setting fitness function

S43, position and speed update

wherein w is an inertial weight (w balances the local search capability and the global search capability of the PSO, generally 0.5); c. C₁、c₂An acceleration factor (usually taken to be 2, typically between 0 and 4); r is₁、r₂Is distributed in [0,1]]D is 1,2, …, D is the data dimension; i is 1,2, …, M is the number of particles; k is the number of iterations;

representing individual extrema of the current particle;

representing the global extremum of the current particle.

S44, iterative optimization

Make a comparison if

Update the individual extremum of the current particle with fit (i)

If it is not

Update the global extremum of the current particle with fit (i)

S45, finding the global optimal solution

According to the fact that the adaptive value fitness is smaller than the set value or the maximum iteration number k is reached_maxEvaluating all individuals in each generation of particle swarm, continuously iterating until reaching the maximum iteration times, and finding out the minimum value fitness of individual fitness_min，Correspondingly obtaining the individual optimal solution g_bestAnd global optimal solution p of the particle swarm_best。

S46, finding the global optimal solution

Global optimal solution p of particle swarm_bestThe obtained optimal individual decoding is given to the weak classifier weight of Adaboost to obtain an Adaboost detection model, and a training set is input into learning training, so that the arrangement is favorable for enhancing the prediction precision.

In S5, the training samples are compared with the actual and predicted results to obtain a confusion matrix, and the values of the following indexes, i.e., true Positive rate tpr (true Positive rate), false Positive rate fpr (false Positive rate), auc (area Under curve) and KS (Kolmogorov-Smirnov), can be calculated as follows:

KS＝max(TPR-FPR)

This arrangement is advantageous for straight pipe judgment model accuracy.

In S6, deploying the fraud detection model of the optimized AdaBoost to an application platform, acquiring data of a real-time application client, importing the data serving as a sample to be detected into a prediction model, outputting whether fraud is caused or not, realizing real-time examination and approval of the application client, inputting the performance data into the model for training at regular intervals, and realizing online updating of the model.

The Internet financial client application fraud detection system based on AdaBoost comprises the following modules:

the data acquisition module is used for acquiring a modeling sample which comprises personal application information, operation behavior buried point data and fraud performance as evaluation results;

the data processing module is used for carrying out deletion completion, abnormal value processing and normalization processing on the collected credit data;

the model building module is used for training and building a plurality of weak classifiers and weighting and synthesizing an application fraud classifier of Adaboost;

the parameter optimization module is used for optimizing the Adaboost weak classifier weight by adopting a particle swarm algorithm, and training and optimizing again;

a fraud detection module: the fraud detection model for AdaBoost detects fraud for the application client in real time.

The principles and embodiments of the present invention are explained herein using specific examples, which are presented only to assist in understanding the method and its core concepts of the present invention. The foregoing is only a preferred embodiment of the present invention, and it should be noted that there are objectively infinite specific structures due to the limited character expressions, and it will be apparent to those skilled in the art that a plurality of modifications, decorations or changes may be made without departing from the principle of the present invention, and the technical features described above may be combined in a suitable manner; such modifications, variations, combinations, or adaptations of the invention using its spirit and scope, as defined by the claims, may be directed to other uses and embodiments.

Claims

1. An Internet financial client application fraud detection method based on AdaBoost is characterized by comprising the following six steps:

2. The method for detecting the application fraud of the internet financial client based on AdaBoost of claim 1, characterized in that in S1, a certain proportion and quantity of normal repayment and overdue clients are selected as modeling samples according to the post-loan expression from the back end of the internet financial platform, personal basic information of the sample client account during registration application is collected, and operation behavior buried point data is obtained from monitoring software. The personal application information of the user comprises: the mobile phone number, the academic calendar, the marital status, the working unit, the address, the contact information, the personal basic information, the credit transaction information, the public information and the special record data which are acquired by the credit investigation report; the data of the buried point comprises equipment behavior data and log data which are collected when the point is buried, wherein the equipment behavior data comprises: the number of times, the number of clicks, the click frequency, the total input time and the average time, the mobile phone number data, the GPS position, the MAC address, the IP address data, the geographic information application frequency, the IP application frequency, the equipment electric quantity ratio and the average acceleration of the gyroscope of logging on the platform, and the log data comprises: login times within 7 days, time from the first click to the application of credit, the maximum number of sessions within one day, behavior statistics of one week before the application of credit and the like. In addition, under the compliance requirement, the method is not limited to obtaining the universe multi-dimensional big data including mobile internet behavior data, behavior data in the loan APP, credit history and operator data.

3. The method for detecting Internet financial client application fraud based on AdaBoost of claim 1, wherein in S2, Adaboost is sensitive to abnormal samples, and the abnormal samples may obtain higher weight in iteration, which affects the prediction accuracy of the final strong learner. Normalizing the data after eliminating abnormal points and reducing noise by using a normalization formula for the sample data collected in the step S1, converting all data into [0,1], reducing the difference value among the data, and enabling the data distribution to be smoother, wherein the normalization formula is as follows:

4. The method for detecting internet financial client application fraud based on AdaBoost of claim 1, wherein in S3, an Adaptive boosting algorithm (AdaBoost) is used to obtain sample weights by repeatedly searching sample feature space, continuously adjust the weights of training samples in an iterative process, increase (decrease) the weights of samples with low (high) prediction accuracy, and combine a weighted majority voting method to form a strong predictor, i.e., increase (decrease) the weights of weak predictors with smaller (large) prediction error rates, so that it plays a larger (smaller) role in voting, thereby significantly improving the prediction performance of the learning algorithm.

s31, initializing sample weight:

the weight distribution of the training samples is initialized,

wherein D is₁Representing the weight, D, of each sample of the first iteration_tRepresents the weight distribution, w, of the training data before the start of the t-th iteration_tiRepresenting the weight of the ith sample at the tth iteration.

S32 iterative training

Carrying out iteration T as 1,2, … and T, carrying out iteration training on the data sample to obtain a weak classifier h_t(x) At the current distribution D_tThen, calling a weak classifier to obtain a classification rule of the t-th round:

h_t(x)：X→{-1，+1}。

s33 weight normalization

Weight w for ith sample at the t-th iteration_tiNormalization treatment:

wherein, w_tiIs a sample (x)_i，y_i) Weights in the t-th iteration.

S34, calculating the fitness value of each weak classifier

Training weak classifier h using the sample distribution of the t-th round_t(x) Calculate h_t(x) E classification error rate of_t，e_tThe sum of all the misclassified sample weights represents the training error rate of the t-th weak classifier, and the expression is as follows:

wherein, w_tiIs a sample (x)_i，y_i) Weight in the t-th iteration, I (h)_t(x_i)≠y_i) To classify the error rate e_tCalculated are misclassified samples; y is_iThe true tag value for the ith sample;

s35, calculating h_t(x) Coefficient a of_t

where et represents the training error rate of the t-th weak classifier.

S36, updating the sample weight and increasing the weight of the error sample

D_t+1,i＝(w_t+1,1,w_t+1,2,…,w_t+1,m)

wherein Z is_tIs a normalization factor such that the weight distribution D_tBecomes a probability distribution, the expression is as follows:

S37 construction of Linear combination of basic classifiers

After the iteration is completed, combining the weak classifiers f (x):

wherein, sign mathematical sign function expression is as follows:

5. The method for detecting the fraud application of the Internet financial client based on AdaBoost according to claim 1, wherein in S4, aiming at the problem that once the weak classification coefficients are determined in each iteration process, the weak classification coefficients cannot be changed in the later period, and redundant or useless weak classifiers have larger weights inevitably, the weight of the Adaboost weak classifier is optimized by adopting a particle swarm algorithm, so that the weak classifier with high accuracy obtains larger weights, and the useless or redundant weak classifier obtains smaller weights, thereby further improving the accuracy and readability of the Adaboost.

S41, initializing particle swarm parameters and coding Adaboost

s42 setting fitness function

S43, position and speed update

assuming that M particles constitute a population of particles in a D-dimensional search space, during each iteration the particles pass through the sum of individual extremaGlobal extremum updating its own velocity V_idAnd position X_idThe update formula is as follows:

wherein w is an inertial weight (w balances the local search capability and the global search capability of the PSO, generally 0.5); c. C₁、c₂An acceleration factor (usually taken to be 2, typically between 0 and 4); r is₁、r₂Is distributed in [0,1]]A random number in between; d is 1,2, …, D is the data dimension; i is 1,2, …, M is the number of particles; k is the number of iterations;

representing individual extrema of the current particle;

representing the global extremum of the current particle.

S44, iterative optimization

Make a comparison if

Update the individual extremum of the current particle with fit (i)

If it is not

Update the global extremum of the current particle with fit (i)

S45, finding the global optimal solution

S46, finding the global optimal solution

Global optimal solution p of particle swarm_bestAnd the obtained optimal individual decoding is given to the weak classifier weight of Adaboost to obtain an Adaboost detection model, and the training set is input into learning training.

6. The method for detecting internet financial client application fraud based on AdaBoost of claim 1, wherein in S5, the training samples are compared according to actual and predicted results to obtain a confusion matrix, and the values of the following indexes, namely, true Positive rate tpr (true Positive rate), false Positive rate fpr (false Positive rate), auc (area Under customer) and KS (Kolmogorov-Smirnov), can be calculated as follows:

KS＝max(TPR-FPR)

7. The method for detecting the application fraud of the Internet financial client based on AdaBoost according to claim 1, wherein in S6, a fraud detection model for optimizing AdaBoost is deployed to an application platform, data of a real-time application client is obtained and is imported into a prediction model as a sample to be detected to output whether fraud exists or not, real-time approval of the application client is realized, and presented data is periodically input into model training to realize online updating of the model.