CN112581262A

CN112581262A - Whale algorithm-based fraud detection method for optimizing LVQ neural network

Info

Publication number: CN112581262A
Application number: CN202011536682.XA
Authority: CN
Inventors: 江远强
Original assignee: Baiweijinke Shanghai Information Technology Co ltd
Current assignee: Baiweijinke Shanghai Information Technology Co ltd
Priority date: 2020-12-23
Filing date: 2020-12-23
Publication date: 2021-03-30

Abstract

The invention relates to the technical field of wind control in the Internet financial industry, in particular to a whale algorithm-based method for detecting fraud by optimizing an LVQ neural network, which comprises the following six steps: compared with neural networks such as BP, RBF and SOM, the LVQ neural network has the advantages of simple structure, short training time, stronger nonlinear classification processing capability and the like, compared with optimization algorithms such as genetic algorithm and particle swarm algorithm, the whale algorithm has the characteristics of simple parameter setting, strong function optimization capability, strong global optimization capability, good convergence stability and the like, in the invention, the initial weight of the LVQ neural network is optimized by the whale algorithm to improve the global fitting capability, the learning rate and the prediction accuracy of the LVQ neural network, and the requirement of real-time detection of networked financial fraud behaviors can be met.

Description

Whale algorithm-based fraud detection method for optimizing LVQ neural network

Technical Field

The invention relates to the technical field of wind control in the Internet financial industry, in particular to a whale algorithm-based method for detecting fraud by optimizing an LVQ neural network.

Background

In recent years, with the continuous progress and development of economy in China, more and more people begin to use credit for consumption, the demand for credit evaluation is higher, and machine learning algorithms such as logistic regression, decision trees, support vector machines and the like are successfully applied to credit evaluation, but the algorithms have a common identification effect on financial fraud, and with the development of artificial intelligence, a neural network plays an important role in internet financial fraud identification. Neural networks such as error Back Propagation (BP), Radial Basis Function (RBF), self-organizing map (SOM) and the like become important research fields of internet financial credit assessment. However, BP and RBF neural networks have the defects of low learning speed, high possibility of falling into local minimum values and low prediction result precision, and SOM adopts unsupervised learning rules and lacks classification information, so that the demand for a cheating behavior detection method based on whale algorithm optimization LVQ neural network is increasing day by day.

The Learning Vector Quantization (LVQ) neural network is a forward neural network with a supervised learning method for training a competition layer, which is evolved from a Self-organizing feature mapping (SOM) neural network. As an extension of the SOM, the LVQ neural network integrates the characteristics of competitive learning thought and supervised learning algorithm, overcomes the defect that the SOM lacks of classification information value, does not need to normalize and orthogonalize the input quantity, only needs to directly calculate the distance between the input vector and a competitive layer, thereby realizing mode discrimination, and can effectively avoid the strong limitation that the linear network requires data to be linearly separable under the action of the competitive layer. Compared with neural networks such as BP, RBF and SOM, the LVQ neural network has the advantages of short training time, strong classification capability, high prediction precision and the like.

The LVQ neural network also has the defects that only winning nodes are learned each time, so that the waste of information resources between input samples and competing nodes is caused, the initial values are very sensitive in the initial stage of network learning, partial nodes are not fully utilized to become dead points due to too large initial weights, the nodes are not fully utilized, so that input vectors cannot be well clustered in a competing layer, and the network convergence speed and the classification accuracy are influenced.

In the prior art, a group intelligent algorithm is adopted to optimize the initial weight of the LVQ neural network, such as a genetic algorithm and a particle swarm algorithm, but the genetic algorithm has long running time and poor local search capability in actual use; the particle swarm algorithm has poor global search capability in the later period due to a population position updating mechanism, and the prediction precision of the neural network is reduced. The initial weight of the LVQ neural network is selected through a novel optimization algorithm to improve the global fitting capacity, the learning rate and the prediction accuracy of the LVQ neural network, and a fraud detection method based on whale algorithm optimization LVQ neural network is provided for the problems.

Disclosure of Invention

The invention aims to provide a whale algorithm-based cheating behavior detection method for optimizing an LVQ neural network, so as to solve the problems in the background technology.

In order to achieve the purpose, the invention provides the following technical scheme:

a whale algorithm-based cheating behavior detection method for optimizing an LVQ neural network comprises the following six steps:

s1, collecting a certain proportion of normal and fraudulent customers as modeling samples, collecting customer account registration personal basic information of the modeling samples, and obtaining operation behavior buried point data from monitoring software as credit data;

s2, removing abnormal data in the credit data by using a Levina criterion, and then dividing the samples into a training set and a testing set;

s3, constructing an LVQ neural network, determining a network topology structure and initializing network parameters, screening the credit evaluation indexes with the most representativeness as the input of the LVQ model through a logistic regression algorithm, and taking the cheating performance of a client as the output of the LVQ model;

s4, initializing whale algorithm parameters, optimizing the weights of the LVQ neural network through a whale algorithm to obtain optimal initial weights, endowing the optimal initial weights to the LVQ neural network, and inputting training set samples for learning;

s5, inputting a test set sample to the trained LVQ neural network for prediction, and comparing model precision evaluation indexes with LVQ neural network prediction models optimized by genetic algorithm and particle swarm optimization;

s6, deploying the optimized LVQ neural network fraud detection model to an application platform, acquiring data of a real-time application client, importing the data serving as a sample to be detected into the detection model, outputting a fraud prediction result, realizing real-time detection of the application client, inputting data of a new fraud client into model training at regular intervals, and realizing iterative updating of the model.

Preferably, in S1, a certain proportion and quantity of normal applications and fraudulent clients are selected as modeling samples from the back end of the internet financial platform according to the post-loan performance, and the personal basic information and the operation behavior buried point data are acquired from the monitoring software when the sample client account is registered and applied. The personal application information of the user comprises: the mobile phone number, the academic calendar, the marital status, the working unit, the address, the contact information, the personal basic information, the credit transaction information, the public information and the special record data which are acquired by the credit investigation report; the data of the buried point comprises equipment behavior data and log data which are collected when the point is buried, wherein the equipment behavior data comprises: the number of times, the number of clicks, the click frequency, the total input time and the average time, the mobile phone number data, the GPS position, the MAC address, the IP address data, the geographic information application frequency, the IP application frequency, the equipment electric quantity ratio and the average acceleration of the gyroscope of logging on the platform, and the log data comprises: login times within 7 days, time from the first click to the application of credit, the maximum number of sessions within one day, behavior statistics of one week before the application of credit and the like. In addition, under the compliance requirement, the method is not limited to obtaining the universe multi-dimensional big data including mobile internet behavior data, behavior data in the loan APP, credit history and operator data.

Preferably, in S2, since the LVQ neural network does not need to normalize and orthogonalize the input quantity, but needs to process the abnormal data to reduce noise, the patent uses the criterion of leine to eliminate the abnormal data, and the process is as follows:

s21, for one collected data set X ═ X (X)_i1,x_i2,…,x_ij,…,x_iN) And (3) calculating an arithmetic mean value and a residual error of the arithmetic mean value, wherein the calculation expression is as follows:

s22, obtaining the root mean square deviation according to the Bessel method, and calculating the expression as follows:

wherein N is the number of data set samples,

is an arithmetic mean value, v_ijIs the residual error and σ is the root mean square deviation.

S23, judging whether the data are satisfied

X is then_ijFor normal data, data x is reserved_ijElse delete data x_ij。

Preferably, in S3, the Learning Vector Quantization (LVQ) neural network is a forward neural network with a training competition layer evolved from a Self-organizing feature mapping (SOM) neural network.

The LVQ network consists of three layers, namely an input layer, a competition layer and an output layer, wherein one connection of the nodes of the input layer and the nodes of the competition layer corresponds to one connection weight; the competition layer is mainly responsible for finishing the classification of input vectors in nodes of the input layer, and each node corresponds to a reference vector formed by all connection weights corresponding to the corresponding node; and the nodes in the competition layer are connected with the nodes in the output layer in a one-to-one correspondence manner, and the output layer is used for outputting the detection result outwards.

The LVQ network algorithm can be divided into an LVQ1 algorithm and an LVQ2 algorithm, the LVQ1 algorithm is adopted in the patent, and the basic idea is as follows: if the category of the input vector is consistent with the category corresponding to the linear output layer node, the corresponding competition layer node weight moves along the direction of the input vector; otherwise, the corresponding weight of the node of the competition layer moves along the reverse direction of the input vector. The LVQ1 algorithm mainly comprises the following steps:

s31, creating and initializing the LVQ neural network, wherein the number of the nodes of the competition layer can be selected according to the following empirical formula:

wherein s is the number of nodes of the competition layer; n is the number of nodes of the input layer; m is the number of output layer nodes; a is a constant of 0 to 10.

S32, selecting input vector X ═ X₁,x₂,…,x_j,…,x_n)^TN is the number of input nodes, and the desired output of the input variable D ═ D₁,d₂,…,d_m)^TM is the number of nodes of the output layer, and the connection weight W between the input layer and the competition layer is initialized_ijAnd a learning rate η (η > 0). Sending the input vector X into the input layer, and calculating the distance d between the ith competition layer node and the input vector_iThe calculation formula is as follows:

wherein d is_iThe distance between the ith competition layer node and the input vector is shown, i is 1,2, …, and s is the number of competition layer nodes; x is the number of_jIs an input vector, W_ijIs the connection weight between the input layer and the competing layer.

S33, finding the minimum distance between the input vector and the weight vector as excitation node i, and marking the minimum distance as d_iAnd excitation node d_iClass label of connected linear output layer node is marked as C_i；

S34, updating the connection weight, judging whether the LVQ network classification is correct according to whether the network classification output is consistent with the expected classification, adopting different rules to correct the weight of the excitation node, and if the class label corresponding to the input vector is recorded as C_xClass C corresponding to excitation node i_iClass C corresponding to input vector_xCoincidence, i.e. C_i＝C_xWhen the network output type is consistent with the target type, the weight is corrected towards the direction of the input vector, and the expression of the updated connection weight value is as follows:

W_ij-new＝W_ij-old+η(x-W_ij-old)

otherwise, the weight is corrected to the opposite direction of the input vector, and the expression is as follows:

W_ij-new＝W_ij-old-η(x-W_ij-old)

wherein eta is learning efficiency; w is a_{ij_old}、W_{ij_new}Connecting the weights of the node i and the node j before and after adjustment;

s35, updating learning rate

Iterative training, when t < t_maxAnd if the time t is t +1, the step S32 is switched to input the next sample, and when the LVQ neural network training time reaches the set maximum iteration time or meets the error precision requirement, the training is finished.

S36, LVQ output

When the sample data is used as the input of the LVQ, the node of the competition layer generates a winning node through the competition rule that the winner is the king, the output is 1, and the output of the loser is 0. The output of the output node connected with the group where the winning node is located is 1, and the other outputs are 0, so that the purpose of classifying and identifying the output samples is achieved. But the convergence speed and prediction accuracy of the LVQ neural network are affected by the initial weight.

Preferably, in S4, a Whale algorithm is used to optimize the initial weights of the LVQ neural network, the Whale algorithm (WOA) simulates a new group intelligent optimization algorithm of the Whale bubble net hunting behavior of Whale, and the optimal solution is found by continuously approaching the hunting process of the prey through the spiral bubble net strategy, the shrink wrapping, the spiral position updating and the random hunting mechanism of the Whale. The whale algorithm optimizes the initial weight of the LVQ neural network as follows:

s41, initializing an artificial whale population, wherein the individual code of the artificial whale population is W

Setting population size W_Size(equal to the LVQ neural network weight number n) and the maximum iteration number T_maxWhen the current iteration time t is equal to 0, randomly generating W_SizeThe candidate solution constitutes a population W_P＝{X₁,X₂,…,X_i,…,X_WSizeIn which X is_iIs the ith candidate solution in the population, and X_iThe initial weight x of the LVQ neural network is stored_i＝[W₁,W₂,…,W_j,…,W_n]Whale initial position x₀；

S42, identifying surrounding prey

When the whale with a standing head identifies a prey, the position of the prey is not known a priori, so that the target prey position is assumed to be the position of the optimal or near optimal individual whale in the current population, other individual whales are close to the target prey, and the position updating formula can be described as follows:

X(t+1)＝X^＊(t)-A·D

D＝|CX^＊(t)-X(t)|

wherein D is a distance coefficient between the current whale and the optimal whale; x (t) is an optimal position vector of whale individuals in the contemporary population, namely a local optimal solution, and t is the current iteration number; x (t) is a position vector of the whale individual in the current generation, and is updated in real time; a and C are coefficient variables, wherein A is a random parameter in an interval of [ -2,2] and determines the switching of wandering foraging and surrounding contraction links, and C is a random number in an interval of [0,2] and controls the influence of the distance of a randomly selected whale position vector Xrand from a current position X (t). The coefficient variables a and C are expressed as follows:

A＝2ar₁-a

C＝2·r₂

wherein r is₁And r₂Is [0, 1]]A random number in between; a is a convergence factor or a control parameter, and the expression is as follows:

wherein t represents the current iteration number; t is_maxIs the maximum number of iterations.

The convergence factor a is linearly reduced from 2 to 0, the search range is continuously reduced in training, the coefficient vector A is influenced, surrounding contraction of prey is achieved, and the convergence speed of the whale algorithm is improved.

S43 attack by bubble net

The method for foraging by air bubbles of whales in standing head to attack prey mainly comprises contraction and enclosure and spiral ascending position updating, wherein the contraction and enclosure realizes the enclosure and contraction of the prey by reducing a convergence factor a in the contraction and enclosure, and the spiral ascending position updating is that the whales update the positions of the whales from bottom to top along a spiral shape by manufacturing the air bubbles to gradually contract the enclosure range of the prey.

When the whale with the standing head attacks a prey through the bubble net, the contraction surrounding and the spiral ascending position updating are synchronous, in order to simulate the point, the position of one whale in n whales is randomly selected as an optimal whale position vector X (t), other non-optimal whales update the whale position for a contraction surrounding mechanism or a spiral mechanism according to an optimal whale position random selection probability p updating mode, the global search and local development of the whales during optimizing are guaranteed, and optimizing blind spots are reduced.

This patent chooses to choose the probability of choosing between the two to select to contract to enclose the mechanism or choose spiral position to update the mechanism with the probability p ═ 50%, and mathematical model can be expressed as:

wherein p is [0, 1]]Random number of (2), D_PThe distance between the current optimal position of whale and a prey is represented by | X (t) -X (t) |, X (t) is the optimal position vector of individual whale in the contemporary population, X (t) is the position vector of the current whale, and t is the current iteration number; b is a defined logarithmic spiral shape constant; l is [ -1,1]Wherein when l is-1, the artificial whale is closest to the food, and when l is 1, the artificial whale is closest to the foodThe fish are furthest from the food.

S44, searching for predation

The whale in the sitting position can randomly search for the prey in addition to attacking the prey by the bubble net. Based on the variable coefficient A, the fluctuation range of A also decreases with a, when the value of a decreases from 2 to 0 in the iterative process, A is a random value in [ -a, a ], when A is between [ -1,1], namely | A | <1, the position searched by the whale colony is the position of the target prey, and the whale colony approaches to the target prey to attack the prey.

When A >1 or A < -1, i.e. | A | >1, whale population will perform mobile search away from the prey, thereby finding a more suitable prey, which can enhance the exploration ability of the algorithm to enable the WOA algorithm to perform global search. The mathematical model is as follows:

D＝|C·X_rand-X(t)|

X(t+1)＝X_rand-A·D

wherein, X_randIs a randomly selected whale position vector;

s45 setting fitness function

As the model training pursues the minimization of the mean square error, the mean square error of the LVQ neural network is taken as a fitness function of the whale algorithm, and the fitness function is expressed as the following formula:

wherein, y_i，y'_iRespectively an actual value and a predicted value of the ith sample; n is the number of training sample sets.

S46, iteration updating

Training an LVQ neural network by using training samples, calculating the fitness value of each individual in a whale population according to the error of each training sample, comparing fitness functions of all whale individuals and the optimal whale X (t) in a whale position updating X (t +1), selecting the whale individual with the best fitness function as the optimal whale of the next generation, updating the position of the optimal whale, and respectively endowing j parameter values of the whale to the jth weight in the LVQ neural network, wherein the fitness functions comprise the following steps:

ω_j＝X_i,j(t), j is 1,2, …, m is the number of nodes in output layer

Recording the optimal whale position vector at the last iteration

And giving an initial weight value of the whale position vector pair to the LVQ neural network, and inputting the training set sample into the LVQ neural network for learning and training.

Preferably, in S5, the test set is input into the trained and tested LVQ neural network to output the application detection result, and the prediction accuracy of the model is calculated, which is convenient for testing the training result of the LVQ neural network.

Preferably, in S5, the training samples are compared according to the actual and predicted results to obtain a confusion matrix, and the values of the following indexes, i.e., true Positive rate tpr (true Positive rate), false Positive rate fpr (false Positive rate), auc (area Under curve) and KS (Kolmogorov-Smirnov), can be calculated as follows:

KS＝max(TPR-FPR)

wherein, True Positive (TP) means that the model correctly predicts the Positive class sample as the Positive class; true Negative (TN) refers to the model correctly predicting Negative class samples as Negative classes; false Positive example (FP) refers to a model that incorrectly predicts negative class samples as Positive classes; false Negative (FN) refers to a model that correctly predicts Negative class samples as Negative classes. In this application, the fraud sample is used as a positive type, and the normal sample is used as a negative type.

The TPR is taken as a vertical axis, the FPR is taken as a horizontal axis for plotting to obtain an ROC (receiver operating characteristic Curve), an AUC value (Area Under the ROC Curve) obtained by the ROC Curve is taken as an evaluation standard for measuring the accuracy of the model, and the effect of the model is better when the AUC value is closer to 1.

The KS value is the maximum value of the difference between the TPR and the FPR, the optimal distinguishing effect of the model can be reflected, the threshold value taken at the moment is generally used as the optimal threshold value for defining good and bad users, and generally KS is larger than 0.2, so that the model has better prediction accuracy.

Preferably, in S6, the optimized LVQ neural network fraud detection model is deployed to the application platform, data of the real-time application client is acquired and imported as a sample to be detected into the detection model to output a fraud prediction result, so as to implement real-time detection of the application client, and periodically input new data of the fraud client into model training to implement iterative update of the model.

Compared with the prior art, the invention has the beneficial effects that:

1. compared with neural networks such as BP, RBF and SOM, the LVQ neural network has the advantages of simple structure, short training time, stronger nonlinear classification processing capability and the like.

2. Compared with optimization algorithms such as a genetic algorithm, a particle swarm algorithm and the like, the whale algorithm has the advantages of simple parameter setting, strong function optimization capability, strong global optimization capability, good convergence stability and the like.

3. In the invention, the whale algorithm is utilized to optimize the initial weight of the LVQ neural network to improve the global fitting capability, the learning rate and the prediction precision of the LVQ neural network, and the requirement of real-time detection of internet financial fraud behaviors can be met.

Drawings

FIG. 1 is a schematic view of the overall process of the present invention.

Detailed Description

Referring to fig. 1, the present invention provides a technical solution:

In S1, normal application and fraud customers in certain proportion and quantity are selected as modeling samples from the back end of the Internet financial platform according to the post-loan performance, personal basic information when the account of the sample customer is registered and applied is collected, and operation behavior buried point data is obtained from monitoring software. The personal application information of the user comprises: the mobile phone number, the academic calendar, the marital status, the working unit, the address, the contact information, the personal basic information, the credit transaction information, the public information and the special record data which are acquired by the credit investigation report; the data of the buried point comprises equipment behavior data and log data which are collected when the point is buried, wherein the equipment behavior data comprises: the number of times, the number of clicks, the click frequency, the total input time and the average time, the mobile phone number data, the GPS position, the MAC address, the IP address data, the geographic information application frequency, the IP application frequency, the equipment electric quantity ratio and the average acceleration of the gyroscope of logging on the platform, and the log data comprises: login times within 7 days, time from the first click to the application of credit, the maximum number of sessions within one day, behavior statistics of one week before the application of credit and the like. In addition, under the compliance requirement, the method is not limited to obtaining the universe multi-dimensional big data including mobile internet behavior data, behavior data in the loan APP, credit history and operator data, and the arrangement is favorable for comprehensively counting user information so as to be convenient for subsequently predicting the credit risk of the user.

In S2, since the LVQ neural network does not need to normalize and orthogonalize the input data, but needs to process the abnormal data to reduce noise, the patent uses the leine criterion to eliminate the abnormal data, and the process is as follows:

wherein N is the number of data set samples,

S23, judging whether the data are satisfied

X is then_ijFor normal data, data x is reserved_ijElse delete data x_ijThe setting realizes the elimination of abnormal data and reduces the interference on the prediction model.

In S3, the Learning Vector Quantization (LVQ) neural network is a forward neural network with a training competition layer evolved from Self-organizing feature mapping (SOM) neural network.

S32, selecting input vector X ═ X₁,x₂,…,x_j,…,x_n)^TN is the number of input nodes, and the desired output of the input variable D ═ D₁,d₂,…,d_m)^TM is the number of nodes of the output layer, initialize the input layer and competeConnection weight W between layers_ijAnd a learning rate η (η > 0). Sending the input vector X into the input layer, and calculating the distance d between the ith competition layer node and the input vector_iThe calculation formula is as follows:

W_ij-new＝W_ij-old+η(x-W_ij-old)

W_ij-new＝W_ij-old-η(x-W_ij-old)

s35, updating learning rate

S36, LVQ output

In S4, a Whale Algorithm is used to optimize the initial weights of the LVQ neural network, the Whale Algorithm (WOA) simulates a new group intelligent Optimization Algorithm of the Whale bubble net hunting behavior of Whale, and an optimal solution is found through a hunting process that continuously approaches the hunting of the hunting object through the spiral bubble net strategy, the shrink wrapping, the spiral position updating and the random hunting mechanism of the Whale. The whale algorithm optimizes the initial weight of the LVQ neural network as follows:

S42, identifying surrounding prey

X(t+1)＝X^＊(t)-A·D

D＝|CX^＊(t)-X(t)|

A＝2ar₁-a

C＝2·r₂

The convergence factor a is linearly reduced from 2 to 0, the search range is continuously reduced in training, the coefficient vector A is influenced, surrounding contraction of prey is achieved, and the convergence speed of whales is improved.

S43 attack by bubble net

wherein p is [0, 1]]Random number of (2), D_PThe distance between the current optimal position of whale and a prey is represented by | X (t) -X (t) |, X (t) is the optimal position vector of individual whale in the contemporary population, X (t) is the position vector of the current whale, and t is the current iteration number; b is a defined logarithmic spiral shape constant; l is [ -1,1]Wherein when l is-1, the artificial whale is closest to the food, and when l is 1, the artificial whale is farthest from the food.

S44, searching for predation

D＝|C·X_rand-X(t)|

X(t+1)＝X_rand-A·D

wherein, X_randIs a randomly selected whale position vector;

s45 setting fitness function

S46, iteration updating

ω_j＝X_i,j(t), j is 1,2, …, m is the number of nodes in output layer

Recording the optimal whale position vector at the last iteration

Giving an LVQ neural network to the initial weight of the whale position vector pair, inputting a training set sample into the LVQ neural network for learning and training, and facilitating the improvement of prediction precision through repeated iteration in the setting.

In S5, the test set is input to the trained and tested LVQ neural network to output the application detection result, and the prediction accuracy of the model is calculated for comparison with the actual data.

In S5, the training samples are compared with the actual and predicted results to obtain a confusion matrix, and the values of the following indexes, i.e., true Positive rate tpr (true Positive rate), false Positive rate fpr (false Positive rate), auc (area Under curve) and KS (Kolmogorov-Smirnov), can be calculated as follows:

KS＝max(TPR-FPR)

In S6, the optimized LVQ neural network fraud detection model is deployed to an application platform, data of a real-time application client is acquired and imported into the detection model as a sample to be detected to output a fraud prediction result, so as to implement real-time detection of the application client, and periodically input new data of the fraud client to model training, thereby implementing iterative update of the model.

The principles and embodiments of the present invention are explained herein using specific examples, which are presented only to assist in understanding the method and its core concepts of the present invention. The foregoing is only a preferred embodiment of the present invention, and it should be noted that there are objectively infinite specific structures due to the limited character expressions, and it will be apparent to those skilled in the art that a plurality of modifications, decorations or changes may be made without departing from the principle of the present invention, and the technical features described above may be combined in a suitable manner; such modifications, variations, combinations, or adaptations of the invention using its spirit and scope, as defined by the claims, may be directed to other uses and embodiments.

Claims

1. A whale algorithm-based method for detecting fraud behaviors by optimizing LVQ neural network is characterized by comprising the following steps: the method comprises the following six steps:

2. The method for detecting the cheating behavior based on whale algorithm optimization LVQ neural network as claimed in claim 1, wherein in S1, a certain proportion and quantity of normal applications and cheating customers are selected as modeling samples from the back end of an internet financial platform according to post-loan performance, personal basic information when sample customer accounts are registered and applied is collected, and operation behavior buried point data is obtained from monitoring software. The personal application information of the user comprises: the mobile phone number, the academic calendar, the marital status, the working unit, the address, the contact information, the personal basic information, the credit transaction information, the public information and the special record data which are acquired by the credit investigation report; the data of the buried point comprises equipment behavior data and log data which are collected when the point is buried, wherein the equipment behavior data comprises: the number of times, the number of clicks, the click frequency, the total input time and the average time, the mobile phone number data, the GPS position, the MAC address, the IP address data, the geographic information application frequency, the IP application frequency, the equipment electric quantity ratio and the average acceleration of the gyroscope of logging on the platform, and the log data comprises: login times within 7 days, time from the first click to the application of credit, the maximum number of sessions within one day, behavior statistics of one week before the application of credit and the like. In addition, under the compliance requirement, the method is not limited to obtaining the universe multi-dimensional big data including mobile internet behavior data, behavior data in the loan APP, credit history and operator data.

3. The method for detecting fraud in claim 1, wherein in S2, because the LVQ neural network does not need to normalize and orthogonalize the input data, but needs to process abnormal data to reduce noise, the patent uses the leintean criterion to eliminate the abnormal data, and the procedure is as follows:

wherein N is the number of data set samples,

S23, judging whether the data are satisfied

X is then_ijFor normal data, data x is reserved_ijElse delete data x_ij。

4. The method for detecting fraud behavior based on whale algorithm optimized LVQ neural network as claimed in claim 1, wherein in S3, Learning Vector Quantization (LVQ) neural network is a forward neural network with training competition layer evolved from Self-organizing feature mapping (SOM) neural network.

s31, creating an LVQ neural network and determining a network topology structure according to input and output, wherein the number of nodes of the competition layer can be selected according to the following empirical formula:

wherein d is_iThe distance between the ith competition layer node and the input vector is shown, i is 1,2, …, and s is the number of competition layer nodes; x is the number of_jIs an input vector, W_ijIs a connection between an input layer and a competition layerAnd (6) weighting.

W_ij-new＝W_ij-old+η(x-W_ij-old)

W_ij-new＝W_ij-old-η(x-W_ij-old)

s35, updating learning rate

S36, LVQ output

5. The method as claimed in claim 1, wherein a Whale Algorithm is used to optimize the initial weight of the LVQ neural network in S4, the Whale Algorithm (WOA) is a new group intelligent Optimization Algorithm simulating the Whale bubble net hunting behavior of Whale, and the optimal solution is found through the hunting process of "spiral bubble net" strategy, shrink wrapping, spiral position updating and random hunting mechanism of Whale. The whale algorithm optimizes the initial weight of the LVQ neural network as follows:

S42, identifying surrounding prey

X(t+1)＝X^＊(t)-A·D

D＝|CX^＊(t)-X(t)|

wherein D is a distance coefficient between the current whale and the optimal whale; x (t) is an optimal position vector of whale individuals in the contemporary population, namely a local optimal solution, and t is the current iteration number; x (t) is an individual of contemporary whaleThe position vector of (2) is updated in real time; a and C are coefficient variables, where A is the interval [ -2,2]The random parameters above determine the switching of wandering foraging and surrounding contraction links, and C is the interval [0,2]]Random number of (3), control randomly selected whale position vector X_randInfluence of distance from the current position x (t). The coefficient variables a and C are expressed as follows:

A＝2ar₁-a

C＝2·r₂

S43 attack by bubble net

S44, searching for predation

D＝|C·X_rand-X(t)|

X(t+1)＝X_rand-A·D

wherein, X_randIs a randomly selected whale position vector;

s45 setting fitness function

S46, iteration updating

ω_j＝X_i,j(t)，j＝1,2,…,m

recording the optimal whale position vector at the last iteration

6. The method and system for credit assessment based on the grayish wolf algorithm optimized generalized regression neural network as claimed in claim 1, wherein in S5, the test set is input to the trained LVQ neural network to output the application detection result and calculate the prediction accuracy of the model.

7. The method for detecting the fraudulent behavior of LVQ neural network based on whale algorithm optimization as claimed in claim 1, wherein in S5, the training samples are compared according to the actual and predicted results to obtain a confusion matrix, and the values of the following indexes, namely, true Positive rate TPR (true Positive rate), false Positive rate FPR (false Positive rate), AUC (area Under Current) and KS (Kolmogorov-Smirnov), can be calculated, wherein the calculation formula is as follows:

KS＝max(TPR-FPR)

8. The method for detecting the cheating behavior based on whale algorithm optimization LVQ neural network of claim 1, characterized in that in S6, the optimized LVQ neural network cheating behavior detection model is deployed to an application platform, data of a real-time application client are obtained and are led into the detection model as a sample to be detected to output a cheating behavior prediction result, the real-time detection of the application client is realized, and the data of a new cheating client is input into model training at regular intervals to realize the iterative update of the model.