CN112581262A - Whale algorithm-based fraud detection method for optimizing LVQ neural network - Google Patents

Whale algorithm-based fraud detection method for optimizing LVQ neural network Download PDF

Info

Publication number
CN112581262A
CN112581262A CN202011536682.XA CN202011536682A CN112581262A CN 112581262 A CN112581262 A CN 112581262A CN 202011536682 A CN202011536682 A CN 202011536682A CN 112581262 A CN112581262 A CN 112581262A
Authority
CN
China
Prior art keywords
whale
neural network
data
lvq
algorithm
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202011536682.XA
Other languages
Chinese (zh)
Inventor
江远强
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Baiweijinke Shanghai Information Technology Co ltd
Original Assignee
Baiweijinke Shanghai Information Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Baiweijinke Shanghai Information Technology Co ltd filed Critical Baiweijinke Shanghai Information Technology Co ltd
Priority to CN202011536682.XA priority Critical patent/CN112581262A/en
Publication of CN112581262A publication Critical patent/CN112581262A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q40/00Finance; Insurance; Tax strategies; Processing of corporate or income taxes
    • G06Q40/03Credit; Loans; Processing thereof
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/004Artificial life, i.e. computing arrangements simulating life
    • G06N3/006Artificial life, i.e. computing arrangements simulating life based on simulated virtual individual or collective life forms, e.g. social simulations or particle swarm optimisation [PSO]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • General Physics & Mathematics (AREA)
  • Evolutionary Computation (AREA)
  • Artificial Intelligence (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • General Engineering & Computer Science (AREA)
  • Mathematical Physics (AREA)
  • General Health & Medical Sciences (AREA)
  • Business, Economics & Management (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • Software Systems (AREA)
  • Computational Linguistics (AREA)
  • Biophysics (AREA)
  • Biomedical Technology (AREA)
  • Health & Medical Sciences (AREA)
  • Finance (AREA)
  • Accounting & Taxation (AREA)
  • Technology Law (AREA)
  • Evolutionary Biology (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • General Business, Economics & Management (AREA)
  • Strategic Management (AREA)
  • Marketing (AREA)
  • Economics (AREA)
  • Development Economics (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

The invention relates to the technical field of wind control in the Internet financial industry, in particular to a whale algorithm-based method for detecting fraud by optimizing an LVQ neural network, which comprises the following six steps: compared with neural networks such as BP, RBF and SOM, the LVQ neural network has the advantages of simple structure, short training time, stronger nonlinear classification processing capability and the like, compared with optimization algorithms such as genetic algorithm and particle swarm algorithm, the whale algorithm has the characteristics of simple parameter setting, strong function optimization capability, strong global optimization capability, good convergence stability and the like, in the invention, the initial weight of the LVQ neural network is optimized by the whale algorithm to improve the global fitting capability, the learning rate and the prediction accuracy of the LVQ neural network, and the requirement of real-time detection of networked financial fraud behaviors can be met.

Description

Whale algorithm-based fraud detection method for optimizing LVQ neural network
Technical Field
The invention relates to the technical field of wind control in the Internet financial industry, in particular to a whale algorithm-based method for detecting fraud by optimizing an LVQ neural network.
Background
In recent years, with the continuous progress and development of economy in China, more and more people begin to use credit for consumption, the demand for credit evaluation is higher, and machine learning algorithms such as logistic regression, decision trees, support vector machines and the like are successfully applied to credit evaluation, but the algorithms have a common identification effect on financial fraud, and with the development of artificial intelligence, a neural network plays an important role in internet financial fraud identification. Neural networks such as error Back Propagation (BP), Radial Basis Function (RBF), self-organizing map (SOM) and the like become important research fields of internet financial credit assessment. However, BP and RBF neural networks have the defects of low learning speed, high possibility of falling into local minimum values and low prediction result precision, and SOM adopts unsupervised learning rules and lacks classification information, so that the demand for a cheating behavior detection method based on whale algorithm optimization LVQ neural network is increasing day by day.
The Learning Vector Quantization (LVQ) neural network is a forward neural network with a supervised learning method for training a competition layer, which is evolved from a Self-organizing feature mapping (SOM) neural network. As an extension of the SOM, the LVQ neural network integrates the characteristics of competitive learning thought and supervised learning algorithm, overcomes the defect that the SOM lacks of classification information value, does not need to normalize and orthogonalize the input quantity, only needs to directly calculate the distance between the input vector and a competitive layer, thereby realizing mode discrimination, and can effectively avoid the strong limitation that the linear network requires data to be linearly separable under the action of the competitive layer. Compared with neural networks such as BP, RBF and SOM, the LVQ neural network has the advantages of short training time, strong classification capability, high prediction precision and the like.
The LVQ neural network also has the defects that only winning nodes are learned each time, so that the waste of information resources between input samples and competing nodes is caused, the initial values are very sensitive in the initial stage of network learning, partial nodes are not fully utilized to become dead points due to too large initial weights, the nodes are not fully utilized, so that input vectors cannot be well clustered in a competing layer, and the network convergence speed and the classification accuracy are influenced.
In the prior art, a group intelligent algorithm is adopted to optimize the initial weight of the LVQ neural network, such as a genetic algorithm and a particle swarm algorithm, but the genetic algorithm has long running time and poor local search capability in actual use; the particle swarm algorithm has poor global search capability in the later period due to a population position updating mechanism, and the prediction precision of the neural network is reduced. The initial weight of the LVQ neural network is selected through a novel optimization algorithm to improve the global fitting capacity, the learning rate and the prediction accuracy of the LVQ neural network, and a fraud detection method based on whale algorithm optimization LVQ neural network is provided for the problems.
Disclosure of Invention
The invention aims to provide a whale algorithm-based cheating behavior detection method for optimizing an LVQ neural network, so as to solve the problems in the background technology.
In order to achieve the purpose, the invention provides the following technical scheme:
a whale algorithm-based cheating behavior detection method for optimizing an LVQ neural network comprises the following six steps:
s1, collecting a certain proportion of normal and fraudulent customers as modeling samples, collecting customer account registration personal basic information of the modeling samples, and obtaining operation behavior buried point data from monitoring software as credit data;
s2, removing abnormal data in the credit data by using a Levina criterion, and then dividing the samples into a training set and a testing set;
s3, constructing an LVQ neural network, determining a network topology structure and initializing network parameters, screening the credit evaluation indexes with the most representativeness as the input of the LVQ model through a logistic regression algorithm, and taking the cheating performance of a client as the output of the LVQ model;
s4, initializing whale algorithm parameters, optimizing the weights of the LVQ neural network through a whale algorithm to obtain optimal initial weights, endowing the optimal initial weights to the LVQ neural network, and inputting training set samples for learning;
s5, inputting a test set sample to the trained LVQ neural network for prediction, and comparing model precision evaluation indexes with LVQ neural network prediction models optimized by genetic algorithm and particle swarm optimization;
s6, deploying the optimized LVQ neural network fraud detection model to an application platform, acquiring data of a real-time application client, importing the data serving as a sample to be detected into the detection model, outputting a fraud prediction result, realizing real-time detection of the application client, inputting data of a new fraud client into model training at regular intervals, and realizing iterative updating of the model.
Preferably, in S1, a certain proportion and quantity of normal applications and fraudulent clients are selected as modeling samples from the back end of the internet financial platform according to the post-loan performance, and the personal basic information and the operation behavior buried point data are acquired from the monitoring software when the sample client account is registered and applied. The personal application information of the user comprises: the mobile phone number, the academic calendar, the marital status, the working unit, the address, the contact information, the personal basic information, the credit transaction information, the public information and the special record data which are acquired by the credit investigation report; the data of the buried point comprises equipment behavior data and log data which are collected when the point is buried, wherein the equipment behavior data comprises: the number of times, the number of clicks, the click frequency, the total input time and the average time, the mobile phone number data, the GPS position, the MAC address, the IP address data, the geographic information application frequency, the IP application frequency, the equipment electric quantity ratio and the average acceleration of the gyroscope of logging on the platform, and the log data comprises: login times within 7 days, time from the first click to the application of credit, the maximum number of sessions within one day, behavior statistics of one week before the application of credit and the like. In addition, under the compliance requirement, the method is not limited to obtaining the universe multi-dimensional big data including mobile internet behavior data, behavior data in the loan APP, credit history and operator data.
Preferably, in S2, since the LVQ neural network does not need to normalize and orthogonalize the input quantity, but needs to process the abnormal data to reduce noise, the patent uses the criterion of leine to eliminate the abnormal data, and the process is as follows:
s21, for one collected data set X ═ X (X)i1,xi2,…,xij,…,xiN) And (3) calculating an arithmetic mean value and a residual error of the arithmetic mean value, wherein the calculation expression is as follows:
Figure BDA0002853274880000031
s22, obtaining the root mean square deviation according to the Bessel method, and calculating the expression as follows:
Figure BDA0002853274880000041
wherein N is the number of data set samples,
Figure BDA0002853274880000042
is an arithmetic mean value, vijIs the residual error and σ is the root mean square deviation.
S23, judging whether the data are satisfied
Figure BDA0002853274880000043
X is thenijFor normal data, data x is reservedijElse delete data xij
Preferably, in S3, the Learning Vector Quantization (LVQ) neural network is a forward neural network with a training competition layer evolved from a Self-organizing feature mapping (SOM) neural network.
The LVQ network consists of three layers, namely an input layer, a competition layer and an output layer, wherein one connection of the nodes of the input layer and the nodes of the competition layer corresponds to one connection weight; the competition layer is mainly responsible for finishing the classification of input vectors in nodes of the input layer, and each node corresponds to a reference vector formed by all connection weights corresponding to the corresponding node; and the nodes in the competition layer are connected with the nodes in the output layer in a one-to-one correspondence manner, and the output layer is used for outputting the detection result outwards.
The LVQ network algorithm can be divided into an LVQ1 algorithm and an LVQ2 algorithm, the LVQ1 algorithm is adopted in the patent, and the basic idea is as follows: if the category of the input vector is consistent with the category corresponding to the linear output layer node, the corresponding competition layer node weight moves along the direction of the input vector; otherwise, the corresponding weight of the node of the competition layer moves along the reverse direction of the input vector. The LVQ1 algorithm mainly comprises the following steps:
s31, creating and initializing the LVQ neural network, wherein the number of the nodes of the competition layer can be selected according to the following empirical formula:
Figure BDA0002853274880000044
wherein s is the number of nodes of the competition layer; n is the number of nodes of the input layer; m is the number of output layer nodes; a is a constant of 0 to 10.
S32, selecting input vector X ═ X1,x2,…,xj,…,xn)TN is the number of input nodes, and the desired output of the input variable D ═ D1,d2,…,dm)TM is the number of nodes of the output layer, and the connection weight W between the input layer and the competition layer is initializedijAnd a learning rate η (η > 0). Sending the input vector X into the input layer, and calculating the distance d between the ith competition layer node and the input vectoriThe calculation formula is as follows:
Figure BDA0002853274880000051
wherein d isiThe distance between the ith competition layer node and the input vector is shown, i is 1,2, …, and s is the number of competition layer nodes; x is the number ofjIs an input vector, WijIs the connection weight between the input layer and the competing layer.
S33, finding the minimum distance between the input vector and the weight vector as excitation node i, and marking the minimum distance as diAnd excitation node diClass label of connected linear output layer node is marked as Ci
S34, updating the connection weight, judging whether the LVQ network classification is correct according to whether the network classification output is consistent with the expected classification, adopting different rules to correct the weight of the excitation node, and if the class label corresponding to the input vector is recorded as CxClass C corresponding to excitation node iiClass C corresponding to input vectorxCoincidence, i.e. Ci=CxWhen the network output type is consistent with the target type, the weight is corrected towards the direction of the input vector, and the expression of the updated connection weight value is as follows:
Wij-new=Wij-old+η(x-Wij-old)
otherwise, the weight is corrected to the opposite direction of the input vector, and the expression is as follows:
Wij-new=Wij-old-η(x-Wij-old)
wherein eta is learning efficiency; w is aij_old、Wij_newConnecting the weights of the node i and the node j before and after adjustment;
s35, updating learning rate
Figure BDA0002853274880000052
Iterative training, when t < tmaxAnd if the time t is t +1, the step S32 is switched to input the next sample, and when the LVQ neural network training time reaches the set maximum iteration time or meets the error precision requirement, the training is finished.
S36, LVQ output
When the sample data is used as the input of the LVQ, the node of the competition layer generates a winning node through the competition rule that the winner is the king, the output is 1, and the output of the loser is 0. The output of the output node connected with the group where the winning node is located is 1, and the other outputs are 0, so that the purpose of classifying and identifying the output samples is achieved. But the convergence speed and prediction accuracy of the LVQ neural network are affected by the initial weight.
Preferably, in S4, a Whale algorithm is used to optimize the initial weights of the LVQ neural network, the Whale algorithm (WOA) simulates a new group intelligent optimization algorithm of the Whale bubble net hunting behavior of Whale, and the optimal solution is found by continuously approaching the hunting process of the prey through the spiral bubble net strategy, the shrink wrapping, the spiral position updating and the random hunting mechanism of the Whale. The whale algorithm optimizes the initial weight of the LVQ neural network as follows:
s41, initializing an artificial whale population, wherein the individual code of the artificial whale population is W
Setting population size WSize(equal to the LVQ neural network weight number n) and the maximum iteration number TmaxWhen the current iteration time t is equal to 0, randomly generating WSizeThe candidate solution constitutes a population WP={X1,X2,…,Xi,…,XWSizeIn which X isiIs the ith candidate solution in the population, and XiThe initial weight x of the LVQ neural network is storedi=[W1,W2,…,Wj,…,Wn]Whale initial position x0
S42, identifying surrounding prey
When the whale with a standing head identifies a prey, the position of the prey is not known a priori, so that the target prey position is assumed to be the position of the optimal or near optimal individual whale in the current population, other individual whales are close to the target prey, and the position updating formula can be described as follows:
X(t+1)=X(t)-A·D
D=|CX(t)-X(t)|
wherein D is a distance coefficient between the current whale and the optimal whale; x (t) is an optimal position vector of whale individuals in the contemporary population, namely a local optimal solution, and t is the current iteration number; x (t) is a position vector of the whale individual in the current generation, and is updated in real time; a and C are coefficient variables, wherein A is a random parameter in an interval of [ -2,2] and determines the switching of wandering foraging and surrounding contraction links, and C is a random number in an interval of [0,2] and controls the influence of the distance of a randomly selected whale position vector Xrand from a current position X (t). The coefficient variables a and C are expressed as follows:
A=2ar1-a
C=2·r2
wherein r is1And r2Is [0, 1]]A random number in between; a is a convergence factor or a control parameter, and the expression is as follows:
Figure BDA0002853274880000071
wherein t represents the current iteration number; t ismaxIs the maximum number of iterations.
The convergence factor a is linearly reduced from 2 to 0, the search range is continuously reduced in training, the coefficient vector A is influenced, surrounding contraction of prey is achieved, and the convergence speed of the whale algorithm is improved.
S43 attack by bubble net
The method for foraging by air bubbles of whales in standing head to attack prey mainly comprises contraction and enclosure and spiral ascending position updating, wherein the contraction and enclosure realizes the enclosure and contraction of the prey by reducing a convergence factor a in the contraction and enclosure, and the spiral ascending position updating is that the whales update the positions of the whales from bottom to top along a spiral shape by manufacturing the air bubbles to gradually contract the enclosure range of the prey.
When the whale with the standing head attacks a prey through the bubble net, the contraction surrounding and the spiral ascending position updating are synchronous, in order to simulate the point, the position of one whale in n whales is randomly selected as an optimal whale position vector X (t), other non-optimal whales update the whale position for a contraction surrounding mechanism or a spiral mechanism according to an optimal whale position random selection probability p updating mode, the global search and local development of the whales during optimizing are guaranteed, and optimizing blind spots are reduced.
This patent chooses to choose the probability of choosing between the two to select to contract to enclose the mechanism or choose spiral position to update the mechanism with the probability p ═ 50%, and mathematical model can be expressed as:
Figure BDA0002853274880000072
wherein p is [0, 1]]Random number of (2), DPThe distance between the current optimal position of whale and a prey is represented by | X (t) -X (t) |, X (t) is the optimal position vector of individual whale in the contemporary population, X (t) is the position vector of the current whale, and t is the current iteration number; b is a defined logarithmic spiral shape constant; l is [ -1,1]Wherein when l is-1, the artificial whale is closest to the food, and when l is 1, the artificial whale is closest to the foodThe fish are furthest from the food.
S44, searching for predation
The whale in the sitting position can randomly search for the prey in addition to attacking the prey by the bubble net. Based on the variable coefficient A, the fluctuation range of A also decreases with a, when the value of a decreases from 2 to 0 in the iterative process, A is a random value in [ -a, a ], when A is between [ -1,1], namely | A | <1, the position searched by the whale colony is the position of the target prey, and the whale colony approaches to the target prey to attack the prey.
When A >1 or A < -1, i.e. | A | >1, whale population will perform mobile search away from the prey, thereby finding a more suitable prey, which can enhance the exploration ability of the algorithm to enable the WOA algorithm to perform global search. The mathematical model is as follows:
D=|C·Xrand-X(t)|
X(t+1)=Xrand-A·D
wherein, XrandIs a randomly selected whale position vector;
s45 setting fitness function
As the model training pursues the minimization of the mean square error, the mean square error of the LVQ neural network is taken as a fitness function of the whale algorithm, and the fitness function is expressed as the following formula:
Figure BDA0002853274880000081
wherein, yi,y'iRespectively an actual value and a predicted value of the ith sample; n is the number of training sample sets.
S46, iteration updating
Training an LVQ neural network by using training samples, calculating the fitness value of each individual in a whale population according to the error of each training sample, comparing fitness functions of all whale individuals and the optimal whale X (t) in a whale position updating X (t +1), selecting the whale individual with the best fitness function as the optimal whale of the next generation, updating the position of the optimal whale, and respectively endowing j parameter values of the whale to the jth weight in the LVQ neural network, wherein the fitness functions comprise the following steps:
ωj=Xi,j(t), j is 1,2, …, m is the number of nodes in output layer
Recording the optimal whale position vector at the last iteration
Figure BDA0002853274880000091
And giving an initial weight value of the whale position vector pair to the LVQ neural network, and inputting the training set sample into the LVQ neural network for learning and training.
Preferably, in S5, the test set is input into the trained and tested LVQ neural network to output the application detection result, and the prediction accuracy of the model is calculated, which is convenient for testing the training result of the LVQ neural network.
Preferably, in S5, the training samples are compared according to the actual and predicted results to obtain a confusion matrix, and the values of the following indexes, i.e., true Positive rate tpr (true Positive rate), false Positive rate fpr (false Positive rate), auc (area Under curve) and KS (Kolmogorov-Smirnov), can be calculated as follows:
Figure BDA0002853274880000092
Figure BDA0002853274880000093
KS=max(TPR-FPR)
wherein, True Positive (TP) means that the model correctly predicts the Positive class sample as the Positive class; true Negative (TN) refers to the model correctly predicting Negative class samples as Negative classes; false Positive example (FP) refers to a model that incorrectly predicts negative class samples as Positive classes; false Negative (FN) refers to a model that correctly predicts Negative class samples as Negative classes. In this application, the fraud sample is used as a positive type, and the normal sample is used as a negative type.
The TPR is taken as a vertical axis, the FPR is taken as a horizontal axis for plotting to obtain an ROC (receiver operating characteristic Curve), an AUC value (Area Under the ROC Curve) obtained by the ROC Curve is taken as an evaluation standard for measuring the accuracy of the model, and the effect of the model is better when the AUC value is closer to 1.
The KS value is the maximum value of the difference between the TPR and the FPR, the optimal distinguishing effect of the model can be reflected, the threshold value taken at the moment is generally used as the optimal threshold value for defining good and bad users, and generally KS is larger than 0.2, so that the model has better prediction accuracy.
Preferably, in S6, the optimized LVQ neural network fraud detection model is deployed to the application platform, data of the real-time application client is acquired and imported as a sample to be detected into the detection model to output a fraud prediction result, so as to implement real-time detection of the application client, and periodically input new data of the fraud client into model training to implement iterative update of the model.
Compared with the prior art, the invention has the beneficial effects that:
1. compared with neural networks such as BP, RBF and SOM, the LVQ neural network has the advantages of simple structure, short training time, stronger nonlinear classification processing capability and the like.
2. Compared with optimization algorithms such as a genetic algorithm, a particle swarm algorithm and the like, the whale algorithm has the advantages of simple parameter setting, strong function optimization capability, strong global optimization capability, good convergence stability and the like.
3. In the invention, the whale algorithm is utilized to optimize the initial weight of the LVQ neural network to improve the global fitting capability, the learning rate and the prediction precision of the LVQ neural network, and the requirement of real-time detection of internet financial fraud behaviors can be met.
Drawings
FIG. 1 is a schematic view of the overall process of the present invention.
Detailed Description
Referring to fig. 1, the present invention provides a technical solution:
a whale algorithm-based cheating behavior detection method for optimizing an LVQ neural network comprises the following six steps:
s1, collecting a certain proportion of normal and fraudulent customers as modeling samples, collecting customer account registration personal basic information of the modeling samples, and obtaining operation behavior buried point data from monitoring software as credit data;
s2, removing abnormal data in the credit data by using a Levina criterion, and then dividing the samples into a training set and a testing set;
s3, constructing an LVQ neural network, determining a network topology structure and initializing network parameters, screening the credit evaluation indexes with the most representativeness as the input of the LVQ model through a logistic regression algorithm, and taking the cheating performance of a client as the output of the LVQ model;
s4, initializing whale algorithm parameters, optimizing the weights of the LVQ neural network through a whale algorithm to obtain optimal initial weights, endowing the optimal initial weights to the LVQ neural network, and inputting training set samples for learning;
s5, inputting a test set sample to the trained LVQ neural network for prediction, and comparing model precision evaluation indexes with LVQ neural network prediction models optimized by genetic algorithm and particle swarm optimization;
s6, deploying the optimized LVQ neural network fraud detection model to an application platform, acquiring data of a real-time application client, importing the data serving as a sample to be detected into the detection model, outputting a fraud prediction result, realizing real-time detection of the application client, inputting data of a new fraud client into model training at regular intervals, and realizing iterative updating of the model.
In S1, normal application and fraud customers in certain proportion and quantity are selected as modeling samples from the back end of the Internet financial platform according to the post-loan performance, personal basic information when the account of the sample customer is registered and applied is collected, and operation behavior buried point data is obtained from monitoring software. The personal application information of the user comprises: the mobile phone number, the academic calendar, the marital status, the working unit, the address, the contact information, the personal basic information, the credit transaction information, the public information and the special record data which are acquired by the credit investigation report; the data of the buried point comprises equipment behavior data and log data which are collected when the point is buried, wherein the equipment behavior data comprises: the number of times, the number of clicks, the click frequency, the total input time and the average time, the mobile phone number data, the GPS position, the MAC address, the IP address data, the geographic information application frequency, the IP application frequency, the equipment electric quantity ratio and the average acceleration of the gyroscope of logging on the platform, and the log data comprises: login times within 7 days, time from the first click to the application of credit, the maximum number of sessions within one day, behavior statistics of one week before the application of credit and the like. In addition, under the compliance requirement, the method is not limited to obtaining the universe multi-dimensional big data including mobile internet behavior data, behavior data in the loan APP, credit history and operator data, and the arrangement is favorable for comprehensively counting user information so as to be convenient for subsequently predicting the credit risk of the user.
In S2, since the LVQ neural network does not need to normalize and orthogonalize the input data, but needs to process the abnormal data to reduce noise, the patent uses the leine criterion to eliminate the abnormal data, and the process is as follows:
s21, for one collected data set X ═ X (X)i1,xi2,…,xij,…,xiN) And (3) calculating an arithmetic mean value and a residual error of the arithmetic mean value, wherein the calculation expression is as follows:
Figure BDA0002853274880000121
s22, obtaining the root mean square deviation according to the Bessel method, and calculating the expression as follows:
Figure BDA0002853274880000122
wherein N is the number of data set samples,
Figure BDA0002853274880000124
is an arithmetic mean value, vijIs the residual error and σ is the root mean square deviation.
S23, judging whether the data are satisfied
Figure BDA0002853274880000123
X is thenijFor normal data, data x is reservedijElse delete data xijThe setting realizes the elimination of abnormal data and reduces the interference on the prediction model.
In S3, the Learning Vector Quantization (LVQ) neural network is a forward neural network with a training competition layer evolved from Self-organizing feature mapping (SOM) neural network.
The LVQ network consists of three layers, namely an input layer, a competition layer and an output layer, wherein one connection of the nodes of the input layer and the nodes of the competition layer corresponds to one connection weight; the competition layer is mainly responsible for finishing the classification of input vectors in nodes of the input layer, and each node corresponds to a reference vector formed by all connection weights corresponding to the corresponding node; and the nodes in the competition layer are connected with the nodes in the output layer in a one-to-one correspondence manner, and the output layer is used for outputting the detection result outwards.
The LVQ network algorithm can be divided into an LVQ1 algorithm and an LVQ2 algorithm, the LVQ1 algorithm is adopted in the patent, and the basic idea is as follows: if the category of the input vector is consistent with the category corresponding to the linear output layer node, the corresponding competition layer node weight moves along the direction of the input vector; otherwise, the corresponding weight of the node of the competition layer moves along the reverse direction of the input vector. The LVQ1 algorithm mainly comprises the following steps:
s31, creating and initializing the LVQ neural network, wherein the number of the nodes of the competition layer can be selected according to the following empirical formula:
Figure BDA0002853274880000131
wherein s is the number of nodes of the competition layer; n is the number of nodes of the input layer; m is the number of output layer nodes; a is a constant of 0 to 10.
S32, selecting input vector X ═ X1,x2,…,xj,…,xn)TN is the number of input nodes, and the desired output of the input variable D ═ D1,d2,…,dm)TM is the number of nodes of the output layer, initialize the input layer and competeConnection weight W between layersijAnd a learning rate η (η > 0). Sending the input vector X into the input layer, and calculating the distance d between the ith competition layer node and the input vectoriThe calculation formula is as follows:
Figure BDA0002853274880000132
wherein d isiThe distance between the ith competition layer node and the input vector is shown, i is 1,2, …, and s is the number of competition layer nodes; x is the number ofjIs an input vector, WijIs the connection weight between the input layer and the competing layer.
S33, finding the minimum distance between the input vector and the weight vector as excitation node i, and marking the minimum distance as diAnd excitation node diClass label of connected linear output layer node is marked as Ci
S34, updating the connection weight, judging whether the LVQ network classification is correct according to whether the network classification output is consistent with the expected classification, adopting different rules to correct the weight of the excitation node, and if the class label corresponding to the input vector is recorded as CxClass C corresponding to excitation node iiClass C corresponding to input vectorxCoincidence, i.e. Ci=CxWhen the network output type is consistent with the target type, the weight is corrected towards the direction of the input vector, and the expression of the updated connection weight value is as follows:
Wij-new=Wij-old+η(x-Wij-old)
otherwise, the weight is corrected to the opposite direction of the input vector, and the expression is as follows:
Wij-new=Wij-old-η(x-Wij-old)
wherein eta is learning efficiency; w is aij_old、Wij_newConnecting the weights of the node i and the node j before and after adjustment;
s35, updating learning rate
Figure BDA0002853274880000141
Iterative training, when t < tmaxAnd if the time t is t +1, the step S32 is switched to input the next sample, and when the LVQ neural network training time reaches the set maximum iteration time or meets the error precision requirement, the training is finished.
S36, LVQ output
When the sample data is used as the input of the LVQ, the node of the competition layer generates a winning node through the competition rule that the winner is the king, the output is 1, and the output of the loser is 0. The output of the output node connected with the group where the winning node is located is 1, and the other outputs are 0, so that the purpose of classifying and identifying the output samples is achieved. But the convergence speed and prediction accuracy of the LVQ neural network are affected by the initial weight.
In S4, a Whale Algorithm is used to optimize the initial weights of the LVQ neural network, the Whale Algorithm (WOA) simulates a new group intelligent Optimization Algorithm of the Whale bubble net hunting behavior of Whale, and an optimal solution is found through a hunting process that continuously approaches the hunting of the hunting object through the spiral bubble net strategy, the shrink wrapping, the spiral position updating and the random hunting mechanism of the Whale. The whale algorithm optimizes the initial weight of the LVQ neural network as follows:
s41, initializing an artificial whale population, wherein the individual code of the artificial whale population is W
Setting population size WSize(equal to the LVQ neural network weight number n) and the maximum iteration number TmaxWhen the current iteration time t is equal to 0, randomly generating WSizeThe candidate solution constitutes a population WP={X1,X2,…,Xi,…,XWSizeIn which X isiIs the ith candidate solution in the population, and XiThe initial weight x of the LVQ neural network is storedi=[W1,W2,…,Wj,…,Wn]Whale initial position x0
S42, identifying surrounding prey
When the whale with a standing head identifies a prey, the position of the prey is not known a priori, so that the target prey position is assumed to be the position of the optimal or near optimal individual whale in the current population, other individual whales are close to the target prey, and the position updating formula can be described as follows:
X(t+1)=X(t)-A·D
D=|CX(t)-X(t)|
wherein D is a distance coefficient between the current whale and the optimal whale; x (t) is an optimal position vector of whale individuals in the contemporary population, namely a local optimal solution, and t is the current iteration number; x (t) is a position vector of the whale individual in the current generation, and is updated in real time; a and C are coefficient variables, wherein A is a random parameter in an interval of [ -2,2] and determines the switching of wandering foraging and surrounding contraction links, and C is a random number in an interval of [0,2] and controls the influence of the distance of a randomly selected whale position vector Xrand from a current position X (t). The coefficient variables a and C are expressed as follows:
A=2ar1-a
C=2·r2
wherein r is1And r2Is [0, 1]]A random number in between; a is a convergence factor or a control parameter, and the expression is as follows:
Figure BDA0002853274880000151
wherein t represents the current iteration number; t ismaxIs the maximum number of iterations.
The convergence factor a is linearly reduced from 2 to 0, the search range is continuously reduced in training, the coefficient vector A is influenced, surrounding contraction of prey is achieved, and the convergence speed of whales is improved.
S43 attack by bubble net
The method for foraging by air bubbles of whales in standing head to attack prey mainly comprises contraction and enclosure and spiral ascending position updating, wherein the contraction and enclosure realizes the enclosure and contraction of the prey by reducing a convergence factor a in the contraction and enclosure, and the spiral ascending position updating is that the whales update the positions of the whales from bottom to top along a spiral shape by manufacturing the air bubbles to gradually contract the enclosure range of the prey.
When the whale with the standing head attacks a prey through the bubble net, the contraction surrounding and the spiral ascending position updating are synchronous, in order to simulate the point, the position of one whale in n whales is randomly selected as an optimal whale position vector X (t), other non-optimal whales update the whale position for a contraction surrounding mechanism or a spiral mechanism according to an optimal whale position random selection probability p updating mode, the global search and local development of the whales during optimizing are guaranteed, and optimizing blind spots are reduced.
This patent chooses to choose the probability of choosing between the two to select to contract to enclose the mechanism or choose spiral position to update the mechanism with the probability p ═ 50%, and mathematical model can be expressed as:
Figure BDA0002853274880000161
wherein p is [0, 1]]Random number of (2), DPThe distance between the current optimal position of whale and a prey is represented by | X (t) -X (t) |, X (t) is the optimal position vector of individual whale in the contemporary population, X (t) is the position vector of the current whale, and t is the current iteration number; b is a defined logarithmic spiral shape constant; l is [ -1,1]Wherein when l is-1, the artificial whale is closest to the food, and when l is 1, the artificial whale is farthest from the food.
S44, searching for predation
The whale in the sitting position can randomly search for the prey in addition to attacking the prey by the bubble net. Based on the variable coefficient A, the fluctuation range of A also decreases with a, when the value of a decreases from 2 to 0 in the iterative process, A is a random value in [ -a, a ], when A is between [ -1,1], namely | A | <1, the position searched by the whale colony is the position of the target prey, and the whale colony approaches to the target prey to attack the prey.
When A >1 or A < -1, i.e. | A | >1, whale population will perform mobile search away from the prey, thereby finding a more suitable prey, which can enhance the exploration ability of the algorithm to enable the WOA algorithm to perform global search. The mathematical model is as follows:
D=|C·Xrand-X(t)|
X(t+1)=Xrand-A·D
wherein, XrandIs a randomly selected whale position vector;
s45 setting fitness function
As the model training pursues the minimization of the mean square error, the mean square error of the LVQ neural network is taken as a fitness function of the whale algorithm, and the fitness function is expressed as the following formula:
Figure BDA0002853274880000171
wherein, yi,y'iRespectively an actual value and a predicted value of the ith sample; n is the number of training sample sets.
S46, iteration updating
Training an LVQ neural network by using training samples, calculating the fitness value of each individual in a whale population according to the error of each training sample, comparing fitness functions of all whale individuals and the optimal whale X (t) in a whale position updating X (t +1), selecting the whale individual with the best fitness function as the optimal whale of the next generation, updating the position of the optimal whale, and respectively endowing j parameter values of the whale to the jth weight in the LVQ neural network, wherein the fitness functions comprise the following steps:
ωj=Xi,j(t), j is 1,2, …, m is the number of nodes in output layer
Recording the optimal whale position vector at the last iteration
Figure BDA0002853274880000172
Giving an LVQ neural network to the initial weight of the whale position vector pair, inputting a training set sample into the LVQ neural network for learning and training, and facilitating the improvement of prediction precision through repeated iteration in the setting.
In S5, the test set is input to the trained and tested LVQ neural network to output the application detection result, and the prediction accuracy of the model is calculated for comparison with the actual data.
In S5, the training samples are compared with the actual and predicted results to obtain a confusion matrix, and the values of the following indexes, i.e., true Positive rate tpr (true Positive rate), false Positive rate fpr (false Positive rate), auc (area Under curve) and KS (Kolmogorov-Smirnov), can be calculated as follows:
Figure BDA0002853274880000173
Figure BDA0002853274880000181
KS=max(TPR-FPR)
wherein, True Positive (TP) means that the model correctly predicts the Positive class sample as the Positive class; true Negative (TN) refers to the model correctly predicting Negative class samples as Negative classes; false Positive example (FP) refers to a model that incorrectly predicts negative class samples as Positive classes; false Negative (FN) refers to a model that correctly predicts Negative class samples as Negative classes. In this application, the fraud sample is used as a positive type, and the normal sample is used as a negative type.
The TPR is taken as a vertical axis, the FPR is taken as a horizontal axis for plotting to obtain an ROC (receiver operating characteristic Curve), an AUC value (Area Under the ROC Curve) obtained by the ROC Curve is taken as an evaluation standard for measuring the accuracy of the model, and the effect of the model is better when the AUC value is closer to 1.
The KS value is the maximum value of the difference between the TPR and the FPR, the optimal distinguishing effect of the model can be reflected, the threshold value taken at the moment is generally used as the optimal threshold value for defining good and bad users, and generally KS is larger than 0.2, so that the model has better prediction accuracy.
In S6, the optimized LVQ neural network fraud detection model is deployed to an application platform, data of a real-time application client is acquired and imported into the detection model as a sample to be detected to output a fraud prediction result, so as to implement real-time detection of the application client, and periodically input new data of the fraud client to model training, thereby implementing iterative update of the model.
The principles and embodiments of the present invention are explained herein using specific examples, which are presented only to assist in understanding the method and its core concepts of the present invention. The foregoing is only a preferred embodiment of the present invention, and it should be noted that there are objectively infinite specific structures due to the limited character expressions, and it will be apparent to those skilled in the art that a plurality of modifications, decorations or changes may be made without departing from the principle of the present invention, and the technical features described above may be combined in a suitable manner; such modifications, variations, combinations, or adaptations of the invention using its spirit and scope, as defined by the claims, may be directed to other uses and embodiments.

Claims (8)

1. A whale algorithm-based method for detecting fraud behaviors by optimizing LVQ neural network is characterized by comprising the following steps: the method comprises the following six steps:
s1, collecting a certain proportion of normal and fraudulent customers as modeling samples, collecting customer account registration personal basic information of the modeling samples, and obtaining operation behavior buried point data from monitoring software as credit data;
s2, removing abnormal data in the credit data by using a Levina criterion, and then dividing the samples into a training set and a testing set;
s3, constructing an LVQ neural network, determining a network topology structure and initializing network parameters, screening the credit evaluation indexes with the most representativeness as the input of the LVQ model through a logistic regression algorithm, and taking the cheating performance of a client as the output of the LVQ model;
s4, initializing whale algorithm parameters, optimizing the weights of the LVQ neural network through a whale algorithm to obtain optimal initial weights, endowing the optimal initial weights to the LVQ neural network, and inputting training set samples for learning;
s5, inputting a test set sample to the trained LVQ neural network for prediction, and comparing model precision evaluation indexes with LVQ neural network prediction models optimized by genetic algorithm and particle swarm optimization;
s6, deploying the optimized LVQ neural network fraud detection model to an application platform, acquiring data of a real-time application client, importing the data serving as a sample to be detected into the detection model, outputting a fraud prediction result, realizing real-time detection of the application client, inputting data of a new fraud client into model training at regular intervals, and realizing iterative updating of the model.
2. The method for detecting the cheating behavior based on whale algorithm optimization LVQ neural network as claimed in claim 1, wherein in S1, a certain proportion and quantity of normal applications and cheating customers are selected as modeling samples from the back end of an internet financial platform according to post-loan performance, personal basic information when sample customer accounts are registered and applied is collected, and operation behavior buried point data is obtained from monitoring software. The personal application information of the user comprises: the mobile phone number, the academic calendar, the marital status, the working unit, the address, the contact information, the personal basic information, the credit transaction information, the public information and the special record data which are acquired by the credit investigation report; the data of the buried point comprises equipment behavior data and log data which are collected when the point is buried, wherein the equipment behavior data comprises: the number of times, the number of clicks, the click frequency, the total input time and the average time, the mobile phone number data, the GPS position, the MAC address, the IP address data, the geographic information application frequency, the IP application frequency, the equipment electric quantity ratio and the average acceleration of the gyroscope of logging on the platform, and the log data comprises: login times within 7 days, time from the first click to the application of credit, the maximum number of sessions within one day, behavior statistics of one week before the application of credit and the like. In addition, under the compliance requirement, the method is not limited to obtaining the universe multi-dimensional big data including mobile internet behavior data, behavior data in the loan APP, credit history and operator data.
3. The method for detecting fraud in claim 1, wherein in S2, because the LVQ neural network does not need to normalize and orthogonalize the input data, but needs to process abnormal data to reduce noise, the patent uses the leintean criterion to eliminate the abnormal data, and the procedure is as follows:
s21, for one collected data set X ═ X (X)i1,xi2,…,xij,…,xiN) And (3) calculating an arithmetic mean value and a residual error of the arithmetic mean value, wherein the calculation expression is as follows:
Figure FDA0002853274870000021
s22, obtaining the root mean square deviation according to the Bessel method, and calculating the expression as follows:
Figure FDA0002853274870000022
wherein N is the number of data set samples,
Figure FDA0002853274870000023
is an arithmetic mean value, vijIs the residual error and σ is the root mean square deviation.
S23, judging whether the data are satisfied
Figure FDA0002853274870000024
X is thenijFor normal data, data x is reservedijElse delete data xij
4. The method for detecting fraud behavior based on whale algorithm optimized LVQ neural network as claimed in claim 1, wherein in S3, Learning Vector Quantization (LVQ) neural network is a forward neural network with training competition layer evolved from Self-organizing feature mapping (SOM) neural network.
The LVQ network consists of three layers, namely an input layer, a competition layer and an output layer, wherein one connection of the nodes of the input layer and the nodes of the competition layer corresponds to one connection weight; the competition layer is mainly responsible for finishing the classification of input vectors in nodes of the input layer, and each node corresponds to a reference vector formed by all connection weights corresponding to the corresponding node; and the nodes in the competition layer are connected with the nodes in the output layer in a one-to-one correspondence manner, and the output layer is used for outputting the detection result outwards.
The LVQ network algorithm can be divided into an LVQ1 algorithm and an LVQ2 algorithm, the LVQ1 algorithm is adopted in the patent, and the basic idea is as follows: if the category of the input vector is consistent with the category corresponding to the linear output layer node, the corresponding competition layer node weight moves along the direction of the input vector; otherwise, the corresponding weight of the node of the competition layer moves along the reverse direction of the input vector. The LVQ1 algorithm mainly comprises the following steps:
s31, creating an LVQ neural network and determining a network topology structure according to input and output, wherein the number of nodes of the competition layer can be selected according to the following empirical formula:
Figure FDA0002853274870000031
wherein s is the number of nodes of the competition layer; n is the number of nodes of the input layer; m is the number of output layer nodes; a is a constant of 0 to 10.
S32, selecting input vector X ═ X1,x2,…,xj,…,xn)TN is the number of input nodes, and the desired output of the input variable D ═ D1,d2,…,dm)TM is the number of nodes of the output layer, and the connection weight W between the input layer and the competition layer is initializedijAnd a learning rate η (η > 0). Sending the input vector X into the input layer, and calculating the distance d between the ith competition layer node and the input vectoriThe calculation formula is as follows:
Figure FDA0002853274870000032
wherein d isiThe distance between the ith competition layer node and the input vector is shown, i is 1,2, …, and s is the number of competition layer nodes; x is the number ofjIs an input vector, WijIs a connection between an input layer and a competition layerAnd (6) weighting.
S33, finding the minimum distance between the input vector and the weight vector as excitation node i, and marking the minimum distance as diAnd excitation node diClass label of connected linear output layer node is marked as Ci
S34, updating the connection weight, judging whether the LVQ network classification is correct according to whether the network classification output is consistent with the expected classification, adopting different rules to correct the weight of the excitation node, and if the class label corresponding to the input vector is recorded as CxClass C corresponding to excitation node iiClass C corresponding to input vectorxCoincidence, i.e. Ci=CxWhen the network output type is consistent with the target type, the weight is corrected towards the direction of the input vector, and the expression of the updated connection weight value is as follows:
Wij-new=Wij-old+η(x-Wij-old)
otherwise, the weight is corrected to the opposite direction of the input vector, and the expression is as follows:
Wij-new=Wij-old-η(x-Wij-old)
wherein eta is learning efficiency; w is aij_old、Wij_newConnecting the weights of the node i and the node j before and after adjustment;
s35, updating learning rate
Figure FDA0002853274870000041
Iterative training, when t < tmaxAnd if the time t is t +1, the step S32 is switched to input the next sample, and when the LVQ neural network training time reaches the set maximum iteration time or meets the error precision requirement, the training is finished.
S36, LVQ output
When the sample data is used as the input of the LVQ, the node of the competition layer generates a winning node through the competition rule that the winner is the king, the output is 1, and the output of the loser is 0. The output of the output node connected with the group where the winning node is located is 1, and the other outputs are 0, so that the purpose of classifying and identifying the output samples is achieved. But the convergence speed and prediction accuracy of the LVQ neural network are affected by the initial weight.
5. The method as claimed in claim 1, wherein a Whale Algorithm is used to optimize the initial weight of the LVQ neural network in S4, the Whale Algorithm (WOA) is a new group intelligent Optimization Algorithm simulating the Whale bubble net hunting behavior of Whale, and the optimal solution is found through the hunting process of "spiral bubble net" strategy, shrink wrapping, spiral position updating and random hunting mechanism of Whale. The whale algorithm optimizes the initial weight of the LVQ neural network as follows:
s41, initializing an artificial whale population, wherein the individual code of the artificial whale population is W
Setting population size WSize(equal to the LVQ neural network weight number n) and the maximum iteration number TmaxWhen the current iteration time t is equal to 0, randomly generating WSizeThe candidate solution constitutes a population WP={X1,X2,…,Xi,…,XWSizeIn which X isiIs the ith candidate solution in the population, and XiThe initial weight x of the LVQ neural network is storedi=[W1,W2,…,Wj,…,Wn]Whale initial position x0
S42, identifying surrounding prey
When the whale with a standing head identifies a prey, the position of the prey is not known a priori, so that the target prey position is assumed to be the position of the optimal or near optimal individual whale in the current population, other individual whales are close to the target prey, and the position updating formula can be described as follows:
X(t+1)=X(t)-A·D
D=|CX(t)-X(t)|
wherein D is a distance coefficient between the current whale and the optimal whale; x (t) is an optimal position vector of whale individuals in the contemporary population, namely a local optimal solution, and t is the current iteration number; x (t) is an individual of contemporary whaleThe position vector of (2) is updated in real time; a and C are coefficient variables, where A is the interval [ -2,2]The random parameters above determine the switching of wandering foraging and surrounding contraction links, and C is the interval [0,2]]Random number of (3), control randomly selected whale position vector XrandInfluence of distance from the current position x (t). The coefficient variables a and C are expressed as follows:
A=2ar1-a
C=2·r2
wherein r is1And r2Is [0, 1]]A random number in between; a is a convergence factor or a control parameter, and the expression is as follows:
Figure FDA0002853274870000051
wherein t represents the current iteration number; t ismaxIs the maximum number of iterations.
The convergence factor a is linearly reduced from 2 to 0, the search range is continuously reduced in training, the coefficient vector A is influenced, surrounding contraction of prey is achieved, and the convergence speed of the whale algorithm is improved.
S43 attack by bubble net
The method for foraging by air bubbles of whales in standing head to attack prey mainly comprises contraction and enclosure and spiral ascending position updating, wherein the contraction and enclosure realizes the enclosure and contraction of the prey by reducing a convergence factor a in the contraction and enclosure, and the spiral ascending position updating is that the whales update the positions of the whales from bottom to top along a spiral shape by manufacturing the air bubbles to gradually contract the enclosure range of the prey.
When the whale with the standing head attacks a prey through the bubble net, the contraction surrounding and the spiral ascending position updating are synchronous, in order to simulate the point, the position of one whale in n whales is randomly selected as an optimal whale position vector X (t), other non-optimal whales update the whale position for a contraction surrounding mechanism or a spiral mechanism according to an optimal whale position random selection probability p updating mode, the global search and local development of the whales during optimizing are guaranteed, and optimizing blind spots are reduced.
This patent chooses to choose the probability of choosing between the two to select to contract to enclose the mechanism or choose spiral position to update the mechanism with the probability p ═ 50%, and mathematical model can be expressed as:
Figure FDA0002853274870000061
wherein p is [0, 1]]Random number of (2), DPThe distance between the current optimal position of whale and a prey is represented by | X (t) -X (t) |, X (t) is the optimal position vector of individual whale in the contemporary population, X (t) is the position vector of the current whale, and t is the current iteration number; b is a defined logarithmic spiral shape constant; l is [ -1,1]Wherein when l is-1, the artificial whale is closest to the food, and when l is 1, the artificial whale is farthest from the food.
S44, searching for predation
The whale in the sitting position can randomly search for the prey in addition to attacking the prey by the bubble net. Based on the variable coefficient A, the fluctuation range of A also decreases with a, when the value of a decreases from 2 to 0 in the iterative process, A is a random value in [ -a, a ], when A is between [ -1,1], namely | A | <1, the position searched by the whale colony is the position of the target prey, and the whale colony approaches to the target prey to attack the prey.
When A >1 or A < -1, i.e. | A | >1, whale population will perform mobile search away from the prey, thereby finding a more suitable prey, which can enhance the exploration ability of the algorithm to enable the WOA algorithm to perform global search. The mathematical model is as follows:
D=|C·Xrand-X(t)|
X(t+1)=Xrand-A·D
wherein, XrandIs a randomly selected whale position vector;
s45 setting fitness function
As the model training pursues the minimization of the mean square error, the mean square error of the LVQ neural network is taken as a fitness function of the whale algorithm, and the fitness function is expressed as the following formula:
Figure FDA0002853274870000071
wherein, yi,y'iRespectively an actual value and a predicted value of the ith sample; n is the number of training sample sets.
S46, iteration updating
Training an LVQ neural network by using training samples, calculating the fitness value of each individual in a whale population according to the error of each training sample, comparing fitness functions of all whale individuals and the optimal whale X (t) in a whale position updating X (t +1), selecting the whale individual with the best fitness function as the optimal whale of the next generation, updating the position of the optimal whale, and respectively endowing j parameter values of the whale to the jth weight in the LVQ neural network, wherein the fitness functions comprise the following steps:
ωj=Xi,j(t),j=1,2,…,m
recording the optimal whale position vector at the last iteration
Figure FDA0002853274870000072
And giving an initial weight value of the whale position vector pair to the LVQ neural network, and inputting the training set sample into the LVQ neural network for learning and training.
6. The method and system for credit assessment based on the grayish wolf algorithm optimized generalized regression neural network as claimed in claim 1, wherein in S5, the test set is input to the trained LVQ neural network to output the application detection result and calculate the prediction accuracy of the model.
7. The method for detecting the fraudulent behavior of LVQ neural network based on whale algorithm optimization as claimed in claim 1, wherein in S5, the training samples are compared according to the actual and predicted results to obtain a confusion matrix, and the values of the following indexes, namely, true Positive rate TPR (true Positive rate), false Positive rate FPR (false Positive rate), AUC (area Under Current) and KS (Kolmogorov-Smirnov), can be calculated, wherein the calculation formula is as follows:
Figure FDA0002853274870000081
Figure FDA0002853274870000082
KS=max(TPR-FPR)
wherein, True Positive (TP) means that the model correctly predicts the Positive class sample as the Positive class; true Negative (TN) refers to the model correctly predicting Negative class samples as Negative classes; false Positive example (FP) refers to a model that incorrectly predicts negative class samples as Positive classes; false Negative (FN) refers to a model that correctly predicts Negative class samples as Negative classes. In this application, the fraud sample is used as a positive type, and the normal sample is used as a negative type.
The TPR is taken as a vertical axis, the FPR is taken as a horizontal axis for plotting to obtain an ROC (receiver operating characteristic Curve), an AUC value (Area Under the ROC Curve) obtained by the ROC Curve is taken as an evaluation standard for measuring the accuracy of the model, and the effect of the model is better when the AUC value is closer to 1.
The KS value is the maximum value of the difference between the TPR and the FPR, the optimal distinguishing effect of the model can be reflected, the threshold value taken at the moment is generally used as the optimal threshold value for defining good and bad users, and generally KS is larger than 0.2, so that the model has better prediction accuracy.
8. The method for detecting the cheating behavior based on whale algorithm optimization LVQ neural network of claim 1, characterized in that in S6, the optimized LVQ neural network cheating behavior detection model is deployed to an application platform, data of a real-time application client are obtained and are led into the detection model as a sample to be detected to output a cheating behavior prediction result, the real-time detection of the application client is realized, and the data of a new cheating client is input into model training at regular intervals to realize the iterative update of the model.
CN202011536682.XA 2020-12-23 2020-12-23 Whale algorithm-based fraud detection method for optimizing LVQ neural network Pending CN112581262A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202011536682.XA CN112581262A (en) 2020-12-23 2020-12-23 Whale algorithm-based fraud detection method for optimizing LVQ neural network

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202011536682.XA CN112581262A (en) 2020-12-23 2020-12-23 Whale algorithm-based fraud detection method for optimizing LVQ neural network

Publications (1)

Publication Number Publication Date
CN112581262A true CN112581262A (en) 2021-03-30

Family

ID=75139391

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202011536682.XA Pending CN112581262A (en) 2020-12-23 2020-12-23 Whale algorithm-based fraud detection method for optimizing LVQ neural network

Country Status (1)

Country Link
CN (1) CN112581262A (en)

Cited By (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113240067A (en) * 2021-05-14 2021-08-10 江苏科技大学 RBF neural network optimization method based on improved manta ray foraging optimization algorithm
CN113393331A (en) * 2021-06-10 2021-09-14 罗忠明 Database and algorithm based big data insurance accurate wind control, management, intelligent customer service and marketing system
CN113591240A (en) * 2021-07-27 2021-11-02 重庆大学 Gear grinding machine thermal error model modeling method based on bidirectional LSTM network
CN113610535A (en) * 2021-07-29 2021-11-05 浙江惠瀜网络科技有限公司 Risk monitoring method and device suitable for consumption staging business process
CN113627071A (en) * 2021-06-26 2021-11-09 西安科技大学 Coal-fired boiler NO based on whale algorithm optimization long-time memory networkXSoft measurement method
CN115277067A (en) * 2022-06-15 2022-11-01 广州理工学院 Computer network information vulnerability detection method based on artificial fish swarm algorithm
CN115438592A (en) * 2022-11-08 2022-12-06 成都中科合迅科技有限公司 Industrial research and development design data modeling method based on system engineering
CN115660073A (en) * 2022-12-28 2023-01-31 民航成都物流技术有限公司 Intrusion detection method and system based on harmony whale optimization algorithm
CN115688982A (en) * 2022-10-11 2023-02-03 华能江苏综合能源服务有限公司 Building photovoltaic data completion method based on WGAN and whale optimization algorithm
CN116594353A (en) * 2023-07-13 2023-08-15 湖北工业大学 Machine tool positioning error compensation modeling method and system based on CWP-BPNN
CN117426754A (en) * 2023-12-22 2024-01-23 山东锋士信息技术有限公司 PNN-LVQ-based feature weight self-adaptive pulse wave classification method

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103440495A (en) * 2013-07-31 2013-12-11 华北电力大学(保定) Method for automatically identifying hydrophobic grades of composite insulators
CN108830431A (en) * 2018-08-03 2018-11-16 广东工业大学 A kind of Electricity price forecasting solution and relevant apparatus based on whale optimization algorithm
CN110030843A (en) * 2019-04-12 2019-07-19 广西大学 Based on the heat accumulation type aluminum melting furnace parameter optimization setting method for improving whale optimization algorithm
CN110110930A (en) * 2019-05-08 2019-08-09 西南交通大学 A kind of Recognition with Recurrent Neural Network Short-Term Load Forecasting Method improving whale algorithm
CN110648017A (en) * 2019-08-30 2020-01-03 广东工业大学 Short-term impact load prediction method based on two-layer decomposition technology
CN111767815A (en) * 2020-06-22 2020-10-13 浙江省机电设计研究院有限公司 Tunnel water leakage identification method

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103440495A (en) * 2013-07-31 2013-12-11 华北电力大学(保定) Method for automatically identifying hydrophobic grades of composite insulators
CN108830431A (en) * 2018-08-03 2018-11-16 广东工业大学 A kind of Electricity price forecasting solution and relevant apparatus based on whale optimization algorithm
CN110030843A (en) * 2019-04-12 2019-07-19 广西大学 Based on the heat accumulation type aluminum melting furnace parameter optimization setting method for improving whale optimization algorithm
CN110110930A (en) * 2019-05-08 2019-08-09 西南交通大学 A kind of Recognition with Recurrent Neural Network Short-Term Load Forecasting Method improving whale algorithm
CN110648017A (en) * 2019-08-30 2020-01-03 广东工业大学 Short-term impact load prediction method based on two-layer decomposition technology
CN111767815A (en) * 2020-06-22 2020-10-13 浙江省机电设计研究院有限公司 Tunnel water leakage identification method

Cited By (18)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113240067A (en) * 2021-05-14 2021-08-10 江苏科技大学 RBF neural network optimization method based on improved manta ray foraging optimization algorithm
CN113393331B (en) * 2021-06-10 2022-08-23 罗嗣扬 Database and algorithm based big data insurance accurate wind control, management, intelligent customer service and marketing system
CN113393331A (en) * 2021-06-10 2021-09-14 罗忠明 Database and algorithm based big data insurance accurate wind control, management, intelligent customer service and marketing system
CN113627071B (en) * 2021-06-26 2023-06-20 西安科技大学 Coal-fired boiler NO based on whale algorithm optimization long-short-term memory network X Soft measurement method
CN113627071A (en) * 2021-06-26 2021-11-09 西安科技大学 Coal-fired boiler NO based on whale algorithm optimization long-time memory networkXSoft measurement method
CN113591240B (en) * 2021-07-27 2023-09-05 重庆大学 Modeling method for thermal error model of tooth grinding machine based on bidirectional LSTM network
CN113591240A (en) * 2021-07-27 2021-11-02 重庆大学 Gear grinding machine thermal error model modeling method based on bidirectional LSTM network
CN113610535A (en) * 2021-07-29 2021-11-05 浙江惠瀜网络科技有限公司 Risk monitoring method and device suitable for consumption staging business process
CN115277067A (en) * 2022-06-15 2022-11-01 广州理工学院 Computer network information vulnerability detection method based on artificial fish swarm algorithm
CN115688982A (en) * 2022-10-11 2023-02-03 华能江苏综合能源服务有限公司 Building photovoltaic data completion method based on WGAN and whale optimization algorithm
CN115688982B (en) * 2022-10-11 2024-01-30 华能江苏综合能源服务有限公司 Building photovoltaic data complement method based on WGAN and whale optimization algorithm
CN115438592A (en) * 2022-11-08 2022-12-06 成都中科合迅科技有限公司 Industrial research and development design data modeling method based on system engineering
CN115660073A (en) * 2022-12-28 2023-01-31 民航成都物流技术有限公司 Intrusion detection method and system based on harmony whale optimization algorithm
CN115660073B (en) * 2022-12-28 2024-02-06 民航成都物流技术有限公司 Intrusion detection method and system based on harmony whale optimization algorithm
CN116594353A (en) * 2023-07-13 2023-08-15 湖北工业大学 Machine tool positioning error compensation modeling method and system based on CWP-BPNN
CN116594353B (en) * 2023-07-13 2023-11-07 湖北工业大学 Machine tool positioning error compensation modeling method and system based on CWP-BPNN
CN117426754A (en) * 2023-12-22 2024-01-23 山东锋士信息技术有限公司 PNN-LVQ-based feature weight self-adaptive pulse wave classification method
CN117426754B (en) * 2023-12-22 2024-04-19 山东锋士信息技术有限公司 PNN-LVQ-based feature weight self-adaptive pulse wave classification method

Similar Documents

Publication Publication Date Title
CN112581262A (en) Whale algorithm-based fraud detection method for optimizing LVQ neural network
CN112581263A (en) Credit evaluation method for optimizing generalized regression neural network based on wolf algorithm
CN105488528B (en) Neural network image classification method based on improving expert inquiry method
US11816183B2 (en) Methods and systems for mining minority-class data samples for training a neural network
CN103716204B (en) Abnormal intrusion detection ensemble learning method and apparatus based on Wiener process
CN110336768B (en) Situation prediction method based on combined hidden Markov model and genetic algorithm
CN112634018A (en) Overdue monitoring method for optimizing recurrent neural network based on ant colony algorithm
CN112581264A (en) Grasshopper algorithm-based credit risk prediction method for optimizing MLP neural network
CN111061959B (en) Group intelligent software task recommendation method based on developer characteristics
CN112348655A (en) Credit evaluation method based on AFSA-ELM
CN112215446A (en) Neural network-based unit dynamic fire risk assessment method
CN112529683A (en) Method and system for evaluating credit risk of customer based on CS-PNN
CN109768989A (en) Networks security situation assessment model based on LAHP-IGFNN
CN112529685A (en) Loan user credit rating method and system based on BAS-FNN
CN113903395A (en) BP neural network copy number variation detection method and system for improving particle swarm optimization
CN112634019A (en) Default probability prediction method for optimizing grey neural network based on bacterial foraging algorithm
CN113239638A (en) Overdue risk prediction method for optimizing multi-core support vector machine based on dragonfly algorithm
CN115115389A (en) Express customer loss prediction method based on value subdivision and integrated prediction
CN114117787A (en) Short-term wind power prediction method based on SSA (simple sequence analysis) optimization BP (back propagation) neural network
CN108737429B (en) Network intrusion detection method
CN116015967B (en) Industrial Internet intrusion detection method based on improved whale algorithm optimization DELM
CN117370766A (en) Satellite mission planning scheme evaluation method based on deep learning
CN116452904B (en) Image aesthetic quality determination method
CN112529684A (en) Customer credit assessment method and system based on FWA _ DBN
CN110487519B (en) Structural damage identification method based on ALO-INM and weighted trace norm

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
WD01 Invention patent application deemed withdrawn after publication

Application publication date: 20210330

WD01 Invention patent application deemed withdrawn after publication