CN112529683A - Method and system for evaluating credit risk of customer based on CS-PNN - Google Patents

Method and system for evaluating credit risk of customer based on CS-PNN Download PDF

Info

Publication number
CN112529683A
CN112529683A CN202011351678.6A CN202011351678A CN112529683A CN 112529683 A CN112529683 A CN 112529683A CN 202011351678 A CN202011351678 A CN 202011351678A CN 112529683 A CN112529683 A CN 112529683A
Authority
CN
China
Prior art keywords
pnn
layer
data
model
training
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202011351678.6A
Other languages
Chinese (zh)
Inventor
江远强
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Baiweijinke Shanghai Information Technology Co ltd
Original Assignee
Baiweijinke Shanghai Information Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Baiweijinke Shanghai Information Technology Co ltd filed Critical Baiweijinke Shanghai Information Technology Co ltd
Priority to CN202011351678.6A priority Critical patent/CN112529683A/en
Publication of CN112529683A publication Critical patent/CN112529683A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q40/00Finance; Insurance; Tax strategies; Processing of corporate or income taxes
    • G06Q40/03Credit; Loans; Processing thereof
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/004Artificial life, i.e. computing arrangements simulating life
    • G06N3/006Artificial life, i.e. computing arrangements simulating life based on simulated virtual individual or collective life forms, e.g. social simulations or particle swarm optimisation [PSO]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q10/00Administration; Management
    • G06Q10/04Forecasting or optimisation specially adapted for administrative or management purposes, e.g. linear programming or "cutting stock problem"

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Business, Economics & Management (AREA)
  • Data Mining & Analysis (AREA)
  • General Engineering & Computer Science (AREA)
  • Evolutionary Computation (AREA)
  • Artificial Intelligence (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Molecular Biology (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Strategic Management (AREA)
  • Computing Systems (AREA)
  • Economics (AREA)
  • General Health & Medical Sciences (AREA)
  • Computational Linguistics (AREA)
  • Biophysics (AREA)
  • Biomedical Technology (AREA)
  • Health & Medical Sciences (AREA)
  • Accounting & Taxation (AREA)
  • Development Economics (AREA)
  • Human Resources & Organizations (AREA)
  • General Business, Economics & Management (AREA)
  • Finance (AREA)
  • Marketing (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Technology Law (AREA)
  • Game Theory and Decision Science (AREA)
  • Evolutionary Biology (AREA)
  • Entrepreneurship & Innovation (AREA)
  • Operations Research (AREA)
  • Quality & Reliability (AREA)
  • Tourism & Hospitality (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

The invention relates to the technical field of wind control in the Internet financial industry, in particular to a customer credit risk assessment method and system based on CS-PNN (CS-public network). compared with BP (back propagation) -RBF (radial basis function) neural networks, the PNN integrates Bayesian decision theory and density function estimation on the basis of a radial basis function, and has the advantages of simple network structure, less adjusting parameters, fast running time, no local minimum points and the like; compared with other optimization algorithms such as GA, PSO and ACO, the CS algorithm searches for a global optimal solution by combining simulated cuckoo nest parasitic reproduction behavior and the Levy flight search principle, has the advantages of less parameter setting, high convergence speed, high universality and robustness, easiness in implementation and the like, and can efficiently balance local search and global search of the algorithm; the CS-PNN model obtained by CS optimization of the smoothing factor of the PNN has the advantages of simple network structure, high convergence speed, good fault tolerance, high robustness, high classification accuracy, strong sample addition capability and the like, and can meet the requirement of real-time credit risk assessment of the loan system.

Description

Method and system for evaluating credit risk of customer based on CS-PNN
Technical Field
The invention belongs to the technical field of wind control in the Internet financial industry, and particularly provides a customer credit risk assessment method and system based on CS-PNN.
Background
With the rapid development of internet finance, credit assessment for credit risk of customers becomes an important research field. The credit risk assessment calculates the credit risk of the applicant by using information submitted by the loan or credit card applicant and information provided by a third party, divides the risk into different risk levels, and uses the risk levels as the basis for the approval of the loan or credit card.
Credit assessment is essentially a classification problem in pattern recognition, and machine learning methods are used to classify applicants into customers with different credit ratings according to their characteristics, such as age, gender, marital status, income, and the like. When the model is trained, the classification model is obtained according to the rule found by the historical data, and then the default risk of the future borrower is predicted through the model. In the prior art, machine learning methods such as logistic regression, support vector machines and decision trees are mainly adopted, in the aspect of internet financial credit evaluation, an artificial neural network is proved to be a study model with good performance in recent years, the artificial neural network is an information processing model simulating a brain, fitting of a nonlinear model is realized through modeling of a parallel neural network, and the artificial neural network has self-learning capability and connection storage capability.
The Probability Neural Network (PNN) is a novel feedforward type neural network combining radial basis function neurons and competitive neurons together, a Bayesian decision theory and density function estimation are fused on the basis of radial basis functions, the probability neural network is different from a BP network, a learning algorithm of the PNN does not adjust connection weights among the neurons in a training process, the learning of the network completely depends on data samples, training of a large amount of data is not needed, only a parameter of a smoothing factor needs to be determined, and the probability neural network has the advantages of simple network structure, high convergence speed, good fault tolerance, high robustness, high classification accuracy, strong sample adding capability and the like, and can meet the requirement of real-time data processing.
However, the performance of the PNN is greatly influenced by the smoothing factor delta, and the larger the delta value is in the PNN training, the smoother the fitting of the function is, the higher the prediction precision is, and the faster the operation rate is. Conversely, if the value of δ is too small, it means that many neurons are required to adapt to slow changes in the function, resulting in poor network performance. Therefore, the reasonable selection of the delta value plays a very important role in the classification performance of the network. In the prior art, a cluster intelligent optimization algorithm, such as a genetic algorithm, a particle swarm algorithm and an ant colony algorithm, is adopted to automatically optimize a smoothing factor, but the cluster intelligent optimization algorithm has some limitations in the aspects of the property, parameter adjustment, calculation time and the like of a target problem, most of the cluster intelligent optimization algorithm is biased to global search capability or local search capability, few of the cluster intelligent optimization algorithm and the ant colony algorithm can balance the target problem, the learning training has the problems of long learning process, easiness in precocity, easiness in falling into local optimization, poor robustness and the like, and the learning speed, the learning precision and the optimization searching efficiency are not high.
The optimal smoothing factor delta becomes a bottleneck problem of the PNN, and how to adopt a more suitable intelligent algorithm to optimize the smoothing factor delta of the PNN and apply the smoothing factor delta to the credit risk assessment of the client is a technical problem to be solved urgently by professionals in the field.
Disclosure of Invention
The invention aims to provide a method and a system for evaluating credit risk of a customer based on a CS-PNN (CS-PNN network), so as to solve the problems in the background technology.
In order to achieve the purpose, the invention provides the following technical scheme:
a method and a system for evaluating credit risk of a customer based on CS-PNN comprise the following steps:
s1, constructing sample data, sampling a client with existing loan expression as a modeling sample, and collecting credit characteristic data of the client;
s2, preprocessing the acquired data, normalizing the preprocessed data by adopting a Min-Max method, and dividing the preprocessed data into a training set and a test set according to a ratio of 7: 3;
s3, selecting 10 feature vectors which can most influence the repayment state as input by using logistic regression or random forest at the root of the training set, and establishing a PNN prediction model by using whether the repayment is overdue or not as output;
s4, optimizing the smoothing factor of the PNN by using a CS algorithm, wherein the optimization algorithm takes the accuracy of the model as a target, obtains the optimal smoothing factor by iterative optimization, and outputs an optimization result as an initial parameter of the PNN to obtain a CS-PNN prediction model;
s5, predicting the data of the test set by using the trained CS-PNN prediction model, and evaluating the data by using the root mean square error, the average relative error and the Hill inequality coefficient to verify the quality of the optimization model and obtain the optimized system model;
and S6, deploying the CS-PNN model after offline training to an application platform, extracting characteristic values from online real-time application user data, normalizing the characteristic values, inputting the normalized characteristic values into a trained probabilistic neural network, and outputting a user credit risk score.
Preferably, in S1, the client of the existing loan performance is sampled as a modeling sample, and the credit feature data of the client is collected, wherein the credit feature data comprises personal basic information, operation behavior buried point data and third party data.
Preferably, in S2, in the neural network input quantities, the magnitude of each input quantity is different greatly due to the difference in units; if direct input quantity input is adopted, neuron training is saturated, so that before input training, data must be normalized to be in the same number level, the preprocessed data are normalized by adopting a Min-Max method, and a calculation formula is as follows:
Figure BDA0002801483870000031
wherein the content of the first and second substances,
Figure BDA0002801483870000032
for normalized data, DmaxAs the maximum value of the training sample set, DminAs the minimum of the training sample set, DiThe data itself.
Reconstructing the normalized training sample to respectively obtain an input matrix X and a corresponding output moment Y, and according to the application time, according to the proportion of 7:3 into training and testing sets.
Preferably, in S3, the Probabilistic Neural Network (PNN) is a feedforward type neural network in which radial basis function neurons and competitive neurons are combined together, a gaussian function is used as a basis function, and the neural network is obtained according to probability density function estimation and bayesian classification rules, unlike the BP network, a learning algorithm of the PNN does not adjust a connection weight between neurons in a training process, learning of the network completely depends on data samples, without training of a large amount of data, only one parameter of a smoothing factor needs to be determined, and the PNN has the advantages of simple operation, high robustness, high parallel structure, parallel implementation capability, and the like, and can meet the requirement of processing data in real time.
A PNN is constructed, the PNN network including an input layer, a mode layer, a summing layer, and an output layer. Determining the number of nodes of an input layer, a hidden layer, a summation layer and an output layer of the PNN to be established; and establishing a PNN network model by taking the training sample data as the input of the PNN. The method specifically comprises the following steps:
step 3-1: input layer
The first layer of the PNN, called the input layer, receives values from training samples and passes on feature vectors to the network, the number of sample feature dimensions being equal to the number of neurons in the input layer, and the input vector X (X ═ y1,x2,…,xm)TAnd n is the sample dimension.
Step 3-2: mode layer
The second layer of the PNN is called a mode layer, and is connected to the input layer by a connection weight, the number of neurons in the mode layer is equal to the sum of the training samples of each category, the mode layer calculates the matching relationship between the input feature vector and each mode in the training set, that is, the similarity, and sends the distance to a gaussian function to obtain the output of the mode layer, and the output of the mode layer can be expressed as:
Figure BDA0002801483870000041
wherein X is an input feature vector, Wiδ is the smoothing factor, which is the weight between the input layer and the mode layer.
Step 3-3: summation layer
The third layer of the PNN, called the summation layer, is responsible for connecting the mode layer elements of each class, each class having only one summation layer element, which is added only to the outputs of the mode layer elements belonging to its class, not connected to the mode layers of the other elements, whose outputs are proportional to the estimates of the kernel-based probability densities of the classes. The number of neurons in the summation layer is the number of classes of samples.
Step 3-4: output layer
The fourth layer of PNN is called the output layer, where there are several threshold discriminators, the neuron of which is a competitive neuron, and the one with the largest posterior probability density among the estimated probability densities is taken as the output of the whole system. The output layer neuron number is equal to the class number of the training sample data, receives various probability density functions output from the summation layer, and outputs an m-dimensional vector Y ═ Y1,y2,…,ym)TThe output of the output layer can be expressed as:
Figure BDA0002801483870000051
wherein f (x) is a probability density function; p is the dimension of the training sample feature vector; x is the number ofiIs a training sample feature vector; m is the number of training samples; δ is a smoothing factor; the value of delta determines the width of a bell-shaped curve taking a sample point as a center, is the only parameter needing to be adjusted in the PNN, and has a great influence on the classification accuracy of the sample.
Preferably, in S4, in order to optimize the only parameter to be adjusted in the PNN to be the smoothing factor δ value, the CS algorithm is used to find the optimal smoothing factor of the probabilistic neural network, which specifically includes:
s41: according to a given training sample, determining a network topology structure of the PNN and the number of nodes of each layer, and determining initialization parameters of a CS algorithm, wherein the method comprises the following steps: population size M, maximum number of iterations tmaxDiscovery probability PaAnd a step size factor alpha0
S42: coding the smoothing factor delta to be optimized, and randomly generating M bird nests within a specified range
Figure BDA0002801483870000052
Each bird nest is a set of parameters for a smoothing factor δ, corresponding to a set of solutions for the smoothing factor δ { δ }123,…,δM},i=1,2,...,M;
S43: determining a fitness function, and evaluating the fitness of each nest in the population by using the following formula:
Figure BDA0002801483870000053
where n is the total number of samples, y' (i), and y (i) are the actual output value and the expected output value of the ith sample, respectively.
S44: for each cuckoo, a levy flight is carried out, the aim of which is to replace the less good nests with new and possibly better nests, the path and position updating formula for each cuckoo nest is as follows:
Figure BDA0002801483870000061
wherein the content of the first and second substances,
Figure BDA0002801483870000062
and
Figure BDA0002801483870000063
respectively representing the position vectors of the ith bird nest in the t generation and the t +1 generation;
Figure BDA0002801483870000064
is point-to-point multiplication; alpha is step length control quantity, determines random search range, generally takes 0.1, continuously updates step length in search, reduces search range, and step length updating formula is as followsThe following:
Figure BDA0002801483870000065
wherein alpha is0Is a constant number, xbestIndicating the current nest with the best quality;
l (lambda) is a Levy flight random search path, obeys Levy distribution, and meets the following conditions:
Figure BDA0002801483870000066
where both μ and v follow a normal distribution.
Figure BDA0002801483870000067
Figure BDA0002801483870000068
Wherein Γ represents a standard gamma function; β ═ 1.5; x is the number ofbestIndicating the current best quality nest.
S45: after the position is updated, a random number r is generated to be equal to [0,1 ∈]And probability of discovery PaBy contrast, if r > PaThen according to the Levy principle
Figure BDA0002801483870000069
Randomly changing, calculating the fitness value of the new population, comparing with the fitness value of the previous generation population, keeping the better fitness value, and recording the optimal bird nest xbest(ii) a Otherwise, the value is kept unchanged.
S46: judging the iteration times: if less than the maximum number of iterations tmaxRepeating the step S44 and the step S45, and continuing the next iteration until the condition is met; otherwise, ending the algorithm and enabling the optimal bird nest XbestObtaining the optimal smoothing factor deltabest
S47: the optimized smoothing factor deltabestSubstituting into PNN framework, inputting training sample to train CS-PNN prediction model.
Preferably, in step 5, in order to analyze the model prediction performance in comparison with a PNN model optimized by a genetic algorithm, a particle swarm optimization, and an ant colony optimization, 3 indexes of Root Mean Square Error (RMSE), Average Relative Error (ARE), and hil unequal coefficient (Theil IC) ARE selected to evaluate the prediction effect of the model, and the formula is as follows:
Figure BDA0002801483870000071
Figure BDA0002801483870000072
Figure BDA0002801483870000073
wherein, yiTo test the true value of the sample set, yi' is the predicted value, and n is the number of samples.
The RMSE and the ARE ARE respectively used for measuring the discrete degree and the integral error of the model, and the smaller the numerical value of the RMSE and the ARE is, the smaller the prediction error of the model is, the more stable the model is and the better the effect is; the Theil IC is taken in the (0,1) interval, and the closer to 0, the smaller the error is, and the better the prediction performance of the model is.
Preferably, in step 6, deploying the offline-trained CS-PNN model to an application platform, extracting characteristic values from online real-time application user data, normalizing the characteristic values, inputting the normalized characteristic values into the trained CS-PNN model, outputting user credit assessment, and inputting performance data into the model for training at regular intervals to realize online updating of the model.
Preferably, the evaluation system for the credit risk of the client based on the CS-PNN is also provided, and comprises the following modules:
a dataset acquisition and labeling module: the loan system back end is used for obtaining a training data set comprising application, repayment, operation and third-party data;
the data preprocessing and normalization processing module is used for: the data preprocessing process comprises data cleaning, missing value processing, abnormal value processing, data transformation and data formatting, and the preprocessing is subjected to normalization processing and is divided into a training set and a test set;
a PNN model construction module: determining the number of nodes of an input layer, a hidden layer, a summation layer and an output layer of the probabilistic neural network to be established; taking training sample data as input of a probabilistic neural network, and establishing a probabilistic neural network model;
a CS-PNN model construction module: obtaining an optimal smoothing factor by iteration optimization through a CS algorithm, and outputting an optimization result as an initial parameter of the PNN to obtain a CS-PNN prediction model;
the PNN training test module is used for training the optimized PNN by using a training set and verifying by using a test set to obtain the accuracy of model prediction;
a PNN prediction module: and (4) carrying out credit risk assessment prediction on the online application client by using the trained PNN model.
Compared with the prior art, the invention has the beneficial effects that:
1. compared with BP and RBF neural networks, the PNN integrates Bayes decision theory and density function estimation on the basis of radial basis functions, and has the advantages of simple network structure, less adjusting parameters, fast running time, no local minimum points and the like.
2. Compared with other optimization algorithms such as GA, PSO and ACO, the CS algorithm searches for a global optimal solution by combining simulated cuckoo nest parasitic reproduction behavior and the Levy flight search principle, has the advantages of less parameter setting, high convergence speed, high universality and robustness, easiness in implementation and the like, and can efficiently balance local search and global search of the algorithm.
3. In the invention, the CS-PNN model obtained by CS optimization of the smoothing factor of the PNN has the advantages of simple network structure, high convergence speed, good fault tolerance, high robustness, high classification accuracy, strong sample addition capability and the like, and can meet the requirement of real-time credit risk assessment of a loan system.
Drawings
Fig. 1 is a schematic view of the overall structure of the present invention.
Detailed Description
Example 1:
referring to fig. 1, the present invention provides a technical solution:
a method and a system for evaluating credit risk of a customer based on CS-PNN comprise the following steps:
s1, constructing sample data, sampling a client with existing loan expression as a modeling sample, and collecting credit characteristic data of the client;
s2, preprocessing the acquired data, normalizing the preprocessed data by adopting a Min-Max method, and dividing the preprocessed data into a training set and a test set according to a ratio of 7: 3;
s3, selecting 10 feature vectors which can most influence the repayment state as input by using logistic regression or random forest at the root of the training set, and establishing a PNN prediction model by using whether the repayment is overdue or not as output;
s4, optimizing the smoothing factor of the PNN by using a CS algorithm, wherein the optimization algorithm takes the accuracy of the model as a target, obtains the optimal smoothing factor by iterative optimization, and outputs an optimization result as an initial parameter of the PNN to obtain a CS-PNN prediction model;
s5, predicting the data of the test set by using the trained CS-PNN prediction model, and evaluating the data by using the root mean square error, the average relative error and the Hill inequality coefficient to verify the quality of the optimization model and obtain the optimized system model;
and S6, deploying the CS-PNN model after offline training to an application platform, extracting characteristic values from online real-time application user data, normalizing the characteristic values, inputting the normalized characteristic values into a trained probabilistic neural network, and outputting a user credit risk score.
At S1, the client of the existing loan performance is sampled as a modeling sample, and credit characteristic data of the client is collected, the credit characteristic data includes personal basic information, operating behavior buried point data and third party data, the arrangement is favorable for collecting user data,
in S2, in the neural network input quantities, the orders of magnitude differ greatly due to the unit difference of each input quantity; if direct input quantity input is adopted, neuron training is easy to saturate, therefore, before input training, data must be normalized to be in the same number level, the preprocessed data is normalized by adopting a Min-Max method, and a calculation formula is as follows:
Figure BDA0002801483870000101
wherein the content of the first and second substances,
Figure BDA0002801483870000102
for normalized data, DmaxAs the maximum value of the training sample set, DminAs the minimum of the training sample set, DiThe data itself.
Reconstructing the normalized training sample to respectively obtain an input matrix X and a corresponding output moment Y, and according to the application time, according to the proportion of 7:3 into a training set and a test set, which facilitates data processing,
in S3, a Probabilistic Neural Network (PNN) is a feedforward type neural network in which radial basis function neurons and competitive neurons are combined together, a gaussian function is used as a basis function, and a neural network is obtained according to probability density function estimation and bayesian classification rules, which is different from a BP network, a learning algorithm of the PNN does not adjust a connection weight between neurons in a training process, learning of the network completely depends on data samples, training of a large amount of data is not needed, only a smoothing factor is needed to be determined, and the PNN has the advantages of simple operation, high robustness, high parallel structure, parallel realization capability and the like, and can meet the requirement of processing data in real time.
A PNN is constructed, the PNN network including an input layer, a mode layer, a summing layer, and an output layer. Determining the number of nodes of an input layer, a hidden layer, a summation layer and an output layer of the PNN to be established; and establishing a PNN network model by taking the training sample data as the input of the PNN. The method specifically comprises the following steps:
step 3-1: input layer
The first layer of the PNN, called the input layer, receives values from training samples and passes on feature vectors to the network, the number of sample feature dimensions being equal to the number of neurons in the input layer, and the input vector X (X ═ y1,x2,…,xm)TAnd n is the sample dimension.
Step 3-2: mode layer
The second layer of the PNN is called a mode layer, and is connected to the input layer by a connection weight, the number of neurons in the mode layer is equal to the sum of the training samples of each category, the mode layer calculates the matching relationship between the input feature vector and each mode in the training set, that is, the similarity, and sends the distance to a gaussian function to obtain the output of the mode layer, and the output of the mode layer can be expressed as:
Figure BDA0002801483870000111
wherein X is an input feature vector, Wiδ is the smoothing factor, which is the weight between the input layer and the mode layer.
Step 3-3: summation layer
The third layer of the PNN, called the summation layer, is responsible for connecting the mode layer elements of each class, each class having only one summation layer element, which is added only to the outputs of the mode layer elements belonging to its class, not connected to the mode layers of the other elements, whose outputs are proportional to the estimates of the kernel-based probability densities of the classes. The number of neurons in the summation layer is the number of classes of samples.
Step 3-4: output layer
The fourth layer of PNN is called the output layer, where there are several threshold discriminators, the neuron of which is a competitive neuron, and the one with the largest posterior probability density among the estimated probability densities is taken as the output of the whole system. The output layer has neuron number equal to the number of kinds of training sample data, receives the probability density functions of various kinds output from the summation layer, and outputsGiving m-dimensional vector Y ═ Y1,y2,…,ym)TThe output of the output layer can be expressed as:
Figure BDA0002801483870000112
wherein f (x) is a probability density function; p is the dimension of the training sample feature vector; x is the number ofiIs a training sample feature vector; m is the number of training samples; δ is a smoothing factor; the value of delta determines the width of a bell-shaped curve taking a sample point as a center, is the only parameter needing to be adjusted in the PNN, has a great influence on the classification accuracy of the sample, and is favorable for controlling the accuracy of the sample classification by defining the value of a smoothing factor delta,
in S4, in order to optimize that the only parameter to be adjusted in the PNN is the smoothing factor δ value, a CS algorithm is used to find the optimal smoothing factor of the probabilistic neural network, which specifically includes:
s41: according to a given training sample, determining a network topology structure of the PNN and the number of nodes of each layer, and determining initialization parameters of a CS algorithm, wherein the method comprises the following steps: population size M, maximum number of iterations tmaxDiscovery probability PaAnd a step size factor alpha0
S42: coding the smoothing factor delta to be optimized, and randomly generating M bird nests within a specified range
Figure BDA0002801483870000121
Each bird nest is a set of parameters for a smoothing factor δ, corresponding to a set of solutions for the smoothing factor δ { δ }123,…,δM},i=1,2,...,M;
S43: determining a fitness function, and evaluating the fitness of each nest in the population by using the following formula:
Figure BDA0002801483870000122
where n is the total number of samples, y' (i), and y (i) are the actual output value and the expected output value of the ith sample, respectively.
S44: for each cuckoo, a levy flight is carried out, the aim of which is to replace the less good nests with new and possibly better nests, the path and position updating formula for each cuckoo nest is as follows:
Figure BDA0002801483870000123
wherein the content of the first and second substances,
Figure BDA0002801483870000124
and
Figure BDA0002801483870000125
respectively representing the position vectors of the ith bird nest in the t generation and the t +1 generation;
Figure BDA0002801483870000126
is point-to-point multiplication; alpha is step length control quantity, a random search range is determined, generally 0.1 is taken, the step length is continuously updated in the search, the search range is narrowed, and a step length updating formula is as follows:
Figure BDA0002801483870000127
wherein alpha is0Is a constant number, xbestIndicating the current nest with the best quality;
and L (lambda) is L vy, the flight random search path obeys L vy distribution, and the following conditions are met:
Figure BDA0002801483870000128
where both μ and v follow a normal distribution.
Figure BDA0002801483870000131
Figure BDA0002801483870000132
Wherein Γ represents a standard gamma function; β ═ 1.5; x is the number ofbestIndicating the current best quality nest.
S45: after the position is updated, a random number r is generated to be equal to [0,1 ∈]And probability of discovery PaBy contrast, if r > PaThen according to the Levy principle
Figure BDA0002801483870000136
Randomly changing, calculating the fitness value of the new population, comparing with the fitness value of the previous generation population, keeping the better fitness value, and recording the optimal bird nest xbestOtherwise, the value is kept unchanged.
S46: judging the iteration times: if less than the maximum number of iterations tmaxRepeating the step S44 and the step S45, continuing the next iteration until the condition is met, otherwise, ending the algorithm, and enabling the optimal bird nest XbestObtaining the optimal smoothing factor deltabest
S47: the optimized smoothing factor deltabestSubstituting into PNN framework, inputting training sample for CS-PNN prediction model training, which is favorable for accurately determining optimal smoothing factor deltabest
In step 5, in order to analyze the model prediction performance in comparison with a DBN model optimized by a genetic algorithm, a particle swarm algorithm and an ant colony algorithm, 3 indexes of Root Mean Square Error (RMSE), Average Relative Error (ARE) and hil unequal coefficient (Theil IC) ARE selected to evaluate the prediction effect of the model, and the formula is as follows:
Figure BDA0002801483870000133
Figure BDA0002801483870000134
Figure BDA0002801483870000135
wherein, yiIs the true value, y 'of the test sample set'iThe prediction value is a prediction value, n is the number of samples, RMSE and ARE ARE respectively used for measuring the discrete degree and the integral error of the model, and the smaller the numerical value of the RMSE and the ARE is, the smaller the prediction error of the model is, the more stable the model is and the better the effect is; the Theil IC is valued in the interval (0,1), the closer to 0, the smaller the error is, the better the prediction performance of the model is, the arrangement is favorable for intuitively judging the prediction performance of the model,
in step 6, deploying the CS-PNN model after off-line training to an application platform, extracting characteristic values of the online real-time application user data, normalizing the characteristic values, inputting the normalized characteristic values into a trained probabilistic neural network, outputting user credit assessment, inputting performance data into the model training at regular intervals to realize online updating of the model, and the device is beneficial to continuous optimization of the model,
also provided is a system for assessing credit risk of a customer based on a CS-PNN, comprising the following modules:
a dataset acquisition and labeling module: the loan system back end is used for obtaining a training data set comprising application, repayment, operation and third-party data;
the data preprocessing and normalization processing module is used for: the data preprocessing process comprises data cleaning, missing value processing, abnormal value processing, data transformation and data formatting, and the preprocessing is subjected to normalization processing and is divided into a training set and a test set;
a PNN model construction module: determining the number of nodes of an input layer, a hidden layer, a summation layer and an output layer of the probabilistic neural network to be established; taking training sample data as input of a probabilistic neural network, and establishing a probabilistic neural network model;
a CS-PNN model construction module: performing iterative optimization on the probabilistic neural network by using a CS algorithm to obtain an optimal smoothing factor, and outputting an optimization result as an initial parameter of the PNN to obtain a CS-PNN prediction model;
the PNN training test module is used for training the optimized PNN by using a training set and verifying by using a test set to obtain the accuracy of model prediction;
a PNN prediction module: the trained PNN model is used for carrying out credit risk assessment prediction on the online application client, and the setting can predict the credit level of the client in advance, so that the client can be provided with corresponding credit rights conveniently.
The working process is as follows: the invention comprises the following steps:
s1, constructing sample data, sampling a client with existing loan expression as a modeling sample, and collecting credit characteristic data of the client;
s2, preprocessing the acquired data, normalizing the preprocessed data by adopting a Min-Max method, and dividing the preprocessed data into a training set and a test set according to a ratio of 7: 3;
s3, selecting 10 feature vectors which can most influence the repayment state as input by using logistic regression or random forest at the root of the training set, and establishing a PNN prediction model by using whether the repayment is overdue or not as output;
s4, optimizing the smoothing factor of the PNN by using a CS algorithm, wherein the optimization algorithm takes the accuracy of the model as a target, obtains the optimal smoothing factor by iterative optimization, and outputs an optimization result as an initial parameter of the PNN to obtain a CS-PNN prediction model;
s5, predicting the data of the test set by using the trained CS-PNN prediction model, and evaluating the data by using the root mean square error, the average relative error and the Hill inequality coefficient to verify the quality of the optimization model and obtain the optimized system model;
and S6, deploying the CS-PNN model after offline training to an application platform, extracting characteristic values from online real-time application user data, normalizing the characteristic values, inputting the normalized characteristic values into the CS-PNN model, and outputting the credit risk assessment of the user.
Collecting customer data as input through S1-S6, and outputting a credit risk score of the customer by using the CS-PNN model to predict whether the customer will pay within the term.
The same parts of embodiment 2 as embodiment 1 are not repeated, except that: at S1, the client of the existing loan performance is sampled as a modeling sample, and credit characteristic data of the client is collected, the credit characteristic data includes personal basic information, operating behavior buried point data and third party data, the arrangement is favorable for collecting user data,
in S2, in the neural network input quantities, the orders of magnitude differ greatly due to the unit difference of each input quantity; if direct input quantity input is adopted, neuron training is easy to saturate, therefore, before input training, data must be normalized to be in the same number level, the preprocessed data is normalized by adopting a Min-Max method, and a calculation formula is as follows:
Figure BDA0002801483870000161
wherein the content of the first and second substances,
Figure BDA0002801483870000162
for normalized data, DmaxAs the maximum value of the training sample set, DminAs the minimum of the training sample set, DiThe data itself.
Reconstructing the normalized training sample to respectively obtain an input matrix X and a corresponding output moment Y, and according to the application time, according to the proportion of 7:3 into a training set and a test set, which facilitates data processing,
in S3, a Probabilistic Neural Network (PNN) is a feedforward type neural network in which radial basis function neurons and competitive neurons are combined together, a gaussian function is used as a basis function, and a neural network is obtained according to probability density function estimation and bayesian classification rules, which is different from a BP network, a learning algorithm of the PNN does not adjust a connection weight between neurons in a training process, learning of the network completely depends on data samples, training of a large amount of data is not needed, only a smoothing factor is needed to be determined, and the PNN has the advantages of simple operation, high robustness, high parallel structure, parallel realization capability and the like, and can meet the requirement of processing data in real time.
A PNN is constructed, the PNN network including an input layer, a mode layer, a summing layer, and an output layer. Determining the number of nodes of an input layer, a hidden layer, a summation layer and an output layer of the PNN to be established; and establishing a PNN network model by taking the training sample data as the input of the PNN. The method specifically comprises the following steps:
step 3-1: input layer
The first layer of the PNN, called the input layer, receives values from training samples and passes on feature vectors to the network, the number of sample feature dimensions being equal to the number of neurons in the input layer, and the input vector X (X ═ y1,x2,…,xm)TAnd n is the sample dimension.
Step 3-2: mode layer
The second layer of the PNN is called a mode layer, and is connected to the input layer by a connection weight, the number of neurons in the mode layer is equal to the sum of the training samples of each category, the mode layer calculates the matching relationship between the input feature vector and each mode in the training set, that is, the similarity, and sends the distance to a gaussian function to obtain the output of the mode layer, and the output of the mode layer can be expressed as:
Figure BDA0002801483870000171
wherein X is an input feature vector, Wiδ is the smoothing factor, which is the weight between the input layer and the mode layer.
Step 3-3: summation layer
The third layer of the PNN, called the summation layer, is responsible for connecting the mode layer elements of each class, each class having only one summation layer element, which is added only to the outputs of the mode layer elements belonging to its class, not connected to the mode layers of the other elements, whose outputs are proportional to the estimates of the kernel-based probability densities of the classes. The number of neurons in the summation layer is the number of classes of samples.
Step 3-4: output layer
The fourth layer of PNN is called the output layer, where there are several threshold discriminationsAnd the neuron is a competitive neuron, and the maximum posterior probability density in the estimated probability densities is used as the output of the whole system. The output layer neuron number is equal to the class number of the training sample data, receives various probability density functions output from the summation layer, and outputs an m-dimensional vector Y ═ Y1,y2,…,ym)TThe output of the output layer can be expressed as:
Figure BDA0002801483870000172
wherein f (x) is a probability density function; p is the dimension of the training sample feature vector; x is the number ofiIs a training sample feature vector; m is the number of training samples; δ is a smoothing factor; the value of delta determines the width of a bell-shaped curve taking a sample point as a center, is the only parameter needing to be adjusted in the PNN, has a great influence on the classification accuracy of the sample, and is favorable for controlling the accuracy of the sample classification by defining the value of a smoothing factor delta,
in S4, in order to optimize that the only parameter to be adjusted in the PNN is the smoothing factor δ value, a CS algorithm is used to find the optimal smoothing factor of the probabilistic neural network, which specifically includes:
s41: according to a given training sample, determining a network topology structure of the PNN and the number of nodes of each layer, and determining initialization parameters of a CS algorithm, wherein the method comprises the following steps: population size M, maximum number of iterations tmaxDiscovery probability PaAnd a step size factor alpha0
S42: coding the smoothing factor delta to be optimized, and randomly generating M bird nests within a specified range
Figure BDA0002801483870000181
Each bird nest is a set of parameters for a smoothing factor δ, corresponding to a set of solutions for the smoothing factor δ { δ }123,…,δM},i=1,2,...,M;
S43: determining a fitness function, and evaluating the fitness of each nest in the population by using the following formula:
Figure BDA0002801483870000182
where n is the total number of samples, y' (i), and y (i) are the actual output value and the expected output value of the ith sample, respectively.
S44: for each cuckoo, a levy flight is carried out, the aim of which is to replace the less good nests with new and possibly better nests, the path and position updating formula for each cuckoo nest is as follows:
Figure BDA0002801483870000183
wherein the content of the first and second substances,
Figure BDA0002801483870000184
and
Figure BDA0002801483870000185
respectively representing the position vectors of the ith bird nest in the t generation and the t +1 generation;
Figure BDA0002801483870000186
is point-to-point multiplication; alpha is step length control quantity, a random search range is determined, generally 0.1 is taken, the step length is continuously updated in the search, the search range is narrowed, and a step length updating formula is as follows:
Figure BDA0002801483870000187
wherein alpha is0Is a constant number, xbestIndicating the current nest with the best quality;
and L (lambda) is L vy, the flight random search path obeys L vy distribution, and the following conditions are met:
Figure BDA0002801483870000188
where both μ and v follow a normal distribution.
Figure BDA0002801483870000189
Figure BDA0002801483870000191
Wherein Γ represents a standard gamma function; β ═ 1.5; x is the number ofbestIndicating the current best quality nest.
S45: after the position is updated, a random number r is generated to be equal to [0,1 ∈]And probability of discovery PaBy contrast, if r > PaThen according to the Levy principle
Figure BDA0002801483870000192
Randomly changing, calculating the fitness value of the new population, comparing with the fitness value of the previous generation population, keeping the better fitness value, and recording the optimal bird nest xbestOtherwise, the value is kept unchanged.
S46: judging the iteration times: if less than the maximum number of iterations tmaxRepeating the step S44 and the step S45, continuing the next iteration until the condition is met, otherwise, ending the algorithm, and enabling the optimal bird nest XbestObtaining the optimal smoothing factor deltabest
S47: the optimized smoothing factor deltabestSubstituting into PNN framework, inputting training sample for CS-PNN prediction model training, which is favorable for accurately determining optimal smoothing factor deltabest
In step 5, in order to analyze the model prediction performance in comparison with a DBN model optimized by a genetic algorithm, a particle swarm algorithm and an ant colony algorithm, 3 indexes of Root Mean Square Error (RMSE), Average Relative Error (ARE) and hil unequal coefficient (Theil IC) ARE selected to evaluate the prediction effect of the model, and the formula is as follows:
Figure BDA0002801483870000193
Figure BDA0002801483870000194
Figure BDA0002801483870000195
wherein, yiTo test the true value of the sample set, yiThe' is a predicted value, n is the number of samples, RMSE and ARE ARE respectively used for measuring the discrete degree and the integral error of the model, and the smaller the numerical value of the RMSE and the ARE is, the smaller the predicted error of the model is, the more stable the model is and the better the effect is; the Theil IC is valued in the interval (0,1), the closer to 0, the smaller the error is, the better the prediction performance of the model is, the arrangement is favorable for intuitively judging the prediction performance of the model,
in step 6, deploying the CS-PNN model after off-line training to an application platform, extracting characteristic values of the online real-time application user data, normalizing the characteristic values, inputting the normalized characteristic values into a trained probabilistic neural network, outputting user credit assessment, inputting performance data into the model training at regular intervals to realize online updating of the model, and the device is beneficial to continuous optimization of the model,
also provided is a system for assessing credit risk of a customer based on a CS-PNN, comprising the following modules:
a dataset acquisition and labeling module: the loan system back end is used for obtaining a training data set comprising application, repayment, operation and third-party data;
the data preprocessing and normalization processing module is used for: the data preprocessing process comprises data cleaning, missing value processing, abnormal value processing, data transformation and data formatting, and the preprocessing is subjected to normalization processing and is divided into a training set and a test set;
a PNN model construction module: determining the number of nodes of an input layer, a hidden layer, a summation layer and an output layer of the probabilistic neural network to be established; taking training sample data as input of a probabilistic neural network, and establishing a probabilistic neural network model;
a CS-PNN model construction module: performing iterative optimization on the probabilistic neural network by using a CS algorithm to obtain an optimal smoothing factor, and outputting an optimization result as an initial parameter of the PNN to obtain a CS-PNN prediction model;
the PNN training test module is used for training the optimized PNN by using a training set and verifying by using a test set to obtain the accuracy of model prediction;
a PNN prediction module: the trained PNN model is used for carrying out credit risk assessment prediction on the online application client, and the setting can predict the credit level of the client in advance, so that the client can be provided with corresponding credit rights conveniently.
Compared with BP and RBF neural networks, the PNN integrates Bayes decision theory and density function estimation on the basis of radial basis functions, and has the advantages of simple network structure, less adjusting parameters, fast running time, no local minimum value point and the like; compared with other optimization algorithms such as GA, PSO and ACO, the CS algorithm searches for a global optimal solution by combining simulated cuckoo nest parasitic reproduction behavior and the Levy flight search principle, has the advantages of less parameter setting, high convergence speed, high universality and robustness, easiness in implementation and the like, and can efficiently balance local search and global search of the algorithm; the CS-PNN model obtained by CS optimization of the smoothing factor of the PNN has the advantages of simple network structure, high convergence speed, good fault tolerance, high robustness, high classification accuracy, strong sample addition capability and the like, and can meet the requirement of real-time credit risk assessment of the loan system
The principles and embodiments of the present invention are explained herein using specific examples, which are presented only to assist in understanding the method and its core concepts of the present invention. The foregoing is only a preferred embodiment of the present invention, and it should be noted that there are objectively infinite specific structures due to the limited character expressions, and it will be apparent to those skilled in the art that a plurality of modifications, decorations or changes may be made without departing from the principle of the present invention, and the technical features described above may be combined in a suitable manner; such modifications, variations, combinations, or adaptations of the invention using its spirit and scope, as defined by the claims, may be directed to other uses and embodiments.

Claims (8)

1. A method and a system for evaluating credit risk of a customer based on CS-PNN are characterized in that: the method comprises the following steps:
s1, constructing sample data, sampling a client with existing loan expression as a modeling sample, and collecting credit characteristic data of the client;
s2, preprocessing the acquired data, normalizing the preprocessed data by adopting a Min-Max method, and dividing the preprocessed data into a training set and a test set according to a ratio of 7: 3;
s3, selecting 10 feature vectors which can most influence the repayment state as input by using logistic regression or random forest at the root of the training set, and establishing a PNN prediction model by using whether the repayment is overdue or not as output;
s4, optimizing the smoothing factor of the PNN by using a CS algorithm, wherein the optimization algorithm takes the accuracy of the model as a target, obtains the optimal smoothing factor by iterative optimization, and outputs an optimization result as an initial parameter of the PNN to obtain a CS-PNN prediction model;
s5, predicting the data of the test set by using the trained CS-PNN prediction model, and evaluating the data by using the root mean square error, the average relative error and the Hill inequality coefficient to verify the quality of the optimization model and obtain the optimized system model;
and S6, deploying the CS-PNN model after offline training to an application platform, extracting characteristic values of online real-time application user data, normalizing the characteristic values, inputting the normalized characteristic values into the CS-PNN model, and outputting the credit risk score of the user.
2. The method and system for assessing credit risk of a customer based on CS-PNN as claimed in claim 1, wherein: at S1, the client of the existing loan expression is sampled as a modeling sample, and the credit characteristic data of the client is collected, wherein the credit characteristic data comprises personal basic information, operation behavior buried point data and third-party data.
3. The method and system for assessing credit risk of a customer based on CS-PNN as claimed in claim 1, wherein: in S2, in the input quantities of the neural network, because the units of the input quantities are different and the magnitude differences are large, if the input quantities are directly input, the training of neurons is easily saturated, so before the input training, the data must be normalized to be in the same magnitude, and the preprocessed data must be normalized by the Min-Max method, and the calculation formula is as follows:
Figure FDA0002801483860000021
wherein the content of the first and second substances,
Figure FDA0002801483860000022
for normalized data, DmaxAs the maximum value of the training sample set, DminAs the minimum of the training sample set, DiThe data itself.
Reconstructing the normalized training sample to respectively obtain an input matrix X and a corresponding output moment Y, and according to the application time, according to the proportion of 7:3 into training and testing sets.
4. The method and system for assessing credit risk of a customer based on CS-PNN as claimed in claim 1, wherein: in S3, a Probabilistic Neural Network (PNN) is a feedforward type neural network in which radial basis function neurons and competitive neurons are combined together, a gaussian function is used as a basis function, and a neural network is obtained according to probability density function estimation and bayesian classification rules, which is different from a BP network, a learning algorithm of the PNN does not adjust a connection weight between neurons in a training process, learning of the network completely depends on data samples, training of a large amount of data is not needed, only a smoothing factor is needed to be determined, and the PNN has the advantages of simple operation, high robustness, high parallel structure, parallel realization capability and the like, and can meet the requirement of processing data in real time.
A PNN is constructed, the PNN network including an input layer, a mode layer, a summing layer, and an output layer. Determining the number of nodes of an input layer, a hidden layer, a summation layer and an output layer of the PNN to be established; and establishing a PNN network model by taking the training sample data as the input of the PNN. The method specifically comprises the following steps:
step 3-1: input layer
The first layer of the PNN, called the input layer, receives values from training samples and passes on feature vectors to the network, the number of sample feature dimensions being equal to the number of neurons in the input layer, and the input vector X (X ═ y1,x2,…,xm)TAnd n is the sample dimension.
Step 3-2: mode layer
The second layer of the PNN is called a mode layer, and is connected to the input layer by a connection weight, the number of neurons in the mode layer is equal to the sum of the training samples of each category, the mode layer calculates the matching relationship between the input feature vector and each mode in the training set, that is, the similarity, and sends the distance to a gaussian function to obtain the output of the mode layer, and the output of the mode layer can be expressed as:
Figure FDA0002801483860000031
wherein X is an input feature vector, Wiδ is the smoothing factor, which is the weight between the input layer and the mode layer.
Step 3-3: summation layer
The third layer of the PNN, called the summation layer, is responsible for connecting the mode layer elements of each class, each class having only one summation layer element, which is added only to the outputs of the mode layer elements belonging to its class, not connected to the mode layers of the other elements, whose outputs are proportional to the estimates of the kernel-based probability densities of the classes. The number of neurons in the summation layer is the number of classes of samples.
Step 3-4: output layer
The fourth layer of PNN is called the output layer, among themAnd a plurality of threshold discriminators, wherein the neuron is a competitive neuron, and the output of the whole system is the maximum posterior probability density in the estimated probability densities. The output layer neuron number is equal to the class number of the training sample data, receives various probability density functions output from the summation layer, and outputs an m-dimensional vector Y ═ Y1,y2,…,ym)TThe output of the output layer can be expressed as:
Figure FDA0002801483860000032
wherein f (x) is a probability density function; p is the dimension of the training sample feature vector; x is the number ofiIs a training sample feature vector; m is the number of training samples; δ is a smoothing factor; the value of delta determines the width of a bell-shaped curve taking a sample point as a center, is the only parameter needing to be adjusted in the PNN, and has a great influence on the classification accuracy of the sample.
5. The method and system for assessing credit risk of a customer based on CS-PNN as claimed in claim 1, wherein: in S4, in order to optimize that the only parameter to be adjusted in the PNN is the smoothing factor δ value, a CS algorithm is used to find the optimal smoothing factor of the probabilistic neural network, which specifically includes:
s41: according to a given training sample, determining a network topology structure of the PNN and the number of nodes of each layer, and determining initialization parameters of a CS algorithm, wherein the method comprises the following steps: population size M, maximum number of iterations tmaxDiscovery probability PaAnd a step size factor alpha0
S42: coding the smoothing factor delta to be optimized, and randomly generating M bird nests within a specified range
Figure FDA0002801483860000041
Each bird nest is a set of parameters for a smoothing factor δ, corresponding to a set of solutions for the smoothing factor δ { δ }123,…,δM},i=1,2,...,M;
S43: determining a fitness function, and evaluating the fitness of each nest in the population by using the following formula:
Figure FDA0002801483860000042
where n is the total number of samples, y' (i), and y (i) are the actual output value and the expected output value of the ith sample, respectively.
S44: for each cuckoo, a levy flight is carried out, the aim of which is to replace the less good nests with new and possibly better nests, the path and position updating formula for each cuckoo nest is as follows:
Figure FDA0002801483860000043
wherein the content of the first and second substances,
Figure FDA0002801483860000044
and
Figure FDA0002801483860000045
respectively representing the position vectors of the ith bird nest in the t generation and the t +1 generation;
Figure FDA0002801483860000046
is point-to-point multiplication; alpha is step length control quantity, a random search range is determined, generally 0.1 is taken, the step length is continuously updated in the search, the search range is narrowed, and a step length updating formula is as follows:
Figure FDA0002801483860000047
wherein alpha is0Is a constant number, xbestIndicating the current nest with the best quality;
l (lambda) is a Levy flight random search path, obeys Levy distribution, and meets the following conditions:
Figure FDA0002801483860000048
where both μ and v follow a normal distribution.
Figure FDA0002801483860000049
Figure FDA0002801483860000051
Wherein Γ represents a standard gamma function; β ═ 1.5; x is the number ofbestIndicating the current best quality nest.
S45: after the position is updated, a random number r is generated to be equal to [0,1 ∈]And probability of discovery PaBy contrast, if r > PaThen according to the Levy principle
Figure FDA0002801483860000052
Randomly changing, calculating the fitness value of the new population, comparing with the fitness value of the previous generation population, keeping the better fitness value, and recording the optimal bird nest xbestOtherwise, the value is kept unchanged.
S46: judging the iteration times: if less than the maximum number of iterations tmaxRepeating the step S44 and the step S45, continuing the next iteration until the condition is met, otherwise, ending the algorithm, and enabling the optimal bird nest XbestObtaining the optimal smoothing factor deltabest
S47: the optimized smoothing factor deltabestSubstituting into PNN framework, inputting training sample to train CS-PNN prediction model.
6. The FWA _ DBN-based customer credit assessment method and system according to claim 1, wherein: in step 5, in order to analyze the model prediction performance in comparison with a PNN model optimized by a genetic algorithm, a particle swarm algorithm and an ant colony algorithm, 3 indexes of Root Mean Square Error (RMSE), Average Relative Error (ARE) and hil unequal coefficient (Theil IC) ARE selected to evaluate the prediction effect of the model, and the formula is as follows:
Figure FDA0002801483860000053
Figure FDA0002801483860000054
Figure FDA0002801483860000055
wherein, yiIs the true value, y 'of the test sample set'iThe prediction value is a prediction value, n is the number of samples, RMSE and ARE ARE respectively used for measuring the discrete degree and the integral error of the model, and the smaller the numerical value of the RMSE and the ARE is, the smaller the prediction error of the model is, the more stable the model is and the better the effect is; the Theil IC is taken in the (0,1) interval, and the closer to 0, the smaller the error is, and the better the prediction performance of the model is.
7. The method and system for assessing client credit risk based on CS-PNN as claimed in claim 1, wherein in step 6, the offline trained CS-PNN model is deployed to the application platform, the online real-time application user data is extracted, the characteristic values are normalized and input into the trained CS-PNN model, the user credit assessment is output, and the presence of performance data is input into the model training periodically to realize online updating of the model.
8. The method and system for assessing credit risk of a customer based on CS-PNN as claimed in claim 1, wherein: also provided is a system for assessing credit risk of a customer based on a CS-PNN, comprising the following modules:
a dataset acquisition and labeling module: the loan system back end is used for obtaining a training data set comprising application, repayment, operation and third-party data;
the data preprocessing and normalization processing module is used for: the data preprocessing process comprises data cleaning, missing value processing, abnormal value processing, data transformation and data formatting, and the preprocessing is subjected to normalization processing and is divided into a training set and a test set;
a PNN model construction module: determining the number of nodes of an input layer, a hidden layer, a summation layer and an output layer of the probabilistic neural network to be established; taking training sample data as input of a probabilistic neural network, and establishing a probabilistic neural network model;
a CS-PNN model construction module: performing iterative optimization on the probabilistic neural network by using a CS algorithm to obtain an optimal smoothing factor, and outputting an optimization result as an initial parameter of the PNN to obtain a CS-PNN prediction model;
the PNN training test module is used for training the optimized PNN by using a training set and verifying by using a test set to obtain the accuracy of model prediction;
a PNN prediction module: and (4) carrying out credit risk assessment prediction on the online application client by using the trained PNN model.
CN202011351678.6A 2020-11-27 2020-11-27 Method and system for evaluating credit risk of customer based on CS-PNN Pending CN112529683A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202011351678.6A CN112529683A (en) 2020-11-27 2020-11-27 Method and system for evaluating credit risk of customer based on CS-PNN

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202011351678.6A CN112529683A (en) 2020-11-27 2020-11-27 Method and system for evaluating credit risk of customer based on CS-PNN

Publications (1)

Publication Number Publication Date
CN112529683A true CN112529683A (en) 2021-03-19

Family

ID=74993825

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202011351678.6A Pending CN112529683A (en) 2020-11-27 2020-11-27 Method and system for evaluating credit risk of customer based on CS-PNN

Country Status (1)

Country Link
CN (1) CN112529683A (en)

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113255235A (en) * 2021-06-28 2021-08-13 中国人民解放军国防科技大学 Approximate modeling method, device, equipment and medium for complex structure of aircraft
CN113793214A (en) * 2021-09-27 2021-12-14 武汉众邦银行股份有限公司 Control and management method and device for solving credit granting risk of small and micro enterprises
CN113888096A (en) * 2021-10-28 2022-01-04 江南大学 Goods management method and system for logistics park
CN116385141A (en) * 2023-03-23 2023-07-04 杭州青橄榄网络技术有限公司 Overdraft limit management method based on user identity
CN116579842A (en) * 2023-07-13 2023-08-11 南开大学 Credit data analysis method and system based on user behavior data
CN117437063A (en) * 2023-12-11 2024-01-23 交通银行股份有限公司湖南省分行 Financial risk prediction method and system

Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106327357A (en) * 2016-08-17 2017-01-11 深圳先进技术研究院 Load identification method based on improved probabilistic neural network
CN106384122A (en) * 2016-09-05 2017-02-08 江苏科技大学 Device fault mode identification method based on improved CS-LSSVM
CN107292453A (en) * 2017-07-24 2017-10-24 国网江苏省电力公司电力科学研究院 A kind of short-term wind power prediction method based on integrated empirical mode decomposition Yu depth belief network
CN108089099A (en) * 2017-12-18 2018-05-29 广东电网有限责任公司佛山供电局 The diagnostic method of distribution network failure based on depth confidence network
CN108596212A (en) * 2018-03-29 2018-09-28 红河学院 Based on the Diagnosis Method of Transformer Faults for improving cuckoo chess game optimization neural network
CN109547431A (en) * 2018-11-19 2019-03-29 国网河南省电力公司信息通信公司 A kind of network security situation evaluating method based on CS and improved BP
CN109637121A (en) * 2018-06-05 2019-04-16 南京理工大学 A kind of road traffic congestion prediction technique in short-term based on CS-SVR algorithm
CN110244216A (en) * 2019-07-01 2019-09-17 桂林电子科技大学 Analog-circuit fault diagnosis method based on cloud model optimization PNN
CN110909802A (en) * 2019-11-26 2020-03-24 西安邮电大学 Improved PSO (particle swarm optimization) based fault classification method for optimizing PNN (portable network) smoothing factor

Patent Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106327357A (en) * 2016-08-17 2017-01-11 深圳先进技术研究院 Load identification method based on improved probabilistic neural network
CN106384122A (en) * 2016-09-05 2017-02-08 江苏科技大学 Device fault mode identification method based on improved CS-LSSVM
CN107292453A (en) * 2017-07-24 2017-10-24 国网江苏省电力公司电力科学研究院 A kind of short-term wind power prediction method based on integrated empirical mode decomposition Yu depth belief network
CN108089099A (en) * 2017-12-18 2018-05-29 广东电网有限责任公司佛山供电局 The diagnostic method of distribution network failure based on depth confidence network
CN108596212A (en) * 2018-03-29 2018-09-28 红河学院 Based on the Diagnosis Method of Transformer Faults for improving cuckoo chess game optimization neural network
CN109637121A (en) * 2018-06-05 2019-04-16 南京理工大学 A kind of road traffic congestion prediction technique in short-term based on CS-SVR algorithm
CN109547431A (en) * 2018-11-19 2019-03-29 国网河南省电力公司信息通信公司 A kind of network security situation evaluating method based on CS and improved BP
CN110244216A (en) * 2019-07-01 2019-09-17 桂林电子科技大学 Analog-circuit fault diagnosis method based on cloud model optimization PNN
CN110909802A (en) * 2019-11-26 2020-03-24 西安邮电大学 Improved PSO (particle swarm optimization) based fault classification method for optimizing PNN (portable network) smoothing factor

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
ZIHAO XIE,XIAOHUI YANG: ""Fault Diagnosis in Industrial Chemical Processes using Optimal Probabilistic Neural Network"", 《THE CANADIAN JOURNAL OF CHEMICAL ENGINEERING》 *

Cited By (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113255235A (en) * 2021-06-28 2021-08-13 中国人民解放军国防科技大学 Approximate modeling method, device, equipment and medium for complex structure of aircraft
CN113255235B (en) * 2021-06-28 2021-09-24 中国人民解放军国防科技大学 Approximate modeling method, device, equipment and medium for complex structure of aircraft
CN113793214A (en) * 2021-09-27 2021-12-14 武汉众邦银行股份有限公司 Control and management method and device for solving credit granting risk of small and micro enterprises
CN113888096A (en) * 2021-10-28 2022-01-04 江南大学 Goods management method and system for logistics park
CN116385141A (en) * 2023-03-23 2023-07-04 杭州青橄榄网络技术有限公司 Overdraft limit management method based on user identity
CN116385141B (en) * 2023-03-23 2023-11-03 杭州青橄榄网络技术有限公司 Overdraft limit management method based on user identity
CN116579842A (en) * 2023-07-13 2023-08-11 南开大学 Credit data analysis method and system based on user behavior data
CN116579842B (en) * 2023-07-13 2023-10-03 南开大学 Credit data analysis method and system based on user behavior data
CN117437063A (en) * 2023-12-11 2024-01-23 交通银行股份有限公司湖南省分行 Financial risk prediction method and system

Similar Documents

Publication Publication Date Title
CN112529683A (en) Method and system for evaluating credit risk of customer based on CS-PNN
WO2022121289A1 (en) Methods and systems for mining minority-class data samples for training neural network
CN112581263A (en) Credit evaluation method for optimizing generalized regression neural network based on wolf algorithm
CN112037012A (en) Internet financial credit evaluation method based on PSO-BP neural network
CN111105104A (en) Short-term power load prediction method based on similar day and RBF neural network
CN112308288A (en) Particle swarm optimization LSSVM-based default user probability prediction method
CN115688024B (en) Network abnormal user prediction method based on user content characteristics and behavior characteristics
CN112529685A (en) Loan user credit rating method and system based on BAS-FNN
CN112967088A (en) Marketing activity prediction model structure and prediction method based on knowledge distillation
CN113344615A (en) Marketing activity prediction method based on GBDT and DL fusion model
CN112037011A (en) Credit scoring method based on FOA-RBF neural network
CN112581264A (en) Grasshopper algorithm-based credit risk prediction method for optimizing MLP neural network
CN112348655A (en) Credit evaluation method based on AFSA-ELM
CN115115389A (en) Express customer loss prediction method based on value subdivision and integrated prediction
CN115600729A (en) Grid load prediction method considering multiple attributes
CN114742564A (en) False reviewer group detection method fusing complex relationships
CN114021612A (en) Novel personal credit assessment method and system
Urgun et al. Composite power system reliability evaluation using importance sampling and convolutional neural networks
Abdelaziz et al. Convolutional Neural Network With Genetic Algorithm for Predicting Energy Consumption in Public Buildings
CN110109005B (en) Analog circuit fault testing method based on sequential testing
CN112529684A (en) Customer credit assessment method and system based on FWA _ DBN
CN116956160A (en) Data classification prediction method based on self-adaptive tree species algorithm
Gao et al. Establishment of economic forecasting model of high-tech industry based on genetic optimization neural network
CN115689001A (en) Short-term load prediction method based on pattern matching
CN115423091A (en) Conditional antagonistic neural network training method, scene generation method and system

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication

Application publication date: 20210319