CN117557361B - User credit risk assessment method and system based on data analysis - Google Patents

User credit risk assessment method and system based on data analysis Download PDF

Info

Publication number
CN117557361B
CN117557361B CN202311502563.6A CN202311502563A CN117557361B CN 117557361 B CN117557361 B CN 117557361B CN 202311502563 A CN202311502563 A CN 202311502563A CN 117557361 B CN117557361 B CN 117557361B
Authority
CN
China
Prior art keywords
data
risk assessment
time sequence
attribute data
branch
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202311502563.6A
Other languages
Chinese (zh)
Other versions
CN117557361A (en
Inventor
熊刚
黄�俊
周烈华
彭忠
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Weichuang Software Wuhan Co ltd
Original Assignee
Weichuang Software Wuhan Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Weichuang Software Wuhan Co ltd filed Critical Weichuang Software Wuhan Co ltd
Priority to CN202311502563.6A priority Critical patent/CN117557361B/en
Publication of CN117557361A publication Critical patent/CN117557361A/en
Application granted granted Critical
Publication of CN117557361B publication Critical patent/CN117557361B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q40/00Finance; Insurance; Tax strategies; Processing of corporate or income taxes
    • G06Q40/03Credit; Loans; Processing thereof
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/10Pre-processing; Data cleansing
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/213Feature extraction, e.g. by transforming the feature space; Summarisation; Mappings, e.g. subspace methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/25Fusion techniques
    • G06F18/253Fusion techniques of extracted features
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/004Artificial life, i.e. computing arrangements simulating life
    • G06N3/006Artificial life, i.e. computing arrangements simulating life based on simulated virtual individual or collective life forms, e.g. social simulations or particle swarm optimisation [PSO]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/0464Convolutional networks [CNN, ConvNet]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/0985Hyperparameter optimisation; Meta-learning; Learning-to-learn

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Evolutionary Computation (AREA)
  • Artificial Intelligence (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Computing Systems (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • General Health & Medical Sciences (AREA)
  • Molecular Biology (AREA)
  • Health & Medical Sciences (AREA)
  • Mathematical Physics (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Software Systems (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Evolutionary Biology (AREA)
  • Business, Economics & Management (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Finance (AREA)
  • Accounting & Taxation (AREA)
  • General Business, Economics & Management (AREA)
  • Technology Law (AREA)
  • Strategic Management (AREA)
  • Marketing (AREA)
  • Economics (AREA)
  • Development Economics (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

The invention provides a user credit risk assessment method and a system based on data analysis, wherein the method comprises the following steps: basic information and historical credit behavior data of different users are obtained, preprocessing and index screening are carried out, and a data set is constructed; building a risk assessment model based on a multi-branch deep learning network, training the risk assessment model through the data set, and optimizing super-parameters of the multi-branch deep learning network by adopting an improved swan optimization algorithm; and carrying out credit risk assessment on the user based on the trained risk assessment model. According to the invention, the multi-branch deep learning network is used for respectively learning the characteristics of the non-time sequence attribute data and the characteristics of the time sequence attribute data of the sample, and the super-parameters of the multi-branch deep learning network are optimized through the improved pool goose optimization algorithm, so that the risk assessment model is trained to carry out the credit risk assessment of the user, and the accuracy of the credit risk assessment can be improved.

Description

User credit risk assessment method and system based on data analysis
Technical Field
The invention belongs to the technical field of big data, and particularly relates to a user credit risk assessment method and system based on data analysis.
Background
User credit risk assessment is an important issue in the internet financial industry, which involves assessing and predicting the credit status of users, thereby helping financial institutions to formulate appropriate credit policies and risk management measures. Traditional user credit risk assessment methods mainly rely on limited data sources such as personal credit reports, financial statements and the like, and the data often cannot fully reflect the real credit status of the user.
With the rapid development and popularity of the internet, a large amount of user data is continuously generated and accumulated. Such data includes personal information of the user, behavior tracks, consumption records, etc., which are rich in characteristic information and behavior patterns. If the data can be analyzed and modeled through big data technology and machine learning algorithm, potential credit rules and modes can be mined from a large amount of user data, and the credit status of the user can be estimated and predicted more comprehensively and accurately.
The invention patent with publication number of CN113362167A discloses a credit risk assessment method based on a resampling integrated learning model of class boundaries, wherein a plurality of classifiers are integrated and learned by using a Bagging integrated learning algorithm to construct a credit risk assessment model for credit risk assessment, and the accuracy of prediction is improved by solving the problem of data unbalance. However, the feature extraction capability and the evaluation effect of different machine learning models/deep learning models can be different, and how to design a prediction model with strong feature extraction capability and good evaluation effect is a primary task of credit risk evaluation based on big data. In addition, the prediction performance and generalization capability of the machine learning model/the deep learning model are quite dependent on the super parameters of the neural network model, and the problems of under fitting, over fitting, unstable performance, overlong training time and the like may occur due to improper super parameter selection. Therefore, the super-parameters of the optimization model are also important aspects of improving the accuracy of the credit risk assessment of the user.
Disclosure of Invention
In view of the above, the present invention provides a method and a system for evaluating credit risk of a user based on data analysis, which are used for solving the problem that the accuracy of credit risk evaluation of the user is to be improved.
The invention discloses a user credit risk assessment method based on data analysis, which comprises the following steps:
Basic information and historical credit behavior data of different users are obtained, preprocessing and index screening are carried out, non-time sequence attribute data and time sequence attribute data of the different users are respectively extracted to form samples, and a data set is constructed;
Constructing a risk assessment model based on a multi-branch deep learning network, wherein each branch of the multi-branch deep learning network is used for extracting the characteristics of non-time sequence attribute data and the characteristics of time sequence attribute data of a sample respectively;
Training the risk assessment model through the data set, and optimizing the super parameters of the multi-branch deep learning network by adopting an improved swan optimization algorithm;
and carrying out credit risk assessment on the user based on the trained risk assessment model.
On the basis of the technical scheme, preferably, the basic information comprises names, ages, sexes, marital status, educational backgrounds, occupation types, working units and working years;
the historical credit behavior data includes financial status and credit records of the user over a period of time;
the financial conditions include income, expense, deposit and liability of the user;
the credit records include credit card usage records, loan records, overdue records, fraud records, and public records.
On the basis of the above technical solution, preferably, the preprocessing and feature screening are performed, non-time sequence attribute data and time sequence attribute data of different users are respectively extracted to form samples, and the construction of the data set specifically includes:
Performing data cleaning, missing value processing and positive and negative sample division on basic information and historical credit behavior data of different users;
Performing feature screening on the basic information and the historical credit behavior data by a correlation analysis and principal component analysis method;
Carrying out quantization processing on the screened basic information to form non-time sequence attribute data of the sample;
According to the time stamp of the historical credit behavior data, carrying out data alignment and combination on the screened historical credit behavior data to form time sequence attribute data of a sample;
And normalizing the non-time sequence attribute data and the time sequence attribute data of the sample, and adding the non-time sequence attribute data and the time sequence attribute data into the data set.
On the basis of the technical scheme, preferably, the risk assessment model based on the multi-branch deep learning network comprises an input layer, a first branch network, a second branch network, a feature fusion layer, a full-connection layer and an output layer;
The input layer is used for respectively reading non-time sequence attribute data and time sequence attribute data of the sample and converting the format;
The first branch network is respectively connected with the input layer and the feature fusion layer and is used for extracting features of non-time sequence attribute data of the sample; the first branch network adopts a convolutional neural network;
The second branch network is respectively connected with the input layer and the feature fusion layer and is used for extracting the features of the time sequence attribute data of the sample; the second branch network adopts Reformer neural network;
the feature fusion layer is used for fusing the features extracted by the first branch network and the second branch network;
The characteristic fusion layer, the full-connection layer and the output layer are sequentially connected.
On the basis of the above technical solution, preferably, the optimizing the super parameters of the multi-branch deep learning network by adopting the improved geese optimization algorithm specifically includes:
initializing the positions of goose populations in the pond by adopting chaotic mapping, and setting the maximum iteration times T;
Calculating the fitness value of each geese individual by taking the minimum mean square error predicted by the deep learning network as a fitness function, and storing the current optimal individual position;
selecting whether to enter a search phase or a development phase with a random probability p;
In the searching stage, introducing a refractive factor to evaluate the depth of the hunting object in water, and searching the hunting object according to the depth of the hunting object in water by adopting a U-shaped diving mode and a V-shaped diving mode; mirror image processing is carried out on the current optimal individual position, and the optimal individual position is updated;
In the development stage, the capturing capacity of the pond goose individuals is evaluated according to the fitness value of the pond goose individuals, whether hunting is captured or random walk is judged according to the capturing capacity of the pond goose individuals, and the positions of the pond goose individuals are updated;
Calculating the fitness value of each individual geese in the pond;
Judging whether the iteration termination condition is met, if so, outputting the position of the optimal individual as the super parameter of the deep learning network; if not, continuing to update the positions of the search stage and the development stage until the iteration termination condition is met.
On the basis of the above technical solution, preferably, in the searching stage, introducing the refractive factor to evaluate the depth of the prey in the water, and searching the prey by adopting a U-shaped diving mode and a V-shaped diving mode according to the depth of the prey in the water specifically includes:
calculating an average value f' of fitness values of each individual geese in the pond;
Calculating a refraction factor n= (f (X i(t))-f(Xb (t)))/epsilon, wherein epsilon is a preset fitness threshold value, and f (·) is a fitness function; x i(t)、Xb (t) is the position of the individual i and the position of the optimal individual at the t-th iteration, i=1, 2, …, N is the population number;
Estimating the depth h=n×f (X b (t)) of the prey in water based on the refractive factor;
if h is more than or equal to f', performing hunting search by adopting a U-shaped diving mode, otherwise, performing hunting search by adopting a V-shaped diving mode.
On the basis of the above technical solution, preferably, the formula for mirroring the current optimal individual position and updating the optimal individual position is as follows:
Wherein U, L is the upper limit and the lower limit of the search space respectively, X b (t) is the optimal individual position before mirror image processing, and X b' (t) is the mirror image position obtained by mirror image processing of X b (t);
Is the updated optimal individual position.
On the basis of the above technical solution, preferably, in the development stage, the capturing capability of the geese in the pool is evaluated according to the fitness value of the geese in the pool, and whether to capture a game or randomly walk is judged according to the capturing capability of the geese in the pool, and the updating of the position of the geese in the pool specifically includes:
Evaluation of Capacity C=f (X i) based on fitness value of individual geese in pool, if Then proceeding to the current optimal individual to capture the prey, otherwise, carrying out random walk through the Laiweighing flight strategy, wherein the position updating formula is as follows:
Wherein X i(t)、Xi (t+1) is the position of the individual i at the t-th and t+1-th iterations, w is the adaptive weight, C 1 and c 2 are preset weight maximum and minimum values, respectively.
In a second aspect of the present invention, a system for credit risk assessment of a user based on data analysis is disclosed, the system comprising:
the data set construction module: the method comprises the steps of acquiring basic information and historical credit behavior data of different users, preprocessing and index screening, respectively extracting non-time sequence attribute data and time sequence attribute data of the different users to form samples, and constructing a data set;
Model building module: the risk assessment model based on the multi-branch deep learning network is built, and each branch of the multi-branch deep learning network is used for extracting the characteristics of non-time sequence attribute data and the characteristics of time sequence attribute data of a sample respectively;
Model training module: the risk assessment model is trained through the data set, and the super parameters of the risk assessment model are optimized by adopting an improved swan optimization algorithm;
risk assessment module: for performing a user credit risk assessment based on the trained risk assessment model.
In a third aspect of the invention, a computer-readable storage medium is disclosed, storing computer instructions that cause a computer to implement the method according to the first aspect of the invention.
Compared with the prior art, the invention has the following beneficial effects:
1) According to the invention, basic information and historical credit behavior data of different users are obtained, non-time sequence attribute data and time sequence attribute data are extracted to form samples, the characteristics of the non-time sequence attribute data and the characteristics of the time sequence attribute data of the samples are respectively learned through a multi-branch deep learning network, the credit risk assessment of the users is carried out by training a risk assessment model, the credit risk assessment can be carried out by combining the non-time sequence attribute and the time sequence attribute in the data of the users, and the accuracy of the credit risk assessment is improved.
2) According to the multi-branch deep learning network, the convolutional neural network is adopted to extract the characteristics of user basic information, the Reformer neural network is adopted to fully mine the time sequence rule in the user history credit behaviors, and the characteristics of time sequence attribute data and the characteristics of non-time sequence attribute data are fused, so that the advantage of extremely compressed storage of the Reformer neural network can be utilized to accelerate the characteristic learning speed of the time sequence rule, and meanwhile, the number of super parameters to be optimized is reduced.
3) According to the invention, the super parameters of the multi-branch deep learning network are optimized through an improved pool goose optimization algorithm, the positions of pool goose populations are initialized through chaotic mapping, initial values with uniform distribution are obtained, and refractive factors are introduced in a searching stage to evaluate the depth of a prey in water, so that a searching mode is rapidly determined, and the searching efficiency is improved.
4) The improved geese optimization algorithm disclosed by the invention carries out mirror image processing on the current optimal individual position in the searching stage so as to increase disturbance and update the optimal individual position in time, thereby avoiding sinking into local optimal in the searching stage.
5) In the development stage of the improved optimization algorithm of the geese in the pool, the capturing capability of the geese in the pool is evaluated according to the fitness value of the geese in the pool, and the capturing capability is divided by combining the optimal individual positions before and after updating, so that the defect in the judgment of the capturing capability is alleviated, the rapid decision and the dynamic weight adjustment are realized, the randomness of the capturing of the prey is reduced, the complexity of the algorithm is reduced, and the convergence of the algorithm is accelerated.
Drawings
In order to more clearly illustrate the embodiments of the invention or the technical solutions in the prior art, the drawings that are required in the embodiments or the description of the prior art will be briefly described, it being obvious that the drawings in the following description are only some embodiments of the invention, and that other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.
FIG. 1 is a flow chart of a user credit risk assessment method based on data analysis according to the present invention;
Fig. 2 is a schematic diagram of a multi-branch deep learning network structure according to the present invention.
Detailed Description
The following description of the embodiments of the present invention will clearly and fully describe the technical aspects of the embodiments of the present invention, and it is apparent that the described embodiments are only some embodiments of the present invention, not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the present invention without making any inventive effort, are intended to fall within the scope of the present invention.
Referring to fig. 1, the method for evaluating credit risk of a user based on data analysis of the present invention includes:
S1, acquiring basic information and historical credit behavior data of different users, preprocessing and index screening, and constructing a data set.
S11, acquiring basic information and historical credit behavior data of the user through big data.
The user's underlying information and historical credit behavior are a very important part of credit risk assessment, which can help institutions or individuals make more accurate credit decisions.
The basic information of the user generally includes name, age, sex, marital status, educational background, job type, work unit, work year, etc. User stability and reliability, revenue stability and professional development compensation and also financial stability.
The historical credit behavior data includes financial status and credit records of the user over a period of time.
The financial conditions include income, expense, deposit and liability conditions of the user, etc., which reflect the consumer's consumption habits and payment capabilities.
Credit records include credit card usage records, loan records, overdue records, fraud records, public records, and the like. Fraud records such as suspected credit card theft, false applications, etc., may reflect bad credit behavior and credit risk for the user. Public records include a user's court decision records, bankruptcy records, tax violation records, etc., which may reflect the user's legal credit status and financial stability.
The purpose of this data is to obtain relevant information about the user and thus evaluate his credit risk. When data is collected, related laws and regulations are required to be complied with, and the privacy and data security of users are protected. Meanwhile, data preprocessing is needed to ensure the quality and accuracy of the data.
S12, data preprocessing and index screening are carried out, non-time sequence attribute data and time sequence attribute data are extracted to form a sample, and a data set is constructed.
Firstly, data cleaning, missing value processing and positive and negative sample division are carried out on basic information and historical credit behavior data of different users. Wherein the credit is good for positive samples, otherwise negative samples.
And then, performing feature screening on the basic information and the historical credit behavior data through a correlation analysis and a principal component analysis method. In particular, the method comprises the steps of,
The basic information of the user has non-time sequence attribute, and the historical credit behavior data has time sequence attribute, so the invention respectively processes the basic information of the user and the historical credit behavior data and extracts the non-time sequence attribute data and the time sequence attribute data of different users. Specifically, the screened basic information is quantized to form non-time sequence attribute data of the sample. And then carrying out data alignment and combination on the screened historical credit behavior data according to the time stamp of the historical credit behavior data to form time sequence attribute data of the sample.
And finally, carrying out normalization and standardization processing on the non-time sequence attribute data and the time sequence attribute data of the sample, and combining the sample label to form the sample so as to construct a data set.
The invention respectively extracts non-time sequence attribute data and time sequence attribute data in the user data to construct a data set of a risk assessment model.
S2, building a risk assessment model based on the multi-branch deep learning network.
Fig. 2 is a schematic diagram of a multi-branch deep learning network. The risk assessment model based on the multi-branch deep learning network comprises an input layer, a first branch network, a second branch network, a feature fusion layer, a full connection layer and an output layer.
And the input layer is used for respectively reading the non-time sequence attribute data and the time sequence attribute data of the sample and performing format conversion.
The first branch network is connected with the input layer and the feature fusion layer respectively and is used for extracting features of non-time sequence attribute data of the sample. The first branch network may employ a convolutional neural network.
And the second branch network is respectively connected with the input layer and the feature fusion layer and is used for extracting the features of the time sequence attribute data of the sample. The second branch network may employ Reformer neural networks.
The feature fusion layer is used for fusing the features extracted by the first branch network and the second branch network;
Wherein, feature fusion layer, full tie layer and output layer connect gradually.
The multi-branch deep learning network has 2 branches, wherein the first branch network is used for extracting the characteristics of non-time sequence attribute data of the samples, and the second branch network is used for extracting the characteristics of time sequence attribute data of the samples. According to the invention, the characteristic of time sequence attribute data of a sample is extracted by adopting Reformer neural network, the Reformer neural network uses local Sensitive hash (Locality-Sensitive-hash, LSH) to reduce the processing complexity of a long sequence and a reversible residual layer, so that the available memory is more effectively used, the time sequence characteristic in the long sequence data can be well extracted, and the accuracy of a risk assessment model can be improved by combining the time sequence characteristic extracted by adopting the convolutional neural network with the non-time sequence characteristic.
S3, training the risk assessment model through a data set, and optimizing super parameters of the risk assessment model by adopting an improved swan optimization algorithm.
And dividing the data set into a training set and a testing set, and carrying out parameter training of the risk assessment model. In view of the fact that the super-parameters based on the multi-branch deep learning network have a great influence on the performance of the risk assessment model, the super-parameters of the risk assessment model are optimized by adopting an improved swan optimization algorithm in the training process.
The method for optimizing the super parameters of the risk assessment model by adopting the improved swan optimization algorithm specifically comprises the following steps:
s31, initializing the positions of the pool goose population by adopting chaotic mapping, and setting the maximum iteration times T.
Combining the super parameters to be optimized into individual vectors of the swan, setting a boundary range [ L, U ] of the super parameters, initializing the positions { X i |i=1, 2, …, N } of the swan population in the range, wherein N is the population number. The original pool goose optimization algorithm uses a random initialization form to generate an initial population, the uniformity of distribution cannot be guaranteed, and the subsequent optimizing efficiency is affected, so that the position of the pool goose population is initialized by adopting the Tent chaotic map, the piecewise linear map has good randomness and ergodic property, and the global optimizing capability of the algorithm is improved, so that the algorithm performance is improved.
S32, calculating the fitness value of each geese individual by taking the minimum mean square error predicted by the multi-branch deep learning network as a fitness function, and storing the current optimal individual position.
The optimization target of the multi-branch deep learning network is the least mean square error of prediction, so that the optimization target is used as an fitness function, the fitness values of the individual geese in each pond are calculated and sequenced, and the current optimal individual position is stored.
S33, selecting whether to enter a searching stage or a developing stage according to a random probability p.
Generating a random number p in [0,1], entering a search stage if p is more than 0.5, otherwise entering a development stage.
And S34, introducing a refraction factor to evaluate the depth of the hunting object in the water in the searching stage, and searching the hunting object by adopting a U-shaped diving mode and a V-shaped diving mode according to the depth of the hunting object in the water.
The pool goose optimization algorithm determines the searching mode of the hunting object by judging the depth of the hunting object in water, however, the original pool goose optimization algorithm selects the searching mode of the hunting object by a random probability mode, and the depth of the hunting object in water is not actually evaluated, so that the random searching mode has slow convergence speed and affects the optimizing speed. The invention relates to a method for estimating the depth of a hunting object in water, which is characterized in that the depth of the hunting object in water is influenced by light refraction, so that the observed position of the hunting object in water is different from the actual position of the hunting object in water, according to the principle, the observed position of the hunting object in water is taken as a local optimal solution, the actual position of the hunting object in water is taken as a global optimal solution, a refraction factor is introduced, the depth of the hunting object in water is estimated, and diving search is carried out to approximate the actual position of the hunting object in water.
Let X i(t)、Xb (t) be the position of the individual i at the t-th iteration, the position of the optimal individual, respectively, calculate the refractive factor n:
Wherein epsilon is a preset fitness threshold and f (·) is a fitness function.
Estimating the depth h of the hunting in the water based on the refraction factor:
h=n*f(Xb(t))。
calculating an average value f' of fitness values of each individual geese in the pond;
If h is more than or equal to f', performing hunting search in a U-shaped diving mode, otherwise, performing hunting search in a V-shaped diving mode; specifically, the formula for searching for a prey is:
Wherein X i(t)、Xi (t+1) is the position of the individual i at the t-th and t+1-th iterations, respectively ,A=a(2r1-1),a=2t1 cos(2πr2),B=b(2r3-1),b=2t1V(2πr4), V (.cndot.) represents a type V function,U 1∈[-a,a],v1∈[-b,b],r1、r2、r3、r4 are random numbers between (0, 1), and X b(t)、Xr (t) are the optimal individual position at the t-th iteration and the position of one randomly selected individual, respectively.
S35, mirror image processing is carried out on the current optimal individual position, and the optimal individual position is updated.
In order to balance local search and global search, the invention carries out mirror image processing on the current optimal individual position, and compares the fitness of the current optimal individual position before and after mirror image processing so as to update the optimal individual position.
The formula of the mirror image processing is:
Wherein U, L is the upper limit and the lower limit of the search space respectively, X b (t) is the optimal individual position before mirror image processing, and X b' (t) is the mirror image position obtained by mirror image processing of X b (t);
the formula for updating the optimal individual position is:
Is the updated optimal individual position.
S36, in the development stage, the capturing capacity of the pond goose individuals is evaluated according to the fitness value of the pond goose individuals, whether hunting is captured or random walk is judged according to the capturing capacity of the pond goose individuals, and the positions of the pond goose individuals are updated.
The invention evaluates the capturing capacity C=f (X i) according to the fitness value of the individual geese in the pond, ifThen proceeding to the current optimal individual to capture the prey, otherwise, carrying out random walk through the Laiweighing flight strategy, wherein the position updating formula is as follows:
w is the dynamic adaptive weight of the model, C 1 and c 2 are respectively preset maximum and minimum weights, the weight w changes along with the change of the iteration times, a state with a smaller value at the early stage and a larger value at the later stage is presented, and the requirements of the algorithm on global large-scale searching at the early stage and development at the small range at the later stage are balanced.
S37, calculating fitness values of the individual geese in each pond, and storing the current optimal individual position X b (t).
S38, judging whether an iteration termination condition is met, namely whether f (X b (t) < epsilon) is met, if so, outputting the position of the optimal individual as the super-parameter of the deep learning network; if not, returning to the step S33, and continuing to update the positions of the search stage and the development stage until the iteration termination condition is met.
S4, performing credit risk assessment on the user based on the trained risk assessment model.
The trained risk assessment model has an optimal prediction effect, basic information and historical credit behavior data of a user to be tested are obtained, data processing is carried out in the same mode of the step S1, and the trained risk assessment model is input to obtain a credit risk assessment result of the user.
Corresponding to the embodiment of the method, the invention also provides a user credit risk assessment system based on data analysis, which comprises:
the data set construction module: the method comprises the steps of acquiring basic information and historical credit behavior data of different users, preprocessing and index screening, respectively extracting non-time sequence attribute data and time sequence attribute data of the different users to form samples, and constructing a data set;
Model building module: the risk assessment model based on the multi-branch deep learning network is built, and each branch of the multi-branch deep learning network is used for extracting the characteristics of non-time sequence attribute data and the characteristics of time sequence attribute data of a sample respectively;
Model training module: the risk assessment model is trained through the data set, and the super parameters of the risk assessment model are optimized by adopting an improved swan optimization algorithm;
risk assessment module: for performing a user credit risk assessment based on the trained risk assessment model.
The system embodiments and the method embodiments are in one-to-one correspondence, and the brief description of the system embodiments is just to refer to the method embodiments.
The invention also discloses an electronic device, comprising: at least one processor, at least one memory, a communication interface, and a bus; the processor, the memory and the communication interface complete communication with each other through the bus; the memory stores program instructions executable by the processor that the processor invokes to implement the aforementioned methods of the present invention.
The invention also discloses a computer readable storage medium storing computer instructions for causing a computer to implement all or part of the steps of the methods of the embodiments of the invention. The storage medium includes: a usb disk, a removable hard disk, a ROM, a RAM, a magnetic disk, or an optical disk, or other various media capable of storing program codes.
The system embodiments described above are merely illustrative, wherein the elements illustrated as separate elements may or may not be physically separate, and the elements shown as elements may or may not be physical elements, i.e., may be distributed over a plurality of network elements. One of ordinary skill in the art may select some or all of the modules according to actual needs without performing any inventive effort to achieve the objectives of the present embodiment.
The foregoing description of the preferred embodiments of the invention is not intended to be limiting, but rather is intended to cover all modifications, equivalents, alternatives, and improvements that fall within the spirit and scope of the invention.

Claims (8)

1. A method for credit risk assessment of a user based on data analysis, the method comprising:
Basic information and historical credit behavior data of different users are obtained, preprocessing and index screening are carried out, non-time sequence attribute data and time sequence attribute data of the different users are respectively extracted to form samples, and a data set is constructed;
Constructing a risk assessment model based on a multi-branch deep learning network, wherein each branch of the multi-branch deep learning network is used for extracting the characteristics of non-time sequence attribute data and the characteristics of time sequence attribute data of a sample respectively;
Training the risk assessment model through the data set, and optimizing the super parameters of the multi-branch deep learning network by adopting an improved swan optimization algorithm;
Performing user credit risk assessment based on the trained risk assessment model;
the basic information comprises names, ages, sexes, marital conditions, educational backgrounds, occupation types, work units and work years;
the historical credit behavior data includes financial status and credit records of the user over a period of time;
the financial conditions include income, expense, deposit and liability of the user;
The credit records include credit card usage records, loan records, overdue records, fraud records, and public records;
the optimizing the super parameters of the multi-branch deep learning network by adopting the improved swan optimization algorithm specifically comprises the following steps:
initializing the positions of goose populations in the pond by adopting chaotic mapping, and setting the maximum iteration times T;
Calculating the fitness value of each geese individual by taking the minimum mean square error predicted by the deep learning network as a fitness function, and storing the current optimal individual position;
selecting whether to enter a search phase or a development phase with a random probability p;
In the searching stage, introducing a refractive factor to evaluate the depth of the hunting object in water, and searching the hunting object according to the depth of the hunting object in water by adopting a U-shaped diving mode and a V-shaped diving mode; mirror image processing is carried out on the current optimal individual position, and the optimal individual position is updated;
In the development stage, the capturing capacity of the pond goose individuals is evaluated according to the fitness value of the pond goose individuals, whether hunting is captured or random walk is judged according to the capturing capacity of the pond goose individuals, and the positions of the pond goose individuals are updated;
Calculating the fitness value of each individual geese in the pond;
Judging whether the iteration termination condition is met, if so, outputting the position of the optimal individual as the super parameter of the deep learning network; if not, continuing to update the positions of the search stage and the development stage until the iteration termination condition is met.
2. The method for evaluating credit risk of a user based on data analysis according to claim 1, wherein the preprocessing and feature screening are performed to extract non-time-series attribute data and time-series attribute data of different users respectively to form samples, and the constructing the data set specifically includes:
Performing data cleaning, missing value processing and positive and negative sample division on basic information and historical credit behavior data of different users;
Performing feature screening on the basic information and the historical credit behavior data by a correlation analysis and principal component analysis method;
Carrying out quantization processing on the screened basic information to form non-time sequence attribute data of the sample;
According to the time stamp of the historical credit behavior data, carrying out data alignment and combination on the screened historical credit behavior data to form time sequence attribute data of a sample;
And normalizing the non-time sequence attribute data and the time sequence attribute data of the sample, and adding the non-time sequence attribute data and the time sequence attribute data into the data set.
3. The data analysis-based user credit risk assessment method according to claim 2, wherein the multi-branch deep learning network-based risk assessment model comprises an input layer, a first branch network, a second branch network, a feature fusion layer, a full connection layer and an output layer;
The input layer is used for respectively reading non-time sequence attribute data and time sequence attribute data of the sample and converting the format;
The first branch network is respectively connected with the input layer and the feature fusion layer and is used for extracting features of non-time sequence attribute data of the sample; the first branch network adopts a convolutional neural network;
The second branch network is respectively connected with the input layer and the feature fusion layer and is used for extracting the features of the time sequence attribute data of the sample; the second branch network adopts Reformer neural network;
the feature fusion layer is used for fusing the features extracted by the first branch network and the second branch network;
The characteristic fusion layer, the full-connection layer and the output layer are sequentially connected.
4. The method for evaluating credit risk of a user based on data analysis according to claim 1, wherein the step of introducing a refraction factor to evaluate the depth of the game in the water during the searching step comprises the steps of:
calculating an average value f' of fitness values of each individual geese in the pond;
Calculating a refraction factor n:
Wherein epsilon is a preset fitness threshold value, and f (·) is a fitness function; x i(t)、Xb (t) is the position of the individual i at the t-th iteration, the position of the optimal individual, respectively;
Estimating the depth h of the hunting in the water based on the refraction factor: h=n×f (X b (t));
if h is more than or equal to f', performing hunting search by adopting a U-shaped diving mode, otherwise, performing hunting search by adopting a V-shaped diving mode.
5. The method for evaluating credit risk of a user based on data analysis according to claim 4, wherein the formula for mirroring the current optimal individual position and updating the optimal individual position is:
Wherein U, L is the upper limit and the lower limit of the search space respectively, X b (t) is the optimal individual position before mirror image processing, and X b' (t) is the mirror image position obtained by mirror image processing of X b (t);
Is the updated optimal individual position.
6. The method for evaluating credit risk of a user based on data analysis according to claim 5, wherein in the development stage, evaluating the capturing ability of the individual geese in the pool, judging whether to capture a game or randomly walk according to the capturing ability of the individual geese in the pool, and updating the position of the individual geese in the pool specifically comprises:
Evaluation of Capacity C=f (X i) based on fitness value of individual geese in pool, if Then proceeding to the current optimal individual to capture the prey, otherwise, carrying out random walk through the Laiweighing flight strategy, wherein the position updating formula is as follows:
Wherein X i(t)、Xi (t+1) is the position of the individual i at the t-th and t+1-th iterations, w is the adaptive weight, C 1 and c 2 are preset weight maximum and minimum values, respectively.
7. A user credit risk assessment system based on data analysis using the method of any one of claims 1 to 6, the system comprising:
the data set construction module: the method comprises the steps of acquiring basic information and historical credit behavior data of different users, preprocessing and index screening, respectively extracting non-time sequence attribute data and time sequence attribute data of the different users to form samples, and constructing a data set;
Model building module: the risk assessment model based on the multi-branch deep learning network is built, and each branch of the multi-branch deep learning network is used for extracting the characteristics of non-time sequence attribute data and the characteristics of time sequence attribute data of a sample respectively;
Model training module: the risk assessment model is trained through the data set, and the super parameters of the risk assessment model are optimized by adopting an improved swan optimization algorithm;
risk assessment module: for performing a user credit risk assessment based on the trained risk assessment model.
8. A computer readable storage medium storing computer instructions for causing a computer to implement the method of any one of claims 1 to 6.
CN202311502563.6A 2023-11-10 2023-11-10 User credit risk assessment method and system based on data analysis Active CN117557361B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202311502563.6A CN117557361B (en) 2023-11-10 2023-11-10 User credit risk assessment method and system based on data analysis

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202311502563.6A CN117557361B (en) 2023-11-10 2023-11-10 User credit risk assessment method and system based on data analysis

Publications (2)

Publication Number Publication Date
CN117557361A CN117557361A (en) 2024-02-13
CN117557361B true CN117557361B (en) 2024-04-26

Family

ID=89821336

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202311502563.6A Active CN117557361B (en) 2023-11-10 2023-11-10 User credit risk assessment method and system based on data analysis

Country Status (1)

Country Link
CN (1) CN117557361B (en)

Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1034081A (en) * 1987-01-03 1989-07-19 陈树芳 The teaching tools of classified literals and study thereof
GB201213491D0 (en) * 2012-07-30 2012-09-12 Gaiasoft Ip Ltd Content delivery system
WO2014108762A2 (en) * 2013-01-14 2014-07-17 Yogesh Chunilal Rathod Dynamic products & services card & account and/or global payments & mobile network(s) mediated & managed dynamic e-commerce, advertising & marketing platform(s) and service(s)
CN105654361A (en) * 2015-12-30 2016-06-08 广东科海信息科技股份有限公司 Method and system for assessing credit based on community O2O
CN112581264A (en) * 2020-12-23 2021-03-30 百维金科(上海)信息科技有限公司 Grasshopper algorithm-based credit risk prediction method for optimizing MLP neural network
CN113468817A (en) * 2021-07-13 2021-10-01 淮阴工学院 Ultra-short-term wind power prediction method based on IGOA (optimized El-electric field model)
CN113487403A (en) * 2021-06-29 2021-10-08 百维金科(上海)信息科技有限公司 Credit risk assessment system, method, device and medium
CN114330815A (en) * 2021-11-10 2022-04-12 淮阴工学院 Ultra-short-term wind power prediction method and system based on improved GOA (generic object oriented architecture) optimized LSTM (least Square TM)
CN115131131A (en) * 2022-07-06 2022-09-30 浙江财经大学 Credit risk assessment method for unbalanced data set multi-stage integration model

Patent Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1034081A (en) * 1987-01-03 1989-07-19 陈树芳 The teaching tools of classified literals and study thereof
GB201213491D0 (en) * 2012-07-30 2012-09-12 Gaiasoft Ip Ltd Content delivery system
WO2014108762A2 (en) * 2013-01-14 2014-07-17 Yogesh Chunilal Rathod Dynamic products & services card & account and/or global payments & mobile network(s) mediated & managed dynamic e-commerce, advertising & marketing platform(s) and service(s)
CN105654361A (en) * 2015-12-30 2016-06-08 广东科海信息科技股份有限公司 Method and system for assessing credit based on community O2O
CN112581264A (en) * 2020-12-23 2021-03-30 百维金科(上海)信息科技有限公司 Grasshopper algorithm-based credit risk prediction method for optimizing MLP neural network
CN113487403A (en) * 2021-06-29 2021-10-08 百维金科(上海)信息科技有限公司 Credit risk assessment system, method, device and medium
CN113468817A (en) * 2021-07-13 2021-10-01 淮阴工学院 Ultra-short-term wind power prediction method based on IGOA (optimized El-electric field model)
CN114330815A (en) * 2021-11-10 2022-04-12 淮阴工学院 Ultra-short-term wind power prediction method and system based on improved GOA (generic object oriented architecture) optimized LSTM (least Square TM)
CN115131131A (en) * 2022-07-06 2022-09-30 浙江财经大学 Credit risk assessment method for unbalanced data set multi-stage integration model

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
Gannet optimization algorithm : A new metaheuristic algorithm for solving engineering optimization problems;Jeng-Shyang Pan 等;《Mathematics and Computers in Simulation》;20221231;第202卷;第343-373页 *
QS银行小微企业信贷风险管理研究;郭春霞;《中国优秀硕士学位论文全文数据库 经济与管理科学辑》;20190515(第5期);第J152-1445页 *

Also Published As

Publication number Publication date
CN117557361A (en) 2024-02-13

Similar Documents

Publication Publication Date Title
CN110659744B (en) Training event prediction model, and method and device for evaluating operation event
CN109389494B (en) Loan fraud detection model training method, loan fraud detection method and device
CN111369299B (en) Identification method, device, equipment and computer readable storage medium
CN108475393A (en) The system and method that decision tree is predicted are promoted by composite character and gradient
CN110276679A (en) A kind of network individual credit fraud detection method towards deep learning
CN113011895B (en) Associated account sample screening method, device and equipment and computer storage medium
CN105786711A (en) Data analysis method and device
CN112733997A (en) Hydrological time series prediction optimization method based on WOA-LSTM-MC
KR102330423B1 (en) Online default forecasting system using image recognition deep learning algorithm
WO2021144943A1 (en) Control method, information processing device, and control program
CN110866832A (en) Risk control method, system, storage medium and computing device
CN110634060A (en) User credit risk assessment method, system, device and storage medium
CN111090833A (en) Data processing method, system and related equipment
CN113657990A (en) Ant-lion algorithm optimized NARX neural network risk prediction system and method
CN113240506A (en) Financial wind-controlled cold start modeling method based on unsupervised field self-adaptation
CN114721898A (en) Edge cloud server utilization rate prediction method and device based on boosting algorithm and storage medium
CN114519508A (en) Credit risk assessment method based on time sequence deep learning and legal document information
CN109146667A (en) A kind of construction method of the external interface integrated application model based on quantitative statistics
CN117557361B (en) User credit risk assessment method and system based on data analysis
CN116821759A (en) Identification prediction method and device for category labels, processor and electronic equipment
CN110516853B (en) Lean elimination time prediction method based on under-sampling improved AdaBoost algorithm
CN111833171B (en) Abnormal operation detection and model training method, device and readable storage medium
CN118037440A (en) Trusted data processing method and system for comprehensive credit system
CN117522562A (en) Credit scale prediction model based on kernel density estimation
CN117877232A (en) Geological disaster early warning method and system based on GIS

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant