CN111815432A

CN111815432A - Financial service risk prediction method and device

Info

Publication number: CN111815432A
Application number: CN202010654184.9A
Authority: CN
Inventors: 相妹; 卢健; 马格
Original assignee: Industrial and Commercial Bank of China Ltd ICBC
Current assignee: Industrial and Commercial Bank of China Ltd ICBC
Priority date: 2020-07-08
Filing date: 2020-07-08
Publication date: 2020-10-23
Anticipated expiration: 2040-07-08
Also published as: CN111815432B

Abstract

The embodiment of the application provides a financial service risk prediction method and a device, wherein the method comprises the following steps: receiving a financial service risk prediction request for a target user; selecting one of a plurality of financial service risk prediction models as a target financial service risk prediction model based on the type of the financial service request, wherein the financial service risk prediction models are obtained by applying respective corresponding training sets for training, and the training sets are target data sets obtained by applying a Spark system to perform data binning on historical user financial information of a plurality of users; and inputting the financial information of the user into the target financial service risk prediction model, and outputting the financial information as a financial service risk prediction result of the target user. The method and the device can effectively improve the reliability, efficiency and automation degree of the binning process of the training data, and further can effectively improve the accuracy, efficiency and automation degree of the financial service risk prediction model obtained by applying the binning data training to the financial institution user.

Description

Financial service risk prediction method and device

Technical Field

The application relates to the technical field of data processing, in particular to a financial service risk prediction method and device.

Background

As financial institutions such as banks provide more and more kinds of financial services to the public and more target groups for the financial services, the financial institutions need to pre-judge risks that may exist in providing a financial service to a user before providing the user with the financial service. At present, a machine learning model is generally used for automated prediction in a manner of financial service risk prediction by a financial institution, but the machine learning model currently applied by the financial institution generally needs a large amount of historical data for training, and when the historical data is applied, a large amount of manpower is required for data mining and processing. However, the data processing method by human power is time and labor consuming, and may cause processing errors due to human processing errors.

Based on the above, in the existing financial service risk prediction process, data binning processing is performed on training data in advance, and common binning methods are mainly classified into supervised chi-square binning and minimum entropy method binning, and unsupervised equidistant binning and equal-frequency binning. However, the unsupervised binning method does not consider the dependent variable, so that the improvement of the effect of the machine learning model obtained by training is limited to a certain extent, the machine learning model trained on the unevenly distributed data set is unstable, manual interference is required, the accuracy and the automation degree of the financial service risk prediction cannot be guaranteed, and the operation process of the supervised binning method is complex, time-consuming and labor-consuming. That is, the existing financial service risk prediction process cannot satisfy the accuracy and efficiency of financial service risk prediction at the same time.

Disclosure of Invention

Aiming at the problems in the prior art, the financial service risk prediction method and device can effectively improve the reliability, efficiency and automation degree of the binning process of training data, and further can effectively improve the accuracy, efficiency and automation degree of financial service risk prediction of a financial institution user by a financial service risk prediction model obtained by applying binning data training.

In order to solve the technical problem, the application provides the following technical scheme:

in a first aspect, the present application provides a financial service risk prediction method, including:

receiving a financial service risk prediction request aiming at a target user, wherein the financial service risk prediction request comprises user financial information of the target user and a corresponding financial service request type;

selecting one of a plurality of preset financial service risk prediction models as a target financial service risk prediction model based on the financial service request type, wherein each financial service risk prediction model is obtained by respectively applying a corresponding training set for training, and the training set is a target data set obtained by performing data binning on historical user financial information of a plurality of users by applying a Spark system in advance;

and inputting the user financial information of the target user into the target financial service risk prediction model, and taking the output of the target financial service risk prediction model as a financial service risk prediction result of the target user.

Further, the financial service request types include: the user category, the application type, the application state and the request type of the financial loan;

correspondingly, selecting one of a plurality of preset financial service risk prediction models as a target financial service risk prediction model based on the financial service request type, and the method comprises the following steps:

and selecting one financial service risk prediction model corresponding to the user category of the target user, the application type, the application state and the request type of the financial loan from a preset model table according to the user category of the target user, the application type, the application state and the request type of the financial loan, and determining the financial service risk prediction model as the target financial service risk prediction model corresponding to the target user at present.

Further, before the receiving a financial service risk prediction request for a target user, the method further comprises:

respectively extracting a numerical data set conforming to numerical characteristics and a character data set conforming to character characteristics from historical user financial information of a plurality of users;

acquiring a plurality of segmentation points corresponding to each element column in the numerical data set;

performing decision tree calculation on each dividing point by using the Spark system to obtain a kini coefficient value of each dividing point;

determining the splitting point with the minimum Keyney coefficient value in the multiple splitting points corresponding to each element column as the optimal splitting point of each corresponding element column;

generating an optimal box dividing and splitting point list according to the optimal splitting points of the element columns;

and generating a target data set corresponding to the historical user financial information of the plurality of users based on the optimal box-dividing segmentation point list, each element column and the character data set.

Further, the obtaining a plurality of segmentation points corresponding to each element column in the numerical data set includes:

if the missing value exists in each element column in the numerical data set after being searched, the minimum value in the element column with the missing value is applied to fill in the missing value;

carrying out duplicate removal processing on each numerical value in each element column;

sequencing the numerical values in each element column from small to large, and respectively determining the numerical values in each sequenced element column as initial segmentation points;

judging whether the number of the initial segmentation points corresponding to each element row is larger than a number threshold, if the element rows with the number of the initial segmentation points larger than the number threshold exist, randomly selecting a numerical value equal to the number threshold from the element rows and determining the numerical value as the segmentation point of the element row;

if the element column with the number of the initial dividing points smaller than or equal to the number threshold exists, determining each initial dividing point in the element column as the dividing point of the element column.

Further, the generating a target data set corresponding to historical user financial information of a plurality of users based on the optimal bin-dividing point list, each element column and the character data set comprises:

applying the Spark system, and parallelly executing a preset dichotomy searching step on each element column based on the optimal box dividing and splitting point list to obtain a discretization element column corresponding to each element column;

and splicing each discretization element column with the character data set respectively to obtain a target data set corresponding to the historical user financial information of a plurality of users.

Further, the dichotomy searching step comprises:

and acquiring index numbers of the box segments, corresponding to the values in each element column, in the optimal box dividing and splitting point list by using a binary search algorithm in a Spark system, and replacing the values corresponding to the element columns respectively by using the acquired index numbers.

Further, the target data set comprises a training set;

correspondingly, before the receiving of the financial service risk prediction request for the target user, the method further includes:

and applying the training set to train to obtain a financial service risk prediction model.

Further, the target data set further comprises a test set;

correspondingly, after the financial service risk prediction model is obtained by applying the training set training, the method further includes:

and performing effect test on the financial service risk prediction model based on the test set, and adjusting the financial service risk prediction model based on a corresponding effect test result.

In a second aspect, the present application provides a financial service risk prediction apparatus, comprising:

the system comprises a request receiving module, a risk prediction module and a risk prediction module, wherein the request receiving module is used for receiving a financial service risk prediction request aiming at a target user, and the financial service risk prediction request comprises user financial information of the target user and a corresponding financial service request type;

the model selection module is used for selecting one of a plurality of preset financial service risk prediction models as a target financial service risk prediction model based on the financial service request type, wherein each financial service risk prediction model is obtained by respectively applying a corresponding training set for training, and the training set is a target data set obtained by performing data binning on historical user financial information of a plurality of users by applying a Spark system in advance;

and the risk prediction module is used for inputting the user financial information of the target user into the target financial service risk prediction model and outputting the target financial service risk prediction model as a financial service risk prediction result of the target user.

correspondingly, the request receiving module is used for executing the following contents:

Further, still include:

the data dividing module is used for respectively extracting a numerical data set which accords with numerical type characteristics and a character data set which accords with character type characteristics from historical user financial information of a plurality of users;

a segmentation point acquisition module, configured to acquire a plurality of segmentation points corresponding to each element column in the numerical data set;

the decision tree calculation module is used for performing decision tree calculation on each segmentation point by applying the Spark system to obtain a kini coefficient value of each segmentation point;

an optimal dividing point obtaining module, configured to determine a dividing point with a smallest damping coefficient value among the plurality of dividing points corresponding to each element row as an optimal dividing point of each corresponding element row;

the list generating module is used for generating an optimal box dividing and dividing point list according to the optimal dividing points of each element column;

and the target data set generating module is used for generating a target data set corresponding to the historical user financial information of a plurality of users based on the optimal box dividing and dividing point list, each element column and the character data set.

Further, the segmentation point obtaining module is configured to perform the following:

Further, the target data set generation module is configured to perform the following:

Further, the target data set generation module further comprises: a binary search unit;

the dichotomy searching unit is used for executing the dichotomy searching step, and the dichotomy searching step comprises the following steps: and acquiring index numbers of the box segments, corresponding to the values in each element column, in the optimal box dividing and splitting point list by using a binary search algorithm in a Spark system, and replacing the values corresponding to the element columns respectively by using the acquired index numbers.

Further, the target data set comprises a training set;

correspondingly, the financial service risk prediction device further comprises:

and the model training module is used for applying the training set to train to obtain a financial service risk prediction model.

Further, the target data set further comprises a test set;

and the model testing module is used for carrying out effect testing on the financial service risk prediction model based on the testing set and adjusting the financial service risk prediction model based on a corresponding effect testing result.

In a third aspect, the present application provides an electronic device, including a memory, a processor, and a computer program stored in the memory and executable on the processor, wherein the processor executes the computer program to implement the financial service risk prediction method.

In a fourth aspect, the present application provides a computer-readable storage medium having stored thereon a computer program which, when executed by a processor, implements the financial service risk prediction method.

According to the technical scheme, the financial service risk prediction method and device provided by the application comprise the following steps: receiving a financial service risk prediction request aiming at a target user, wherein the financial service risk prediction request comprises user financial information of the target user and a corresponding financial service request type; selecting one of a plurality of preset financial service risk prediction models as a target financial service risk prediction model based on the financial service request type, wherein each financial service risk prediction model is obtained by respectively applying a corresponding training set for training, and the training set is a target data set obtained by performing data binning on historical user financial information of a plurality of users by applying a Spark system in advance; inputting the user financial information of the target user into the target financial service risk prediction model, taking the output of the target financial service risk prediction model as a financial service risk prediction result of the target user, and performing different model predictions according to different request types, so that the application universality of financial service risk prediction can be effectively improved, and the risk prediction efficiency of various prediction requests can be effectively improved; the data classification is carried out on historical financial information of a plurality of users by applying the Spark system, so that the reliability, the efficiency and the automation degree of the classification process of training data can be effectively improved, a financial service risk prediction model is obtained by applying the data set training after the Spark system is used for carrying out data classification, the application reliability and the stability of the financial service risk prediction model can be effectively improved, the loss of characteristic information is reduced, more self characteristic information is reserved in a data set entering the model, and the reduction of the overfitting risk of the model and the improvement of the accuracy and the stability of the data model are facilitated; the financial service risk prediction model is applied to perform financial service risk prediction on a target user, so that the accuracy, the efficiency and the automation degree of the financial service risk prediction process can be effectively improved, the labor cost and the time cost of a financial institution can be effectively reduced, the accuracy and the efficiency of identifying risk users by the financial institution can be effectively improved, for example, credit risk ratings or default risk probabilities before, during and after loan of users or enterprises applying for various financial loans by the users of the financial institution can be rapidly and accurately predicted, and the operation safety and the operation stability of the financial institution can be effectively improved.

Drawings

In order to more clearly illustrate the embodiments of the present application or the technical solutions in the prior art, the drawings needed to be used in the description of the embodiments or the prior art will be briefly introduced below, and it is obvious that the drawings in the following description are some embodiments of the present application, and it is obvious for those skilled in the art to obtain other drawings based on these drawings without creative efforts.

Fig. 1 is a schematic flow chart of a financial service risk prediction method in an embodiment of the present application.

FIG. 2 is a flowchart illustrating a financial services risk prediction method including step 110 according to an embodiment of the present disclosure.

Fig. 3 is a schematic flowchart illustrating steps 010 to 060 in the financial service risk prediction method in the embodiment of the present application.

Fig. 4 is a flowchart illustrating a method for predicting risk of financial services including step 070 in an embodiment of the present application.

FIG. 5 is a flowchart illustrating a method for risk prediction of financial services according to an embodiment of the present invention, including step 080.

Fig. 6 is a schematic diagram of a first structure of a financial service risk prediction apparatus in an embodiment of the present application.

Fig. 7 is a schematic diagram of a second structure of a financial service risk prediction apparatus in an embodiment of the present application.

Fig. 8 is a schematic diagram of a third structure of a financial service risk prediction apparatus in an embodiment of the present application.

Fig. 9 is a fourth structural diagram of the financial service risk prediction apparatus in the embodiment of the present application.

Fig. 10 is a schematic structural diagram of a financial service risk prediction system provided in an application example of the present application.

Fig. 11 is a schematic structural diagram of a preprocessing unit provided in an application example of the present application.

Fig. 12 is a schematic structural diagram of a decision tree binning unit provided in an application example of the present application.

Fig. 13 is a schematic structural diagram of a feature discretization unit provided in an application example of the present application.

Fig. 14 is a schematic structural diagram of an electronic device in an embodiment of the present application.

Detailed Description

In order to make the objects, technical solutions and advantages of the embodiments of the present application clearer, the technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application, and it is obvious that the described embodiments are some embodiments of the present application, but not all embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present application.

Data binning (also called discrete binning) is a data preprocessing technique and is also an important data processing operation in data mining feature engineering. The method is used for reducing the influence of secondary observation errors, improving the stability of the model and reducing the overfitting risk of the model, and is a method for grouping a plurality of continuous values into a small number of 'boxes'. At present, common binning methods are mainly classified into supervised chi-square binning, minimum entropy binning, unsupervised equidistant binning, equal-frequency binning and the like.

Due to the unsupervised binning method, the problem that dependent variables are not considered exists, the improvement of the model effect is not facilitated, and in addition, the operation process of the supervised binning method such as chi-square is complex, time-consuming and labor-consuming.

In order to optimize the problems of unstable performance of linear binning on unevenly distributed data sets, manual interference and the like, and to improve the calculation processing speed in the processing and modeling of large-scale data sets, an application of an optimal binning method and a fast, efficient and general calculation framework is urgently needed.

Based on this, in view of the problem that the accuracy and efficiency of the financial service risk prediction cannot be simultaneously satisfied in the existing financial service risk prediction process, embodiments of the present application provide a financial service risk prediction method, a financial service risk prediction apparatus, an electronic device, and a computer-readable storage medium, by receiving a financial service risk prediction request for a target user, where the financial service risk prediction request includes user financial information of the target user and a corresponding financial service request type; selecting one of a plurality of preset financial service risk prediction models as a target financial service risk prediction model based on the financial service request type, wherein each financial service risk prediction model is obtained by respectively applying a corresponding training set for training, and the training set is a data set obtained by performing data binning on historical financial information of a plurality of users by applying a Spark system in advance; inputting the user financial information of the target user into the target financial service risk prediction model, taking the output of the target financial service risk prediction model as a financial service risk prediction result of the target user, and performing different model predictions according to different request types, so that the application universality of financial service risk prediction can be effectively improved, and the risk prediction efficiency of various prediction requests can be effectively improved; the data classification is carried out on historical financial information of a plurality of users by applying the Spark system, so that the reliability, the efficiency and the automation degree of the classification process of training data can be effectively improved, a financial service risk prediction model is obtained by applying the data set training after the Spark system is used for carrying out data classification, the application reliability and the stability of the financial service risk prediction model can be effectively improved, the loss of characteristic information is reduced, more self characteristic information is reserved in a data set entering the model, and the reduction of the overfitting risk of the model and the improvement of the accuracy and the stability of the data model are facilitated; the financial service risk prediction model is applied to perform financial service risk prediction on a target user, so that the accuracy, the efficiency and the automation degree of the financial service risk prediction process can be effectively improved, the labor cost and the time cost of a financial institution can be effectively reduced, the accuracy and the efficiency of identifying risk users by the financial institution can be effectively improved, for example, credit risk ratings or default risk probabilities before, during and after loan of users or enterprises applying for various financial loans by the users of the financial institution can be rapidly and accurately predicted, and the operation safety and the operation stability of the financial institution can be effectively improved.

Specifically, the following examples are given to illustrate the respective embodiments.

In order to solve the problem that the accuracy and efficiency of financial service risk prediction cannot be simultaneously met in the existing financial service risk prediction process, the application provides an embodiment of a financial service risk prediction method, which specifically includes the following contents, with reference to fig. 1:

step 100: receiving a financial service risk prediction request aiming at a target user, wherein the financial service risk prediction request comprises user financial information of the target user and a corresponding financial service request type.

It is understood that the user financial information of the target user includes historical transaction data or transaction requests of the user within a preset time period. Specific examples of the user financial information may be data information such as credit investigation, asset running, tax payment and the like of an individual or an enterprise.

It is understood that the financial service request types include: user category, application type of financial loan, application status, and request type. The method specifically comprises the following steps:

the user categories may specifically include: personal users and business users, etc.

The application type of the financial loan may specifically include: pledge loan, house loan, car loan, fitment loan, travel loan, business loan, and education loan, etc.

The application status of the financial loan may specifically include: before, during, and after the loan, etc.

The request type may specifically include: credit risk rating request and breach risk probability prediction request, etc.

Step 200: selecting one of a plurality of preset financial service risk prediction models as a target financial service risk prediction model based on the financial service request type, wherein each financial service risk prediction model is obtained by respectively applying a corresponding training set for training, and the training set is a target data set obtained by performing data binning on historical user financial information of a plurality of users by applying a Spark system in advance.

In step 200, the Spark system is a fast, general-purpose cluster computing engine for large-scale data. The decision tree algorithm can optimally divide the variable characteristics into boxes according to the specific relation between the variable and the dependent variable. By applying a decision tree algorithm and a recursion method, the optimal binning of continuous numerical features is realized, and by using operation methods such as Spark RDD data set representation, Select and Map and a binary search method, the discretization of continuous features in a business data wide table is quickly realized, and the purposes of improving the model accuracy and accelerating the large-scale data calculation speed are achieved.

Step 300: and inputting the user financial information of the target user into the target financial service risk prediction model, and taking the output of the target financial service risk prediction model as a financial service risk prediction result of the target user.

It is to be understood that the financial services risk prediction model may specifically be a classification or regression model trained by machine-learned model algorithms for credit risk rating or breach risk probability. May be, but is not limited to: a logistic regression LR model, a generalized linear regression GLM model, a gradient lifting decision tree GBDT model, a lifting tree XGboost model and the like.

As can be seen from the above description, the financial service risk prediction method provided in the embodiment of the present application can effectively improve the application universality of financial service risk prediction and effectively improve the efficiency of risk prediction for various prediction requests by performing different model predictions for different request types; the data classification is carried out on historical financial information of a plurality of users by applying the Spark system, so that the reliability, the efficiency and the automation degree of the classification process of training data can be effectively improved, a financial service risk prediction model is obtained by applying the data set training after the Spark system is used for carrying out data classification, the application reliability and the stability of the financial service risk prediction model can be effectively improved, the loss of characteristic information is reduced, more self characteristic information is reserved in a data set entering the model, and the reduction of the overfitting risk of the model and the improvement of the accuracy and the stability of the data model are facilitated; the financial service risk prediction model is applied to carry out financial service risk prediction on a target user, so that the accuracy, the efficiency and the automation degree of the financial service risk prediction process can be effectively improved, the labor cost and the time cost of a financial institution can be effectively reduced, the accuracy and the efficiency of identifying risk users by the financial institution can be effectively improved, for example, the credit risk rating or default risk probability of individuals or enterprises applying for various financial loans by the users of the financial institution, including before, during and after the loan, can be rapidly and accurately predicted, and the operation safety and the operation stability of the financial institution can be effectively improved.

In order to provide a preferred way of predicting an application process, in an embodiment of the method for predicting a risk of a financial service provided by the present application, the financial service request type includes: the user category, the application type, the application state and the request type of the financial loan; referring to fig. 2, step 100 of the method for predicting risk of financial services specifically includes the following steps:

step 110: and selecting one financial service risk prediction model corresponding to the user category of the target user, the application type, the application state and the request type of the financial loan from a preset model table according to the user category of the target user, the application type, the application state and the request type of the financial loan, and determining the financial service risk prediction model as the target financial service risk prediction model corresponding to the target user at present.

It is understood that the model table is used for storing the corresponding relationship among the user category of each user, the application type of the financial loan, the application state, the request type and the financial service risk prediction model. See table 1 below for specific examples.

TABLE 1

Based on table 1, if the user category in the financial service request types included in the currently received financial service risk prediction request of the target user is an enterprise user, the application type of the financial loan is a car loan, the application state of the financial loan is before the loan, and the request type is a credit risk rating request, the corresponding financial service risk prediction model is found in the model table as a model 11, and the model 11 is used as the current target financial service risk prediction model of the target user.

In the model table, the difference between the financial service risk prediction models lies in the difference of the selection of training data, and each financial service risk prediction model is obtained by training the user history data corresponding to the user category, the application type of the financial loan, the application state and the request type which correspond to the financial service risk prediction model in the model table.

As can be seen from the above description, the financial service risk prediction method provided in the embodiment of the present application can further effectively improve the application universality of financial service risk prediction, effectively improve the efficiency of risk prediction for various prediction requests, quickly and accurately predict credit risk ratings or default risk probabilities before, during, and after loan of individuals or enterprises applying for various financial loans by users of financial institutions, and effectively improve the safety and stability of the operation of financial institutions.

In order to provide a decision tree application in the Spark system, in an embodiment of the financial service risk prediction method provided in the present application, referring to fig. 3, before step 100 of the financial service risk prediction method, the following content is further included:

step 010: and respectively extracting a numerical data set conforming to the numerical characteristic and a character data set conforming to the character type characteristic from historical user financial information of a plurality of users.

Step 020: and acquiring a plurality of segmentation points corresponding to each element column in the numerical data set.

Step 030: and respectively carrying out decision tree calculation on each dividing point by applying the Spark system so as to obtain the kini coefficient value of each dividing point.

Step 040: and respectively determining the splitting point with the minimum Keyney coefficient value in the multiple splitting points corresponding to each element column as the optimal splitting point of each corresponding element column.

Step 050: and generating an optimal box dividing and splitting point list according to the optimal splitting points of the element columns.

Step 060: and generating a target data set corresponding to the historical user financial information of the plurality of users based on the optimal box-dividing segmentation point list, each element column and the character data set.

From the above description, in the financial service risk prediction method provided in the embodiment of the present application, the decision tree method selects the segmentation points according to the label values corresponding to the features, and retains and uses the self information of the features to a great extent, thereby avoiding the problems of unstable performance, subjectivity of manual intervention, and the like in the case of selecting the segmentation points by the linear binning method. And calculating Gini indexes of all possible segmentation points of each feature, outputting the segmentation points according to parameters such as adjustable tree depth, segmentation number and the like, and realizing the optimal classification of the discretization of the feature data.

In order to provide a preferred way of setting the segmentation point, in an embodiment of the method for predicting risk of financial services provided by the present application, step 020 in the method for predicting risk of financial services specifically includes the following steps:

step 021: and if the missing value exists in each element column in the numerical data set after searching, applying the minimum value in the element column with the missing value to fill in the missing value.

Step 022: and carrying out deduplication processing on each numerical value in each element column.

Step 023: and sequencing the numerical values in each element column from small to large, and determining the numerical values in each sequenced element column as initial segmentation points.

And 024: judging whether the number of the initial segmentation points corresponding to each element column is greater than a number threshold, if the element columns with the number of the initial segmentation points greater than the number threshold exist, executing step 025: randomly selecting a value equal to a quantity threshold value in the element column to determine the dividing point of the element column.

If there are element columns for which the number of initial segmentation points is less than or equal to the number threshold, go to step 026: and respectively determining each initial dividing point in the element column as the dividing point of the element column.

From the above description, the financial service risk prediction method provided by the embodiment of the application can effectively improve the reliability and efficiency of the segmentation point setting, further can effectively improve the accuracy and efficiency of obtaining the optimal binning segmentation point list, and further can effectively improve the reliability, efficiency and automation degree of the binning process of the training data.

In order to provide a parallel computing manner in Spark, in an embodiment of the method for predicting risk of financial services provided in the present application, step 060 of the method for predicting risk of financial services specifically includes the following steps:

step 061: and applying the Spark system, and parallelly executing a preset dichotomy searching step on each element column based on the optimal box dividing and splitting point list so as to obtain a discretization element column corresponding to each element column.

Step 062: and splicing each discretization element column with the character data set respectively to obtain a target data set corresponding to the historical user financial information of a plurality of users.

As can be seen from the above description, in the financial service risk prediction method provided in the embodiment of the present application, the RDD of Spark can perform parallel computation on the data set and the operation methods such as Select and Map thereof, and the fast speed and the high efficiency of large-scale computation are achieved through distributed parallel computation.

In order to provide a binary search calculation method in Spark, in an embodiment of the financial service risk prediction method provided in the present application, the binary search step in the financial service risk prediction method specifically includes the following steps:

As can be seen from the above description, the financial service risk prediction method provided in the embodiment of the present application can output binning segmentation points according to parameters such as adjustable tree depth and binning segment number, so as to implement optimal classification of discretization of feature data.

In order to perform model training on the classification result, in one embodiment of the financial service risk prediction method provided by the application, the target data set comprises a training set; referring to fig. 4, the financial service risk prediction method further includes the following steps after step 060 and before step 100:

step 070: and applying the training set to train to obtain a financial service risk prediction model.

From the above description, the financial service risk prediction method provided by the embodiment of the application can effectively improve the application reliability and stability of the financial service risk prediction model, reduce the loss of the characteristic information, and enable the data set entering the model to retain more self characteristic information, thereby being beneficial to reducing the overfitting risk of the model and improving the accuracy and stability of the data model.

In order to perform model testing on the classification result, in one embodiment of the financial service risk prediction method provided by the application, the target data set comprises a training set; referring to fig. 5, the financial service risk prediction method further includes the following steps after step 070 and before step 100:

step 080: and performing effect test on the financial service risk prediction model based on the test set, and adjusting the financial service risk prediction model based on a corresponding effect test result.

From the above description, the financial service risk prediction method provided by the embodiment of the application can further improve the application reliability and stability of the financial service risk prediction model, reduce the loss of the characteristic information, and enable the data set entering the model to retain more self characteristic information, thereby being beneficial to reducing the overfitting risk of the model and improving the accuracy and stability of the data model.

In terms of software, in order to solve the problem that the existing financial service risk prediction process cannot simultaneously satisfy the accuracy and efficiency of the financial service risk prediction, the present application provides an embodiment of a financial service risk prediction apparatus for executing all or part of the content in the financial service risk prediction method, and referring to fig. 6, the financial service risk prediction apparatus specifically includes the following contents:

the request receiving module 10 is configured to receive a financial service risk prediction request for a target user, where the financial service risk prediction request includes user financial information of the target user and a corresponding financial service request type.

It is understood that the user financial information of the target user includes historical transaction data or transaction requests of the user within a preset time period.

And the model selecting module 20 is configured to select one of a plurality of preset financial service risk prediction models as a target financial service risk prediction model based on the financial service request type, where each financial service risk prediction model is obtained by applying a corresponding training set to train, and the training set is a target data set obtained by performing data binning on historical user financial information of a plurality of users by applying a Spark system in advance.

And the risk prediction module 30 is configured to input the user financial information of the target user into the target financial service risk prediction model, and output the target financial service risk prediction model as a financial service risk prediction result of the target user.

As can be seen from the above description, the financial service risk prediction apparatus provided in the embodiment of the present application can effectively improve the application universality of financial service risk prediction and effectively improve the efficiency of risk prediction for various prediction requests by performing different model predictions for different request types; the data classification is carried out on historical financial information of a plurality of users by applying the Spark system, so that the reliability, the efficiency and the automation degree of the classification process of training data can be effectively improved, a financial service risk prediction model is obtained by applying the data set training after the Spark system is used for carrying out data classification, the application reliability and the stability of the financial service risk prediction model can be effectively improved, the loss of characteristic information is reduced, more self characteristic information is reserved in a data set entering the model, and the reduction of the overfitting risk of the model and the improvement of the accuracy and the stability of the data model are facilitated; the financial service risk prediction model is applied to carry out financial service risk prediction on a target user, so that the accuracy, the efficiency and the automation degree of the financial service risk prediction process can be effectively improved, the labor cost and the time cost of a financial institution can be effectively reduced, the accuracy and the efficiency of identifying risk users by the financial institution can be effectively improved, for example, the credit risk rating or default risk probability of individuals or enterprises applying for various financial loans by the users of the financial institution, including before, during and after the loan, can be rapidly and accurately predicted, and the operation safety and the operation stability of the financial institution can be effectively improved.

In order to provide a preferred way of predicting the application process, in an embodiment of the financial service risk prediction apparatus provided by the present application, the financial service request type includes: the user category, the application type, the application state and the request type of the financial loan; the request receiving module 10 in the financial service risk prediction device is used for executing the following steps:

As can be seen from the above description, the financial service risk prediction apparatus provided in the embodiment of the present application can further effectively improve the application range of financial service risk prediction, and effectively improve the efficiency of risk prediction for various prediction requests, can quickly and accurately predict credit risk ratings or default risk probabilities before, during, and after loan of individuals or enterprises applying for various financial loans by users of financial institutions, and can effectively improve the safety and stability of the operation of financial institutions.

In order to provide a decision tree application in the Spark system, in an embodiment of the financial service risk prediction apparatus provided in the present application, referring to fig. 7, the financial service risk prediction apparatus further includes the following components:

the data dividing module 01 is configured to extract a numerical data set conforming to the numerical characteristic and a character data set conforming to the character characteristic from historical user financial information of a plurality of users, respectively.

A segmentation point obtaining module 02, configured to obtain multiple segmentation points corresponding to each element column in the numerical data set.

And the decision tree calculation module 03 is configured to perform decision tree calculation on each splitting point by using the Spark system, so as to obtain a kini coefficient value of each splitting point.

And an optimal splitting point obtaining module 04, configured to determine, as the optimal splitting point of each corresponding element column, the splitting point with the smallest value of the kini coefficient in the multiple splitting points corresponding to each element column.

And the list generating module 05 is configured to generate an optimal box dividing and splitting point list according to the optimal splitting points of each element column.

And a target data set generating module 06, configured to generate target data sets corresponding to historical user financial information of multiple users based on the optimal binning and splitting point list, each element column, and the character data set.

From the above description, it can be seen that in the financial service risk prediction apparatus provided in the embodiment of the present application, the decision tree method selects the segmentation points according to the label values corresponding to the features, and retains and uses the self information of the features to a great extent, thereby avoiding the problems of unstable performance, subjectivity of manual intervention, and the like in the linear binning method for selecting the segmentation points. And calculating Gini indexes of all possible segmentation points of each feature, outputting the segmentation points according to parameters such as adjustable tree depth, segmentation number and the like, and realizing the optimal classification of the discretization of the feature data.

In order to provide a preferred way of setting a cut point, in an embodiment of the financial service risk prediction apparatus provided in the present application, the cut point obtaining module 02 of the financial service risk prediction apparatus is specifically configured to perform the following:

From the above description, the financial service risk prediction device provided by the embodiment of the application can effectively improve the reliability and efficiency of the setting of the segmentation points, further can effectively improve the accuracy and efficiency of obtaining the optimal binning segmentation point list, and further can effectively improve the reliability, efficiency and automation degree of the binning process of the training data.

In order to provide a parallel computing manner in Spark, in an embodiment of the financial service risk prediction apparatus provided in the present application, the target data set generation module 06 in the financial service risk prediction apparatus is specifically configured to execute the following:

As can be seen from the above description, in the financial service risk prediction apparatus provided in the embodiment of the present application, the RDD of Spark can perform parallel computation on the data set and the operation methods such as Select and Map thereof, and the fast speed and the high efficiency of large-scale computation are achieved through distributed parallel computation.

In order to provide a binary search calculation method in Spark, in an embodiment of the financial service risk prediction apparatus provided in the present application, the target data set generating module 06 in the financial service risk prediction apparatus further includes the following contents: and a binary search unit.

As can be seen from the above description, the financial service risk prediction apparatus provided in the embodiment of the present application can output binning segmentation points according to parameters such as adjustable tree depth and binning segment number, so as to implement optimal classification of discretization of feature data.

In order to perform model training on the classification result, in one embodiment of the financial service risk prediction apparatus provided by the present application, the target data set comprises a training set; referring to fig. 8, the financial service risk prediction apparatus further includes the following contents:

and the model training module 07 is used for applying the training set to train to obtain a financial service risk prediction model.

From the above description, the financial service risk prediction device provided in the embodiment of the present application can effectively improve the application reliability and stability of the financial service risk prediction model, reduce the loss of the feature information, and enable the data set entering the model to retain more feature information of the data set, thereby facilitating the reduction of the overfitting risk of the model and the improvement of the accuracy and stability of the data model.

In order to perform model testing on the classification result, in one embodiment of the financial service risk prediction apparatus provided by the present application, the target data set comprises a training set; referring to fig. 9, the financial service risk prediction apparatus further includes the following contents:

and the model testing module 08 is used for performing effect testing on the financial service risk prediction model based on the test set and adjusting the financial service risk prediction model based on a corresponding effect testing result.

From the above description, the financial service risk prediction device provided in the embodiment of the present application can further improve the application reliability and stability of the financial service risk prediction model, reduce the loss of the feature information, and enable the data set entering the model to retain more feature information of the data set, thereby facilitating the reduction of the overfitting risk of the model and the improvement of the accuracy and stability of the data model.

In order to further explain the scheme, the present application further provides a specific application example of implementing the financial service risk prediction method by using a financial service risk prediction system, which mainly comprises a data big width table unit 1, a preprocessing unit 2, a feature extraction unit 3, a modeling unit 4 and a model evaluation unit 5, referring to fig. 10. The decision tree binning process based on Spark provided by the application example of the present application is mainly embodied in the preprocessing unit 2.

The preprocessing unit 2 is arranged between the source data importing and feature extracting unit 3 of the data wide table unit 1 in the whole process, and is mainly used for discretizing continuous features in a data set and outputting the data set subjected to box discretization to the feature extracting unit 3 for subsequent model training. As shown in fig. 11, the system specifically includes a data set splitting unit 21, a decision tree binning unit 22, and a feature discretization unit 23. Wherein:

and the data set splitting unit 21 is configured to split the data set in the data wide table unit 1, and output two parts, namely a training set and a test set.

And the decision tree binning unit 22 is configured to perform decision tree algorithm modeling on the continuous feature columns in the data set to achieve output of an optimal binning rule of each continuous feature column. In order to avoid the possibility of characteristic crossing in subsequent modeling, the method only performs decision tree binning rule learning on training set data so as to ensure the independence of a test set. And the specific decision tree algorithm is realized in a box manner.

The feature discretization unit 23 is configured to implement discretization of the continuous data feature columns of the training set and the test set respectively by using a binary search and Spark operation method according to the optimal binning rule learned from the training set in the decision tree unit 22, and correspondingly output discrete feature data sets, that is, the training set _ discrete feature and the test set _ discrete feature, as training set and test set inputs for subsequent feature extraction and modeling.

The main task of the decision tree binning unit 22 is to apply a decision tree algorithm to perform recursive training on training set data, learn and output an optimal binning rule for each feature column. Referring to fig. 12, the decision tree binning unit 22 mainly includes:

the feature column classification unit 22_1 trains all feature columns in the training set broad table one by one to judge the data types of the feature columns, and divides all the feature columns into character type features and numerical type features.

The missing value padding unit 22_2 determines whether a missing value exists in the sample for the numerical characteristic sequence, calculates the minimum value of the numerical sample of the sequence for the missing value, and pads the missing value.

All possible segmentation point calculation units 22_3 perform de-duplication on all element values of each feature column and sort the element values from small to large according to numerical values, so as to obtain all possible segmentation points of the feature. And for the excessive number of all the possible segmentation points, randomly selecting a certain number of segmentation points from the possible segmentation points according to a preset threshold value. The processing effectively reduces the calculation amount of the subsequent optimal box dividing and cutting point screening.

And the Gini index calculation unit 22_4 is used for defining the calculation logic of the Gini index (kini coefficient), and calculating the Gini index returned to a cut point according to a certain characteristic column element set D and the cut point.

And a decision tree unit 22_5 is constructed, Gini indexes of all possible segmentation points are calculated by calling a Gini index calculation unit 22_4 according to parameters such as set decision tree depth, the number of the box segmentation points and the like, so that the optimal segmentation points are obtained, and the training set is segmented. The decision tree binning unit carries out recursion processing until the optimal binning and splitting points with set number are screened out.

The optimal binning storage unit 22_6 stores the calculation result, that is, the optimal binning splitting point list of all the feature columns in the training set, in a tabular data structure DataFrame of Spark, and stores the result to a disk for calling.

The training set and the test set are both subjected to the computation process of the feature discretization unit 23. Referring to fig. 13, the feature discretization unit 23 specifically includes the following contents:

the training set 23_1 and the test set 23_2 respectively obtain the feature width tables of the training set and the test set output by the data set splitting unit 21.

The optimal binning importing unit 23_3 imports and reads the binning rule result stored in the optimal binning storage unit 22_6 by using Spark.

The Spark calculation unit 23_4 improves the operation efficiency by using the RDD dataset representation and the Select and Map operation methods, which can be calculated by Spark in parallel. The method comprises the steps of selecting all elements of a certain characteristic column and an optimal sub-box list corresponding to the characteristic column, and the Map method achieves parallel binary search of each element.

The binary search unit 23_5 uses a binary search method for a certain element of a certain feature column to correspondingly find out the index number of the box segment where the element is located from the optimal box list of the column, and replaces the element with the index number to realize the discretization of the continuous feature of the column.

The training set _ feature unit 23_6 obtains a discretized numerical feature column by performing operations on all continuous features in the training set through the Spark calculation unit 23_4 and the binary search unit 23_ 5. And then the character type characteristic column is spliced with the character type characteristic column which does not need to be subjected to box separation processing, and a complete training data set which can be directly used by the model is output.

And the test set _ feature unit 23_7 performs operation processing on the test set in accordance with the training set _ feature unit 23_6, and outputs a complete test data set which can be directly used by the model.

Among them, RDD (elastic Distributed data set) is the most basic data abstraction in Spark, and is a data set that can be computed in parallel. The query speed of data is greatly improved by using the RDD in a parallel computing mode, and function computation in the decision tree construction process is rapidly completed on each element of the RDD capable of being computed in parallel through Map operation.

Based on the above, the specific process of implementing the financial service risk prediction method by the financial service risk prediction system is as follows:

s1: splitting a data set of source data of the imported data big width table unit 1 according to a set proportion according to the data set lump amount, and outputting a training set and a test set.

S2: and converting the floating point number of all the characteristic columns in the training set one by one to judge the data type of the characteristic columns. Convertible into numeric type features, and vice versa, character type features.

S3: and judging whether a missing value exists in the sample or not according to the numerical characteristic column, calculating the minimum value of the numerical sample of the column and filling the missing value for the sample with the missing value.

S4: and carrying out de-duplication on all element values of each numerical characteristic column and sequencing the element values from small to large according to numerical values so as to obtain all possible segmentation points of the characteristic column. And for the excessive number of all the possible segmentation points, randomly selecting a certain number of segmentation points from the possible segmentation points according to a preset threshold value. The processing effectively reduces the calculation amount of the subsequent optimal box dividing and cutting point screening.

S5: and defining the calculation logic of the Gini index, and calculating the Gini index returned to a cut point according to a certain characteristic column element set D and the cut point.

S6: and calculating Gini indexes of all possible segmentation points according to parameters such as set decision tree depth, the number of the box segmentation points and the like, selecting the segmentation point with the minimum Gini index as an optimal segmentation point and segmenting the training set. The decision tree binning unit carries out recursion processing until the optimal binning and splitting points with set number are screened out.

There is a sample set D, and the Gini index calculation mode is as follows:

C_kis the subset of samples in D that belong to the kth class, k being the number of classes (decision trees are mostly two classes).

If the sample set D is divided into two parts D1 and D2 according to whether the feature column A takes a certain possible value a, namely:

D₁＝{(x，y)∈D|A(x)＝a}，D₂＝D-D₁

then under the condition of the characteristic column a, the Gini index of the set D is calculated as:

s7: and storing the calculation result, namely the optimal box dividing and splitting point list C of all the feature columns in the training set by using a tabular data structure DataFrame of Spark, and storing the data in a disk for calling.

S8: importing and reading the result of the binning rule (optimal binning splitting point list C) saved in S7 by using Spark;

s9: and (3) obtaining a discretized numerical characteristic column by performing the following operations (1) and (2) on all continuous characteristics in the training set. Then the character type characteristic column is spliced with the character type characteristic column which does not need to be subjected to box separation processing, and a complete training data set A which can be directly used by the model is output_TARGET。

S10: the following operation processing (1) and (2) is carried out on the test set, and the complete output which can be directly used by the model is outputTest data set B_TARGET。

(1) The RDD data set representation capable of being calculated in parallel by Spark and the operation method of Select and Map are used for improving the operation efficiency. The method comprises the steps of selecting all elements of a certain characteristic column and an optimal sub-box list corresponding to the characteristic column, and the Map method achieves parallel binary search of each element.

(2) Aiming at a certain element of a certain characteristic column, a binary search method is used for correspondingly finding out the index number of the box section where the element is located from the optimal box list of the column, and the index number is used for replacing the element to realize the discretization of the continuous characteristic of the column.

The method and the related data provided by the application example of the application are applied to credit risk rating or default risk probability of individuals or enterprises applying for various financial loans, including before-loan, in-loan and after-loan.

The method is characterized in that processing based on a Spark decision tree binning method provided by the application example of the application is carried out on source data, and a feature wide table data set which does not contain missing values and is standard in feature coding and can be directly input into a model is formed. And training through a machine-learned model algorithm to obtain a classification or regression model for credit risk rating or default risk probability. For the machine learning model used, it may be, but is not limited to: a logistic regression LR model, a generalized linear regression GLM model, a gradient lifting decision tree GBDT model, a lifting tree XGboost model and the like.

From the above description, the financial service risk prediction method provided by the application example of the present application has the following advantages:

1. the processing speed of large-scale data is improved. Aiming at a large-scale data set with tens of millions of characteristics, the problems of long consumed time and large occupied resources exist when the data set is operated by using a pandas data structure, the RDD of Spark can calculate the data set and operation methods such as Select and Map thereof in parallel, and the fast speed and the high efficiency of large-scale operation are realized through distributed parallel calculation.

2. The optimization of the binning and splitting points is improved. The decision tree method selects the segmentation points according to the label values corresponding to the features, self information of the features is reserved and used to a great extent, and the problems of unstable performance, subjectivity of manual intervention and the like in the selection of the segmentation points by the linear binning method are solved. And calculating Gini indexes of all possible segmentation points of each feature, outputting the segmentation points according to parameters such as adjustable tree depth, segmentation number and the like, and realizing the optimal classification of the discretization of the feature data.

3. The accuracy and stability of the model are improved. The box-dividing rule learning part only selects the processing of a training set, so that the possibility of characteristic crossing is avoided, and meanwhile, the optimal box-dividing realized by the decision tree algorithm reduces the loss of characteristic information, so that the data set entering the model retains more self characteristic information, and the reduction of the overfitting risk of the model and the improvement of the accuracy and stability of the data model are facilitated.

4. And carrying out comparison analysis before and after the model effect, carrying out binning treatment, wherein the AUC value of the model evaluation index is 0.662, and the AUC value is increased to 0.754 after the treatment by using the binning method. Therefore, the decision tree binning method based on Spark plays a significant role in improving the accuracy effect of the model.

In terms of hardware, in order to solve the problem that the accuracy and efficiency of financial service risk prediction cannot be simultaneously satisfied in the existing financial service risk prediction process, the present application provides an embodiment of an electronic device for implementing all or part of the contents in the financial service risk prediction method, where the electronic device specifically includes the following contents:

fig. 14 is a schematic block diagram of a system configuration of an electronic device 9600 according to an embodiment of the present application. As shown in fig. 14, the electronic device 9600 can include a central processor 9100 and a memory 9140; the memory 9140 is coupled to the central processor 9100. Notably, this FIG. 14 is exemplary; other types of structures may also be used in addition to or in place of the structure to implement telecommunications or other functions.

In one embodiment, the financial services risk prediction function may be integrated into the central processor. Wherein the central processor may be configured to control:

As can be seen from the above description, the electronic device provided in the embodiment of the present application can effectively improve the application universality of the financial service risk prediction and effectively improve the efficiency of the risk prediction for various prediction requests by performing different model predictions for different request types; the data classification is carried out on historical financial information of a plurality of users by applying the Spark system, so that the reliability, the efficiency and the automation degree of the classification process of training data can be effectively improved, a financial service risk prediction model is obtained by applying the data set training after the Spark system is used for carrying out data classification, the application reliability and the stability of the financial service risk prediction model can be effectively improved, the loss of characteristic information is reduced, more self characteristic information is reserved in a data set entering the model, and the reduction of the overfitting risk of the model and the improvement of the accuracy and the stability of the data model are facilitated; the financial service risk prediction model is applied to carry out financial service risk prediction on a target user, so that the accuracy, the efficiency and the automation degree of the financial service risk prediction process can be effectively improved, the labor cost and the time cost of a financial institution can be effectively reduced, the accuracy and the efficiency of identifying risk users by the financial institution can be effectively improved, for example, the credit risk rating or default risk probability of individuals or enterprises applying for various financial loans by the users of the financial institution, including before, during and after the loan, can be rapidly and accurately predicted, and the operation safety and the operation stability of the financial institution can be effectively improved.

In another embodiment, the financial service risk prediction apparatus may be configured separately from the central processor 9100, for example, the financial service risk prediction apparatus may be configured as a chip connected to the central processor 9100, and the financial service risk prediction function is realized by the control of the central processor.

As shown in fig. 14, the electronic device 9600 may further include: a communication module 9110, an input unit 9120, an audio processor 9130, a display 9160, and a power supply 9170. It is noted that the electronic device 9600 also does not necessarily include all of the components shown in fig. 14; further, the electronic device 9600 may further include components not shown in fig. 14, which can be referred to in the related art.

As shown in fig. 14, a central processor 9100, sometimes referred to as a controller or operational control, can include a microprocessor or other processor device and/or logic device, which central processor 9100 receives input and controls the operation of the various components of the electronic device 9600.

The memory 9140 can be, for example, one or more of a buffer, a flash memory, a hard drive, a removable media, a volatile memory, a non-volatile memory, or other suitable device. The information relating to the failure may be stored, and a program for executing the information may be stored. And the central processing unit 9100 can execute the program stored in the memory 9140 to realize information storage or processing, or the like.

The input unit 9120 provides input to the central processor 9100. The input unit 9120 is, for example, a key or a touch input device. Power supply 9170 is used to provide power to electronic device 9600. The display 9160 is used for displaying display objects such as images and characters. The display may be, for example, an LCD display, but is not limited thereto.

The memory 9140 can be a solid state memory, e.g., Read Only Memory (ROM), Random Access Memory (RAM), a SIM card, or the like. There may also be a memory that holds information even when power is off, can be selectively erased, and is provided with more data, an example of which is sometimes called an EPROM or the like. The memory 9140 could also be some other type of device. Memory 9140 includes a buffer memory 9141 (sometimes referred to as a buffer). The memory 9140 may include an application/function storage portion 9142, the application/function storage portion 9142 being used for storing application programs and function programs or for executing a flow of operations of the electronic device 9600 by the central processor 9100.

The memory 9140 can also include a data store 9143, the data store 9143 being used to store data, such as contacts, digital data, pictures, sounds, and/or any other data used by an electronic device. The driver storage portion 9144 of the memory 9140 may include various drivers for the electronic device for communication functions and/or for performing other functions of the electronic device (e.g., messaging applications, contact book applications, etc.).

The communication module 9110 is a transmitter/receiver 9110 that transmits and receives signals via an antenna 9111. The communication module (transmitter/receiver) 9110 is coupled to the central processor 9100 to provide input signals and receive output signals, which may be the same as in the case of a conventional mobile communication terminal.

Based on different communication technologies, a plurality of communication modules 9110, such as a cellular network module, a bluetooth module, and/or a wireless local area network module, may be provided in the same electronic device. The communication module (transmitter/receiver) 9110 is also coupled to a speaker 9131 and a microphone 9132 via an audio processor 9130 to provide audio output via the speaker 9131 and receive audio input from the microphone 9132, thereby implementing ordinary telecommunications functions. The audio processor 9130 may include any suitable buffers, decoders, amplifiers and so forth. In addition, the audio processor 9130 is also coupled to the central processor 9100, thereby enabling recording locally through the microphone 9132 and enabling locally stored sounds to be played through the speaker 9131.

Embodiments of the present application further provide a computer-readable storage medium capable of implementing all the steps in the financial service risk prediction method in the foregoing embodiments, where the computer-readable storage medium stores thereon a computer program, and when the computer program is executed by a processor, the computer program implements all the steps of the financial service risk prediction method in which an execution subject is a server or a client, for example, when the processor executes the computer program, the processor implements the following steps:

As can be seen from the above description, the computer-readable storage medium provided in the embodiment of the present application can effectively improve the application universality of the risk prediction of the financial service and effectively improve the efficiency of the risk prediction for various prediction requests by performing different model predictions for different request types; the data classification is carried out on historical financial information of a plurality of users by applying the Spark system, so that the reliability, the efficiency and the automation degree of the classification process of training data can be effectively improved, a financial service risk prediction model is obtained by applying the data set training after the Spark system is used for carrying out data classification, the application reliability and the stability of the financial service risk prediction model can be effectively improved, the loss of characteristic information is reduced, more self characteristic information is reserved in a data set entering the model, and the reduction of the overfitting risk of the model and the improvement of the accuracy and the stability of the data model are facilitated; the financial service risk prediction model is applied to carry out financial service risk prediction on a target user, so that the accuracy, the efficiency and the automation degree of the financial service risk prediction process can be effectively improved, the labor cost and the time cost of a financial institution can be effectively reduced, the accuracy and the efficiency of identifying risk users by the financial institution can be effectively improved, for example, the credit risk rating or default risk probability of individuals or enterprises applying for various financial loans by the users of the financial institution, including before, during and after the loan, can be rapidly and accurately predicted, and the operation safety and the operation stability of the financial institution can be effectively improved.

As will be appreciated by one skilled in the art, embodiments of the present invention may be provided as a method, apparatus, or computer program product. Accordingly, the present invention may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present invention may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.

The present invention is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (devices), and computer program products according to embodiments of the invention. It will be understood that each flow and/or block of the flow diagrams and/or block diagrams, and combinations of flows and/or blocks in the flow diagrams and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

The principle and the implementation mode of the invention are explained by applying specific embodiments in the invention, and the description of the embodiments is only used for helping to understand the method and the core idea of the invention; meanwhile, for a person skilled in the art, according to the idea of the present invention, there may be variations in the specific embodiments and the application scope, and in summary, the content of the present specification should not be construed as a limitation to the present invention.

Claims

1. A financial services risk prediction method, comprising:

2. The financial services risk prediction method of claim 1, wherein the financial services request type comprises: the user category, the application type, the application state and the request type of the financial loan;

3. The financial services risk prediction method of claim 1, further comprising, prior to the receiving a financial services risk prediction request for a target user:

4. The method according to claim 3, wherein the obtaining a plurality of segmentation points corresponding to each element column in the numerical data set comprises:

5. The method for predicting financial service risk according to claim 3, wherein the generating a target data set corresponding to historical user financial information of a plurality of users based on the optimal bin-dividing point list, each element column and the character data set comprises:

6. The financial services risk prediction method of claim 5, wherein the dichotomy lookup step comprises:

7. The financial services risk prediction method of claim 3, wherein the target data set comprises a training set;

8. The financial services risk prediction method of claim 7, wherein the target data set further comprises a test set;

9. A financial services risk prediction apparatus, comprising:

10. The financial services risk prediction apparatus of claim 9, wherein the financial services request type comprises: the user category, the application type, the application state and the request type of the financial loan;

11. The financial services risk prediction device of claim 9, further comprising:

12. The financial services risk prediction device of claim 11, wherein the cut point acquisition module is configured to perform the following:

13. The financial services risk prediction device of claim 11, wherein the target data set generation module is configured to perform the following:

14. The financial services risk prediction device of claim 13, wherein the target data set generation module further comprises: a binary search unit;

15. The financial services risk prediction device of claim 11, wherein the target data set includes a training set;

16. The financial services risk prediction device of claim 15, wherein the target data set further comprises a test set;

17. An electronic device comprising a memory, a processor, and a computer program stored on the memory and executable on the processor, wherein the processor implements the method of financial service risk prediction of any one of claims 1 to 8 when executing the program.

18. A computer-readable storage medium having stored thereon a computer program, wherein the computer program, when executed by a processor, implements the financial services risk prediction method of any one of claims 1 to 8.