CN111815432B

CN111815432B - Financial service risk prediction method and device

Info

Publication number: CN111815432B
Application number: CN202010654184.9A
Authority: CN
Inventors: 相妹; 卢健; 马格
Original assignee: Industrial and Commercial Bank of China Ltd ICBC
Current assignee: Industrial and Commercial Bank of China Ltd ICBC
Priority date: 2020-07-08
Filing date: 2020-07-08
Publication date: 2023-08-18
Anticipated expiration: 2040-07-08
Also published as: CN111815432A

Abstract

The embodiment of the application provides a financial service risk prediction method and a financial service risk prediction device, wherein the method comprises the following steps: receiving a financial service risk prediction request for a target user; selecting one of a plurality of financial service risk prediction models as a target financial service risk prediction model based on a financial service request type, wherein the financial service risk prediction model is trained by applying respective corresponding training sets, and the training sets are target data sets obtained by applying a Spark system to divide historical user financial information of a plurality of users into data boxes; and inputting the financial information of the user into the target financial service risk prediction model, and outputting the financial service risk prediction result serving as the target user. The application can effectively improve the reliability, efficiency and automation degree of the data box dividing process for training, and further can effectively improve the accuracy, efficiency and automation degree of the financial service risk prediction model obtained by applying the data box dividing training to predict the financial service risk of the users of the financial institutions.

Description

Financial service risk prediction method and device

Technical Field

The application relates to the technical field of data processing, in particular to a financial service risk prediction method and a financial service risk prediction device.

Background

As financial institutions such as banks offer an increasing variety of financial services to the public and an increasing number of target people for which financial services are targeted, financial institutions need to pre-judge the risk that a user may have to be provided with a financial service before providing that user with a financial service. At present, a financial institution performs financial service risk prediction in an automatic prediction mode by using a machine learning model, but the machine learning model applied by the financial institution at present usually requires a large amount of historical data to train, and when the historical data are applied, a large amount of manpower is required to perform data mining and processing work. However, the data processing by manpower is time-consuming and labor-consuming, and also causes processing errors due to human processing errors.

Based on the method, the training data can be subjected to data box division processing in advance in the existing financial service risk prediction process, and the common box division method mainly comprises supervised chi-square box division, minimum entropy method box division, unsupervised equidistant box division, equal frequency box division and the like. However, because the unsupervised box division method does not consider the problem of dependent variables, the improvement of the effect of the machine learning model obtained by training is limited to a certain extent, the training machine learning model for the data set with uneven distribution is unstable, manual intervention is needed, and the like, so that the accuracy and the automation degree of financial service risk prediction cannot be ensured, and the operation process of the supervised box division method is complex and time-consuming. That is, the existing financial service risk prediction process cannot simultaneously satisfy the accuracy and efficiency of financial service risk prediction.

Disclosure of Invention

Aiming at the problems in the prior art, the application provides the financial service risk prediction method and the financial service risk prediction device, which can effectively improve the reliability, the efficiency and the automation degree of the data for training in the case division process, and further can effectively improve the accuracy, the efficiency and the automation degree of the financial service risk prediction of a financial institution user by using the financial service risk prediction model obtained by data training after case division.

In order to solve the technical problems, the application provides the following technical scheme:

in a first aspect, the present application provides a financial service risk prediction method, including:

receiving a financial service risk prediction request aiming at a target user, wherein the financial service risk prediction request comprises user financial information of the target user and a corresponding financial service request type;

selecting one of a plurality of preset financial service risk prediction models as a target financial service risk prediction model based on the financial service request type, wherein each financial service risk prediction model is trained by applying a corresponding training set, and the training set is a target data set obtained by pre-applying a Spark system to divide data of historical user financial information of a plurality of users;

And inputting the user financial information of the target user into the target financial service risk prediction model, and taking the output of the target financial service risk prediction model as a financial service risk prediction result of the target user.

Further, the financial service request types include: user category, application type, application state and request type of financial loan;

correspondingly, the selecting one of a plurality of preset financial service risk prediction models as the target financial service risk prediction model based on the financial service request type comprises the following steps:

and selecting a financial service risk prediction model corresponding to the user type of the target user, the application type of the financial loan, the application state and the request type from a preset model table according to the user type of the target user, the application type of the financial loan, the application state and the request type, and determining the financial service risk prediction model as a target financial service risk prediction model corresponding to the target user currently.

Further, before the receiving the financial service risk prediction request for the target user, the method further includes:

respectively extracting a numerical value data set conforming to numerical value type characteristics and a character data set conforming to character type characteristics from historical user financial information of a plurality of users;

Acquiring a plurality of cutting points corresponding to each element column in the numerical data set;

respectively carrying out decision tree calculation on each segmentation point by using the Spark system to obtain a coefficient value of the basis of each segmentation point;

determining the segmentation point with the smallest coefficient value among a plurality of segmentation points corresponding to each element row as the optimal segmentation point of each corresponding element row;

generating an optimal box dividing point list according to the optimal dividing points of each element column;

and generating target data sets corresponding to historical user financial information of a plurality of users based on the optimal box division point list, each element column and the character data sets.

Further, the obtaining a plurality of segmentation points corresponding to each element column in the numerical data set includes:

if the missing value exists in each element column in the numerical data set through searching, the missing value is complemented by the minimum value in the element column with the missing value;

performing de-duplication treatment on each numerical value in each element column;

sorting the numerical values in the element columns according to the order from small to large, and respectively determining the numerical values in the sorted element columns as initial dividing points;

Judging whether the number of the initial dividing points corresponding to each element column is larger than a number threshold value, if the element columns with the number of the initial dividing points larger than the number threshold value exist, randomly selecting a numerical value equal to the number threshold value from the element columns, and determining the numerical value as the dividing point of the element columns;

if the element columns with the number of the initial dividing points being smaller than or equal to the number threshold exist, determining the initial dividing points in the element columns as dividing points of the element columns respectively.

Further, the generating a target data set corresponding to historical user financial information of a plurality of users based on the optimal binning and dividing point list, each element column and the character data set includes:

applying the Spark system, and executing a preset dichotomy searching step on each element column in parallel based on the optimal bin dividing point list so as to obtain a discretized element column corresponding to each element column;

and splicing the discretization element columns with the character data sets respectively to obtain target data sets corresponding to historical user financial information of a plurality of users.

Further, the dichotomy searching step includes:

and obtaining index numbers of the box sections where the numerical values in the element columns correspond to in the optimal box division point list by applying a binary search algorithm in the Spark system, and respectively replacing the numerical values in the element columns by applying the obtained index numbers.

Further, the target data set comprises a training set;

correspondingly, before receiving the financial service risk prediction request for the target user, the method further comprises:

and training by using the training set to obtain a financial service risk prediction model.

Further, the target data set further comprises a test set;

correspondingly, after the training set is applied to train to obtain the financial service risk prediction model, the method further comprises the following steps:

and performing effect test on the financial service risk prediction model based on the test set, and adjusting the financial service risk prediction model based on a corresponding effect test result.

In a second aspect, the present application provides a financial service risk prediction apparatus, comprising:

the system comprises a request receiving module, a request processing module and a request processing module, wherein the request receiving module is used for receiving a financial service risk prediction request aiming at a target user, and the financial service risk prediction request comprises user financial information of the target user and a corresponding financial service request type;

the model selection module is used for selecting one of a plurality of preset financial service risk prediction models as a target financial service risk prediction model based on the financial service request type, wherein each financial service risk prediction model is trained by applying a corresponding training set, and the training set is a target data set obtained by pre-applying a Spark system to divide historical user financial information of a plurality of users into boxes;

And the risk prediction module is used for inputting the user financial information of the target user into the target financial service risk prediction model and taking the output of the target financial service risk prediction model as a financial service risk prediction result of the target user.

correspondingly, the request receiving module is used for executing the following contents:

Further, the method further comprises the following steps:

the data dividing module is used for respectively extracting a numerical value data set conforming to numerical value type characteristics and a character data set conforming to character type characteristics from historical user financial information of a plurality of users;

The dividing point acquisition module is used for acquiring a plurality of dividing points corresponding to each element column in the numerical data set;

the decision tree calculation module is used for respectively carrying out decision tree calculation on each segmentation point by applying the Spark system so as to obtain the coefficient value of the radix of each segmentation point;

the optimal segmentation point acquisition module is used for respectively determining the segmentation point with the smallest coefficient value among a plurality of segmentation points corresponding to each element column as the optimal segmentation point of each corresponding element column;

the list generation module is used for generating an optimal box division point list according to the optimal division points of each element column;

and the target data set generation module is used for generating target data sets corresponding to historical user financial information of a plurality of users based on the optimal box division point list, each element column and the character data sets.

Further, the segmentation point acquisition module is configured to execute the following contents:

Further, the target data set generating module is configured to perform the following:

Further, the target data set generating module further includes: a binary search unit;

the binary search unit is configured to perform the binary search step, where the binary search step includes: and obtaining index numbers of the box sections where the numerical values in the element columns correspond to in the optimal box division point list by applying a binary search algorithm in the Spark system, and respectively replacing the numerical values in the element columns by applying the obtained index numbers.

Further, the target data set comprises a training set;

correspondingly, the financial service risk prediction device further comprises:

and the model training module is used for applying the training set to train to obtain the financial service risk prediction model.

Further, the target data set further comprises a test set;

and the model test module is used for performing effect test on the financial service risk prediction model based on the test set and adjusting the financial service risk prediction model based on a corresponding effect test result.

In a third aspect, the present application provides an electronic device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, the processor implementing the financial service risk prediction method when executing the program.

In a fourth aspect, the present application provides a computer readable storage medium having stored thereon a computer program which when executed by a processor implements the financial service risk prediction method.

As can be seen from the above technical solutions, the method and apparatus for predicting risk of financial service provided by the present application include: receiving a financial service risk prediction request aiming at a target user, wherein the financial service risk prediction request comprises user financial information of the target user and a corresponding financial service request type; selecting one of a plurality of preset financial service risk prediction models as a target financial service risk prediction model based on the financial service request type, wherein each financial service risk prediction model is trained by applying a corresponding training set, and the training set is a target data set obtained by pre-applying a Spark system to divide data of historical user financial information of a plurality of users; inputting the user financial information of the target user into the target financial service risk prediction model, taking the output of the target financial service risk prediction model as a financial service risk prediction result of the target user, and carrying out different model predictions for different request types so as to effectively improve the application universality of financial service risk prediction and the risk prediction efficiency for various prediction requests; the data box division is carried out on the historical financial information of a plurality of users by using the Spark system, so that the reliability, efficiency and automation degree of the data box division process for training can be effectively improved, the financial service risk prediction model is obtained by using the data set after the data box division by using the Spark system, the application reliability and stability of the financial service risk prediction model can be effectively improved, the loss of characteristic information is reduced, more self characteristic information is reserved in the data set entering the model, and the reduction of the model fitting risk and the improvement of the accuracy and stability of the data model are facilitated; by applying the financial service risk prediction model to predict the financial service risk of the target user, the accuracy, efficiency and automation degree of the financial service risk prediction process can be effectively improved, the labor cost and time cost of a financial institution can be effectively reduced, the accuracy and efficiency of identifying the risk user by the financial institution can be further effectively improved, for example, credit risk ratings or default risk probabilities including before, during and after loan of users or enterprises applying various financial loans by the financial institution can be rapidly and accurately predicted, and the operation safety and stability of the financial institution can be effectively improved.

Drawings

In order to more clearly illustrate the embodiments of the present application or the technical solutions in the prior art, the drawings that are required in the embodiments or the description of the prior art will be briefly described, and it is obvious that the drawings in the following description are some embodiments of the present application, and other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.

Fig. 1 is a flowchart of a financial service risk prediction method according to an embodiment of the application.

Fig. 2 is a flowchart of a financial service risk prediction method including step 110 according to an embodiment of the present application.

Fig. 3 is a flowchart illustrating steps 010 to 060 in a financial service risk prediction method according to an embodiment of the present application.

Fig. 4 is a flowchart of a financial service risk prediction method including step 070 according to an embodiment of the application.

Fig. 5 is a flowchart of a financial service risk prediction method including step 080 according to an embodiment of the present application.

Fig. 6 is a schematic diagram of a first configuration of a financial service risk prediction apparatus according to an embodiment of the present application.

Fig. 7 is a schematic diagram of a second configuration of a financial service risk prediction apparatus according to an embodiment of the present application.

Fig. 8 is a schematic diagram of a third configuration of a financial service risk prediction apparatus according to an embodiment of the present application.

Fig. 9 is a schematic diagram of a fourth configuration of a financial service risk prediction apparatus according to an embodiment of the present application.

Fig. 10 is a schematic structural diagram of a financial service risk prediction system provided by an application example of the present application.

Fig. 11 is a schematic structural diagram of a preprocessing unit provided by an application example of the present application.

Fig. 12 is a schematic structural diagram of a decision tree binning unit provided by an application example of the present application.

Fig. 13 is a schematic structural diagram of a feature discretization unit provided by an application example of the present application.

Fig. 14 is a schematic structural diagram of an electronic device in an embodiment of the present application.

Detailed Description

For the purpose of making the objects, technical solutions and advantages of the embodiments of the present application more apparent, the technical solutions of the embodiments of the present application will be clearly and completely described below with reference to the accompanying drawings in the embodiments of the present application, and it is apparent that the described embodiments are some embodiments of the present application, but not all embodiments of the present application. All other embodiments, which can be made by those skilled in the art based on the embodiments of the application without making any inventive effort, are intended to be within the scope of the application.

Data binning (also known as discrete binning) is a data preprocessing technique and is also an important data processing operation in data mining feature engineering. The method is used for reducing the influence of secondary observation errors, improving the stability of the model and reducing the risk of overfitting of the model, and is a method for grouping a plurality of continuous values into a smaller number of 'binning'. At present, the common box dividing method mainly comprises supervised chi-square box dividing, minimum entropy method box dividing, unsupervised equidistant box dividing, equal frequency box dividing and the like.

The unsupervised box division method is disadvantageous to the improvement of the model effect because of the problem that the dependent variable is not considered, and the supervised box division method such as chi-square is complex in operation process and time-consuming.

In order to optimize the problems of unstable performance, manual interference and the like of the linear binning on the data sets with uneven distribution, and to process and model the large-scale data sets, an optimal binning method and application of a rapid and efficient general computing framework are needed in order to improve the computing processing speed.

Based on the above, in consideration of the problem that the accuracy and efficiency of financial service risk prediction cannot be simultaneously satisfied in the existing financial service risk prediction process, the embodiment of the application provides a financial service risk prediction method, a financial service risk prediction device, an electronic device and a computer readable storage medium, and a financial service risk prediction request for a target user is received, wherein the financial service risk prediction request comprises user financial information of the target user and a corresponding financial service request type; selecting one of a plurality of preset financial service risk prediction models as a target financial service risk prediction model based on the financial service request type, wherein each financial service risk prediction model is trained by applying a corresponding training set, and the training set is a data set obtained by pre-applying a Spark system to divide historical financial information of a plurality of users into boxes; inputting the user financial information of the target user into the target financial service risk prediction model, taking the output of the target financial service risk prediction model as a financial service risk prediction result of the target user, and carrying out different model predictions for different request types so as to effectively improve the application universality of financial service risk prediction and the risk prediction efficiency for various prediction requests; the data box division is carried out on the historical financial information of a plurality of users by using the Spark system, so that the reliability, efficiency and automation degree of the data box division process for training can be effectively improved, the financial service risk prediction model is obtained by using the data set after the data box division by using the Spark system, the application reliability and stability of the financial service risk prediction model can be effectively improved, the loss of characteristic information is reduced, more self characteristic information is reserved in the data set entering the model, and the reduction of the model fitting risk and the improvement of the accuracy and stability of the data model are facilitated; by applying the financial service risk prediction model to predict the financial service risk of the target user, the accuracy, efficiency and automation degree of the financial service risk prediction process can be effectively improved, the labor cost and time cost of a financial institution can be effectively reduced, the accuracy and efficiency of identifying the risk user by the financial institution can be further effectively improved, for example, credit risk ratings or default risk probabilities including before, during and after loan of users or enterprises applying various financial loans by the financial institution can be rapidly and accurately predicted, and the operation safety and stability of the financial institution can be effectively improved.

The following examples are given by way of illustration.

In order to solve the problem that the accuracy and efficiency of financial service risk prediction cannot be simultaneously satisfied in the existing financial service risk prediction process, the application provides an embodiment of a financial service risk prediction method, referring to fig. 1, wherein the financial service risk prediction method specifically comprises the following contents:

step 100: and receiving a financial service risk prediction request aiming at a target user, wherein the financial service risk prediction request comprises user financial information of the target user and a corresponding financial service request type.

It can be understood that the user financial information of the target user includes historical transaction data or transaction requests of the user within a preset time period. Specific examples of the financial information of the user may be credit investigation, asset flow, tax payment and other data information of an individual or an enterprise.

It is understood that the financial service request types include: user category, application type of financial loan, application status, and request type. The method specifically comprises the following steps:

the user category may specifically include: personal users, enterprise users, etc.

The application type of the financial loan can specifically include: mortgage, house, car, fitment, travel, business, education, etc.

The application state of the financial loan may specifically include: pre-loan, mid-loan, post-loan, etc.

The request types may specifically include: credit risk rating requests, breach risk probability prediction requests, etc.

Step 200: and selecting one of a plurality of preset financial service risk prediction models as a target financial service risk prediction model based on the financial service request type, wherein each financial service risk prediction model is trained by applying a corresponding training set, and the training set is a target data set obtained by pre-applying a Spark system to divide data of historical user financial information of a plurality of users.

In step 200, the Spark system is a fast general-purpose clustered computing engine for large-scale data. The decision tree algorithm can optimally bin the variable characteristics according to the specific relation between the variable and the dependent variable. The optimal box division of continuous numerical characteristics is realized by using a decision tree algorithm and a recursion method, and discretization of continuous characteristics in a business data large-width table is rapidly realized by using RDD data set representation of Spark, selection, map and other operation methods and a binary search method, so that the purposes of improving the model accuracy and accelerating the large-scale data calculation speed are achieved.

Step 300: and inputting the user financial information of the target user into the target financial service risk prediction model, and taking the output of the target financial service risk prediction model as a financial service risk prediction result of the target user.

It will be appreciated that the financial services risk prediction model may specifically be a model algorithm training through machine learning to obtain classification or regression models for credit risk ratings or risk of default. May be, but is not limited to: a logistic regression LR model, a generalized linear regression GLM model, a gradient lifting decision tree GBDT model, a lifting tree XGBoost model and the like.

As can be seen from the above description, the financial service risk prediction method provided by the embodiment of the present application can effectively improve the application universality of financial service risk prediction and effectively improve the efficiency of risk prediction for various prediction requests by performing different model predictions for different request types; the data box division is carried out on the historical financial information of a plurality of users by using the Spark system, so that the reliability, efficiency and automation degree of the data box division process for training can be effectively improved, the financial service risk prediction model is obtained by using the data set after the data box division by using the Spark system, the application reliability and stability of the financial service risk prediction model can be effectively improved, the loss of characteristic information is reduced, more self characteristic information is reserved in the data set entering the model, and the reduction of the model fitting risk and the improvement of the accuracy and stability of the data model are facilitated; by applying the financial service risk prediction model to predict the financial service risk of the target user, the accuracy, efficiency and automation degree of the financial service risk prediction process can be effectively improved, the labor cost and time cost of a financial institution can be effectively reduced, the accuracy and efficiency of identifying the risk user by the financial institution can be effectively improved, for example, credit risk ratings or default risk probabilities including before, during and after loan can be rapidly and accurately predicted for users of the financial institution applying various financial loans, and the operation safety and stability of the financial institution can be effectively improved.

In order to provide a preferred way of predicting an application process, in one embodiment of the financial service risk prediction method provided by the present application, the financial service request types include: user category, application type, application state and request type of financial loan; referring to fig. 2, the step 100 in the financial service risk prediction method specifically includes the following:

step 110: and selecting a financial service risk prediction model corresponding to the user type of the target user, the application type of the financial loan, the application state and the request type from a preset model table according to the user type of the target user, the application type of the financial loan, the application state and the request type, and determining the financial service risk prediction model as a target financial service risk prediction model corresponding to the target user currently.

It can be understood that the model table is used for storing the corresponding relation among the user category of each user, the application type of the financial loan, the application state, the request type and the financial service risk prediction model. See table 1 below for specific examples.

TABLE 1

Based on the above table 1, if the user category in the financial service request types included in the financial service risk prediction request of the currently received target user is an enterprise user, the application type of the financial loan is a car loan, the application state of the financial loan is before the loan, and the request type is a credit risk rating request, the corresponding financial service risk prediction model is found in the model table to be the model 11, and the model 11 is used as the current target financial service risk prediction model of the target user.

In the model table, the difference between the financial service risk prediction models is that the training data are different, and each financial service risk prediction model is obtained by training the user history data corresponding to the user category, the application type, the application state and the request type of the financial loan corresponding to the model table.

As can be seen from the above description, the financial service risk prediction method provided by the embodiment of the present application can further effectively improve the application universality of financial service risk prediction, and effectively improve the efficiency of risk prediction for various prediction requests, and can rapidly and accurately predict credit risk ratings or default risk probabilities including before, during and after loan of individuals or enterprises applying various financial loans by users of the financial institutions, and effectively improve the operation safety and stability of the financial institutions.

In order to provide a decision tree application in Spark system, in one embodiment of the financial service risk prediction method provided by the present application, referring to fig. 3, before step 100 in the financial service risk prediction method, the method further specifically includes the following:

step 010: and respectively extracting a numerical value data set conforming to the numerical value type characteristic and a character data set conforming to the character type characteristic from the historical user financial information of a plurality of users.

Step 020: and acquiring a plurality of segmentation points corresponding to each element column in the numerical data set.

Step 030: and respectively carrying out decision tree calculation on each segmentation point by using the Spark system to obtain the coefficient value of the kunit of each segmentation point.

Step 040: and determining the segmentation point with the smallest coefficient value among the multiple segmentation points corresponding to the element columns as the optimal segmentation point of the corresponding element columns.

Step 050: and generating an optimal box dividing point list according to the optimal dividing points of each element column.

Step 060: and generating target data sets corresponding to historical user financial information of a plurality of users based on the optimal box division point list, each element column and the character data sets.

As can be seen from the above description, in the financial service risk prediction method provided by the embodiment of the present application, the decision tree method selects the segmentation points according to the label values corresponding to the features, so that the self information of the features is retained and used to a great extent, and the problems of unstable performance, subjectivity of manual intervention, etc. existing in the linear box-dividing method for selecting the segmentation points are avoided. And calculating Gini indexes for all possible segmentation points of each feature, outputting the segmentation points according to parameters such as adjustable tree depths, segmentation segment numbers and the like, and realizing the optimal classification of feature data discretization.

In order to provide a preferable manner of setting the cut point, in one embodiment of the financial service risk prediction method provided by the present application, the step 020 in the financial service risk prediction method specifically includes the following steps:

step 021: if the missing value exists in each element column in the numerical data set through searching, the missing value is complemented by the minimum value in the element column with the missing value.

Step 022: and performing de-duplication processing on each numerical value in each element column.

Step 023: and sequencing the numerical values in the element columns according to the sequence from small to large, and respectively determining the numerical values in the sequenced element columns as initial dividing points.

Step 024: judging whether the number of the initial dividing points corresponding to each element column is greater than a number threshold, if the element columns with the number of the initial dividing points greater than the number threshold exist, executing step 025: and randomly selecting a value equal to the quantity threshold value from the element column to determine the segmentation point of the element column.

If there are columns of elements whose number of initial cut points is less than or equal to the number threshold, then step 026 is performed: and respectively determining the initial segmentation points in the element column as segmentation points of the element column.

From the above description, it can be seen that the financial service risk prediction method provided by the embodiment of the present application can effectively improve the reliability and efficiency of the segmentation point setting, further can effectively improve the accuracy and efficiency of obtaining the optimal box-dividing segmentation point list, and further can effectively improve the reliability, efficiency and automation degree of the box-dividing process of training data.

In order to provide a parallel computing manner in Spark, in one embodiment of the financial service risk prediction method provided by the present application, step 060 in the financial service risk prediction method specifically includes the following:

step 061: and executing a preset dichotomy searching step on each element column in parallel based on the optimal bin dividing point list by using the Spark system to obtain discretized element columns corresponding to each element column.

Step 062: and splicing the discretization element columns with the character data sets respectively to obtain target data sets corresponding to historical user financial information of a plurality of users.

As can be seen from the above description, in the financial service risk prediction method provided by the embodiment of the application, RDD of Spark can calculate data sets and Select, map and other operation methods in parallel, and the distributed parallel calculation realizes the high speed and high efficiency of large-scale operation.

In order to provide a binary search calculation method in Spark, in one embodiment of the financial service risk prediction method provided by the present application, the binary search step in the financial service risk prediction method specifically includes the following steps:

From the above description, it can be seen that the financial service risk prediction method provided by the embodiment of the application can output the box division points according to the parameters such as the adjustable tree depth and the box division number, so as to realize the optimal classification of the discretization of the feature data.

In order to model training the binning result, in one embodiment of the financial service risk prediction method provided by the application, the target data set comprises a training set; referring to fig. 4, after step 060 and before step 100 in the financial service risk prediction method, the following are specifically included:

step 070: and training by using the training set to obtain a financial service risk prediction model.

From the above description, the financial service risk prediction method provided by the embodiment of the application can effectively improve the application reliability and stability of the financial service risk prediction model, reduce the loss of the characteristic information, and enable the data set entering the model to retain more self characteristic information, thereby being beneficial to the reduction of the risk of model fitting and the improvement of the accuracy and stability of the data model.

In order to perform model test on the binning result, in one embodiment of the financial service risk prediction method provided by the application, the target data set comprises a training set; referring to fig. 5, after step 070 and before step 100 in the financial service risk prediction method, the following are specifically included:

step 080: and performing effect test on the financial service risk prediction model based on the test set, and adjusting the financial service risk prediction model based on a corresponding effect test result.

From the above description, the financial service risk prediction method provided by the embodiment of the application can further improve the application reliability and stability of the financial service risk prediction model, reduce the loss of the characteristic information, and enable the data set entering the model to retain more self characteristic information, thereby being beneficial to the reduction of the risk of model fitting and the improvement of the accuracy and stability of the data model.

In order to solve the problem that the accuracy and efficiency of the existing financial service risk prediction process cannot be simultaneously satisfied, the present application provides an embodiment of a financial service risk prediction device for executing all or part of the contents in the financial service risk prediction method, referring to fig. 6, the financial service risk prediction device specifically includes the following contents:

the request receiving module 10 is configured to receive a financial service risk prediction request for a target user, where the financial service risk prediction request includes user financial information of the target user and a corresponding financial service request type.

It can be understood that the user financial information of the target user includes historical transaction data or transaction requests of the user within a preset time period.

The model selection module 20 is configured to select one of a plurality of preset financial service risk prediction models as a target financial service risk prediction model based on the financial service request type, where each financial service risk prediction model is trained by applying a training set corresponding to each financial service risk prediction model, and the training set is a target data set obtained by pre-applying a Spark system to perform data binning on historical user financial information of a plurality of users.

The risk prediction module 30 is configured to input user financial information of the target user into the target financial service risk prediction model, and take output of the target financial service risk prediction model as a financial service risk prediction result of the target user.

As can be seen from the above description, the financial service risk prediction device provided by the embodiment of the present application can effectively improve the application universality of financial service risk prediction and effectively improve the efficiency of risk prediction for various prediction requests by performing different model predictions for different request types; the data box division is carried out on the historical financial information of a plurality of users by using the Spark system, so that the reliability, efficiency and automation degree of the data box division process for training can be effectively improved, the financial service risk prediction model is obtained by using the data set after the data box division by using the Spark system, the application reliability and stability of the financial service risk prediction model can be effectively improved, the loss of characteristic information is reduced, more self characteristic information is reserved in the data set entering the model, and the reduction of the model fitting risk and the improvement of the accuracy and stability of the data model are facilitated; by applying the financial service risk prediction model to predict the financial service risk of the target user, the accuracy, efficiency and automation degree of the financial service risk prediction process can be effectively improved, the labor cost and time cost of a financial institution can be effectively reduced, the accuracy and efficiency of identifying the risk user by the financial institution can be effectively improved, for example, credit risk ratings or default risk probabilities including before, during and after loan can be rapidly and accurately predicted for users of the financial institution applying various financial loans, and the operation safety and stability of the financial institution can be effectively improved.

In order to provide a preferred way of predicting an application process, in one embodiment of the financial service risk prediction apparatus provided by the present application, the financial service request type includes: user category, application type, application state and request type of financial loan; the request receiving module 10 in the financial service risk prediction apparatus is configured to perform the following:

As can be seen from the above description, the financial service risk prediction device provided by the embodiment of the present application can further effectively improve the application universality of financial service risk prediction, and effectively improve the efficiency of risk prediction for various prediction requests, and can rapidly and accurately predict credit risk ratings or default risk probabilities including before, during and after loans of individuals or enterprises applying various financial loans by users of financial institutions, and effectively improve the operation safety and stability of the financial institutions.

In order to provide a decision tree application in Spark system, in an embodiment of the financial service risk prediction device provided by the present application, referring to fig. 7, the financial service risk prediction device further specifically includes the following contents:

the data dividing module 01 is used for respectively extracting a numerical value data set conforming to the numerical value type characteristic and a character data set conforming to the character type characteristic from the historical user financial information of a plurality of users.

And the segmentation point acquisition module 02 is used for acquiring a plurality of segmentation points corresponding to each element column in the numerical data set.

And the decision tree calculation module 03 is configured to apply the Spark system to perform decision tree calculation on each segmentation point, so as to obtain a coefficient value of the radix of each segmentation point.

The optimal segmentation point obtaining module 04 is configured to determine, as the optimal segmentation point of each element column, the segmentation point with the smallest coefficient value among the multiple segmentation points corresponding to each element column.

And the list generation module 05 is used for generating an optimal box division point list according to the optimal division points of each element column.

And the target data set generating module 06 is configured to generate target data sets corresponding to historical user financial information of a plurality of users based on the optimal binning and dividing point list, each element column and the character data set.

As can be seen from the above description, the financial service risk prediction device provided by the embodiment of the application selects the segmentation points according to the label values corresponding to the features by the decision tree method, so that the self information of the features is reserved and used to a great extent, and the problems of unstable performance, subjectivity of manual intervention and the like existing in the selection of the segmentation points by the linear box-dividing method are avoided. And calculating Gini indexes for all possible segmentation points of each feature, outputting the segmentation points according to parameters such as adjustable tree depths, segmentation segment numbers and the like, and realizing the optimal classification of feature data discretization.

In order to provide a preferred manner of setting the cut point, in one embodiment of the financial service risk prediction device provided by the present application, the cut point obtaining module 02 in the financial service risk prediction device is specifically configured to execute the following:

As can be seen from the above description, the financial service risk prediction device provided by the embodiment of the present application can effectively improve the reliability and efficiency of the segmentation point setting, further can effectively improve the accuracy and efficiency of obtaining the optimal box-dividing segmentation point list, and further can effectively improve the reliability, efficiency and automation degree of the box-dividing process of training data.

In order to provide a parallel computing manner in Spark, in one embodiment of the financial service risk prediction device provided by the present application, the target data set generating module 06 in the financial service risk prediction device is specifically configured to perform the following:

As can be seen from the above description, according to the financial service risk prediction device provided by the embodiment of the application, RDD of Spark can be used for parallel computing data sets and Select, map and other operation methods thereof, and the distributed parallel computing is used for realizing the high speed and high efficiency of large-scale operation.

In order to provide a binary search calculation method in Spark, in one embodiment of the financial service risk prediction apparatus provided by the present application, the target data set generating module 06 in the financial service risk prediction apparatus further specifically includes the following contents: and a binary search unit.

As can be seen from the above description, the financial service risk prediction device provided by the embodiment of the present application can output the box division points according to the parameters such as the adjustable tree depth and the box division number, so as to implement the optimal classification of the discretization of the feature data.

In order to model the binning result, in one embodiment of the financial service risk prediction apparatus provided by the present application, the target data set comprises a training set; referring to fig. 8, the financial service risk prediction apparatus further specifically includes the following:

the model training module 07 is configured to apply the training set to train to obtain a financial service risk prediction model.

From the above description, it can be seen that the financial service risk prediction device provided by the embodiment of the application can effectively improve the reliability and stability of the application of the financial service risk prediction model, reduce the loss of the feature information, and enable the data set entering the model to retain more self feature information, thereby being beneficial to the reduction of the risk of model fitting and the improvement of the accuracy and stability of the data model.

In order to model the binning result, in one embodiment of the financial service risk prediction apparatus provided by the present application, the target data set comprises a training set; referring to fig. 9, the financial service risk prediction apparatus further specifically includes the following:

and the model test module 08 is used for performing effect test on the financial service risk prediction model based on the test set and adjusting the financial service risk prediction model based on the corresponding effect test result.

As can be seen from the above description, the financial service risk prediction device provided by the embodiment of the application can further improve the reliability and stability of the application of the financial service risk prediction model, reduce the loss of the feature information, and enable the data set entering the model to retain more self feature information, thereby being beneficial to the reduction of the risk of model fitting and the improvement of the accuracy and stability of the data model.

In order to further explain the scheme, the application also provides a specific application example for realizing the financial service risk prediction method by applying a financial service risk prediction system, and referring to fig. 10, the financial service risk prediction system mainly comprises a data large-width table unit 1, a preprocessing unit 2, a feature extraction unit 3, a modeling unit 4 and a model evaluation unit 5. The Spark-based decision tree binning process provided by the application example of the application is mainly embodied in the preprocessing unit 2.

The preprocessing unit 2 is interposed between the source data importing and feature extracting unit 3 of the data large-width table unit 1 in the whole flow, and has the main task of discretizing continuous features in the data set, and then outputting the data set with the discretized data boxes to the feature extracting unit 3 for subsequent model training. As shown in fig. 11, the data set splitting unit 21, the decision tree binning unit 22, and the feature discretizing unit 23 are specifically included. Wherein:

The data set splitting unit 21 is configured to split the data set in the data large-width table unit 1, and output two parts, namely a training set and a testing set.

And the decision tree binning unit 22 is used for modeling the continuous feature columns in the data set by a decision tree algorithm to realize the output of the optimal binning rule of each continuous feature column. In order to avoid the possibility of feature crossing in the subsequent modeling, decision tree box rule learning is only carried out on training set data in the method so as to ensure the independence of the test set. And the concrete decision tree algorithm is realized in a box-division way.

The feature discretization unit 23 is configured to implement discretization of continuous data feature columns of the training set and the test set by using a binary search and Spark operation method according to the optimal binning rule learned from the training set in the decision tree unit 22, and correspondingly output a discrete feature data set, i.e., a training set_discrete feature and a test set_discrete feature, as training set and test set inputs for subsequent feature extraction and modeling.

The main task of the decision tree binning unit 22 is to apply a decision tree algorithm for recursive training on the training set data, learn and output the optimal binning rules for each feature column. Referring to fig. 12, the decision tree binning unit 22 mainly comprises:

The feature column classification unit 22_1 performs training on all feature columns in the training set large-width table to judge the data types of the feature columns one by one, and classifies all the feature columns into two types of character type features and numerical type features.

The missing value filling unit 22_2 determines whether or not there is a missing value in the sample for the numerical value type feature column, calculates the minimum value of the column number sample for the missing value, and fills the missing value.

The all possible segmentation points calculation unit 22_3 performs deduplication on all the element values of each feature column and sorts the values from small to large, so as to obtain all the possible segmentation points of the feature. Wherein, for all possible segmentation points with excessive numbers, a certain number of segmentation points can be randomly selected from the possible segmentation points according to a preset threshold value. The processing effectively reduces the calculation amount of the subsequent optimal box division point screening.

The Gini index calculating unit 22_4 is configured to define a calculation logic of Gini index (a coefficient of base), and calculate Gini index returned to a dividing point according to a certain feature column element set D and the dividing point.

The decision tree unit 22_5 is constructed, and according to the set parameters such as the depth of the decision tree, the number of the splitting points, and the like, the Gini index calculation unit 22_4 is called to calculate the Gini indexes of all the possible splitting points, so that the optimal splitting points are obtained, and the training set is split. The decision tree binning unit performs recursion processing until a set number of optimal binning cut bins are screened out.

The optimal binning storage unit 22_6 stores the calculation result, that is, the optimal binning point list of all feature columns in the training set, in a table data structure DataFrame of Spark, and stores the result to a disk for calling.

Both the training set and the test set are subjected to a calculation process by the feature discretization unit 23. Referring to fig. 13, the feature discretization unit 23 specifically includes the following:

the training set 23_1 and the test set 23_2 respectively obtain the feature large-width tables of the training set and the test set output by the data set splitting unit 21.

The optimum binning import unit 23_3 imports and reads the binning rule result held by the optimum binning storage unit 22_6 using Spark.

The Spark calculation unit 23_4 uses the RDD data set representation and Select and Map operation methods that can be calculated in parallel by Spark to improve the calculation efficiency. The selection method realizes screening of all elements in a certain characteristic column and screening of an optimal bin list corresponding to the characteristic column, and the Map method realizes parallel binary search of each element.

The binary search unit 23_5 uses a binary search method for a certain element of a certain feature column, correspondingly finds out the index number of the box section where the element is located from the optimal box-division list of the column, and replaces the element with the index number to realize discretization of the continuous feature of the column.

The training set_feature unit 23_6 obtains a discretized numerical feature column through the operations of the Spark computing unit 23_4 and the binary search unit 23_5 on all continuous features in the training set. And then splicing the character type feature columns with character type feature columns which do not need to be subjected to box division processing, and outputting a complete training data set which can be directly used by the model.

The test set_feature unit 23_7 performs operation processing consistent with the training set_feature unit 23_6 on the test set, and outputs a complete test data set which can be directly used by the model.

Among them, RDD (resilient distributed data set Resilient Distributed Dataset) is the most basic data abstraction in Spark, and is a data set that can be calculated in parallel. The use of RDD greatly improves the query speed of data by a parallel computing mode, and the function computation in the decision tree construction process is rapidly completed for each element of the RDD which can be computed in parallel by Map operation.

Based on the above, the specific process of implementing the financial service risk prediction method through the financial service risk prediction system is as follows:

s1: and splitting the data set of the source data of the imported data large-width table unit 1 according to the data aggregation quantity and the set proportion, and outputting two parts of a training set and a testing set.

S2: and converting all the feature columns in the training set one by floating point number to judge the data type. The convertible character is a numerical character, and vice versa.

S3: for a numerical value type characteristic column, judging whether a missing value exists in the sample, calculating the minimum value of the column number value sample for the missing value, and filling the missing value.

S4: and performing de-duplication on all the element values of each numerical feature column and sequencing the element values from small to large according to the numerical values, so as to obtain all possible segmentation points of the feature column. Wherein, for all possible segmentation points with excessive numbers, a certain number of segmentation points can be randomly selected from the possible segmentation points according to a preset threshold value. The processing effectively reduces the calculation amount of the subsequent optimal box division point screening.

S5: the calculation logic defining the Gini index calculates the Gini index back to a point of division based on a set of feature column elements D and the point of division.

S6: according to the set parameters such as the depth of the decision tree, the number of the box dividing points and the like, the Gini index of the decision tree is calculated for all possible dividing points, and the dividing point with the smallest Gini index is selected as the optimal dividing point and the training set is divided. The decision tree binning unit performs recursion processing until a set number of optimal binning cut bins are screened out.

There is a sample set D whose Gini index is calculated by:

C _k is a subset of samples belonging to the kth class in D, k is the number of classes (decision trees are mostly two classes).

If the sample set D is divided into two parts D1 and D2 according to whether the feature column a takes a certain possible value a, namely:

D ₁ ＝{(x，y)∈D|A(x)＝a}，D ₂ ＝D-D ₁

then, under the condition of the feature column a, the Gini index calculation formula of the set D is:

s7: and storing the calculation result, namely the optimal binning segmentation point list C of all feature columns in the training set, in a Spark table type data structure DataFrame, and storing the data structure DataFrame to a disk for calling.

S8: importing and reading the stored sub-box rule result (optimal sub-box segmentation point list C) by using Spark;

s9: all continuous features in the training set are subjected to the following operations (1) and (2) to obtain a discretized numerical feature column. Then splice it with character type characteristic column without case division processing, output complete training data set A for direct use of model _TARGET 。

S10: the operation processing of the following (1) and (2) is carried out on the test set, and a complete test data set B which can be directly used by a model is output _TARGET 。

(1) The RDD data set representation and Select and Map operation methods which can be calculated in parallel by Spark are used for improving the operation efficiency. The selection method realizes screening of all elements in a certain characteristic column and screening of an optimal bin list corresponding to the characteristic column, and the Map method realizes parallel binary search of each element.

(2) And aiming at a certain element of a certain characteristic column, using a binary search method to correspondingly find out the index number of the box section where the element is positioned from the optimal box division list of the column, and replacing the element with the index number to realize discretization of the continuous characteristic of the column.

The method and the related data provided by the application example are applied to credit risk ratings or default risk probabilities including pre-loan, during-loan and post-loan of individuals or enterprises applying various financial loans.

The Spark decision tree-based binning method provided by the application example of the application is processed aiming at source data to form a feature broad table data set which does not contain missing values and has feature coding specifications and can be directly input into a model. And further training through a machine learning model algorithm to obtain a classification or regression model for credit risk rating or default risk probability. For the machine learning model used, it is possible but not limited to: a logistic regression LR model, a generalized linear regression GLM model, a gradient lifting decision tree GBDT model, a lifting tree XGBoost model and the like.

As can be seen from the above description, the financial service risk prediction method provided by the application example of the present application has the following advantages:

1. the processing speed of large-scale data is improved. Aiming at a data set with large scale level and millions of characteristics, the problem of long time consumption and large occupied resources exists when the pandas data structure is used for operation, the RDD of Spark can calculate the data set and the operation methods such as Select and Map thereof in parallel, and the high speed and high efficiency of large-scale operation are realized through distributed parallel calculation.

2. The optimization of the box division point is improved. The decision tree method selects the segmentation points according to the label values corresponding to the features, so that the self information of the features is reserved and used to a great extent, and the problems of unstable performance, subjectivity of manual intervention and the like existing in the selection of the segmentation points by the linear box-dividing method are avoided. And calculating Gini indexes for all possible segmentation points of each feature, outputting the segmentation points according to parameters such as adjustable tree depths, segmentation segment numbers and the like, and realizing the optimal classification of feature data discretization.

3. The accuracy and stability of the model are improved. The training set is only selected for processing by the box division rule learning part, so that the possibility of feature crossing is avoided, meanwhile, the optimal box division realized by the decision tree algorithm reduces the loss of feature information, so that the data set entering the model retains more self feature information, and the reduction of the model overfitting risk and the improvement of the accuracy and stability of the data model are facilitated.

4. And (3) comparing and analyzing the model effect before and after the model effect, wherein the model evaluation index AUC value is 0.662 without the box division treatment, and the AUC value is increased to 0.754 after the treatment by using the box division method. Therefore, the Spark-based decision tree binning method plays a remarkable role in improving the model accuracy effect.

In order to solve the problem that the accuracy and efficiency of financial service risk prediction cannot be simultaneously satisfied in the existing financial service risk prediction process from the hardware aspect, the application provides an embodiment of an electronic device for implementing all or part of contents in the financial service risk prediction method, wherein the electronic device specifically comprises the following contents:

fig. 14 is a schematic block diagram of a system configuration of an electronic device 9600 according to an embodiment of the present application. As shown in fig. 14, the electronic device 9600 may include a central processor 9100 and a memory 9140; the memory 9140 is coupled to the central processor 9100. Notably, this fig. 14 is exemplary; other types of structures may also be used in addition to or in place of the structures to implement telecommunications functions or other functions.

In one embodiment, the financial service risk prediction function may be integrated into the central processor. Wherein the central processor may be configured to control:

As can be seen from the above description, the electronic device provided by the embodiment of the present application, by performing different model predictions for different request types, can effectively improve the application universality of financial service risk prediction, and effectively improve the efficiency of risk prediction for various prediction requests; the data box division is carried out on the historical financial information of a plurality of users by using the Spark system, so that the reliability, efficiency and automation degree of the data box division process for training can be effectively improved, the financial service risk prediction model is obtained by using the data set after the data box division by using the Spark system, the application reliability and stability of the financial service risk prediction model can be effectively improved, the loss of characteristic information is reduced, more self characteristic information is reserved in the data set entering the model, and the reduction of the model fitting risk and the improvement of the accuracy and stability of the data model are facilitated; by applying the financial service risk prediction model to predict the financial service risk of the target user, the accuracy, efficiency and automation degree of the financial service risk prediction process can be effectively improved, the labor cost and time cost of a financial institution can be effectively reduced, the accuracy and efficiency of identifying the risk user by the financial institution can be effectively improved, for example, credit risk ratings or default risk probabilities including before, during and after loan can be rapidly and accurately predicted for users of the financial institution applying various financial loans, and the operation safety and stability of the financial institution can be effectively improved.

In another embodiment, the financial service risk prediction apparatus may be configured separately from the central processor 9100, for example, the financial service risk prediction apparatus may be configured as a chip connected to the central processor 9100, and the financial service risk prediction function is implemented by control of the central processor.

As shown in fig. 14, the electronic device 9600 may further include: a communication module 9110, an input unit 9120, an audio processor 9130, a display 9160, and a power supply 9170. It is noted that the electronic device 9600 need not include all of the components shown in fig. 14; in addition, the electronic device 9600 may further include components not shown in fig. 14, and reference may be made to the related art.

As shown in fig. 14, the central processor 9100, sometimes referred to as a controller or operational control, may include a microprocessor or other processor device and/or logic device, which central processor 9100 receives inputs and controls the operation of the various components of the electronic device 9600.

The memory 9140 may be, for example, one or more of a buffer, a flash memory, a hard drive, a removable media, a volatile memory, a non-volatile memory, or other suitable device. The information about failure may be stored, and a program for executing the information may be stored. And the central processor 9100 can execute the program stored in the memory 9140 to realize information storage or processing, and the like.

The input unit 9120 provides input to the central processor 9100. The input unit 9120 is, for example, a key or a touch input device. The power supply 9170 is used to provide power to the electronic device 9600. The display 9160 is used for displaying display objects such as images and characters. The display may be, for example, but not limited to, an LCD display.

The memory 9140 may be a solid state memory such as Read Only Memory (ROM), random Access Memory (RAM), SIM card, etc. But also a memory which holds information even when powered down, can be selectively erased and provided with further data, an example of which is sometimes referred to as EPROM or the like. The memory 9140 may also be some other type of device. The memory 9140 includes a buffer memory 9141 (sometimes referred to as a buffer). The memory 9140 may include an application/function storage portion 9142, the application/function storage portion 9142 storing application programs and function programs or a flow for executing operations of the electronic device 9600 by the central processor 9100.

The memory 9140 may also include a data store 9143, the data store 9143 for storing data, such as contacts, digital data, pictures, sounds, and/or any other data used by an electronic device. The driver storage portion 9144 of the memory 9140 may include various drivers of the electronic device for communication functions and/or for performing other functions of the electronic device (e.g., messaging applications, address book applications, etc.).

The communication module 9110 is a transmitter/receiver 9110 that transmits and receives signals via an antenna 9111. A communication module (transmitter/receiver) 9110 is coupled to the central processor 9100 to provide input signals and receive output signals, as in the case of conventional mobile communication terminals.

Based on different communication technologies, a plurality of communication modules 9110, such as a cellular network module, a bluetooth module, and/or a wireless local area network module, etc., may be provided in the same electronic device. The communication module (transmitter/receiver) 9110 is also coupled to a speaker 9131 and a microphone 9132 via an audio processor 9130 to provide audio output via the speaker 9131 and to receive audio input from the microphone 9132 to implement usual telecommunications functions. The audio processor 9130 can include any suitable buffers, decoders, amplifiers and so forth. In addition, the audio processor 9130 is also coupled to the central processor 9100 so that sound can be recorded locally through the microphone 9132 and sound stored locally can be played through the speaker 9131.

An embodiment of the present application also provides a computer-readable storage medium capable of implementing all steps in the financial service risk prediction method in the above embodiment, the computer-readable storage medium storing thereon a computer program which, when executed by a processor, implements all steps in the financial service risk prediction method in which an execution subject in the above embodiment is a server or a client, for example, the processor implements the following steps when executing the computer program:

As can be seen from the above description, the computer readable storage medium provided by the embodiments of the present application can effectively improve the application universality of financial service risk prediction and effectively improve the efficiency of risk prediction for various prediction requests by performing different model predictions for different request types; the data box division is carried out on the historical financial information of a plurality of users by using the Spark system, so that the reliability, efficiency and automation degree of the data box division process for training can be effectively improved, the financial service risk prediction model is obtained by using the data set after the data box division by using the Spark system, the application reliability and stability of the financial service risk prediction model can be effectively improved, the loss of characteristic information is reduced, more self characteristic information is reserved in the data set entering the model, and the reduction of the model fitting risk and the improvement of the accuracy and stability of the data model are facilitated; by applying the financial service risk prediction model to predict the financial service risk of the target user, the accuracy, efficiency and automation degree of the financial service risk prediction process can be effectively improved, the labor cost and time cost of a financial institution can be effectively reduced, the accuracy and efficiency of identifying the risk user by the financial institution can be effectively improved, for example, credit risk ratings or default risk probabilities including before, during and after loan can be rapidly and accurately predicted for users of the financial institution applying various financial loans, and the operation safety and stability of the financial institution can be effectively improved.

It will be apparent to those skilled in the art that embodiments of the present invention may be provided as a method, apparatus, or computer program product. Accordingly, the present invention may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present invention may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.

The present invention is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (devices), and computer program products according to embodiments of the invention. It will be understood that each flow and/or block of the flowchart illustrations and/or block diagrams, and combinations of flows and/or blocks in the flowchart illustrations and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

The principles and embodiments of the present invention have been described in detail with reference to specific examples, which are provided to facilitate understanding of the method and core ideas of the present invention; meanwhile, as those skilled in the art will have variations in the specific embodiments and application scope in accordance with the ideas of the present invention, the present description should not be construed as limiting the present invention in view of the above.

Claims

1. A financial service risk prediction method, comprising:

respectively carrying out decision tree calculation on each segmentation point by using a Spark system to obtain a coefficient value of the basis of each segmentation point;

generating target data sets corresponding to historical user financial information of a plurality of users based on the optimal box division point list, each element column and the character data sets;

2. The financial service risk prediction method according to claim 1, wherein the financial service request type includes: user category, application type, application state and request type of financial loan;

3. The method of claim 1, wherein the obtaining a plurality of cut points corresponding to each element column in the numerical dataset includes:

4. The financial service risk prediction method according to claim 1, wherein the generating a target data set corresponding to historical user financial information of a plurality of users based on the optimal binning-dividing-point list, each element column, and the character data set includes:

5. The financial service risk prediction method according to claim 4, wherein the dichotomy finding step comprises:

6. The financial service risk prediction method of claim 1, wherein the target data set comprises a training set;

7. The financial service risk prediction method of claim 6, wherein the target data set further comprises a test set;

8. A financial service risk prediction apparatus, comprising:

the decision tree calculation module is used for respectively carrying out decision tree calculation on each segmentation point by applying a Spark system so as to obtain a coefficient value of the basis of each segmentation point;

The target data set generation module is used for generating target data sets corresponding to historical user financial information of a plurality of users based on the optimal box division point list, each element column and the character data sets;

9. The financial service risk prediction apparatus according to claim 8, wherein the financial service request type includes: user category, application type, application state and request type of financial loan;

10. The financial service risk prediction apparatus according to claim 8, wherein the cut point acquisition module is configured to perform:

11. The financial service risk prediction apparatus of claim 8, wherein the target data set generation module is configured to:

12. The financial service risk prediction apparatus of claim 11, wherein the target data set generation module further comprises: a binary search unit;

13. The financial service risk prediction apparatus of claim 8, wherein the target data set comprises a training set;

14. The financial service risk prediction apparatus of claim 13, wherein the target data set further comprises a test set;

15. An electronic device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, wherein the processor implements the financial service risk prediction method of any one of claims 1 to 7 when the program is executed by the processor.

16. A computer readable storage medium having stored thereon a computer program, which when executed by a processor implements the financial service risk prediction method of any one of claims 1 to 7.