CN110659817A

CN110659817A - Data processing method and device, machine readable medium and equipment

Info

Publication number: CN110659817A
Application number: CN201910872797.7A
Authority: CN
Inventors: 周曦; 姚志强; 胡佩涛
Original assignee: Shanghai Cloud From Enterprise Development Co Ltd
Current assignee: Shanghai Cloud From Enterprise Development Co Ltd
Priority date: 2019-09-16
Filing date: 2019-09-16
Publication date: 2020-01-07

Abstract

The invention provides a data processing method, which comprises the following steps: acquiring a service request of a financial service object; acquiring the attribute and behavior data of the financial business object; matching a corresponding service model according to the service type of the service request, wherein the service model is generated by training of a plurality of application components; and processing the attribute and behavior data of the financial service object through the service model, and outputting a financial service processing result. The invention breaks through the limitation that operators need to master modeling skills, and enables business experts to generate and edit the own scoring cards by utilizing the existing application components. The programming technology of the programming expert is converted into a universal component by utilizing the application component, so that the application component can be used by other people, and the popularization and the promotion of the artificial intelligence technology are facilitated.

Description

Data processing method and device, machine readable medium and equipment

Technical Field

The present invention relates to the field of financial technologies, and in particular, to a data processing method, apparatus, machine-readable medium, and device.

Background

With the development of artificial intelligence, artificial intelligence technology gradually moves out of laboratories and merges into various industries and our daily lives. The artificial intelligence has the advantages of identifying modes, predicting future events, formulating rules, driving an automatic flow and being rapid, brings good experience to users, and has high accuracy in specific application scenes. These features of artificial intelligence are rapidly changing to become a competing factor for a successful financial services enterprise. The scoring card is a common tool in the financial field, and needs artificial intelligence to be added.

However, the popularization of the current artificial intelligence technology in the financial industry has the following limitations:

1. recruitment professionals are costly

The artificial intelligence technique depends on statistics and computer knowledge and needs a lot of training to master, so the artificial intelligence technique is mastered in hands of some doctors, major and other people with professional skills. The rapid increase in demand has resulted in a surge in personnel costs, and for some small-scale financial institutions, it is not easy to maintain a large amount of human costs.

Recruited professionals are typically non-financial professionals and are not familiar with banking.

The method can be used for recruiting professional financial intelligent talents and is not suitable for small and medium financial institutions;

2. the original training difficulty of personnel is large

As previously mentioned, artificial intelligence techniques rely on statistical, computer knowledge and require extensive training to master. Original personnel of the bank cannot quickly learn the artificial intelligence modeling method. Even with some knowledge, the model cannot be optimized skillfully.

Therefore, there is a need to solve the above problems.

Disclosure of Invention

In view of the above-mentioned shortcomings of the prior art, the present invention provides a data processing method, device, machine-readable medium and apparatus, which are used to solve the problems of the prior art.

To achieve the above and other related objects, the present invention provides a data processing method, including:

acquiring the attribute and behavior data of the financial business object;

and processing the attribute and the behavior data of the financial business object and outputting a financial business processing result.

Optionally, the processing the attribute and behavior data of the financial transaction object includes inputting the attribute and behavior data of the financial transaction object into the generated transaction model.

Optionally, the attributes of the financial transaction object include name, age, locale, occupation, income, cultural degree, and asset condition.

Optionally, the behavior data includes whether a loan has occurred, whether it is overdue.

Optionally, the processing result includes whether to give credit and the amount and interest rate of the credit.

Optionally, the method for generating the business model includes:

preprocessing sample data;

performing box separation processing on the preprocessed sample data to output a box separation data table;

calculating WoE values for each data field in the binned data table to output a WoE value data table;

calculating the IV value of each data field in the box data table according to the WoE value data table to output an IV value data table;

screening the data fields in the IV value data table according to a set screening threshold value;

outputting a scoring card model according to the screened data fields and the box data table;

and outputting corresponding scoring cards according to the WoE value data tables and the model parameters of the scoring card models.

Optionally, the method for generating a business model further includes:

evaluating the scoring card model;

and outputting the rating card model and the evaluation indexes of the rating card model.

Optionally, the preprocessing the sample data includes:

receiving sample data;

sampling the sample data to output a first data table;

processing missing data in the first data table to output a second data table;

and processing the abnormal value in the second data table to output a third data table.

To achieve the above and other related objects, the present invention also provides a data processing method, including:

acquiring a service request of a financial service object;

acquiring the attribute and behavior data of the financial business object;

matching a corresponding service model according to the service type of the service request, wherein the service model is generated by training of a plurality of application components;

and processing the attribute and behavior data of the financial service object through the corresponding application component, and outputting a financial service processing result.

Optionally, the application component comprises:

the data preprocessing component is used for preprocessing the sample data;

the data binning component is used for binning the preprocessed sample data to output a binning data table;

WoE a value calculation component for calculating WoE values for respective data fields in the binned data table to output a WoE value data table;

an IV value calculating component for calculating the IV value of each data field in the bin data table according to the WoE value data table to output an IV value data table;

the characteristic selection component is used for screening the data fields in the IV value data table according to a set screening threshold value;

the model generation component is used for outputting a scoring card model according to the screened data fields and the box data table;

and the scoring card generating component is used for outputting the corresponding scoring card according to the WoE value data table and the model parameters of the scoring card model.

Optionally, the application component further comprises:

an evaluation component for evaluating the scoring card model;

and the derivation component is used for outputting the rating card model and the evaluation indexes of the rating card model.

Optionally, the data preprocessing component includes:

a data receiving component for receiving sample data;

the data sampling component is used for sampling the sample data to output a first data table;

the missing value processing component is used for processing the missing data in the first data table to output a second data table;

and the abnormal value processing component is used for processing the abnormal value in the second data table to output a third data table.

Optionally, processing the missing data includes replacing the missing value with one of the following: pre-value, post-value, maximum value, minimum value, mean value, a self-defined value.

Optionally, processing the outliers comprises filling the outliers, and the filling method comprises mode filling, median filling, mean filling and specified value filling.

Optionally, the binning process includes equal frequency binning, equal width binning, and chi-square binning.

Optionally, the scoring card model is a logistic regression model, a probabilistic regression model, a decision tree, a neural network.

Optionally, the scoring card generating component converts the output value of the scoring card model into the scoring card score.

Optionally, the evaluation index for evaluating the effect of the score card model includes AUC value, KS value, Accuracy value, Precision value, Recall value, ROC curve, KS curve, PR curve.

Optionally, the evaluation index of the output scoring card model includes an AUC value, a KS value, an Accuracy value, a Precision value, a Recall value, an ROC curve, a KS curve, a PR curve, a scoring card scale, and a scoring card detail.

To achieve the above and other related objects, the present invention also provides a data processing apparatus comprising:

the service request acquisition module is used for acquiring a service request of a financial service object;

the data acquisition module is used for acquiring the attribute and behavior data of the financial business object;

the model matching module is used for matching a corresponding service model according to the service type of the service request, and the service model is generated by training a plurality of application components;

and the result output module is used for processing the attribute and the behavior data of the financial business object through the corresponding business model and outputting a financial business processing result.

Optionally, the business model is generated by a business model generating component, and the business model generating component includes:

the data preprocessing component is used for preprocessing the sample data;

Optionally, the business model generating component further comprises:

an evaluation component for evaluating the scoring card model;

Optionally, the data preprocessing component includes:

a data receiving component for receiving sample data;

To achieve the above and other related objects, the present invention also provides an apparatus comprising:

one or more processors; and

one or more machine-readable media having instructions stored thereon that, when executed by the one or more processors, cause the apparatus to perform one or more of the methods described previously.

To achieve the above objects and other related objects, the present invention also provides one or more machine-readable media having instructions stored thereon, which when executed by one or more processors, cause an apparatus to perform one or more of the methods described above.

As described above, the data processing method, apparatus, machine-readable medium and device provided by the present invention have the following beneficial effects:

the invention breaks through the limitation that operators need to master modeling skills, and enables business experts to generate and edit the own scoring cards by utilizing the existing application components. The programming technology of the programming expert is converted into a universal component by utilizing the application component, so that the application component can be used by other people, and the popularization and the promotion of the artificial intelligence technology are facilitated.

Drawings

FIG. 1 is a flow chart of a data processing method according to an embodiment of the present invention;

FIG. 2 is a flowchart illustrating a method for generating a score card model according to an embodiment of the present invention;

FIG. 3 is a flow chart illustrating exemplary pre-processing of sample data according to an embodiment of the present invention;

FIG. 4 is a flow chart of a data processing method according to another embodiment of the present invention;

FIG. 5 is a diagram illustrating application components included in generating a score card model according to an embodiment of the present invention;

FIG. 6 is a diagram illustrating a data preprocessing component according to an embodiment of the present invention;

FIG. 7 is a block diagram of a data processing apparatus according to an embodiment of the present invention;

fig. 8 is a schematic diagram of a hardware structure of a terminal device according to an embodiment of the present invention;

fig. 9 is a schematic diagram of a hardware structure of a terminal device according to an embodiment of the present invention.

Detailed Description

The embodiments of the present invention are described below with reference to specific embodiments, and other advantages and effects of the present invention will be easily understood by those skilled in the art from the disclosure of the present specification. The invention is capable of other and different embodiments and of being practiced or of being carried out in various ways, and its several details are capable of modification in various respects, all without departing from the spirit and scope of the present invention. It is to be noted that the features in the following embodiments and examples may be combined with each other without conflict.

It should be noted that the drawings provided in the following embodiments are only for illustrating the basic idea of the present invention, and the components related to the present invention are only shown in the drawings rather than drawn according to the number, shape and size of the components in actual implementation, and the type, quantity and proportion of the components in actual implementation may be changed freely, and the layout of the components may be more complicated.

As shown in fig. 1, the present invention provides a data processing method, including:

s10, acquiring the attribute and behavior data of the financial business object;

the attributes of the financial business objects comprise names, ages, regions, occupation, income, cultural degree and asset conditions. The behavior data includes whether loan occurs or not and whether overdue occurs or not.

And S11, processing the attribute and behavior data of the financial service object and outputting a financial service processing result.

Wherein, the processing result comprises whether to give credit and the amount and interest rate of the credit.

In one embodiment, the processing the attribute and behavior data of the financial transaction object includes inputting the attribute and behavior data of the financial transaction object into the generated transaction model.

In an embodiment, as shown in fig. 2, the method for generating the business model includes:

s221, preprocessing sample data;

as shown in fig. 3, the preprocessing the sample data includes:

s2210 receives sample data;

in one embodiment, the sample data received by the data receiving component is presented in the form of a data table, including instance name, file path, table name, field format.

S2211 samples the sample data to output a first data table;

specifically, data is sampled in a random and layered mode, sample data is sampled randomly according to a given proportion or number, each sampling is independent, and finally a first data table is output. In one embodiment, the data may also be sampled in a manner with a set back acquisition.

S2212 processes the missing data in the first data table to output a second data table;

due to the fact that loss frequently occurs in business data, the component can fill data convenient for modeling into a data loss area so as to enhance modeling quality. Specifically, the missing data may be processed by replacing a null value or a specified value with a pre-value, a post-value, a maximum value, a minimum value, a mean value, or a self-defined value, or by replacing a character-type null value or a null character string with a pre-value, a post-value, or a self-defined value.

S2213 processes the abnormal value in the second data table to output a third data table.

Because outliers often occur in the business data, the component can first find the outliers and then fill in the data that facilitates modeling to the area of the outliers to improve the quality of the modeling. In one embodiment, the abnormal values are judged by using a box type graph, and abnormal data in the third data table is filled. The data filling method comprises mode filling, median filling, mean filling and specified value filling; mode population refers to population of outliers with the modes of the selected data fields, median population refers to population of outliers with the medians of the selected data fields, mean population refers to population of outliers with the means of the selected data fields, and specified value population refers to population of outliers with NA or other special values.

In an embodiment, pre-processing the sample data further comprises: extracting partial fields from the sample data.

S222, performing box separation processing on the preprocessed sample data to output a box separation data table;

the binning is a necessary step for making a rating card, and refers to smoothly storing data values by considering neighbors (surrounding values), wherein the bin depth is used for indicating that the same number of data exist in different bins, and the bin width is used for indicating a value range of each bin value. The box separation mode comprises equal-frequency box separation, equal-width box separation and card square box separation, wherein the card square box separation is particularly common, and discrete data and continuous data can be subjected to box separation by using the card square box separation.

The basic idea of chi-squared binning is to infer from sample data whether the distribution of the population differs significantly from the expected distribution, or whether the two classification variables are related or independent. The general assumptions can be assumed as: the observed frequency is not different from the expected frequency, or the two variables are independent of each other. In practical applications, the chi-squared value may be calculated assuming that the original assumption holds, where chi-squared represents the degree of deviation between the observed value and the theoretical value.

Equal frequency binning, the boundary values of the intervals are chosen such that each interval contains an approximately equal number of instances. For example, each interval should contain about 10% of instances, say N-10.

And (4) dividing the equal width into N equal parts from the minimum value to the maximum value. If a is the minimum value and B is the maximum value, the length of each interval is W ═ B-a)/N, and the interval boundary values are a + W, A +2W, …, a + (N-1) W. The number of instances of each aliquot may not be equal, considering here only the boundaries.

S223 calculating WoE values of each data field in the box data table to output a WoE value data table;

WoE (weight of evidence), an evidence weight, can convert the logistic regression model into the standard scorecard format, WoE is a form of encoding for the original independent variables. WoE, the contribution of the independent variable is reflected, after WoE encoding, the independent variable has certain standardized property and is insensitive to abnormal values.

S224, calculating the IV value of each data field in the box data table according to the WoE value data table to output an IV value data table;

the IV is called Information Value, and Chinese means Information Value or Information amount.

How to select the most important and direct measurement standard of the model-entering variables? is the prediction capability of the variables, IV is one such index that can be used to measure the prediction capability of the independent variables.

S225, screening the data fields in the IV value data table according to the set screening threshold value;

in this embodiment, the IV value is used to screen the feature, and generally, when the IV value is 0.3 or more, the feature prediction ability is strong. Sometimes the IV of a variable that is inherently important to the business is low due to sample problems. In order to solve the problem, the platform provides a flexible manual feature selection function, and a user can eliminate some features with poor correlation or strong consistency according to expert experience.

S226, outputting a scoring card model according to the screened data fields and the box data table;

the scoring card model can be a logistic regression model, a probabilistic regression model, a decision tree, or a neural network. In this embodiment, the scorecard model selects the logistic regression model, which has the advantages of simplicity, stability, strong interpretability, mature technology, easy detection and deployment, and the like, and is the most frequently used algorithm for the scorecard model.

S227 outputs the corresponding rating card according to the WoE value data sheet and the model parameters of the rating card model.

In building scoring card models, logistic regression is often used to model the data. However, in prediction using logistic regression, logistic regression returns a probability value and not a scorecard score. Therefore, in the present embodiment, the corresponding rating card is generated from the WoE value data table and the model parameters of the rating card model.

As shown in fig. 4, the present invention further provides a data processing method, including:

s20, acquiring a service request of the financial service object;

s21, acquiring the attribute and behavior data of the financial business object;

S22, matching a corresponding service model according to the service type of the service request, wherein the service model is generated by training a plurality of application components;

and S23, processing the attribute and behavior data of the financial service object through the corresponding application component, and outputting a financial service processing result. Wherein, the processing result comprises whether to give credit and the amount and interest rate of the credit.

In an embodiment, the method further comprises configuring parameters of the application component.

Generally, different service types correspond to different service models, and the scoring card model is further described in this embodiment. As shown in fig. 5, the application components may include a plurality of components, and the function of each application component may be a complete function, which can be directly used by a user when selecting the application components, thereby improving convenience of use. And each application component can be adjusted according to actual needs and then used, so that different functions are realized by combining different application components differently, and the operation flexibility is improved. And establishing input and output links for the required application components according to the step of generating the financial business processing result.

In one embodiment, the generation of the score card is described as a specific embodiment.

A scoring card: the credit scoring card is one of the most common financial wind control means, and is used for scoring the credit of a client by using a certain credit scoring model according to various attributes and behavior data of the client, and accordingly determining whether to give credit or not and the amount and interest rate of the credit so as to identify and reduce transaction risks in financial transactions.

The application components may specifically include a data preprocessing component 110, a data binning component 111, an WoE value calculation component 112, an IV value calculation component 113, a feature selection component 114, a model generation component 115, and a scorecard generation component 116.

The data preprocessing component 110 is configured to preprocess sample data; the preprocessing of the sample data specifically refers to processing the sample data into data meeting requirements.

As shown in fig. 6, the data preprocessing component includes:

a data receiving component 1110 for receiving sample data;

A data sampling component 1111, configured to sample the sample data to output a first data table;

A missing value processing component 1112, configured to process missing data in the first data table to output a second data table;

An abnormal value processing component 1113, configured to process the abnormal value in the second data table to output a third data table.

In an embodiment, the application component may further comprise a data source reading component for extracting partial fields from the sample data.

The data binning component 111 is used for performing binning processing on the preprocessed sample data to output a binning data table;

WoE a value calculating component 112 for calculating WoE values for respective data fields in the binned data table to output a WoE value data table;

An IV value calculating component 113 for calculating IV values of the respective data fields in the binned data table from the WoE value data table to output an IV value data table;

A feature selection component 114 for screening data fields in the IV value data table according to a set screening threshold;

The model generation component 115 is used for outputting a scoring card model according to the screened data fields and the box data table; the scoring card model is a logistic regression model, a probability regression model, a decision tree and a neural network. In this embodiment, the scorecard model selects the logistic regression model, which has the advantages of simplicity, stability, strong interpretability, mature technology, easy detection and deployment, and the like, and is the most frequently used algorithm for the scorecard model.

In building scoring card models, logistic regression is often used to model the data. However, in prediction using logistic regression, logistic regression returns a probability value and not a scorecard score. Accordingly, the scoring card generating component 116 is operable to generate a corresponding scoring card from the WoE value data sheet and the model parameters of the scoring card model.

In one embodiment, the conversion of the score card is described in detail.

Score card definition

The probability of a known bad user is: p (Y ═ 1| x) ═ p

The probability of a good user is: p (Y ═ 0| x) ═ 1-p

A ratio of good to bad users (ratio of bad to good users, numerator bad user) can be calculated, called the ratio:

odds＝{p}/{1-p}

the score scale set by the score card may be expressed by a linear expression expressing the score as a log of ratio, i.e. a scale with a scale of scores that is a function of the log of the ratio

score＝A+B*ln(odds)

Wherein A and B are constants

Scoring card conversion

The conversion steps are as follows:

fraction p _ {0} when odds ═ Theta _ {0} is set

Setting the score of PDO (point of double odds) for each 1-fold increase of odds

Substituting the fraction p _ {0}, when odds ═ Theta _ {0}, the fraction p _ {0} + PDO, when odds ═ 2 · _ Theta _ {0}, into a fraction equation, to obtain:

p_{0}＝A+B*ln(Theta_{0})

p_{0}+PDO＝A+B*ln(2*Theta_{0})

then, the values of a and B can be calculated, i.e.:

B＝{PDO}/{ln(2)}

A＝p_{0}-B*ln(Theta_{0})

typically, the score will be rounded to the nearest integer to simplify the presentation and interpretability of the score card. This rounding will yield an approximation of the score, but the effect is small and negligible.

In order to facilitate the use of business personnel, the scoring card can be displayed more carefully, namely the influence of different values of each variable on the result of the scoring card.

In the known manner, it is known that,

p(Y＝1|x)＝{e^{Theta x}}/{1+e^{Theta x}}

p(Y＝0|x)＝{1}/{1+e^{Theta x}}

odds＝{p(Y＝1|x)}/{p(Y＝0|x)}＝e^{Theta x}

then, the score card can be expressed as:

score＝A+B*ln(odds)

score＝A+B*ln(e^{Theta*x})

score＝A+B*sum{Theta_{i}*x_{i}}

wherein Theta _ { i } x _ { i }, (Theta _ { i } w _ { i1}) delta _ { i1} + (Theta _ { i } w _ { i2}) delta _ { i2} +,. + (Theta _ { i } w _ { m2}) delta _ { im }.

m is the value number of x _ { i } after box separation;

w _ { im } is WoE value corresponding to mth value of variable x _ { i };

and delta _ { im } is a binary variable, and if x _ { i } takes the m-th value after binning, the value is 1, otherwise, the value is 0.

In this embodiment, the reference score is 500, and PDO is 20.

The final rating card is shown in table 1.

TABLE 1

Generally, after the score card model is established, the effect of the score card model needs to be evaluated, so the application component further comprises an evaluation component 117 for evaluating the score card model. During testing, the data for testing can be input into the scoring card model to evaluate the effect of the scoring card model through the output indexes. Wherein, the evaluation index can comprise an AUC value, a KS value, an Accuracy value, a Precision value, a Recall value, a ROC curve, a KS curve and a PR curve;

AUC (area Under the dark) represents the probability that a sample A is randomly selected from all positive examples and a sample B is randomly selected from all negative examples, and the classifier judges A as a positive example more likely than B as a positive example. All samples are firstly sorted according to the prediction probability of the classifier when an ROC curve is drawn, so that the AUC reflects the sorting capability of the classifier on the samples, and the larger the AUC is, the better the sorting capability is, namely, the more positive examples are sorted before the negative examples by the classifier. The AUC is larger, which shows that the accuracy of the algorithm and the model is higher and better, and the requirement of the algorithm and the model on line can be achieved generally above 0.7.

The KS value is the maximum distance between two lines in the KS map, which reflects the partition capability of the classifier. The KS is larger, the accuracy of the algorithm and the model is higher and better, and the online requirement can be met generally above 0.7

Accuracy refers to the ratio of the number of correctly predicted samples to the total number of predicted samples, regardless of whether the predicted samples are positive or negative examples.

Precision refers to the ratio of the number of correctly predicted positive samples to the number of all predicted positive samples, i.e., how many of all predicted positive samples are true positive samples. Precision only focuses on the part predicted as positive samples, while Accuracy considers all samples.

Recall refers to the ratio of the number of correctly predicted positive samples to the total number of true positive samples, i.e., how many positive samples I can correctly find out from these samples.

ROC curve (Receiver Operating characterization): the ROC curve is commonly used for model comparison in the two-class problem, and is mainly expressed as a trade-off between true normal rate (TPR) and false normal rate (FPR). The specific method is to respectively use TPR and FPR as a vertical axis and a horizontal axis to be plotted under different classification threshold (threshold) settings. The ROC curve can be viewed as a "confrontation" between positive and negative examples in all samples as the threshold is moved. The closer the curve is to the upper left corner, meaning that more positive cases are preferred over negative cases, the better the overall performance of the model.

KS curve (Kolmogorov-Smirnov): the index measures the difference between the good and bad sample cumulative divisions. The greater the cumulative difference of good and bad samples, the greater the KS index, and the stronger the risk discrimination ability of the model.

PR curve (Kolmogorov-Smirnov): the PR curve shows a Precision vs Recall curve, the same point of the PR curve and the ROC curve is that TPR (Recall) is adopted, and the effect of the classifier can be measured by AUC. The difference is that the ROC curve uses FPR and the PR curve uses Precision, so both indices of the PR curve focus on the positive case. The PR curve is widely considered superior to the ROC curve in this case because of the major concern in the class imbalance problem.

In one embodiment, the application component further comprises a derivation component 118 for outputting the score card model and evaluation indexes of the score card model, including indexes (Auc, KS, Accuracy, real) not limited to the aforementioned models, and visual reports (ROC, KS, P-R), and may further include scales of the score cards and score card details.

As shown in fig. 7, the present invention also provides a data processing apparatus, including:

a service request obtaining module 10, configured to obtain a service request of a financial service object;

the data acquisition module 11 is used for acquiring the attribute and behavior data of the financial business object;

the model matching module 12 is configured to match a corresponding service model according to the service type of the service request, where the service model is generated by training a plurality of application components;

and the result output module 13 is used for processing the attribute and the behavior data of the financial service object through the corresponding service model and outputting a financial service processing result.

The attributes of the financial transaction object include name, age, location, occupation, income, cultural degree, and asset condition. The behavior data includes whether loan occurs or not and whether overdue occurs or not. The processing result comprises whether the credit is given or not and the amount and interest rate of the credit.

In an embodiment, the apparatus further includes a parameter configuration module configured to configure a parameter of the application component.

In this embodiment, the business model is generated by a business model generating component, the business model generating component may include a plurality of application components, and the function of each application component may be a complete function, and when a user selects the application components, the user can directly use the application components, thereby improving the convenience of use. And each application component can be adjusted according to actual needs and then used, so that different functions are realized by combining different application components differently, and the operation flexibility is improved. And establishing input and output links for the required application components according to the step of generating the financial business processing result.

The data preprocessing component comprises:

a data receiving component 1110 for receiving sample data;

Because outliers often occur in the business data, the component can first find the outliers and then fill in the data that facilitates modeling to the area of the outliers to improve the quality of the modeling. In one embodiment, the abnormal values are judged by using a box type graph, and abnormal data in the third data table is filled. The data filling method comprises mode filling, median filling, mean filling and specified value filling; mode filling refers to filling the abnormal values by adopting modes of the selected data fields, median filling refers to filling the abnormal values by adopting median of the selected data fields, mean filling refers to filling the abnormal values by adopting mean of the selected data fields, and specified value filling refers to filling the abnormal values by adopting NA or other special values.

In one embodiment, the conversion of the score card is described in detail.

Score card definition

The probability of a known bad user is: p (Y ═ 1| x) ═ p

The probability of a good user is: p (Y ═ 0| x) ═ 1-p

odds＝{p}/{1-p}

score＝A+B*ln(odds)

Wherein A and B are constants

Scoring card conversion

The conversion steps are as follows:

fraction p _ {0} when odds ═ Theta _ {0} is set

p_{0}＝A+B*ln(Theta_{0})

p_{0}+PDO＝A+B*ln(2*Theta_{0})

then, the values of a and B can be calculated, i.e.:

B＝{PDO}/{ln(2)}

A＝p_{0}-B*ln(Theta_{0})

In the known manner, it is known that,

p(Y＝1|x)＝{e^{Theta x}}/{1+e^{Theta x}}

p(Y＝0|x)＝{1}/{1+e^{Theta x}}

odds＝{p(Y＝1|x)}/{p(Y＝0|x)}＝e^{Theta x}

then, the score card can be expressed as:

score＝A+B*ln(odds)

score＝A+B*ln(e^{Theta*x})

score＝A+B*sum{Theta_{i}*x_{i}}

m is the value number of x _ { i } after box separation;

w _ { im } is WoE value corresponding to mth value of variable x _ { i };

In this embodiment, the reference score is 500, and PDO is 20.

The final rating card is shown in table 2.

TABLE 2

Serial number	Variables of	Interval (left closed and right open)	Score of
				1	zhima_score	-inf,603	16
2	zhima_score	603,611	17
				3	zhima_score	611,615	18
4	zhima_score	615,635	19
				5	zhima_score	635+	21
6	step_number	-inf,-1	13
				7	step_number	-1,3.6667	17
8	step_number	3.6667,1944	16

An embodiment of the present application further provides an apparatus, which may include: one or more processors; and one or more machine readable media having instructions stored thereon that, when executed by the one or more processors, cause the apparatus to perform the method of fig. 1. In practical applications, the device may be used as a terminal device, and may also be used as a server, where examples of the terminal device may include: the mobile terminal includes a smart phone, a tablet computer, an electronic book reader, an MP3 (Moving Picture Experts Group Audio Layer III) player, an MP4 (Moving Picture Experts Group Audio Layer IV) player, a laptop, a vehicle-mounted computer, a desktop computer, a set-top box, an intelligent television, a wearable device, and the like.

The present embodiment also provides a non-volatile readable storage medium, where one or more modules (programs) are stored in the storage medium, and when the one or more modules are applied to a device, the device may be caused to execute instructions (instructions) of steps included in the face recognition method in fig. 1 according to the present embodiment.

Fig. 8 is a schematic diagram of a hardware structure of a terminal device according to an embodiment of the present application. As shown, the terminal device may include: an input device 1100, a first processor 1101, an output device 1102, a first memory 1103, and at least one communication bus 1104. The communication bus 1104 is used to implement communication connections between the elements. The first memory 1103 may include a high-speed RAM memory, and may also include a non-volatile storage NVM, such as at least one disk memory, and the first memory 1103 may store various programs for performing various processing functions and implementing the method steps of the present embodiment.

Alternatively, the first processor 1101 may be, for example, a Central Processing Unit (CPU), an Application Specific Integrated Circuit (ASIC), a Digital Signal Processor (DSP), a Digital Signal Processing Device (DSPD), a Programmable Logic Device (PLD), a Field Programmable Gate Array (FPGA), a controller, a microcontroller, a microprocessor, or other electronic components, and the first processor 1101 is coupled to the input device 1100 and the output device 1102 through a wired or wireless connection.

Optionally, the input device 1100 may include a variety of input devices, such as at least one of a user-oriented user interface, a device-oriented device interface, a software programmable interface, a camera, and a sensor. Optionally, the device interface facing the device may be a wired interface for data transmission between devices, or may be a hardware plug-in interface (e.g., a USB interface, a serial port, etc.) for data transmission between devices; optionally, the user-facing user interface may be, for example, a user-facing control key, a voice input device for receiving voice input, and a touch sensing device (e.g., a touch screen with a touch sensing function, a touch pad, etc.) for receiving user touch input; optionally, the programmable interface of the software may be, for example, an entry for a user to edit or modify a program, such as an input pin interface or an input interface of a chip; the output devices 1102 may include output devices such as a display, audio, and the like.

In this embodiment, the processor of the terminal device includes a module for executing the functions of the modules of the face recognition apparatus in each device, and specific functions and technical effects may refer to the foregoing embodiments, which are not described herein again.

Fig. 9 is a schematic hardware structure diagram of a terminal device according to an embodiment of the present application. FIG. 9 is a specific embodiment of the implementation of FIG. 8. As shown, the terminal device of the present embodiment may include a second processor 1201 and a second memory 1202.

The second processor 1201 executes the computer program code stored in the second memory 1202 to implement the method described in fig. 4 in the above embodiment.

The second memory 1202 is configured to store various types of data to support operations at the terminal device. Examples of such data include instructions for any application or method operating on the terminal device, such as messages, pictures, videos, and so forth. The second memory 1202 may include a Random Access Memory (RAM) and may also include a non-volatile memory (non-volatile memory), such as at least one disk memory.

Optionally, a second processor 1201 is provided in the processing assembly 1200. The terminal device may further include: communication component 1203, power component 1204, multimedia component 1205, speech component 1206, input/output interfaces 1207, and/or sensor component 1208. The specific components included in the terminal device are set according to actual requirements, which is not limited in this embodiment.

The processing component 1200 generally controls the overall operation of the terminal device. The processing assembly 1200 may include one or more second processors 1201 to execute instructions to perform all or part of the steps of the data processing method described above. Further, the processing component 1200 can include one or more modules that facilitate interaction between the processing component 1200 and other components. For example, the processing component 1200 can include a multimedia module to facilitate interaction between the multimedia component 1205 and the processing component 1200.

The power supply component 1204 provides power to the various components of the terminal device. The power components 1204 may include a power management system, one or more power sources, and other components associated with generating, managing, and distributing power for the terminal device.

The multimedia components 1205 include a display screen that provides an output interface between the terminal device and the user. In some embodiments, the display screen may include a Liquid Crystal Display (LCD) and a Touch Panel (TP). If the display screen includes a touch panel, the display screen may be implemented as a touch screen to receive an input signal from a user. The touch panel includes one or more touch sensors to sense touch, slide, and gestures on the touch panel. The touch sensor may not only sense the boundary of a touch or slide action, but also detect the duration and pressure associated with the touch or slide operation.

The voice component 1206 is configured to output and/or input voice signals. For example, the voice component 1206 includes a Microphone (MIC) configured to receive external voice signals when the terminal device is in an operational mode, such as a voice recognition mode. The received speech signal may further be stored in the second memory 1202 or transmitted via the communication component 1203. In some embodiments, the speech component 1206 further comprises a speaker for outputting speech signals.

The input/output interface 1207 provides an interface between the processing component 1200 and peripheral interface modules, which may be click wheels, buttons, etc. These buttons may include, but are not limited to: a volume button, a start button, and a lock button.

The sensor component 1208 includes one or more sensors for providing various aspects of status assessment for the terminal device. For example, the sensor component 1208 may detect an open/closed state of the terminal device, relative positioning of the components, presence or absence of user contact with the terminal device. The sensor assembly 1208 may include a proximity sensor configured to detect the presence of nearby objects without any physical contact, including detecting the distance between the user and the terminal device. In some embodiments, the sensor assembly 1208 may also include a camera or the like.

The communication component 1203 is configured to facilitate communications between the terminal device and other devices in a wired or wireless manner. The terminal device may access a wireless network based on a communication standard, such as WiFi, 2G or 3G, or a combination thereof. In one embodiment, the terminal device may include a SIM card slot therein for inserting a SIM card therein, so that the terminal device may log onto a GPRS network to establish communication with the server via the internet.

As can be seen from the above, the communication component 1203, the voice component 1206, the input/output interface 1207 and the sensor component 1208 involved in the embodiment of fig. 9 can be implemented as the input device in the embodiment of fig. 8.

The foregoing embodiments are merely illustrative of the principles and utilities of the present invention and are not intended to limit the invention. Any person skilled in the art can modify or change the above-mentioned embodiments without departing from the spirit and scope of the present invention. Accordingly, it is intended that all equivalent modifications or changes which can be made by those skilled in the art without departing from the spirit and technical spirit of the present invention be covered by the claims of the present invention.

Claims

1. A method of data processing, the method comprising:

acquiring the attribute and behavior data of the financial business object;

2. The data processing method of claim 1, wherein processing the attribute and behavior data of the financial transaction object comprises inputting the attribute and behavior data of the financial transaction object into a generated transaction model.

3. The data processing method of claim 1, wherein the attributes of the financial transaction object include name, age, locale, occupation, income, cultural degree, and asset condition.

4. The data processing method of claim 1, wherein the behavior data includes whether a loan has occurred and whether it is overdue.

5. The data processing method according to claim 1, wherein the processing result includes whether or not to give credit, and the amount and interest rate of the credit.

6. The data processing method of claim 2, wherein the method for generating the business model comprises:

preprocessing sample data;

7. The data processing method of claim 6, wherein the method for generating the business model further comprises:

evaluating the scoring card model;

8. The data processing method of claim 6, wherein the pre-processing the sample data comprises:

receiving sample data;

sampling the sample data to output a first data table;

processing missing data in the first data table to output a second data table;

9. A method of data processing, the method comprising:

acquiring a service request of a financial service object;

acquiring the attribute and behavior data of the financial business object;

and processing the attribute and behavior data of the financial service object through the service model, and outputting a financial service processing result.

10. A data processing method according to claim 9, characterized in that the method further comprises configuring parameters of said application components.

11. The data processing method of claim 9, wherein the attributes of the financial transaction object include name, age, location, occupation, income, cultural degree, and asset condition.

12. The data processing method of claim 9, wherein the behavior data includes whether a loan has occurred and whether it is overdue.

13. The data processing method according to claim 9, wherein the processing result includes whether or not to give credit, and the amount and interest rate of the credit.

14. The data processing method of claim 9, wherein the application component comprises:

the data preprocessing component is used for preprocessing the sample data;

15. The data processing method of claim 14, wherein the application component further comprises:

an evaluation component for evaluating the scoring card model;

16. The data processing method of claim 14, wherein the data pre-processing component comprises:

a data receiving component for receiving sample data;

17. The data processing method of claim 16, wherein processing the missing data comprises replacing the missing value with one of a null value or a specified value: pre-value, post-value, maximum value, minimum value, mean value, a self-defined value.

18. The data processing method of claim 16, wherein processing the outliers comprises filling the outliers, and wherein the filling comprises mode filling, median filling, mean filling, and specified value filling.

19. The data processing method of claim 14, wherein the binning process comprises equal frequency binning, equal width binning, chi-square binning.

20. The data processing method of claim 14, wherein the scoring card model is a logistic regression model, a probabilistic regression model, a decision tree, a neural network.

21. The data processing method of claim 14, wherein the scorecard generation component converts the output value of the scorecard model into a scorecard score.

22. The data processing method of claim 15, wherein the evaluation index for evaluating the effect of the scorecard model comprises an AUC value, a KS value, an Accuracy value, a Precision value, a Recall value, a ROC curve, a KS curve, and a PR curve.

23. The data processing method of claim 15, wherein the evaluation index of the outputted score card model includes an AUC value, a KS value, an Accuracy value, a Precision value, a Recall value, a ROC curve, a KS curve, a PR curve, a score card scale, and a score card detail.

24. A data processing apparatus, characterized in that the apparatus comprises:

25. The data processing apparatus of claim 24, further comprising a parameter configuration module configured to configure parameters of the application components.

26. The data processing apparatus of claim 24, wherein the attributes of the financial transaction object include name, age, locale, occupation, income, cultural degree, and asset condition.

27. The data processing apparatus of claim 24, wherein the behavior data comprises whether a loan has occurred and whether it is overdue.

28. The data processing apparatus of claim 24, wherein the processing result includes whether or not to give credit and the amount and interest rate of the credit.

29. The data processing apparatus of claim 24, wherein the business model is generated by a business model generation component comprising:

the data preprocessing component is used for preprocessing the sample data;

30. The data processing apparatus of claim 29, wherein the business model generation component further comprises:

an evaluation component for evaluating the scoring card model;

31. The data processing apparatus of claim 28, wherein the data pre-processing component comprises:

a data receiving component for receiving sample data;

32. The data processing apparatus of claim 31, wherein processing missing data comprises replacing missing values by one of a null value or a specified value: pre-value, post-value, maximum value, minimum value, mean value, a self-defined value.

33. The data processing apparatus of claim 31, wherein processing the outliers comprises filling the outliers, and wherein the filling method comprises mode filling, median filling, mean filling, and specified value filling.

34. The data processing apparatus of claim 29, wherein the binning process comprises equal frequency binning, equal width binning, chi-square binning.

35. The data processing apparatus of claim 29, wherein the scoring card model is a logistic regression model, a probabilistic regression model, a decision tree, a neural network.

36. The data processing apparatus of claim 29, wherein the scorecard generation component converts the output value of the scorecard model into a scorecard score.

37. The data processing apparatus of claim 30, wherein the evaluation index for evaluating the effect of the scorecard model comprises an AUC value, a KS value, an Accuracy value, a Precision value, a Recall value, a ROC curve, a KS curve, a PR curve.

38. The data processing apparatus of claim 30, wherein the evaluation index of the outputted scorecard model includes an AUC value, a KS value, an Accuracy value, a Precision value, a Recall value, a ROC curve, a KS curve, a PR curve, a scorecard scale, and a scorecard detail.

39. An apparatus, comprising:

one or more processors; and

one or more machine-readable media having instructions stored thereon that, when executed by the one or more processors, cause the apparatus to perform the method recited by one or more of claims 1-8 or 9-23.

40. One or more machine readable media having instructions stored thereon that, when executed by one or more processors, cause an apparatus to perform the method of one or more of claims 1-8 or 9-23.