CN108876076A - The personal credit methods of marking and device of data based on instruction - Google Patents

The personal credit methods of marking and device of data based on instruction Download PDF

Info

Publication number
CN108876076A
CN108876076A CN201710322533.5A CN201710322533A CN108876076A CN 108876076 A CN108876076 A CN 108876076A CN 201710322533 A CN201710322533 A CN 201710322533A CN 108876076 A CN108876076 A CN 108876076A
Authority
CN
China
Prior art keywords
index
group
sample
regression models
logic regression
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201710322533.5A
Other languages
Chinese (zh)
Inventor
张湛梅
张晓川
徐睿
崔志顺
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
China Mobile Communications Group Co Ltd
China Mobile Group Guangdong Co Ltd
Original Assignee
China Mobile Communications Group Co Ltd
China Mobile Group Guangdong Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by China Mobile Communications Group Co Ltd, China Mobile Group Guangdong Co Ltd filed Critical China Mobile Communications Group Co Ltd
Priority to CN201710322533.5A priority Critical patent/CN108876076A/en
Publication of CN108876076A publication Critical patent/CN108876076A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q10/00Administration; Management
    • G06Q10/06Resources, workflows, human or project management; Enterprise or organisation planning; Enterprise or organisation modelling
    • G06Q10/063Operations research, analysis or management
    • G06Q10/0639Performance analysis of employees; Performance analysis of enterprise or organisation operations
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F17/00Digital computing or data processing equipment or methods, specially adapted for specific functions
    • G06F17/10Complex mathematical operations
    • G06F17/18Complex mathematical operations for evaluating statistical data, e.g. average values, frequency distributions, probability functions, regression analysis

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Business, Economics & Management (AREA)
  • General Physics & Mathematics (AREA)
  • Human Resources & Organizations (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Economics (AREA)
  • Computational Mathematics (AREA)
  • Mathematical Analysis (AREA)
  • Mathematical Optimization (AREA)
  • Operations Research (AREA)
  • Strategic Management (AREA)
  • Pure & Applied Mathematics (AREA)
  • Entrepreneurship & Innovation (AREA)
  • Mathematical Physics (AREA)
  • Educational Administration (AREA)
  • Development Economics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Algebra (AREA)
  • Evolutionary Biology (AREA)
  • Software Systems (AREA)
  • Databases & Information Systems (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • General Engineering & Computer Science (AREA)
  • Game Theory and Decision Science (AREA)
  • Probability & Statistics with Applications (AREA)
  • Marketing (AREA)
  • Quality & Reliability (AREA)
  • Tourism & Hospitality (AREA)
  • General Business, Economics & Management (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

The embodiment of the invention discloses the personal credit methods of marking and device of a kind of data based on instruction.Method includes:Sample account group is obtained, and selects positive sample and negative sample from sample account group according to preset rules;Branch mailbox processing is carried out to the first pre-set level group, and the corresponding WOE value of each branch mailbox is obtained according to the accounting of negative sample in each branch mailbox;The estimator of the parameter of prebuild Logic Regression Models is obtained according to the corresponding WOE value of each branch mailbox;Penalty term is configured to the estimator of the Logic Regression Models parameter, the penalty term is for configuration signal data and other contributions of non-signaling data to the Logic Regression Models;It is scored according to personal credit of the Logic Regression Models to user, obtains the personal credit scoring of user.The embodiment of the present invention is based on signaling data and improves to Logic Regression Models, occupies biggish specific gravity to guarantee signaling data in Logic Regression Models, compared with prior art, has the advantages that scoring is more accurate.

Description

The personal credit methods of marking and device of data based on instruction
Technical field
The present embodiments relate to credit legal system technical fields, and in particular to a kind of personal letter of data based on instruction With methods of marking and device.
Background technique
By the end of in September, 2015, Central Bank's credit investigation system has included 8.7 hundred million natural persons and 21,020,000 family enterprises and other groups It knits, the information that Central Bank's credit investigation system is collected further includes social security, common reserve fund, civil ruling and hold using bank credit information as core Row, public utilities and communication payment record etc..In fact, commmunication company possesses full and accurate user behavior data, user context money Material can also grasp signaling data, the airtime communication record, consumer record of paying dues of customer position information, these data are also already It has been brought into national credit investigation system.Reference field most with practical value, commmunication company are landed as big data application It is all constantly exploring always.
During realizing the embodiment of the present invention, inventor has found the personal credit methods of marking that common carrier is established Mainly consider the essential information, service order information, consuming capacity, communication behavior, history defaulting subscriber record, contacts of user Various factors such as circle, but since the factor of consideration is there is no emphasis, the appraisal result actually obtained is inaccurate.
Summary of the invention
One purpose of the embodiment of the present invention is to solve the prior art since the factor that scoring considers does not have emphasis to cause The problem of appraisal result inaccuracy.
The embodiment of the present invention proposes a kind of personal credit methods of marking of data based on instruction, including:
Sample account group is obtained, and selects positive sample and negative sample from sample account group according to preset rules;
Branch mailbox processing is carried out to the first pre-set level group, and each branch mailbox is obtained according to the accounting of negative sample in each branch mailbox Corresponding WOE value;
The estimator of the parameter of prebuild Logic Regression Models is obtained according to the corresponding WOE value of each branch mailbox;
Penalty term configured to the estimator of the Logic Regression Models parameter, the penalty term for configuration signal data with Other contributions of non-signaling data to the Logic Regression Models;
It is scored according to personal credit of the Logic Regression Models to user, obtains the personal credit scoring of user.
Optionally, described to select positive sample from sample account group according to preset rules and negative sample includes:
The dispersion degree of the second pre-set level group of each sample account is judged using Information Entropy;
Positive sample is chosen from the sample account group according to the dispersion degree of the second pre-set level group of each sample account Sheet and negative sample.
Optionally, the estimator configuration penalty term to the Logic Regression Models parameter includes:
The first pre-set level group is analyzed, the first index group relevant to signaling data and and signaling are obtained The second unrelated index group of data;
Construct the penalty term of the second index group middle finger target coefficient Yu the first index group middle finger target coefficient;
The penalty term is configured to the estimator of the Logic Regression Models parameter.
Optionally, the penalty term is
Wherein, ψ1For penalty coefficient, βjFor the coefficient of j-th of index in the second index group,Refer to for described first Kth in mark groupnThe coefficient of a index.
Optionally, it is described according to personal credit of the Logic Regression Models to user carry out scoring include:
Under default constraint condition, Rating Model is converted by the Logic Regression Models;
Using the corresponding data of the second pre-set level group of user as the input of Rating Model, the described second default finger is obtained Each index corresponds to the score value of each branch mailbox in mark group;
The personal credit scoring of user is obtained according to the score value that each index corresponds to each branch mailbox.
The embodiment of the present invention proposes a kind of personal credit scoring apparatus of data based on instruction, including:
Module is obtained, selects positive sample from sample account group for obtaining sample account group, and according to preset rules And negative sample;
Branch mailbox module for carrying out branch mailbox processing to the first pre-set level group, and is accounted for according to negative sample in each branch mailbox Than obtaining the corresponding WOE value of each branch mailbox;
Modeling module, the parameter for obtaining according to the corresponding WOE value of each branch mailbox prebuild Logic Regression Models are estimated Metering;
Configuration module configures penalty term for the estimator to the Logic Regression Models parameter, and the penalty term is used for Configuration signal data and other contributions of non-signaling data to the Logic Regression Models;
Grading module obtains user's for scoring according to personal credit of the Logic Regression Models to user Personal credit scoring.
Optionally, the acquisition module judges the second pre-set level group's of each sample account for use Information Entropy Dispersion degree;Positive sample is chosen from the sample account group according to the dispersion degree of the second pre-set level group of each sample account Sheet and negative sample.
Optionally, the configuration module obtains and signaling data phase for analyzing the first pre-set level group The the first index group and the second index group unrelated with signaling data closed;Construct the second index group middle finger target coefficient with The penalty term of the first index group middle finger target coefficient;The penalty term is configured into estimating to the Logic Regression Models parameter Metering.
Optionally, the penalty term is
Wherein, ψ1For penalty coefficient, βjFor the coefficient of j-th of index in the second index group,Refer to for described first Kth in mark groupnThe coefficient of a index.
Optionally, institute's scoring module, for converting scoring for the Logic Regression Models under default constraint condition Model;Using the corresponding data of the second pre-set level group of user as the input of Rating Model, second pre-set level is obtained Each index corresponds to the score value of each branch mailbox in group;The personal letter of user is obtained according to the score value that each index corresponds to each branch mailbox With scoring.
As shown from the above technical solution, a kind of personal credit scoring side for data based on instruction that the embodiment of the present invention proposes Method and device are based on signaling data and improve to Logic Regression Models, to guarantee that signaling data occupies in Logic Regression Models Biggish specific gravity has the advantages that scoring is more accurate compared with prior art.
Detailed description of the invention
The features and advantages of the present invention will be more clearly understood by referring to the accompanying drawings, and attached drawing is schematically without that should manage Solution is carries out any restrictions to the present invention, in the accompanying drawings:
Fig. 1 shows the process signal of the personal credit methods of marking of the data based on instruction of one embodiment of the invention offer Figure;
Fig. 2 shows another embodiment of the present invention provides the processes of personal credit methods of marking of data based on instruction show It is intended to;
Fig. 3 shows the structural representation of the personal credit scoring apparatus of the data based on instruction of one embodiment of the invention offer Figure.
Specific embodiment
In order to make the object, technical scheme and advantages of the embodiment of the invention clearer, below in conjunction with the embodiment of the present invention In attached drawing, technical scheme in the embodiment of the invention is clearly and completely described, it is clear that described embodiment is A part of the embodiments of the present invention, instead of all the embodiments.Based on the embodiments of the present invention, ordinary skill people Member's every other embodiment obtained without making creative work, shall fall within the protection scope of the present invention.
Fig. 1 shows the process signal of the personal credit methods of marking of the data based on instruction of one embodiment of the invention offer Figure, referring to Fig. 1, this method can be realized by processor, specifically include following distinguishing feature:
110, sample account group is obtained, and selects positive sample and negative sample from sample account group according to preset rules;
It should be noted that sample account herein can be used for phone number, Customs Assigned Number of user of certain enterprise etc. The information of family unique identification;Then, the rule based on some divisions therefrom selects positive sample and negative sample, i.e. good user With bad user it is each how many, or account for the percentage of total sample is how many.
Wherein, there are many rules of division, such as:Fibonacci method.
120, branch mailbox processing is carried out to the first pre-set level group, and is obtained each according to the accounting of negative sample in each branch mailbox The corresponding WOE value of branch mailbox;
It should be noted that branch mailbox facture is more mature technology, details are not described herein again.
130, the estimator of the parameter of prebuild Logic Regression Models is obtained according to the corresponding WOE value of each branch mailbox;
140, penalty term is configured to the estimator of the Logic Regression Models parameter, the penalty term is used for configuration signal number According to other contributions of the non-signaling data to the Logic Regression Models;
It should be noted that needing to limit other non-signaling data for the evaluation factor of model is laid particular emphasis on signaling data Relationship between signaling data, to guarantee that it is non-signaling that signaling data specific gravity shared in Logic Regression Models is greater than other Data proportion.
150, it is scored according to personal credit of the Logic Regression Models to user, the personal credit for obtaining user is commented Point.
It should be noted that after the completion of model foundation, using the relevant information of the first pre-set level group of user as model Input, obtain user personal credit scoring.
Logic Regression Models are improved as it can be seen that the present embodiment is based on signaling data, to guarantee signaling data in logic Occupy biggish specific gravity in regression model, compared with prior art, has the advantages that scoring is more accurate.
Above-mentioned steps are described in detail below:
Firstly, the method for choosing positive/negative sample in step 110 may include steps of:
The dispersion degree of the second pre-set level group of each sample account is judged using Information Entropy;
Positive sample is chosen from the sample account group according to the dispersion degree of the second pre-set level group of each sample account Sheet and negative sample.
Secondly, step 140 specifically includes:
The first pre-set level group is analyzed, the first index group relevant to signaling data and and signaling are obtained The second unrelated index group of data;
Construct the penalty term of the second index group middle finger target coefficient Yu the first index group middle finger target coefficient;
The penalty term is configured to the estimator of the Logic Regression Models parameter.
The form of the penalty term is
Wherein, ψ1For penalty coefficient, βjFor the coefficient of j-th of index in the second index group,Refer to for described first Kth in mark groupnThe coefficient of a index.
Step 150 specifically includes:
Under default constraint condition, Rating Model is converted by the Logic Regression Models;
Using the corresponding data of the second pre-set level group of user as the input of Rating Model, the described second default finger is obtained Each index corresponds to the score value of each branch mailbox in mark group;
The personal credit scoring of user is obtained according to the score value that each index corresponds to each branch mailbox.
Wherein, constraint condition involved in step of converting can be the range of scoring, such as:1-100.
As it can be seen that the present embodiment carries out personal credit scoring using the adaptive logic regression model based on signaling data, from It adapts to choose to ensure that holding when screening index of personal credit Rating Model to the effective index of credit scoring and coefficient The important function for stablizing and embodying signaling data reduces the error of model coefficient, so that Rating Model is more reasonable.
Fig. 2 shows another embodiment of the present invention provides the processes of personal credit methods of marking of data based on instruction show It is intended to, design principle of the invention is described in detail referring to Fig. 2:
1, mentality of designing
This programme returns personal credit Rating Model mainly for traditional logic and optimizes, here using based on letter The Logic Regression Models of data adaptive are enabled to carry out personal credit scoring.The main flow of entire scheme is first to be mentioned using Information Entropy Positive negative sample is taken to model as master sample data for subsequent scoring, selection index relevant to personal credit is measured, which is used as, builds The input variable of mould, including essential information, consuming capacity, credit record, relationship among persons, Behavior preference in terms of index and letter Enable the index in terms of data.Branch mailbox is carried out to the index of master sample data again and its WOE value etc. is asked to pre-process, then establishes base In the adaptive logic regression model of signaling data, adaptive training is carried out to model using signaling data, it is automatic to choose to letter With effective index and the coefficient of scoring, personal credit grade form finally is converted by regression model, is scored for personal credit.
210, extraction standard sample data is for modeling of scoring;
Establish personal credit file system, it is necessary to which the sample for first selecting a part of standard can as reference frame It is handy family which, which is distinguished, those are bad users, and subsequent Rating Model is based on these data and is analyzed.
The technical program using using Information Entropy and combine arrearage in terms of index score user, score value by The arrearage degree of high to Low sequence, the more high then user of score is higher, and the probability of promise breaking is consequently increased, and divides preceding 1% so obtaining User as bad user, i.e. positive sample;The 10% of total user number is randomly selected in remaining user and is used as handy family, i.e., Negative sample.Specific steps are as follows:
1, nearly three months shutdown total degrees, nearly three months arrearage total amounts and client's bill cycle type are chosen as index, this The a little index equilibrium amounts arrearage violation of agreement of user.Since the value range of index is inconsistent, in order to avoid excessively stressing list A index needs to be standardized index, and standardization formula is as follows:
Wherein, Uij, i=1,2 ..., m, j=1,2,3 for j-th of index in initial data i-th of record, m is total uses Family number, VijFor the data after standardization.
2, can be used to judge three months shutdown total degrees, nearly three months arrearage total amounts and customer account by calculating entropy The dispersion degree of three indexs of phase type, it is bigger that dispersion degree shows that more greatly the index influences overall merit.
The entropy of parameter first, has measured the dispersion degree of index, and calculation formula is as follows:
Wherein, rijIndicate i-th of specific gravity for recording j-th of index
Then, the weight of parameter has measured three months shutdown total degrees, nearly three months arrearage total amounts and customer account The coefficient that three indexs of phase type ought to be multiplied by when calculating total score, calculation formula are as follows:
Wherein, hjFor the otherness coefficient h of j-th of indexj=1-ej, j=1,2,3.
Finally, calculating the Information Entropy score of each user according to the weight and index value of index
3, to SiScore value sorts from high to low, and score value is higher to indicate more serious in terms of arrearage promise breaking, obtains and divides preceding 1% User is as bad user, i.e. positive sample;It randomly selects the 10% of total user number in remaining user and is used as handy family, i.e., it is negative Sample.The intersection of positive negative sample is the sample data of standard, establishes credit scoring model for subsequent.
220, it chooses and pre-processes with measuring the relevant index of personal credit and carry out branch mailbox etc.;
The index that can fully assess user credit situation is chosen, while grade form can be formed just for the ease of subsequent scoring In assessment CREDIT SCORE, needs to carry out branch mailbox processing to index, obtain WOE value.
In order to fully assess the credit situation of user, in addition to the essential information from tradition scoring angle extraction user, consumption The five broad aspect index such as ability, credit record, relationship among persons and Behavior preference, is additionally added the signaling data of user, letter here Data are enabled mainly to consider location information.Consider that the daytime of user and the resident position in evening, daytime reside position and write high-end Building and CBD, position is resided at night in the user of high-end cell, credit standing is more high-quality.
The essential information of user mainly includes brand, in the information such as net duration and identity;Consuming capacity is to measure user to exist The hierarchy of consumption of communication consumption, consumption levels, consumption liveness, the expense for mainly including comprising account balance, main package, last month Total talk times, upper three natural monthly average supplement amount etc. with money;Credit record includes upper three for measuring user's contractual capacity A calendar month arrearage total value, upper calendar month barring number of days, a upper calendar month are double to shut down number of days etc.;Relationship among persons are for weighing User social contact relationship strength is measured, relationship among persons, including high frequency opposite end number are assessed from the credit score of social influence power and around people Code number, high frequency opposite-terminal number are averaged duration, intimate personnel's number, intimate personnel's average per capita consumption etc.;Behavior preference is used The liveness and application preferences of app, including APP type preference top1, community's friend-making access times, society are used in measuring user It makes friends using flow, electric business shopping access times, stock class APP access times etc. in area.The signaling data of user is mainly chosen Working day 10:00 to 17:00 resident position is the number and 22 of high-end office building and CBD:00 to next day 6:00 resident position is The number of high-end cell.
Grade form can be formed convenient for assessment CREDIT SCORE for the ease of subsequent scoring, need to carry out index branch mailbox, it is right In continuous type index, a reasonable branch mailbox is to make the data volume in each case more balanced, unsuitable excessive or mistake Few, while monotone increasing or downward trend should be presented in the accounting of negative sample in each case, use WOE value here, it both can be with It measures the trend situation of each branch mailbox and the variable input of subsequent regression model, calculation formula is as follows:
For discrete type index, when the value of index is few, as branch mailbox and WOE directly can be sought by its value Value;When value is more, certain values can be merged, then seek corresponding WOE value.
230, adaptive training is carried out to Rating Model using signaling data;
240, automatic to choose to the effective index of credit scoring and coefficient
Personal credit scoring is carried out based on the adaptive Logic Regression Models of signaling data firstly, establishing.
Than wide, its structure is simple for use in credit scoring model for logistic regression, and the effect of coefficient is easy in industry It is explained in business.
User is that the probability of bad user can indicate that then Logic Regression Models are represented by with P
Wherein xi(i=1,2 ..., s) be index, since P value is between 0 to 1, and by logit transformation after, value Variable range is changed to any real number value, and that need to solve is β=(β01,...,βs)T
When being predicted using logistic regression, whole indexs can be used and enter model, but certain pairs of prediction contribution degrees are not high Also can enter model, cause the deviation of model prediction to become larger, solution at this time be do Variable Selection such as progressive method, retreat The methods of method, successive Regression, rejecting act on unconspicuous index.
But these traditional regression models, when doing successive Regression, variables choice and parameter Estimation are to separate two ranks Section results in the unstability of model selection.Signaling data is utilized certainly based on the adaptive Logic Regression Models of signaling data Variables choice and coefficient estimation are adaptively carried out simultaneously, effectively reduce model coefficient estimated bias.
Here Adaptive-Lasso method solution logic regression model is used first.Data-oriented (X(i),y(i)), i= 1,2 ..., n, wherein X(i)=(xi1,...,xis), indicate the WOE value vector of i-th of data in sample data, total n is a, xi1 Indicate the corresponding WOE value of first index of i-th of data, y(i)Target variable is indicated, if i-th of data is positive sample, y(i)=1;If i-th of data is negative sample, y(i)=0.Then β=(β under Adaptive-Lasso method01,...,βs)T Estimator be defined as
(2) first part of formula indicates the seed superiority of models fitting, this is portion of the general Logic Regression Models when solving Point, second partThen indicate the penalty term of coefficient, λnFor punishment parameter.AndWhereinRepresentation formula (1) carries out the β that least-squares estimation obtainsjEstimated value, as | βj| when coefficient is larger,It gives Lesser punishment can obtain lesser deviation;And work as | βj| when coefficient is lesser,Biggish punishment is given, the coefficient is then It is approximately 0, realizes the function of variables choice.
The process solved simultaneously needs to carry out adaptively using coefficient of the index in terms of signaling data to other indexs Control, it is ensured that the index in terms of signaling data contributes higher weight, so needing on the basis of Adaptive-Lasso method Upper increase penalty term.
Remember working day 10:00 to 17:00 resident position is the number and 22 of high-end office building and CBD:00 to next day 6:00 Resident position is the number two indices of high-end cell in all index xiK is designated as under in (i=1,2 ..., s)1,k2, i.e., Indicate working day 10:00 to 17:00 resident position is the number of high-end office building and CBD,Indicate indexCorresponding coefficient.
In order to guarantee the index in terms of signaling dataWithHigher weight is contributed, is needed to βjBetween difference carry out Control.Consider addition penalty termFor Con trolling indexWithCoefficient value, pass through limitationSize, ensure that indexWithCoefficient have to be larger than the coefficients of other indexs, that is, ensure letter Index in terms of data is enabled to contribute higher weight in model, and ψ1For penalty coefficient.
To sum up then there is adaptive logic regression model β=(β based on signaling data01,...,βs)TEstimator definition For
250, Rating Model is converted by regression model
It is the process of a scaling by the form that regression coefficient is converted to credit scoring, makes in order to facilitate business personnel With and scoring between difference have business meaning, it usually needs meet three point requirement:
1, scoring control in a certain range, as 0-900/.
2, in specific score, handy family and bad user have certain proportionate relationship, use here.
It measures, such as wishes that score value ratio of handy family and bad user when 600 points is 50:1.
3, the increase of score value should be able to reflect the variation at handy family and bad user's ratio, such as wish that score value does not increase by 50 Point, odds is also doubled.
The more common credit scoring equation of industry is as follows at present:
Score=offest+factor × ln (odds),
In order to meet above 3 conditions, party's formula need to meet following two equation
A, score=offest+factor × ln (odds)
B, score+pdo=offest+factor × ln (2 × odds)
Wherein pdo indicates that odds increases by 1 times and needs the increased value of score value.Then have
Factor=pdo/ln (2), offest=score-factor × ln (odds).
It is to obtain final scoring equation:
Score=offest+factor × ln (odds)
If score value ratio of handy family and bad user when 600 points is 50:1, and odds double when It waits, scoring increases by 50 points.Then have:
Factor=50/ln (2)=72.13, offest=600-72.13 × ln (50)=317.83
Then final scoring equation is obtained:Score=317.83+72.13 × ln (odds).
Due to-logit (P)=ln (odds) known to the left side of logistic regression equation, then step 4 is obtained into the estimator of βScoring equation is substituted into, is obtained:
Here xiIndicate the WOE value of branch mailbox corresponding to the value of i-th of variable (index),It is obtained for (3) formula The regression model coefficient arrived.
Therefore it can obtain corresponding to the score value of each branch mailbox of each variable according to scoring formula.
Wherein, WOE indicates the corresponding WOE value of branch mailbox of variable.
For method implementation, for simple description, therefore, it is stated as a series of action combinations, but ability Field technique personnel should be aware of, and embodiment of the present invention is not limited by the described action sequence, because according to the present invention Embodiment, some steps may be performed in other sequences or simultaneously.Secondly, those skilled in the art should also know that, Embodiment described in this description belongs to preferred embodiment, related movement embodiment party not necessarily of the present invention Necessary to formula.
Fig. 3 shows the structural representation of the personal credit scoring apparatus of the data based on instruction of one embodiment of the invention offer Figure, referring to fig. 2, which includes:Obtain module 310, branch mailbox module 320, modeling module 330, configuration module 340 and scoring Module, wherein:
Module 310 is obtained, selects positive sample from sample account group for obtaining sample account group, and according to preset rules Sheet and negative sample;
Branch mailbox module 320, for carrying out branch mailbox processing to the first pre-set level group, and according to negative sample in each branch mailbox Accounting obtains the corresponding WOE value of each branch mailbox;
Modeling module 330, for obtaining according to the corresponding WOE value of each branch mailbox the parameter of prebuild Logic Regression Models Estimator;
Configuration module 340 configures penalty term for the estimator to the Logic Regression Models parameter, and the penalty term is used In configuration signal data and other contributions of non-signaling data to the Logic Regression Models;
Grading module 350 obtains user for scoring according to personal credit of the Logic Regression Models to user Personal credit scoring.
Module 310 is obtained after receiving the instruction for starting scoring, account sample cluster is obtained from pre-established database, so Account sample cluster is divided afterwards, and division result is sent to branch mailbox module 320;Branch mailbox module 320 combines first to preset Index group carries out branch mailbox processing to the positive/negative sample received, the corresponding WOE data of each branch mailbox is then obtained, then by it It is sent to modeling module 330, combines the data received to parse pre-established model by modeling module 330, obtains mould The estimator of unknown parameter in type, is then sent to configuration module 340 for model;Configuration module 340 is by punishing model configuration Item is penalized, to limit the contribution of signaling indicator and non-signaling index to model, and is sent to grading module for the model completed is established 350;Rating Model 350 scores to user based on the model completed is established.
As it can be seen that the present embodiment carries out personal credit scoring using the adaptive logic regression model based on signaling data, from It adapts to choose to ensure that holding when screening index of personal credit Rating Model to the effective index of credit scoring and coefficient The important function for stablizing and embodying signaling data reduces the error of model coefficient, so that Rating Model is more reasonable.
Each functional module of the present apparatus is described in detail below:
Module 310 is obtained, the dispersion degree of the second pre-set level group for judging each sample account using Information Entropy; Positive sample and negative sample are chosen from the sample account group according to the dispersion degree of the second pre-set level group of each sample account This.
Configuration module 340 obtains relevant to signaling data first for analyzing the first pre-set level group Index group and the second index group unrelated with signaling data;Construct the second index group middle finger target coefficient and described first The penalty term of index group middle finger target coefficient;The penalty term is configured to the estimator of the Logic Regression Models parameter.
Grading module 350, for converting Rating Model for the Logic Regression Models under default constraint condition;It will Input of the corresponding data of the second pre-set level group of user as Rating Model obtains each in the second pre-set level group Index corresponds to the score value of each branch mailbox;The personal credit scoring of user is obtained according to the score value that each index corresponds to each branch mailbox.
As it can be seen that the technical program, which utilizes, is based on signaling data in contrast to existing logistic regression personal credit scoring technology Adaptive logic regression model traditional logistic regression personal credit Rating Model is improved, ensure that personal credit is commented Sub-model keeps stablizing and embodying when screening index the important function of signaling data, reduces the error of model coefficient, makes It is more reasonable to obtain Rating Model.
To sum up, in contrast to traditional personal credit methods of marking, this programme institute energy bring Efficiency Comparison is as follows:
For device embodiments, since it is substantially similar to method implementation, so be described relatively simple, Related place illustrates referring to the part of method implementation.
It should be noted that in all parts of the device of the invention, according to the function that it to be realized to therein Component has carried out logical partitioning, and still, the present invention is not only restricted to this, can according to need all parts are repartitioned or Person's combination.
All parts embodiment of the invention can be implemented in hardware, or to transport on one or more processors Capable software module is realized, or is implemented in a combination thereof.In the present apparatus, PC is by realizing internet to equipment or device Long-range control, the step of accurately controlling equipment or device each operation.The present invention is also implemented as executing here Some or all device or device programs of described method are (for example, computer program and computer program produce Product).Program of the invention, which is achieved, can store on a computer-readable medium, and the file or document tool that program generates There is statistics available property, generate data report and cpk report etc., batch testing can be carried out to power amplifier and counts.On it should be noted that Stating embodiment, illustrate the present invention rather than limit it, and those skilled in the art are not departing from It can be designed replacement embodiment in the case where attached the scope of the claims.It in the claims, should not will be between bracket Any reference symbol be configured to limitations on claims.Word "comprising" does not exclude the presence of member not listed in the claims Part or step.Word "a" or "an" located in front of the element does not exclude the presence of multiple such elements.The present invention can borrow Help include the hardware of several different elements and be realized by means of properly programmed computer.If listing equipment for drying Unit claim in, several in these devices, which can be, to be embodied by the same item of hardware.Word first, Second and the use of third etc. do not indicate any sequence.These words can be construed to title.
Finally it should be noted that:The above embodiments are merely illustrative of the technical solutions of the present invention, rather than its limitations;Although Present invention has been described in detail with reference to the aforementioned embodiments, those skilled in the art should understand that:It still may be used To modify the technical solutions described in the foregoing embodiments or equivalent replacement of some of the technical features; And these are modified or replaceed, technical solution of various embodiments of the present invention that it does not separate the essence of the corresponding technical solution spirit and Range.

Claims (10)

1. a kind of personal credit methods of marking of data based on instruction, which is characterized in that including:
Sample account group is obtained, and selects positive sample and negative sample from sample account group according to preset rules;
Branch mailbox processing is carried out to the first pre-set level group, and each branch mailbox is obtained according to the accounting of negative sample in each branch mailbox and is corresponded to WOE value;
The estimator of the parameter of prebuild Logic Regression Models is obtained according to the corresponding WOE value of each branch mailbox;
Penalty term is configured to the estimator of the Logic Regression Models parameter, the penalty term is for configuration signal data and other Contribution of the non-signaling data to the Logic Regression Models;
It is scored according to personal credit of the Logic Regression Models to user, obtains the personal credit scoring of user.
2. the method according to claim 1, wherein described select from sample account group according to preset rules Positive sample and negative sample include:
The dispersion degree of the second pre-set level group of each sample account is judged using Information Entropy;
Chosen from the sample account group according to the dispersion degree of the second pre-set level group of each sample account positive sample and Negative sample.
3. the method according to claim 1, wherein the estimator to the Logic Regression Models parameter is matched Setting penalty term includes:
The first pre-set level group is analyzed, the first index group relevant to signaling data and and signaling data are obtained The second unrelated index group;
Construct the penalty term of the second index group middle finger target coefficient Yu the first index group middle finger target coefficient;
The penalty term is configured to the estimator of the Logic Regression Models parameter.
4. according to the method described in claim 3, it is characterized in that, the penalty term is
Wherein, ψ1For penalty coefficient, βjFor the coefficient of j-th of index in the second index group,For in the first index group KthnThe coefficient of a index.
5. method according to claim 1-4, which is characterized in that it is described according to the Logic Regression Models to The personal credit at family carries out scoring:
Under default constraint condition, Rating Model is converted by the Logic Regression Models;
Using the corresponding data of the second pre-set level group of user as the input of Rating Model, the second pre-set level group is obtained In each index correspond to the score value of each branch mailbox;
The personal credit scoring of user is obtained according to the score value that each index corresponds to each branch mailbox.
6. a kind of personal credit scoring apparatus of data based on instruction, which is characterized in that including:
Module is obtained, for obtaining sample account group, and positive sample is selected from sample account group according to preset rules and bears Sample;
Branch mailbox module for carrying out branch mailbox processing to the first pre-set level group, and is obtained according to the accounting of negative sample in each branch mailbox Take the corresponding WOE value of each branch mailbox;
Modeling module, the estimator of the parameter for obtaining prebuild Logic Regression Models according to the corresponding WOE value of each branch mailbox;
Configuration module configures penalty term for the estimator to the Logic Regression Models parameter, and the penalty term is for configuring Signaling data and other contributions of non-signaling data to the Logic Regression Models;
Grading module obtains the individual of user for scoring according to personal credit of the Logic Regression Models to user Credit scoring.
7. device according to claim 6, which is characterized in that the acquisition module, for each using Information Entropy judgement The dispersion degree of second pre-set level group of sample account;According to the dispersion degree of the second pre-set level group of each sample account Positive sample and negative sample are chosen from the sample account group.
8. device according to claim 6, which is characterized in that the configuration module, for first pre-set level Group is analyzed, and the first index group relevant to signaling data and the second index group unrelated with signaling data are obtained;Building The penalty term of the second index group middle finger target coefficient and the first index group middle finger target coefficient;The penalty term is matched It sets to the estimator of the Logic Regression Models parameter.
9. device according to claim 8, which is characterized in that the penalty term is
Wherein, Ψ1For penalty coefficient, βjFor the coefficient of j-th of index in the second index group,For the first index group Middle kthnThe coefficient of a index.
10. according to the described in any item devices of claim 6-9, which is characterized in that institute's scoring module, in default constraint Under the conditions of, Rating Model is converted by the Logic Regression Models;Using the corresponding data of the second pre-set level group of user as The input of Rating Model obtains the score value that each index in the second pre-set level group corresponds to each branch mailbox;According to each finger The score value of the corresponding each branch mailbox of mark obtains the personal credit scoring of user.
CN201710322533.5A 2017-05-09 2017-05-09 The personal credit methods of marking and device of data based on instruction Pending CN108876076A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201710322533.5A CN108876076A (en) 2017-05-09 2017-05-09 The personal credit methods of marking and device of data based on instruction

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201710322533.5A CN108876076A (en) 2017-05-09 2017-05-09 The personal credit methods of marking and device of data based on instruction

Publications (1)

Publication Number Publication Date
CN108876076A true CN108876076A (en) 2018-11-23

Family

ID=64287486

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201710322533.5A Pending CN108876076A (en) 2017-05-09 2017-05-09 The personal credit methods of marking and device of data based on instruction

Country Status (1)

Country Link
CN (1) CN108876076A (en)

Cited By (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109325639A (en) * 2018-12-06 2019-02-12 南京安讯科技有限责任公司 A kind of credit scoring card automation branch mailbox method for credit forecast assessment
CN109584047A (en) * 2018-11-29 2019-04-05 北京玖富普惠信息技术有限公司 A kind of credit method, system, computer equipment and medium
CN110428270A (en) * 2019-08-07 2019-11-08 佰聆数据股份有限公司 The potential preference client recognition methods of the channel of logic-based regression algorithm
CN110544155A (en) * 2019-09-02 2019-12-06 中诚信征信有限公司 User credit score acquisition method, acquisition device, server and storage medium
CN110727510A (en) * 2019-09-25 2020-01-24 浙江大搜车软件技术有限公司 User data processing method and device, computer equipment and storage medium
WO2020125106A1 (en) * 2018-12-21 2020-06-25 苏宁易购集团股份有限公司 Similarity model-based data processing method and system
WO2020143233A1 (en) * 2019-01-07 2020-07-16 平安科技(深圳)有限公司 Method and device for building scorecard model, computer apparatus and storage medium

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20030110112A1 (en) * 1999-12-30 2003-06-12 Johnson Christopher D. Methods and systems for automated inferred valuation of credit scoring
CN101996381A (en) * 2009-08-14 2011-03-30 中国工商银行股份有限公司 Method and system for calculating retail asset risk
CN104537067A (en) * 2014-12-30 2015-04-22 广东电网有限责任公司信息中心 Box separation method based on k-means clustering
CN105894089A (en) * 2016-04-21 2016-08-24 百度在线网络技术(北京)有限公司 Method of establishing credit investigation model, credit investigation determination method and the corresponding apparatus thereof
CN106097095A (en) * 2016-06-08 2016-11-09 腾讯科技(深圳)有限公司 Determine the method and device of credit
CN106204106A (en) * 2016-06-28 2016-12-07 武汉斗鱼网络科技有限公司 A kind of specific user's recognition methods and system
CN106503562A (en) * 2015-09-06 2017-03-15 阿里巴巴集团控股有限公司 A kind of Risk Identification Method and device

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20030110112A1 (en) * 1999-12-30 2003-06-12 Johnson Christopher D. Methods and systems for automated inferred valuation of credit scoring
CN101996381A (en) * 2009-08-14 2011-03-30 中国工商银行股份有限公司 Method and system for calculating retail asset risk
CN104537067A (en) * 2014-12-30 2015-04-22 广东电网有限责任公司信息中心 Box separation method based on k-means clustering
CN106503562A (en) * 2015-09-06 2017-03-15 阿里巴巴集团控股有限公司 A kind of Risk Identification Method and device
CN105894089A (en) * 2016-04-21 2016-08-24 百度在线网络技术(北京)有限公司 Method of establishing credit investigation model, credit investigation determination method and the corresponding apparatus thereof
CN106097095A (en) * 2016-06-08 2016-11-09 腾讯科技(深圳)有限公司 Determine the method and device of credit
CN106204106A (en) * 2016-06-28 2016-12-07 武汉斗鱼网络科技有限公司 A kind of specific user's recognition methods and system

Cited By (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109584047A (en) * 2018-11-29 2019-04-05 北京玖富普惠信息技术有限公司 A kind of credit method, system, computer equipment and medium
CN109584047B (en) * 2018-11-29 2021-01-26 北京玖富普惠信息技术有限公司 Credit granting method, system, computer equipment and medium
CN109325639A (en) * 2018-12-06 2019-02-12 南京安讯科技有限责任公司 A kind of credit scoring card automation branch mailbox method for credit forecast assessment
WO2020125106A1 (en) * 2018-12-21 2020-06-25 苏宁易购集团股份有限公司 Similarity model-based data processing method and system
WO2020143233A1 (en) * 2019-01-07 2020-07-16 平安科技(深圳)有限公司 Method and device for building scorecard model, computer apparatus and storage medium
CN110428270A (en) * 2019-08-07 2019-11-08 佰聆数据股份有限公司 The potential preference client recognition methods of the channel of logic-based regression algorithm
CN110544155A (en) * 2019-09-02 2019-12-06 中诚信征信有限公司 User credit score acquisition method, acquisition device, server and storage medium
CN110544155B (en) * 2019-09-02 2023-05-19 中诚信征信有限公司 User credit score acquisition method, acquisition device, server and storage medium
CN110727510A (en) * 2019-09-25 2020-01-24 浙江大搜车软件技术有限公司 User data processing method and device, computer equipment and storage medium

Similar Documents

Publication Publication Date Title
CN108876076A (en) The personal credit methods of marking and device of data based on instruction
CN107766929B (en) Model analysis method and device
CN109360084A (en) Appraisal procedure and device, storage medium, the computer equipment of reference default risk
CN106204106A (en) A kind of specific user's recognition methods and system
CN107230108A (en) The processing method and processing device of business datum
CN110119948B (en) Power consumer credit evaluation method and system based on time-varying weight dynamic combination
CN107622326A (en) User's classification, available resources Forecasting Methodology, device and equipment
CN107507038A (en) A kind of electricity charge sensitive users analysis method based on stacking and bagging algorithms
CN109903182A (en) Power customer arrears risk analysis method and device based on random forests algorithm
CN108388974A (en) Top-tier customer Optimum Identification Method and device based on random forest and decision tree
CN108647818A (en) A kind of method and device of prediction enterprise concerning taxes risk
CN109615280A (en) Employee's data processing method, device, computer equipment and storage medium
CN109741177A (en) Appraisal procedure, device and the intelligent terminal of user credit
CN107545038A (en) A kind of file classification method and equipment
CN108154311A (en) Top-tier customer recognition methods and device based on random forest and decision tree
CN110930038A (en) Loan demand identification method, loan demand identification device, loan demand identification terminal and loan demand identification storage medium
CN106991577A (en) A kind of method and device for determining targeted customer
CN107609771A (en) A kind of supplier's value assessment method
CN108416684A (en) A kind of credibility appraisal procedure, device and the server of account main body
CN109740036A (en) OTA platform hotel's sort method and device
CN110263136B (en) Method and device for pushing object to user based on reinforcement learning model
CN109978575A (en) A kind of method and device excavated customer flow and manage scene
CN115130811A (en) Method and device for establishing power user portrait and electronic equipment
CN110310012A (en) Data analysing method, device, equipment and computer readable storage medium
CN110175883A (en) A kind of sort method, device, electronic equipment and non-volatile memory medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination