CN110349007A - The method, apparatus and electronic equipment that tenant group mentions volume are carried out based on variable discrimination index - Google Patents

The method, apparatus and electronic equipment that tenant group mentions volume are carried out based on variable discrimination index Download PDF

Info

Publication number
CN110349007A
CN110349007A CN201910587759.7A CN201910587759A CN110349007A CN 110349007 A CN110349007 A CN 110349007A CN 201910587759 A CN201910587759 A CN 201910587759A CN 110349007 A CN110349007 A CN 110349007A
Authority
CN
China
Prior art keywords
group
user
variable
model
volume
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201910587759.7A
Other languages
Chinese (zh)
Inventor
乾春涛
沈赟
郑彦
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shanghai Qiyu Information and Technology Co Ltd
Original Assignee
Shanghai Qiyu Information and Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shanghai Qiyu Information and Technology Co Ltd filed Critical Shanghai Qiyu Information and Technology Co Ltd
Priority to CN201910587759.7A priority Critical patent/CN110349007A/en
Publication of CN110349007A publication Critical patent/CN110349007A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/906Clustering; Classification
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q40/00Finance; Insurance; Tax strategies; Processing of corporate or income taxes
    • G06Q40/03Credit; Loans; Processing thereof

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Business, Economics & Management (AREA)
  • Accounting & Taxation (AREA)
  • Databases & Information Systems (AREA)
  • Finance (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Development Economics (AREA)
  • Economics (AREA)
  • Marketing (AREA)
  • Strategic Management (AREA)
  • Technology Law (AREA)
  • General Business, Economics & Management (AREA)
  • Financial Or Insurance-Related Operations Such As Payment And Settlement (AREA)

Abstract

Method, apparatus, electronic equipment and the computer-readable medium that tenant group mentions volume are carried out based on variable discrimination index the invention discloses a kind of.The method calculates the discrimination index of each variable for the user data that historical financial user concentrates, and is determined according to the discrimination index and divides group's variable and divide group regular.Then, a point group is carried out according to described point of group's rule to the user that historical financial user concentrates, and divides group to establish respectively for difference and mentions volume model.For new user, group's rule is divided to carry out a point group according to described, and mention volume strategy using the volume model determination that mentions of user group corresponding with the user.The present invention is based on variable discrimination indexs to carry out tenant group, and model is allowed to select different appraisal procedures in each user group, can effectively improve the predictive power of model, while can more precisely to the risk identification degree of user group.

Description

The method, apparatus and electronics that tenant group mentions volume are carried out based on variable discrimination index Equipment
Technical field
The present invention relates to computer information processing fields, are carried out in particular to one kind based on variable discrimination index Tenant group proposes the method, apparatus and electronic equipment of volume.
Background technique
Deficiency based on credit system, the letter of majority of populations lower in user's ratio that business bank holds credit card Not perfect with recording, credit information lacks, and business bank is difficult to cover this kind of crowd's offer financial service.Financial technology it is fast Speed development, accelerates the paces of general favour finance.Each internet financial institution, little Dai company by allow client submit various types of materials, Scene or telephone talk judge the authenticity and repaying ability of the credit requirement situation of client, solve nothing to a certain extent The credit problem of the financial service object of credit record.Everybody falls over each other researching and designing risk policy, such as inquires customers' credit Record the financial risks that number, the gender etc. being queried judge by these strategies and identify client.
In actual operation, there are some drawbacks and defects for the above method: 1, client's fill data may fake, and verify Human cost and difficulty are larger;2, enjoy the risk for related sale of property being shifted after financial service there are client;3, exist Crowd unstable can also bring the unstability of strategy therefore simply to design a set of risk policy in true application scenarios Top-tier customer may be missed, low-quality client is received.
Summary of the invention
Present invention seek to address that the existing volume model that mentions can not be taken for different user groups and targetedly mention volume strategy, To bring the inadaptability of risk policy.
In order to solve the above-mentioned technical problem, first aspect present invention proposes a kind of based on variable discrimination index progress user The method for dividing group to mention volume, comprising:
Calculate the discrimination index of each variable for the user data that historical financial user concentrates;
It is determined according to the discrimination index and divides group's variable and divide group regular;
A point group is carried out according to described point of group's rule to the user that historical financial user concentrates, and divides group to establish respectively for difference Mention volume model;
For new user, group's rule is divided to carry out a point group, and mentioning using user group corresponding with the user according to described The determination of volume model mentions volume strategy.
According to the preferred embodiment of the present invention, each change for calculating the user data that the historical financial user concentrates The discrimination index of amount includes:
Machine self-learning disaggregated model is established, the highest variable of discrimination is determined by Machine self-learning and based on the variable Divide group regular.
According to the preferred embodiment of the present invention, the Machine self-learning disaggregated model is decision-tree model.
According to the preferred embodiment of the present invention, described that the highest variable of discrimination is determined by Machine self-learning and is based on Group's rule of dividing of the variable includes:
By decision-tree model using mode classification that self study obtains as dividing group regular.
According to the preferred embodiment of the present invention, described to divide group to establish respectively for difference to mention volume model and include:
For each user group, training dataset, test number are established by the historical financial user data in the user group respectively According to collecting and mention volume model
It is according to the preferred embodiment of the present invention, described to divide group to establish respectively to mention volume model for difference further include:
Using the training dataset of each user group, test data set to mentioning volume model accordingly and be trained and test.
According to the preferred embodiment of the present invention, the K-S value curve for calculating each model, when the K-S be unsatisfactory for it is pre- When setting the goal, described point of group's rule is adjusted.
The second aspect that this hair is proposes a kind of device that volume is mentioned based on variable discrimination index progress tenant group, packet It includes:
Index computing module, the discrimination index of each variable of the user data for calculating historical financial user concentration;
Regular determining module, for dividing group's variable according to discrimination index determination and dividing group regular;
Grouping modeling module, the user for concentrating to historical financial user carry out a point group according to described point of group's rule, and Divide group to establish respectively for difference and mentions volume model;
Tactful determining module, for dividing group's rule to carry out a point group, and use and the user according to described for new user The volume model determination that mentions of corresponding user group mentions volume strategy.
According to the preferred embodiment of the present invention, the index computing module is used for: Machine self-learning disaggregated model is established, The highest variable of discrimination is determined by Machine self-learning and divides group regular based on the variable.
According to the preferred embodiment of the present invention, the Machine self-learning disaggregated model is decision-tree model.
According to the preferred embodiment of the present invention, the regular determining module is also used to: will be learnt by oneself by decision-tree model Acquistion to mode classification as dividing group regular.
According to the preferred embodiment of the present invention, the grouping modeling module is also used to:
For each user group, training dataset, test number are established by the historical financial user data in the user group respectively According to collecting and mention volume model.
According to the preferred embodiment of the present invention, the grouping modeling module is also used to:
Using the training dataset of each user group, test data set to mentioning volume model accordingly and be trained and test.
According to the preferred embodiment of the present invention, further includes:
Module is adjusted, for calculating the K-S value curve of each model, when the K-S is unsatisfactory for predeterminated target, to institute Point group's rule is stated to be adjusted.
In order to solve the above-mentioned technical problem, third aspect present invention propose a kind of electronic equipment comprising processor and The memory of computer executable instructions is stored, the computer executable instructions when executed execute the processor The method stated.
In order to solve the above-mentioned technical problem, fourth aspect present invention proposes a kind of computer readable storage medium, this is described Computer-readable recording medium storage one or more program is realized when one or more of programs are executed by processor Above-mentioned method.
The present invention for the deficiency in existing risk policy technology, provide it is a kind of based on tenant group to mention volume strategy true Determine method, apparatus and electronic equipment.The present invention is based on variable discrimination indexs to carry out tenant group, allows model in each user group The different appraisal procedure of middle selection, can effectively improve the predictive power of model, while can be more to the risk identification degree of user group What is added is accurate.
Detailed description of the invention
In order to keep technical problem solved by the invention, the technological means of use and the technical effect of acquirement clearer, Detailed description of the present invention specific embodiment below with reference to accompanying drawings.But it need to state, drawings discussed below is only this The attached drawing of the exemplary embodiment of invention, to those skilled in the art, without creative efforts, The attached drawing of other embodiments can be obtained according to these attached drawings.
Fig. 1 is the flow chart of the invention that mention volume strategy and determine method based on tenant group;
Fig. 2 is the schematic diagram for mentioning volume strategy and determining method based on tenant group of one embodiment of the present of invention;
Fig. 3 is the module composition figure for mentioning volume strategy determination apparatus based on tenant group of the third embodiment of the present invention;
Fig. 4 is that a kind of of the fourth embodiment of the present invention carries out the module composition that tenant group mentions the device of volume based on index Figure;
Fig. 5 is the structural block diagram of the exemplary embodiment of a kind of electronic equipment according to the present invention;
Fig. 6 is the schematic diagram of a computer-readable medium embodiment of the invention.
Specific embodiment
Exemplary embodiment of the present invention is more fully described with reference to the drawings, although each exemplary embodiment Can by it is a variety of it is specific in a manner of implement, but be not understood that the invention be limited to embodiment set forth herein.On the contrary, providing this A little exemplary embodiments are easily facilitated inventive concept being comprehensively communicated to ability to keep the contents of the present invention more complete The technical staff in domain.
Under the premise of meeting technical concept of the invention, the properity described in some specific embodiment, effect Fruit or other features can be integrated in any suitable manner in one or more other embodiments.
During the introduction for specific embodiment, the datail description to properity, effect or other features is In order to enable those skilled in the art to fully understand embodiment.But, it is not excluded that those skilled in the art can be Under specific condition, implement the present invention not contain the technical solution of above structure, performance, effect or other features.
Flow chart in attached drawing is only a kind of illustrative process demonstration, and not representing must include stream in the solution of the present invention All contents, operation and step in journey figure, also not representing must execute according to sequence shown in figure.For example, stream Operation/the step having in journey figure can decompose, and some operation/steps can merge or part merges, etc., not depart from this hair In the case where bright inventive concept, the execution sequence shown in flow chart can change according to the actual situation.
What the block diagram in attached drawing typicallyed represent is functional entity, might not be necessarily opposite with physically separate entity It answers.I.e., it is possible to realize these functional entitys using software form, or in one or more hardware modules or integrated circuit in fact These existing functional entitys, or these functions reality is realized in heterogeneous networks and/or processor device and/or microcontroller device Body.
Respectively the same reference numbers in the drawings refer to same or similar element, component or parts, thus hereinafter may It is omitted to same or similar element, component or partial repeated description.Although should also be understood that may use the herein One, the attribute of the expressions such as second, third number describes various devices, element, component or part, but these devices, element, Component or part should not be limited by these attributes.That is, these attributes are intended merely to distinguish one and another one.Example Such as, the first device is also referred to as the second device, but without departing from the technical solution of essence of the invention.In addition, term "and/or", " and/or " refer to all combinations including any one or more in listed project.
Fig. 1 is the flow chart of the invention that mention volume strategy and determine method based on tenant group.As shown in Figure 1, this method Include the following steps.
S1, it determines and divides group regular for what historical financial user collected.
Starting point of the invention is to carry out a point group to financial user, then for each user group, in conjunction with its distinctive number The user reflected according to feature is specific to establish air control strategy respectively.If establishing model regardless of user group, work as credit product User group ingredient or the ratio of each ingredient when changing, the effect of model will have a greatly reduced quality.Because for modeling, Most basic hypothesis is that the sample of modeling can represent the user group in future.Therefore, user can just only be prevented to tenant group Composition transfer bring adverse effect, allows air control strategy to become stable.
We first have to determine point of group's rule as a result, and the selection of this point of group's rule may need to consider Multiple factors, But once it is determined that dividing group regular, then the rule can should reasonably be divided group for the new user that may be come out. Also, this point of group's rule first has to use in the training of model, and guarantees: the tactful mould trained with which type of crowd Type is used in same crowd subsequent.
And dividing group regular to most optimally determine, the present invention is not the subjective setting of experience progress with people, but base It is determined after the data collected for a historical financial user are analyzed.Certainly, in order to enable the invention to handle those realities The new financial user that border is likely encountered, the present invention is when selecting historical financial user collection, on the one hand it is contemplated that data set In user diversification, on the other hand to consider the user's ingredient encountered with actual capabilities have similitude.
The variable for dividing the rule of group to generally include point group, and the classifying rules according to variable.The variable and not only It is limited to a variable, is sometimes also possible to the combination of two variables.It is, for example, possible to use " age " and " gender " two variables User group is divided into " male 30 years old or less ", " women 30 years old or less ", " male 30 years old or more ", " women 30 years old or more " four Class, here it is classifying rules.Certainly, this only simple citing, actual conditions needs are according to historical financial user's set analysis Or it calculates and determines, such as determine that the age divides gear number value.
Accordingly, it is determined that the variable for dividing group's rule to first have to true this point of group, secondary determination according to variable classifying rules.This Invention proposes that a variety of determinations divide group's scheme of rule.
A kind of preferred embodiment is to calculate the importance of each variable for the user data that the historical financial user concentrates Index determines the classifying rules for dividing group's variable and Dependent variable, according to the importance index.So-called importance refers to the variable The significance level for the result (dependent variable) that (independent variable) exports model.In general, can be increased by IV value, the information of calculating variable Benefit or Gini coefficient etc. are measured.
Another preferred embodiment is to calculate the distribution of each variable for the user data that the historical financial user concentrates Stability indicator determines the classifying rules for dividing group's variable and Dependent variable, according to the distributional stability index.So-called distribution is steady It is qualitative refer to the variable at any time, the influence of environment, application scenarios etc. it is smaller, the degree that the distribution of sample tends towards stability.Example Such as, we can pass through the stability of PSI (Stability index, Population Stability Index) Lai Hengliang variable.
Yet another preferred form is to calculate the differentiation of each variable for the user data that the historical financial user concentrates Index is spent, the classifying rules for dividing group's variable and Dependent variable, is determined according to the discrimination index.
The discrimination of variable refers not only to the influence degree that variable itself exports model herein, and refers to taking for independent variable The different value difference bring influence degrees to dependent variable of value, that is to say, that whether the different values of independent variable also can Mode input (dependent variable) is brought to take different value.In general, to obtain discrimination, therefore we cannot immediately pass through a certain algorithm The present invention preferably establishes Machine self-learning disaggregated model, determines the highest variable of discrimination by Machine self-learning and is based on being somebody's turn to do Thus the classifying rules of variable can directly obtain a point reference for group's rule.
Yet another preferred form is to calculate the independent variable for the user data that the historical financial user concentrates to because becoming The disturbance degree index of amount determines the classifying rules for dividing group's variable and Dependent variable, according to the disturbance degree index.
Disturbance degree designated herein not refers to merely different degree, and refers to the degree of correlation of dependent variable and independent variable.This with Above-mentioned different degree index is discrepant.Different degree is one exhausted more consideration is given to being contribution degree of the variable to output To value, and more consideration is given to the relevances before variable and output for disturbance degree, are a relative values.
It proposes to calculate the independent variable to the disturbance degree of dependent variable using Boruta algorithm in the present invention.This will have below Body explanation.
S2, the user concentrated to historical financial user are divided at least two user groups according to described point of group's rule.
When point group rule has been determined, so that it may construct model respectively for different user groups.Although main in the present invention What is considered is to mention volume class model, but the principle of the present invention can also be used in other models.It, might not for different user groups Using congener model, but tend to using Machine self-learning model, such as xgboost or neural network etc..For machine Device learning model usually requires to be trained, we also need what is determined according to step S1 to divide group's rule to going through herein as a result, History finance user collection first carries out a point group, with the model of toilet training for particular group training.
S3, for each user group, respectively by the historical financial user data in the user group establish training dataset, test Data set and mention volume model.
The step is that each user group is carried out respectively, it should be noted that training dataset, test data set are all The corresponding divided data collection of historical financial user collection after dividing group.
S4, it is trained and tests to mentioning volume model accordingly using the training dataset of each user group, test data set.
After dividing group to historical financial user collection, historical financial user's divided data collection of available different user group is right In each user group, it can be modeled, trained and tested respectively based on this.It should be noted that each user group modeling, training with And test is also independently to carry out.
Risk policy model is established based on user group positioning, model is allowed to select different assessment sides in every class user group Method can effectively improve the predictive power of model, while more precisely can reduce finance letter to the risk identification degree of user group With the cost of evaluation, so as to provide the service of general favour finance for more people.
S5, for new user, divide group's rule to carry out a point group according to described, mention volume using the corresponding user group of the user Model determination mentions volume strategy.
After the completion of for all training of the model of each user group and testing, so that it may carry out actual mentioning volume strategy really It is fixed.It mentions volume model on the one hand to determine whether that volume should be mentioned, on the other hand for determining the specific number for the volume that mentions.For specifically mentioning Volume model, the present invention do not make the limitation of upper body, still, for new user, the present invention claims also according to the training pattern when Divide group's rule to carry out a point group, using the corresponding user group of the user mention volume model determination mention volume strategy.
Embodiment one
Fig. 2 is the schematic diagram for mentioning volume strategy and determining method based on tenant group of one embodiment of the present of invention.Such as Fig. 2 Shown, which is to obtain and mention volume air control strategy.Volume air control strategy is mentioned in order to obtain, a point of group is formulated first and advises Then, the user data that the historical financial user that will acquire concentrates carries out a point group, obtains four user groups: user group one, user group Two, user group three and user group four.
In this embodiment, we establish point group's rule using discrimination index.That is, we calculate the history first The discrimination index of each variable for the user data that financial user concentrates.In order to calculate, to find out discrimination high from multiple changes reason Become reason, we use decision-tree model herein.Decision-tree model belongs to Machine self-learning disaggregated model, will by decision-tree model The mode classification that self study obtains, which is used as, divides group regular.
Decision tree is a kind of tree structure, wherein each internal node indicates the test on an attribute, each branch's generation One test output of table, each leaf node represent a kind of classification.
Classification tree (decision tree) is a kind of supervision study, and so-called supervision study is exactly given a pile sample, each sample There are one group of attribute and a classification, these classifications are pre-determined, then a classifier is obtained by study, this classification Device can provide correct classification to emerging object.Such machine learning is just referred to as supervised learning.We establish One decision-tree model, and the data match plan tree that usage history financial user data is concentrated is trained can be with when trained It selects label data relevant to amount as dependent variable, mass data can also be passed through using promise breaking information as dependent variable Training, our available classification tree.In this embodiment, the most trunk classification that we are likely to be obtained classification tree is " year Age whether be greater than 27 " or most important two classification be " whether the age is greater than 27 " and " gender ".As a result, we Obtain two points of group's variables: thus user is divided into four classes by " whether the age is greater than 27 " and " gender ":
1, the age is greater than 27 male;
2, the age is greater than 27 women;
3, the age is not more than 27 male;
4, the age is not more than 27 women.
Then, for each user group, that is, correspond to each historical financial user diversity of different user group, extract respectively certain The user data of amount is as training set and test set.For example, the half data in each historical financial user diversity can be used for Training, another half data is for testing.
Then into the model architecture stage, i.e., volume model is mentioned for that can establish respectively corresponding to any user group.? When establishing model, it is necessary first to be pre-processed to the data in training set and test set, so that each variable standardization.Pre- When processing, it would be desirable to:
1) to the processing of missing values
A kind of mode is directly the whole record deletion for having missing values.This method is suitble to data sample larger and lacks Less scene is recorded, deletes missing record to entire effect very little.Another way is that one new variables of construction is scarce to mark Mistake value: missing is just labeled as 1, and 0 is labeled as without lacking.This method thinks that missing values itself are a significant information, It is unable to simple process to fall, it is necessary to be marked.Yet another approach is to replace missing values with a value, specifically with what be worth come It replaces this method and also compares more, such as logarithm type variable can be considered with mean value, maximum to classification type variable frequency That value (mode).
2) value of classification type variable is recompiled
Value for classification type variable is typically all label, and typically character string stores, this just needs it to compile again Code is numerical value.
3) continuous variable is carried out branch mailbox, is then recompiled according still further to the mode of processing classification type variable
Branch mailbox mode can be customized branch mailbox, i.e. the boundary value of branch mailbox is customized;Another way is wide branch mailbox, i.e., The boundary of each case is in arithmetic progression.Such as with 10 for interval to age branch mailbox, that is, carry out branch mailbox according to such section: 0-10,10-20,20-30,….Also one is equal deep branch mailbox, that is, guarantee that record number is the same in each case or satisfaction is specified Ratio.
4) continuous variable is standardized and normalized
Standardization and normalization belong to the method for carrying out nondimensionalization processing to variable, and purpose is exactly to make different rule The data of lattice scale are transformed on same specification scale.
In addition, being converted the variable of certain initial data to be allowed to more have and explain in a model Property, here it is the derivative variables obtained on the basis of basic underlying variables, for example, we can go back according to that should refund the date with practical The money date is calculated with the presence or absence of overdue variable, also it is available whether overdue number of days be more than some threshold value variable, example Such as whether 7 days overdue, if 30 days overdue, etc..
In addition, dependent variable should be also determined at this stage, and for mentioning for volume model or amount model, available one It is a to propose volume scoring and calculating proposes volume amount, therefore dependent variable can be and mention volume point to the user.
Next, we are respectively established each user group, establish here be Logistic model and/or XGBoost model, and use the training set of different user group into the corresponding model of training.After training is completed, using corresponding Test set tested.
If test is not up to corresponding standard, it would be desirable to modified accordingly the parameter of model.When test table Bright model resolution is low when be due to dividing group's rule to cause, it is also necessary to divide group's rule establishment step back to what is most started, Readjust the classifying rules of the selection and Dependent variable, that divide group's variable.
Finally, we have obtained the model for all user groups that test passes through.New finance can be used using the model Family propose the determination of volume strategy.But unlike the prior art when, we application mention volume model when, first have to basis Divide group's rule that new user is carried out a point group, and is sent to different mention in volume model according to grouping result.
A kind of adjustment mode is the K-S value curve for calculating each model, right when the K-S is unsatisfactory for predeterminated target Described point of group's rule is adjusted.
Embodiment two
What is different from the first embodiment is that the embodiment divides group regular using different.In the full example of the implementation, calculate The independent variable for the user data that historical financial user concentrates determines the disturbance degree index of dependent variable according to the disturbance degree index Divide group's variable and divides group regular.The independent variable is calculated to the disturbance degree of dependent variable using Boruta algorithm herein.
Boruta is a kind of variable selection algorithm.It speaks by the book, it is a kind of packaging algorithm around random forest.We Know, feature selecting is a step very crucial in prediction model.When constructing model of the data set comprising multiple variables, this A step is particularly important.
The independent variable is calculated to the disturbance degree of dependent variable using Boruta algorithm, comprising: is entire historical financial user Collection, which is established, mentions volume model, and usage history finance user collects the training volume of the mentioning model;Using Boruta algorithm to the weight of independent variable It spends and is ranked up;It deletes and selects the independent variable with higher different degree.
Described be ranked up using different degree of the Boruta algorithm to independent variable includes: the shadow character for creating independent variable, It is spliced to the eigenmatrix of independent variable, constitutes new eigenmatrix;Volume model is mentioned using described in the new eigenmatrix training, Important point of computational shadowgraph feature and former independent variable feature;Shadow character maximum value is taken, important point when independent variable feature is greater than When shadow character is important, hit at first time is recorded.Referred to using the accumulative hit-count of the independent variable feature as different degree Mark, is ranked up the different degree of the independent variable.
The step of here is the operation of boruta algorithm:
1. firstly, it by creation mixing copy all variables (i.e. shade variable) be give data set increase with Machine.
2. then, it trains the growth data collection of random forest classification, and (silent using a feature importance measure Recognize and be set as average reduction precision), it is more high with the importance of each variable of assessment, mean more important.
3. in each iteration, it checks whether real features than best shadow character have higher importance It (i.e. whether this feature is higher than maximum shadow character score) and constantly deletes it and is considered as very unessential feature.
4. finally, when all features are confirmed or are refused or algorithm reaches limit as defined in one of random forest operation When processed, algorithm stops.
Boruta follows all relevant feature selection approach, it can capture the related all features of outcome variable. In contrast, most of traditional feature selecting algorithms all follow a smallest optimization method, they depend on the one of feature A small subset can generate minimal error in selection sort.
When carrying out the fitting of Random Forest model to data set, it can recursively handle and be showed not in each iterative process Good feature.This method reduces the error of Random Forest model to the maximum extent, and it is optimal that this will ultimately form a minimum Character subset.This is occurred by the excessive compact version of one input data set of selection, in turn, can lose some relevant spies Sign.
It will be understood by those skilled in the art that realizing that all or part of the steps of above-described embodiment is implemented as by data Manage the program that equipment (including computer) executes, i.e. computer program.It is performed in the computer program, this hair may be implemented The above method of bright offer.Moreover, the computer program can store in computer readable storage medium, which is situated between Matter can be the readable storage medium storing program for executing such as disk, CD, ROM, RAM, be also possible to the storage array of multiple storage medium compositions, example Such as disk or tape storage array.The storage medium is not limited to centralised storage, is also possible to distributed storage, such as Cloud storage based on cloud computing.
The device of the invention embodiment is described below, which can be used for executing embodiment of the method for the invention.For Details described in apparatus of the present invention embodiment should be regarded as the supplement for above method embodiment;For in apparatus of the present invention Undisclosed details in embodiment is referred to above method embodiment to realize.
Embodiment three
Fig. 3 is the module composition figure for mentioning volume strategy determination apparatus based on tenant group of the third embodiment of the present invention. As shown in figure 3, the device includes that rule establishes module, grouping module, model building module and tactful determining module.
Rule establishes module, divides group regular for what historical financial user collected for determining.
Accordingly, it is determined that the variable for dividing group's rule to first have to true this point of group, secondary determination according to variable classifying rules.This Invention proposes that a variety of determinations divide group's scheme of rule.
A kind of preferred embodiment is to calculate the importance of each variable for the user data that the historical financial user concentrates Index determines the classifying rules for dividing group's variable and Dependent variable, according to the importance index.So-called importance refers to the variable The significance level for the result (dependent variable) that (independent variable) exports model.In general, can be increased by IV value, the information of calculating variable Benefit or Gini coefficient etc. are measured.
Another preferred embodiment is to calculate the distribution of each variable for the user data that the historical financial user concentrates Stability indicator determines the classifying rules for dividing group's variable and Dependent variable, according to the distributional stability index.So-called distribution is steady It is qualitative refer to the variable at any time, the influence of environment, application scenarios etc. it is smaller, the degree that the distribution of sample tends towards stability.Example Such as, we can pass through the stability of PSI (Stability index, Population Stability Index) Lai Hengliang variable.
Yet another preferred form is to calculate the differentiation of each variable for the user data that the historical financial user concentrates Index is spent, the classifying rules for dividing group's variable and Dependent variable, is determined according to the discrimination index.
Yet another preferred form is to calculate the independent variable for the user data that the historical financial user concentrates to because becoming The disturbance degree index of amount determines the classifying rules for dividing group's variable and Dependent variable, according to the disturbance degree index.
Grouping module, the user for concentrating to historical financial user are divided at least two users according to described point of group's rule Group.
Grouping module need rule determine it is determining divide group's rule first to carry out a point group to historical financial user collection, so as to The model trained is for particular group training.This stroke of block is that each user group is carried out respectively, but should infuse Meaning, training dataset, test data set are all derived from the corresponding divided data collection of the historical financial user collection after point group.
Model building module, for being established by the historical financial user data in the user group respectively for each user group Training dataset, test data set and volume model is mentioned, and using the training dataset of each user group, test data set to corresponding The volume model that mentions be trained and test.
Risk policy model is established based on user group positioning, model is allowed to select different assessment sides in every class user group Method can effectively improve the predictive power of model, while more precisely can reduce finance letter to the risk identification degree of user group With the cost of evaluation, so as to provide the service of general favour finance for more people.
Tactful determining module is used to divide group's rule to carry out a point group according to described, uses the user couple for new user The volume model determination that mentions for the user group answered mentions volume strategy.
After the completion of for all training of the model of each user group and testing, so that it may which space is actual to mention volume strategy really It is fixed.It mentions volume model on the one hand to determine whether that volume should be mentioned, on the other hand for determining the specific number for the volume that mentions.For specifically mentioning Volume model, the present invention do not make the limitation of upper body, still, for new user, the present invention claims also according to the training pattern when Divide group's rule to carry out a point group, using the corresponding user group of the user mention volume model determination mention volume strategy.
Example IV
Fig. 4 is the module composition figure of the device that volume is mentioned based on index progress tenant group of the fourth embodiment of the present invention. As shown in figure 4, the device includes index computing module, regular determining module, grouping modeling module and tactful determining module.
Index computing module is used to calculate the index of each variable of the user data of historical financial user concentration.The index It can be any index above-mentioned, including different degree index, discrimination index, stability indicator and disturbance degree index etc..
Regular determining module, which is used to be determined according to the index, divides group's variable and divides group regular.It, can for different rules To establish a variety of different points of group's rules according to foregoing manner.
Grouping modeling module is used to carry out a point group according to described point of group's rule to the user that historical financial user concentrates, and is Difference divides group to establish respectively to mention volume model.The module is also used to training and test to each model.
Tactful determining module divides group's rule to carry out a point group for new user, according to described, and use is corresponding with the user User group mention volume model determination mention volume strategy.
It will be understood by those skilled in the art that each module in above-mentioned apparatus embodiment can be distributed in device according to description In, corresponding change can also be carried out, is distributed in one or more devices different from above-described embodiment.The mould of above-described embodiment Block can be merged into a module, can also be further split into multiple submodule.
Electronic equipment embodiment of the invention is described below, which can be considered as the method for aforementioned present invention With the embodiment of the entity form of Installation practice.For details described in electronic equipment embodiment of the present invention, should be regarded as For the supplement of the above method or Installation practice;It, can be with for the undisclosed details in electronic equipment embodiment of the present invention It is realized referring to the above method or Installation practice.
Fig. 5 is the structural block diagram of the exemplary embodiment of a kind of electronic equipment according to the present invention.The electronics that Fig. 5 is shown is set A standby only example, should not function to the embodiment of the present invention and use scope bring any restrictions.
As shown in figure 5, the electronic equipment 510 of the exemplary embodiment is showed in the form of communications data processing unit.Electricity The component of sub- equipment 510 can include but is not limited to: at least one processing unit 511, at least one storage unit 512, connection The buses 516 of different system components (including storage unit 512 and processing unit 511), display unit 513 etc..
Wherein, the storage unit 512 is stored with computer-readable program, can be source program or all reader Code.Described program can be executed with unit 511 processed, so that the processing unit 210 executes the various embodiments of the present invention The step of.For example, the processing unit 511 can execute step as shown in Figure 1.
The storage unit 512 may include the readable medium of volatile memory cell form, such as random access memory Unit (RAM) 5121 and/or cache memory unit 5122 can further include read-only memory unit (ROM) 5123. The storage unit 512 can also include program/utility 5124 with one group of (at least one) program module 5125, this The program module 5125 of sample includes but is not limited to: operating system, one or more application program, other program modules and journey It may include the realization of network environment in ordinal number evidence, each of these examples or certain combination.
Bus 516 can be to indicate one of a few class bus structures or a variety of, including storage unit bus or storage Cell controller, peripheral bus, graphics acceleration port, processing unit use any bus structures in a variety of bus structures Local bus.
Electronic equipment 510 can also be with one or more external equipments 520 (such as keyboard, display, the network equipment, indigo plant Tooth equipment etc.) communication, it enables a user to interact via these external equipments 520 with the electronic equipment 520, and/or make the electricity Sub- equipment 510 can be communicated with one or more of the other data processing equipment (such as router, modem etc.).This Kind communication can be carried out by input/output (I/O) interface 514, can also pass through network adapter 515 and one or more Network (such as local area network (LAN), wide area network (WAN) and/or public network, such as internet) carry out.Network adapter 515 can To be communicated by bus 516 with other modules of electronic equipment 520.It should be understood that although not shown in the drawings, electronic equipment 510 In other hardware and/or software module can be used, including but not limited to: microcode, device driver, redundant processing unit, outer Portion's disk drive array, RAID system, tape drive and data backup storage system etc..
Fig. 6 is the schematic diagram of a computer-readable medium embodiment of the invention.As shown in fig. 6, the computer journey Sequence can store on one or more computer-readable mediums.Computer-readable medium can be readable signal medium or can Read storage medium.Readable storage medium storing program for executing for example can be but be not limited to the system of electricity, magnetic, optical, electromagnetic, infrared ray or semiconductor, Device or device, or any above combination.The more specific example (non exhaustive list) of readable storage medium storing program for executing includes: tool Have the electrical connections of one or more conducting wires, portable disc, hard disk, random access memory (RAM), read-only memory (ROM), can Erasing programmable read only memory (EPROM or flash memory), optical fiber, portable compact disc read only memory (CD-ROM), optical storage Device, magnetic memory device or above-mentioned any appropriate combination.When the computer program is by one or more data processings When equipment executes, so that the computer-readable medium can be realized the above method of the invention.
Through the above description of the embodiments, those skilled in the art it can be readily appreciated that the present invention describe it is exemplary Embodiment can also be realized by software realization in such a way that software is in conjunction with necessary hardware.Therefore, according to this hair The technical solution of bright embodiment can be embodied in the form of software products, which can store calculates at one In the readable storage medium of machine (can be CD-ROM, USB flash disk, mobile hard disk etc.) or on network, including some instructions are so that one Platform data processing equipment (can be personal computer, server or network equipment etc.) executes above-mentioned side according to the present invention Method.
The computer readable storage medium may include in a base band or the data as the propagation of carrier wave a part are believed Number, wherein carrying readable program code.The data-signal of this propagation can take various forms, including but not limited to electromagnetism Signal, optical signal or above-mentioned any appropriate combination.Readable storage medium storing program for executing can also be any other than readable storage medium storing program for executing Readable medium, the readable medium can send, propagate or transmit for by instruction execution system, device or device use or Person's program in connection.The program code for including on readable storage medium storing program for executing can transmit with any suitable medium, packet Include but be not limited to wireless, wired, optical cable, RF etc. or above-mentioned any appropriate combination.
The program for executing operation of the present invention can be write with any combination of one or more programming languages Code, described program design language include object oriented program language-Java, C++ etc., further include conventional Procedural programming language-such as " C " language or similar programming language.Program code can be fully in user It calculates and executes in equipment, partly executes on a user device, being executed as an independent software package, partially in user's calculating Upper side point is executed on a remote computing or is executed in remote computing device or server completely.It is being related to far Journey calculates in the situation of equipment, and remote computing device can pass through the network of any kind, including local area network (LAN) or wide area network (WAN), it is connected to user calculating equipment, or, it may be connected to external computing device (such as utilize ISP To be connected by internet).
In conclusion the present invention can execute method, apparatus, electronic equipment or the computer-readable medium of computer program To realize.The communications data processing units such as microprocessor or digital signal processor (DSP) can be used in practice to come in fact Existing some or all functions of the invention.
Particular embodiments described above has carried out further in detail the purpose of the present invention, technical scheme and beneficial effects It describes in detail bright, it should be understood that the present invention is not inherently related to any certain computer, virtual bench or electronic equipment, various The present invention also may be implemented in fexible unit.The above is only a specific embodiment of the present invention, is not limited to this hair Bright, all within the spirits and principles of the present invention, any modification, equivalent substitution, improvement and etc. done should be included in the present invention Protection scope within.

Claims (10)

1. a kind of carry out the method that tenant group mentions volume based on variable discrimination index characterized by comprising
Calculate the discrimination index of each variable for the user data that historical financial user concentrates;
It is determined according to the discrimination index and divides group's variable and divide group regular;
A point group is carried out according to described point of group's rule to the user that historical financial user concentrates, and divides group to establish respectively for difference and mentions volume Model;
For new user, group's rule is divided to carry out a point group according to described, and mention volume mould using user group corresponding with the user Type determination mentions volume strategy.
2. according to the method described in claim 1, it is characterized by:
The discrimination index of each variable for calculating the user data that the historical financial user concentrates includes:
Machine self-learning disaggregated model is established, the highest variable of discrimination and dividing based on the variable are determined by Machine self-learning Group's rule.
3. -2 described in any item methods according to claim 1, it is characterised in that:
The Machine self-learning disaggregated model is decision-tree model.
4. method according to claim 1-3, it is characterised in that: described to determine discrimination by Machine self-learning Highest variable and group's rule of dividing based on the variable include:
By decision-tree model using mode classification that self study obtains as dividing group regular.
5. method according to claim 1-4, it is characterised in that: described to divide group to establish respectively to mention volume mould for difference Type includes:
For each user group, training dataset, test data set are established by the historical financial user data in the user group respectively And mention volume model.
6. method according to claim 1-5, it is characterised in that: described to divide group to establish respectively to mention volume mould for difference Type further include:
Using the training dataset of each user group, test data set to mentioning volume model accordingly and be trained and test.
7. method according to claim 1-6, it is characterised in that: further include:
The K-S value curve for calculating each model adjusts described point of group's rule when the K-S is unsatisfactory for predeterminated target It is whole.
8. a kind of carry out the device that tenant group mentions volume based on variable discrimination index characterized by comprising
Index computing module, the discrimination index of each variable of the user data for calculating historical financial user concentration;
Regular determining module, for dividing group's variable according to discrimination index determination and dividing group regular;
Grouping modeling module, the user for concentrating to historical financial user carry out a point group according to described point of group's rule, and for not Volume model is mentioned with dividing group to establish respectively;
Tactful determining module, for dividing group's rule to carry out a point group according to described, and use is corresponding with the user for new user User group mention volume model determination mention volume strategy.
9. a kind of electronic equipment, comprising:
Processor;And
The memory of computer executable instructions is stored, the computer executable instructions when executed hold the processor Row method according to any one of claims 1-7.
10. a kind of computer readable storage medium, wherein the computer-readable recording medium storage one or more program, When one or more of programs are executed by processor, method of any of claims 1-7 is realized.
CN201910587759.7A 2019-07-02 2019-07-02 The method, apparatus and electronic equipment that tenant group mentions volume are carried out based on variable discrimination index Pending CN110349007A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910587759.7A CN110349007A (en) 2019-07-02 2019-07-02 The method, apparatus and electronic equipment that tenant group mentions volume are carried out based on variable discrimination index

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910587759.7A CN110349007A (en) 2019-07-02 2019-07-02 The method, apparatus and electronic equipment that tenant group mentions volume are carried out based on variable discrimination index

Publications (1)

Publication Number Publication Date
CN110349007A true CN110349007A (en) 2019-10-18

Family

ID=68177912

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910587759.7A Pending CN110349007A (en) 2019-07-02 2019-07-02 The method, apparatus and electronic equipment that tenant group mentions volume are carried out based on variable discrimination index

Country Status (1)

Country Link
CN (1) CN110349007A (en)

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111582646A (en) * 2020-04-09 2020-08-25 上海淇毓信息科技有限公司 User policy risk early warning method and device and electronic equipment
CN112017062A (en) * 2020-07-15 2020-12-01 北京淇瑀信息科技有限公司 Resource limit distribution method and device based on guest group subdivision and electronic equipment
CN112308466A (en) * 2020-11-26 2021-02-02 东莞市盟大塑化科技有限公司 Enterprise qualification auditing method and device, computer equipment and storage medium
CN112819527A (en) * 2021-01-29 2021-05-18 百果园技术(新加坡)有限公司 User grouping processing method and device
CN113724061A (en) * 2021-08-18 2021-11-30 杭州信雅达泛泰科技有限公司 Consumer financial product credit scoring method and device based on customer grouping

Cited By (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111582646A (en) * 2020-04-09 2020-08-25 上海淇毓信息科技有限公司 User policy risk early warning method and device and electronic equipment
CN112017062A (en) * 2020-07-15 2020-12-01 北京淇瑀信息科技有限公司 Resource limit distribution method and device based on guest group subdivision and electronic equipment
CN112017062B (en) * 2020-07-15 2024-06-07 北京淇瑀信息科技有限公司 Resource quota distribution method and device based on guest group subdivision and electronic equipment
CN112308466A (en) * 2020-11-26 2021-02-02 东莞市盟大塑化科技有限公司 Enterprise qualification auditing method and device, computer equipment and storage medium
CN112819527A (en) * 2021-01-29 2021-05-18 百果园技术(新加坡)有限公司 User grouping processing method and device
CN112819527B (en) * 2021-01-29 2024-05-24 百果园技术(新加坡)有限公司 User grouping processing method and device
CN113724061A (en) * 2021-08-18 2021-11-30 杭州信雅达泛泰科技有限公司 Consumer financial product credit scoring method and device based on customer grouping

Similar Documents

Publication Publication Date Title
CN110349000A (en) Method, apparatus and electronic equipment are determined based on the volume strategy that mentions of tenant group
CN110415103A (en) The method, apparatus and electronic equipment that tenant group mentions volume are carried out based on variable disturbance degree index
CN110349007A (en) The method, apparatus and electronic equipment that tenant group mentions volume are carried out based on variable discrimination index
CN108399509A (en) Determine the method and device of the risk probability of service request event
CN108648074A (en) Loan valuation method, apparatus based on support vector machines and equipment
KR20180041174A (en) Risk Assessment Methods and Systems
CN111199474A (en) Risk prediction method and device based on network diagram data of two parties and electronic equipment
CN110033284A (en) Source of houses verification method, apparatus, equipment and storage medium
CN111583017A (en) Risk strategy generation method and device based on guest group positioning and electronic equipment
CN113283795B (en) Data processing method and device based on two-classification model, medium and equipment
CN114219360A (en) Monitoring safety prediction method and system based on model optimization
CN113742492A (en) Insurance scheme generation method and device, electronic equipment and storage medium
CN110399473A (en) The method and apparatus for determining answer for customer problem
JP6856503B2 (en) Impression estimation model learning device, impression estimation device, impression estimation model learning method, impression estimation method, and program
CN111210332A (en) Method and device for generating post-loan management strategy and electronic equipment
CN110348995A (en) A kind of credit risk control method, apparatus and electronic equipment based on risk attribution
Sampath et al. A generalized decision support framework for large‐scale project portfolio decisions
US20210358044A1 (en) Analysis and visual presentation of dataset components
Yuping et al. New methods of customer segmentation and individual credit evaluation based on machine learning
CN116911994B (en) External trade risk early warning system
CN110033165A (en) The recommended method of overdue loaning bill collection mode, device, medium, electronic equipment
CN112508690A (en) Risk assessment method and device based on joint distribution adaptation and electronic equipment
CN113298121A (en) Message sending method and device based on multi-data source modeling and electronic equipment
CN111382909A (en) Rejection inference method based on survival analysis model expansion bad sample and related equipment
CN116611911A (en) Credit risk prediction method and device based on support vector machine

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
CB02 Change of applicant information

Country or region after: China

Address after: Room 1118, No.4, Lane 800, Tongpu Road, Putuo District, Shanghai 200062

Applicant after: SHANGHAI QIYU INFORMATION TECHNOLOGY Co.,Ltd.

Address before: 201500 room a1-5962, 58 Fumin Branch Road, Hengsha Township, Chongming District, Shanghai (Shanghai Hengtai Economic Development Zone)

Applicant before: SHANGHAI QIYU INFORMATION TECHNOLOGY Co.,Ltd.

Country or region before: China

CB02 Change of applicant information