WO2018036402A1 - 模型中关键变量的探测方法及装置 - Google Patents

模型中关键变量的探测方法及装置 Download PDF

Info

Publication number
WO2018036402A1
WO2018036402A1 PCT/CN2017/097434 CN2017097434W WO2018036402A1 WO 2018036402 A1 WO2018036402 A1 WO 2018036402A1 CN 2017097434 W CN2017097434 W CN 2017097434W WO 2018036402 A1 WO2018036402 A1 WO 2018036402A1
Authority
WO
WIPO (PCT)
Prior art keywords
variable
credit
variables
target sample
target
Prior art date
Application number
PCT/CN2017/097434
Other languages
English (en)
French (fr)
Inventor
席炎
Original Assignee
阿里巴巴集团控股有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 阿里巴巴集团控股有限公司 filed Critical 阿里巴巴集团控股有限公司
Priority to SG11201901614SA priority Critical patent/SG11201901614SA/en
Publication of WO2018036402A1 publication Critical patent/WO2018036402A1/zh
Priority to US16/283,381 priority patent/US20190220924A1/en
Priority to PH12019500406A priority patent/PH12019500406A1/en

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q10/00Administration; Management
    • G06Q10/06Resources, workflows, human or project management; Enterprise or organisation planning; Enterprise or organisation modelling
    • G06Q10/063Operations research, analysis or management
    • G06Q10/0635Risk analysis of enterprise or organisation activities
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N20/00Machine learning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/211Selection of the most significant subset of features
    • G06F18/2113Selection of the most significant subset of features by ranking or filtering the set of features, e.g. using a measure of variance or of feature cross-correlation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/22Matching criteria, e.g. proximity measures
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N7/00Computing arrangements based on specific mathematical models
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q10/00Administration; Management
    • G06Q10/06Resources, workflows, human or project management; Enterprise or organisation planning; Enterprise or organisation modelling
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q40/00Finance; Insurance; Tax strategies; Processing of corporate or income taxes
    • G06Q40/03Credit; Loans; Processing thereof
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F2218/00Aspects of pattern recognition specially adapted for signal processing
    • G06F2218/08Feature extraction
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F2218/00Aspects of pattern recognition specially adapted for signal processing
    • G06F2218/12Classification; Matching

Definitions

  • the present application relates to the field of computer applications, and in particular, to a method and apparatus for detecting key variables in a model.
  • a large amount of service data from a user can be collected as a modeling sample in a certain business scenario, and then the modeling sample is trained by a statistical model or a machine learning method to construct a business model. After the business model is built, the business data can be input into the business model, and corresponding business predictions are performed in the business scenario according to the output result of the business model.
  • the present application proposes a method for detecting key variables in a model, the method comprising:
  • the target sample comprising a number of variables
  • a key variable having the highest degree of influence on the first result is determined based on a difference between each of the second result in the second result set and the first result.
  • the application also proposes a credit promotion guidance method, the method comprising:
  • the target sample includes a plurality of variables
  • the physical meaning corresponding to the key variable with the highest degree of influence is output as a credit promotion guide to the user corresponding to the target sample.
  • the present application also proposes a detection device for evaluating key variables in a model, the device comprising:
  • a first input module configured to input a target sample into the model to obtain a first result;
  • the target sample includes a plurality of variables;
  • a first replacement module configured to replace, in sequence, a value of a variable in the target sample with a detection threshold corresponding to the variable
  • a second input module configured to input the target samples in which the values of the variables are sequentially replaced into the model to obtain a second result set
  • a first determining module configured to determine a key variable having the highest degree of influence on the first result based on a difference between each of the second result in the second result set and the first result.
  • the application also provides a credit promotion guidance device, the device comprising:
  • a third input module configured to input a target sample into a credit evaluation model to obtain a first credit score; the target sample includes a plurality of variables;
  • a second replacement module configured to replace, in sequence, a value of a variable in the target sample with a detection threshold corresponding to the variable
  • a fourth input module configured to input the target samples in which the values of the variables are sequentially replaced into the credit evaluation model to obtain a second credit score set
  • a second determining module configured to determine a key variable having the highest impact on the first credit score based on a difference between each second credit score in the second credit score set and the first credit score ;
  • an output module configured to output the physical meaning corresponding to the key variable with the highest degree of influence as a credit promotion guide to the user corresponding to the target sample.
  • the first result is obtained by inputting the target sample into the model; the value of the variable in the target sample is sequentially replaced with the detection threshold corresponding to the variable, and the target value of the variable is sequentially replaced. Entering the model separately to obtain a second result set; then determining a key variable having the highest highest first result based on a difference between each of the second result and the first result in the second result set, After the value of the comparison variable is sequentially replaced, the difference between the second result obtained by the target sample in the model and the actual first result obtained by the target sample can determine the degree of influence on the first result.
  • the highest key variable without an in-depth understanding of the model's algorithm;
  • the difference between the two can be used to determine the key variables that have the highest impact on the user's credit score, without the need to understand the model's algorithm in depth, thus reducing the complexity of detecting the most influential variables on credit scores;
  • the physical meaning corresponding to the key variable is output as a credit promotion guide to the user corresponding to the target sample, so that the user can intuitively understand the way to improve the credit, thereby improving the user experience.
  • FIG. 1 is a flowchart of a method for detecting a key variable in a model according to an embodiment of the present application
  • FIG. 2 is a flowchart of a method for guiding credit promotion according to an embodiment of the present application
  • FIG. 3 is a process flow diagram of an output credit promotion guide in a credit evaluation model according to an embodiment of the present application
  • FIG. 4 is a logic block diagram of a detecting device for a key variable in a model according to an embodiment of the present application
  • FIG. 5 is a hardware structural diagram of a server end of a detecting apparatus for carrying a key variable in the model according to an embodiment of the present application
  • FIG. 6 is a logic block diagram of a credit score promotion guidance device according to an embodiment of the present application.
  • FIG. 7 is a hardware structural diagram of a server that carries the credit score promotion guidance device according to an embodiment of the present application.
  • the business risk model is an evaluation model used to evaluate business risks.
  • a large amount of service data can be collected as a modeling sample in a certain business scenario, and the modeling samples are classified based on whether the modeling sample contains a predefined business risk event, and then through a statistical model or machine learning.
  • the method of training the modeling samples to build a business risk model is an evaluation model used to evaluate business risks.
  • the collected business data can be input into the business risk model as a target sample for risk assessment to predict the probability of such a business risk event occurring in a future period of time, and then the probability is converted into a corresponding Business risk scores to reflect the risk level of the business.
  • the credit information company takes the user's business data as a target sample, inputs the model for credit evaluation, outputs the user's credit score, and the user usually There will be a strong appeal to improve the credit score; therefore, the credit information company needs to know which of the user's business data has the highest impact on the final credit score, which variable is lowering the user's credit score. Therefore, based on the credit board of the user, the credit promotion guide can be output to the user in a targeted manner.
  • a specific credit promotion guidance algorithm can be designed by deepening into the modeling algorithm of the evaluation model, and the credit promotion guidance algorithm is used to detect the target sample of the user, and finally obtain the The key variable with the highest impact on the credit score, and then the business behavior corresponding to the key variable is output to the user as a credit promotion guide.
  • the design of the above detection algorithm generally requires an in-depth understanding of the modeling algorithm of the evaluation model.
  • traditional modeling algorithms such as logistic regression algorithms and decision tree algorithms
  • the models built based on these algorithms have a simple structure and a high degree of interpretability, these algorithms are designed to penetrate these algorithms, usually Will not cause difficulties.
  • the present application obtains a first result by inputting a target sample into a model; replacing the value of the variable in the target sample with a detection threshold corresponding to the variable, and replacing the value of the variable in turn.
  • a target sample is respectively input into the model to obtain a second result set; and then determining a key to the first result based on a difference between each of the second result and the first result in the second result set
  • the variable can be determined by the difference between the value of the comparison variable and the second result obtained by the target sample in the model, and the difference between the first result obtained by the target sample and the first result obtained by the target sample.
  • the difference between the scores it is possible to determine the key variables that have the highest impact on the user's credit score, without requiring an in-depth understanding of the model's algorithm, thereby reducing the complexity in detecting variables with the highest impact on credit scores;
  • the user can intuitively understand the way to improve the credit, thereby improving the user experience.
  • FIG. 1 is a method for detecting a key variable in a model according to an embodiment of the present application. The method is applied to a server, and the method performs the following steps:
  • Step 101 Input a target sample into a model to obtain a first result; the target sample includes a plurality of variables;
  • Step 102 Replace the value of the variable in the target sample with a detection threshold corresponding to the variable.
  • Step 103 Input the target samples whose values are sequentially replaced by the model into the second result set.
  • Step 104 Determine a key variable having the highest degree of influence on the first result based on a difference between each of the second result in the second result set and the first result.
  • the above server may include a server for training and using a business model, a server cluster, or a cloud platform built based on a server cluster.
  • the above model may include a mathematical model for performing business prediction after training a large number of collected modeling samples based on a preset modeling algorithm.
  • the above business model may be an evaluation model, by which the user's business risk may be scored for a certain period of time, and the score result may be output.
  • the above-mentioned server can adopt modeling methods such as scorecard, regression analysis or neural network, such as SAS (Statistical Analysis System) and SPSS (Statistical Product and Service Solutions). Solution) More mature data mining tools, through the training of a large number of collected modeling samples to build the above business model.
  • SAS Statistical Analysis System
  • SPSS Statistical Product and Service Solutions
  • the server may collect a target sample of the target user.
  • a plurality of business variables may be included, and among the business variables, a plurality of behavior variables may also be included.
  • the target sample and the variables included in the modeling sample may be variables that affect the business, and the variables may also include business variables corresponding to the user's business behavior.
  • the above target samples and the number of behavior variables included in the modeling samples can be based on The actual needs are customized.
  • the variables in the above target samples may all be defined as behavior variables.
  • the target sample may be input into the training evaluation model to perform service prediction, and a first result corresponding to the target sample is obtained.
  • the server may take the variable included in the service sample. The value is replaced by the detection threshold corresponding to the variable, and then the target sample whose values are replaced in turn is input into the business model for business prediction.
  • the above detection threshold may be a threshold value that can represent the value of the variable contained in the collected target sample and the overall level in the target user population.
  • the variable included in the target sample may respectively correspond to a detection threshold for replacing the value of the variable.
  • the target user group may be defined as all the people who implement the business corresponding to the target sample, or may be defined as the target user corresponding to the target sample, and a specific business group to which it belongs, which is not particularly limited in this example. .
  • the detection threshold may be defined as any of the corresponding business variables, any of the average, median, or mode in the target user population.
  • the average, median and mode are the basic statistical concepts.
  • the average is the average of all the samples taken and divided by the number of samples.
  • the median is the average of the two in the middle, or the average of the two in the middle.
  • the mode is the value of the sample with the highest number of occurrences in all sample samples.
  • the mode when used as the above-mentioned detection threshold, since the mode may be plural, in this case, the average value of the plurality of modes, or any one of them may be used as the detection threshold.
  • variable in the target sample when the value of the variable in the target sample is sequentially replaced with the detection threshold corresponding to the variable, in general, the variable in the target sample can be replaced with the detection threshold corresponding to the variable.
  • the target sample may include a plurality of behavior variables corresponding to the same behavior; in this case, if the target sample includes multiple behavior variables, and the plurality of behavior variables correspond to The same behavior may replace the values of the multiple behavior variables with the detection thresholds corresponding to the multiple behavior variables.
  • the plurality of target samples obtained by sequentially replacing the variable may be input into the above.
  • Business forecasting is performed in the business model to obtain a second result set.
  • the server may further save the variable whose value is replaced, and The correspondence between the above-mentioned target samples after the value of the variable is replaced, and the business credit score obtained in the above-mentioned business model.
  • the subsequent server can locate the corresponding service variable whose value is replaced by querying the corresponding relationship based on any second result in the second result set.
  • the server may compare the first result that has been obtained with each second result in the second result set, and calculate The difference between the first result and the second result of the second result is then determined based on the calculated difference to determine a key variable having the highest degree of influence on the first result.
  • the server may calculate a difference between each of the second results in the second result set minus the first result.
  • the second result in the second result set that is the largest difference from the first result may be determined as the key result.
  • the server may use the key result as a query index, and save the foregoing correspondence in advance to determine a variable whose value corresponding to the key result is replaced.
  • the variable determined at this time corresponding to the key result is the key variable that is finally detected to have the highest impact on the first result.
  • the difference between the value obtained by the comparison variable and the result obtained by the target sample in the model can be quickly and easily compared with the actual result obtained when the target sample is not replaced by the variable value.
  • the key business variables that have the highest impact on the first result are determined without deep understanding of the model's algorithm, thereby reducing the complexity in determining the variables that have the highest impact on the first result.
  • the foregoing business model may be a credit evaluation model.
  • the following is a description of the credit evaluation model using the above business model as an example.
  • FIG. 2 is a method for guiding credit promotion according to an embodiment of the present application, which is applied to a server, and the method performs the following steps:
  • Step 201 Enter a target sample into a credit evaluation model to obtain a first credit score; the target sample includes a plurality of variables;
  • the above server may include a server, a server cluster for training and using a credit evaluation model, or A cloud platform built on a server cluster.
  • the above credit evaluation model may include a mathematical model for performing credit evaluation after training a large number of collected modeling samples based on a preset modeling algorithm.
  • the credit evaluation model may be a credit risk assessment model, by which the user's credit risk may be scored and the score result may be output.
  • the credit score is a credit score obtained by performing credit evaluation on the collected target sample by the credit evaluation model, and the credit score is used to measure the credit risk of the user in a future period of time.
  • the credit evaluation model can perform credit risk assessment on business data collected from a specific credit business scenario, and obtain a corresponding credit score, which is used to measure a user in the future.
  • the above-mentioned server may use modeling methods such as scorecards, regression analysis, or neural network when training the credit evaluation model, such as SAS (Statistical Analysis System) and SPSS (Statistical). Product and Service Solutions, statistical products and service solutions, and other mature data mining tools, through the training of a large number of collected modeling samples, to build the above credit evaluation model.
  • SAS Statistical Analysis System
  • SPSS SPSS
  • the server may collect a target sample of the target user.
  • the target user is the user who needs to conduct a credit risk assessment.
  • the above modeling samples and the foregoing target samples may all include business data collected from a specific business scenario.
  • the business data as a modeled sample can be used for training of the model, and the business data as a target sample can be used to evaluate the credit risk of the target user.
  • a plurality of variables that may affect the credit risk of the user may be included, and among these variables, several behavior variables may also be included.
  • the above target sample and the variables included in the modeling sample may be variables that affect credit risk; for example, may include user's income consumption data, historical credit data, default data, and user employment status. And other variables that affect credit risk.
  • income consumption data, historical credits, and default data correspond to users' consumption behaviors, credit behaviors, and default behaviors. Therefore, income consumption data, historical credit data, and default data can be referred to as target samples. Behavioral variables.
  • the above target samples and the number of behavior variables included in the modeling samples can be based on The actual needs are customized.
  • the variables in the above target samples may all be defined as behavior variables.
  • the target sample may be input into the trained credit evaluation model for risk assessment, and a first credit score corresponding to the target sample is obtained.
  • Step 202 Replace the value of the variable in the target sample with a detection threshold corresponding to the variable.
  • Step 203 Enter the target sample that has been replaced by the variable values into the credit evaluation model to obtain a second credit score set.
  • the server may include the variable included in the service sample in order to detect the variable having the highest degree of influence on the first credit score in the target sample.
  • the value of the variable is replaced by the detection threshold corresponding to the variable, and then the target sample whose values are replaced in turn is input into the credit evaluation model for credit risk assessment.
  • the above detection threshold may be a threshold value that can represent the value of the variable contained in the collected target sample and the overall level in the target user population.
  • the variable included in the target sample may respectively correspond to a detection threshold for replacing the value of the variable.
  • the target user group may be defined as all the people who implement the business corresponding to the target sample, or may be defined as the target user corresponding to the target sample, and a specific business group to which it belongs, which is not particularly limited in this example. .
  • the detection threshold may be defined as any of the corresponding business variables, any of the average, median, or mode in the target user population.
  • all users in the target user group can usually be collected, and the value corresponding to the variable is taken as a sample of values, and then calculated.
  • the target sample may include business variables such as income consumption data, historical credit data, default data, and employment status of the user.
  • business variables such as income consumption data, historical credit data, default data, and employment status of the user.
  • the income consumption data of all users in the target user population can be collected as a sample of values, and then the income consumption of all collected users is calculated.
  • the average is the average of all the samples taken and divided by the number of samples.
  • the median is the average of the two in the middle, or the average of the two in the middle.
  • the mode is the value of the sample with the highest number of occurrences in all sample samples.
  • the value of the variable in the target sample may be directly set to the detection threshold in any of the average, median, or mode in the target user population. In this way, it is only necessary to perform a simple statistical analysis on the variables in the target sample and the corresponding values in the target user population, and the detection thresholds can be separately set for the variables in the target samples.
  • the mode when used as the above-mentioned detection threshold, since the mode may be plural, in this case, the average value of the plurality of modes, or any one of them may be used as the detection threshold.
  • the detection threshold may also be defined as a statistical analysis of the variables in the target user population by using a specific statistical analysis algorithm for the variables in the target user population.
  • a threshold capable of characterizing the value of the variable in the target sample above the overall level of the target user population.
  • the average, median or mode in the target user population usually does not accurately reflect the overall value of the value of the business variable in the target user population.
  • any one of the average, median, or mode in the target user population is defined as the above detection threshold.
  • all users in the target user group may also take the value corresponding to the variable as a sample of values, and then perform statistical analysis through a specific statistical analysis algorithm. And obtaining a threshold value capable of characterizing the value of the variable in the target sample in the target user population, and then defining the obtained threshold as the detection threshold.
  • the statistical analysis algorithm used when performing statistical analysis on the above-mentioned sample of values, may be the same as or different from the algorithm used to construct the above-mentioned evaluation model.
  • algorithms such as regression analysis can also be used to perform statistical analysis on the above-mentioned sample of values using more mature data mining tools such as SAS or SPSS, and obtain the distribution law of all the sample values.
  • a threshold value that can represent the value of the value variable and the overall level in the target user population is determined, and the specific statistical analysis process is not detailed in this example.
  • the personnel can refer to the description in the related art when putting into practice.
  • the detection threshold may be separately defined for the business variables in the target sample by other mathematical quantization methods.
  • the detection thresholds respectively defined for the business variables in the above target samples are intended to represent the value of the business variables, and the overall level in the target user population is In the example, no one enumeration is made.
  • variable in the target sample when the value of the variable in the target sample is sequentially replaced with the detection threshold corresponding to the variable, in general, the variable in the target sample can be replaced with the detection threshold corresponding to the variable.
  • V1-t the detection thresholds corresponding to V1, V2, and V3 are V1-t, V2-t, and V3-t, respectively.
  • V1-t the value of variable V1 to get a target sample consisting of variables V1-t, V2, and V3.
  • V2 the value of V2 is replaced by V2-t to obtain a target sample composed of variables V1, V2-t and V3.
  • V3-t is used to replace the value of variable V3 to obtain a target sample consisting of variables V1, V2 and V3-t.
  • the above target samples may contain multiple behavior variables corresponding to the same behavior.
  • the target sample includes variables such as “default amount”, “number of defaults”, and “revenue consumption data”.
  • the variable “revenue consumption data” it and the user's consumption behavior.
  • the variables "default amount” and "default number” are the behavior variables in the target sample that correspond to the same behavior.
  • the values of the multiple behavior variables may be replaced by the detection thresholds corresponding to the multiple behavior variables.
  • V1-t the detection thresholds corresponding to V1, V2, and V3 are V1-t, V2-t, and V3-t, respectively.
  • V2 and V3 correspond to the same behavior.
  • V2 and V3-t are used to replace the values of variables V2 and V3, respectively, to obtain a target sample composed of variables V1, V2-t and V3-t.
  • the plurality of target samples obtained by sequentially replacing the variable may be input into the above. Credit risk assessment is carried out in the credit evaluation model to obtain a second credit score set.
  • the target sample contains three variables V1, V2, and V3, and the detection thresholds corresponding to V1, V2, and V3 are V1-t, V2-t, and V3-t, respectively.
  • the detection thresholds corresponding to V1, V2, and V3 are V1-t, V2-t, and V3-t, respectively.
  • a target sample consisting of variables V1-t, V2 and V3, and a variable V1, V2- are obtained.
  • the credit risk assessment is performed by inputting the above three target samples into the credit evaluation model to obtain a credit score set. At this time, the score set includes three credit scores.
  • the server may also save the variable whose value is replaced. Correspondence between the credit scores obtained in the above-mentioned evaluation model is input to the above target sample after the value of the variable is replaced.
  • the subsequent server can locate the corresponding service variable whose value is replaced by querying the corresponding relationship based on any credit score in the second credit score set.
  • Step 204 Determine, according to a difference between each of the second credit scores in the second credit score set and the first credit score, a key variable having the highest degree of influence on the first credit score.
  • the server when the server detects the key variable with the highest degree of influence on the first credit score, the server may compare the obtained first credit score with each credit score of the second credit score set. Calculating a difference between each of the first credit score and each credit score in the second credit score set, and then determining a key variable having the highest degree of influence on the first credit score based on the calculated difference.
  • the server may separately calculate a difference between each credit score in the second credit score set minus the first credit score; wherein the calculated difference may be greater than 0, It may also be less than 0.
  • the credit score obtained in the input model is greater than the target sample obtained when the value is not replaced. Credit score.
  • the increase in the credit score may be due to the variable being replaced by the value.
  • the credit score obtained in the input model is smaller than that obtained when the target sample is not replaced. Credit score. In this case, it may be due to the business variable whose value is replaced, which lowers the credit score.
  • the credit score Because of the credit score, it is usually inversely proportional to the risk level, that is, the higher the credit score, the lower the corresponding risk.
  • the credit score having the largest difference between the second credit score set and the first credit score may be determined as the key credit score.
  • the server may use the credit score as a query index, and save the foregoing correspondence in advance to determine a variable whose value corresponding to the credit score is replaced. Determined at this time
  • the key credit score has a corresponding relationship variable, which is the key variable that is finally detected to have the highest impact on the first credit score.
  • the credit score obtained after inputting the model has the largest difference from the first credit score, indicating that the value of the variable is replaced by the variable in the target user population.
  • the resulting credit score is significantly increased compared to the other variables that are replaced, and the risk is significantly reduced.
  • variable when the variable is not replaced, the user's risk is relatively high, in fact, because the variable lowers the first credit score, indicating that the target user corresponding to the target sample is performing on the variable. , below the overall level of the target user population. Thus, in this scenario, it is reasonable to determine this variable as a key business variable.
  • Step 205 Output the physical meaning corresponding to the key variable with the highest degree of influence as a credit promotion guide to the user corresponding to the target sample.
  • the physical meaning corresponding to the key variable may be further output as a credit promotion guide to the target user corresponding to the target sample.
  • the physical meaning corresponding to the key variables described above may be the user behavior corresponding to the key variable.
  • the server may further determine whether the key variable is a behavior variable, and if the key variable is a behavior variable, the server may further correspond to the key variable.
  • the behavior is output as a behavioral guide to the target user corresponding to the above target sample.
  • the target user can guide the behavior of the output, understand what behavior it may be, improve its own risk, and lower the credit score. Subsequent target users can reduce their risk and improve their credit score by improving the behavior.
  • the key variable is the default number variable in the target sample
  • the business behavior corresponding to the key variable is a default behavior.
  • the system can output a “to avoid excessive number of defaults to improve the default behavior”.
  • Credit rating guidelines for credit enhancement At this time, users with lower credit scores can pay attention to their own performance behavior in the future after viewing the credit promotion guidelines output by the system, and repay the contract as much as possible to reduce default. Record to improve your credit score.
  • the credit promotion guide may not be output, but a preset prompt message may be output to the target user; the prompt message is used to indicate that the target user's credit risk is controllable; for example, when the credit is When the score is the credit score obtained by the credit risk assessment model, the above prompt message may be a prompt message that “your credit record is good”.
  • the credit score defined by the evaluation model is proportional to the risk level, that is, the higher the credit score and the higher the corresponding risk, the key variable that has the highest influence on the first credit score is determined.
  • the implementation process is the reverse of the implementation process shown above.
  • the difference between the first credit score minus the credit scores in the second credit score set may be calculated, and the first credit score is compared with the second
  • the credit score with the largest difference between each credit score in the credit score set is determined as a key credit score, and then the key relationship with the highest degree of influence on the first credit score is determined by finding the above correspondence.
  • FIG. 3 is a flowchart of a process for outputting credit promotion guidance in a credit evaluation model according to the present example.
  • the above credit risk assessment model includes models of three business variables V1, V2, and V3, wherein V1, V2, and V3 are behavior variables, and the detection thresholds corresponding to V1, V2, and V3 are respectively V1-t. , V2-t and V3-t.
  • V1-t, V2-t, and V3-t are the average values of V1, V2, and V3 in the target user population, respectively.
  • Figure 3 shows the mean value of V1, V2, and V3 in the target user population using the mean function to get V1. -t, V2-t and V3-t).
  • the server may input the target sample into the model for credit evaluation to obtain a credit score, which is recorded as Score1.
  • V1, V2, and V3 may be sequentially replaced with corresponding detection thresholds.
  • the value of the business variable V1 can be replaced with V1-t to obtain a target sample composed of the business variables V1-t, V2, and V3.
  • the above three target samples composed of V1-t, V2, and V3, the target samples composed of V1, V2-t, and V3, and the targets composed of V1, V2, and V3-t can be obtained.
  • the samples were entered into the above model for credit risk assessment to obtain a credit score. Among them, in this example, the higher the credit score, the higher the credit rating of the target user, and the lower the default probability.
  • the credit score obtained by the target samples composed of V1-t, V2, and V3 in the model is scored as Score_V1.
  • the server can save the correspondence between V1 and Score_V1.
  • the credit score obtained by the target samples composed of V1, V2-t, and V3 in the model is scored as Score_V2.
  • the server can save the correspondence between V2 and Score_V2.
  • the credit score obtained by the target samples composed of V1, V2, and V3-t in the model is scored as Score_V3.
  • the server can save the correspondence between V3 and Score_V3.
  • the above server may calculate the difference between Score_V1, Score_V2, Score_V3 and Score1 when outputting the credit promotion guide.
  • the credit score with the largest difference from Score1 is determined as a key score, and the above correspondence is queried, and the business variable corresponding to the key score is determined as a key variable.
  • the business behavior corresponding to the key variable is the credit promotion guide that needs to be output.
  • the server can query the above correspondence, determine the business variable V1 corresponding to Score_V1 as the key variable having the highest degree of influence on the credit score Score1, and output the corresponding variable corresponding to the business variable V1.
  • Business behavior is output to the user as a key business action.
  • the system may output a credit promotion guide “to avoid excessive defaults to improve the credit score”, and the target user is viewing the credit output of the system output.
  • a credit promotion guide “to avoid excessive defaults to improve the credit score”
  • the target user is viewing the credit output of the system output.
  • V1, V2 and V3 It indicates that the performance of the target users in V1, V2 and V3 is better than the overall level of the target user group.
  • the above behavior guide may not be output; or the system may output to the user.
  • the first result is obtained by inputting the target sample into the model; the values of the variables in the target sample are sequentially replaced with the detection threshold corresponding to the variable, and the values of the variables are sequentially replaced.
  • a target sample is respectively input into the model to obtain a second result set; and then determining a highest value for the first result based on a difference between each of the second result and the first result in the second result set
  • the key variable is realized by the difference between the value of the comparison variable and the second result obtained by the target sample in the model, and the difference between the actual result obtained by the target sample and the first result obtained by the target sample.
  • the result is the most influential key variable, without the need to understand the algorithm of the model;
  • the difference between the two can be used to determine the key variables that have the highest impact on the user's credit score, without the need to understand the model's algorithm in depth, thus reducing the complexity of detecting the most influential variables on credit scores;
  • the physical meaning corresponding to the key variable is output as a credit promotion guide to the user corresponding to the target sample, so that the user can intuitively understand the way to improve the credit, thereby improving the user experience.
  • the present application also provides an embodiment of the device.
  • the present application proposes a detection device 40 for key variables in a model, which is applied to a server; wherein, referring to FIG. 5, the hardware involved in the server of the detecting device 40 carrying the key variables in the model.
  • the CPU, the memory, the non-volatile memory, the network interface, and the internal bus are generally included.
  • the detection device 40 of the key variable in the evaluation model can be generally understood as a computer program loaded in the memory.
  • the logic device combined with the hardware and software formed after the CPU runs, the device 40 includes:
  • a first input module 401 configured to input a target sample into the model to obtain a first result;
  • the target sample includes a plurality of variables;
  • a first replacement module 402 configured to replace, in sequence, a value of a variable in the target sample with a detection threshold corresponding to the variable
  • a second input module 403 configured to input the target samples in which the values of the variables are sequentially replaced into the model to obtain a second result set
  • a first determining module 404 configured to determine, according to each second result in the second result set, the first node The difference between the fruits determines the key variable that has the highest impact on the first result.
  • the detection threshold characterizes the overall level of the corresponding variable in the target population
  • the detection threshold is an average, median or mode of the corresponding variable in the target population.
  • the replacement module 402 is specifically configured to:
  • variable corresponding to the second result corresponding to the largest difference is determined as the key variable having the highest degree of influence on the first result.
  • the present application provides a credit promotion guidance device 60, which is applied to a server.
  • the hardware architecture involved in the server that carries the credit promotion guide 60 generally includes a CPU.
  • a logical device combining hardware and software, the device 60 comprising:
  • a third input module 601 configured to input a target sample into a credit evaluation model to obtain a first credit score; and the target sample includes a plurality of variables;
  • a second replacement module 602 configured to replace, in sequence, a value of a variable in the target sample with a detection threshold corresponding to the variable
  • a fourth input module 603, configured to input the target samples in which the values of the variables are sequentially replaced into the credit evaluation model to obtain a second credit score set;
  • a second determining module 604 configured to determine, according to a difference between each second credit score in the second credit score set and the first credit score, a key to affecting the first credit score variable;
  • the output module 605 is configured to output the physical meaning corresponding to the key variable with the highest degree of influence as a credit promotion guide to the user corresponding to the target sample.
  • the detection threshold characterizes the overall level of the corresponding variable in the target population
  • the detection threshold is an average, median or mode of the corresponding variable in the target population.
  • the second replacement module 602 is further configured to:
  • the target sample includes a plurality of behavior sub-variables corresponding to the same behavior variable
  • the values of the plurality of behavior sub-variables are replaced with detection thresholds respectively corresponding to the plurality of behavior sub-variables.
  • the second determining module 604 is specifically configured to:
  • variable corresponding to the second credit score corresponding to the largest difference is determined as the key variable having the highest degree of influence on the first credit score.
  • the output module 605 is specifically configured to:
  • Determining whether the key variable is a behavior variable if the key variable is a behavior variable, the behavior corresponding to the key variable is output as a behavior guide to a target user corresponding to the target sample.
  • the output module 605 is further configured to:

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Business, Economics & Management (AREA)
  • General Physics & Mathematics (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Artificial Intelligence (AREA)
  • General Engineering & Computer Science (AREA)
  • Evolutionary Computation (AREA)
  • Economics (AREA)
  • Strategic Management (AREA)
  • Human Resources & Organizations (AREA)
  • Software Systems (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Accounting & Taxation (AREA)
  • Finance (AREA)
  • Entrepreneurship & Innovation (AREA)
  • General Business, Economics & Management (AREA)
  • Marketing (AREA)
  • Development Economics (AREA)
  • Mathematical Physics (AREA)
  • Computing Systems (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Evolutionary Biology (AREA)
  • Operations Research (AREA)
  • Technology Law (AREA)
  • Educational Administration (AREA)
  • Game Theory and Decision Science (AREA)
  • Medical Informatics (AREA)
  • Quality & Reliability (AREA)
  • Tourism & Hospitality (AREA)
  • Pure & Applied Mathematics (AREA)
  • Algebra (AREA)
  • Computational Mathematics (AREA)
  • Mathematical Analysis (AREA)
  • Mathematical Optimization (AREA)
  • Financial Or Insurance-Related Operations Such As Payment And Settlement (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

一种模型中关键变量的探测方法及装置,其中的方法包括:将目标样本输入模型得到第一结果;所述目标样本包含若干变量(101);将所述目标样本中的变量的取值依次替换为与该变量对应的探测阈值(102);将变量的取值依次被替换后的目标样本分别输入所述模型得到第二结果集合(103);基于所述第二结果集合中的每一个第二结果与所述第一结果之间的差值,确定对所述第一结果影响度最高的关键变量(104)。本方法及装置可以降低在探测对模型的输出结果影响度最高的变量时的复杂度。

Description

模型中关键变量的探测方法及装置
本申请要求2016年08月26日递交的申请号为201610741714.7、发明名称为“模型中关键变量的探测方法及装置”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。
技术领域
本申请涉及计算机应用领域,尤其涉及一种模型中关键变量的探测方法及装置。
背景技术
在相关技术中,通常可以在某一业务场景下采集大量来自用户的业务数据作为建模样本,然后通过统计模型或者机器学习的方法对建模样本进行训练,来构建业务模型。当业务模型构建完成后,可以将业务数据输入该业务模型,并根据该业务模型的输出结果,在该业务场景下进行相应的业务预测。
然而,在实际应用中,在将业务数据作为业务样本输入业务模型得到结果后,由于输入的业务数据通常可能包含若干业务变量,而模型通常无法确定该业务样本中的哪一个业务变量对最终输出的业务结果影响度最高,因此无法满足实际的业务需求。
发明内容
本申请提出一种模型中关键变量的探测方法,该方法包括:
将目标样本输入模型得到第一结果;所述目标样本包含若干变量;
将所述目标样本中的变量的取值依次替换为与该变量对应的探测阈值;
将变量的取值依次被替换后的目标样本分别输入所述模型得到第二结果集合;
基于所述第二结果集合中的每一个第二结果与所述第一结果之间的差值,确定对所述第一结果影响度最高的关键变量。
本申请还提出一种信用提升指引方法,该方法包括:
将目标样本输入信用评价模型得到第一信用评分;所述目标样本包含若干变量;
将所述目标样本中的变量的取值依次替换为与该变量对应的探测阈值;
将变量的取值依次被替换后的目标样本分别输入所述信用评价模型得到第二信用评分集合;
基于所述第二信用评分集合中的每一个第二信用评分与所述第一信用评分之间的差 值,确定对所述第一信用评分影响度最高的关键变量;
将该影响度最高的关键变量对应的物理含义作为信用提升指引输出给所述目标样本对应的用户。
本申请还提出一种评价模型中关键变量的探测装置,该装置包括:
第一输入模块,用于将目标样本输入模型得到第一结果;所述目标样本包含若干变量;
第一替换模块,用于将所述目标样本中的变量的取值依次替换为与该变量对应的探测阈值;
第二输入模块,用于将变量的取值依次被替换后的目标样本分别输入所述模型得到第二结果集合;
第一确定模块,用于基于所述第二结果集合中的每一个第二结果与所述第一结果之间的差值,确定对所述第一结果影响度最高的关键变量。
本申请还提出一种信用提升指引装置,该装置包括:
第三输入模块,用于将目标样本输入信用评价模型得到第一信用评分;所述目标样本包含若干变量;
第二替换模块,用于将所述目标样本中的变量的取值依次替换为与该变量对应的探测阈值;
第四输入模块,用于将变量的取值依次被替换后的目标样本分别输入所述信用评价模型得到第二信用评分集合;
第二确定模块,用于基于所述第二信用评分集合中的每一个第二信用评分与所述第一信用评分之间的差值,确定对所述第一信用评分影响度最高的关键变量;
输出模块,用于将该影响度最高的关键变量对应的物理含义作为信用提升指引输出给所述目标样本对应的用户。
本申请中,通过将目标样本输入模型得到第一结果;将所述目标样本中的变量的取值依次替换为与该变量对应的探测阈值,并将变量的取值依次被替换后的目标样本分别输入所述模型得到第二结果集合;然后基于所述第二结果集合中的每一个第二结果与所述第一结果之间的差值,确定对所述第一结果最高的关键变量,实现了通过比较变量的取值被依次替换后,目标样本在模型中得到的第二结果,与该目标样本实际的得到的第一结果之间的差异,就可以确定出对第一结果影响度最高的关键变量,而不需要深入理解模型的算法;
当本申请的技术方案应用于信用评价模型中时,可以实现通过比较变量的取值被依次替换后,目标样本在信用评价模型中得到的信用评分,与该目标样本实际的得到的信用评分之间的差异,就可以确定出对用户的信用评分影响度最高的关键变量,而不需要深入理解模型的算法,从而可以降低在探测对信用评分影响度最高的变量时的复杂度;同时,通过将该关键变量对应的物理含义作为信用提升指引输出给所述目标样本对应的用户,可以使用户能够直观的了解到提升自身信用的途径,从而可以提升用户体验。
附图说明
图1是本申请一实施例提供的一种模型中关键变量的探测方法的流程图;
图2是本申请一实施例提供的一种信用提升指引方法的流程图;
图3是本申请一实施例提供的一种信用评价模型中输出信用提升指引的处理流程图;
图4是本申请一实施例提供的一种模型中关键变量的探测装置的逻辑框图;
图5是本申请一实施例提供的承载所述一种模型中关键变量的探测装置的服务端的硬件结构图;
图6是本申请一实施例提供的一种信用评分提升指引装置的逻辑框图;
图7是本申请一实施例提供的承载所述信用评分提升指引装置的服务端的硬件结构图。
具体实施方式
业务风险模型,是一种用于对业务风险进行评估的评价模型。在相关技术中,通常可以在某一业务场景下采集大量业务数据作为建模样本,并基于建模样本中是否包含预先定义的业务风险事件对建模样本进行分类,然后通过统计模型或者机器学习的方法对建模样本进行训练,来构建业务风险模型。
当业务风险模型构建完成后,可以将采集到的业务数据作为目标样本输入该业务风险模型进行风险评估,来预测在未来一段时间内发生这种业务风险事件的概率,然后将该概率转换为对应的业务风险评分,来反映业务的风险等级。
在实际应用中,在通过将采集到的业务数据作为目标样本,输入构建完成的评价模型,得到对应的业务风险评分后,通常希望能够探测出该目标样本所包含的若干变量中,对最终输出的风险评分影响度最高的关键变量。
例如,在信贷业务的应用场景中,当上述业务风险模型为信用风险评价模型时,征信公司将用户的业务数据作为目标样本,输入模型中进行信用评估输出该用户的信用评分后,用户通常会有着比较强烈的提升信用评分的诉求;因此,征信公司需要了解该用户的业务数据中,哪一个变量对最终的信用评分影响度最高,到底是哪一个变量拉低了该用户信用评分,从而可以基于该用户的信用短板,有针对性的向该用户输出信用提升指引。
在相关技术中,在探测目标样本中对风险评分影响度最高的关键变量时,通常可以通过特定的探测算法来进行实现;
例如,在信贷业务的应用场景中,可以通过深入到评价模型的建模算法内部,来设计特定的信用提升指引算法,通过该信用提升指引算法,来探测用户的目标样本中,对最终得到的信用评分影响度最高的关键变量,然后将与该关键变量对应的业务行为作为信用提升指引向用户输出。
可见,在以上技术方案中,上述探测算法的设计,通常需要深入了解评价模型的建模算法。对于传统的诸如逻辑回归算法,决策树算法等建模算法而言,由于基于这些算法构建的模型具有结构简洁,以及高度的可解释性的特性,因此在深入这些算法来设计上述探测算法,通常不会造成困难。
然而,随着大数据挖掘技术的发展以及计算机计算性能的提升,越来越多的复杂算法被应用在了评价模型中,例如GBDT(Gradient Boosting Decision Tree,迭代的决策树算法),深度神经网络等算法,由于基于这些复杂算法生成的模型不易解读,由此引申出的问题是,在设计上述探测算法时,通常难以深入到模型的算法内部,从而在上述探测算法的设计上会存在困难。
有鉴于此,本申请通过将目标样本输入模型得到第一结果;将所述目标样本中的变量的取值依次替换为与该变量对应的探测阈值,并将变量的取值依次被替换后的目标样本分别输入所述模型得到第二结果集合;然后基于所述第二结果集合中的每一个第二结果与所述第一结果之间的差值,确定对所述第一结果最高的关键变量,实现了通过比较变量的取值被依次替换后,目标样本在模型中得到的第二结果,与该目标样本实际的得到的第一结果之间的差异,就可以确定出对第一结果影响度最高的关键变量,而不需要深入理解模型的算法;
当本申请的技术方案应用于信用评价模型中时,可以实现通过比较变量的取值被依次替换后,目标样本在信用评价模型中得到的信用评分,与该目标样本实际的得到的信 用评分之间的差异,就可以确定出对用户的信用评分影响度最高的关键变量,而不需要深入理解模型的算法,从而可以降低在探测对信用评分影响度最高的变量时的复杂度;同时,通过将该关键变量对应的物理含义作为信用提升指引输出给所述目标样本对应的用户,可以使用户能够直观的了解到提升自身信用的途径,从而可以提升用户体验。
下面通过具体实施例并结合具体的应用场景对本申请进行描述。
请参考图1,图1是本申请一实施例提供的一种模型中关键变量的探测方法,应用于服务端,所述方法执行以下步骤:
步骤101,将目标样本输入模型得到第一结果;所述目标样本包含若干变量;
步骤102,将所述目标样本中的变量的取值依次替换为与该变量对应的探测阈值;
步骤103,将变量的取值依次被替换后的目标样本分别输入所述模型得到第二结果集合;
步骤104,基于所述第二结果集合中的每一个第二结果与所述第一结果之间的差值,确定对所述第一结果影响度最高的关键变量。
上述服务端,可以包括用于训练以及使用业务模型的服务器、服务器集群或者基于服务器集群构建的云平台。
上述模型,可以包括基于预设的建模算法,对采集到的大量的建模样本进行训练后,构建出的用于进行业务预测的数学模型。例如,在实际应用中,上述业务模型可以是评价模型,通过该模型可以对用户未来一段时间的业务风险进行评分,并输出评分结果。
其中,在基于采集到的大量的建模样本进行训练来构建模型的具体过程,在本申请中不再进行详述,本领域技术人员可以参考相关技术中的记载;例如,在实际应用中,上述服务端在训练上述模型时,可以采用诸如评分卡、回归分析或者神经网络等建模方法,利用诸如SAS(Statistical Analysis System,统计分析系统)以及SPSS(Statistical Product and Service Solutions,统计产品与服务解决方案)等较为成熟的数据挖掘工具,通过对采集到的大量建模样本进行训练,来构建上述业务模型。
在本例中,当上述业务模型训练完成后,上述服务端可以采集目标用户的目标样本。其中,在作为上述目标样本以及建模样本的业务数据中,均可以包括若干业务变量,而在这些业务变量中,还可以包含若干行为变量。例如,当上述业务模型为评价模型时,上述目标样本以及建模样本中包含的变量,可以是对业务造成影响的变量,而在这些变量中还可以包括与用户的业务行为对应的业务变量。
需要说明的是,上述目标样本以及建模样本中所包含的行为变量的数量,可以基于 实际的需求进行自定义。例如,在实际应用中,为了探测对业务模型的输出结果影响度最高的用户行为,可以将上述目标样本中的变量,全部定义为行为变量。
当上述服务端采集到目标用户的目标样本后,可以将该目标样本输入训练完成的评价模型中进行业务预测,得到与该目标样本对应的第一结果。
当将上述目标样本输入模型进行业务预测,得到第一结果后,为了探测上述目标样本中,对该第一结果影响度最高的变量,上述服务端可以将上述业务样本中所包含的变量的取值,依次替换为与该变量对应的探测阈值,然后将变量的取值依次被替换后的该目标样本分别输入上述业务模型中进行业务预测。
上述探测阈值,可以是一个能够表征采集到的目标样本中所包含的变量的取值,在目标用户人群中的整体水平的阈值。其中,该目标样本包含的所有变量,可以分别对应一个用于对该变量的取值进行替换的探测阈值。
上述目标用户人群,可以定义为实施与上述目标样本对应的业务的所有人群,也可以定义为与上述目标样本对应的目标用户,所属的某一个特定的业务人群,在本例中不进行特别限定。
在示出的一种实施方式中,上述探测阈值,可以定义为其对应的业务变量的取值,在目标用户人群中的平均数、中位数或者众数中的任一。其中,平均数、中位数以及众数,均为基础的统计学概念。平均数,是指所有取值样本相加后除以取值样本的数量得到的平均值。中位数,是指将所有取值样本高低排序后找出正中间的一个,或者正中间的两个的平均值。众数,是指所有取值样本中出现次数最多的取值样本的取值。
通过这种方式,只需要对上述目标样本中的变量,在目标用户人群中对应的取值作为取值样本,进行简单的统计分析计算,就可以为上述目标样本中的变量分别设置探测阈值。
其中,当将众数作为上述探测阈值时,由于众数可能为多个,因此在这种情况下,可以将该多个众数的平均值,或者其中的任意一个作为上述探测阈值。
在本例中,在将上述目标样本中的变量的取值依次替换与该变量对应的探测阈值时,通常情况下,可以将目标样本中的变量逐个替换为与该变量对应的探测阈值即可。
然而,在实际应用中,上述目标样本中可能会包含多个对应于同一行为的行为变量;在这种情况下,如果上述目标样本中,包含多个行为变量,并且该多个行为变量对应于同一行为,则可以将该多个行为变量的取值同时替换为与该多个行为变量分别对应的探测阈值。
在本例中,当将上述目标样本中的变量的取值,依次替换与该变量对应的探测阈值后,还可以将变量的取值依次被替换后的得到的多个目标样本,分别输入上述业务模型中进行业务预测,得到一个第二结果集合。
另外,在本例中,在将变量的取值依次被替换后得到的多个目标样本,分别输入上述业务模型中进行业务预测后,上述服务端还可以保存取值被替换的变量,与该变量的取值被替换后的上述目标样本,在输入上述业务模型中得到的业务信用评分之间的对应关系。
通过这种方式,后续服务端可以基于上述第二结果集合中的任一第二结果,通过查询该对应关系,来定位到对应的取值被替换的业务变量。
在本例中,上述服务端在探测针对上述第一结果影响度最高的关键变量时,可以将已经得出的上述第一结果与上述第二结果集合中的各第二结果进行数值比较,计算上述第一结果与上述第二结果中各第二结果之间的差值,然后基于计算得到的该差值,来确定对上述第一结果影响度最高的关键变量。
在示出的一种实施方式中,上述服务端可以分别计算上述第二结果集合中的每个第二结果减去上述第一结果之间的差。在确定对上述第一结果影响最高的变量时,可以将第二结果集合中与第一结果之间的差最大的第二结果,确定为关键结果。在确定出关键结果后,上述服务端可以将该关键结果作为查询索引,在预先保存上述对应关系,来确定与该关键结果对应的取值被替换的变量。此时确定出的与该关键结果存在对应关系的变量,即为最终探测到的对第一结果影响度最高的关键变量。
可见,通过这种方式,通过比较变量的取值被依次替换后目标样本在模型中得到的结果,与该目标样本不进行变量取值替换时实际得到的结果之间的差异,就可以快速简易的确定出对第一结果影响度最高的关键业务变量,而不需要深入理解模型的算法,从而可以降低在确定对第一结果影响度最高的变量时的复杂度。
需要说明的是,在实际应用中,上述业务模型可以是信用评价模型。以下以上述业务模型为信用评价模型为例进行说明。
请参考图2,图2是本申请一实施例提供的一种信用提升指引方法,应用于服务端,所述方法执行以下步骤:
步骤201,将目标样本输入信用评价模型得到第一信用评分;所述目标样本包含若干变量;
上述服务端,可以包括用于训练以及使用信用评价模型的服务器、服务器集群或者 基于服务器集群构建的云平台。
上述信用评价模型,可以包括基于预设的建模算法,对采集到的大量的建模样本进行训练后,构建出的用于进行信用评估的数学模型。例如,在实际应用中,上述信用评价模型可以是信用风险评估模型,通过该模型可以对用户的信用风险进行评分,并输出评分结果。
上述信用评分,为上述信用评价模型针对采集到的目标样本进行信用评估后得到的信用评分,该信用评分用于衡量用户在未来一段时间内的信用风险。
例如,在信贷业务的场景中,该信用评价模型可以针对从特定的信贷业务场景中采集到的业务数据进行信用风险评估,得到相应的信用评分,此时该信用评分用于衡量一个用户在未来一段时间内发生信用违约的概率。
其中,在基于采集到的大量的建模样本进行训练来构建信用评价模型的具体过程,在本申请中不再进行详述,本领域技术人员可以参考相关技术中的记载;
例如,在实际应用中,上述服务端在训练上述信用评价模型时,可以采用诸如评分卡、回归分析或者神经网络等建模方法,利用诸如SAS(Statistical Analysis System,统计分析系统)以及SPSS(Statistical Product and Service Solutions,统计产品与服务解决方案)等较为成熟的数据挖掘工具,通过对采集到的大量建模样本进行训练,来构建上述信用评价模型。
在本例中,当上述信用评价模型训练完成后,上述服务端可以采集目标用户的目标样本。该目标用户,即为需要进行信用风险评估的用户。上述建模样本以及上述目标样本,均可以包括从具体的业务场景下采集到的业务数据。作为建模样本的业务数据可以用于模型的训练,而作为目标样本的业务数据则可以用于对目标用户的信用风险进行评估。
其中,在作为上述目标样本以及建模样本的业务数据中,均可以包括若干可能对用户的信用风险造成影响的变量,而在这些变量中,还可以包含若干行为变量。
例如,在信贷业务场景中上述目标样本以及建模样本中包含的变量,可以是对信用风险造成影响的变量;比如,可以包括用户的收入消费数据、历史信贷数据、违约数据、用户的就业状况等等对信用风险造成影响的变量。而在这些变量中,收入消费数据、历史信贷数以及违约数据,分别与用户的消费行为、信贷行为以及违约行为相对应,因此收入消费数据、历史信贷数据以及违约数据可以称之为目标样本中的行为变量。
需要说明的是,上述目标样本以及建模样本中所包含的行为变量的数量,可以基于 实际的需求进行自定义。例如,在实际应用中,为了探测对信用评分影响度最高的用户行为,可以将上述目标样本中的变量,全部定义为行为变量。
当上述服务端采集到目标用户的目标样本后,可以将该目标样本输入训练完成的信用评价模型中进行风险评估,得到与该目标样本对应的第一信用评分。
步骤202,将所述目标样本中的变量的取值依次替换为与该变量对应的探测阈值;
步骤203,将变量的取值依次被替换后的目标样本分别输入所述信用评价模型得到第二信用评分集合;
当将上述目标样本输入模型进行信用风险评估,得到第一信用评分后,为了探测上述目标样本中,对该第一信用评分影响度最高的变量,上述服务端可以将上述业务样本中所包含的变量的取值,依次替换为与该变量对应的探测阈值,然后将变量的取值依次被替换后的该目标样本分别输入上述信用评价模型中进行信用风险评估。
上述探测阈值,可以是一个能够表征采集到的目标样本中所包含的变量的取值,在目标用户人群中的整体水平的阈值。其中,该目标样本包含的所有变量,可以分别对应一个用于对该变量的取值进行替换的探测阈值。
上述目标用户人群,可以定义为实施与上述目标样本对应的业务的所有人群,也可以定义为与上述目标样本对应的目标用户,所属的某一个特定的业务人群,在本例中不进行特别限定。
在示出的一种实施方式中,上述探测阈值,可以定义为其对应的业务变量的取值,在目标用户人群中的平均数、中位数或者众数中的任一。
在相关技术中,在衡量某一个变量的取值在某一目标用户人群中的整体水平时,通常可以采集该目标用户人群中所有用户,对应于该变量的取值作为取值样本,然后计算采集到的所有取值样本的平均数、中位数或者众数,并使用平均数、中位数或者众数中的任一来表征该变量的取值在某一个目标用户人群中的整体水平。
例如,在信贷业务的应用场景中,上述目标样本可以包括诸如收入消费数据、历史信贷数据、违约数据、用户的就业状况等业务变量。假设需要确定收入消费数据这一业务变量在目标用户人群中的整体水平的话,此时可以采集该目标用户人群中所有用户的收入消费数据作为取值样本,然后计算采集到的所有用户的收入消费数据对应的具体消费数额的平均数、中位数或者众数,并使用平均数、中位数或者众数中的任一,作为该目标用户人群中的整体水平。
其中,平均数、中位数以及众数,均为基础的统计学概念。
平均数,是指所有取值样本相加后除以取值样本的数量得到的平均值。
中位数,是指将所有取值样本高低排序后找出正中间的一个,或者正中间的两个的平均值。
众数,是指所有取值样本中出现次数最多的取值样本的取值。
因此,在实际应用中,可以将上述目标样本中的变量的取值,在上述目标用户人群中的平均数、中位数或者众数中的任一,直接设置为上述探测阈值。通过这种方式,只需要对上述目标样本中的变量,在目标用户人群中对应的取值作为取值样本,进行简单的统计分析计算,就可以为上述目标样本中的变量分别设置探测阈值。
其中,当将众数作为上述探测阈值时,由于众数可能为多个,因此在这种情况下,可以将该多个众数的平均值,或者其中的任意一个作为上述探测阈值。
在示出的另一种实施方式中,上述探测阈值还可以定义为,通过特定的统计分析算法针对上述目标样本中的变量,在上述目标用户人群中的取值样本进行统计分析后,得到的能够表征上述目标样本中的变量的取值在上述目标用户人群中的整体水平的阈值。
由于上述目标样本中的业务变量的取值,在目标用户人群中的平均数、中位数或者众数,通常并不能精确的反映该业务变量的取值在目标用户人群中的整体水平。
因此,在实际应用中,除了可以将上述目标样本中的变量的取值,在目标用户人群中的平均数、中位数或者众数中的任一,定义为上述探测阈值以外,在衡量某一个变量的取值在某一目标用户人群中的整体水平时,也可以将该目标用户人群中所有用户,对应于该变量的取值作为取值样本,然后通过特定的统计分析算法进行统计分析,得出一个能够表征上述目标样本中的变量的取值在上述目标用户人群中的整体水平的阈值,然后将该得到的该阈值定义为上述探测阈值。
其中,在针对上述取值样本进行统计分析时,所采用的统计分析算法,可以与构建上述评价模型采用的算法相同,也可以不同。
例如,在实际应用中,也可以采用诸如回归分析等算法,利用诸如SAS或SPSS等较为成熟的数据挖掘工具,针对上述取值样本进行统计分析,得到所有取值样本的取值分布规律,然后基于该取值分布规律确定出一个能够表征该取值变量的取值,在该目标用户人群中的整体水平的阈值,其具体的统计分析过程在本例中不再进行详述,本领域技术人员在付诸实施时可以参考相关技术中的记载。
当然,除了以上示出的针对上述探测阈值的定义方法以外,在实际应用中,也可以通过其它的数学量化方法,来为上述目标样本中的业务变量分别定义探测阈值。
需要强调的是,无论采用何种数学量化方法,最终为上述目标样本中的业务变量分别定义的探测阈值,旨在表征该业务变量的取值,在上述目标用户人群中的整体水平,在本例中不再进行一一列举。
在本例中,在将上述目标样本中的变量的取值依次替换与该变量对应的探测阈值时,通常情况下,可以将目标样本中的变量逐个替换为与该变量对应的探测阈值即可。
例如,假设该目标样本包含三个变量V1、V2和V3,V1、V2和V3对应的探测阈值分别为V1-t、V2-t和V3-t。那么,首先可以使用V1-t替换变量V1的取值,得到一个由变量V1-t、V2和V3构成的目标样本。其次,再使用V2-t替换变量V2的取值,得到一个由变量V1、V2-t和V3构成的目标样本。最后,再使用V3-t替换变量V3的取值,得到一个由变量V1、V2和V3-t构成的目标样本。
然而,在实际应用中,上述目标样本中可能会包含多个对应于同一行为的行为变量。
例如,在信贷业务的应用场景中,假设上述目标样本同时包括“违约金额”、“违约次数”、“收入消费数据”等变量,对于变量“收入消费数据”而言,它与用户的消费行为唯一对应;而对于变量“违约金额”和“违约次数”而言,均与用户的违约行为对应。在这种情况下,变量“违约金额”和“违约次数”即为该目标样本中,对应于同一行为的行为变量。
在本例中,如果上述目标样本中,包含对应于同一行为的多个行为变量,可以将该多个行为变量的取值,同时替换为与该多个行为变量分别对应的探测阈值。
例如,假设该目标样本包含三个变量V1、V2和V3,V1、V2和V3对应的探测阈值分别为V1-t、V2-t和V3-t。其中,V2和V3对应同一种行为。那么,首先可以使用V1-t替换变量V1的取值,得到一个由变量V1-t、V2和V3构成的目标样本。其次,再同时使用V2-t和V3-t分别替换变量V2和V3的取值,得到一个由变量V1、V2-t和V3-t构成的目标样本。
在本例中,当将上述目标样本中的变量的取值,依次替换与该变量对应的探测阈值后,还可以将变量的取值依次被替换后的得到的多个目标样本,分别输入上述信用评价模型中进行信用风险评估,得到一个第二信用评分集合。
例如,假设该目标样本包含三个变量V1、V2和V3,V1、V2和V3对应的探测阈值分别为V1-t、V2-t和V3-t。将变量V1、V2和V3的取值依次替换为V1-t、V2-t和V3-t后,会得到一个由变量V1-t、V2和V3构成的目标样本、一个由变量V1、V2-t和V3构成的目标样本,以及一个由变量V1、V2和V3-t构成的目标样本。在这种情况下,可 以将以上三个目标样本分别输入信用评价模型进行信用风险评估,得到一个信用评分集合,此时该评分集合包含3个信用评分。
另外,在本例中,在将变量的取值依次被替换后得到的多个目标样本,分别输入上述信用评价模型中进行信用风险评估后,上述服务端还可以保存取值被替换的变量,与该变量的取值被替换后的上述目标样本,在输入上述评价模型中得到的信用评分之间的对应关系。
通过这种方式,后续服务端可以基于上述第二信用评分集合中的任一信用评分,通过查询该对应关系,来定位到对应的取值被替换的业务变量。
步骤204,基于所述第二信用评分集合中的每一个第二信用评分与所述第一信用评分之间的差值,确定对所述第一信用评分影响度最高的关键变量。
在本例中,上述服务端在探测针对上述第一信用评分影响度最高的关键变量时,可以将已经得出的上述第一信用评分与上述第二信用评分集合中各信用评分进行数值比较,计算上述第一信用评分与上述第二信用评分集合中各信用评分之间的差值,然后基于计算得到的该差值,来确定对上述第一信用评分影响度最高的关键变量。
在示出的一种实施方式中,上述服务端可以分别计算上述第二信用评分集合中的每个信用评分减去上述第一信用评分之间的差;其中,计算得到的差可能大于0,也可能小于0。
如果计算得到的差大于0,表明该目标样本中某一个变量的取值被替换为对应的探测阈值后,输入模型中得到的信用评分,大于该目标样本未进行取值替换时在模型中得到的信用评分。在这种情况下,信用评分的提高,则可能是由于取值被替换的该变量导致的。
如果计算得到的差小于0,表明该目标样本中某一个变量的取值被替换为对应的探测阈值后,输入模型中得到的信用评分,小于该目标样本未进行取值替换时在模型中得到的信用评分。在这种情况下,则可能是由于取值被替换的该业务变量,拉低了信用评分。
由于信用评分,通常与风险等级成反比,即信用评分越高,相应的风险越低。
因此,在这种情况下,在确定对上述第一信用评分影响最高的变量时,可以将第二信用评分集合中与第一信用评分之间的差最大的信用评分,确定为关键信用评分。
在确定出关键信用评分后,上述服务端可以将该信用评分作为查询索引,在预先保存上述对应关系,来确定与该信用评分对应的取值被替换的变量。此时确定出的与该关 键信用评分存在对应关系的变量,即为最终探测到的对第一信用评分影响度最高的关键变量。
例如,当某一变量被替换后的目标样本,在输入模型后得到的信用评分,与上述第一信用评分的差最大的话,表明该变量的取值,替换为该变量在目标用户人群中的整体水平后,相较于其它被替换的变量,最终得到的信用评分显著增大,风险显著降低。
在这种情况下,该变量不被替换时,用户的风险相对较高,实际上是由于该变量拉低了上述第一信用评分,表明与该目标样本对应的目标用户在该变量上的表现,低于目标用户人群的整体水平。从而,在这种情境中,将该变量确定为关键业务变量则是合理的。
步骤205,将该影响度最高的关键变量对应的物理含义作为信用提升指引输出给所述目标样本对应的用户。
当确定了对上述第一信用评分影响度最高的关键变量后,此时可以进一步将该关键变量对应的物理含义,作为信用提升指引输出给上述目标样本对应的目标用户。
在示出的一种实施方式中,与上述关键变量对应的物理含义,可以是与该关键变量对应的用户行为。上述服务端在通过以上示出的方式,确定出上述关键变量后,可以进一步判断该关键变量是否为行为变量,如果该关键变量是行为变量的话,上述服务端还可以将与该关键变量对应的行为,作为行为指引向与上述目标样本对应的目标用户输出。
在这种情况下,该目标用户可以通过输出的该行为指引,了解到自身可能是由于何种行为,提升了自身的风险,拉低了信用评分。后续目标用户可以通过改善该行为来降低自己的风险,提升信用评分。
例如,在信贷业务的场景中,假设上述关键变量为上述目标样本中的违约次数变量,该关键变量对应的业务行为为违约行为,此时系统可以向用户输出一个“避免违约次数过多来提升信用评分”的信用提升指引,此时一个信用评分较低的用户在查看到系统输出的该信用提升指引后,在未来可以有针对性的注意自己的履约行为,尽可能按时还款,减少违约记录,来提升自己的信用评分。
可见,通过这种方式,通过比较变量的取值被依次替换后目标样本在评价模型中得到的信用评分,与该目标样本实际的得到的信用评分之间的差异,就可以快速简易的确定出影响信用评分的关键业务变量,而不需要深入理解模型的算法,从而可以降低在确定对信用评分影响度最高的变量时的复杂度。
同时,通过向用户输出信用提升指引,使得用户可以直观的了解到自身信用的“短 板”,从而可以通过改善自身的信用短板,来提升自身的信用等级。
在本例中,如果上述第二信用评分集合中的每个信用评分与上述第一信用评分之间的差均小于0时,由于信用评分与风险等级成反比,在这种情况下,表明与该目标样本对应的目标用户,在该目标样本中包含的每一个变量上的表现,均优于目标用户人群的整体水平(即将取值替换为整体水平后,风险反而增大了)。
因此,在这种情境中,可以不输出上述信用提升指引,而是向上述目标用户输出一条预设的提示消息;该提示消息用于提示该目标用户的信用风险可控;例如,当上述信用评分为信用风险评估模型得到的信用评分时,上述提示消息可以是一条“您的信用记录良好”的提示消息。
当然,在实际应用中,如果评价模型定义的信用评分,与风险等级成正比,即信用评分越高,相应的风险也越高的话,在确定对上述第一信用评分的影响度最高的关键变量的实施过程,与以上示出的实施过程相反。
在这种情况下,在确定对上述第一信用评分影响最高的业务变量时,可以计算第一信用评分减去第二信用评分集合中各信用评分的差,并将第一信用评分与第二信用评分集合中各信用评分之间的差最大的信用评分,确定为关键信用评分,然后通过查找上述对应关系,来确定对第一信用评分影响度最高的关键变量。
以下结合具体的实例对以上实施例中的技术方案进行详细描述。
请参见图3,图3为本例示出的一种信用评价模型中输出信用提升指引的处理流程图。
如图3所示,上述信用风险评估模型包含V1、V2和V3三个业务变量的模型,其中V1、V2和V3均为行为变量,与V1、V2和V3对应的探测阈值分别为V1-t、V2-t和V3-t。
V1-t、V2-t和V3-t分别为V1、V2和V3在目标用户人群中的平均值(图3示出用mean函数求解V1、V2和V3在目标用户人群中的平均值得到V1-t、V2-t和V3-t)。
在初始状态,上述服务端在采集到目标用户的目标样本后,可以将该目标样本输入上述模型中进行信用评估得到信用评分,记为Score1。
在确定对Score1影响度最高的关键业务变量,可以将V1、V2和V3的取值依次替换为对应的探测阈值。
首先,可以使用V1-t替换业务变量V1的取值,得到一个由业务变量V1-t、V2和V3构成的目标样本。
其次,再使用V2-t替换业务变量V2的取值,得到一个由业务变量V1、V2-t和V3构成的目标样本。
最后,再使用V3-t替换业务变量V3的取值,得到一个由业务变量V1、V2和V3-t构成的目标样本。
替换完成后,可以将得到的以上三个分别由V1-t、V2和V3构成的目标样本、由V1、V2-t和V3构成的目标样本,以及由V1、V2和V3-t构成的目标样本分别输入上述模型中进行信用风险评估得到信用评分。其中,在本例中,信用评分越高,目标用户的信用等级越高,违约概率越低。
假设:
由V1-t、V2和V3构成的目标样本在模型中得到的信用评分记为Score_V1。上述服务端可以保存V1与Score_V1的对应关系。
由V1、V2-t和V3构成的目标样本在模型中得到的信用评分记为Score_V2。上述服务端可以保存V2与Score_V2的对应关系。
由V1、V2和V3-t构成的目标样本在模型中得到的信用评分记为Score_V3。上述服务端可以保存V3与Score_V3的对应关系。
上述服务端在输出信用提升指引时,可以分别计算Score_V1、Score_V2、Score_V3减去Score1的差。
将Score_V1和Score1的差记为delta_Score_V1。
将Score_V2和Score1的差记为delta_Score_V2。
将Score_V3和Score1的差记为delta_Score_V3。
然后将与Score1差最大的信用评分确定为关键评分,并查询上述对应关系,将与该关键评分对应的业务变量确定为关键变量。此时,该关键变量对应的业务行为,即为需要输出的信用提升指引。
假设确定出Score_V1与Score1的差delta_Score_V1最大,那么上述服务端可以查询上述对应关系,将与Score_V1对应的业务变量V1确定为对信用评分Score1影响程度最高的关键变量,并输出与业务变量V1对应的业务行为作为关键业务行为向用户输出。
例如,如果业务变量V1对应的业务行为为违约行为,系统可以向用户输出一个“避免违约次数过多来提升信用评分”的信用提升指引,此时上述目标用户在查看到系统输出的该信用提升指引后,在未来可以有针对性的注意自己的履约行为,尽可能按时还款,减少违约记录,来提升自己的信用评分Score1。
当然,如果Score_V1、Score_V2和Score_V3与Score1的差均小于0,
表明该目标用户,在V1、V2和V3对应的业务行为上的表现,均优于目标用户人群的整体水平,在这种情境中,可以不输出上述行为指引;或者,系统可以向用户输出的一个用于提示目标用户当前的信用记录良好的提示消息。
通过以上各实施例可知,通过将目标样本输入模型得到第一结果;将所述目标样本中的变量的取值依次替换为与该变量对应的探测阈值,并将变量的取值依次被替换后的目标样本分别输入所述模型得到第二结果集合;然后基于所述第二结果集合中的每一个第二结果与所述第一结果之间的差值,确定对所述第一结果最高的关键变量,实现了通过比较变量的取值被依次替换后,目标样本在模型中得到的第二结果,与该目标样本实际的得到的第一结果之间的差异,就可以确定出对第一结果影响度最高的关键变量,而不需要深入理解模型的算法;
当本申请的技术方案应用于信用评价模型中时,可以实现通过比较变量的取值被依次替换后,目标样本在信用评价模型中得到的信用评分,与该目标样本实际的得到的信用评分之间的差异,就可以确定出对用户的信用评分影响度最高的关键变量,而不需要深入理解模型的算法,从而可以降低在探测对信用评分影响度最高的变量时的复杂度;同时,通过将该关键变量对应的物理含义作为信用提升指引输出给所述目标样本对应的用户,可以使用户能够直观的了解到提升自身信用的途径,从而可以提升用户体验。
与上述方法实施例相对应,本申请还提供了装置的实施例。
请参见图4,本申请提出一种模型中关键变量的探测装置40,应用于服务端;其中,请参见图5,作为承载所述模型中关键变量的探测装置40的服务端所涉及的硬件架构中,通常包括CPU、内存、非易失性存储器、网络接口以及内部总线等;以软件实现为例,所述评价模型中关键变量的探测装置40通常可以理解为加载在内存中的计算机程序,通过CPU运行之后形成的软硬件相结合的逻辑装置,所述装置40包括:
第一输入模块401,用于将目标样本输入模型得到第一结果;所述目标样本包含若干变量;
第一替换模块402,用于将所述目标样本中的变量的取值依次替换为与该变量对应的探测阈值;
第二输入模块403,用于将变量的取值依次被替换后的目标样本分别输入所述模型得到第二结果集合;
第一确定模块404,用于基于所述第二结果集合中的每一个第二结果与所述第一结 果之间的差值,确定对所述第一结果影响度最高的关键变量。
在本例中,所述探测阈值表征其对应变量在目标人群中取值的整体水平;
其中,所述探测阈值是其对应变量在目标人群中取值的平均数、中位数或者众数。
在本例中,所述替换模块402具体用于:
分别计算所述第二结果集合中的每个第二结果减去所述第一结果的差;
将与最大的差对应的第二结果所对应的取值被替换后的变量确定为对所述第一结果影响度最高的关键变量。
请参见图6,本申请提出一种信用提升指引装置60,应用于服务端;其中,请参见图7,作为承载所述信用提升指引60的服务端所涉及的硬件架构中,通常包括CPU、内存、非易失性存储器、网络接口以及内部总线等;以软件实现为例,所述评价模型中关键变量的探测装置60通常可以理解为加载在内存中的计算机程序,通过CPU运行之后形成的软硬件相结合的逻辑装置,所述装置60包括:
第三输入模块601,用于将目标样本输入信用评价模型得到第一信用评分;所述目标样本包含若干变量;
第二替换模块602,用于将所述目标样本中的变量的取值依次替换为与该变量对应的探测阈值;
第四输入模块603,用于将变量的取值依次被替换后的目标样本分别输入所述信用评价模型得到第二信用评分集合;
第二确定模块604,用于基于所述第二信用评分集合中的每一个第二信用评分与所述第一信用评分之间的差值,确定对所述第一信用评分影响度最高的关键变量;
输出模块605,用于将该影响度最高的关键变量对应的物理含义作为信用提升指引输出给所述目标样本对应的用户。
在本例中,所述探测阈值表征其对应变量在目标人群中取值的整体水平;
其中,所述探测阈值是其对应变量在目标人群中取值的平均数、中位数或者众数。
在本例中,所述第二替换模块602进一步用于:
如果所述目标样本中包含对应于同一行为变量的多个行为子变量时,将该多个行为子变量的取值均替换为与该多个行为子变量分别对应的探测阈值。
在本例中,所述第二确定模块604具体用于:
分别计算所述第二信用评分集合中的每个第二信用评分减去所述第一信用评分的差;
将与最大的差对应的第二信用评分所对应的取值被替换后的变量确定为对所述第一信用评分影响度最高的关键变量。
在本例中,所述输出模块605具体用于:
判断所述关键变量是否为行为变量;如果所述关键变量是行为变量,将与该关键变量对应的行为,作为行为指引向与所述目标样本对应的目标用户输出。
在本例中,所述输出模块605进一步用于:
当所述第二信用评分集合中的每个第二信用评分减去所述第一信用评分得到的差均小于0时,输出预设的提示消息;所述提示消息提示与所述目标样本对应的目标用户信用风险可控。
本领域技术人员在考虑说明书及实践这里公开的发明后,将容易想到本申请的其它实施方案。本申请旨在涵盖本申请的任何变型、用途或者适应性变化,这些变型、用途或者适应性变化遵循本申请的一般性原理并包括本申请未公开的本技术领域中的公知常识或惯用技术手段。说明书和实施例仅被视为示例性的,本申请的真正范围和精神由下面的权利要求指出。
应当理解的是,本申请并不局限于上面已经描述并在附图中示出的精确结构,并且可以在不脱离其范围进行各种修改和改变。本申请的范围仅由所附的权利要求来限制。
以上所述仅为本申请的较佳实施例而已,并不用以限制本申请,凡在本申请的精神和原则之内,所做的任何修改、等同替换、改进等,均应包含在本申请保护的范围之内。

Claims (18)

  1. 一种模型中关键变量的探测方法,其特征在于,该方法包括:
    将目标样本输入模型得到第一结果;所述目标样本包含若干变量;
    将所述目标样本中的变量的取值依次替换为与该变量对应的探测阈值;
    将变量的取值依次被替换后的目标样本分别输入所述模型得到第二结果集合;
    基于所述第二结果集合中的每一个第二结果与所述第一结果之间的差值,确定对所述第一结果影响度最高的关键变量。
  2. 根据权利要求1所述的方法,其特征在于,所述探测阈值表征其对应变量在目标人群中取值的整体水平;
    其中,所述探测阈值是其对应变量在目标人群中取值的平均数、中位数或者众数。
  3. 根据权利要求1所述的方法,其特征在于,所述基于所述第二结果集合中的每一个第二结果与所述第一结果之间的差值,确定对所述第一结果影响度最高的关键变量,包括:
    分别计算所述第二结果集合中的每个第二结果减去所述第一结果的差;
    将与最大的差对应的第二结果所对应的取值被替换后的变量确定为对所述第一结果影响度最高的关键变量。
  4. 一种信用提升指引方法,其特征在于,该方法包括:
    将目标样本输入信用评价模型得到第一信用评分;所述目标样本包含若干变量;
    将所述目标样本中的变量的取值依次替换为与该变量对应的探测阈值;
    将变量的取值依次被替换后的目标样本分别输入所述信用评价模型得到第二信用评分集合;
    基于所述第二信用评分集合中的每一个第二信用评分与所述第一信用评分之间的差值,确定对所述第一信用评分影响度最高的关键变量;
    将该影响度最高的关键变量对应的物理含义作为信用提升指引输出给所述目标样本对应的用户。
  5. 根据权利要求4所述的方法,其特征在于,所述探测阈值表征其对应变量在目标人群中取值的整体水平;
    其中,所述探测阈值是其对应变量在目标人群中取值的平均数、中位数或者众数。
  6. 根据权利要求4所述的方法,其特征在于,如果所述目标样本中包含多个行为变量,并且该多个行为变量对应于同一行为,则将该多个行为变量的取值均替换为与该 多个行为变量分别对应的探测阈值。
  7. 根据权利要求4所述的方法,其特征在于,所述基于所述第二信用评分集合中的每一个第二信用评分与所述第一信用评分之间的差值,确定对所述第一信用评分影响度最高的关键变量,包括:
    分别计算所述第二信用评分集合中的每个第二信用评分减去所述第一信用评分的差;
    将与最大的差对应的第二信用评分所对应的取值被替换后的变量确定为对所述第一信用评分影响度最高的关键变量。
  8. 根据权利要求4所述的方法,其特征在于,所述将该影响度最高的关键变量对应的物理含义作为信用提升指引输出给所述目标样本对应的用户包括:
    判断所述关键变量是否为行为变量;
    如果所述关键变量是行为变量,将与该关键变量对应的行为,作为行为指引向与所述目标样本对应的目标用户输出。
  9. 根据权利要求8所述的方法,其特征在于,当所述第二信用评分集合中的每个第二信用评分减去所述第一信用评分得到的差均小于0时,输出预设的提示消息;所述提示消息提示与所述目标样本对应的目标用户信用风险可控。
  10. 一种模型中关键变量的探测装置,其特征在于,该装置包括:
    第一输入模块,用于将目标样本输入模型得到第一结果;所述目标样本包含若干变量;
    第一替换模块,用于将所述目标样本中的变量的取值依次替换为与该变量对应的探测阈值;
    第二输入模块,用于将变量的取值依次被替换后的目标样本分别输入所述模型得到第二结果集合;
    第一确定模块,用于基于所述第二结果集合中的每一个第二结果与所述第一结果之间的差值,确定对所述第一结果影响度最高的关键变量。
  11. 根据权利要求10所述的装置,其特征在于,所述探测阈值表征其对应变量在目标人群中取值的整体水平;
    其中,所述探测阈值是其对应变量在目标人群中取值的平均数、中位数或者众数。
  12. 根据权利要求10所述的装置,其特征在于,所述第一替换模块具体用于:
    分别计算所述第二结果集合中的每个第二结果减去所述第一结果的差;
    将与最大的差对应的第二结果所对应的取值被替换后的变量确定为对所述第一结果影响度最高的关键变量。
  13. 一种信用提升指引装置,其特征在于,该装置包括:
    第三输入模块,用于将目标样本输入信用评价模型得到第一信用评分;所述目标样本包含若干变量;
    第二替换模块,用于将所述目标样本中的变量的取值依次替换为与该变量对应的探测阈值;
    第四输入模块,用于将变量的取值依次被替换后的目标样本分别输入所述信用评价模型得到第二信用评分集合;
    第二确定模块,用于基于所述第二信用评分集合中的每一个第二信用评分与所述第一信用评分之间的差值,确定对所述第一信用评分影响度最高的关键变量;
    输出模块,用于将该影响度最高的关键变量对应的物理含义作为信用提升指引输出给所述目标样本对应的用户。
  14. 根据权利要求13所述的装置,其特征在于,所述探测阈值表征其对应变量在目标人群中取值的整体水平;
    其中,所述探测阈值是其对应变量在目标人群中取值的平均数、中位数或者众数。
  15. 根据权利要求13所述的装置,其特征在于,所述第二替换模块进一步用于:
    如果所述目标样本中包含对应于同一行为变量的多个行为子变量时,将该多个行为子变量的取值均替换为与该多个行为子变量分别对应的探测阈值。
  16. 根据权利要求13所述的装置,其特征在于,所述第二确定模块具体用于:
    分别计算所述第二信用评分集合中的每个第二信用评分减去所述第一信用评分的差;
    将与最大的差对应的第二信用评分所对应的取值被替换后的变量确定为对所述第一信用评分影响度最高的关键变量。
  17. 根据权利要求13所述的装置,其特征在于,所述输出模块具体用于:
    判断所述关键变量是否为行为变量;如果所述关键变量是行为变量,将与该关键变量对应的行为,作为行为指引向与所述目标样本对应的目标用户输出。
  18. 根据权利要求17所述的装置,其特征在于,所述输出模块进一步用于:
    当所述第二信用评分集合中的每个第二信用评分减去所述第一信用评分得到的差均小于0时,输出预设的提示消息;所述提示消息提示与所述目标样本对应的目标用户 信用风险可控。
PCT/CN2017/097434 2016-08-26 2017-08-15 模型中关键变量的探测方法及装置 WO2018036402A1 (zh)

Priority Applications (3)

Application Number Priority Date Filing Date Title
SG11201901614SA SG11201901614SA (en) 2016-08-26 2017-08-15 Method and device for determining key variable in model
US16/283,381 US20190220924A1 (en) 2016-08-26 2019-02-22 Method and device for determining key variable in model
PH12019500406A PH12019500406A1 (en) 2016-08-26 2019-02-26 Method and device for determining key variable in model

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201610741714.7 2016-08-26
CN201610741714.7A CN107784411A (zh) 2016-08-26 2016-08-26 模型中关键变量的探测方法及装置

Related Child Applications (1)

Application Number Title Priority Date Filing Date
US16/283,381 Continuation US20190220924A1 (en) 2016-08-26 2019-02-22 Method and device for determining key variable in model

Publications (1)

Publication Number Publication Date
WO2018036402A1 true WO2018036402A1 (zh) 2018-03-01

Family

ID=61246425

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2017/097434 WO2018036402A1 (zh) 2016-08-26 2017-08-15 模型中关键变量的探测方法及装置

Country Status (6)

Country Link
US (1) US20190220924A1 (zh)
CN (1) CN107784411A (zh)
PH (1) PH12019500406A1 (zh)
SG (1) SG11201901614SA (zh)
TW (1) TWI677830B (zh)
WO (1) WO2018036402A1 (zh)

Families Citing this family (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20200410586A1 (en) * 2018-05-31 2020-12-31 Simplecredit Micro-Lending Co., Ltd. Adjusting Method and Adjusting Device, Server and Storage Medium for Scorecard Model
CN109191096A (zh) * 2018-08-22 2019-01-11 阿里巴巴集团控股有限公司 一种签约风险量化方法、代扣风险量化方法、装置及设备
US11475515B1 (en) * 2019-10-11 2022-10-18 Wells Fargo Bank, N.A. Adverse action methodology for credit risk models
CN112017042A (zh) * 2020-10-22 2020-12-01 北京淇瑀信息科技有限公司 基于tweedie分布的资源配额确定方法、装置和电子设备

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102222125A (zh) * 2010-04-13 2011-10-19 利弗莫尔软件技术公司 工程设计优化中确定最大影响设计变量的方法和系统
CN104392096A (zh) * 2014-10-23 2015-03-04 华为技术有限公司 一种统计方法及装置
CN105260863A (zh) * 2015-11-26 2016-01-20 国家电网公司 一种基于电力电缆故障信息的故障单影响因素分析方法
US9286286B1 (en) * 2015-01-03 2016-03-15 Chahid Kamel Ghaddar Method, apparatus, and computer program product for optimizing parameterized models using functional paradigm of spreadsheet software
CN105740280A (zh) * 2014-12-10 2016-07-06 阿里巴巴集团控股有限公司 检测变量重要性的方法和装置

Family Cites Families (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8078524B2 (en) * 2001-02-22 2011-12-13 Fair Isaac Corporation Method and apparatus for explaining credit scores
US9569797B1 (en) * 2002-05-30 2017-02-14 Consumerinfo.Com, Inc. Systems and methods of presenting simulated credit score information
TWI340345B (en) * 2006-08-10 2011-04-11 Uniminer Inc Method for selecting critical variables
US8515862B2 (en) * 2008-05-29 2013-08-20 Sas Institute Inc. Computer-implemented systems and methods for integrated model validation for compliance and credit risk
US20130151388A1 (en) * 2011-12-12 2013-06-13 Visa International Service Association Systems and methods to identify affluence levels of accounts
US8712907B1 (en) * 2013-03-14 2014-04-29 Credibility Corp. Multi-dimensional credibility scoring

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102222125A (zh) * 2010-04-13 2011-10-19 利弗莫尔软件技术公司 工程设计优化中确定最大影响设计变量的方法和系统
CN104392096A (zh) * 2014-10-23 2015-03-04 华为技术有限公司 一种统计方法及装置
CN105740280A (zh) * 2014-12-10 2016-07-06 阿里巴巴集团控股有限公司 检测变量重要性的方法和装置
US9286286B1 (en) * 2015-01-03 2016-03-15 Chahid Kamel Ghaddar Method, apparatus, and computer program product for optimizing parameterized models using functional paradigm of spreadsheet software
CN105260863A (zh) * 2015-11-26 2016-01-20 国家电网公司 一种基于电力电缆故障信息的故障单影响因素分析方法

Also Published As

Publication number Publication date
TW201807623A (zh) 2018-03-01
CN107784411A (zh) 2018-03-09
US20190220924A1 (en) 2019-07-18
PH12019500406A1 (en) 2020-03-02
SG11201901614SA (en) 2019-03-28
TWI677830B (zh) 2019-11-21

Similar Documents

Publication Publication Date Title
JP6771751B2 (ja) リスク評価方法およびシステム
CN107633265B (zh) 用于优化信用评估模型的数据处理方法及装置
US11315149B2 (en) Brand personality inference and recommendation system
WO2018036402A1 (zh) 模型中关键变量的探测方法及装置
WO2020073714A1 (zh) 训练样本获取方法,账户预测方法及对应装置
CN106027577A (zh) 一种异常访问行为检测方法及装置
CN110442516B (zh) 信息处理方法、设备及计算机可读存储介质
JP2011048822A5 (zh)
CN110971659A (zh) 推荐消息的推送方法、装置及存储介质
CN111754241A (zh) 一种用户行为感知方法、装置、设备及介质
WO2017071369A1 (zh) 一种预测用户离网的方法和设备
CN110162939B (zh) 人机识别方法、设备和介质
CN111090833A (zh) 一种数据处理方法、系统及相关设备
CN114448657B (zh) 一种配电通信网络安全态势感知与异常入侵检测方法
CN115063035A (zh) 基于神经网络的客户评估方法、系统、设备及存储介质
US20210357699A1 (en) Data quality assessment for data analytics
CN110704614B (zh) 对应用中的用户群类型进行预测的信息处理方法及装置
CN115204322B (zh) 行为链路异常识别方法和装置
CN115481694B (zh) 一种训练样本集的数据增强方法、装置、设备及存储介质
WO2023029065A1 (zh) 数据集质量评估方法、装置、计算机设备及存储介质
TWI626550B (zh) 用於預測系統障礙熱區之處理系統與方法
CN111221704A (zh) 一种确定办公管理应用系统运行状态的方法及系统
US20240184812A1 (en) Distributed active learning in natural language processing for determining resource metrics
CN112733015B (zh) 一种用户行为分析方法、装置、设备及介质
US20220027779A1 (en) Value over replacement feature (vorf) based determination of feature importance in machine learning

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 17842825

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 17842825

Country of ref document: EP

Kind code of ref document: A1