WO2022062193A1 - Individual credit assessment and explanation method and apparatus based on time sequence attribution analysis, and device and storage medium - Google Patents

Individual credit assessment and explanation method and apparatus based on time sequence attribution analysis, and device and storage medium Download PDF

Info

Publication number
WO2022062193A1
WO2022062193A1 PCT/CN2020/135274 CN2020135274W WO2022062193A1 WO 2022062193 A1 WO2022062193 A1 WO 2022062193A1 CN 2020135274 W CN2020135274 W CN 2020135274W WO 2022062193 A1 WO2022062193 A1 WO 2022062193A1
Authority
WO
WIPO (PCT)
Prior art keywords
credit
time
historical
data
model
Prior art date
Application number
PCT/CN2020/135274
Other languages
French (fr)
Chinese (zh)
Inventor
程人
李青山
司华友
Original Assignee
南京博雅区块链研究院有限公司
北京大学
博雅正链(北京)科技有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 南京博雅区块链研究院有限公司, 北京大学, 博雅正链(北京)科技有限公司 filed Critical 南京博雅区块链研究院有限公司
Publication of WO2022062193A1 publication Critical patent/WO2022062193A1/en

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q40/00Finance; Insurance; Tax strategies; Processing of corporate or income taxes
    • G06Q40/03Credit; Loans; Processing thereof
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/243Classification techniques relating to the number of classes
    • G06F18/24323Tree-organised classifiers
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/084Backpropagation, e.g. using gradient descent
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q50/00Information and communication technology [ICT] specially adapted for implementation of business processes of specific business sectors, e.g. utilities or tourism
    • G06Q50/01Social networking

Definitions

  • the invention relates to the field of financial credit reporting, in particular to a personal credit evaluation and interpretation method, device, equipment and storage medium based on time series attribution analysis.
  • Attribution analysis technology is an effective way to mine and identify the triggering factors of credit events in the field of credit reporting. At present, this technology has been applied to personal credit evaluation to try to make a valuable logical explanation for the evaluation results.
  • the core scoring model of the scoring system is trained based on the historical credit data of a certain period of time.
  • the attributes of the credit reporting subject will change over time, and even some new attributes that have a significant impact on the evaluation results will appear.
  • Using such a scoring model may even fail to obtain objective and valid evaluation results, let alone provide valuable logical explanations for the evaluation results.
  • a first aspect of the present invention provides a personal credit evaluation and interpretation method based on time series attribution analysis, and its specific technical scheme is as follows:
  • a personal credit evaluation and interpretation method based on time series attribution analysis which includes:
  • the credit scoring model is a weighted scoring model or an unweighted scoring model
  • the credit scoring models are separately trained by using several groups of time-labeled historical credit scoring models to obtain several trained time-labeled historical credit scoring models, wherein: each of the historical credit reporting data sets is Including multiple pieces of historical credit data, the historical credit data located in the same group has the same time label, and the historical credit data located in different groups has different time labels, and the time label represents the data of the historical credit data to which it belongs generation time;
  • the category of the credit scoring model predict and obtain several future credit scoring models with time stamps based on the several historical credit scoring models with time stamps or the sets of historical credit investigation data sets with time stamps , wherein the time labels of each of the future credit scoring models are different;
  • a second aspect of the present invention provides a personal credit evaluation and interpretation device based on time series attribution analysis, which includes:
  • a model initialization module for constructing a credit scoring model and initializing model parameters, where the credit scoring model is a weighted scoring model or an unweighted scoring model;
  • the historical credit scoring model acquisition module is used to separately train the credit scoring model by using several groups of historical credit reporting data sets with time tags to obtain several trained historical credit scoring models with time tags, wherein: Each of the historical credit investigation data sets includes multiple pieces of historical credit investigation data, the historical credit investigation data in the same group has the same time tag, and the historical credit investigation data in different groups has different time tags, and the time tags represent The data generation time of the historical credit data to which it belongs;
  • the future credit scoring model acquisition module is used to predict and obtain a number of historical credit scoring models based on the several time-tagged historical credit scoring models or the several groups of time-tagged historical credit reporting data sets according to the category of the credit scoring model.
  • a future credit scoring model with a time stamp wherein the time stamps of each of the future credit scoring models are different;
  • the credit evaluation module is used to input the credit data to be evaluated into the selected historical credit scoring model or future credit scoring model with a time stamp, so as to obtain the credit reporting subject corresponding to the credit data to be evaluated at the time The credit evaluation result at the time point corresponding to the label;
  • the interpretation module is used to interpret the credit evaluation result.
  • a third aspect of the present invention provides an electronic device, comprising a memory, a processor, and a computer program stored in the memory and running on the processor, the processor implementing the program described in the first aspect of the present invention when the processor executes the program A method of personal credit evaluation and interpretation based on time series attribution analysis.
  • a fourth aspect of the present invention provides a computer-readable storage medium, where a computer program is stored on the computer-readable storage medium, and when the program is executed by a processor, the time-series attribution analysis-based algorithm described in the first aspect of the present invention is implemented. Personal credit assessment and interpretation methods.
  • the present invention Compared with the credit scoring model in the prior art, the present invention has the following significant advantages:
  • FIG. 1 is a schematic flowchart of a personal credit evaluation and interpretation method based on time series attribution analysis in an embodiment of the present invention
  • FIG. 2 is a schematic flowchart of a personal credit evaluation and interpretation method based on time series attribution analysis in an embodiment of the present invention
  • FIG. 3 is a schematic flowchart of a personal credit evaluation and interpretation method based on time series attribution analysis in an embodiment of the present invention
  • FIG. 4 is a schematic flowchart of a personal credit evaluation and interpretation method based on time series attribution analysis in an embodiment of the present invention
  • FIG. 5 is a schematic flowchart of a personal credit evaluation and interpretation method based on time series attribution analysis in an embodiment of the present invention
  • FIG. 6 is a schematic structural diagram of an apparatus for evaluating and explaining personal credit based on time series attribution analysis in an embodiment of the present invention
  • FIG. 7 is a schematic structural diagram of an electronic device in an embodiment of the present invention.
  • FIG. 8 is a data structure diagram of credit reporting data of a credit reporting subject in an embodiment of the present invention.
  • FIG. 9 is a schematic diagram of a logic for obtaining a historical scoring model and a future scoring model in an embodiment of the present invention.
  • FIG. 10 is a schematic diagram of a linear regression model of the attribute “debt ratio” in an embodiment of the present invention.
  • FIG. 11 is a flowchart of an evaluation result interpretation process in an embodiment of the present invention.
  • FIG. 12 is a logic diagram of a method for obtaining several approximate sample data by perturbing the attribute value of each attribute of the credit data in an embodiment of the present invention.
  • the present invention provides method operation steps or device structures as shown in the following embodiments or drawings, more or less operation steps or module units may be included in the method or device based on routine or without creative work. .
  • the execution order of these steps or the module structure of the apparatus is not limited to the execution order or module structure shown in the embodiments of the present invention or the accompanying drawings.
  • the method or the module structure is applied in an actual device or terminal product, it can be executed sequentially or in parallel according to the method or the module structure shown in the embodiments or the accompanying drawings.
  • the core scoring model of the existing credit scoring system is obtained by training based on the historical credit data of a certain period of time.
  • the attributes of the credit reporting subject will change over time, and even some new attributes that have a significant impact on the evaluation results will appear.
  • Using the existing scoring models may not even be able to obtain objective and valid evaluation results, let alone provide valuable logical explanations for the evaluation results.
  • the present invention provides a personal credit evaluation and interpretation method, device, equipment and storage medium based on time series attribution analysis, which is aimed at multiple historical time points and multiple futures.
  • Time points build a series of credit scoring models over time.
  • Choosing an appropriate credit scoring model can realize the evaluation of the credit status of the credit reporting subject at a specific point in the past, present or future, thereby significantly improving the evaluation effect and ensuring the interpretability of the evaluation results.
  • Credit data data related to the credit of the credit subject.
  • the credit data includes several attributes (or called features), and each attribute has an attribute value.
  • Each attribute value of each piece of credit information may be collected from one data source, or may be collected from several data sources.
  • FIG. 8 shows the data format of the credit data of the credit reporting subject in one embodiment, the credit data includes 12 attributes collected from three data sources, and each attribute corresponds to an attribute value .
  • the three data sources are financial institutions (such as banks), consumer platforms (such as Taobao, Meituan, etc.), and social networks.
  • the first two data sources can directly reflect the credit reporting behavior of credit reporting subjects, and the introduction of social networks can improve the scoring accuracy of the scoring model to a certain extent.
  • Time label The time label of the credit data is used to indicate the generation time of the credit data.
  • the span of the time label can be a year, a quarter, a month, etc.
  • the span of the time label in the embodiment shown in FIG. 7 is a year, and the time label of the credit data shown in it is "2018", which means that the credit data is All attribute data are generated in 2018.
  • scoring model F can be expressed as the following form
  • x i is the attribute value of the i-th attribute of the credit reporting subject
  • ⁇ i is the weight corresponding to the i-th attribute
  • n is the number of attributes of the credit reporting subject.
  • the scoring model F is defined as a weighted scoring model.
  • scoring model F is defined as an unweighted scoring model.
  • the weighted scoring model can choose a logistic regression model.
  • logistic regression has always been the mainstream classification algorithm in the industry. It has the advantages of simplicity and stability, strong interpretability, and easy detection and deployment.
  • GBDT gradient boosting decision tree
  • DNN Deep neural network
  • a deep neural network can be understood as a neural network including multiple hidden layers, which can be adjusted in a very high dimension through huge parameters through techniques such as activation function and backpropagation, so that complex classification boundaries can be fully identified. achieve a good classification effect.
  • the method for evaluating and explaining personal credit based on time series attribution analysis includes the following steps:
  • each historical credit data set includes multiple pieces of historical credit data
  • the historical credit data in the same group has the same time label
  • the historical credit data in different groups has different time labels
  • the time label represents the time label it belongs to The data generation time of the historical credit information data.
  • the span (granularity) of the time label can be a year, a quarter, a month, or even a day.
  • the shorter the span of the time label the finer the granularity
  • the more historical credit information with the same time label is collected.
  • the more uniform the data distribution in the dataset the better the scoring effect of the trained historical credit scoring model.
  • the span of the time label can be selected according to specific needs.
  • the time label is year
  • the current year is 2020
  • the credit data of the past three years is collected and sampled to obtain three sets of historical credit data sets, respectively 2017 credit data set and 2018 credit data set.
  • Credit reference dataset and 2019 credit reference dataset For example, all credit data in the 2017 credit data set were generated in 2017, and each piece of credit data represents the credit status of a credit subject in 2017.
  • Three sets of historical credit data sets are used as training samples to train the credit scoring model respectively, that is, three historical credit scoring models can be obtained correspondingly, namely the historical credit scoring model in 2017, the historical credit scoring model in 2018 and the historical credit scoring model in 2019. Evaluate the model. Of course, in practical applications, more years of credit data can be collected to train more historical credit scoring models.
  • the credit scoring model is a weighted scoring model, such as a logistic regression model.
  • step S200 The specific implementation process of step S200 is as follows:
  • the logistic regression expression can be solved iteratively by adopting the classical gradient descent idea, and the training speed is usually very fast.
  • the results obtained by logistic regression can be easily converted into a standard scorecard model, that is, the final total credit score can be split to obtain the dimensional credit score corresponding to each attribute:
  • the credit scoring model adopts an unweighted scoring model, such as gradient boosting decision tree (GBDT), Algorithmic models such as deep neural networks.
  • GBDT gradient boosting decision tree
  • Algorithmic models such as deep neural networks.
  • step S200 The specific implementation process of step S200 is as follows:
  • each round of iteration uses historical credit data, using multiple rounds of iterations, each round of iteration generates a weak classifier, each classifier is trained on the basis of the residual of the previous round of classifiers, and finally the weak classifiers obtained in each round of training are Weighted summation yields an overall classifier.
  • the credit scoring model is a weighted scoring model, such as a logistic regression model.
  • step S300 is as follows:
  • S301 Classify and summarize each attribute weight of each historical credit scoring model according to the attribute, so as to obtain several attribute weight sets with time labels.
  • step S200 m historical credit scoring models are trained. As shown in FIG. 9, three historical models are trained. Of course, in practical embodiments, more coarse historical credit scoring models need to be trained.
  • j is the time label
  • x ij is the attribute value of the i-th attribute at the time point corresponding to the time label j
  • ⁇ ij is the weight of the i-th attribute at the time point corresponding to the time label j
  • n is the number of attributes.
  • the attribute weight set corresponding to the i-th attribute is: ( ⁇ i1 , . . . , ⁇ im ).
  • Figure 10 shows the linear regression model trained by taking the attribute "debt ratio" as an example.
  • the prediction of the attribute weight of each attribute at a certain time point in the future can be realized, so as to obtain the attribute weight with time labels of each attribute at a certain time point in the future.
  • the credit scoring model adopts an unweighted scoring model, such as gradient boosting decision tree (GBDT), Algorithmic models such as deep neural networks.
  • GBDT gradient boosting decision tree
  • Algorithmic models such as deep neural networks.
  • the credit scoring model adopts a weightless scoring model, in order to obtain a scoring model that can effectively evaluate the credit of the credit reporting subject at future time points. It can make full use of the historical credit data of the credit reporting subject, learn the trend of data changes over time from the historical credit data, and predict a series of credit data corresponding to future time points.
  • step S300 is as follows:
  • S304' use the trained vector regression model to predict and obtain several prediction vectors with time labels.
  • step S300 after step S300 is performed, three future credit scoring models with time tags are obtained, which are the future credit scoring model in 2020, the future credit scoring model in 2021, and the future credit scoring model in 2022.
  • the current time is 2020. If you want to evaluate the credit reporting status of the credit reporting subject in 2018, input the credit reporting data of the credit reporting subject into the 2018 historical credit scoring model to obtain the credit report. The credit evaluation results of the credit subject in 2018. If you want to evaluate the credit information of the credit subject in 2022, you can input the credit data of the credit subject into the 2022 future credit scoring model to obtain the credit evaluation result of the credit subject in 2022.
  • the credit scoring model is a weighted scoring model, such as a logistic regression model.
  • the credit reporting subject can focus on the attributes in the top-ranked interval, and improve the credit score of the credit reporting subject by improving the attribute values of these attributes.
  • the historical score of each attribute may also be considered.
  • the S500 further includes:
  • the proportion of the number of credit reporting subjects in each score segment can be counted.
  • the proportion of the number of people referred to here should be superimposed according to the score from low to high.
  • the actual meaning is the number of people whose score is greater than or equal to a certain score segment. .
  • the score segment in which the credit subject's score is located can be obtained, and then the proportion of the credit subject's score for that attribute can be known from the proportion of the number of people in the score segment;
  • Attributes in the same interval are sorted again according to the score ratio.
  • the lower the score ratio the higher the ranking.
  • the final ranking result is presented to the user, and the weight ratio and score ratio will also be displayed together.
  • the credit reporting subject can weigh and select the importance of attributes to the evaluation results in combination with the weight ratio and score ratio to improve their credit.
  • the credit scoring model adopts an unweighted scoring model, such as gradient boosting decision tree (GBDT), Algorithmic models such as deep neural networks.
  • GBDT gradient boosting decision tree
  • Algorithmic models such as deep neural networks.
  • the credit scoring model adopts an unweighted scoring model
  • the local interpretable diagnosis algorithm LIME
  • LIME local interpretable diagnosis algorithm
  • the interpretation of the credit evaluation results using the locally interpretable diagnosis algorithm includes:
  • a local weighted scoring model is obtained by training based on the approximate sample set and the sample evaluation result set.
  • the evaluation result can be interpreted based on the attribute weight of the local weighted scoring model.
  • the personal credit evaluation method of the present invention constructs a series of credit scoring models over time for multiple historical time points and multiple future time points. Choosing an appropriate credit scoring model can realize the evaluation of the credit status of the credit reporting subject at a certain point in time, thereby significantly improving the evaluation effect of the evaluation model and ensuring the interpretability of the evaluation results.
  • the personal credit evaluation device based on time series attribution analysis in this embodiment includes a model initialization module 10 , a historical credit score model acquisition module 20 , a future credit score model acquisition module 30 , a credit evaluation module 40 and an interpretation module 50. in:
  • the model initialization module 10 is used for constructing a credit scoring model and initializing model parameters, and the credit scoring model is a weighted scoring model or an unweighted scoring model.
  • the historical credit scoring model acquisition module 20 is used to separately train the credit scoring model by using several groups of historical credit reporting data sets with time tags to obtain several trained historical credit scoring models with time tags, wherein : Each of the historical credit data sets includes multiple pieces of historical credit data, the historical credit data in the same group has the same time label, the historical credit data in different groups has different time labels, the time label Indicates the data generation time of the historical credit data to which it belongs.
  • the future credit scoring model acquisition module 30 is configured to predict and obtain the data based on the several time-tagged historical credit scoring models or the several groups of time-tagged historical credit investigation datasets according to the category of the credit scoring model.
  • the credit evaluation module 40 is used for inputting the credit data to be evaluated into the selected historical credit scoring model or future credit scoring model with a time stamp, so as to obtain the credit reporting subject corresponding to the credit data to be evaluated in the corresponding credit reporting data.
  • the interpretation module 50 is used for interpreting the credit evaluation result.
  • FIG. 7 is a schematic structural diagram of an electronic device provided by an embodiment of the present application.
  • the electronic device 60 includes a processor 61 and a memory 63 , and the processor 61 and the memory 63 are connected, for example, through a bus 63 .
  • the processor 61 may be a CPU, general purpose processor, DSP, ASIC, FPGA or other programmable device, transistor logic device, hardware component or any other combination. It may implement or execute the various exemplary logical blocks, modules and circuits described in connection with this disclosure.
  • the processor 61 may also be a combination for realizing computing functions, for example, including a combination of one or more microprocessors, a combination of a DSP and a microprocessor, and the like.
  • the bus 62 may include a path to transfer information between the components described above.
  • the bus 62 may be a PCI bus, an EISA bus, or the like.
  • the bus 62 can be divided into an address bus, a data bus, a control bus, and the like. For ease of presentation, only one thick line is shown in the figure, but it does not mean that there is only one bus or one type of bus.
  • the memory 63 may be ROM or other types of static storage devices that can store static information and instructions, RAM or other types of dynamic storage devices that can store information and instructions, or EEPROM, CD-ROM, or other optical disk storage, optical disk storage. , a magnetic disk storage medium or other magnetic storage device, or any other medium that can be used to carry or store the desired program code in the form of instructions or data structures and that can be accessed by a computer, without limitation.
  • the memory 63 is used to store the application code of the solution of the present application, and is controlled and executed by the processor 61 .
  • the processor 61 is configured to execute the application program code stored in the memory 63 to implement the personal credit evaluation method of the first embodiment.
  • the embodiment of the present application finally provides a computer-readable storage medium, where a computer program is stored on the computer-readable storage medium, and when the program is executed by the processor, the personal credit evaluation method in the first embodiment is implemented.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Business, Economics & Management (AREA)
  • General Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • General Engineering & Computer Science (AREA)
  • Evolutionary Computation (AREA)
  • General Health & Medical Sciences (AREA)
  • Computing Systems (AREA)
  • Artificial Intelligence (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Marketing (AREA)
  • General Business, Economics & Management (AREA)
  • Strategic Management (AREA)
  • Mathematical Physics (AREA)
  • Computational Linguistics (AREA)
  • Molecular Biology (AREA)
  • Economics (AREA)
  • Software Systems (AREA)
  • Finance (AREA)
  • Accounting & Taxation (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Technology Law (AREA)
  • Evolutionary Biology (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Development Economics (AREA)
  • Human Resources & Organizations (AREA)
  • Primary Health Care (AREA)
  • Tourism & Hospitality (AREA)
  • Financial Or Insurance-Related Operations Such As Payment And Settlement (AREA)

Abstract

An individual credit assessment and explanation method and apparatus based on time sequence attribution analysis, and a device and a storage medium. The method comprises: constructing a credit scoring model; separately training the credit scoring model by using multiple groups of historical credit investigation data sets having time labels to obtain multiple historical credit scoring models; predicting, according to the types of the credit scoring models, multiple future credit scoring models on the basis of the multiple historical credit scoring models having time labels or the multiple groups of historical credit investigation data sets having the time labels; inputting credit investigation data to be assessed into a selected historical credit scoring model or future credit scoring model, so as to obtain a credit investigation assessment result of a credit investigation subject corresponding to said credit investigation data; and explaining the credit investigation assessment result. In the method, a series of credit scoring models are constructed for multiple historical time points and multiple future time points, credit assessment of a credit investigation subject at a specific time point can be realized by selecting an appropriate credit scoring model, and explanation with reference value is made for the assessment result, so that individual credit scores are guided to be improved.

Description

基于时序归因分析的个人信用评估与解释方法、装置、设备及存储介质Personal credit evaluation and interpretation method, device, equipment and storage medium based on time series attribution analysis 技术领域technical field
本发明涉及金融征信领域,尤其是一种基于时序归因分析的个人信用评估与解释方法、装置、设备及存储介质。The invention relates to the field of financial credit reporting, in particular to a personal credit evaluation and interpretation method, device, equipment and storage medium based on time series attribution analysis.
背景技术Background technique
大数据和云计算等信息技术的飞速发展为金触行业开展征信业务提供了海量数据与先进技术,其中基于互联网大数据的个人征信业务具有巨大的发展潜力。The rapid development of information technologies such as big data and cloud computing has provided massive data and advanced technologies for the credit reporting business of the Jintouch industry. Among them, the personal credit reporting business based on Internet big data has great potential for development.
利用强大的机器学习分类模型,现有的个人信用评分系统已经可以做到对个人信用的评估。但这些系统普遍存在一类备受关注的问题,那就是无法对评估的结果做出有价值的逻辑解释,因此也无法给予征信主体改进信用评分的有效可行的建议。Using powerful machine learning classification models, existing personal credit scoring systems can already evaluate personal credit. However, these systems generally have a problem that has attracted much attention, that is, they cannot make valuable logical explanations for the evaluation results, so they cannot provide effective and feasible suggestions for credit reporting subjects to improve their credit scores.
归因分析技术是用以挖掘和识别征信金融领域所发生信用事件诱发因素的有效途径,目前该技术已经被应用至个人信用评估中以尝试对评估的结果做出有价值的逻辑解释。Attribution analysis technology is an effective way to mine and identify the triggering factors of credit events in the field of credit reporting. At present, this technology has been applied to personal credit evaluation to try to make a valuable logical explanation for the evaluation results.
然而,目前存在的主要问题上,评分系统的核心评分模型是基于某一固定时段的历史征信数据训练得到的。然而,征信主体的属性是会随着时间的推移发生变化的,甚至会出现某些新的对评估结果产生重大影响的新属性。使用这样的评分模型甚至有可能无法得到客观、有效的评估结果,更不用说能够对评估的结果做出有价值的逻辑解释。However, the main problem at present is that the core scoring model of the scoring system is trained based on the historical credit data of a certain period of time. However, the attributes of the credit reporting subject will change over time, and even some new attributes that have a significant impact on the evaluation results will appear. Using such a scoring model may even fail to obtain objective and valid evaluation results, let alone provide valuable logical explanations for the evaluation results.
发明内容SUMMARY OF THE INVENTION
为了解决上述技术问题中的至少一个,本发明第一方面提供了一种基于时序归因分析的个人信用评估与解释方法,其具体技术方案如下:In order to solve at least one of the above technical problems, a first aspect of the present invention provides a personal credit evaluation and interpretation method based on time series attribution analysis, and its specific technical scheme is as follows:
一种基于时序归因分析的个人信用评估与解释方法,其包括:A personal credit evaluation and interpretation method based on time series attribution analysis, which includes:
构建信用评分模型并初始化模型参数,所述信用评分模型为有权重评分模型或无权重评分模型;Building a credit scoring model and initializing model parameters, the credit scoring model is a weighted scoring model or an unweighted scoring model;
利用若干组带有时间标签的历史征信数据集对所述信用评分模型进行分别训练以获得若干个训练好的带有时间标签的历史信用评分模型,其中:各所述历史征信数据集均包括多条历史征信数据,位于相同组的历史征信数据具有相同的时间标签,位于不同组的历史征信数据具有不同的时间标签,所述时间标签表征其所属的历史征信数据的数据产生时间;The credit scoring models are separately trained by using several groups of time-labeled historical credit scoring models to obtain several trained time-labeled historical credit scoring models, wherein: each of the historical credit reporting data sets is Including multiple pieces of historical credit data, the historical credit data located in the same group has the same time label, and the historical credit data located in different groups has different time labels, and the time label represents the data of the historical credit data to which it belongs generation time;
根据所述信用评分模型的类别,基于所述若干带有时间标签的历史信用评分模型或所述若干组带有时间标签的历史征信数据集预测获取到若干带有时间标签的未来信用评分模型,其中,各所述未来信用评分模型的时间标签均不相同;According to the category of the credit scoring model, predict and obtain several future credit scoring models with time stamps based on the several historical credit scoring models with time stamps or the sets of historical credit investigation data sets with time stamps , wherein the time labels of each of the future credit scoring models are different;
将待评估征信数据输入至选定的具有时间标签的历史信用评分模型或未来信用评分模型中,以获得所述待评估征信数据对应的征信主体在所述时间标签对应的时间点的征信评估结果;Input the credit data to be evaluated into the selected historical credit scoring model or future credit scoring model with a time stamp to obtain the credit data of the credit subject corresponding to the credit data to be evaluated at the time point corresponding to the time stamp. credit evaluation results;
对所述征信评估结果进行解释。Explain the results of the credit evaluation.
本发明第二方面提供了一种基于时序归因分析的个人信用评估与解释装置,其包括:A second aspect of the present invention provides a personal credit evaluation and interpretation device based on time series attribution analysis, which includes:
模型初始化模块,用于构建信用评分模型并初始化模型参数,所述信用评分模型为有权重评分模型或无权重评分模型;A model initialization module for constructing a credit scoring model and initializing model parameters, where the credit scoring model is a weighted scoring model or an unweighted scoring model;
历史信用评分模型获取模块,用于利用若干组带有时间标签的历史征信数据集对所述信用评分模型进行分别训练以获得若干个训练好的带有时间标签的历史信用评分模型,其中:各所述历史征信数据集均包括多条历史征信数据,位于相同组的历史征信数据具有相同的时间标签,位于不同组的历史征信数据具有不同的时间标签,所述时间标签表征其所属的历史征信数据的数据产生时间;The historical credit scoring model acquisition module is used to separately train the credit scoring model by using several groups of historical credit reporting data sets with time tags to obtain several trained historical credit scoring models with time tags, wherein: Each of the historical credit investigation data sets includes multiple pieces of historical credit investigation data, the historical credit investigation data in the same group has the same time tag, and the historical credit investigation data in different groups has different time tags, and the time tags represent The data generation time of the historical credit data to which it belongs;
未来信用评分模型获取模块,用于根据所述信用评分模型的类别,基于所述若干带有时间标签的历史信用评分模型或所述若干组带有时间标签的历史征信数据集预测获取到若干带有时间标签的未来信用评分模型,其中,各所述未来信用评分模型的时间标签均不相同;The future credit scoring model acquisition module is used to predict and obtain a number of historical credit scoring models based on the several time-tagged historical credit scoring models or the several groups of time-tagged historical credit reporting data sets according to the category of the credit scoring model. A future credit scoring model with a time stamp, wherein the time stamps of each of the future credit scoring models are different;
信用评估模块,用于将待评估征信数据输入至选定的具有时间标签的历史信用评分模型或未来信用评分模型中,以获得所述待评估征信数据对应的征信 主体在所述时间标签对应的时间点的征信评估结果;The credit evaluation module is used to input the credit data to be evaluated into the selected historical credit scoring model or future credit scoring model with a time stamp, so as to obtain the credit reporting subject corresponding to the credit data to be evaluated at the time The credit evaluation result at the time point corresponding to the label;
解释模块,用于对所述征信评估结果进行解释。The interpretation module is used to interpret the credit evaluation result.
本发明第三方面提供了一种电子设备,包括存储器、处理器及存储在存储器内并可在处理器上运行的计算机程序,所述处理器执行所述程序时实现本发明第一方面所述的基于时序归因分析的个人信用评估与解释方法。A third aspect of the present invention provides an electronic device, comprising a memory, a processor, and a computer program stored in the memory and running on the processor, the processor implementing the program described in the first aspect of the present invention when the processor executes the program A method of personal credit evaluation and interpretation based on time series attribution analysis.
本发明第四方面提供了一种计算机可读存储介质,所述计算机可读存储介质上存储有计算机程序,该程序被处理器执行时实现本发明第一方面所述的基于时序归因分析的个人信用评估与解释方法。A fourth aspect of the present invention provides a computer-readable storage medium, where a computer program is stored on the computer-readable storage medium, and when the program is executed by a processor, the time-series attribution analysis-based algorithm described in the first aspect of the present invention is implemented. Personal credit assessment and interpretation methods.
与现有技术中的信用评分模型相比,本发明具有如下显著优点:Compared with the credit scoring model in the prior art, the present invention has the following significant advantages:
针对多个历史点及多个未来时间点构建出随时间推移的一系列信用评分模型。选择合适的信用评分模型即能够实现对征信主体在某一特定时间点的信用情况的评估,从而显著提升评估效果,并保证了评估结果的可解释性,从而给如何提升个人信用得分提供有价值的参考。Build a series of credit scoring models over time for multiple historical points and multiple future points in time. Choosing an appropriate credit scoring model can realize the evaluation of the credit status of the credit reporting subject at a certain point in time, thereby significantly improving the evaluation effect and ensuring the interpretability of the evaluation results, thus providing some insights into how to improve personal credit scores. value reference.
附图说明Description of drawings
图1为本发明实施例中的基于时序归因分析的个人信用评估与解释方法的流程示意图;1 is a schematic flowchart of a personal credit evaluation and interpretation method based on time series attribution analysis in an embodiment of the present invention;
图2为本发明实施例中的基于时序归因分析的个人信用评估与解释方法的流程示意图;2 is a schematic flowchart of a personal credit evaluation and interpretation method based on time series attribution analysis in an embodiment of the present invention;
图3为本发明实施例中的基于时序归因分析的个人信用评估与解释方法的流程示意图;3 is a schematic flowchart of a personal credit evaluation and interpretation method based on time series attribution analysis in an embodiment of the present invention;
图4为本发明实施例中的基于时序归因分析的个人信用评估与解释方法的流程示意图;4 is a schematic flowchart of a personal credit evaluation and interpretation method based on time series attribution analysis in an embodiment of the present invention;
图5为本发明实施例中的基于时序归因分析的个人信用评估与解释方法的流程示意图;5 is a schematic flowchart of a personal credit evaluation and interpretation method based on time series attribution analysis in an embodiment of the present invention;
图6为本发明实施例中的基于时序归因分析的个人信用评估与解释装置的结构示意图;6 is a schematic structural diagram of an apparatus for evaluating and explaining personal credit based on time series attribution analysis in an embodiment of the present invention;
图7为本发明实施例中的电子设备的结构示意图;7 is a schematic structural diagram of an electronic device in an embodiment of the present invention;
图8为本发明实施例中的一个征信主体的征信数据的数据结构图;8 is a data structure diagram of credit reporting data of a credit reporting subject in an embodiment of the present invention;
图9为本发明实施例中获取历史评分模型和未来评分模型的逻辑示意图;9 is a schematic diagram of a logic for obtaining a historical scoring model and a future scoring model in an embodiment of the present invention;
图10为本发明实施例中属性“负债比”的线性回归模型的示意图;10 is a schematic diagram of a linear regression model of the attribute “debt ratio” in an embodiment of the present invention;
图11为本发明实施例中的评估结果解释过程的流程图;11 is a flowchart of an evaluation result interpretation process in an embodiment of the present invention;
图12为本发明实施例中通过对征信数据的各属性的属性值进行扰动获得若干近似样本数据的方法逻辑图。FIG. 12 is a logic diagram of a method for obtaining several approximate sample data by perturbing the attribute value of each attribute of the credit data in an embodiment of the present invention.
具体实施方式detailed description
为使本发明的上述目的、特征和优点能够更加明显易懂,下面结合附图和具体实施方式对本发明作进一步详细的说明。In order to make the above objects, features and advantages of the present invention more clearly understood, the present invention will be described in further detail below with reference to the accompanying drawings and specific embodiments.
虽然本发明提供了如下述实施例或附图所示的方法操作步骤或装置结构,但基于常规或者无需创造性的劳动在所述方法或装置中可以包括更多或者更少的操作步骤或模块单元。在逻辑性上不存在必要因果关系的步骤或结构中,这些步骤的执行顺序或装置的模块结构不限于本发明实施例或附图所示的执行顺序或模块结构。所述的方法或模块结构的在实际中的装置或终端产品应用时,可以按照实施例或者附图所示的方法或模块结构进行顺序执行或者并行执行。Although the present invention provides method operation steps or device structures as shown in the following embodiments or drawings, more or less operation steps or module units may be included in the method or device based on routine or without creative work. . In the steps or structures that logically do not have a necessary causal relationship, the execution order of these steps or the module structure of the apparatus is not limited to the execution order or module structure shown in the embodiments of the present invention or the accompanying drawings. When the method or the module structure is applied in an actual device or terminal product, it can be executed sequentially or in parallel according to the method or the module structure shown in the embodiments or the accompanying drawings.
现有的征信评分系统的核心评分模型是基于某一固定时段的历史征信数据训练得到的。然而,征信主体的属性是会随着时间的推移发生变化的,甚至会出现某些新的对评估结果产生重大影响的新属性。使用现有的评分模型甚至有可能无法得到客观、有效的评估结果,更不用说能够对评估的结果做出有价值的逻辑解释。The core scoring model of the existing credit scoring system is obtained by training based on the historical credit data of a certain period of time. However, the attributes of the credit reporting subject will change over time, and even some new attributes that have a significant impact on the evaluation results will appear. Using the existing scoring models may not even be able to obtain objective and valid evaluation results, let alone provide valuable logical explanations for the evaluation results.
鉴于现有的征信评分模型存在的上述缺陷,本发明提供了一种基于时序归因分析的个人信用评估与解释方法、装置、设备及存储介质,其针对多个历史时间点及多个未来时间点构建出随时间推移的一系列信用评分模型。选择合适的信用评分模型即能够实现对征信主体在过去、现在或未来某一特定时间点的信用情况的评估,从而显著提升评估效果,并保证了评估结果的可解释性。In view of the above-mentioned defects of the existing credit scoring models, the present invention provides a personal credit evaluation and interpretation method, device, equipment and storage medium based on time series attribution analysis, which is aimed at multiple historical time points and multiple futures. Time points build a series of credit scoring models over time. Choosing an appropriate credit scoring model can realize the evaluation of the credit status of the credit reporting subject at a specific point in the past, present or future, thereby significantly improving the evaluation effect and ensuring the interpretability of the evaluation results.
在对本发明实施例进行介绍之前,对下述专业术语进行说明:Before introducing the embodiments of the present invention, the following technical terms are explained:
征信数据:与征信主体的信用有关的数据,征信数据包括若干属性(或者称为特征),每个属性具有一属性值。每条征信数据的各属性值可能自一个数据源采集获取,也可能自若干数据源采集获取。如图8,其示出了一个实施例中的征信主体的征信数据的数据格式,该征信数据包括自三个数据源处采集的12个属性,每个属性均对应有一个属性值。三个数据源分别为金融机构(如银 行)、消费平台(如淘宝、美团等)、社交网络。前两个数据源能够直接反应征信主体的征信行为,而引入社交网络则能够在一定程度上提升评分模型的评分准确性。 Credit data: data related to the credit of the credit subject. The credit data includes several attributes (or called features), and each attribute has an attribute value. Each attribute value of each piece of credit information may be collected from one data source, or may be collected from several data sources. FIG. 8 shows the data format of the credit data of the credit reporting subject in one embodiment, the credit data includes 12 attributes collected from three data sources, and each attribute corresponds to an attribute value . The three data sources are financial institutions (such as banks), consumer platforms (such as Taobao, Meituan, etc.), and social networks. The first two data sources can directly reflect the credit reporting behavior of credit reporting subjects, and the introduction of social networks can improve the scoring accuracy of the scoring model to a certain extent.
当然,在进行模型训练或进行信用评估时,需要对征信数据进行必要的预处理,如将其转换为向量形式。Of course, when performing model training or credit evaluation, it is necessary to perform necessary preprocessing on the credit data, such as converting it into a vector form.
时间标签:征信数据的时间标签用来表针征信数据的产生时间。时间标签的跨度可以是年、季度、月等,图7所示实施例中的时间标签的跨度为年,其所示征信数据的时间标签为“2018年”,则表示该征信数据的所有属性数据均产生于2018年。 Time label: The time label of the credit data is used to indicate the generation time of the credit data. The span of the time label can be a year, a quarter, a month, etc. The span of the time label in the embodiment shown in FIG. 7 is a year, and the time label of the credit data shown in it is "2018", which means that the credit data is All attribute data are generated in 2018.
有权重评分模型和无权评分模型:There are weighted scoring models and unweighted scoring models:
评分模型可表示为Score=F(x),其中:x为征信主体的征信数据的属性向量,F为选定的评分模型,Score为评分模型最终获取的信用得分。The scoring model can be expressed as Score=F(x), where: x is the attribute vector of the credit data of the credit reporting subject, F is the selected scoring model, and Score is the final credit score obtained by the scoring model.
如果,评分模型F可以表示为如下形式;If, the scoring model F can be expressed as the following form;
F(x)=α 1x 1+…α nx nF(x)=α 1 x 1 +...α n x n ;
其中:x i为征信主体的第i个属性的属性值,α i为对应第i个属性的权重,n为征信主体的属性的数量。 Among them: x i is the attribute value of the i-th attribute of the credit reporting subject, α i is the weight corresponding to the i-th attribute, and n is the number of attributes of the credit reporting subject.
则该评分模型F被定义为权重评分模型。Then the scoring model F is defined as a weighted scoring model.
否则,该评分模型F被定义为无权重评分模型。Otherwise, the scoring model F is defined as an unweighted scoring model.
权重评分模型可以选择逻辑回归模型。逻辑回归作为最简单的分类算法,一直是工业界主流的分类算法,其具有简单稳定、可解释性强、易于检测和部署等优势。The weighted scoring model can choose a logistic regression model. As the simplest classification algorithm, logistic regression has always been the mainstream classification algorithm in the industry. It has the advantages of simplicity and stability, strong interpretability, and easy detection and deployment.
无权重评分模型则可选择梯度提升决策树(GBDT)、深度神经网络等算法模型。其中:梯度提升决策树(GBDT)属于集成算法中的一种,基础学习器采用分类回归树,该算法的优点在于:具有突出的分类效果,且可以在训练过程中国实现特征筛选。深度神经网络可以理解为包括多个隐藏层的神经网络,其通过激活函数和反向传播等技术,可以通过庞大的参数在极高的维度做调整,从而可以充分地识别出复杂的分类界限,达到良好的分类效果。For the unweighted scoring model, you can choose algorithm models such as gradient boosting decision tree (GBDT) and deep neural network. Among them: Gradient Boosting Decision Tree (GBDT) is a kind of ensemble algorithm, and the basic learner adopts classification and regression tree. The advantage of this algorithm is that it has a prominent classification effect and can realize feature screening in the training process. A deep neural network can be understood as a neural network including multiple hidden layers, which can be adjusted in a very high dimension through huge parameters through techniques such as activation function and backpropagation, so that complex classification boundaries can be fully identified. achieve a good classification effect.
方法实施例Method embodiment
如图1所示,本发明实施例提供的基于时序归因分析的个人信用评估与解释方法包括如下步骤:As shown in FIG. 1 , the method for evaluating and explaining personal credit based on time series attribution analysis provided by an embodiment of the present invention includes the following steps:
S100、构建信用评分模型并初始化模型参数,信用评分模型为有权重评分模型或无权重评分模型。S100. Build a credit scoring model and initialize model parameters, where the credit scoring model is a weighted scoring model or a weightless scoring model.
S200、利用若干组带有时间标签的历史征信数据集对信用评分模型进行分别训练以获得若干个训练好的带有时间标签的历史信用评分模型。其中:各历史征信数据集均包括多条历史征信数据,位于相同组的历史征信数据具有相同的时间标签,位于不同组的历史征信数据具有不同的时间标签,时间标签表征其所属的历史征信数据的数据产生时间。S200 , using several groups of historical credit reporting data sets with time tags to train the credit scoring models respectively to obtain several trained historical credit scoring models with time tags. Among them: each historical credit data set includes multiple pieces of historical credit data, the historical credit data in the same group has the same time label, the historical credit data in different groups has different time labels, and the time label represents the time label it belongs to The data generation time of the historical credit information data.
时间标签的跨度(颗粒度)可以是年、季、月,甚至是天,一般来说,时间标签的跨度越短(颗粒度越细),所采集到的具有相同的时间标签的历史征信数据集中的数据分布越均匀,所训练出的历史信用评分模型的评分效果越好。The span (granularity) of the time label can be a year, a quarter, a month, or even a day. Generally speaking, the shorter the span of the time label (the finer the granularity), the more historical credit information with the same time label is collected. The more uniform the data distribution in the dataset, the better the scoring effect of the trained historical credit scoring model.
实际应用中,可以根据具体需要选择时间标签的跨度。如图9实施例中,时间标签为年,当前年份为2020年,对过去三年的征信数据进行采集、取样,得到三组历史征信数据集,分别2017年征信数据集、2018年征信数据集和2019年征信数据集。例如,2017年征信数据集中的所有征信数据均产生于2017年,每条征信数据均表征一个征信主体在2017年的信用情况。In practical applications, the span of the time label can be selected according to specific needs. In the embodiment shown in Figure 9, the time label is year, the current year is 2020, and the credit data of the past three years is collected and sampled to obtain three sets of historical credit data sets, respectively 2017 credit data set and 2018 credit data set. Credit reference dataset and 2019 credit reference dataset. For example, all credit data in the 2017 credit data set were generated in 2017, and each piece of credit data represents the credit status of a credit subject in 2017.
将三组历史征信数据集分别作为训练样本分别对信用评分模型进行训练,即能对应获得三个历史信用评分模型,分别为2017年历史信用评分模型、2018年历史信用评分模型和2019年历史评估模型。当然,实际应用中,可以采集更多年份的征信数据,训练出更多的历史信用评分模型。Three sets of historical credit data sets are used as training samples to train the credit scoring model respectively, that is, three historical credit scoring models can be obtained correspondingly, namely the historical credit scoring model in 2017, the historical credit scoring model in 2018 and the historical credit scoring model in 2019. Evaluate the model. Of course, in practical applications, more years of credit data can be collected to train more historical credit scoring models.
具体的:specific:
当信用评分模型采用的是有权重评分模型时,如逻辑回归模型。When the credit scoring model is a weighted scoring model, such as a logistic regression model.
步骤S200的具体实施过程如下:The specific implementation process of step S200 is as follows:
采取经典的梯度下降思想即可迭代求解逻辑回归表达式,通常训练速度十分快速。逻辑回归得到的结果可以十分方便地转换为标准的评分卡模式,即最终得到总信用评分可拆分得到对应各个属性的维度信用得分:The logistic regression expression can be solved iteratively by adopting the classical gradient descent idea, and the training speed is usually very fast. The results obtained by logistic regression can be easily converted into a standard scorecard model, that is, the final total credit score can be split to obtain the dimensional credit score corresponding to each attribute:
(1)在进行模型求解时,先使用变量分箱的方法对变量进行分段;(1) When solving the model, first use the method of variable binning to segment the variables;
(2)再使用WOE编码将分箱后的离散变量编码为连续变量;(2) Then use WOE coding to encode the discrete variables after binning into continuous variables;
(3)之后再进行模型的求解训练。(3) After that, the solution training of the model is carried out.
最终的结果可表示如下,The final result can be expressed as follows,
F(x)=A-B(α 1x 1+…α nx n),α i=θ iw iF(x)=AB(α 1 x 1 +...α n x n) , α ii w i .
其中A、B为常量,则可以看出各个属性对应的信用评分分值为-Bθ ix iWhere A and B are constants, it can be seen that the credit score corresponding to each attribute is -Bθ i x i .
当信用评分模型采用的是无权重评分模型时,如梯度提升决策树(GBDT)、When the credit scoring model adopts an unweighted scoring model, such as gradient boosting decision tree (GBDT), 深度神经网络等算法模型。Algorithmic models such as deep neural networks.
步骤S200的具体实施过程如下:The specific implementation process of step S200 is as follows:
利用历史征信数据,采用多轮迭代的方式,每轮迭代产生一个弱分类器,每个分类器在上一轮分类器的残差基础上进行训练,最终将每轮训练得到的弱分类器加权求和得到一个总分类器。Using historical credit data, using multiple rounds of iterations, each round of iteration generates a weak classifier, each classifier is trained on the basis of the residual of the previous round of classifiers, and finally the weak classifiers obtained in each round of training are Weighted summation yields an overall classifier.
S300、根据信用评分模型的类别,基于若干带有时间标签的历史信用评分模型或若干组带有时间标签的历史征信数据集预测获取到若干带有时间标签的未来信用评分模型,其中,各未来信用评分模型的时间标签均不相同。S300. According to the category of the credit scoring model, predict and obtain several future credit scoring models with time stamps based on several historical credit scoring models with time stamps or several groups of historical credit investigation data sets with time stamps, wherein each The time labels for future credit scoring models are all different.
具体的:specific:
当信用评分模型采用的是有权重评分模型时,如逻辑回归模型。When the credit scoring model is a weighted scoring model, such as a logistic regression model.
则如图2所示的,步骤S300的具体实施过程如下:As shown in FIG. 2 , the specific implementation process of step S300 is as follows:
S301、按属性对各历史信用评分模型的各属性权重进行分类汇总,以获取到若干带有时间标签的属性权重集。S301. Classify and summarize each attribute weight of each historical credit scoring model according to the attribute, so as to obtain several attribute weight sets with time labels.
仍以图8实施例中的征信数据为例。经过步骤S200训练出了m个历史信用评分模型,如图9中,训练出三个历史模型。当然,实际实施例中,需要训练粗更多的历史信用评分模型。Still take the credit reporting data in the embodiment of FIG. 8 as an example. After step S200, m historical credit scoring models are trained. As shown in FIG. 9, three historical models are trained. Of course, in practical embodiments, more coarse historical credit scoring models need to be trained.
时间标签为j的历史信用评分模型可以表示为:The historical credit scoring model with time label j can be expressed as:
F(x j)=α 1jx 1j+…+α ijx ijnjx njF(x j )=α 1j x 1j +...+α ij x ijnj x nj ;
其中:j为时间标签,x ij为第i个属性在时间标签j对应时间点的属性值,α ij为第i个属性在时间标签j对应时间点的权重,n为属性的数量。 Among them: j is the time label, x ij is the attribute value of the i-th attribute at the time point corresponding to the time label j, α ij is the weight of the i-th attribute at the time point corresponding to the time label j, and n is the number of attributes.
则,对应于第i个属性的属性权重集为:(α i1,…,α im)。 Then, the attribute weight set corresponding to the i-th attribute is: (α i1 , . . . , α im ).
S302、以各属性权重集作为训练数据集,训练得到若干与若干属性权重集一一对应的线性回归模型。S302 , using each attribute weight set as a training data set, train to obtain several linear regression models corresponding to several attribute weight sets one-to-one.
即对应于第i个属性,对属性权重集(α i1,…,α im)进行回归分析,以拟 合出一对应于该第i个属性的线性回归模型。共拟合出n个线性回归模型。 That is, corresponding to the ith attribute, perform regression analysis on the attribute weight set (α i1 , . . . , α im ) to fit a linear regression model corresponding to the ith attribute. A total of n linear regression models are fitted.
图10示出了以属性“负债比”为例训练出的线性回归模型。Figure 10 shows the linear regression model trained by taking the attribute "debt ratio" as an example.
S303、使用训练好的若干所述线性回归模型,分别预测出各属性在未来若干时间点的带有时间标签的属性权重。S303 , using several of the trained linear regression models, respectively predict the attribute weights with time labels for each attribute at several time points in the future.
训练出各属性对应的属性权重回归模型后,即能实现对未来某时间点的各属性的属性权重的预测,从而得到各属性在未来若干时间点的带有时间标签的属性权重。After training the attribute weight regression model corresponding to each attribute, the prediction of the attribute weight of each attribute at a certain time point in the future can be realized, so as to obtain the attribute weight with time labels of each attribute at a certain time point in the future.
S304、基于预测到的各属性在未来若干时间点的带有时间标签的属性权重构建出若干带有时间标签的未来信用评分模型。S304 , constructing a number of future credit scoring models with time tags based on the predicted attribute weights with time tags at several future time points.
当信用评分模型采用的是无权重评分模型时,如梯度提升决策树(GBDT)、When the credit scoring model adopts an unweighted scoring model, such as gradient boosting decision tree (GBDT), 深度神经网络等算法模型。Algorithmic models such as deep neural networks.
由于信用评分模型采用的是无权重评分模型,为了获得能够对征信主体在未来时间点的信用进行有效评估的评分模型。可以充分利用征信主体的历史征信数据,从历史征信数据中学习数据随时间的变化趋势,从而预测出一系列与将来时间点对应的征信数据。具体的:Since the credit scoring model adopts a weightless scoring model, in order to obtain a scoring model that can effectively evaluate the credit of the credit reporting subject at future time points. It can make full use of the historical credit data of the credit reporting subject, learn the trend of data changes over time from the historical credit data, and predict a series of credit data corresponding to future time points. specific:
则如图4所示的,步骤S300的具体实施过程如下:As shown in FIG. 4 , the specific implementation process of step S300 is as follows:
S301’、获取各历史征信数据集的概率分布及概率分布的参数值,概率分布为高斯分布。S301', obtain the probability distribution of each historical credit information data set and the parameter value of the probability distribution, and the probability distribution is a Gaussian distribution.
S302’、使用核函数运算将各所述历史征信数据集的概率分布变换至再生核希伯特空间,得到若干与各所述历史征信数据集一一对应的带有时间标签的历史向量。S302', using kernel function operation to transform the probability distribution of each of the historical credit information data sets into a regenerated kernel Hibbert space, to obtain a number of historical vectors with time labels corresponding to each of the historical credit information data sets one-to-one .
S303’、以若干历史向量作为训练数据集,训练得到向量回归模型;S303', using several historical vectors as the training data set, training to obtain a vector regression model;
S304’、使用训练好的向量回归模型,预测获取若干带有时间标签的预测向量。S304', use the trained vector regression model to predict and obtain several prediction vectors with time labels.
S305’、使用核函数运算将若干预测向量逆变换至概率分布空间,从而获取到若干组带有时间标签的预测征信数据集;S305', using kernel function operation to inversely transform several prediction vectors into probability distribution space, thereby obtaining several groups of prediction credit data sets with time labels;
S306’、利用若干组带有时间标签的预测征信数据集对信用评分模型进行分别训练以获得若干个训练好的带有时间标签的未来信用评分模型。S306', using several groups of predicted credit reporting data sets with time labels to train the credit scoring models respectively to obtain several trained future credit scoring models with time labels.
继续参考图9所示,执行完步骤S300后,获得三个带有时间标签的未来信 用评分模型,分别为2020年未来信用评分模型、2021年未来信用评分模型和2022年未来信用评分模型。Continuing to refer to Figure 9, after step S300 is performed, three future credit scoring models with time tags are obtained, which are the future credit scoring model in 2020, the future credit scoring model in 2021, and the future credit scoring model in 2022.
S400、将待评估征信数据输入至选定的具有时间标签的历史信用评分模型或未来信用评分模型中,以获得所述待评估征信数据对应的征信主体在所述时间标签对应的时间点的征信评估结果。S400. Input the credit investigation data to be evaluated into the selected historical credit scoring model or future credit scoring model with a time stamp, so as to obtain the time corresponding to the time stamp of the credit investigation subject corresponding to the credit investigation data to be evaluated Points of credit evaluation results.
如图9所示,当前时间为2020年,如果想对征信主体在2018年的征信情况进行评估时,将征信主体的征信数据输入至2018年历史信用评分模型即可获得该征信主体在2018年的征信评估结果。而如果想对征信主体在2022年的征信情况进行评估时,将征信主体的征信数据输入至2022年未来信用评分模型即可获得该征信主体在2022年的征信评估结果。As shown in Figure 9, the current time is 2020. If you want to evaluate the credit reporting status of the credit reporting subject in 2018, input the credit reporting data of the credit reporting subject into the 2018 historical credit scoring model to obtain the credit report. The credit evaluation results of the credit subject in 2018. If you want to evaluate the credit information of the credit subject in 2022, you can input the credit data of the credit subject into the 2022 future credit scoring model to obtain the credit evaluation result of the credit subject in 2022.
S500、对征信评估结果进行解释。S500. Explain the credit evaluation result.
当信用评分模型采用的是有权重评分模型时,如逻辑回归模型。When the credit scoring model is a weighted scoring model, such as a logistic regression model.
如图3所示,S500的具体实施过程如下:As shown in Figure 3, the specific implementation process of S500 is as follows:
S501、从征信评估结果中获取到各属性的权重并计算出总权重。S501. Obtain the weight of each attribute from the credit evaluation result and calculate the total weight.
S502、计算各属性的权重占总权重的权重占比。S502. Calculate the weight ratio of the weight of each attribute to the total weight.
S503、按权重占比对各属性进行重要性排序。S503. Rank the importance of each attribute according to the weight ratio.
S504、将排序后的属性均匀划分为若干区间。S504, evenly divide the sorted attributes into several intervals.
对于有权重评分模型,权重越高,可以解释为对应的属性对评估结果的影响自然越大,也就是说该属性对征信主体的信用度的影响越大。因此,征信主体可以重点关注排在最前面的区间内的各属性,通过对这些属性的属性值进行改善,从而改善征信主体的信用得分。For a weighted scoring model, the higher the weight, the greater the influence of the corresponding attribute on the evaluation result, that is to say, the greater the influence of the attribute on the credit of the credit subject. Therefore, the credit reporting subject can focus on the attributes in the top-ranked interval, and improve the credit score of the credit reporting subject by improving the attribute values of these attributes.
由于某些属性的权重可能会出现极端情况,从而导致属性的排序结果并不能客观、真实地反映属性的重要性。Due to the extreme weights of some attributes, the ranking results of attributes cannot objectively and truly reflect the importance of attributes.
鉴于此,可选的,还可以考虑各属性的历史得分情况。In view of this, optionally, the historical score of each attribute may also be considered.
如图3所示,可选的,S500还包括:As shown in Figure 3, optionally, the S500 further includes:
S505、从历史征信数据中统计出各属性的得分分布。S505. Calculate the score distribution of each attribute from the historical credit information data.
具体地,对于某一属性,可以统计出各个分数段的征信主体的人数比例,此处所指的人数比例应该按照分数由低至高依次叠加,实际含义为得分大于等于某一分数段的人数。Specifically, for a certain attribute, the proportion of the number of credit reporting subjects in each score segment can be counted. The proportion of the number of people referred to here should be superimposed according to the score from low to high. The actual meaning is the number of people whose score is greater than or equal to a certain score segment. .
S506、统计出各属性的得分占比。S506. Calculate the score ratio of each attribute.
具体地,对于某一个属性,可以得到征信主体的得分所在的分数段,进而由该分数段人数比例可知该征信主体的该属性的得分比例情况;Specifically, for a certain attribute, the score segment in which the credit subject's score is located can be obtained, and then the proportion of the credit subject's score for that attribute can be known from the proportion of the number of people in the score segment;
S507、基于各属性的得分占比对各所述区间内的属性进行重新排序。S507. Reorder the attributes in each of the intervals based on the score ratio of each attribute.
对于同一区间的属性按照得分比例再次进行排序,得分比例越低则排序越高,将最终的排序结果呈现给用户,权重比例、得分比例也会一并展示。Attributes in the same interval are sorted again according to the score ratio. The lower the score ratio, the higher the ranking. The final ranking result is presented to the user, and the weight ratio and score ratio will also be displayed together.
征信主体可以结合权重比例、得分比例对属性对评估结果的重要度进行权衡、选择,以改善其信用度。The credit reporting subject can weigh and select the importance of attributes to the evaluation results in combination with the weight ratio and score ratio to improve their credit.
当信用评分模型采用的是无权重评分模型时,如梯度提升决策树(GBDT)、When the credit scoring model adopts an unweighted scoring model, such as gradient boosting decision tree (GBDT), 深度神经网络等算法模型。Algorithmic models such as deep neural networks.
由于信用评分模型采用的是无权重评分模型,为了对评估结果做出解释,需要结合信用评分模型和征信主体的属性数据进行训练以得到局部的有权重评分模型。具体的,可以采用局部可解释诊断算法(LIME)获得局部的有权重评分模型,局部可解释诊断算法(LIME)理论上能够实现对任何无权重评分模型的评估结果进行解释。Since the credit scoring model adopts an unweighted scoring model, in order to interpret the evaluation results, it is necessary to combine the credit scoring model and the attribute data of the credit reporting subject for training to obtain a partial weighted scoring model. Specifically, the local interpretable diagnosis algorithm (LIME) can be used to obtain a local weighted scoring model, and the local interpretable diagnosis algorithm (LIME) can theoretically interpret the evaluation results of any unweighted scoring model.
如图5和图11所示,采用局部可解释诊断算法对征信评估结果进行解释具体包括:As shown in Figure 5 and Figure 11, the interpretation of the credit evaluation results using the locally interpretable diagnosis algorithm includes:
S501’、对待评估征信数据的属性进行搅动获取到由若干与待评估征信数据相近的样本数据构成的近似样本集。S501', stirring the attributes of the credit data to be evaluated to obtain an approximate sample set composed of several sample data similar to the credit data to be evaluated.
仍以图8实施例中的征信数据为例。如图12所示,通过对征信数据的各属性的属性值(如图中的收入、负债比)进行扰动,即可获得若干与待评估征信数据相近的样本数据,最终构成近似样本集。Still take the credit reporting data in the embodiment of FIG. 8 as an example. As shown in Figure 12, by perturbing the attribute values of each attribute of the credit data (income and debt ratio in the figure), several sample data similar to the credit data to be evaluated can be obtained, and finally an approximate sample set can be formed. .
S502’、经近似样本集输入至产生评估结果的历史信用评分模型或未来信用评分模型中,得到样本评估结果集。S502', input the approximate sample set into the historical credit scoring model or the future credit scoring model that generates the evaluation result, and obtain the sample evaluation result set.
S503’、基于近似样本集和样本评估结果集训练得到局部有权重评分模型。S503', a local weighted scoring model is obtained by training based on the approximate sample set and the sample evaluation result set.
S504’、基于局部有权重评分模型的属性权重即能实现对评估结果的解释。S504', the evaluation result can be interpreted based on the attribute weight of the local weighted scoring model.
可见,本发明个人信用评估方法的针对多个历史时间点及多个未来时间点构建出随时间推移的一系列信用评分模型。选择合适的信用评分模型即能够实现对征信主体在某一特定时间点的信用情况的评估,从而显著提升评估模型的 评估效果,并保证了评估结果的可解释性。It can be seen that the personal credit evaluation method of the present invention constructs a series of credit scoring models over time for multiple historical time points and multiple future time points. Choosing an appropriate credit scoring model can realize the evaluation of the credit status of the credit reporting subject at a certain point in time, thereby significantly improving the evaluation effect of the evaluation model and ensuring the interpretability of the evaluation results.
装置实施例Device embodiment
如图6所示,本实施例中的基于时序归因分析的个人信用评估装置包括模型初始化模块10、历史信用评分模型获取模块20、未来信用评分模型获取模块30、信用评估模块40及解释模块50。其中:As shown in FIG. 6 , the personal credit evaluation device based on time series attribution analysis in this embodiment includes a model initialization module 10 , a historical credit score model acquisition module 20 , a future credit score model acquisition module 30 , a credit evaluation module 40 and an interpretation module 50. in:
模型初始化模块10,用于构建信用评分模型并初始化模型参数,所述信用评分模型为有权重评分模型或无权重评分模型。The model initialization module 10 is used for constructing a credit scoring model and initializing model parameters, and the credit scoring model is a weighted scoring model or an unweighted scoring model.
历史信用评分模型获取模块20,用于利用若干组带有时间标签的历史征信数据集对所述信用评分模型进行分别训练以获得若干个训练好的带有时间标签的历史信用评分模型,其中:各所述历史征信数据集均包括多条历史征信数据,位于相同组的历史征信数据具有相同的时间标签,位于不同组的历史征信数据具有不同的时间标签,所述时间标签表征其所属的历史征信数据的数据产生时间。The historical credit scoring model acquisition module 20 is used to separately train the credit scoring model by using several groups of historical credit reporting data sets with time tags to obtain several trained historical credit scoring models with time tags, wherein : Each of the historical credit data sets includes multiple pieces of historical credit data, the historical credit data in the same group has the same time label, the historical credit data in different groups has different time labels, the time label Indicates the data generation time of the historical credit data to which it belongs.
未来信用评分模型获取模块30,用于根据所述信用评分模型的类别,基于所述若干带有时间标签的历史信用评分模型或所述若干组带有时间标签的历史征信数据集预测获取到若干带有时间标签的未来信用评分模型,其中,各所述未来信用评分模型的时间标签均不相同;The future credit scoring model acquisition module 30 is configured to predict and obtain the data based on the several time-tagged historical credit scoring models or the several groups of time-tagged historical credit investigation datasets according to the category of the credit scoring model. Several future credit scoring models with time stamps, wherein the time stamps of each of the future credit scoring models are different;
信用评估模块40,用于将待评估征信数据输入至选定的具有时间标签的历史信用评分模型或未来信用评分模型中,以获得所述待评估征信数据对应的征信主体在相应的所述时间标签对应的时间点的征信评估结果;The credit evaluation module 40 is used for inputting the credit data to be evaluated into the selected historical credit scoring model or future credit scoring model with a time stamp, so as to obtain the credit reporting subject corresponding to the credit data to be evaluated in the corresponding credit reporting data. The credit evaluation result of the time point corresponding to the time label;
解释模块50,用于对所述征信评估结果进行解释。The interpretation module 50 is used for interpreting the credit evaluation result.
由于本实施例中的个人信用评估装置的各功能模块的处理过程与前述实施例一中的个人信用评估方法的处理过程一致,因此本实施例不再对个人信用评估装置的各功能模块的处理过程进行重复描述,可以参考实施例一的相关描述。Since the processing process of each functional module of the personal credit evaluation device in this embodiment is consistent with the processing process of the personal credit evaluation method in the first embodiment, the processing of each functional module of the personal credit evaluation device is no longer performed in this embodiment. The process is repeatedly described, and reference may be made to the related description of Embodiment 1.
电子设备实施例Electronic device embodiment
图7为本申请实施例提供的电子设备的结构示意图,如图7所示,该电子设备60包括处理器61和存储器63,处理器61和存储器63相连,如通过总线63相连。FIG. 7 is a schematic structural diagram of an electronic device provided by an embodiment of the present application. As shown in FIG. 7 , the electronic device 60 includes a processor 61 and a memory 63 , and the processor 61 and the memory 63 are connected, for example, through a bus 63 .
处理器61可以是CPU,通用处理器、DSP,ASIC,FPGA或者其他可编 程器件、晶体管逻辑器件、硬件部件或者其他任意组合。其可以实现或执行结合本申请公开内容所描述的各种示例性的逻辑方框,模块和电路。处理器61也可以是实现计算功能的组合,例如包括一个或多个微处理器组合,DSP和微处理器的组合等。The processor 61 may be a CPU, general purpose processor, DSP, ASIC, FPGA or other programmable device, transistor logic device, hardware component or any other combination. It may implement or execute the various exemplary logical blocks, modules and circuits described in connection with this disclosure. The processor 61 may also be a combination for realizing computing functions, for example, including a combination of one or more microprocessors, a combination of a DSP and a microprocessor, and the like.
总线62可以包括一通路,在上述组件之间传送信息。总线62可以是PCI总线或EISA总线等。总线62可以分为地址总线、数据总线、控制总线等。为了便于表示,图中仅以一条粗线表示,但是并不表示仅有一根总线或一种类型的总线。The bus 62 may include a path to transfer information between the components described above. The bus 62 may be a PCI bus, an EISA bus, or the like. The bus 62 can be divided into an address bus, a data bus, a control bus, and the like. For ease of presentation, only one thick line is shown in the figure, but it does not mean that there is only one bus or one type of bus.
存储器63可以是ROM或可存储静态信息和指令的其他类型的静态存储设备,RAM或者可以储存信息和指令的其他类型的动态存储设备,也可以是EEPROM、CD-ROM或其他光盘存储、光碟存储、磁盘存储介质或其他磁存储设备、或者能够用于携带或存储具有指令或数据结构形式的期望程序代码并能够由计算机存取的任何其他介质,但不限于此。The memory 63 may be ROM or other types of static storage devices that can store static information and instructions, RAM or other types of dynamic storage devices that can store information and instructions, or EEPROM, CD-ROM, or other optical disk storage, optical disk storage. , a magnetic disk storage medium or other magnetic storage device, or any other medium that can be used to carry or store the desired program code in the form of instructions or data structures and that can be accessed by a computer, without limitation.
存储器63用于存储本申请方案的应用程序代码,并由处理器61来控制执行。处理器61用于执行存储器63中存储的应用程序代码,以实现实施例一的个人信用评估方法。The memory 63 is used to store the application code of the solution of the present application, and is controlled and executed by the processor 61 . The processor 61 is configured to execute the application program code stored in the memory 63 to implement the personal credit evaluation method of the first embodiment.
本申请实施例最后还提供了一种计算机可读存储介质,该计算机可读存储介质上存储有计算机程序,该程序被处理器执行时实现实施例一中的个人信用评估方法。The embodiment of the present application finally provides a computer-readable storage medium, where a computer program is stored on the computer-readable storage medium, and when the program is executed by the processor, the personal credit evaluation method in the first embodiment is implemented.
上文对本发明进行了足够详细的具有一定特殊性的描述。所属领域内的普通技术人员应该理解,实施例中的描述仅仅是示例性的,在不偏离本发明的真实精神和范围的前提下做出所有改变都应该属于本发明的保护范围。本发明所要求保护的范围是由所述的权利要求书进行限定的,而不是由实施例中的上述描述来限定的。The invention has been described above in sufficient detail with certain particularities. Those of ordinary skill in the art should understand that the descriptions in the embodiments are only exemplary, and all changes made without departing from the true spirit and scope of the present invention should belong to the protection scope of the present invention. The claimed scope of the present invention is defined by the claims, rather than by the above description in the embodiments.

Claims (10)

  1. 一种基于时序归因分析的个人信用评估与解释方法,其特征在于,其包括:A personal credit evaluation and interpretation method based on time series attribution analysis, characterized in that it includes:
    构建信用评分模型并初始化模型参数,所述信用评分模型为有权重评分模型或无权重评分模型;Building a credit scoring model and initializing model parameters, the credit scoring model is a weighted scoring model or an unweighted scoring model;
    利用若干组带有时间标签的历史征信数据集对所述信用评分模型进行分别训练以获得若干个训练好的带有时间标签的历史信用评分模型,其中:各所述历史征信数据集均包括多条历史征信数据,位于相同组的历史征信数据具有相同的时间标签,位于不同组的历史征信数据具有不同的时间标签,所述时间标签表征其所属的历史征信数据的数据产生时间;The credit scoring models are separately trained by using several groups of time-labeled historical credit scoring models to obtain several trained time-labeled historical credit scoring models, wherein: each of the historical credit reporting data sets is Including multiple pieces of historical credit data, the historical credit data located in the same group has the same time label, and the historical credit data located in different groups has different time labels, and the time label represents the data of the historical credit data to which it belongs generation time;
    根据所述信用评分模型的类别,基于所述若干带有时间标签的历史信用评分模型或所述若干组带有时间标签的历史征信数据集预测获取到若干带有时间标签的未来信用评分模型,其中,各所述未来信用评分模型的时间标签均不相同;According to the category of the credit scoring model, predict and obtain several future credit scoring models with time stamps based on the several historical credit scoring models with time stamps or the sets of historical credit investigation data sets with time stamps , wherein the time labels of each of the future credit scoring models are different;
    将待评估征信数据输入至选定的具有时间标签的历史信用评分模型或未来信用评分模型中,以获得所述待评估征信数据对应的征信主体在所述时间标签对应的时间点的征信评估结果;Input the credit data to be evaluated into the selected historical credit scoring model or future credit scoring model with a time stamp to obtain the credit data of the credit subject corresponding to the credit data to be evaluated at the time point corresponding to the time stamp. credit evaluation results;
    对所述征信评估结果进行解释。Explain the results of the credit evaluation.
  2. 如权利要求1所述的个人信用评估与解释方法,其特征在于:The personal credit evaluation and interpretation method as claimed in claim 1, characterized in that:
    当所述信用评分模型为有权重评分模型时,基于所述若干带有时间标签的历史信用评分模型预测获取到若干带有时间标签的未来信用评分模型,包括:When the credit scoring model is a weighted scoring model, a number of future credit scoring models with time stamps are predicted and obtained based on the several time stamped historical credit scoring models, including:
    按属性对各所述历史信用评分模型的各属性权重进行分类汇总,以获取到若干带有时间标签的属性权重集;Classify and summarize the attribute weights of each of the historical credit scoring models according to attributes, so as to obtain several attribute weight sets with time labels;
    以各所述属性权重集作为训练数据集,训练得到若干与所述若干属性权重集一一对应的线性回归模型;Using each of the attribute weight sets as a training data set, training obtains several linear regression models corresponding to the several attribute weight sets one-to-one;
    使用训练好的若干所述线性回归模型,分别预测出各属性在未来若干时间点的带有时间标签的属性权重;Using several of the trained linear regression models, respectively predict the attribute weights with time labels for each attribute at several time points in the future;
    基于预测到的各属性在未来若干时间点的带有时间标签的属性权重构建出 所述若干带有时间标签的未来信用评分模型。The several time-tagged future credit scoring models are constructed based on the predicted time-tagged attribute weights of each attribute at several future time points.
  3. 如权利要求2所述的个人信用评估与解释方法,其特征在于,所述对所述评估结果进行解释包括:The personal credit evaluation and interpretation method according to claim 2, wherein the interpreting the evaluation result comprises:
    从所述征信评估结果中获取到各属性的权重并计算出总权重;Obtain the weight of each attribute from the credit evaluation result and calculate the total weight;
    计算各属性的权重占总权重的权重占比;Calculate the weight ratio of the weight of each attribute to the total weight;
    按权重占比对各属性进行重要性排序;Rank the importance of each attribute according to the weight ratio;
    将排序后的属性均匀划分为若干区间。The sorted attributes are evenly divided into several intervals.
  4. 如权利要求3所述的个人信用评估方法,其特征在于,所述对所述评估结果进行解释还包括:The personal credit evaluation method according to claim 3, wherein the interpreting the evaluation result further comprises:
    从历史征信数据中统计出各属性的得分分布;Calculate the score distribution of each attribute from historical credit data;
    统计出各属性的得分占比;Calculate the proportion of scores for each attribute;
    基于各属性的得分占比对各所述区间内的属性进行重新排序。Attributes in each of the intervals are reordered based on the score ratio of each attribute.
  5. 如权利要求1所述的个人信用评估与解释方法,其特征在于:The personal credit evaluation and interpretation method as claimed in claim 1, characterized in that:
    当所述信用评分模型为无权重评分模型时,基于所述若干组带有时间标签的历史征信数据集预测获取到若干带有时间标签的未来信用评分模型,包括:When the credit scoring model is an unweighted scoring model, several time-labeled future credit scoring models are predicted and obtained based on the several groups of time-labeled historical credit reporting data sets, including:
    获取各所述历史征信数据集的概率分布及所述概率分布的参数值,所述概率分布为高斯分布;Obtain the probability distribution of each of the historical credit information data sets and the parameter values of the probability distribution, where the probability distribution is a Gaussian distribution;
    使用核函数运算将各所述历史征信数据集的概率分布变换至再生核希伯特空间,得到若干与各所述历史征信数据集一一对应的带有时间标签的历史向量;Transform the probability distribution of each of the historical credit information data sets into a regenerated kernel Hibbert space using kernel function operations, and obtain a number of historical vectors with time labels corresponding to each of the historical credit information data sets one-to-one;
    以所述若干历史向量作为训练数据集,训练得到向量回归模型;Using the several historical vectors as a training data set, training obtains a vector regression model;
    使用训练好的所述向量回归模型,预测获取若干带有时间标签的预测向量;Use the trained vector regression model to predict and obtain several prediction vectors with time labels;
    使用核函数运算将所述若干预测向量逆变换至概率分布空间,从而获取到若干组带有时间标签的预测征信数据集;using kernel function operation to inversely transform the several prediction vectors into probability distribution space, so as to obtain several groups of time-labeled prediction credit information data sets;
    利用所述若干组带有时间标签的预测征信数据集对所述信用评分模型进行分别训练以获得若干个训练好的带有时间标签的未来信用评分模型。The credit scoring models are separately trained by using the several groups of time-labeled predictive credit reporting data sets to obtain several trained future credit scoring models with time-labels.
  6. 如权利要求5所述的个人信用评估与解释方法,其特征在于,采用局部可解释模型诊断方法对所述评估结果进行解释,包括:The personal credit evaluation and interpretation method according to claim 5, wherein the evaluation result is explained by using a local interpretable model diagnosis method, comprising:
    对待评估征信数据的属性进行搅动获取到由若干与所述待评估征信数据相近的样本数据构成的近似样本集;Stir the attributes of the credit data to be evaluated to obtain an approximate sample set consisting of several sample data similar to the credit data to be evaluated;
    经所述近似样本集输入至产生所述评估结果的历史信用评分模型或未来信用评分模型中,得到样本评估结果集;Inputting the approximate sample set into a historical credit scoring model or a future credit scoring model that generates the evaluation result, to obtain a sample evaluation result set;
    基于所述近似样本集和所述样本评估结果集训练得到局部有权重评分模型;A local weighted scoring model is obtained by training based on the approximate sample set and the sample evaluation result set;
    基于局部有权重评分模型的属性权重对所述评估结果进行解释。The evaluation results are interpreted based on the attribute weights of the local weighted scoring model.
  7. 如权利要求1所述的个人信用评估与解释方法,其特征在于:The personal credit evaluation and interpretation method as claimed in claim 1, characterized in that:
    所述有权重评分模型包括逻辑回归模型;The weighted scoring model includes a logistic regression model;
    所述无权重评分模型包括梯度提升决策树模型、神经网络模型。The weightless scoring model includes a gradient boosting decision tree model and a neural network model.
  8. 一种基于时序归因分析的个人信用评估与解释装置,其特征在于,其包括:A personal credit evaluation and interpretation device based on time series attribution analysis, characterized in that it includes:
    模型初始化模块,用于构建信用评分模型并初始化模型参数,所述信用评分模型为有权重评分模型或无权重评分模型;A model initialization module for constructing a credit scoring model and initializing model parameters, where the credit scoring model is a weighted scoring model or an unweighted scoring model;
    历史信用评分模型获取模块,用于利用若干组带有时间标签的历史征信数据集对所述信用评分模型进行分别训练以获得若干个训练好的带有时间标签的历史信用评分模型,其中:各所述历史征信数据集均包括多条历史征信数据,位于相同组的历史征信数据具有相同的时间标签,位于不同组的历史征信数据具有不同的时间标签,所述时间标签表征其所属的历史征信数据的数据产生时间;The historical credit scoring model acquisition module is used to separately train the credit scoring model by using several groups of historical credit reporting data sets with time tags to obtain several trained historical credit scoring models with time tags, wherein: Each of the historical credit investigation data sets includes multiple pieces of historical credit investigation data, the historical credit investigation data in the same group has the same time tag, and the historical credit investigation data in different groups has different time tags, and the time tags represent The data generation time of the historical credit data to which it belongs;
    未来信用评分模型获取模块,用于根据所述信用评分模型的类别,基于所述若干带有时间标签的历史信用评分模型或所述若干组带有时间标签的历史征信数据集预测获取到若干带有时间标签的未来信用评分模型,其中,各所述未来信用评分模型的时间标签均不相同;The future credit scoring model acquisition module is used to predict and obtain a number of historical credit scoring models based on the several time-tagged historical credit scoring models or the several groups of time-tagged historical credit reporting data sets according to the category of the credit scoring model. A future credit scoring model with a time stamp, wherein the time stamps of each of the future credit scoring models are different;
    信用评估模块,用于将待评估征信数据输入至选定的具有时间标签的历史信用评分模型或未来信用评分模型中,以获得所述待评估征信数据对应的征信主体在所述时间标签对应的时间点的征信评估结果;The credit evaluation module is used to input the credit data to be evaluated into the selected historical credit scoring model or future credit scoring model with a time stamp, so as to obtain the credit reporting subject corresponding to the credit data to be evaluated at the time The credit evaluation result at the time point corresponding to the label;
    解释模块,用于对所述征信评估结果进行解释。The interpretation module is used to interpret the credit evaluation result.
  9. 一种电子设备,包括存储器、处理器及存储在存储器内并可在处理器上运行的计算机程序,其特征在于,所述处理器执行所述程序时实现权利要求1至7任一项所述的基于时序归因分析的个人信用评估与解释方法。An electronic device, comprising a memory, a processor and a computer program stored in the memory and running on the processor, characterized in that, when the processor executes the program, any one of claims 1 to 7 is implemented A method of personal credit evaluation and interpretation based on time series attribution analysis.
  10. 一种计算机可读存储介质,其特征在于,所述计算机可读存储介质上存储有计算机程序,该程序被处理器执行时实现权利要求1-7任一项所述的基于时序归因分析的个人信用评估与解释方法。A computer-readable storage medium, characterized in that, a computer program is stored on the computer-readable storage medium, and when the program is executed by a processor, the time-series attribution analysis-based method according to any one of claims 1-7 is implemented. Personal credit assessment and interpretation methods.
PCT/CN2020/135274 2020-09-28 2020-12-10 Individual credit assessment and explanation method and apparatus based on time sequence attribution analysis, and device and storage medium WO2022062193A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202011039030.5 2020-09-28
CN202011039030.5A CN112215696A (en) 2020-09-28 2020-09-28 Personal credit evaluation and interpretation method, device, equipment and storage medium based on time sequence attribution analysis

Publications (1)

Publication Number Publication Date
WO2022062193A1 true WO2022062193A1 (en) 2022-03-31

Family

ID=74051781

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2020/135274 WO2022062193A1 (en) 2020-09-28 2020-12-10 Individual credit assessment and explanation method and apparatus based on time sequence attribution analysis, and device and storage medium

Country Status (2)

Country Link
CN (1) CN112215696A (en)
WO (1) WO2022062193A1 (en)

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115907144A (en) * 2022-11-21 2023-04-04 哈尔滨工业大学(深圳)(哈尔滨工业大学深圳科技创新研究院) Event prediction method and device, terminal equipment and storage medium
CN116416056A (en) * 2023-04-04 2023-07-11 深圳征信服务有限公司 Credit data processing method and system based on machine learning
CN116596284A (en) * 2023-07-18 2023-08-15 益企商旅(山东)科技服务有限公司 Travel decision management method and system based on customer requirements
CN117389179A (en) * 2023-10-17 2024-01-12 司空定制家居科技有限公司 Remote intelligent centralized control method and system for inspection equipment

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112862593B (en) * 2021-01-28 2024-05-03 深圳前海微众银行股份有限公司 Credit scoring card model training method, device and system and computer storage medium
CN117934159A (en) * 2024-03-21 2024-04-26 北京信立合创信息技术有限公司 Personal credit report query monitoring and early warning method based on artificial intelligence

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20190171428A1 (en) * 2017-12-04 2019-06-06 Banjo, Inc. Automated model management methods
CN109934371A (en) * 2017-12-18 2019-06-25 普华讯光(北京)科技有限公司 The method that solvency risk identification and prediction are carried out to enterprise based on electricity consumption data
CN110634060A (en) * 2018-06-21 2019-12-31 马上消费金融股份有限公司 User credit risk assessment method, system, device and storage medium
CN110866819A (en) * 2019-10-18 2020-03-06 华融融通(北京)科技有限公司 Automatic credit scoring card generation method based on meta-learning
CN111222982A (en) * 2020-01-16 2020-06-02 随手(北京)信息技术有限公司 Internet credit overdue prediction method, device, server and storage medium
CN111325344A (en) * 2020-02-24 2020-06-23 支付宝(杭州)信息技术有限公司 Method and apparatus for evaluating model interpretation tools

Family Cites Families (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20140365356A1 (en) * 2013-06-11 2014-12-11 Fair Isaac Corporation Future Credit Score Projection
US20180349790A1 (en) * 2017-05-31 2018-12-06 Microsoft Technology Licensing, Llc Time-Based Features and Moving Windows Sampling For Machine Learning
CN109978682A (en) * 2019-03-28 2019-07-05 上海拍拍贷金融信息服务有限公司 Credit-graded approach, device and computer storage medium
CN111260249B (en) * 2020-02-13 2022-08-05 武汉大学 Electric power communication service reliability assessment and prediction method and device based on LSTM and random forest mixed model
CN111652279B (en) * 2020-04-30 2024-04-30 中国平安财产保险股份有限公司 Behavior evaluation method and device based on time sequence data and readable storage medium
CN111598329A (en) * 2020-05-13 2020-08-28 中国科学院计算机网络信息中心 Time sequence data prediction method based on automatic parameter adjustment recurrent neural network

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20190171428A1 (en) * 2017-12-04 2019-06-06 Banjo, Inc. Automated model management methods
CN109934371A (en) * 2017-12-18 2019-06-25 普华讯光(北京)科技有限公司 The method that solvency risk identification and prediction are carried out to enterprise based on electricity consumption data
CN110634060A (en) * 2018-06-21 2019-12-31 马上消费金融股份有限公司 User credit risk assessment method, system, device and storage medium
CN110866819A (en) * 2019-10-18 2020-03-06 华融融通(北京)科技有限公司 Automatic credit scoring card generation method based on meta-learning
CN111222982A (en) * 2020-01-16 2020-06-02 随手(北京)信息技术有限公司 Internet credit overdue prediction method, device, server and storage medium
CN111325344A (en) * 2020-02-24 2020-06-23 支付宝(杭州)信息技术有限公司 Method and apparatus for evaluating model interpretation tools

Cited By (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115907144A (en) * 2022-11-21 2023-04-04 哈尔滨工业大学(深圳)(哈尔滨工业大学深圳科技创新研究院) Event prediction method and device, terminal equipment and storage medium
CN116416056A (en) * 2023-04-04 2023-07-11 深圳征信服务有限公司 Credit data processing method and system based on machine learning
CN116416056B (en) * 2023-04-04 2023-10-03 深圳征信服务有限公司 Credit data processing method and system based on machine learning
CN116596284A (en) * 2023-07-18 2023-08-15 益企商旅(山东)科技服务有限公司 Travel decision management method and system based on customer requirements
CN116596284B (en) * 2023-07-18 2023-09-26 益企商旅(山东)科技服务有限公司 Travel decision management method and system based on customer requirements
CN117389179A (en) * 2023-10-17 2024-01-12 司空定制家居科技有限公司 Remote intelligent centralized control method and system for inspection equipment
CN117389179B (en) * 2023-10-17 2024-05-03 北京思木企业管理咨询中心(有限合伙) Remote intelligent centralized control method and system for inspection equipment

Also Published As

Publication number Publication date
CN112215696A (en) 2021-01-12

Similar Documents

Publication Publication Date Title
WO2022062193A1 (en) Individual credit assessment and explanation method and apparatus based on time sequence attribution analysis, and device and storage medium
Qu et al. Review of bankruptcy prediction using machine learning and deep learning techniques
Patil et al. Prediction system for student performance using data mining classification
Jain et al. An efficient approach for multiclass student performance prediction based upon machine learning
El Fouki et al. Multidimensional Approach Based on Deep Learning to Improve the Prediction Performance of DNN Models.
Liang et al. A double channel CNN-LSTM model for text classification
Es–SABERY et al. An improved ID3 classification algorithm based on correlation function and weighted attribute
Lenin et al. Learning from Imbalanced Educational Data Using Ensemble Machine Learning Algorithms.
CN116306785A (en) Student performance prediction method of convolution long-short term network based on attention mechanism
Suresh et al. Predicting the e-learners learning style by using support vector regression technique
Gillmann et al. Quantification of Economic Uncertainty: a deep learning approach
CN114529063A (en) Financial field data prediction method, device and medium based on machine learning
Ajay et al. Prediction of student performance using random forest classification technique
Brandsætera et al. Explainable artificial intelligence: How subsets of the training data affect a prediction
Las Johansen et al. Predicting academic performance of information technology students using c4. 5 classification algorithm: a model development
Akhil Bank Loan Personal Modelling Using Classification Algorithms of Machine Learning
Karaçor et al. Exploiting visual features in financial time series prediction
Rüdian et al. Comparison and Prospect of Two Heaven Approaches: SVM and ANN for Identifying Students' Learning Performance
Verma et al. NATION-WISE AFFILIATION PREDICTION FOR THE REAL-TIME.
Anwar et al. Predicting student graduation using artificial neural network: A preliminary study of diploma in accountancy program at uitm sabah
CN110580261B (en) Deep technology tracking method for high-tech company
AU2021104628A4 (en) A novel machine learning technique for classification using deviation parameters
Ayad et al. A Proposed Model for Loan Approval Prediction Using Explainable Artificial Intelligence
Krishna et al. Student’s Performance Prediction
Wang Future Job Stability Prediction Based on Multiple Machine Learning Algorithms

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 20955035

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

32PN Ep: public notification in the ep bulletin as address of the adressee cannot be established

Free format text: NOTING OF LOSS OF RIGHTS PURSUANT TO RULE 112(1) EPC (EPO FORM 1205A DATED 18/08/2023)

122 Ep: pct application non-entry in european phase

Ref document number: 20955035

Country of ref document: EP

Kind code of ref document: A1