CN112215696A - Personal credit evaluation and interpretation method, device, equipment and storage medium based on time sequence attribution analysis - Google Patents

Personal credit evaluation and interpretation method, device, equipment and storage medium based on time sequence attribution analysis Download PDF

Info

Publication number
CN112215696A
CN112215696A CN202011039030.5A CN202011039030A CN112215696A CN 112215696 A CN112215696 A CN 112215696A CN 202011039030 A CN202011039030 A CN 202011039030A CN 112215696 A CN112215696 A CN 112215696A
Authority
CN
China
Prior art keywords
credit
historical
time
model
scoring model
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202011039030.5A
Other languages
Chinese (zh)
Inventor
孙圣力
程人
李青山
司华友
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Boya Chain Beijing Technology Co ltd
Nanjing Boya Blockchain Research Institute Co ltd
Peking University
Original Assignee
Boya Chain Beijing Technology Co ltd
Nanjing Boya Blockchain Research Institute Co ltd
Peking University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Boya Chain Beijing Technology Co ltd, Nanjing Boya Blockchain Research Institute Co ltd, Peking University filed Critical Boya Chain Beijing Technology Co ltd
Priority to CN202011039030.5A priority Critical patent/CN112215696A/en
Priority to PCT/CN2020/135274 priority patent/WO2022062193A1/en
Publication of CN112215696A publication Critical patent/CN112215696A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q40/00Finance; Insurance; Tax strategies; Processing of corporate or income taxes
    • G06Q40/03Credit; Loans; Processing thereof
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/243Classification techniques relating to the number of classes
    • G06F18/24323Tree-organised classifiers
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/084Backpropagation, e.g. using gradient descent
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q50/00Systems or methods specially adapted for specific business sectors, e.g. utilities or tourism
    • G06Q50/01Social networking

Abstract

The invention provides a personal credit evaluation and interpretation method, a device, equipment and a storage medium based on time sequence attribution analysis, wherein the method comprises the following steps: constructing a credit scoring model; respectively training a credit scoring model by using a plurality of groups of historical credit investigation data sets with time labels to obtain a plurality of historical credit scoring models; predicting a plurality of future credit scoring models based on a plurality of historical credit scoring models with time labels or a plurality of groups of historical credit assessment data sets with time labels according to the category of the credit scoring models; inputting credit assessment data to be assessed into a selected historical credit scoring model or a selected future credit scoring model so as to obtain a credit assessment result of a credit assessment main body corresponding to the credit assessment data to be assessed; and interpreting the credit assessment result. The invention constructs a series of credit scoring models aiming at a plurality of historical time points and a plurality of future time points, can realize the credit assessment of credit investigation subjects at specific time points by selecting a proper credit scoring model, and makes an explanation with reference value on the assessment result, thereby guiding the improvement of personal credit score.

Description

Personal credit evaluation and interpretation method, device, equipment and storage medium based on time sequence attribution analysis
Technical Field
The invention relates to the field of financial credit investigation, in particular to a personal credit assessment and interpretation method, a personal credit assessment and interpretation device, personal credit assessment and interpretation equipment and a storage medium based on time sequence attribution analysis.
Background
The rapid development of information technologies such as big data and cloud computing provides massive data and advanced technologies for the development of credit investigation business in the golden touch industry, wherein the personal credit investigation business based on the big data of the internet has huge development potential.
The existing personal credit scoring system can already evaluate the personal credit by using a powerful machine learning classification model. However, these systems generally have a concern that the evaluation results cannot be logically interpreted in a valuable manner, and therefore the credit assessing entity cannot be effectively and effectively advised to improve the credit score.
The attribution analysis technique is an effective way to mine and identify credit event inducers occurring in the credit finance field, and has been applied to personal credit assessment to try to make valuable logical interpretations of the assessment results.
However, in the main problem existing at present, the core scoring model of the scoring system is trained based on historical credit data of a certain fixed time period. However, the attribute of the credit investigation subject may change with the passage of time, and even some new attributes may appear, which have a great influence on the evaluation result. It is even possible that objective and effective assessment results are not available using such scoring models, let alone that valuable logical interpretations can be made on the results of the assessment.
Disclosure of Invention
In order to solve at least one of the above technical problems, a first aspect of the present invention provides a personal credit assessment and interpretation method based on time series attribution analysis, which has the following specific technical solutions:
a method for personal credit assessment and interpretation based on temporal attribution analysis, comprising:
establishing a credit scoring model and initializing model parameters, wherein the credit scoring model is a weighted scoring model or a non-weighted scoring model;
respectively training the credit scoring model by using a plurality of groups of historical credit assessment data sets with time labels to obtain a plurality of trained historical credit scoring models with time labels, wherein: each historical credit investigation data set comprises a plurality of pieces of historical credit investigation data, the historical credit investigation data in the same group have the same time label, the historical credit investigation data in different groups have different time labels, and the time labels represent the data generation time of the historical credit investigation data to which the time labels belong;
according to the category of the credit scoring model, predicting and acquiring a plurality of future credit scoring models with time labels based on the plurality of historical credit scoring models with time labels or the plurality of groups of historical credit investigation data sets with time labels, wherein the time labels of the future credit scoring models are different;
inputting credit investigation data to be evaluated into a selected historical credit scoring model or a selected future credit scoring model with a time tag so as to obtain a credit investigation evaluation result of a credit investigation subject corresponding to the credit investigation data to be evaluated at a time point corresponding to the time tag;
and interpreting the credit assessment result.
The invention provides a personal credit assessment and interpretation device based on time sequence attribution analysis, which comprises:
the model initialization module is used for constructing a credit scoring model and initializing model parameters, wherein the credit scoring model is a weighted scoring model or a non-weighted scoring model;
the historical credit scoring model acquisition module is used for respectively training the credit scoring models by utilizing a plurality of groups of historical credit investigation data sets with time labels to acquire a plurality of trained historical credit scoring models with time labels, wherein: each historical credit investigation data set comprises a plurality of pieces of historical credit investigation data, the historical credit investigation data in the same group have the same time label, the historical credit investigation data in different groups have different time labels, and the time labels represent the data generation time of the historical credit investigation data to which the time labels belong;
the future credit scoring model acquisition module is used for predicting and acquiring a plurality of future credit scoring models with time labels based on the plurality of historical credit scoring models with time labels or the plurality of groups of historical credit investigation data sets with time labels according to the category of the credit scoring model, wherein the time labels of the future credit scoring models are different;
the credit evaluation module is used for inputting credit assessment data to be evaluated into a selected historical credit scoring model or a selected future credit scoring model with a time tag so as to obtain a credit assessment result of a credit assessment main body corresponding to the credit assessment data to be evaluated at a time point corresponding to the time tag;
and the interpretation module is used for interpreting the credit investigation result.
A third aspect of the present invention provides an electronic device, comprising a memory, a processor and a computer program stored in the memory and executable on the processor, wherein the processor executes the program to implement the method for personal credit assessment and interpretation based on time series attribution analysis according to the first aspect of the present invention.
A fourth aspect of the present invention provides a computer-readable storage medium having stored thereon a computer program which, when executed by a processor, implements the method for personal credit assessment and interpretation based on time series attribution analysis according to the first aspect of the present invention.
Compared with the credit scoring model in the prior art, the method has the following remarkable advantages:
a series of credit scoring models over time are constructed for a plurality of historical points and a plurality of future time points. The credit condition of a credit investigation subject at a certain time point can be evaluated by selecting a proper credit scoring model, so that the evaluation effect is obviously improved, the interpretability of an evaluation result is ensured, and a valuable reference is provided for improving the personal credit score.
Drawings
FIG. 1 is a flow chart of a method for personal credit assessment and interpretation based on time series attribution analysis according to an embodiment of the present invention;
FIG. 2 is a flow chart of a method for personal credit assessment and interpretation based on time series attribution analysis according to an embodiment of the present invention;
FIG. 3 is a flowchart illustrating a method for personal credit assessment and interpretation based on time-series attribution analysis according to an embodiment of the present invention;
FIG. 4 is a flowchart illustrating a method for personal credit assessment and interpretation based on time-series attribution analysis according to an embodiment of the present invention;
FIG. 5 is a flowchart illustrating a method for personal credit assessment and interpretation based on time-series attribution analysis according to an embodiment of the present invention;
FIG. 6 is a schematic diagram of a personal credit evaluation and interpretation apparatus based on time-series attribution analysis according to an embodiment of the present invention;
FIG. 7 is a schematic structural diagram of an electronic device in an embodiment of the invention;
fig. 8 is a data structure diagram of credit investigation data of a credit investigation subject in the embodiment of the invention;
FIG. 9 is a logic diagram illustrating the acquisition of historical scoring models and future scoring models in accordance with an embodiment of the present invention;
FIG. 10 is a schematic diagram of a linear regression model with the attribute "liability ratio" in an embodiment of the present invention;
FIG. 11 is a flowchart of an evaluation result interpretation process in the embodiment of the present invention;
fig. 12 is a logic diagram of a method for obtaining a plurality of approximate sample data by perturbing attribute values of each attribute of credit investigation data according to an embodiment of the present invention.
Detailed Description
In order to make the aforementioned objects, features and advantages of the present invention comprehensible, embodiments accompanied with figures are described in further detail below.
Although the present invention provides the method operation steps or apparatus structures as shown in the following embodiments or figures, more or less operation steps or module units may be included in the method or apparatus based on conventional or non-inventive labor. In the case of steps or structures which do not logically have the necessary cause and effect relationship, the execution order of the steps or the block structure of the apparatus is not limited to the execution order or the block structure shown in the embodiment or the drawings of the present invention. The described methods or modular structures, when applied in an actual device or end product, may be executed sequentially or in parallel according to embodiments or the methods or modular structures shown in the figures.
The core scoring model of the existing credit scoring system is trained based on historical credit data of a certain fixed time period. However, the attribute of the credit investigation subject may change with the passage of time, and even some new attributes may appear, which have a great influence on the evaluation result. It is even possible that objective and effective assessment results cannot be obtained using existing scoring models, let alone that valuable logical interpretations can be made on the results of the assessment.
In view of the above-mentioned drawbacks of the existing credit assessment model, the present invention provides a personal credit assessment and interpretation method, device, equipment and storage medium based on time-series attribution analysis, which constructs a series of credit assessment models over time for a plurality of historical time points and a plurality of future time points. The credit condition of the credit investigation subject at a certain past, present or future specific time point can be evaluated by selecting a proper credit scoring model, so that the evaluation effect is remarkably improved, and the interpretability of the evaluation result is ensured.
Before describing embodiments of the present invention, the following terms will be used:
credit investigation data:data relating to credit of a credit investigation subject, the credit investigation data comprising a number of attributes (alternatively referred to as signatures), each attribute having an attribute value. The attribute values of each credit investigation data may be acquired from one data source or may be acquired from a plurality of data sources. For example, fig. 8 shows a data format of credit investigation data of a credit investigation subject in an embodiment, the credit investigation data includes 12 attributes collected from three data sources, and each attribute corresponds to an attribute value. The three data sources are respectively financial institutions (such as banks), consumption platforms (such as Taobao, Mei Tuo and the like) and social networks. The first two data sources can directly reflect credit investigation behaviors of credit investigation subjects, and the introduction of the social network can improve the scoring accuracy of the scoring model to a certain extent.
Of course, when model training or credit assessment is performed, the credit data needs to be preprocessed, such as converting the credit data into a vector form.
Time labeling:the time stamp of the credit data is used to indicate the generation time of the credit data. The time label span may be year, quarter, month, etc., and the time label span in the embodiment shown in fig. 7 is year, and the time label of the credit data shown in the time label span in fig. 7 is "2018 year", which means that all attribute data of the credit data are generated in 2018.
Weighted scoring model and unweighted scoring model:
the scoring model may be expressed as Score ═ F (x), where: x is an attribute vector of credit investigation data of a credit investigation subject, F is a selected scoring model, and Score is a credit Score finally obtained by the scoring model.
If so, the scoring model F can be expressed as follows;
F(x)=α1x1+…αnxn
wherein: x is the number ofiAttribute value of i-th attribute of credit main body, alphaiN is the number of attributes of the credit investigation subject, and is the weight corresponding to the ith attribute.
The scoring model F is defined as a weighted scoring model.
Otherwise, the scoring model F is defined as a weightless scoring model.
The weight scoring model may select a logistic regression model. Logistic regression, as the simplest classification algorithm, has been the mainstream classification algorithm in the industry, and has the advantages of simplicity, stability, strong interpretability, easy detection and deployment, and the like.
The algorithm models such as a Gradient Boosting Decision Tree (GBDT) and a deep neural network can be selected for the weightless scoring model. Wherein: the Gradient Boosting Decision Tree (GBDT) belongs to one of the integrated algorithms, and the basic learner adopts a classification regression tree, and the algorithm has the advantages that: has outstanding classification effect and can realize characteristic screening in China during training. The deep neural network can be understood as a neural network comprising a plurality of hidden layers, and can be adjusted in extremely high dimensionality through huge parameters by the technologies of activation functions, back propagation and the like, so that complex classification boundaries can be fully identified, and a good classification effect is achieved.
Method embodiment
As shown in fig. 1, the method for evaluating and interpreting personal credit based on time series attribution analysis according to the embodiment of the present invention comprises the following steps:
s100, a credit scoring model is built and model parameters are initialized, wherein the credit scoring model is a weighted scoring model or a non-weighted scoring model.
S200, respectively training the credit scoring model by using a plurality of groups of historical credit investigation data sets with time labels to obtain a plurality of trained historical credit scoring models with time labels. Wherein: each historical credit investigation data set comprises a plurality of pieces of historical credit investigation data, the historical credit investigation data in the same group have the same time label, the historical credit investigation data in different groups have different time labels, and the time labels represent the data generation time of the historical credit investigation data to which the time labels belong.
The time label span (granularity) can be year, season, month or even day, generally, the shorter the time label span (the finer the granularity), the more evenly the data distribution in the collected historical credit data set with the same time label is, and the better the scoring effect of the trained historical credit scoring model is.
In practical application, the span of the time tag can be selected according to specific needs. In the embodiment of fig. 9, the time label is year, the current year is 2020, and three sets of historical credit investigation data sets, namely a 2017 credit investigation data set, a 2018 credit investigation data set and a 2019 credit investigation data set, are obtained by collecting and sampling credit investigation data of the last three years. For example, all credit investigation data in the credit investigation data set in 2017 are generated in 2017, and each credit investigation data represents the credit condition of one credit investigation subject in 2017.
And respectively taking the three groups of historical credit investigation data sets as training samples to respectively train the credit rating models, namely correspondingly obtaining three historical credit rating models, namely a 2017 historical credit rating model, a 2018 historical credit rating model and a 2019 calendar history rating model. In practical application, of course, credit investigation data of more years can be collected, and more historical credit scoring models can be trained.
Specifically, the method comprises the following steps:
when the credit scoring model is a weighted scoring model, such as a logistic regression model.
The specific implementation process of step S200 is as follows:
the logistic regression expression can be solved iteratively by adopting a classical gradient descent idea, and the training speed is very fast. The result obtained by the logistic regression can be conveniently converted into a standard scoring card mode, namely, the total credit score can be finally obtained and can be split to obtain the dimension credit score corresponding to each attribute:
(1) when the model is solved, firstly, segmenting variables by using a variable box dividing method;
(2) then, encoding the separated discrete variables into continuous variables by using WOE (word-in-error) encoding;
(3) and then carrying out solution training on the model.
The final result can be expressed as follows,
F(x)=A-B(α1x1+…αnxn),αi=θiwi
a, B is a constant, it can be seen that the credit score corresponding to each attribute is-B θixi
When the credit scoring model adopts weightless scoring model, such as Gradient Boosting Decision Tree (GBDT), depth god Via an algorithm model such as a network.
The specific implementation process of step S200 is as follows:
and utilizing historical credit investigation data, adopting a multi-round iteration mode, generating a weak classifier in each round of iteration, training each classifier on the basis of the residual error of the last round of classifier, and finally weighting and summing the weak classifiers obtained in each round of training to obtain a total classifier.
S300, according to the category of the credit scoring model, predicting and acquiring a plurality of future credit scoring models with time labels based on a plurality of historical credit scoring models with time labels or a plurality of groups of historical credit assessment data sets with time labels, wherein the time labels of the future credit scoring models are different.
Specifically, the method comprises the following steps:
when the credit scoring model is a weighted scoring model, such as a logistic regression model.
As shown in fig. 2, the specific implementation process of step S300 is as follows:
s301, classifying and summarizing the attribute weights of the historical credit scoring models according to the attributes to obtain a plurality of attribute weight sets with time labels.
The credit data in the embodiment of fig. 8 is still taken as an example. M historical credit scoring models are trained through step S200, and as shown in fig. 9, three historical models are trained. Of course, in practical embodiments, more or less historical credit score models need to be trained.
The historical credit score model with time label j can be expressed as:
F(xj)=α1jx1j+…+αijxijnjxnj
wherein: j is a time stamp, xijFor the attribute value, alpha, of the ith attribute at the time point corresponding to the time label jijThe weight of the ith attribute at the time point corresponding to the time label j is defined, and n is the number of the attributes.
Then, the attribute weight set corresponding to the ith attribute is: (alphai1,…,αim)。
And S302, training by taking each attribute weight set as a training data set to obtain a plurality of linear regression models corresponding to the attribute weight sets one by one.
I.e. corresponding to the ith attribute, to the attribute weight set (alpha)i1,…,αim) Regression analysis is performed to fit a linear regression model corresponding to the ith attribute. N linear regression models were fitted together.
Fig. 10 shows a linear regression model trained with the attribute "liability ratio" as an example.
S303, respectively predicting the attribute weight with the time label of each attribute at a plurality of future time points by using the trained linear regression models.
After the attribute weight regression model corresponding to each attribute is trained, the attribute weight of each attribute at a certain time point in the future can be predicted, and thus the attribute weight with a time label of each attribute at a plurality of time points in the future is obtained.
S304, constructing a plurality of future credit scoring models with time labels based on the predicted attribute weights with time labels of the attributes at a plurality of future time points.
When the credit scoring model adopts weightlessWhen scoring models, e.g. Gradient Boosting Decision Trees (GBDT), depth god Via an algorithm model such as a network.
Since the credit scoring model adopts a weightless scoring model, the scoring model can effectively evaluate the credit of credit assessing subjects at future time points. Historical credit investigation data of a credit investigation subject can be fully utilized, and the change trend of the data along with time is learned from the historical credit investigation data, so that a series of credit investigation data corresponding to a future time point can be predicted. Specifically, the method comprises the following steps:
as shown in fig. 4, the specific implementation process of step S300 is as follows:
s301', probability distribution of each historical credit investigation data set and parameter values of the probability distribution are obtained, and the probability distribution is Gaussian distribution.
S302', the probability distribution of each historical credit data set is transformed to a regeneration kernel Hilbert space by using kernel function operation, and a plurality of historical vectors which are in one-to-one correspondence with each historical credit data set and have time labels are obtained.
S303', training a plurality of historical vectors as a training data set to obtain a vector regression model;
s304', using the trained vector regression model to predict and obtain a plurality of prediction vectors with time labels.
S305', inversely transforming the prediction vectors to a probability distribution space by using a kernel function operation, thereby obtaining a plurality of groups of prediction credit investigation data sets with time labels;
s306', the credit scoring models are respectively trained by using a plurality of groups of prediction credit assessment data sets with time labels to obtain a plurality of trained future credit scoring models with time labels.
With continued reference to fig. 9, after step S300 is performed, three future credit scoring models with time labels are obtained, namely a 2020 future credit scoring model, a 2021 future credit scoring model and a 2022 future credit scoring model.
S400, inputting credit data to be evaluated into a selected historical credit scoring model or a selected future credit scoring model with a time tag so as to obtain a credit evaluation result of a credit main body corresponding to the credit data to be evaluated at a time point corresponding to the time tag.
As shown in fig. 9, when the current time is 2020, if the credit investigation situation of the credit investigation subject in 2018 is to be evaluated, the credit investigation data of the credit investigation subject is input into the 2018 historical credit score model, so that the credit investigation result of the credit investigation subject in 2018 can be obtained. If the credit investigation subject wants to evaluate the credit investigation situation in 2022 years, the credit investigation data of the credit investigation subject is input into the score model of the future credit in 2022 years, so that the credit investigation result of the credit investigation subject in 2022 years can be obtained.
And S500, explaining the credit investigation evaluation result.
When the credit scoring model is a weighted scoring model, such as a logistic regression model.
As shown in fig. 3, the specific implementation process of S500 is as follows:
and S501, acquiring the weight of each attribute from the token evaluation result and calculating the total weight.
S502, calculating the weight ratio of the weight of each attribute to the total weight.
S503, sorting the importance of each attribute according to the weight ratio.
And S504, uniformly dividing the sorted attributes into a plurality of intervals.
For the weighted scoring model, the higher the weight is, it can be interpreted that the influence of the corresponding attribute on the evaluation result is naturally greater, that is, the influence of the attribute on the credit degree of the credit investigation subject is greater. Therefore, the credit investigation entity can focus on each attribute in the top section and improve the credit score of the credit investigation entity by improving the attribute values of the attributes.
The weight of some attributes may be extreme, so that the ranking result of the attributes cannot objectively and truly reflect the importance of the attributes.
In view of this, optionally, historical scores for each attribute may also be considered.
As shown in fig. 3, optionally, S500 further includes:
and S505, counting the score distribution of each attribute from the historical credit investigation data.
Specifically, for a certain attribute, the people number proportion of the credit main body of each score section can be counted, the people number proportion referred to here should be overlapped from low to high according to the score, and the actual meaning is the people number with the score being more than or equal to a certain score section.
S506, counting the score proportion of each attribute.
Specifically, for a certain attribute, a score segment where the score of the credit investigation subject is located can be obtained, and then the score proportion condition of the attribute of the credit investigation subject can be known according to the number proportion of the score segment;
and S507, reordering the attributes in the intervals based on the score proportion of the attributes.
And sorting the attributes in the same interval again according to the score proportion, wherein the lower the score proportion is, the higher the sorting is, the final sorting result is presented to the user, and the weight proportion and the score proportion are also displayed together.
The credit investigation subject can balance and select the importance of the attribute on the evaluation result by combining the weight proportion and the score proportion so as to improve the credit degree of the credit investigation subject.
When the credit scoring model adopts weightless scoring model, such as Gradient Boosting Decision Tree (GBDT), depth god Via an algorithm model such as a network.
Since the credit scoring model adopts the weightless scoring model, in order to explain the evaluation result, the credit scoring model and the attribute data of the credit investigation subject need to be combined for training to obtain a local weighted scoring model. Specifically, a local weighted score model can be obtained by using a local interpretable diagnostic algorithm (LIME), and the local interpretable diagnostic algorithm (LIME) can theoretically explain the evaluation result of any model without weighted score.
As shown in fig. 5 and fig. 11, the interpreting the credit assessment result by using the locally interpretable diagnostic algorithm specifically includes:
s501', stirring the attributes of the credit investigation data to be evaluated to obtain an approximate sample set consisting of a plurality of sample data close to the credit investigation data to be evaluated.
The credit data in the embodiment of fig. 8 is still taken as an example. As shown in fig. 12, by perturbing the attribute values (such as income and liability ratio in the figure) of each attribute of the credit data, a plurality of sample data close to the credit data to be evaluated can be obtained, and finally an approximate sample set is formed.
S502', the approximate sample set is input into a historical credit scoring model or a future credit scoring model which generates an evaluation result, and a sample evaluation result set is obtained.
S503', training to obtain a local weighted scoring model based on the approximate sample set and the sample evaluation result set.
S504', based on the attribute weight of the local weighted scoring model, the evaluation result can be interpreted.
Therefore, the personal credit assessment method provided by the invention constructs a series of credit scoring models with the time lapse aiming at a plurality of historical time points and a plurality of future time points. The credit condition of the credit investigation subject at a certain specific time point can be evaluated by selecting a proper credit scoring model, so that the evaluation effect of the evaluation model is remarkably improved, and the interpretability of the evaluation result is ensured.
Device embodiment
As shown in fig. 6, the personal credit evaluation device based on time series attribution analysis in the present embodiment includes a model initialization module 10, a historical credit score model acquisition module 20, a future credit score model acquisition module 30, a credit evaluation module 40 and an interpretation module 50. Wherein:
the model initialization module 10 is configured to construct a credit scoring model and initialize model parameters, where the credit scoring model is a weighted scoring model or a non-weighted scoring model.
A historical credit scoring model obtaining module 20, configured to train the credit scoring models with sets of time-tagged historical credit assessment data sets respectively to obtain a plurality of trained time-tagged historical credit scoring models, where: each historical credit investigation data set comprises a plurality of pieces of historical credit investigation data, the historical credit investigation data in the same group have the same time label, the historical credit investigation data in different groups have different time labels, and the time labels represent the data generation time of the historical credit investigation data to which the time labels belong.
A future credit scoring model obtaining module 30, configured to predict and obtain a plurality of future credit scoring models with time labels based on the plurality of historical credit scoring models with time labels or the plurality of groups of historical credit assessment data sets with time labels according to the category of the credit scoring model, where the time labels of the future credit scoring models are different;
the credit evaluation module 40 is configured to input credit assessment data to be evaluated into a selected historical credit scoring model or a selected future credit scoring model with a time tag, so as to obtain a credit assessment result of a credit assessment main body corresponding to the credit assessment data to be evaluated at a time point corresponding to the time tag;
and the interpreting module 50 is used for interpreting the credit assessment result.
Since the processing procedure of each functional module of the personal credit evaluation device in this embodiment is consistent with the processing procedure of the personal credit evaluation method in the first embodiment, the processing procedure of each functional module of the personal credit evaluation device in this embodiment is not described repeatedly, and reference may be made to the related description of the first embodiment.
Electronic device embodiment
Fig. 7 is a schematic structural diagram of an electronic device according to an embodiment of the present disclosure, and as shown in fig. 7, the electronic device includes a processor 61 and a memory 63, and the processor 61 is connected to the memory 63, for example, through a bus 63.
The processor 61 may be a CPU, general purpose processor, DSP, ASIC, FPGA or other programmable device, transistor logic device, hardware component, or any combination thereof. Which may implement or perform the various illustrative logical blocks, modules, and circuits described in connection with the disclosure. The processor 61 may also be a combination of computing functions, e.g., comprising one or more microprocessors, a combination of a DSP and a microprocessor, or the like.
Bus 62 may include a path that carries information between the aforementioned components. The bus 62 may be a PCI bus or an EISA bus, etc. The bus 62 may be divided into an address bus, a data bus, a control bus, and the like. For ease of illustration, only one thick line is shown, but this does not mean only one bus or one type of bus.
Memory 63 may be, but is not limited to, a ROM or other type of static storage device that can store static information and instructions, a RAM or other type of dynamic storage device that can store information and instructions, an EEPROM, a CD-ROM or other optical disk storage, magnetic disk storage or other magnetic storage devices, or any other medium that can be used to carry or store desired program code in the form of instructions or data structures and that can be accessed by a computer.
The memory 63 is used for storing application program codes of the present application, and is controlled to be executed by the processor 61. The processor 61 is configured to execute the application program codes stored in the memory 63 to implement the personal credit evaluation method of the first embodiment.
Finally, an embodiment of the present application provides a computer-readable storage medium, where a computer program is stored on the computer-readable storage medium, and when the computer program is executed by a processor, the personal credit assessment method in the first embodiment is implemented.
The invention has been described above with a certain degree of particularity. It will be understood by those of ordinary skill in the art that the description of the embodiments is merely exemplary and that all changes that come within the true spirit and scope of the invention are desired to be protected. The scope of the invention is defined by the appended claims rather than by the foregoing description of the embodiments.

Claims (10)

1. A method for personal credit assessment and interpretation based on time series attribution analysis, comprising:
establishing a credit scoring model and initializing model parameters, wherein the credit scoring model is a weighted scoring model or a non-weighted scoring model;
respectively training the credit scoring model by using a plurality of groups of historical credit assessment data sets with time labels to obtain a plurality of trained historical credit scoring models with time labels, wherein: each historical credit investigation data set comprises a plurality of pieces of historical credit investigation data, the historical credit investigation data in the same group have the same time label, the historical credit investigation data in different groups have different time labels, and the time labels represent the data generation time of the historical credit investigation data to which the time labels belong;
according to the category of the credit scoring model, predicting and acquiring a plurality of future credit scoring models with time labels based on the plurality of historical credit scoring models with time labels or the plurality of groups of historical credit investigation data sets with time labels, wherein the time labels of the future credit scoring models are different;
inputting credit investigation data to be evaluated into a selected historical credit scoring model or a selected future credit scoring model with a time tag so as to obtain a credit investigation evaluation result of a credit investigation subject corresponding to the credit investigation data to be evaluated at a time point corresponding to the time tag;
and interpreting the credit assessment result.
2. The personal credit assessment and interpretation method of claim 1, wherein:
when the credit scoring model is a weighted scoring model, predicting and acquiring a plurality of future credit scoring models with time labels based on the plurality of historical credit scoring models with time labels, wherein the predicting and acquiring comprises the following steps:
classifying and summarizing the attribute weights of the historical credit scoring models according to attributes to obtain a plurality of attribute weight sets with time labels;
training to obtain a plurality of linear regression models corresponding to the attribute weight sets one by taking each attribute weight set as a training data set;
respectively predicting the attribute weight with a time label of each attribute at a plurality of future time points by using a plurality of trained linear regression models;
and constructing a plurality of future credit scoring models with time labels based on the predicted attribute weights with time labels of the attributes at a plurality of future time points.
3. The personal credit evaluation and interpretation method of claim 2, wherein said interpreting the result of said evaluation comprises:
acquiring the weight of each attribute from the credit investigation evaluation result and calculating the total weight;
calculating the weight ratio of the weight of each attribute to the total weight;
ranking the importance of each attribute according to the weight ratio;
and uniformly dividing the sorted attributes into a plurality of intervals.
4. The personal credit evaluation method of claim 3, wherein said interpreting the evaluation further comprises:
counting the score distribution of each attribute from historical credit investigation data;
counting the score proportion of each attribute;
and reordering the attributes in the intervals based on the score proportion of the attributes.
5. The personal credit assessment and interpretation method of claim 1, wherein:
when the credit scoring model is a weightless scoring model, predicting and acquiring a plurality of future credit scoring models with time labels based on the plurality of groups of historical credit investigation data sets with time labels, wherein the steps of:
acquiring probability distribution of each historical credit investigation data set and parameter values of the probability distribution, wherein the probability distribution is Gaussian distribution;
transforming the probability distribution of each historical credit data set to a regenerated kernel Hilbert space by using kernel function operation to obtain a plurality of historical vectors which are in one-to-one correspondence with each historical credit data set and have time labels;
training to obtain a vector regression model by taking the historical vectors as a training data set;
predicting and obtaining a plurality of prediction vectors with time labels by using the trained vector regression model;
inversely transforming the prediction vectors to a probability distribution space by using kernel function operation, thereby obtaining a plurality of groups of prediction credit data sets with time labels;
and respectively training the credit scoring model by using the plurality of groups of predicted credit assessment data sets with time labels to obtain a plurality of trained future credit scoring models with time labels.
6. The personal credit assessment and interpretation method of claim 5, wherein interpreting said assessment results using a locally interpretable model diagnostic method comprises:
stirring the attribute of credit investigation data to be evaluated to obtain an approximate sample set consisting of a plurality of sample data close to the credit investigation data to be evaluated;
inputting the approximate sample set into a historical credit scoring model or a future credit scoring model which generates the evaluation result to obtain a sample evaluation result set;
training to obtain a local weighted scoring model based on the approximate sample set and the sample evaluation result set;
interpreting the evaluation result based on the attribute weight of the local weighted scoring model.
7. The personal credit assessment and interpretation method of claim 1, wherein:
the weighted scoring model comprises a logistic regression model;
the weightless scoring model comprises a gradient lifting decision tree model and a neural network model.
8. An apparatus for personal credit assessment and interpretation based on time series attribution analysis, comprising:
the model initialization module is used for constructing a credit scoring model and initializing model parameters, wherein the credit scoring model is a weighted scoring model or a non-weighted scoring model;
the historical credit scoring model acquisition module is used for respectively training the credit scoring models by utilizing a plurality of groups of historical credit investigation data sets with time labels to acquire a plurality of trained historical credit scoring models with time labels, wherein: each historical credit investigation data set comprises a plurality of pieces of historical credit investigation data, the historical credit investigation data in the same group have the same time label, the historical credit investigation data in different groups have different time labels, and the time labels represent the data generation time of the historical credit investigation data to which the time labels belong;
the future credit scoring model acquisition module is used for predicting and acquiring a plurality of future credit scoring models with time labels based on the plurality of historical credit scoring models with time labels or the plurality of groups of historical credit investigation data sets with time labels according to the category of the credit scoring model, wherein the time labels of the future credit scoring models are different;
the credit evaluation module is used for inputting credit assessment data to be evaluated into a selected historical credit scoring model or a selected future credit scoring model with a time tag so as to obtain a credit assessment result of a credit assessment main body corresponding to the credit assessment data to be evaluated at a time point corresponding to the time tag;
and the interpretation module is used for interpreting the credit investigation result.
9. An electronic device comprising a memory, a processor and a computer program stored in the memory and executable on the processor, wherein the processor implements the method for personal credit assessment and interpretation based on time series attribution analysis according to any one of claims 1 to 7 when executing the program.
10. A computer-readable storage medium, wherein the computer-readable storage medium has stored thereon a computer program, which when executed by a processor, implements the method for personal credit assessment and interpretation based on time-series attribution analysis according to any one of claims 1-7.
CN202011039030.5A 2020-09-28 2020-09-28 Personal credit evaluation and interpretation method, device, equipment and storage medium based on time sequence attribution analysis Pending CN112215696A (en)

Priority Applications (2)

Application Number Priority Date Filing Date Title
CN202011039030.5A CN112215696A (en) 2020-09-28 2020-09-28 Personal credit evaluation and interpretation method, device, equipment and storage medium based on time sequence attribution analysis
PCT/CN2020/135274 WO2022062193A1 (en) 2020-09-28 2020-12-10 Individual credit assessment and explanation method and apparatus based on time sequence attribution analysis, and device and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202011039030.5A CN112215696A (en) 2020-09-28 2020-09-28 Personal credit evaluation and interpretation method, device, equipment and storage medium based on time sequence attribution analysis

Publications (1)

Publication Number Publication Date
CN112215696A true CN112215696A (en) 2021-01-12

Family

ID=74051781

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202011039030.5A Pending CN112215696A (en) 2020-09-28 2020-09-28 Personal credit evaluation and interpretation method, device, equipment and storage medium based on time sequence attribution analysis

Country Status (2)

Country Link
CN (1) CN112215696A (en)
WO (1) WO2022062193A1 (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112862593A (en) * 2021-01-28 2021-05-28 深圳前海微众银行股份有限公司 Credit scoring card model training method, device, system and computer storage medium
CN112862593B (en) * 2021-01-28 2024-05-03 深圳前海微众银行股份有限公司 Credit scoring card model training method, device and system and computer storage medium

Families Citing this family (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115907144A (en) * 2022-11-21 2023-04-04 哈尔滨工业大学(深圳)(哈尔滨工业大学深圳科技创新研究院) Event prediction method and device, terminal equipment and storage medium
CN116416056B (en) * 2023-04-04 2023-10-03 深圳征信服务有限公司 Credit data processing method and system based on machine learning
CN116596284B (en) * 2023-07-18 2023-09-26 益企商旅(山东)科技服务有限公司 Travel decision management method and system based on customer requirements

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20140365356A1 (en) * 2013-06-11 2014-12-11 Fair Isaac Corporation Future Credit Score Projection
US20180349790A1 (en) * 2017-05-31 2018-12-06 Microsoft Technology Licensing, Llc Time-Based Features and Moving Windows Sampling For Machine Learning
CN109978682A (en) * 2019-03-28 2019-07-05 上海拍拍贷金融信息服务有限公司 Credit-graded approach, device and computer storage medium
CN111260249A (en) * 2020-02-13 2020-06-09 武汉大学 Electric power communication service reliability assessment and prediction method and device based on LSTM and random forest mixed model
CN111598329A (en) * 2020-05-13 2020-08-28 中国科学院计算机网络信息中心 Time sequence data prediction method based on automatic parameter adjustment recurrent neural network
CN111652279A (en) * 2020-04-30 2020-09-11 中国平安财产保险股份有限公司 Behavior evaluation method and device based on time sequence data and readable storage medium

Family Cites Families (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US10209974B1 (en) * 2017-12-04 2019-02-19 Banjo, Inc Automated model management methods
CN109934371A (en) * 2017-12-18 2019-06-25 普华讯光(北京)科技有限公司 The method that solvency risk identification and prediction are carried out to enterprise based on electricity consumption data
CN110634060A (en) * 2018-06-21 2019-12-31 马上消费金融股份有限公司 User credit risk assessment method, system, device and storage medium
CN110866819A (en) * 2019-10-18 2020-03-06 华融融通(北京)科技有限公司 Automatic credit scoring card generation method based on meta-learning
CN111222982A (en) * 2020-01-16 2020-06-02 随手(北京)信息技术有限公司 Internet credit overdue prediction method, device, server and storage medium
CN111325344A (en) * 2020-02-24 2020-06-23 支付宝(杭州)信息技术有限公司 Method and apparatus for evaluating model interpretation tools

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20140365356A1 (en) * 2013-06-11 2014-12-11 Fair Isaac Corporation Future Credit Score Projection
US20180349790A1 (en) * 2017-05-31 2018-12-06 Microsoft Technology Licensing, Llc Time-Based Features and Moving Windows Sampling For Machine Learning
CN109978682A (en) * 2019-03-28 2019-07-05 上海拍拍贷金融信息服务有限公司 Credit-graded approach, device and computer storage medium
CN111260249A (en) * 2020-02-13 2020-06-09 武汉大学 Electric power communication service reliability assessment and prediction method and device based on LSTM and random forest mixed model
CN111652279A (en) * 2020-04-30 2020-09-11 中国平安财产保险股份有限公司 Behavior evaluation method and device based on time sequence data and readable storage medium
CN111598329A (en) * 2020-05-13 2020-08-28 中国科学院计算机网络信息中心 Time sequence data prediction method based on automatic parameter adjustment recurrent neural network

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112862593A (en) * 2021-01-28 2021-05-28 深圳前海微众银行股份有限公司 Credit scoring card model training method, device, system and computer storage medium
CN112862593B (en) * 2021-01-28 2024-05-03 深圳前海微众银行股份有限公司 Credit scoring card model training method, device and system and computer storage medium

Also Published As

Publication number Publication date
WO2022062193A1 (en) 2022-03-31

Similar Documents

Publication Publication Date Title
CN108629687B (en) Anti-money laundering method, device and equipment
CN112215696A (en) Personal credit evaluation and interpretation method, device, equipment and storage medium based on time sequence attribution analysis
Alsubaie et al. Cost-sensitive prediction of stock price direction: Selection of technical indicators
CN111160473A (en) Feature mining method and device for classified labels
Du et al. An integrated framework based on latent variational autoencoder for providing early warning of at-risk students
El Fouki et al. Multidimensional Approach Based on Deep Learning to Improve the Prediction Performance of DNN Models.
CN109299252A (en) The viewpoint polarity classification method and device of stock comment based on machine learning
Tian et al. Inductive representation learning on dynamic stock co-movement graphs for stock predictions
Liang et al. A double channel CNN-LSTM model for text classification
CN111325344A (en) Method and apparatus for evaluating model interpretation tools
CN114519508A (en) Credit risk assessment method based on time sequence deep learning and legal document information
Rofik et al. The Optimization of Credit Scoring Model Using Stacking Ensemble Learning and Oversampling Techniques
Dharwadkar et al. Customer retention and credit risk analysis using ANN, SVM and DNN
Dey Growing importance of machine learning in compliance and regulatory reporting
Silva et al. Developing and Assessing a Human-Understandable Metric for Evaluating Local Interpretable Model-Agnostic Explanations.
Suresh et al. Predicting the e-learners learning style by using support vector regression technique
Garg et al. A roadmap to deep learning: a state-of-the-art step towards machine learning
Liang et al. Feature construction using genetic programming for figure-ground image segmentation
CN113159419A (en) Group feature portrait analysis method, device and equipment and readable storage medium
CN111340356A (en) Method and apparatus for evaluating model interpretation tools
Supriyadi et al. Performance comparison of machine learning algorithms for student personality classification
Thripuranthakam et al. Stock Market Prediction Using Machine Learning and Twitter Sentiment Analysis: A Survey
Kennardi et al. Evaluation on neural network models for video-based stress recognition
Das A new technique for classification method with imbalanced training data
Bansal et al. Comparison of Machine Learning Algorithms for Prediction of Stock Prices

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication

Application publication date: 20210112

RJ01 Rejection of invention patent application after publication