CN108428001B - Credit score prediction method and device - Google Patents

Credit score prediction method and device Download PDF

Info

Publication number
CN108428001B
CN108428001B CN201710076216.XA CN201710076216A CN108428001B CN 108428001 B CN108428001 B CN 108428001B CN 201710076216 A CN201710076216 A CN 201710076216A CN 108428001 B CN108428001 B CN 108428001B
Authority
CN
China
Prior art keywords
credit score
classifier
predictor model
predictor
data
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201710076216.XA
Other languages
Chinese (zh)
Other versions
CN108428001A (en
Inventor
黄巩怡
郑博
陈谦
刘成烽
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Tencent Technology Shenzhen Co Ltd
Original Assignee
Tencent Technology Shenzhen Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Tencent Technology Shenzhen Co Ltd filed Critical Tencent Technology Shenzhen Co Ltd
Priority to CN201710076216.XA priority Critical patent/CN108428001B/en
Publication of CN108428001A publication Critical patent/CN108428001A/en
Application granted granted Critical
Publication of CN108428001B publication Critical patent/CN108428001B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q10/00Administration; Management
    • G06Q10/04Forecasting or optimisation specially adapted for administrative or management purposes, e.g. linear programming or "cutting stock problem"

Abstract

The invention discloses a credit score prediction method and device, and belongs to the technical field of big data. The method comprises the following steps: acquiring sample data provided by each of at least two data sources, training according to the sample data provided by each of at least two data sources respectively to obtain at least two predictor models corresponding to the at least two data sources, and acquiring the misjudgment rate of each predictor model; respectively inputting the characteristic data of the target user into each predictor model to obtain a credit score output by each predictor model; and according to the misjudgment rate of each predictor model, counting the credit value output by each predictor model to obtain the credit value of the target user. The credit score obtained by applying the at least two predictor models and the misjudgment rate statistics of the at least two predictor models can represent the credibility of the target user in at least two aspects, the predicted credit score is more comprehensive, and the prediction accuracy is improved.

Description

Credit score prediction method and device
Technical Field
The invention relates to the technical field of big data, in particular to a credit score prediction method and device.
Background
The user may generate feature data in various aspects of daily life, and some organizations may collect feature data of the user in specific aspects and determine credit score according to the collected feature data, wherein the credit score is data capable of reflecting credit of the user, and the credit score can be used for measuring credibility of the user and evaluating credit risk of the user. Where the characteristic data may describe characteristics of the user in a particular aspect, the credit score may indicate a trustworthiness of the user in the particular aspect. For example, a bank may collect financial characteristic data of a user and determine the user's trustworthiness in finance and the ability to pay back arrears, while a public transportation company may collect traffic characteristic data of a user and determine the user's trustworthiness in travel, the risk of ticket evasion, and the like.
With the increasingly complex feature data of the public users, in order to accurately predict the credit score of the user, a prediction model is generally obtained, and prediction is performed based on the obtained prediction model. When the prediction models are obtained, the mechanisms can be used as data sources, the feature data and the credit score of the user in the specific aspect are obtained by taking at least two samples provided by a certain data source, and the prediction models are obtained through training and used for predicting the credit score of the user according to the feature data of the user in the specific aspect, so that the credit score represents the credibility of the user in the specific aspect. Then, when the credit score of the target user is to be predicted, the feature data of the target user in the specific aspect is acquired and input into the prediction model, i.e. the credit score of the target user can be determined based on the prediction model.
In the process of implementing the invention, the inventor finds that the related art has at least the following problems: the data source adopted in the process of obtaining the prediction model is single, so that the prediction model can be obtained only according to the feature data and the credit score of the sample user in a single aspect, and further the credit score of the target user in a single aspect can be predicted only by applying the prediction model, but the credit score of the target user cannot be predicted comprehensively, and the prediction is not accurate enough.
Disclosure of Invention
In order to solve the problems of the related art, the embodiment of the invention provides a credit score prediction method and a credit score prediction device. The technical scheme is as follows:
in a first aspect, a credit score prediction method is provided, the method comprising:
the method comprises the steps of obtaining sample data provided by each of at least two data sources, wherein the sample data provided by each data source comprises characteristic data and credit scores of at least two sample users, the characteristic data provided by different data sources are used for describing the characteristics of the sample users in different aspects, and the credit scores provided by different data sources are used for representing the credibility of the sample users in different aspects;
training according to sample data provided by each data source of the at least two data sources respectively to obtain at least two predictor models corresponding to the at least two data sources, and obtaining the misjudgment rate of each predictor model, wherein the misjudgment rate is used for representing the probability of prediction error of the predictor models;
respectively inputting the characteristic data of a target user into each predictor model to obtain a credit score output by each predictor model;
and according to the misjudgment rate of each predictor model, counting the credit value output by each predictor model to obtain the credit value of the target user.
In a second aspect, a credit score prediction apparatus is provided, the apparatus comprising:
the acquisition module is used for acquiring sample data provided by each of at least two data sources, wherein the sample data provided by each data source comprises characteristic data and credit scores of at least two sample users, the characteristic data provided by different data sources is used for describing the characteristics of the sample users in different aspects, and the credit scores provided by different data sources are used for representing the credibility of the sample users in different aspects;
the training module is used for training according to sample data provided by each data source of the at least two data sources respectively to obtain at least two predictor models corresponding to the at least two data sources and obtain the misjudgment rate of each predictor model, and the misjudgment rate is used for expressing the probability of prediction error of the predictor models;
the prediction module is used for respectively inputting the characteristic data of the target user into each prediction submodel to obtain the credit score output by each prediction submodel;
and the counting module is used for counting the credit value output by each predictor model according to the misjudgment rate of each predictor model to obtain the credit value of the target user.
The technical scheme provided by the embodiment of the invention has the following beneficial effects:
according to the method and the device provided by the embodiment of the invention, as the sample data provided by each data source comprises the characteristic data and the credit score of at least two sample users, the characteristic data provided by different data sources are used for describing the characteristics of the sample users in different aspects, and the credit score provided by different data sources is used for representing the credibility of the sample users in different aspects, at least two prediction submodels are obtained by adopting the characteristic data and the credit score of the sample users in at least two aspects provided by at least two data sources, the credibility of the target users in at least two aspects can be represented by using the at least two prediction submodels and the credit score obtained by the misjudgment rate statistics of the at least two prediction submodels, the predicted credit score is more comprehensive, and the prediction accuracy is improved.
Drawings
In order to more clearly illustrate the technical solutions in the embodiments of the present invention, the drawings needed to be used in the description of the embodiments will be briefly introduced below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and it is obvious for those skilled in the art to obtain other drawings based on these drawings without creative efforts.
FIG. 1 is a flowchart of a credit score prediction method according to an embodiment of the present invention;
FIG. 2A is a schematic diagram of an implementation environment provided by embodiments of the invention;
FIG. 2B is a schematic diagram of another exemplary implementation environment provided by embodiments of the invention;
FIG. 2C is a flow chart of data provided by an embodiment of the invention;
FIG. 3A is a flowchart of a credit score prediction method according to an embodiment of the present invention;
FIG. 3B is a flowchart of a credit score prediction method according to an embodiment of the present invention;
FIG. 4 is a schematic diagram of a credit score prediction apparatus according to an embodiment of the present invention;
fig. 5 is a schematic structural diagram of a server according to an embodiment of the present invention.
Detailed Description
The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are some, not all, embodiments of the present invention. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
In daily life, an organization may describe a user's characteristics in a particular aspect by collecting user characteristic data in the particular aspect, and may determine a credit score based on the user characteristic data in the particular aspect to indicate the trustworthiness of the user in the particular aspect. For example, a bank may determine a credit score using financial characteristic data collected from a user in financial terms. The financial characteristic data can describe the user's financial aspect and the credit score can indicate the user's financial aspect trustworthiness.
The collected feature data may be used to train a predictive model based on which the credit score of any target user may be predicted. However, only a single data source is used to obtain the prediction model, and the obtained prediction model can only predict the credit score of the target user in a single aspect. As the feature data of the users become more complex and tend to be diversified, each user has feature data in at least two aspects of daily life, and the prediction model cannot predict the credit scores of the target user in at least two aspects, so that the prediction is not accurate enough.
In the embodiment of the present invention, the prediction device performs the prediction according to the sample data provided by each of the at least two data sources. The prediction device may be a server, a server cluster composed of a plurality of servers, or a cloud computing service center, and the embodiment of the present invention does not limit the prediction device.
Fig. 1 is a flowchart of a credit score prediction method according to an embodiment of the present invention. The execution subject of the embodiment of the present invention is a prediction apparatus, and referring to fig. 1, the method includes:
101. the prediction device obtains sample data provided by each of at least two data sources.
The sample data provided by each data source comprises characteristic data and credit scores of at least two sample users, the characteristic data provided by different data sources is used for describing the characteristics of the sample users in different aspects, and the credit scores provided by different data sources are used for representing the credibility of the sample users in different aspects, so that the credibility of the target users in at least two aspects can be described based on the credit scores predicted by the prediction model, and the prediction accuracy is improved.
Wherein, the credit score value can indicate the credibility of the user. The credit score may be positively correlated with the trustworthiness of the user, i.e., a greater credit score indicates that the user is more trustworthy. The credit score may also be negatively correlated with the trustworthiness of the user, i.e., a greater credit score indicates that the user is less trustworthy.
In addition, the credit score may be expressed in two values, a first value of the credit score indicates that the user is trustworthy, a second value of the credit score indicates that the user is not trustworthy, e.g., 0 indicates that the user is trustworthy, and 1 indicates that the user is not trustworthy. Or the credit score may be expressed as a number within a given range of values, for example, where the credit score is negatively correlated with the trustworthiness of the user, the range of values may be (0, 1) and a user with a credit score of 0.2 may be more trustworthy than a user with a credit score of 0.8.
The at least two data sources include at least one of a financial data source, a traffic data source, a social data source, a health data source, an underlying data source, although other types of data sources may be included.
Wherein the financial data source is used for providing financial characteristic data and financial credit scores of at least two sample users, the financial characteristic data includes but is not limited to: the credit information of the sample user includes credit card record, bank loan record, personal property record and other credit loan records, the public record includes personal housing public accumulation fund, personal endowment insurance and other records, and the embodiment of the invention does not limit the records.
The financial characteristic data is obtained according to the financial condition of the sample user in the past period or according to the operation of the user on the related financial information, and can be obtained by the prediction device through user data filled in various applications of the sample user on a mobile phone, a tablet or a personal computer, or can be obtained by various financial-related applications according to the operation report of the sample user, and the financial-related applications can comprise a financial platform, shopping software and the like. Alternatively, the financial data may be obtained by a third-party server and then sent to the prediction apparatus, where the third-party server may be a bank server, an insurance company server, a financial management server, and the like, and the embodiment of the present invention is not limited thereto.
The financial credit score is used to indicate the trustworthiness of the sample user in financial terms, with a greater financial credit score indicating a greater trustworthiness of the sample user in financial terms. For example, a financial credit score may be used to indicate whether a sample user has the ability to redeem an arrear, and a greater financial credit score indicates a greater ability of the sample user to redeem an arrear.
The traffic data source is for providing traffic characteristic data and traffic credit scores for at least two sample users. The traffic data source can comprise a public traffic system server, a bayonet shooting device, a tourist company server and the like, the traffic characteristic data comprises but is not limited to tourist travel data, geographic position data, self-driving use information and public traffic travel information, the tourist travel data comprises ticket ordering, hotel presetting and the like, the geographic position data comprises navigation, check-in, special cars and the like, the self-driving use information comprises self-driving use frequency and mileage, and the public traffic travel information comprises public transport use frequency and mileage.
The traffic credit score is used to indicate the trustworthiness of the sample user in traffic, with a greater traffic credit score indicating a greater trustworthiness of the sample user in traffic. For example, a traffic credit score may be used to indicate whether a sample user is at risk of a ticket evasion, with a greater traffic credit score indicating a lower risk of ticket evasion for the sample user.
The social data source is used for providing social characteristic data and social credit scores of at least two sample users, the social data source can comprise a social server, a mailbox server and the like, the social characteristic data comprises but is not limited to recorded data such as chatting, e-mails, voice calls, microblog space publishing, bean comment commenting, question and answer knowing, public number article reading, the number of friends, friend circle comment and comment like, expressions, social account head portrait and virtual value-added service data, and the virtual value-added service data comprises virtual account character dressing, game prop purchasing, movie and television membership service, cloud storage space value-added service, music flow packages and the like.
The social credit score is used to indicate how trustworthy the sample user is socially, with a greater social credit score indicating that the sample user is socially more trustworthy. For example, a social credit score may be used to represent the social activity of a sample user, with a greater social credit score representing a sample user that is more active.
The health data source is used to provide health characteristic data and health credit scores for at least two sample users, which may include hospitals, gyms, etc., and health characteristic data including, but not limited to, exercise records, medical records, etc., of the sample users.
The health credit score is used to indicate the trustworthiness of the sample user in terms of health, and a greater health credit score indicates that the sample user is more trustworthy in terms of health, i.e., the sample user is healthier. For example, a health credit score may be used to represent the health status of a sample user, and a greater health credit score indicates a sample user that is healthier and less at risk of developing an illness.
The base data source is for providing base feature data and base credit scores for at least two sample users. The underlying data source may include a terminal of the sample user, an application registered for use by the sample user, and the like, and the underlying characteristic data may include, but is not limited to, underlying demographic data of the sample user, such as name, age, gender, region, academic history, occupation, marital status, and the like.
The basic credit score is used for representing the credibility of the sample user, and the larger the basic credit score is, the more credible the sample user is. For example, a sample user of a college calendar may have a larger base credit score.
The sample data may be obtained through user data filled in by a user in various applications on a mobile phone, a tablet, or a personal computer, or obtained by reporting the various applications according to the operation of the user, where the applications may include an instant messaging application program, a game client, an application downloading platform, a financial management platform, shopping software, and the like, and the embodiment of the present invention is not limited thereto.
It should be noted that after the sample data is obtained, before training the sample data, a preprocessing operation may be performed on the sample data in advance to filter out the sample data in an abnormal condition. The preprocessing operations include deduplication operations, cleaning operations, padding operations, and the like. For example, it is considered that, for the obtained sample data, an abnormal situation that the same data source repeatedly records the same sample user may occur, at this time, the sample data should be subjected to deduplication operation, or an abnormal situation that the sample data of a certain sample user belongs to false data or the sample data format is wrong may also occur, at this time, the sample data should be subjected to cleaning operation, or an abnormal situation that the sample data is incomplete may also occur, at this time, the sample data should be subjected to padding operation.
Further, considering that the types of the characteristic data stored in different data sources are different, if a uniform processing strategy is used to perform the preprocessing operation on the sample data of each data source, it may cause an unreasonable processing strategy for certain characteristic data, which may further cause data loss or introduce unnecessary noise, and therefore, the processing strategy when the preprocessing operation is performed on the sample data provided by each data source may be determined respectively, so as to perform the preprocessing operation on the sample data provided by each data source according to the characteristics of each data source.
102. The prediction device is trained according to the sample data provided by each data source of the at least two data sources respectively to obtain at least two prediction submodels corresponding to the at least two data sources, and the misjudgment rate of each prediction submodel is obtained.
Specifically, the process for training the predictor model may include the following step 1021, and the process for determining the misjudgment rate of the predictor model may include the following step 1022:
1021. and for each data source, the prediction device trains according to the sample data provided by the data source to obtain a prediction submodel.
Considering that sample data provided by different data sources has different characteristics, for example, sample data provided by some data sources has the characteristic of consistent dimension, and sample data provided by some data sources has the characteristic of multiple dimensions, after the prediction device obtains the sample data provided by different data sources, the same or different training modes can be respectively determined for the sample data from different data sources according to the respective characteristics of the sample data of each data source, and training is performed according to the sample data of each data source and the corresponding training mode, so as to obtain the prediction sub-model corresponding to each data source.
In practical application, the predictor model may include at least one classifier, and an algorithm used in training the predictor model may be determined according to characteristics of sample data provided by each data source during training, so as to train a corresponding predictor model. For example, it can be determined whether each data source is trained using a non-ensemble learning algorithm to obtain one classifier or at least two classifiers are trained using an ensemble learning algorithm.
In a first possible implementation manner, for sample data provided by any data source, the prediction apparatus may train the sample data by using a non-ensemble learning algorithm to obtain a classifier, where the classifier is a predictor model corresponding to the data source. The non-ensemble learning algorithm may include a linear partition training algorithm, a logistic regression training algorithm, a decision tree training algorithm, and the like.
In a second possible implementation manner, for sample data provided by any data source, the prediction apparatus may train the sample data by using an ensemble learning algorithm to obtain at least two classifiers, where the at least two classifiers are prediction submodels corresponding to the data source. The ensemble learning algorithm may include a boosting training algorithm, a bagging (self-help integration) training algorithm, a random forest training algorithm, and the like.
For example, the prediction device may train sample data according to a boosting training algorithm, set the same weight for each sample data, train according to at least two sample data and corresponding weights to obtain a classifier, determine misjudged sample data according to a prediction result of the classifier, increase the weights of the sample data, train according to at least two sample data and the adjusted corresponding weights, and obtain a classifier again. At least two classifiers can be obtained by continuously adjusting the weight of each sample data, and the misjudgment rate of each classifier obtained by training is gradually reduced along with the increase of the training times.
By executing the steps for multiple times, model training is carried out on the sample data provided by each data source to obtain the predictor models corresponding to each data source, namely, at least two predictor models corresponding to at least two data sources can be obtained.
1022. The prediction device inputs the characteristic data of at least two sample users in the sample data into the predictor model to obtain the prediction credit scores of the at least two sample users output by the predictor model, and determines the misjudgment rate of the predictor model according to the credit scores of the at least two sample users in the sample data and the prediction credit scores of the at least two sample users output by the predictor model.
The misjudgment rate is used to indicate the probability of prediction error of the predictor model. And when the misjudgment rate is calculated, carrying out comparison statistics according to the credit scores of the at least two sample users in the sample data and the predicted credit scores of the at least two sample users output by the predictor model to obtain the number of the sample users with wrong prediction, and taking the ratio of the number of the sample users with wrong prediction to the total number of the at least two sample users as the misjudgment rate of the predictor model.
For each sample user, the predicted credit score determined by the predictor model can be regarded as the actual predicted value of the sample user, the credit score of the sample user in the sample data can be regarded as the theoretical value of the sample user, and whether the predictor model correctly predicts the credit score of the sample user can be determined according to the deviation of the theoretical value and the actual value. That is, when the deviation between the predicted credit score and the credit score is less than a preset threshold, the predictor model is considered to correctly predict the credit score of the sample user, and when the deviation between the predicted credit score and the credit score is not less than the preset threshold, the predictor model is considered to incorrectly predict the credit score of the sample user. The preset threshold is used for stipulating the maximum deviation between the predicted correct credit score and the theoretical credit score, namely, the predicted credit score is the correct predicted credit score only when the deviation is smaller than the preset threshold, and the preset threshold can be determined according to the requirement on the accuracy degree of the predictor model.
It should be noted that, in practical application, the trained predictor model includes at least one classifier, and the determined misjudgment rate of each predictor model includes the misjudgment rate of at least one classifier in the predictor model.
Specifically, when the predictor model comprises at least one classifier, for each classifier in the at least one classifier, the prediction device inputs the feature data of at least two sample users in the sample data into the classifier to obtain the predicted credit scores of the at least two sample users output by the classifier, and determines the misjudgment rate of the classifier according to the credit scores of the at least two sample users in the sample data and the predicted credit scores of the at least two sample users output by the classifier. And when traversing each classifier in the predictor model and respectively obtaining the misjudgment rate of each classifier, the misjudgment rate of each classifier can form the misjudgment rate of the predictor model. For example, the predictor model includes a classifier 1, a classifier 2, and a classifier 3, where the misjudgment rate of the classifier 1 is a, the misjudgment rate of the classifier 2 is b, and the misjudgment rate of the classifier 3 is c, and the misjudgment rate of the predictor model is (a, b, c).
103. The forecasting device respectively inputs the characteristic data of the target user into each forecasting sub-model to obtain the credit score output by each forecasting sub-model.
After obtaining the at least two predictor models corresponding to the at least two data sources and the misjudgment rate of each predictor model, the feature data of the target user can be respectively input into each predictor model to obtain the credit score output by each predictor model, so that the at least two credit scores corresponding to the at least two predictor models are obtained.
Further, for each of the at least two predictor models, the feature data of the target user is input into the predictor model, which may be input into a classifier of the predictor model, and each classifier outputs a credit score based on the input feature data of the target user, so that the predicting apparatus may obtain the credit score output by at least one classifier of the predictor model.
104. And the predicting device counts the credit value output by each predicting submodel according to the misjudgment rate of each predicting submodel to obtain the credit value of the target user.
After the credit score output by each predictor model based on the feature data of the target user and the misjudgment rate of each predictor model are obtained, the credit score of the target user can be determined according to the misjudgment rate and the credit score of each predictor model. Optionally, the credit score of the target user may be determined according to the credit score output by at least one classifier in each predictor model and the misjudgment rate of at least one classifier.
In a possible implementation manner, the predicting apparatus may apply the following formula to count the credit score output by at least one classifier in each predictor model according to the misjudgment rate of at least one classifier in each predictor model, so as to obtain the credit score of the target user:
Figure BDA0001224414260000101
Figure BDA0001224414260000102
wherein J represents the identifier of the predictor model, J represents the number of the predictor models, J is a positive integer not greater than J, T represents the identifier of the classifier, T represents the number of classifiers in the predictor model, T is a positive integer not greater than TX represents the characteristic data of the target user, EjtRepresents the misjudgment rate, C, of the classifier t in the predictor model jjt(x) Represents the credit score output by classifier t in predictor model j, p (x) represents the credit score of the target user, sign function is a sign function,
Figure BDA0001224414260000103
further, considering that in practical applications, the value range of p (x) is usually between 0 and 1, and the value is small, so that the difference between the credit scores of different users is not obvious, the prediction device may convert the credit score after obtaining the credit score p (x), and obtain the credit score in other value ranges through conversion.
Specifically, the credit score p (x) may be used as a first credit score, and after the statistics of the credit scores output by each predictor model is performed and the first credit score of the target user is obtained based on the above formula, the following formula is applied to calculate a second credit score of the target user:
S=B+ln(1/P(x));
wherein, p (x) represents a first credit score of the target user, B represents a preset reference value, and S represents a second credit score of the target user, and the numerical range of the second credit score is different from that of the first credit score.
In practical applications, the first credit score p (x) may be used to represent the default probability of the target user, the size of p (x) is inversely related to the credibility of the user, and the larger p (x), the more probable the default is and the less credible is the target user. The second credit score S which is in a negative correlation with P (x) can be obtained by adopting the calculation mode, so that the size of S is positively correlated with the credibility of the user, and the larger the S is, the more unlikely the target user is to violate the rules, and the more credible the target user is.
In addition, the numerical range of S can be adjusted by setting the preset reference value B, so that P (x) with a fixed numerical range is translated to other numerical ranges, and credit scores are represented by different numerical values. The specific value of B may be determined according to the numerical range requirement for the credit score, e.g., B may be set to 100 when the credit score is required to be represented in three digits.
In another possible implementation manner, the predicting apparatus may perform statistics on the credit score output by at least one classifier in each predictor model by applying the following formula according to the misjudgment rate of at least one classifier in each predictor model, to obtain the credit score of the target user:
Figure BDA0001224414260000111
Figure BDA0001224414260000112
Figure BDA0001224414260000113
wherein J represents the identifier of the predictor model, J represents the number of the predictor models, J is a positive integer not greater than J, T represents the identifier of the classifier, T represents the number of classifiers in the predictor model, T is a positive integer not greater than T, x represents the feature data of the target user, EjtRepresents the misjudgment rate, C, of the classifier t in the predictor model jjt(x) The credit score output by the classifier t in the predictor model j is represented, H (x) represents the credit score of the target user, and the sign function is a symbolic function.
The two calculation modes respectively represent credit scores of target users through P (x) and H (x), wherein the two credit scores are different in that the credit score obtained by P (x) is a numerical value which floats between 0 and 1, P (x) can represent the credibility of the target user, the size of P (x) is inversely proportional to the credibility, and the larger P (x), the lower the credibility of the corresponding target user. And h (x) may obtain a credit score of 0 or 1, and when the credit score is 0, it may indicate that the corresponding target user is trusted, and when the credit score is 1, it may indicate that the corresponding target user is not trusted.
Further, since the process of calculating the credit score by the above formula requires that the number of classifiers of different predictor models is equal, in practical applications, since different predictor models are trained based on sample data provided by different data sources, the number of classifiers in different predictor models may or may not be equal.
In order to ensure normal calculation of credit score, after at least two predictor models are obtained, each predictor model can be divided into at least one classifier set so as to ensure that the number of the classifier sets of different predictor models is equal, and statistics is carried out according to the misjudgment rate of each classifier set in each predictor model and the output credit score to obtain the credit score of a target user.
Specifically, after obtaining the plurality of predictor models corresponding to the plurality of data sources in the step 102, before obtaining the misjudgment rate of each predictor model, the method may include the following steps 1121 1122:
1121. and sequencing at least one classifier in each predictor model according to the sequence of the misjudgment rate from large to small.
For each of the at least two predictor models, when the predictor model includes at least one classifier, the at least one classifier may be sorted in order of a large to small misjudgment rate.
Specifically, when training sample data of the jth data source to generate a predictor model including T classifiers, the tth classifier of the T classifiers may be represented by Cjt(x) T is a positive integer, and T is a positive integer not greater than T. Accordingly, the misjudgment rate of the classifier can be used as EjtAnd (4) showing. When obtaining each classifier C of T classifiersjt(x) Error rate of (E)jtThen, the T classifiers may be sorted in order of decreasing the misjudgment rate, the obtained sorting result may include T classifiers corresponding to each other and the misjudgment rate, and the sorting result may be (C)j1(x),Ej1)、(Cj2(x),Ej2)…(CjT(x),EjT) Wherein E isj1>Ej2>…>EjT
In addition, when sample data of the jth data source is trained and only one classifier is included in the predictor model, the classifier can be directly used as Cj1(x) The misjudgment rate of the classifier is represented by Ej1If so, the ordering result of the predictor model is (C)j1(x),Ej1)。
1122. And dividing the at least one classifier into at least one classifier set according to the sorting result.
After the sorting result of each predictor model is obtained, the number of the classifier sets can be set according to the lengths of different sorting results. For at least one classifier in each predictor model, the at least one classifier can be divided into at least one classifier set according to the number of the classifier sets and the number of classifiers in the predictor model, each classifier set of the same predictor model comprises the same number of classifiers, and the number of the classifier sets of different predictor models is equal.
For example, after a first predictor model corresponding to the social data source and a second predictor model corresponding to the traffic data source are obtained, the number of classifiers of the first predictor model is 200, and the number of classifiers of the second predictor model is 100. The prediction apparatus may set the number of classifier sets to be 10, divide 200 classifiers in the first prediction sub model into 10 classifier sets, each classifier set includes 20 classifiers that are sorted from large to small according to the misjudgment rate, divide 100 classifiers in the second prediction sub model into 10 classifier sets, and each classifier set includes 10 classifiers that are sorted from large to small according to the misjudgment rate.
Accordingly, the process of obtaining the misjudgment rate of each predictor model in step 102 may be replaced by the following step 1123:
1123. for each classifier set, inputting feature data of a plurality of sample users into the classifier set to obtain credit scores of the plurality of sample users output by each classifier, counting according to the credit scores of the plurality of sample users output by each classifier in the classifier set to obtain predicted credit scores of the plurality of sample users output by the classifier set, and determining misjudgment rates of the classifier set according to the credit scores of the plurality of sample users in the sample data and the predicted credit scores of the plurality of sample users output by the classifier set.
For each sample user, when the credit scores output by each classifier in the classifier set are counted, a plurality of credit scores can be counted according to a preset voting strategy, and a credit score is determined for the plurality of credit scores, corresponds to the classifier set, and comprehensively reflects the prediction results of each classifier in the classifier set. The preset voting policy may be to select a credit score with the highest occurrence frequency from the plurality of credit scores output by the plurality of classifiers, calculate an average value of the plurality of credit scores output by the plurality of classifiers, and the like, which is not limited in the embodiment of the present invention.
And when the misjudgment rate of the classifier set is determined, carrying out comparison statistics according to the credit scores of the plurality of sample users in the sample data and the predicted credit scores of the plurality of sample users output by the classifier set to obtain the number of sample users with wrong prediction, and taking the ratio of the number of sample users with wrong prediction to the total number of the sample users as the misjudgment rate of the classifier set. Whether or not each sample user predicts an error may be determined based on a deviation of the predicted credit score from the actual credit score, and will not be described herein.
Then, the above step 103 "inputting the feature data of the target user into each predictor model respectively to obtain the credit score output by each predictor model" may be replaced by the following step 1031:
1031. inputting the characteristic data of the target user into the predictor model, counting each classifier set in the predictor model according to the credit score output by each classifier in the classifier set to obtain the credit score output by the classifier set, and further obtaining the credit score output by each classifier set in the predictor model.
Specifically, after obtaining at least two credit scores output by at least two classifiers in the classifier set, the at least two credit scores may be counted according to a preset voting policy, and a credit score is determined for the at least two credit scores, where the credit score corresponds to the classifier set and comprehensively reflects a prediction result of each classifier in the classifier set.
The preset voting policy may be to select a credit score with the highest occurrence frequency from at least two credit scores output by at least two classifiers, calculate an average value of the at least two credit scores output by the at least two classifiers, and the like, which is not limited in the embodiment of the present invention.
Then step 104 above may be replaced by step 1041 below:
1041. and counting the credit score output by at least one classifier set in each predictor model according to the misjudgment rate of at least one classifier set in each predictor model to obtain the credit score of the target user.
In a possible implementation manner, the predicting apparatus may apply the following formula to count the credit score output by at least one classifier set in each predictor model according to the misjudgment rate of at least one classifier set in each predictor model, so as to obtain the credit score of the target user:
Figure BDA0001224414260000141
Figure BDA0001224414260000142
wherein J represents the identifier of the predictor model, J represents the number of the predictor models, J is a positive integer not greater than J, T represents the identifier of the classifier set, T represents the number of the classifier sets in the predictor model, T is a positive integer not greater than T, x represents the feature data of the target user, EjtRepresenting the misjudgment rate, C, of the classifier set t in the predictor model jjt(x) Representing classifier set t input in predictor model jA credit score, P (x), representing the credit score of the target user, the sign function being a sign function,
Figure BDA0001224414260000143
in another possible implementation manner, the predicting apparatus may apply the following formula to count the credit score output by at least one classifier set in each predictor model according to the misjudgment rate of at least one classifier set in each predictor model, so as to obtain the credit score of the target user:
Figure BDA0001224414260000144
Figure BDA0001224414260000145
Figure BDA0001224414260000146
wherein J represents the identifier of the predictor model, J represents the number of the predictor models, J is a positive integer not greater than J, T represents the identifier of the classifier set, T represents the number of the classifier sets in the predictor model, T is a positive integer not greater than T, x represents the feature data of the target user, EjtRepresenting the misjudgment rate, C, of the classifier set t in the predictor model jjt(x) The credit score output by the classifier set t in the predictor model j is represented, H (x) represents the credit score of the target user, and the sign function is a symbolic function.
It should be noted that, considering that the feature data of the target user includes feature data of at least two aspects, after obtaining the credit score of the target user, an interpretation analysis needs to be performed on the credit score to determine which aspect of the feature data the credit score of the target user is mainly caused by. Therefore, when the credit score of the target user is calculated, the credit score calculated by each predictor model is inversely proportional to the misjudgment rate of the predictor model, and the influence is larger when the misjudgment rate of the predictor model is lower and the proportion of the calculated credit score is higher. Then, it may be considered that the credit score prediction result of the target user is mainly caused by the feature data corresponding to the predictor model with the lowest misjudgment rate. For example, the social predictor model has the lowest misjudgment rate, and when the target user is determined to have a small credit score and is not trusted, the main reason why the target user is determined to be not trusted may be considered to be due to social inactivity.
According to the method provided by the embodiment of the invention, because the sample data provided by each data source comprises the characteristic data and the credit score of at least two sample users, the characteristic data provided by different data sources are used for describing the characteristics of the sample users in different aspects, and the credit score provided by different data sources is used for representing the credibility of the sample users in different aspects, at least two prediction submodels are obtained by adopting the characteristic data and the credit score of the sample users in at least two aspects provided by at least two data sources, the credit score obtained by applying the at least two prediction submodels and the misjudgment rate statistics of the at least two prediction submodels can represent the credibility of the target user in at least two aspects, the predicted credit score is more comprehensive, and the prediction accuracy is improved.
Furthermore, the same or different training modes can be determined for the sample data from different data sources respectively according to the respective characteristics of the sample data of each data source, training is performed according to the sample data of each data source and the corresponding training mode, and the problem of inaccurate prediction model caused by the fact that a unified training mode is adopted to train the sample data of at least two data sources is solved by considering the difference between different data sources.
Furthermore, the misjudgment rate of each prediction submodel is obtained, and after the credit score of the target user is obtained, the credit score can be explained and analyzed according to the misjudgment rate of each prediction submodel, so that the credibility of the target user in different aspects can be analyzed conveniently.
On the basis of the embodiment shown in fig. 1, an implementation environment is provided in the embodiment of the present invention, and fig. 2A is a schematic diagram of an implementation environment provided in the embodiment of the present invention, where the implementation environment includes a statistics server and at least two model training servers, and each model training server is connected to the statistics server through a network.
Each model training server is used for storing the characteristic data and the credit score of the sample user and the target user, training according to the characteristic data and the credit score of at least two sample users to obtain a prediction sub-model, and determining the misjudgment rate of the prediction sub-model.
And the statistical server is used for carrying out statistics according to the misjudgment rate of each predictor model and the credit score output by each predictor model to obtain the credit score of the target user.
Different model training servers may store feature data and credit scores of different aspects of sample users and target users, for example, referring to fig. 2B, the at least two model training servers may include a financial data server, a social data server, a health data server, a traffic data server, and a basic data server, and the statistical server may perform statistics according to the credit scores and corresponding misjudgment rates output by the financial data server, the social data server, the health data server, the traffic data server, and the basic data server, respectively, to determine the credit scores, as shown in fig. 2C.
Based on the implementation environment shown in fig. 2A, an embodiment of the invention provides a credit score prediction method, and fig. 3A is a flowchart of the credit score prediction method provided in the embodiment of the invention. Referring to fig. 3A, the interactive body of the credit score prediction method includes a statistics server and at least two model training servers. The method comprises the following steps:
301. and each model training server acquires sample data, trains according to the sample data to obtain a prediction submodel, and acquires the misjudgment rate of the prediction submodel.
Each model training server can obtain the characteristic data and the credit score of at least two sample users, train according to the characteristic data and the credit score, obtain a prediction submodel and determine the misjudgment rate of the prediction submodel.
The statistical server can send sample user identifications of at least two sample users to each model training server, each model training server determines sample data matched with the sample user identifications from the stored data of the public users according to the sample user identifications, and training is carried out according to the sample data of the at least two sample users.
The sample user identifier may be a user name, a number assigned to the sample user by the statistical server, and the like, and when different model training servers obtain the same sample user identifier, the feature data of the same sample user in different aspects may be determined respectively.
Optionally, in order to ensure security of data transmission, when the statistics server sends at least two sample user identifiers and corresponding credit scores to each model training server, an encryption transmission mode may be adopted, and each model training server needs to perform a decryption operation to obtain the sample user identifiers and the corresponding credit scores.
The process of training sample data by each model training server is similar to the above step 1021, and the process of determining the misjudgment rate of the predictor model is similar to the above step 1022. The t-th classifier in the predictor sub-model obtained by the jth model training server is C in the embodiment of fig. 2Ajt(x) The misjudgment rate of the classifier is Ejt
302. And the statistical server acquires a target user identifier of a target user and sends the target user identifier to each model training server.
In the embodiment of the invention, the model training server stores the user identification and the characteristic data which correspond to each other, and each time prediction is carried out, the statistical server can determine the target user of the credit score to be predicted, acquire the target user identification of the target user and respectively send the target user identification to each model training server, so that the model training server can acquire the corresponding characteristic data according to the target user identification.
303. Each model training server receives the target user identification, obtains the characteristic data of the target user according to the target user identification, inputs the characteristic data into the prediction submodel, obtains the credit score of the target user output by the prediction submodel, and sends the misjudgment rate of the prediction submodel and the credit score of the target user to the statistical server.
304. And the statistical server receives the misjudgment rate and the credit value sent by each model training server, and performs statistics on the credit value output by each predictor model according to the misjudgment rate of each predictor model to obtain the credit value of the target user.
When the statistical server determines the credit score output by each classifier in each predictor model, namely, C is determined11(x)、C12(x)、C13(x)…Cjt(x)…CJT(x) And determining the misjudgment rate E11、E12、E13…Ejt…EJTThen, P (x) or H (x) can be calculated by applying the formula in step 104.
Optionally, after obtaining p (x), p (x) may be further used as the first credit score, and the formula for calculating the second credit score S is adopted to obtain the second credit score of the target user.
According to the method provided by the embodiment of the invention, the characteristic data adopted by different model training servers when training represents the characteristics of the sample user in different aspects, and the credit scores adopted by different model training servers when training represent the credibility of the sample user in different aspects, so that the statistical server adopts the credit scores provided by at least two predictor models and the misjudgment rate of the at least two predictor models for statistics, the obtained credit score can represent the credibility of the target user in at least two aspects, the predicted credit score is more comprehensive, and the prediction accuracy is improved.
In the related art, when obtaining the prediction model, the data source is required to transmit the feature data of the sample user and the target user to the statistical server. The feature data is usually huge in data size, long in time consumption in the transmission process, large in occupied transmission resources and difficult to implement. Moreover, the feature data often relates to the privacy of the user, and once the feature data is disclosed, the risk that the privacy of the user is stolen is caused. Meanwhile, the data formats of the feature data stored in different data sources may be different, and data barriers exist between different data sources, which may result in that the statistical server cannot identify all the feature data.
In the embodiment of the invention, the model training server executes the process of training the characteristic data and predicting the credit score according to the stored characteristic data, the characteristic data does not need to be transmitted to the statistical server, and only the credit score of the target user and the misjudgment rate of the prediction sub-model need to be transmitted to the statistical server.
In practical application, each model training server can directly send the trained predictor models to the statistical server, and the statistical server respectively obtains the credit values of the target users based on each predictor model and performs statistics to determine the final credit values.
Specifically, referring to fig. 3B, fig. 3B is a flowchart of another credit score prediction method according to an embodiment of the present invention. The interaction subject of the credit score prediction method comprises a statistical server and at least two model training servers. The method comprises the following steps:
311. and each model training server acquires sample data, trains according to the sample data to obtain a prediction submodel, and acquires the misjudgment rate of the prediction submodel.
312. And each model training server sends a corresponding prediction sub-model and a corresponding misjudgment rate to the statistical server.
When the model training server trains the sample data to obtain the predictor model, the predictor model and the misjudgment rate can be directly sent to the statistical server, for example, the model training server sends (C)j1(x),Ej1)、(Cj2(x),Ej2)…(CjT(x),EjT)。
313. And the statistical server receives the prediction submodels and the misjudgment rates sent by each model training server and correspondingly stores at least two prediction submodels and misjudgment rates.
After the statistics server obtains each predictor model and the corresponding misjudgment rate, it may perform statistics on at least two predictor models through the formula in step 104, and determine p (x) or h (x), where p (x) or h (x) is the at least two predictor models after statistics.
314. And the statistical server acquires the user identification of the target user and sends the user identification to each model training server.
315. And each model training server acquires the characteristic data of the target user according to the user identification and sends the characteristic data of the target user to the statistical server.
And each model training server acquires the characteristic data corresponding to the user identification according to the user identification of the sample user, so that the characteristic data of the sample user can be acquired, wherein the characteristic data is the specific content of x.
316. The statistical server receives the feature data of the target user sent by each model training server, the feature data of the target user is respectively input into each predictor model to obtain the credit score output by each predictor model, and the credit score output by each predictor model is counted according to the misjudgment rate of each predictor model to obtain the credit score of the target user.
The difference between the above step 311 and 316 and the above embodiment of fig. 2A is that each model training server sends the prediction sub-model, the misjudgment rate and the feature data of the target user to the statistics server, and the statistics server inputs the feature data of the target user into each prediction sub-model to obtain the credit score output by each prediction sub-model. Other processing procedures are similar and are not described herein.
According to the method provided by the embodiment of the invention, the characteristic data adopted by different model training servers when training represents the characteristics of the sample user in different aspects, and the credit scores adopted by different model training servers when training represent the credibility of the sample user in different aspects, so that the statistical server adopts the credit scores provided by at least two predictor models and the misjudgment rate of the at least two predictor models for statistics, the obtained credit score can represent the credibility of the target user in at least two aspects, the predicted credit score is more comprehensive, and the prediction accuracy is improved.
Fig. 4 is a schematic structural diagram of a credit score prediction apparatus according to an embodiment of the present invention. Referring to fig. 4, the apparatus includes: an acquisition module 401, a training module 402, a prediction module 403, and a statistics module 404.
An obtaining module 401, configured to execute step 101 in the embodiment shown in fig. 1, or execute the obtaining process of step 301 in the embodiment shown in fig. 3A, or execute the obtaining process of step 311 in the embodiment shown in fig. 3B.
A training module 402, configured to perform step 102 in the embodiment shown in fig. 1, or configured to perform the training process of step 301 in the embodiment shown in fig. 3A, or configured to perform the training process of step 311 in the embodiment shown in fig. 3B.
The prediction module 403 is configured to execute step 103 in the embodiment shown in fig. 1, or execute the process of obtaining the credit score output by the predictor model in step 303 in the embodiment shown in fig. 3A, or execute the process of obtaining the credit score output by the predictor model in step 316 in the embodiment shown in fig. 3B.
A statistic module 404, configured to perform step 104 in the embodiment shown in fig. 1, or configured to perform the process of obtaining the credit score of the target user in step 304 in the embodiment shown in fig. 3A, or configured to perform the process of obtaining the credit score of the target user in step 316 in the embodiment shown in fig. 3B.
Optionally, the training module 402 includes:
and the training submodule is used for executing the process of obtaining the prediction submodel in each embodiment.
And the input sub-module is used for executing the process of inputting the characteristic data into the predictor sub-model in each embodiment.
And the determining submodule is used for executing the process of determining the misjudgment rate of the predictor model in each embodiment.
Optionally, the apparatus further comprises: the device comprises a calculation module and a set division module.
The calculating module is used for executing the process of calculating the second credit score according to the first credit score in each embodiment.
The set dividing module is configured to perform a process of dividing at least one classifier into at least one classifier set according to the sorting result in each of the above embodiments.
All the above optional technical solutions may be combined arbitrarily to form the optional embodiments of the present disclosure, and are not described herein again.
It should be noted that: in the above embodiment, when predicting the credit score, the credit score prediction apparatus is exemplified by only the division of the functional modules, and in practical applications, the function distribution may be completed by different functional modules according to needs, that is, the internal structure of the prediction apparatus is divided into different functional modules to complete all or part of the functions described above. In addition, the credit score prediction apparatus provided in the above embodiments and the credit score prediction method embodiments belong to the same concept, and specific implementation processes thereof are described in the method embodiments, and are not described herein again.
Fig. 5 is a schematic structural diagram of a server 500 according to an embodiment of the present invention, where the server 500 may have a relatively large difference due to different configurations or performances, and may include one or more Central Processing Units (CPUs) 522 (e.g., one or more processors) and a memory 532, and one or more storage media 530 (e.g., one or more mass storage devices) for storing applications 542 or data 544. Memory 532 and storage media 530 may be, among other things, transient storage or persistent storage. The program stored on the storage medium 530 may include one or more modules (not shown), each of which may include a series of instruction operations for the server. Still further, the central processor 522 may be configured to communicate with the storage medium 530, and execute a series of instruction operations in the storage medium 530 on the server 500.
The Server 500 may also include one or more power supplies 526, one or more wired or wireless network interfaces 550, one or more input-output interfaces 558, one or more keyboards 556, and/or one or more operating systems 541, such as a Windows ServerTM,Mac OS XTM,UnixTM,LinuxTM,FreeBSDTMAnd so on.
The server 500 may be configured to perform the steps performed by the prediction apparatus in the credit score prediction method provided in the foregoing embodiments.
It will be understood by those skilled in the art that all or part of the steps for implementing the above embodiments may be implemented by hardware, or may be implemented by a program instructing relevant hardware, where the program may be stored in a computer-readable storage medium, and the above-mentioned storage medium may be a read-only memory, a magnetic disk or an optical disk, etc.
The above description is only for the purpose of illustrating the preferred embodiments of the present invention and is not to be construed as limiting the invention, and any modifications, equivalents, improvements and the like that fall within the spirit and principle of the present invention are intended to be included therein.

Claims (18)

1. A method for predicting a credit score, the method comprising:
acquiring sample data provided by each of at least two data sources, wherein the sample data provided by each data source comprises characteristic data and credit scores of at least two sample users, the characteristic data provided by different data sources is used for describing the characteristics of the sample users in different aspects, and the credit scores provided by different data sources are used for representing the credibility of the sample users in different aspects;
training according to sample data provided by each data source in the at least two data sources respectively to obtain at least two predictor models corresponding to the at least two data sources, and obtaining the misjudgment rate of each predictor model, wherein the misjudgment rate is used for representing the probability of prediction error of the predictor models;
respectively inputting the characteristic data of a target user into each predictor model to obtain a credit score output by each predictor model;
according to the misjudgment rate of each predictor model, counting the credit value output by each predictor model to obtain the credit value of the target user;
the training according to the sample data provided by each of the at least two data sources respectively to obtain at least two predictor models corresponding to the at least two data sources and obtain the misjudgment rate of each predictor model includes:
for each data source, training according to sample data provided by the data source to obtain a predictor model;
inputting the characteristic data of at least two sample users in the sample data into the predictor model to obtain the prediction credit scores of the at least two sample users output by the predictor model;
and determining the misjudgment rate of the predictor model according to the credit scores of the at least two sample users in the sample data and the predicted credit scores of the at least two sample users output by the predictor model.
2. The method of claim 1, wherein the at least two data sources include at least one of a financial data source, a traffic data source, a social data source, a health data source, a base data source;
the financial data source is used for providing financial characteristic data and financial credit scores of at least two sample users;
the traffic data source is used for providing traffic characteristic data and traffic credit scores of at least two sample users;
the social data source is used for providing social characteristic data and social credit scores of at least two sample users;
the health data source is used for providing health feature data and health credit scores of at least two sample users;
the base data source is for providing base feature data and base credit scores for at least two sample users.
3. The method of claim 1, wherein determining the misjudgment rate of the predictor model according to the credit scores of the at least two sample users in the sample data and the predicted credit scores of the at least two sample users output by the predictor model comprises:
comparing and counting the credit scores of the at least two sample users in the sample data and the predicted credit scores of the at least two sample users output by the predictor model to obtain the number of sample users with wrong prediction;
and taking the ratio of the number of the sample users with wrong prediction to the total number of the at least two sample users as the misjudgment rate of the predictor model.
4. The method of claim 1, wherein each predictor model comprises at least one classifier, and wherein the misjudgment rate of each predictor model comprises the misjudgment rate of at least one classifier in each predictor model;
the step of respectively inputting the characteristic data of the target user into each predictor model to obtain the credit score output by each predictor model comprises the following steps:
and for each of the at least two predictor models, inputting the characteristic data of the target user into a classifier of the predictor model to obtain a credit score output by at least one classifier of the predictor model.
5. The method of claim 4, wherein the credit score output by each predictor model comprises the credit score output by at least one classifier in each predictor model, and the number of classifiers in different predictor models is equal;
the step of counting the credit score output by each predictor model according to the misjudgment rate of each predictor model to obtain the credit score of the target user comprises the following steps:
according to the misjudgment rate of at least one classifier in each predictor model, applying the following formula to the credit score output by at least one classifier in each predictor model for statistics to obtain the credit score of the target user:
Figure FDA0002989047000000031
wherein J represents the identifier of the predictor model, J represents the number of the predictor models, J is a positive integer not greater than J, T represents the identifier of the classifier, T represents the number of classifiers in the predictor model, T is a positive integer not greater than T, x represents the feature data of the target user, EjtRepresents the misjudgment rate, C, of the classifier t in the predictor model jjt(x) Represents the credit score output by classifier t in predictor model j, p (x) represents the credit score of the target user, sign function is a sign function,
Figure FDA0002989047000000032
6. the method of claim 5, further comprising:
counting the credit score output by each predictor model, and after obtaining a first credit score of the target user, calculating a second credit score of the target user by applying the following formula, wherein the numerical range of the second credit score is different from that of the first credit score:
S=B+ln(1/P(x));
wherein p (x) represents a first credit score of the target user, B represents a preset benchmark value, and S represents a second credit score of the target user.
7. The method of claim 4, wherein the credit score output by each predictor model comprises the credit score output by at least one classifier in each predictor model, and the number of classifiers in different predictor models is equal;
the step of counting the credit score output by each predictor model according to the misjudgment rate of each predictor model to obtain the credit score of the target user comprises the following steps:
according to the misjudgment rate of at least one classifier in each predictor model, applying the following formula to the credit score output by at least one classifier in each predictor model for statistics to obtain the credit score of the target user:
Figure FDA0002989047000000041
wherein J represents the identifier of the predictor model, J represents the number of the predictor models, J is a positive integer not greater than J, T represents the identifier of the classifier, T represents the number of classifiers in the predictor model, T is a positive integer not greater than T, x represents the feature data of the target user, EjtRepresents the misjudgment rate, C, of the classifier t in the predictor model jjt(x) The credit score output by the classifier t in the predictor model j is represented, H (x) represents the credit score of the target user, and the sign function is a symbolic function.
8. The method of claim 4, further comprising:
sequencing at least one classifier in each predictor model according to the sequence of the misjudgment rate from large to small, dividing the classifier into at least one classifier set according to the sequencing result, and acquiring the misjudgment rate of each classifier set, wherein each classifier set of the same predictor model comprises the same number of classifiers, and the number of the classifier sets of different predictor models is equal;
after the feature data of the target user is input into the classifier of the predictor model and the credit score output by at least one classifier of the predictor model is obtained, the method further comprises the following steps:
for each classifier set, carrying out statistics according to the credit score output by each classifier in the classifier set to obtain the credit score output by the classifier set;
the step of counting the credit score output by each predictor model according to the misjudgment rate of each predictor model to obtain the credit score of the target user comprises the following steps:
and counting the credit score output by at least one classifier set in each predictor model according to the misjudgment rate of at least one classifier set in each predictor model to obtain the credit score of the target user.
9. A credit score prediction apparatus, the apparatus comprising:
the acquisition module is used for acquiring sample data provided by each of at least two data sources, wherein the sample data provided by each data source comprises characteristic data and credit scores of at least two sample users, the characteristic data provided by different data sources is used for describing the characteristics of the sample users in different aspects, and the credit scores provided by different data sources are used for representing the credibility of the sample users in different aspects;
the training module is used for respectively training according to the sample data provided by each data source in the at least two data sources to obtain at least two predictor models corresponding to the at least two data sources and obtain the misjudgment rate of each predictor model, and the misjudgment rate is used for expressing the probability of prediction error of the predictor models;
the prediction module is used for respectively inputting the characteristic data of the target user into each prediction submodel to obtain the credit score output by each prediction submodel;
the statistical module is used for counting the credit value output by each predictor model according to the misjudgment rate of each predictor model to obtain the credit value of the target user;
the training module comprises:
the training submodule is used for training each data source according to the sample data provided by the data source to obtain a predictor model;
the input submodule is used for inputting the characteristic data of at least two sample users in the sample data into the predictor submodel to obtain the prediction credit scores of the at least two sample users output by the predictor submodel;
and the determining submodule is used for determining the misjudgment rate of the predictor model according to the credit scores of the at least two sample users in the sample data and the predicted credit scores of the at least two sample users output by the predictor model.
10. The apparatus of claim 9, wherein the at least two data sources comprise at least one of a financial data source, a traffic data source, a social data source, a health data source, an underlying data source;
the financial data source is used for providing financial characteristic data and financial credit scores of at least two sample users;
the traffic data source is used for providing traffic characteristic data and traffic credit scores of at least two sample users;
the social data source is used for providing social characteristic data and social credit scores of at least two sample users;
the health data source is used for providing health feature data and health credit scores of at least two sample users;
the base data source is for providing base feature data and base credit scores for at least two sample users.
11. The apparatus according to claim 9, wherein the determining sub-module is configured to compare and count the credit scores of the at least two sample users in the sample data and the predicted credit scores of the at least two sample users output by the predictor sub-model, so as to obtain the number of sample users with wrong prediction; and taking the ratio of the number of the sample users with wrong prediction to the total number of the at least two sample users as the misjudgment rate of the predictor model.
12. The apparatus of claim 9, wherein each predictor model comprises at least one classifier, and wherein the misjudgment rate of each predictor model comprises the misjudgment rate of at least one classifier in each predictor model;
the prediction module is used for inputting the characteristic data of the target user into at least one classifier of the at least two prediction submodels to obtain the credit score output by the at least one classifier of the prediction submodels.
13. The apparatus of claim 12, wherein the credit score output by each predictor model comprises the credit score output by at least one classifier in each predictor model, and wherein the number of classifiers in different predictor models is equal;
the statistical module is used for applying the following formula to carry out statistics on the credit score output by at least one classifier in each predictor model according to the misjudgment rate of at least one classifier in each predictor model to obtain the credit score of the target user:
Figure FDA0002989047000000071
wherein J represents the identifier of the predictor model, J represents the number of the predictor models, J is a positive integer not greater than J, T represents the identifier of the classifier, T represents the number of classifiers in the predictor model, T is a positive integer not greater than T, x represents the feature data of the target user, EjtRepresents the misjudgment rate, C, of the classifier t in the predictor model jjt(x) Represents the credit score output by classifier t in predictor model j, p (x) represents the credit score of the target user, sign function is a sign function,
Figure FDA0002989047000000072
14. the apparatus of claim 13, further comprising:
a calculating module, configured to count credit scores output by each predictor model, and after obtaining a first credit score of the target user, apply the following formula to calculate a second credit score of the target user, where the second credit score is different from the first credit score in a numerical range:
S=B+ln(1/P(x));
wherein p (x) represents a first credit score of the target user, B represents a preset benchmark value, and S represents a second credit score of the target user.
15. The apparatus of claim 12, wherein the credit score output by each predictor model comprises the credit score output by at least one classifier in each predictor model, and wherein the number of classifiers in different predictor models is equal;
the statistical module is used for applying the following formula to carry out statistics on the credit score output by at least one classifier in each predictor model according to the misjudgment rate of at least one classifier in each predictor model to obtain the credit score of the target user:
Figure FDA0002989047000000081
wherein J represents the identifier of the predictor model, J represents the number of the predictor models, J is a positive integer not greater than J, T represents the identifier of the classifier, T represents the number of classifiers in the predictor model, T is a positive integer not greater than T, x represents the feature data of the target user, EjtRepresents the misjudgment rate, C, of the classifier t in the predictor model jjt(x) The credit score output by the classifier t in the predictor model j is represented, H (x) represents the credit score of the target user, and the sign function is a symbolic function.
16. The apparatus of claim 12, further comprising:
the set dividing module is used for sequencing at least one classifier in each predictor model from large to small according to the misjudgment rate, dividing the classifier into at least one classifier set according to the sequencing result, and acquiring the misjudgment rate of each classifier set, wherein each classifier set of the same predictor model comprises the same number of classifiers, and the number of the classifier sets of different predictor models is equal;
the statistic module is further used for carrying out statistics on each classifier set according to the credit score output by each classifier in the classifier set to obtain the credit score output by the classifier set; and counting the credit score output by at least one classifier set in each predictor model according to the misjudgment rate of at least one classifier set in each predictor model to obtain the credit score of the target user.
17. A server, characterized in that the server comprises:
one or more processors;
a memory;
the memory stores a program for execution on the server by the processor to implement the steps in the credit score prediction method of any of claims 1-8.
18. A computer-readable storage medium, characterized in that the storage medium stores a program for implementing the steps in the credit score prediction method according to any one of claims 1 to 8.
CN201710076216.XA 2017-02-13 2017-02-13 Credit score prediction method and device Active CN108428001B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201710076216.XA CN108428001B (en) 2017-02-13 2017-02-13 Credit score prediction method and device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201710076216.XA CN108428001B (en) 2017-02-13 2017-02-13 Credit score prediction method and device

Publications (2)

Publication Number Publication Date
CN108428001A CN108428001A (en) 2018-08-21
CN108428001B true CN108428001B (en) 2021-05-25

Family

ID=63154950

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201710076216.XA Active CN108428001B (en) 2017-02-13 2017-02-13 Credit score prediction method and device

Country Status (1)

Country Link
CN (1) CN108428001B (en)

Families Citing this family (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111209930B (en) * 2019-12-20 2023-08-11 上海淇玥信息技术有限公司 Method and device for generating trust policy and electronic equipment
CN111494964B (en) * 2020-06-30 2020-11-20 腾讯科技(深圳)有限公司 Virtual article recommendation method, model training method, device and storage medium
CN112508696A (en) * 2021-02-05 2021-03-16 北京淇瑀信息科技有限公司 Channel user quality evaluation method and device and electronic equipment
CN112925911B (en) * 2021-02-25 2022-08-12 平安普惠企业管理有限公司 Complaint classification method based on multi-modal data and related equipment thereof

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2004206167A (en) * 2002-12-20 2004-07-22 Fujitsu Ltd Case prediction device and method
CN104867051A (en) * 2015-06-17 2015-08-26 韩璐 Classification method and device based on support vector machine
CN106022892A (en) * 2016-05-30 2016-10-12 深圳市华傲数据技术有限公司 Credit scoring model update method and credit scoring model update system

Family Cites Families (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20130091050A1 (en) * 2011-10-10 2013-04-11 Douglas Merrill System and method for providing credit to underserved borrowers
CN106127570A (en) * 2016-06-16 2016-11-16 腾讯科技(深圳)有限公司 The stability indicator of credit investigation system generates method and device

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2004206167A (en) * 2002-12-20 2004-07-22 Fujitsu Ltd Case prediction device and method
CN104867051A (en) * 2015-06-17 2015-08-26 韩璐 Classification method and device based on support vector machine
CN106022892A (en) * 2016-05-30 2016-10-12 深圳市华傲数据技术有限公司 Credit scoring model update method and credit scoring model update system

Also Published As

Publication number Publication date
CN108428001A (en) 2018-08-21

Similar Documents

Publication Publication Date Title
US11659050B2 (en) Discovering signature of electronic social networks
US11360971B2 (en) Computer-based systems configured for entity resolution for efficient dataset reduction
US20230325724A1 (en) Updating attribute data structures to indicate trends in attribute data provided to automated modelling systems
US9785792B2 (en) Systems and methods for processing requests for genetic data based on client permission data
US20170286429A1 (en) Computer System for Automated Assessment at Scale of Topic-Specific Social Media Impact
US11640545B2 (en) Computer-based systems configured for entity resolution and indexing of entity activity
US9922134B2 (en) Assessing and scoring people, businesses, places, things, and brands
CN108428001B (en) Credit score prediction method and device
US20140143346A1 (en) Identifying And Classifying Travelers Via Social Media Messages
US20210360077A1 (en) Determining session intent
US20230104757A1 (en) Techniques for input classification and response using generative neural networks
CN110929799A (en) Method, electronic device, and computer-readable medium for detecting abnormal user
WO2022142903A1 (en) Identity recognition method and apparatus, electronic device, and related product
CN111581258B (en) Security data analysis method, device, system, equipment and storage medium
CN111371853A (en) Resource information pushing method and device, server and storage medium
US20220207284A1 (en) Content targeting using content context and user propensity
WO2021174881A1 (en) Multi-dimensional information combination prediction method, apparatus, computer device, and medium
KR102177392B1 (en) User authentication system and method based on context data
KR101935161B1 (en) Prediction system and method based on combination of sns and public opinion poll
US20220277327A1 (en) Computer-based systems for data distribution allocation utilizing machine learning models and methods of use thereof
US20240144079A1 (en) Systems and methods for digital image analysis
US11935060B1 (en) Systems and methods based on anonymized data
Ma et al. Attributed Network Embedding Model for Exposing COVID-19 Spread Trajectory Archetypes
US20230177424A1 (en) Personal protective equipment (ppe) management
TWI657393B (en) Marketing customer group prediction system and method

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant