CN111695084A - Model generation method, credit score generation method, device, equipment and storage medium - Google Patents

Model generation method, credit score generation method, device, equipment and storage medium Download PDF

Info

Publication number
CN111695084A
CN111695084A CN202010340909.7A CN202010340909A CN111695084A CN 111695084 A CN111695084 A CN 111695084A CN 202010340909 A CN202010340909 A CN 202010340909A CN 111695084 A CN111695084 A CN 111695084A
Authority
CN
China
Prior art keywords
user
credit
sample
credit score
users
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202010340909.7A
Other languages
Chinese (zh)
Inventor
李金洋
郎官宏
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing QIYI Century Science and Technology Co Ltd
Original Assignee
Beijing QIYI Century Science and Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing QIYI Century Science and Technology Co Ltd filed Critical Beijing QIYI Century Science and Technology Co Ltd
Priority to CN202010340909.7A priority Critical patent/CN111695084A/en
Publication of CN111695084A publication Critical patent/CN111695084A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F17/00Digital computing or data processing equipment or methods, specially adapted for specific functions
    • G06F17/10Complex mathematical operations
    • G06F17/18Complex mathematical operations for evaluating statistical data, e.g. average values, frequency distributions, probability functions, regression analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/953Querying, e.g. by the use of web search engines
    • G06F16/9535Search customisation based on user profiles and personalisation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q40/00Finance; Insurance; Tax strategies; Processing of corporate or income taxes
    • G06Q40/02Banking, e.g. interest calculation or account maintenance

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • Business, Economics & Management (AREA)
  • Mathematical Analysis (AREA)
  • General Engineering & Computer Science (AREA)
  • Mathematical Physics (AREA)
  • Pure & Applied Mathematics (AREA)
  • Mathematical Optimization (AREA)
  • Accounting & Taxation (AREA)
  • Computational Mathematics (AREA)
  • Finance (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Marketing (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Technology Law (AREA)
  • Development Economics (AREA)
  • Evolutionary Biology (AREA)
  • Economics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Operations Research (AREA)
  • Probability & Statistics with Applications (AREA)
  • General Business, Economics & Management (AREA)
  • Algebra (AREA)
  • Strategic Management (AREA)
  • Software Systems (AREA)
  • Financial Or Insurance-Related Operations Such As Payment And Settlement (AREA)

Abstract

The invention provides a model generation method, a credit score generation device, credit score generation equipment and a storage medium, and belongs to the technical field of electronic equipment. According to the method, users corresponding to different credit degrees are selected as sample users, wherein the sample users at least comprise users with the credit degree larger than a preset threshold value and users with the credit degree not larger than the preset threshold value, sample characteristics of the sample users are obtained according to user related information of the sample users, sample credit scores of the sample users are generated, the sample characteristics and the sample credit scores are used as training data, an initial credit score model is trained to obtain a target credit score model, and therefore accuracy of credit scores determined by using the target credit score model subsequently can be improved.

Description

Model generation method, credit score generation method, device, equipment and storage medium
Technical Field
The invention belongs to the technical field of electronic equipment, and particularly relates to a model generation method, a credit score generation device, equipment and a storage medium.
Background
With the continuous popularization of the internet, more and more users exist in the internet. In using the internet, users sometimes need to loan resources to resource providers in the network. Where the resource may be money, an electronic device, clothing, etc. Thus, it is often desirable to determine a credit score for a user from which to make a credit assessment.
In the prior art, a score corresponding to user-related information is often searched from a preset credit score card according to the user-related information of a user, and the score is used as a credit score of the user. Since the preset credit score card is often manually and empirically established, the accuracy of the determined credit score is low by directly searching to determine the credit score.
Disclosure of Invention
The invention provides a model generation method, a credit score generation method, a device, equipment and a storage medium, which are used for solving the problem of low accuracy of credit score.
In a first aspect of the present invention, there is provided a model generation method, including:
selecting users corresponding to different credit degrees as sample users; the sample users at least comprise users with credit degrees larger than a preset threshold value and users with credit degrees not larger than the preset threshold value;
acquiring sample characteristics of the sample user according to the user related information of the sample user, and generating a sample credit score of the sample user;
and taking the sample characteristics and the sample credit scores as training data, and training an initial credit score model to obtain a target credit score model.
In a second aspect of the present invention, there is also provided a credit score generation method, including:
generating user characteristics of a user to be predicted according to user related information of the user to be predicted;
taking the user characteristics of the user to be predicted as the input of a target credit scoring model, and generating the credit score of the user to be predicted by using the target credit scoring model;
wherein the target credit score model is generated using the model generation method of the first aspect.
In a third aspect of the present invention, there is also provided a model generation apparatus, including:
the selecting module is used for selecting users corresponding to different credit degrees as sample users; the sample users at least comprise users with credit degrees larger than a preset threshold value and users with credit degrees not larger than the preset threshold value;
the acquisition module is used for acquiring the sample characteristics of the sample user according to the user related information of the sample user and generating the sample credit score of the sample user;
and the training module is used for training an initial credit score model by taking the sample characteristics and the sample credit score as training data so as to obtain a target credit score model.
In a fourth aspect of the present invention, there is also provided a credit score generation apparatus, including:
the first generation module is used for generating the user characteristics of the user to be predicted according to the user related information of the user to be predicted;
and the second generation module is used for taking the user characteristics of the user to be predicted as the input of a target credit scoring model and generating the credit score of the user to be predicted by utilizing the target credit scoring model.
Wherein the target credit score model is generated using the model generation apparatus of the third aspect.
In yet another aspect of the present invention, there is also provided a computer-readable storage medium having stored therein instructions, which when run on a computer, cause the computer to perform any of the methods described above.
In yet another aspect of the present invention, there is also provided a computer program product containing instructions which, when run on a computer, cause the computer to perform any of the methods described above.
According to the model generation method provided by the embodiment of the invention, users corresponding to different credit degrees can be selected as sample users, wherein the sample users at least comprise users with the credit degree larger than a preset threshold value and users with the credit degree not larger than the preset threshold value, then, the sample characteristics of the sample users are obtained according to the user related information of the sample users, the sample credit scores of the sample users are generated, and finally, the sample characteristics and the sample credit scores are used as training data to train an initial credit score model so as to obtain a target credit score model. As the sample users corresponding to different credit degrees are used in the process of generating the target credit scoring model, the initial credit scoring model can learn the characteristics of the user with higher credibility and the characteristics of the user with ordinary or lower credibility in the training process, the prediction capability of the finally generated target credit scoring model can be ensured to a certain extent, and the accuracy of the credit scoring determined by subsequently using the target credit scoring model is further improved.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below.
FIG. 1 is a flow chart illustrating steps of a method for generating a model according to an embodiment of the present invention;
FIG. 2 is a flowchart illustrating the steps of a credit score generation method according to an embodiment of the present invention;
FIG. 3 is a diagram illustrating a credit score generation process provided by an embodiment of the invention;
FIG. 4 is a block diagram of a model generation apparatus provided by an embodiment of the present invention;
fig. 5-1 is a block diagram of a credit score generation apparatus according to an embodiment of the present invention;
fig. 5-2 is a block diagram of another credit score generation apparatus provided by an embodiment of the present invention;
fig. 5-3 are block diagrams of still another credit score generation apparatus provided by an embodiment of the present invention;
fig. 6 is a block diagram of an electronic device according to an embodiment of the present invention.
Detailed Description
The technical solutions in the embodiments of the present invention will be described below with reference to the drawings in the embodiments of the present invention.
Fig. 1 is a flowchart of steps of a model generation method provided in an embodiment of the present invention, and as shown in fig. 1, the method may include:
step 101, selecting users corresponding to different credit degrees as sample users; the sample users at least comprise users with credit degrees larger than a preset threshold value and users with credit degrees not larger than the preset threshold value.
In the embodiment of the present invention, the user may be a user in a network platform, for example, a user in a video network platform. The preset threshold may be set according to actual requirements, and may represent an average credit degree of a user in the network, and the credit degree of the user may be considered to be higher if the credit degree of the user is greater than the preset threshold, and the credit degree of the user may be considered to be normal or lower if the credit degree of the user is not greater than the preset threshold. Correspondingly, at least the user with the credit degree larger than the preset threshold value and the user with the credit degree not larger than the preset threshold value are selected as the sample user, so that the sample user comprises the user with the higher credit degree and the user with the common or lower credit degree, and further, when the training is carried out based on the sample user, the characteristics of the users with different credit degrees can be learned by the initial credit scoring model.
102, obtaining the sample characteristics of the sample user according to the user related information of the sample user, and generating the sample credit score of the sample user.
In the embodiment of the invention, the user related information can be information related to the sample user, the sample credit score can be a score for reflecting the real credit degree of the sample user, and further, the sample characteristics of the sample user can be obtained and the sample credit score of the sample user can be generated according to the user related information, so that a training sample which is convenient for the initial credit score model to process can be obtained, and further, the subsequent training is convenient.
And 103, taking the sample characteristics and the sample credit score as training data, and training an initial credit score model to obtain a target credit score model.
In the embodiment of the invention, the initial credit scoring model can be selectively constructed according to actual requirements. For example, the initial credit scoring model may be a Logistic Regression (LR) model or a lightweight gradient Boosting Machine (LightGBM) model, and so on. The initial credit scoring model can be iteratively trained according to the sample characteristics and the sample credit scores until the loss value of the initial credit scoring model meets the requirement, namely the prediction accuracy of the initial credit scoring model meets the requirement, and finally the initial credit scoring model with the prediction accuracy meeting the requirement can be used as a target credit scoring model.
In summary, in the model generation method provided in the embodiment of the present invention, users corresponding to different credit degrees may be selected as sample users, where the sample users at least include users whose credit degrees are greater than a preset threshold and users whose credit degrees are not greater than the preset threshold, then, according to user-related information of the sample users, sample characteristics of the sample users are obtained, and sample credit scores of the sample users are generated, and finally, the sample characteristics and the sample credit scores are used as training data to train an initial credit score model, so as to obtain a target credit score model. As the sample users corresponding to different credit degrees are used in the process of generating the target credit scoring model, the initial credit scoring model can learn the characteristics of the user with higher credibility and the characteristics of the user with ordinary or lower credibility in the training process, the prediction capability of the finally generated target credit scoring model can be ensured to a certain extent, and the accuracy of the credit scoring determined by subsequently using the target credit scoring model is further improved.
Optionally, selecting users corresponding to different credit degrees, and the operation as the sample user may be implemented by the following substeps (1) to (2):
substep (1): and acquiring the loan-related information of the borrowed user.
In this step, the borrowed user refers to a user who has borrowed a resource from a resource provider. Accordingly, when the information related to the loan of the borrowed user is obtained, the user of the platform where the resource provider is located may be used as the borrowed user, and then the information related to the loan of the borrowed user is obtained from the background server of the platform. The loan-related information may be an overdue-related parameter of the borrowed user within a predetermined time period.
Substep (2): selecting at least one non-overdue borrowed user from the borrowed users according to the lending related information of the borrowed users to serve as the sample user; selecting at least one borrowed user with overdue conditions meeting preset conditions from the borrowed users to serve as the sample user; the loaned users who are not overdue are users with the credit degree larger than a preset threshold, and the loaned users with the overdue condition meeting the preset condition are users with the credit degree not larger than the preset threshold.
Specifically, the overdue related parameters may include at least the number of overdue times and/or the overdue duration. When the sample user is obtained, the borrowed user with the overdue related parameter larger than the preset parameter threshold can be determined as the borrowed user with the overdue condition meeting the preset condition; and selecting at least one borrowed user from borrowed users with overdue conditions meeting preset conditions. Determining the borrowed user with the overdue related parameter not larger than the preset parameter threshold as a borrowed user without overdue; selecting at least one borrowed user from the non-overdue borrowed users. Wherein the process of selecting the sample user is the Y value definition process. The preset parameter threshold may be set according to actual conditions. Therefore, the borrowed users with overdue related parameters not larger than the preset parameter threshold value and the borrowed users with overdue related parameters larger than the preset parameter threshold value are selected as sample users, so that when the sample users are used for training the initial credit scoring model, the initial credit scoring model can learn the characteristics of good users with higher credibility and the characteristics of bad users with lower credibility in the training process, and the prediction capability of the target credit scoring model is finally generated.
Optionally, the user-related information may include at least two-dimensional related information, the related information may be obtained from a network, and the related information may include at least user portrait information, information about usage of applications and networks in devices used by users, information about viewing behavior of users, payment information, and/or click behavior information for specific types of advertisements. For example, the user representation information may include gender, age, occupation, consumption level of the user, city in which the user is logged onto the network platform, membership status in the network platform, device status of a device used to log onto the network platform, e.g., device containing memory, price, brand, screen size, etc. The usage information of the application and the network in the device used by the user may include the type of the application installed in the device and usage data generated during the usage, the Wifi network usage location and usage type, a work/rest address determined according to the usage location located by the GPS network, and the like. The viewing behavior information of the user can comprise viewing behavior information of different time periods, different channels and used networks, and the payment information can comprise payment information of purchasing members, platform mall payment information, direct-broadcast appreciation and call charge information in games. Further, the specified type of advertisement may be a loan-type advertisement, and the click behavior information for the specified type of advertisement may include the number of times the loan-type advertisement is clicked. Because the data in the network is more comprehensive, in the embodiment of the invention, the relevance of the trained model can be ensured to a certain extent by acquiring the relevant information of the user in a plurality of different dimensions from the network to participate in the subsequent training.
Optionally, the operation of obtaining the sample feature of the sample user according to the user-related information of the sample user may be implemented by the following sub-steps (3) to (5):
substep (3): and vectorizing the related information of at least two dimensions to obtain the candidate features of at least two dimensions.
In this step, vectorization means that the related information is converted into a form convenient for a computer to understand, that is, into a characteristic form. Specifically, when vectorization is performed, relevant information of each dimension may be input into a preset vector space model, where the vector space model may be a bag of words (bag of words) model, the relevant information is processed through the model, and finally, an output of the model is used as a feature to be selected. Of course, other vectorization methods may be used, and the embodiment of the present invention is not limited thereto.
Substep (4): and determining the importance corresponding to the feature to be selected of any dimension.
Optionally, when determining the importance degree corresponding to the feature to be selected of a dimension, the decision tree corresponding to the feature to be selected of the dimension may be fitted according to the negative gradient of the loss function of the initial credit score model, and then the sum of gains of each branch in the decision tree is counted to obtain the importance degree of the feature to be selected of the dimension.
For example, a constant value for minimizing the loss function may be estimated as a root node, then a negative gradient of the loss function under the current initial credit scoring model is calculated, a leaf node region of the decision tree is estimated according to the negative gradient to fit an approximate value of a residual error, then a value of the leaf node region for minimizing the loss function is estimated by using a linear search as a leaf node, and the nodes are sequentially connected based on a determination order of the root node and the leaf node to obtain the decision tree. And finally, calculating gains corresponding to nodes on each branch, and then calculating the sum of the gains to obtain the importance of the feature to be selected of the dimension. Further, taking the example that the initial credit scoring model is the LightGBM model, the gain of the node may be defined as follows:
Figure BDA0002468454510000071
wherein G isL、HLFirst order gradient sum, respectively two of the loss function at the left-hand child node of the nodeStep gradient sum, GR、HRRespectively first and second order gradient sums of the loss function on the right child node of the node, λ being the regularization term and γ being a penalty factor for the tree complexity.
Substep (5): and selecting the corresponding candidate features of the first m dimensions with the highest importance as the sample features, wherein m is an integer not less than 1.
Furthermore, in an actual application scenario, the dimensions of sample features that can be used for training are often many, and therefore, in the embodiment of the present invention, the features to be selected may be ranked according to importance, and then, the features to be selected of the first m dimensions with the highest importance are selected as the sample features to participate in the training. In this way, the features with high importance are selected to participate in the training, and the accuracy of the target credit scoring model trained based on the features can be improved to a certain extent. Where the specific value of m may be set according to requirements, for example, m may be 3, 5, and so on.
For example, table 1 shows several candidate features with higher Importance for the LightGBM model and their corresponding Importance levels (importances). In the embodiment of the invention, the training is carried out by combining the multi-dimensional user characteristics in the network, so that the characteristics contained in the characteristic dictionary used for training are richer, and the accuracy of the model obtained by training can be improved to a certain extent. The feature dictionary may be composed of sample features, and the network mentioned in the embodiment of the present invention may be the entire internet, or may be a certain network platform, for example, a network platform for playing videos, a network platform for playing music, or the like.
Candidate feature Importance
Apple login device number feature 4700.91
Installing loan app number features within 30 days from the application date 2141.79
Educational level features 2070.44
Sex characteristics 1636.69
Viewing duration characteristics of 04, 07 each time interval of each day on average in nearly one month 1241.58
Age characteristics 1208.03
Device most often logs into city level features 1202.89
Viewing duration ratio feature of 04, 07 each time period of each day on average in nearly one month 906.14
Occupational features 819.07
Cell phone read-only memory feature 809.35
TABLE 1
Further, the LightGBM model is an implementation of the integrated learning algorithm (GBDT), and is also an optimization framework for the Xgboost model, which is similar in principle to the GBDT algorithm and the Xgboost model. Compared with the Xgboost model, on the premise of not reducing the accuracy rate, the Xgboost model has the advantages of higher speed, lower memory occupation amount, support for parallelization learning, capability of directly processing large-scale data and support for directly using category (category) characteristics, so that the LightGBM model is used as the initial credit scoring model for training in the embodiment of the invention, and the target credit scoring model with better effect can be obtained to a certain extent.
Optionally, the operation of generating the sample credit score of the sample user may be implemented by the following sub-steps (6) to (8):
substep (6): and determining the score corresponding to the relevant information of each dimension according to the preset corresponding relation between the relevant information and the score.
The preset corresponding relationship between the relevant information and the score may be predefined, and the preset corresponding relationship between the relevant information and the score may be a financial scoring card. Specifically, according to the corresponding relationship, a score corresponding to the relevant information of the dimension can be searched.
Substep (7): selecting the related information of the first n dimensionalities with the maximum corresponding score as target related information; and n is an integer not less than 1.
In this step, the feature variables may be sorted according to the scores, and the n pieces of relevant information with the highest scores may be selected as the target relevant information according to the sorting result. Where the specific value of n may be set according to requirements, for example, n may be 3, 5, and so on. For example, the partial sort result may be as shown in table 2:
related information Iv
Number of devices logging in apple 0.15
Installing loan app numbers within 30 days from the application date 0.13
Sex 0.12
In the trade 0.09
The average film watching duration of 04, 07 each time period of each day in the last month 0.08
Average ratio of film watching time to film watching time per day in nearly one month 0.07
The most common level of entry of a device into a city 0.07
Product variety number of borrowing application in nearly 7 days 0.07
Duration of online film hanging and watching of near-january mobile terminal 0.07
TABLE 2
Substep (8): and determining the sum of the scores of the target related information as the sample credit score of the sample user.
In this step, the sum of the scores of the target related information may be calculated, and then the final sample credit score may be obtained. The value of the sample credit score may be used to reflect the sample user's true trustworthiness. In the embodiment of the invention, the relevant information with a large score is selected as the target relevant information, and the sample credit score is determined based on the target relevant information, so that the calculation complexity of the sample credit score can be reduced to a certain extent while the representativeness of the sample credit score is ensured under the condition of containing a large number of dimensionality relevant information.
Optionally, when generating a sample credit score of a sample user, the sample feature may be input into the initial credit score model, and the sample feature may be processed according to the initial credit score model to generate a predicted credit score; calculating a loss value for the initial credit score model based on the predicted credit score and the sample credit score; if the loss value is not in the preset range, adjusting the initial credit scoring model based on the loss value, continuing training the adjusted initial credit scoring model until the loss value is in the preset range, and taking the initial credit scoring model as the target credit scoring model.
In particular, the initial credit scoring model may include a multi-layer structure, each of which may implement a different process. The sample features are input into a credit scoring model that processes the sample features through the layers that it includes, and finally, a predicted credit score is generated. The predicted credit score and the sample credit score may then be input to a loss function, where the loss function may be pre-constructed with the degree of deviation between the predicted credit score and the sample credit score as an increasing function of the argument. Thus, the greater the deviation between the predicted credit score and the sample credit score, i.e., the poorer the predictive power of the initial credit score model, the greater the loss value of the loss function is guaranteed, and vice versa. Therefore, the calculated loss value can more accurately represent the precision of the initial credit scoring model, and when the initial credit scoring model is adjusted through the loss value subsequently, the initial credit scoring model can be adjusted correspondingly, namely, corresponding punishment is given to the initial credit scoring model according to the error degree of the initial credit scoring model.
Further, if the loss value is not within the preset range, the processing capacity of the initial credit score model may not be enough to meet the evaluation requirement, and therefore, the parameters in the initial credit score model may be adjusted and then training may be continued to further optimize the initial credit score model. Specifically, parameters in the initial credit scoring model can be adjusted based on the error degree represented by the loss value, and training is continued based on the adjusted initial credit scoring model until the loss value is within a preset range, that is, when the processing capacity of the initial credit scoring model meets the evaluation requirement, the initial credit scoring model is used as a target credit scoring model, so that the precision of the trained target credit scoring model is ensured. It should be noted that the foregoing process of determining a sample user, the process of generating a sample feature, and the process of training may be implemented by an offline sample collection module, a feature process module, and a model training module, respectively.
Further, in the embodiment of the present invention, after the target credit scoring model is obtained, that is, after the loss value is within the preset range, the target credit scoring model may be verified according to the test data; under the condition that the target credit scoring model passes the verification, performing secondary verification on the target credit scoring model based on verification data; in the case that the target credit scoring model fails the secondary verification, the model continues to be trained. The test data comprises sample characteristics and sample credit scores of the test sample users, and the verification data comprises the sample characteristics and the sample credit scores of the verification sample users. The time period corresponding to the data source of the test data is the same as the time period corresponding to the data source of the training data, the time period corresponding to the data source of the verification data is later than the time period corresponding to the data source of the training data, the data source may be user-related data of users in the network, and the time period corresponding to the data source may be a time period of time when the user-related information is generated. For example, the time period corresponding to the data source of the test data and the time period corresponding to the data source of the training data may be from 6 month 1 day in 2018 to 4 month 1 day in 2019, and the time period corresponding to the data source of the verification data may be from 4 month 1 day in 2019 to 6 month 1 day in 2019. Specifically, the verification process may include inputting sample characteristics of the test sample user into the target credit score model, determining a first evaluation parameter according to a deviation between a predicted credit score given by the target credit score model and the sample credit score of the test sample user, and determining whether the target credit score model passes the verification based on the first evaluation parameter. The secondary verification may be performed by inputting the sample characteristics of the verification sample user into the target credit score model, determining a second evaluation parameter according to a deviation between the predicted credit score given by the target credit score model and the sample credit score of the verification sample user, and determining whether the target credit score model passes the verification based on the second evaluation parameter. Because the time period corresponding to the data source of the verification data is later, in the embodiment of the invention, the generalization capability of the model for predicting the future situation can be tested by performing secondary verification on the verification data. Under the condition of not passing the secondary verification, the training is continued, so that the final model can be ensured to accurately predict the future condition, and the accuracy of the model is further improved.
Further, compared with the method of training the personal credit investigation data through the bank system, the usage degree of the network by the user is very high, that is, the user-related data contained in the network is more comprehensive than the personal credit investigation data. Therefore, the model obtained by performing model training based on the information acquired from the network in the embodiment of the invention can make up the problem of insufficient coverage of personal credit investigation data to a certain extent, so that the model covers more information of the user, and the subsequent prediction effect on the user can be improved.
It should be noted that, because there are multiple initial credit scoring models that can be used for training, in the embodiment of the present invention, different initial credit scoring models may be used for training, and then a model that is more suitable for requirements in performance is selected according to the performance of the model obtained by training the different initial credit scoring models. For example, table 3 shows the performance of the model obtained by training with the LR model as the initial credit scoring model and the performance of the model obtained by training with the LightGBM model as the initial credit scoring model.
Figure BDA0002468454510000121
TABLE 3
Where OOT represents the set of training and test sets. ks refers to the maximum difference between the percentage of accumulated good users and the percentage of bad users using the same scoring criteria. auc indicates the area under the receiver operating characteristic curve (ROC), which is the standard for judging the quality of the model. It can be seen that when the LR model is used as the initial credit scoring model, the ks value of the target credit scoring model in different training sets, test sets, and sets of the training set and the test set changes in a smaller range, and the ks value is more stable, and when the LightGBM model is used as the initial credit scoring model, the ks value of the target credit scoring model in different training sets, test sets, and sets of the training set and the test set changes in a larger range, and the stability is worse. When the LR model is used as the initial credit score model, the auc value of the target credit score model is smaller than the auc value of the target credit score model when the LightGBM model is used as the initial credit score model. When the LightGBM model is used as the initial credit scoring model, the trained model is not easy to over-fit, the effect is better, but the stability is poorer. When the LR model is used as the initial credit scoring model, the trained model has stable effect, but the effect is not optimal.
Fig. 2 is a flowchart illustrating steps of a credit score generation method according to an embodiment of the present invention, where as shown in fig. 2, the method may include:
step 201, generating user characteristics of the user to be predicted according to the user related information of the user to be predicted.
In the embodiment of the present invention, the user to be predicted may be a user who needs to determine a credit score, for example, a user who needs to perform resource lending, and the user to be predicted may be obtained by obtaining users in each application software in the network. For example, a user of video playing software may be acquired as the user to be predicted. The process of generating user features may be implemented by an online full-scale feature generation module.
Further, the specific information included in the user-related information and the manner of determining the user characteristics according to the user-related information in the embodiment of the present invention may refer to the description of the user-related information in the foregoing embodiment, and the description of the implementation manner of obtaining the sample characteristics according to the user-related information in the foregoing embodiment, which are not described herein again.
Step 202, taking the user characteristics of the user to be predicted as the input of a target credit scoring model, and generating the credit score of the user to be predicted by using the target credit scoring model.
In the embodiment of the invention, the credit score of the user to be predicted is generated by directly utilizing the target credit model, so that the real-time performance of the obtained credit score can be ensured to the greatest extent. The process of directly generating the credit score of the user to be predicted through the target credit score model can be realized through an online prediction scoring module. Further, the target credit score model may be generated by the model generation method provided in the foregoing embodiment. The target credit scoring model is obtained by training the sample users corresponding to different credit degrees, so that the characteristics of the user with higher credibility and the characteristics of the user with ordinary or lower credibility can be learned in the training process, and the accuracy of the credit scoring determined by using the target credit scoring model can be improved to a certain extent.
In summary, the credit score generation method provided in the embodiment of the present invention may generate the user characteristics of the user to be predicted according to the user-related information of the user to be predicted, use the user characteristics of the user to be predicted as the input of the target credit score model, and generate the credit score of the user to be predicted by using the target credit score model. The target credit scoring model is obtained by training the sample users corresponding to different credit degrees, so that the characteristics of the user with higher credibility and the characteristics of the user with ordinary or lower credibility can be learned in the training process, and the accuracy of the credit scoring determined by using the target credit scoring model can be improved to a certain extent.
Optionally, the user to be predicted may be multiple users in a preset network, where the preset network may be set according to actual requirements, and for example, the preset network may be the whole internet, or may be a certain video website, a music website, or the like. The user to be predicted may be a part of users in a preset network or may be all users. Further, after generating the credit score of the user to be predicted, the following steps may also be performed:
step A, storing the credit score of the user to be predicted and the user identification ID of the user to be predicted into a credit score database.
In this step, the credit scoring database may be an online database, and specifically, may be an online database based on the touchbase technology. The credit scoring database can be used for storing credit scores, and a preset interface can be arranged on the credit scoring database so as to facilitate the calling of the electronic equipment. Therefore, when the electronic equipment acquires the credit score from the preset credit score database to a certain extent, the electronic equipment can acquire the credit score more quickly by storing the credit score in an online database mode, and the acquisition efficiency is further improved. The user Identification (ID) may be pre-assigned to each user to be predicted, and may uniquely represent the identification of the user to be predicted.
And B, under the condition of receiving a search request, searching a credit score corresponding to the user ID from the credit score database according to the user ID of the user to be searched contained in the search request, wherein the credit score is used as the credit score of the user to be searched.
In this step, the search request may be sent when the credit score of the user to be searched needs to be searched. During searching, the user ID of the user to be searched may be compared with each user ID in the scoring database to determine a user ID matching the user ID of the user to be searched, and then, a credit score corresponding to the matching user ID is determined as the credit score of the user to be searched. In the embodiment of the invention, the generated credit score of the user to be predicted is stored in the credit score database, and when the credit score of a certain user needs to be determined subsequently, the model generation does not need to be used again, and the user can directly search the credit score database, so that the acquisition efficiency of the credit score can be improved to a certain extent.
Further, since the user-related information of the user is time-efficient, that is, the personal condition of the user changes with the passage of time, and accordingly, the behavior of the user also changes, so that the user-related information changes. Thus, the previously determined credit score may not accurately represent the true trustworthiness of the user. Therefore, in the embodiment of the present invention, the credit scores stored in the credit score database may also be updated. Specifically, the update operation may be implemented by the following steps C to E:
step C, for any user in the credit score database, acquiring the updated user characteristics of the user according to a first preset period; the updated user characteristics are generated according to updated user related information of the user, wherein the updated user related information is acquired within a preset time from the current time.
In this step, the first preset period may be set according to actual conditions. For example, the first predetermined period may be 1 day, 2 days, or one week, etc. Further, the electronic device may obtain the user characteristics of the user once every other first preset period to obtain the updated user characteristics. Further, the preset duration may be set according to actual requirements, and the preset duration may be 0, that is, the relevant information of the updated user is obtained in real time at the current time, and the updated user characteristic is generated in real time. Accordingly, when the updated user characteristics are obtained, the recent user related information of the user is collected from the network to obtain the updated user related information, and then the updated user related information is vectorized to further obtain the updated user characteristics of the user. Thus, the timeliness of the obtained updated user characteristics can be ensured to the maximum extent by acquiring the updated user characteristics in real time. Alternatively, the preset time duration may not be 0, that is, the updated user related information is previously obtained in advance, and the updated user characteristic is previously generated. For example, the updated user related information of the user may be obtained according to a second preset period, and the updated user related information is vectorized to generate the updated user feature of the user; and storing the updated user characteristics of the user into a preset offline characteristic library. Accordingly, in this step, the updated user characteristics of the user may be obtained from a preset offline characteristic library. Wherein the preset offline feature library may be a hive technology-based database. The offline feature library may be a database for storing updated user features of the user, and the duration of the second preset period may be a preset duration. In the embodiment of the invention, the updated user characteristics are periodically updated and stored according to the second preset period, so that the updated user characteristics can be stably and reliably provided for the electronic equipment through the offline characteristic library. Accordingly, when the updated user features of the user are obtained, the timeliness of the obtained updated user features can be ensured to a certain extent and the obtaining efficiency is improved in a mode of directly obtaining the updated user features from the offline feature library.
And D, taking the updated user characteristics as the input of the target credit scoring model, and generating the credit score of the user by using the target credit scoring model.
For example, taking the target credit scoring model trained by the LR model as an example, the target credit scoring model may generate the default probability value by the following formula:
Figure BDA0002468454510000161
Figure BDA0002468454510000162
wherein, ∑ bixi=b0+b1x1+b2x2+…+bixi,biRepresenting the ith preset weight parameter in the model,xiRepresenting the ith user characteristic of the input, P may be denoted as "P (y)i=1|xi) ", which may represent the user y to which the user characteristic correspondsiProbability of breach. The closer the value of P is to 0, the lower the probability that the user will violate the contract, and the farther the value of P is from 0, the closer to 1, the higher the probability that the user will violate the contract. When the probability value is equal to 0, the user can be considered not to default, and when the probability value is equal to 1, the user can be considered to default. After the logit transformation is performed on P, i.e. the natural logarithm of formula (1) is taken according to P/(1-P), a linear regression equation as shown in formula (2) can be obtained. Wherein, the parameters in the formula can be obtained by continuous adjustment in the model training process. Since the error term of the formula obeys binomial distribution, the parameter estimation can be performed by using the maximum likelihood estimation method during adjustment. Taking a single variable as an example, when xiTending to be positive infinity, P (y)i=1|xi) 1 is ═ 1; when xi tends to minus infinity, P (y)i=1|xi) 0. Because the independent variable and the dependent variable have a linear relation in the formula, the obtained result is more continuous by the model, and the obtained probability value distribution is more uniform.
Finally, a credit score may be determined based on the default probability value. In particular, the probability value may be determined directly as a credit score. Or, according to a preset default probability value and score corresponding relation, determining a score corresponding to the default probability value, and then determining the score as a credit score.
And E, updating the credit score corresponding to the user in the credit score database by using the generated credit score.
In this step, the credit score can be used to cover the credit score corresponding to the user in the credit score database, so as to realize updating. In this way, by periodically updating the credit score in the credit score database according to the first preset period, the accuracy of the credit score searched by the electronic device from the credit score database can be ensured.
Optionally, after generating the credit score of the user to be predicted by using the target credit score model, the following steps F to H may be further performed:
and F, determining the crowd category to which the user to be predicted belongs according to the user related information of the user to be predicted.
Specifically, in this step, the preset judgment condition corresponding to each preset crowd category may be obtained first, where the preset crowd category may be preset according to the attribute characteristics of the people, and different people may belong to different preset crowd categories. For example, the predetermined population categories may include elderly people, middle aged people, and young people, or alternatively, people with high income stability, people with low income stability, people with high consumption ability, people with low consumption ability, and so on. Different preset crowd categories have different resource repayment capabilities, for example, the people with high income stability often have higher resource repayment capabilities than the people with low income stability, that is, the credibility of different people is different. The preset judgment condition may be preset and is used for representing a condition that needs to be met by the behavior of the user belonging to the preset crowd category, then based on the preset judgment condition, the user related information of the user to be predicted is respectively matched, and finally, the preset crowd category corresponding to the preset judgment condition matched with the user related information is determined as the crowd category to which the user to be predicted belongs.
And G, determining a credit score interval to which the credit score of the user to be predicted belongs.
The credit scoring intervals can be divided in advance, the credibility degrees of the users with credit scores belonging to the same credit scoring interval are approximately the same, and the credibility degrees of the users with credit scores belonging to the same credit scoring interval are different. Specifically, an interval range corresponding to each preset credit score interval may be obtained, then an interval range to which the credit score of the user to be predicted belongs is determined, and finally, the preset credit score interval corresponding to the interval range to which the credit score of the user to be predicted belongs is determined as the credit score interval to which the credit score of the user to be predicted belongs.
And step H, recommending the user to be predicted to a resource provider corresponding to the credit scoring interval and the crowd category.
In embodiments of the invention, the resources provided by different resource providers may be different, for example, the resource providers may be money providing, for example, money lending software. The resource provider may also be a software offering for clothing, for example, for renting clothing, or the resource provider may also be a software offering for electronic equipment, for example, for renting products such as computers, cell phones, and the like. Different resource providers have different requirements on the trustworthiness of the user, since the resource provided is of different value. For example, since the clothing tends to be less valuable than the electronic device, the resource provider providing the clothing resource tends to have less requirements on the user's trustworthiness than the resource provider providing the electronic device resource. Therefore, in order to provide more users to the resource provider as much as possible, in the embodiment of the present invention, corresponding credit scoring intervals and crowd categories may be set in advance for different resource providers according to the requirement of the resource provider on the user credibility.
Further, in this step, a credit scoring interval to which the credit score of the user to be predicted belongs may be determined, then the credit scoring area to which the user to be predicted belongs is matched with the credit scoring intervals corresponding to the resource providers, the crowd category to which the user to be predicted belongs is matched with the crowd category corresponding to the resource providers, and finally the corresponding credit scoring area is matched with the credit scoring area to which the user to be predicted belongs, and the resource provider to which the corresponding crowd category is matched with the crowd category to which the user to be predicted belongs is determined as the resource provider corresponding to the credit scoring interval and the crowd category. Accordingly, if the two match, the credibility of the user to be predicted can be considered to meet the credibility required by the resource provider, and therefore, the user to be predicted can be recommended to the resource provider.
In the embodiment of the invention, the user to be predicted is recommended to the adaptive resource provider, so that the user can be hierarchically and intelligently distributed to different resource providers, and the service change rate is further improved. Meanwhile, users with credit scoring intervals and crowd categories matched with the credit scoring intervals and the crowd categories corresponding to the resource provider are recommended, namely, the users with high adaptation degree with the resource provider, so that the passing rate of the users at the resource provider can be ensured to a certain extent, and the risk brought to the resource provider by the users is reduced. Of course, credit decisions may also be made based on credit scores, which are not limited by the embodiments of the present invention.
For example, fig. 3 is a schematic diagram of a credit score generation process according to an embodiment of the present invention, and as shown in fig. 3, a sample user may be obtained by generating a good-bad sample, then sample characteristics corresponding to the sample user are obtained by feature derivation, then model training is performed based on these characteristics, after training, user characteristics of other users in a network may be obtained through a full-scale feature generation link, and finally, a prediction scoring is performed according to user characteristics of other users in the network by using a trained target credit score model. The other users may be users in a credit score database, and the score obtained by predicting the score is the credit score corresponding to the user in the credit score database.
It should be noted that after the credit score interval to which the credit score of the user to be predicted belongs is determined, the distribution condition of the credit score interval of different users to be predicted in the network and the proportion of overdue users in the credit score interval, that is, the bad proportion, can also be counted. Based on the distribution and the proportion, the characteristics of the users in different credit scoring intervals are analyzed. Further, the proportion of users included in the credit score interval compared with all users in the whole network, that is, the full distribution, may also be determined. So that the characteristics of the users in the whole network can be analyzed based on the full distribution, and guidance is provided for subsequent work. For example, table 4 shows a distribution, a corresponding ratio, and a corresponding total distribution for each interval.
Section bar Bad ratio Good ratio
[716,1000] 0.88% 9.77%
[663,716] 1.88% 8.04%
[623,663] 2.52% 7.67%
[586,623] 3.72% 8.03%
[552,586] 4.04% 8.00%
[516,552] 4.98% 8.86%
[478,516] 6.30% 9.71%
[435,478] 7.88% 12.58%
[378,435] 10.13% 14.83%
[0,378] 16.38% 12.51%
all 5.87% 100%
TABLE 4
It can be seen that the higher the credit score, the smaller the proportion of bad, i.e., the higher the credit score, the lower the proportion of user overdue. For users in the overall network, the credit scores of the users present a left-biased distribution.
Fig. 4 is a block diagram of a model generation apparatus according to an embodiment of the present invention, and as shown in fig. 4, the apparatus 40 may include:
a selecting module 401, configured to select users corresponding to different credit degrees as sample users; the sample users at least comprise users with credit degrees larger than a preset threshold value and users with credit degrees not larger than the preset threshold value.
An obtaining module 402, configured to obtain a sample feature of the sample user according to the user-related information of the sample user, and generate a sample credit score of the sample user.
The training module 403 is configured to train the initial credit score model by using the sample features and the sample credit score as training data to obtain a target credit score model.
In summary, the model generating apparatus provided in the embodiment of the present invention may select users corresponding to different credit degrees as sample users, where the sample users at least include users whose credit degrees are greater than a preset threshold and users whose credit degrees are not greater than the preset threshold, then obtain sample characteristics of the sample users according to user-related information of the sample users, generate sample credit scores of the sample users, and finally train an initial credit score model by using the sample characteristics and the sample credit scores as training data to obtain a target credit score model. As the sample users corresponding to different credit degrees are used in the process of generating the target credit scoring model, the initial credit scoring model can learn the characteristics of the user with higher credibility and the characteristics of the user with ordinary or lower credibility in the training process, the prediction capability of the finally generated target credit scoring model can be ensured to a certain extent, and the accuracy of the credit scoring determined by subsequently using the target credit scoring model is further improved.
Optionally, the selecting module 401 is specifically configured to:
and acquiring the loan-related information of the borrowed user.
Selecting at least one non-overdue borrowed user from the borrowed users according to the lending related information of the borrowed users to serve as the sample user; and selecting at least one borrowed user with overdue conditions meeting preset conditions from the borrowed users to serve as the sample user.
The loaned users who are not overdue are users with the credit degree larger than a preset threshold, and the loaned users with the overdue condition meeting the preset condition are users with the credit degree not larger than the preset threshold.
Optionally, the loan related information is overdue related parameters of the borrowed user within a preset time length;
the selecting module 401 is further specifically configured to:
determining the borrowed user with the overdue related parameter larger than the preset parameter threshold as the borrowed user with the overdue condition meeting the preset condition; and selecting at least one borrowed user from borrowed users with overdue conditions meeting preset conditions to serve as the sample user.
Determining the borrowed user with the overdue related parameter not larger than the preset parameter threshold as a borrowed user without overdue; selecting at least one borrowed user from the non-overdue borrowed users as the sample user.
Wherein the overdue related parameters at least comprise overdue times and/or overdue duration.
Optionally, the user-related information includes related information of at least two dimensions.
The obtaining module 402 is specifically configured to:
and vectorizing the related information of at least two dimensions to obtain the candidate features of at least two dimensions.
And determining the importance corresponding to the feature to be selected of any dimension.
And selecting the corresponding candidate features of the first m dimensions with the highest importance as the sample features, wherein m is an integer not less than 1.
Optionally, the related information at least includes user portrait information, information about usage of applications and networks in devices used by the user, information about viewing behavior of the user, payment information, and/or click behavior information for a specific type of advertisement.
Optionally, the obtaining module 402 is further specifically configured to:
and fitting a decision tree corresponding to the feature to be selected of the dimensionality according to the negative gradient of the loss function of the initial credit scoring model.
And counting the sum of the gains of all branches in the decision tree to obtain the importance of the feature to be selected of the dimensionality.
Optionally, the obtaining module 402 is further specifically configured to:
and determining the score corresponding to the relevant information of each dimension according to the preset corresponding relation between the relevant information and the score.
Selecting the related information of the first n dimensionalities with the maximum corresponding score as target related information; and n is an integer not less than 1.
And determining the sum of the scores of the target related information as the sample credit score of the sample user.
Optionally, the training module 403 is specifically configured to:
inputting the sample features into the initial credit score model, and processing the sample features according to the initial credit score model to generate a predicted credit score.
Calculating a loss value for the initial credit score model based on the predicted credit score and the sample credit score.
If the loss value is not in the preset range, adjusting the initial credit scoring model based on the loss value, continuing training the adjusted initial credit scoring model until the loss value is in the preset range, and taking the initial credit scoring model as the target credit scoring model.
Fig. 5-1 is a block diagram of a credit score generation apparatus according to an embodiment of the present invention, and as shown in fig. 5-1, the apparatus 50 may include:
a first generating module 501, configured to generate a user characteristic of a user to be predicted according to user-related information of the user to be predicted.
A second generating module 502, configured to use the user characteristics of the user to be predicted as an input of a target credit scoring model, and generate a credit score of the user to be predicted by using the target credit scoring model.
Wherein the target credit scoring model is generated using the apparatus of any of the preceding embodiments.
In summary, the credit score generation apparatus provided in the embodiment of the present invention may generate the user characteristics of the user to be predicted according to the user-related information of the user to be predicted, use the user characteristics of the user to be predicted as the input of the target credit score model, and generate the credit score of the user to be predicted by using the target credit score model. The target credit scoring model is obtained by training the sample users corresponding to different credit degrees, so that the characteristics of the user with higher credibility and the characteristics of the user with ordinary or lower credibility can be learned in the training process, and the accuracy of the credit scoring determined by using the target credit scoring model can be improved to a certain extent.
Optionally, as shown in fig. 5-2, the apparatus 50 further includes:
the storage module 503 is configured to store the credit score of the user to be predicted and the user identifier ID of the user to be predicted in a credit score database.
The search module 504 is configured to, in a case that a search request is received, search a credit score corresponding to a user ID from the credit score database according to the user ID of the user to be searched included in the search request, so as to serve as the credit score of the user to be searched.
Optionally, the apparatus 50 further includes:
an obtaining module 505, configured to obtain, for any user in the credit score database, an updated user characteristic of the user according to a first preset period; the updated user characteristics are generated according to updated user related information of the user, wherein the updated user related information is acquired within a preset time from the current time.
An input module 506, configured to use the updated user characteristics as an input of the target credit scoring model, and generate a credit score of the user by using the target credit scoring model.
An updating module 507, configured to update the credit score corresponding to the user in the credit score database by using the generated credit score.
Optionally, as shown in fig. 5-3, the apparatus 50 further includes:
a first determining module 508, configured to determine, according to the user-related information of the user to be predicted, a crowd category to which the user to be predicted belongs.
A second determining module 509, configured to determine a credit score interval to which the credit score of the user to be predicted belongs.
A recommending module 510, configured to recommend the user to be predicted to a resource provider corresponding to the credit scoring interval and the crowd category.
An embodiment of the present invention further provides an electronic device, as shown in fig. 6, including a processor 601, a communication interface 602, a memory 603, and a communication bus 604, where the processor 601, the communication interface 602, and the memory 603 complete mutual communication through the communication bus 604,
a memory 603 for storing a computer program;
the processor 601 is configured to implement the following steps when executing the program stored in the memory 603:
selecting users corresponding to different credit degrees as sample users; the sample users at least comprise users with credit degrees larger than a preset threshold value and users with credit degrees not larger than the preset threshold value;
acquiring sample characteristics of the sample user according to the user related information of the sample user, and generating a sample credit score of the sample user;
and taking the sample characteristics and the sample credit scores as training data, and training an initial credit score model to obtain a target credit score model.
The communication bus mentioned in the above terminal may be a Peripheral Component Interconnect (PCI) bus, an Extended Industry Standard Architecture (EISA) bus, or the like. The communication bus may be divided into an address bus, a data bus, a control bus, etc. For ease of illustration, only one thick line is shown, but this does not mean that there is only one bus or one type of bus.
The communication interface is used for communication between the terminal and other equipment.
The Memory may include a Random Access Memory (RAM) or a non-volatile Memory (non-volatile Memory), such as at least one disk Memory. Optionally, the memory may also be at least one memory device located remotely from the processor.
The Processor may be a general-purpose Processor, and includes a Central Processing Unit (CPU), a Network Processor (NP), and the like; the device can also be a Digital Signal Processor (DSP), an Application Specific Integrated Circuit (ASIC), a Field Programmable Gate Array (FPGA) or other Programmable logic device, a discrete Gate or transistor logic device, or a discrete hardware component.
In yet another embodiment of the present invention, there is also provided a computer-readable storage medium having stored therein instructions, which when run on a computer, cause the computer to perform the model generation method and/or the credit score generation method of any of the above embodiments.
In a further embodiment provided by the present invention, there is also provided a computer program product comprising instructions which, when run on a computer, cause the computer to perform the model generation method and or the credit score generation method of any of the above embodiments.
In the above embodiments, the implementation may be wholly or partially realized by software, hardware, firmware, or any combination thereof. When implemented in software, may be implemented in whole or in part in the form of a computer program product. The computer program product includes one or more computer instructions. When loaded and executed on a computer, cause the processes or functions described in accordance with the embodiments of the invention to occur, in whole or in part. The computer may be a general purpose computer, a special purpose computer, a network of computers, or other programmable device. The computer instructions may be stored in a computer readable storage medium or transmitted from one computer readable storage medium to another, for example, from one website site, computer, server, or data center to another website site, computer, server, or data center via wired (e.g., coaxial cable, fiber optic, Digital Subscriber Line (DSL)) or wireless (e.g., infrared, wireless, microwave, etc.). The computer-readable storage medium can be any available medium that can be accessed by a computer or a data storage device, such as a server, a data center, etc., that incorporates one or more of the available media. The usable medium may be a magnetic medium (e.g., floppy Disk, hard Disk, magnetic tape), an optical medium (e.g., DVD), or a semiconductor medium (e.g., Solid State Disk (SSD)), among others.
It is noted that, herein, relational terms such as first and second, and the like may be used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Also, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other identical elements in a process, method, article, or apparatus that comprises the element.
All the embodiments in the present specification are described in a related manner, and the same and similar parts among the embodiments may be referred to each other, and each embodiment focuses on the differences from the other embodiments. In particular, for the system embodiment, since it is substantially similar to the method embodiment, the description is simple, and for the relevant points, reference may be made to the partial description of the method embodiment.
The above description is only for the preferred embodiment of the present invention, and is not intended to limit the scope of the present invention. Any modification, equivalent replacement, or improvement made within the spirit and principle of the present invention shall fall within the protection scope of the present invention.

Claims (16)

1. A method of model generation, the method comprising:
selecting users corresponding to different credit degrees as sample users; the sample users at least comprise users with credit degrees larger than a preset threshold value and users with credit degrees not larger than the preset threshold value;
acquiring sample characteristics of the sample user according to the user related information of the sample user, and generating a sample credit score of the sample user;
and taking the sample characteristics and the sample credit scores as training data, and training an initial credit score model to obtain a target credit score model.
2. The method according to claim 1, wherein the selecting users corresponding to different credit degrees as sample users comprises:
acquiring the loan related information of the borrowed user;
selecting at least one non-overdue borrowed user from the borrowed users according to the lending related information of the borrowed users to serve as the sample user; selecting at least one borrowed user with overdue conditions meeting preset conditions from the borrowed users to serve as the sample user;
the loaned users who are not overdue are users with the credit degree larger than a preset threshold, and the loaned users with the overdue condition meeting the preset condition are users with the credit degree not larger than the preset threshold.
3. The method of claim 2, wherein the loan-related information is an overdue-related parameter of the borrowed user within a predetermined time period;
the selecting at least one lended user who is not overdue and a lended user whose overdue condition meets a preset condition from the lended users according to the lending related information of the lended user comprises the following steps:
determining the borrowed user with the overdue related parameter larger than the preset parameter threshold as the borrowed user with the overdue condition meeting the preset condition; selecting at least one borrowed user from borrowed users with overdue conditions meeting preset conditions to serve as the sample user;
determining the borrowed user with the overdue related parameter not larger than the preset parameter threshold as a borrowed user without overdue; selecting at least one borrowed user from the non-overdue borrowed users as the sample user;
wherein the overdue related parameters at least comprise overdue times and/or overdue duration.
4. The method of claim 1, wherein the user-related information comprises at least two dimensions of related information;
the obtaining of the sample characteristics of the sample user according to the user-related information of the sample user includes:
vectorizing the related information of at least two dimensions to obtain the to-be-selected features of at least two dimensions;
for the feature to be selected of any dimension, determining the importance degree corresponding to the feature to be selected of the dimension;
and selecting the corresponding candidate features of the first m dimensions with the highest importance as the sample features, wherein m is an integer not less than 1.
5. The method of claim 4, wherein the related information comprises at least user representation information, application and network usage information of a device used by the user, user viewing behavior information, payment information, and/or click behavior information for a specific type of advertisement.
6. The method according to claim 4, wherein the determining the importance corresponding to the candidate features of the dimension comprises:
fitting a decision tree corresponding to the feature to be selected of the dimensionality according to the negative gradient of the loss function of the initial credit scoring model;
and counting the sum of the gains of all branches in the decision tree to obtain the importance of the feature to be selected of the dimensionality.
7. The method of claim 1, wherein the user-related information comprises at least two dimensions of related information; the generating a sample credit score for the sample user, comprising:
determining a score corresponding to the relevant information of each dimension according to a preset corresponding relation between the relevant information and the score;
selecting the related information of the first n dimensionalities with the maximum corresponding score as target related information; n is an integer not less than 1;
and determining the sum of the scores of the target related information as the sample credit score of the sample user.
8. The method of claim 1, wherein training an initial credit score model using the sample features and the sample credit scores as training data to obtain a target credit score model comprises:
inputting the sample features into the initial credit score model, and processing the sample features according to the initial credit score model to generate a predicted credit score;
calculating a loss value for the initial credit score model based on the predicted credit score and the sample credit score;
if the loss value is not in the preset range, adjusting the initial credit scoring model based on the loss value, continuing training the adjusted initial credit scoring model until the loss value is in the preset range, and taking the initial credit scoring model as the target credit scoring model.
9. A method for generating a credit score, the method comprising:
generating user characteristics of a user to be predicted according to user related information of the user to be predicted;
taking the user characteristics of the user to be predicted as the input of a target credit scoring model, and generating the credit score of the user to be predicted by using the target credit scoring model;
wherein the target credit score model is generated using the method of any one of claims 1 to 8.
10. The method according to claim 9, wherein the users to be predicted are a plurality of users in a preset network; after the generating the credit score of the user to be predicted by using the target credit score model, the method further includes:
storing the credit score of the user to be predicted and the user identification ID of the user to be predicted into a credit score database;
and under the condition of receiving a search request, searching a credit score corresponding to the user ID from the credit score database according to the user ID of the user to be searched contained in the search request, wherein the credit score is used as the credit score of the user to be searched.
11. The method of claim 10, further comprising:
for any user in the credit score database, acquiring the updated user characteristics of the user according to a first preset period; the updated user characteristics are generated according to updated user related information of the user, wherein the updated user related information is acquired within a preset time length from the current time;
taking the updated user characteristics as an input of the target credit scoring model, and generating a credit score of the user by using the target credit scoring model;
and updating the credit score corresponding to the user in the credit score database by using the generated credit score.
12. The method of claim 9, wherein after generating the credit score of the user to be predicted using the target credit score model, the method further comprises:
determining the crowd category to which the user to be predicted belongs according to the user related information of the user to be predicted;
determining a credit score interval to which the credit score of the user to be predicted belongs;
and recommending the user to be predicted to a resource provider corresponding to the credit scoring interval and the crowd category.
13. An apparatus for model generation, the apparatus comprising:
the selecting module is used for selecting users corresponding to different credit degrees as sample users; the sample users at least comprise users with credit degrees larger than a preset threshold value and users with credit degrees not larger than the preset threshold value;
the acquisition module is used for acquiring the sample characteristics of the sample user according to the user related information of the sample user and generating the sample credit score of the sample user;
and the training module is used for training an initial credit score model by taking the sample characteristics and the sample credit score as training data so as to obtain a target credit score model.
14. An apparatus for generating a credit score, the apparatus comprising:
the first generation module is used for generating the user characteristics of the user to be predicted according to the user related information of the user to be predicted;
the second generation module is used for taking the user characteristics of the user to be predicted as the input of a target credit scoring model and generating the credit score of the user to be predicted by using the target credit scoring model;
wherein the target credit scoring model is generated using the apparatus of claim 13.
15. An electronic device is characterized by comprising a processor, a communication interface, a memory and a communication bus, wherein the processor and the communication interface are used for realizing mutual communication by the memory through the communication bus;
a memory for storing a computer program;
a processor for implementing the method steps of any of claims 1-12 when executing a program stored in the memory.
16. A computer-readable storage medium, on which a computer program is stored which, when being executed by a processor, carries out the method according to any one of claims 1-12.
CN202010340909.7A 2020-04-26 2020-04-26 Model generation method, credit score generation method, device, equipment and storage medium Pending CN111695084A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010340909.7A CN111695084A (en) 2020-04-26 2020-04-26 Model generation method, credit score generation method, device, equipment and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010340909.7A CN111695084A (en) 2020-04-26 2020-04-26 Model generation method, credit score generation method, device, equipment and storage medium

Publications (1)

Publication Number Publication Date
CN111695084A true CN111695084A (en) 2020-09-22

Family

ID=72476700

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010340909.7A Pending CN111695084A (en) 2020-04-26 2020-04-26 Model generation method, credit score generation method, device, equipment and storage medium

Country Status (1)

Country Link
CN (1) CN111695084A (en)

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112214592A (en) * 2020-11-05 2021-01-12 中科讯飞互联(北京)信息科技有限公司 Reply dialogue scoring model training method, dialogue reply method and device
CN112801775A (en) * 2021-01-29 2021-05-14 中国工商银行股份有限公司 Client credit evaluation method and device
CN112862593A (en) * 2021-01-28 2021-05-28 深圳前海微众银行股份有限公司 Credit scoring card model training method, device, system and computer storage medium
CN113781247A (en) * 2021-09-18 2021-12-10 平安医疗健康管理股份有限公司 Protocol data recommendation method and device, computer equipment and storage medium
CN114358920A (en) * 2022-01-07 2022-04-15 北京百度网讯科技有限公司 Method and device for iterating credit scoring card model, electronic equipment and storage medium

Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108898476A (en) * 2018-06-14 2018-11-27 中国银行股份有限公司 A kind of loan customer credit-graded approach and device
CN109064229A (en) * 2018-08-13 2018-12-21 河海大学 A kind of advertisement recommender system based on somatosensory device
CN109670940A (en) * 2018-11-12 2019-04-23 深圳壹账通智能科技有限公司 Credit Risk Assessment Model generation method and relevant device based on machine learning
CN109684538A (en) * 2018-12-03 2019-04-26 重庆邮电大学 A kind of recommended method and recommender system based on individual subscriber feature
CN109711875A (en) * 2018-12-19 2019-05-03 口碑(上海)信息技术有限公司 Content recommendation method and device
CN109784707A (en) * 2019-01-04 2019-05-21 深圳壹账通智能科技有限公司 Rating business credit method, apparatus, computer equipment and storage medium
CN110070391A (en) * 2019-04-17 2019-07-30 同盾控股有限公司 Data processing method, device, computer-readable medium and electronic equipment
CN110544155A (en) * 2019-09-02 2019-12-06 中诚信征信有限公司 User credit score acquisition method, acquisition device, server and storage medium
CN110909963A (en) * 2018-09-14 2020-03-24 中国软件与技术服务股份有限公司 Credit scoring card model training method and taxpayer abnormal risk assessment method

Patent Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108898476A (en) * 2018-06-14 2018-11-27 中国银行股份有限公司 A kind of loan customer credit-graded approach and device
CN109064229A (en) * 2018-08-13 2018-12-21 河海大学 A kind of advertisement recommender system based on somatosensory device
CN110909963A (en) * 2018-09-14 2020-03-24 中国软件与技术服务股份有限公司 Credit scoring card model training method and taxpayer abnormal risk assessment method
CN109670940A (en) * 2018-11-12 2019-04-23 深圳壹账通智能科技有限公司 Credit Risk Assessment Model generation method and relevant device based on machine learning
CN109684538A (en) * 2018-12-03 2019-04-26 重庆邮电大学 A kind of recommended method and recommender system based on individual subscriber feature
CN109711875A (en) * 2018-12-19 2019-05-03 口碑(上海)信息技术有限公司 Content recommendation method and device
CN109784707A (en) * 2019-01-04 2019-05-21 深圳壹账通智能科技有限公司 Rating business credit method, apparatus, computer equipment and storage medium
CN110070391A (en) * 2019-04-17 2019-07-30 同盾控股有限公司 Data processing method, device, computer-readable medium and electronic equipment
CN110544155A (en) * 2019-09-02 2019-12-06 中诚信征信有限公司 User credit score acquisition method, acquisition device, server and storage medium

Cited By (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112214592A (en) * 2020-11-05 2021-01-12 中科讯飞互联(北京)信息科技有限公司 Reply dialogue scoring model training method, dialogue reply method and device
CN112214592B (en) * 2020-11-05 2024-06-11 科大讯飞(北京)有限公司 Method for training reply dialogue scoring model, dialogue reply method and device thereof
CN112862593A (en) * 2021-01-28 2021-05-28 深圳前海微众银行股份有限公司 Credit scoring card model training method, device, system and computer storage medium
CN112862593B (en) * 2021-01-28 2024-05-03 深圳前海微众银行股份有限公司 Credit scoring card model training method, device and system and computer storage medium
CN112801775A (en) * 2021-01-29 2021-05-14 中国工商银行股份有限公司 Client credit evaluation method and device
CN113781247A (en) * 2021-09-18 2021-12-10 平安医疗健康管理股份有限公司 Protocol data recommendation method and device, computer equipment and storage medium
CN114358920A (en) * 2022-01-07 2022-04-15 北京百度网讯科技有限公司 Method and device for iterating credit scoring card model, electronic equipment and storage medium

Similar Documents

Publication Publication Date Title
CN110442790B (en) Method, device, server and storage medium for recommending multimedia data
WO2020135535A1 (en) Recommendation model training method and related apparatus
CN108829808B (en) Page personalized sorting method and device and electronic equipment
CN111695084A (en) Model generation method, credit score generation method, device, equipment and storage medium
CN108885624B (en) Information recommendation system and method
CN110647683B (en) Information recommendation method and device
CN111797320A (en) Data processing method, device, equipment and storage medium
CN113111250A (en) Service recommendation method and device, related equipment and storage medium
CN114371946B (en) Information push method and information push server based on cloud computing and big data
CN111061948B (en) User tag recommendation method and device, computer equipment and storage medium
US20210097424A1 (en) Dynamic selection of features for training machine learning models
CN114254615A (en) Volume assembling method and device, electronic equipment and storage medium
CN110543601B (en) Method and system for recommending context-aware interest points based on intelligent set
CN111931069A (en) User interest determination method and device and computer equipment
Joorabloo et al. A probabilistic graph-based method to solve precision-diversity dilemma in recommender systems
KR20200130767A (en) Method and device for evaluating whether cryptocurrency is listed on cryptocurrency market using artificial neural network
CN114897607A (en) Data processing method and device for product resources, electronic equipment and storage medium
CN114329167A (en) Hyper-parameter learning, intelligent recommendation, keyword and multimedia recommendation method and device
WO2024113641A1 (en) Video recommendation method and apparatus, and electronic device, computer-readable storage medium and computer program product
KR102637198B1 (en) Method, computing device and computer program for sharing, renting and selling artificial intelligence model through artificial intelligence model production platform
CN116431779B (en) FAQ question-answering matching method and device in legal field, storage medium and electronic device
CN118152759A (en) Data processing method and device, equipment, storage medium and program product
CN117635182A (en) Target recommendation method and device
Aramuthakannan et al. Movie recommendation system using taymon optimized deep learning network
CN116756414A (en) Project recommendation and investment project recommendation methods and devices

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination