CN113011624A - User default prediction method, device, equipment and medium - Google Patents

User default prediction method, device, equipment and medium Download PDF

Info

Publication number
CN113011624A
CN113011624A CN201911308640.8A CN201911308640A CN113011624A CN 113011624 A CN113011624 A CN 113011624A CN 201911308640 A CN201911308640 A CN 201911308640A CN 113011624 A CN113011624 A CN 113011624A
Authority
CN
China
Prior art keywords
user
data
mobile communication
credit investigation
feature
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201911308640.8A
Other languages
Chinese (zh)
Inventor
吴飞
韩屹
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
China Mobile Communications Group Co Ltd
China Mobile Shanghai ICT Co Ltd
Original Assignee
China Mobile Communications Group Co Ltd
China Mobile Shanghai ICT Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by China Mobile Communications Group Co Ltd, China Mobile Shanghai ICT Co Ltd filed Critical China Mobile Communications Group Co Ltd
Priority to CN201911308640.8A priority Critical patent/CN113011624A/en
Publication of CN113011624A publication Critical patent/CN113011624A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q10/00Administration; Management
    • G06Q10/04Forecasting or optimisation specially adapted for administrative or management purposes, e.g. linear programming or "cutting stock problem"
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q10/00Administration; Management
    • G06Q10/06Resources, workflows, human or project management; Enterprise or organisation planning; Enterprise or organisation modelling
    • G06Q10/063Operations research, analysis or management
    • G06Q10/0635Risk analysis of enterprise or organisation activities
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q30/00Commerce
    • G06Q30/02Marketing; Price estimation or determination; Fundraising
    • G06Q30/0201Market modelling; Market analysis; Collecting market data
    • G06Q30/0202Market predictions or forecasting for commercial activities
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches

Abstract

The embodiment of the invention discloses a user default prediction method, a device, equipment and a medium. The method comprises the following steps: acquiring credit investigation data of a target user and mobile communication data of the target user; calculating default probability of the target user by utilizing at least one classification model based on credit investigation data of the target user and mobile communication data of the target user; and judging whether the target user defaults according to the default probability of the target user. The user default prediction method, device, equipment and medium can improve the accuracy of user default prediction.

Description

User default prediction method, device, equipment and medium
Technical Field
The present invention relates to the field of data processing technologies, and in particular, to a method, an apparatus, a device, and a medium for predicting a user default.
Background
The credit investigation data is also called credit information, the credit investigation data reflecting the credit condition of the enterprise is called the enterprise credit information, and the credit investigation data reflecting the personal credit condition is called the personal credit information.
Whether the user will generate default conditions or not can be predicted by utilizing the credit investigation data. Currently, only a single classification model is used to predict whether a user will generate a breach. Whether the user will generate default or not is predicted by using a single classification model, and the prediction accuracy is low.
Disclosure of Invention
The embodiment of the invention provides a user default prediction method, a device, equipment and a medium, which can improve the accuracy of user default prediction.
In one aspect, an embodiment of the present invention provides a user default prediction method, including:
acquiring credit investigation data of a target user and mobile communication data of the target user;
calculating default probability of the target user by utilizing at least one classification model based on credit investigation data of the target user and mobile communication data of the target user; the classification model is used for calculating default probability of the user;
and judging whether the target user defaults according to the default probability of the target user.
In an embodiment of the present invention, determining whether a target user violates a rule according to the default probability of the target user includes:
calculating the average default probability of the target user according to the default probability;
and if the average default probability of the target user is greater than the preset probability, predicting the default of the target user.
In an embodiment of the present invention, before obtaining credit investigation data of a target user and mobile communication data of the target user, the method for predicting user default provided in the embodiment of the present invention further includes:
and training at least one classification model by using credit investigation data of at least one user and mobile communication data of at least one user.
In one embodiment of the present invention, training at least one classification model using credit data of at least one user and mobile communication data of at least one user comprises:
for each user in at least one user, determining derivative data corresponding to the user according to credit investigation data of the user and mobile communication data of the user;
and training at least one classification model according to credit investigation data of at least one user, mobile communication data of at least one user and derivative data corresponding to at least one user.
In an embodiment of the present invention, for each of at least one user, determining derivative data corresponding to the user according to credit investigation data of the user and mobile communication data of the user includes:
and aiming at each user in at least one user, taking the product of the credit value corresponding to the first credit investigation feature in the credit investigation data of the user and the mobile communication value corresponding to the first mobile communication feature in the mobile communication data of the user as one derivative data corresponding to the user.
In an embodiment of the present invention, for each of at least one user, determining derivative data corresponding to the user according to credit investigation data of the user and mobile communication data of the user includes:
and aiming at each user of at least one user, taking the product of the quotient of the mobile communication value corresponding to the second mobile communication characteristic in the mobile communication data of the user and the mobile communication value corresponding to the third mobile communication characteristic and the credit investigation value corresponding to the second credit investigation characteristic in the credit investigation data of the user as one derivative data corresponding to the user.
In an embodiment of the present invention, for each of at least one user, determining derivative data corresponding to the user according to credit investigation data of the user and mobile communication data of the user includes:
for each user of at least one user, preprocessing credit investigation data of the user and mobile communication data of the user;
and determining the derived data of the user by using the data obtained after preprocessing.
In one embodiment of the present invention, training at least one classification model according to credit investigation data of at least one user, mobile communication data of at least one user and derivative data corresponding to at least one user comprises:
determining the feature importance of each feature in at least one credit investigation feature corresponding to the credit investigation data, at least one mobile communication feature corresponding to the mobile communication data and at least one derivative feature corresponding to the derivative data;
removing feature data corresponding to features of which the feature importance is smaller than the preset feature importance from credit investigation data of at least one user, mobile communication data of at least one user and derivative data corresponding to at least one user;
and training at least one classification model by using the data obtained after the removal.
In another aspect, an embodiment of the present invention provides a user default prediction apparatus, including:
the acquisition module is used for acquiring credit investigation data of a target user and mobile communication data of the target user;
the calculation module is used for calculating the default probability of the target user by utilizing at least one classification model based on the credit investigation data of the target user and the mobile communication data of the target user; the classification model is used for calculating default probability of the user;
and the prediction module is used for judging whether the target user defaults according to the default probability of the target user.
In an embodiment of the present invention, the prediction module is specifically configured to:
calculating the average default probability of the target user according to the default probability;
and if the average default probability is larger than the preset probability, determining the default of the target user.
In an embodiment of the present invention, an apparatus for predicting a user default provided in the embodiment of the present invention further includes:
and the training module is used for training at least one classification model by utilizing credit investigation data of at least one user and mobile communication data of at least one user.
In one embodiment of the invention, a training module comprises:
the determining unit is used for determining derivative data corresponding to the user according to credit investigation data of the user and mobile communication data of the user aiming at each user in at least one user;
and the training unit is used for training at least one classification model according to credit investigation data of at least one user, mobile communication data of at least one user and derivative data corresponding to at least one user.
In an embodiment of the present invention, the determining unit is specifically configured to:
and aiming at each user in at least one user, taking the product of the credit value corresponding to the first credit investigation feature in the credit investigation data of the user and the mobile communication value corresponding to the first mobile communication feature in the mobile communication data of the user as one derivative data corresponding to the user.
In an embodiment of the present invention, the determining unit is specifically configured to:
and aiming at each user of at least one user, taking the product of the quotient of the mobile communication value corresponding to the second mobile communication characteristic in the mobile communication data of the user and the mobile communication value corresponding to the third mobile communication characteristic and the credit investigation value corresponding to the second credit investigation characteristic in the credit investigation data of the user as one derivative data corresponding to the user.
In an embodiment of the present invention, the determining unit is specifically configured to:
for each user of at least one user, preprocessing credit investigation data of the user and mobile communication data of the user;
and determining the derived data of the user by using the data obtained after preprocessing.
In an embodiment of the present invention, the training unit is specifically configured to:
determining the feature importance of each feature in at least one credit investigation feature corresponding to the credit investigation data, at least one mobile communication feature corresponding to the mobile communication data and at least one derivative feature corresponding to the derivative data;
removing feature data corresponding to features of which the feature importance is smaller than the preset feature importance from credit investigation data of at least one user, mobile communication data of at least one user and derivative data corresponding to at least one user;
and training at least one classification model by using the data obtained after the removal.
In another aspect, an embodiment of the present invention provides a user default prediction apparatus, including: a memory, a processor, and a computer program stored on the memory and executable on the processor;
the processor implements the user default prediction method provided by the embodiments of the present invention when executing the computer program.
In another aspect, an embodiment of the present invention provides a computer-readable storage medium, on which a computer program is stored, where the computer program, when executed by a processor, implements the user default prediction method provided by the embodiment of the present invention.
According to the user default prediction method, the device, the equipment and the medium, the default probability of the user is calculated by using the at least one classification model, the default probability of the user is calculated according to the at least one classification model, whether the user is default or not is judged, and the accuracy of user default prediction can be improved.
Drawings
In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings required to be used in the embodiments of the present invention will be briefly described below, and for those skilled in the art, other drawings can be obtained according to the drawings without creative efforts.
FIG. 1 is a first flowchart illustrating a method for predicting a user default according to an embodiment of the present invention;
FIG. 2 is a second flowchart illustrating a method for predicting a user default according to an embodiment of the present invention;
FIG. 3 is a schematic structural diagram of a user default prediction apparatus according to an embodiment of the present invention;
FIG. 4 sets forth a block diagram of an exemplary hardware architecture of computing devices capable of implementing the user violation prediction method and apparatus according to embodiments of the present invention.
Detailed Description
Features and exemplary embodiments of various aspects of the present invention will be described in detail below, and in order to make objects, technical solutions and advantages of the present invention more apparent, the present invention will be further described in detail below with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are merely illustrative of the invention and are not to be construed as limiting the invention. It will be apparent to one skilled in the art that the present invention may be practiced without some of these specific details. The following description of the embodiments is merely intended to provide a better understanding of the present invention by illustrating examples of the present invention.
It is noted that, herein, relational terms such as first and second, and the like may be used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Also, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising … …" does not exclude the presence of other identical elements in a process, method, article, or apparatus that comprises the element.
In order to solve the problems in the prior art, embodiments of the present invention provide a method, an apparatus, a device, and a medium for predicting a user default. First, a user default prediction method provided by the embodiment of the present invention is described below.
Fig. 1 shows a first flowchart of a user default prediction method according to an embodiment of the present invention. The user default prediction method may include:
s101: acquiring credit investigation data of a target user and mobile communication data of the target user.
S102: and calculating the default probability of the target user by utilizing at least one classification model based on the credit investigation data of the target user and the mobile communication data of the target user.
Wherein the classification model is used for calculating the default probability of the user.
S103: and judging whether the target user defaults according to the default probability of the target user.
The credit investigation data of the embodiment of the invention includes but is not limited to: the credit card consumption limit of the user, the average monthly consumption amount of the credit card of the user, the average monthly consumption pen number of the credit card of the user and the like. Mobile communication data includes, but is not limited to: the number of financial Application programs (APP) of the user terminal equipment, monthly mobile communication package cost of the user, monthly average mobile communication cost of the user and the like.
Illustratively, assume that there are three classification models for calculating the user's default probability. The three classification models are respectively: distributed Gradient Boosting framework (LightGBM), Gradient Boosting iterative decision tree (GBDT), and eXtreme Gradient Boosting (XGBoost) based on decision tree algorithms.
In an embodiment of the present invention, the average default probability of the target user may be calculated according to the default probability of the target user calculated by using at least one classification model; and if the average default probability of the target user is greater than the preset probability, predicting the default of the target user.
In an embodiment of the present invention, the average default probability of the embodiment of the present invention may be an arithmetic average default probability, a weighted average default probability, a geometric average default probability, a harmonic average default probability, or a squared average default probability.
Illustratively, the arithmetic mean default probability is taken as an example. The default probability of the target user is calculated to be 0.44 by using the LightGBM; calculating default probability of the target user to be 0.49 by using GBDT; and calculating the default probability of the target user to be 0.81 by using the XGboost.
The arithmetic average default probability of the target user is: (0.44+0.49+ 0.81)/3-0.58.
And judging whether the target user defaults according to the arithmetic average default probability 0.58 of the target user.
In one embodiment of the present invention, if the average default probability of the target user is greater than the preset probability, the default of the target user is predicted.
Illustratively, assuming that the preset probability is 0.5, the average default probability 0.58 of the target user is greater than the preset probability 0.5, and the target user default is predicted.
Illustratively, assuming that the preset probability is 0.75, the average default probability 0.58 of the target user is less than the preset probability 0.75, and the target user is predicted not to default.
According to the user default prediction method provided by the embodiment of the invention, the default probability of the user is predicted by using the at least one classification model, and the average value of the default probabilities of the user is predicted based on the at least one classification model to predict whether the user is default, so that the accuracy of user default prediction can be improved. In addition, the embodiment of the invention introduces mobile communication data, and can further improve the accuracy of the user default prediction.
In one embodiment of the present invention, using at least one classification model, the at least one classification model needs to be trained before predicting the probability of breach of the target user.
In one embodiment of the invention, at least one classification model may be trained using credit data of at least one user and mobile communication data of at least one user.
For example, it is assumed that the credit investigation data of the user includes data corresponding to X credit investigation features, and the mobile communication data of the user includes data corresponding to Y mobile communication features.
The credit data and mobile communication data of at least one user are obtained as shown in table 1.
TABLE 1
Figure BDA0002323884240000071
In Table 1, ZTijData representing credit investigation characteristics j in credit investigation data of a user i; YTijData indicating a mobile communication characteristic j in the mobile communication data of the user i.
At least one classification model is trained using credit investigation data and mobile communication data of the N users in table 1.
The embodiment of the invention does not limit the way of training the classification model by utilizing credit investigation data and mobile communication data, and any available training way can be applied to the embodiment of the invention.
In one embodiment of the present invention, when training at least one classification model by using credit investigation data of at least one user and mobile communication data of at least one user, for each user of at least one user, derivative data corresponding to the user can be determined according to the credit investigation data of the user and the mobile communication data of the user; and training at least one classification model according to credit investigation data of at least one user, mobile communication data of at least one user and derivative data corresponding to at least one user.
In an embodiment of the present invention, when determining, for each of at least one user, derivative data corresponding to the user according to credit investigation data of the user and mobile communication data of the user, for each of the at least one user, a product of a credit investigation value corresponding to a first credit investigation feature in the credit investigation data of the user and a mobile communication value corresponding to the first mobile communication feature in the mobile communication data of the user may be used as the one derivative data corresponding to the user.
For example, it is assumed that the first credit-assessing feature is the credit card consumption limit of the user, and the first mobile communication feature is the number of financial APP of the user terminal equipment.
And performing product operation on the financial APP number of the user terminal equipment in the mobile communication data of the user and the credit card consumption amount of the user in the credit investigation data of the user, and taking the obtained product as derivative data of the user. The more financial APP of the user terminal equipment, the more financial APP, the more fund flow of the user can be reflected generally from the side, the financial APP number of the user terminal equipment is multiplied by the credit card consumption amount of the user, the characteristic of the credit card consumption amount of the user is amplified, the credit card consumption amount of the user is originally a variable which has a larger influence on the model in default modeling, the difference of the characteristic among samples is amplified by combining the financial APP number of the user terminal equipment, the sensitivity of a default prediction model to the characteristic can be improved, and the accuracy of the model is improved.
In an embodiment of the present invention, when determining, for each of at least one user, derivative data corresponding to the user according to credit investigation data of the user and mobile communication data of the user, for each of the at least one user, a product of a quotient of a mobile communication value corresponding to a second mobile communication feature in the mobile communication data of the user and a mobile communication value corresponding to a third mobile communication feature and a credit investigation value corresponding to the second credit investigation feature in the credit investigation data of the user may be used as one derivative data corresponding to the user.
Illustratively, the second mobile communication characteristic is the average mobile communication cost per month of the user, the third mobile communication characteristic is the monthly mobile communication set meal fee of the user, and the second credit-assessing characteristic is the average consumption number per month of the credit card of the user.
Carrying out quotient operation on the monthly average mobile communication cost of the user and the monthly mobile communication package fee of the user in the mobile communication data of the user, carrying out product operation on the obtained quotient value and the monthly average consumption number of the credit card of the user in the credit data of the user, and taking the obtained product as another derivative data of the user. The quotient value can reflect monthly over-run condition of the user, and indirectly reflect planning capacity and consumption stability of the user. The quotient is multiplied by the average consumption number of the user credit card per month, the characteristic of the average consumption number of the user credit card per month is amplified, and the model accuracy can be improved.
Illustratively, it is assumed that the obtained credit investigation data and mobile communication data of at least one user are as shown in table 1 above. The results of generating the derivative data are shown in table 2.
TABLE 2
Figure BDA0002323884240000101
In Table 2, DTijData representing the derived feature j in the derived data of user i.
Training at least one classification model using credit investigation data, mobile communication data and derivative data of the N users in table 2.
In an embodiment of the present invention, when determining, for each of at least one user, derivative data corresponding to the user according to credit investigation data of the user and mobile communication data of the user, the credit investigation data of the user and the mobile communication data of the user may be preprocessed for each of the at least one user; and determining the derived data of the user by using the data obtained after preprocessing.
Preprocessing of embodiments of the present invention includes, but is not limited to, missing value supplementation, deletion features, and outlier replacement.
For example, it is assumed that the eigenvalue of credit investigation feature 1 in the credit investigation data of user 1 is missing.
In an embodiment of the present invention, the feature value of the credit investigation feature 1 in the credit investigation data of the user 1 may be supplemented with the average feature value of the credit investigation feature 1 in the credit investigation data of the users 2 to N.
The average characteristic value of the embodiment of the present invention may be an arithmetic average characteristic value, a weighted average characteristic value, a geometric average characteristic value, a harmonic average characteristic value, or a square average characteristic value.
In an embodiment of the present invention, the characteristic value of the credit investigation feature 1 in the credit investigation data of the user 1 may be supplemented with a mode of the characteristic values of the credit investigation feature 1 in the credit investigation data of the users 2 to N.
In one embodiment of the invention, a feature may be deleted if, for that feature, multiple users (e.g., more than half of the users) lack the feature value for that feature.
In one embodiment of the invention, for an abnormal feature value, the average value of other normal feature values may be used to replace the abnormal feature value.
In one embodiment of the present invention, if a plurality of users (for example, more than half of the users) have abnormal feature values corresponding to a feature, the feature may be deleted.
In one embodiment of the invention, a box graph may be used to determine whether a feature value is an outlier feature value.
After the credit investigation data of the user and the mobile communication data of the user are preprocessed, the derived data of the user can be determined by utilizing the preprocessed data.
The process of determining the derivative data of the user by using the data obtained after the preprocessing is similar to the process of determining the derivative data of the user by using the data which is not preprocessed, and specifically, the process of determining the derivative data of the user by using the data which is not preprocessed can be referred to. The embodiments of the present invention are not described herein in detail.
It can be understood that, after preprocessing credit investigation data and mobile communication data of each of at least one user, when predicting the default probability of the target user by using at least one classification model based on the credit investigation data and mobile communication data of the target user, the credit investigation data and mobile communication data of the target user also need to be preprocessed accordingly.
For example, when the credit investigation data and the mobile communication data of each of the at least one user are preprocessed, the credit investigation feature j in the credit investigation data is deleted, and for the target user, the credit investigation feature j in the credit investigation data of the target user needs to be deleted.
For another example, when the credit investigation data and the mobile communication data of each of the at least one user are preprocessed, the feature value of the credit investigation feature j in the credit investigation data is supplemented by the average value, and for the target user, if the feature value of the credit investigation feature j in the credit investigation data of the target user is missing, the feature value of the credit investigation feature j in the credit investigation data of the target user is supplemented by the average value.
For another example, when the credit investigation data and the mobile communication data of each of the at least one user are preprocessed, the characteristic value of the credit investigation feature j in the credit investigation data is supplemented by a mode, and for the target user, if the characteristic value of the credit investigation feature j in the credit investigation data of the target user is missing, the characteristic value of the credit investigation feature j in the credit investigation data of the target user is supplemented by a mode.
In an embodiment of the present invention, when training at least one classification model according to credit investigation data of at least one user, mobile communication data of at least one user and derivative data corresponding to at least one user, the feature importance of each feature of at least one credit investigation feature corresponding to the credit investigation data, at least one mobile communication feature corresponding to the mobile communication data and at least one derivative feature corresponding to the derivative data can be determined; removing feature data corresponding to features of which the feature importance is smaller than the preset feature importance from credit investigation data of at least one user, mobile communication data of at least one user and derivative data corresponding to at least one user; and training at least one classification model by using the data obtained after the removal.
In one embodiment of the invention, a random forest algorithm may be used to determine the feature importance of each feature.
Illustratively, assume that the predetermined characteristic importance is 0.5.
Based on table 2 above, the feature importance of the credit investigation feature 1 is 0.25, … …, the feature importance of the credit investigation feature i is 0.75, … …, the feature importance of the credit investigation feature X is 0.8, the feature importance of the mobile communication feature 1 is 0.55, … …, the feature importance of the mobile communication feature i is 0.73, … …, the feature importance of the mobile communication feature Y is 0.81, the feature importance of the derivative feature 1 is 0.6, … …, the feature importance of the derivative feature i is 0.78, … …, and the feature importance of the derivative feature Z is 0.86.
And if the characteristic importance of the credit investigation feature 1 is smaller than the preset characteristic importance, removing the characteristic data corresponding to the credit investigation feature 1.
The result of removing the signature data corresponding to the credit signature 1 is shown in table 3.
TABLE 3
Figure BDA0002323884240000131
At least one classification model is trained based on the data shown in table 3.
Based on the above, a user default prediction method provided by the embodiment of the present invention is shown in fig. 2. Fig. 2 is a second flowchart illustrating a user default prediction method according to an embodiment of the present invention.
First, credit investigation data and mobile communication data of at least one user are obtained.
And preprocessing credit investigation data and mobile communication data of at least one user.
And determining the derivative data of at least one user by utilizing the preprocessed credit investigation data and mobile communication data of at least one user.
Determining a feature importance of each of the features of the credit data, the mobile communication data and the derived data of the at least one user.
And screening the features based on the feature importance, and removing feature data corresponding to the features with the feature importance smaller than the preset feature importance.
Training N classification models for predicting default probability of the user by utilizing credit investigation data, mobile communication data and derivative data of at least one user after characteristic data corresponding to the characteristics with the characteristic importance smaller than the preset characteristic importance are removed;
and acquiring credit investigation data and mobile communication data of the target user.
And correspondingly preprocessing credit investigation data and mobile communication data of the target user.
And determining the derived data of the target user by utilizing the preprocessed credit investigation data and mobile communication data of the target user.
And removing the characteristic data corresponding to the characteristic with the characteristic importance smaller than the preset characteristic importance from the credit investigation data, the mobile communication data and the derivative data of the target user.
And calculating the default probability of the target user by utilizing the N classification models based on the credit investigation data, the mobile communication data and the derivative data of the target user after the feature data corresponding to the features with the feature importance smaller than the preset feature importance are removed.
And calculating the average default probability of the target user.
And (4) default prediction, namely judging whether the average default probability of the target user is greater than the preset probability, if so, predicting default of the target user, and if not, predicting that the target user does not default.
Corresponding to the above method embodiment, the embodiment of the present invention further provides a device for predicting user default.
Fig. 3 is a schematic structural diagram of a user default prediction apparatus according to an embodiment of the present invention. The user default prediction means may include:
the obtaining module 301 is configured to obtain credit investigation data of a target user and mobile communication data of the target user.
A calculating module 302, configured to calculate a default probability of the target user by using at least one classification model based on the credit investigation data of the target user and the mobile communication data of the target user.
The classification model is used for calculating default probability of the user.
The predicting module 303 is configured to determine whether the target user defaults according to the default probability of the target user.
In an embodiment of the present invention, the prediction module 303 may be specifically configured to:
calculating the average default probability of the target user according to the default probability;
and if the average default probability is larger than the preset probability, determining the default of the target user.
In an embodiment of the present invention, the apparatus for predicting a user default provided in the embodiment of the present invention may further include:
and the training module is used for training at least one classification model by utilizing credit investigation data of at least one user and mobile communication data of at least one user.
In one embodiment of the present invention, the training module may include:
the determining unit is used for determining derivative data corresponding to the user according to credit investigation data of the user and mobile communication data of the user aiming at each user in at least one user;
and the training unit is used for training at least one classification model according to credit investigation data of at least one user, mobile communication data of at least one user and derivative data corresponding to at least one user.
In an embodiment of the present invention, the determining unit may be specifically configured to:
and aiming at each user in at least one user, taking the product of the credit value corresponding to the first credit investigation feature in the credit investigation data of the user and the mobile communication value corresponding to the first mobile communication feature in the mobile communication data of the user as one derivative data corresponding to the user.
In an embodiment of the present invention, the determining unit may be specifically configured to:
and aiming at each user of at least one user, taking the product of the quotient of the mobile communication value corresponding to the second mobile communication characteristic in the mobile communication data of the user and the mobile communication value corresponding to the third mobile communication characteristic and the credit investigation value corresponding to the second credit investigation characteristic in the credit investigation data of the user as one derivative data corresponding to the user.
In an embodiment of the present invention, the determining unit may be specifically configured to:
for each user of at least one user, preprocessing credit investigation data of the user and mobile communication data of the user;
and determining the derived data of the user by using the data obtained after preprocessing.
In an embodiment of the present invention, the training unit may specifically be configured to:
determining the feature importance of each feature in at least one credit investigation feature corresponding to the credit investigation data, at least one mobile communication feature corresponding to the mobile communication data and at least one derivative feature corresponding to the derivative data;
removing feature data corresponding to features of which the feature importance is smaller than the preset feature importance from credit investigation data of at least one user, mobile communication data of at least one user and derivative data corresponding to at least one user;
and training at least one classification model by using the data obtained after the removal.
FIG. 4 sets forth a block diagram of an exemplary hardware architecture of computing devices capable of implementing the user violation prediction method and apparatus according to embodiments of the present invention. As shown in fig. 4, computing device 400 includes an input device 401, an input interface 402, a central processor 403, a memory 404, an output interface 405, and an output device 406. The input interface 402, the central processing unit 403, the memory 404, and the output interface 405 are connected to each other through a bus 410, and the input device 401 and the output device 406 are connected to the bus 410 through the input interface 402 and the output interface 405, respectively, and further connected to other components of the computing device 400.
Specifically, the input device 401 receives input information from the outside and transmits the input information to the central processor 403 through the input interface 402; the central processor 403 processes the input information based on computer-executable instructions stored in the memory 404 to generate output information, stores the output information temporarily or permanently in the memory 404, and then transmits the output information to the output device 406 through the output interface 405; output device 406 outputs the output information outside of computing device 400 for use by a user.
That is, the computing device shown in fig. 4 may also be implemented as a user breach prediction device, which may include: a memory storing a computer program; and a processor, which can implement the user default prediction method provided by the embodiment of the present invention when executing the computer program.
An embodiment of the present invention further provides a computer-readable storage medium, where a computer program is stored on the computer-readable storage medium; the computer program, when executed by a processor, implements the user default prediction method provided by embodiments of the present invention.
It is to be understood that the invention is not limited to the specific arrangements and instrumentality described above and shown in the drawings. A detailed description of known methods is omitted herein for the sake of brevity. In the above embodiments, several specific steps are described and shown as examples. However, the method processes of the present invention are not limited to the specific steps described and illustrated, and those skilled in the art can make various changes, modifications and additions or change the order between the steps after comprehending the spirit of the present invention.
The functional blocks shown in the above-described structural block diagrams may be implemented as hardware, software, firmware, or a combination thereof. When implemented in hardware, it may be, for example, an electronic circuit, an Application Specific Integrated Circuit (ASIC), suitable firmware, plug-in, function card, or the like. When implemented in software, the elements of the invention are the programs or code segments used to perform the required tasks. The program or code segments may be stored in a machine-readable medium or transmitted by a data signal carried in a carrier wave over a transmission medium or a communication link. A "machine-readable medium" may include any medium that can store or transfer information. Examples of a machine-readable medium include electronic circuits, semiconductor memory devices, ROM, flash memory, Erasable ROM (EROM), floppy disks, CD-ROMs, optical disks, hard disks, fiber optic media, Radio Frequency (RF) links, and so forth. The code segments may be downloaded via computer networks such as the internet, intranet, etc.
It should also be noted that the exemplary embodiments mentioned in this patent describe some methods or systems based on a series of steps or devices. However, the present invention is not limited to the order of the above-described steps, that is, the steps may be performed in the order mentioned in the embodiments, may be performed in an order different from the order in the embodiments, or may be performed simultaneously.
As described above, only the specific embodiments of the present invention are provided, and it can be clearly understood by those skilled in the art that, for convenience and brevity of description, the specific working processes of the system, the module and the unit described above may refer to the corresponding processes in the foregoing method embodiments, and are not described herein again. It should be understood that the scope of the present invention is not limited thereto, and any person skilled in the art can easily conceive various equivalent modifications or substitutions within the technical scope of the present invention, and these modifications or substitutions should be covered within the scope of the present invention.

Claims (13)

1. A method for predicting a user breach, the method comprising:
acquiring credit investigation data of a target user and mobile communication data of the target user;
calculating a default probability of the target user based on the credit investigation data and the mobile communication data by using at least one classification model; the classification model is used for calculating default probability of the user;
and judging whether the target user defaults according to the default probability.
2. The method of claim 1, wherein said determining whether the target user violates the default according to the default probability comprises:
calculating the average default probability of the target user according to the default probability;
and if the average default probability is larger than the preset probability, judging the default of the target user.
3. The method of claim 1, wherein before the obtaining credit investigation data of the target user and mobile communication data of the target user, the method further comprises:
and training the at least one classification model by utilizing credit investigation data of at least one user and mobile communication data of the at least one user.
4. The method of claim 3, wherein the training the at least one classification model using credit data of at least one user and mobile communication data of the at least one user comprises:
for each user of the at least one user, determining derivative data corresponding to the user according to credit investigation data of the user and mobile communication data of the user;
and training the at least one classification model according to the credit investigation data of the at least one user, the mobile communication data of the at least one user and the derivative data corresponding to the at least one user.
5. The method according to claim 4, wherein the determining, for each of the at least one user, the derivative data corresponding to the user according to the credit investigation data of the user and the mobile communication data of the user comprises:
and aiming at each user of the at least one user, taking the product of the credit value corresponding to the first credit characteristic in the credit data of the user and the mobile communication value corresponding to the first mobile communication characteristic in the mobile communication data of the user as one derivative data corresponding to the user.
6. The method according to claim 4, wherein the determining, for each of the at least one user, the derivative data corresponding to the user according to the credit investigation data of the user and the mobile communication data of the user comprises:
and for each user of the at least one user, taking the product of the quotient of the mobile communication value corresponding to the second mobile communication characteristic in the mobile communication data of the user and the mobile communication value corresponding to the third mobile communication characteristic and the credit investigation value corresponding to the second credit investigation characteristic in the credit investigation data of the user as one derivative data corresponding to the user.
7. The method according to claim 4, wherein the determining, for each of the at least one user, the derivative data corresponding to the user according to the credit investigation data of the user and the mobile communication data of the user comprises:
for each user of the at least one user, preprocessing credit investigation data of the user and mobile communication data of the user;
and determining the derived data of the user by using the data obtained after preprocessing.
8. The method of claim 4, wherein the training the at least one classification model according to the credit data of the at least one user, the mobile communication data of the at least one user, and the derivative data corresponding to the at least one user comprises:
determining the feature importance of each feature in at least one credit investigation feature corresponding to the credit investigation data, at least one mobile communication feature corresponding to the mobile communication data and at least one derivative feature corresponding to the derivative data;
removing feature data corresponding to features of which the feature importance is smaller than the preset feature importance from credit investigation data of the at least one user, mobile communication data of the at least one user and derivative data corresponding to the at least one user;
and training the at least one classification model by using the data obtained after the removal.
9. A user breach prediction apparatus, comprising:
the acquisition module is used for acquiring credit investigation data of a target user and mobile communication data of the target user;
a calculation module for calculating a default probability of the target user based on the credit investigation data and the mobile communication data by using at least one classification model; the classification model is used for calculating default probability of the user;
and the prediction module is used for judging whether the target user defaults according to the default probability.
10. The apparatus of claim 9, wherein the prediction module is specifically configured to:
calculating the average default probability of the target user according to the default probability;
and if the average default probability is larger than the preset probability, judging the default of the target user.
11. The apparatus of claim 9, further comprising:
and the training module is used for training the at least one classification model by utilizing credit investigation data of at least one user and mobile communication data of the at least one user.
12. A user breach prediction device, the device comprising: a memory, a processor, and a computer program stored on the memory and executable on the processor;
the processor, when executing the computer program, implements the user violation prediction method of any of claims 1-8.
13. A computer-readable storage medium, having stored thereon a computer program which, when executed by a processor, implements the user violation prediction method of any of claims 1-8.
CN201911308640.8A 2019-12-18 2019-12-18 User default prediction method, device, equipment and medium Pending CN113011624A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201911308640.8A CN113011624A (en) 2019-12-18 2019-12-18 User default prediction method, device, equipment and medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201911308640.8A CN113011624A (en) 2019-12-18 2019-12-18 User default prediction method, device, equipment and medium

Publications (1)

Publication Number Publication Date
CN113011624A true CN113011624A (en) 2021-06-22

Family

ID=76382405

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201911308640.8A Pending CN113011624A (en) 2019-12-18 2019-12-18 User default prediction method, device, equipment and medium

Country Status (1)

Country Link
CN (1) CN113011624A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116091206A (en) * 2023-01-31 2023-05-09 金电联行(北京)信息技术有限公司 Credit evaluation method, credit evaluation device, electronic equipment and storage medium

Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2018090657A1 (en) * 2016-11-18 2018-05-24 同济大学 Bp_adaboost model-based method and system for predicting credit card user default
CN108256691A (en) * 2018-02-08 2018-07-06 成都智宝大数据科技有限公司 Refund Probabilistic Prediction Model construction method and device
CN109034658A (en) * 2018-08-22 2018-12-18 重庆邮电大学 A kind of promise breaking consumer's risk prediction technique based on big data finance
CN109191282A (en) * 2018-08-23 2019-01-11 北京玖富普惠信息技术有限公司 Methods of marking and system are monitored in a kind of loan of Behavior-based control model
CN109344998A (en) * 2018-09-06 2019-02-15 盈盈(杭州)网络技术有限公司 A kind of customer default probability forecasting method based on medical and beauty treatment scene
CN109360084A (en) * 2018-09-27 2019-02-19 平安科技(深圳)有限公司 Appraisal procedure and device, storage medium, the computer equipment of reference default risk
CN109949152A (en) * 2019-04-15 2019-06-28 武汉理工大学 A kind of personal credit's violation correction method
CN110349009A (en) * 2019-07-02 2019-10-18 北京淇瑀信息科技有限公司 A kind of bull debt-credit violation correction method, apparatus and electronic equipment
CN110348721A (en) * 2019-06-29 2019-10-18 北京淇瑀信息科技有限公司 Financial default risk prediction technique, device and electronic equipment based on GBST

Patent Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2018090657A1 (en) * 2016-11-18 2018-05-24 同济大学 Bp_adaboost model-based method and system for predicting credit card user default
CN108256691A (en) * 2018-02-08 2018-07-06 成都智宝大数据科技有限公司 Refund Probabilistic Prediction Model construction method and device
CN109034658A (en) * 2018-08-22 2018-12-18 重庆邮电大学 A kind of promise breaking consumer's risk prediction technique based on big data finance
CN109191282A (en) * 2018-08-23 2019-01-11 北京玖富普惠信息技术有限公司 Methods of marking and system are monitored in a kind of loan of Behavior-based control model
CN109344998A (en) * 2018-09-06 2019-02-15 盈盈(杭州)网络技术有限公司 A kind of customer default probability forecasting method based on medical and beauty treatment scene
CN109360084A (en) * 2018-09-27 2019-02-19 平安科技(深圳)有限公司 Appraisal procedure and device, storage medium, the computer equipment of reference default risk
CN109949152A (en) * 2019-04-15 2019-06-28 武汉理工大学 A kind of personal credit's violation correction method
CN110348721A (en) * 2019-06-29 2019-10-18 北京淇瑀信息科技有限公司 Financial default risk prediction technique, device and electronic equipment based on GBST
CN110349009A (en) * 2019-07-02 2019-10-18 北京淇瑀信息科技有限公司 A kind of bull debt-credit violation correction method, apparatus and electronic equipment

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116091206A (en) * 2023-01-31 2023-05-09 金电联行(北京)信息技术有限公司 Credit evaluation method, credit evaluation device, electronic equipment and storage medium
CN116091206B (en) * 2023-01-31 2023-10-20 金电联行(北京)信息技术有限公司 Credit evaluation method, credit evaluation device, electronic equipment and storage medium

Similar Documents

Publication Publication Date Title
Liu et al. An information theoretic approach of designing sparse kernel adaptive filters
KR101879416B1 (en) Apparatus and method for detecting abnormal financial transaction
US20210319006A1 (en) Method And System To Estimate The Cardinality Of Sets And Set Operation Results From Single And Multiple HyperLogLog Sketches
US20090319829A1 (en) Pattern extraction method and apparatus
US11580194B2 (en) Information processing apparatus, information processing method, and program
CN110679114B (en) Method for estimating deletability of data object
CN113570396A (en) Time series data abnormity detection method, device, equipment and storage medium
CN112737798B (en) Host resource allocation method and device, scheduling server and storage medium
CN109800138B (en) CPU testing method, electronic device and storage medium
CN113011624A (en) User default prediction method, device, equipment and medium
CN113129064A (en) Automobile part price prediction method, system, equipment and readable storage medium
CN107656927B (en) Feature selection method and device
US20080126029A1 (en) Run-Time Characterization of On-Demand Analytical Model Accuracy
CN116542507A (en) Process abnormality detection method, electronic device, computer storage medium, and program product
CN115563310A (en) Method, device, equipment and medium for determining key service node
CN111898626B (en) Model determination method and device and electronic equipment
CN113296951A (en) Resource allocation scheme determination method and equipment
CN113449062A (en) Track processing method and device, electronic equipment and storage medium
CN112532692A (en) Information pushing method and device and storage medium
CN113010571A (en) Data detection method, data detection device, electronic equipment, storage medium and program product
CN112613762A (en) Knowledge graph-based group rating method and device and electronic equipment
CN110968773A (en) Application recommendation method, device, equipment and storage medium
CN112650741A (en) Abnormal data identification and correction method, system, equipment and readable storage medium
CN113535067A (en) Data storage method, device, equipment and storage medium
KR20120058417A (en) Method and system for machine-learning based optimization and customization of document similarities calculation

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination