CN113011624A

CN113011624A - User default prediction method, device, equipment and medium

Info

Publication number: CN113011624A
Application number: CN201911308640.8A
Authority: CN
Inventors: 吴飞; 韩屹
Original assignee: China Mobile Communications Group Co Ltd; China Mobile Shanghai ICT Co Ltd
Current assignee: China Mobile Communications Group Co Ltd; China Mobile Shanghai ICT Co Ltd
Priority date: 2019-12-18
Filing date: 2019-12-18
Publication date: 2021-06-22

Abstract

The embodiment of the invention discloses a user default prediction method, a device, equipment and a medium. The method comprises the following steps: acquiring credit investigation data of a target user and mobile communication data of the target user; calculating default probability of the target user by utilizing at least one classification model based on credit investigation data of the target user and mobile communication data of the target user; and judging whether the target user defaults according to the default probability of the target user. The user default prediction method, device, equipment and medium can improve the accuracy of user default prediction.

Description

User default prediction method, device, equipment and medium

Technical Field

The present invention relates to the field of data processing technologies, and in particular, to a method, an apparatus, a device, and a medium for predicting a user default.

Background

The credit investigation data is also called credit information, the credit investigation data reflecting the credit condition of the enterprise is called the enterprise credit information, and the credit investigation data reflecting the personal credit condition is called the personal credit information.

Whether the user will generate default conditions or not can be predicted by utilizing the credit investigation data. Currently, only a single classification model is used to predict whether a user will generate a breach. Whether the user will generate default or not is predicted by using a single classification model, and the prediction accuracy is low.

Disclosure of Invention

The embodiment of the invention provides a user default prediction method, a device, equipment and a medium, which can improve the accuracy of user default prediction.

In one aspect, an embodiment of the present invention provides a user default prediction method, including:

acquiring credit investigation data of a target user and mobile communication data of the target user;

calculating default probability of the target user by utilizing at least one classification model based on credit investigation data of the target user and mobile communication data of the target user; the classification model is used for calculating default probability of the user;

and judging whether the target user defaults according to the default probability of the target user.

In an embodiment of the present invention, determining whether a target user violates a rule according to the default probability of the target user includes:

calculating the average default probability of the target user according to the default probability;

and if the average default probability of the target user is greater than the preset probability, predicting the default of the target user.

In an embodiment of the present invention, before obtaining credit investigation data of a target user and mobile communication data of the target user, the method for predicting user default provided in the embodiment of the present invention further includes:

and training at least one classification model by using credit investigation data of at least one user and mobile communication data of at least one user.

In one embodiment of the present invention, training at least one classification model using credit data of at least one user and mobile communication data of at least one user comprises:

for each user in at least one user, determining derivative data corresponding to the user according to credit investigation data of the user and mobile communication data of the user;

and training at least one classification model according to credit investigation data of at least one user, mobile communication data of at least one user and derivative data corresponding to at least one user.

In an embodiment of the present invention, for each of at least one user, determining derivative data corresponding to the user according to credit investigation data of the user and mobile communication data of the user includes:

and aiming at each user in at least one user, taking the product of the credit value corresponding to the first credit investigation feature in the credit investigation data of the user and the mobile communication value corresponding to the first mobile communication feature in the mobile communication data of the user as one derivative data corresponding to the user.

and aiming at each user of at least one user, taking the product of the quotient of the mobile communication value corresponding to the second mobile communication characteristic in the mobile communication data of the user and the mobile communication value corresponding to the third mobile communication characteristic and the credit investigation value corresponding to the second credit investigation characteristic in the credit investigation data of the user as one derivative data corresponding to the user.

for each user of at least one user, preprocessing credit investigation data of the user and mobile communication data of the user;

and determining the derived data of the user by using the data obtained after preprocessing.

In one embodiment of the present invention, training at least one classification model according to credit investigation data of at least one user, mobile communication data of at least one user and derivative data corresponding to at least one user comprises:

determining the feature importance of each feature in at least one credit investigation feature corresponding to the credit investigation data, at least one mobile communication feature corresponding to the mobile communication data and at least one derivative feature corresponding to the derivative data;

removing feature data corresponding to features of which the feature importance is smaller than the preset feature importance from credit investigation data of at least one user, mobile communication data of at least one user and derivative data corresponding to at least one user;

and training at least one classification model by using the data obtained after the removal.

In another aspect, an embodiment of the present invention provides a user default prediction apparatus, including:

the acquisition module is used for acquiring credit investigation data of a target user and mobile communication data of the target user;

the calculation module is used for calculating the default probability of the target user by utilizing at least one classification model based on the credit investigation data of the target user and the mobile communication data of the target user; the classification model is used for calculating default probability of the user;

and the prediction module is used for judging whether the target user defaults according to the default probability of the target user.

In an embodiment of the present invention, the prediction module is specifically configured to:

and if the average default probability is larger than the preset probability, determining the default of the target user.

In an embodiment of the present invention, an apparatus for predicting a user default provided in the embodiment of the present invention further includes:

and the training module is used for training at least one classification model by utilizing credit investigation data of at least one user and mobile communication data of at least one user.

In one embodiment of the invention, a training module comprises:

the determining unit is used for determining derivative data corresponding to the user according to credit investigation data of the user and mobile communication data of the user aiming at each user in at least one user;

and the training unit is used for training at least one classification model according to credit investigation data of at least one user, mobile communication data of at least one user and derivative data corresponding to at least one user.

In an embodiment of the present invention, the determining unit is specifically configured to:

In an embodiment of the present invention, the training unit is specifically configured to:

In another aspect, an embodiment of the present invention provides a user default prediction apparatus, including: a memory, a processor, and a computer program stored on the memory and executable on the processor;

the processor implements the user default prediction method provided by the embodiments of the present invention when executing the computer program.

In another aspect, an embodiment of the present invention provides a computer-readable storage medium, on which a computer program is stored, where the computer program, when executed by a processor, implements the user default prediction method provided by the embodiment of the present invention.

According to the user default prediction method, the device, the equipment and the medium, the default probability of the user is calculated by using the at least one classification model, the default probability of the user is calculated according to the at least one classification model, whether the user is default or not is judged, and the accuracy of user default prediction can be improved.

Drawings

In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings required to be used in the embodiments of the present invention will be briefly described below, and for those skilled in the art, other drawings can be obtained according to the drawings without creative efforts.

FIG. 1 is a first flowchart illustrating a method for predicting a user default according to an embodiment of the present invention;

FIG. 2 is a second flowchart illustrating a method for predicting a user default according to an embodiment of the present invention;

FIG. 3 is a schematic structural diagram of a user default prediction apparatus according to an embodiment of the present invention;

FIG. 4 sets forth a block diagram of an exemplary hardware architecture of computing devices capable of implementing the user violation prediction method and apparatus according to embodiments of the present invention.

Detailed Description

Features and exemplary embodiments of various aspects of the present invention will be described in detail below, and in order to make objects, technical solutions and advantages of the present invention more apparent, the present invention will be further described in detail below with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are merely illustrative of the invention and are not to be construed as limiting the invention. It will be apparent to one skilled in the art that the present invention may be practiced without some of these specific details. The following description of the embodiments is merely intended to provide a better understanding of the present invention by illustrating examples of the present invention.

It is noted that, herein, relational terms such as first and second, and the like may be used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Also, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising … …" does not exclude the presence of other identical elements in a process, method, article, or apparatus that comprises the element.

In order to solve the problems in the prior art, embodiments of the present invention provide a method, an apparatus, a device, and a medium for predicting a user default. First, a user default prediction method provided by the embodiment of the present invention is described below.

Fig. 1 shows a first flowchart of a user default prediction method according to an embodiment of the present invention. The user default prediction method may include:

s101: acquiring credit investigation data of a target user and mobile communication data of the target user.

S102: and calculating the default probability of the target user by utilizing at least one classification model based on the credit investigation data of the target user and the mobile communication data of the target user.

Wherein the classification model is used for calculating the default probability of the user.

S103: and judging whether the target user defaults according to the default probability of the target user.

The credit investigation data of the embodiment of the invention includes but is not limited to: the credit card consumption limit of the user, the average monthly consumption amount of the credit card of the user, the average monthly consumption pen number of the credit card of the user and the like. Mobile communication data includes, but is not limited to: the number of financial Application programs (APP) of the user terminal equipment, monthly mobile communication package cost of the user, monthly average mobile communication cost of the user and the like.

Illustratively, assume that there are three classification models for calculating the user's default probability. The three classification models are respectively: distributed Gradient Boosting framework (LightGBM), Gradient Boosting iterative decision tree (GBDT), and eXtreme Gradient Boosting (XGBoost) based on decision tree algorithms.

In an embodiment of the present invention, the average default probability of the target user may be calculated according to the default probability of the target user calculated by using at least one classification model; and if the average default probability of the target user is greater than the preset probability, predicting the default of the target user.

In an embodiment of the present invention, the average default probability of the embodiment of the present invention may be an arithmetic average default probability, a weighted average default probability, a geometric average default probability, a harmonic average default probability, or a squared average default probability.

Illustratively, the arithmetic mean default probability is taken as an example. The default probability of the target user is calculated to be 0.44 by using the LightGBM; calculating default probability of the target user to be 0.49 by using GBDT; and calculating the default probability of the target user to be 0.81 by using the XGboost.

The arithmetic average default probability of the target user is: (0.44+0.49+ 0.81)/3-0.58.

And judging whether the target user defaults according to the arithmetic average default probability 0.58 of the target user.

In one embodiment of the present invention, if the average default probability of the target user is greater than the preset probability, the default of the target user is predicted.

Illustratively, assuming that the preset probability is 0.5, the average default probability 0.58 of the target user is greater than the preset probability 0.5, and the target user default is predicted.

Illustratively, assuming that the preset probability is 0.75, the average default probability 0.58 of the target user is less than the preset probability 0.75, and the target user is predicted not to default.

According to the user default prediction method provided by the embodiment of the invention, the default probability of the user is predicted by using the at least one classification model, and the average value of the default probabilities of the user is predicted based on the at least one classification model to predict whether the user is default, so that the accuracy of user default prediction can be improved. In addition, the embodiment of the invention introduces mobile communication data, and can further improve the accuracy of the user default prediction.

In one embodiment of the present invention, using at least one classification model, the at least one classification model needs to be trained before predicting the probability of breach of the target user.

In one embodiment of the invention, at least one classification model may be trained using credit data of at least one user and mobile communication data of at least one user.

For example, it is assumed that the credit investigation data of the user includes data corresponding to X credit investigation features, and the mobile communication data of the user includes data corresponding to Y mobile communication features.

The credit data and mobile communication data of at least one user are obtained as shown in table 1.

TABLE 1

In Table 1, ZT_ijData representing credit investigation characteristics j in credit investigation data of a user i; YT_ijData indicating a mobile communication characteristic j in the mobile communication data of the user i.

At least one classification model is trained using credit investigation data and mobile communication data of the N users in table 1.

The embodiment of the invention does not limit the way of training the classification model by utilizing credit investigation data and mobile communication data, and any available training way can be applied to the embodiment of the invention.

In one embodiment of the present invention, when training at least one classification model by using credit investigation data of at least one user and mobile communication data of at least one user, for each user of at least one user, derivative data corresponding to the user can be determined according to the credit investigation data of the user and the mobile communication data of the user; and training at least one classification model according to credit investigation data of at least one user, mobile communication data of at least one user and derivative data corresponding to at least one user.

In an embodiment of the present invention, when determining, for each of at least one user, derivative data corresponding to the user according to credit investigation data of the user and mobile communication data of the user, for each of the at least one user, a product of a credit investigation value corresponding to a first credit investigation feature in the credit investigation data of the user and a mobile communication value corresponding to the first mobile communication feature in the mobile communication data of the user may be used as the one derivative data corresponding to the user.

For example, it is assumed that the first credit-assessing feature is the credit card consumption limit of the user, and the first mobile communication feature is the number of financial APP of the user terminal equipment.

And performing product operation on the financial APP number of the user terminal equipment in the mobile communication data of the user and the credit card consumption amount of the user in the credit investigation data of the user, and taking the obtained product as derivative data of the user. The more financial APP of the user terminal equipment, the more financial APP, the more fund flow of the user can be reflected generally from the side, the financial APP number of the user terminal equipment is multiplied by the credit card consumption amount of the user, the characteristic of the credit card consumption amount of the user is amplified, the credit card consumption amount of the user is originally a variable which has a larger influence on the model in default modeling, the difference of the characteristic among samples is amplified by combining the financial APP number of the user terminal equipment, the sensitivity of a default prediction model to the characteristic can be improved, and the accuracy of the model is improved.

In an embodiment of the present invention, when determining, for each of at least one user, derivative data corresponding to the user according to credit investigation data of the user and mobile communication data of the user, for each of the at least one user, a product of a quotient of a mobile communication value corresponding to a second mobile communication feature in the mobile communication data of the user and a mobile communication value corresponding to a third mobile communication feature and a credit investigation value corresponding to the second credit investigation feature in the credit investigation data of the user may be used as one derivative data corresponding to the user.

Illustratively, the second mobile communication characteristic is the average mobile communication cost per month of the user, the third mobile communication characteristic is the monthly mobile communication set meal fee of the user, and the second credit-assessing characteristic is the average consumption number per month of the credit card of the user.

Carrying out quotient operation on the monthly average mobile communication cost of the user and the monthly mobile communication package fee of the user in the mobile communication data of the user, carrying out product operation on the obtained quotient value and the monthly average consumption number of the credit card of the user in the credit data of the user, and taking the obtained product as another derivative data of the user. The quotient value can reflect monthly over-run condition of the user, and indirectly reflect planning capacity and consumption stability of the user. The quotient is multiplied by the average consumption number of the user credit card per month, the characteristic of the average consumption number of the user credit card per month is amplified, and the model accuracy can be improved.

Illustratively, it is assumed that the obtained credit investigation data and mobile communication data of at least one user are as shown in table 1 above. The results of generating the derivative data are shown in table 2.

TABLE 2

In Table 2, DT_ijData representing the derived feature j in the derived data of user i.

Training at least one classification model using credit investigation data, mobile communication data and derivative data of the N users in table 2.

In an embodiment of the present invention, when determining, for each of at least one user, derivative data corresponding to the user according to credit investigation data of the user and mobile communication data of the user, the credit investigation data of the user and the mobile communication data of the user may be preprocessed for each of the at least one user; and determining the derived data of the user by using the data obtained after preprocessing.

Preprocessing of embodiments of the present invention includes, but is not limited to, missing value supplementation, deletion features, and outlier replacement.

For example, it is assumed that the eigenvalue of credit investigation feature 1 in the credit investigation data of user 1 is missing.

In an embodiment of the present invention, the feature value of the credit investigation feature 1 in the credit investigation data of the user 1 may be supplemented with the average feature value of the credit investigation feature 1 in the credit investigation data of the users 2 to N.

The average characteristic value of the embodiment of the present invention may be an arithmetic average characteristic value, a weighted average characteristic value, a geometric average characteristic value, a harmonic average characteristic value, or a square average characteristic value.

In an embodiment of the present invention, the characteristic value of the credit investigation feature 1 in the credit investigation data of the user 1 may be supplemented with a mode of the characteristic values of the credit investigation feature 1 in the credit investigation data of the users 2 to N.

In one embodiment of the invention, a feature may be deleted if, for that feature, multiple users (e.g., more than half of the users) lack the feature value for that feature.

In one embodiment of the invention, for an abnormal feature value, the average value of other normal feature values may be used to replace the abnormal feature value.

In one embodiment of the present invention, if a plurality of users (for example, more than half of the users) have abnormal feature values corresponding to a feature, the feature may be deleted.

In one embodiment of the invention, a box graph may be used to determine whether a feature value is an outlier feature value.

After the credit investigation data of the user and the mobile communication data of the user are preprocessed, the derived data of the user can be determined by utilizing the preprocessed data.

The process of determining the derivative data of the user by using the data obtained after the preprocessing is similar to the process of determining the derivative data of the user by using the data which is not preprocessed, and specifically, the process of determining the derivative data of the user by using the data which is not preprocessed can be referred to. The embodiments of the present invention are not described herein in detail.

It can be understood that, after preprocessing credit investigation data and mobile communication data of each of at least one user, when predicting the default probability of the target user by using at least one classification model based on the credit investigation data and mobile communication data of the target user, the credit investigation data and mobile communication data of the target user also need to be preprocessed accordingly.

For example, when the credit investigation data and the mobile communication data of each of the at least one user are preprocessed, the credit investigation feature j in the credit investigation data is deleted, and for the target user, the credit investigation feature j in the credit investigation data of the target user needs to be deleted.

For another example, when the credit investigation data and the mobile communication data of each of the at least one user are preprocessed, the feature value of the credit investigation feature j in the credit investigation data is supplemented by the average value, and for the target user, if the feature value of the credit investigation feature j in the credit investigation data of the target user is missing, the feature value of the credit investigation feature j in the credit investigation data of the target user is supplemented by the average value.

For another example, when the credit investigation data and the mobile communication data of each of the at least one user are preprocessed, the characteristic value of the credit investigation feature j in the credit investigation data is supplemented by a mode, and for the target user, if the characteristic value of the credit investigation feature j in the credit investigation data of the target user is missing, the characteristic value of the credit investigation feature j in the credit investigation data of the target user is supplemented by a mode.

In an embodiment of the present invention, when training at least one classification model according to credit investigation data of at least one user, mobile communication data of at least one user and derivative data corresponding to at least one user, the feature importance of each feature of at least one credit investigation feature corresponding to the credit investigation data, at least one mobile communication feature corresponding to the mobile communication data and at least one derivative feature corresponding to the derivative data can be determined; removing feature data corresponding to features of which the feature importance is smaller than the preset feature importance from credit investigation data of at least one user, mobile communication data of at least one user and derivative data corresponding to at least one user; and training at least one classification model by using the data obtained after the removal.

In one embodiment of the invention, a random forest algorithm may be used to determine the feature importance of each feature.

Illustratively, assume that the predetermined characteristic importance is 0.5.

Based on table 2 above, the feature importance of the credit investigation feature 1 is 0.25, … …, the feature importance of the credit investigation feature i is 0.75, … …, the feature importance of the credit investigation feature X is 0.8, the feature importance of the mobile communication feature 1 is 0.55, … …, the feature importance of the mobile communication feature i is 0.73, … …, the feature importance of the mobile communication feature Y is 0.81, the feature importance of the derivative feature 1 is 0.6, … …, the feature importance of the derivative feature i is 0.78, … …, and the feature importance of the derivative feature Z is 0.86.

And if the characteristic importance of the credit investigation feature 1 is smaller than the preset characteristic importance, removing the characteristic data corresponding to the credit investigation feature 1.

The result of removing the signature data corresponding to the credit signature 1 is shown in table 3.

TABLE 3

At least one classification model is trained based on the data shown in table 3.

Based on the above, a user default prediction method provided by the embodiment of the present invention is shown in fig. 2. Fig. 2 is a second flowchart illustrating a user default prediction method according to an embodiment of the present invention.

First, credit investigation data and mobile communication data of at least one user are obtained.

And preprocessing credit investigation data and mobile communication data of at least one user.

And determining the derivative data of at least one user by utilizing the preprocessed credit investigation data and mobile communication data of at least one user.

Determining a feature importance of each of the features of the credit data, the mobile communication data and the derived data of the at least one user.

And screening the features based on the feature importance, and removing feature data corresponding to the features with the feature importance smaller than the preset feature importance.

Training N classification models for predicting default probability of the user by utilizing credit investigation data, mobile communication data and derivative data of at least one user after characteristic data corresponding to the characteristics with the characteristic importance smaller than the preset characteristic importance are removed;

and acquiring credit investigation data and mobile communication data of the target user.

And correspondingly preprocessing credit investigation data and mobile communication data of the target user.

And determining the derived data of the target user by utilizing the preprocessed credit investigation data and mobile communication data of the target user.

And removing the characteristic data corresponding to the characteristic with the characteristic importance smaller than the preset characteristic importance from the credit investigation data, the mobile communication data and the derivative data of the target user.

And calculating the default probability of the target user by utilizing the N classification models based on the credit investigation data, the mobile communication data and the derivative data of the target user after the feature data corresponding to the features with the feature importance smaller than the preset feature importance are removed.

And calculating the average default probability of the target user.

And (4) default prediction, namely judging whether the average default probability of the target user is greater than the preset probability, if so, predicting default of the target user, and if not, predicting that the target user does not default.

Corresponding to the above method embodiment, the embodiment of the present invention further provides a device for predicting user default.

Fig. 3 is a schematic structural diagram of a user default prediction apparatus according to an embodiment of the present invention. The user default prediction means may include:

the obtaining module 301 is configured to obtain credit investigation data of a target user and mobile communication data of the target user.

A calculating module 302, configured to calculate a default probability of the target user by using at least one classification model based on the credit investigation data of the target user and the mobile communication data of the target user.

The classification model is used for calculating default probability of the user.

The predicting module 303 is configured to determine whether the target user defaults according to the default probability of the target user.

In an embodiment of the present invention, the prediction module 303 may be specifically configured to:

In an embodiment of the present invention, the apparatus for predicting a user default provided in the embodiment of the present invention may further include:

In one embodiment of the present invention, the training module may include:

In an embodiment of the present invention, the determining unit may be specifically configured to:

In an embodiment of the present invention, the training unit may specifically be configured to:

FIG. 4 sets forth a block diagram of an exemplary hardware architecture of computing devices capable of implementing the user violation prediction method and apparatus according to embodiments of the present invention. As shown in fig. 4, computing device 400 includes an input device 401, an input interface 402, a central processor 403, a memory 404, an output interface 405, and an output device 406. The input interface 402, the central processing unit 403, the memory 404, and the output interface 405 are connected to each other through a bus 410, and the input device 401 and the output device 406 are connected to the bus 410 through the input interface 402 and the output interface 405, respectively, and further connected to other components of the computing device 400.

Specifically, the input device 401 receives input information from the outside and transmits the input information to the central processor 403 through the input interface 402; the central processor 403 processes the input information based on computer-executable instructions stored in the memory 404 to generate output information, stores the output information temporarily or permanently in the memory 404, and then transmits the output information to the output device 406 through the output interface 405; output device 406 outputs the output information outside of computing device 400 for use by a user.

That is, the computing device shown in fig. 4 may also be implemented as a user breach prediction device, which may include: a memory storing a computer program; and a processor, which can implement the user default prediction method provided by the embodiment of the present invention when executing the computer program.

An embodiment of the present invention further provides a computer-readable storage medium, where a computer program is stored on the computer-readable storage medium; the computer program, when executed by a processor, implements the user default prediction method provided by embodiments of the present invention.

It is to be understood that the invention is not limited to the specific arrangements and instrumentality described above and shown in the drawings. A detailed description of known methods is omitted herein for the sake of brevity. In the above embodiments, several specific steps are described and shown as examples. However, the method processes of the present invention are not limited to the specific steps described and illustrated, and those skilled in the art can make various changes, modifications and additions or change the order between the steps after comprehending the spirit of the present invention.

The functional blocks shown in the above-described structural block diagrams may be implemented as hardware, software, firmware, or a combination thereof. When implemented in hardware, it may be, for example, an electronic circuit, an Application Specific Integrated Circuit (ASIC), suitable firmware, plug-in, function card, or the like. When implemented in software, the elements of the invention are the programs or code segments used to perform the required tasks. The program or code segments may be stored in a machine-readable medium or transmitted by a data signal carried in a carrier wave over a transmission medium or a communication link. A "machine-readable medium" may include any medium that can store or transfer information. Examples of a machine-readable medium include electronic circuits, semiconductor memory devices, ROM, flash memory, Erasable ROM (EROM), floppy disks, CD-ROMs, optical disks, hard disks, fiber optic media, Radio Frequency (RF) links, and so forth. The code segments may be downloaded via computer networks such as the internet, intranet, etc.

It should also be noted that the exemplary embodiments mentioned in this patent describe some methods or systems based on a series of steps or devices. However, the present invention is not limited to the order of the above-described steps, that is, the steps may be performed in the order mentioned in the embodiments, may be performed in an order different from the order in the embodiments, or may be performed simultaneously.

As described above, only the specific embodiments of the present invention are provided, and it can be clearly understood by those skilled in the art that, for convenience and brevity of description, the specific working processes of the system, the module and the unit described above may refer to the corresponding processes in the foregoing method embodiments, and are not described herein again. It should be understood that the scope of the present invention is not limited thereto, and any person skilled in the art can easily conceive various equivalent modifications or substitutions within the technical scope of the present invention, and these modifications or substitutions should be covered within the scope of the present invention.

Claims

1. A method for predicting a user breach, the method comprising:

calculating a default probability of the target user based on the credit investigation data and the mobile communication data by using at least one classification model; the classification model is used for calculating default probability of the user;

and judging whether the target user defaults according to the default probability.

2. The method of claim 1, wherein said determining whether the target user violates the default according to the default probability comprises:

and if the average default probability is larger than the preset probability, judging the default of the target user.

3. The method of claim 1, wherein before the obtaining credit investigation data of the target user and mobile communication data of the target user, the method further comprises:

and training the at least one classification model by utilizing credit investigation data of at least one user and mobile communication data of the at least one user.

4. The method of claim 3, wherein the training the at least one classification model using credit data of at least one user and mobile communication data of the at least one user comprises:

for each user of the at least one user, determining derivative data corresponding to the user according to credit investigation data of the user and mobile communication data of the user;

and training the at least one classification model according to the credit investigation data of the at least one user, the mobile communication data of the at least one user and the derivative data corresponding to the at least one user.

5. The method according to claim 4, wherein the determining, for each of the at least one user, the derivative data corresponding to the user according to the credit investigation data of the user and the mobile communication data of the user comprises:

and aiming at each user of the at least one user, taking the product of the credit value corresponding to the first credit characteristic in the credit data of the user and the mobile communication value corresponding to the first mobile communication characteristic in the mobile communication data of the user as one derivative data corresponding to the user.

6. The method according to claim 4, wherein the determining, for each of the at least one user, the derivative data corresponding to the user according to the credit investigation data of the user and the mobile communication data of the user comprises:

and for each user of the at least one user, taking the product of the quotient of the mobile communication value corresponding to the second mobile communication characteristic in the mobile communication data of the user and the mobile communication value corresponding to the third mobile communication characteristic and the credit investigation value corresponding to the second credit investigation characteristic in the credit investigation data of the user as one derivative data corresponding to the user.

7. The method according to claim 4, wherein the determining, for each of the at least one user, the derivative data corresponding to the user according to the credit investigation data of the user and the mobile communication data of the user comprises:

for each user of the at least one user, preprocessing credit investigation data of the user and mobile communication data of the user;

8. The method of claim 4, wherein the training the at least one classification model according to the credit data of the at least one user, the mobile communication data of the at least one user, and the derivative data corresponding to the at least one user comprises:

removing feature data corresponding to features of which the feature importance is smaller than the preset feature importance from credit investigation data of the at least one user, mobile communication data of the at least one user and derivative data corresponding to the at least one user;

and training the at least one classification model by using the data obtained after the removal.

9. A user breach prediction apparatus, comprising:

a calculation module for calculating a default probability of the target user based on the credit investigation data and the mobile communication data by using at least one classification model; the classification model is used for calculating default probability of the user;

and the prediction module is used for judging whether the target user defaults according to the default probability.

10. The apparatus of claim 9, wherein the prediction module is specifically configured to:

11. The apparatus of claim 9, further comprising:

and the training module is used for training the at least one classification model by utilizing credit investigation data of at least one user and mobile communication data of the at least one user.

12. A user breach prediction device, the device comprising: a memory, a processor, and a computer program stored on the memory and executable on the processor;

the processor, when executing the computer program, implements the user violation prediction method of any of claims 1-8.

13. A computer-readable storage medium, having stored thereon a computer program which, when executed by a processor, implements the user violation prediction method of any of claims 1-8.