CN112613920A - Loss probability prediction method and device - Google Patents

Loss probability prediction method and device Download PDF

Info

Publication number
CN112613920A
CN112613920A CN202011640346.XA CN202011640346A CN112613920A CN 112613920 A CN112613920 A CN 112613920A CN 202011640346 A CN202011640346 A CN 202011640346A CN 112613920 A CN112613920 A CN 112613920A
Authority
CN
China
Prior art keywords
target
time interval
customer
client
characteristic data
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202011640346.XA
Other languages
Chinese (zh)
Inventor
朱红伟
吴正良
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Agricultural Bank of China
Original Assignee
Agricultural Bank of China
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Agricultural Bank of China filed Critical Agricultural Bank of China
Priority to CN202011640346.XA priority Critical patent/CN112613920A/en
Publication of CN112613920A publication Critical patent/CN112613920A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q30/00Commerce
    • G06Q30/02Marketing; Price estimation or determination; Fundraising
    • G06Q30/0201Market modelling; Market analysis; Collecting market data
    • G06Q30/0202Market predictions or forecasting for commercial activities
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q30/00Commerce
    • G06Q30/02Marketing; Price estimation or determination; Fundraising
    • G06Q30/0201Market modelling; Market analysis; Collecting market data
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q40/00Finance; Insurance; Tax strategies; Processing of corporate or income taxes
    • G06Q40/03Credit; Loans; Processing thereof
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q40/00Finance; Insurance; Tax strategies; Processing of corporate or income taxes
    • G06Q40/04Trading; Exchange, e.g. stocks, commodities, derivatives or currency exchange
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q40/00Finance; Insurance; Tax strategies; Processing of corporate or income taxes
    • G06Q40/06Asset management; Financial planning or analysis

Landscapes

  • Business, Economics & Management (AREA)
  • Engineering & Computer Science (AREA)
  • Accounting & Taxation (AREA)
  • Finance (AREA)
  • Development Economics (AREA)
  • Strategic Management (AREA)
  • Theoretical Computer Science (AREA)
  • Marketing (AREA)
  • Physics & Mathematics (AREA)
  • General Business, Economics & Management (AREA)
  • General Physics & Mathematics (AREA)
  • Economics (AREA)
  • Entrepreneurship & Innovation (AREA)
  • Technology Law (AREA)
  • Game Theory and Decision Science (AREA)
  • Data Mining & Analysis (AREA)
  • Human Resources & Organizations (AREA)
  • Operations Research (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

The application discloses a loss probability prediction method and a loss probability prediction device, wherein the method comprises the following steps: acquiring characteristic data of a target client in a target observation time interval; predicting the loss probability of the target client in the target expression time interval according to the feature data of the target client in the target observation time interval and a pre-trained XGboost model; the start time of the target presentation time interval is after the end time of the target observation time interval; the XGboost model is trained according to training samples; the training sample comprises characteristic data of a client set in an observation time interval and the marked client set; the attrition customers are customers of a reduced level in the set of customers. Therefore, according to the method provided by the application, the loss probability of the client in the presentation time can be efficiently obtained through the characteristic data of the client in the target observation time according to the pre-trained XGboost model.

Description

Loss probability prediction method and device
Technical Field
The application relates to the field of computers, in particular to a loss probability prediction method and device.
Background
The customer prediction attrition probability has become the data information needed in many industries, and at present, the customer prediction attrition probability is mainly manually predicted according to the customer information. However, in some scenarios, the number of customers is huge, and if the information of the customers is collated manually and the probability of customer loss is calculated, a large amount of manpower is consumed, and the efficiency is low. Therefore, how to provide a method for predicting the loss probability with high efficiency becomes a technical problem which needs to be solved urgently in the field.
Disclosure of Invention
In order to solve the technical problem, the application provides a loss probability prediction method and a loss probability prediction device, which are used for efficiently predicting the loss probability of a customer.
In order to achieve the above purpose, the technical solutions provided in the embodiments of the present application are as follows:
the embodiment of the application provides a loss probability prediction method, which comprises the following steps:
acquiring characteristic data of a target client in a target observation time interval;
predicting the loss probability of the target client in the target expression time interval according to the feature data of the target client in the target observation time interval and a pre-trained XGboost model; the start time of the target presentation time interval is after the end time of the target observation time interval;
the XGboost model is trained according to training samples; the training sample comprises characteristic data of a client set in an observation time interval and the marked client set; the attrition customers are customers of reduced rank in the customer set; the start time of the presentation time interval is after the end time of the observation time interval.
Optionally, the method further comprises:
determining the group type of each customer in the customer set; each group type has a corresponding level division mode;
and determining whether the level of each client is degraded or not according to the level division mode corresponding to the group type to which each client belongs.
Optionally, when the feature data comprises a plurality of feature data types; the method further comprises the following steps:
and when the loss probability of the target client in the target expression time interval exceeds a preset threshold value, obtaining one characteristic data with the maximum weight in the multiple characteristic data types according to the characteristic data of the target client in the target observation time interval and a pre-trained XGboost model.
Optionally, the method further comprises:
obtaining an analysis report of a target customer set; the analysis report comprises a loss customer list and loss reasons, wherein the loss probability in the target customer set exceeds the preset threshold; the loss reason is obtained by analyzing the characteristic data with the maximum weight in the multiple characteristic data types.
Optionally, the feature data comprises:
at least one of customer basic information, customer financial asset information, customer bank card information, customer transaction information, customer insurance information, and bank agent customer payroll information.
The embodiment of the present application further provides a loss probability prediction device, the device includes:
the obtaining module is used for obtaining the characteristic data of the target client in the target observation time interval;
the prediction module is used for predicting the loss probability of the target client in the target expression time interval according to the feature data of the target client in the target observation time interval and a pre-trained XGboost model; the start time of the target presentation time interval is after the end time of the target observation time interval;
the XGboost model is trained according to training samples; the training sample comprises characteristic data of a client set in an observation time interval and the marked client set; the attrition customers are customers of reduced rank in the customer set; the start time of the presentation time interval is after the end time of the observation time interval.
Optionally, the apparatus further comprises:
the determining module is used for determining the group type of each customer in the customer set; each group type has a corresponding level division mode;
and the dividing module is used for determining whether the level of each client is degraded or not according to the level dividing mode corresponding to the group type to which each client belongs.
Optionally, when the feature data comprises a plurality of feature data types; the device further comprises:
and the reason obtaining module is used for obtaining one characteristic data with the largest weight in the multiple characteristic data types according to the characteristic data of the target client in the target observation time interval and a pre-trained XGboost model when the loss probability of the target client in the target expression time interval exceeds a preset threshold value.
Optionally, the apparatus further comprises:
the report acquisition module is used for acquiring an analysis report of the target customer set; the analysis report comprises a loss customer list and loss reasons, wherein the loss probability in the target customer set exceeds the preset threshold; the loss reason is obtained by analyzing the characteristic data with the maximum weight in the multiple characteristic data types.
Optionally, the feature data comprises:
at least one of customer basic information, customer financial asset information, customer bank card information, customer transaction information, customer insurance information, and bank agent customer payroll information.
According to the technical scheme, the method has the following beneficial effects:
the embodiment of the application provides a loss probability prediction method and a loss probability prediction device, wherein the method comprises the following steps: acquiring characteristic data of a target client in a target observation time interval; predicting the loss probability of the target client in the target expression time interval according to the feature data of the target client in the target observation time interval and a pre-trained XGboost model; the start time of the target presentation time interval is after the end time of the target observation time interval; the XGboost model is trained according to training samples; the training sample comprises characteristic data of a client set in an observation time interval and the marked client set; the attrition customers are customers of reduced rank in the customer set; the start time of the presentation time interval is after the end time of the observation time interval. Therefore, according to the method provided by the application, the loss probability of the client in the presentation time can be efficiently obtained through the characteristic data of the client in the target observation time according to the pre-trained XGboost model.
Drawings
In order to more clearly illustrate the embodiments of the present application or the technical solutions in the prior art, the drawings needed to be used in the description of the embodiments or the prior art will be briefly introduced below, and it is obvious that the drawings in the following description are some embodiments of the present application, and it is obvious for those skilled in the art to obtain other drawings based on these drawings without creative efforts.
Fig. 1 is a schematic flow chart of a method for predicting attrition probability provided in an embodiment of the present application;
FIG. 2 is a schematic diagram of a presentation time interval and an observation time interval provided in an embodiment of the present application;
fig. 3 is a schematic diagram of a method for training an XGBoost model according to an embodiment of the present application;
fig. 4 is a schematic structural diagram of a loss probability prediction apparatus according to an embodiment of the present application.
Detailed Description
In order to help better understand the scheme provided by the embodiment of the present application, before describing the method provided by the embodiment of the present application, a scenario of an application of the scheme of the embodiment of the present application is described.
The customer prediction attrition probability has become the data information needed in many industries, and at present, the customer prediction attrition probability is mainly manually predicted according to the customer information. However, in some scenarios, the number of customers is huge, and if the information of the customers is collated manually and the probability of customer loss is calculated, a large amount of manpower is consumed, and the efficiency is low. Therefore, how to provide a method for predicting the loss probability with high efficiency becomes a technical problem which needs to be solved urgently in the field.
In order to solve the above technical problem, an embodiment of the present application provides a method and an apparatus for predicting a loss probability, where the method includes: acquiring characteristic data of a target client in a target observation time interval; predicting the loss probability of the target client in the target expression time interval according to the feature data of the target client in the target observation time interval and a pre-trained XGboost model; the start time of the target presentation time interval is after the end time of the target observation time interval; the XGboost model is trained according to training samples; the training sample comprises characteristic data of a client set in an observation time interval and the marked client set; the attrition customers are customers of reduced rank in the customer set; the start time of the presentation time interval is after the end time of the observation time interval. Therefore, according to the method provided by the application, the loss probability of the client in the expression time can be efficiently obtained through the characteristic data of the client in the observation time according to the pre-trained XGboost model.
In order to make the aforementioned objects, features and advantages of the present application more comprehensible, embodiments accompanying the drawings are described in detail below.
Referring to fig. 1, a schematic flow chart of a method for predicting attrition probability provided in the embodiment of the present application is shown. As shown in fig. 1, the method for predicting the attrition probability provided in the embodiment of the present application includes the following steps:
s101: and obtaining the characteristic data of the target client in the target observation time interval.
S101: predicting the loss probability of the target client in the target expression time interval according to the feature data of the target client in the target observation time interval and a pre-trained XGboost model; the start time of the target presentation time interval is after the end time of the target observation time interval.
It should be noted that, in the embodiment of the present application, the XGBoost model is trained according to a training sample; the training sample comprises characteristic data of a client set in an observation time interval and the marked client set; the attrition customers are customers of reduced rank in the customer set; the start time of the presentation time interval is after the end time of the observation time interval.
In this embodiment, as a possible implementation manner, the feature data includes: at least one of customer basic information, customer financial asset information, customer bank card information, customer transaction information, customer insurance information, and bank agent customer payroll information.
In this embodiment of the present application, in order to make the prediction accuracy of the method provided in this embodiment of the present application higher, as a possible implementation manner, the method provided in this embodiment of the present application further includes: determining the group type of each customer in the customer set; each group type has a corresponding level division mode; and determining whether the level of each client is degraded or not according to the level division mode corresponding to the group type to which each client belongs. It can be understood that, in the embodiment of the present application, the level division modes corresponding to each group of the client are different, and thus, the method provided in the embodiment of the present application may formulate a suitable division rule according to the characteristics of each group, so that the prediction accuracy of the method provided in the embodiment of the present application is higher.
A specific application method of the type division manner of the client group provided in the embodiment of the present application will be described below:
the method comprises the steps of obtaining characteristics of financial asset balance, current deposit balance, periodic deposit balance, loan balance, investment balance and the like of a personal VIP client according to a client grade mark, and dividing the client into 6 groups by a K-means clustering algorithm and combining business experience.
(1) Type division of customer groups:
high-asset-life customers: the monthly-day average financial assets are more than or equal to 100 ten thousand, and the daily average value of the current deposit and the total financial assets is 80 percent.
High-asset periodic type customer: the monthly-day average financial asset is more than or equal to 100 ten thousand, and the monthly-day average value > of the periodic deposit and the total financial asset accounts for 80 percent.
High-asset investment type customers: the monthly-day average financial assets are more than or equal to 100 ten thousand, and the monthly-day average value > of the investment deposit and the total financial assets accounts for 80 percent.
Low-asset-life customers: the monthly-day average financial assets are less than 100 ten thousand, and the daily average value of the current deposit and the total financial assets is 80 percent.
Low-asset periodic type customer: the monthly-day average financial assets are less than 100 ten thousand, and the monthly-day average value > of the periodic deposits and the total financial assets accounts for 80 percent.
Low-asset investment type clients: the monthly-day average financial assets is less than 100 ten thousand, and the monthly-day average value > of the investment deposit and the total financial assets accounts for 80 percent.
(2) Loss marking
And performing customer downshift marking according to the customer grouping category, dividing each type of customer into N levels according to the customer financial asset scale, and regarding the customer loss as each level is reduced.
High-asset-life customers: the reduction of the size of the customer's life assets by 20% is a level, defined as loss;
high-asset periodic type customer: the regular asset scale of the client is reduced by 20 percent to one level, and is defined as loss;
high-asset investment type customers: the reduction of the investment asset scale of the client by 20 percent is a level, and is defined as loss;
low-asset-life customers: the reduction of the size of the customer's life assets by 40% is a level, defined as loss;
low-asset periodic type customer: the regular asset scale of the client is reduced by 40 percent to one level, and is defined as loss;
low-asset investment type clients: the reduction in the size of the investment assets of the client by 40% is a level of reduction, defined as churn.
The presentation time interval and the observation time interval in the embodiment of the present application are described below:
referring to fig. 2, a schematic diagram of a presentation time interval and an observation time interval provided in an embodiment of the present application is shown. In order to improve the prediction accuracy of the method of the present application, as shown in fig. 2, the start time of the observation period (observation time interval) and the end time of the presentation period (presentation time interval) in the embodiment of the present application may be the same time. As an example, the observation time interval in the embodiment of the present application may be 6 months, and the performance time interval may be 3 months.
After the client is classified, the method provided by the embodiment of the application can also be used for preprocessing the characteristic data of the client. The method for generating feature data of the method provided by the present application will be described in a specific embodiment as follows:
(1) and (5) feature construction.
The customer characteristics generate M characteristic items of the personal VIP customer from dimensions such as type, amount, quantity, time and the like from bank owned data such as customer basic information, customer grade, administrative relationship, financial assets, bank card information, debit card transaction, credit card transaction, channel transaction, transfer, held product, deposit, loan, financing, national debt, fund, precious metal, insurance, third party deposit management, generation wage and the like.
(2) And (4) characteristic derivation.
The method mainly extracts features which have larger influence on results from data according to expert experience and a violent method, and mainly comprises four types of summary features, combination features, statistical methods and user-defined methods.
For the characteristics, summarizing and deriving the characteristics of the amount and the quantity for 1, 2, 3, 6, 9 and 12 months according to the time dimension, improving the significance of the characteristics and better conforming to the business logic; the basic information of the client and the mark characteristics are combined to obtain the combined characteristics, such as the age, whether a credit card combination is held, the gender, whether a financial product combination is held, and the like; the characteristics of the amount and quantity can be further derived to obtain the characteristics of an average value, a maximum value, a minimum value, a standard deviation and the like by a statistical method; meanwhile, the original characteristics of the client can be derived from the service perspective according to the experience of a client manager.
(3) And (5) preprocessing the characteristics.
And aiming at the generated features, preprocessing the features by using a popular method.
First, missing value padding: feature variables with missing values above a certain percentage (e.g., 90%) are deleted; filling the characteristic variables with the missing values lower than 90%, wherein the filling rule is as follows: numerical variables: filling the mean value or filling 0; enumerated variables: the filling mode.
Secondly, capping: and calculating two end points of the money amount class characteristics, wherein the point value smaller than the lower end point is set as the lower end point value, and the point value higher than the upper end point is set as the upper end point value. (e.g., 5% for the lower end and 95% for the upper end).
Thirdly, continuous type characteristic binning: and carrying out block mapping on the continuous features according to intervals to realize discretization of the continuous features. The continuous features are converted into discrete features by calculating a binning threshold based on the maximum entropy of the decision tree.
(4) And (4) selecting characteristics.
And selecting the final application characteristics of the model in two steps for the characteristics which are subjected to the pretreatment.
The method comprises the following steps: and (3) performing model training on the samples of each client group by adopting a combination of a logistic regression algorithm and a random forest learning algorithm, ranking each model training result according to the feature importance, and selecting the optimal first a features as the features to be modeled for each client feature.
Step two: and removing the same type of variables. And (3) clustering the variables to be modeled into b classes by using a K-means clustering algorithm, and selecting the most relevant characteristic variables in each class for reservation.
C characteristics of the final mold are formed through the two steps.
After the feature data of the client is formed, the XGBoost model is trained in the embodiment of the present application. One specific embodiment of training the XGBoost model is described below:
referring to fig. 3, the figure is a schematic diagram of a method for training an XGBoost model according to an embodiment of the present application. As shown in fig. 3, the method for training the XGBoost model provided in the embodiment of the present application includes: after the clients are clustered (data clustering), the clients agreeing to the client cluster are divided into training sets and test sets, then an XGboost model is used for trial, and the parameters of the XGboost model are adjusted by comparing the accuracy rate and the recall rate (the higher the preset threshold value of the loss probability is, the higher the accuracy rate is and the lower the recall rate is in general), so that the optimal module with both the accuracy rate and the recall rate is obtained. As an example, the ratio of the number of customers in the training set and the test set in the embodiment of the present application may be 7: 3.
in the embodiment of the present application, as a possible implementation manner, when the feature data includes a plurality of feature data types; the method further comprises the following steps: and when the loss probability of the target client in the target expression time interval exceeds a preset threshold value, obtaining one characteristic data with the maximum weight in the multiple characteristic data types according to the characteristic data of the target client in the target observation time interval and a pre-trained XGboost model. It can be understood that the method provided by the embodiment of the present application can obtain one of the feature data types with the largest weight, thereby obtaining the possible cause that the client is lost, and thus providing support for downstream applications.
Further, in the embodiments of the present application, as a possible implementation manner, the method provided by the implementation of the present application further includes: obtaining an analysis report of a target customer set; the analysis report comprises a loss customer list and loss reasons, wherein the loss probability in the target customer set exceeds the preset threshold; the loss reason is obtained by analyzing the characteristic data with the maximum weight in the multiple characteristic data types.
To sum up, the embodiment of the present application provides a method for predicting a loss probability, including: acquiring characteristic data of a target client in a target observation time interval; predicting the loss probability of the target client in the target expression time interval according to the feature data of the target client in the target observation time interval and a pre-trained XGboost model; the start time of the target presentation time interval is after the end time of the target observation time interval; the XGboost model is trained according to training samples; the training sample comprises characteristic data of a client set in an observation time interval and the marked client set; the attrition customers are customers of reduced rank in the customer set; the start time of the presentation time interval is after the end time of the observation time interval. Therefore, according to the method provided by the application, the loss probability of the client in the expression time can be efficiently obtained through the characteristic data of the client in the observation time according to the pre-trained XGboost model.
Based on the loss probability prediction method provided by the embodiment, the embodiment of the application also provides a loss probability prediction device.
Referring to fig. 4, the drawing is a schematic structural diagram of a runoff probability predicting device according to an embodiment of the present disclosure. As shown in fig. 4, an attrition probability prediction device provided in the embodiment of the present application includes:
an obtaining module 100, configured to obtain feature data of a target client in a target observation time interval.
The prediction module 200 is used for predicting the loss probability of the target client in the target expression time interval according to the feature data of the target client in the target observation time interval and a pre-trained XGboost model; the start time of the target presentation time interval is after the end time of the target observation time interval.
The XGboost model is trained according to training samples; the training sample comprises characteristic data of a client set in an observation time interval and the marked client set; the attrition customers are customers of reduced rank in the customer set; the start time of the presentation time interval is after the end time of the observation time interval.
In the embodiment of the present application, as a possible implementation manner, the apparatus further includes: the determining module is used for determining the group type of each customer in the customer set; each group type has a corresponding level division mode; and the dividing module is used for determining whether the level of each client is degraded or not according to the level dividing mode corresponding to the group type to which each client belongs.
In the embodiment of the present application, as a possible implementation manner, when the feature data includes a plurality of feature data types; the device further comprises: and the reason obtaining module is used for obtaining one characteristic data with the largest weight in the multiple characteristic data types according to the characteristic data of the target client in the target observation time interval and a pre-trained XGboost model when the loss probability of the target client in the target expression time interval exceeds a preset threshold value.
In the embodiment of the present application, as a possible implementation manner, the apparatus further includes: the report acquisition module is used for acquiring an analysis report of the target customer set; the analysis report comprises a loss customer list and loss reasons, wherein the loss probability in the target customer set exceeds the preset threshold; the loss reason is obtained by analyzing the characteristic data with the maximum weight in the multiple characteristic data types.
In this embodiment, as a possible implementation manner, the feature data includes: at least one of customer basic information, customer financial asset information, customer bank card information, customer transaction information, customer insurance information, and bank agent customer payroll information.
In summary, the embodiment of the application provides a loss probability prediction device, and according to a pre-trained XGBoost model, the loss probability of a client in the performance time can be efficiently obtained through the characteristic data of the client in the observation time of a target.
As can be seen from the above description of the embodiments, those skilled in the art can clearly understand that all or part of the steps in the above embodiment methods can be implemented by software plus a necessary general hardware platform. Based on such understanding, the technical solution of the present application may be essentially or partially implemented in the form of a software product, which may be stored in a storage medium, such as a ROM/RAM, a magnetic disk, an optical disk, etc., and includes several instructions for enabling a computer device (which may be a personal computer, a server, or a network communication device such as a media gateway, etc.) to execute the method according to the embodiments or some parts of the embodiments of the present application.
It should be noted that, in the present specification, the embodiments are described in a progressive manner, each embodiment focuses on differences from other embodiments, and the same and similar parts among the embodiments may be referred to each other. The method disclosed by the embodiment corresponds to the system disclosed by the embodiment, so that the description is simple, and the relevant points can be referred to the system part for description.
It should also be noted that, in this document, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other identical elements in a process, method, article, or apparatus that comprises the element.
The foregoing description of the disclosed embodiments will enable those skilled in the art to make or use the invention in various modifications to these embodiments, which will be apparent to those skilled in the art, and the general principles defined herein may be implemented in other embodiments without departing from the spirit or scope of the application. Thus, the present application is not intended to be limited to the embodiments shown herein but is to be accorded the widest scope consistent with the principles and novel features disclosed herein.

Claims (10)

1. A run-off probability prediction method, comprising:
acquiring characteristic data of a target client in a target observation time interval;
predicting the loss probability of the target client in the target expression time interval according to the feature data of the target client in the target observation time interval and a pre-trained XGboost model; the start time of the target presentation time interval is after the end time of the target observation time interval;
the XGboost model is trained according to training samples; the training sample comprises characteristic data of a client set in an observation time interval and the marked client set; the attrition customers are customers of reduced rank in the customer set; the start time of the presentation time interval is after the end time of the observation time interval.
2. The method of claim 1, further comprising:
determining the group type of each customer in the customer set; each group type has a corresponding level division mode;
and determining whether the level of each client is degraded or not according to the level division mode corresponding to the group type to which each client belongs.
3. The method of claim 1, wherein when the feature data comprises a plurality of feature data types; the method further comprises the following steps:
and when the loss probability of the target client in the target expression time interval exceeds a preset threshold value, obtaining one characteristic data with the maximum weight in the multiple characteristic data types according to the characteristic data of the target client in the target observation time interval and a pre-trained XGboost model.
4. The method of claim 1, further comprising:
obtaining an analysis report of a target customer set; the analysis report comprises a loss customer list and loss reasons, wherein the loss probability in the target customer set exceeds the preset threshold; the loss reason is obtained by analyzing the characteristic data with the maximum weight in the multiple characteristic data types.
5. The method of claim 1, wherein the characterization data comprises:
at least one of customer basic information, customer financial asset information, customer bank card information, customer transaction information, customer insurance information, and bank agent customer payroll information.
6. An attrition probability prediction device, the device comprising:
the obtaining module is used for obtaining the characteristic data of the target client in the target observation time interval;
the prediction module is used for predicting the loss probability of the target client in the target expression time interval according to the feature data of the target client in the target observation time interval and a pre-trained XGboost model; the start time of the target presentation time interval is after the end time of the target observation time interval;
the XGboost model is trained according to training samples; the training sample comprises characteristic data of a client set in an observation time interval and the marked client set; the attrition customers are customers of reduced rank in the customer set; the start time of the presentation time interval is after the end time of the observation time interval.
7. The apparatus of claim 6, further comprising:
the determining module is used for determining the group type of each customer in the customer set; each group type has a corresponding level division mode;
and the dividing module is used for determining whether the level of each client is degraded or not according to the level dividing mode corresponding to the group type to which each client belongs.
8. The apparatus of claim 6, wherein when the feature data comprises a plurality of feature data types; the device further comprises:
and the reason obtaining module is used for obtaining one characteristic data with the largest weight in the multiple characteristic data types according to the characteristic data of the target client in the target observation time interval and a pre-trained XGboost model when the loss probability of the target client in the target expression time interval exceeds a preset threshold value.
9. The apparatus of claim 6, further comprising:
the report acquisition module is used for acquiring an analysis report of the target customer set; the analysis report comprises a loss customer list and loss reasons, wherein the loss probability in the target customer set exceeds the preset threshold; the loss reason is obtained by analyzing the characteristic data with the maximum weight in the multiple characteristic data types.
10. The apparatus of claim 6, wherein the characterization data comprises:
at least one of customer basic information, customer financial asset information, customer bank card information, customer transaction information, customer insurance information, and bank agent customer payroll information.
CN202011640346.XA 2020-12-31 2020-12-31 Loss probability prediction method and device Pending CN112613920A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202011640346.XA CN112613920A (en) 2020-12-31 2020-12-31 Loss probability prediction method and device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202011640346.XA CN112613920A (en) 2020-12-31 2020-12-31 Loss probability prediction method and device

Publications (1)

Publication Number Publication Date
CN112613920A true CN112613920A (en) 2021-04-06

Family

ID=75253197

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202011640346.XA Pending CN112613920A (en) 2020-12-31 2020-12-31 Loss probability prediction method and device

Country Status (1)

Country Link
CN (1) CN112613920A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112862546A (en) * 2021-04-25 2021-05-28 平安科技(深圳)有限公司 User loss prediction method and device, computer equipment and storage medium

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109636446A (en) * 2018-11-16 2019-04-16 北京奇虎科技有限公司 Customer churn prediction technique, device and electronic equipment
CN110197187A (en) * 2018-02-24 2019-09-03 腾讯科技(深圳)有限公司 Method, equipment, storage medium and the processor that customer churn is predicted
CN110322085A (en) * 2018-03-29 2019-10-11 北京九章云极科技有限公司 A kind of customer churn prediction method and apparatus
CN110837931A (en) * 2019-11-08 2020-02-25 中国农业银行股份有限公司 Customer churn prediction method, device and storage medium

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110197187A (en) * 2018-02-24 2019-09-03 腾讯科技(深圳)有限公司 Method, equipment, storage medium and the processor that customer churn is predicted
CN110322085A (en) * 2018-03-29 2019-10-11 北京九章云极科技有限公司 A kind of customer churn prediction method and apparatus
CN109636446A (en) * 2018-11-16 2019-04-16 北京奇虎科技有限公司 Customer churn prediction technique, device and electronic equipment
CN110837931A (en) * 2019-11-08 2020-02-25 中国农业银行股份有限公司 Customer churn prediction method, device and storage medium

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112862546A (en) * 2021-04-25 2021-05-28 平安科技(深圳)有限公司 User loss prediction method and device, computer equipment and storage medium

Similar Documents

Publication Publication Date Title
CN110837931B (en) Customer churn prediction method, device and storage medium
CN104781837B (en) System and method for forming predictions using event-based sentiment analysis
CN110111139B (en) Behavior prediction model generation method and device, electronic equipment and readable medium
US20050080821A1 (en) System and method for managing collections accounts
CN111738843B (en) Quantitative risk evaluation system and method using running water data
US20170124458A1 (en) Method and system for generating predictive models for scoring and prioritizing opportunities
US10325235B2 (en) Method and system for analyzing and optimizing distribution of work from a plurality of queues
CN112419030B (en) Method, system and equipment for evaluating financial fraud risk
CN111738819A (en) Method, device and equipment for screening characterization data
CN112836750A (en) System resource allocation method, device and equipment
CN115545886A (en) Overdue risk identification method, overdue risk identification device, overdue risk identification equipment and storage medium
CN116596659A (en) Enterprise intelligent credit approval method, system and medium based on big data wind control
Kim et al. Predicting corporate defaults using machine learning with geometric-lag variables
CN112613920A (en) Loss probability prediction method and device
CN112950359A (en) User identification method and device
CN111476657A (en) Information pushing method, device and system
CN114626940A (en) Data analysis method and device and electronic equipment
CN112712270B (en) Information processing method, device, equipment and storage medium
Wang et al. Sequential one-step estimator by sub-sampling for customer churn analysis with massive data sets
CN113850483A (en) Enterprise credit risk rating system
CN113269610A (en) Bank product recommendation method and device and storage medium
CN112734352A (en) Document auditing method and device based on data dimensionality
CN117726434A (en) Credit scoring card model training method, application method and related products
CN117035843A (en) Customer loss prediction method and device, electronic equipment and medium
CN115545781A (en) Customer mining model generation method and device and terminal equipment

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination