CN116151916A

CN116151916A - Intelligent marketing method based on XGBoost model

Info

Publication number: CN116151916A
Application number: CN202211730008.4A
Authority: CN
Inventors: 马攀; 谭广
Original assignee: Chongqing Fumin Bank Co Ltd
Current assignee: Chongqing Fumin Bank Co Ltd
Priority date: 2022-12-30
Filing date: 2022-12-30
Publication date: 2023-05-23

Abstract

The invention relates to the technical field of intelligent marketing, in particular to an intelligent marketing method based on an XGBoost model. Comprising the following steps: s1: data acquisition is carried out, and a multi-dimensional user tag system is constructed; s2: automatically processing the data missing value according to the missing rate; s3: setting a user response label as a dependent variable Y, a multidimensional user label as an independent variable X, taking a historical touch user as a training set, and training a model by using an XGBoost classification algorithm; s4: and generating a response score index by using the model to obtain a high-intention user, and applying the high-intention user to the intelligent marketing scene. According to the technical scheme, the method and the device can accurately marketing the intended users, so that the user experience is improved.

Description

Intelligent marketing method based on XGBoost model

Technical Field

The invention relates to the technical field of intelligent marketing, in particular to an intelligent marketing method based on an XGBoost model.

Background

The marketing of financial products by banks is commonly performed by manual marketing and machine marketing. The manual marketing efficiency is low, and the cost is high; and machine marketing is mainly marketing through broadcasting the preset marketing speech, can reduce the human cost. In addition, the marketing means is implemented for the clients by adopting a robot mode, the problems of speaking errors and the like caused by manual tiredness can be avoided, and the machine marketing is enabled to be a trend of bank marketing.

However, the current machine marketing has the defects of too low success rate, thousands of clients and difficulty in capturing their interests, lack of a method and a model capable of identifying the user intention degree, and difficulty in carrying out accurate marketing on the intended user, so that the marketing conversion rate is low, the cost is high, and the user experience is not good enough.

Disclosure of Invention

The invention aims at: according to the intelligent marketing method based on the XGBoost model, the technical scheme can accurately marketing the intended users, so that user experience is improved.

In order to solve the problems, the invention provides a basic scheme that: an intelligent marketing method based on an XGBoost model, comprising: s1: data acquisition is carried out, and a multi-dimensional user tag system is constructed; s2: automatically processing the data missing value according to the missing rate;

s3: setting a user response label as a dependent variable Y, a multidimensional user label as an independent variable X, taking a historical touch user as a training set, and training a model by using an XGBoost classification algorithm; s4: and generating a response score index by using the model to obtain a high-intention user, and applying the high-intention user to the intelligent marketing scene.

The basic scheme has the beneficial effects that: s1, marking labels on clients by multiple dimensions, so that clients can be described more accurately; the sample with the missing value is automatically divided through S2, and filling pretreatment is not needed for the missing feature; training is carried out through the S3, so that the XGBoost model can obtain high-intention user accurate marketing, and the follow-up marketing can be carried out aiming at the intentional user; s4, obtaining a final result, and applying the final result to the intelligent marketing scene to improve user experience.

As a preferable scheme, the multi-dimensional user tag system is provided with a first-level classification of user tags, a second-level classification of user tags and a third-level classification of user tags, and various tags are further refined and classified under the third-level classification of the user tags.

Through refining classification and label types, full and accurate user characteristics are provided for the model, so that a result is more accurate, and the user conversion rate is effectively improved.

As a preferred scheme, the S2 is independently divided into one type for the characteristics with the deletion rate within the range of 10% -70%, when the tree splits according to the characteristics, the optimal splitting point is selected for splitting according to a valued sequence without considering the deletion value, then the deletion value samples are respectively brought into a left child node and a right child node, the loss is respectively calculated, and the splitting direction with small loss is reserved.

When the sample size is not large enough, the direct deletion of the missing value can cause the sample size to be further reduced, and the filling can cause data deviation, the characteristics of the missing value in an acceptable range are analyzed, the loss is reduced as much as possible, and the setting is beneficial to reserving more sample size for some new marketing products or services lacking the sample.

As a preferred solution, the objective function of XGBoost model in S3=conventional loss function+model complexity, where "model complexity" is an L2 regularization term.

The L2 regularization term is introduced, so that the condition that the parameter of a certain feature is overlarge and the result is decisively influenced can be avoided, and the model is guaranteed to have good prediction capability.

As a preferable scheme, the value range of the L2 regularization term parameter is set to be [0, + ], and is defaulted to be 1, and the common parameter candidate value is selected from the range (0, 1).

The L2 regularization term is as small as possible to reduce the 'model complexity', and further ensure that the model has better prediction capability.

As a preferable solution, in S3, the classification performance of the trained model is further evaluated by using the ROC curve and the AUC value, which specifically includes: the following criteria were evaluated: the true positive TP, the false positive FP, the true negative FN and the false negative TN are predicted to be positive proportions in all negative samples, and the TPRate is predicted to be positive proportions in all positive samples.

And (5) performing classification performance evaluation, and verifying whether the model is qualified or not so as to ensure the accuracy of final marketing.

Preferably, the step S3 further includes step S32: and continuously supplementing historical touch user data and new multidimensional user labels, and verifying the model, performing iterative optimization and performing parameter correction.

The training set and the multidimensional user tag are continuously expanded, so that the model data source is richer, the model data source is more accurate through continuous iteration and correction, and the model data source also has timeliness.

In the preferred scheme, in the step S4, the customer base to be marketed is used as a prediction data set, and the consideration factors of the customer base to be marketed are determined to include policy coverage, product admittance standard and pedestrian standard, and then the customer admitted by the my pedestrian risk policy is combined as the final target customer base.

The scope is reduced as much as possible when the customer base of the user to be marketed is determined by multiple factors, so that the prediction of the high-intention customers is more accurate from the source, and the computing power resources can be saved by screening at the beginning.

Preferably, the method further comprises S5: and obtaining the purchase intention of the user for a certain credit product by using the marketing response model.

Through predicting the purchasing intention of the user to the specific product, more accurate marketing can be realized, and the user conversion rate is improved.

Drawings

FIG. 1 is a logic diagram of an intelligent marketing method based on the XGBoost model;

fig. 2 is a schematic diagram of the axis of the ROC curve.

Detailed Description

The technical scheme of the application is further described in detail through the following specific embodiments:

FIG. 1 shows an intelligent marketing method based on XGBoost model, comprising the following steps:

s1: data acquisition is carried out, and a multi-dimensional user tag system is constructed, which comprises the following steps:

s11: data is collected and the collected information includes, but is not limited to, data including behavior data of me devices, organizing user behavior, historical marketing contact records, historical marketing response records, business indexes, and the like. The collection behavior comprises but is not limited to login equipment, login time, login frequency and the like of a user in an APP or a official network of the I; and stay time in marketing links, page interaction behavior and the like, and time-consuming information, flow interruption information and the like of each link in a business flow such as a credit application flow.

S12: generating a multi-dimensional user tag, wherein the multi-dimensional user tag is provided with a first-level classification of the user tag, a second-level classification of the user tag and a third-level classification of the user tag, the second-level classification of the user tag is a refined classification of the first-level classification of the user tag, the third-level classification of the user tag is a refined classification of the second-level classification of the user tag, and specific categories are shown in a table 1:

TABLE 1

The three-level classification also includes specifically subdivided labels, and finally generated multi-dimensional user labels are shown in table 2:

TABLE 2

S2: the data missing value processing method comprises the following three methods: for the characteristics with the deletion rate less than 10%, selecting an average number, a median or a mode for filling according to the characteristic distribution condition; for the characteristics with the missing rate more than 70%, deleting directly, so that data errors caused by artificial filling are avoided; for the characteristics with the deletion rate ranging from 10% to 70%, the deletion values are singly classified into a class, and the deletion values are temporarily not considered, and the class can be defined as any value without practical significance.

In this embodiment, it is generally preferable to adopt a third scheme, in the XGBoost training process, when a certain feature has a missing value, when the tree splits according to the feature, the missing value is not considered first, the optimal splitting point is selected according to the valued sequence to split, then the missing value sample is respectively brought into the left child node and the right child node, the loss is respectively calculated, the splitting direction with smaller overall loss is reserved, and the missing value sample is split according to the direction during prediction.

S3: training the XGBoost model specifically comprises the following steps:

s31: model training is carried out by using an XGBoost classification algorithm so as to finally realize user classification, and the method specifically comprises the following steps:

s31-1, setting a user response label as a dependent variable Y, wherein in the embodiment, the user response label is regarded as a positive sample and marked as 1; and vice versa is a negative sample, marked 0. The multidimensional user tag is an independent variable X; when training, the historical touch user is used as a training set, and the historical touch user comprises a real result label and available characteristic data, so that the data requirement of the supervised model is met. In the training process:

objective function of XGBoost model = traditional loss function + model complexity;

the objective function of the XGBoost model is a final optimization function of the model and is used for determining whether the model parameters are optimal or not, and the smaller the value is, the better the value is; the "conventional loss function" is the sum of the mean square error of the true and predicted values; the model complexity is L2 regularization term, which is the square sum of all characteristic parameters, the square sum should be as small as possible, the parameter of one characteristic is prevented from being too large, and the result is decisively influenced, so that the model is guaranteed to have better prediction capability.

In one embodiment, to minimize the "model complexity", the range of values for the L2 regularization term parameter is [0, + -infinity ], and defaults to 1, the common tuning candidate is selected from the range (0, 1), and the final parameter value is considered in combination with the AUC value.

S31-2: a part of data is selected from the data of the historical touch user as a test set, wherein the test set AUC >0.8 is regarded as qualified (the larger AUC is, the better the classification performance is), the AUC is the area enclosed by an ROC curve (receiver operating characteristic curve, namely, a subject working characteristic curve) and a coordinate axis, and the verification standard is shown in table 3:

TABLE 3 Table 3

As shown in fig. 2, the horizontal axis of the ROC curve is FPRate, the vertical axis is TPRate, and fprate=fp/(fp+tn), among all negative samples, the ratio of positive samples is predicted; tprate=tp/(tp+fn), the proportion of positive class is predicted among all positive class samples.

S31-3: and when the characteristics are newly added, the characteristics are off-line and the like, performing iterative training on the model.

S32: and continuously supplementing historical touch user data and new multidimensional user labels, and verifying the model, performing iterative optimization and performing parameter correction.

S4: and obtaining a high-intention user by using the model predictive response score, and applying the high-intention user to an intelligent marketing scene, wherein the method specifically comprises the following steps of:

s41: selecting a prediction data set, wherein the prediction data set is a user guest group to be marketed, determining that the user guest group to be marketed needs comprehensive multi-party consideration, and taking the response prediction into account factors including, but not limited to, policy coverage, product admittance standards and pedestrian standards by combining users admitted by the risk policy of me as a final target guest group.

S42: inputting a model entering feature into a trained marketing response model to predict the response probability of a new user and generating a response score index, wherein the model entering feature comprises but is not limited to a behavior feature, an attribute feature and an account feature, the behavior feature comprises but is not limited to a user source channel/platform, whether a multi-channel/platform user is required to be informed, whether the new user is required to be informed, the frequency of the informed, the number of days of the last time the active time of the informed is far from the current interval, and whether a historical overdue user is required to be informed; attribute features include, but are not limited to, gender, age, academic, regional, gender; account features include, but are not limited to, whether an APP account is registered, whether an account is opened, whether a card is bound.

S43: the response score index is applied to an intelligent marketing scene, targeted touch is carried out on high-intention users by using marketing modes such as short messages, AI (advanced technology) voice, outbound, APP (application) and the like, low-intention users are filtered, marketing cost is saved, marketing conversion rate is improved, a marketing response model finally outputs a marketing response probability value of each predicted user, the greater the marketing response probability value is, the higher the response probability is indicated, namely, the higher the intention of the user is considered, and the higher the response probability of the resource tends to be the user in a marketing personnel selecting link, so that the obtained user feedback value is greater.

Example two

The difference between this embodiment and the first embodiment is that S5:

s5: obtaining the purchase intention of a user on a certain credit product by using a marketing response model, and carrying out accurate recommendation, wherein the method specifically comprises the following steps:

s51: the business flow of the credit product is generally registration- > real name- > account opening- > credit application- > credit passing- > credit releasing- > credit post-credit expression, and in the flow, the user who submits the credit application is defined as a standard for responding to the user, namely, the user who submits the credit application is regarded as a positive sample and marked as 1; and vice versa is a negative sample, marked 0.

S52: and comprehensively considering the service correlation and the data availability, and selecting the characteristics including the basic information, the history lending information, the credit investigation information and the like of the client as an independent variable X.

S53: and selecting a positive and negative sample data training model with equivalent quantity from the historical transaction data.

S54: the input argument X gets the user's purchase intent for a certain credit product.

Example III

The difference between the embodiment and the second embodiment is that in S43, the marketing mode for the high-intention user is further predicted, that is, the prediction capability of the training set training model for the marketing mode accepted by the user is used, the independent variable X is consistent with the high-intention user input model feature and is also a multi-dimensional user tag, wherein the specific gravity of the attribute feature is higher than that of other features, and no additional preparation of other data is required. When formally marketing is carried out, a marketing mode which can be accepted by a user is selected, so that the user dislike caused by repeated marketing in multiple modes is avoided, and the marketing success rate is improved.

The foregoing is merely exemplary of the present invention, and the specific structures and features that are well known in the art are not described in any way herein, so that those skilled in the art will be aware of all the prior art to which the present invention pertains, and will be able to ascertain all of the prior art in this field, and with the ability to apply the conventional experimental means prior to this date, without the ability of those skilled in the art to perfect and practice this invention with their own skills, without the ability to develop certain typical known structures or methods that would otherwise be the obstacle to practicing this invention by those of ordinary skill in the art. It should be noted that modifications and improvements can be made by those skilled in the art without departing from the structure of the present invention, and these should also be considered as the scope of the present invention, which does not affect the effect of the implementation of the present invention and the utility of the patent. The protection scope of the present application shall be subject to the content of the claims, and the description of the specific embodiments and the like in the specification can be used for explaining the content of the claims.

Claims

1. An intelligent marketing method based on an XGBoost model is characterized in that: comprising the following steps:

s1: data acquisition is carried out, and a multi-dimensional user tag system is constructed;

s2: automatically processing the data missing value according to the missing rate;

s3: setting a user response label as a dependent variable Y, a multidimensional user label as an independent variable X, taking a historical touch user as a training set, and training a model by using an XGBoost classification algorithm;

s4: and generating a response score index by using the model to obtain a high-intention user, and applying the high-intention user to the intelligent marketing scene.

2. The intelligent marketing method based on the XGBoost model of claim 1, wherein the method comprises the following steps: the multi-dimensional user tag system is provided with a first-level classification of user tags, a second-level classification of user tags and a third-level classification of user tags, and various tags are further refined and classified under the third-level classification of the user tags.

3. The intelligent marketing method based on the XGBoost model of claim 1, wherein the method comprises the following steps: the S2 is independently classified into a class for the characteristics with the deletion rate within the range of 10% -70%, when the tree splits according to the characteristics, the optimal splitting point is selected for splitting according to a valued sequence without considering the deletion value, and then the deletion value samples are respectively brought into a left child node and a right child node to respectively calculate the loss, so that the splitting direction with small loss is reserved.

4. The intelligent marketing method based on the XGBoost model of claim 1, wherein the method comprises the following steps: the objective function of XGBoost model in S3 = conventional loss function + model complexity, where "model complexity" is the L2 regularization term.

5. The intelligent marketing method based on the XGBoost model as set forth in claim 4, wherein: the range of the L2 regularization term parameter is set to be [0, + ], and default is 1, and the common parameter tuning candidate value is selected from the range (0, 1).

6. The intelligent marketing method based on the XGBoost model of claim 1, wherein the method comprises the following steps: and in the step S3, the classification performance of the trained model is also evaluated by using an ROC curve and an AUC value, and the method specifically comprises the following steps: the following criteria were evaluated: the true positive TP, the false positive FP, the true negative FN and the false negative TN are predicted to be positive proportions in all negative samples, and the TPRate is predicted to be positive proportions in all positive samples.

7. The intelligent marketing method based on the XGBoost model of claim 1, wherein the method comprises the following steps: the step S3 further includes step S32: and continuously supplementing historical touch user data and new multidimensional user labels, and verifying the model, performing iterative optimization and performing parameter correction.

8. The intelligent marketing method based on the XGBoost model of claim 1, wherein the method comprises the following steps: and in the step S4, the user guest group to be marketed is used as a prediction data set, and the consideration factors of the user guest group to be marketed, including policy coverage, product admittance standard and pedestrian standard, are determined, and the user admitted by the risk policy of me is combined as a final target guest group.

9. The intelligent marketing method based on the XGBoost model of claim 1, wherein the method comprises the following steps: further comprising S5: and obtaining the purchase intention of the user for a certain credit product by using the marketing response model.