CN115187312A

CN115187312A - Customer loss prediction method and system based on deep learning

Info

Publication number: CN115187312A
Application number: CN202210931243.1A
Authority: CN
Inventors: 赵晖; 杨立立; 邵宇颉
Original assignee: Xinjiang University
Current assignee: Xinjiang University
Priority date: 2022-08-04
Filing date: 2022-08-04
Publication date: 2022-10-14

Abstract

The invention relates to a customer loss prediction method and system based on deep learning, and belongs to the field of artificial intelligence. And extracting relevant attributes through relevant user logs used by the user on the E-commerce platform, and constructing user behavior characteristic information. And constructing user behavior characteristics of multiple dimensions according to the user loss reason, and determining the final user characteristics by using a Pearson coefficient and Chi-Square test method. Aiming at different influences of user interest change and user global behavior habits on loss prediction, a user dynamic and static fusion strategy is provided. And building an XGB-LGCNN prediction model, and predicting user loss by using a classifier model. The problem that the traditional machine learning method is low in efficiency and inaccurate in user loss probability prediction is solved. The loss probability of the user can be accurately judged, the loss judgment precision is improved, and the cost of manual blind verification and statistics is reduced.

Description

Customer loss prediction method and system based on deep learning

Technical Field

The invention relates to the field of artificial intelligence, in particular to the field of user loss prediction, and particularly relates to a customer loss prediction method and a customer loss prediction system based on deep learning.

Background

With the popularization of mobile devices and the rapid development of the internet, more and more people prefer to choose to shop on the internet, and compared with traditional physical stores, the shopping time on the internet is more free, the range of goods selection is wider, and the advantages promote the increasing prosperity of the e-commerce market.

With the further popularization of emerging technologies such as 5G and the like, the market environment is continuously optimized, and the new online and offline shopping mode appears, so that the new potential of the domestic online consumption market is further excited.

Due to the rapid development of electronic commerce, the market entrance threshold is low, various e-commerce platforms are developed, and the market competition severity is further increased. With the increasing market homogenization and the continuous segmentation of market segments, more and more companies are beginning to pay attention to the construction of customer relationship management. For an enterprise, the cost of developing a new user is 5 times the cost of maintaining an old customer. While older customers are more loyal, the chance of purchasing items again through the platform is much greater than potential new customers. Meanwhile, the old customer can bring more additional benefits to the platform for the propaganda of the platform, the more important reason is that the customer is an important source of enterprise profits, and the customer brings benefits to the platform to a certain extent in the purchasing behavior of the platform, which is an important premise for the subsequent expansion of the enterprise. Maintaining good customer relationship is of great significance to the e-commerce enterprises to maintain long-term competitive advantage in severe competitive environment. Accurate identification of potential attrition users by e-commerce operators can bring the following benefits to the platform: (1) Accurately saving the early lost users and avoiding the later lost users; (2) reducing enterprise losses; (3) optimizing user experience.

However, in the process of predicting user churn, different types of user behavior habits and recent interests are different, and different user churn reasons tend to be different. At present, most of research uses traditional machine learning or manual revisit methods, and with the rapid increase of the number of customers, the efficiency and accuracy of user churn prediction by the traditional machine learning or manual methods are difficult to meet the analysis requirements of current enterprises. User loss prediction in banking industry and telecommunication industry is common, and due to fixed and simple loss characteristics, a good effect can be obtained by using a traditional machine learning method, but user loss prediction in the field is difficult due to complex and huge user data and the loss characteristics which are difficult to define of the electric commerce. Therefore, a new prediction method for predicting the loss of the e-commerce platform user is urgently needed.

Disclosure of Invention

The invention aims to provide a customer loss prediction method and a customer loss prediction system based on deep learning, and solves the problem that the loss of a user of a merchant platform is difficult to accurately predict in the prior art. The invention relates to a user loss prediction method based on user behavior characteristic data, which realizes efficient, accurate and targeted user loss prediction by extracting effective characteristics, continuously reserves original users for enterprises and reduces the user loss probability as much as possible. According to the method, the relevant user logs used by the user on the E-commerce platform are used, relevant attributes are extracted, characteristics are built, a loss model is built, and the loss probability of the future user is predicted.

The above object of the present invention is achieved by the following technical solutions:

the customer churn prediction method based on deep learning comprises the following steps:

step 1, data source: acquiring log data of user behaviors recorded by an e-commerce platform, extracting characteristic attributes and constructing user behavior characteristic information; analyzing different reasons causing user loss to construct user loss characteristics, constructing the loss characteristics of the user from four dimensions of 'personal portrait', 'purchasing power', 'participation degree' and 'loyalty', wherein different dimensions represent different relations between the user and a platform, and the user loss characteristics construct detail information as follows:

personal portrait: user id, gender, registration duration and age;

purchasing power: last consumption, average consumption amount per unit, average consumption amount of commodities, total consumption amount and total consumption times;

the participation degree is as follows: total login days, total login times and total commodity clicking times;

loyalty: last login, login frequency and total use duration;

step 2, characteristic processing: preprocessing the user information data acquired in the step 1, wherein the preprocessing links are as follows:

(1) Deleting the user characteristic information data with larger outlier or obvious unreasonable outlier;

(2) Filling numerical null values according to the condition of the user characteristic null values and the real physical meaning of the filling values;

(3) Noise data are deleted so as to avoid influencing the accuracy of the final model loss prediction;

(4) Carrying out numerical processing on category variables in the user characteristics;

(5) Deleting invalid user characteristic information data;

(6) Data standardization processing, namely processing the numerical characteristics of the user by using a z-score function;

constructing user behavior characteristics of multiple dimensions according to user loss reasons, determining final user characteristics by using a Pearson coefficient and Chi-method test method, selecting characteristics more effective for user loss prediction, and removing characteristics with strong redundancy or interference; aiming at different influences of user interest change and user global behavior habits on loss prediction, a user dynamic and static fusion strategy is provided; classifying the user behavior characteristics, dividing the user dynamic characteristics and the user static characteristics to form a required user behavior characteristic data set, and dividing the user behavior characteristic data set into a training sample and a testing sample according to the proportion;

step 3, model construction: building a limit gradient lifting tree-long-short term memory gated convolutional neural network prediction model (XGB-LGCNN), wherein the XGB-LGCNN prediction model consists of a long-short term memory gated convolutional neural network model (LGCNN) and a limit gradient lifting tree (XGboost), and the LGCNN consists of network layers of a long-short term memory neural network (LSTM), a door mechanism and a Convolutional Neural Network (CNN) and a final full-connection layer; the XGboost is a variant of a GBDT (gradient boosting decision tree), the LGCNN is used for processing dynamic characteristics of a user, the XGboost is used for processing static characteristics of the user, and finally, better characteristic data are obtained through fusion of full connection layers;

step 4, model training: training a limit gradient lifting tree-long short-term memory gated convolutional neural network prediction model (XGB-LGCNN) by using the final user loss characteristics selected in the step 2, comparing the prediction result of the training model with the corresponding real label in the test sample, evaluating the prediction accuracy and the F1 value of the training model, and selecting the trained limit gradient lifting tree-long short-term memory gated convolutional neural network prediction model (XGB-LGCNN) with higher accuracy as a model in practical application;

and 5, predicting user loss: and (4) inputting the user data into the extreme gradient lifting tree-long short-term memory gated convolutional neural network prediction model (XGB-LGCNN) trained in the step (4) to obtain the loss probability of the user to be detected.

In the user information obtained in step 1, "total amount of consumption", "total number of days of login", "frequency of login", "total number of times of clicking on a commodity", and "total duration of use" are dynamic feature data, and need to be processed in groups in addition to obtaining the total number of the whole observation period, and the data set records that the observation period is 90 days, and is divided into 13 groups by 7 days in the specified time granularity.

The finally selected features have larger influence on the prediction result, the features of the end user are determined by using a Pearson coefficient and Chi-Square test method, and the features with the highest feature importance and the lowest correlation are reserved according to the calculation result, so that the features more effective for user loss prediction are selected, and the features with redundancy or strong interference are removed.

The data normalization processing in step 2 (6) is to process the user numerical characteristics by using a z-score function, specifically:

z-score, also called standard score, is the process of dividing the difference of a number and a mean by the standard deviation, and standard score is the number of symbols for an observed or data point whose value is higher than the standard deviation of the average of the observed or measured values, and is given by the formula:

wherein X is the original data, and X is the original data,

is the mean, s is the standard deviation;

z-score is also a method for data normalization, if the statistical data quantity is enough, the Z-score data distribution is satisfied, 68% of data is distributed between "-1" and "1", 95% of data is distributed between "-2" and "2", and 99% of data is distributed between "-3" and "3"; the Z-score is suitable for the condition that the data distribution is too messy, the maximum value and the minimum value cannot be judged, or too many singular points exist in the data.

And respectively inputting the processed dynamic and static feature vectors of the user into different modules of the model for processing so as to more effectively exert the feature extraction capability of the different modules, wherein the LGCNN is used for processing the dynamic features of the user, the XGboost is used for processing the static features of the user, and finally, better feature data is obtained through full-connection layer fusion, so that the prediction effect is improved, and the final prediction result is obtained.

Another object of the present invention is to provide a customer churn prediction system based on deep learning, which includes:

the data source module is used for collecting original data of the user, reserving the required characteristics according to the screened user loss characteristics to generate a required data table, and storing the required data table in a database, wherein the required data table comprises user behavior data related to the loss characteristics of the user, which are constructed around four dimensions of 'personal portrait', 'purchasing power', 'participation degree' and 'loyalty';

the system comprises a preprocessing module, a data processing module and a data processing module, wherein the preprocessing module deletes user characteristic information data with larger outliers or obvious unreasonable outliers, deletes invalid user characteristic information data and noise data, fills numerical null values according to the situation of the user characteristic null values and the real physical significance of the filling values, carries out data standardization processing on the data, stores the data in a database, constructs user behavior characteristics with multiple dimensions according to the user loss reasons, determines the final user characteristics by using a Pearson coefficient and Chi method, selects the characteristics more effective for user loss prediction, removes the characteristics with strong redundancy or interference, and stores the characteristics in the database to form a needed preprocessed user behavior characteristic data set;

the dynamic and static characteristic processing module is used for classifying the user behavior characteristics and dividing the dynamic and static characteristics of the user, and the dynamic characteristics divide the data with the required specification according to the specified time granularity; finally, the behavior data of a user consists of static characteristic data and dynamic characteristic data, and is stored in a database to form a required user behavior characteristic data set, and the required user behavior characteristic data set is randomly divided into training samples and testing samples according to a proportion;

the loss user prediction module comprises a user data reading sub-module, a loss algorithm model and a prediction result output sub-module, wherein in the module, a user needs to upload log data of user behaviors recorded by an e-commerce platform to be predicted before user loss prediction is carried out, then a user behavior data set containing selected end user loss characteristics is uploaded through the user data reading module, the uploaded user behavior data set is subjected to loss prediction through the loss algorithm model, and a predicted result can be printed or exported and stored in a local designated folder.

The invention has the beneficial effects that: the invention provides an e-commerce platform user loss prediction method, a user loss prediction system, electronic equipment and a computer readable storage medium, wherein characteristics in collected user behavior characteristic information are subjected to detailed selection according to user loss reasons, characteristics which are more effective for user loss prediction are selected, and characteristics with high redundancy or interference are removed. Meanwhile, aiming at different influences of user interest change and user global behavior habits on loss prediction, a user dynamic and static fusion strategy is provided, user behavior characteristics are classified, user dynamic and static characteristics are divided, and processing is carried out by using a method of fusion after different network processing, so that hidden information contained in the relation and change of the user interest and behavior habits is mined, hidden intentions of the user are known, and model prediction performance is further improved. The problem that the traditional machine learning method is low in efficiency and inaccurate in user loss probability prediction is solved. The user loss prediction in the banking industry and the telecommunication industry is common, and the loss characteristics are fixed and simple, so that a good effect can be obtained by using the traditional machine learning method, but the user loss prediction in the field of the electric commerce is difficult due to the complex and huge user data and the loss characteristics which are difficult to define, the method can accurately judge the user loss probability, improve the loss judgment precision and reduce the cost of manual blind verification and statistics.

The invention uses a method of combining deep learning and traditional machine learning. The method solves the problems of low efficiency and inaccuracy in predicting the user loss probability in the traditional machine learning method. The user loss prediction in the banking industry and the telecommunication industry is common, and the loss characteristics are fixed and simple, so that a good effect can be obtained by using the traditional machine learning method, but the user loss prediction in the field of the electric commerce is difficult due to the complex and huge user data and the loss characteristics which are difficult to define, the method can accurately judge the user loss probability, improve the loss judgment precision and reduce the cost of manual blind verification and statistics.

The invention carries out detailed selection on the characteristics in the collected user behavior characteristic information according to the user loss reason, selects the characteristics more effective to the user loss prediction, and removes the characteristics with redundancy or strong interference. Meanwhile, a user dynamic and static fusion strategy is provided according to different influences of user interest change and user global behavior habits on loss prediction. And classifying the user loss characteristics, and dividing the user dynamic characteristics and the user static characteristics. Dividing the dynamic characteristics of the user according to the specified time granularity, simultaneously carrying out different processing on the static characteristics to form a required user behavior characteristic data set, carrying out data processing by using a method of processing different networks and then fusing, so as to mine hidden information contained in the relation and change of the user interest and behavior habits, know the hidden intention of the user and further improve the model prediction performance.

Drawings

The accompanying drawings, which are included to provide a further understanding of the invention and are incorporated in and constitute a part of this application, illustrate embodiment(s) of the invention and together with the description serve to explain the invention without limiting the invention.

FIG. 1 is a flow chart of a method of the present invention;

FIG. 2 is a structural diagram of a long-short term memory gated convolutional neural network model LGCNN of the present invention;

FIG. 3 is a structural diagram of a maximum gradient lifting tree-long short-term memory gated convolutional neural network prediction model XGB-LGCNN of the invention;

fig. 4 is a schematic structural diagram of an electronic device according to an embodiment of the present invention.

Detailed Description

The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the accompanying drawings, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be obtained by a person skilled in the art without making any creative effort based on the embodiments in the present invention, belong to the protection scope of the present invention. In order to make the aforementioned objects, features and advantages of the present invention more comprehensible, the present invention is described in detail with reference to the accompanying drawings and the detailed description thereof. Among them, the related keywords are explained as follows:

eXtreme Gradient lifting tree-Long short term memory Gating convolution neural network prediction model (eXtreme Gradient Boosting-Long short term-term memory Gating convolution neural network, XGB-LGCNN)

Long-short term memory gated Convolutional neural network model (LGCNN)

eXtreme Gradient lifting tree (eXtreme Gradient Boosting, XGboost)

Long Short Term Memory neural network (Long Short-Term Memory, LSTM)

Convolutional Neural Network (CNN)

Referring to fig. 1 to 4, according to the deep learning-based (e-commerce) user churn prediction method, the sales prediction system and the electronic device, the user behavior feature information is constructed by extracting the relevant attributes through the relevant user logs used by the user on the e-commerce platform. And constructing user behavior characteristics of multiple dimensions according to the user loss reason, and determining the characteristics of the final user by using a Pearson coefficient and a Chi-Square test method. Aiming at the different influences of the user interest change and the user global behavior habit on the loss prediction, a user dynamic and static fusion strategy is provided. And classifying the user loss characteristics, and dividing the user dynamic characteristics and the user static characteristics. Dividing the dynamic characteristics of the user according to the specified time granularity, simultaneously carrying out different treatments on the static characteristics to form a required user behavior characteristic data set, and dividing the data set into a training sample and a testing sample according to the proportion. And (2) building an extreme gradient lifting tree-long short-term memory gated convolutional neural network prediction model (XGB-LGCNN), training the model by using the related user loss characteristics and using the training sample, evaluating the prediction accuracy and the F1 value of the training model according to the comparison between the prediction result of the training model and the corresponding real label in the test sample, and selecting the model with higher accuracy as the model in practical application. And predicting user loss by using the classifier model.

In a first aspect, the invention provides a deep learning-based e-commerce platform user churn prediction method, which comprises the following steps:

step 1, data source: the method comprises the steps of obtaining log data of user behaviors recorded by an electronic commerce platform, extracting characteristic attributes and constructing user behavior characteristic information. For which different causes of user churn need to be analyzed to construct a user churn profile. Based on this idea, we propose to build the churn feature of the user from the four dimensions "personal portrait", "purchasing power", "engagement", "loyalty". Different dimensions represent different relationships between users and platforms. The user churn feature construction detail information is as follows:

personal portrait: user id, gender, registration duration, age;

the participation degree: total login days, total login times and total commodity clicking times;

loyalty: last login, login frequency and total use time.

Step 2, characteristic processing: and (2) preprocessing the user information data acquired in the step (1), wherein the preprocessing link is as follows:

(3) Deleting the noise data so as to avoid influencing the accuracy of the loss prediction of the final model;

(5) Deleting invalid user characteristic information data;

(6) Data standardization processing, namely processing numerical characteristics of the user by using a z-score function;

the method comprises the steps of constructing multi-dimensional user behavior characteristics according to user loss reasons, determining end user characteristics by using a Pearson coefficient and Chi-method test method, selecting characteristics more effective for user loss prediction, and removing characteristics with high redundancy or interference. Aiming at different influences of user interest change and user global behavior habits on loss prediction, a user dynamic and static fusion strategy is provided. Classifying the user behavior characteristics, dividing the user dynamic characteristics and the user static characteristics to form a required user behavior characteristic data set, and dividing the user behavior characteristic data set into a training sample and a testing sample according to the proportion.

Step 3, model construction: building a limit gradient lifting tree-long and short-term memory gated convolutional neural network prediction model (XGB-LGCNN), wherein the XGB-LGCNN prediction model consists of a long and short-term memory gated convolutional neural network model (LGCNN) and a limit gradient lifting tree (XGboost), and the LGCNN consists of network layers of a long and short-term memory neural network (LSTM), a door mechanism and a Convolutional Neural Network (CNN) and a final full connection layer; the XGboost is a variant of a GBDT, the LGCNN is used for processing dynamic characteristics of users, the XGboost is used for processing static characteristics of the users, and finally better characteristic data is obtained through full-connection layer fusion.

Step 4, model training: and (3) training a limit gradient lifting tree-long and short term memory gated convolutional neural network prediction model (XGB-LGCNN) by using the final user loss characteristics selected in the step (2) by using the training sample, evaluating the prediction accuracy and the F1 value of the training model according to the comparison between the prediction result of the training model and the corresponding real label in the test sample, and selecting the trained limit gradient lifting tree-long and short term memory gated convolutional neural network prediction model (XGB-LGCNN) with higher accuracy as a model in practical application.

In a second aspect, the present invention provides a system for predicting user loss of an e-commerce platform, including: the system comprises a data source module, a preprocessing module, a dynamic and static characteristic processing module and a loss user prediction module.

The data source module collects original data of a user, reserves required characteristics according to screened user loss characteristics to generate a required data table, and stores the required data table in a database, wherein the required data table comprises user behavior data related to the loss characteristics of the user, which are constructed around four dimensions of 'personal portrait', 'purchasing power', 'participation degree' and 'loyalty';

the preprocessing module deletes user characteristic information data with larger outlier or obvious unreasonable outlier, deletes invalid user characteristic information data and noise data, fills a numerical null value according to the condition of the user characteristic null value and the real physical meaning of the filling value, performs data standardization processing on the data, stores the data in a database, constructs user behavior characteristics with multiple dimensions according to the user loss reason, determines the final user characteristics by using a Pearson coefficient and Chi method test, selects the characteristics more effective for user loss prediction, removes the characteristics with strong redundancy or interference, and stores the characteristics in the database to form a needed preprocessed user behavior characteristic data set;

the dynamic and static characteristic processing module classifies user behavior characteristics, divides user dynamic and static characteristics, and divides data of required specifications according to specified time granularity by dynamic characteristics; finally, the behavior data of a user consists of static characteristic data and dynamic characteristic data, and is stored in a database to form a required user behavior characteristic data set, and the required user behavior characteristic data set is randomly divided into training samples and testing samples according to a proportion;

the loss user prediction module comprises three sub-modules of user data reading, a loss algorithm model and prediction result output, wherein in the module, a user needs to upload log data of user behaviors recorded by an e-commerce platform to be predicted before user loss prediction is carried out, then a user behavior data set containing selected loss characteristics of a final user is uploaded through the user data reading module, the uploaded user behavior data set is subjected to loss prediction through the loss algorithm model, and predicted results can be printed or exported and stored in a local appointed folder.

In a third aspect, the present invention provides an electronic device comprising a memory, a processor and a computer program stored in the memory and executable on the processor, wherein the processor implements the steps of the method according to the first aspect when executing the computer program.

In a fourth aspect, the present invention provides a computer readable storage medium having stored thereon a computer program which, when executed by a processor, performs the steps of the method of the first aspect.

Example 1:

referring to fig. 1, a customer churn prediction method based on deep learning includes the following steps:

step 1, data source: and acquiring log data of user behaviors recorded by the e-commerce platform, extracting characteristic attributes and constructing user behavior characteristic information. For which different causes of user churn need to be analyzed to construct user churn features. Based on this idea, we propose to build the churn feature of the user from the four dimensions of "personal portrait", "purchasing power", "engagement", "loyalty". Different dimensions represent different relationships between users and platforms. The user churn feature construction detail information is as follows:

personal portrait: user id, gender, registration duration and age;

loyalty: last login, login frequency and total use time.

Step 2, characteristic processing: and (2) processing the personal information data acquired in the step (1), wherein the acquired 16 characteristics are respectively calculated from 5 different tables and need to be connected through a primary key of 'user id', so that the combined tables only have unique 'user id' and corresponding 15 loss characteristics. For the merged table, the preprocessing steps are as follows:

(1) And deleting the user characteristic information data with larger outlier or obvious unreasonable value.

(2) And filling in a null value. For the merged feature table, it needs to be considered that the user who logs in the observation window does not necessarily buy the product or click on a product, which may result in a large amount of null values for the merged table. These null values need to be filled, and in order to ensure that the filled values have a real physical meaning, the null values in the two dimensions of purchasing power and engagement in the table are given as 0, which means that the user does not perform the activity. If the user does not purchase any goods in the observation window, the corresponding user order table will not have any record of the user, and the user table will have a null value, and the value is filled with 0, which indicates that the user has no consumption times. For the feature "last consumption", the null value cannot be simply assigned a value of 0, since 0 represents that the user purchases a commodity at the observation point (31 days 6 months), which does not fit the user's real behavior, and the user who does not consume in the observation window has his "last consumption" attribute set to-1.

(3) The noisy data is deleted. Users that are newly enrolled during the prediction window, because they do not have any travel logs during the observation window, belong to noisy data, and should be deleted so as not to affect the accuracy of the final model churn prediction.

(4) And (5) carrying out numerical processing on the category variables. The gender of the user belongs to the category variables, and because the gender only comprises male, female and unknown, and the number of the category variables is small, the one-hot coding processing is carried out on the category variables to generate three columns of characteristics of sex _ male, sex _ female and sex _ unknown.

(5) And deleting the invalid user characteristic information data. Since the "age" characteristic of the previous APP personal information is not an indispensable item, the characteristic of part of the previous users is null, and the part of the users accounts for 14.8% of the total observing users. However, since the 'age' feature is an indispensable item of a new registered user in the subsequent version updating process and the integrity of the personal portrait dimension in the lost feature is maintained, the user with the 'age' feature being a null value is finally selected to be deleted.

(6) And (6) carrying out data standardization processing. The user numerical characteristics are processed using the z-score function.

The method comprises the steps of constructing multi-dimensional user behavior characteristics according to user loss reasons, and screening the provided characteristics in order to further verify whether the characteristics are reasonable and reduce task difficulty. Irrelevant features can interfere with tasks and affect model performance. Different features shall represent different meanings. If the plurality of characteristics present the same change rule on the change of the numerical value, the linear strong correlation between the characteristics is reflected, the characteristics need to be deleted, and one or two correlation characteristics are reserved, so that the model training speed can be accelerated, and the key information cannot be lost. The linear relationship between two variables is reflected using the pearson coefficient, which is calculated as:

where cov (X, Y) represents the covariance between two variables, σ _X σ _Y Is the product of the standard deviations of the two variables. Through calculating the Pearson correlation coefficient, the linear relationship between every two of 15 variables can be clearly observed, wherein four groups of variable values of 'total logging days' and 'total logging times', 'average consumption amount per unit' and 'average commodity consumption amount', 'total consumption times' and 'total consumption amount', 'total commodity clicking times' and 'total use duration' have strong positive correlation.

In order to further explore the potential relationship between the user attrition labels and the non-negative attrition characteristics, the card method inspection method is used for inspection. The chi-square test method is a hypothesis test method for judging whether an original hypothesis is correct by using the deviation of a theoretical value and an actual value, and helps to judge whether two features are independently related. The chi-square test is calculated as follows:

x is the user characteristic, and Y represents the user label that runs off, and A is the actual value, and is the observation frequency of certain eigenvalue, and T is theoretical value, expects the frequency. A higher chi-squared test score indicates a greater correlation of the runoff feature to the label, whereas a lesser degree of correlation is indicated. The skearn scientific toolkit is used to provide a selectKBest function to help calculate the chi-squared test value. The sparse features such as gender (sex _ male, sex _ female, sex _ unknown) of the user are deleted, and the feature containing a negative value, which is "last consumed", is deleted. And obtaining chi-square test scores of the related features and the loss labels, combining Pearson correlation coefficients among the features, deleting the features with higher correlation, and keeping the first feature. Therefore, the 4 characteristics of "average amount of consumption of the commodity", "total use time", "total amount of consumption" and "total number of days of login" with smaller scores are deleted, and the 9 characteristics of "total number of times of clicking the commodity", "total number of times of consumption amount", "total number of times of login", "average amount of consumption per unit", "total number of times of consumption", "registration time", "login frequency of login", and "age" are retained as final user characteristics.

The different features are constructed in different ways. One feature, such as the "total days logged" feature, is obtained by counting the number of values over a certain period of time. The "total days logged in" feature is obtained by accumulating the number of user logged in days in a certain section of the observation window. Another feature, such as "registration duration", can only be obtained by using APP by the user as a whole, and its value cannot be obtained by counting the number within a certain period of time. Each type of features reflects the condition of the platform used by the user, but in order to further reflect the trend that the user activity changes along with the change of time, the two types of features need to be classified differently.

Therefore, the behavior characteristics of the user are divided into two types in the present invention: user dynamic features and user static features. The user dynamic characteristics reflect the user interest change condition and represent the condition change of the user using the platform along with the time change. The user static characteristics reflect the global behavior habits of the user and represent the condition that the user integrally uses the platform in the observation window. The dynamic and static characteristics of the user are specifically divided as follows:

user dynamic characteristics: total amount of consumption, total number of login days, login frequency, total times of clicking commodities and total using duration.

User static characteristics: the method comprises the following steps of registering time, last consumption, average consumption amount of each unit, average consumption amount of commodities, total consumption times and total login and login times.

The user static features reflect the overall profile of the user's use of the platform within the observation window. The registration duration and the last consumption need to acquire the latest date of the user logging and ordering in the observation window period, and the latest date and the observation point date are subjected to difference calculation to obtain the month number or the day number difference, wherein the value cannot be divided according to the time granularity. The average consumption amount of each unit and the average consumption amount of the commodities reflect the average performance of the wasting and wasting of the platform of the user in the observation window. In addition, the reason for classifying the total consumption times as the static characteristics is that the two dynamic characteristics of the total consumption amount and the total consumption amount reflect the dynamic change of the purchasing behavior of the user, and in order to avoid the occurrence of the information redundancy condition of the user wasting energy on the dynamic characteristics, the total consumption times characteristic which is one of the purchasing powers of the user is classified as the static characteristics, so that the purpose is to check the whole times of the user purchasing on the platform in the observation window.

Dynamic behavior activities of the user are divided through the specified time granularity, data of the user changing along with time change are obtained, and the relationship between the future loss of the user and historical behaviors is further mined. In the invention, a total of 6 dynamic features are divided by a predetermined time granularity t =7, the original 6 x 1 dimensions are respectively reconstructed into 6 x 13 dimensions, and the 7 x 1 dimensions of the static features of the user are reserved. For each user u, the dynamic daily behavior characteristics are counted, and the behavior time characteristics for each user are expressed as

Wherein t represents time granularity and B represents

B =13,d represents different behavioural activities of the user, with a value of 6 corresponding to six user dynamics. For the user, the static behavior characteristics are expressed as

The size of B is 7, representing the number of static features of the user. And forming a required user behavior characteristic data set, and dividing the data set into a training sample and a testing sample according to a proportion. The experimental data set used the original data set, with 70% being the training set and 30% being the test set.

Step 3, model construction: and constructing a limit gradient lifting tree-long and short term memory gated convolutional neural network prediction model (XGB-LGCNN), wherein the XGB-LGCNN prediction model mainly comprises a long and short term memory gated convolutional neural network model (LGCNN) and a limit gradient lifting tree (XGboost). The specific structure is shown in fig. 3. The long-short term memory gated convolutional neural network (LGCNN) model mainly comprises a network layer of a long-short term memory neural network (LSTM), a door mechanism and Convolutional Neural Network (CNN) and a final full-connection layer; the XGBoost is a variation of the Gradient Boosting Decision Tree (GBDT) and achieves excellent performance in various data competitions.

Long Short-Term Memory neural networks (LSTMs) are capable of handling Long-distance dependency problems. In the task, the LSTM can mine the time sequence change of the user behavior according to the behavior characteristics of the platform used by the user at the observation window. The LSTM realizes the long-term and short-term memory function through the network structure of the LSTM, and long-term memory is carried out on the time sequence information which is valuable to the dynamic characteristics of the user. The LSTM internal network structure comprises three gates, namely an input gate, an output gate and a forgetting gate, and also comprises an internal memory unit. At time t, the different gate update formulas inside LSTM are as follows:

H _t ＝o _t ⊙Tanh(c _t ) (6)

the formulas (1), (2) and (3) respectively represent the updating of an input gate, a forgetting gate and an output gate at the moment t, W, U and b are learning parameters, an activation function is a sigmoid function,

representing the dynamic characteristics of user u. Equations (4) and (5) represent the refresh of the memory cell, and equation (6) represents the hidden state at time t, and represents the hidden state H at the last time _t ∈R ^d×m Represented as an overall user dynamic feature context representation.

The door mechanism is that after the dynamic characteristics of the user pass through the LSTM, the important characteristic information is stored in the hidden state H _t Different user characteristics are of varying importance to affect user churn. The selection of the tree model for the splitting characteristics indirectly reflects the screening of the user characteristics, and the user loss has a direct relation with certain characteristics. In the part, a characteristic gating mechanism is proposed for capturing key characteristics of user behaviors aiming at dynamic characteristics, a linear gating unit is used for language modeling, and the gating unit selects or discards the next word, which is just the embodiment of a characteristic selection idea. For the task, a door mechanism formula is defined as follows:

G＝H _t ⊙σ(H _t *W+b)

an [ ] indicates a Hadamard product, W [ ] R ^d×m And b ∈ R ^m For model training parameters, the activation function is a sigmoid function whose output is [0,1 ]]When the output is 0, it means discard, and 1 means hold. With the help of a gating mechanism, more valuable characteristic information codes are further reserved.

Convolution operation of Convolutional Neural Networks (CNN) has been very successful in capturing local features, and capturing local features using CNN has been successfully applied to various image recognition fields. The convolution operation of CNN has a great advantage in processing mesh structure data similar to the form of image data. Therefore, in the data divided in time granularity, the convolution operation better mines the potential time-series characteristic interaction relation of the user in a multi-view angle.

The Convolutional Neural Network (CNN) extraction of local features is mainly realized by a Convolutional filter, which performs "sliding" in data at a certain step length to encode behavior features of different granularities, and summarizes different feature combinations in a Convolutional window. The filter represents a two-dimensional matrix kw · kh. Using a filter F ∈ R ^kh*kw Characteristic G epsilon R screened out by gate-pair mechanism ^B*L The extraction is carried out, and the specific formula is as follows:

wherein G is _{u,{l:l+kh-1,k:k+kw-1}} The rows represent convolution regions from l to l + kh-1 and columns from k to k + kw-1. In order to further improve the calculation efficiency and retain the key features, the local features extracted by the convolution kernel are subjected to down-sampling operation, the data is subjected to down-sampling by using maximum pooling, and the calculation is shown as the following formula.

V′ _u ＝maxpool(V _u )

The Long-short-term memory gated Convolutional neural network (LGCNN) model mainly comprises a network layer of a Long-short-term memory neural network (LSTM), a door mechanism and Convolutional Neural Network (CNN) and a final full-connection layer. The structure of the LGCNN model is shown in fig. 2.

The model input features comprise 6 user dynamic features, different input matrixes are obtained by dividing according to time granularity t =7, input data are coded through an LSTM layer and capture important feature time sequence information, data are further selected after door mechanism processing, valuable feature coding information is reserved, feature combination is carried out on different user dynamic features through CNN convolution operation, and user dynamic information coding V 'is obtained after maximum pooling operation' _u . Finally, mixing V' _u Inputting the data into a full connection layer Dense to obtain a user loss probability, wherein the output dimension of the Dense layer is set to be 2, and finally, the user loss probability is output

The probability formula is as follows:

σ is the activation function, W and b are learning parameters,

to output a probability value. The different network layer related parameters of the LGCNN are shown in the following table.

The extreme gradient lifting tree (XGboost) can only process structured data, in the task, the static characteristics of a user accord with the characteristics of the structured data, the tree model divides and selects the data characteristics in the training process, and the tree model is a characteristic conversion and construction process, and a new idea is provided for converting the static characteristics. In the application of the Facebook, the GBDT is used for having the tree-shaped characteristic, and the original sample is converted into a high-dimensional sparse matrix through the output of the leaf node of the GBDT to predict the click rate. This idea is used in user churn prediction, where there are 7 user static features, each of which intuitively represents different behavior of the user during the use of APP, but fails to represent interactions between features. The characteristics can be effectively realized through a tree model, the tree model is trained through a working mode of continuously splitting nodes, the division of each node represents the change and interaction between attributes, and each path from a root node to a leaf node represents the interaction between different characteristics. The tree model comprises two decision trees, nodes inside the tree model represent partitions of different characteristics, and the tree model generates 7 leaf nodes in total. After the original feature X is input into the tree model, the original feature X falls into a fourth leaf node of a first tree and a first node of a second tree, and in order to better represent the falling condition of the leaf nodes, the original feature X 'is coded by using a one-hot method, wherein the X' = {0, 1,0} is carried out. Original characteristics X are converted into X' through a tree model, wherein the interaction among the characteristics is completed after the splitting of internal nodes of the tree model and the processing of one-hot coding among different attributes of the original characteristics.

Extreme Gradient boost trees (XGBoost) are a variety of Gradient Boosting Decision Trees (GBDT) and have excellent performance in various data races. The GBDT algorithm trains the next tree by continuously fitting the residual error between the predicted value and the actual value of the previous tree, the leaf nodes of each tree correspond to corresponding scores, and the predicted value of the whole model is obtained by accumulating the scores. The XGboost algorithm adopts a Taylor second-order expansion to realize optimization of a loss function, and the training speed and the prediction precision of the model are improved. The XGboost algorithm adds a regularization term to the objective function, prevents the appearance of an overfitting phenomenon, and supports processing of missing values in samples. In view of the good performance of XGBoost, a new feature is constructed using the XGBoost algorithm. The XGboost algorithm selects the features to perform node splitting by calculating the gain generated after the splitting of different features. In order to further reduce the computational stress caused by the high-dimensional sparse matrix, certain settings are made for the hyper-parameters of XGBoost, where learning _ rate =0.04, n \estimators =5, max \udepth =4. For the entire XGBoost model, the final number of leaf nodes of the tree model is 80.

The prediction of the extreme gradient lifting tree-long short-term memory gated convolutional neural network prediction model (XGB-LGCNN) on user loss is determined by the extreme gradient lifting tree (XGBost) and the long short-term memory gated convolutional neural network model (LGCNN) together. It is worth noting that the XGBoost cannot be trained in parallel with the LGCNN model, and static features need to obtain final falling leaf node information in the trained XGBoost model. Therefore, in the XGB-LGCNN model, the XGboost model is a trained model, and the generated leaf node information can be input into a Dense2 full-link layer as a new feature to participate in the training of the whole model. With the output dimension of density 2 set to 2.

And inputting the static characteristics of the user into the XGboost model to interactively construct new characteristics, and further processing the sparse matrix obtained by the tree model through a layer of full connection layer Dense2 to obtain a denser matrix x. User potential time sequence information V 'obtained from user dynamic characteristics coded by LGCNN' _u The result of splicing is shown in the following formula:

e＝[x,V′ _u ]

the term "", which denotes a vector splicing operation, outputs the loss probability after e is input into Dense1, is the same as the formula (7).

The experimental parameters of the model are set as follows: training period is epochs =50, lots \ u size =512, optimizer algorithm is Adam, loss function is cross entropy loss function, learning rate is set to 0.01, L2=10 ^-5 And finally, taking the optimal result in each index test set.

Step 4, model training: building a extreme gradient lifting tree-long and short-term memory gated convolutional neural network prediction model (XGB-LGCNN), training the model by using the training sample according to the loss characteristics of the final user selected in the step 2, evaluating the prediction accuracy and the F1 value of the training model according to the comparison between the prediction result of the training model and the corresponding real label in the test sample, and selecting the trained extreme gradient lifting tree-long and short-term memory gated convolutional neural network prediction model (XGB-LGCNN) with higher accuracy as the model in practical application.

Example 2:

total days of registration

The acquired user information mainly includes "user id", "registration time length", "average consumption amount of commodity", "total consumption amount", "last login", "total usage time length", "total number of times of clicking commodity", "total number of times of consuming", "total number of times of logging in", "average consumption amount per unit", "total number of times of consuming", "login frequency", "total number of days of logging in". The total consumption amount, the total login days, the login frequency, the total commodity clicking times and the total use time are dynamic characteristic data, grouping processing is needed besides the total number of the whole observation period, the observation period of the data set is recorded for 90 days, the data set is grouped into 13 groups according to the specified time granularity of 7 days, and each group of data is only displayed in one group due to the excessively long table. The data set is a set of different users of a certain e-commerce platform, and the total number of the users is 248416 pieces of user data.

The following table shows the top 10 data examples of the selected data set:

(user data sheet)

(continue: user data table)

And (3) processing according to the data processing method in the step (2), filling numerical null values such as 'total consumption times' according to the condition of the user characteristic null values and the real physical meaning of the filling values, wherein if the user does not purchase commodities in the observation window, the corresponding user order table does not have any record of the user, and the user table is filled with null values which are 0 and represent that the user does not consume times. For the feature "last consumption", the null value cannot be simply assigned a value of 0, since 0 represents that the user purchases a commodity at the observation point (31 days 6 months), which does not fit the user's real behavior, and the user who does not consume in the observation window has his "last consumption" attribute set to-1. The filled client information data set has the characteristics consistent with the real data.

And dividing the dynamic characteristics and the static characteristics of the user of the preprocessed data according to rules. And dividing the dynamic characteristics of the user according to the time granularity, simultaneously carrying out different processing on the static characteristics to form a required user behavior characteristic data set, and dividing the data set into a training sample and a testing sample according to the proportion. Training samples as input of a extreme gradient lifting tree-long short-term memory gated convolutional neural network prediction model (XGB-LGCNN), adjusting parameters to optimize the model, and verifying that the model accuracy is 73.42%, the F1 value is 77.44% and the AUC value is 72.64%. And the other part is used as a verification set to ensure the accuracy of the model and verify that no model overfitting exists.

The user loss rate is predicted by adopting the preprocessing method and the decision tree model Adaboost, and the accuracy rate is 72.80%, the F1 value is 76.00% and the AUC value is 72.19% after the same data set is selected for model training and verification, so that the scheme disclosed by the invention has better performance in the aspect of predicting user loss.

There is also provided an electronic device according to an embodiment of the present invention, as shown in fig. 4, the electronic device may include a processor 901 and a memory 902, where the processor 901 and the memory 902 may be connected by a bus or in another manner, and fig. 4 takes the example of being connected by a bus as an example.

Processor 901 may be a Central Processing Unit (CPU). Processor 901 may also be other general purpose processors, digital Signal Processors (DSPs), application Specific Integrated Circuits (ASICs), field Programmable Gate Arrays (FPGAs) or other Programmable logic devices, discrete Gate or transistor logic devices, discrete hardware components, or combinations thereof.

The memory 902, which is a non-transitory computer readable storage medium, may be used for storing non-transitory software programs, non-transitory computer executable programs, and modules, such as program instructions/modules corresponding to the methods in the method embodiments of the present invention. The processor 901 executes various functional applications and data processing of the processor by executing non-transitory software programs, instructions and modules stored in the memory 902, that is, implements the methods in the above-described method embodiments.

The memory 902 may include a program storage area and a data storage area, wherein the program storage area may store an operating system, an application program required for at least one function; the storage data area may store data created by the processor 901, and the like. Further, the memory 902 may include high speed random access memory, and may also include non-transitory memory, such as at least one magnetic disk storage device, flash memory device, or other non-transitory solid state storage device. In some embodiments, the memory 902 may optionally include memory located remotely from the processor 901, which may be connected to the processor 901 via a network. Examples of such networks include, but are not limited to, the internet, intranets, local area networks, mobile communication networks, and combinations thereof.

One or more modules are stored in the memory 902, which when executed by the processor 901 performs the methods in the above-described method embodiments.

The specific details of the electronic device may be understood by referring to the corresponding related descriptions and effects in the above method embodiments, and are not described herein again.

Those skilled in the art will appreciate that all or part of the processes in the methods of the embodiments described above can be implemented by hardware that is related to instructions of a computer program, and the program can be stored in any computer-readable storage medium, and when executed, the program can include the processes of the embodiments of the methods described above. The storage medium may be a magnetic Disk, an optical Disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a Flash Memory (Flash Memory), a Hard Disk (Hard Disk Drive, abbreviated as HDD), a Solid State Drive (SSD), or the like; the storage medium may also comprise a combination of memories of the kind described above.

The invention adopts specific user information data and predicts the loss of the E-business users according to the self-built prediction model, so that a reasonable marketing strategy is formulated according to the obtained data, the loss of the users is reduced, the lost users are recovered, and the economic benefit is increased. The method uses a deep learning method to predict the loss of the electric commercial users, and overcomes the defects of low prediction accuracy, low speed and the like of the traditional manual method and the manual prediction method.

According to the method for predicting the loss of the electric power commercial user based on deep learning, by using a dynamic and static characteristic fusion method, hidden information contained in the relation and change of user interest and behavior habits is mined, and the hidden intention of the user is known; and an XGB-LGCNN prediction model is provided, and the model prediction precision is improved by learning the spatial characteristics and time sequence of sample data. Analyzing data of e-commerce users to know which users will be lost, and predicting customers who may be lost in the future; on the basis, users to be lost and users likely to be lost are extracted and classified, and the obtained data is used for making user loss maintenance decisions. According to the method, loss prediction is carried out on the user in the real e-commerce environment, and a user loss prediction model is built from two aspects of machine learning and deep learning respectively, so that guidance is provided for e-commerce operators.

The above description is only a preferred example of the present invention and is not intended to limit the present invention, and various modifications and changes may be made by those skilled in the art. Any modification, equivalent replacement, improvement and the like of the present invention shall be included in the protection scope of the present invention.

Claims

1. A customer churn prediction method based on deep learning is characterized in that: the method comprises the following steps:

personal portrait: user id, gender, registration duration, age;

loyalty: last login, login frequency and total use duration;

2.1, deleting the user characteristic information data with larger outlier or obvious unreasonable outlier;

2.2, filling numerical null values according to the condition of the user characteristic null values and the real physical meaning of the filling values;

2.3, deleting noise data so as to avoid influencing the accuracy of the loss prediction of the final model;

2.4, carrying out numerical processing on the category variables in the user characteristics;

2.5, deleting invalid user characteristic information data;

2.6, carrying out data standardization processing, and processing the numerical characteristics of the user by using a z-score function;

constructing user behavior characteristics of multiple dimensions according to user loss reasons, determining final user characteristics by using a Pearson coefficient and Chi-method test method, selecting characteristics more effective for user loss prediction, and removing characteristics with strong redundancy or interference; aiming at different influences of user interest change and user global behavior habits on loss prediction, a user dynamic and static fusion strategy is provided; classifying the user behavior characteristics, dividing the dynamic and static characteristics of the user to form a required user behavior characteristic data set, and dividing the user behavior characteristic data set into a training sample and a testing sample according to the proportion;

step 3, model construction: building a limit gradient lifting tree-long and short term memory gated convolutional neural network prediction model, wherein the limit gradient lifting tree-long and short term memory gated convolutional neural network prediction model consists of a long and short term memory gated convolutional neural network model and a limit gradient lifting tree, and the long and short term memory gated convolutional neural network model consists of a long and short term memory neural network, a network layer of a door mechanism and convolutional neural network and a final full connection layer; the extreme gradient lifting tree is a variant of a gradient lifting decision tree, a long-short term memory gated convolutional neural network model is used for processing dynamic characteristics of a user, the extreme gradient lifting tree is used for processing static characteristics of the user, and finally better characteristic data are obtained through full-connection layer fusion;

step 4, model training: training the extreme gradient lifting tree-long and short term memory gated convolutional neural network prediction model set up by using the final user loss characteristics selected in the step 2 and a training sample, evaluating the prediction accuracy and the F1 value of the training model according to the comparison between the prediction result of the training model and the corresponding real label in the test sample, and selecting the trained extreme gradient lifting tree-long and short term memory gated convolutional neural network prediction model with higher accuracy as a model in practical application;

and 5, predicting user loss: and (4) inputting the user data into the extreme gradient lifting tree-long and short term memory gated convolutional neural network prediction model trained in the step (4) to obtain the loss probability of the user to be detected.

2. The deep learning based customer churn prediction method as claimed in claim 1, wherein: in the user information obtained in step 1, "total amount of consumption", "total number of days of login", "frequency of login", "total number of times of clicking on a commodity", and "total duration of use" are dynamic feature data, and need to be processed in groups in addition to obtaining the total number of the whole observation period, and the data set records that the observation period is 90 days, and is divided into 13 groups by 7 days in the specified time granularity.

3. The customer churn prediction method based on deep learning of claim 1, wherein: the finally selected features have larger influence on the prediction result, the features of the end user are determined by using a Pearson coefficient and Chi-Square test method, and the features with the highest feature importance and the lowest correlation are reserved according to the calculation result, so that the features more effective for user loss prediction are selected, and the features with redundancy or strong interference are removed.

4. The customer churn prediction method based on deep learning of claim 1, wherein: the data normalization processing in step 2.6 is to process the user numerical characteristics by using a z-score function, which specifically comprises the following steps:

z-score, also called the standard score, is the process of dividing the difference of a number and the mean by the standard deviation, the standard score being the number of signs that the value of an observation or data point is higher than the standard deviation of the mean of the observed or measured values, and the formula is as follows:

wherein X is the original data, and X is the original data,

is the mean, s is the standard deviation;

z-score is also a method for data normalization, if the statistical data quantity is enough, the Z-score data distribution is satisfied, 68% of data is distributed between '-1' and '1', 95% of data is distributed between '-2' and '2', and 99% of data is distributed between '-3' and '3'; the Z-score is suitable for the condition that the data distribution is too messy, the maximum value and the minimum value cannot be judged, or too many singular points exist in the data.

5. The deep learning based customer churn prediction method as claimed in claim 1, wherein: and respectively inputting the processed dynamic and static characteristic vectors of the user into different modules of the model for processing so as to more effectively exert the characteristic extraction capability of the different modules, wherein the LGCNN is used for processing dynamic characteristics of the user, the XGboost is used for processing static characteristics of the user, and finally, better characteristic data is obtained through full-connection layer fusion, so that the prediction effect is improved, and the final prediction result is obtained.

6. A customer churn prediction system based on deep learning is characterized in that: the method comprises the following steps:

the dynamic and static characteristic processing module is used for classifying the user behavior characteristics and dividing the dynamic and static characteristics of the user, and the dynamic characteristics divide the data with the required specification according to the specified time granularity; finally, the behavior data of a user consists of static characteristic data and dynamic characteristic data, and is stored in a database to form a required user behavior characteristic data set, and the required user behavior characteristic data set is randomly divided into a training sample and a testing sample according to a proportion;