AU2020100702A4

AU2020100702A4 - A Method of Prediction of Coupon Usage based on Xgboost

Info

Publication number: AU2020100702A4
Application number: AU2020100702A
Authority: AU
Inventors: Xu Chen; Yuxiang Fei; Nuoyi Li; Xiangbo Sun; Haopeng WANG; Yuan Xie
Original assignee: Fei Yuxiang Miss; Li Nuoyi Miss
Current assignee: Fei Yuxiang Miss; Li Nuoyi Miss
Priority date: 2020-05-05
Filing date: 2020-05-05
Publication date: 2020-06-11
Anticipated expiration: 2028-05-05

Abstract

Abstract This invention is in field of marketing and retail industry, and serve as a new method to predict the probability of coupon usage powered by XGBoost algorithm. The invention aims to help the merchant to predict the number of coupons used by the user for a period of time in the future. the invention uses xgboost as a training model and uses the previously introduced source data, including four basic information of the user id merchant id coupon id preferential rate. With these algorithms and advantages, our program will eventually be able to predict the number of coupons users will use over time Data Set File Analyze Data Set File DetermineClaracteristic Pararnpter Input rnoostModel Train Mllm oodel Get Xgoot Key Parameter Use gostModel To predict Figure 1

Description

TITLE

A Method of Prediction of Coupon Usage based on Xgboost

FIELD OF THE INVENTION

This invention is in field of marketing and retail industry, and serve as a new method to predict the probability of coupon usage powered by XGBoost algorithm.

BACKGROUND OF THE INVENTION

Nowadays, with the development of science and technology and social economy, the era of big data has come. Older marketing strategies in the past have been unable to meet the needs of the current era, and big data is increasingly applied in various fields. In the context of the new retail era today, if companies and merchants want to achieve better development, they must rely on big data to carry out innovation and reform of marketing strategies. Retail marketing innovation is the correct choice and path to enhance merchants' competitiveness. In the era of big data, businesses should aim at the use of advanced technology, use technology to promote marketing reform and innovation, maximize the actual needs of customers, and highlight the functional advantages of big data. In our solution, we used a new method that is different from the conventional method: xgboost. This new method greatly improves the efficiency of data processing because it has some innovations that are different from previous methods: 1: its Design and build a highly scalable end-to-end lifting tree system. 2: A theoretically reasonable weighted quantile sketch is proposed to calculate the candidate set. 3: A novel sparse perception algorithm is introduced for parallel tree learning. Make missing values have a default orientation. 4: An effective cache-aware block structure for out-of-core tree learning is proposed. Use the cache to speed up the process of finding the indexed column data after the sorting. In addition, another feature of our solution is that we also use multi-threading and GPU acceleration to help us process data more efficiently.

SUMMARY OF THE INVENTION

The invention aims to help the merchant to predict the number of coupons used by the user for a period of time in the future, the invention uses xgboost as a training model and uses the previously introduced source data, including four basic information of the user id merchant id coupon id preferential rate. They also include the date of consumption, the date of consumption and the date of receipt, we first import the csv files and packets into python, then process the data, convert the illegal values and null values into all the data that can be recognized by the computer, and find the characteristics from these data for more perfect processing of the data, then, the processed data is imported into the xgboost algorithm, and the algorithm is trained to get a model of the trained parameters. Finally, the test set is used for testing, and the results are evaluated and analyzed. In this program, our core is xgboost, whose core algorithm is to constantly add trees, constantly learn new functions, and continue to close the residual value left by a tree. After i

2020100702 05 May 2020 obtaining multiple trees, we predict the score of the sample, and the score is actually the leaf node on each decision tree, and then adding the score of each decision tree together is the predicted value of the sample. After we've finished building the tree, we need to deweight the data and replace the illegal values. The left features are replaced by dmatrix data to form a core set of system programs. For system optimization, we use xgb. cv, to remove more than useless parts, is that the program becomes more concise and more efficient, train the optimized files with train. After training, we make the prediction, the method of prediction is to load the model to make the prediction, after the prediction is finished, the result label is normalized to keep the operation result between 0-1. for the numerical value of the result, we aim to make the predicted value as close as possible to the future true value, so we also have an evaluation file, the result of this file is auc, if the value of auc is closer to 1, then the program is more authentic, if the value is equal to 0.5, then the program credibility is 0, not available. Finally, the result will automatically appear in the same directory as a file, the innovation point of the invention using xgboost is 1 to design and build a highly scalable end-toend lifting tree system. A theoretically reasonable weighted quantile sketch is proposed to calculate the candidate set. 3, a novel sparse sensing algorithm is introduced for parallel tree learning. 4, so that the missing value has the default direction. A valid cache-aware block structure for extranuclear tree learning is proposed. 5. Use caching to accelerate the search for column data that is scrambled after sorting. With these algorithms and advantages, our program will eventually be able to predict the number of coupons users will use over time

DESCRIPTION OF DRAWING

Figure 1 shows the process of this invention, including the following steps:

1.Reading and Pre-processing data;2.Feature Extraction

3.Training Xgboost Model;4.Evalutaing the Xgboost model and the results of the prediction.

DESCRIPTION OF PREFERRED EMBODIMENT

4.1 The description of the main purpose

The present invention utilizes two data tables provided----a record form for offline consumption and coupon collection for users, and a record form for online click I consumption and coupon collection for users----and combines with the user's 020 offline coupon usage forecast sample(the time interval of all data above is from January 1st to June 30th, 2016), and the final forecast is check whether the coupon which users received in July,2016 is been used.

4.2 Overview of resolution

2020100702 05 May 2020

Step One, Read the Data

Two tasks need to be completed in this step:

.Read data from the original csv files

2.Perform preprocessing on the original data

Step Two , Feature Extraction

In this step, it is necessary to combine two provided data forms----a record form of offline consumption and coupon collection and verification by users, and a record form of online click I consumption and coupon collection and cancellation by users----to extract the user-related features, merchantrelated features, coupon features, user-merchant interaction features and other feature. After that, the program can divide the data into two datasets, test set and training set, and complete the feature extraction and storage according to the corresponding data tables and samples.

Step Three , Model Training

After completing the related engineering of feature extraction, the program will read and preprocess the feature data.After completing the integration of these feature data, the program will start training the XGBoost model.

Step Four, Evaluate

Completing all the steps above, the program will make an evaluation of the trained model and prediction results to improve the accuracy of the prediction results and the practicality of the invention. AUC was used in the final evaluation. The closer the AUC is to 1.0, the more authentic the detection method; when it is equal to 0.5, the authenticity is the lowest and the application value.

4.3 The description of essential steps

4.3.1 Feature Extraction

2020100702 05 May 2020

Feature extraction is an important part of the whole process. Good feature extraction can make the entire model more accurate and the prediction results more accurate, effectively improving the practicality of the entire program. In the process of completing the feature extraction step, it is necessary not only to fully consider the importance of the data itself, but also to adequately analyze the relationship between these data. According to the analysis of the data itself and the relationship between them, the program will add meaningful additions to the data.additions to the data. Considering the fields in data table 1 and data table 2, we can extract the following features:

User Features	User Features
Number of coupons consumed;	Number of coupons consumed;
Number of coupons not consumed;	Number of coupons not consumed;
The ratio of the number of coupons	The ratio of the number of coupons
used to the number of coupons not	used to the number of coupons not
used;	used;
Number of coupons received;	Number of coupons received;
Coupon write-off rate;	Coupon write-off rate;
Ordinary consumption;	Ordinary consumption;
Total consumption;	Total consumption;
Proportion of users using coupons	Proportion of users using coupons

2020100702 05 May

4.3.2 XGBoost Model

Other Features

Number of all coupons received by users;

Number of specific coupons received by users;

Average time interval for users to receive coupons;

Number of coupons received by users for specific merchants;

Number of different merchants received by users;

Number of coupons received by users on a particular day;

Number of specific coupons received by users on a particular day

Online User Features

The number of times of user online operations;

The number of times of clicks from users online;

Online CTR;

The number of times of user online purchases;

Online Purchase Rate;

Number of online pickups;

User Online Pickup Rate;

The number of times user do not consume online;

2020100702 05 May 2020

Offline Coupon Features

Total number of coupons issued;

Total number of coupons used;

The rate of coupon usage;

The number of coupons not used;

The number of coupons issued on the particular day;

Coupon type (discounts), rebate=l) ;

Number of different discount coupons received;

Number of different discount coupons used;

Number of different discount coupons not used

XGBoost is an open source machine learning project developed bv Chen Tianqi and others. It has

User-Merchant Features

Number of merchant coupons received by users;

The number of times a user does not write off after receiving a merchant coupons;

Number of write-offs made by users after receiving merchant coupons; User write-off rate after receiving merchant coupons;

The proportion of the number of times that users have not written off to each merchant;

The number of times the user consume in the store;

The number of times the user consume in the store without using merchant coupons;

The number of coupons the user received at this store on a particular day effectively implemented the GBDT algorithm and made many improvements in algorithms and engineering. It is widely used in Kaggle competitions and many other machine learning competitions and has achieved good results. XGBoost is still essentially a GBDT, but strives to maximize speed and efficiency, so it is called X (Extreme) GBoosted.

Ensemble Learning and Boosting Ideas

Ensemble learning is to combine multiple weakly supervised models here in order to get a better and more comprehensive strong supervised model. The underlying idea of ensemble learning is that even if one weak classifier gets a wrong prediction, other weak classifiers can also make mistakes. Correct it back.

The Boosting method uses a serial method to train the base classifiers, and there is a dependency between the base classifiers. Its basic idea is to superimpose the base classifier layers, and each layer gives a higher weight to the wrong sample of the previous base classifier during training. During the test, the final result is obtained by weighting the results of the classifiers of each layer.

GBDT(Gradient Boosting Decision Tree)

GBDT is a model based on Boosting. It makes the results of all weak classifiers add up to the predicted value, and then the next weak classifier fits the residual of the error function to the predicted value (this residual is the error between the predicted value and the true value). The expression of the weak classifier in it is each tree.

2020100702 05 May 2020

Major innovations

Design and build a highly scalable end-to-end lifting tree system.

Proposed a theoretically reasonable weighted quantile sketch ( weighted quantile sketch ) to calculate the candidate set.

A novel sparse perception algorithm is introduced for parallel tree learning. Make missing values have a default orientation.

An effective cache-aware block structure for out-of-core tree learning is proposed. Use the cache to speed up the process of finding the indexed column data after the sorting.

Core Algorithm .Constantly add trees, and constantly perform feature splits to grow a tree. Each time you add a tree, you are actually learning a new function f (x) to fit the residuals of the last prediction.

2. When we get k trees after training, we need to predict the score of a sample. In fact, according to the characteristics of this sample, each tree will fall to a corresponding leaf node, and each leaf node corresponds to a score

3. Finally, you only need to add the scores corresponding to each tree to the predicted value of the sample.

Our goal is to make the predicted value of the tree group as close to the real value as possible, and have the greatest generalization ability. Similar to the previous GBDT routine, XGBoost also needs to accumulate the scores of multiple trees to obtain the final predicted score (each iteration, based on the existing tree, add a tree to fit the prediction result of the previous tree and the true value Between residuals).

Claims

cq What We Claim is:

1. A Method of Prediction of Coupon Usage based on Xgboost , characterized in that: project has used three core technologies;

The first one is xgboost, a computing method can be used to achieve GBDT computing method and add some improvement, we used it to calculate the data what you import and then predict the data; and the second one is multithreaded operation; it’s The technique of executing multiple threads concurrently from software or hardware; computer having the capability of multithreading for hardware support more than one Thread can be executed at the same time, thus improve the overall performance with this capability system includes more than symmetric multiprocessor core processors and chip-level multiprocessing or at the same time multithreading processor in a program, these independent running program fragment called Thread , use it to programming concept is called Multithreading, the third one is feature extraction; feature extraction is a concept in computer vision and image processing, it refers to the use of computer to extract image information, to determine whether each image point belongs to the result of the characteristics of an image feature extraction is the point of the image is divided into different subsets of the subset of these tend to belong to an isolated point continuous curve or continuous regional characteristics of good or bad has a crucial effect on generalization performance.