AU2020100702A4 - A Method of Prediction of Coupon Usage based on Xgboost - Google Patents
A Method of Prediction of Coupon Usage based on Xgboost Download PDFInfo
- Publication number
- AU2020100702A4 AU2020100702A4 AU2020100702A AU2020100702A AU2020100702A4 AU 2020100702 A4 AU2020100702 A4 AU 2020100702A4 AU 2020100702 A AU2020100702 A AU 2020100702A AU 2020100702 A AU2020100702 A AU 2020100702A AU 2020100702 A4 AU2020100702 A4 AU 2020100702A4
- Authority
- AU
- Australia
- Prior art keywords
- xgboost
- coupons
- data
- predict
- user
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Ceased
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N5/00—Computing arrangements using knowledge-based models
- G06N5/01—Dynamic search techniques; Heuristics; Dynamic trees; Branch-and-bound
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F8/00—Arrangements for software engineering
- G06F8/40—Transformation of program code
- G06F8/41—Compilation
- G06F8/45—Exploiting coarse grain parallelism in compilation, i.e. parallelism between groups of instructions
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for program control, e.g. control units
- G06F9/06—Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
- G06F9/46—Multiprogramming arrangements
- G06F9/50—Allocation of resources, e.g. of the central processing unit [CPU]
- G06F9/5061—Partitioning or combining of resources
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N20/00—Machine learning
- G06N20/20—Ensemble learning
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06Q—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
- G06Q30/00—Commerce
- G06Q30/02—Marketing; Price estimation or determination; Fundraising
- G06Q30/0207—Discounts or incentives, e.g. coupons or rebates
Abstract
Abstract This invention is in field of marketing and retail industry, and serve as a new method to predict the probability of coupon usage powered by XGBoost algorithm. The invention aims to help the merchant to predict the number of coupons used by the user for a period of time in the future. the invention uses xgboost as a training model and uses the previously introduced source data, including four basic information of the user id merchant id coupon id preferential rate. With these algorithms and advantages, our program will eventually be able to predict the number of coupons users will use over time Data Set File Analyze Data Set File DetermineClaracteristic Pararnpter Input rnoostModel Train Mllm oodel Get Xgoot Key Parameter Use gostModel To predict Figure 1
Description
TITLE
A Method of Prediction of Coupon Usage based on Xgboost
FIELD OF THE INVENTION
This invention is in field of marketing and retail industry, and serve as a new method to predict the probability of coupon usage powered by XGBoost algorithm.
BACKGROUND OF THE INVENTION
Nowadays, with the development of science and technology and social economy, the era of big data has come. Older marketing strategies in the past have been unable to meet the needs of the current era, and big data is increasingly applied in various fields. In the context of the new retail era today, if companies and merchants want to achieve better development, they must rely on big data to carry out innovation and reform of marketing strategies. Retail marketing innovation is the correct choice and path to enhance merchants' competitiveness. In the era of big data, businesses should aim at the use of advanced technology, use technology to promote marketing reform and innovation, maximize the actual needs of customers, and highlight the functional advantages of big data. In our solution, we used a new method that is different from the conventional method: xgboost. This new method greatly improves the efficiency of data processing because it has some innovations that are different from previous methods: 1: its Design and build a highly scalable end-to-end lifting tree system. 2: A theoretically reasonable weighted quantile sketch is proposed to calculate the candidate set. 3: A novel sparse perception algorithm is introduced for parallel tree learning. Make missing values have a default orientation. 4: An effective cache-aware block structure for out-of-core tree learning is proposed. Use the cache to speed up the process of finding the indexed column data after the sorting. In addition, another feature of our solution is that we also use multi-threading and GPU acceleration to help us process data more efficiently.
SUMMARY OF THE INVENTION
The invention aims to help the merchant to predict the number of coupons used by the user for a period of time in the future, the invention uses xgboost as a training model and uses the previously introduced source data, including four basic information of the user id merchant id coupon id preferential rate. They also include the date of consumption, the date of consumption and the date of receipt, we first import the csv files and packets into python, then process the data, convert the illegal values and null values into all the data that can be recognized by the computer, and find the characteristics from these data for more perfect processing of the data, then, the processed data is imported into the xgboost algorithm, and the algorithm is trained to get a model of the trained parameters. Finally, the test set is used for testing, and the results are evaluated and analyzed. In this program, our core is xgboost, whose core algorithm is to constantly add trees, constantly learn new functions, and continue to close the residual value left by a tree. After i
2020100702 05 May 2020 obtaining multiple trees, we predict the score of the sample, and the score is actually the leaf node on each decision tree, and then adding the score of each decision tree together is the predicted value of the sample. After we've finished building the tree, we need to deweight the data and replace the illegal values. The left features are replaced by dmatrix data to form a core set of system programs. For system optimization, we use xgb. cv, to remove more than useless parts, is that the program becomes more concise and more efficient, train the optimized files with train. After training, we make the prediction, the method of prediction is to load the model to make the prediction, after the prediction is finished, the result label is normalized to keep the operation result between 0-1. for the numerical value of the result, we aim to make the predicted value as close as possible to the future true value, so we also have an evaluation file, the result of this file is auc, if the value of auc is closer to 1, then the program is more authentic, if the value is equal to 0.5, then the program credibility is 0, not available. Finally, the result will automatically appear in the same directory as a file, the innovation point of the invention using xgboost is 1 to design and build a highly scalable end-toend lifting tree system. A theoretically reasonable weighted quantile sketch is proposed to calculate the candidate set. 3, a novel sparse sensing algorithm is introduced for parallel tree learning. 4, so that the missing value has the default direction. A valid cache-aware block structure for extranuclear tree learning is proposed. 5. Use caching to accelerate the search for column data that is scrambled after sorting. With these algorithms and advantages, our program will eventually be able to predict the number of coupons users will use over time
DESCRIPTION OF DRAWING
Figure 1 shows the process of this invention, including the following steps:
1.Reading and Pre-processing data;2.Feature Extraction
3.Training Xgboost Model;4.Evalutaing the Xgboost model and the results of the prediction.
DESCRIPTION OF PREFERRED EMBODIMENT
4.1 The description of the main purpose
The present invention utilizes two data tables provided----a record form for offline consumption and coupon collection for users, and a record form for online click I consumption and coupon collection for users----and combines with the user's 020 offline coupon usage forecast sample(the time interval of all data above is from January 1st to June 30th, 2016), and the final forecast is check whether the coupon which users received in July,2016 is been used.
4.2 Overview of resolution
2020100702 05 May 2020
Step One, Read the Data
Two tasks need to be completed in this step:
.Read data from the original csv files
2.Perform preprocessing on the original data
Step Two , Feature Extraction
In this step, it is necessary to combine two provided data forms----a record form of offline consumption and coupon collection and verification by users, and a record form of online click I consumption and coupon collection and cancellation by users----to extract the user-related features, merchantrelated features, coupon features, user-merchant interaction features and other feature. After that, the program can divide the data into two datasets, test set and training set, and complete the feature extraction and storage according to the corresponding data tables and samples.
Step Three , Model Training
After completing the related engineering of feature extraction, the program will read and preprocess the feature data.After completing the integration of these feature data, the program will start training the XGBoost model.
Step Four, Evaluate
Completing all the steps above, the program will make an evaluation of the trained model and prediction results to improve the accuracy of the prediction results and the practicality of the invention. AUC was used in the final evaluation. The closer the AUC is to 1.0, the more authentic the detection method; when it is equal to 0.5, the authenticity is the lowest and the application value.
4.3 The description of essential steps
4.3.1 Feature Extraction
2020100702 05 May 2020
Feature extraction is an important part of the whole process. Good feature extraction can make the entire model more accurate and the prediction results more accurate, effectively improving the practicality of the entire program. In the process of completing the feature extraction step, it is necessary not only to fully consider the importance of the data itself, but also to adequately analyze the relationship between these data. According to the analysis of the data itself and the relationship between them, the program will add meaningful additions to the data.additions to the data. Considering the fields in data table 1 and data table 2, we can extract the following features:
User Features | User Features |
Number of coupons consumed; | Number of coupons consumed; |
Number of coupons not consumed; | Number of coupons not consumed; |
The ratio of the number of coupons | The ratio of the number of coupons |
used to the number of coupons not | used to the number of coupons not |
used; | used; |
Number of coupons received; | Number of coupons received; |
Coupon write-off rate; | Coupon write-off rate; |
Ordinary consumption; | Ordinary consumption; |
Total consumption; | Total consumption; |
Proportion of users using coupons | Proportion of users using coupons |
2020100702 05 May
4.3.2 XGBoost Model
Other Features
Number of all coupons received by users;
Number of specific coupons received by users;
Average time interval for users to receive coupons;
Number of coupons received by users for specific merchants;
Number of different merchants received by users;
Number of coupons received by users on a particular day;
Number of specific coupons received by users on a particular day
Online User Features
The number of times of user online operations;
The number of times of clicks from users online;
Online CTR;
The number of times of user online purchases;
Online Purchase Rate;
Number of online pickups;
User Online Pickup Rate;
The number of times user do not consume online;
2020100702 05 May 2020
Offline Coupon Features
Total number of coupons issued;
Total number of coupons used;
The rate of coupon usage;
The number of coupons not used;
The number of coupons issued on the particular day;
Coupon type (discounts), rebate=l) ;
Number of different discount coupons received;
Number of different discount coupons used;
Number of different discount coupons not used
XGBoost is an open source machine learning project developed bv Chen Tianqi and others. It has
User-Merchant Features
Number of merchant coupons received by users;
The number of times a user does not write off after receiving a merchant coupons;
Number of write-offs made by users after receiving merchant coupons; User write-off rate after receiving merchant coupons;
The proportion of the number of times that users have not written off to each merchant;
The number of times the user consume in the store;
The number of times the user consume in the store without using merchant coupons;
The number of coupons the user received at this store on a particular day effectively implemented the GBDT algorithm and made many improvements in algorithms and engineering. It is widely used in Kaggle competitions and many other machine learning competitions and has achieved good results. XGBoost is still essentially a GBDT, but strives to maximize speed and efficiency, so it is called X (Extreme) GBoosted.
Ensemble Learning and Boosting Ideas
Ensemble learning is to combine multiple weakly supervised models here in order to get a better and more comprehensive strong supervised model. The underlying idea of ensemble learning is that even if one weak classifier gets a wrong prediction, other weak classifiers can also make mistakes. Correct it back.
The Boosting method uses a serial method to train the base classifiers, and there is a dependency between the base classifiers. Its basic idea is to superimpose the base classifier layers, and each layer gives a higher weight to the wrong sample of the previous base classifier during training. During the test, the final result is obtained by weighting the results of the classifiers of each layer.
GBDT(Gradient Boosting Decision Tree)
GBDT is a model based on Boosting. It makes the results of all weak classifiers add up to the predicted value, and then the next weak classifier fits the residual of the error function to the predicted value (this residual is the error between the predicted value and the true value). The expression of the weak classifier in it is each tree.
2020100702 05 May 2020
Major innovations
Design and build a highly scalable end-to-end lifting tree system.
Proposed a theoretically reasonable weighted quantile sketch ( weighted quantile sketch ) to calculate the candidate set.
A novel sparse perception algorithm is introduced for parallel tree learning. Make missing values have a default orientation.
An effective cache-aware block structure for out-of-core tree learning is proposed. Use the cache to speed up the process of finding the indexed column data after the sorting.
Core Algorithm .Constantly add trees, and constantly perform feature splits to grow a tree. Each time you add a tree, you are actually learning a new function f (x) to fit the residuals of the last prediction.
2. When we get k trees after training, we need to predict the score of a sample. In fact, according to the characteristics of this sample, each tree will fall to a corresponding leaf node, and each leaf node corresponds to a score
3. Finally, you only need to add the scores corresponding to each tree to the predicted value of the sample.
Our goal is to make the predicted value of the tree group as close to the real value as possible, and have the greatest generalization ability. Similar to the previous GBDT routine, XGBoost also needs to accumulate the scores of multiple trees to obtain the final predicted score (each iteration, based on the existing tree, add a tree to fit the prediction result of the previous tree and the true value Between residuals).
Claims (1)
- cq What We Claim is:1. A Method of Prediction of Coupon Usage based on Xgboost , characterized in that: project has used three core technologies;The first one is xgboost, a computing method can be used to achieve GBDT computing method and add some improvement, we used it to calculate the data what you import and then predict the data; and the second one is multithreaded operation; it’s The technique of executing multiple threads concurrently from software or hardware; computer having the capability of multithreading for hardware support more than one Thread can be executed at the same time, thus improve the overall performance with this capability system includes more than symmetric multiprocessor core processors and chip-level multiprocessing or at the same time multithreading processor in a program, these independent running program fragment called Thread , use it to programming concept is called Multithreading, the third one is feature extraction; feature extraction is a concept in computer vision and image processing, it refers to the use of computer to extract image information, to determine whether each image point belongs to the result of the characteristics of an image feature extraction is the point of the image is divided into different subsets of the subset of these tend to belong to an isolated point continuous curve or continuous regional characteristics of good or bad has a crucial effect on generalization performance.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
AU2020100702A AU2020100702A4 (en) | 2020-05-05 | 2020-05-05 | A Method of Prediction of Coupon Usage based on Xgboost |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
AU2020100702A AU2020100702A4 (en) | 2020-05-05 | 2020-05-05 | A Method of Prediction of Coupon Usage based on Xgboost |
Publications (1)
Publication Number | Publication Date |
---|---|
AU2020100702A4 true AU2020100702A4 (en) | 2020-06-11 |
Family
ID=70969027
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
AU2020100702A Ceased AU2020100702A4 (en) | 2020-05-05 | 2020-05-05 | A Method of Prediction of Coupon Usage based on Xgboost |
Country Status (1)
Country | Link |
---|---|
AU (1) | AU2020100702A4 (en) |
Cited By (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111815066A (en) * | 2020-07-21 | 2020-10-23 | 上海数鸣人工智能科技有限公司 | User click prediction method based on gradient lifting decision tree |
CN112161173A (en) * | 2020-09-10 | 2021-01-01 | 国网河北省电力有限公司检修分公司 | Power grid wiring parameter detection device and detection method |
US11741489B2 (en) | 2021-08-11 | 2023-08-29 | International Business Machines Corporation | AI enabled coupon code generation for improved user experience |
-
2020
- 2020-05-05 AU AU2020100702A patent/AU2020100702A4/en not_active Ceased
Cited By (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111815066A (en) * | 2020-07-21 | 2020-10-23 | 上海数鸣人工智能科技有限公司 | User click prediction method based on gradient lifting decision tree |
CN111815066B (en) * | 2020-07-21 | 2021-03-26 | 上海数鸣人工智能科技有限公司 | User click prediction method based on gradient lifting decision tree |
CN112161173A (en) * | 2020-09-10 | 2021-01-01 | 国网河北省电力有限公司检修分公司 | Power grid wiring parameter detection device and detection method |
US11741489B2 (en) | 2021-08-11 | 2023-08-29 | International Business Machines Corporation | AI enabled coupon code generation for improved user experience |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
AU2020100702A4 (en) | A Method of Prediction of Coupon Usage based on Xgboost | |
Wang et al. | A principled approach to data valuation for federated learning | |
Day et al. | Deep learning for financial sentiment analysis on finance news providers | |
Hsueh et al. | Effective matching for P2P lending by mining strong association rules | |
Spinde et al. | Neural Media Bias Detection Using Distant Supervision With BABE--Bias Annotations By Experts | |
Zhao et al. | Stock market prediction exploiting microblog sentiment analysis | |
Xiuguo et al. | An analysis on financial statement fraud detection for Chinese listed companies using deep learning | |
Atwood et al. | The inclusive images competition | |
Jiang et al. | Benchmarking state-of-the-art imbalanced data learning approaches for credit scoring | |
Zhou et al. | MMSE: A multi-model stacking ensemble learning algorithm for purchase prediction | |
Wang et al. | Soft focal loss: Evaluating sample quality for dense object detection | |
Han | Comparing models for time series analysis | |
Xiao et al. | Patent text classification based on naive Bayesian method | |
Wu et al. | Analytical performance modeling for top-k query processing | |
Zhang | A model combining LightGBM and neural network for high-frequency realized volatility forecasting | |
Bretan et al. | Learning and evaluating musical features with deep autoencoders | |
Sagala et al. | Enhanced churn prediction model with boosted trees algorithms in the banking sector | |
Dornaika et al. | A unified deep semi-supervised graph learning scheme based on nodes re-weighting and manifold regularization | |
Wei | A SVM approach in forecasting the moving direction of Chinese stock indices | |
Wu et al. | ALBERT-BPF: a book purchase forecast model for university library by using ALBERT for text feature extraction | |
Shetty | A Hybrid Feature Selection and Hybrid Prediction Model for Credit Risk Prediction | |
Guyon et al. | Results of the cause-effect pair challenge | |
Cialone | Bankruptcy prediction by deep learning | |
Ben Ami et al. | Event-based trading: Building superior trading strategies with state-of-the-art information extraction tools | |
Jiang et al. | Banking market structure and industrial structure: A transnational empirical study |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
FGI | Letters patent sealed or granted (innovation patent) | ||
MK22 | Patent ceased section 143a(d), or expired - non payment of renewal fee or expiry |