AU2020100702A4 - A Method of Prediction of Coupon Usage based on Xgboost - Google Patents

A Method of Prediction of Coupon Usage based on Xgboost Download PDF

Info

Publication number
AU2020100702A4
AU2020100702A4 AU2020100702A AU2020100702A AU2020100702A4 AU 2020100702 A4 AU2020100702 A4 AU 2020100702A4 AU 2020100702 A AU2020100702 A AU 2020100702A AU 2020100702 A AU2020100702 A AU 2020100702A AU 2020100702 A4 AU2020100702 A4 AU 2020100702A4
Authority
AU
Australia
Prior art keywords
xgboost
coupons
data
predict
user
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Ceased
Application number
AU2020100702A
Inventor
Xu Chen
Yuxiang Fei
Nuoyi Li
Xiangbo Sun
Haopeng WANG
Yuan Xie
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Fei Yuxiang Miss
Li Nuoyi Miss
Original Assignee
Fei Yuxiang Miss
Li Nuoyi Miss
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Fei Yuxiang Miss, Li Nuoyi Miss filed Critical Fei Yuxiang Miss
Priority to AU2020100702A priority Critical patent/AU2020100702A4/en
Application granted granted Critical
Publication of AU2020100702A4 publication Critical patent/AU2020100702A4/en
Ceased legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N5/00Computing arrangements using knowledge-based models
    • G06N5/01Dynamic search techniques; Heuristics; Dynamic trees; Branch-and-bound
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F8/00Arrangements for software engineering
    • G06F8/40Transformation of program code
    • G06F8/41Compilation
    • G06F8/45Exploiting coarse grain parallelism in compilation, i.e. parallelism between groups of instructions
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/50Allocation of resources, e.g. of the central processing unit [CPU]
    • G06F9/5061Partitioning or combining of resources
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N20/00Machine learning
    • G06N20/20Ensemble learning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q30/00Commerce
    • G06Q30/02Marketing; Price estimation or determination; Fundraising
    • G06Q30/0207Discounts or incentives, e.g. coupons or rebates

Abstract

Abstract This invention is in field of marketing and retail industry, and serve as a new method to predict the probability of coupon usage powered by XGBoost algorithm. The invention aims to help the merchant to predict the number of coupons used by the user for a period of time in the future. the invention uses xgboost as a training model and uses the previously introduced source data, including four basic information of the user id merchant id coupon id preferential rate. With these algorithms and advantages, our program will eventually be able to predict the number of coupons users will use over time Data Set File Analyze Data Set File DetermineClaracteristic Pararnpter Input rnoostModel Train Mllm oodel Get Xgoot Key Parameter Use gostModel To predict Figure 1

Description

TITLE
A Method of Prediction of Coupon Usage based on Xgboost
FIELD OF THE INVENTION
This invention is in field of marketing and retail industry, and serve as a new method to predict the probability of coupon usage powered by XGBoost algorithm.
BACKGROUND OF THE INVENTION
Nowadays, with the development of science and technology and social economy, the era of big data has come. Older marketing strategies in the past have been unable to meet the needs of the current era, and big data is increasingly applied in various fields. In the context of the new retail era today, if companies and merchants want to achieve better development, they must rely on big data to carry out innovation and reform of marketing strategies. Retail marketing innovation is the correct choice and path to enhance merchants' competitiveness. In the era of big data, businesses should aim at the use of advanced technology, use technology to promote marketing reform and innovation, maximize the actual needs of customers, and highlight the functional advantages of big data. In our solution, we used a new method that is different from the conventional method: xgboost. This new method greatly improves the efficiency of data processing because it has some innovations that are different from previous methods: 1: its Design and build a highly scalable end-to-end lifting tree system. 2: A theoretically reasonable weighted quantile sketch is proposed to calculate the candidate set. 3: A novel sparse perception algorithm is introduced for parallel tree learning. Make missing values have a default orientation. 4: An effective cache-aware block structure for out-of-core tree learning is proposed. Use the cache to speed up the process of finding the indexed column data after the sorting. In addition, another feature of our solution is that we also use multi-threading and GPU acceleration to help us process data more efficiently.
SUMMARY OF THE INVENTION
The invention aims to help the merchant to predict the number of coupons used by the user for a period of time in the future, the invention uses xgboost as a training model and uses the previously introduced source data, including four basic information of the user id merchant id coupon id preferential rate. They also include the date of consumption, the date of consumption and the date of receipt, we first import the csv files and packets into python, then process the data, convert the illegal values and null values into all the data that can be recognized by the computer, and find the characteristics from these data for more perfect processing of the data, then, the processed data is imported into the xgboost algorithm, and the algorithm is trained to get a model of the trained parameters. Finally, the test set is used for testing, and the results are evaluated and analyzed. In this program, our core is xgboost, whose core algorithm is to constantly add trees, constantly learn new functions, and continue to close the residual value left by a tree. After i
2020100702 05 May 2020 obtaining multiple trees, we predict the score of the sample, and the score is actually the leaf node on each decision tree, and then adding the score of each decision tree together is the predicted value of the sample. After we've finished building the tree, we need to deweight the data and replace the illegal values. The left features are replaced by dmatrix data to form a core set of system programs. For system optimization, we use xgb. cv, to remove more than useless parts, is that the program becomes more concise and more efficient, train the optimized files with train. After training, we make the prediction, the method of prediction is to load the model to make the prediction, after the prediction is finished, the result label is normalized to keep the operation result between 0-1. for the numerical value of the result, we aim to make the predicted value as close as possible to the future true value, so we also have an evaluation file, the result of this file is auc, if the value of auc is closer to 1, then the program is more authentic, if the value is equal to 0.5, then the program credibility is 0, not available. Finally, the result will automatically appear in the same directory as a file, the innovation point of the invention using xgboost is 1 to design and build a highly scalable end-toend lifting tree system. A theoretically reasonable weighted quantile sketch is proposed to calculate the candidate set. 3, a novel sparse sensing algorithm is introduced for parallel tree learning. 4, so that the missing value has the default direction. A valid cache-aware block structure for extranuclear tree learning is proposed. 5. Use caching to accelerate the search for column data that is scrambled after sorting. With these algorithms and advantages, our program will eventually be able to predict the number of coupons users will use over time
DESCRIPTION OF DRAWING
Figure 1 shows the process of this invention, including the following steps:
1.Reading and Pre-processing data;2.Feature Extraction
3.Training Xgboost Model;4.Evalutaing the Xgboost model and the results of the prediction.
DESCRIPTION OF PREFERRED EMBODIMENT
4.1 The description of the main purpose
The present invention utilizes two data tables provided----a record form for offline consumption and coupon collection for users, and a record form for online click I consumption and coupon collection for users----and combines with the user's 020 offline coupon usage forecast sample(the time interval of all data above is from January 1st to June 30th, 2016), and the final forecast is check whether the coupon which users received in July,2016 is been used.
4.2 Overview of resolution
2020100702 05 May 2020
Step One, Read the Data
Two tasks need to be completed in this step:
.Read data from the original csv files
2.Perform preprocessing on the original data
Step Two , Feature Extraction
In this step, it is necessary to combine two provided data forms----a record form of offline consumption and coupon collection and verification by users, and a record form of online click I consumption and coupon collection and cancellation by users----to extract the user-related features, merchantrelated features, coupon features, user-merchant interaction features and other feature. After that, the program can divide the data into two datasets, test set and training set, and complete the feature extraction and storage according to the corresponding data tables and samples.
Step Three , Model Training
After completing the related engineering of feature extraction, the program will read and preprocess the feature data.After completing the integration of these feature data, the program will start training the XGBoost model.
Step Four, Evaluate
Completing all the steps above, the program will make an evaluation of the trained model and prediction results to improve the accuracy of the prediction results and the practicality of the invention. AUC was used in the final evaluation. The closer the AUC is to 1.0, the more authentic the detection method; when it is equal to 0.5, the authenticity is the lowest and the application value.
4.3 The description of essential steps
4.3.1 Feature Extraction
2020100702 05 May 2020
Feature extraction is an important part of the whole process. Good feature extraction can make the entire model more accurate and the prediction results more accurate, effectively improving the practicality of the entire program. In the process of completing the feature extraction step, it is necessary not only to fully consider the importance of the data itself, but also to adequately analyze the relationship between these data. According to the analysis of the data itself and the relationship between them, the program will add meaningful additions to the data.additions to the data. Considering the fields in data table 1 and data table 2, we can extract the following features:
User Features User Features
Number of coupons consumed; Number of coupons consumed;
Number of coupons not consumed; Number of coupons not consumed;
The ratio of the number of coupons The ratio of the number of coupons
used to the number of coupons not used to the number of coupons not
used; used;
Number of coupons received; Number of coupons received;
Coupon write-off rate; Coupon write-off rate;
Ordinary consumption; Ordinary consumption;
Total consumption; Total consumption;
Proportion of users using coupons Proportion of users using coupons
2020100702 05 May
4.3.2 XGBoost Model
Other Features
Number of all coupons received by users;
Number of specific coupons received by users;
Average time interval for users to receive coupons;
Number of coupons received by users for specific merchants;
Number of different merchants received by users;
Number of coupons received by users on a particular day;
Number of specific coupons received by users on a particular day
Online User Features
The number of times of user online operations;
The number of times of clicks from users online;
Online CTR;
The number of times of user online purchases;
Online Purchase Rate;
Number of online pickups;
User Online Pickup Rate;
The number of times user do not consume online;
2020100702 05 May 2020
Offline Coupon Features
Total number of coupons issued;
Total number of coupons used;
The rate of coupon usage;
The number of coupons not used;
The number of coupons issued on the particular day;
Coupon type (discounts), rebate=l) ;
Number of different discount coupons received;
Number of different discount coupons used;
Number of different discount coupons not used
XGBoost is an open source machine learning project developed bv Chen Tianqi and others. It has
User-Merchant Features
Number of merchant coupons received by users;
The number of times a user does not write off after receiving a merchant coupons;
Number of write-offs made by users after receiving merchant coupons; User write-off rate after receiving merchant coupons;
The proportion of the number of times that users have not written off to each merchant;
The number of times the user consume in the store;
The number of times the user consume in the store without using merchant coupons;
The number of coupons the user received at this store on a particular day effectively implemented the GBDT algorithm and made many improvements in algorithms and engineering. It is widely used in Kaggle competitions and many other machine learning competitions and has achieved good results. XGBoost is still essentially a GBDT, but strives to maximize speed and efficiency, so it is called X (Extreme) GBoosted.
Ensemble Learning and Boosting Ideas
Ensemble learning is to combine multiple weakly supervised models here in order to get a better and more comprehensive strong supervised model. The underlying idea of ensemble learning is that even if one weak classifier gets a wrong prediction, other weak classifiers can also make mistakes. Correct it back.
The Boosting method uses a serial method to train the base classifiers, and there is a dependency between the base classifiers. Its basic idea is to superimpose the base classifier layers, and each layer gives a higher weight to the wrong sample of the previous base classifier during training. During the test, the final result is obtained by weighting the results of the classifiers of each layer.
GBDT(Gradient Boosting Decision Tree)
GBDT is a model based on Boosting. It makes the results of all weak classifiers add up to the predicted value, and then the next weak classifier fits the residual of the error function to the predicted value (this residual is the error between the predicted value and the true value). The expression of the weak classifier in it is each tree.
2020100702 05 May 2020
Major innovations
Design and build a highly scalable end-to-end lifting tree system.
Proposed a theoretically reasonable weighted quantile sketch ( weighted quantile sketch ) to calculate the candidate set.
A novel sparse perception algorithm is introduced for parallel tree learning. Make missing values have a default orientation.
An effective cache-aware block structure for out-of-core tree learning is proposed. Use the cache to speed up the process of finding the indexed column data after the sorting.
Core Algorithm .Constantly add trees, and constantly perform feature splits to grow a tree. Each time you add a tree, you are actually learning a new function f (x) to fit the residuals of the last prediction.
2. When we get k trees after training, we need to predict the score of a sample. In fact, according to the characteristics of this sample, each tree will fall to a corresponding leaf node, and each leaf node corresponds to a score
3. Finally, you only need to add the scores corresponding to each tree to the predicted value of the sample.
Our goal is to make the predicted value of the tree group as close to the real value as possible, and have the greatest generalization ability. Similar to the previous GBDT routine, XGBoost also needs to accumulate the scores of multiple trees to obtain the final predicted score (each iteration, based on the existing tree, add a tree to fit the prediction result of the previous tree and the true value Between residuals).

Claims (1)

  1. cq What We Claim is:
    1. A Method of Prediction of Coupon Usage based on Xgboost , characterized in that: project has used three core technologies;
    The first one is xgboost, a computing method can be used to achieve GBDT computing method and add some improvement, we used it to calculate the data what you import and then predict the data; and the second one is multithreaded operation; it’s The technique of executing multiple threads concurrently from software or hardware; computer having the capability of multithreading for hardware support more than one Thread can be executed at the same time, thus improve the overall performance with this capability system includes more than symmetric multiprocessor core processors and chip-level multiprocessing or at the same time multithreading processor in a program, these independent running program fragment called Thread , use it to programming concept is called Multithreading, the third one is feature extraction; feature extraction is a concept in computer vision and image processing, it refers to the use of computer to extract image information, to determine whether each image point belongs to the result of the characteristics of an image feature extraction is the point of the image is divided into different subsets of the subset of these tend to belong to an isolated point continuous curve or continuous regional characteristics of good or bad has a crucial effect on generalization performance.
AU2020100702A 2020-05-05 2020-05-05 A Method of Prediction of Coupon Usage based on Xgboost Ceased AU2020100702A4 (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
AU2020100702A AU2020100702A4 (en) 2020-05-05 2020-05-05 A Method of Prediction of Coupon Usage based on Xgboost

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
AU2020100702A AU2020100702A4 (en) 2020-05-05 2020-05-05 A Method of Prediction of Coupon Usage based on Xgboost

Publications (1)

Publication Number Publication Date
AU2020100702A4 true AU2020100702A4 (en) 2020-06-11

Family

ID=70969027

Family Applications (1)

Application Number Title Priority Date Filing Date
AU2020100702A Ceased AU2020100702A4 (en) 2020-05-05 2020-05-05 A Method of Prediction of Coupon Usage based on Xgboost

Country Status (1)

Country Link
AU (1) AU2020100702A4 (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111815066A (en) * 2020-07-21 2020-10-23 上海数鸣人工智能科技有限公司 User click prediction method based on gradient lifting decision tree
CN112161173A (en) * 2020-09-10 2021-01-01 国网河北省电力有限公司检修分公司 Power grid wiring parameter detection device and detection method
US11741489B2 (en) 2021-08-11 2023-08-29 International Business Machines Corporation AI enabled coupon code generation for improved user experience

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111815066A (en) * 2020-07-21 2020-10-23 上海数鸣人工智能科技有限公司 User click prediction method based on gradient lifting decision tree
CN111815066B (en) * 2020-07-21 2021-03-26 上海数鸣人工智能科技有限公司 User click prediction method based on gradient lifting decision tree
CN112161173A (en) * 2020-09-10 2021-01-01 国网河北省电力有限公司检修分公司 Power grid wiring parameter detection device and detection method
US11741489B2 (en) 2021-08-11 2023-08-29 International Business Machines Corporation AI enabled coupon code generation for improved user experience

Similar Documents

Publication Publication Date Title
AU2020100702A4 (en) A Method of Prediction of Coupon Usage based on Xgboost
Wang et al. A principled approach to data valuation for federated learning
Day et al. Deep learning for financial sentiment analysis on finance news providers
Hsueh et al. Effective matching for P2P lending by mining strong association rules
Spinde et al. Neural Media Bias Detection Using Distant Supervision With BABE--Bias Annotations By Experts
Zhao et al. Stock market prediction exploiting microblog sentiment analysis
Xiuguo et al. An analysis on financial statement fraud detection for Chinese listed companies using deep learning
Atwood et al. The inclusive images competition
Jiang et al. Benchmarking state-of-the-art imbalanced data learning approaches for credit scoring
Zhou et al. MMSE: A multi-model stacking ensemble learning algorithm for purchase prediction
Wang et al. Soft focal loss: Evaluating sample quality for dense object detection
Han Comparing models for time series analysis
Xiao et al. Patent text classification based on naive Bayesian method
Wu et al. Analytical performance modeling for top-k query processing
Zhang A model combining LightGBM and neural network for high-frequency realized volatility forecasting
Bretan et al. Learning and evaluating musical features with deep autoencoders
Sagala et al. Enhanced churn prediction model with boosted trees algorithms in the banking sector
Dornaika et al. A unified deep semi-supervised graph learning scheme based on nodes re-weighting and manifold regularization
Wei A SVM approach in forecasting the moving direction of Chinese stock indices
Wu et al. ALBERT-BPF: a book purchase forecast model for university library by using ALBERT for text feature extraction
Shetty A Hybrid Feature Selection and Hybrid Prediction Model for Credit Risk Prediction
Guyon et al. Results of the cause-effect pair challenge
Cialone Bankruptcy prediction by deep learning
Ben Ami et al. Event-based trading: Building superior trading strategies with state-of-the-art information extraction tools
Jiang et al. Banking market structure and industrial structure: A transnational empirical study

Legal Events

Date Code Title Description
FGI Letters patent sealed or granted (innovation patent)
MK22 Patent ceased section 143a(d), or expired - non payment of renewal fee or expiry