CN107301562A - A kind of O2O reward vouchers use big data Forecasting Methodology - Google Patents
A kind of O2O reward vouchers use big data Forecasting Methodology Download PDFInfo
- Publication number
- CN107301562A CN107301562A CN201710341039.3A CN201710341039A CN107301562A CN 107301562 A CN107301562 A CN 107301562A CN 201710341039 A CN201710341039 A CN 201710341039A CN 107301562 A CN107301562 A CN 107301562A
- Authority
- CN
- China
- Prior art keywords
- user
- consumption
- data
- feature
- syndrome
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06Q—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
- G06Q30/00—Commerce
- G06Q30/02—Marketing; Price estimation or determination; Fundraising
- G06Q30/0207—Discounts or incentives, e.g. coupons or rebates
- G06Q30/0211—Determining the effectiveness of discounts or incentives
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06Q—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
- G06Q30/00—Commerce
- G06Q30/02—Marketing; Price estimation or determination; Fundraising
- G06Q30/0201—Market modelling; Market analysis; Collecting market data
- G06Q30/0202—Market predictions or forecasting for commercial activities
Landscapes
- Business, Economics & Management (AREA)
- Strategic Management (AREA)
- Engineering & Computer Science (AREA)
- Accounting & Taxation (AREA)
- Development Economics (AREA)
- Finance (AREA)
- Entrepreneurship & Innovation (AREA)
- Economics (AREA)
- Game Theory and Decision Science (AREA)
- Marketing (AREA)
- Physics & Mathematics (AREA)
- General Business, Economics & Management (AREA)
- General Physics & Mathematics (AREA)
- Theoretical Computer Science (AREA)
- Data Mining & Analysis (AREA)
- Management, Administration, Business Operations System, And Electronic Commerce (AREA)
Abstract
A kind of O2O reward vouchers are claimed using big data Forecasting Methodology in the present invention, including:The historical consumption data collection of 101 couples of users carries out pretreatment operation;The historical consumption data collection mark of 102 couples of users, divides and builds training set and forecast set;The historical consumption data collection of 103 couples of users carries out Feature Engineering structure;The processing of 104 feature selectings and unbalanced data;105 pairs of above-mentioned data carry out Multi-classifers integrated study;106, by built formwork erection type, are predicted according to user's history consumption data to the reward voucher service condition of user, optimize the dispensing of O2O reward vouchers.The present invention carries out Multi-classifers integrated study mainly by the processing to customer consumption data and to data, sets up forecast model, so as to predict following reward voucher service condition of user, the dispensing to O2O reward vouchers is optimized.
Description
Technical field
The invention belongs to Intelligent Information Processing field, and in particular to a kind of O2O reward vouchers use big data Forecasting Methodology.
Background technology
With the improvement and popularization of mobile device, mobile Internet+all trades and professions enter the high speed development stage, wherein with
O2O (Online to Offline) consumption attracts eyeball the most.The naturally associated several hundred million consumers of O2O industries, all kinds of APP are daily
It has recorded one of more than 10,000,000,000 user behavior records, thus best joint as big data scientific research and commercial operation.
Old user is vitalized with reward voucher or attracts new client to enter a kind of important marketing mode that shop consumption is O2O.But deliver at random
Reward voucher causes insignificant interference to most users.For businessman, the reward voucher distributed indiscriminately may reduce brand reputation, simultaneously
It is difficult to estimate cost of marketing.With the arrival in big data epoch, how data mining is effectively carried out, so as to produce commercial value
Have become a kind of enterprise's inexorable trend.Consumer consumption behavior can be carried out using technologies such as data mining, machine learning
Analysis, predicts user preferential certificate service condition.Current general method is that user's history is consumed using the grader for having supervision
Whether the information such as data, coupons consumption set up model, user is got consumed after reward voucher, consumption probability etc.
Make prediction.Most of existing method fails to carry out Multi-classifers integrated study or its integrated study effect is undesirable, and
Part Methods feature construction select and unbalanced data in terms of being handled accordingly, cause the precision of prediction not to be
Highly desirable, effect is not ideal in the prediction of True Data collection, and the work that the present invention is done for these aspects is greatly improved
Precision of the big data in terms of reward voucher uses prediction, improves the practicality of algorithm.
This patent is proposed a kind of O2O reward vouchers and the consumer behavior of user is predicted using big data Forecasting Methodology,
Simultaneously this project also related to the pretreatment of user's history consumption data collection, data mark, Feature Engineering structure, feature selecting and
Unbalanced data processing, the prediction of user preferential certificate service condition is carried out by Multi-classifers integrated study;Meanwhile, algorithms of different
The result starting point it is different, the demand of different user can be met, numerous machine learning algorithms is subjected to rational weighted array
Diversity, the accuracy of consumer's reward voucher service condition can be more portrayed, so as to carry out personalized dispensing, can effectively improve excellent
Favour certificate checks and writes off rate, and it can allow the consumer with certain preference to obtain real material benefit, while assigning businessman stronger marketing
Ability.
The content of the invention
Present invention seek to address that above problem of the prior art.Propose and a kind of highly efficient reward voucher is provided for businessman
Releasing mode, allows consumer to obtain real material benefit, and the O2O reward vouchers for assigning the stronger marketing ability of businessman are pre- using big data
Survey method.Technical scheme is as follows:
A kind of O2O reward vouchers use big data Forecasting Methodology, and it comprises the following steps:
101st, obtain the historical consumption data collection of user and carry out pretreatment operation;
102nd, to after pretreatment user historical consumption data collection carry out mark operation, divide and build training set and
Forecast set;
103rd, three major types are divided into not to the syndrome of the training set of user, respectively mark month syndrome, consumption moon feature
Group, neck certificate consumption moon syndrome;
104th, the processing to syndrome feature selecting and to data and progress unbalanced data;
105th, Multi-classifers integrated study is carried out to the data after feature selecting and unbalanced data processing;
106th, model is set up, the reward voucher service condition of user is predicted according to user's history consumption data, is optimized
The dispensing of O2O reward vouchers.
Further, the step 101 comprises the following steps to the pretreatment operation of the historical consumption data collection of user:
S1011, pass through merchant platform obtain user's history consumption data;
S1012, the filling to missing values in user's history consumption data, in raw data table, missing values are character string
' null ', is uniformly converted into NULL types;
S1013, double types are converted to the range information in user's history consumption data, the date is changed by character string
For DateTime types;
S1014, the conversion for carrying out discount rate, full turn reducing are changed to the form of discount, conversion formula is:
Further, the step 102 carries out mark to the historical consumption data collection of user, divides and build instruction
Practice collection and forecast set, be specially:
S1021, according to training set mark principle to training set mark, the record on neck certificate date is filtered out first, then
The record mark 1 that will be consumed in 15 days with certificate, remaining mark 0;
S1022, according to the on-line off-line data distribution principle of correspondence, the behavior pattern in original table is divided into three classes, respectively
Pure getting reward voucher and consuming row for reward voucher behavior, preceding consumer behavior in January and previous first quarter moon is got for mark is of that month
For.
Further, the step 103 is divided into three major types for the syndrome of the historical consumption data collection training set of user
Not, respectively mark month syndrome, consumption moon syndrome, neck certificate consume moon syndrome, are specially:
S1031, to data set carry out attribute type division work, by Attribute transposition be key types and value types, key
The key when key and multiple subcharacters group that type is mainly used in the extraction of subcharacter group character merge, value type attributes are main
For extracting individual features;
S1032, the structure according to training set, are divided into three major types not by syndrome, respectively mark month syndrome, the consumption moon
Syndrome, neck certificate consumption moon syndrome, three syndromes are again different according to the key for being key every time, are divided into 8 sub- syndromes;
S1033, to correlated characteristic discretization:One is feature discretization of adjusting the distance, and regard distance feature as numeric type feature
While, also serve as nominal type feature, by its it is discrete turn to 12 dimensions, in addition to former numerical value ASSOCIATE STATISTICS, separately increase its every
Number of times statistics under individual dimension;Two be the processing to temporal information;
S1034, before feature extraction is carried out, the single available relevant information of attribute is analyzed, to be follow-up subcharacter
Group, which extracts, provides perfect foundation;
S1035, according to the date daily neck certificate number and consumption number in training set are counted, obtain consumer behavior figure, according to disappearing
Take the Wave crest and wave trough and variation tendency of behavior figure, whether will be in festivals or holidays a few days ago as feature;
S1036, user may get the preferential of multiple different distances, different discount rates and not same date this month
Certificate, based on this fact, builds 8 sequencing features based on mark month sequence:The user got same reward voucher neck this month
Take the ranking on date, the user in one day the get number of times, the user of same reward voucher get reward voucher number of times in the businessman
Than the above user in of that month get reward voucher number of times, it is converted after discount rate ranking, apart from ranking, get the row on date
Name, full ranking, the ranking subtracted;
S1037, the consumer behavior according to user and businessman, count the number of times under each behavior classification, are obtained by combination
Behavior Ratio Features group, addition this feature group.
Further, the syndrome of the step S1038 includes following characteristics:Neck certificate number of each user in each businessman
Account for total neck certificate percentage, each user and account for overall consumption time percentage, each user in each business in the consumption number of times of each businessman
Accounted in 15 days of family with certificate number in 15 days overall consumption time percentage, each user accounted for certificate consumption number of times overall consumption time percentage,
Consumption number of times accounting, the different discount rate consumption accountings of each user, festivals or holidays consumption accounting, section are false under each user's different distance
Get reward voucher accounting, each businessman distribution reward voucher number day and account for total reward voucher number ratio, each reward voucher of each businessman
Issue number accounting.
Further, the step S104 feature selectings and the processing to unbalanced data, be specially:
S1041, using feature based clustering procedure carry out feature selecting;
S1042, the class imbalance classification problem for data set, employ the strategy of negative sample lack sampling, sample rate
For 10:1, it is ensured that positive negative sample ratio is 1:1.
Further, the use feature based clustering procedure is mainly included the following steps that:Two null sets are initialized, by
The all properties for having data set are put into set A, and another set B is sky, randomly selects a subset from set A and is put into set
In B, then start iteration:Often wheel iteration selects an attribute to be put into set B from set A so that the training error of set B attributes
Reduction amount is training error incrementss and minimum with set A attributes, when B training error and A training error difference are minimum
When, stop iteration, now set A, B is exactly two views of separation, finally utilized in the view isolated and be based on xgboost
Feature selecting respectively select TOP K features and be trained, K values are characterized the 30% of total dimension.
Further, the step S105 is trained and tested to carrying out the data set after aforesaid operations, carries out many points
Class device integrated study, obtains result, finally gives complete process scheme, is specially:
S1501, the construction of strategy Multi-classifers integrated study for employing stacking;
S1502, three models of selection carry out integrated XGBoost (eXtreme Gradient Boosting), whole features
Under GBDT (Gradient Boosting Decision Tree)) and 700 dimensional characteristics under XGB, order standard is base
In reward voucher ID average AUC, only to consider the ranking of each grader output result, grader study integration phase makes
Strategy is the same RANK_AVG methods based on sorting consistence:∑weighti/ranki。
Advantages of the present invention and have the beneficial effect that:
The present invention is proposed a kind of O2O reward vouchers and the consumer behavior of user is predicted using big data Forecasting Methodology,
Simultaneously this project also related to the pretreatment of user's history consumption data collection, data mark, Feature Engineering structure, feature selecting and
Unbalanced data processing, the consumption model of user is obtained by a series of step and algorithm, according to this patent 105, using many
Combining classifiers study is predicted to carry out potential user's reward voucher service condition;Meanwhile, the result starting point of algorithms of different is different,
The demand of different user can be met, it is preferential that more numerous rational weighted array of machine learning algorithm progress can be portrayed into consumer
Diversity, the accuracy of certificate service condition, so as to carry out personalized dispensing, can effectively improve reward voucher and check and write off rate, and it can be with
The consumer with certain preference is allowed to obtain real material benefit, while assigning the stronger marketing ability of businessman and reducing the battalion of businessman
Sell cost.
Brief description of the drawings
Fig. 1 is that present invention offer preferred embodiment reward voucher uses prediction flow chart;
Fig. 2 is mark principle figure of the present invention;
Fig. 3 is the division constructing plan figure of training set of the present invention and forecast set;
Fig. 4 is attribute Type division figure of the present invention;
Fig. 5 is feature of present invention View separation flow chart;
Fig. 6 is model integrated figure of the present invention;
Fig. 7 is discount rate transition diagram;
Fig. 8 is that single attribute can extract characteristic pattern.
Embodiment
Below in conjunction with the accompanying drawing in the embodiment of the present invention, the technical scheme in the embodiment of the present invention is carried out clear, detailed
Carefully describe.Described embodiment is only a part of embodiment of the present invention.
The present invention solve above-mentioned technical problem technical scheme be:
Embodiment one
For the present invention program is expanded on further, spy gets reward voucher with user on the 15th in June -2016 years on the 1st April in 2016 and disappeared
Expense is recorded as training set, and on July 31, -2016 years on the 15th May in 2016, user got reward voucher behavior to be right exemplified by forecast set
The technical program carries out elaboration, and with reference to Fig. 1, Fig. 1 is predicted for a kind of O2O reward vouchers that the present embodiment is provided using big data
Method flow diagram:
Step 1:Consumption and reward voucher, which get behavior, under user's line of collection includes ID, trade company ID, reward voucher ID
(null represents that without privileges certificate is consumed);Concessionary rate:X in [0,1] represent discount rate;x:Y represents that full x subtracts y, and unit is member;user
Often the place of activity (if chain store, then takes a nearest door from a distance from the nearest shops of the businessman being x*500 meters
Shop), x in [0,10];Null indicates that without this information 0 represents to be less than 500 meters, and 10 represent to be more than 5 kilometers;Get reward voucher day
Phase;Consume the date:If Date=null&Coupon_id!=null, the record represents to get reward voucher but do not use, i.e.,
Negative sample;If Date!=null&Coupon_id=null, then it represents that ordinary consumption date;If Date!=null, &
Coupon_id!=null, then it represents that consume date, i.e. positive sample with reward voucher;Click/consumption and reward voucher neck on user's line
Take the information such as behavior.
Step 2:Data prediction is carried out to the data set in step 1, in raw data table, missing values are character string
' null ', for the ease of subsequent operation, is uniformly converted into NULL types;To the range information conversion in user's history consumption data
For double types, wherein " null " is converted into null;Date is converted to DateTime types by character string, wherein " null " turns
Turn to null;The conversion of discount rate is carried out, the form that full turn reducing is changed into discount (passes through
Formula is converted), and increase discount rate type, full, subtract three row, discount rate conversion such as Fig. 7.
Step 3:It is specially the note for filtering out the neck certificate date first according to training set mark principle to training set mark
Record, the record mark 1 that then will be consumed in 15 days with certificate, remaining 0 Specific Principles of mark such as Fig. 2;Built according to training set and forecast set
Splitting scheme, build training set and forecast set, the specially on-line off-line data distribution principle of correspondence, by the behavior in original table
Pattern is divided into three classes, respectively the of that month pure neck for getting reward voucher behavior, preceding consumer behavior in January and previous first quarter moon of mark
Take reward voucher and consumer behavior, concrete scheme such as Fig. 3.
Step 4:Attribute type division work is carried out to data set, primitive attribute is divided into key types and value classes
Type, key types mainly include ID, tri- attributes of trade company ID and reward voucher ID, as long as being extracted for subcharacter group character
Key when merging of key and multiple subcharacters group, value type attributes include distance, concessionary rate and reward voucher use date
4 attributes, are mainly used in extracting individual features, specific to divide such as Fig. 4;
Step 5:According to the structure of training set, syndrome is divided into three major types not, respectively mark month syndrome, the consumption moon
Syndrome, neck certificate consumption moon syndrome, three syndromes are again different according to the key for being key every time, are divided into 8 sub- syndromes, special
The combination principle of syndrome such as Fig. 3;
Step 6:To correlated characteristic discretization:One is feature discretization of adjusting the distance, and regard distance feature as numeric type feature
While, also serve as nominal type feature, by its it is discrete turn to 12 dimensions (adjusted the distance in initial data in the presence of 12 kinds expression, i.e.,
0-10, null), in addition to former numerical value ASSOCIATE STATISTICS, the number of times statistics under another increase each of which dimension, by taking user's subcharacter group as an example
Illustrate, count the user got under every kind of distance reward voucher number of times can obtain the user get reward voucher distance partially
It is good;Two be the processing to temporal information, and according to user, in 2016.1.1 to 2016.6.31 consumer record, (temporal characteristics have
Date_pay, date_received), the frequency of analysis upper, middle and lower ten days user neck certificate and consumption, specifically, in one-hot coding
On the basis of, first feature is discrete, construction feature:Whether whether whether " the first tenday period of a month ", " the middle ten days ", " last ten-days period ", its value is 0 or 1, quite
In having done one-hot coding to temporal information, meanwhile, what day, nearly 7 days, nearly 14 days and nearly 21 days 10 dimensions the date discrete is turned into
Degree, counts number of times of each subcharacter group under this 10 dimensions respectively.So that reward voucher characteristic subgroup is got by user businessman as an example
Son is illustrated, using user_id and merchant_id as key, nearly 7 days date_received numbers its business implication of statistics
The reward voucher of the businessman whether is got recently for user;
Step 7:Before feature extraction is carried out, the single available relevant information of attribute is analyzed, to be follow-up subcharacter
Group, which extracts, provides perfect foundation, and in this embodiment, single attribute can extract feature such as Fig. 8;
Step 8:According to the difference of each key value attributes, each syndrome is further divided into 8 big subcharacter
Group, specific features are as follows:
Step 9:Neck certificate number and consumption number daily in training set are counted according to the date, consumer behavior figure, root can be obtained
According to the Wave crest and wave trough and variation tendency of consumer behavior figure, festivals or holidays whether will be in a few days ago as feature;
Step 10:One user may get the excellent of multiple different distances, different discount rates and not same date this month
Favour certificate, based on this fact, builds 8 sequencing features based on mark month sequence, specific features are as follows:
db_user_cid_date_received_rank:The user got the row that same reward voucher gets the date this month
Name
db_user_cid_oneday_cishu:The user in one day same certificate get number of times
db_user_everycid_rate:The user gets reward voucher number of times than in above user this month in the businessman
Get reward voucher number of times
db_user_rate_rank:Discount rate ranking after converted
db_user_distance_rank:Apart from ranking (missing values fill out 0)
db_user_date_received_rank:Get the ranking on date
db_user_man_rank:Full ranking
db_user_jian_rank:The ranking subtracted
Step 11:According to the consumer behavior of user and businessman, the number of times under each behavior classification is counted, is obtained by combination
Behavior Ratio Features group, addition this feature group, Partial Feature is as follows:Each user accounts for total neck certificate number in the neck certificate number of each businessman
Ratio, each user account for overall consumption time percentage, each user in 15 days of each businessman in the consumption number of times of each businessman
Overall consumption time percentage, each user in 15 days are accounted for certificate number overall consumption time percentage, each user is accounted for certificate consumption number of times not
Consumption number of times accounting, the different discount rate consumption accountings of each user, festivals or holidays consumption accounting, festivals or holidays get preferential under same distance
Certificate accounting, each businessman distribution reward voucher number account for total reward voucher number ratio, each issuing preferential tickets number of each businessman and accounted for
Than.
Step 12:In Feature Engineering part, constructed characteristic dimension is larger (899 dimension), so multidimensional characteristic one side
Dimension disaster may be caused, on the other hand over-fitting is easily lead to, it is necessary to do dimension-reduction treatment, this algorithm is poly- using feature based
The feature selecting scheme of class, be specially:Two null sets are initialized, all properties of data with existing collection are put into wherein set A
In, another set B is sky.Randomly selected from set A during a subset is put into set B.Then iteration is started:Often wheel iteration from
Set A selects an attribute to be put into set B so that the training error reduction amount of set B attributes and the training error of set A attributes
It is incrementss and minimum.When B training error and minimum A training error difference, stop iteration.Now set A, B are exactly
Two views of separation.Finally TOP 270 is respectively selected in the view isolated using the feature selecting based on xgboost
Feature is trained, and characteristic view separation process figure is as shown in Figure 5;
Step 13:For the class imbalance classification problem of data set, because data set is sufficiently large herein, so employing
The strategy of negative sample lack sampling, sample rate is 10:1, it is ensured that positive negative sample ratio is 1:1.
Step 14:Built and mark completing characteristic data set, training set and test set are divided and Feature Engineering structure.
The step S105 is trained and tested to carrying out the data set after aforesaid operations, carries out Multi-classifers integrated study, obtains
As a result, complete process scheme is finally given, is specially:Because the result of single grader has unicity, when multiple graders
Accuracy, the reduction over-fitting of grader can be effectively lifted after integrated study.This patent employs stacking's here
Construction of strategy Multi-classifers integrated study;
This patent chooses three models and carries out integrated (XGBoost (the eXtreme Gradient under whole features
Boosting), under the GBDT (Gradient Boosting Decision Tree) and 700 dimensional characteristics under whole features
XGB (eXtreme Gradient Boosting)), due to order standard be the average AUC based on reward voucher ID, be in the nature
Sorting consistence problem, because the result dimension that different classifications device is exported is different, so only considering each grader output result
Ranking, the strategy that grader study integration phase is used is the same RANK_AVG methods based on sorting consistence:∑
weighti/ranki.Specific grader study Integrated Solution such as Fig. 6.
Step 15:Choose three models and carry out the integrated (dimensions of GBDT and 700 under XGB, whole features under whole features
The XGB spent under feature), it is in the nature sorting consistence problem because order standard is the average AUC based on reward voucher ID, so
The strategy that the combining classifiers stage uses is the same RANK_AVG methods based on sorting consistence:∑weighti/ranki, specifically
Model Fusion scheme such as Fig. 6.
Step 16:By built formwork erection type, the reward voucher service condition of user is carried out according to user's history consumption data
Prediction, businessman according to the reward voucher service condition of user, can optimize the dispensing of O2O reward vouchers, strengthen marketing ability, and increase is single
The profit that position input is produced.
The above embodiment is interpreted as being merely to illustrate the present invention rather than limited the scope of the invention.
After the content for the record for having read the present invention, technical staff can make various changes or modifications to the present invention, these equivalent changes
Change and modification equally falls into the scope of the claims in the present invention.
Claims (8)
1. a kind of O2O reward vouchers use big data Forecasting Methodology, it is characterised in that comprise the following steps:
101st, obtain the historical consumption data collection of user and carry out pretreatment operation;
102nd, mark operation is carried out to the historical consumption data collection of the user after pretreatment, divides and build training set and prediction
Collection;
103rd, three major types are divided into not to the syndrome of the training set of user, respectively mark month syndrome, consumption moon syndrome, neck
Certificate consumes moon syndrome;
104th, the processing of unbalanced data is carried out to syndrome feature selecting and to data set;
105th, Multi-classifers integrated study is carried out to the data after feature selecting and unbalanced data processing;
106th, model is set up, the reward voucher service condition of user is predicted according to user's history consumption data, optimization O2O is excellent
The dispensing of favour certificate.
2. O2O reward vouchers according to claim 1 use big data Forecasting Methodology, it is characterised in that
The step 101 comprises the following steps to the pretreatment operation of the historical consumption data collection of user:
S1011, pass through merchant platform obtain user's history consumption data;
S1012, the filling to missing values in user's history consumption data, in raw data table, missing values are character string
' null ', is uniformly converted into NULL types;
S1013, double types are converted to the range information in user's history consumption data, the date is converted to by character string
DateTime types;
S1014, the conversion for carrying out discount rate, full turn reducing are changed to the form of discount, conversion formula is:
3. O2O reward vouchers according to claim 1 or 2 use big data Forecasting Methodology, it is characterised in that the step
The historical consumption data collection of 102 couples of users carries out mark, divides and builds training set and forecast set, be specially:
S1021, according to training set mark principle to training set mark, the record on neck certificate date is filtered out first, then by 15
The record mark 1 consumed in it with certificate, remaining mark 0;
S1022, according to the on-line off-line data distribution principle of correspondence, the behavior pattern in original table is divided into three classes, respectively beaten
Mark it is of that month it is pure get reward voucher behavior, preceding consumer behavior in January and previous first quarter moon get reward voucher and consumer behavior.
4. O2O reward vouchers according to claim 3 use big data Forecasting Methodology, it is characterised in that the step 103 pair
It is divided into three major types not in the syndrome of the historical consumption data collection training set of user, respectively mark month syndrome, the consumption moon are special
Syndrome, neck certificate consumption moon syndrome, be specially:
S1031, to data set carry out attribute type division work, by Attribute transposition be key types and value types, key types
The key when key and multiple subcharacters group for being mainly used in the extraction of subcharacter group character merge, value type attributes are mainly used in
Extract individual features;
S1032, the structure according to training set, are divided into three major types not by syndrome, respectively mark month syndrome, consumption moon feature
Group, neck certificate consumption moon syndrome, three syndromes are again different according to each key key,
It is divided into 8 sub- syndromes;
S1033, to correlated characteristic discretization:One is feature discretization of adjusting the distance, and regard distance feature as the same of numeric type feature
When, also serve as nominal type feature, by its it is discrete turn to 12 dimensions, in addition to former numerical value ASSOCIATE STATISTICS, another increase each of which dimension
Number of times statistics under degree;Two be the processing to temporal information;
S1034, before feature extraction is carried out, the single available relevant information of attribute is analyzed, to be carried for subsequent child syndrome
Take and perfect foundation is provided;
S1035, according to the date daily neck certificate number and consumption number in training set are counted, obtain consumer behavior figure, gone according to consumption
For the Wave crest and wave trough and variation tendency of figure, festivals or holidays whether will be in a few days ago as feature;
S1036, user may get multiple different distances, different discount rates and the not reward voucher of same date this month,
Based on this fact, 8 sequencing features based on mark month sequence are built:The user got same reward voucher this month and gets day
The ranking of phase, the user in one day the get number of times, the user of same reward voucher get reward voucher number of times than upper in the businessman
During the user is of that month get reward voucher number of times, it is converted after discount rate ranking, apart from ranking, get the ranking, full on date
Ranking, the ranking that subtracts;
S1037, the consumer behavior according to user and businessman, count the number of times under each behavior classification, and behavior is obtained by combination
Ratio Features group, addition this feature group.
5. O2O reward vouchers according to claim 4 use big data Forecasting Methodology, it is characterised in that the step S1037
Syndrome include following characteristics:Each user accounts for total neck certificate percentage, each user each in the neck certificate number of each businessman
The consumption number of times of businessman account for overall consumption time percentage, each user and account for overall consumption in 15 days with certificate number in 15 days of each businessman
Secondary percentage, each user accounted for certificate consumption number of times consumption number of times accounting under overall consumption time percentage, each user's different distance,
Each user's difference discount rate consumption accounting, festivals or holidays consumption accounting, festivals or holidays get reward voucher accounting, each businessman and issue excellent
Favour certificate number accounts for total reward voucher number ratio, each issuing preferential tickets number accounting of each businessman.
6. O2O reward vouchers according to claim 4 use big data Forecasting Methodology, it is characterised in that the step S104
Feature selecting and the processing to unbalanced data, be specially:
S1041, using feature based clustering procedure carry out feature selecting;
S1042, the class imbalance classification problem for data set, employ the strategy of negative sample lack sampling, and sample rate is 10:
1, it is ensured that positive negative sample ratio is 1:1.
7. O2O reward vouchers according to claim 6 use big data Forecasting Methodology, it is characterised in that described use is based on
Feature clustering method is mainly included the following steps that:Two null sets are initialized, all properties of data with existing collection are put into set A
In, another set B is sky, is randomly selected from set A during a subset is put into set B, then starts iteration:Often wheel iteration from
Set A selects an attribute to be put into set B so that the training error reduction amount of set B attributes and the training error of set A attributes
It is incrementss and minimum, when B training error and minimum A training error difference, stop iteration, now set A, B is exactly
Two views of separation, finally respectively select TOP K special in the view isolated using the feature selecting based on xgboost
Levy and be trained, K values are characterized the 30% of total dimension.
8. O2O reward vouchers according to claim 7 use big data Forecasting Methodology, it is characterised in that the step S105
It is trained and tests to carrying out the data set after aforesaid operations, carries out Multi-classifers integrated study, obtain result, finally give
Complete process scheme, be specially:
S1501, the construction of strategy Multi-classifers integrated study for employing stacking;
S1502, three models of selection carry out integrated XGBoost, the GBDT under whole features and the XGB under 700 dimensional characteristics,
Order standard is the average AUC based on reward voucher ID, only to consider the ranking of each grader output result, grader
The strategy that habit integration phase is used is the same RANK_AVG methods based on sorting consistence:∑weighti/ranji。
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201710341039.3A CN107301562A (en) | 2017-05-16 | 2017-05-16 | A kind of O2O reward vouchers use big data Forecasting Methodology |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201710341039.3A CN107301562A (en) | 2017-05-16 | 2017-05-16 | A kind of O2O reward vouchers use big data Forecasting Methodology |
Publications (1)
Publication Number | Publication Date |
---|---|
CN107301562A true CN107301562A (en) | 2017-10-27 |
Family
ID=60137179
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201710341039.3A Pending CN107301562A (en) | 2017-05-16 | 2017-05-16 | A kind of O2O reward vouchers use big data Forecasting Methodology |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN107301562A (en) |
Cited By (26)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN107944913A (en) * | 2017-11-21 | 2018-04-20 | 重庆邮电大学 | High potential user's purchase intention Forecasting Methodology based on big data user behavior analysis |
CN108876436A (en) * | 2018-05-25 | 2018-11-23 | 广东工业大学 | A kind of electric business discount coupon based on integrated model uses probability forecasting method |
CN108959562A (en) * | 2018-07-04 | 2018-12-07 | 北京京东尚科信息技术有限公司 | Apply the magnanimity regular data processing method and system on block chain |
CN108985335A (en) * | 2018-06-19 | 2018-12-11 | 中国原子能科学研究院 | The integrated study prediction technique of nuclear reactor cladding materials void swelling |
CN109034658A (en) * | 2018-08-22 | 2018-12-18 | 重庆邮电大学 | A kind of promise breaking consumer's risk prediction technique based on big data finance |
CN109102324A (en) * | 2018-07-12 | 2018-12-28 | 阿里巴巴集团控股有限公司 | Model training method, the red packet material based on model are laid with prediction technique and device |
CN109146580A (en) * | 2018-09-30 | 2019-01-04 | 青岛大学 | A kind of O2O coupon distribution method and system based on big data analysis |
CN109389431A (en) * | 2018-09-30 | 2019-02-26 | 北京三快在线科技有限公司 | Distribution method, device, electronic equipment and the readable storage medium storing program for executing of discount coupon |
CN109509033A (en) * | 2018-12-14 | 2019-03-22 | 重庆邮电大学 | A kind of user buying behavior big data prediction technique under consumer finance scene |
CN109711906A (en) * | 2019-01-10 | 2019-05-03 | 哈步数据科技(上海)有限公司 | A kind of distribution method and system of favor information |
WO2019085704A1 (en) * | 2017-11-06 | 2019-05-09 | 北京京东尚科信息技术有限公司 | Method and apparatus for increasing the number of active users |
CN109741117A (en) * | 2019-02-19 | 2019-05-10 | 贵州大学 | A kind of discount coupon distribution method based on intensified learning |
CN109741114A (en) * | 2019-01-10 | 2019-05-10 | 博拉网络股份有限公司 | A kind of user under big data financial scenario buys prediction technique |
CN109934623A (en) * | 2019-02-26 | 2019-06-25 | 中山大学 | Individual economy consuming capacity prediction technique based on user's APP usage behavior |
CN110210888A (en) * | 2019-04-18 | 2019-09-06 | 深圳壹账通智能科技有限公司 | Resource service condition monitoring method, device, electronic equipment and storage medium |
CN110348999A (en) * | 2019-06-29 | 2019-10-18 | 北京淇瑀信息科技有限公司 | The recognition methods of financial risks sensitive users, device and electronic equipment |
CN110766467A (en) * | 2019-10-25 | 2020-02-07 | 深圳乐信软件技术有限公司 | Electronic ticket delivery monitoring method and device, server and storage medium |
CN110782277A (en) * | 2019-10-12 | 2020-02-11 | 上海陆家嘴国际金融资产交易市场股份有限公司 | Resource processing method, resource processing device, computer equipment and storage medium |
CN110827093A (en) * | 2019-11-14 | 2020-02-21 | 北京爱笔科技有限公司 | Method and device for accurate marketing |
CN110992106A (en) * | 2019-12-11 | 2020-04-10 | 上海风秩科技有限公司 | Training data acquisition method and device, and model training method and device |
CN111553542A (en) * | 2020-05-15 | 2020-08-18 | 无锡职业技术学院 | User coupon verification and sale rate prediction method |
CN112561557A (en) * | 2019-09-26 | 2021-03-26 | 治略资讯整合股份有限公司 | Coupon distribution system and coupon distribution method |
CN112819538A (en) * | 2021-02-04 | 2021-05-18 | 长沙理工大学 | User task prediction method and device, computer equipment and storage medium |
CN113010869A (en) * | 2021-03-11 | 2021-06-22 | 北京百度网讯科技有限公司 | Method, apparatus, device and readable storage medium for managing digital content |
CN113760521A (en) * | 2020-09-22 | 2021-12-07 | 北京沃东天骏信息技术有限公司 | Virtual resource allocation method and device |
US20220051282A1 (en) * | 2018-01-19 | 2022-02-17 | Intuit Inc. | Method and system for using machine learning techniques to identify and recommend relevant offers |
Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20090132347A1 (en) * | 2003-08-12 | 2009-05-21 | Russell Wayne Anderson | Systems And Methods For Aggregating And Utilizing Retail Transaction Records At The Customer Level |
CN104809226A (en) * | 2015-05-07 | 2015-07-29 | 武汉大学 | Method for early classifying imbalance multi-variable time sequence data |
CN105389480A (en) * | 2015-12-14 | 2016-03-09 | 深圳大学 | Multiclass unbalanced genomics data iterative integrated feature selection method and system |
-
2017
- 2017-05-16 CN CN201710341039.3A patent/CN107301562A/en active Pending
Patent Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20090132347A1 (en) * | 2003-08-12 | 2009-05-21 | Russell Wayne Anderson | Systems And Methods For Aggregating And Utilizing Retail Transaction Records At The Customer Level |
CN104809226A (en) * | 2015-05-07 | 2015-07-29 | 武汉大学 | Method for early classifying imbalance multi-variable time sequence data |
CN105389480A (en) * | 2015-12-14 | 2016-03-09 | 深圳大学 | Multiclass unbalanced genomics data iterative integrated feature selection method and system |
Non-Patent Citations (1)
Title |
---|
立刻有: "『 天池竞赛』O2O优惠券使用预测思路总结", 《HTTPS://BLOG.CSDN.NET/SHINE19930820/ARTICLE/DETAILS/53995369》 * |
Cited By (33)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2019085704A1 (en) * | 2017-11-06 | 2019-05-09 | 北京京东尚科信息技术有限公司 | Method and apparatus for increasing the number of active users |
CN109754273A (en) * | 2017-11-06 | 2019-05-14 | 北京京东尚科信息技术有限公司 | The method and apparatus for promoting any active ues quantity |
CN107944913A (en) * | 2017-11-21 | 2018-04-20 | 重庆邮电大学 | High potential user's purchase intention Forecasting Methodology based on big data user behavior analysis |
US20220051282A1 (en) * | 2018-01-19 | 2022-02-17 | Intuit Inc. | Method and system for using machine learning techniques to identify and recommend relevant offers |
CN108876436A (en) * | 2018-05-25 | 2018-11-23 | 广东工业大学 | A kind of electric business discount coupon based on integrated model uses probability forecasting method |
CN108985335B (en) * | 2018-06-19 | 2021-04-27 | 中国原子能科学研究院 | Integrated learning prediction method for irradiation swelling of nuclear reactor cladding material |
CN108985335A (en) * | 2018-06-19 | 2018-12-11 | 中国原子能科学研究院 | The integrated study prediction technique of nuclear reactor cladding materials void swelling |
CN108959562A (en) * | 2018-07-04 | 2018-12-07 | 北京京东尚科信息技术有限公司 | Apply the magnanimity regular data processing method and system on block chain |
CN109102324B (en) * | 2018-07-12 | 2021-08-20 | 创新先进技术有限公司 | Model training method, and red packet material laying prediction method and device based on model |
CN109102324A (en) * | 2018-07-12 | 2018-12-28 | 阿里巴巴集团控股有限公司 | Model training method, the red packet material based on model are laid with prediction technique and device |
CN109034658A (en) * | 2018-08-22 | 2018-12-18 | 重庆邮电大学 | A kind of promise breaking consumer's risk prediction technique based on big data finance |
CN109389431A (en) * | 2018-09-30 | 2019-02-26 | 北京三快在线科技有限公司 | Distribution method, device, electronic equipment and the readable storage medium storing program for executing of discount coupon |
CN109146580A (en) * | 2018-09-30 | 2019-01-04 | 青岛大学 | A kind of O2O coupon distribution method and system based on big data analysis |
CN109509033A (en) * | 2018-12-14 | 2019-03-22 | 重庆邮电大学 | A kind of user buying behavior big data prediction technique under consumer finance scene |
CN109741114A (en) * | 2019-01-10 | 2019-05-10 | 博拉网络股份有限公司 | A kind of user under big data financial scenario buys prediction technique |
CN109711906A (en) * | 2019-01-10 | 2019-05-03 | 哈步数据科技(上海)有限公司 | A kind of distribution method and system of favor information |
CN109741117A (en) * | 2019-02-19 | 2019-05-10 | 贵州大学 | A kind of discount coupon distribution method based on intensified learning |
CN109934623A (en) * | 2019-02-26 | 2019-06-25 | 中山大学 | Individual economy consuming capacity prediction technique based on user's APP usage behavior |
CN110210888A (en) * | 2019-04-18 | 2019-09-06 | 深圳壹账通智能科技有限公司 | Resource service condition monitoring method, device, electronic equipment and storage medium |
CN110348999A (en) * | 2019-06-29 | 2019-10-18 | 北京淇瑀信息科技有限公司 | The recognition methods of financial risks sensitive users, device and electronic equipment |
CN110348999B (en) * | 2019-06-29 | 2023-12-22 | 北京淇瑀信息科技有限公司 | Financial risk sensitive user identification method and device and electronic equipment |
CN112561557A (en) * | 2019-09-26 | 2021-03-26 | 治略资讯整合股份有限公司 | Coupon distribution system and coupon distribution method |
CN110782277A (en) * | 2019-10-12 | 2020-02-11 | 上海陆家嘴国际金融资产交易市场股份有限公司 | Resource processing method, resource processing device, computer equipment and storage medium |
CN110766467A (en) * | 2019-10-25 | 2020-02-07 | 深圳乐信软件技术有限公司 | Electronic ticket delivery monitoring method and device, server and storage medium |
CN110827093A (en) * | 2019-11-14 | 2020-02-21 | 北京爱笔科技有限公司 | Method and device for accurate marketing |
CN110992106A (en) * | 2019-12-11 | 2020-04-10 | 上海风秩科技有限公司 | Training data acquisition method and device, and model training method and device |
CN110992106B (en) * | 2019-12-11 | 2023-11-03 | 上海风秩科技有限公司 | Training data acquisition method, training data acquisition device, model training method and model training device |
CN111553542B (en) * | 2020-05-15 | 2023-09-05 | 无锡职业技术学院 | User coupon verification rate prediction method |
CN111553542A (en) * | 2020-05-15 | 2020-08-18 | 无锡职业技术学院 | User coupon verification and sale rate prediction method |
CN113760521A (en) * | 2020-09-22 | 2021-12-07 | 北京沃东天骏信息技术有限公司 | Virtual resource allocation method and device |
CN112819538A (en) * | 2021-02-04 | 2021-05-18 | 长沙理工大学 | User task prediction method and device, computer equipment and storage medium |
CN113010869A (en) * | 2021-03-11 | 2021-06-22 | 北京百度网讯科技有限公司 | Method, apparatus, device and readable storage medium for managing digital content |
CN113010869B (en) * | 2021-03-11 | 2023-08-29 | 北京百度网讯科技有限公司 | Method, apparatus, device and readable storage medium for managing digital content |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN107301562A (en) | A kind of O2O reward vouchers use big data Forecasting Methodology | |
CN108090800A (en) | A kind of game item method for pushing and device based on player's consumption potentiality | |
Liu et al. | Intermediate input imports and innovations: Evidence from Chinese firms' patent filings | |
Larsen | Components of uncertainty | |
Cho | Tourism forecasting and its relationship with leading economic indicators | |
CN107578281A (en) | User preferential certificate behavior prediction method and model building method under e-commerce environment | |
Moss | The History of the Theory of the Firm from Marshall to Robinson and Chamberlin: the Source of Positivism in Economics | |
CN109345302A (en) | Machine learning model training method, device, storage medium and computer equipment | |
CN109919685A (en) | Customer churn prediction method, apparatus, equipment and computer readable storage medium | |
CN107491554B (en) | Construction method, construction device and the file classification method of text classifier | |
CN104700152B (en) | A kind of tobacco Method for Sales Forecast method of fusion season sales information and search behavior information | |
CN107066616A (en) | Method, device and electronic equipment for account processing | |
CN107346502A (en) | A kind of iteration product marketing forecast method based on big data | |
CN103714139A (en) | Parallel data mining method for identifying a mass of mobile client bases | |
CN106845988A (en) | method and device for selecting payment channel | |
CN105931068A (en) | Cardholder consumption figure generation method and device | |
CN107507038A (en) | A kind of electricity charge sensitive users analysis method based on stacking and bagging algorithms | |
CN111325619A (en) | Credit card fraud detection model updating method and device based on joint learning | |
CN108876436A (en) | A kind of electric business discount coupon based on integrated model uses probability forecasting method | |
CN114611959A (en) | O2O big data technology-based product selection strategy system | |
CN106952420A (en) | ATM cash management device, system and method | |
CN110222733A (en) | The high-precision multistage neural-network classification method of one kind and system | |
CN109785002A (en) | A kind of interior prediction technique of paying of user's game | |
Masand et al. | A Comparison of Approaches for Maximizing Business Payoff of Prediction Models. | |
Catal et al. | Improvement of demand forecasting models with special days |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
RJ01 | Rejection of invention patent application after publication |
Application publication date: 20171027 |
|
RJ01 | Rejection of invention patent application after publication |