CN107301562A - A kind of O2O reward vouchers use big data Forecasting Methodology - Google Patents

A kind of O2O reward vouchers use big data Forecasting Methodology Download PDF

Info

Publication number
CN107301562A
CN107301562A CN201710341039.3A CN201710341039A CN107301562A CN 107301562 A CN107301562 A CN 107301562A CN 201710341039 A CN201710341039 A CN 201710341039A CN 107301562 A CN107301562 A CN 107301562A
Authority
CN
China
Prior art keywords
user
consumption
data
feature
syndrome
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201710341039.3A
Other languages
Chinese (zh)
Inventor
王进
范磊
杨阳
欧阳卫华
邵帅
李航
邓欣
胡峰
李智星
陈乔松
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Chongqing University of Post and Telecommunications
Original Assignee
Chongqing University of Post and Telecommunications
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Chongqing University of Post and Telecommunications filed Critical Chongqing University of Post and Telecommunications
Priority to CN201710341039.3A priority Critical patent/CN107301562A/en
Publication of CN107301562A publication Critical patent/CN107301562A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q30/00Commerce
    • G06Q30/02Marketing; Price estimation or determination; Fundraising
    • G06Q30/0207Discounts or incentives, e.g. coupons or rebates
    • G06Q30/0211Determining the effectiveness of discounts or incentives
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q30/00Commerce
    • G06Q30/02Marketing; Price estimation or determination; Fundraising
    • G06Q30/0201Market modelling; Market analysis; Collecting market data
    • G06Q30/0202Market predictions or forecasting for commercial activities

Landscapes

  • Business, Economics & Management (AREA)
  • Strategic Management (AREA)
  • Engineering & Computer Science (AREA)
  • Accounting & Taxation (AREA)
  • Development Economics (AREA)
  • Finance (AREA)
  • Entrepreneurship & Innovation (AREA)
  • Economics (AREA)
  • Game Theory and Decision Science (AREA)
  • Marketing (AREA)
  • Physics & Mathematics (AREA)
  • General Business, Economics & Management (AREA)
  • General Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

A kind of O2O reward vouchers are claimed using big data Forecasting Methodology in the present invention, including:The historical consumption data collection of 101 couples of users carries out pretreatment operation;The historical consumption data collection mark of 102 couples of users, divides and builds training set and forecast set;The historical consumption data collection of 103 couples of users carries out Feature Engineering structure;The processing of 104 feature selectings and unbalanced data;105 pairs of above-mentioned data carry out Multi-classifers integrated study;106, by built formwork erection type, are predicted according to user's history consumption data to the reward voucher service condition of user, optimize the dispensing of O2O reward vouchers.The present invention carries out Multi-classifers integrated study mainly by the processing to customer consumption data and to data, sets up forecast model, so as to predict following reward voucher service condition of user, the dispensing to O2O reward vouchers is optimized.

Description

A kind of O2O reward vouchers use big data Forecasting Methodology
Technical field
The invention belongs to Intelligent Information Processing field, and in particular to a kind of O2O reward vouchers use big data Forecasting Methodology.
Background technology
With the improvement and popularization of mobile device, mobile Internet+all trades and professions enter the high speed development stage, wherein with O2O (Online to Offline) consumption attracts eyeball the most.The naturally associated several hundred million consumers of O2O industries, all kinds of APP are daily It has recorded one of more than 10,000,000,000 user behavior records, thus best joint as big data scientific research and commercial operation. Old user is vitalized with reward voucher or attracts new client to enter a kind of important marketing mode that shop consumption is O2O.But deliver at random Reward voucher causes insignificant interference to most users.For businessman, the reward voucher distributed indiscriminately may reduce brand reputation, simultaneously It is difficult to estimate cost of marketing.With the arrival in big data epoch, how data mining is effectively carried out, so as to produce commercial value Have become a kind of enterprise's inexorable trend.Consumer consumption behavior can be carried out using technologies such as data mining, machine learning Analysis, predicts user preferential certificate service condition.Current general method is that user's history is consumed using the grader for having supervision Whether the information such as data, coupons consumption set up model, user is got consumed after reward voucher, consumption probability etc. Make prediction.Most of existing method fails to carry out Multi-classifers integrated study or its integrated study effect is undesirable, and Part Methods feature construction select and unbalanced data in terms of being handled accordingly, cause the precision of prediction not to be Highly desirable, effect is not ideal in the prediction of True Data collection, and the work that the present invention is done for these aspects is greatly improved Precision of the big data in terms of reward voucher uses prediction, improves the practicality of algorithm.
This patent is proposed a kind of O2O reward vouchers and the consumer behavior of user is predicted using big data Forecasting Methodology, Simultaneously this project also related to the pretreatment of user's history consumption data collection, data mark, Feature Engineering structure, feature selecting and Unbalanced data processing, the prediction of user preferential certificate service condition is carried out by Multi-classifers integrated study;Meanwhile, algorithms of different The result starting point it is different, the demand of different user can be met, numerous machine learning algorithms is subjected to rational weighted array Diversity, the accuracy of consumer's reward voucher service condition can be more portrayed, so as to carry out personalized dispensing, can effectively improve excellent Favour certificate checks and writes off rate, and it can allow the consumer with certain preference to obtain real material benefit, while assigning businessman stronger marketing Ability.
The content of the invention
Present invention seek to address that above problem of the prior art.Propose and a kind of highly efficient reward voucher is provided for businessman Releasing mode, allows consumer to obtain real material benefit, and the O2O reward vouchers for assigning the stronger marketing ability of businessman are pre- using big data Survey method.Technical scheme is as follows:
A kind of O2O reward vouchers use big data Forecasting Methodology, and it comprises the following steps:
101st, obtain the historical consumption data collection of user and carry out pretreatment operation;
102nd, to after pretreatment user historical consumption data collection carry out mark operation, divide and build training set and Forecast set;
103rd, three major types are divided into not to the syndrome of the training set of user, respectively mark month syndrome, consumption moon feature Group, neck certificate consumption moon syndrome;
104th, the processing to syndrome feature selecting and to data and progress unbalanced data;
105th, Multi-classifers integrated study is carried out to the data after feature selecting and unbalanced data processing;
106th, model is set up, the reward voucher service condition of user is predicted according to user's history consumption data, is optimized The dispensing of O2O reward vouchers.
Further, the step 101 comprises the following steps to the pretreatment operation of the historical consumption data collection of user:
S1011, pass through merchant platform obtain user's history consumption data;
S1012, the filling to missing values in user's history consumption data, in raw data table, missing values are character string ' null ', is uniformly converted into NULL types;
S1013, double types are converted to the range information in user's history consumption data, the date is changed by character string For DateTime types;
S1014, the conversion for carrying out discount rate, full turn reducing are changed to the form of discount, conversion formula is:
Further, the step 102 carries out mark to the historical consumption data collection of user, divides and build instruction
Practice collection and forecast set, be specially:
S1021, according to training set mark principle to training set mark, the record on neck certificate date is filtered out first, then The record mark 1 that will be consumed in 15 days with certificate, remaining mark 0;
S1022, according to the on-line off-line data distribution principle of correspondence, the behavior pattern in original table is divided into three classes, respectively Pure getting reward voucher and consuming row for reward voucher behavior, preceding consumer behavior in January and previous first quarter moon is got for mark is of that month For.
Further, the step 103 is divided into three major types for the syndrome of the historical consumption data collection training set of user Not, respectively mark month syndrome, consumption moon syndrome, neck certificate consume moon syndrome, are specially:
S1031, to data set carry out attribute type division work, by Attribute transposition be key types and value types, key The key when key and multiple subcharacters group that type is mainly used in the extraction of subcharacter group character merge, value type attributes are main For extracting individual features;
S1032, the structure according to training set, are divided into three major types not by syndrome, respectively mark month syndrome, the consumption moon Syndrome, neck certificate consumption moon syndrome, three syndromes are again different according to the key for being key every time, are divided into 8 sub- syndromes;
S1033, to correlated characteristic discretization:One is feature discretization of adjusting the distance, and regard distance feature as numeric type feature While, also serve as nominal type feature, by its it is discrete turn to 12 dimensions, in addition to former numerical value ASSOCIATE STATISTICS, separately increase its every Number of times statistics under individual dimension;Two be the processing to temporal information;
S1034, before feature extraction is carried out, the single available relevant information of attribute is analyzed, to be follow-up subcharacter Group, which extracts, provides perfect foundation;
S1035, according to the date daily neck certificate number and consumption number in training set are counted, obtain consumer behavior figure, according to disappearing Take the Wave crest and wave trough and variation tendency of behavior figure, whether will be in festivals or holidays a few days ago as feature;
S1036, user may get the preferential of multiple different distances, different discount rates and not same date this month Certificate, based on this fact, builds 8 sequencing features based on mark month sequence:The user got same reward voucher neck this month Take the ranking on date, the user in one day the get number of times, the user of same reward voucher get reward voucher number of times in the businessman Than the above user in of that month get reward voucher number of times, it is converted after discount rate ranking, apart from ranking, get the row on date Name, full ranking, the ranking subtracted;
S1037, the consumer behavior according to user and businessman, count the number of times under each behavior classification, are obtained by combination Behavior Ratio Features group, addition this feature group.
Further, the syndrome of the step S1038 includes following characteristics:Neck certificate number of each user in each businessman Account for total neck certificate percentage, each user and account for overall consumption time percentage, each user in each business in the consumption number of times of each businessman Accounted in 15 days of family with certificate number in 15 days overall consumption time percentage, each user accounted for certificate consumption number of times overall consumption time percentage, Consumption number of times accounting, the different discount rate consumption accountings of each user, festivals or holidays consumption accounting, section are false under each user's different distance Get reward voucher accounting, each businessman distribution reward voucher number day and account for total reward voucher number ratio, each reward voucher of each businessman Issue number accounting.
Further, the step S104 feature selectings and the processing to unbalanced data, be specially:
S1041, using feature based clustering procedure carry out feature selecting;
S1042, the class imbalance classification problem for data set, employ the strategy of negative sample lack sampling, sample rate For 10:1, it is ensured that positive negative sample ratio is 1:1.
Further, the use feature based clustering procedure is mainly included the following steps that:Two null sets are initialized, by The all properties for having data set are put into set A, and another set B is sky, randomly selects a subset from set A and is put into set In B, then start iteration:Often wheel iteration selects an attribute to be put into set B from set A so that the training error of set B attributes Reduction amount is training error incrementss and minimum with set A attributes, when B training error and A training error difference are minimum When, stop iteration, now set A, B is exactly two views of separation, finally utilized in the view isolated and be based on xgboost Feature selecting respectively select TOP K features and be trained, K values are characterized the 30% of total dimension.
Further, the step S105 is trained and tested to carrying out the data set after aforesaid operations, carries out many points Class device integrated study, obtains result, finally gives complete process scheme, is specially:
S1501, the construction of strategy Multi-classifers integrated study for employing stacking;
S1502, three models of selection carry out integrated XGBoost (eXtreme Gradient Boosting), whole features Under GBDT (Gradient Boosting Decision Tree)) and 700 dimensional characteristics under XGB, order standard is base In reward voucher ID average AUC, only to consider the ranking of each grader output result, grader study integration phase makes Strategy is the same RANK_AVG methods based on sorting consistence:∑weighti/ranki
Advantages of the present invention and have the beneficial effect that:
The present invention is proposed a kind of O2O reward vouchers and the consumer behavior of user is predicted using big data Forecasting Methodology, Simultaneously this project also related to the pretreatment of user's history consumption data collection, data mark, Feature Engineering structure, feature selecting and Unbalanced data processing, the consumption model of user is obtained by a series of step and algorithm, according to this patent 105, using many Combining classifiers study is predicted to carry out potential user's reward voucher service condition;Meanwhile, the result starting point of algorithms of different is different, The demand of different user can be met, it is preferential that more numerous rational weighted array of machine learning algorithm progress can be portrayed into consumer Diversity, the accuracy of certificate service condition, so as to carry out personalized dispensing, can effectively improve reward voucher and check and write off rate, and it can be with The consumer with certain preference is allowed to obtain real material benefit, while assigning the stronger marketing ability of businessman and reducing the battalion of businessman Sell cost.
Brief description of the drawings
Fig. 1 is that present invention offer preferred embodiment reward voucher uses prediction flow chart;
Fig. 2 is mark principle figure of the present invention;
Fig. 3 is the division constructing plan figure of training set of the present invention and forecast set;
Fig. 4 is attribute Type division figure of the present invention;
Fig. 5 is feature of present invention View separation flow chart;
Fig. 6 is model integrated figure of the present invention;
Fig. 7 is discount rate transition diagram;
Fig. 8 is that single attribute can extract characteristic pattern.
Embodiment
Below in conjunction with the accompanying drawing in the embodiment of the present invention, the technical scheme in the embodiment of the present invention is carried out clear, detailed Carefully describe.Described embodiment is only a part of embodiment of the present invention.
The present invention solve above-mentioned technical problem technical scheme be:
Embodiment one
For the present invention program is expanded on further, spy gets reward voucher with user on the 15th in June -2016 years on the 1st April in 2016 and disappeared Expense is recorded as training set, and on July 31, -2016 years on the 15th May in 2016, user got reward voucher behavior to be right exemplified by forecast set The technical program carries out elaboration, and with reference to Fig. 1, Fig. 1 is predicted for a kind of O2O reward vouchers that the present embodiment is provided using big data Method flow diagram:
Step 1:Consumption and reward voucher, which get behavior, under user's line of collection includes ID, trade company ID, reward voucher ID (null represents that without privileges certificate is consumed);Concessionary rate:X in [0,1] represent discount rate;x:Y represents that full x subtracts y, and unit is member;user Often the place of activity (if chain store, then takes a nearest door from a distance from the nearest shops of the businessman being x*500 meters Shop), x in [0,10];Null indicates that without this information 0 represents to be less than 500 meters, and 10 represent to be more than 5 kilometers;Get reward voucher day Phase;Consume the date:If Date=null&Coupon_id!=null, the record represents to get reward voucher but do not use, i.e., Negative sample;If Date!=null&Coupon_id=null, then it represents that ordinary consumption date;If Date!=null, & Coupon_id!=null, then it represents that consume date, i.e. positive sample with reward voucher;Click/consumption and reward voucher neck on user's line Take the information such as behavior.
Step 2:Data prediction is carried out to the data set in step 1, in raw data table, missing values are character string ' null ', for the ease of subsequent operation, is uniformly converted into NULL types;To the range information conversion in user's history consumption data For double types, wherein " null " is converted into null;Date is converted to DateTime types by character string, wherein " null " turns Turn to null;The conversion of discount rate is carried out, the form that full turn reducing is changed into discount (passes through Formula is converted), and increase discount rate type, full, subtract three row, discount rate conversion such as Fig. 7.
Step 3:It is specially the note for filtering out the neck certificate date first according to training set mark principle to training set mark Record, the record mark 1 that then will be consumed in 15 days with certificate, remaining 0 Specific Principles of mark such as Fig. 2;Built according to training set and forecast set Splitting scheme, build training set and forecast set, the specially on-line off-line data distribution principle of correspondence, by the behavior in original table Pattern is divided into three classes, respectively the of that month pure neck for getting reward voucher behavior, preceding consumer behavior in January and previous first quarter moon of mark Take reward voucher and consumer behavior, concrete scheme such as Fig. 3.
Step 4:Attribute type division work is carried out to data set, primitive attribute is divided into key types and value classes Type, key types mainly include ID, tri- attributes of trade company ID and reward voucher ID, as long as being extracted for subcharacter group character Key when merging of key and multiple subcharacters group, value type attributes include distance, concessionary rate and reward voucher use date 4 attributes, are mainly used in extracting individual features, specific to divide such as Fig. 4;
Step 5:According to the structure of training set, syndrome is divided into three major types not, respectively mark month syndrome, the consumption moon Syndrome, neck certificate consumption moon syndrome, three syndromes are again different according to the key for being key every time, are divided into 8 sub- syndromes, special The combination principle of syndrome such as Fig. 3;
Step 6:To correlated characteristic discretization:One is feature discretization of adjusting the distance, and regard distance feature as numeric type feature While, also serve as nominal type feature, by its it is discrete turn to 12 dimensions (adjusted the distance in initial data in the presence of 12 kinds expression, i.e., 0-10, null), in addition to former numerical value ASSOCIATE STATISTICS, the number of times statistics under another increase each of which dimension, by taking user's subcharacter group as an example Illustrate, count the user got under every kind of distance reward voucher number of times can obtain the user get reward voucher distance partially It is good;Two be the processing to temporal information, and according to user, in 2016.1.1 to 2016.6.31 consumer record, (temporal characteristics have Date_pay, date_received), the frequency of analysis upper, middle and lower ten days user neck certificate and consumption, specifically, in one-hot coding On the basis of, first feature is discrete, construction feature:Whether whether whether " the first tenday period of a month ", " the middle ten days ", " last ten-days period ", its value is 0 or 1, quite In having done one-hot coding to temporal information, meanwhile, what day, nearly 7 days, nearly 14 days and nearly 21 days 10 dimensions the date discrete is turned into Degree, counts number of times of each subcharacter group under this 10 dimensions respectively.So that reward voucher characteristic subgroup is got by user businessman as an example Son is illustrated, using user_id and merchant_id as key, nearly 7 days date_received numbers its business implication of statistics The reward voucher of the businessman whether is got recently for user;
Step 7:Before feature extraction is carried out, the single available relevant information of attribute is analyzed, to be follow-up subcharacter Group, which extracts, provides perfect foundation, and in this embodiment, single attribute can extract feature such as Fig. 8;
Step 8:According to the difference of each key value attributes, each syndrome is further divided into 8 big subcharacter Group, specific features are as follows:
Step 9:Neck certificate number and consumption number daily in training set are counted according to the date, consumer behavior figure, root can be obtained According to the Wave crest and wave trough and variation tendency of consumer behavior figure, festivals or holidays whether will be in a few days ago as feature;
Step 10:One user may get the excellent of multiple different distances, different discount rates and not same date this month Favour certificate, based on this fact, builds 8 sequencing features based on mark month sequence, specific features are as follows:
db_user_cid_date_received_rank:The user got the row that same reward voucher gets the date this month Name
db_user_cid_oneday_cishu:The user in one day same certificate get number of times
db_user_everycid_rate:The user gets reward voucher number of times than in above user this month in the businessman Get reward voucher number of times
db_user_rate_rank:Discount rate ranking after converted
db_user_distance_rank:Apart from ranking (missing values fill out 0)
db_user_date_received_rank:Get the ranking on date
db_user_man_rank:Full ranking
db_user_jian_rank:The ranking subtracted
Step 11:According to the consumer behavior of user and businessman, the number of times under each behavior classification is counted, is obtained by combination Behavior Ratio Features group, addition this feature group, Partial Feature is as follows:Each user accounts for total neck certificate number in the neck certificate number of each businessman Ratio, each user account for overall consumption time percentage, each user in 15 days of each businessman in the consumption number of times of each businessman Overall consumption time percentage, each user in 15 days are accounted for certificate number overall consumption time percentage, each user is accounted for certificate consumption number of times not Consumption number of times accounting, the different discount rate consumption accountings of each user, festivals or holidays consumption accounting, festivals or holidays get preferential under same distance Certificate accounting, each businessman distribution reward voucher number account for total reward voucher number ratio, each issuing preferential tickets number of each businessman and accounted for Than.
Step 12:In Feature Engineering part, constructed characteristic dimension is larger (899 dimension), so multidimensional characteristic one side Dimension disaster may be caused, on the other hand over-fitting is easily lead to, it is necessary to do dimension-reduction treatment, this algorithm is poly- using feature based The feature selecting scheme of class, be specially:Two null sets are initialized, all properties of data with existing collection are put into wherein set A In, another set B is sky.Randomly selected from set A during a subset is put into set B.Then iteration is started:Often wheel iteration from Set A selects an attribute to be put into set B so that the training error reduction amount of set B attributes and the training error of set A attributes It is incrementss and minimum.When B training error and minimum A training error difference, stop iteration.Now set A, B are exactly Two views of separation.Finally TOP 270 is respectively selected in the view isolated using the feature selecting based on xgboost Feature is trained, and characteristic view separation process figure is as shown in Figure 5;
Step 13:For the class imbalance classification problem of data set, because data set is sufficiently large herein, so employing The strategy of negative sample lack sampling, sample rate is 10:1, it is ensured that positive negative sample ratio is 1:1.
Step 14:Built and mark completing characteristic data set, training set and test set are divided and Feature Engineering structure. The step S105 is trained and tested to carrying out the data set after aforesaid operations, carries out Multi-classifers integrated study, obtains As a result, complete process scheme is finally given, is specially:Because the result of single grader has unicity, when multiple graders Accuracy, the reduction over-fitting of grader can be effectively lifted after integrated study.This patent employs stacking's here Construction of strategy Multi-classifers integrated study;
This patent chooses three models and carries out integrated (XGBoost (the eXtreme Gradient under whole features Boosting), under the GBDT (Gradient Boosting Decision Tree) and 700 dimensional characteristics under whole features XGB (eXtreme Gradient Boosting)), due to order standard be the average AUC based on reward voucher ID, be in the nature Sorting consistence problem, because the result dimension that different classifications device is exported is different, so only considering each grader output result Ranking, the strategy that grader study integration phase is used is the same RANK_AVG methods based on sorting consistence:∑ weighti/ranki.Specific grader study Integrated Solution such as Fig. 6.
Step 15:Choose three models and carry out the integrated (dimensions of GBDT and 700 under XGB, whole features under whole features The XGB spent under feature), it is in the nature sorting consistence problem because order standard is the average AUC based on reward voucher ID, so The strategy that the combining classifiers stage uses is the same RANK_AVG methods based on sorting consistence:∑weighti/ranki, specifically Model Fusion scheme such as Fig. 6.
Step 16:By built formwork erection type, the reward voucher service condition of user is carried out according to user's history consumption data Prediction, businessman according to the reward voucher service condition of user, can optimize the dispensing of O2O reward vouchers, strengthen marketing ability, and increase is single The profit that position input is produced.
The above embodiment is interpreted as being merely to illustrate the present invention rather than limited the scope of the invention. After the content for the record for having read the present invention, technical staff can make various changes or modifications to the present invention, these equivalent changes Change and modification equally falls into the scope of the claims in the present invention.

Claims (8)

1. a kind of O2O reward vouchers use big data Forecasting Methodology, it is characterised in that comprise the following steps:
101st, obtain the historical consumption data collection of user and carry out pretreatment operation;
102nd, mark operation is carried out to the historical consumption data collection of the user after pretreatment, divides and build training set and prediction Collection;
103rd, three major types are divided into not to the syndrome of the training set of user, respectively mark month syndrome, consumption moon syndrome, neck Certificate consumes moon syndrome;
104th, the processing of unbalanced data is carried out to syndrome feature selecting and to data set;
105th, Multi-classifers integrated study is carried out to the data after feature selecting and unbalanced data processing;
106th, model is set up, the reward voucher service condition of user is predicted according to user's history consumption data, optimization O2O is excellent The dispensing of favour certificate.
2. O2O reward vouchers according to claim 1 use big data Forecasting Methodology, it is characterised in that
The step 101 comprises the following steps to the pretreatment operation of the historical consumption data collection of user:
S1011, pass through merchant platform obtain user's history consumption data;
S1012, the filling to missing values in user's history consumption data, in raw data table, missing values are character string ' null ', is uniformly converted into NULL types;
S1013, double types are converted to the range information in user's history consumption data, the date is converted to by character string DateTime types;
S1014, the conversion for carrying out discount rate, full turn reducing are changed to the form of discount, conversion formula is:
3. O2O reward vouchers according to claim 1 or 2 use big data Forecasting Methodology, it is characterised in that the step The historical consumption data collection of 102 couples of users carries out mark, divides and builds training set and forecast set, be specially:
S1021, according to training set mark principle to training set mark, the record on neck certificate date is filtered out first, then by 15 The record mark 1 consumed in it with certificate, remaining mark 0;
S1022, according to the on-line off-line data distribution principle of correspondence, the behavior pattern in original table is divided into three classes, respectively beaten Mark it is of that month it is pure get reward voucher behavior, preceding consumer behavior in January and previous first quarter moon get reward voucher and consumer behavior.
4. O2O reward vouchers according to claim 3 use big data Forecasting Methodology, it is characterised in that the step 103 pair It is divided into three major types not in the syndrome of the historical consumption data collection training set of user, respectively mark month syndrome, the consumption moon are special Syndrome, neck certificate consumption moon syndrome, be specially:
S1031, to data set carry out attribute type division work, by Attribute transposition be key types and value types, key types The key when key and multiple subcharacters group for being mainly used in the extraction of subcharacter group character merge, value type attributes are mainly used in Extract individual features;
S1032, the structure according to training set, are divided into three major types not by syndrome, respectively mark month syndrome, consumption moon feature Group, neck certificate consumption moon syndrome, three syndromes are again different according to each key key,
It is divided into 8 sub- syndromes;
S1033, to correlated characteristic discretization:One is feature discretization of adjusting the distance, and regard distance feature as the same of numeric type feature When, also serve as nominal type feature, by its it is discrete turn to 12 dimensions, in addition to former numerical value ASSOCIATE STATISTICS, another increase each of which dimension Number of times statistics under degree;Two be the processing to temporal information;
S1034, before feature extraction is carried out, the single available relevant information of attribute is analyzed, to be carried for subsequent child syndrome Take and perfect foundation is provided;
S1035, according to the date daily neck certificate number and consumption number in training set are counted, obtain consumer behavior figure, gone according to consumption For the Wave crest and wave trough and variation tendency of figure, festivals or holidays whether will be in a few days ago as feature;
S1036, user may get multiple different distances, different discount rates and the not reward voucher of same date this month, Based on this fact, 8 sequencing features based on mark month sequence are built:The user got same reward voucher this month and gets day The ranking of phase, the user in one day the get number of times, the user of same reward voucher get reward voucher number of times than upper in the businessman During the user is of that month get reward voucher number of times, it is converted after discount rate ranking, apart from ranking, get the ranking, full on date Ranking, the ranking that subtracts;
S1037, the consumer behavior according to user and businessman, count the number of times under each behavior classification, and behavior is obtained by combination Ratio Features group, addition this feature group.
5. O2O reward vouchers according to claim 4 use big data Forecasting Methodology, it is characterised in that the step S1037 Syndrome include following characteristics:Each user accounts for total neck certificate percentage, each user each in the neck certificate number of each businessman The consumption number of times of businessman account for overall consumption time percentage, each user and account for overall consumption in 15 days with certificate number in 15 days of each businessman Secondary percentage, each user accounted for certificate consumption number of times consumption number of times accounting under overall consumption time percentage, each user's different distance, Each user's difference discount rate consumption accounting, festivals or holidays consumption accounting, festivals or holidays get reward voucher accounting, each businessman and issue excellent Favour certificate number accounts for total reward voucher number ratio, each issuing preferential tickets number accounting of each businessman.
6. O2O reward vouchers according to claim 4 use big data Forecasting Methodology, it is characterised in that the step S104 Feature selecting and the processing to unbalanced data, be specially:
S1041, using feature based clustering procedure carry out feature selecting;
S1042, the class imbalance classification problem for data set, employ the strategy of negative sample lack sampling, and sample rate is 10: 1, it is ensured that positive negative sample ratio is 1:1.
7. O2O reward vouchers according to claim 6 use big data Forecasting Methodology, it is characterised in that described use is based on Feature clustering method is mainly included the following steps that:Two null sets are initialized, all properties of data with existing collection are put into set A In, another set B is sky, is randomly selected from set A during a subset is put into set B, then starts iteration:Often wheel iteration from Set A selects an attribute to be put into set B so that the training error reduction amount of set B attributes and the training error of set A attributes It is incrementss and minimum, when B training error and minimum A training error difference, stop iteration, now set A, B is exactly Two views of separation, finally respectively select TOP K special in the view isolated using the feature selecting based on xgboost Levy and be trained, K values are characterized the 30% of total dimension.
8. O2O reward vouchers according to claim 7 use big data Forecasting Methodology, it is characterised in that the step S105 It is trained and tests to carrying out the data set after aforesaid operations, carries out Multi-classifers integrated study, obtain result, finally give Complete process scheme, be specially:
S1501, the construction of strategy Multi-classifers integrated study for employing stacking;
S1502, three models of selection carry out integrated XGBoost, the GBDT under whole features and the XGB under 700 dimensional characteristics, Order standard is the average AUC based on reward voucher ID, only to consider the ranking of each grader output result, grader The strategy that habit integration phase is used is the same RANK_AVG methods based on sorting consistence:∑weighti/ranji
CN201710341039.3A 2017-05-16 2017-05-16 A kind of O2O reward vouchers use big data Forecasting Methodology Pending CN107301562A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201710341039.3A CN107301562A (en) 2017-05-16 2017-05-16 A kind of O2O reward vouchers use big data Forecasting Methodology

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201710341039.3A CN107301562A (en) 2017-05-16 2017-05-16 A kind of O2O reward vouchers use big data Forecasting Methodology

Publications (1)

Publication Number Publication Date
CN107301562A true CN107301562A (en) 2017-10-27

Family

ID=60137179

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201710341039.3A Pending CN107301562A (en) 2017-05-16 2017-05-16 A kind of O2O reward vouchers use big data Forecasting Methodology

Country Status (1)

Country Link
CN (1) CN107301562A (en)

Cited By (26)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107944913A (en) * 2017-11-21 2018-04-20 重庆邮电大学 High potential user's purchase intention Forecasting Methodology based on big data user behavior analysis
CN108876436A (en) * 2018-05-25 2018-11-23 广东工业大学 A kind of electric business discount coupon based on integrated model uses probability forecasting method
CN108959562A (en) * 2018-07-04 2018-12-07 北京京东尚科信息技术有限公司 Apply the magnanimity regular data processing method and system on block chain
CN108985335A (en) * 2018-06-19 2018-12-11 中国原子能科学研究院 The integrated study prediction technique of nuclear reactor cladding materials void swelling
CN109034658A (en) * 2018-08-22 2018-12-18 重庆邮电大学 A kind of promise breaking consumer's risk prediction technique based on big data finance
CN109102324A (en) * 2018-07-12 2018-12-28 阿里巴巴集团控股有限公司 Model training method, the red packet material based on model are laid with prediction technique and device
CN109146580A (en) * 2018-09-30 2019-01-04 青岛大学 A kind of O2O coupon distribution method and system based on big data analysis
CN109389431A (en) * 2018-09-30 2019-02-26 北京三快在线科技有限公司 Distribution method, device, electronic equipment and the readable storage medium storing program for executing of discount coupon
CN109509033A (en) * 2018-12-14 2019-03-22 重庆邮电大学 A kind of user buying behavior big data prediction technique under consumer finance scene
CN109711906A (en) * 2019-01-10 2019-05-03 哈步数据科技(上海)有限公司 A kind of distribution method and system of favor information
WO2019085704A1 (en) * 2017-11-06 2019-05-09 北京京东尚科信息技术有限公司 Method and apparatus for increasing the number of active users
CN109741117A (en) * 2019-02-19 2019-05-10 贵州大学 A kind of discount coupon distribution method based on intensified learning
CN109741114A (en) * 2019-01-10 2019-05-10 博拉网络股份有限公司 A kind of user under big data financial scenario buys prediction technique
CN109934623A (en) * 2019-02-26 2019-06-25 中山大学 Individual economy consuming capacity prediction technique based on user's APP usage behavior
CN110210888A (en) * 2019-04-18 2019-09-06 深圳壹账通智能科技有限公司 Resource service condition monitoring method, device, electronic equipment and storage medium
CN110348999A (en) * 2019-06-29 2019-10-18 北京淇瑀信息科技有限公司 The recognition methods of financial risks sensitive users, device and electronic equipment
CN110766467A (en) * 2019-10-25 2020-02-07 深圳乐信软件技术有限公司 Electronic ticket delivery monitoring method and device, server and storage medium
CN110782277A (en) * 2019-10-12 2020-02-11 上海陆家嘴国际金融资产交易市场股份有限公司 Resource processing method, resource processing device, computer equipment and storage medium
CN110827093A (en) * 2019-11-14 2020-02-21 北京爱笔科技有限公司 Method and device for accurate marketing
CN110992106A (en) * 2019-12-11 2020-04-10 上海风秩科技有限公司 Training data acquisition method and device, and model training method and device
CN111553542A (en) * 2020-05-15 2020-08-18 无锡职业技术学院 User coupon verification and sale rate prediction method
CN112561557A (en) * 2019-09-26 2021-03-26 治略资讯整合股份有限公司 Coupon distribution system and coupon distribution method
CN112819538A (en) * 2021-02-04 2021-05-18 长沙理工大学 User task prediction method and device, computer equipment and storage medium
CN113010869A (en) * 2021-03-11 2021-06-22 北京百度网讯科技有限公司 Method, apparatus, device and readable storage medium for managing digital content
CN113760521A (en) * 2020-09-22 2021-12-07 北京沃东天骏信息技术有限公司 Virtual resource allocation method and device
US20220051282A1 (en) * 2018-01-19 2022-02-17 Intuit Inc. Method and system for using machine learning techniques to identify and recommend relevant offers

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20090132347A1 (en) * 2003-08-12 2009-05-21 Russell Wayne Anderson Systems And Methods For Aggregating And Utilizing Retail Transaction Records At The Customer Level
CN104809226A (en) * 2015-05-07 2015-07-29 武汉大学 Method for early classifying imbalance multi-variable time sequence data
CN105389480A (en) * 2015-12-14 2016-03-09 深圳大学 Multiclass unbalanced genomics data iterative integrated feature selection method and system

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20090132347A1 (en) * 2003-08-12 2009-05-21 Russell Wayne Anderson Systems And Methods For Aggregating And Utilizing Retail Transaction Records At The Customer Level
CN104809226A (en) * 2015-05-07 2015-07-29 武汉大学 Method for early classifying imbalance multi-variable time sequence data
CN105389480A (en) * 2015-12-14 2016-03-09 深圳大学 Multiclass unbalanced genomics data iterative integrated feature selection method and system

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
立刻有: "『 天池竞赛』O2O优惠券使用预测思路总结", 《HTTPS://BLOG.CSDN.NET/SHINE19930820/ARTICLE/DETAILS/53995369》 *

Cited By (33)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2019085704A1 (en) * 2017-11-06 2019-05-09 北京京东尚科信息技术有限公司 Method and apparatus for increasing the number of active users
CN109754273A (en) * 2017-11-06 2019-05-14 北京京东尚科信息技术有限公司 The method and apparatus for promoting any active ues quantity
CN107944913A (en) * 2017-11-21 2018-04-20 重庆邮电大学 High potential user's purchase intention Forecasting Methodology based on big data user behavior analysis
US20220051282A1 (en) * 2018-01-19 2022-02-17 Intuit Inc. Method and system for using machine learning techniques to identify and recommend relevant offers
CN108876436A (en) * 2018-05-25 2018-11-23 广东工业大学 A kind of electric business discount coupon based on integrated model uses probability forecasting method
CN108985335B (en) * 2018-06-19 2021-04-27 中国原子能科学研究院 Integrated learning prediction method for irradiation swelling of nuclear reactor cladding material
CN108985335A (en) * 2018-06-19 2018-12-11 中国原子能科学研究院 The integrated study prediction technique of nuclear reactor cladding materials void swelling
CN108959562A (en) * 2018-07-04 2018-12-07 北京京东尚科信息技术有限公司 Apply the magnanimity regular data processing method and system on block chain
CN109102324B (en) * 2018-07-12 2021-08-20 创新先进技术有限公司 Model training method, and red packet material laying prediction method and device based on model
CN109102324A (en) * 2018-07-12 2018-12-28 阿里巴巴集团控股有限公司 Model training method, the red packet material based on model are laid with prediction technique and device
CN109034658A (en) * 2018-08-22 2018-12-18 重庆邮电大学 A kind of promise breaking consumer's risk prediction technique based on big data finance
CN109389431A (en) * 2018-09-30 2019-02-26 北京三快在线科技有限公司 Distribution method, device, electronic equipment and the readable storage medium storing program for executing of discount coupon
CN109146580A (en) * 2018-09-30 2019-01-04 青岛大学 A kind of O2O coupon distribution method and system based on big data analysis
CN109509033A (en) * 2018-12-14 2019-03-22 重庆邮电大学 A kind of user buying behavior big data prediction technique under consumer finance scene
CN109741114A (en) * 2019-01-10 2019-05-10 博拉网络股份有限公司 A kind of user under big data financial scenario buys prediction technique
CN109711906A (en) * 2019-01-10 2019-05-03 哈步数据科技(上海)有限公司 A kind of distribution method and system of favor information
CN109741117A (en) * 2019-02-19 2019-05-10 贵州大学 A kind of discount coupon distribution method based on intensified learning
CN109934623A (en) * 2019-02-26 2019-06-25 中山大学 Individual economy consuming capacity prediction technique based on user's APP usage behavior
CN110210888A (en) * 2019-04-18 2019-09-06 深圳壹账通智能科技有限公司 Resource service condition monitoring method, device, electronic equipment and storage medium
CN110348999A (en) * 2019-06-29 2019-10-18 北京淇瑀信息科技有限公司 The recognition methods of financial risks sensitive users, device and electronic equipment
CN110348999B (en) * 2019-06-29 2023-12-22 北京淇瑀信息科技有限公司 Financial risk sensitive user identification method and device and electronic equipment
CN112561557A (en) * 2019-09-26 2021-03-26 治略资讯整合股份有限公司 Coupon distribution system and coupon distribution method
CN110782277A (en) * 2019-10-12 2020-02-11 上海陆家嘴国际金融资产交易市场股份有限公司 Resource processing method, resource processing device, computer equipment and storage medium
CN110766467A (en) * 2019-10-25 2020-02-07 深圳乐信软件技术有限公司 Electronic ticket delivery monitoring method and device, server and storage medium
CN110827093A (en) * 2019-11-14 2020-02-21 北京爱笔科技有限公司 Method and device for accurate marketing
CN110992106A (en) * 2019-12-11 2020-04-10 上海风秩科技有限公司 Training data acquisition method and device, and model training method and device
CN110992106B (en) * 2019-12-11 2023-11-03 上海风秩科技有限公司 Training data acquisition method, training data acquisition device, model training method and model training device
CN111553542B (en) * 2020-05-15 2023-09-05 无锡职业技术学院 User coupon verification rate prediction method
CN111553542A (en) * 2020-05-15 2020-08-18 无锡职业技术学院 User coupon verification and sale rate prediction method
CN113760521A (en) * 2020-09-22 2021-12-07 北京沃东天骏信息技术有限公司 Virtual resource allocation method and device
CN112819538A (en) * 2021-02-04 2021-05-18 长沙理工大学 User task prediction method and device, computer equipment and storage medium
CN113010869A (en) * 2021-03-11 2021-06-22 北京百度网讯科技有限公司 Method, apparatus, device and readable storage medium for managing digital content
CN113010869B (en) * 2021-03-11 2023-08-29 北京百度网讯科技有限公司 Method, apparatus, device and readable storage medium for managing digital content

Similar Documents

Publication Publication Date Title
CN107301562A (en) A kind of O2O reward vouchers use big data Forecasting Methodology
CN108090800A (en) A kind of game item method for pushing and device based on player's consumption potentiality
Liu et al. Intermediate input imports and innovations: Evidence from Chinese firms' patent filings
Larsen Components of uncertainty
Cho Tourism forecasting and its relationship with leading economic indicators
CN107578281A (en) User preferential certificate behavior prediction method and model building method under e-commerce environment
Moss The History of the Theory of the Firm from Marshall to Robinson and Chamberlin: the Source of Positivism in Economics
CN109345302A (en) Machine learning model training method, device, storage medium and computer equipment
CN109919685A (en) Customer churn prediction method, apparatus, equipment and computer readable storage medium
CN107491554B (en) Construction method, construction device and the file classification method of text classifier
CN104700152B (en) A kind of tobacco Method for Sales Forecast method of fusion season sales information and search behavior information
CN107066616A (en) Method, device and electronic equipment for account processing
CN107346502A (en) A kind of iteration product marketing forecast method based on big data
CN103714139A (en) Parallel data mining method for identifying a mass of mobile client bases
CN106845988A (en) method and device for selecting payment channel
CN105931068A (en) Cardholder consumption figure generation method and device
CN107507038A (en) A kind of electricity charge sensitive users analysis method based on stacking and bagging algorithms
CN111325619A (en) Credit card fraud detection model updating method and device based on joint learning
CN108876436A (en) A kind of electric business discount coupon based on integrated model uses probability forecasting method
CN114611959A (en) O2O big data technology-based product selection strategy system
CN106952420A (en) ATM cash management device, system and method
CN110222733A (en) The high-precision multistage neural-network classification method of one kind and system
CN109785002A (en) A kind of interior prediction technique of paying of user's game
Masand et al. A Comparison of Approaches for Maximizing Business Payoff of Prediction Models.
Catal et al. Improvement of demand forecasting models with special days

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication

Application publication date: 20171027

RJ01 Rejection of invention patent application after publication