WO2001020512A2 - Method for modeling market response rates - Google Patents
Method for modeling market response rates Download PDFInfo
- Publication number
- WO2001020512A2 WO2001020512A2 PCT/US2000/024414 US0024414W WO0120512A2 WO 2001020512 A2 WO2001020512 A2 WO 2001020512A2 US 0024414 W US0024414 W US 0024414W WO 0120512 A2 WO0120512 A2 WO 0120512A2
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- prospects
- selecting
- list
- variables
- group
- Prior art date
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06Q—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
- G06Q30/00—Commerce
- G06Q30/02—Marketing; Price estimation or determination; Fundraising
Definitions
- the present invention generally relates to direct marketing, and more specifically to modeling market response rates to direct solicitations.
- Direct marketing usually involves directly contacting persons or entities, such as for example by mail, with a specific message or solicitation.
- the persons to be contacted are usually identified by a mailing list.
- Today's direct marketer faces a variety of problems, such as for example rising postal and printing costs, which effect the cost of doing business.
- the success of a direct mailing is dependent on the number of responses created by the direct mailing (i.e., the response rate).
- the response rate i.e., the response rate
- blindly mailing a direct mail piece to everyone on a mailing list e.g., mass mailings
- the response rate will in all likelihood be low.
- Cost-conscious direct marketers use their knowledge about persons identified on a mailing list (i.e., a prospect) to determine the best prospects to mail to.
- a marketer will use a set of descriptor variables about each prospect, such as for example demographics and credit card ownership, to target good prospects (i.e., prospects which will find the mailing interesting).
- the Rao and Steckel model includes acquiring a set of descriptor variables and conducting a knowledge engineering session to screen the variables.
- a marketing committee may be appointed and prior experience and intuition may be used to pick out the demographic variables most relevant to the response rate.
- the probability of a response is modeled as a beta-logistic distribution and its parameters are estimated by maximum likelihood and a response score, R(i) for each prospect i, and a profit score, P(i) for each prospect i, are generated.
- each prospect is assigned a value of R(i) x P(i), and each prospect is ranked high to low based on the assigned value of R(i) x P(i).
- the ranking of prospects is intended to account for both responsiveness and profitability of the direct mailing proposal.
- the variable screening process based on opinion is subjective and error-prone.
- the present invention is a method for modeling market response rates.
- the method is used to evaluate and filter large contact lists with the aim of accomplishing two goals.
- the tactical goal is to improve a market response rate to cut costs associated with mailing, phone and electronic mail campaigns and produce more leads.
- the strategic goal is to assess the incremental risk of non-responsiveness associated with the incremental volume derived from growing a market in different directions (i.e., the tradeoffs between growing business (e.g., more solicitation) and risk of loss).
- the method of the present invention uses an internal experience database and an external demographic database, coupled with variable screening schemes and non- parametric modeling techniques.
- the method comprises a number of steps.
- a data acquisition step associates descriptor variables with prospects.
- a variable selection step identifies the descriptor variables in order to identify prospects most likely to respond to a direct marketing solicitation.
- a model selection step examines and assesses a number of competitive algorithms, and selects the algorithm that will best predict the response rate.
- a parameter estimation step ensures the best fit of data once an algorithm is chosen.
- a validation step ensures the robustness of the modeling process. Robustness means the model will work even though the data in the future is likely to be different from the data used to build the model.
- Fig. 1 is a flow diagram illustrating the steps of one embodiment of a method for modeling market response rates in accordance with the present invention
- Fig. 2 is a block diagram illustrating an example of cross-referencing of mailing records with multiple universal files in the data acquisition step of the method illustrated in Fig. 1 ;
- Fig. 3 is a flow diagram illustrating steps included in the variable selection step of the method illustrated in Fig. 1;
- Fig. 4 is a table showing an exemplary subset of Census Tract and Block
- Fig. 5 is a table showing the results of testing of models constructed using training data
- Fig. 6 is a table showing the performance of ZIP5 classifiers
- Fig. 7 is a table showing the performance of Census Tract and Block Group classifiers
- Fig. 8 is a table showing the performance of Donnelley classifiers.
- Fig. 9 is a graph summarizing the results shown in Figs. 6-8.
- the first embodiment of the invention is a method for estimating response rates for a direct marketing campaign.
- this invention will be described with reference to a direct marketing campaign that uses the mail, however, other processes can be used with this invention such as direct marketing with a phone, the Internet (electronic mail), fax machines, etc.
- a direct marketing campaign through the mail can provide a variety of information to the intended recipients.
- One possible example is to mail pieces advertising long term health care insurance. This invention should not be limited to advertising long term health care insurance and can be use for a variety of other insurance applications as well other areas that do not relate to insurance.
- the method of this invention associates demographic variables to prospects and uses non-parametric modeling techniques to predict mailing response rates for the prospects.
- the method is operable in two modes ⁇ a training mode and a testing mode - for cross-validation purposes.
- the data set is divided into two sets, a training data set that is used to build the model, and a test data set that is used to test the robustness of the model.
- the training mode historical mailing data is analyzed off-line and a decision logic (i.e., a model) is formulated to estimate the mailing response rates.
- the decision logic analyzes prospects on the fly and predicts the response rates for prospects.
- the method first comprises the step of acquiring data.
- This step generally comprises attaching household or area level demographics to a prospect (e.g., a mailing record), randomly sampling the prospects, and splitting the randomly sampled prospects into a training set and a testing set.
- a prospect e.g., a mailing record
- mailing records on a mailing list are cross referenced with a universal file (i.e., the entire data set) so that information regarding demographic variables associated with a mailing record are attached to the mailing record.
- the mailing records are cross referenced with multiple universal files.
- the mailing records can be broken down into groups using universal files available from various vendors, including for example Donnelley, Census Tract and Block Group ("CTBG”) and ZIP5. If multiple universal files are used, preferably the mailing records are broken down in to subgroups.
- CTBG Census Tract and Block Group
- Group 1 Matched with
- the mailing record is attached with individual, household and area level demographic information which are useful for identifying segments having the strongest relationship to the mailing response rate.
- the mailing record is attached with individual, household and area level demographic information which are useful for identifying segments having the strongest relationship to the mailing response rate.
- equal number of responders and non- responders are included in the group.
- all of the responders and a sub- sampling (i.e., random drawing of non-responders) of the non-responders are typically included because non-responders are much greater in size than that of responders.
- Each group is next randomly split into two sets - a training set and a testing set.
- the training set may be about 2/3 of the size of the group, and the testing set may be about 1/3 of the group.
- the method next comprises the step of variably selecting descriptor variables.
- descriptor variables are selected using the misclassification rate as a measure of the discrimination power of each input variable given the same size of tree it constructs.
- a model takes as input a list of prospects attached with demographic variables (Xs) and known response (Y's) and will produce as output four numbers — first, the number of known responders being classified as responders (sensitivity of the classifier); second, the number of known responders being classified as non-responders (missed-opportunity); third, the number of known non-responders being classified as responders (wasted-mail); and fourth, the number of known non-responders being classified as non-responders (specificity of the classifier). It is preferred that both missed-opportunity and wasted-mail be minimized.
- CART a commercially available statistical algorithm for classification be used for variable selection. Assuming there are N input variables and one output variable, and that there are equal number of responders and non-responders in the training data set, a tree model is constructed for each input variable and the output variable is used to measure how good the variable is, as shown in Fig. 3. Next, the tree is allowed to grow until the size of each terminal node is preferably no smaller than 1/100 of the original data set. Next, the tree is pruned until the number of terminal nodes is preferably around 10, which provides a balance between robustness and accuracy. Next, the misclassification rate of the tree model is computed.
- each (N) tree model ranked in ascending order of their misclassification rates.
- the top 20 trees and their input variables are selected. For example, a subset of CTBG variables selected by CART are shown in Fig. 4. This step secondly involves selecting variables out of the available Donnelley, CTBG and ZIP5 variables. In this regard, each input variable is grouped into two samples: responders and non-responders. The mean difference of the two groups for the particular input variable is next tested. In addition, the variance difference of the two groups is tested. If both the mean and variance values are significant different, then the input variable is selected. The selection criteria in this case is that both P-values from two-sample T-test and F-test are significant at 0.01 level.
- the variables that are common to the two groups of variables are preferably used.
- the method next comprises the step of selecting a classifier that will best model the mailing response rate.
- available classifiers that are to be considered as possible models are selected.
- commercially available classifiers including METROMAIL, Multivariate Adaptive Regression Splines, Logistic Regression, Neural Networks with Back-Propagation, CART and No Data Optimal Classifier (e.g., human intuition)
- the selected classifiers are compared using available universal files, such as for example, the ZIP5 universal file.
- the selected classifiers are constructed using the selected universal file that is split into a training set, to construct the classifier, and a test set, to test the robustness of the constructed classifier.
- multiple universal files are used.
- the ZIP5 universal file is split into a training set and a testing set, and the training set is used to construct a number of different classifiers such as
- METROMAIL Multivariate Adaptive Regression
- MRS Multivariate Adaptive Regression
- LR Logistic Regression
- NN-BP Neural Networks with Back-Propagation
- NDOC No Data Optimal Classifier
- the ZIP5 testing set is used to test the constructed classifiers for robustness.
- the ZIP5 universe file contained 8,407 responders and 454,732 non-responders.
- the results of the testing of each model constructed using the training data are shown using the test data to validate.
- the cost per missed- opportunity i.e., no mailing was made to prospect which would have responded
- the cost per wasted-mail is estimated to be $.33 (i.e., the cost of postage).
- the CART classifier was determined to be the best. The results of the test, and the best classifier will vary according to the classifiers used, the universal file used, and the assumptions made.
- the method further comprises the step of estimating the parameters.
- NODAC no-data optimal classifier
- NODAC no-data optimal classifier
- the parameters that can be set in tree structured classification include the priors, ⁇ (i), and variable misclassification costs, C(j
- the priors are more or less fixed ⁇ not much can be done about the 1.56% response rate on the average. Consequently, instead of bumping up the 1.56% response rate, it is preferred that the classifier's prediction accuracy is improved by using better estimates of misclassification costs.
- break-even costs for the ZIP5 priors are $17.85 and $0.33 for missed-opportunity and wasted-mail, respectively.
- a high confidence on the estimate of wasted-mail is presumed to be within 10%.
- the missed- opportunity depends on how the profit is modeled.
- the lower bound of the figure is the break-even cost of missed-opportunity: $17.85 for the ZIP5.
- the estimate of missed-opportunity increased in the positive direction, we will be tempted to mail out to all prospects due to the fact that the cost of missed-opportunity would be too high.
- the NODAC no-data optimal classifier
- Fig. 6 illustrates the performance of ZIP5 classifiers.
- Figs. 7 and 8 show the performance of CTBG and Donnelley, respectively.
- Fig 6. There are two numbers in each cell of Fig 6. One is the total misclassification cost. The other is the percentage improvement over METROMAIL. If the missed- opportunity cost is estimated as $20, then CART is clearly the better classifier. If it is estimated as $60, then both CART and NODAC give similar performance. If the estimate of missed-opportunity is further increased to $100, then CART is acting exactly like NODAC (i.e., mailing out to all prospects). Note that the performance of NODAC is the same throughout the three different estimates of missed-opportunity because the break-even cost is around $20. Consequently, NODAC just mails out to all prospects as long as the cost of missed opportunity is estimated greater than $20.
- CART is the better classifier if the missed-opportunity cost is around $20. If the missed-opportunity is estimated beyond $60, then both CART and NODAC behave in the same way.
- Figs. 6-8 The results shown in Figs. 6-8 are summarized in Fig. 9.
- the X-axis is the cost estimates of missed-opportunity, while the Y-axis is the dollars saved over METROMAIL classifier summed across the three demographics. If a lead's value is less than $20, then it is not worth doing any business. Note that a lead's value is the same as the miss-opportunity cost. At the break-even cost, $20, CART can save $318,000 over the current METROMAIL classifier. If a lead's value is $60, then either CART or NODAC can save up to $1.8MM. If a lead is valued at $100, then either CART or NODAC can save over $4.4MM over the METROMAIL.
- CART is the better classifier if a lead is valued less than $60. If a lead's value is greater than $60, then there is not much gained using CART over
- the missed-opportunity cost is 303 times greater than that of the wasted-mail cost
- the method of the present invention provides a consistent and sustainable process for building response models which can be used in a variety of direct marketing scenarios that use the phone, Internet (electronic mail), fax machines, etc.
Abstract
Description
Claims
Priority Applications (3)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CA002389222A CA2389222A1 (en) | 1999-09-15 | 2000-09-06 | Method for modeling market response rates |
AU73513/00A AU7351300A (en) | 1999-09-15 | 2000-09-06 | Method for modeling market response rates |
EP00961577A EP1224590A2 (en) | 1999-09-15 | 2000-09-06 | Method for modeling market response rates |
Applications Claiming Priority (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US39659999A | 1999-09-15 | 1999-09-15 | |
US09/396,599 | 1999-09-15 |
Publications (2)
Publication Number | Publication Date |
---|---|
WO2001020512A2 true WO2001020512A2 (en) | 2001-03-22 |
WO2001020512A8 WO2001020512A8 (en) | 2002-04-11 |
Family
ID=23567909
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/US2000/024414 WO2001020512A2 (en) | 1999-09-15 | 2000-09-06 | Method for modeling market response rates |
Country Status (4)
Country | Link |
---|---|
EP (1) | EP1224590A2 (en) |
AU (1) | AU7351300A (en) |
CA (1) | CA2389222A1 (en) |
WO (1) | WO2001020512A2 (en) |
Cited By (12)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US7698159B2 (en) | 2004-02-13 | 2010-04-13 | Genworth Financial Inc. | Systems and methods for performing data collection |
US7801748B2 (en) | 2003-04-30 | 2010-09-21 | Genworth Financial, Inc. | System and process for detecting outliers for insurance underwriting suitable for use by an automated system |
US7813945B2 (en) | 2003-04-30 | 2010-10-12 | Genworth Financial, Inc. | System and process for multivariate adaptive regression splines classification for insurance underwriting suitable for use by an automated system |
US7818186B2 (en) | 2001-12-31 | 2010-10-19 | Genworth Financial, Inc. | System for determining a confidence factor for insurance underwriting suitable for use by an automated system |
US7844477B2 (en) | 2001-12-31 | 2010-11-30 | Genworth Financial, Inc. | Process for rule-based insurance underwriting suitable for use by an automated system |
US7844476B2 (en) | 2001-12-31 | 2010-11-30 | Genworth Financial, Inc. | Process for case-based insurance underwriting suitable for use by an automated system |
US7895062B2 (en) | 2001-12-31 | 2011-02-22 | Genworth Financial, Inc. | System for optimization of insurance underwriting suitable for use by an automated system |
US7899688B2 (en) | 2001-12-31 | 2011-03-01 | Genworth Financial, Inc. | Process for optimization of insurance underwriting suitable for use by an automated system |
US8005693B2 (en) | 2001-12-31 | 2011-08-23 | Genworth Financial, Inc. | Process for determining a confidence factor for insurance underwriting suitable for use by an automated system |
US8793146B2 (en) | 2001-12-31 | 2014-07-29 | Genworth Holdings, Inc. | System for rule-based insurance underwriting suitable for use by an automated system |
US10055795B2 (en) | 2001-06-08 | 2018-08-21 | Genworth Holdings, Inc. | Systems and methods for providing a benefit product with periodic guaranteed minimum income |
CN111047343A (en) * | 2018-10-15 | 2020-04-21 | 京东数字科技控股有限公司 | Method, device, system and medium for information push |
Families Citing this family (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US8433634B1 (en) | 2001-06-08 | 2013-04-30 | Genworth Financial, Inc. | Systems and methods for providing a benefit product with periodic guaranteed income |
-
2000
- 2000-09-06 EP EP00961577A patent/EP1224590A2/en not_active Withdrawn
- 2000-09-06 WO PCT/US2000/024414 patent/WO2001020512A2/en not_active Application Discontinuation
- 2000-09-06 AU AU73513/00A patent/AU7351300A/en not_active Abandoned
- 2000-09-06 CA CA002389222A patent/CA2389222A1/en not_active Abandoned
Non-Patent Citations (1)
Title |
---|
No Search * |
Cited By (12)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US10055795B2 (en) | 2001-06-08 | 2018-08-21 | Genworth Holdings, Inc. | Systems and methods for providing a benefit product with periodic guaranteed minimum income |
US7818186B2 (en) | 2001-12-31 | 2010-10-19 | Genworth Financial, Inc. | System for determining a confidence factor for insurance underwriting suitable for use by an automated system |
US7844477B2 (en) | 2001-12-31 | 2010-11-30 | Genworth Financial, Inc. | Process for rule-based insurance underwriting suitable for use by an automated system |
US7844476B2 (en) | 2001-12-31 | 2010-11-30 | Genworth Financial, Inc. | Process for case-based insurance underwriting suitable for use by an automated system |
US7895062B2 (en) | 2001-12-31 | 2011-02-22 | Genworth Financial, Inc. | System for optimization of insurance underwriting suitable for use by an automated system |
US7899688B2 (en) | 2001-12-31 | 2011-03-01 | Genworth Financial, Inc. | Process for optimization of insurance underwriting suitable for use by an automated system |
US8005693B2 (en) | 2001-12-31 | 2011-08-23 | Genworth Financial, Inc. | Process for determining a confidence factor for insurance underwriting suitable for use by an automated system |
US8793146B2 (en) | 2001-12-31 | 2014-07-29 | Genworth Holdings, Inc. | System for rule-based insurance underwriting suitable for use by an automated system |
US7801748B2 (en) | 2003-04-30 | 2010-09-21 | Genworth Financial, Inc. | System and process for detecting outliers for insurance underwriting suitable for use by an automated system |
US7813945B2 (en) | 2003-04-30 | 2010-10-12 | Genworth Financial, Inc. | System and process for multivariate adaptive regression splines classification for insurance underwriting suitable for use by an automated system |
US7698159B2 (en) | 2004-02-13 | 2010-04-13 | Genworth Financial Inc. | Systems and methods for performing data collection |
CN111047343A (en) * | 2018-10-15 | 2020-04-21 | 京东数字科技控股有限公司 | Method, device, system and medium for information push |
Also Published As
Publication number | Publication date |
---|---|
WO2001020512A8 (en) | 2002-04-11 |
CA2389222A1 (en) | 2001-03-22 |
AU7351300A (en) | 2001-04-17 |
EP1224590A2 (en) | 2002-07-24 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US11790396B2 (en) | Preservation of scores of the quality of traffic to network sites across clients and over time | |
Simester et al. | Targeting prospective customers: Robustness of machine-learning methods to typical data challenges | |
JP4529058B2 (en) | Distribution system | |
JP4600392B2 (en) | How to select relevant campaign messages to send to recipients | |
Liu et al. | Data mining feature selection for credit scoring models | |
Wittink et al. | Forecasting with conjoint analysis | |
US20040093261A1 (en) | Automatic validation of survey results | |
US7933903B2 (en) | System and method to determine the validity of and interaction on a network | |
Lo et al. | WMR--A graph-based algorithm for friend recommendation | |
US8688518B2 (en) | Method, algorithm, and computer program for targeting messages including advertisements in an interactive measurable medium | |
US20070011224A1 (en) | Real-time Internet data mining system and method for aggregating, routing, enhancing, preparing, and analyzing web databases | |
US20070124432A1 (en) | System and method for scoring electronic messages | |
US20030195793A1 (en) | Automated online design and analysis of marketing research activity and data | |
US20060009994A1 (en) | System and method for reputation rating | |
US20090265221A1 (en) | Systems, methods, and apparatus for analyzing the influence of marketing assets | |
EP1224590A2 (en) | Method for modeling market response rates | |
Safa et al. | An artificial neural network classification approach for improving accuracy of customer identification in e-commerce | |
Curtis et al. | The citizen versus consumer hypothesis: evidence from a contingent valuation survey | |
Mild et al. | Collaborative filtering or regression models for Internet recommendation systems? | |
US11188949B2 (en) | Segment content optimization delivery system and method | |
Linder et al. | Artificial neural networks, classification trees and regression: Which method for which customer base? | |
Qabbaah et al. | Decision tree analysis to improve e-mail marketing campaigns | |
AU2014204115A1 (en) | Using a graph database to match entities by evaluating Boolean expressions | |
Wielenga | Identifying and overcoming common data mining mistakes | |
Singh et al. | An RNN-survival model to decide email send times |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
AK | Designated states |
Kind code of ref document: A2 Designated state(s): AL AM AT AU AZ BA BB BG BR BY CA CH CN CU CZ DE DK EE ES FI GB GD GE GH GM HR HU ID IL IN IS JP KE KG KP KR KZ LC LK LR LS LT LU LV MD MG MK MN MW MX NO NZ PL PT RO RU SD SE SG SI SK SL TJ TM TR TT UA UG UZ VN YU ZW |
|
AL | Designated countries for regional patents |
Kind code of ref document: A2 Designated state(s): GH GM KE LS MW MZ SD SL SZ TZ UG ZW AM AZ BY KG KZ MD RU TJ TM AT BE CH CY DE DK ES FI FR GB GR IE IT LU MC NL PT SE BF BJ CF CG CI CM GA GN GW ML MR NE SN TD TG |
|
121 | Ep: the epo has been informed by wipo that ep was designated in this application | ||
DFPE | Request for preliminary examination filed prior to expiration of 19th month from priority date (pct application filed before 20040101) | ||
WWE | Wipo information: entry into national phase |
Ref document number: 73513/00 Country of ref document: AU |
|
AK | Designated states |
Kind code of ref document: C1 Designated state(s): AL AM AT AU AZ BA BB BG BR BY CA CH CN CU CZ DE DK EE ES FI GB GD GE GH GM HR HU ID IL IN IS JP KE KG KP KR KZ LC LK LR LS LT LU LV MD MG MK MN MW MX NO NZ PL PT RO RU SD SE SG SI SK SL TJ TM TR TT UA UG UZ VN YU ZW |
|
AL | Designated countries for regional patents |
Kind code of ref document: C1 Designated state(s): GH GM KE LS MW MZ SD SL SZ TZ UG ZW AM AZ BY KG KZ MD RU TJ TM AT BE CH CY DE DK ES FI FR GB GR IE IT LU MC NL PT SE BF BJ CF CG CI CM GA GN GW ML MR NE SN TD TG |
|
D17 | Declaration under article 17(2)a | ||
WWE | Wipo information: entry into national phase |
Ref document number: 2000961577 Country of ref document: EP |
|
WWE | Wipo information: entry into national phase |
Ref document number: 2389222 Country of ref document: CA |
|
WWP | Wipo information: published in national office |
Ref document number: 2000961577 Country of ref document: EP |
|
REG | Reference to national code |
Ref country code: DE Ref legal event code: 8642 |
|
NENP | Non-entry into the national phase in: |
Ref country code: JP |
|
WWW | Wipo information: withdrawn in national office |
Ref document number: 2000961577 Country of ref document: EP |