US20080154884A1 - System and method for analyzing and correcting retail data - Google Patents
System and method for analyzing and correcting retail data Download PDFInfo
- Publication number
- US20080154884A1 US20080154884A1 US11/926,381 US92638107A US2008154884A1 US 20080154884 A1 US20080154884 A1 US 20080154884A1 US 92638107 A US92638107 A US 92638107A US 2008154884 A1 US2008154884 A1 US 2008154884A1
- Authority
- US
- United States
- Prior art keywords
- data
- sources
- stage
- attribute
- fusion
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Abandoned
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06Q—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
- G06Q30/00—Commerce
- G06Q30/02—Marketing; Price estimation or determination; Fundraising
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06Q—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
- G06Q10/00—Administration; Management
- G06Q10/06—Resources, workflows, human or project management; Enterprise or organisation planning; Enterprise or organisation modelling
- G06Q10/063—Operations research, analysis or management
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06Q—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
- G06Q10/00—Administration; Management
- G06Q10/06—Resources, workflows, human or project management; Enterprise or organisation planning; Enterprise or organisation modelling
- G06Q10/063—Operations research, analysis or management
- G06Q10/0637—Strategic management or analysis, e.g. setting a goal or target of an organisation; Planning actions based on goals; Analysis or evaluation of effectiveness of goals
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06Q—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
- G06Q30/00—Commerce
- G06Q30/02—Marketing; Price estimation or determination; Fundraising
- G06Q30/0201—Market modelling; Market analysis; Collecting market data
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06Q—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
- G06Q30/00—Commerce
- G06Q30/02—Marketing; Price estimation or determination; Fundraising
- G06Q30/0201—Market modelling; Market analysis; Collecting market data
- G06Q30/0204—Market segmentation
Definitions
- the present invention relates to computer software, and more particularly, but not exclusively, relates to systems and methods for analyzing and correcting retail data.
- POS point of sale
- IRI Information Resources, Inc.
- ACN A.C. Nielsen
- Sampling errors are those errors attributable to the normal (random) variation that would be expected due to the fact that, by the very act of sampling, measurements are not being taken from the entire population. Sampling errors can be reduced by increasing the sample size since the standard deviation of the sampling distribution (often referred to as the “standard error”) decreases with the square root of the sample size.
- Biases are systematic errors that affect any sample taken by a particular sampling method. Because these errors are systematic, they are not affected by the size of the sample. Examples of panel biases include, but are not limited to:
- bias and sampling error are present in consumer panel data, for panels of a size significant enough to be of use in tracking consumer purchases (e.g., the IRI and ACN panels), the vast majority of the error that is present is due to bias. Further, since bias is unaffected by sample size, the negative impact of bias relative to the negative impact of sampling error worsens as the panel size increases.
- bias The negative impact of bias is substantially larger than that of sampling error for most products. Increasing the size of the sample (i.e., the size of the panel) will reduce only the sampling error and may, in fact, worsen any bias that may be present. Given the sizes of today's consumer panels, there is limited advantage to be gained by increasing the size of the panel—since over 90% of the total error is often due to non-sampling errors (i.e., bias).
- Coverage includes both the number of channels in which measurements are reported and the business usefulness of those measurements. While Information Resources, Inc.'s (IRI's) point-of-sale (POS) based services provide excellent coverage of the Food/Grocery, Drug, Mass (excluding WALMART®), Convenience, and Military channels, these channels may account for only 50% of a manufacturer's sales—and as little as 20% of its sales growth. Non-tracked, growth channels—e.g., Club, Dollar, WALMART®—are, thus, becoming an increasingly important part of manufacturers' businesses while at the same time having little data available in the way of actionable sales measurement information. Further advancements are also needed in this area.
- IRI's Information Resources, Inc.'s
- POS point-of-sale
- One form of the present invention is a unique system for analyzing and correcting retail data.
- Another form includes operating a computer system that has several client workstations and servers coupled together over a network.
- At least one server is a database server that stores sale data for various data sources, product identifier and attribute categorizations, calculated factors, and other data. External sources can be used to feed the data store on a scheduled or on-demand basis.
- At least one server is a server that contains business logic for analyzing and correcting some of the data sources stored in database server.
- Some client workstations can be used to administer settings used in process of analyzing and correcting the data sources.
- Other client workstations can be used to view the corrected and/or uncorrected data in a multi-dimensional format using a graphical user interface.
- Another form includes providing a computer system that uses multiple data sources to support inferences that would not be feasible based upon any single data source when used alone.
- Sales are positioned along product, venue, and time dimension hierarchies. Characteristics of the data source determine the level of aggregation at which the data can be positioned in the framework. For example, POS data may be available weekly in a particular channel; however, direct store delivery (DSD) data may be available at a daily level, and still other measures may be available only at a monthly or quarterly level.
- DSD direct store delivery
- the situation is similar along the product and venue dimensions—ranging from the specificity of the sale of a particular UPC-coded item at a particular store to the generality of total category sales within a channel (across all geographies).
- the data fusion process itself is an iterative one, utilizing both competitive and complementary fusion methods.
- competitive fusion two or more data sources that provide overlapping measurements along at least one dimension are compared (“competed”) against each other at some level of aggregation along the product, venue, and time dimensions. More accurate/reliable sources are used to correct less accurate/reliable sources.
- complementary fusion relationships modeled where data sources overlap are projected to areas of the data framework in which fewer (or even a single) sources exist—enhancing the accuracy/reliability of those fewer (or single) sources even in domains where data from of the other sources upon which the models were based do not exist.
- the process is iterative in that the competitive and complementary fusion methodologies can be repeated at varying level of aggregation of the data framework.
- Another form includes providing a method for identifying and quantifying biases in consumer panel data so that the inherent utility of the consumer panel data may be enhanced.
- This method is termed competitive fusion.
- At least two data sources are used, with at least one assumed to be more accurate than the other—e.g., scanner-based POS data and consumer panel purchase data.
- the data sources are aligned along a common framework (i.e., data model or hierarchy) along the dimensions of product (item), venue (channel and/or geography), and/or time, with aggregation along these dimensions as necessary.
- the attributes associated with the framework are identified along which the framework may be characterized.
- the data sources are compared along these attributes—quantifying the impact of the attributes on the less-accurate data source.
- the usefulness of the consumer panel data may be enhanced.
- the effect of the biases may be corrected for via modeling; i.e., the raw data may be adjusted to reduce or eliminate the effect of the biases.
- panel management practices may be changed in order to remove or lessen the source of bias in the panel itself.
- Yet another form of the present invention includes providing a method for using complementary fusion to “project” the results and relationships from the competitive fusion method onto consumer panel data in a channel with incomplete/less data than desired (e.g. data from WALMART®) to help enhance the accuracy of the Panel data source.
- competitive fusion may be used again in several possible ways and at several levels of aggregation along the venue, time, and/or product dimensions in order to develop independent estimates against which the complementary-fused estimate may be competed:
- FIG. 1 is a diagrammatic view of a computer system of one embodiment of the present invention.
- FIG. 2 is a multi-dimensional diagram illustrating the data space used by the system of FIG. 1 .
- FIG. 3 is a block diagram illustrating selected data sources that are used by the system of FIG. 1 .
- FIG. 4 is a high-level process flow diagram for the system of FIG. 1 .
- FIG. 5A is a first part process flow diagram for the system of FIG. 1 demonstrating the stages involved in performing competitive and complementary fusion.
- FIG. 5B is a second part process flow diagram for the system of FIG. 1 demonstrating the stages involved in performing competitive and complementary fusion.
- FIG. 6A is a first part process flow diagram for the system of FIG. 1 demonstrating a preferred process for calculating and applying factors in competitive fusion.
- FIG. 6B is a second part process flow diagram for the system of FIG. 1 demonstrating a preferred process for calculating and applying factors in competitive fusion.
- FIG. 6C is a third part process flow diagram for the system of FIG. 1 demonstrating a preferred process for calculating and applying factors in competitive fusion.
- FIG. 7A is a first part process flow diagram for the system of FIG. 1 demonstrating an alternate process for calculating and applying factors in competitive fusion.
- FIG. 7B is a second part process flow diagram for the system of FIG. 1 demonstrating an alternate process for calculating and applying factors in competitive fusion.
- FIG. 7C is a third part process flow diagram for the system of FIG. 1 demonstrating an alternate process for calculating and applying factors in competitive fusion.
- FIG. 8 is a process flow diagram for the system of FIG. 1 demonstrating the stages involved in performing complementary fusion.
- FIG. 9 is a process flow diagram for the system of FIG. 1 demonstrating the stages involved in iteratively performing competitive and complementary fusion steps.
- FIG. 10 is a process flow diagram for the system of FIG. 1 demonstrating the stages involved in calculating blended factors where multiple factor measures are available for the same factor.
- FIG. 11 is a data table illustrating hypothetical data elements stored in the database of FIG. 1 to be used in accordance with the procedure of FIG. 6 .
- FIG. 12 is a data table illustrating hypothetical data elements that are stored in the database of FIG. 1 and are adjusted according to factors for a first attribute in accordance with the procedure of FIG. 6 .
- FIG. 13 is a data table illustrating hypothetical data elements that are stored in the database of FIG. 1 and are adjusted according to factors for a second attribute in accordance with the procedure of FIG. 6 .
- FIG. 14 is a data table illustrating hypothetical data elements that are stored in the database of FIG. 1 and are adjusted according to factors for a third attribute in accordance with the procedure of FIG. 6 .
- FIG. 15 is a data table illustrating hypothetical data elements stored in the database of FIG. 1 , with attribute summaries, and used in accordance with the procedure of FIG. 7 .
- FIG. 16 is a data table illustrating hypothetical data elements that are stored in the database of FIG. 1 and are adjusted according to factors for three attributes in accordance with the procedure of FIG. 7 .
- FIG. 17 is a data table illustrating hypothetical data elements by retailer that are stored in the database of FIG. 1 and used in accordance with the complementary fusion procedure of FIG. 8 .
- FIG. 18 is a data table illustrating hypothetical data elements by retailer that are stored in the database of FIG. 1 , adjusted using complementary fusion according to the factors calculated in accordance with the procedure of FIG. 7 , as described in the procedure of FIG. 8 .
- FIG. 19 is a data table illustrating hypothetical data elements by retailer that are stored in the database of FIG. 1 and are used to perform another iteration of competitive fusion, including calculating blended factors, as described in the procedures of FIG. 9 and FIG. 10 .
- FIG. 20 is a data table illustrating hypothetical data elements by retailer that are stored in the database of FIG. 1 and updated based upon the blended factor, as described in the procedures of FIG. 9 and FIG. 10 .
- FIG. 21 is a data table illustrating hypothetical real, original, and corrected values stored in the database of FIG. 1 to show how the competitive and complementary fusion process helped improve the data, as described in the procedures of FIG. 9 .
- FIG. 22 is a simulated screen of a user interface for one or more client workstations of FIG. 1 that allows a user to view the multi-dimensional elements in the database, as described in the procedures of FIG. 4 and FIG. 5 .
- FIG. 1 is a diagrammatic view of computer system 20 of one embodiment of the present invention.
- Computer system 20 includes computer network 22 .
- Computer network 22 couples together a number of computers 21 over network pathways 23 a - e .
- system 20 includes several servers, namely business logic server 24 and database server 25 .
- System 20 also includes external data sources 26 , which in various embodiments include other computers, files, electronic and/or paper data sources. External data sources 26 are optionally coupled to network over pathway 23 f .
- System 20 also includes client workstations 30 a , 30 b , and 30 c (collectively client workstations 30 ). While computers 21 are each illustrated as being either a server or a client, it should be understood that any of computers 21 may be arranged to provide both a client and server functionality, solely a client functionality, or solely a server functionality. Furthermore, it should be understood that while six computers 21 are illustrated, more or fewer may be utilized in alternative embodiments.
- Computers 21 include one or more processors or CPUs ( 50 a , 50 b , 50 c , 50 d , and 50 e , respectively) and one or more types of memory ( 52 a , 52 b , 52 c , 52 d , and 52 e , respectively).
- Each memory 52 a , 52 b , 52 c , 52 d , and 52 e includes a removable memory device.
- Each processor may be comprised of one or more components configured as a single unit. Alternatively, when of a multi-component form, a processor may have one or more components located remotely relative to the others. One or more components of each processor may be of the electronic variety defining digital circuitry, analog circuitry, or both.
- each processor is of a conventional, integrated circuit microprocessor arrangement, such as one or more PENTIUM III or PENTIUM 4 processors supplied by INTEL Corporation of 2200 Mission College Boulevard, Santa Clara, Calif. 95052, USA.
- Each memory is one form of computer-readable device.
- Each memory may include one or more types of solid-state electronic memory, magnetic memory, or optical memory, just to name a few.
- each memory may include solid-state electronic Random Access Memory (RAM), Sequentially Accessible Memory (SAM) (such as the First-In, First-Out (FIFO) variety or the Last-In-First-Out (LIFO) variety), Programmable Read-Only Memory (PROM), Electronically Programmable Read-Only Memory (EPROM), or Electrically Erasable Programmable Read-Only Memory (EEPROM); an optical disc memory (such as a DVD or CD ROM); a magnetically encoded hard disc, floppy disc, tape, or cartridge media; or a combination of any of these memory types.
- each memory may be volatile, nonvolatile, or a hybrid combination of volatile and nonvolatile varieties.
- each computer 21 is coupled to a display.
- Computers 21 may be of the same type, or be a heterogeneous combination of different computing devices.
- the displays may be of the same type, or a heterogeneous combination of different visual devices.
- each computer 21 may also include one or more operator input devices such as a keyboard, mouse, track ball, light pen, and/or microtelecommunicator, to name just a few representative examples.
- operator input devices such as a keyboard, mouse, track ball, light pen, and/or microtelecommunicator, to name just a few representative examples.
- one or more other output devices may be included such as loudspeaker(s) and/or a printer.
- Various display and input device arrangements are possible.
- Computer network 22 can be in the form of a wired or wireless Local Area Network (LAN), Municipal Area Network (MAN), Wide Area Network (WAN) such as the Internet, a combination of these, or such other network arrangement as would occur to those skilled in the art.
- the operating logic of system 20 can be embodied in signals transmitted over network 22 , in programming instructions, dedicated hardware, or a combination of these. It should be understood that more or fewer computers 21 can be coupled together by computer network 22 .
- system 20 operates at one or more physical locations where business logic server 24 is configured as a server that hosts and runs application business logic 33 , database server 25 is configured as a database 34 that stores reference data 35 (e.g. product identifiers 36 a , attributes 36 b , and a dictionary 36 c ), at least two retail data sources (such as point-of-sale and panel data) 38 , calculated factors 39 , and other data 40 .
- reference data 35 e.g. product identifiers 36 a , attributes 36 b , and a dictionary 36 c
- at least two retail data sources such as point-of-sale and panel data
- calculated factors 39 e.g., calculated factors 39
- other data 40 e.g., external data 26 is imported to database server 25 from a mainframe extract file that is generated on a periodic basis.
- mainframe extract file e.g., a mainframe extract file that is generated on a periodic basis.
- external data sources are not used.
- database 34 of database server 25 is a relational database and/or a data warehouse.
- database 34 can be a series of files, a combination of database tables and external files, calls to external web or other services that return data, and various other arrangements for accessing data for use in a program as would occur to one of ordinary skill in the art.
- Client workstations 30 are configured for providing one or more user interfaces to allow a user to modify settings used by business logic 33 and/or to view the retail data sources 38 of database 34 in a multi-dimensional format. Typical applications of system 20 would include more or fewer client workstations of this type at one or more physical locations, but three have been illustrated in FIG. 1 to preserve clarity.
- business logic server 24 and database server 25 could be provided on the same computer or varying other arrangements of computers at one or more physical locations and still be within the spirit of the invention. Farms of dedicated servers could also be provided to support the specific features if desired.
- FIG. 2 is a multi-dimensional cube 60 that illustrates a way of conceptually thinking about the elements stored in database 34 of system 20 .
- Cube 60 contains three dimensions: complexity 62 , sources 64 , and aggregation 66 .
- complexity 62 the data in database 34 is categorized according to complexity 62 , sources 64 , and aggregation 66 axes of multi-dimensional cube 60 for analysis, viewing, and/or reporting.
- Cube 60 helps illustrate the concept that the aggregation dimension 66 is multi-dimensional, although other dimensions could be used than illustrated.
- Examples of elements of the source dimension 64 includes client (internal) data 65 a , scanning (point-of-sale) data 65 b , panel data 65 c , audit data 66 d , and other (external) data 66 e , as a few examples.
- Examples of elements of the aggregation dimension 66 include time 67 a , item (product) 67 b , channel (venue) 67 c , geography (venue) 67 d , and other 67 e , to name a few examples.
- Various dimensions of cube 60 are used in the competitive fusion and complementary fusion processes described herein.
- FIG. 3 is a block diagram illustrating further examples of the one or more retail data sources ( 36 in FIGS. 1 and 64 in FIG. 2 ) that can be used by the system of FIG. 1 in the competitive fusion and complementary fusion processes described herein.
- Point-of-sale data 70 consumer panel data 72
- audit/survey data 74 including causal (promotional) data
- population census data 78 including geo-demographic data
- store universe data 80 including geo-demographic data
- other data sources 82 and specialty panels 84
- the types of data that can be used with system 20 are not limited to traditional retailers. For example, data collected during any part of the supply chain could be used as a data source.
- FIG. 4 illustrates the high-level procedures for performing “competitive fusion” and “complementary fusion”.
- “competitive fusion” two or more data sources that provide overlapping measurements along at least one dimension are compared (“competed”) against each other at some level of aggregation along the product, venue, and/or time dimensions. More accurate/reliable sources are used to correct less accurate/reliable sources.
- procedure 150 is at least partially implemented in the operating logic of system 20 .
- Procedure 150 begins with business logic server 24 identifying at least two data sources, with at least one data source being more accurate than another (stage 152 ).
- At least one data source (see e.g. 36 in FIGS. 1 and 64 in FIG. 2 ) is used as the “reference” data source and another is used as the “target” data source with the biases to be identified and quantified.
- the reference data source is more accurate than the target data source.
- scanner-based point-of-sale (POS) data is typically a good “reference” source, due to its inherent accuracy and high level of granularity along the dimensions of time, venue, and product.
- manufacturer-supplied shipment data especially where such data is based upon direct store delivery (DSD) information, may be utilized as a “reference” source.
- DSD direct store delivery
- retailer-specific data sources e.g., “frequent shopper” program data from loyalty cards
- the product characteristics of the data sources should ideally be available at the item level, where “item” is by UPC, SKU, or another unique product identifier.
- item is by UPC, SKU, or another unique product identifier.
- venue characteristics of the data sources they should ideally be available at the retailer and market level, where “retailer” is a store (or chain of stores) within a particular retail channel and “market” is a geographic construct (e.g., Chicago area).
- “retailer” is a store (or chain of stores) within a particular retail channel and “market” is a geographic construct (e.g., Chicago area).
- time characteristics of the data sources they should ideally be available at the weekly level (or even daily in some cases), although monthly data (or 4-week “quad” data) or various other time frames are also acceptable.
- more aggregated levels of the product e.g., “brand”
- venue e.g., “food” or “mass” channel for retailer and/or “region” or “total U.S.” for market
- time e.g., quarterly or annual data
- stage 154 After the data sources have been identified (stage 152 ), they are next aligned along a common framework (stage 154 ), such as along the item, venue, and/or time dimensions. Depending upon the characteristics (and quality) of the data sources, some aggregation along these dimensions may be required in order for the alignment to be possible. For example, UPC-level POS data may need to be aggregated at the SKU or even brand level in order to be aligned with data from other sources (particularly in the cases in which venue-specific UPCs are involved). Similarly, store-level data may need to be aggregated at the local market or even regional level in order to be aligned with consumer panel purchase data. Finally, weekly (or even daily) POS data may need to be aggregated at the 4-week quad level in order to be aligned with shipment/delivery data. Various other arrangements for aligning the data along a common framework are also possible.
- the item structure is provided by a multiple-level hierarchy, in which UPCs are the lowest level and are aggregated along category-related characteristics.
- Venue structure is provided along both geographical and channel dimensions, with FIPS-code-level transactions being aligned along market and regions and store locations being part of a sub-chain, chain, and parent store hierarchy.
- Time structure is presently provided at the weekly level at the lowest level of aggregation, with daily data being aggregated at the weekly level before placement into the structure, although a daily data compatible structure or other variation is also possible.
- overlapping attribute segments of at least one dimension are available to use for data comparison and correction.
- Certain attributes associated with the data sources are identified along which more detailed comparisons may be made.
- product attributes are available in from reference data 35 of database 34 .
- product attributes are available in from reference data 35 of database 34 .
- one or more pieces of information from product identifier 36 a , attributes 36 b , and dictionary 36 c references can be used to access or modify attributes, attribute hierarchies, and mappings.
- These attributes represent category-specific dimensions along which products in that category may be characterized (e.g., diet vs. regular in carbonated soft drinks, active ingredient in internal analgesics, product size in most categories).
- attribute used herein is meant in the generic sense to cover various types of descriptors.
- Business logic server 24 compares the data sources and calculates factors for the attributes of at least one element of the common framework (stage 158 ). Each segment of a given attribute will have its own factor, as described in detail herein. The presence of attribute-related bias may be identified by comparison of the data sources. In the examples illustrated herein, volumetric comparisons are made (e.g., equivalent units); however, various other measures (e.g., dollar sales, actual units) could also be utilized, as long as the same type of measure is being used for the comparison. For example, it would not be useful to compare dollar sales to actual units, but it would be useful to compare dollars to dollars.
- the comparison itself is between the value of the target data source (e.g., projected panel volume) and that of the reference data source (e.g., POS data).
- This comparison can be by way of two-sample inference, regression analysis, or other statistical tests appropriate for determining whether any differences between the two data sources are associated with the attributes along which they have been characterized at a statistically significant level. Where such differences (biases) are identified, they are quantified, and factors are calculated for use in bias correction/adjustment.
- the factors are used to correct bias in the less accurate data source (stage 160 ), which in this example is consumer panel data.
- stage 160 By using the factors to correct the bias in the less accurate “target” data source, the effect of these biases is reduced or eliminated.
- These biases can be corrected by adjusting the raw data, or by way of post-adjustment.
- the factors are also used to supplement the data that is incomplete in the less complete data source (stage 162 ), such as consumer panel data.
- Incomplete data is used in a general sense to mean that less data was provided than desired or that the data is less accurate than desired, to name a few non-limiting examples.
- highly accurate data e.g. POS data
- less accurate data e.g. panel data
- Relationships modeled where data sources overlap are projected to areas of the data framework in which fewer (or even a single) sources exist, enhancing the accuracy and reliability of those fewer (or single) sources even in domains where data from of the other sources upon which the models were based do not exist.
- Users and/or reports can access database 34 from one of client workstations 30 to view/analyze the corrected and adjusted data (stage 164 ). Users and/or reports can also access database 34 from one of client workstations 30 to view and/or modify settings used by system 20 to make data corrections. The steps are repeated as desired (stage 166 ). The process then ends at stage 168 .
- FIGS. 5A-5B are first and second parts of a process flow diagram for the system of FIG. 1 demonstrating the stages involved in performing competitive and complementary fusion using POS and panel data as the data sources. While in this and other figures, the first data source (the “source” data source) is described as being POS data and the second data source (the “target” data source) is described as being panel data, it will be appreciated that the system and methodologies can be used with other data sources as appropriate.
- procedure 170 is at least partially implemented in the operating logic of system 20 . Procedure 170 begins in FIG. 5A with receiving updates for reference data 35 and/or data sources 38 on a periodic basis (stage 172 ).
- a parameter specification for the number of weeks used in calculating the factors is thirteen, and the minimum week range included in database 34 is then set to be thirteen weeks prior to the update week.
- Database 34 may be built and maintained using various data sources and can include various types of data, as would occur to one of ordinary skill in the art.
- system 20 supports the option to pull the desired period (e.g. all thirteen weeks) of the data sources 38 , append the recent period (e.g. four weeks) needed since the last factor update to the existing database 34 , and/or be able to recreate the data a week at a time.
- the system can optionally drop the same number of weeks from the start week of database 34 as were appended to the end week. For example, if the option was chosen to append the four weeks needed since the last factor update, the system should drop the four oldest weeks from the existing database 34 when appending the four new weeks.
- the received updates to reference data 35 and/or data sources 38 are stored in database 34 (stage 174 ).
- the system determines that data adjustments should be made to correct bias (decision point 175 ).
- Application business logic 33 ensures reference data 35 and data sources 38 are up to date, and if not, updates them accordingly (stage 176 ).
- reference data 35 is reviewed to ensure that the default attributes for the current category will be appropriate for the client or scenario, and adjustments are made to reference data 35 as appropriate (stage 177 ).
- attribute segments may be reviewed and translated to more succinct segmentations that better classify the product identifiers. Other variations are also possible.
- a product-identifier-to-attribute-segment mapping is prepared for the product identifiers (e.g. UPC's) (stage 178 ). If the attributes are determined to be irrelevant, they can be removed from further consideration in this process.
- the attribute table 36 b is a reference table that maps each product identifier 36 a to a set of attribute variables. While UPC's are described as a common product identifier, other identifiers could also be used. For example, not every dataset has a UPC, but may have a product identifier at a higher, lower, or equivalent level. Rules are used to determine supportable attribute segments and relevant attributes.
- the UPC is assigned to a new segment “not supportable.” All segments with less than a 5% share are assigned to “not supportable.” Furthermore, in one embodiment, if the final “not supportable” category accounts for >50% of the category share, then the attribute is designated as “irrelevant.” Other ways for determining relevance can also be used, or relevance can simply be ignored. Stage 178 can be repeated to arrive at the final level of segments to use (rolled-up or drilled-down) as appropriate.
- source (e.g. POS) and target (e.g. panel) data 38 are retrieved from database 34 and summarized by attribute segments (stage 180 ).
- Factors are calculated for attribute segments (stage 181 ).
- the significance of the attribute segments is determined (stage 182 ). If any non-significant factors are determined, the significant attribute factors can be re-aligned (stage 183 ).
- the factors for each attribute segment are applied to the target (panel) data to correct bias (stage 184 ).
- the factors are also applied to the target (panel) data to correct data that is incomplete (e.g. less available) (stage 186 ).
- the competitive and/or complementary data fusion steps can be repeated as desired or appropriate (stage 187 ).
- FIGS. 6-10 illustrate the competitive and complementary fusion stages in further detail.
- FIGS. 6A-6C are first, second, and third parts of a process flow diagram for the system of FIG. 1 demonstrating a preferred process for iteratively calculating and applying factors in competitive fusion.
- procedure 200 is at least partially implemented in the operating logic of system 20 .
- Procedure 200 begins on FIG. 6A with summing source (POS) data by the most granular product and time dimension (e.g. UPC) (stage 202 ) and summing target (panel) data by the most granular product and time dimension (e.g. UPC) (stage 204 ). In one embodiment, they are both summed to weekly (e.g. 52) totals.
- POS summing source
- UPC time dimension
- panel panel
- weekly e.g. 52
- Business logic server 24 determines the period of time to use in the analysis (stage 206 ), such as to use all of the weekly totals summed in the prior step or to use only part of the weekly totals that cover a desired time period, such as the most recent 13 weeks, to name a few examples.
- Outliers are also eliminated (stage 207 ) at this point or another appropriate point before final calculations. For example, in one embodiment, although thirteen weeks are contained in the dataset, only 11 weeks are actually used in calculations. Research indicates that panel volume is extremely vulnerable to outliers. To minimize the potential impact of outliers, the week with the lowest coverage and the week with the highest coverage are eliminated from further use in calculations for the current update.
- Business logic server 24 then merges the source (POS) data, target (panel) data, and product identifier to attribute segment mapping reference data (stage 208 ). Attributes can optionally be sorted in order by importance (stage 210 ). In one embodiment, the least important is first and the most important is last. If factors for the most important attribute segments are the last ones applied, it usually has the most significant mathematical effect because no lesser important attribute segment factor will be applied after that last calculation to further skew the results.
- An initial factor of 1.0 is assigned to all attribute segment (stage 212 ).
- source (POS) and target (panel) data are then summarized for the segments of the current attribute (stage 214 ).
- a factor is calculated for each attribute segment of the current attribute as source data volume divided by target data volume (stage 216 ). Other mathematical variations could also be used.
- For each segment of the current attribute determine whether the attribute segment is significant (stage 218 ). In one embodiment, shares are calculated for the attribute segments, such as by dividing the Calculation Period Segment Total U.S. POS volume by the Calculation Period Category Total U.S. POS volume.
- Significance is then determined by first analyzing a confidence interval (CI) for each share to determine if there is overlap between the POS share CI and the panel share CI. If there is overlap, then the difference between source and target shares is not significant and the attribute segment will be designated as “nonsignificant.” Other ways for determining significance can also be used, or significance can be assumed.
- CI confidence interval
- each volume is multiplied by the factor for the corresponding segment (stage 224 ). Again, other mathematical variations could also be used.
- the factors for each attribute segment are then saved to factor data store 39 of database 34 (stage 226 ). If another attribute is present (decision point 228 ), the next attribute is made the current attribute (stage 230 ) and stages 214 - 226 are repeated. These stages are repeated until all attributes are processed. Continuing with FIG.
- a category adjustment factor is applied to all product identifiers as necessary (stage 232 ) to adjust for the level of coverage.
- the use of a category adjustment factor depends on the type of measure being used. For example, where volume is used, coverage adjustments may not be necessary, but where shares are used, further coverage adjustments may be necessary. Any final factors for the category adjustment factor are saved to the factor data store 39 of database 34 (stage 234 ). The process 200 then ends at stage 236 .
- FIGS. 7 A-&C are first, second, and third parts of a process flow diagram for the system of FIG. 1 demonstrating an alternate process for calculating and applying factors in competitive fusion.
- procedure 250 is at least partially implemented in the operating logic of system 20 .
- Procedure 250 begins on FIG. 7A with summing the more reliable (source) data source (e.g., POS data) by the most granular product and time dimension (e.g. UPC) (stage 252 ) and summing the less accurate (target) data source (e.g., panel data) by the most granular product and time dimension (stage 254 ).
- source source
- target data source e.g., panel data
- Business logic server 24 determines the period of time to use in the analysis (stage 256 ) and eliminates outliers (stage 257 ), as discussed in FIG. 6 .
- Source data, target data, and product identifiers to attribute segment mapping data are merged (stage 258 ).
- An initial factor of 1.0 is assigned to each attribute segment (stage 260 ).
- Source and target data are summarized to the segments for all attributes (stage 262 ).
- each attribute segment is calculated for each attribute segment as source volume divided by target volume (stage 264 ).
- Business logic server 24 determines whether the attribute segment is significant (stage 266 ), as described in FIG. 6 . Where two or more segments for any particular attribute are insignificant (decision point 268 ), then the significant factors are re-aligned to account for the elimination of the insignificant segment factors in the particular attribute (stage 270 ).
- each volume is multiplied by the factor for each corresponding segment (stage 272 ). In other words, all of the factors applicable to the volume are applied simultaneously, as opposed to iteratively as shown in FIG. 6 .
- the factors are then saved to factor data store 39 for each attribute segment (stage 274 ).
- a category adjustment factor is applied to all product identifiers as necessary (stage 276 ), as described in FIG. 6 .
- the final factors for the category adjustment factor are saved to the factor data store 39 of database 34 (stage 277 ).
- the procedure 250 then ends at stage 278 .
- Procedure 250 should only be used in the appropriate circumstances, such as when the attributes are not affected by each other and iteration is not needed for greater accuracy, to name one example. If attributes are affected by each other and procedure 250 is used instead of the iterative procedure of FIG. 6 , then the results will be mathematically different, with the procedure of FIG. 6 producing a more accurate result.
- FIG. 8 is a process flow diagram for the system of FIG. 1 demonstrating the stages involved in performing complementary fusion.
- procedure 280 is at least partially implemented in the operating logic of system 20 .
- Procedure 280 begins with merging source data, target data, and product identifier data to attribute segment mapping data (stage 282 ).
- the factors previously calculated in accordance with FIG. 6 or FIG. 7 are applied to the product identifier-level target data based on the attribute segment mapping to correct the data for incompleteness (e.g. less data than desired) (stage 286 ).
- the target data elements that are corrected in this process can be the same, different, or overlapping from the target data that was used to help calculate the factors.
- the procedure 280 then ends at stage 288 .
- FIG. 9 is a process flow diagram for the system of FIG. 1 demonstrating the stages involved in performing repeating competitive and complementary fusion steps multiple times.
- procedure 290 is at least partially implemented in the operating logic of system 20 .
- Procedure 290 begins with determining what additional public or private data sources are available to use for competitive fusion along venue, time, and/or product dimensions (stage 292 ). Using one or more of those data sources, additional factors are calculated that are independent estimates against which the complementary-fused estimate may be competed (stage 294 ). The newly calculated factors are applied to the product identifier-level target data (e.g. POS data) to further adjust the data (stage 296 ).
- the competitive and complementary fusion steps can be repeated as desired and/or appropriate (stage 298 ).
- the procedure 290 then ends at stage 299 .
- FIG. 10 is a process flow diagram for the system of FIG. 1 demonstrating the stages involved in calculating blended factors where multiple factor measures are available for the same factor.
- procedure 300 is at least partially implemented in the operating logic of system 20 .
- Procedure 300 can be used when competitive fusion is being performed and at least two data sources are available for the same factor (stage 302 ).
- Factors for each aggregation of the current data source are calculated by dividing source data volume by target data volume (stage 305 ).
- a blended factor (stage 308 ) where the more accurate source is given a higher weight and the less accurate source is given a lower weight.
- a blended factor uses an “inverse-variance-weighted” method (see 444 on FIG. 19 as an example).
- FIG. 11 is a data table illustrating hypothetical data elements that are adjusted according to the preferred embodiment competitive fusion procedure of FIG. 6 .
- POS data 320 , panel data 322 , and attribute information 324 are shown in a summarized form by UPC 326 . For each attribute and its corresponding segments, various steps are performed as discussed below.
- the data is assumed to be relevant and the POS and panel data shown in table 330 are then summarized for the segments of the current attribute (stage 214 ), which in the current iteration is manufacturer 332 .
- Private brand label summaries 334 and non-private brand label summaries 336 for POS 338 and panel data 340 are calculated from table 330 as illustrated.
- a factor 342 for each attribute segment of the current attribute, in this case private label manufacturer 334 and non-private label manufacturer 336 segments, is calculated as POS volume 338 divided by panel volume 340 (stage 216 ).
- Business logic server 24 determines whether the current attribute segment is significant (stage 218 ). For purposes of illustrating the current example, all attribute segments are also assumed significant.
- each panel volume 344 is multiplied by the factor 342 for its corresponding segment (stage 224 ) to arrive at an adjusted panel value 346 .
- Factors 342 are saved to the factor data store 39 of database 34 (stage 226 ).
- stages 214 to 226 repeat for each attribute, with previously adjusted data being used in the calculation.
- FIG. 13 illustrates data elements being adjusted according to factors calculated for a second attribute in accordance with the procedure of FIG. 6 .
- the POS and panel data shown in table 350 are then summarized for the segments of the current attribute (stage 214 ), which in the current iteration is type 352 .
- Summaries for regular type 354 and special type 356 for POS 358 and panel data 360 are calculated from table 350 as illustrated.
- a factor 362 for each attribute segment of the current attribute, in this case regular type 354 and special type 356 segments, is calculated as POS Volume 358 divided by panel volume 360 (stage 216 ).
- the previously adjusted panel volume 364 is multiplied by the factor 362 for its corresponding segment (stage 224 ) to arrive at yet another adjusted panel value 366 .
- Factors 362 are saved to the factor data store 39 of database 34 (stage 226 ).
- FIG. 14 illustrates data elements being adjusted according to factors calculated for a third attribute in accordance with the procedure of FIG. 6 .
- the POS and panel data shown in table 370 are then summarized for the segments of the current attribute (stage 214 ), which in the current iteration is size 372 .
- Summaries for size big 374 , size medium 375 , and size small 376 for POS 378 and panel data 380 are calculated from table 370 as illustrated.
- a factor 382 for each attribute segment of the current attribute, in this case size big 374 , medium 375 , and small 376 segments, is calculated as POS Volume 378 divided by panel volume 380 (stage 216 ).
- each previously adjusted panel volume 384 is multiplied by the factor 382 for its corresponding segment (stage 224 ) to arrive at yet another adjusted panel value 386 .
- Factors 382 are saved to the factor data store 39 of database 34 (stage 226 ). After processing all attributes, the final factors are saved to the factor data store 39 of database 34 (stage 234 ). The process then ends at stage 236 .
- FIGS. 15 and 16 illustrate data elements being adjusted according to factors calculated according to an alternative embodiment competitive fusion process in accordance with the procedure of FIG. 7 .
- Business logic server 24 determines the period of time to use in the analysis (stage 256 ), and merges POS, panel, and attribute information by UPC as shown in table 390 (stage 258 ).
- POS data 392 and panel data 394 are summarized for all attribute segments (stage 262 ), in this case by manufacturer 396 , type 398 , and size 400 .
- factors for each attribute segment 402 are calculated as each respective POS volume 404 divided by each respective panel volume 406 (stage 264 ).
- Each panel volume 407 is multiplied by the factors 408 a - 408 c appropriate for its corresponding segment (stage 272 ) to calculate an adjusted panel value 410 .
- the process ends at stage 278 .
- FIG. 17 is a data table illustrating hypothetical data elements by retailer that are stored in the database of FIG. 1 and used in accordance with the complementary fusion procedure of FIG. 8 .
- POS, panel and attribute information are merged by UPC (stage 282 ) for multiple retailers, as shown in table 420 .
- Client shipment data 424 is also merged by UPC.
- Shares are calculated for POS data 420 a - 420 b and panel data 422 a - 422 c for the segments of each attribute (stage 284 ).
- the previously calculated factors 430 a - 430 c 408 a - 408 c in FIG.
- stage 16 are applied to the UPC level panel data 432 a - 432 c to further adjust the data to correct for incompleteness (stage 286 ) and arrive at an adjusted panel value 434 a - 434 c .
- the complementary fusion process then ends at stage 288 .
- FIGS. 19 and 20 illustrate performing another iteration of competitive fusion, including calculating blended factors, as described in the procedures of FIG. 9 and FIG. 10 .
- Additional public or private data sources are identified as available to use for competitive fusion (stage 292 ).
- channel specific totals 440 a - 440 f across attributes have been identified for use in competitive fusion.
- client shipment total 440 e and panel total 440 f can also be used for comparison.
- additional factors 442 have been calculated that are independent estimates against which the complementary-fused data from FIG. 18 may be competed (stage 294 ).
- a blended factor 444 has been calculated since multiple data sources were available for the same factor (stages 302 - 308 in FIG. 10 ). As shown in FIGS. 19 and 20 , each volume 446 a - 446 c of the previously adjusted UPC-level panel data is then multiplied by the blended factor to arrive at the newly adjusted panel values 450 a - 450 c (stage 298 in FIG. 9 , and stage 310 in FIG. 10 ).
- FIG. 21 is a data table illustrating hypothetical table 460 of end results for POS data elements by retailers 2 and 3 , with a comparison to reality FIGS. 462 a - 462 b , pre-fusion FIGS. 464 a - 464 b , and post-fusion FIGS. 466 a - 466 b to show how the competitive and complementary fusion processes according to FIGS. 4-10 and illustrated in the hypothetical of FIGS. 11-20 helped improve the data accuracy.
- FIG. 22 is a simulated screen of a user interface for one or more client workstations 30 that allows a user to view the multi-dimensional elements in the database, as described in the procedures of FIG. 4 and FIG. 5 .
- the updated data can be used by various systems, users, and/or reports as appropriate.
- a method comprising identifying a plurality of data sources, wherein at least a first data source is more accurate than a second data source; identifying a plurality of overlapping attribute segments to use for comparing the data sources; calculating a factor as a function of each of the plurality of overlapping attribute segments; and using the factors to update a first group of values in the second data source to reduce bias.
- a method comprising receiving point-of-sale data and panel data on a periodic basis; identifying a plurality of product identifiers and a plurality of attributes to analyze; retrieving and summarizing the point-of-sale data and the panel data by the plurality of product identifiers, the plurality of attributes, and a plurality of corresponding attribute segments for a specified time period; calculating a factor for each attribute segment of a particular attribute; and applying the factors for the particular attribute segment to the panel data to correct panel bias.
- a method comprising receiving point-of-sale data and panel data on a periodic basis; identifying a plurality of product identifiers and a plurality of attributes to analyze; retrieving and summarizing the point-of-sale data and the panel data by the plurality of product identifiers, the plurality of attributes, and a plurality of corresponding attribute segments for a specified time period; calculating a plurality of factors, wherein one factor is calculated for each attribute segment of the plurality of attributes; and applying the factors to the second data source to reduce bias; and applying the factors to the second data source to reduce incompleteness.
- a method comprising identifying a plurality of product identifiers and a plurality of attributes to analyze for at least two data sources, wherein at least a first data source is more accurate than a second data source; retrieving and summarizing the first data source and the second data source by the plurality of product identifiers, the plurality of attributes, and a plurality of corresponding attribute segments for a specified time period; calculating a plurality of factors, wherein one factor is calculated for each attribute segment of the plurality of attributes; applying the factors to the second data source to reduce bias; and applying the factors to a different or overlapping dataset of the second data source to reduce incompleteness.
- a system comprising one or more servers being operable to store retail data from at least two data sources, store product identifier and attribute categorizations, and store a plurality of factor calculations; wherein the at least two data sources includes a first data source that is more accurate than a second data source; and wherein one or more of said servers contains business logic that is operable to identify and retrieve a plurality of overlapping attribute segments to use for comparing the at least two data sources, compare each of the overlapping attribute segments, calculate a factor for each of the overlapping attribute segments, and use the factors to update a first group of values in the second data source to reduce bias.
- an apparatus comprises a device encoded with logic executable by one or more processors to: identify and retrieve a plurality of overlapping attribute segments to use for comparing at least two data sources, wherein the at least two data sources includes a first data source that is more accurate than a second data source, compare each of the overlapping attribute segments, calculate a factor for each of the overlapping attribute segments, and use the factors to update a first group of values in the second data source to reduce bias.
Landscapes
- Business, Economics & Management (AREA)
- Engineering & Computer Science (AREA)
- Strategic Management (AREA)
- Development Economics (AREA)
- Entrepreneurship & Innovation (AREA)
- Accounting & Taxation (AREA)
- Finance (AREA)
- Human Resources & Organizations (AREA)
- Economics (AREA)
- Marketing (AREA)
- Physics & Mathematics (AREA)
- General Business, Economics & Management (AREA)
- General Physics & Mathematics (AREA)
- Theoretical Computer Science (AREA)
- Game Theory and Decision Science (AREA)
- Educational Administration (AREA)
- Data Mining & Analysis (AREA)
- Operations Research (AREA)
- Quality & Reliability (AREA)
- Tourism & Hospitality (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
- Management, Administration, Business Operations System, And Electronic Commerce (AREA)
Abstract
A computer system and method is disclosed that analyzes and corrects retail data. The system and method includes several client workstations and one or more servers coupled together over a network. A database stores various data used by the system. A business logic server uses competitive and complementary fusion to analyze and correct some of the data sources stored in database server. The data fusion process itself is an iterative one—utilizing both competitive and complementary fusion methods. In competitive fusion, two or more data sources that provide overlapping attributes are compared against each other. More accurate/reliable sources are used to correct less accurate/reliable sources. In complementary fusion, relationships modeled where data sources overlap are projected to areas of the data framework in which fewer sources exist—enhancing the accuracy/reliability of those fewer sources even in the absence of the other sources upon which the models were based.
Description
- The present invention relates to computer software, and more particularly, but not exclusively, relates to systems and methods for analyzing and correcting retail data.
- The measurement of sales in retail channels can be done via a variety of methods. Initially, sample-based audits of consumer purchases at check-out were extensively utilized—but were costly and subject to significant potential inaccuracies. With the advent and accuracy improvement in scanner-based point of sale (POS) data, tracking services such as those offered by Information Resources, Inc. (IRI), and A.C. Nielsen (ACN) are able to provide highly-granular (in terms of item, venue, and time), highly-accurate measurement of sales in several retail channels—including food/grocery, drug, mass merchandise, convenience, and military commissary. These POS-based offerings can be sample-based—i.e., rely on a statistically determined subset of the target population—or census-based—i.e., use all available data from all available venues.
- While POS-based measurement offerings do an excellent job of reporting “what” sold, they provide little insight into “why” something sold—since they provide no consumer-level data. To fill this need, market research companies such as IRI and ACN have recruited national consumer panels—in which panelists report their households' purchases on a regular basis. This longitudinal sample allows the development of much deeper consumer insights (e.g., brand switching, trial and repeat, etc.).
- However, consumer panels are not without their problems. As with any sample-based survey, consumer panels are subject to two types of errors—i.e., sampling errors and biases—where the total error is given by the sum: (Total Error)2=(Sampling Error)2+(Bias)2.
- Sampling errors are those errors attributable to the normal (random) variation that would be expected due to the fact that, by the very act of sampling, measurements are not being taken from the entire population. Sampling errors can be reduced by increasing the sample size since the standard deviation of the sampling distribution (often referred to as the “standard error”) decreases with the square root of the sample size.
- Biases are systematic errors that affect any sample taken by a particular sampling method. Because these errors are systematic, they are not affected by the size of the sample. Examples of panel biases include, but are not limited to:
-
- Recruitment bias—in which households recruited to participate in the panel are not representative of the target population (e.g., the overall population of the United States);
- Self-selection bias—in which households who choose to participate in the panel have slightly different buying habits than the average household (e.g., an orientation toward using promotions or adopting new products);
- Panelist turnover bias—in which the reporting effectiveness (accuracy and consistency) of panelists may vary over the time period in which they participate in the panel;
- Hereditary bias—in which individuals within a household share a tendency toward certain behaviors or medical conditions;
- Compliance bias—in which certain purchases or purchase occasions are consistently underreported by panelists;
- Item placement bias—in which panelists report products purchased that have not been accurately captured and/or classified in the hierarchy maintained by the data collector; and
- Projection bias—in which the weighting or projection system cannot fully adjust all geo-demographics or is stressed by over- or under-sampled segments of the target population.
- While both bias and sampling error are present in consumer panel data, for panels of a size significant enough to be of use in tracking consumer purchases (e.g., the IRI and ACN panels), the vast majority of the error that is present is due to bias. Further, since bias is unaffected by sample size, the negative impact of bias relative to the negative impact of sampling error worsens as the panel size increases.
- The negative impact of bias is substantially larger than that of sampling error for most products. Increasing the size of the sample (i.e., the size of the panel) will reduce only the sampling error and may, in fact, worsen any bias that may be present. Given the sizes of today's consumer panels, there is limited advantage to be gained by increasing the size of the panel—since over 90% of the total error is often due to non-sampling errors (i.e., bias).
- There has been little progress in the area of developing a systematic method of identifying and quantifying these biases. Further advancements are needed in this area.
- Another area of concern in retail sales measurement is “coverage”. Coverage includes both the number of channels in which measurements are reported and the business usefulness of those measurements. While Information Resources, Inc.'s (IRI's) point-of-sale (POS) based services provide excellent coverage of the Food/Grocery, Drug, Mass (excluding WALMART®), Convenience, and Military channels, these channels may account for only 50% of a manufacturer's sales—and as little as 20% of its sales growth. Non-tracked, growth channels—e.g., Club, Dollar, WALMART®—are, thus, becoming an increasingly important part of manufacturers' businesses while at the same time having little data available in the way of actionable sales measurement information. Further advancements are also needed in this area.
- One form of the present invention is a unique system for analyzing and correcting retail data.
- Other forms include unique systems and methods to identify, quantify, and correct consumer panel biases. Yet another form includes unique systems and methods to model relationships where data sources overlap to project values in areas in which fewer sources exist.
- Another form includes operating a computer system that has several client workstations and servers coupled together over a network. At least one server is a database server that stores sale data for various data sources, product identifier and attribute categorizations, calculated factors, and other data. External sources can be used to feed the data store on a scheduled or on-demand basis. At least one server is a server that contains business logic for analyzing and correcting some of the data sources stored in database server. Some client workstations can be used to administer settings used in process of analyzing and correcting the data sources. Other client workstations can be used to view the corrected and/or uncorrected data in a multi-dimensional format using a graphical user interface.
- Another form includes providing a computer system that uses multiple data sources to support inferences that would not be feasible based upon any single data source when used alone. Sales are positioned along product, venue, and time dimension hierarchies. Characteristics of the data source determine the level of aggregation at which the data can be positioned in the framework. For example, POS data may be available weekly in a particular channel; however, direct store delivery (DSD) data may be available at a daily level, and still other measures may be available only at a monthly or quarterly level. The situation is similar along the product and venue dimensions—ranging from the specificity of the sale of a particular UPC-coded item at a particular store to the generality of total category sales within a channel (across all geographies).
- Once this data framework is populated, the data fusion process itself is an iterative one, utilizing both competitive and complementary fusion methods. In “competitive fusion”, two or more data sources that provide overlapping measurements along at least one dimension are compared (“competed”) against each other at some level of aggregation along the product, venue, and time dimensions. More accurate/reliable sources are used to correct less accurate/reliable sources. In “complementary fusion”, relationships modeled where data sources overlap are projected to areas of the data framework in which fewer (or even a single) sources exist—enhancing the accuracy/reliability of those fewer (or single) sources even in domains where data from of the other sources upon which the models were based do not exist. The process is iterative in that the competitive and complementary fusion methodologies can be repeated at varying level of aggregation of the data framework.
- Another form includes providing a method for identifying and quantifying biases in consumer panel data so that the inherent utility of the consumer panel data may be enhanced. This method is termed competitive fusion. At least two data sources are used, with at least one assumed to be more accurate than the other—e.g., scanner-based POS data and consumer panel purchase data. The data sources are aligned along a common framework (i.e., data model or hierarchy) along the dimensions of product (item), venue (channel and/or geography), and/or time, with aggregation along these dimensions as necessary. The attributes associated with the framework are identified along which the framework may be characterized. The data sources are compared along these attributes—quantifying the impact of the attributes on the less-accurate data source.
- After these biases have been identified and quantified, the usefulness of the consumer panel data may be enhanced. The effect of the biases may be corrected for via modeling; i.e., the raw data may be adjusted to reduce or eliminate the effect of the biases. Furthermore, as appropriate, panel management practices may be changed in order to remove or lessen the source of bias in the panel itself.
- Yet another form of the present invention includes providing a method for using complementary fusion to “project” the results and relationships from the competitive fusion method onto consumer panel data in a channel with incomplete/less data than desired (e.g. data from WALMART®) to help enhance the accuracy of the Panel data source. At this point, competitive fusion may be used again in several possible ways and at several levels of aggregation along the venue, time, and/or product dimensions in order to develop independent estimates against which the complementary-fused estimate may be competed:
-
- Publicly available data about the incomplete channel (e.g., channel reports, reported sales and financials, store databases, geo-demographics, etc.) may be used to develop an independent venue (channel) estimate.
- Publicly available data about the category of interest (e.g., category studies, industry reports, reported sales/financials, etc.) may be used to develop an independent category estimate.
- Private data from manufacturer-partners (e.g., shipment data, delivery data, retailer-supplied data, etc.) may be used to develop independent channel and category estimates. Due to the potentially sensitive nature of some of these data sources, this competitive fusion may be performed inside a manufacturer's facility—as an auxiliary input to the baseline model.
- Private data from retailer-partners within a Collaborative Retail Exchange may be used in some venues to develop independent channel and category estimates.
- Yet other forms, embodiments, objects, advantages, benefits, features, and aspects of the present invention will become apparent from the detailed description and drawings contained herein.
-
FIG. 1 is a diagrammatic view of a computer system of one embodiment of the present invention. -
FIG. 2 is a multi-dimensional diagram illustrating the data space used by the system ofFIG. 1 . -
FIG. 3 is a block diagram illustrating selected data sources that are used by the system ofFIG. 1 . -
FIG. 4 is a high-level process flow diagram for the system ofFIG. 1 . -
FIG. 5A is a first part process flow diagram for the system ofFIG. 1 demonstrating the stages involved in performing competitive and complementary fusion. -
FIG. 5B is a second part process flow diagram for the system ofFIG. 1 demonstrating the stages involved in performing competitive and complementary fusion. -
FIG. 6A is a first part process flow diagram for the system ofFIG. 1 demonstrating a preferred process for calculating and applying factors in competitive fusion. -
FIG. 6B is a second part process flow diagram for the system ofFIG. 1 demonstrating a preferred process for calculating and applying factors in competitive fusion. -
FIG. 6C is a third part process flow diagram for the system ofFIG. 1 demonstrating a preferred process for calculating and applying factors in competitive fusion. -
FIG. 7A is a first part process flow diagram for the system ofFIG. 1 demonstrating an alternate process for calculating and applying factors in competitive fusion. -
FIG. 7B is a second part process flow diagram for the system ofFIG. 1 demonstrating an alternate process for calculating and applying factors in competitive fusion. -
FIG. 7C is a third part process flow diagram for the system ofFIG. 1 demonstrating an alternate process for calculating and applying factors in competitive fusion. -
FIG. 8 is a process flow diagram for the system ofFIG. 1 demonstrating the stages involved in performing complementary fusion. -
FIG. 9 is a process flow diagram for the system ofFIG. 1 demonstrating the stages involved in iteratively performing competitive and complementary fusion steps. -
FIG. 10 is a process flow diagram for the system ofFIG. 1 demonstrating the stages involved in calculating blended factors where multiple factor measures are available for the same factor. -
FIG. 11 is a data table illustrating hypothetical data elements stored in the database ofFIG. 1 to be used in accordance with the procedure ofFIG. 6 . -
FIG. 12 is a data table illustrating hypothetical data elements that are stored in the database ofFIG. 1 and are adjusted according to factors for a first attribute in accordance with the procedure ofFIG. 6 . -
FIG. 13 is a data table illustrating hypothetical data elements that are stored in the database ofFIG. 1 and are adjusted according to factors for a second attribute in accordance with the procedure ofFIG. 6 . -
FIG. 14 is a data table illustrating hypothetical data elements that are stored in the database ofFIG. 1 and are adjusted according to factors for a third attribute in accordance with the procedure ofFIG. 6 . -
FIG. 15 is a data table illustrating hypothetical data elements stored in the database ofFIG. 1 , with attribute summaries, and used in accordance with the procedure ofFIG. 7 . -
FIG. 16 is a data table illustrating hypothetical data elements that are stored in the database ofFIG. 1 and are adjusted according to factors for three attributes in accordance with the procedure ofFIG. 7 . -
FIG. 17 is a data table illustrating hypothetical data elements by retailer that are stored in the database ofFIG. 1 and used in accordance with the complementary fusion procedure ofFIG. 8 . -
FIG. 18 is a data table illustrating hypothetical data elements by retailer that are stored in the database ofFIG. 1 , adjusted using complementary fusion according to the factors calculated in accordance with the procedure ofFIG. 7 , as described in the procedure ofFIG. 8 . -
FIG. 19 is a data table illustrating hypothetical data elements by retailer that are stored in the database ofFIG. 1 and are used to perform another iteration of competitive fusion, including calculating blended factors, as described in the procedures ofFIG. 9 andFIG. 10 . -
FIG. 20 is a data table illustrating hypothetical data elements by retailer that are stored in the database ofFIG. 1 and updated based upon the blended factor, as described in the procedures ofFIG. 9 andFIG. 10 . -
FIG. 21 is a data table illustrating hypothetical real, original, and corrected values stored in the database ofFIG. 1 to show how the competitive and complementary fusion process helped improve the data, as described in the procedures ofFIG. 9 . -
FIG. 22 is a simulated screen of a user interface for one or more client workstations ofFIG. 1 that allows a user to view the multi-dimensional elements in the database, as described in the procedures ofFIG. 4 andFIG. 5 . - For the purposes of promoting an understanding of the principles of the invention, reference will now be made to the embodiment illustrated in the drawings and specific language will be used to describe the same. It will nevertheless be understood that no limitation of the scope of the invention is thereby intended. Any alterations and further modifications in the described embodiments, and any further applications of the principles of the invention as described herein are contemplated as would normally occur to one skilled in the art to which the invention relates.
- One embodiment of the present invention includes a unique system for identifying, quantifying, and correcting consumer panel biases, and then using overlapping areas of the data sources to project values in areas where fewer or less complete sources exist.
FIG. 1 is a diagrammatic view ofcomputer system 20 of one embodiment of the present invention.Computer system 20 includescomputer network 22.Computer network 22 couples together a number ofcomputers 21 overnetwork pathways 23 a-e. More specifically,system 20 includes several servers, namelybusiness logic server 24 anddatabase server 25.System 20 also includesexternal data sources 26, which in various embodiments include other computers, files, electronic and/or paper data sources. External data sources 26 are optionally coupled to network overpathway 23 f.System 20 also includesclient workstations computers 21 are each illustrated as being either a server or a client, it should be understood that any ofcomputers 21 may be arranged to provide both a client and server functionality, solely a client functionality, or solely a server functionality. Furthermore, it should be understood that while sixcomputers 21 are illustrated, more or fewer may be utilized in alternative embodiments. -
Computers 21 include one or more processors or CPUs (50 a, 50 b, 50 c, 50 d, and 50 e, respectively) and one or more types of memory (52 a, 52 b, 52 c, 52 d, and 52 e, respectively). Eachmemory - In one embodiment, each processor is of a conventional, integrated circuit microprocessor arrangement, such as one or more PENTIUM III or
PENTIUM 4 processors supplied by INTEL Corporation of 2200 Mission College Boulevard, Santa Clara, Calif. 95052, USA. - Each memory (removable or generic) is one form of computer-readable device. Each memory may include one or more types of solid-state electronic memory, magnetic memory, or optical memory, just to name a few. By way of non-limiting example, each memory may include solid-state electronic Random Access Memory (RAM), Sequentially Accessible Memory (SAM) (such as the First-In, First-Out (FIFO) variety or the Last-In-First-Out (LIFO) variety), Programmable Read-Only Memory (PROM), Electronically Programmable Read-Only Memory (EPROM), or Electrically Erasable Programmable Read-Only Memory (EEPROM); an optical disc memory (such as a DVD or CD ROM); a magnetically encoded hard disc, floppy disc, tape, or cartridge media; or a combination of any of these memory types. Also, each memory may be volatile, nonvolatile, or a hybrid combination of volatile and nonvolatile varieties.
- Although not shown in
FIG. 1 to preserve clarity, in one embodiment eachcomputer 21 is coupled to a display.Computers 21 may be of the same type, or be a heterogeneous combination of different computing devices. Likewise, the displays may be of the same type, or a heterogeneous combination of different visual devices. Although again not shown to preserve clarity, eachcomputer 21 may also include one or more operator input devices such as a keyboard, mouse, track ball, light pen, and/or microtelecommunicator, to name just a few representative examples. Also, besides display, one or more other output devices may be included such as loudspeaker(s) and/or a printer. Various display and input device arrangements are possible. -
Computer network 22 can be in the form of a wired or wireless Local Area Network (LAN), Municipal Area Network (MAN), Wide Area Network (WAN) such as the Internet, a combination of these, or such other network arrangement as would occur to those skilled in the art. The operating logic ofsystem 20 can be embodied in signals transmitted overnetwork 22, in programming instructions, dedicated hardware, or a combination of these. It should be understood that more orfewer computers 21 can be coupled together bycomputer network 22. - In one embodiment,
system 20 operates at one or more physical locations wherebusiness logic server 24 is configured as a server that hosts and runsapplication business logic 33,database server 25 is configured as adatabase 34 that stores reference data 35 (e.g. product identifiers 36 a, attributes 36 b, and adictionary 36 c), at least two retail data sources (such as point-of-sale and panel data) 38, calculatedfactors 39, andother data 40. In one embodiment,external data 26 is imported todatabase server 25 from a mainframe extract file that is generated on a periodic basis. Various other scenarios are also possible for using and importing external data todatabase server 25. In another embodiment, external data sources are not used. In one embodiment,database 34 ofdatabase server 25 is a relational database and/or a data warehouse. Alternatively or additionally,database 34 can be a series of files, a combination of database tables and external files, calls to external web or other services that return data, and various other arrangements for accessing data for use in a program as would occur to one of ordinary skill in the art.Client workstations 30 are configured for providing one or more user interfaces to allow a user to modify settings used bybusiness logic 33 and/or to view theretail data sources 38 ofdatabase 34 in a multi-dimensional format. Typical applications ofsystem 20 would include more or fewer client workstations of this type at one or more physical locations, but three have been illustrated inFIG. 1 to preserve clarity. Furthermore, although two servers are shown, it will be appreciated by those of ordinary skill in the art that the one or more features provided bybusiness logic server 24 anddatabase server 25 could be provided on the same computer or varying other arrangements of computers at one or more physical locations and still be within the spirit of the invention. Farms of dedicated servers could also be provided to support the specific features if desired. -
FIG. 2 is amulti-dimensional cube 60 that illustrates a way of conceptually thinking about the elements stored indatabase 34 ofsystem 20.Cube 60 contains three dimensions:complexity 62,sources 64, andaggregation 66. In one embodiment, at least part of the data indatabase 34 is categorized according tocomplexity 62,sources 64, andaggregation 66 axes ofmulti-dimensional cube 60 for analysis, viewing, and/or reporting.Cube 60 helps illustrate the concept that theaggregation dimension 66 is multi-dimensional, although other dimensions could be used than illustrated. Examples of elements of thesource dimension 64 includes client (internal)data 65 a, scanning (point-of-sale)data 65 b,panel data 65 c, audit data 66 d, and other (external) data 66 e, as a few examples. Examples of elements of theaggregation dimension 66 include time 67 a, item (product) 67 b, channel (venue) 67 c, geography (venue) 67 d, and other 67 e, to name a few examples. Various dimensions ofcube 60 are used in the competitive fusion and complementary fusion processes described herein. -
FIG. 3 is a block diagram illustrating further examples of the one or more retail data sources (36 inFIGS. 1 and 64 inFIG. 2 ) that can be used by the system ofFIG. 1 in the competitive fusion and complementary fusion processes described herein. Point-of-sale data 70,consumer panel data 72, audit/survey data 74 including causal (promotional) data,shipment data 76 from anywhere in supply chain,population census data 78 including geo-demographic data,store universe data 80,other data sources 82, andspecialty panels 84 are examples of the types of data that can be used withsystem 20. The types of data that can be used withsystem 20 are not limited to traditional retailers. For example, data collected during any part of the supply chain could be used as a data source. - Referring also to
FIG. 4 , one embodiment for implementingsystem 20 is illustrated in flow chart form asprocedure 150, which demonstrates a high-level process for the system ofFIG. 1 and will be discussed in more detail below.FIG. 4 illustrates the high-level procedures for performing “competitive fusion” and “complementary fusion”. In “competitive fusion”, two or more data sources that provide overlapping measurements along at least one dimension are compared (“competed”) against each other at some level of aggregation along the product, venue, and/or time dimensions. More accurate/reliable sources are used to correct less accurate/reliable sources. In “complementary fusion”, relationships modeled where data sources overlap are projected to areas of the data framework in which fewer (or even a single) sources exist—enhancing the accuracy/reliability of those fewer (or single) sources even in domains where data from of the other sources upon which the models were based do not exist. The process is iterative in that the competitive and complementary fusion methodologies can be repeated at varying level of aggregation of the data framework. - In one form,
procedure 150 is at least partially implemented in the operating logic ofsystem 20.Procedure 150 begins withbusiness logic server 24 identifying at least two data sources, with at least one data source being more accurate than another (stage 152). At least one data source (see e.g. 36 inFIGS. 1 and 64 inFIG. 2 ) is used as the “reference” data source and another is used as the “target” data source with the biases to be identified and quantified. In one embodiment, the reference data source is more accurate than the target data source. For purposes of the tracking of sales in retail channels, scanner-based point-of-sale (POS) data is typically a good “reference” source, due to its inherent accuracy and high level of granularity along the dimensions of time, venue, and product. Alternatively or additionally, manufacturer-supplied shipment data, especially where such data is based upon direct store delivery (DSD) information, may be utilized as a “reference” source. As yet another alternative, retailer-specific data sources (e.g., “frequent shopper” program data from loyalty cards) are also appropriate. - Various examples herein illustrate using consumer panel purchase data as the target data source to be corrected. However, the current invention can be used with other data sources, such as sample-based or survey-based data sources whose overall accuracy is limited by the presence of biases, to name a few non-limiting examples.
- The product characteristics of the data sources should ideally be available at the item level, where “item” is by UPC, SKU, or another unique product identifier. In terms of the venue characteristics of the data sources, they should ideally be available at the retailer and market level, where “retailer” is a store (or chain of stores) within a particular retail channel and “market” is a geographic construct (e.g., Chicago area). In terms of the time characteristics of the data sources, they should ideally be available at the weekly level (or even daily in some cases), although monthly data (or 4-week “quad” data) or various other time frames are also acceptable. Where these levels of granularity are not possible, more aggregated levels of the product (e.g., “brand”), venue (e.g., “food” or “mass” channel for retailer and/or “region” or “total U.S.” for market), and/or time (e.g., quarterly or annual data) dimensions may be used.
- After the data sources have been identified (stage 152), they are next aligned along a common framework (stage 154), such as along the item, venue, and/or time dimensions. Depending upon the characteristics (and quality) of the data sources, some aggregation along these dimensions may be required in order for the alignment to be possible. For example, UPC-level POS data may need to be aggregated at the SKU or even brand level in order to be aligned with data from other sources (particularly in the cases in which venue-specific UPCs are involved). Similarly, store-level data may need to be aggregated at the local market or even regional level in order to be aligned with consumer panel purchase data. Finally, weekly (or even daily) POS data may need to be aggregated at the 4-week quad level in order to be aligned with shipment/delivery data. Various other arrangements for aligning the data along a common framework are also possible.
- In one embodiment, the item structure is provided by a multiple-level hierarchy, in which UPCs are the lowest level and are aggregated along category-related characteristics. Venue structure is provided along both geographical and channel dimensions, with FIPS-code-level transactions being aligned along market and regions and store locations being part of a sub-chain, chain, and parent store hierarchy. Time structure is presently provided at the weekly level at the lowest level of aggregation, with daily data being aggregated at the weekly level before placement into the structure, although a daily data compatible structure or other variation is also possible.
- As a result of aligning the data sources along a common framework (stage 154), overlapping attribute segments of at least one dimension are available to use for data comparison and correction. Certain attributes associated with the data sources are identified along which more detailed comparisons may be made. In one embodiment, product attributes are available in from
reference data 35 ofdatabase 34. For example, one or more pieces of information fromproduct identifier 36 a, attributes 36 b, anddictionary 36 c references can be used to access or modify attributes, attribute hierarchies, and mappings. These attributes represent category-specific dimensions along which products in that category may be characterized (e.g., diet vs. regular in carbonated soft drinks, active ingredient in internal analgesics, product size in most categories). The term attribute used herein is meant in the generic sense to cover various types of descriptors. -
Business logic server 24 compares the data sources and calculates factors for the attributes of at least one element of the common framework (stage 158). Each segment of a given attribute will have its own factor, as described in detail herein. The presence of attribute-related bias may be identified by comparison of the data sources. In the examples illustrated herein, volumetric comparisons are made (e.g., equivalent units); however, various other measures (e.g., dollar sales, actual units) could also be utilized, as long as the same type of measure is being used for the comparison. For example, it would not be useful to compare dollar sales to actual units, but it would be useful to compare dollars to dollars. The comparison itself is between the value of the target data source (e.g., projected panel volume) and that of the reference data source (e.g., POS data). This comparison can be by way of two-sample inference, regression analysis, or other statistical tests appropriate for determining whether any differences between the two data sources are associated with the attributes along which they have been characterized at a statistically significant level. Where such differences (biases) are identified, they are quantified, and factors are calculated for use in bias correction/adjustment. - The factors are used to correct bias in the less accurate data source (stage 160), which in this example is consumer panel data. By using the factors to correct the bias in the less accurate “target” data source, the effect of these biases is reduced or eliminated. These biases can be corrected by adjusting the raw data, or by way of post-adjustment.
- In “complementary fusion”, the factors are also used to supplement the data that is incomplete in the less complete data source (stage 162), such as consumer panel data. Incomplete data is used in a general sense to mean that less data was provided than desired or that the data is less accurate than desired, to name a few non-limiting examples. Where highly accurate data (e.g. POS data) is not provided, less accurate data (e.g. panel data) becomes more important to analyze and correct. Relationships modeled where data sources overlap are projected to areas of the data framework in which fewer (or even a single) sources exist, enhancing the accuracy and reliability of those fewer (or single) sources even in domains where data from of the other sources upon which the models were based do not exist.
- Users and/or reports can access
database 34 from one ofclient workstations 30 to view/analyze the corrected and adjusted data (stage 164). Users and/or reports can also accessdatabase 34 from one ofclient workstations 30 to view and/or modify settings used bysystem 20 to make data corrections. The steps are repeated as desired (stage 166). The process then ends atstage 168. -
FIGS. 5A-5B are first and second parts of a process flow diagram for the system ofFIG. 1 demonstrating the stages involved in performing competitive and complementary fusion using POS and panel data as the data sources. While in this and other figures, the first data source (the “source” data source) is described as being POS data and the second data source (the “target” data source) is described as being panel data, it will be appreciated that the system and methodologies can be used with other data sources as appropriate. In one form,procedure 170 is at least partially implemented in the operating logic ofsystem 20.Procedure 170 begins inFIG. 5A with receiving updates forreference data 35 and/ordata sources 38 on a periodic basis (stage 172). - In one embodiment, a parameter specification for the number of weeks used in calculating the factors is thirteen, and the minimum week range included in
database 34 is then set to be thirteen weeks prior to the update week.Database 34 may be built and maintained using various data sources and can include various types of data, as would occur to one of ordinary skill in the art. In one embodiment,system 20 supports the option to pull the desired period (e.g. all thirteen weeks) of the data sources 38, append the recent period (e.g. four weeks) needed since the last factor update to the existingdatabase 34, and/or be able to recreate the data a week at a time. In such a scenario, for space conservation, the system can optionally drop the same number of weeks from the start week ofdatabase 34 as were appended to the end week. For example, if the option was chosen to append the four weeks needed since the last factor update, the system should drop the four oldest weeks from the existingdatabase 34 when appending the four new weeks. - The received updates to
reference data 35 and/ordata sources 38 are stored in database 34 (stage 174). At some point in time, such as on a scheduled or as-requested basis, the system determines that data adjustments should be made to correct bias (decision point 175).Application business logic 33 ensuresreference data 35 anddata sources 38 are up to date, and if not, updates them accordingly (stage 176). Optionally,reference data 35 is reviewed to ensure that the default attributes for the current category will be appropriate for the client or scenario, and adjustments are made toreference data 35 as appropriate (stage 177). As one non-limiting example, attribute segments may be reviewed and translated to more succinct segmentations that better classify the product identifiers. Other variations are also possible. - A product-identifier-to-attribute-segment mapping is prepared for the product identifiers (e.g. UPC's) (stage 178). If the attributes are determined to be irrelevant, they can be removed from further consideration in this process. The attribute table 36 b is a reference table that maps each
product identifier 36 a to a set of attribute variables. While UPC's are described as a common product identifier, other identifiers could also be used. For example, not every dataset has a UPC, but may have a product identifier at a higher, lower, or equivalent level. Rules are used to determine supportable attribute segments and relevant attributes. In one embodiment, if segment assignment is missing then the UPC is assigned to a new segment “not supportable.” All segments with less than a 5% share are assigned to “not supportable.” Furthermore, in one embodiment, if the final “not supportable” category accounts for >50% of the category share, then the attribute is designated as “irrelevant.” Other ways for determining relevance can also be used, or relevance can simply be ignored.Stage 178 can be repeated to arrive at the final level of segments to use (rolled-up or drilled-down) as appropriate. - Continuing with
FIG. 5B , source (e.g. POS) and target (e.g. panel)data 38 are retrieved fromdatabase 34 and summarized by attribute segments (stage 180). Factors are calculated for attribute segments (stage 181). The significance of the attribute segments is determined (stage 182). If any non-significant factors are determined, the significant attribute factors can be re-aligned (stage 183). The factors for each attribute segment are applied to the target (panel) data to correct bias (stage 184). The factors are also applied to the target (panel) data to correct data that is incomplete (e.g. less available) (stage 186). The competitive and/or complementary data fusion steps can be repeated as desired or appropriate (stage 187). Users and/or reports can accessdatabase 34 from one ofclient workstations 30 to view/analyze the corrected and adjusted data (stage 188). Theprocedure 170 then ends atstage 190.FIGS. 6-10 illustrate the competitive and complementary fusion stages in further detail. -
FIGS. 6A-6C are first, second, and third parts of a process flow diagram for the system ofFIG. 1 demonstrating a preferred process for iteratively calculating and applying factors in competitive fusion. In one form,procedure 200 is at least partially implemented in the operating logic ofsystem 20.Procedure 200 begins onFIG. 6A with summing source (POS) data by the most granular product and time dimension (e.g. UPC) (stage 202) and summing target (panel) data by the most granular product and time dimension (e.g. UPC) (stage 204). In one embodiment, they are both summed to weekly (e.g. 52) totals.Business logic server 24 determines the period of time to use in the analysis (stage 206), such as to use all of the weekly totals summed in the prior step or to use only part of the weekly totals that cover a desired time period, such as the most recent 13 weeks, to name a few examples. Outliers are also eliminated (stage 207) at this point or another appropriate point before final calculations. For example, in one embodiment, although thirteen weeks are contained in the dataset, only 11 weeks are actually used in calculations. Research indicates that panel volume is extremely vulnerable to outliers. To minimize the potential impact of outliers, the week with the lowest coverage and the week with the highest coverage are eliminated from further use in calculations for the current update. In one embodiment, although the outlier weeks are eliminated from further use in calculations for the current update, they are not removed from the dataset as they may be used in subsequent updates.Business logic server 24 then merges the source (POS) data, target (panel) data, and product identifier to attribute segment mapping reference data (stage 208). Attributes can optionally be sorted in order by importance (stage 210). In one embodiment, the least important is first and the most important is last. If factors for the most important attribute segments are the last ones applied, it usually has the most significant mathematical effect because no lesser important attribute segment factor will be applied after that last calculation to further skew the results. - An initial factor of 1.0 is assigned to all attribute segment (stage 212). Continuing with
FIG. 6B , source (POS) and target (panel) data are then summarized for the segments of the current attribute (stage 214). A factor is calculated for each attribute segment of the current attribute as source data volume divided by target data volume (stage 216). Other mathematical variations could also be used. For each segment of the current attribute, determine whether the attribute segment is significant (stage 218). In one embodiment, shares are calculated for the attribute segments, such as by dividing the Calculation Period Segment Total U.S. POS volume by the Calculation Period Category Total U.S. POS volume. Significance is then determined by first analyzing a confidence interval (CI) for each share to determine if there is overlap between the POS share CI and the panel share CI. If there is overlap, then the difference between source and target shares is not significant and the attribute segment will be designated as “nonsignificant.” Other ways for determining significance can also be used, or significance can be assumed. - In one embodiment, if two or more segments for the current attribute were nonsignificant (stage 220), then the significant factors (that remain) will need to be re-aligned to account for non-significant segment factors being removed (stage 222). At the product identifier-level target (POS) data, each volume is multiplied by the factor for the corresponding segment (stage 224). Again, other mathematical variations could also be used. The factors for each attribute segment are then saved to
factor data store 39 of database 34 (stage 226). If another attribute is present (decision point 228), the next attribute is made the current attribute (stage 230) and stages 214-226 are repeated. These stages are repeated until all attributes are processed. Continuing withFIG. 6C , a category adjustment factor is applied to all product identifiers as necessary (stage 232) to adjust for the level of coverage. In one embodiment, the use of a category adjustment factor depends on the type of measure being used. For example, where volume is used, coverage adjustments may not be necessary, but where shares are used, further coverage adjustments may be necessary. Any final factors for the category adjustment factor are saved to thefactor data store 39 of database 34 (stage 234). Theprocess 200 then ends at stage 236. - FIGS. 7A-&C are first, second, and third parts of a process flow diagram for the system of
FIG. 1 demonstrating an alternate process for calculating and applying factors in competitive fusion. In one form, procedure 250 is at least partially implemented in the operating logic ofsystem 20. Procedure 250 begins onFIG. 7A with summing the more reliable (source) data source (e.g., POS data) by the most granular product and time dimension (e.g. UPC) (stage 252) and summing the less accurate (target) data source (e.g., panel data) by the most granular product and time dimension (stage 254).Business logic server 24 determines the period of time to use in the analysis (stage 256) and eliminates outliers (stage 257), as discussed inFIG. 6 . Source data, target data, and product identifiers to attribute segment mapping data are merged (stage 258). An initial factor of 1.0 is assigned to each attribute segment (stage 260). Source and target data are summarized to the segments for all attributes (stage 262). - Continuing with
FIG. 7B , factors are calculated for each attribute segment as source volume divided by target volume (stage 264).Business logic server 24 determines whether the attribute segment is significant (stage 266), as described inFIG. 6 . Where two or more segments for any particular attribute are insignificant (decision point 268), then the significant factors are re-aligned to account for the elimination of the insignificant segment factors in the particular attribute (stage 270). At the product identifier-level target data, each volume is multiplied by the factor for each corresponding segment (stage 272). In other words, all of the factors applicable to the volume are applied simultaneously, as opposed to iteratively as shown inFIG. 6 . The factors are then saved tofactor data store 39 for each attribute segment (stage 274). - Continuing with
FIG. 7C , a category adjustment factor is applied to all product identifiers as necessary (stage 276), as described inFIG. 6 . The final factors for the category adjustment factor are saved to thefactor data store 39 of database 34 (stage 277). The procedure 250 then ends at stage 278. Procedure 250 should only be used in the appropriate circumstances, such as when the attributes are not affected by each other and iteration is not needed for greater accuracy, to name one example. If attributes are affected by each other and procedure 250 is used instead of the iterative procedure ofFIG. 6 , then the results will be mathematically different, with the procedure ofFIG. 6 producing a more accurate result. -
FIG. 8 is a process flow diagram for the system ofFIG. 1 demonstrating the stages involved in performing complementary fusion. In one form,procedure 280 is at least partially implemented in the operating logic ofsystem 20.Procedure 280 begins with merging source data, target data, and product identifier data to attribute segment mapping data (stage 282). The factors previously calculated in accordance withFIG. 6 orFIG. 7 are applied to the product identifier-level target data based on the attribute segment mapping to correct the data for incompleteness (e.g. less data than desired) (stage 286). The target data elements that are corrected in this process can be the same, different, or overlapping from the target data that was used to help calculate the factors. Theprocedure 280 then ends at stage 288. -
FIG. 9 is a process flow diagram for the system ofFIG. 1 demonstrating the stages involved in performing repeating competitive and complementary fusion steps multiple times. In one form,procedure 290 is at least partially implemented in the operating logic ofsystem 20.Procedure 290 begins with determining what additional public or private data sources are available to use for competitive fusion along venue, time, and/or product dimensions (stage 292). Using one or more of those data sources, additional factors are calculated that are independent estimates against which the complementary-fused estimate may be competed (stage 294). The newly calculated factors are applied to the product identifier-level target data (e.g. POS data) to further adjust the data (stage 296). The competitive and complementary fusion steps can be repeated as desired and/or appropriate (stage 298). Theprocedure 290 then ends atstage 299. -
FIG. 10 is a process flow diagram for the system ofFIG. 1 demonstrating the stages involved in calculating blended factors where multiple factor measures are available for the same factor. In one form,procedure 300 is at least partially implemented in the operating logic ofsystem 20.Procedure 300 can be used when competitive fusion is being performed and at least two data sources are available for the same factor (stage 302). For each aggregation (venue, time, or product) that has at least two factor measures, calculate specific totals are calculated across attributes (stage 304). Factors for each aggregation of the current data source are calculated by dividing source data volume by target data volume (stage 305). If there are more data sources (decision point 306), then move to the next data source (stage 307) and repeat stages 304-305. Then, calculate a blended factor (stage 308) where the more accurate source is given a higher weight and the less accurate source is given a lower weight. One simple way of calculating a blended factor is to calculate a central tendency—e.g., mean or median—of the various factors as the overall factor. This treats all estimates as of equal value (reliability, accuracy, precision), which in reality may or may not be the case. In a preferred embodiment, the “blended factor” uses an “inverse-variance-weighted” method (see 444 onFIG. 19 as an example). This name originates from the fact that more “reliable” estimates—i.e., those with more precision and, thus, less variability—are given more weight than those that are less “reliable” (more variable). Once the blended estimate has been calculated, multiply each volume of the product identifier-level target data by the blended factor (stage 310). Theprocedure 300 then ends atstage 312. - A hypothetical example will now be described in
FIGS. 11-21 to with reference to the procedures described inFIGS. 6-10 .FIG. 11 is a data table illustrating hypothetical data elements that are adjusted according to the preferred embodiment competitive fusion procedure ofFIG. 6 .POS data 320,panel data 322, and attributeinformation 324 are shown in a summarized form byUPC 326. For each attribute and its corresponding segments, various steps are performed as discussed below. - Turning to
FIG. 12 , the data is assumed to be relevant and the POS and panel data shown in table 330 are then summarized for the segments of the current attribute (stage 214), which in the current iteration ismanufacturer 332. Privatebrand label summaries 334 and non-privatebrand label summaries 336 forPOS 338 andpanel data 340 are calculated from table 330 as illustrated. Afactor 342 for each attribute segment of the current attribute, in this caseprivate label manufacturer 334 andnon-private label manufacturer 336 segments, is calculated asPOS volume 338 divided by panel volume 340 (stage 216).Business logic server 24 determines whether the current attribute segment is significant (stage 218). For purposes of illustrating the current example, all attribute segments are also assumed significant. At the UPC level panel data, eachpanel volume 344 is multiplied by thefactor 342 for its corresponding segment (stage 224) to arrive at an adjustedpanel value 346.Factors 342 are saved to thefactor data store 39 of database 34 (stage 226). - As shown in
FIGS. 13 and 14 , stages 214 to 226 repeat for each attribute, with previously adjusted data being used in the calculation.FIG. 13 illustrates data elements being adjusted according to factors calculated for a second attribute in accordance with the procedure ofFIG. 6 . The POS and panel data shown in table 350 are then summarized for the segments of the current attribute (stage 214), which in the current iteration istype 352. Summaries forregular type 354 andspecial type 356 forPOS 358 andpanel data 360 are calculated from table 350 as illustrated. Afactor 362 for each attribute segment of the current attribute, in this caseregular type 354 andspecial type 356 segments, is calculated asPOS Volume 358 divided by panel volume 360 (stage 216). At the UPC level panel data, the previously adjustedpanel volume 364 is multiplied by thefactor 362 for its corresponding segment (stage 224) to arrive at yet another adjustedpanel value 366.Factors 362 are saved to thefactor data store 39 of database 34 (stage 226). -
FIG. 14 illustrates data elements being adjusted according to factors calculated for a third attribute in accordance with the procedure ofFIG. 6 . The POS and panel data shown in table 370 are then summarized for the segments of the current attribute (stage 214), which in the current iteration issize 372. Summaries for size big 374,size medium 375, and size small 376 forPOS 378 andpanel data 380 are calculated from table 370 as illustrated. Afactor 382 for each attribute segment of the current attribute, in this case size big 374, medium 375, and small 376 segments, is calculated asPOS Volume 378 divided by panel volume 380 (stage 216). At the UPC level panel data, each previously adjustedpanel volume 384 is multiplied by thefactor 382 for its corresponding segment (stage 224) to arrive at yet another adjustedpanel value 386.Factors 382 are saved to thefactor data store 39 of database 34 (stage 226). After processing all attributes, the final factors are saved to thefactor data store 39 of database 34 (stage 234). The process then ends at stage 236. -
FIGS. 15 and 16 illustrate data elements being adjusted according to factors calculated according to an alternative embodiment competitive fusion process in accordance with the procedure ofFIG. 7 .Business logic server 24 determines the period of time to use in the analysis (stage 256), and merges POS, panel, and attribute information by UPC as shown in table 390 (stage 258).POS data 392 andpanel data 394 are summarized for all attribute segments (stage 262), in this case bymanufacturer 396,type 398, andsize 400. As shown inFIG. 16 , factors for each attribute segment 402 are calculated as each respective POS volume 404 divided by each respective panel volume 406 (stage 264). Eachpanel volume 407 is multiplied by the factors 408 a-408 c appropriate for its corresponding segment (stage 272) to calculate an adjustedpanel value 410. The process then ends at stage 278. -
FIG. 17 is a data table illustrating hypothetical data elements by retailer that are stored in the database ofFIG. 1 and used in accordance with the complementary fusion procedure ofFIG. 8 . POS, panel and attribute information are merged by UPC (stage 282) for multiple retailers, as shown in table 420.Client shipment data 424, another data source available, is also merged by UPC. Shares are calculated forPOS data 420 a-420 b and panel data 422 a-422 c for the segments of each attribute (stage 284). As shown inFIG. 18 , the previously calculated factors 430 a-430 c (408 a-408 c inFIG. 16 ) are applied to the UPC level panel data 432 a-432 c to further adjust the data to correct for incompleteness (stage 286) and arrive at an adjusted panel value 434 a-434 c. The complementary fusion process then ends at stage 288. -
FIGS. 19 and 20 illustrate performing another iteration of competitive fusion, including calculating blended factors, as described in the procedures ofFIG. 9 andFIG. 10 . Additional public or private data sources are identified as available to use for competitive fusion (stage 292). As shown in table 438, channel specific totals 440 a-440 f across attributes have been identified for use in competitive fusion. In addition to POS and Panel totals forretailers 1 and 2 (440 a-440 d),client shipment total 440 e and panel total 440 f can also be used for comparison. Using these totals 440 a-440 f,additional factors 442 have been calculated that are independent estimates against which the complementary-fused data fromFIG. 18 may be competed (stage 294). A blendedfactor 444 has been calculated since multiple data sources were available for the same factor (stages 302-308 inFIG. 10 ). As shown inFIGS. 19 and 20 , each volume 446 a-446 c of the previously adjusted UPC-level panel data is then multiplied by the blended factor to arrive at the newly adjusted panel values 450 a-450 c (stage 298 inFIG. 9 , andstage 310 inFIG. 10 ). -
FIG. 21 is a data table illustrating hypothetical table 460 of end results for POS data elements byretailers FIGS. 462 a-462 b, pre-fusionFIGS. 464 a-464 b, and post-fusionFIGS. 466 a-466 b to show how the competitive and complementary fusion processes according toFIGS. 4-10 and illustrated in the hypothetical ofFIGS. 11-20 helped improve the data accuracy. -
FIG. 22 is a simulated screen of a user interface for one ormore client workstations 30 that allows a user to view the multi-dimensional elements in the database, as described in the procedures ofFIG. 4 andFIG. 5 . - Alternatively or additionally, once data fusion has been performed as described herein, the updated data can be used by various systems, users, and/or reports as appropriate.
- In one embodiment of the present invention, a method is disclosed comprising identifying a plurality of data sources, wherein at least a first data source is more accurate than a second data source; identifying a plurality of overlapping attribute segments to use for comparing the data sources; calculating a factor as a function of each of the plurality of overlapping attribute segments; and using the factors to update a first group of values in the second data source to reduce bias.
- In another embodiment of the present invention, a method is disclosed comprising receiving point-of-sale data and panel data on a periodic basis; identifying a plurality of product identifiers and a plurality of attributes to analyze; retrieving and summarizing the point-of-sale data and the panel data by the plurality of product identifiers, the plurality of attributes, and a plurality of corresponding attribute segments for a specified time period; calculating a factor for each attribute segment of a particular attribute; and applying the factors for the particular attribute segment to the panel data to correct panel bias.
- In yet another embodiment, a method is disclosed comprising receiving point-of-sale data and panel data on a periodic basis; identifying a plurality of product identifiers and a plurality of attributes to analyze; retrieving and summarizing the point-of-sale data and the panel data by the plurality of product identifiers, the plurality of attributes, and a plurality of corresponding attribute segments for a specified time period; calculating a plurality of factors, wherein one factor is calculated for each attribute segment of the plurality of attributes; and applying the factors to the second data source to reduce bias; and applying the factors to the second data source to reduce incompleteness.
- In yet a further embodiment, a method is disclosed comprising identifying a plurality of product identifiers and a plurality of attributes to analyze for at least two data sources, wherein at least a first data source is more accurate than a second data source; retrieving and summarizing the first data source and the second data source by the plurality of product identifiers, the plurality of attributes, and a plurality of corresponding attribute segments for a specified time period; calculating a plurality of factors, wherein one factor is calculated for each attribute segment of the plurality of attributes; applying the factors to the second data source to reduce bias; and applying the factors to a different or overlapping dataset of the second data source to reduce incompleteness.
- In another embodiment, a system is disclosed that comprises one or more servers being operable to store retail data from at least two data sources, store product identifier and attribute categorizations, and store a plurality of factor calculations; wherein the at least two data sources includes a first data source that is more accurate than a second data source; and wherein one or more of said servers contains business logic that is operable to identify and retrieve a plurality of overlapping attribute segments to use for comparing the at least two data sources, compare each of the overlapping attribute segments, calculate a factor for each of the overlapping attribute segments, and use the factors to update a first group of values in the second data source to reduce bias.
- In yet a further embodiment, an apparatus is disclosed that comprises a device encoded with logic executable by one or more processors to: identify and retrieve a plurality of overlapping attribute segments to use for comparing at least two data sources, wherein the at least two data sources includes a first data source that is more accurate than a second data source, compare each of the overlapping attribute segments, calculate a factor for each of the overlapping attribute segments, and use the factors to update a first group of values in the second data source to reduce bias.
- A person of ordinary skill in the computer software art will recognize that the client and/or server arrangements, user interface screen content, and data layouts could be organized differently to include fewer or additional options or features than as portrayed in the illustrations and still be within the spirit of the invention.
- While the invention has been illustrated and described in detail in the drawings and foregoing description, the same is to be considered as illustrative and not restrictive in character, it being understood that only the preferred embodiment has been shown and described and that all equivalents, changes, and modifications that come within the spirit of the inventions as described herein and/or by the following claims are desired to be protected.
Claims (1)
1. A method comprising:
identifying a plurality of data sources, wherein at least a first data source is more accurate than a second data source;
identifying a plurality of overlapping attribute segments to use for comparing the data sources;
calculating a factor as a function of each of the plurality of overlapping attribute segments; and
using the factors to update a first group of values in the second data source to reduce bias.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US11/926,381 US20080154884A1 (en) | 2004-02-20 | 2007-10-29 | System and method for analyzing and correcting retail data |
Applications Claiming Priority (8)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US10/783,323 US7873529B2 (en) | 2004-02-20 | 2004-02-20 | System and method for analyzing and correcting retail data |
US88680107P | 2007-01-26 | 2007-01-26 | |
US88679807P | 2007-01-26 | 2007-01-26 | |
US88712207P | 2007-01-29 | 2007-01-29 | |
US89150707P | 2007-02-24 | 2007-02-24 | |
US89193307P | 2007-02-27 | 2007-02-27 | |
US97930507P | 2007-10-11 | 2007-10-11 | |
US11/926,381 US20080154884A1 (en) | 2004-02-20 | 2007-10-29 | System and method for analyzing and correcting retail data |
Related Parent Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
US10/783,323 Continuation US7873529B2 (en) | 2004-02-20 | 2004-02-20 | System and method for analyzing and correcting retail data |
Publications (1)
Publication Number | Publication Date |
---|---|
US20080154884A1 true US20080154884A1 (en) | 2008-06-26 |
Family
ID=34861204
Family Applications (21)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
US10/783,323 Active 2027-11-12 US7873529B2 (en) | 2004-02-20 | 2004-02-20 | System and method for analyzing and correcting retail data |
US11/926,347 Abandoned US20080162462A1 (en) | 2004-02-20 | 2007-10-29 | System and method for analyzing and correcting retail data |
US11/926,335 Abandoned US20080162461A1 (en) | 2004-02-20 | 2007-10-29 | System and method for analyzing and correcting retail data |
US11/926,318 Abandoned US20080143474A1 (en) | 2004-02-20 | 2007-10-29 | System and method for analyzing and correcting retail data |
US11/926,354 Abandoned US20080162223A1 (en) | 2004-02-20 | 2007-10-29 | System and method for analyzing and correcting retail data |
US11/926,381 Abandoned US20080154884A1 (en) | 2004-02-20 | 2007-10-29 | System and method for analyzing and correcting retail data |
US11/926,309 Abandoned US20080162571A1 (en) | 2004-02-20 | 2007-10-29 | System and method for analyzing and correcting retail data |
US11/926,360 Abandoned US20080162464A1 (en) | 2004-02-20 | 2007-10-29 | System and method for analyzing and correcting retail data |
US11/926,332 Abandoned US20080136582A1 (en) | 2004-02-20 | 2007-10-29 | System and method for analyzing and correcting retail data |
US11/926,343 Abandoned US20080162572A1 (en) | 2004-02-20 | 2007-10-29 | System and method for analyzing and correcting retail data |
US11/926,358 Abandoned US20080162463A1 (en) | 2004-02-20 | 2007-10-29 | System and method for analyzing and correcting retail data |
US11/926,366 Abandoned US20080140480A1 (en) | 2004-02-20 | 2007-10-29 | System and method for analyzing and correcting retail data |
US11/926,383 Abandoned US20080154885A1 (en) | 2004-02-20 | 2007-10-29 | System and method for analyzing and correcting retail data |
US11/926,368 Abandoned US20080162465A1 (en) | 2004-02-20 | 2007-10-29 | System and method for analyzing and correcting retail data |
US11/926,308 Abandoned US20080162460A1 (en) | 2004-02-20 | 2007-10-29 | System and method for analyzing and correcting retail data |
US11/926,329 Abandoned US20080256027A1 (en) | 2004-02-20 | 2007-10-29 | System and method for analyzing and correcting retail data |
US11/926,374 Abandoned US20080136583A1 (en) | 2004-02-20 | 2007-10-29 | System and method for analyzing and correcting retail data |
US11/926,372 Abandoned US20080162466A1 (en) | 2004-02-20 | 2007-10-29 | System and method for analyzing and correcting retail data |
US11/926,320 Abandoned US20080154843A1 (en) | 2004-02-20 | 2007-10-29 | System and method for analyzing and correcting retail data |
US11/926,351 Abandoned US20080162404A1 (en) | 2004-02-20 | 2007-10-29 | System and method for analyzing and correcting retail data |
US11/926,316 Abandoned US20080147459A1 (en) | 2004-02-20 | 2007-10-29 | System and method for analyzing and correcting retail data |
Family Applications Before (5)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
US10/783,323 Active 2027-11-12 US7873529B2 (en) | 2004-02-20 | 2004-02-20 | System and method for analyzing and correcting retail data |
US11/926,347 Abandoned US20080162462A1 (en) | 2004-02-20 | 2007-10-29 | System and method for analyzing and correcting retail data |
US11/926,335 Abandoned US20080162461A1 (en) | 2004-02-20 | 2007-10-29 | System and method for analyzing and correcting retail data |
US11/926,318 Abandoned US20080143474A1 (en) | 2004-02-20 | 2007-10-29 | System and method for analyzing and correcting retail data |
US11/926,354 Abandoned US20080162223A1 (en) | 2004-02-20 | 2007-10-29 | System and method for analyzing and correcting retail data |
Family Applications After (15)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
US11/926,309 Abandoned US20080162571A1 (en) | 2004-02-20 | 2007-10-29 | System and method for analyzing and correcting retail data |
US11/926,360 Abandoned US20080162464A1 (en) | 2004-02-20 | 2007-10-29 | System and method for analyzing and correcting retail data |
US11/926,332 Abandoned US20080136582A1 (en) | 2004-02-20 | 2007-10-29 | System and method for analyzing and correcting retail data |
US11/926,343 Abandoned US20080162572A1 (en) | 2004-02-20 | 2007-10-29 | System and method for analyzing and correcting retail data |
US11/926,358 Abandoned US20080162463A1 (en) | 2004-02-20 | 2007-10-29 | System and method for analyzing and correcting retail data |
US11/926,366 Abandoned US20080140480A1 (en) | 2004-02-20 | 2007-10-29 | System and method for analyzing and correcting retail data |
US11/926,383 Abandoned US20080154885A1 (en) | 2004-02-20 | 2007-10-29 | System and method for analyzing and correcting retail data |
US11/926,368 Abandoned US20080162465A1 (en) | 2004-02-20 | 2007-10-29 | System and method for analyzing and correcting retail data |
US11/926,308 Abandoned US20080162460A1 (en) | 2004-02-20 | 2007-10-29 | System and method for analyzing and correcting retail data |
US11/926,329 Abandoned US20080256027A1 (en) | 2004-02-20 | 2007-10-29 | System and method for analyzing and correcting retail data |
US11/926,374 Abandoned US20080136583A1 (en) | 2004-02-20 | 2007-10-29 | System and method for analyzing and correcting retail data |
US11/926,372 Abandoned US20080162466A1 (en) | 2004-02-20 | 2007-10-29 | System and method for analyzing and correcting retail data |
US11/926,320 Abandoned US20080154843A1 (en) | 2004-02-20 | 2007-10-29 | System and method for analyzing and correcting retail data |
US11/926,351 Abandoned US20080162404A1 (en) | 2004-02-20 | 2007-10-29 | System and method for analyzing and correcting retail data |
US11/926,316 Abandoned US20080147459A1 (en) | 2004-02-20 | 2007-10-29 | System and method for analyzing and correcting retail data |
Country Status (3)
Country | Link |
---|---|
US (21) | US7873529B2 (en) |
EP (1) | EP1723588A4 (en) |
WO (1) | WO2005081876A2 (en) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US7873529B2 (en) | 2004-02-20 | 2011-01-18 | Symphonyiri Group, Inc. | System and method for analyzing and correcting retail data |
Families Citing this family (55)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US8166101B2 (en) | 2003-08-21 | 2012-04-24 | Microsoft Corporation | Systems and methods for the implementation of a synchronization schemas for units of information manageable by a hardware/software interface system |
US8238696B2 (en) | 2003-08-21 | 2012-08-07 | Microsoft Corporation | Systems and methods for the implementation of a digital images schema for organizing units of information manageable by a hardware/software interface system |
US10325272B2 (en) * | 2004-02-20 | 2019-06-18 | Information Resources, Inc. | Bias reduction using data fusion of household panel data and transaction data |
US20080288889A1 (en) * | 2004-02-20 | 2008-11-20 | Herbert Dennis Hunt | Data visualization application |
US20080167916A1 (en) * | 2004-06-14 | 2008-07-10 | Symphonyrpm, Inc. | Decision object for associating a plurality of business plans |
CA2532374A1 (en) * | 2005-01-07 | 2006-07-07 | Masco Corporation Of Indiana | Style trend tracking tool |
US20060173864A1 (en) * | 2005-01-28 | 2006-08-03 | Microsoft Corporation | Systems and methods for reconciling image metadata |
US9069436B1 (en) * | 2005-04-01 | 2015-06-30 | Intralinks, Inc. | System and method for information delivery based on at least one self-declared user attribute |
US20080005155A1 (en) * | 2006-04-11 | 2008-01-03 | University Of Southern California | System and Method for Generating a Service Oriented Data Composition Architecture for Integrated Asset Management |
US7996256B1 (en) | 2006-09-08 | 2011-08-09 | The Procter & Gamble Company | Predicting shopper traffic at a retail store |
US8676745B2 (en) * | 2006-12-29 | 2014-03-18 | Accenture Global Services Limited | Integrated number management module and service order system |
US8504598B2 (en) | 2007-01-26 | 2013-08-06 | Information Resources, Inc. | Data perturbation of non-unique values |
WO2008157287A2 (en) * | 2007-06-14 | 2008-12-24 | The Nielsen Company (U.S.), Inc. | Methods and apparatus to weight incomplete respondent data |
US8909632B2 (en) * | 2007-10-17 | 2014-12-09 | International Business Machines Corporation | System and method for maintaining persistent links to information on the Internet |
US8655708B2 (en) * | 2008-12-19 | 2014-02-18 | The Toronto Dominion Bank | Systems and methods for generating and using trade areas associated with business branches based on correlated demographics |
US9292592B2 (en) * | 2009-05-29 | 2016-03-22 | Red Hat, Inc. | Object-based modeling using composite model object having independently updatable component objects |
US9292485B2 (en) | 2009-05-29 | 2016-03-22 | Red Hat, Inc. | Extracting data cell transformable to model object |
US8838469B2 (en) * | 2009-06-12 | 2014-09-16 | Accenture Global Services Limited | System and method for optimizing display space allocation of merchandising using regression analysis to generate space elasticity curves |
US8122038B2 (en) * | 2009-06-15 | 2012-02-21 | Microsoft Corporation | Period to date functions for time intelligence functionality |
US20110106808A1 (en) * | 2009-10-30 | 2011-05-05 | Salesforce.Com, Inc. | Multi-dimensional content organization and delivery |
US20120209697A1 (en) * | 2010-10-14 | 2012-08-16 | Joe Agresti | Bias Reduction in Internet Measurement of Ad Noting and Recognition |
US8447721B2 (en) * | 2011-07-07 | 2013-05-21 | Platfora, Inc. | Interest-driven business intelligence systems and methods of data analysis using interest-driven data pipelines |
US9967218B2 (en) * | 2011-10-26 | 2018-05-08 | Oath Inc. | Online active learning in user-generated content streams |
US9251037B2 (en) | 2011-11-04 | 2016-02-02 | Hewlett Packard Enterprise Development Lp | Providing elastic insight to information technology performance data |
US9253176B2 (en) | 2012-04-27 | 2016-02-02 | Intralinks, Inc. | Computerized method and system for managing secure content sharing in a networked secure collaborative exchange environment |
US9251360B2 (en) | 2012-04-27 | 2016-02-02 | Intralinks, Inc. | Computerized method and system for managing secure mobile device content viewing in a networked secure collaborative exchange environment |
US9553860B2 (en) | 2012-04-27 | 2017-01-24 | Intralinks, Inc. | Email effectivity facility in a networked secure collaborative exchange environment |
CA2871600A1 (en) | 2012-04-27 | 2013-10-31 | Intralinks, Inc. | Computerized method and system for managing networked secure collaborative exchange |
US8543523B1 (en) | 2012-06-01 | 2013-09-24 | Rentrak Corporation | Systems and methods for calibrating user and consumer data |
US20140089051A1 (en) * | 2012-09-25 | 2014-03-27 | Frank Piotrowski | Methods and apparatus to align panelist data with retailer sales data |
US10445777B2 (en) * | 2013-10-29 | 2019-10-15 | Verizon Patent And Licensing Inc. | Methods and systems for delivering electronic content to users in population based geographic zones |
EP3069462A4 (en) | 2013-11-14 | 2017-05-03 | Intralinks, Inc. | Litigation support in cloud-hosted file sharing and collaboration |
US9784774B2 (en) | 2014-01-06 | 2017-10-10 | The Nielsen Company (Us), Llc | Methods and apparatus to determine an operational status of a device |
GB2530685A (en) | 2014-04-23 | 2016-03-30 | Intralinks Inc | Systems and methods of secure data exchange |
US10255345B2 (en) * | 2014-10-09 | 2019-04-09 | Business Objects Software Ltd. | Multivariate insight discovery approach |
US10740774B2 (en) * | 2015-07-15 | 2020-08-11 | The Nielsen Company (Us), Llc | Reducing processing requirements to correct for bias in ratings data having interdependencies among demographic statistics |
US10033702B2 (en) | 2015-08-05 | 2018-07-24 | Intralinks, Inc. | Systems and methods of secure data exchange |
WO2017106677A1 (en) * | 2015-12-18 | 2017-06-22 | Wal-Mart Stores, Inc. | Systems and methods for resolving data discrepancy |
US10373099B1 (en) * | 2015-12-18 | 2019-08-06 | Palantir Technologies Inc. | Misalignment detection system for efficiently processing database-stored data and automatically generating misalignment information for display in interactive user interfaces |
US11354683B1 (en) | 2015-12-30 | 2022-06-07 | Videomining Corporation | Method and system for creating anonymous shopper panel using multi-modal sensor fusion |
US10262331B1 (en) | 2016-01-29 | 2019-04-16 | Videomining Corporation | Cross-channel in-store shopper behavior analysis |
US10963893B1 (en) | 2016-02-23 | 2021-03-30 | Videomining Corporation | Personalized decision tree based on in-store behavior analysis |
US10387896B1 (en) | 2016-04-27 | 2019-08-20 | Videomining Corporation | At-shelf brand strength tracking and decision analytics |
US10354262B1 (en) | 2016-06-02 | 2019-07-16 | Videomining Corporation | Brand-switching analysis using longitudinal tracking of at-shelf shopper behavior |
US10776740B2 (en) * | 2016-06-07 | 2020-09-15 | International Business Machines Corporation | Detecting potential root causes of data quality issues using data lineage graphs |
US10776728B1 (en) | 2016-06-07 | 2020-09-15 | The Nielsen Company (Us), Llc | Methods, systems and apparatus for calibrating data using relaxed benchmark constraints |
US11062365B2 (en) * | 2017-03-23 | 2021-07-13 | Walmart Apollo, Llc | Systems and methods for correcting incorrect product information in an electronic data catalog |
CN107358035A (en) * | 2017-06-28 | 2017-11-17 | 广东技术师范学院 | A kind of portable medical data digging system |
US11449880B2 (en) | 2018-11-01 | 2022-09-20 | Nielsen Consumer Llc | Methods, systems, apparatus and articles of manufacture to model eCommerce sales |
US11537961B2 (en) * | 2019-04-22 | 2022-12-27 | Walmart Apollo, Llc | Forecasting system |
CA3078881A1 (en) | 2019-04-22 | 2020-10-22 | Walmart Apollo, Llc | Forecasting system |
US11544653B2 (en) * | 2019-06-24 | 2023-01-03 | Overstock.Com, Inc. | System and method for improving product catalog representations based on product catalog adherence scores |
US11205214B2 (en) | 2019-07-29 | 2021-12-21 | Luke MARIETTA | Method and system for automatically replenishing consumable items |
CN112667699A (en) * | 2019-10-15 | 2021-04-16 | 深圳海知科技有限公司 | Intelligent security comparison method and system based on individual, group and overall multilevel |
CN112132705B (en) * | 2020-09-30 | 2023-07-11 | 国网智能科技股份有限公司 | Substation panoramic data storage and reproduction method and system |
Citations (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US6401070B1 (en) * | 1996-10-11 | 2002-06-04 | Freddie Mac | System and method for providing house price forecasts based on repeat sales model |
Family Cites Families (34)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US4731585A (en) * | 1987-02-24 | 1988-03-15 | Kabushiki Kaisha Toshiba | Antenna coupling circuit for magnetic resonance imaging |
US5041972A (en) | 1988-04-15 | 1991-08-20 | Frost W Alan | Method of measuring and evaluating consumer response for the development of consumer products |
US5758257A (en) | 1994-11-29 | 1998-05-26 | Herz; Frederick | System and method for scheduling broadcast of and access to video programs and other data using customer profiles |
US5719573A (en) * | 1995-06-01 | 1998-02-17 | Cirrus Logic, Inc. | Analog modulator for A/D converter utilizing leap-frog filter |
US6011432A (en) * | 1998-07-23 | 2000-01-04 | Lucent Technologies Inc. | Continuous tuning of switched capacitor circuits using DC-isolated tuning elements |
US6430539B1 (en) | 1999-05-06 | 2002-08-06 | Hnc Software | Predictive modeling of consumer financial behavior |
US6662192B1 (en) | 2000-03-29 | 2003-12-09 | Bizrate.Com | System and method for data collection, evaluation, information generation, and presentation |
US6708156B1 (en) | 2000-04-17 | 2004-03-16 | Michael Von Gonten, Inc. | System and method for projecting market penetration |
US6636585B2 (en) | 2000-06-26 | 2003-10-21 | Bearingpoint, Inc. | Metrics-related testing of an operational support system (OSS) of an incumbent provider for compliance with a regulatory scheme |
US6636862B2 (en) | 2000-07-05 | 2003-10-21 | Camo, Inc. | Method and system for the dynamic analysis of data |
TW581955B (en) | 2000-10-27 | 2004-04-01 | Manugistics Inc | Supply chain demand forecasting and planning |
US7523047B1 (en) * | 2000-12-20 | 2009-04-21 | Demandtec, Inc. | Price optimization system |
US20020099597A1 (en) | 2000-12-27 | 2002-07-25 | Michael Gamage | Method for analyzing assortment of retail product |
US20030088474A1 (en) | 2001-03-23 | 2003-05-08 | Restaurant Services, Inc. ("RSI"). | System, method and computer program product for an electronics and appliances supply chain management framework |
US7171379B2 (en) * | 2001-03-23 | 2007-01-30 | Restaurant Services, Inc. | System, method and computer program product for normalizing data in a supply chain management framework |
US20030046120A1 (en) * | 2001-03-23 | 2003-03-06 | Restaurant Services, Inc. | System, method and computer program product for evaluating the success of a promotion in a supply chain management framework |
US20030028417A1 (en) | 2001-05-02 | 2003-02-06 | Fox Edward J. | Method for evaluating retail locations |
US20040260599A1 (en) | 2001-05-04 | 2004-12-23 | Stefan Ziegele | System and methods for estimating product sales in highly fragmented geographically segments of service provider location |
US20030083925A1 (en) | 2001-11-01 | 2003-05-01 | Weaver Chana L. | System and method for product category management analysis |
US20030171978A1 (en) | 2002-03-11 | 2003-09-11 | Jenkins Margalyn Toi | Efficient retail item assortment |
US7027843B2 (en) * | 2002-03-21 | 2006-04-11 | Lenovo (Singapore) Pte. Ltd. | Wireless device power optimization |
US20060020641A1 (en) | 2002-03-25 | 2006-01-26 | Data Quality Solutions | Business process management system and method |
US7734495B2 (en) | 2002-04-23 | 2010-06-08 | Kimberly-Clark Worldwide, Inc. | Methods and system for allocating shelf space |
US7426520B2 (en) | 2003-09-10 | 2008-09-16 | Exeros, Inc. | Method and apparatus for semantic discovery and mapping between data sources |
US7379890B2 (en) | 2003-10-17 | 2008-05-27 | Makor Issues And Rights Ltd. | System and method for profit maximization in retail industry |
US10325272B2 (en) | 2004-02-20 | 2019-06-18 | Information Resources, Inc. | Bias reduction using data fusion of household panel data and transaction data |
US7949639B2 (en) | 2004-02-20 | 2011-05-24 | Symphonyiri Group, Inc. | Attribute segments and data table bias reduction |
US7873529B2 (en) | 2004-02-20 | 2011-01-18 | Symphonyiri Group, Inc. | System and method for analyzing and correcting retail data |
US20080288889A1 (en) | 2004-02-20 | 2008-11-20 | Herbert Dennis Hunt | Data visualization application |
US20080168028A1 (en) | 2004-02-20 | 2008-07-10 | Kruger Michael W | System and method for analyzing and correcting retail data |
US20080168027A1 (en) | 2004-02-20 | 2008-07-10 | Kruger Michael W | System and method for analyzing and correcting retail data |
US20080256028A1 (en) | 2004-02-20 | 2008-10-16 | Kruger Michael W | System and method for analyzing and correcting retail data |
US20080147699A1 (en) | 2004-02-20 | 2008-06-19 | Kruger Michael W | System and method for analyzing and correcting retail data |
US20080168104A1 (en) | 2004-06-22 | 2008-07-10 | Kruger Michael W | System and method for analyzing and correcting retail data |
-
2004
- 2004-02-20 US US10/783,323 patent/US7873529B2/en active Active
-
2005
- 2005-02-22 WO PCT/US2005/005297 patent/WO2005081876A2/en active Application Filing
- 2005-02-22 EP EP05713819A patent/EP1723588A4/en not_active Withdrawn
-
2007
- 2007-10-29 US US11/926,347 patent/US20080162462A1/en not_active Abandoned
- 2007-10-29 US US11/926,335 patent/US20080162461A1/en not_active Abandoned
- 2007-10-29 US US11/926,318 patent/US20080143474A1/en not_active Abandoned
- 2007-10-29 US US11/926,354 patent/US20080162223A1/en not_active Abandoned
- 2007-10-29 US US11/926,381 patent/US20080154884A1/en not_active Abandoned
- 2007-10-29 US US11/926,309 patent/US20080162571A1/en not_active Abandoned
- 2007-10-29 US US11/926,360 patent/US20080162464A1/en not_active Abandoned
- 2007-10-29 US US11/926,332 patent/US20080136582A1/en not_active Abandoned
- 2007-10-29 US US11/926,343 patent/US20080162572A1/en not_active Abandoned
- 2007-10-29 US US11/926,358 patent/US20080162463A1/en not_active Abandoned
- 2007-10-29 US US11/926,366 patent/US20080140480A1/en not_active Abandoned
- 2007-10-29 US US11/926,383 patent/US20080154885A1/en not_active Abandoned
- 2007-10-29 US US11/926,368 patent/US20080162465A1/en not_active Abandoned
- 2007-10-29 US US11/926,308 patent/US20080162460A1/en not_active Abandoned
- 2007-10-29 US US11/926,329 patent/US20080256027A1/en not_active Abandoned
- 2007-10-29 US US11/926,374 patent/US20080136583A1/en not_active Abandoned
- 2007-10-29 US US11/926,372 patent/US20080162466A1/en not_active Abandoned
- 2007-10-29 US US11/926,320 patent/US20080154843A1/en not_active Abandoned
- 2007-10-29 US US11/926,351 patent/US20080162404A1/en not_active Abandoned
- 2007-10-29 US US11/926,316 patent/US20080147459A1/en not_active Abandoned
Patent Citations (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US6401070B1 (en) * | 1996-10-11 | 2002-06-04 | Freddie Mac | System and method for providing house price forecasts based on repeat sales model |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US7873529B2 (en) | 2004-02-20 | 2011-01-18 | Symphonyiri Group, Inc. | System and method for analyzing and correcting retail data |
Also Published As
Publication number | Publication date |
---|---|
US20080154885A1 (en) | 2008-06-26 |
US20080162404A1 (en) | 2008-07-03 |
US20080143474A1 (en) | 2008-06-19 |
US20080162462A1 (en) | 2008-07-03 |
WO2005081876A3 (en) | 2005-11-17 |
US20080154843A1 (en) | 2008-06-26 |
US20080162460A1 (en) | 2008-07-03 |
US20080162461A1 (en) | 2008-07-03 |
US20080162463A1 (en) | 2008-07-03 |
US20080162465A1 (en) | 2008-07-03 |
US20080136582A1 (en) | 2008-06-12 |
EP1723588A2 (en) | 2006-11-22 |
US7873529B2 (en) | 2011-01-18 |
WO2005081876A2 (en) | 2005-09-09 |
US20080162572A1 (en) | 2008-07-03 |
US20080140480A1 (en) | 2008-06-12 |
US20080162223A1 (en) | 2008-07-03 |
US20080256027A1 (en) | 2008-10-16 |
EP1723588A4 (en) | 2009-07-15 |
US20080136583A1 (en) | 2008-06-12 |
US20050187972A1 (en) | 2005-08-25 |
US20080162466A1 (en) | 2008-07-03 |
US20080162464A1 (en) | 2008-07-03 |
US20080162571A1 (en) | 2008-07-03 |
US20080147459A1 (en) | 2008-06-19 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US7873529B2 (en) | System and method for analyzing and correcting retail data | |
US20080168027A1 (en) | System and method for analyzing and correcting retail data | |
US20080256028A1 (en) | System and method for analyzing and correcting retail data | |
US20230316206A1 (en) | Methods and apparatus for the formatting of data values that may be arbitrary or indeterminate collected from a plurality of sources | |
US20080262900A1 (en) | Methods and apparatus to facilitate sales estimates | |
US20080168028A1 (en) | System and method for analyzing and correcting retail data | |
US20080147699A1 (en) | System and method for analyzing and correcting retail data | |
US20100070339A1 (en) | Associating an Entity with a Category | |
US7174304B1 (en) | System and method for estimating product distribution using a product specific universe | |
US20080168104A1 (en) | System and method for analyzing and correcting retail data | |
US20090192914A1 (en) | Adaptive Lead Pricing | |
US20190164180A1 (en) | Methods, systems, apparatus and articles of manufacture to generate projection weights for a panel | |
Sun et al. | Geo-level bayesian hierarchical media mix modeling | |
Korevaar et al. | Comparison of statistical methods used to meta-analyse results from interrupted time series studies: an empirical study | |
Anand et al. | Retail Analysis—Walmart’s Trend Assessment | |
Smith et al. | Territorial Economic Impact Index: Measuring the ongoing effects of long-term disruptions to Pacific Island Territories | |
US20090313284A1 (en) | Data Integration Method |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
STCB | Information on status: application discontinuation |
Free format text: ABANDONED -- FAILURE TO RESPOND TO AN OFFICE ACTION |