WO2022221743A1 - System and method for estimation of treatment effects from observational and corrupted a/b testing data - Google Patents

System and method for estimation of treatment effects from observational and corrupted a/b testing data Download PDF

Info

Publication number
WO2022221743A1
WO2022221743A1 PCT/US2022/025140 US2022025140W WO2022221743A1 WO 2022221743 A1 WO2022221743 A1 WO 2022221743A1 US 2022025140 W US2022025140 W US 2022025140W WO 2022221743 A1 WO2022221743 A1 WO 2022221743A1
Authority
WO
WIPO (PCT)
Prior art keywords
data
estimation
treatment
matrix
treatment effects
Prior art date
Application number
PCT/US2022/025140
Other languages
French (fr)
Inventor
Vivek Francis FARIAS
Andrew Li
Tianyi PENG
Original Assignee
Massachusetts Institute Of Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Massachusetts Institute Of Technology filed Critical Massachusetts Institute Of Technology
Publication of WO2022221743A1 publication Critical patent/WO2022221743A1/en

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N5/00Computing arrangements using knowledge-based models
    • G06N5/01Dynamic search techniques; Heuristics; Dynamic trees; Branch-and-bound
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q30/00Commerce
    • G06Q30/02Marketing; Price estimation or determination; Fundraising
    • G06Q30/0201Market modelling; Market analysis; Collecting market data
    • G06Q30/0202Market predictions or forecasting for commercial activities
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q30/00Commerce
    • G06Q30/02Marketing; Price estimation or determination; Fundraising
    • G06Q30/0241Advertisements
    • G06Q30/0242Determining effectiveness of advertisements

Definitions

  • a system for building experiments in the real world that suffer from imperfect controls and to infer correctly from the experiments.
  • the system contains a storage device for storing transaction data and intervention data, and an estimation engine that performs the steps of: receiving transaction data and intervention data, also referred to herein as observational data; organizing the observational data as a matrix or tensor; transforming the transaction data and intervention data into a panel format; using a de-biased matric completion algorithm to learn treatment effects of promotions at each store at each time period; and validating learned treatment effects.
  • FIG. 1 is a schematic diagram illustrating a network in which the present system and method may be implemented.
  • an entity might correspond to a geographic region with the associated sequence of observations corresponding to some economic time series of interest.
  • an entity may correspond to a product with the associated sequence of observations corresponding to the sales of that product over time, or with a customer, with the associated sequence of observations corresponding to site-visits for that customer over time.
  • the memory 106 can include any one or combination of volatile memory elements (e.g., random access memory (RAM, such as DRAM, SRAM, SDRAM, etc.)) and nonvolatile memory elements (e.g., ROM, hard drive, tape, CDROM, etc.). Moreover, the memory may incorporate electronic, magnetic, optical, and/or other types of storage media. Note that the memory can have a distributed architecture, where various components are situated remotely from one another, but can be accessed by the processor.
  • the software 110 defines functionality performed by the estimation system 100, in accordance with the present invention.
  • the software 110 in the memory 106 may include one or more separate programs, each of which contains an ordered listing of executable instructions for implementing logical functions of the estimation system 100, as described below.
  • the memory may contain an operating system (O/S) 170.
  • the operating system 170 essentially controls the execution of programs within the estimation system 100 and provides scheduling, input-output control, file and data management, memory management, and communication control and related services. It is noted that the combination of the processor 102, and the memory 106, having the software 110 and operating system 170 therein, may be referred to herein as the estimation engine 108.
  • the convex estimator utilized for scalar r * has a natural generalization; specifically, we consider as our first step, computing a Tough' estimate of the treatment effect, by solving a natural optimization problem:

Landscapes

  • Engineering & Computer Science (AREA)
  • Business, Economics & Management (AREA)
  • Strategic Management (AREA)
  • Accounting & Taxation (AREA)
  • Development Economics (AREA)
  • Finance (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Entrepreneurship & Innovation (AREA)
  • Data Mining & Analysis (AREA)
  • General Business, Economics & Management (AREA)
  • Game Theory and Decision Science (AREA)
  • Marketing (AREA)
  • Economics (AREA)
  • Artificial Intelligence (AREA)
  • Computational Linguistics (AREA)
  • Evolutionary Computation (AREA)
  • Computing Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

A system is provided for building experiments in the real world that suffer from imperfect controls and to infer correctly from the experiments. The system contains a storage device for storing transaction data and intervention data, and an estimation engine that performs the steps of: receiving transaction data and intervention data, also referred to herein as observational data; organizing the observational data as a matrix or tensor; transforming the transaction data and intervention data into a panel format; using a de-biased matric completion algorithm to learn treatment effects of promotions at each store at each time period; and validating learned treatment effects.

Description

SYSTEM AND METHOD FOR ESTIMATION OF TREATMENT EFFECTS FROM OBSERVATIONAL AND CORRUPTED A/B TESTING DATA
CROSS-REFERENCE TO RELATED APPLICATIONS
This application claims the benefit of priority to U.S. Provisional Patent Application No.
63/175,500, entitled Causal Inference In Tensor Data with Multiple Treatments Applied in
General Treatment Patterns, which was filed on April 15, 2021. The disclosure of the prior application is incorporated by reference herein in its entirety.
FIELD OF THE INVENTION
The present invention relates to systems for estimating impact of decisions prior to making such decision, and particularly doing so in a manner that minimizes the effect of interference.
BACKGROUD OF THE INVENTION
When considering a digital service that interacts with customers in a variety of different modes on its platform, and seeks to understand the differential impact of any of these modes of interaction on a customer outcome of interest, such impacts are often effectively measured via A/B testing. Modem A/B testing seeks to estimate “conditional average treatment effects” at increasingly fine levels of granularity. In commerce applications, the demands of finer consumer/ product-level granularity and a large palette of potential treatments lead to an untenable explosion in the number of A/B tests required (from the perspective of samples, and the cost/ complexity of maintaining such tests). Given this challenge, it is desirable to estimate answers to questions typically answered by A/B testing using data that is already available - so-called “observational data”.
Different methods have been attempted to address this problem, with lacking results. The
‘Synthetic Control’ [Abadie et. al. 2003, 2010] framework presents a conceptual approach to addressing the problem above, by compensating for the lack of an actual control (that would exist in A/B testing) with a ‘synthetic’ one constructed as a composite of other treatments units. In the present example this would correspond to constructing a control for a given customer (or group of customers) with a combination of other customers in the dataset. The synthetic control framework is restricted in its scope: (1) it allows for a limited set of data structures with respect to the observational data available, allowing for only ‘panels’ (or matrices). (2) more importantly, it allows for a very restricted set of treatment patterns (i.e. the pattern of data elements that are subject to a treatment must essentially be block shaped). These restrictions preclude the sort of observational data one would have access to on a typical digital platform/ service.
Recently, there have been efforts to relax synthetic control restrictions, by employing vanilla matrix completion techniques, but they continue to be limited in their use. For example, the approach of [Athey et. al. 2018] presents no error recovery guarantees so that any inference on the correctness of the estimates is not available (i.e. the approach provides no guarantees on whether the treatment effects estimated by the approach are in fact correct). As another example, the approach of [XiongPelger 2019], which places restrictions on the data requiring it to be
‘stationary’, is untenable on a fast-evolving digital platform where emergent trends are routine.
As a further example, the approaches of [Amjad et. al. 2018] and [Agarwal et. al. 2020], like synthetic control, place severe restrictions on the treatment pattern. Therefore, there is a need to address the shortcomings of the current state of the art. SUMMARY OF THE INVENTION
The present system and method provides a new approach to estimating treatment effects from data panels that might consist entirely of observational data, or else data obtained from A/B tests that have potentially been corrupted for a variety of reasons. The present system and method provides a new approach to overcoming the lack of an obvious control in these settings that would otherwise have served to establish counterfactual outcomes. It first describes how such data may be manipulated so as to be viewed as the sum of a low rank counterfactual matrix, a noise matrix and a treatment matrix. The present system and method then prescribes an optimization algorithm to recover the average treatment effect corresponding to the treatment matrix along with the counterfactual matrix. It is shown that the optimization algorithm of the estimation engine has many desirable properties, including, but not limited to, it being optimal from among all possible algorithms for this problem in a sense we make precise.
A system is provided for building experiments in the real world that suffer from imperfect controls and to infer correctly from the experiments. The system contains a storage device for storing transaction data and intervention data, and an estimation engine that performs the steps of: receiving transaction data and intervention data, also referred to herein as observational data; organizing the observational data as a matrix or tensor; transforming the transaction data and intervention data into a panel format; using a de-biased matric completion algorithm to learn treatment effects of promotions at each store at each time period; and validating learned treatment effects.
Other systems, methods, features, and advantages of the present invention will be or become apparent to one with skill in the art upon examination of the following drawings and detailed description. It is intended that all such additional systems, methods, features, and advantages be included within this description, be within the scope of the present invention, and be protected by the accompanying claims.
BRIEF DESCRIPTION OF THE DRAWINGS
FIG. 1 is a schematic diagram illustrating a network in which the present system and method may be implemented.
FIG. 2 is a schematic diagram further illustrating the estimation system of FIG. 1.
FIG. 3 is a flowchart that illustrates steps performed by the present estimation system.
FIG. 4 is a diagram where the left side provides a promotion pattern of real data, while the right side illustrates an estimation of ԏ and test errors.
FIG. 5 is a diagram providing an example of transaction data transformed into a panel format.
FIG. 6 is a diagram providing an example of intervention data transformed into a panel format.
DETAILED DESCRIPTION
The problem of causal inference with panel data is a central problem in econometrics.
The following is a formulation of a fundamental version of this problem: Let M* be a low rank matrix and E be a random matrix with independent, zero-mean, sub-Gaussian entries. For a
‘treatment’ matrix Z with entries in {0, 1 } and some ‘treatment effect’ ԏ* ∈R, we observe the matrix O with entries
Figure imgf000006_0001
where are unknown, heterogenous treatment effects. The problem requires we estimate the average treatment effect ԏ* .This setup finds broad applications in causal inference questions for multi-variate time-series data, which arise in areas ranging from macroeconomics and policy evaluation to e-commerce.
The present system and method provide an estimator for ԏ* that is rate-optimal and asymptotically normal under general conditions on the structure of Z. In particular, present recovery guarantees are valid under a set of conditions on the treatment matrix Z which relate to its projection on the tangent space of M*. Should the conditions on Z be violated by an amount that grows negligibly small with problem size, no algorithm can recover ԏ*. Therefore, the present system and method generalizes the synthetic control paradigm to allow for general treatment patterns. The recovery guarantees of the present system and method are the first of their type. Utilization of an estimator of the present invention on synthetic and real-world data show a substantial advantage over competing matrix completion-based estimators.
Consider the following common econometric problem: one is provided a sequence of T observations on each of n distinct entities, or units. As a concrete example, in the policy evaluation literature, an entity might correspond to a geographic region with the associated sequence of observations corresponding to some economic time series of interest. In e- commerce, an entity may correspond to a product with the associated sequence of observations corresponding to the sales of that product over time, or with a customer, with the associated sequence of observations corresponding to site-visits for that customer over time.
For each entity, some subset of its observations is potentially impacted by the application of a ‘treatment’, or intervention. For example, this may correspond to the implementation of a new policy (in the policy evaluation context), or the application of a new type of promotion (in the e-commerce context). The econometric question at hand is to estimate the average effect of this treatment. This problem is of immense applied importance in modem econometrics. This problem can be formalized: Let M* ∈ RnxT be a fixed, unknown matrix and £ be a zero-mean random matrix; we refer to M* + E as the "counterfactual" matrix with each row corresponding to a distinct "unit" A known ‘treatment' matrix Z with entries in {0, 1} encodes observations impacted bv an intervention. Specifically. we observe a matrix O with entries where the are unknown. heterogenous treatment effects . The goal is
Figure imgf000008_0001
to recover the average treatment effect
Figure imgf000008_0002
Herein, we refer to this problem as the panel data problem. Note that while we focus on a constant treatment effect, which one may view as capturing the average treatment effect of an intervention, an extension to multiple treatments is also presented.
To allow for meaningful solutions to this problem, assumptions are made on M* and Z; consider the following:
(1) Imputation of Counterfactual Observations: One must make assumptions on M* that, loosely speaking, allow for the imputation of counterfactual entries of M* (i.e., entries in the support of Z). One assumption made to this end is to at least assume M* is a low (say, r) rank matrix.
(2) Identifiability: We must rule out the existence of a rank r matrix M', distinct from M* for which M' = M* + δZ for some 8 0, or else identifying ԏ* is impossible even if E is identically zero.
The synthetic control paradigm, as described in Abadie et al., 2010, Abadie and
Gardeazabal, 2003, which is incorporated herein by reference in its entirety, for instance, requires that Z has support on part of a single row and that the treated row of M* be in the row-space of the untreated rows. An estimator based on linear regression can then allow for estimation of counterfactual values of M*. Other approaches based on the use of propensity scores require knowledge of a generative model for Z and M* to estimate counterfactual values.
More generally, central to any successful approach, one needs a set of assumptions on M* and Z and a complementary approach to imputing counterfactual values of M*.
Against this backdrop, it is natural to imagine that the recent literature on matrix completion might be leveraged to fruitfully solve the panel data problem in greater generality.
Full generality here would mean allowing for arbitrary rank-r matrices M*, and all Z such that M* remains identifiable. That is all Z for which there exists no rank r matrix M' = M* + δZ for some non-zero δ.
Attempts to leverage the matrix completion literature to solve the panel data problem essentially view treated entries of M* (i.e., entries in the support of Z) as missing, and then seek to leverage matrix completion techniques to impute these missing entries. Whereas this has the benefit of not assuming any structure a-priori to the impact of the treatments, this approach runs into certain fundamental challenges:
1. Structured Z: Matrix completion techniques typically require that entries be missing at random. In the context of the panel data problem, however, Z is highly structured so that treating the entries in the support of Z as missing raises immediate challenges. Conversely, if Z were a random matrix with i.i.d entries (corresponding to the missingness pattern most commonly assumed in matrix completion), the panel data problem is already trivial.
2. Error Guarantees: Ignoring the challenge above, imagine one were able to construct an estimate of M* (say,
Figure imgf000009_0002
) and we then proceeded to estimate
Figure imgf000009_0001
by measuring the average difference between observations and estimated counterfactuals on the treated entries. It is unclear that even an (optimal) O(l/√n) guarantee on would yield useful
Figure imgf000009_0003
guarantees on
Figure imgf000010_0001
As it turns out, even optimal entry-wise bounds of
Figure imgf000010_0002
would yield substantially sub-optimal guarantees on
Figure imgf000010_0003
These issues lay bare the challenges with any approach that views the treated observations as ‘missing’: not only does one need to deal with a potentially non-random missingness pattern, but in addition, one would need some sort of entry-wise control on the estimation error of counterfactual observations. And even that would yield substantially sub- optimal guarantees for These challenges are reflected in the available results. For
Figure imgf000010_0004
example, Athey et al., 2018, which is incorporated by reference in its entirety herein, shows that the recovery of subspaces is possible (with error measured in the Frobenius norm) for a certain stylized choice of Z. This does not lead to guarantees for estimating treatment effects. Agarwal et al., 2020 and Amjad et al., 2018 require, like standard synthetic control, a ‘block’ treatment pattern, along with an assumption on a certain sub-matrix of M* that trivializes the completion problem. Xiong and Pelger, 2019 make distributional assumptions on M* and Z that effectively require stationarity in the panel data. Additional work, not necessarily related to panel data, but on matrix completion with non-standard observation patterns, exists but by and large is not obviously useful or applicable to the present problem.
The present system and method provide for a unique and novel estimation of multiple treatment effects in observational data that allows for general treatment patterns, provides confidence intervals for any estimated treatment effect, and is extensible to tensors. The present approach is powered by a novel de-biasing technique provided by the present system, which can be implemented for general treatment patterns and multiple treatment effects. The system yields sharp, statistically optimal recovery guarantees and provides a user with a precise joint distribution on the error of estimating all potential treatment effects. In empirical experiments, it is noted that the present system and method provides significantly improved error rates relative to prior approaches even in settings where those approaches might be used. In particular, the functionality of the estimation approach of the present system and method addresses the needs of answering questions germane to multiple A/B testing using observational data.
FIG. 1 is a schematic diagram illustrating a network 2 in which the present system and method may be implemented. As shown by FIG. 1, the network 2 may contain a data server 10 that is in communication with the estimation system of the present invention 100 via the internet
30. In addition, the network 2 may contain a database 20 that is in communication with the estimation system 100 of the present invention via the internet 30. It should be noted that the use of a network is not a requirement of the present system and method, but instead, the present system and method may be implemented on a stand-alone computer. Observational data may be stored within the server 10 and/or the database 20, which is used by the estimation system 100, as is described in further detail herein. It should also be noted that there need not be both a server 10 and a database 20. Instead, only one of the two may be within the network.
Alternatively, the observational data may already be stored within the estimation system 100, thereby alleviating the need for a server 10 or an external database 20 that would be storing and transmitting the observational data. In addition, there may be more than one server 10 and/’ or more than one external database 20.
The term observational data is intended to cover any data collected in the course of monitoring some performance indicator (or indicators) of interest across a number of distinct observational units (customers, stores, etc.) over time.
Functionality as performed by the present system and method is defined by software modules within the estimation system 100. The estimation system 100, which is illustrated in further detail in FIG. 2, may contain a processor 102, an internal storage device 104, a memory
106 having software 110 stored therein that defines functionality of the estimation system, input and output (I/O) devices (or peripherals) 174, and a local bus 172, or local interface allowing for communication within the estimation system 100. The local interface can be, for example but not limited to, one or more buses or other wired or wireless connections, as is known in the art.
The local bus may have additional elements, which are omitted for simplicity, such as controllers, buffers (caches), drivers, repeaters, and receivers, to enable communications.
Further, the local bus 172 may include address, control, and/or data connections to enable appropriate communications among the aforementioned components.
The processor 102 is a hardware device for executing software, particularly that stored in the memory 106. The processor 102 can be any custom made or commercially available single core or multi-core processor, a central processing unit (CPU), an auxiliary processor among several processors associated with the present server, a semiconductor based microprocessor (in the form of a microchip or chip set), a microprocessor, or generally any device for executing software instructions.
The memory 106 can include any one or combination of volatile memory elements (e.g., random access memory (RAM, such as DRAM, SRAM, SDRAM, etc.)) and nonvolatile memory elements (e.g., ROM, hard drive, tape, CDROM, etc.). Moreover, the memory may incorporate electronic, magnetic, optical, and/or other types of storage media. Note that the memory can have a distributed architecture, where various components are situated remotely from one another, but can be accessed by the processor.
The software 110 defines functionality performed by the estimation system 100, in accordance with the present invention. The software 110 in the memory 106 may include one or more separate programs, each of which contains an ordered listing of executable instructions for implementing logical functions of the estimation system 100, as described below. The memory may contain an operating system (O/S) 170. The operating system 170 essentially controls the execution of programs within the estimation system 100 and provides scheduling, input-output control, file and data management, memory management, and communication control and related services. It is noted that the combination of the processor 102, and the memory 106, having the software 110 and operating system 170 therein, may be referred to herein as the estimation engine 108.
The I/O devices 174 may include input devices, for example but not limited to, a keyboard, mouse, scanner, microphone, etc. Furthermore, the I/O devices 174 may also include output devices, for example but not limited to, a printer, display, etc. Finally, the I/O devices
174 may further include devices that communicate via both inputs and outputs, for instance but not limited to, a modulator/demodulator (modem; for accessing another device, system, or network), a radio frequency (RF) or other transceiver, a telephonic interface, a bridge, a router, or other device.
When the estimation system 100 is in operation, the processor 102 is configured to execute the software 110 stored within the memory 106, to communicate data to and from the memory 106, and to generally control operations of the estimation system 100 pursuant to the software 110, as explained herein.
When the functionality of the estimation system 100 is in operation, the processor 102 is configured to execute the software 110 stored within the memory 106, to communicate data to and from the memory 106, and to generally control operations of the estimation system 100 pursuant to the software 110. The operating system 170 is read by the processor 102, perhaps buffered within the processor 102, and then executed.
When functionality of the estimation system 100 is implemented in software, it should be noted that instructions for implementing the estimation system, can be stored on any computer- readable medium for use by or in connection with any computer-related device, system, or method. Such a computer-readable medium may, in some embodiments, correspond to either or both the memory or the storage device. In the context of this document, a computer-readable medium is an electronic, magnetic, optical, or other physical device or means that can contain or store a computer program for use by or in connection with a computer-related device, system, or method. Instructions for implementing the system can be embodied in any computer-readable medium for use by or in connection with the processor or other such instruction execution system, apparatus, or device. Although the processor has been mentioned by way of example, such instruction execution system, apparatus, or device may, in some embodiments, be any computer-based system, processor-containing system, or other system that can fetch the instructions from the instruction execution system, apparatus, or device and execute the instructions. In the context of this document, a “computer-readable medium” can be any means that can store, communicate, propagate, or transport the program for use by or in connection with the processor or other such instruction execution system, apparatus, or device.
Such a computer-readable medium can be, for example but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, device, or propagation medium. More specific examples (a nonexhaustive list) of the computer-readable medium would include the following: an electrical connection (electronic) having one or more wires, a portable computer diskette (magnetic), a random access memory (RAM) (electronic), a read-only memory (ROM) (electronic), an erasable programmable read-only memory (EPROM,
EEPROM, or Flash memory) (electronic), an optical fiber (optical), and a portable compact disc read-only memory (CDROM) (optical). Note that the computer-readable medium could even be paper or another suitable medium upon which the program is printed, as the program can be electronically captured, via for instance optical scanning of the paper or other medium, then compiled, interpreted or otherwise processed in a suitable manner if necessary, and then stored in a computer memory.
In an alternative embodiment, where functionality of the estimation system 100 is implemented in hardware, the functionality can be implemented with any or a combination of the following technologies, which are each well known in the art: a discrete logic circuits) having logic gates for implementing logic functions upon data signals, an application specific integrated circuit (ASIC) having appropriate combinational logic gates, a programmable gate array(s)
(PGA), a field programmable gate array (FPGA), etc.
The following provides a more detailed description of the present system and method in accordance with exemplary embodiments of the invention. FIG. 3 is a flowchart that illustrates steps performed by the present system. The following description, with regard to FIG. 3, describes these steps in detail. It should be noted that any process descriptions should be understood as representing modules, segments, portions of code, or steps that include one or more instructions for implementing specific logical functions in the process, and alternative implementations are included within the scope of the present invention in which functions may be executed out of order from that shown or discussed, including substantially concurrently or in reverse order, depending on the functionality involved, as would be understood by those reasonably skilled in the art of the present invention. As shown by block 112, the estimation engine 108 first receives transaction and intervention data, also referred to as observational data. This data is used for estimating promotion effects. Two types of observational data are received. A first type of data received is transaction data, for example, at a store level, over a specified period of time. A second type of data is promotions data, for example, at the store level, indicating the starting time and the ending time of a promotion at the store. Such data may be received from the server 10 (FIG. 1), the database 20 (FIG. 1), stored within the storage device 104 of the estimation system 100 (FIG.
2), or received in real-time.
The estimation engine 108 organizes the observational data as a matrix or tensor, as shown by block 114. Entries in the matrix or tensor are either marked ‘untreated’ or else marked with any treatments applicable to the entry. Herein, the term “treatment” refers to interventions, such as, but not limited to, product promotions or advertising strategies. In practice, data organized in typical NoSql databases such as Big Table are routinely organized in this fashion.
Concretely, on a digital platform the modes of such a tensor would correspond respectively to customers, products and time at a minimum; the keys in the database would thus be (customer, product, time) tuples. Matrix completion methods present a means to allow for inference with general treatment patterns.
As shown by block 122, the received raw data, namely, the transaction and intervention data, is transformed into a panel format. An example of transaction data transformed into a panel format is illustrated by FIG. 5 and an example of intervention data transformed into a panel format is illustrated by FIG. 6. Specifically, there are at least two different sequences of paneling data. A first paneling sequence involves paneling data with sales, in which a matrix is created, where rows are individual stores, columns are time units (e.g., weeks), and entries are the sales at each store at each time. A second paneling sequence involves paneling data with interventions, in which a matrix is created, where rows are individual stores, columns are time units, and entries indicate whether a promotion occurred at each store at each time.
As shown by block 132, the estimation engine 108 then learns the effects of treatment.
Specifically, the estimation engine 108 uses a novel de-biased matric completion algorithm to learn the treatment effects of promotions at each store at each time period. Equations 7 and 8, as provided herein, are used as part of this de-biased matric completion algorithm to learn the treatment effects of promotions at each store at each time period. The estimation engine 108, through this functionality, also produces confidence intervals on these same treatment effects.
Theorem 2 described herein combined with the following, assist in producing confidence intervals. The present system and method begins by considering the following natural convex estimator for M* or ԏ* to get to a ‘rough’ first estimate of ԏ*:
Figure imgf000017_0001
Here X is a regularizer. Denote by
Figure imgf000017_0002
an optimal solution to this program. While no error guarantees have been available for
Figure imgf000017_0003
heretofore, this is arguably a natural choice of estimator.
Crucially, this estimator utilizes all observations to simultaneously learn both M* (and thus, counterfactual observations) as well as ԏ*.
It is unlikely that is an optimal estimator. Specifically, this is because the regularizer in the convex estimation problem above introduces bias in the estimation of M* and thereby introduces a bias in our estimate of Denote by the singular value decomposition
Figure imgf000017_0005
Figure imgf000017_0004
of , and by the projection onto the orthogonal complement to the tangent space of . For
Figure imgf000017_0006
a rank-r matrix X, we colloquially refer to the tangent space of the manifold of up-to rank r matrices at as simply the tangent space of X. The estimation engine 108 derives a correction term that serves to de-bias
Figure imgf000018_0004
and purpose, as the estimator:
Figure imgf000018_0001
It is worth noting that a natural extension to the panel data problem involves multiple treatment matrices each with an associated treatment effect
Figure imgf000018_0003
so that for some fixed k. While the following focuses primarily on the single
Figure imgf000018_0002
treatment case (given its fundamental role and the state of available estimators for that case), multiple treatments can also be addressed by the current estimation engine.
Functionality of the estimation engine 108 was determined based on structured testing.
We begin by formally defining our problem; we in fact present a generalization to the problem described in the previous section, allowing for multiple treatments. Let
Figure imgf000018_0012
M ∈ be a fixed rank-r matrix with singular value decomposition (SVD) denoted by M
Figure imgf000018_0013
where
Figure imgf000018_0010
have orthonormal columns, and
Figure imgf000018_0014
is diagonal with diagonal entries be the condition number of
Figure imgf000018_0011
M*. There are k treatments that can be applied to each entry, and for each treatment
Figure imgf000018_0015
we are given a treatment matrix which encodes (he entries which have received the
Figure imgf000018_0005
Mth treatment (0 meaning no treatment, and 1 meaning being treated). Note that multiple treatments are allowed to be applied to an entry. We then observe a single matrix of outcomes:
Figure imgf000018_0006
is the Hadamard or ‘emtrywise’ product), where each
Figure imgf000018_0007
is an unknown matrix of treatment effects, and E € is a (possibly heterogeneous) random noise matrix. Finally, let ԏ* € be the vector of average ireahneni effects, whose
Figure imgf000018_0016
value is defined as
Figure imgf000018_0008
( ) / and let
Figure imgf000018_0009
e be associated ‘residual’ matrices. Our problem is to estimate ԏ* after having observed O and Z; It is worth noting that the representation above is powerful: for instance, it subsumes the setting where the intervention on any entry is associated with a (0, 1)-valued covariate vector, and the treatment effect observed on that entry is some linear function of this covariate vector plus idiosyncratic noise. Recovery of ԏ* is then equivalent to recovering covariate dependent heterogeneous treatment effects.
Our problem also subsumes the synthetic control setting where K = 1 and Z1 must place support on a single row; the focus of our later analysis will be the case where 2; is allowed to be general.
The assumptions imposed in order to state meaningful results can be divided into two groups. The first are assumptions on M* and E that are, by this point, canonical in the matrix completion literature:
Assumption 1 (Random NoiseX The entries of E are independent, mean-zero, sub-Gaussian random variables with sub-Gaussian norm bounded
Figure imgf000019_0006
Assumption 2 (Incoherence). M* is incoherent:
Figure imgf000019_0007
where denotes the maximum l2-norm of the raws of a matrix.
Figure imgf000019_0005
In addition to these standard conditions on M * and E which we will assume throughout this paper, we will also need to impose conditions on the relationship between M* and the Zm’s. Loosely speaking, these conditions preclude treatment matrices that can be "disguised" within M*, in the sense that their projections onto the tangent spaces of M* are large. Specifically, the formal statements relate to a particular decomposition of the linear space of n x n matrices,
Figure imgf000019_0004
where T* is the tangent space of M* in the manifold consisting of matrices with tank no larger than rank ( M*):
Figure imgf000019_0001
Equivalently , the orthogonal space of T*, denoted
Figure imgf000019_0008
is the subspace of denote the columns and rows are orthogonal, respectively, to the spaces U* and V*. Let (<) denote the projection operator onto
Figure imgf000019_0009
Figure imgf000019_0002
The estimation engine functionality is constructed in two steps, stated as the following two equations:
Figure imgf000019_0003
In the Eq. 8, define by
Figure imgf000020_0001
the Gram matrix with entries
Figure imgf000020_0002
and by
Figure imgf000020_0003
the ‘error’ vector with components where we have let
Figure imgf000020_0004
Figure imgf000020_0005
denote the tangent space of
Figure imgf000020_0006
The first step in the Eq. 7, is a natural convex optimization formulation that is used to compute a ‘rough’ estimate of the average treatment effects. The objective function’s first term penalizes choices of M and ԏ which differ from the observed O, and the second term seeks to penalize the rank of M using the nuclear norm as a (convex) proxy. The tuning parameter λ > 0, which will be specified in our theoretical guarantees, encodes the relative weight of these two objectives.
After the first step, having
Figure imgf000020_0007
as a minimizer of Eq. 7, we could simply use
Figure imgf000020_0017
as our estimator for . However, a brief analysis of the first-order optimality conditions for the Eq. 7 yields a simple, but powerful decomposition of
Figure imgf000020_0008
that suggests a first-order improvement to via de-biasing:
Lemma 1. Suppose is a minimizer of Eq. 7. Let and
Figure imgf000020_0009
Figure imgf000020_0010
let denote the tangent space of Denote
Figure imgf000020_0011
Figure imgf000020_0012
Figure imgf000020_0016
Consider this error decomposition, i.e.
Figure imgf000020_0013
by Eq. 8, and note that D
Figure imgf000020_0014
is entirely a function of observed quantities. Thus, it is known and removable. The second step of Eq. 8, does exactly this. The resulting de-biased estimator, denoted ԏd , is important. The main results characterize the error The crux of this can be gleaned from
Figure imgf000020_0015
the second and third terms of the equation
Figure imgf000021_0001
if
Figure imgf000021_0010
is sufficiently
‘close’ to T*, then Δ3 becomes negligible (because
Figure imgf000021_0002
Showing closeness of
Figure imgf000021_0003
and
T* is the main technical challenge of this work. The remaining error, contributed by A2, can then be characterized as a particular ‘weighted average’ of the (independent) entries of E and the residual matrices which we show to be min-max optimal.
Figure imgf000021_0004
Lemma / is then proved for a single treatment (k = 1). Since k = 1, we suppress redundant subscripts. Consider the first-order optimality conditions of Eq. 7:
Figure imgf000021_0005
(W is called the ‘dual certificate’. Combining Eq. 11 and Eq. 10, we have:
Figure imgf000021_0006
Next, applying
Figure imgf000021_0007
to both sides of Eq. 11 and using Eq. 13:
Figure imgf000021_0008
This is equivalent to Eq. 9, completing the proof.
So, to summarize, the estimation engine 108 functionality, as defined by the software 110 in the memory 106 if the estimation engine 108, and executed by the processor 102 of the estimation engine 108, is constructed in two steps: 1) solve the convex program in Eq. 7 to obtain an initial estimate then de-bias according to Eq. 8. While the estimator has been
Figure imgf000021_0009
presented in a setting that allows for multiple treatments (i.e., k > 1), for the sake simplicity our results here are restricted to the single treatment (i.e., k = 1) setting. Recall that the work on synthetic control is for a single treatment and a particular form of Z1 (support on a single row); our results demonstrated herein are for a single treatment but general Z1. To ease notation, the present disclosure suppresses treatment-specific subscripts (Z1, ԏ1, etc.).
As mentioned previously, the results require a set of conditions that relate the treatment matrix Z to the tangent space T* of M*:
It is assumed that there exist positive constants Cr1, Cr2, such that:
Figure imgf000022_0001
This assumption is necessary for identifying ԏ* (in a manner made formal by proposition 2 illustrated herein). be the matrix of treatment effect ‘residuals.’ Our first
Figure imgf000022_0002
result establishes a bound on the error rate of ԏd. Note that is a zero-mean matrix and is zero outside the support of Z. Thus, the requirement hereinbelow that
Figure imgf000022_0003
is mild. It is trivially met in synthetic control settings. It is also easily seen as met when has independent, sub-Gaussian entries. Finally, as it turns out, the condition can also admit random sub-gaussian matrices with complex correlation patterns.
Theorem 1 : Optimal Error Rate Then for any C2 > 0, for
Figure imgf000022_0004
sufficiently large n, with probability we have
Figure imgf000022_0005
Figure imgf000023_0001
Here, Ce is a constant depending (polynomially) on (where Cr1 and Cr2 are the constants appearing.
To begin parsing this result, consider a ‘typical’ scenario in which
Figure imgf000023_0002
This is minimax optimal (up to log n factors), as shown herein:
Proposition 1 (Minimax Lower Bound)
For any estimator
Figure imgf000023_0008
, there exists an instance with
Figure imgf000023_0003
and
Figure imgf000023_0004
Finally, it is worth considering some special cases under which Eq. 20 reduces further to
Figure imgf000023_0005
which is the optimal rate (up to log n) achievable even when M* and are known. Any of the following are, alone, sufficient to imply Eq. 22:
Independent Independent, sub-gaussian
Figure imgf000023_0009
with o(1) sub-gaussian norm.
Synthetic control and block Z:
Figure imgf000023_0006
consists of an l X C block that is sufficiently sparse:
Figure imgf000023_0007
Panel data regression: Z is sufficiently dense:
Figure imgf000024_0001
This recovers their error guarantee (up to log factors).
A second main result establishes asymptotic normality for the estimation engine 108.
This naturally requires some additional control over the variability of . We consider the setting
Figure imgf000024_0011
in which the
Figure imgf000024_0002
are independent variables.
Theorem 2 (Asymptotic Normality)
Suppose each is a mean-zero, independent random variable with sub-Gaussian norm
Figure imgf000024_0003
Figure imgf000024_0004
Then with probability
Figure imgf000024_0005
Figure imgf000024_0006
Consequently,
Figure imgf000024_0007
Provided that
Figure imgf000024_0008
Asymptotic normality is of econometric interest, as it enables inference. Specifically, inference can be performed using a ‘plug-in’ estimator gotten by substituting
Figure imgf000024_0009
where Md is a de-biased estimator for M*.
Figure imgf000024_0010
Referring to block 142 of FIG. 3, the estimation engine 108 then validates learned treatment effects. The learned treatment effects are checked for robustness. Specific steps in validating learned treatment effect include validating the estimation using simulation, cross- validating the estimation for counterfactuals, and validating the estimation for data collected from randomized controlled experiments, if available. The following provides an example of validating learned treatment effects. It is noted that the present invention is not intended to be limited to the specific experiments exemplified herein.
A set of experiments are conducted on semi-synthetic datasets (the treatment is introduced artificially and thus ground-truth treatment-effect values are known) and real datasets
(the treatment is real and ground-truth treatment-effect values are unknown). The results show that the present estimation engine 108 ԏd is more accurate than existing methods and its performance is robust to various treatment patterns.
The following four benchmarks were implemented: (i) Synthetic Difference-in-
Difference (SDID), as exemplified in Synthetic difference in differences, 2019, by Dmitry
Arkhangelsky, Susan Athey, David A Hirshberg, Guido W Imbens, and Stefan Wager, which is incorporated by reference herein in its entirety; (ii) Matrix-Completion with Nuclear Norm
Minimization (MC-NNM), as exemplified in Matrix completion methods for causal panel data models. Journal of the American Statistical Association, pages 1-41, 2021, by Susan Athey,
Mohsen Bayati, Nikolay Doudchenko, Guido Imbens, and Khashayar Khosravi, which is incorporated by reference herein in its entirety; (iii) Robust Synthetic Control (RSC), as exemplified in Robust synthetic control. The Journal of Machine Learning Research, 19(1): 802-
852, 2018, by Muhammad Amjad, Devavrat Shah, and Dennis Shen, which is incorporated by reference herein in its entirety; and (iv) Ordinary Least Square (OLS): Selects to minimize where
Figure imgf000026_0003
is the
Figure imgf000026_0001
Figure imgf000026_0002
vector of ones. It is worth noting that SDID and RSC one apply to traditional synthetic control patterns (block and stagger herein).
Semi-Synthetic Data (Tobacco)
The first dataset consists of the annual tobacco consumption per capita for 38 states during 1970-
2001, collected from the prominent synthetic control study of Synthetic control methods for comparative case studies: Estimating the effect of California’s tobacco control program. Journal of the American statistical Association, 105(490):493-505, 2010, which is incorporated by reference herein in its entirety (the treated unit California is removed). Similar to Matrix completion methods for causal panel data models. Journal of the American Statistical
Association, pages 1-41, 2021, we view the collected data as M* and introduce artificial treatments. We considered two families of patterns that are common in the economics literature: block and stagger. Block patterns model simultaneous adoption of the treatment, while stagger patterns model adoption at different times. In both cases, treatment continues forever once adopted. Specifically, given the parameters (m1, m2), a set of m1 rows of Z are selected uniformly at random. On these rows, Zij = 1 if and only if j ≥ ti, where for block patterns, ti = m2, and for stagger patterns, ti is selected uniformly from values greater than m2.
To model heterogenous treatment effects, let where is i.i.d and
Figure imgf000026_0004
Figure imgf000026_0005
characterizes the unit-specific effect. Then the observation is
Figure imgf000026_0006
We fix through all experiments, where is the
Figure imgf000026_0008
Figure imgf000026_0009
Figure imgf000026_0007
mean value of M*. The hyperparameters for all algorithms were tuned using rank r ~5 (estimated via the spectrum of M*). Next, we compare the performances of the various algorithms on an ensemble of 1,000 instances with for stagger patte s
Figure imgf000027_0001
rn for block patterns (matching the year 1988, where California
Figure imgf000027_0002
passed its law for tobacco control). The results are reported in the first two rows of Table 1 below in terms of the average normalized error
Figure imgf000027_0003
Note that the treatment patterns here are ‘home court’ for the SDID and RSC synthetic control methods but our approach nonetheless outperforms these benchmarks. One potential reason is that these methods do not leverage all of the available data for learning counterfactuals:
MC-NNM and SDID ignore treated observations. RSC ignores even more: it in addition does not leverage some of the untreated observations in M* on treated units (i.e.. observations Oij for j
< ti on treated units).
Table 1: Comparison of the present algorithm of the present estimation engine 108 (De- biased Convex) to benchmarks on semi-synthetic datasets (Block and Stagger correspond to
Tobacco dataset; Adaptive pattern corresponds to Sales dataset). Average normalized error is reported.
Figure imgf000027_0004
Figure imgf000027_0005
Table 1
As shown by the table, the errors using the approach introduced by this invention are substantially lower that stat-of-the-art alternatives across all settings.
Semi-Svnthetic Data (Sales) The second dataset consists of weekly sales of 167 products over 147 weeks, collected from a Kaggle competition. In this application, treatment corresponds to various ‘promotions’ of a product (e.g., price reductions, advertisements, etc.). We introduced an artificial promotion Z, used the collected data as M* (M* ≈ 12170), and the goal was to estimate the average treatment effect given
Figure imgf000028_0001
follows the same generation process as above with
Figure imgf000028_0002
Now the challenge in these settings is that these promotions are often decided based on previous sales. Put another way, the treatment matrix Z is constructed adaptively. We considered a simple model for generating adaptive patterns for Z: Fix parameters (a, b). If the sale of a product reaches its lowest point among the past a weeks, then we added promotions for the following b weeks (this models a common preference for promoting low-sale products).
Across our instances, (a, b) was generated according to
Figure imgf000028_0003
This represents a treatment pattern where it is unclear how typical synthetic control approaches
(SDID, RSC) might even be applied.
The rank of M* is estimated via the spectrum with r 35. See Table 1 for the results averaged over 1,000 instances. The average of for our algorithm,
Figure imgf000028_0004
versus 27.6% for MC-NNM. We conjecture that the reason for this is that highly structured missing-ness patterns are challenging for matrix-completion algorithms; we overcome this limitation by leveraging the treated data as well. Of course, there is a natural trade-off here: if the heterogeneity in > were on the order of the variation in
Figure imgf000028_0005
then it is unclear that the treated data would help (and it might, in fact, hurt). But for most practical applications, the treatment effects we seek to estimate are typically small relative to the nominal observed values. Real Data
This dataset consists of daily sales and promotion information of 571 drug stores over
942 days, collected from Rossmann Store Sales dataset. The promotion dataset Z is binary (1 indicates a promo is running on that specific day and store). The real pattern is highly complex
(see FIG. 4) and hence synthetic-control type methods (SDID, RSC) again do not apply. Our goal here is to estimate the average increase of sales ԏ* brought by the promotion. The left side of FIG. 4 show the promotion pattern of the real data, while the right side illustrates an estimation of ԏ and test errors.
The hyperparameters for all algorithms were tuned using rank r ~ 70 (estimated via cross validation). A test set Ω consisting of 20% of the treated entires is randomly sampled and hidden. The test error is then calculated by where 0 is the mean-value of O. FIG. 4 shows the results averaged over 100 instances. The algorithm of the estimation engine 108 provides superior test error. This is potentially a conservative measure since it captures error in approximating both M* and ԏ* the variation contributed by M* to observations is substantially larger that that contributed by ԏ*. NOW whereas the ground-truth for ԏ* is not known here, the negative treatment effects estimated by MC-NNM and OLS seem less likely since store-wise promotions are typically associated with positive effects on sales.
With the treatment effects learned, the learned treatment effects can be used for promotions decisions, to the benefit of a user of the estimation system. Business decisions can be made based on the estimated promotion effects. For example, in the decision of whether to begin distribution of a new product line, the estimation system can be used to make a decision to rollout the new product line where tested promotions have significant positive effects. In addition, it may be decided to perform further investigation where there are less positive effects noted. In addition, when promotions that have significant negative effects, a decision may be made by the user to discard of the new product line, or to further investigate the product line with different promotions.
Extension to Multiple Treatments
Whereas the focus has been on the case of a single treatment, we consider in this section an extension of our estimator to the setting of multiple treatments. Specifically, for some (fixed) k, we consider that a unit may be simultaneously subject to multiple treatments so that
Figure imgf000030_0001
where Zm is a 0-1 matrix whose support indicates the observations impacted by treatment m. Thus, our goal here is to estimate a treatment effect vector r* €
The convex estimator utilized for scalar r * has a natural generalization; specifically, we consider as our first step, computing a Tough' estimate of the treatment effect,
Figure imgf000030_0008
by solving a natural optimization problem:
Figure imgf000030_0002
The de-biasing step in this case is slightly more involved, but follows from a decomposition of the quantity
Figure imgf000030_0005
derived from the optimality conditions for the convex program. In particular, we have the following;
Lemma 2. Suppose
Figure imgf000030_0006
is a minimizer for the program in (13).
Figure imgf000030_0007
and denote the tangent space of Then, for each treatment I = 1, 2. . . , k, we have
Figure imgf000030_0004
Figure imgf000030_0003
This decomposition immediately suggests the appropriate de-biasing required. Specifically, define by D € the Gram matrix with entires
Figure imgf000031_0008
the "error" vectors with components Then.
Figure imgf000031_0004
Lemma 2 establishes
Figure imgf000031_0001
Noting that Δ1 is entirely a function of the solution to Eq. 25, and thus is known, we propose the de-biased estimator:
Figure imgf000031_0002
Our definition implicitly assumes that D is invertible. We view this as a natural assumption on
(the absence of) collinearity in treatments.
Now, the RMSE for this estimator is simply
Figure imgf000031_0003
For fixed k, ԏd as defined herein, is also a near-optimal estimator under suitable conditions on M* and the treatment matrices Z/ that are in analogy with the single treatment case. Specifically, we see that if
Figure imgf000031_0005
(which will likely require that k be fixed), then the third term is negligible. This is since
Figure imgf000031_0006
= 0 so that the error is dominated by the second term. Now if, in addition, error revealed by the second term is optimal (by way of comparison with the
Figure imgf000031_0007
error of the least squares estimator for the case when M* is known).

Claims

CLAIMS We claim:
1. A system for building experiments in the real world that suffer from imperfect controls and to infer correctly from the experiments, wherein the system comprises: a storage device for storing transaction data and intervention data; an estimation engine that performs the steps of: receives transaction data and intervention data, also referred to herein as observational data; organize the observational data as a matrix or tensor; transforming the transaction data and intervention data into a panel format; using a de-biased matric completion algorithm to learn treatment effects of promotions at each store at each time period; and validating learned treatment effects.
2. The system of claim 1, wherein the observation data comprising at least two types of data, wherein a first type of observation data received is transaction data, and a second type of observation data is promotions data.
3. The system of claim 2, wherein the transaction data is at a store level, over a specified period of time.
4. The system of claim 2, wherein the promotions data is at the store level, indicating the starting time and the ending time of a promotion at the store.
5. The system of claim 1, wherein the learned treatment effects are checked for robustness.
6. The system of claim 1, wherein the specific steps in validating learned treatment effect include validating the estimation using simulation, cross-validating the estimation for counterfactuals, and validating the estimation for data collected from randomized controlled experiments.
PCT/US2022/025140 2021-04-15 2022-04-15 System and method for estimation of treatment effects from observational and corrupted a/b testing data WO2022221743A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
US202163175500P 2021-04-15 2021-04-15
US63/175,500 2021-04-15

Publications (1)

Publication Number Publication Date
WO2022221743A1 true WO2022221743A1 (en) 2022-10-20

Family

ID=83639775

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/US2022/025140 WO2022221743A1 (en) 2021-04-15 2022-04-15 System and method for estimation of treatment effects from observational and corrupted a/b testing data

Country Status (1)

Country Link
WO (1) WO2022221743A1 (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117954114A (en) * 2024-03-26 2024-04-30 北京大学 Real world data borrowing method and system based on tendency grading and power priori

Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20080294996A1 (en) * 2007-01-31 2008-11-27 Herbert Dennis Hunt Customized retailer portal within an analytic platform

Patent Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20080294996A1 (en) * 2007-01-31 2008-11-27 Herbert Dennis Hunt Customized retailer portal within an analytic platform

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117954114A (en) * 2024-03-26 2024-04-30 北京大学 Real world data borrowing method and system based on tendency grading and power priori

Similar Documents

Publication Publication Date Title
Herbst et al. Sequential Monte Carlo sampling for DSGE models
Yuan et al. 10 structural equation modeling
Schennach Measurement error in nonlinear models: A review
McCrary Manipulation of the running variable in the regression discontinuity design: A density test
Ben-Michael et al. The balancing act in causal inference
US20180225320A1 (en) Anomaly Detection at Coarser Granularity of Data
KR101642216B1 (en) Method and apparatus for analyzing missing not at random data and recommendation system using the same
WO2022221743A1 (en) System and method for estimation of treatment effects from observational and corrupted a/b testing data
CN110020739B (en) Method, apparatus, electronic device and computer readable medium for data processing
Bianchi et al. Model structure selection for switched NARX system identification: a randomized approach
Wolf et al. Stochastic efficiency of Bayesian Markov chain Monte Carlo in spatial econometric models: an empirical comparison of exact sampling methods
Zhang et al. SamEn‐SVR: using sample entropy and support vector regression for bug number prediction
Satopää et al. Partial information framework: Model-based aggregation of estimates from diverse information sources
CN111783830A (en) Retina classification method and device based on OCT, computer equipment and storage medium
CN113223502B (en) Speech recognition system optimization method, device, equipment and readable storage medium
GB2617940A (en) Spatiotemporal deep learning for behavioral biometrics
Sanchez-Becerra Robust inference for the treatment effect variance in experiments using machine learning
WO2012075221A1 (en) Method for inferring attributes of a data set and recognizers used thereon
US11394774B2 (en) System and method of certification for incremental training of machine learning models at edge devices in a peer to peer network
Bartalotti Regression discontinuity and heteroskedasticity robust standard errors: evidence from a fixed-bandwidth approximation
CN114860615A (en) Rule automatic testing method and device, electronic equipment and storage medium
US20240103920A1 (en) Method and system for accelerating the convergence of an iterative computation code of physical parameters of a multi-parameter system
CN112579979A (en) Error discovery rate control method and device in adaptive data analysis
Zrnic et al. Active Statistical Inference
CN113159504A (en) Risk assessment method based on big data analysis, computer device and readable storage medium

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 22789064

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 22789064

Country of ref document: EP

Kind code of ref document: A1