US20080195440A1  Data quality management using business process modeling  Google Patents
Data quality management using business process modeling Download PDFInfo
 Publication number
 US20080195440A1 US20080195440A1 US12/058,044 US5804408A US2008195440A1 US 20080195440 A1 US20080195440 A1 US 20080195440A1 US 5804408 A US5804408 A US 5804408A US 2008195440 A1 US2008195440 A1 US 2008195440A1
 Authority
 US
 United States
 Prior art keywords
 error
 controls
 data quality
 ε
 ɛ
 Prior art date
 Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
 Abandoned
Links
 238000000034 methods Methods 0 abstract claims description title 84
 238000004458 analytical methods Methods 0 abstract claims description 13
 238000005457 optimization Methods 0 abstract description 15
 230000000694 effects Effects 0 claims description 9
 230000002776 aggregation Effects 0 abstract description 5
 238000000342 Monte Carlo simulations Methods 0 claims description 4
 238000004220 aggregation Methods 0 abstract description 4
 230000018109 developmental process Effects 0 abstract description 3
 238000009826 distribution Methods 0 claims description 3
 239000003138 indicator Substances 0 abstract description 3
 238000007726 management methods Methods 0 claims 11
 239000000203 mixtures Substances 0 description 9
 239000000727 fractions Substances 0 description 8
 239000000047 products Substances 0 description 7
 238000001303 quality assessment methods Methods 0 description 7
 230000003449 preventive Effects 0 description 6
 238000009472 formulation Methods 0 description 5
 230000001603 reducing Effects 0 description 4
 238000005070 sampling Methods 0 description 4
 230000002829 reduced Effects 0 description 3
 238000004873 anchoring Methods 0 description 2
 239000004452 animal feeding substances Substances 0 description 2
 238000004422 calculation algorithm Methods 0 description 2
 230000000875 corresponding Effects 0 description 2
 230000035611 feeding Effects 0 description 2
 239000011159 matrix materials Substances 0 description 2
 238000004445 quantitative analysis Methods 0 description 2
 239000002994 raw materials Substances 0 description 2
 230000001105 regulatory Effects 0 description 2
 230000000996 additive Effects 0 description 1
 239000000654 additives Substances 0 description 1
 230000003466 anticipated Effects 0 description 1
 238000004364 calculation methods Methods 0 description 1
 239000000470 constituents Substances 0 description 1
 230000014509 gene expression Effects 0 description 1
 230000001965 increased Effects 0 description 1
 238000004519 manufacturing process Methods 0 description 1
 230000004048 modification Effects 0 description 1
 238000006011 modification Methods 0 description 1
 230000003405 preventing Effects 0 description 1
 230000000644 propagated Effects 0 description 1
 238000006722 reduction reaction Methods 0 description 1
 230000011664 signaling Effects 0 description 1
 238000000844 transformation Methods 0 description 1
 230000001131 transforming Effects 0 description 1
Classifications

 G—PHYSICS
 G06—COMPUTING; CALCULATING; COUNTING
 G06Q—DATA PROCESSING SYSTEMS OR METHODS, SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL, SUPERVISORY OR FORECASTING PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL, SUPERVISORY OR FORECASTING PURPOSES, NOT OTHERWISE PROVIDED FOR
 G06Q10/00—Administration; Management
 G06Q10/06—Resources, workflows, human or project management, e.g. organising, planning, scheduling or allocating time, human or machine resources; Enterprise planning; Organisational models
 G06Q10/067—Business modelling

 G—PHYSICS
 G06—COMPUTING; CALCULATING; COUNTING
 G06F—ELECTRIC DIGITAL DATA PROCESSING
 G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
 G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
 G06F16/21—Design, administration or maintenance of databases
 G06F16/215—Improving data quality; Data cleansing, e.g. deduplication, removing invalid entries or correcting typographical errors

 G—PHYSICS
 G06—COMPUTING; CALCULATING; COUNTING
 G06F—ELECTRIC DIGITAL DATA PROCESSING
 G06F17/00—Digital computing or data processing equipment or methods, specially adapted for specific functions
 G06F17/10—Complex mathematical operations
 G06F17/18—Complex mathematical operations for evaluating statistical data, e.g. average values, frequency distributions, probability functions, regression analysis

 G—PHYSICS
 G06—COMPUTING; CALCULATING; COUNTING
 G06Q—DATA PROCESSING SYSTEMS OR METHODS, SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL, SUPERVISORY OR FORECASTING PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL, SUPERVISORY OR FORECASTING PURPOSES, NOT OTHERWISE PROVIDED FOR
 G06Q10/00—Administration; Management
 G06Q10/06—Resources, workflows, human or project management, e.g. organising, planning, scheduling or allocating time, human or machine resources; Enterprise planning; Organisational models

 G—PHYSICS
 G06—COMPUTING; CALCULATING; COUNTING
 G06Q—DATA PROCESSING SYSTEMS OR METHODS, SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL, SUPERVISORY OR FORECASTING PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL, SUPERVISORY OR FORECASTING PURPOSES, NOT OTHERWISE PROVIDED FOR
 G06Q10/00—Administration; Management
 G06Q10/06—Resources, workflows, human or project management, e.g. organising, planning, scheduling or allocating time, human or machine resources; Enterprise planning; Organisational models
 G06Q10/063—Operations research or analysis
 G06Q10/0631—Resource planning, allocation or scheduling for a business operation
 G06Q10/06311—Scheduling, planning or task assignment for a person or group

 G—PHYSICS
 G06—COMPUTING; CALCULATING; COUNTING
 G06Q—DATA PROCESSING SYSTEMS OR METHODS, SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL, SUPERVISORY OR FORECASTING PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL, SUPERVISORY OR FORECASTING PURPOSES, NOT OTHERWISE PROVIDED FOR
 G06Q10/00—Administration; Management
 G06Q10/06—Resources, workflows, human or project management, e.g. organising, planning, scheduling or allocating time, human or machine resources; Enterprise planning; Organisational models
 G06Q10/063—Operations research or analysis
 G06Q10/0637—Strategic management or analysis
 G06Q10/06375—Prediction of business process outcome or impact based on a proposed change

 G—PHYSICS
 G06—COMPUTING; CALCULATING; COUNTING
 G06Q—DATA PROCESSING SYSTEMS OR METHODS, SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL, SUPERVISORY OR FORECASTING PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL, SUPERVISORY OR FORECASTING PURPOSES, NOT OTHERWISE PROVIDED FOR
 G06Q10/00—Administration; Management
 G06Q10/06—Resources, workflows, human or project management, e.g. organising, planning, scheduling or allocating time, human or machine resources; Enterprise planning; Organisational models
 G06Q10/063—Operations research or analysis
 G06Q10/0639—Performance analysis
 G06Q10/06395—Quality analysis or management
Abstract
A business process modeling framework is used for data quality analysis. The modeling framework represents the sources of transactions entering the information processing system, the various tasks within the process that manipulate or transform these transactions, and the data repositories in which the transactions are stored or aggregated. A subset of these tasks is associated as the potential error introduction sources, and the rate and magnitude of various error classes at each such task are probabilistically modeled. This model can be used to predict how changes in transactions volumes and business processes impact data quality at the aggregate level in the data repositories. The model can also account for the presence of error correcting controls and assess how the placement and effectiveness of these controls alter the propagation and aggregation of errors. Optimization techniques are used for the placement of error correcting controls that meet target quality requirements while minimizing the cost of operating these controls. This analysis also contributes to the development of business “dashboards” that allow decisionmakers to monitor and react to key performance indicators (KPIs) based on aggregation of the transactions being processed. Data quality estimation in real time provides the accuracy of these KPIs (in terms of the probability that a KPI is above or below a given value), which may condition the action undertaken by the decisionmaker.
Description
 1. Field of the Invention
 The present application generally relates to modeling and quantitative analysis techniques for managing the quality of data and, more particularly, to extending a business process model with constructs to identify the sources data whose quality is of interest, the data transformative tasks where error may be introduced, the error detection and correction controls in the process, and the data repositories whose quality is to be assessed.
 2. Background Description
 As companies increasingly adopt information systems that cover a range of functional areas, they have electronic access to vast amounts of transactional data. Increasingly companies are looking develop dashboards where a variety of key performance indicators that are composed from the transactional data are displayed to assist to business decisions. The quality of data contained in these enterprise information systems has important consequences, both from the internal perspective of making business decisions based on the data as well as the legal obligation to provide accurate reporting to external agencies and stakeholders. As a result, companies spend considerable time and money to assess and improve the quality of data in the transactions that flow through its information systems and are stored in its repositories.
 A considerable body of literature exists on the issue of data quality assessment from the perspective of auditing a given information processing system. The prior work on data quality management comes from the fields of financial accounting and auditing and information systems.
 Data quality and control assessment has been studied in accounting literature since the early 1970s. Most of the studies have approached reliability assessment with the accounting system viewed as a “black box” that transforms data into aggregations of account balances contained in various ledgers (see, for example, W. R. Knechel, “The use of Quantitative Models in the Review and Evaluation of Internal Control: A Survey and Review”, Journal of Accounting Literature, (Vol. 2), Spring 1983:205219). This approach works well from the perspective of an auditor who is interested in assessing the reliability with which the black box performs the data transformations. We review this literature to make note of the key concepts, definitions, and analyses that we adopt and extend in order to develop data quality modeling and analysis techniques at the detailed level of the transformational tasks and processes that are contained within the accounting system.
 B. E. Cushing in “A Mathematic Approach of the Analysis and Design of Internal Control Systems” in The Accounting Review 1974, pp. 2441, developed a mathematic formulation for measuring the reliability for an accounting system. He used the probability that the system makes no errors of any kind in its outputs as the system reliability measure. He also derived a cost measurement by taking into consideration of the cost of executing error correction controls and the risk of undetected errors in the system. It is useful in the sense of evaluating the reliability assessment of a given system. However, Cushing's control model takes the system structure as given; it does not address any problem from the system design perspective. We apply the same basic concepts of reliability and cost measurement to the problems of evaluating system reliability for a detailed process model and to design the optimal set of corrective controls with the objective of cost minimization.
 S. S. Hamlen in “A ChanceConstrained Mix Integer Programming Model for Internal Control Systems”. The Accounting Review 1980, pp. 578593, proposed a mixed integer programming model for designing an internal control system. Her model minimizes the cost of controls subject to a given percentage of quality improvement desired in the output from the system. In order to formulate a linear program, the model imposes instrumental polynomial terms with their respective constraints which have the drawback of growing exponentially with the number of terms. The accounting system is modeled as a set of controls that can correct a set of error types (which could be errors in various ledgers). We extend Hamlen's approach to a more detailed model that identifies error sources within the business process of the accounting system and controls that may be selectively applied to these error sources. Our model also allows us to assess the effect of applying a control to an error source on the resulting probability of errors at all the ledgers that are linked to that error source. This leads to greater flexibility in selecting controls to apply with the potential of better solutions. We also show how our optimization problem formulation, though more detailed than Hamlen's, can be reduced to a nonexponential series of knapsack problems without having to convert a nonlinear system into a linear one.
 Other research in accounting literature focused on probabilistic modeling and quantitative assessment of accounting information system reliability. These studies have focused at the accounting system level modeling of reliability assessment using probabilistic or deterministic methods. They treat the transactions streams and transformative processes within the accounting information systems as a black box. Recent studies have begun to develop more detailed models for the assessment of accounting system reliability.
 R. B. Lea, S. J. Adams, and R. F. Boykin in “Modeling of the audit risk assessment process at the assertion level within an account balance”, Auditing: A Journal of Practice & Theory 1992 (Vol. 11, Supplement): 152179, discussed the audit risk assessment models at different levels of detail within accounting systems. They model how risks of error at the level of the various transaction streams are related to the risk of error at the account balance level to which they contribute. They note that the level of tolerable error at the transaction stream level cannot be assumed to be the same as that for the account balance level. Their risk model covers both inherent risk (in the absence of internal controls) and control risk. We follow their motivation to decompose an account balance to its constituent transaction streams but extend their purely additive model to include (a) the volume of transactions in the various streams and (b) the probabilistic network structure of these transaction streams, identifying the various sources of errors (as represented by a process model). This allows us to overcome the assumption made by their model that the errors in the various transaction streams are independent.
 R. Nado, M. Chams, J. Delisio, and W. Hamscher in “Comet: An Application of ModelBased Reasoning to Accounting Systems”, Proceedings of the Eighth Innovative Applications of Artificial Intelligence Conference AAAI Press (1996) pp. 14821490, developed a process model based reasoning system, which they called “Comet”, for analyzing the effectiveness of controls. This is one of the earliest attempts to decompose the accounting system structure into the level of tasks that process transactions and implement internal controls. They modeled accounting systems as a hierarchically structured graph with nodes representing the transaction processing activities and collection points. The potential for failure in each activity is propagated to the collection points that are the accounts being audited. Controls are modeled in terms of the probability that they will not cover the failures. This model can be used to select the key set of controls that reduce the risk of failure below a threshold. However, the paper does not clarify the quantitative model (if any) that is used. It models only the probability of failures but ignores the magnitude of error in these failures. It also implicitly assumes identical and fixed costs for all controls. Our model adopts the basic process modeling concepts introduced in this paper and extends them to develop the quantitative framework described hereinafter. This enables the performance of rigorous quantitative analysis including Monte Carlo simulation of inherent and control risk and optimization of control usage based on risk and cost.
 Research on data quality in the information systems literature has focused on identifying the important characteristics that define the quality of data (see, for example, Y. Wand and R. Y. Wang, “Anchoring data quality dimensions in ontological foundations”, Communications of the ACM (39:11) (1996), pp. 8695, and R. Y. Wang, “A Product Prospective on Total Data Quality Management”, Communications of the ACM, (41:2) (1998), pp. 5865). Recently, the management of data quality and the quality of associated data management processes has been identified as a critical issue (see D. Ballou, R. Wang, H. Pazer, and G. Tayi, “Modeling Information Manufacturing Systems to Determine Information Product Quality”, Management Science (44:4), April, 1998, pp. 462484). However, most of the papers describe the criteria for the information systems design to improve or achieve good data quality (DQ) or information quality (IQ). To our knowledge, none of the papers have tackled data quality management from the point of view quantitative reliability assessment and optimization, nor did they bring the costs of quality and quality improvement into the DQ or IQ assessment consideration. We consider these issues to be critical from the practical perspective of design and management of enterprise information systems.
 Wand and Wang, supra, are amongst the first who studied the data quality in the context of information systems design. They suggested rigorous definitions of data quality dimensions by anchoring them in ontological foundations and showed that such dimensions can provide guidance to systems designers on data quality issues. They developed a set of Ontological Concepts, and defined Design Deficiencies and Data Quality Dimensions. Then they presented the analysis of Dimensions and the Implications to Information Systems Design. Wang, supra, and Ballou et al., supra, developed the Total Data Quality Management methodology (TDQM). TDQM consists of the concepts and the principles of information quality (IQ) and the information product (IP), and procedures of information management system (IMS) for defining, measuring, analyzing, and improving information products.
 L. L. Pipino, Y. W. Lee, and R. Y. Wang, in “Data Quality Assessment”, Communications of the ACM, (45:4), (2002), pp. 211218, introduced three functional forms of data quality: simple ratio, min or max operators, and weighted average. Based on these functional forms, they developed the illustrative metrics for important data quality dimensions. Finally, they presented an approach that combines the subjective and objective assessments of data quality, and demonstrated how the approach can be used effectively in practice.
 H. Xu in “Managing accounting information quality: an Australian study”, Managing Accounting Information Quality, (2000), pp. 628634, developed and tested a model that identifies the critical success factors (CSF) influencing data quality in accounting information systems. He first proposed a list of factors influencing the data quality of AIS from the literature, and then conducted pilot case studies, using the findings from the pilot study together with the literature to identify possible critical success factors for data quality of accounting information systems. He did case studies of accounting information quality in Australian organizations in practice to test and customize the initial research model and compared similarities and differences between proposed critical success factors with realworld critical success factors.
 E. M. Pierce in “Assessing Data Quality with Control Matrices”, Comminations of the ACM, (47:2), (2004), pp. 8286, developed a technique for information quality management based on the practice from auditing field: an information product control matrix, to evaluate the reliability of an information product. Pierce defined the components of the matrix, and presented a way to link the data problems to the quality controls that should detect and correct these data problems during the information manufacturing process.
 D. Strong, Y. W. Lee, and R. Wang in “Data Quality in Context”, Communications of the ACM, (40:5), (1997), pp. 5865, propose a dataconsumer perspective for data assessments as opposed to the traditional intrinsic DQ assessment. They presented a set of DQ dimensions that consists of not only the Intrinsic DQ, but Accessibility DQ, Contextual DQ and Representational DQ. The latter three concern about the usertask context. They argued that data quality assessment should incorporate the task context of users and the processes by which users' access and manipulate data to meet their task requirements.
 Adopted from Strong et al.'s idea, C. Cappiello, C. Francalanci, and B. Pernici in “Data quality assessment from the user's perspective”, International Workshop on Information Quality in Information Systems, 2004, proposed a data quality assessment model that takes into consideration user requirements in the assessment phase. In their mathematical formulation, parameters and matrices to capture the user and user class's preference and requirement are introduced. Their model showed how data quality assessment should take into account how user requirements vary with the accessed service.
 Our invention addresses the issue of data quality management from the perspectives of the owner or the consumer of the information processing system and predicting and managing the quality of its data when faced with anticipated changes in the business environment in which the system operates. Such changes could include:

 Changes in the relative volume of transactions arriving from different input sources. For example, a small but fastgrowing business unit alters the mix of sales transactions over time and therefore impacts the overall quality of sales data.
 Changes in the business processes and policies that transform the data in the transactions. For example, automated systems replace manual tasks or sections of a process are outsourced.
 Changes in the business controls that attempt to detect and fix errors in the transaction. For example, the thresholds that trigger a control are altered or controls are added or removed as part of process reengineering.
 This invention provides the modeling and analysis for predicting how these changes impact data quality. Then, on the basis of this predictive ability, optimization techniques are used for the placement of error correcting controls that meet target quality requirements while minimizing the cost of operating these controls. This analysis also contributes to the development of business “dashboards” that allow decisionmakers to monitor and react to key performance indicators (KPIs) based on aggregation of the transactions being processed. Data quality estimation in real time provides the accuracy of these KPIs (in terms of the probability that a KPI is above or below a given value), which may condition the action undertaken by the decisionmaker.
 Our approach to modeling data quality takes advantage of the increasing emphasis in many businesses on the formal modeling of business processes and their underlying information processing systems. Although the initial objective of process modeling is usually for resource planning, and services and workflow design purposes, data quality estimation can be an important secondary outcome.
 A business process model can be used to represent the sources of transactions entering the information processing system and the various tasks within the process that manipulate or transform these transactions. We associate a subset of these tasks as the potential error introduction sources and probabilistically model the rate and magnitude of various error classes at each such task. We also define the information repositories such as accounting ledgers and other databases where the transactions are eventually stored and whose quality needs to be assessed. A network of links (often with probabilistic branches) connects the transaction sources, error sources, and the information repositories.
 The foregoing and other objects, aspects and advantages will be better understood from the following detailed description of a preferred embodiment of the invention with reference to the drawings, in which:

FIG. 1 is a block diagram of a process network consisting of transaction sources, error sources and audit targets; 
FIG. 2A is a block diagram illustrating preventive controls on an error source, andFIG. 2B is a block diagram illustrating feedforward control on an error source; 
FIG. 3 is a block diagram illustrating a sequence of feedforward controls at an error source; and 
FIG. 4 is an influence diagram of a simple control system.  A business process model represents the flow of physical items or informational artifacts through a sequence of tasks and subprocesses that operate on them. The flow may be controlled by different types of “gateways” that can diverge or converge flows using constructs such as branches, forks, merges, and joins. These elements form a directed graph with the tasks and gateways as nodes. The graphs may be cyclic (with the probability of a cycle being less than one) as well as hierarchical, where one of the nodes could be a subprocess containing its own directed graph.
 We extend the business process modeling framework by adding the following attributes relevant to modeling data quality. Consider a business process with T tasks, including all the tasks in its subprocesses. We assign some of these tasks to be transaction sources, error sources, and audit targets as defined next.
 A start event or initial task in a process may be assigned to be a transaction source. This is the origination point of a transaction in which an error is yet to be introduced. A transaction source is characterized by a volume of transaction over a predefined time period and a random variable signifying the quantitative value of the transaction. For financial accounting data, this is typically the book value of the transaction.

 Let T_{S}⊂T be the set of transaction sources in the process model.
 Let x_{k }be a random variable representing the book value a transaction originating from the transaction source t_{k}∈T_{S}.
Errors could occur when data originating from a transaction source passes through a subsequent task that is assigned to be an error source. Error sources are tasks that operate on the incoming transaction and could introduce errors in them.  Let T_{E}⊂T be the set of error sources in the process model.
 Let P_{i}(ε) be the error incidence probability for error class ε in the error source t_{i}∈T_{E}.
Borrowing from financial accounting practice, we consider three classes of error:  1. Valuation error, which is defined as an error in the magnitude or value of a valid transaction. This can happen when a transaction's book value contains the wrong number due to data entry or mathematical calculation error.
 Substituting ε=v, let p_{i}(v) be the probability that a valuation error is introduced at the error source t_{i}.
 Let z_{i }be a random variable representing the taint of the valuation error. “Taint” is defined as the ratio of the error magnitude to the book value. If a valuation error is introduced at the error source t_{i}, the magnitude of that error in the book value is defined as:

e _{i} ^{v} =z _{1} ·x _{1}, (1)  where x_{i }is the observable book value of a transaction at error source t_{i }and e_{1} ^{v }is the random discrepancy between this hook value and the true value of the transaction, known as its audit value.
 2. Existence error is defined as the introduction of spurious transaction entries at the error source. This can happen if the task at the error source erroneously introduces a new or duplicate transaction into the business process or fails to follow a business rule that calls for the cancellation or rejection of a real transaction.

 Substituting ε=e, let p_{i}(e) be the probability that an existence error is introduced at the error source t_{i}.
 Let x_{i} ^{e }be the random variable for the book value of the spurious transaction.
 If an existence error is introduced at the error source i, the magnitude of that error in the book value is defined as:

e_{i} ^{e}=x_{i} ^{e}. (2)  3. Completeness error occurs when a valid transaction is lost or goes missing at the error source. This can happen for example when a valid transaction is erroneously deleted or canceled or if there is a failure to create a new data record as required by a business rule at the task.

 Substituting ε=c, let p_{i}(c) be the probability that a completeness error is introduced at the error source t_{i}.
 If a completeness error is introduced at the error source i, the magnitude of that error in the book value is defined as:

e_{i} ^{e}=x_{i}. (3)  From the above definitions of the three error classes, note that an error source can introduce only one class of error in any single transaction.
 Audit targets are repositories in the business process where transactions can be stored and retrieved. These could be databases containing business and financial data that is used by the company in its decisionmaking and evaluation of its strategy, or used to generate quarterly and annual financial reports to external parties such as shareholders and regulatory agencies.

 Let T_{A}⊂T be the set of audit targets in the process model (where we model repositories as tasks).
 Let X_{j }be the set of transactions in an audit target t_{j}∈T_{A }and X_{j} ^{ε} be the subset of transactions containing error of class ε.
 Let x_{j }be the book value of a transaction in X_{j }and e_{j} ^{ε} be the magnitude of an erroneous transaction in X_{j} ^{ε}.
 As described in more detail below, we consider three mutually exclusive classes of error: valuation, existence, and completeness, denoted by the set [v,e,c]. Let E_{j} ⊂[v,e,c] be the subset of error classes of interest in the audit target.
Our objective of data quality assessment is to quantify the error in these repositories according to various error metrics.
 1. Rate One: Error Incidence, is the ratio of the number of erroneous transactions of error class ε to the total number of transactions:

$\begin{array}{cc}R\ue89e\phantom{\rule{0.3em}{0.3ex}}\ue89e{1}_{j}^{\varepsilon}=\frac{\uf603{X}_{j}^{\varepsilon}\uf604}{\uf603{X}_{j}\uf604}.& \left(4\right)\end{array}$  2. Rate Two: Proportion of net monetary error, is the ratio of total monetary error over all erroneous transactions of error class ε to the total book value over all transactions:

$\begin{array}{cc}R\ue89e\phantom{\rule{0.3em}{0.3ex}}\ue89e{2}_{j}^{\varepsilon}=\frac{\sum _{{x}_{j}^{\varepsilon}}\ue89e{e}_{j}^{\varepsilon}}{\sum _{{x}_{j}}\ue89e{x}_{j}}.& \left(5\right)\end{array}$  The proportion of net monetary error can be decomposed into the two following rates:
3. Rate Three: Proportion of dollar unit in error or tainting, is the ratio of the total monetary error over all erroneous transactions of error class ε to the total book value over the same set of erroneous transactions: 
$\begin{array}{cc}R\ue89e\phantom{\rule{0.3em}{0.3ex}}\ue89e{3}_{j}^{\varepsilon}=\frac{\sum _{{x}_{j}^{\varepsilon}}\ue89e{e}_{j}^{\varepsilon}}{\sum _{{x}_{j}^{\varepsilon}}\ue89e{x}_{j}}.& \left(6\right)\end{array}$  4. Rate Four: Proportion of dollar units containing error, is the ratio of the total book value of all erroneous transactions of error class ε to the total book value over all the transactions:

$\begin{array}{cc}R\ue89e\phantom{\rule{0.3em}{0.3ex}}\ue89e{4}_{j}^{\varepsilon}=\frac{\sum _{{x}_{j}^{\varepsilon}}\ue89e{x}_{j}}{\sum _{{x}_{j}}\ue89e{x}_{j}}.& \left(7\right)\end{array}$  The transactions sources, error sources and audit targets are connected to each other by a network of links and gateways. Gateways are defined as the means by which (a) the output from a single task diverges into the inputs of multiple tasks or (b) the outputs from multiple tasks converge into the input of a single task. The following types of gateways are common in a process network:
1. Branch: The branch gateway sends the output of a single task to the input of one out of multiple alternative tasks. The branching decision is probabilistic (either directly specified or derived from other branching criteria).
2. Merge: The merge gateway allows the output of multiple tasks to feed into the input of a single task which is performed when it receives an input from any one of the tasks being merged.
3. Fork: The fork gateway sends the output of a single task to the inputs of multiple tasks at the same time, resulting in the creation of parallel streams of task activity.
4. Join: The join gateway allows the output of multiple tasks to feed into the input of a single task which is performed only when it receives input from all of the tasks being joined. This is usually present to synchronize the parallel task activities created as a result of a fork upstream in the process network.  We can traverse a process network with the objective of identifying the following parameters that link transaction sources to error sources, and error sources to audit targets:

 Let V_{kl }be the volume of transactions that flow from a transaction source t_{k }to a task t_{i }designated as an error source.
 Let P_{ij }be the probability that a transaction that flows through an error source t_{i }will subsequently be stored in an audit target repository t_{j}.

FIG. 1 shows a network diagram linking the transaction sources, error sources, and audit targets. The dashed links between any two nodes denote a (possibly null) set of tasks and gateways (hidden in the figure) that intermediate the flow of transactions between the two nodes in the direction shown. By definition, these hidden tasks cannot be transaction sources, error sources, or audit targets.  As shown by the figure, an error introduced at an error source may be stored in several audit targets. Also, a single audit target may contain errors introduced at multiple error sources. With the data quality attributes defined above and the network interconnections depicted in
FIG. 1 , the propagation of transactions and their errors can now be calculated. The volume of transactions, V_{i}, reaching error source t_{i}∈T_{E }from all transactions sources t_{k}∈·T_{S}: 
$\begin{array}{cc}{V}_{i}=\sum _{{t}_{k}\in {T}_{S}}\ue89e{V}_{\mathrm{ki}}.& \left(8\right)\end{array}$  The book value, x_{i}, of a transaction reaching error source, t_{i}:

$\begin{array}{cc}{x}_{i}=\frac{\sum _{{t}_{k}\in {T}_{S}}\ue89e{x}_{k}\xb7{V}_{\mathrm{ki}}}{{V}_{i}}.& \left(9\right)\end{array}$  The magnitude of error, e_{i} ^{ε}, of error class introduced by error source t_{i }is given by Equations (1), (2) and (3) for valuation, existence and completeness errors respectively.
 The transactions passing through error sources propagate to audit targets based on the probability P_{ij }which is determined from the process network. The aggregation of all transactions from all error sources results in X_{j }defined above, the set of transactions in an audit target, t_{j}∈T_{A}. The subset of these transactions containing errors depends on the error incidence probability p_{i}(ε) for each error source and the volume of transactions flowing through it.
 At each audit target, t_{j}∈T_{A}, we calculate the set of error rates defined above, corresponding to each of the error classes. Let ∈ε[v,e,c] denote the class of error for which we calculate the error rates.

$\begin{array}{cc}R\ue89e\phantom{\rule{0.3em}{0.3ex}}\ue89e{1}_{j}^{\varepsilon}=\frac{\uf603{X}_{j}^{\varepsilon}\uf604}{\uf603{X}_{j}\uf604}=\frac{\sum _{{t}_{i}\in {T}_{E}}\ue89e{V}_{i}\xb7{p}_{i}\ue8a0\left(\varepsilon \right)\xb7{P}_{\mathrm{ij}}}{\sum _{{t}_{i}\in {T}_{E}}\ue89e{V}_{i}\xb7{P}_{\mathrm{ij}}},& \left(10\right)\\ R\ue89e\phantom{\rule{0.3em}{0.3ex}}\ue89e{2}_{j}^{\varepsilon}=\frac{\sum _{{x}_{j}^{\varepsilon}}\ue89e{e}_{j}^{\varepsilon}}{\sum _{{x}_{j}}\ue89e{x}_{j}}=\frac{\sum _{{t}_{i}\in {T}_{E}}\ue89e{V}_{i}\xb7{e}_{i}^{\varepsilon}\xb7{p}_{i}\ue8a0\left(\varepsilon \right)\xb7{P}_{\mathrm{ij}}}{\sum _{{t}_{i}\in {T}_{E}}\ue89e{V}_{i}\xb7{x}_{i}\xb7{P}_{\mathrm{ij}}},& \left(11\right)\\ R\ue89e\phantom{\rule{0.3em}{0.3ex}}\ue89e{3}_{j}^{\varepsilon}=\frac{\sum _{{x}_{j}^{\varepsilon}}\ue89e{e}_{j}^{\varepsilon}}{\sum _{{x}_{j}}\ue89e{x}_{j}}=\frac{\sum _{{t}_{i}\in {T}_{E}}\ue89e{V}_{i}\xb7{e}_{i}^{\varepsilon}\xb7{p}_{i}\ue8a0\left(\varepsilon \right)\xb7{P}_{\mathrm{ij}}}{\sum _{{t}_{i}\in {T}_{E}}\ue89e{V}_{i}\xb7{x}_{i}\xb7{P}_{\mathrm{ij}}},& \left(12\right)\\ R\ue89e\phantom{\rule{0.3em}{0.3ex}}\ue89e{4}_{j}^{\varepsilon}=\frac{\sum _{{x}_{j}^{\varepsilon}}\ue89e{x}_{j}}{\sum _{{x}_{j}}\ue89e{x}_{j}}=\frac{\sum _{{t}_{i}\in {T}_{E}}\ue89e{V}_{i}\xb7{x}_{i}\xb7{p}_{i}\ue8a0\left(\varepsilon \right)\xb7{P}_{\mathrm{ij}}}{\sum _{{t}_{i}\in {T}_{E}}\ue89e{V}_{i}\xb7{x}_{i}\xb7{P}_{\mathrm{ij}}}.& \left(13\right)\end{array}$  These equations calculate error rates for a single error class. As described above, an error source may introduce up to three classes of errors: valuation, existence, or completeness. For a given transaction however, only a single class of error is possible. Due to this mutual exclusion, the sets of erroneous transactions, X_{j} ^{v}, X_{j} ^{c}, X_{j} ^{e}, have no transactions in common (i.e., their pairwise intersections result in null sets). As a result of this property, the combined error rates for all error classes are:

$\begin{array}{cc}\begin{array}{c}R\ue89e\phantom{\rule{0.3em}{0.3ex}}\ue89e{1}_{j}=\ue89e\sum _{\varepsilon \in {E}_{j}}\ue89eR\ue89e\phantom{\rule{0.3em}{0.3ex}}\ue89e{1}_{j}^{\varepsilon}\\ =\ue89e\frac{\sum _{\varepsilon \in {E}_{i}}\ue89e\sum _{{t}_{i}\in {T}_{E}}\ue89e{V}_{i}\xb7{p}_{i}\ue8a0\left(\varepsilon \right)\xb7{P}_{\mathrm{ij}}}{\sum _{{t}_{i}\in {T}_{E}}\ue89e{V}_{i}\xb7{P}_{\mathrm{ij}}}\\ =\ue89e\frac{\sum _{{t}_{i}\in {T}_{E}}\ue89e\left({V}_{i}\xb7{P}_{\mathrm{ij}}\ue89e\sum _{\varepsilon \in {E}_{i}}\ue89e{p}_{i}\ue8a0\left(\varepsilon \right)\right)}{\sum _{{t}_{i}\in {T}_{E}}\ue89e{V}_{i}\xb7{P}_{\mathrm{ij}}}\end{array}& \left(14\right)\\ \begin{array}{c}R\ue89e\phantom{\rule{0.3em}{0.3ex}}\ue89e{2}_{j}=\ue89e\sum _{\varepsilon \in {E}_{i}}\ue89eR\ue89e\phantom{\rule{0.3em}{0.3ex}}\ue89e{2}_{j}^{\varepsilon}\\ =\ue89e\frac{\sum _{\varepsilon \in {E}_{i}}\ue89e\sum _{{t}_{i}\in {T}_{E}}\ue89e{V}_{i}\xb7{e}_{i}^{\varepsilon}\xb7{p}_{i}\ue8a0\left(\varepsilon \right)\xb7{P}_{\mathrm{ij}}}{\sum _{{t}_{i}\in {T}_{E}}\ue89e{V}_{i}\xb7{x}_{i}\xb7{P}_{\mathrm{ij}}}\\ =\ue89e\frac{\sum _{{t}_{i}\in {T}_{E}}\ue89e\left({V}_{i}\xb7{P}_{\mathrm{ij}}\ue89e\sum _{\varepsilon \in {E}_{i}}\ue89e{e}_{i}^{\varepsilon}\xb7{p}_{i}\ue8a0\left(\varepsilon \right)\right)}{\sum _{{t}_{i}\in {T}_{E}}\ue89e{V}_{i}\xb7{x}_{i}\xb7{P}_{\mathrm{ij}}}\end{array}& \left(15\right)\end{array}$  where the set E_{j} ⊂[v,e,c] consists of the error classes of interest at the audit target.
 These metrics can be directly calculated if point estimates (or means only) are given for the input random variables (such as the transaction book values x_{k }and the taints z_{i}). If instead, probability distributions are specified for the random variables, Monte Carlo simulation can be done to arrive at probability distributions for the outputs.
 Cost of error arises from the failure of reduce or correct errors that accumulate at the audit targets of a transaction process. The cost may arise due to the additional cost or losses incurred because of operating the business with incorrect information (for example poor targeting of potential customers due to erroneous sales data). The cost could also be in the form of penalties assessed by regulatory and legal agencies due to misstatements made as a result of incorrect data in financial ledgers.
 Let ω_{1 }be the unit cost per erroneous transaction.
 Let ω_{2 }be the unit cost per unit of monetary error.
 Then, the total cost due to the number of erroneous transactions for audit target t_{j }can be obtained as follows, applying Equation (14):

$\hspace{1em}\begin{array}{cc}\begin{array}{c}{\Omega}_{1,j}=\ue89eR\ue89e\phantom{\rule{0.3em}{0.3ex}}\ue89e{1}_{j}\xb7{\omega}_{1}\xb7\uf603{X}_{j}\uf604\\ =\ue89e{\omega}_{1}\xb7\sum _{{t}_{j}\in {T}_{i}}\ue89e\left({V}_{i}\xb7{P}_{\mathrm{ij}}\ue89e\sum _{\varepsilon \in {E}_{j}}\ue89e{p}_{i}\ue8a0\left(\varepsilon \right)\right)\end{array}& \left(16\right)\end{array}$  The total cost due to the magnitude of monetary error for audit target t_{j }can be obtained as follows, applying Equation (15):

$\hspace{1em}\begin{array}{cc}\begin{array}{c}{\Omega}_{2,j}=\ue89eR\ue89e\phantom{\rule{0.3em}{0.3ex}}\ue89e{2}_{j}\xb7{\omega}_{2}\xb7\sum _{{x}_{j}}\ue89e{x}_{j}\\ =\ue89e{\omega}_{2}\xb7\sum _{{t}_{j}\in {T}_{F}}\ue89e\left({V}_{i}\xb7{P}_{\mathrm{ij}}\ue89e\sum _{\varepsilon \in {E}_{j}}\ue89e{e}_{i}^{\varepsilon}\xb7{p}_{i}\ue8a0\left(\varepsilon \right)\right)\end{array}& \left(17\right)\end{array}$  The total cost across all audit targets t_{j}∈T_{A }is:

$\begin{array}{cc}\Omega =\sum _{{t}_{j}\in {T}_{A}}\ue89e\left({\Omega}_{1,j}+{\Omega}_{2,j}\right).& \left(18\right)\end{array}$  The set of equations introduced in this section enables the assessment of data quality at an audit target both in terms of error rates and cost. This assessment takes into account the structure of the business process and the location of transaction sources and error sources within it. Process owners can use this assessment to quantify the impact of changes in process structure or transaction volumes on the quality of data being stored. In auditing terminology, this level of analysis estimates the inherent risk of the accounting system. In the next section, we begin to estimate the effect of applying error detection and correction controls in order to reduce error rates and costs.
 Businesses implement internal control systems to reduce the incidence of errors in its business processes. Controls may be implemented either to prevent errors from being introduced or to monitor for and detect errors after they have been generated at error sources. In the latter case, the control could attempt to correct the errors as they are detected (feedforward control) or to report them so that the errorproducing action may be eventually corrected (feedback control) (see B. E. Cushing, “A Further Note on the Mathematic Approach to Internal Control”, The Accounting Review, Vol. 50, No. 1, 1975, pp. 141154).
 For our model, we consider the controls that have a direct impact on reducing the number of erroneous transactions introduced at an error source. This includes preventive and feedforward controls but excludes feedback controls because they lack the direct corrective action on erroneous transactions.
FIGS. 2A and 2B show how these control types interact with an error source and alter its probability of introducing an error from p(ε) to p(ε_{c}). More particularly,FIG. 2A shows the impact of preventive control on an error source, andFIG. 2B shows the impact of feedforward control on an error source. Note that the controls may only impact the probability of an error, not its taint.  An error source may have a sequence of feedforward controls associated with it to monitor, detect, and fix errors that may be introduced by the error source or by any of the intervening controls. This is shown in
FIG. 3 . The figure also depicts the possibility that not all the transactions that leave an error source may be sent to a control. A random sampling or a business rule may be used to select the subset of transactions that are sent to each control.  To develop the mathematical formula for calculating, p(ε_{K}), the probability of error after a sequence of controls K is applied, let us consider the simplest case of one error source and one feedforward control. There are four variables in the system to describe the state of this control system:

 the error status E=(ε,
ε ),  the control signaling status C_{s}=(c_{s},
c _{s}),  the control fixing status C_{f}=(c_{f},
c _{f}), and  the error status after the application of the control, E_{c}=(ε,
ε ).
There are eight possible states in the control system as described in the table below, along with the resulting impact on the error status E_{e }after the application of the control:
 the error status E=(ε,

TABLE 1 Control System States State E C_{s} C_{f} Description E_{c} 1 ε c_{s} c_{f} An error exists, the control signals the error, ε and fixes it. 2 ε c_{s} c _{f}An error exists, the control signals the error, ε and does not fix it. 3 ε c _{s}c_{f} An error exists, the control does not signal ε the error, but somehow takes an action of fixing it. 4 ε c _{s}c _{f}An error exists, the control does not signal ε the error, nor fixes it. 5 ε c _{s}c_{f} An error does not exist, the control does not ε signal the error, but somehow takes an action of error “fixing”. 6 ε c _{s}c _{f}An error does not exist; the control does not ε signal the error, nor fixes it. 7 ε c_{s} c_{f} An error does not exist; the control signals ε an error, and takes an action of error “fixing”. 8 ε c_{s} c _{f}An error does not exist; the control signals ε an error, but no fixing action.
We define the following exogenous attributes of a feedforward control that represent the effectiveness of the control (we show later that preventive controls can be formulated as a special case): 
 p(c_{s}ε): the probability that the control signals an error ε in an error source, given that the error ε exists.
 p(c_{s}
ε ): the probability that the control signals an error ε in an error source, given that the error ε does not exist (contra factual).  p(c_{f}c_{s}): the probability that the control takes an action of error fixing, given that it signals an error ε in an error source.
 p(c_{f}
c _{s}): the probability that the control takes an action of error fixing, given that it does not signal an error ε in an error source (contra factual).
The influence diagram of this control system is shown inFIG. 4 . The diagram shows that if the status of C_{s }is known, then the status of C_{f }is independent of the status of E, i.e., C_{f }and E are conditionally independent given:

p(E,C _{f} C _{s})=p(EC _{s})·p(C _{f} C _{s}) (19)  From this conditional independence, we have:

$\hspace{1em}\begin{array}{cc}\begin{array}{c}p\ue89e\left({C}_{f}{C}_{s},E\right)=\ue89e\frac{p\ue8a0\left(E,{C}_{f}{C}_{s}\right)}{p\ue8a0\left(E{C}_{s}\right)}\\ =\ue89e\frac{p\ue8a0\left(E{C}_{s}\right)\xb7p\ue8a0\left({C}_{f}{C}_{s}\right)}{p\ue8a0\left(E{C}_{s}\right)}\\ =\ue89ep\ue8a0\left({C}_{f}{C}_{s}\right)\end{array}& \left(20\right)\end{array}$  Using this, we derive the probability of any state in the control system as follows:

$\hspace{1em}\begin{array}{cc}\begin{array}{c}p\ue8a0\left(E,{C}_{s},{C}_{f}\right)=\ue89ep\ue8a0\left({C}_{f},{C}_{s}E\right)\xb7p\ue8a0\left(E\right)\\ =\ue89ep\ue8a0\left({C}_{f}{C}_{s},E\right)\xb7p\ue8a0\left({C}_{s}E\right)\xb7p\ue8a0\left(E\right)\\ =\ue89ep\ue8a0\left({C}_{f}{C}_{s}\right)\xb7p\ue8a0\left({C}_{s}E\right)\xb7p\ue8a0\left(E\right)\end{array}& \left(21\right)\end{array}$  We assume the following for feedforward controls:

 If an control does not signal an error, there will never be an action of fixing an error, i.e., p(c_{f}
c _{s})=0 and p(c _{f}c _{s})=1.  If an control does signal an error, there will always be an action of fixing an error, i.e., p(c_{f}c_{s})=1 and p(
c _{f}c_{s})=0.
These assumptions are always true for preventive controls along with p(c_{s}ε )=0. That is, we formulate a preventive control as a special case of the feedforward control where p(c_{s}ε) is the only parameter that can have a value between 0 and 1. This parameter represents the effectiveness of the control in preventing an error from being generated by the error source.
 If an control does not signal an error, there will never be an action of fixing an error, i.e., p(c_{f}
 Under these assumptions, Equation (21) reduces to the following for each of the eight states in the control system:

p(ε,c _{s} ,c _{f})=p(c _{s}ε)·p)(ε) 
p(ε,c _{s} ,c _{f})=0 
p(ε,c _{s} , c _{f})=0 
p(ε,c _{s} ,c _{f})=p(c _{s}ε)·p(ε) 
p(ε ,c _{s} ,c _{f})=0 
p(ε ,c _{s} ,c _{f})=p(c ,ε )·p(ε ) 
p(ε ,c _{s} , c _{f})=p(c ,ε )·p(ε ) 
p(ε ,c _{s} ,c _{f})=0  Now we derive p(ε_{c}), the probability of error ε in an error source after a single control c has been applied:

$\hspace{1em}\begin{array}{cc}\begin{array}{c}p\ue8a0\left({\varepsilon}_{c}\right)=\ue89ep\ue8a0\left(\varepsilon ,{\stackrel{\_}{c}}_{s},{\stackrel{\_}{c}}_{f}\right)+p\ue8a0\left(\stackrel{\_}{\varepsilon},{c}_{s},{c}_{f}\right)+p\ue8a0\left(\stackrel{\_}{\varepsilon},{\stackrel{\_}{c}}_{s},{c}_{f}\right)+p\ue8a0\left(\varepsilon ,{c}_{s},{\stackrel{\_}{c}}_{f}\right)\\ =\ue89ep\ue8a0\left(\varepsilon ,{\stackrel{\_}{c}}_{s},{\stackrel{\_}{c}}_{f}\right)+p\ue8a0\left(\stackrel{\_}{\varepsilon},{c}_{s},{c}_{f}\right)\\ =\ue89ep\ue8a0\left({\stackrel{\_}{c}}_{s}\varepsilon \right)\xb7p\ue8a0\left(\varepsilon \right)+p\ue8a0\left({c}_{s}\stackrel{\_}{\varepsilon}\right)\xb7p\ue8a0\left(\stackrel{\_}{\varepsilon}\right)\\ =\ue89ep\ue8a0\left(\varepsilon \right)\xb7\left(1p\ue8a0\left({c}_{s}\varepsilon \right)\right)+\left(1p\ue8a0\left(\varepsilon \right)\right)\xb7p\ue8a0\left({c}_{s}\stackrel{\_}{\varepsilon}\right).\end{array}& \left(22\right)\end{array}$  If the control c is applied only to a fraction y of all the transactions coming out of the error source, Equation (22) is modified to:

$\hspace{1em}\begin{array}{cc}\begin{array}{c}p\ue8a0\left({\varepsilon}_{c}\right)=\ue89ey\xb7\left[p\ue8a0\left(\varepsilon \right)\xb7\left(1p\ue8a0\left({c}_{s}\varepsilon \right)\right)+\left(1p\ue8a0\left(\varepsilon \right)\right)\xb7p\ue8a0\left({c}_{s}\stackrel{\_}{\varepsilon}\right)\right]+\\ \ue89e\left(1y\right)\xb7p\ue8a0\left(\varepsilon \right)\\ =\ue89ep\ue8a0\left(\varepsilon \right)\xb7\left(1\mathrm{yp}\ue8a0\left({c}_{s}\varepsilon \right)\right)+\left(1p\ue8a0\left(\varepsilon \right)\right)\xb7(\mathrm{yp}\ue8a0\left({c}_{s}\stackrel{\_}{\varepsilon}\right)\end{array}& \left(23\right)\end{array}$  Next, we consider p(ε_{K}), the probability of error ε after the application of a sequence of controls K to an error source, as shown on
FIG. 3 . Let K=[c_{j}]j=1 . . . J, where c_{j }is the jth control in the sequence. Then, the probability of error after the application of the jth control is: 
p(ε_{c,j})=p(ε_{c,j1})·(1−y _{j} p(c _{s,j}ε))+(1−p(ε_{s,j1}))·(y _{j} p(c _{c,j}ε )) (24)  where, the j subscript in the other variables denote the respective variables for the jth control.
 Equation (24) can be iteratively calculated to compute p(ε_{c})=p(ε_{c,J}) starting at p(ε_{c,0})=p(ε). This quantifies the effect of applying a regime of controls K to a single error source. Using the error propagation formulation described above, we can now assess the impact of the controls on the error rates and cost of error at the audit targets in the business process. In auditing terminology, this level of analysis estimates the control risk in the accounting system.
 The application of controls at an error source incurs a cost. We consider this cost to be linearly proportional to the number of transactions passing through the control. This cost consists of the cost to detect if an error exists and the cost to fix the error if found. Let ω(c_{s}) be the cost to monitor, detect and signal an error (incurred on all transactions passing through the control) and ω(c_{f}) be the cost of fixing each error (incurred only on the transactions deemed erroneous). Then, the cost per transaction passing through the control is:

ω(c)=ω(c _{s})+ω(c _{f})·(p(c _{f} ,c _{x},ε )+p(c _{f} ,c _{s}, ε)+p(c _{f} ,c _{s},ε )+p(c _{f} ,c _{s},ε))  Applying the assumptions for feedforward controls and Equation (21),

ω(c)=ω(c _{s})+ω(c _{f})·(p(c _{s}ε )·(1−p(ε))+p(c _{s}ε)·p(ε)). (25)  Considering T_{E }error sources and a sequence of controls K, at an error source t_{t}∈T_{E}, we have the total cost of controls in the business process:

$\begin{array}{cc}{\Omega}_{C}=\sum _{{t}_{j}\in {T}_{E}}\ue89e\left({V}_{i}\ue89e\sum _{{c}_{j}\in {K}_{i}}\ue89e{y}_{j}\ue89e\omega \ue8a0\left({c}_{j}\right)\right)& \left(26\right)\end{array}$  where V_{i}, as defined in Equation (8), is the volume of transactions reaching the error source t_{i}.
 Now we are in a position to formulate optimization problems that trade off the cost of controls at the error sources with the cost of error at the audit targets. This is done in the next section.
 The business process and control models developed above allow us to formulate the following series of optimization problems.

 For these formulations, we use the following variables: The overall system reliability (1−R) across all the audit targets, where R is either

$\sum _{{t}_{j}\in {T}_{A}}\ue89eR\ue89e\phantom{\rule{0.3em}{0.3ex}}\ue89e{1}_{j}$  as defined by Equation (14) or

$\sum _{{t}_{j}\in {T}_{A}}\ue89eR\ue89e\phantom{\rule{0.3em}{0.3ex}}\ue89e{2}_{j}$  as defined by Equation (15).

 The total cost of error across all audit targets, Ω as given in Equation (18).
 The total cost of controls in the business process, Ω_{C }as given in Equation (26).
 The decision variables y_{j}, which is the fraction of transactions at error source t_{i}∈T_{E }that will be sent to a control c_{j}∈K_{i}, where K_{i }is the sequence of controls available for the error source t_{i}.
 Using the above notation, the optimization formulations are as follows:

 1. Maximize the system reliability (1−R), subject to a budget {circumflex over (Ω)}_{C }for the total control cost in the business process, i.e., Ω_{C}≦{circumflex over (Ω)}_{C}.
 2. Minimize the control cost Ω_{C}, subject to a target system reliability (1−{circumflex over (R)}), i.e., R≦{circumflex over (R)}.
 3. Minimize the cost of error Ω, subject to a budget {circumflex over (Ω)}_{C }for the total control cost in the business process, i.e., Ω_{C}≦{circumflex over (Ω)}_{C}.
 4. Minimize the control cost Ω_{C}, subject to a budget {circumflex over (Ω)} for the total cost of error in the business process, i.e., Ω≦{circumflex over (Ω)}.
 5. Minimize the total cost in the process (Ω+Ω_{C}).
 As a special case with a tractable solution, consider the optimization problem 4 above, where the cost of control must be minimized to as to keep the cost of error in the system below a threshold budget {circumflex over (Ω)}.
 We solve this problem by dividing it into two subproblems. One subproblem is at the audit targets stage, where we wish to minimize the total cost of error, given sets of controlled error levels and their corresponding control cost for each error source. The second subproblem is to come up with these sets at the error sources stage, where we wish to minimize the control cost for a given error level.
 For the audit target stage subproblem, Equation (18) calculates the total cost of error across all audit targets Ω, which can be written as follows, if we consider only a single class of error in our analysis:

$\begin{array}{cc}\Omega =\sum _{{t}_{j}\in {T}_{A}}\ue89e\left(\sum _{{t}_{i}\in {T}_{E}}\ue89e{V}_{i}\xb7{P}_{\mathrm{ij}}\ue8a0\left({\omega}_{1}+{\omega}_{2}\ue89e{e}_{i}^{\varepsilon}\right)\xb7{p}_{i}\ue8a0\left(\varepsilon \right)\right)& \left(27\right)\end{array}$  To meet the Ω≦{circumflex over (Ω)} requirement, we need to apply controls at one or more error sources to reduce the “posterior” error rates p(ε_{K}) at some cost. For each error source t_{t}, we characterize a set of pairs: {(ω_{i,k},p_{i}(ε_{k} _{ i }))k_{t}∈{1, 2, . . . K_{i}}}, where ω_{i,k}, is the cost of reducing the error level at t_{i }to p_{i}(ε_{k} _{ i }). As described below for the second subproblem, where we optimize the cost of controls at the error sources stage, k_{i}, is a control strategy that can be applied at the error source t_{i}. Table 2 below shows the different levels controls and the associated cost and reliability levels.

TABLE 2 The different error levels (by applying controls) and associated cost of control $\begin{array}{c}\mathrm{error}\ue89e\phantom{\rule{0.8em}{0.8ex}}\ue89e\mathrm{source}\ue89e\phantom{\rule{0.8em}{0.8ex}}\ue89e1\\ \stackrel{\uf612}{\begin{array}{c}\left({\omega}_{1,1},{p}_{1}\ue8a0\left({\varepsilon}_{1}\right)\right)\\ \left({\omega}_{1,2},{p}_{1}\ue8a0\left({\varepsilon}_{2}\right)\right)\\ \vdots \\ \left({\omega}_{1,K\ue89e1},{p}_{1}\ue8a0\left({\varepsilon}_{K\ue89e1}\right)\right)\end{array}}\end{array}\ue89e\hspace{1em}\hspace{1em}$ $\begin{array}{c}\mathrm{error}\ue89e\phantom{\rule{0.8em}{0.8ex}}\ue89e\mathrm{source}\ue89e\phantom{\rule{0.8em}{0.8ex}}\ue89e2\\ \stackrel{\uf612}{\begin{array}{c}\left({\omega}_{2,1},{p}_{2}\ue8a0\left({\varepsilon}_{1}\right)\right)\\ \left({\omega}_{2,2},{p}_{2}\ue8a0\left({\varepsilon}_{2}\right)\right)\\ \vdots \\ \left({\omega}_{2,K\ue89e2},{p}_{2}\ue8a0\left({\varepsilon}_{K\ue89e2}\right)\right)\end{array}}\end{array}\hspace{1em}$ $\begin{array}{c}\mathrm{error}\ue89e\phantom{\rule{0.8em}{0.8ex}}\ue89e\mathrm{source}\ue89e\phantom{\rule{0.8em}{0.8ex}}\ue89e3\\ \stackrel{\uf612}{\begin{array}{c}\left({\omega}_{3,1},{p}_{3}\ue8a0\left({\varepsilon}_{1}\right)\right)\\ \left({\omega}_{3,2},{p}_{3}\ue8a0\left({\varepsilon}_{2}\right)\right)\\ \vdots \\ \left({\omega}_{3,K\ue89e3},{p}_{3}\ue8a0\left({\varepsilon}_{K\ue89e3}\right)\right)\end{array}}\end{array}\ue89e\dots ,\dots $ $\begin{array}{c}\mathrm{error}\ue89e\phantom{\rule{0.8em}{0.8ex}}\ue89e\mathrm{source}\ue89e\phantom{\rule{0.6em}{0.6ex}}\ue89eI\\ \stackrel{\uf612}{\begin{array}{c}\left({\omega}_{I,1},{p}_{I}\ue8a0\left({\varepsilon}_{1}\right)\right)\\ \left({\omega}_{I,2},{p}_{I}\ue8a0\left({\varepsilon}_{2}\right)\right)\\ \vdots \\ \left({\omega}_{I,K\ue89e1},{p}_{I}\ue8a0\left({\varepsilon}_{K\ue89e1}\right)\right)\end{array}}\end{array}\hspace{1em}$
The objective here is to pick an appropriate level of control at each error source so as to keep the system level cost of error below the threshold budget {circumflex over (Ω)}. This can be written as follows: 
$\begin{array}{cc}\mathrm{min}\ue89e\sum _{{t}_{j}\in {T}_{i}}\ue89e\stackrel{{K}_{i}}{\sum _{k\in 1}}\ue89e\left({\omega}_{i,{k}_{i}}\xb7{z}_{i,{k}_{i}}\right)\ue89e\text{}\ue89es.t.\phantom{\rule{0.8em}{0.8ex}}\ue89e\Omega =\sum _{{t}_{i}\in {T}_{A}}\ue89e\left(\sum _{{t}_{i}\in {T}_{e}}\ue89e{V}_{i}\xb7{P}_{\mathrm{ij}}\ue8a0\left({\omega}_{1}+{\omega}_{2}\ue89e{e}_{i}^{\varepsilon}\right)\xb7\left(\sum _{{k}_{i}=1}^{{K}_{i}}\ue89e\phantom{\rule{0.3em}{0.3ex}}\ue89e{z}_{i,{k}_{i}}\xb7{p}_{i}\ue8a0\left({\varepsilon}_{{k}_{i}}\right)\right)\right)\le \hat{\Omega}\ue89e\text{}\ue89e\sum _{{k}_{i}=1}^{{K}_{i}}\ue89e{z}_{i,{k}_{i}}\le 1,\text{}\ue89e{z}_{i,{k}_{i}}=\{\begin{array}{cc}1,& \mathrm{if}\ue89e\phantom{\rule{0.8em}{0.8ex}}\ue89e\mathrm{control}\ue89e\phantom{\rule{0.8em}{0.8ex}}\ue89e\mathrm{level}\ue89e\phantom{\rule{0.8em}{0.8ex}}\ue89e{k}_{i}\ue89e\phantom{\rule{0.8em}{0.8ex}}\ue89e\mathrm{is}\ue89e\phantom{\rule{0.8em}{0.8ex}}\ue89e\mathrm{chosen}\ue89e\phantom{\rule{0.8em}{0.8ex}}\ue89e\mathrm{for}\ue89e\phantom{\rule{0.8em}{0.8ex}}\ue89e\mathrm{error}\ue89e\phantom{\rule{0.8em}{0.8ex}}\ue89e\mathrm{source}\ue89e\phantom{\rule{0.8em}{0.8ex}}\ue89ei\\ 0,& \mathrm{if}\ue89e\phantom{\rule{0.8em}{0.8ex}}\ue89e\mathrm{control}\ue89e\phantom{\rule{0.8em}{0.8ex}}\ue89e\mathrm{level}\ue89e\phantom{\rule{0.8em}{0.8ex}}\ue89e{k}_{i}\ue89e\phantom{\rule{0.6em}{0.6ex}}\ue89e\phantom{\rule{0.3em}{0.3ex}}\ue89e\mathrm{is}\ue89e\phantom{\rule{0.8em}{0.8ex}}\ue89e\mathrm{not}\ue89e\phantom{\rule{0.8em}{0.8ex}}\ue89e\mathrm{chosen}\ue89e\phantom{\rule{0.8em}{0.8ex}}\ue89e\mathrm{for}\ue89e\phantom{\rule{0.8em}{0.8ex}}\ue89e\mathrm{error}\ue89e\phantom{\rule{0.8em}{0.8ex}}\ue89e\mathrm{source}\ue89e\phantom{\rule{0.8em}{0.8ex}}\ue89ei\end{array}& \left(28\right)\end{array}$  The decision variable is z_{i,k} _{ i }, k_{i}∈{1, 2, . . . K_{i}}·z_{i,k} _{ i }, is a binary variable, which takes the value of 1 if the pair (ω_{i,k} _{ i }, p_{i}(ε_{k} _{ i })) is chosen for the error source i. The constraint

$\sum _{{k}_{i}=1}^{{K}_{i}}\ue89e{z}_{i,{k}_{i}}\le 1,$  implies that only one reliability level for each error source i can be chosen. Recognizing this problem as the multiple choice knapsack problem (see, for example, S. Martello and P. Toth, Knapsack Problems, Algorithms and Computer Implementations, John Wiley and Sons Ltd., England, 1990) which can be solved by dynamic programming in O(K×W) where K is the total number of levels across all error sources and W is related to the accuracy with which {circumflex over (Ω)} needs to be achieved.
 Next, we develop a control model to compute the minimum cost control strategy for each level at each error source. Although this implies the need to solve an optimization model to compute each (cost, error level) pair, we will show that this optimization model is a knapsack problem which is relatively easy to solve.
 For the subproblem at the level of the error sources, our objective is to come up with a set of (ω_{i,k} _{ i }, p_{i}(ε_{k} _{ i })) pairs for each error source. In doing so, we wish to minimize the cost ω_{i,k} _{ i }of reducing the error level at error source t_{i }to p_{i}(ε_{k} _{ i }).
 Equation (24) provides the means for iteratively calculating p_{i}(ε_{k} _{ i }) for a given set of controls K_{i }at error source t_{i }and a control strategy defined by the fraction of transactions y_{j}, j∈{1, 2, . . . , K_{i}}, reaching each control c_{j}∈K_{i}. If we can reasonably assume that a control attempting to fix a nonerror will not introduce an error, i.e., the states 5 and 7 in Table 1, E_{c}=
ε . With this the error incidence rate p(ε_{K}) simplifies to: 
$\begin{array}{cc}p\ue8a0\left({\varepsilon}_{{K}_{i}}\right)=p\ue8a0\left(\varepsilon \right)\xb7\prod _{j=1}^{\uf603{K}_{t}\uf604}\ue89e(1{y}_{j}\ue89ep\ue8a0\left({c}_{s,j}\varepsilon \right).\text{}\ue89e\mathrm{ln}\ue89e\phantom{\rule{0.3em}{0.3ex}}\ue89ep\ue8a0\left({\varepsilon}_{{K}_{i}}\right)=\mathrm{ln}\ue89e\phantom{\rule{0.3em}{0.3ex}}\ue89ep\ue8a0\left(\varepsilon \right)+\sum _{j=1}^{\uf603{K}_{i}\uf604}\ue89e\mathrm{ln}\ue8a0\left(1{y}_{j}\ue89ep\ue8a0\left({c}_{s,j}\varepsilon \right)\right)& \left(29\right)\end{array}$  Observe that we have linearized the expression for p(ε_{K} _{ i }) using logarithms. This suggests that the sequence in which the controls c_{j}∈K_{i }are applied is inconsequential. So a simple optimization formulation for a single error source with multiple controls is as follows: Given c_{j}∈K_{i }control units and a the target error level p(ε_{k} _{ i }), find the optimal control strategy k_{i}, specified in terms of y_{j}, j∈{1, 2, . . . , K_{i}}, that minimizes the control cost:

$\begin{array}{cc}\mathrm{min}\ue89e\sum _{j=1}^{\uf603K\uf604}\ue89e\left(\omega \ue8a0\left({c}_{j}\right)\xb7{y}_{j}\right)\ue89e\text{}\ue89es.t.\phantom{\rule{0.8em}{0.8ex}}\ue89e\mathrm{ln}\ue89e\phantom{\rule{0.3em}{0.3ex}}\ue89ep\ue8a0\left(\varepsilon \right)+\sum _{j=1}^{\uf603{K}_{i}\uf604}\ue89e\mathrm{ln}\ue8a0\left(1{y}_{j}\ue89ep\ue8a0\left({c}_{s,j}\varepsilon \right)\right)\le \mathrm{ln}\ue89e\phantom{\rule{0.3em}{0.3ex}}\ue89ep\ue8a0\left({\varepsilon}_{{k}_{i}}\right)\ue89e\text{}\ue89e{y}_{j}\in \left[0,1\right]& \left(30\right)\end{array}$  where ω(c_{j}) is the pertransaction cost of applying the jth control to the error source as defined in Equation (25). Although we have assumed (implicitly by making y_{j }binary,) that the controls are applied to all the transactions or none, this can be easily relaxed to allow the control of a fraction of the transactions. Notice that the above problem is a knapsack problem that can be solved by dynamic programming (see, again. Martello and Toth 1990, supra) in O(J×R) where J is the number of controls and R is a number based on the accuracy desired of p(ε_{k} _{ i }).
 Noting from Equation (29) that the sequence of applying controls does not impact the probability of error after the application of controls, we construct a simple algorithm that can find the optimal control strategy k_{i }for a given target error level p(ε_{k} _{ i }). This is shown in Table 3. We select the control with the highest costeffectiveness ratio and apply it to all the transactions in the error source. If the resulting error level is still higher than the target, we apply the control with the next highest costeffectiveness ratio. When the error level falls below the target, we adjust the sampling fraction y_{j }of the last selected to so as to achieve the target error level. Thus, the sampling fractions of all controls will be 1 or 0 with the exception of one control, whose sampling fraction will be in [01].

TABLE 3 Algorithm for Control Strategy Selection Given target error level, p(ε_{k} _{ i }) Candidate control set K = {C_{1}, C_{2}, . . . C_{j}}, Solution set [y_{j}]j ε {1,2, . . . , K} = [0] Set P = p(ε) 1. $\mathrm{Calculate}\ue89e\phantom{\rule{0.8em}{0.8ex}}\ue89e\mathrm{the}\ue89e\phantom{\rule{0.8em}{0.8ex}}\ue89e\mathrm{cost}\ue89e\text{}\ue89e\mathrm{effectiveness}\ue89e\phantom{\rule{0.8em}{0.8ex}}\ue89e\mathrm{ratio},\frac{p\ue8a0\left({c}_{s,j}\varepsilon \right)}{\omega \ue8a0\left({c}_{j}\right)},$ for each candidate control unit in K 2. Choose the control that has the highest value $j\ue89e\text{*}=\underset{{c}_{j}\ue89e\mathrm{\epsilon K}}{\mathrm{max}}\ue89e\left(\frac{p\ue8a0\left({c}_{s,j}\varepsilon \right)}{\omega \ue8a0\left({c}_{j}\right)}\right)$ 3. Update P = P · (1 − p(c_{s,j*}ε)) 4. if P > p(ε_{k} _{ i }), set y_{j* }= 1; take C_{j* }off the candidate list K, if K ≠ φ go to step 2 else terminate the procedure with failure else $\mathrm{set}\ue89e\phantom{\rule{0.8em}{0.8ex}}\ue89e{y}_{j\ue89e\text{*}}=\frac{Pp\ue8a0\left({\varepsilon}_{{k}_{i}}\right)\ue89e\left(1p\ue8a0\left({c}_{s,j\ue89e\text{*}}\varepsilon \right)\right)}{P\xb7p\ue8a0\left({c}_{s,j\ue89e\text{*}}\varepsilon \right)}$ terminate the procedure with success  We have described a framework for the quantitative modeling of data quality in a business process. We have shown how the model can be used to make assessments of data quality in a predefined process as well as to develop optimal control system designs that meet reliability or cost requirements.
 These techniques will be of value to business process owners as well as to evaluators of data quality (such as auditors in case of business processes with financial transactions and accounts). However, the users of these techniques must adopt a methodology by which the data quality model must be developed and maintained. The methodology comprises of the following steps:
 1. Create a model of an existing business process. Various modeling tools are commercially available for this purpose.
2. Utilizing the modeling framework developed in the Process Model section, identify the transaction sources, error sources, and audit targets.  2.1. For transaction sources, obtain or estimate the volume of transactions over a given time period (e.g., per day, month, quarter, or etc.) and estimate the transaction book values. This may be a simple average book value or a probability distribution based on historical transaction data.
 2.2. For error sources, obtain the probability of errors prior to the application of any controls. This may be obtained from the logs of controls that already exist. For a new business process or for error sources that do not have logs of past control activity, an estimation must be done based on comparable error sources with available data. The taint of the error sources must also be obtained from historical logs or otherwise estimated. Note that the taint may be a point estimate or a probability distribution.
 2.3. For audit targets, specify the types of errors of interest and if any error level requirements exist for them.
 3. Run the error propagation analysis described in the Process Model section to estimate error rates and cost of error at the audit targets. For a model with probability distributions, a Monte Carlo simulation can be performed to estimate error rates and costs in terms of probability distributions. The process analyst may develop multiple scenarios to test different expectations of future process changes, such as changes in transaction volumes and business process topology and policies.
4. Utilize the control systems model developed in the Control Model section to associate error sources with a set of controls. These may be existing or available controls. For each control, estimate its error detection and correction effectiveness, as defined by the probabilities p(c_{s}ε) and p(c_{s}ε ). This data is available if the controls are periodically subject to internal or external auditing, where they are evaluated with test data with known errors. The cost of controls can be estimated from the time spent on each control to search for and then fix errors.
5. Analyze the impact of selected controls using the assessment technique described in the Control Model section. The process analyst may run multiple scenarios with different control selections as well as the scenarios developed in step 3 above. The cost of the selected controls can be compared with the reliability level or cost of error at the audit targets.
6. When manual search for the optimal control design is intractable, the optimization techniques shown in step 5 are applicable. Here, we assume that each error source has a set of potential controls and the problem is to select the fraction of the total transactions to send to each.  Although our model and analyses have been motivated by the types of transaction errors and error correction controls in the accounting and auditing domain, it can extend to other domains and definitions of data quality. For example, we can consider that “error” sources introduce uncertainty about the data in a transaction rather than mistakes. Sources of uncertainty could be prices of raw material, customer demand, product development times, service delivery times, etc. We can adapt the error propagation techniques of this invention to propagate these uncertainties to the data repositories. We can also then consider the analogues of “controls” that may reduce these uncertainties, but at a cost. For example, uncertainties about raw material prices can be reduced by establishing longterm contracts or hedging with options. Variability in delivery times may be reduced by automating processes. These uncertainty reduction actions come at a cost and we can trade off these costs with the consequent level or cost of the uncertainty in the data repositories.
 In conclusion, our invention contributes to the analysis of data quality by incorporating a business process framework for the assessment and optimization of data quality. This invention applies not only to the literature and practice of financial accounting and auditing, but also to business decisionsupport systems.
 While the invention has been described in terms of a single preferred embodiment, those skilled in the art will recognize that the invention can be practiced with modification within the spirit and scope of the appended claims.
Claims (11)
1. A data quality management method comprising the steps of:
creating a model of a new or existing business process;
utilizing a modeling framework, identifying transaction sources, error sources, and audit targets;
running error propagation analysis to estimate error rates and cost of error at the audit targets;
utilizing a control systems model to associate error sources with a set of controls; and
analyzing an impact of selected controls using an assessment technique.
2. The data quality management method recited in claim 1 , further comprising the steps for transaction sources of obtaining or estimating a volume of transactions over a given time period and estimating transaction book values.
3. The data quality management method recited in claim 2 , wherein estimating transaction book values is based on a simple average book value or a probability distribution based on historical transaction data.
4. The data quality management method recited in claim 1 , further comprising the steps for error sources of obtaining a probability of errors prior to application of any controls and a taint of the error sources.
5. The data quality management method recited in claim 4 , wherein the probability of errors and the taint of the error sources are obtained from logs of controls that already exist.
6. The data quality management method recited in claim 4 , wherein for a new business process or for error sources that do not have logs of past control activity, an estimation is done based on comparable error sources with available data.
7. The data quality management method recited in claim 1 , further comprising the steps for audit targets of specifying types of errors of interest and if any error level requirements exist for them.
8. The data quality management method recited in claim 1 , further comprising the step for a model with probability distributions of performing a Monte Carlo simulation to estimate error rates and costs in terms of probability distributions.
9. The data quality management method recited in claim 1 , further comprising the step for each control of estimating its error detection and correction effectiveness.
10. The data quality management method recited in claim 1 , further comprising the step of maximizing the reliability level at audit targets subject to meeting a budget for a cost of controls.
11. The data quality management method recited in claim 1 , further comprising the step of minimizing a cost of controls subject to meeting a minimum reliability level at audit targets.
Priority Applications (2)
Application Number  Priority Date  Filing Date  Title 

US11/357,134 US20070198312A1 (en)  20060221  20060221  Data quality management using business process modeling 
US12/058,044 US20080195440A1 (en)  20060221  20080328  Data quality management using business process modeling 
Applications Claiming Priority (2)
Application Number  Priority Date  Filing Date  Title 

US12/058,044 US20080195440A1 (en)  20060221  20080328  Data quality management using business process modeling 
US15/255,685 US9836713B2 (en)  20060221  20160902  Data quality management using business process modeling 
Related Parent Applications (1)
Application Number  Title  Priority Date  Filing Date  

US11/357,134 Continuation US20070198312A1 (en)  20060221  20060221  Data quality management using business process modeling 
Related Child Applications (1)
Application Number  Title  Priority Date  Filing Date 

US15/255,685 Continuation US9836713B2 (en)  20060221  20160902  Data quality management using business process modeling 
Publications (1)
Publication Number  Publication Date 

US20080195440A1 true US20080195440A1 (en)  20080814 
Family
ID=38429449
Family Applications (3)
Application Number  Title  Priority Date  Filing Date 

US11/357,134 Abandoned US20070198312A1 (en)  20060221  20060221  Data quality management using business process modeling 
US12/058,044 Abandoned US20080195440A1 (en)  20060221  20080328  Data quality management using business process modeling 
US15/255,685 Active US9836713B2 (en)  20060221  20160902  Data quality management using business process modeling 
Family Applications Before (1)
Application Number  Title  Priority Date  Filing Date 

US11/357,134 Abandoned US20070198312A1 (en)  20060221  20060221  Data quality management using business process modeling 
Family Applications After (1)
Application Number  Title  Priority Date  Filing Date 

US15/255,685 Active US9836713B2 (en)  20060221  20160902  Data quality management using business process modeling 
Country Status (1)
Country  Link 

US (3)  US20070198312A1 (en) 
Cited By (19)
Publication number  Priority date  Publication date  Assignee  Title 

US20090322782A1 (en) *  20080627  20091231  Microsoft Corporation  Dashboard controls to manipulate visual data 
US20110119523A1 (en) *  20091116  20110519  International Business Machines Corporation  Adaptive remote decision making under quality of information requirements 
US20110313812A1 (en) *  20100618  20111222  HCL America Inc.  Accounting for data dependencies in process models, analysis, and management 
US8209218B1 (en)  20080314  20120626  DataInfoCom Inc.  Apparatus, system and method for processing, analyzing or displaying data related to performance metrics 
US8364519B1 (en)  20080314  20130129  DataInfoCom USA Inc.  Apparatus, system and method for processing, analyzing or displaying data related to performance metrics 
WO2013043686A1 (en) *  20110919  20130328  Citigroup Technology, Inc.  Methods and systems for assessing data quality 
CN103135981A (en) *  20111025  20130605  德商赛克公司  Selective change propagation techniques for supporting partial roundtrips in modeltomodel transformations 
US20130151423A1 (en) *  20111209  20130613  Wells Fargo Bank, N.A.  Valuation of data 
US8510276B2 (en)  20100929  20130813  Microsoft Corporation  Comparing and selecting data cleansing service providers 
US8751436B2 (en)  20101117  20140610  Bank Of America Corporation  Analyzing data quality 
US9031889B1 (en) *  20121109  20150512  DataInfoCom USA Inc.  Analytics scripting systems and methods 
US9230211B1 (en)  20121109  20160105  DataInfoCom USA, Inc.  Analytics scripting systems and methods 
TWI557674B (en) *  20140917  20161111  東芝股份有限公司  Quality controlling device and control method thereof 
US9600504B2 (en)  20140908  20170321  International Business Machines Corporation  Data quality analysis and cleansing of source data with respect to a target system 
US9605529B1 (en)  20130826  20170328  DataInfoCom USA, Inc.  Prescriptive reservoir asset management 
US9678487B1 (en)  20121009  20170613  DataInfoCom USA, Inc.  System and method for allocating a fixed quantity distributed over a set of quantities 
US10095984B1 (en)  20131113  20181009  DataInfoCom USA, Inc.  System and method for well trace analysis 
US10324778B2 (en)  20170227  20190618  International Business Machines Corporation  Utilizing an error prediction and avoidance component for a transaction processing system 
US10371857B1 (en)  20130529  20190806  DataInfoCom USA, Inc.  System and method for well log analysis 
Families Citing this family (15)
Publication number  Priority date  Publication date  Assignee  Title 

US20080027738A1 (en) *  20060731  20080131  Microsoft Corporation  Increasing business value through increased usage and adoption 
US8185827B2 (en) *  20071026  20120522  International Business Machines Corporation  Role tailored portal solution integrating near realtime metrics, business logic, online collaboration, and web 2.0 content 
US8200522B2 (en) *  20071026  20120612  International Business Machines Corporation  Repeatable and standardized approach for deployment of a portable SOA infrastructure within a client environment 
US8296718B2 (en)  20071031  20121023  International Business Machines Corporation  SOA software components that endure from prototyping to production 
US8126758B2 (en) *  20080115  20120228  International Business Machines Corporation  Method and apparatus for information boosting in related but disconnected databases 
US8171415B2 (en) *  20080611  20120501  International Business Machines Corporation  Outage management portal leveraging backend resources to create a role and user tailored frontend interface for coordinating outage responses 
US8606762B2 (en) *  20080916  20131210  Sap Ag  Data quality administration framework 
US10157369B2 (en)  20090205  20181218  International Business Machines Corporation  Role tailored dashboards and scorecards in a portal solution that integrates retrieved metrics across an enterprise 
US8769516B2 (en) *  20100819  20140701  International Business Machines Corporation  Systems and methods for automated support for repairing input model errors 
CA2788356C (en) *  20110831  20160503  Accenture Global Services Limited  Data quality analysis and management system 
US20160098654A1 (en) *  20141001  20160407  Morgan Stanley  Data quality analysis tool 
US20170185926A1 (en) *  20151228  20170629  Sap Se  Object registration 
US10242079B2 (en)  20161107  20190326  Tableau Software, Inc.  Optimizing execution of data transformation flows 
CN107168995A (en) *  20170329  20170915  联想(北京)有限公司  A kind of data processing method and server 
US10394691B1 (en) *  20171005  20190827  Tableau Software, Inc.  Resolution of data flow errors using the lineage of detected error conditions 
Citations (5)
Publication number  Priority date  Publication date  Assignee  Title 

US5842202A (en) *  19961127  19981124  Massachusetts Institute Of Technology  Systems and methods for data quality management 
US20030065541A1 (en) *  20010323  20030403  Restaurant Services, Inc.  System, method and computer program product for adding supply chain components in a supply chain management analysis 
US20030233249A1 (en) *  20020325  20031218  Walsh John G.  Method and system for enterprise business process management 
US20040015381A1 (en) *  20020109  20040122  Johnson Christopher D.  Digital cockpit 
US20060161814A1 (en) *  20030709  20060720  Carl Wocke  Method and system of data analysis using neural networks 

2006
 20060221 US US11/357,134 patent/US20070198312A1/en not_active Abandoned

2008
 20080328 US US12/058,044 patent/US20080195440A1/en not_active Abandoned

2016
 20160902 US US15/255,685 patent/US9836713B2/en active Active
Patent Citations (5)
Publication number  Priority date  Publication date  Assignee  Title 

US5842202A (en) *  19961127  19981124  Massachusetts Institute Of Technology  Systems and methods for data quality management 
US20030065541A1 (en) *  20010323  20030403  Restaurant Services, Inc.  System, method and computer program product for adding supply chain components in a supply chain management analysis 
US20040015381A1 (en) *  20020109  20040122  Johnson Christopher D.  Digital cockpit 
US20030233249A1 (en) *  20020325  20031218  Walsh John G.  Method and system for enterprise business process management 
US20060161814A1 (en) *  20030709  20060720  Carl Wocke  Method and system of data analysis using neural networks 
NonPatent Citations (1)
Title 

Elizabeth M. Pierce, Assessing Data Quality with Control Matrices. Feb. 2004/Vol. 47, No. 2; Communication of the ACM; pgs. 8286 * 
Cited By (33)
Publication number  Priority date  Publication date  Assignee  Title 

US8738425B1 (en)  20080314  20140527  DataInfoCom USA Inc.  Apparatus, system and method for processing, analyzing or displaying data related to performance metrics 
US8209218B1 (en)  20080314  20120626  DataInfoCom Inc.  Apparatus, system and method for processing, analyzing or displaying data related to performance metrics 
US8364519B1 (en)  20080314  20130129  DataInfoCom USA Inc.  Apparatus, system and method for processing, analyzing or displaying data related to performance metrics 
US20090322782A1 (en) *  20080627  20091231  Microsoft Corporation  Dashboard controls to manipulate visual data 
US10114875B2 (en) *  20080627  20181030  Microsoft Technology Licensing, Llc  Dashboard controls to manipulate visual data 
US20110119523A1 (en) *  20091116  20110519  International Business Machines Corporation  Adaptive remote decision making under quality of information requirements 
US8660022B2 (en)  20091116  20140225  International Business Machines Corporation  Adaptive remote decision making under quality of information requirements 
US20120290543A1 (en) *  20100618  20121115  HCL America Inc.  Accounting for process data quality in process analysis 
US20110313812A1 (en) *  20100618  20111222  HCL America Inc.  Accounting for data dependencies in process models, analysis, and management 
US8996479B2 (en)  20100929  20150331  Microsoft Technology Licensing, Llc  Comparing and selecting data cleansing service providers 
US8510276B2 (en)  20100929  20130813  Microsoft Corporation  Comparing and selecting data cleansing service providers 
US8751436B2 (en)  20101117  20140610  Bank Of America Corporation  Analyzing data quality 
US10248672B2 (en)  20110919  20190402  Citigroup Technology, Inc.  Methods and systems for assessing data quality 
WO2013043686A1 (en) *  20110919  20130328  Citigroup Technology, Inc.  Methods and systems for assessing data quality 
CN103135981A (en) *  20111025  20130605  德商赛克公司  Selective change propagation techniques for supporting partial roundtrips in modeltomodel transformations 
US20130151423A1 (en) *  20111209  20130613  Wells Fargo Bank, N.A.  Valuation of data 
US9678487B1 (en)  20121009  20170613  DataInfoCom USA, Inc.  System and method for allocating a fixed quantity distributed over a set of quantities 
US9031889B1 (en) *  20121109  20150512  DataInfoCom USA Inc.  Analytics scripting systems and methods 
US9230211B1 (en)  20121109  20160105  DataInfoCom USA, Inc.  Analytics scripting systems and methods 
US9424518B1 (en)  20121109  20160823  DataInfoCom USA, Inc.  Analytics scripting systems and methods 
US10371857B1 (en)  20130529  20190806  DataInfoCom USA, Inc.  System and method for well log analysis 
US9617834B1 (en)  20130826  20170411  DataInfoCom USA, Inc.  Prescriptive reservoir asset management 
US9605529B1 (en)  20130826  20170328  DataInfoCom USA, Inc.  Prescriptive reservoir asset management 
US9785731B1 (en)  20130826  20171010  DataInfoCom USA, Inc.  Prescriptive reservoir asset management 
US9617843B1 (en)  20130826  20170411  DataInfoCom USA, Inc.  Prescriptive reservoir asset management 
US10095984B1 (en)  20131113  20181009  DataInfoCom USA, Inc.  System and method for well trace analysis 
US10095926B1 (en)  20131113  20181009  DataInfoCom USA, Inc.  System and method for well trace analysis 
US10095982B1 (en)  20131113  20181009  DataInfoCom USA, Inc.  System and method for well trace analysis 
US10095983B1 (en)  20131113  20181009  DataInfoCom USA, Inc.  System and method for well trace analysis 
US9600504B2 (en)  20140908  20170321  International Business Machines Corporation  Data quality analysis and cleansing of source data with respect to a target system 
US10055431B2 (en)  20140908  20180821  International Business Machines Corporation  Data quality analysis and cleansing of source data with respect to a target system 
TWI557674B (en) *  20140917  20161111  東芝股份有限公司  Quality controlling device and control method thereof 
US10324778B2 (en)  20170227  20190618  International Business Machines Corporation  Utilizing an error prediction and avoidance component for a transaction processing system 
Also Published As
Publication number  Publication date 

US9836713B2 (en)  20171205 
US20070198312A1 (en)  20070823 
US20160371612A1 (en)  20161222 
Similar Documents
Publication  Publication Date  Title 

Martens et al.  A threshold error‐correction model for intraday futures and index returns  
Grant et al.  Diversity, diversification, and profitability among British manufacturing companies, 1972–1984  
Grigori et al.  Business process intelligence  
Singh et al.  Empirical validation of objectoriented metrics for predicting fault proneness models  
US6920474B2 (en)  Method and system for enterprise business process management  
US8782201B2 (en)  System and method for managing the configuration of resources in an enterprise  
Sun et al.  Using Bayesian networks for bankruptcy prediction: Some methodological issues  
Kök et al.  Inspection and replenishment policies for systems with inventory record inaccuracy  
Farbey et al.  Evaluating investments in IT  
Vasarhelyi et al.  Principles of analytic monitoring for continuous assurance  
Khanna et al.  Corporate environmental management: regulatory and marketbased incentives  
US7835932B2 (en)  Relational logic management system  
US8271949B2 (en)  Selfhealing factory processes in a software factory  
Doyle et al.  Determinants of weaknesses in internal control over financial reporting  
Arteta et al.  A measure of agility as the complexity of the enterprise system  
US8214308B2 (en)  Computerimplemented systems and methods for updating predictive models  
AU2001259992B2 (en)  Continuously updated data processing system and method for measuring and reporting on value creation performance  
Stackowiak et al.  Oracle data warehousing & business intelligence SO  
US20040133439A1 (en)  Method and system for valuation of complex systems, in particular for corporate rating and valuation  
US20050080701A1 (en)  Methods and systems for managing risk management information  
Boehm et al.  Software development cost estimation approaches—A survey  
JP4557933B2 (en)  System and method for providing financial planning and advice  
US20030195780A1 (en)  Computerbased optimization system for financial performance management  
ProtopappaSieke et al.  Interrelating operational and financial performance measurements in inventory control  
US8484110B2 (en)  Electronic enterprise monitoring apparatus 
Legal Events
Date  Code  Title  Description 

STCB  Information on status: application discontinuation 
Free format text: ABANDONED  FAILURE TO RESPOND TO AN OFFICE ACTION 