US11106643B1 - System and method for integrating systems to implement data quality processing - Google Patents
System and method for integrating systems to implement data quality processing Download PDFInfo
- Publication number
- US11106643B1 US11106643B1 US15/666,685 US201715666685A US11106643B1 US 11106643 B1 US11106643 B1 US 11106643B1 US 201715666685 A US201715666685 A US 201715666685A US 11106643 B1 US11106643 B1 US 11106643B1
- Authority
- US
- United States
- Prior art keywords
- data quality
- data
- rule
- quality rule
- production environment
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active, expires
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/21—Design, administration or maintenance of databases
- G06F16/215—Improving data quality; Data cleansing, e.g. de-duplication, removing invalid entries or correcting typographical errors
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06Q—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
- G06Q10/00—Administration; Management
- G06Q10/06—Resources, workflows, human or project management; Enterprise or organisation planning; Enterprise or organisation modelling
- G06Q10/063—Operations research, analysis or management
- G06Q10/0639—Performance analysis of employees; Performance analysis of enterprise or organisation operations
- G06Q10/06395—Quality analysis or management
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06Q—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
- G06Q10/00—Administration; Management
- G06Q10/06—Resources, workflows, human or project management; Enterprise or organisation planning; Enterprise or organisation modelling
- G06Q10/067—Enterprise or organisation modelling
Definitions
- the invention relates to integrating systems to implement data quality assessments.
- Embodiments of the present invention are directed to a system and method for integrating systems to implement data quality processing.
- a business rule creation module is configured to create a business rule associated with a business term.
- a data quality specification module is configured to create a data quality specification based on the business rule.
- the data quality specification comprises (1) an identity of a column of a table stored in a database comprising data to be tested; (2) a test to perform on the data to be tested; and (3) reference data required to perform the test on the data.
- a validation module is configured to receive the data quality specification; retrieve data associated with the column from the database; and test the retrieved data in accordance with the test using the reference data.
- a result publication module is configured to return a result of the test to the data quality specification module.
- the data quality specification further comprises at least one of (1) an effective date for the test; and (2) a test pass/fail indicator.
- system further includes a lineage module configured to maintain data describing the test performed on the data.
- FIG. 1 is a diagram illustrating an exemplary system of the present invention
- FIGS. 2A and 2B are exemplary user interfaces that may be used in accordance with a preferred embodiment of the present invention.
- FIG. 3 is a flow diagram illustrating an exemplary process of the present invention.
- FIG. 4 is an exemplary system for carrying out embodiments of the present invention.
- Maintaining data quality is a significant issue facing organizations. If a data quality issue is observed or suspected, even in a single record, it is necessary to systematically identify the full universe of exceptions/data quality issues that may exist to quantify the size and scope of the issue and understand the impact on an organization.
- the present invention allows for systematic identification of data quality issues that may exist within the processing systems of an organization. More particularly, the invention involves creation of a well-defined data quality specification from metadata stored in a first system in a manner that a second system can readily access, ingest, and implement a data quality rule in accordance with the specification and a defined workflow.
- First system 110 is one that allows for creation of a data quality rule specification and stores data, metadata and the specifications.
- the second system 120 is one that receives data quality rule specifications and context data from first system 110 and, using such information, retrieves data from one or more databases 115 , as indicated by the rule specification, and tests such data in accordance with the rules indicated in the rule specifications.
- An end user may access system 100 may be via a user device 105 employing a web browser. All communications among the components of system 100 may be through central server 125 .
- first system 110 business rules and specifications are created in first system 110 and second system 120 retrieves the data to be tested from a particular table in a database 115 as specified in the specification. Second system 120 performs the testing on the data retrieved from the database 115 and sends the results back to first system 110 .
- the results of the testing comprise data describing the number of exceptions to the rule.
- Interface 200 may be used by a business stakeholder/end user to create, review and approve a proposed business rule in first system 110 (using, e.g. a business rule creation module) by way of a workflow.
- the proposed business rule may include the name of the business rule, and English description of the rule, the associated business term, the data quality specification that implements the rules, and the policy implemented by the business rule.
- Interface 210 may be used by a data steward/technical stakeholder to create the data quality and profiling rule specification, corresponding to the business rule, in second system 120 (using, e.g., a data quality specification module).
- the data quality and profiling rule specification may include the data quality specification name; an English description of the rule itself; an effective date of the rule; a threshold; an execution frequency; the columns in the table that the data quality rule specification will apply to; the data quality dimension; the categorization by rule family; the allowed value set; and the referenced columns.
- FICO Equivalent at Acquisition must be 0 or a number greater than 300 and less than or equal to 850
- the threshold may be, for example, the percentage of records that must pass out of the whole for the rule to be considered as passing, regardless of the number of exceptions to the rule.
- Other thresholds or scoring mechanisms may be used within the scope of the present invention.
- Data quality dimensions may include completeness (the proportion of stored data against the potential of 100% complete); consistency (data sets match across data stores); uniqueness (a data record will be recorded once based upon how that record is identified); validity (data sets conform to the rules and syntax of its definition); accuracy (the degree to which data captured correctly describes the real world object or event being described); and timeliness (the degree to which data represents reality from the required point in time).
- the categorization by rule family describes the type of check that is being performed (e.g., functional, null check, range/valid values, service level agreement, standard, reconciliation).
- the allowed value set is maintained in the first system 110 and passed to the second system 120 so that the second system 120 can use the allowed values in connection with the testing performed by it. This avoids needing to maintain allowed value sets in multiple platforms. Instead, in accordance with aspects of the present invention, all the information needed by the second system 120 to perform the test is passed to it by way of an integrated workflow.
- the reference columns are indications of database columns containing data that will be needed as a reference in connection with performing the data quality test (e.g. FICO Score can be zero if the value of a SCHEMA.TABLE.TABLECOLUMN such as ACTIVE FLAG is “Y”)
- Second system 120 validates the data quality rules specification and a data quality analyst may then write the appropriate data quality rule in second system 120 based on the specification.
- System 120 executes the data quality rules (e.g., using validation module) and results/metrics of the data quality rule execution is fed back into first system 110 (e.g., using result publication module).
- first system 110 e.g., using result publication module
- the first system 110 can obtain access to data lineage (e.g., processing history/audit trail) for a specific schema.table.tablecolumn. This streamlines the process of generating and documenting/accessing the flow of data.
- data lineage e.g., processing history/audit trail
- the process of creating a business rule, a corresponding data quality specification, implementation of the specification, testing of data in accordance with the specification, and delivering results of the testing is managed in accordance with an automated workflow.
- a notification is sent to a technical stakeholder that a business rule has been completed and a corresponding data quality specification must be created.
- Tasks associated with the same may be maintained in a queue, and completed according to priority. Notifications of tasks to be completed may be sent, e.g., via electronic mail or indicated by way of a web application interface.
- step 300 the process for defining a rule specification is commenced.
- Steps 1 . 1 through 1 . 4 are performed in, e.g., the first system 110 of FIG. 1 , in a preferred embodiment.
- step 1 . 1 the business rule for a business term is defined (using, e.g., interface 200 of FIG. 2A ).
- step 1 . 2 a data quality rule specification is created, corresponding to the business rule (using, e.g., interface 210 of FIG. 2B ).
- step 1 . 3 the data quality rule for the table column is defined.
- step 4 the rule is transmitted for implementation in, e.g., central server 125 of FIG. 1 .
- step 1 . 5 it is determined if central server 125 has received the rule. If not, in step 1 . 6 , an error notification is sent and the process is repeated from step 1 . 3 . If central server 125 has received the rule, the process for implementing the rule configuration is commenced, in central server 125 .
- step 2 . 0 the new rule is populated in the work queue of the central server 125 .
- step 2 . 1 an end user is notified of the new rule and that it is in the production work queue.
- step 2 . 2 a data quality control analyst logs into the production work queue of central server 125 .
- step 2 . 3 the oldest or prioritized unworked rule is selected from the work queue.
- step 2 . 4 the rule is tagged with the status and assigned to a data quality analysis.
- step 2 . 5 the code for the rule configuration is written and tested.
- step 2 . 6 the test results are reviewed and validated by the data steward.
- step 2 . 10 code is written such that the data quality rules will run automatically.
- step 2 . 8 it is determined if the data quality rule is ready for production. If not, the process repeats from step 2 . 4 . If so, in step 2 . 9 the rule configuration and automatic running is scheduled for the next product release.
- step 2 . 11 all needed change control approvals are obtained.
- step 2 . 15 a notification is sent to data stewards and other interested persons that the rule is completed.
- the central server 125 reads that status of the rules in the work queue and assesses whether the rule is finished, in step 2 . 13 . If so, the process returns to step 2 . 15 . If not, in step 2 . 14 , an automatic notification is sent to the interested individuals, indicating the unfinished rules.
- step 3 . 1 the rule is promoted to the production environment, including the automatic running of the rule, in step 4 . 0 .
- step 3 . 2 the rule configuration is run in production mode.
- step 3 . 3 the test results are reviewed and validated with the data steward.
- step 3 . 4 the rule configuration performance is reviewed and fine-tuned.
- step 3 . 5 it is determined whether the rules are acceptable to deploy. If not, in step 3 . 6 , the rule configuration is rolled back and the process begins again with step 2 . 4 . If so, in step 3 . 7 , the results are viewed in the first system 110 , in step 3 . 7 .
- step 3 . 6 If the results are not acceptable, the process moves to step 3 . 6 with a roll back to rule configuration. If the results are acceptable, the process moves to step 3 . 8 where it is determined whether the rule is scheduled to be released. If not, the process returns to step 3 . 7 . If so, the process moves to step 3 . 9 where data stewards and other interested parties are notified that the rule configuration is implemented. From there, testing is performed by the second system 120 and the results of the testing made available.
- the data quality specification described herein, as a metadata driven workflow represents an improvement to technology because it imposes automated controls over the data quality review process, thereby eliminating errors and inconsistencies.
- inventive process a complete record is ensured which documents traceability between a rule, the specification, the technical elements, the requirements, and the business context in a manner that imposes automated controls.
- the methods are carried out by a system that employs a client/server architecture such as, for example, the exemplary embodiments described as follows with reference to FIG. 4 .
- the data that may be used as an input to the system and the outputs from the system(s) may be stored in one or more databases 401 .
- Database server(s) 402 may include a database services management application 403 that manages storage and retrieval of data from the database(s) 401 .
- the databases 401 may be relational databases; however, other data organizational structures may be used without departing from the scope of the present invention. Thus, for example, database 401 may be representative of database 115 of FIG. 1 .
- One or more application server(s) 404 are in communication with the database server 402 .
- the application server 404 communicates requests for data to the database server 402 .
- the database server 402 retrieves the requested data.
- the application server 404 may also send data to the database server 402 for storage in the database(s) 401 .
- the application server 404 comprises one or more processors 405 , non-transitory computer readable storage media 407 that store programs (computer readable instructions) for execution by the processor(s), and an interface 406 between the processor(s) 405 and computer readable storage media 407 .
- the application server 404 may store the computer programs and code used to implement the methods of the present invention.
- first system 110 and second system 120 could take the form of application server 404 and would store the software necessary to implement creating of the business rule, data quality specification, and running of the data quality testing.
- the network server 408 also comprises one or more processors 409 , computer readable storage media 411 that store programs (computer readable instructions) for execution by the processor(s), and an interface 410 between the processor(s) 409 and computer readable storage media 411 .
- the network server 408 is employed to deliver content that can be accessed through the communications network 412 , e.g., by an end user employing computing device 413 (e.g., device 105 of FIG. 1 ).
- an application such as an Internet browser
- the network server 408 receives and processes the request.
- the network server 408 sends the data or application requested along with user interface instructions for displaying an interface on device 413 , such as a point of sale terminal.
- the non-transitory computer readable storage media e.g., 407 or 411
- the programs may include volatile and non-volatile, removable and non-removable media implemented in any method or technology for storage of information such as computer-readable instructions, data structures, program modules, or other data.
- Computer readable storage media may include, but is not limited to, RAM, ROM, Erasable Programmable ROM (EPROM), Electrically Erasable Programmable ROM (EEPROM), flash memory or other solid state memory technology, CD-ROM, digital versatile disks (DVD), or other optical storage, magnetic cassettes, magnetic tape, magnetic disk storage or other magnetic storage devices, or any other medium which can be used to store the desired information and which can be accessed by the computer system and processed.
Landscapes
- Engineering & Computer Science (AREA)
- Business, Economics & Management (AREA)
- Human Resources & Organizations (AREA)
- Theoretical Computer Science (AREA)
- Databases & Information Systems (AREA)
- Quality & Reliability (AREA)
- Strategic Management (AREA)
- General Physics & Mathematics (AREA)
- Physics & Mathematics (AREA)
- Economics (AREA)
- Entrepreneurship & Innovation (AREA)
- Educational Administration (AREA)
- Development Economics (AREA)
- Data Mining & Analysis (AREA)
- General Engineering & Computer Science (AREA)
- Operations Research (AREA)
- Marketing (AREA)
- Tourism & Hospitality (AREA)
- Game Theory and Decision Science (AREA)
- General Business, Economics & Management (AREA)
- Stored Programmes (AREA)
- General Factory Administration (AREA)
Abstract
System and method for integrating systems to implement data quality processing. A business rule creation module is configured to create a business rule associated with a business term. A data quality specification module is configured to create a data quality specification based on the business rule. The data quality specification comprises (1) an identity of a column of a table stored in a database comprising data to be tested; (2) a test to perform on the data to be tested; and (3) reference data required to perform the test on the data. A validation module is configured to receive the data quality specification; retrieve data associated with the column from the database; and test the retrieved data in accordance with the test using the reference data. A result publication module is configured to return a result of the test to the data quality specification module.
Description
The invention relates to integrating systems to implement data quality assessments.
Embodiments of the present invention are directed to a system and method for integrating systems to implement data quality processing. A business rule creation module is configured to create a business rule associated with a business term. A data quality specification module is configured to create a data quality specification based on the business rule. The data quality specification comprises (1) an identity of a column of a table stored in a database comprising data to be tested; (2) a test to perform on the data to be tested; and (3) reference data required to perform the test on the data. A validation module is configured to receive the data quality specification; retrieve data associated with the column from the database; and test the retrieved data in accordance with the test using the reference data. A result publication module is configured to return a result of the test to the data quality specification module.
In some embodiments, the data quality specification further comprises at least one of (1) an effective date for the test; and (2) a test pass/fail indicator.
In some embodiments, the system further includes a lineage module configured to maintain data describing the test performed on the data.
The foregoing summary, as well as the following detailed description of embodiments of the invention, will be better understood when read in conjunction with the appended drawings of an exemplary embodiment. It should be understood, however, that the invention is not limited to the precise arrangements and instrumentalities shown.
In the drawings:
Maintaining data quality is a significant issue facing organizations. If a data quality issue is observed or suspected, even in a single record, it is necessary to systematically identify the full universe of exceptions/data quality issues that may exist to quantify the size and scope of the issue and understand the impact on an organization.
The present invention allows for systematic identification of data quality issues that may exist within the processing systems of an organization. More particularly, the invention involves creation of a well-defined data quality specification from metadata stored in a first system in a manner that a second system can readily access, ingest, and implement a data quality rule in accordance with the specification and a defined workflow.
With reference to FIG. 1 , the overall system 100 is described. First system 110 is one that allows for creation of a data quality rule specification and stores data, metadata and the specifications. The second system 120 is one that receives data quality rule specifications and context data from first system 110 and, using such information, retrieves data from one or more databases 115, as indicated by the rule specification, and tests such data in accordance with the rules indicated in the rule specifications. An end user may access system 100 may be via a user device 105 employing a web browser. All communications among the components of system 100 may be through central server 125.
Business rules and specifications are created in first system 110 and second system 120 retrieves the data to be tested from a particular table in a database 115 as specified in the specification. Second system 120 performs the testing on the data retrieved from the database 115 and sends the results back to first system 110. The results of the testing comprise data describing the number of exceptions to the rule.
With reference to FIGS. 2A and 2B , exemplary interfaces 200 and 210 are illustrated. Such interfaces may be accessed by user device 105 employing a web browser. Interface 200 may be used by a business stakeholder/end user to create, review and approve a proposed business rule in first system 110 (using, e.g. a business rule creation module) by way of a workflow. The proposed business rule may include the name of the business rule, and English description of the rule, the associated business term, the data quality specification that implements the rules, and the policy implemented by the business rule. By way of example:
Business Rule: Original FICO Score
English Description of the Rule: The FICO Equivalent at Acquisition must be 0 or between 300 and 850.
Business Term: Original FICO Score
Data Quality Rule Specification:
-
- https://businessentity.firstsystem.com/term/DQRuleSpecificationExample (a link to the content of the data quality rule specification)
Policy Implemented by the Business Rule:
-
- Ensuring FICO score is within acceptable limits
By way of example:
Data Quality Specification Name: Original FICO Score
Data Quality Rule Description: FICO Equivalent at Acquisition must be 0 or a number greater than 300 and less than or equal to 850
Effective Date of the Rule: Jun. 2, 2017
Threshold: 97
Execution Frequency: Monthly
Columns in the table that the data quality rule specification will apply to: SCHEMA.TABLE.TABLECOLUMN (identifying a column in a specified database)
Data Quality Dimension: Completeness
Categorization by Rule Family: Functional
Allowed Value Set: Must be a number between 0 and 1000
Referenced Columns: SCHEMA.TABLE.TABLECOLUMN (identifying a column in a specified database)
The threshold may be, for example, the percentage of records that must pass out of the whole for the rule to be considered as passing, regardless of the number of exceptions to the rule. Other thresholds or scoring mechanisms may be used within the scope of the present invention.
Data quality dimensions may include completeness (the proportion of stored data against the potential of 100% complete); consistency (data sets match across data stores); uniqueness (a data record will be recorded once based upon how that record is identified); validity (data sets conform to the rules and syntax of its definition); accuracy (the degree to which data captured correctly describes the real world object or event being described); and timeliness (the degree to which data represents reality from the required point in time).
The categorization by rule family describes the type of check that is being performed (e.g., functional, null check, range/valid values, service level agreement, standard, reconciliation).
The allowed value set is maintained in the first system 110 and passed to the second system 120 so that the second system 120 can use the allowed values in connection with the testing performed by it. This avoids needing to maintain allowed value sets in multiple platforms. Instead, in accordance with aspects of the present invention, all the information needed by the second system 120 to perform the test is passed to it by way of an integrated workflow.
The reference columns are indications of database columns containing data that will be needed as a reference in connection with performing the data quality test (e.g. FICO Score can be zero if the value of a SCHEMA.TABLE.TABLECOLUMN such as ACTIVE FLAG is “Y”) Second system 120 validates the data quality rules specification and a data quality analyst may then write the appropriate data quality rule in second system 120 based on the specification. System 120 then executes the data quality rules (e.g., using validation module) and results/metrics of the data quality rule execution is fed back into first system 110 (e.g., using result publication module). Thus, once the test is run, the results are published to the record associated with the data quality rule and can be accessed via a link to the results.
Link to Results:
-
- https://businessentity.systemone.com/term/DQRuleResults
Exemplary Results:
-
- 3,606,376 rows passed, 133 rows failed
- Passing fraction=99.99631 . . . Threshold 97
- Result “Pass”
In some embodiments of the present invention, the first system 110 can obtain access to data lineage (e.g., processing history/audit trail) for a specific schema.table.tablecolumn. This streamlines the process of generating and documenting/accessing the flow of data.
In some embodiments, the process of creating a business rule, a corresponding data quality specification, implementation of the specification, testing of data in accordance with the specification, and delivering results of the testing, is managed in accordance with an automated workflow. Thus, for example, upon completion of a business rule by a business stakeholder, a notification is sent to a technical stakeholder that a business rule has been completed and a corresponding data quality specification must be created. Tasks associated with the same may be maintained in a queue, and completed according to priority. Notifications of tasks to be completed may be sent, e.g., via electronic mail or indicated by way of a web application interface.
With reference to FIG. 3 , a flow diagram illustrating a method of the present invention is shown. In step 300, the process for defining a rule specification is commenced. Steps 1.1 through 1.4 are performed in, e.g., the first system 110 of FIG. 1 , in a preferred embodiment. In step 1.1, the business rule for a business term is defined (using, e.g., interface 200 of FIG. 2A ). In step 1.2, a data quality rule specification is created, corresponding to the business rule (using, e.g., interface 210 of FIG. 2B ). In step 1.3, the data quality rule for the table column is defined. In step 1.4, the rule is transmitted for implementation in, e.g., central server 125 of FIG. 1 . In step 1.5, it is determined if central server 125 has received the rule. If not, in step 1.6, an error notification is sent and the process is repeated from step 1.3. If central server 125 has received the rule, the process for implementing the rule configuration is commenced, in central server 125.
In step 2.0, the new rule is populated in the work queue of the central server 125. In step 2.1, an end user is notified of the new rule and that it is in the production work queue. In step 2.2, a data quality control analyst logs into the production work queue of central server 125. In step 2.3, the oldest or prioritized unworked rule is selected from the work queue. In step 2.4, the rule is tagged with the status and assigned to a data quality analysis. In step 2.5, the code for the rule configuration is written and tested. In step 2.6, the test results are reviewed and validated by the data steward. In step 2.7, the rule configuration performance is reviewed and fine-tuned. In step 2.10, code is written such that the data quality rules will run automatically. In step 2.8, it is determined if the data quality rule is ready for production. If not, the process repeats from step 2.4. If so, in step 2.9 the rule configuration and automatic running is scheduled for the next product release. In step 2.11, all needed change control approvals are obtained. In step 2.15, a notification is sent to data stewards and other interested persons that the rule is completed. Referring back now to step 2.12, in a parallel process, the central server 125 reads that status of the rules in the work queue and assesses whether the rule is finished, in step 2.13. If so, the process returns to step 2.15. If not, in step 2.14, an automatic notification is sent to the interested individuals, indicating the unfinished rules.
At this point, the rule is ready to be deployed in the production environment. In step 3.1, the rule is promoted to the production environment, including the automatic running of the rule, in step 4.0. In step 3.2, the rule configuration is run in production mode. In step 3.3, the test results are reviewed and validated with the data steward. In step 3.4, the rule configuration performance is reviewed and fine-tuned. In step 3.5, it is determined whether the rules are acceptable to deploy. If not, in step 3.6, the rule configuration is rolled back and the process begins again with step 2.4. If so, in step 3.7, the results are viewed in the first system 110, in step 3.7. If the results are not acceptable, the process moves to step 3.6 with a roll back to rule configuration. If the results are acceptable, the process moves to step 3.8 where it is determined whether the rule is scheduled to be released. If not, the process returns to step 3.7. If so, the process moves to step 3.9 where data stewards and other interested parties are notified that the rule configuration is implemented. From there, testing is performed by the second system 120 and the results of the testing made available.
Use of the data quality specification, described herein, as a metadata driven workflow represents an improvement to technology because it imposes automated controls over the data quality review process, thereby eliminating errors and inconsistencies. Through use of the inventive process, a complete record is ensured which documents traceability between a rule, the specification, the technical elements, the requirements, and the business context in a manner that imposes automated controls. In some embodiments, the methods are carried out by a system that employs a client/server architecture such as, for example, the exemplary embodiments described as follows with reference to FIG. 4 . The data that may be used as an input to the system and the outputs from the system(s) may be stored in one or more databases 401. Database server(s) 402 may include a database services management application 403 that manages storage and retrieval of data from the database(s) 401. The databases 401 may be relational databases; however, other data organizational structures may be used without departing from the scope of the present invention. Thus, for example, database 401 may be representative of database 115 of FIG. 1 .
One or more application server(s) 404 are in communication with the database server 402. The application server 404 communicates requests for data to the database server 402. The database server 402 retrieves the requested data. The application server 404 may also send data to the database server 402 for storage in the database(s) 401. The application server 404 comprises one or more processors 405, non-transitory computer readable storage media 407 that store programs (computer readable instructions) for execution by the processor(s), and an interface 406 between the processor(s) 405 and computer readable storage media 407. The application server 404 may store the computer programs and code used to implement the methods of the present invention. Thus, for example, first system 110 and second system 120 could take the form of application server 404 and would store the software necessary to implement creating of the business rule, data quality specification, and running of the data quality testing.
To the extent data and information is communicated over a network (e.g., the Internet or an Intranet), one or more network servers 408 may be employed. The network server 408 also comprises one or more processors 409, computer readable storage media 411 that store programs (computer readable instructions) for execution by the processor(s), and an interface 410 between the processor(s) 409 and computer readable storage media 411. The network server 408 is employed to deliver content that can be accessed through the communications network 412, e.g., by an end user employing computing device 413 (e.g., device 105 of FIG. 1 ). When data is requested through an application, such as an Internet browser, the network server 408 receives and processes the request. The network server 408 sends the data or application requested along with user interface instructions for displaying an interface on device 413, such as a point of sale terminal.
The computers referenced herein are specially programmed to perform the functionality described herein.
The non-transitory computer readable storage media (e.g., 407 or 411) that store the programs (i.e., software modules comprising computer readable instructions) may include volatile and non-volatile, removable and non-removable media implemented in any method or technology for storage of information such as computer-readable instructions, data structures, program modules, or other data. Computer readable storage media may include, but is not limited to, RAM, ROM, Erasable Programmable ROM (EPROM), Electrically Erasable Programmable ROM (EEPROM), flash memory or other solid state memory technology, CD-ROM, digital versatile disks (DVD), or other optical storage, magnetic cassettes, magnetic tape, magnetic disk storage or other magnetic storage devices, or any other medium which can be used to store the desired information and which can be accessed by the computer system and processed.
It will be appreciated by those skilled in the art that changes could be made to the exemplary embodiments shown and described above without departing from the broad inventive concept thereof. It is understood, therefore, that this invention is not limited to the exemplary embodiments shown and described, but it is intended to cover modifications within the spirit and scope of the present invention as defined by the claims. For example, specific features of the exemplary embodiments may or may not be part of the claimed invention and features of the disclosed embodiments may be combined. Unless specifically set forth herein, the terms “a”, “an” and “the” are not limited to one element but instead should be read as meaning “at least one”.
It is to be understood that at least some of the figures and descriptions of the invention have been simplified to focus on elements that are relevant for a clear understanding of the invention, while eliminating, for purposes of clarity, other elements that those of ordinary skill in the art will appreciate may also comprise a portion of the invention. However, because such elements are well known in the art, and because they do not necessarily facilitate a better understanding of the invention, a description of such elements is not provided herein.
Further, to the extent that the method does not rely on the particular order of steps set forth herein, the particular order of the steps should not be construed as limitation on the claims. The claims directed to the method of the present invention should not be limited to the performance of their steps in the order written, and one skilled in the art can readily appreciate that the steps may be varied and still remain within the spirit and scope of the present invention.
Claims (27)
1. A computer-implemented method, comprising:
generating a business rule;
generating a data quality specification for data to be tested, wherein the data quality specification is generated using the business rule and stored metadata, and wherein the data quality specification includes a test to perform on a set of specified reference data;
generating a data quality rule corresponding to the data quality specification, wherein the data quality rule is specific to the test, and wherein the data quality rule is used to determine whether data in the production environment conforms to the data quality specification;
transmitting the data quality rule, wherein when the data quality rule is received at a validation module, the validation module repeatedly executes a process for validating the data quality rule on the set of specified reference data, wherein the specified reference data is specific data used for the test, and wherein the data quality rule is continually updated until a determination is made to promote the data quality rule;
receiving results of the process to validate the data quality rule, wherein the results include the updated data quality rule;
promoting the updated data quality rule to a production environment;
automating the updated data quality rule to automatically process data in a production environment; and
implementing the updated data quality rule on data in the production environment to determine whether the data in the production environment conforms to the data quality specification.
2. The method of claim 1 , further comprising:
automatically transmitting a notification indicating the generation of the business rule.
3. The method of claim 1 , wherein the business rule is generated using an automated workflow.
4. The method of claim 1 , wherein the business rule includes a policy implemented by the business rule.
5. The method of claim 1 , wherein the data quality specification includes a threshold corresponding to a percentage of records needed for the data quality rule to pass.
6. The method of claim 5 , wherein the percentage of records includes a number of allowed exceptions to the data quality rule.
7. The method of claim 1 , wherein the data quality specification includes an allowed value set that is maintained in a single platform.
8. The method of claim 1 , wherein the data quality specification includes an audit trail which includes a processing history of the set of reference data and wherein the data quality specification includes stored metadata indicating specific to the data location information of the set of reference data.
9. The method of claim 1 , wherein the validation module repeatedly executes a new process for validating the data quality rule on a new set of specified reference data in a pre-production environment and wherein the data quality rule is continually updated until a new determination is made to promote the data quality rule.
10. A system, comprising:
one or more processors; and
a non-transitory computer-readable medium including instructions that, when executed by the one or more processors, cause the one or more processors to perform operations, including:
generating a business rule;
generating a data quality specification for data to be tested, wherein the data quality specification is generated using the business rule and stored metadata, and wherein the data quality specification includes a test to perform on a set of specified reference data;
generating a data quality rule corresponding to the data quality specification, wherein the data quality rule is specific to the test, and wherein the data quality rule is used to determine whether data in the production environment conforms to the data quality specification;
transmitting the data quality rule, wherein when the data quality rule is received at a validation module, the validation module repeatedly executes a process for validating the data quality rule on the set of specified reference data, wherein the specified reference data is specific data used for the test, and wherein the data quality rule is continually updated until a determination is made to promote the data quality rule;
receiving results of the process to validate the data quality rule, wherein the results include the updated data quality rule;
promoting the updated data quality rule to a production environment;
automating the updated data quality rule to automatically process data in a production environment; and
implementing the updated data quality rule on data in the production environment to determine whether the data in the production environment conforms to the data quality specification.
11. The system of claim 10 , further comprising:
automatically transmitting a notification indicating the generation of the business rule.
12. The system of claim 10 , wherein the business rule is generated using an automated workflow.
13. The system of claim 10 , wherein the business rule includes a policy implemented by the business rule.
14. The system of claim 10 , wherein the data quality specification includes a threshold corresponding to a percentage of records needed for the data quality rule to pass.
15. The system of claim 14 , wherein the percentage of records includes a number of allowed exceptions to the data quality rule.
16. The system of claim 10 , wherein the data quality specification includes an allowed value set that is maintained in a single platform.
17. The system of claim 10 , wherein the data quality specification includes an audit trail which includes a processing history of the set of reference data and wherein the data quality specification includes stored metadata indicating data location information of the set of reference data.
18. The system of claim 10 , wherein the validation module repeatedly executes a new process for validating the data quality rule on a new set of specified reference data in a pre-production environment and wherein the data quality rule is continually updated until a new determination is made to promote the data quality rule.
19. A computer-program product tangibly embodied in a non-transitory machine-readable storage medium, including instructions that, when executed by one or more processors, cause the one or more processors to:
generate a business rule;
generate a data quality specification for data to be tested, wherein the data quality specification is generated using the business rule and stored metadata, and wherein the data quality specification includes a test to perform on a set of specified reference data;
generate a data quality rule corresponding to the data quality specification, wherein the data quality rule is specific to the test, and wherein the data quality rule is used to determine whether data in the production environment conforms to the data quality specification;
transmit the data quality rule, wherein when the data quality rule is received at a validation module, the validation module repeatedly executes a process for validating the data quality rule on the set of specified reference data, wherein the specified reference data is specific data used for the test, and wherein the data quality rule is continually updated until a determination is made to promote the data quality rule;
receive results of the process to validate the data quality rule, wherein the results include the updated data quality rule;
promote the updated data quality rule to a production environment;
automate the updated data quality rule to automatically process data in a production environment; and
implement the updated data quality rule on data in the production environment to determine whether the data in the production environment conforms to the data quality specification.
20. The computer-program product of claim 19 , further comprising instructions to cause the one or more processors to:
automatically transmit a notification indicating the generation of the business rule.
21. The computer-program product of claim 19 , wherein the business rule is generated using an automated workflow.
22. The computer-program product of claim 19 , wherein the business rule includes a policy implemented by the business rule.
23. The computer-program product of claim 19 , wherein the data quality specification includes a threshold corresponding to a percentage of records needed for the data quality rule to pass.
24. The computer-program product of claim 23 , wherein the percentage of records includes a number of allowed exceptions to the data quality rule.
25. The computer-program product of claim 19 , wherein the data quality specification includes an allowed value set that is maintained in a single platform.
26. The computer-program product of claim 19 , wherein the data quality specification includes an audit trail which includes a processing history of the set of reference data and wherein the data quality specification includes stored metadata indicating data location information of the set of reference data.
27. The computer-program product of claim 19 , wherein the validation module repeatedly executes a new process for validating the data quality rule on a new set of specified reference data in a pre-production environment and wherein the data quality rule is continually updated until a new determination is made to promote the data quality rule.
Priority Applications (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US15/666,685 US11106643B1 (en) | 2017-08-02 | 2017-08-02 | System and method for integrating systems to implement data quality processing |
US17/382,405 US20220019566A1 (en) | 2017-08-02 | 2021-07-22 | System and method for integrating systems to implement data quality processing |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US15/666,685 US11106643B1 (en) | 2017-08-02 | 2017-08-02 | System and method for integrating systems to implement data quality processing |
Related Child Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
US17/382,405 Continuation US20220019566A1 (en) | 2017-08-02 | 2021-07-22 | System and method for integrating systems to implement data quality processing |
Publications (1)
Publication Number | Publication Date |
---|---|
US11106643B1 true US11106643B1 (en) | 2021-08-31 |
Family
ID=77464923
Family Applications (2)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
US15/666,685 Active 2037-10-03 US11106643B1 (en) | 2017-08-02 | 2017-08-02 | System and method for integrating systems to implement data quality processing |
US17/382,405 Abandoned US20220019566A1 (en) | 2017-08-02 | 2021-07-22 | System and method for integrating systems to implement data quality processing |
Family Applications After (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
US17/382,405 Abandoned US20220019566A1 (en) | 2017-08-02 | 2021-07-22 | System and method for integrating systems to implement data quality processing |
Country Status (1)
Country | Link |
---|---|
US (2) | US11106643B1 (en) |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN115328948A (en) * | 2022-02-22 | 2022-11-11 | 杭州美创科技有限公司 | Master data quality management method, master data quality management device, computer equipment and storage medium |
US20230099164A1 (en) * | 2021-09-29 | 2023-03-30 | Jpmorgan Chase Bank, N.A. | Systems and methods for automated data quality semantic constraint identification using rich data type inferences |
Families Citing this family (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US12050568B2 (en) * | 2022-09-09 | 2024-07-30 | Genworth Holdings, Inc. | System and method for implementing a data quality framework and engine |
Citations (18)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20050182739A1 (en) * | 2004-02-18 | 2005-08-18 | Tamraparni Dasu | Implementing data quality using rule based and knowledge engineering |
US20090018996A1 (en) * | 2007-01-26 | 2009-01-15 | Herbert Dennis Hunt | Cross-category view of a dataset using an analytic platform |
US20090076841A1 (en) * | 2006-12-07 | 2009-03-19 | Baker Geoffrey B | Rules-based software and methods for health care measurement applications and uses thereof |
US20110173149A1 (en) * | 2010-01-13 | 2011-07-14 | Ab Initio Technology Llc | Matching metadata sources using rules for characterizing matches |
US20120311426A1 (en) * | 2011-05-31 | 2012-12-06 | Oracle International Corporation | Analysis of documents using rules |
US20130297567A1 (en) * | 2012-05-04 | 2013-11-07 | International Business Machines Corporation | Data stream quality management for analytic environments |
US20140075028A1 (en) * | 2012-09-10 | 2014-03-13 | Bank Of America Corporation | Centralized Data Provisioning |
US8781882B1 (en) * | 2008-08-07 | 2014-07-15 | Accenture Global Services Limited | Automotive industry high performance capability assessment |
US20150121280A1 (en) * | 2013-10-29 | 2015-04-30 | Matt Slatner | Systems and Methods for Data Transfer with Graphic Interface |
US20150261796A1 (en) * | 2014-03-13 | 2015-09-17 | Ab Initio Technology Llc | Specifying and applying logical validation rules to data |
US20160011926A1 (en) * | 2014-07-08 | 2016-01-14 | International Business Machines Corporation | Method for processing data quality exceptions in a data processing system |
US20160070724A1 (en) * | 2014-09-08 | 2016-03-10 | International Business Machines Corporation | Data quality analysis and cleansing of source data with respect to a target system |
US20160070733A1 (en) * | 2014-09-10 | 2016-03-10 | Ab Initio Technology Llc | Conditional validation rules |
US20160098654A1 (en) * | 2014-10-01 | 2016-04-07 | Morgan Stanley | Data quality analysis tool |
US20170235848A1 (en) * | 2012-08-29 | 2017-08-17 | Dennis Van Dusen | System and method for fuzzy concept mapping, voting ontology crowd sourcing, and technology prediction |
US20180113898A1 (en) * | 2016-10-25 | 2018-04-26 | Mastercard International Incorporated | Systems and methods for assessing data quality |
US10013439B2 (en) * | 2011-06-27 | 2018-07-03 | International Business Machines Corporation | Automatic generation of instantiation rules to determine quality of data migration |
US20190138627A1 (en) * | 2017-11-06 | 2019-05-09 | Bank Of America Corporation | Dynamic Lineage Validation System |
Family Cites Families (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US8700577B2 (en) * | 2009-12-07 | 2014-04-15 | Accenture Global Services Limited GmbH | Method and system for accelerated data quality enhancement |
US8412735B2 (en) * | 2009-12-07 | 2013-04-02 | Accenture Global Services Limited | Data quality enhancement for smart grid applications |
US10565172B2 (en) * | 2017-02-24 | 2020-02-18 | International Business Machines Corporation | Adjusting application of a set of data quality rules based on data analysis |
-
2017
- 2017-08-02 US US15/666,685 patent/US11106643B1/en active Active
-
2021
- 2021-07-22 US US17/382,405 patent/US20220019566A1/en not_active Abandoned
Patent Citations (19)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20050182739A1 (en) * | 2004-02-18 | 2005-08-18 | Tamraparni Dasu | Implementing data quality using rule based and knowledge engineering |
US20090076841A1 (en) * | 2006-12-07 | 2009-03-19 | Baker Geoffrey B | Rules-based software and methods for health care measurement applications and uses thereof |
US20090018996A1 (en) * | 2007-01-26 | 2009-01-15 | Herbert Dennis Hunt | Cross-category view of a dataset using an analytic platform |
US8781882B1 (en) * | 2008-08-07 | 2014-07-15 | Accenture Global Services Limited | Automotive industry high performance capability assessment |
US20110173149A1 (en) * | 2010-01-13 | 2011-07-14 | Ab Initio Technology Llc | Matching metadata sources using rules for characterizing matches |
US20120311426A1 (en) * | 2011-05-31 | 2012-12-06 | Oracle International Corporation | Analysis of documents using rules |
US10013439B2 (en) * | 2011-06-27 | 2018-07-03 | International Business Machines Corporation | Automatic generation of instantiation rules to determine quality of data migration |
US20130297567A1 (en) * | 2012-05-04 | 2013-11-07 | International Business Machines Corporation | Data stream quality management for analytic environments |
US20170235848A1 (en) * | 2012-08-29 | 2017-08-17 | Dennis Van Dusen | System and method for fuzzy concept mapping, voting ontology crowd sourcing, and technology prediction |
US20140075028A1 (en) * | 2012-09-10 | 2014-03-13 | Bank Of America Corporation | Centralized Data Provisioning |
US20150121280A1 (en) * | 2013-10-29 | 2015-04-30 | Matt Slatner | Systems and Methods for Data Transfer with Graphic Interface |
US20150261796A1 (en) * | 2014-03-13 | 2015-09-17 | Ab Initio Technology Llc | Specifying and applying logical validation rules to data |
US20160011926A1 (en) * | 2014-07-08 | 2016-01-14 | International Business Machines Corporation | Method for processing data quality exceptions in a data processing system |
US20160070724A1 (en) * | 2014-09-08 | 2016-03-10 | International Business Machines Corporation | Data quality analysis and cleansing of source data with respect to a target system |
US20160070725A1 (en) * | 2014-09-08 | 2016-03-10 | International Business Machines Corporation | Data quality analysis and cleansing of source data with respect to a target system |
US20160070733A1 (en) * | 2014-09-10 | 2016-03-10 | Ab Initio Technology Llc | Conditional validation rules |
US20160098654A1 (en) * | 2014-10-01 | 2016-04-07 | Morgan Stanley | Data quality analysis tool |
US20180113898A1 (en) * | 2016-10-25 | 2018-04-26 | Mastercard International Incorporated | Systems and methods for assessing data quality |
US20190138627A1 (en) * | 2017-11-06 | 2019-05-09 | Bank Of America Corporation | Dynamic Lineage Validation System |
Cited By (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20230099164A1 (en) * | 2021-09-29 | 2023-03-30 | Jpmorgan Chase Bank, N.A. | Systems and methods for automated data quality semantic constraint identification using rich data type inferences |
US12105687B2 (en) * | 2021-09-29 | 2024-10-01 | Jpmorgan Chase Bank, N.A. | Systems and methods for automated data quality semantic constraint identification using rich data type inferences |
CN115328948A (en) * | 2022-02-22 | 2022-11-11 | 杭州美创科技有限公司 | Master data quality management method, master data quality management device, computer equipment and storage medium |
Also Published As
Publication number | Publication date |
---|---|
US20220019566A1 (en) | 2022-01-20 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US11249981B2 (en) | Data quality analysis | |
US20220019566A1 (en) | System and method for integrating systems to implement data quality processing | |
US8813178B1 (en) | Systems and methods for preparing and submitting documents to comply with securities regulations | |
WO2016141491A1 (en) | Systems and methods for managing data | |
US20180285394A1 (en) | System and Method for Data Integration | |
US8522202B2 (en) | System and method for managing computer environment setup requests | |
CN111767340B (en) | Data processing method, device, electronic equipment and medium | |
US9513997B2 (en) | Test data management | |
US11645194B2 (en) | Systems for enterprise-wide end-to-end automated software testing | |
US11816163B2 (en) | Systems and methods for improved transactional mainframes | |
CN111340584A (en) | Method, device, equipment and storage medium for determining fund side | |
US11196751B2 (en) | System and method for controlling security access | |
CN108038667B (en) | Policy generation method, device and equipment | |
US11200102B1 (en) | System for tracking transaction data across applications | |
CN117036062A (en) | Accounting resource calculation method and device | |
US20200258093A1 (en) | Compliance standards mapping | |
US9230284B2 (en) | Centrally managed and accessed system and method for performing data processing on multiple independent servers and datasets | |
US7516048B2 (en) | Externalized metric calculation engine | |
CN113515510A (en) | Data processing method, data processing system, electronic device and readable storage medium | |
US11831490B1 (en) | Systems, methods, and media for performing information technology service management correlation for infrastructure environment functions | |
CN114327377B (en) | Method and device for generating demand tracking matrix, computer equipment and storage medium | |
US12086815B2 (en) | Approaches for analyzing entity relationships | |
US20220027991A1 (en) | Systems and methods for tracking and reporting user interactions with data objects | |
US10402390B1 (en) | Model validation system | |
CN116382871A (en) | Information change identification and task rerun method, device, equipment, medium and product |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
STCF | Information on status: patent grant |
Free format text: PATENTED CASE |