US8838652B2 - Techniques for application data scrubbing, reporting, and analysis - Google Patents
Techniques for application data scrubbing, reporting, and analysis Download PDFInfo
- Publication number
- US8838652B2 US8838652B2 US12/050,414 US5041408A US8838652B2 US 8838652 B2 US8838652 B2 US 8838652B2 US 5041408 A US5041408 A US 5041408A US 8838652 B2 US8838652 B2 US 8838652B2
- Authority
- US
- United States
- Prior art keywords
- data
- machine
- report
- merge
- rules
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Expired - Fee Related, expires
Links
Images
Classifications
-
- G06F17/30—
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
Definitions
- a typical enterprise in today's highly automated environment can have a variety of systems and data sources. Each system can produce different versions of the same data types that the enterprise manages and tracks. So, similar or same data is often repetitively stored within the enterprise. In fact, with some data sources the information may be incomplete whereas in other data sources the information may be more robust.
- a method for data analysis A first schema for a first data source and a second schema for a second data source are acquired. The first and second schemas are used for detecting data types and patterns for the data types in both the data sources. Next, some first patterns associated with the first data source are matched to other second patterns associated with the second data source in response to matching rules. Finally, a report is generated that identifies the matched first patterns of the first data source to the second patterns of the second source.
- FIG. 1 is a diagram of a method for data analysis, according to an example embodiment.
- FIG. 2 is a diagram of a method for processing a data analysis tool, according to an example embodiment.
- FIG. 3 is a diagram of a data analysis system, according to an example embodiment.
- FIG. 4 is a diagram of another data analysis system, according to an example embodiment.
- a “schema” as used herein refers to a file or table that defines a data source's structure and syntax.
- Some example schemas can include extensible markup language (XML) schemas, relational database schemas, directory schemas, and the like.
- XML extensible markup language
- a schema describes limitations on the structure of the universe of data that can be associated with a particular data source.
- a “data source” refers to a repository for the data that a schema defines.
- the repository can be a relational database table, a file, a directory, etc.
- a data source is produced by applications in an automated fashion or produced in a semi-automated fashion via interfaces that users interact with.
- a data source can include data produced in a strictly automated fashion via processing applications and at the same time include manually entered data received from a user via a Graphical User Interface (GUI), such as World-Wide Web (WWW) site via WWW pages and interfaces, a SQL update, proprietary applications' interfaces, etc.
- GUI Graphical User Interface
- Novell® network and proxy server products are implemented in whole or in part in the Novell® network and proxy server products, directory services products, operating system products, and/or identity based products, distributed by Novell®, Inc., of Provo, Utah.
- FIG. 1 is a diagram of a method 100 for data analysis, according to an example embodiment.
- the method 100 (hereinafter “data analysis service”) is implemented in a machine-accessible and readable medium.
- the data analysis service is operational over and processes within a network.
- the network may be wired, wireless, or a combination of wired and wireless.
- the data analysis service acquires a first schema for a first data source and a second schema for a second data source.
- the schemas include the structural and syntactical restrictions associated with identifying and validating data types within the data housed in the data sources.
- the acquisition of the first and second schemas can occur in a variety of manners.
- a data analyst may specifically identify the schemas and the data sources via an interface, such as a World-Wide Web (WW) page/form.
- WW World-Wide Web
- the data sources may be identified and the unique identities associated with the data sources permit a repository to be queried and the proper schemas returned.
- a policy may be used to construct a name or identifier for the corresponding schema. Once the schema name or identifier is know, the data analysis service can acquire the schema of interest.
- the data analysis service uses the first and second schemas to detect data types and patterns for those data types in both the data sources.
- the schema defines data types and their corresponding syntax and/or structure.
- the data analysis service uses this information to parse the data sources and identify data types and patterns from data in the data sources.
- the schemas are defined in extensible markup language (XML) as XML schema definitions (XSD's).
- the schema provides at least some structure and syntax for initially recognizing and parsing data types and patterns that occur in the data source to which the schema is associated.
- the data analysis service matches some first patterns associated with the first data source to other second patterns associated with the second data source in response to matching rules.
- the matching rules provide a link between data types or patterns across the two data sources.
- the matching rules can be acquired from a Meta schema that ties the first schema to the second schema, such that the matching rules are pre-existing and acquired via inspection of the Meta schema.
- the matching rules are acquired in response to a predefined policy that associates patterns or data types between the two schemas.
- the matching rules are predefined but as stated above can be acquired in a variety of manners and from a variety of sources.
- An example matching rule may match a first data type identified in the first data source as phone-number with a second data type identified in the second data sources as contact-information, even when the first data type is 10 digits (U.S. area code plus traditional 7 digits) and the second data type is 13 digits (3 digit country code, 3 digit area code, and 7 digit phone number). So, the matching rule provides a mechanism to automatically match patterns or data types across the two data sources.
- the data analysis service applies policy against the data in the first and second data sources to generate statistics to use with the matching rules.
- Meta conditions defined in policies can indicate that certain metrics about the data in the data sources are to be captured as the data sources are parsed and being analyzed. Some metrics may include pattern variations for each defined data type, frequency of a particular pattern for a particular data type that occurs within a data source, identifying data source entries where sub data types are missing under a parent data type when required to present in accordance with that data source's schema, etc. These metrics or statistics can be used as conditions that are evaluated for the rules to take actions, such as do not match entries that lack a corresponding sub data type required to be present, etc. So, application of policy can be used to generate statistics that are fed into the matching rules and the matching rules may rely on or use the statistics as part of the application.
- the data analysis service generates a matching report that identifies the matched first patterns of the first data source to the corresponding second patterns of the second data source.
- the matching report includes a variety of information, such as: identifiers for data types, statistics related to the data types, patterns, statistics for the patterns, identifiers for the matching rules, matching rules applied to particular ones of the data types and/or patterns, etc.
- the data analysis service merges selective ones of the first patterns with selective ones of the second patterns to produce a master source in response to merge rules. So, data associated with some matched patterns are merged together in a single master data source. This permits a single master data source to be generated for the enterprise in response to matching patterns and then enforcing merge rules.
- merge rule suppose a data type or pattern associated with a user in a first data source includes such sub data types and data that identifies office location, name, email and supervisor for that user. Now suppose the second data source includes a matching data type or pattern for an employee that includes such sub data types and data that identifies social security number (SSN), salary, date of hire, age, dependents, and department number.
- the matching rules which are processed by the data analysis service, at 130 associates the user and employee data types together; and a merge rule that the data analysis service processes, at 150 , results in combining the data associated with the user of the first data source and the data associated with the employee of the second data source together in a master data source as a single master data source for the enterprise. It is noted that a single master data source schema may be used to acquire the merge rule that permits the data merge.
- the data analysis service permits a data analyst to interact with and modify the report, which was generated at 140 , by altering the merge rules or adding new merge rules and then reiterating the processing 110 - 150 after the data analyst modifies the report.
- the report may include the merge rules or references to the merge rules or alternatively the merge rules may be completely separate from the report.
- the data analysis service is adapted to check for the existence of a pre-existing report when the data analysis service iterates the processing at 110 (at startup or initialization for a new processing iteration). The metrics and content of the report can be used to drive and modify the matching and merging of the data.
- the data analysis service produces a duplicate report that identifies selective first patterns from the first data source that are duplicated in selective second data patterns from the second data source.
- the duplicate report essentially identifies data that is duplicated across the data sources.
- the data analysis service can use the duplicate report to retain a single version of the duplicated pattern in a modified version of a master data source. Whether duplicates are retained or removed from a master data source that combines the first and second data sources can be driven by policy.
- the data analysis service generates white and black list reports for a master data source.
- the white list identifies data from the first and second data sources that conforms to policy.
- the black list report identifies other data from the first and second data sources that do not conform to the policy and that are to be cleaned or edited for correction to conform to the policy.
- every employee data type is to include a sub data type associated with home address and that a few entries in one of the data sources, for example the first data source, lack a home address.
- This data can be flagged in the black list report along with pointers to its location within the first data source or identifying information such that the data can be quickly located within the first data source.
- either an automated process or an editor can take the report and fix the data. This is but one example of many that can be achieved using the black list report. In fact, if a large volume of black list entries are present automated scripts may be used to correct the issues.
- the data analysis service is meant to be an iterative and interactive process that permits enterprise data to be analyzed, cleansed, and reported on. So, the data analysis service can iterate the first and second data sources multiple times producing revised reports (matching, merging, duplicate, white list, and/or black list reports).
- a data analyst inspects the reports and interactively modifies the reports, rules (matching and/merging rules), and/or policy and re-executes the data analysis service.
- the end result is a master data source that conforms to enterprise data policies and reports on the state of the enterprise data sources that comprise the master data source.
- FIG. 2 is a diagram of a method 200 for processing a data analysis tool, according to an example embodiment.
- the method 200 (hereinafter “data analysis tool” is implemented in a machine-accessible and readable medium and is operational over a network.
- the network may be wired, wireless, or a combination of wired and wireless.
- the data analysis tool service presents a different and enhanced perspective to the data analysis service, which is represented by the method 100 of the FIG. 1 and which is discussed above.
- the data analysis tool interacts with a data analyst via an interface presented to the data analyst.
- the interface can be any Graphical User Interface (GUI) or command line Application Programming Interface (API) that permits the data analyst to access a variety of features associated with the data analysis tool for purposes of provided structured and automated mechanisms for analyzing, cleansing (scrubbing), and reporting on enterprise data.
- GUI Graphical User Interface
- API Application Programming Interface
- the interface is a one or more WWW pages accessible via an Internet WWW browser.
- the interface is a relational database API.
- the interface is a directory-based API.
- the data analysis tool receives identifiers for data schemas and data sources associated with those data schemas from the data analyst via the interface. So, the data analyst identifies a set of data sources that the data analyst believes are related and wants to analyze, scrub, and generate reports for. This can be done in a variety of manners.
- the analyst may identify a Meta schema that provides the details for acquiring the individual data schemas and identifiers for the corresponding data sources.
- the data sources are identified and the schemas acquired in response to the identifiers associated with those data sources.
- the schemas are identified and the data sources acquired there from.
- the data analysis tool acquires merge rules from the data analyst via the interface.
- the merge rules identify conditions within the data sources for merging different data types defined in the data schemas together with one another.
- the analyst may manually enter some merge rules via the interface.
- the analyst can also identify a repository for acquiring the merge rules.
- the analyst can identify a master schema that ties the data sources together in a master data source and the master schema includes the merge rules.
- the data analysis tool parses the data sources using the data schemas. When the data source is parsed, patterns are matched across the data sources and the merge rules are enforced against the matched patterns. Next, the data analysis tool produces a merge report and a master data source that combines the data sources together in accordance with the merge rules.
- the data analysis tool identifies matching rules from the data schemas.
- the matching rules assist in identifying data types and patterns in the data sources during the parsing process.
- one schema entry may provide the pattern conditions that identify an employee's phone number as “(NNN) NNN-NNNN” where N represents a numeric character.
- N represents a numeric character.
- a phone number is represented as a 10 digit number having separators of parenthesis, spaces, and a dash; the 10 digit number includes an initial 3 digit area code.
- Another schema may include an entry for phone number that has pattern conditions as follows: “NNN.NNN.NNN.NNNN.” This last schema entry includes a country code (3 digits), an area code (3 digits), and a traditional phone number (7 digits); the last schema also uses a separator as a period character.
- the data analysis tool acquires from the data analyst, via the interface, one or more matching rules. So, an analyst can interactively supply matching rules to the data analysis tool for immediate enforcement during the parsing process.
- the data analysis tool receives modified merge rules from the data analyst, via the interface, which identifies modified conditions with the data sources for merging the different data types and for re-parsing the data sources to produce a modified master data source.
- the data analyst can decide in response to the merge report that modifications should be done and can use the interface to communicate the modifications as changed or even new merge rules (or matching rules) and then re-execute the parsing process of the data analysis tool to produce another version of the master data source and the merge report.
- the data analysis tool compares the merge report to one or more previously generated merge reports for profiling changes in the data types for the data sources (the data types are defined in the schemas) over a configurable period of time.
- This can produce a lot of useful information for the enterprise; such as the enterprise may determine that a particular authoritative data source is in fact not authoritative any longer as another application and data source has been more influential on the overall state of the enterprise data. This can be used to change policy to make the application and its data source the authority for designated data types within the enterprise data warehouse.
- Other information can be ascertained as well, such as determining that a more universally accepted pattern is emerging for a particular data type, for example a phone number that includes a country code in addition to an area code.
- a variety of other useful information can be ascertained by profiling the data types over time; thus, the above presented examples were presented for purposes of illustration only and were not intended to limit the teachings presented herein to just the presented examples.
- the data analysis tool generates a duplicate report that identifies duplicate data types across different ones of the data sources.
- Policy may dictate whether the duplicates are retained or whether they are removed from the master data source.
- the analyst may view the duplicate report via the interface or via a link provided within the interface.
- the analyst may also use the interface to override policy to remove or keep duplicates in the master data source.
- the interface and the data analysis tool provide an automated mechanism for an analyst to iteratively and interactively analyze, cleanse, and generate reports on enterprise data sources.
- the data analysis tool generates a black list report that identifies data types from the data sources that are to be corrected by manual or subsequent automated mechanisms.
- a black list report that identifies data types from the data sources that are to be corrected by manual or subsequent automated mechanisms.
- the interface may permit the analyst to view the black list report and dynamically jump to the problem areas in the data sources and make manual corrections.
- the black list report can be used as input data to an automated script that then serially accesses the problem data sources and corrects the problem data.
- the data analysis tool can also generate white list reports for the data sources.
- the white list may identify statistics on the data that complies with the enterprise data policies. This may be used to determine that one data source should be used over another because it is cleaner and requires less correction.
- FIG. 3 is a diagram of a data analysis system 300 , according to an example embodiment.
- the data analysis system 300 is implemented in a machine-accessible and readable medium as instructions that process on one or more machines of a network.
- the data analysis system 300 is operational over the network; the network may be wired, wireless, or a combination of wired and wireless.
- the data analysis system 300 implements, among other things, the data analysis service and the data analysis tool represented by the methods 100 and 200 of the FIGS. 1 and 2 , respectively.
- the data analysis system 300 includes a data analysis tool 301 and a data analyzer 302 . Each of these will now be discussed in turn.
- the data analysis tool 301 is implemented in a machine-accessible and computer-readable medium as instructions that execute on a machine (computer or processor-enabled device) of the network. Example processing associated with some aspects of the data analysis tool 301 was presented in detail above with reference to the method 100 of the FIG. 1 .
- the data analysis tool 301 is configured or adapted to provide an interactive interface to a data analyst.
- the data analysis tool 301 permits the data analyst to identify data sources that are to be analyzed, scrubbed, and reported on.
- the data analysis tool 301 generates a merge report, which includes statistics regarding actions taken when the data types and patterns within the data sources are identified and which identifies the merge rules and policies applied to the data in the data sources.
- the data analysis tool 301 also generates a duplicate report that identifies duplicate data types that span two or more of the data sources. Information regarding the duplicate report and processing associated with the duplicate report were presented in detail above with reference to the methods 100 and 200 of the FIGS. 1 and 2 , respectively.
- the data sources can come from a variety of enterprise information repositories or enterprise authorities, such as but not limited to: a directory, a relational database table, a file, a WWW page, output produced from an application that also processes on a machine of the network, and/or various combinations of these things.
- enterprise information repositories or enterprise authorities such as but not limited to: a directory, a relational database table, a file, a WWW page, output produced from an application that also processes on a machine of the network, and/or various combinations of these things.
- the analyst can use the data analysis tool 301 to modify one or more of the merge rules or policies during at least one iteration of the processing associated with the data analyzer 302 (the merge rules and policies, discussed below).
- the data analyzer 302 is implemented in a machine-accessible and computer-readable medium as instructions that execute on the machine or a different machine of the network. Example processing associated with some aspects of the data analyzer 302 was presented in detail above with reference to the methods 100 and 200 of the FIGS. 1 and 2 , respective.
- the data analyzer 302 acquires a separate data schema for each of the data sources and uses the data schemas to parse the data sources for purposes of identifying data types and patterns in the data sources. Examples and details regarding this processing were discussed in detail above with reference to the methods 100 and 200 of the FIGS. 1 and 2 , respective.
- the data analyzer 302 uses merge rules and policies to merge some of the data types and their corresponding data from the data sources together in a master data source.
- merge rules and policies were discussed in detail above with reference to the methods 100 and 200 of the FIGS. 1 and 2 , respectively.
- the data analysis tool 301 and the data analyzer 302 combine to provide an interactive and iterative mechanism for a data analyst to have data sources of an enterprise analyzed, scrubbed, and reported on.
- the analysis includes detecting patterns and data types in an automated fashion using the schemas and producing statistics and reports regarding the analysis.
- the scrubbing or cleansing includes merging various data types and patterns in accordance with the merge rules and policies. Both the analysis and the scrubbing include reporting.
- An analyst can iteratively interact with the data analyzer 302 via the data analysis tool 301 to continually iterate over the data sources until a desired enterprise state for the master data source is achieved.
- FIG. 4 is a diagram of another data analysis system 400 , according to an example embodiment.
- the data analysis system 400 is implemented in a machine-accessible and computer-readable medium and is processed on machines of a network.
- the network may be wired, wireless, or a combination of wired and wireless.
- the data analysis system 400 implements among other things the data analysis service and the data analysis tool service represented by the methods 100 and 200 of the FIGS. 1 and 2 , respectively.
- the data analysis system 400 presents and alternative arrangement and perspective to the data analysis system 300 discussed above with reference to the FIG. 3 .
- the data analysis system 400 includes applications 401 and a data analyzer 402 . Each of these and their interactions with one another will now be discussed in turn.
- the applications 401 are each implemented in a machine-accessible and computer-readable medium as instructions that process on same or different machines of the network.
- Each application 401 produces application data housed in a particular data source.
- Each data source includes its own schema and that schema defines structure and syntax for data included in that data source.
- the schema may be directly associated with the output produced by the application 401 or by a data source that the output associated with the application 401 is stored.
- each application 401 produces application data defined by its own schema.
- the data analyzer 402 is implemented in a machine-accessible and computer-readable medium as instructions that process on any machine of the network. Example processing associated with the data analyzer 402 was presented in detail within the discussion associated with the method 100 of the FIG. 1 , the method 200 of the FIG. 2 , and the system 300 of the FIG. 3 .
- the data analyzer 402 parses the application data using the schemas and further uses merging rules and policies to then map the application data to a master data source.
- the schemas provide matching rules for detecting data types and patterns in the application data. Example entries for schemas were provided above with reference to the methods 100 and 200 of the FIGS. 1 and 2 , respectively.
- the merging rules define via conditions which patterns or data types from one application data set that are to be merged and combined with other patterns or other data types from another application data set.
- the policies can override conditions defined in the merge rules and act as Meta conditions on the merging rules.
- the data analyzer 402 iterates the application data a configurable number of times in response to modified merge rules and modified policies. So, as stated above with reference to the methods 100 and 200 of the FIGS. 1 and 2 , respectively, the data analyzer 402 is an iterative tool that a data analyst can employ to analyze, scrub, and report on the enterprise data (application data).
- the data analyzer 402 generates a merge report, a duplicate data report, a white list report, and a black list report.
- the merge report identifies statistics associated with actions taken by the data analyzer 402 in recognizing patterns and data types and in applying merge rules and enforcing policies to produce the master data source.
- the duplicate data report identifies data types that are potentially duplicates of one another across different sets of the application data.
- the white list report provides details on the correctness of the data content included in the application data sets in view of enterprise data polices.
- the black list report provides details on perceived errors in the data content of the application data sets in view of the schemas and/or the enterprise data policies.
- the data analyzer 402 also uses a master schema to assist in mapping the application data sets to the master data source.
- a master schema may be acquired via a master schema that ties and provides the mapping from the individual application schemas for the application data sets to the master data source.
- the data analyzer 402 acquires a unique identity for the master data source once generated from an identity manager.
- the identity manager processes on a machine of the network and provides unique identity assignments to resources of the enterprise for use in security enforcement within the network of the enterprise.
- the identity manager may also supply authentication services to the resources of the enterprise. Also, security restrictions are enforced against the master data source via the identity manager. So, the master data source can be locked down once the data analyst believes that it is in an acceptable state or condition.
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Data Mining & Analysis (AREA)
- Databases & Information Systems (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
Description
Claims (25)
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US12/050,414 US8838652B2 (en) | 2008-03-18 | 2008-03-18 | Techniques for application data scrubbing, reporting, and analysis |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US12/050,414 US8838652B2 (en) | 2008-03-18 | 2008-03-18 | Techniques for application data scrubbing, reporting, and analysis |
Publications (2)
Publication Number | Publication Date |
---|---|
US20090240694A1 US20090240694A1 (en) | 2009-09-24 |
US8838652B2 true US8838652B2 (en) | 2014-09-16 |
Family
ID=41089891
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
US12/050,414 Expired - Fee Related US8838652B2 (en) | 2008-03-18 | 2008-03-18 | Techniques for application data scrubbing, reporting, and analysis |
Country Status (1)
Country | Link |
---|---|
US (1) | US8838652B2 (en) |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US11442952B2 (en) * | 2018-09-24 | 2022-09-13 | Salesforce, Inc. | User interface for commerce architecture |
US11442969B2 (en) * | 2020-04-24 | 2022-09-13 | Capital One Services, Llc | Computer-based systems configured for efficient entity resolution for database merging and reconciliation |
Families Citing this family (10)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US8458148B2 (en) * | 2009-09-22 | 2013-06-04 | Oracle International Corporation | Data governance manager for master data management hubs |
US8407191B1 (en) * | 2010-06-29 | 2013-03-26 | Emc Corporation | Priority based data scrubbing on a deduplicated data store |
US8614966B1 (en) | 2011-12-19 | 2013-12-24 | Sprint Communications Company L.P. | Wireless communication device that determines per-resource data call efficiency metrics |
US10754830B2 (en) * | 2014-08-07 | 2020-08-25 | Netflix, Inc. | Activity information schema discovery and schema change detection and notification |
US11514069B1 (en) * | 2016-06-10 | 2022-11-29 | Amazon Technologies, Inc. | Aggregation of contextual data and internet of things (IoT) device data |
US11720553B2 (en) | 2016-11-11 | 2023-08-08 | Sap Se | Schema with methods specifying data rules, and method of use |
US10452628B2 (en) * | 2016-11-11 | 2019-10-22 | Sap Se | Data analysis schema and method of use in parallel processing of check methods |
US11157563B2 (en) * | 2018-07-13 | 2021-10-26 | Bank Of America Corporation | System for monitoring lower level environment for unsanitized data |
US11062052B2 (en) * | 2018-07-13 | 2021-07-13 | Bank Of America Corporation | System for provisioning validated sanitized data for application development |
US11200239B2 (en) * | 2020-04-24 | 2021-12-14 | International Business Machines Corporation | Processing multiple data sets to generate a merged location-based data set |
Citations (23)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US6148298A (en) | 1998-12-23 | 2000-11-14 | Channelpoint, Inc. | System and method for aggregating distributed data |
US6507843B1 (en) | 1999-08-14 | 2003-01-14 | Kent Ridge Digital Labs | Method and apparatus for classification of data by aggregating emerging patterns |
US20030208460A1 (en) * | 2002-05-06 | 2003-11-06 | Ncr Corporation | Methods, systems and data structures to generate and link reports |
US20030225752A1 (en) | 1999-08-04 | 2003-12-04 | Reuven Bakalash | Central data warehouse with integrated data aggregation engine for performing centralized data aggregation operations |
US20040103124A1 (en) * | 2002-11-26 | 2004-05-27 | Microsoft Corporation | Hierarchical differential document representative of changes between versions of hierarchical document |
US20040181543A1 (en) * | 2002-12-23 | 2004-09-16 | Canon Kabushiki Kaisha | Method of using recommendations to visually create new views of data across heterogeneous sources |
US20050039117A1 (en) * | 2003-08-15 | 2005-02-17 | Fuhwei Lwo | Method, system, and computer program product for comparing two computer files |
US20050060332A1 (en) * | 2001-12-20 | 2005-03-17 | Microsoft Corporation | Methods and systems for model matching |
US20060117057A1 (en) * | 2004-11-30 | 2006-06-01 | Thomas Legault | Automated relational schema generation within a multidimensional enterprise software system |
US20060136428A1 (en) * | 2004-12-16 | 2006-06-22 | International Business Machines Corporation | Automatic composition of services through semantic attribute matching |
US20060155725A1 (en) * | 2004-11-30 | 2006-07-13 | Canon Kabushiki Kaisha | System and method for future-proofing devices using metaschema |
US20060238919A1 (en) | 2005-04-20 | 2006-10-26 | The Boeing Company | Adaptive data cleaning |
US7219104B2 (en) | 2002-04-29 | 2007-05-15 | Sap Aktiengesellschaft | Data cleansing |
US7240279B1 (en) * | 2002-06-19 | 2007-07-03 | Microsoft Corporation | XML patterns language |
US20070239769A1 (en) * | 2006-04-07 | 2007-10-11 | Cognos Incorporated | Packaged warehouse solution system |
US20080027958A1 (en) | 2006-07-31 | 2008-01-31 | Microsoft Corporation | Data Cleansing for a Data Warehouse |
US20080046874A1 (en) * | 2006-08-21 | 2008-02-21 | International Business Machines Corporation | Data reporting application programming interfaces in an xml parser generator for xml validation and deserialization |
US20080052294A1 (en) * | 2002-09-26 | 2008-02-28 | Larkin Michael K | Web services data aggregation system and method |
US20090006315A1 (en) * | 2007-06-29 | 2009-01-01 | Sougata Mukherjea | Structured method for schema matching using multiple levels of ontologies |
US20090006156A1 (en) * | 2007-01-26 | 2009-01-01 | Herbert Dennis Hunt | Associating a granting matrix with an analytic platform |
US20090070237A1 (en) * | 2007-09-11 | 2009-03-12 | Goldman Sachs& Co. | Data reconciliation |
US7505888B2 (en) * | 2004-11-30 | 2009-03-17 | International Business Machines Corporation | Reporting model generation within a multidimensional enterprise software system |
US20090240726A1 (en) * | 2008-03-18 | 2009-09-24 | Carter Stephen R | Techniques for schema production and transformation |
-
2008
- 2008-03-18 US US12/050,414 patent/US8838652B2/en not_active Expired - Fee Related
Patent Citations (23)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US6148298A (en) | 1998-12-23 | 2000-11-14 | Channelpoint, Inc. | System and method for aggregating distributed data |
US20030225752A1 (en) | 1999-08-04 | 2003-12-04 | Reuven Bakalash | Central data warehouse with integrated data aggregation engine for performing centralized data aggregation operations |
US6507843B1 (en) | 1999-08-14 | 2003-01-14 | Kent Ridge Digital Labs | Method and apparatus for classification of data by aggregating emerging patterns |
US20050060332A1 (en) * | 2001-12-20 | 2005-03-17 | Microsoft Corporation | Methods and systems for model matching |
US7219104B2 (en) | 2002-04-29 | 2007-05-15 | Sap Aktiengesellschaft | Data cleansing |
US20030208460A1 (en) * | 2002-05-06 | 2003-11-06 | Ncr Corporation | Methods, systems and data structures to generate and link reports |
US7240279B1 (en) * | 2002-06-19 | 2007-07-03 | Microsoft Corporation | XML patterns language |
US20080052294A1 (en) * | 2002-09-26 | 2008-02-28 | Larkin Michael K | Web services data aggregation system and method |
US20040103124A1 (en) * | 2002-11-26 | 2004-05-27 | Microsoft Corporation | Hierarchical differential document representative of changes between versions of hierarchical document |
US20040181543A1 (en) * | 2002-12-23 | 2004-09-16 | Canon Kabushiki Kaisha | Method of using recommendations to visually create new views of data across heterogeneous sources |
US20050039117A1 (en) * | 2003-08-15 | 2005-02-17 | Fuhwei Lwo | Method, system, and computer program product for comparing two computer files |
US20060155725A1 (en) * | 2004-11-30 | 2006-07-13 | Canon Kabushiki Kaisha | System and method for future-proofing devices using metaschema |
US20060117057A1 (en) * | 2004-11-30 | 2006-06-01 | Thomas Legault | Automated relational schema generation within a multidimensional enterprise software system |
US7505888B2 (en) * | 2004-11-30 | 2009-03-17 | International Business Machines Corporation | Reporting model generation within a multidimensional enterprise software system |
US20060136428A1 (en) * | 2004-12-16 | 2006-06-22 | International Business Machines Corporation | Automatic composition of services through semantic attribute matching |
US20060238919A1 (en) | 2005-04-20 | 2006-10-26 | The Boeing Company | Adaptive data cleaning |
US20070239769A1 (en) * | 2006-04-07 | 2007-10-11 | Cognos Incorporated | Packaged warehouse solution system |
US20080027958A1 (en) | 2006-07-31 | 2008-01-31 | Microsoft Corporation | Data Cleansing for a Data Warehouse |
US20080046874A1 (en) * | 2006-08-21 | 2008-02-21 | International Business Machines Corporation | Data reporting application programming interfaces in an xml parser generator for xml validation and deserialization |
US20090006156A1 (en) * | 2007-01-26 | 2009-01-01 | Herbert Dennis Hunt | Associating a granting matrix with an analytic platform |
US20090006315A1 (en) * | 2007-06-29 | 2009-01-01 | Sougata Mukherjea | Structured method for schema matching using multiple levels of ontologies |
US20090070237A1 (en) * | 2007-09-11 | 2009-03-12 | Goldman Sachs& Co. | Data reconciliation |
US20090240726A1 (en) * | 2008-03-18 | 2009-09-24 | Carter Stephen R | Techniques for schema production and transformation |
Non-Patent Citations (3)
Title |
---|
Chimezie Ogbuji, "Validating XML with Schematron", Nov. 22, 2000, XML.com, pp. 1-6. * |
James W. Hunt and M. Douglas McIlroy, "An Algorithm for Differential File Comparison", Jun. 1976, Bell Laboratories, Computing Science Technical Report, pp. 1-9. * |
Leigh Dodds, "Schematron: validating XML using XSLT", Apr. 2001, ingenta ltd, pp. 1-16. * |
Cited By (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US11442952B2 (en) * | 2018-09-24 | 2022-09-13 | Salesforce, Inc. | User interface for commerce architecture |
US11442969B2 (en) * | 2020-04-24 | 2022-09-13 | Capital One Services, Llc | Computer-based systems configured for efficient entity resolution for database merging and reconciliation |
US20220405310A1 (en) * | 2020-04-24 | 2022-12-22 | Capital One Services, Llc | Computer-based systems configured for efficient entity resolution for database merging and reconciliation |
US11640416B2 (en) * | 2020-04-24 | 2023-05-02 | Capital One Services, Llc | Computer-based systems configured for efficient entity resolution for database merging and reconciliation |
US20230259535A1 (en) * | 2020-04-24 | 2023-08-17 | Capital One Services, Llc | Computer-based systems configured for efficient entity resolution for database merging and reconciliation |
US11934431B2 (en) * | 2020-04-24 | 2024-03-19 | Capital One Services, Llc | Computer-based systems configured for efficient entity resolution for database merging and reconciliation |
Also Published As
Publication number | Publication date |
---|---|
US20090240694A1 (en) | 2009-09-24 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US8838652B2 (en) | Techniques for application data scrubbing, reporting, and analysis | |
Kagdi et al. | Blending conceptual and evolutionary couplings to support change impact analysis in source code | |
Zhang et al. | On complexity and optimization of expensive queries in complex event processing | |
US8516309B1 (en) | Method of debugging a software system | |
US7917815B2 (en) | Multi-layer context parsing and incident model construction for software support | |
US8615526B2 (en) | Markup language based query and file generation | |
US8005803B2 (en) | Best practices analyzer | |
US11599539B2 (en) | Column lineage and metadata propagation | |
US10417430B2 (en) | Security remediation | |
US20130179863A1 (en) | Bug variant detection using program analysis and pattern identification | |
Wang et al. | Synthesizing mapping relationships using table corpus | |
US20110302187A1 (en) | Schema definition generating device and schema definition generating method | |
Fürber et al. | Using semantic web resources for data quality management | |
US9706005B2 (en) | Providing automatable units for infrastructure support | |
WO2017041578A1 (en) | Method and device for acquiring database change information | |
Ruijters et al. | FFORT: a benchmark suite for fault tree analysis | |
Vo et al. | Discovering Conditional Functional Dependencies in XML Data. | |
US7844601B2 (en) | Quality of service feedback for technology-neutral data reporting | |
Sun et al. | A transformation‐based approach to testing concurrent programs using UML activity diagrams | |
Sneed | Testing a web application | |
Abbott et al. | Automated recognition of event scenarios for digital forensics | |
Iqbal et al. | Interlinking developer identities within and across open source projects: The linked data approach | |
Thaler et al. | The IWi process model corpus | |
Bahana et al. | Web crawler and back-end for news aggregator system (Noox project) | |
Polychniatis et al. | Detecting cross-language dependencies generically |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
AS | Assignment |
Owner name: NOVELL, INC., UTAH Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNORS:JENSEN, NATHAN BLAINE;SCHEUBER-HEINZ, VOLKER GUNNAR;CARTER, STEPHEN R.;AND OTHERS;REEL/FRAME:020807/0822;SIGNING DATES FROM 20080313 TO 20080317 Owner name: NOVELL, INC., UTAH Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNORS:JENSEN, NATHAN BLAINE;SCHEUBER-HEINZ, VOLKER GUNNAR;CARTER, STEPHEN R.;AND OTHERS;SIGNING DATES FROM 20080313 TO 20080317;REEL/FRAME:020807/0822 |
|
AS | Assignment |
Owner name: NOVELL INTELLECTUAL PROPERTY HOLDINGS, INC., WASHI Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNOR:CPTN HOLDINGS LLC;REEL/FRAME:027465/0206 Effective date: 20110909 Owner name: CPTN HOLDINGS LLC, WASHINGTON Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNOR:NOVELL,INC.;REEL/FRAME:027465/0227 Effective date: 20110427 |
|
AS | Assignment |
Owner name: NOVELL INTELLECTUAL PROPERTY HOLDING, INC., WASHIN Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNOR:CPTN HOLDINGS LLC;REEL/FRAME:027325/0131 Effective date: 20110909 |
|
FEPP | Fee payment procedure |
Free format text: PAYOR NUMBER ASSIGNED (ORIGINAL EVENT CODE: ASPN); ENTITY STATUS OF PATENT OWNER: LARGE ENTITY |
|
AS | Assignment |
Owner name: RPX CORPORATION, CALIFORNIA Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNOR:NOVELL INTELLECTUAL PROPERTY HOLDINGS, INC.;REEL/FRAME:037809/0057 Effective date: 20160208 |
|
AS | Assignment |
Owner name: JPMORGAN CHASE BANK, N.A., AS COLLATERAL AGENT, IL Free format text: SECURITY AGREEMENT;ASSIGNORS:RPX CORPORATION;RPX CLEARINGHOUSE LLC;REEL/FRAME:038041/0001 Effective date: 20160226 |
|
AS | Assignment |
Owner name: RPX CLEARINGHOUSE LLC, CALIFORNIA Free format text: RELEASE (REEL 038041 / FRAME 0001);ASSIGNOR:JPMORGAN CHASE BANK, N.A.;REEL/FRAME:044970/0030 Effective date: 20171222 Owner name: RPX CORPORATION, CALIFORNIA Free format text: RELEASE (REEL 038041 / FRAME 0001);ASSIGNOR:JPMORGAN CHASE BANK, N.A.;REEL/FRAME:044970/0030 Effective date: 20171222 |
|
FEPP | Fee payment procedure |
Free format text: MAINTENANCE FEE REMINDER MAILED (ORIGINAL EVENT CODE: REM.) |
|
AS | Assignment |
Owner name: JEFFERIES FINANCE LLC, NEW YORK Free format text: SECURITY INTEREST;ASSIGNOR:RPX CORPORATION;REEL/FRAME:046486/0433 Effective date: 20180619 |
|
LAPS | Lapse for failure to pay maintenance fees |
Free format text: PATENT EXPIRED FOR FAILURE TO PAY MAINTENANCE FEES (ORIGINAL EVENT CODE: EXP.); ENTITY STATUS OF PATENT OWNER: LARGE ENTITY |
|
STCH | Information on status: patent discontinuation |
Free format text: PATENT EXPIRED DUE TO NONPAYMENT OF MAINTENANCE FEES UNDER 37 CFR 1.362 |
|
FP | Lapsed due to failure to pay maintenance fee |
Effective date: 20180916 |
|
AS | Assignment |
Owner name: RPX CORPORATION, CALIFORNIA Free format text: RELEASE BY SECURED PARTY;ASSIGNOR:JEFFERIES FINANCE LLC;REEL/FRAME:054486/0422 Effective date: 20201023 |