US20100114629A1 - Extracting Enterprise Information Through Analysis of Provenance Data - Google Patents

Extracting Enterprise Information Through Analysis of Provenance Data Download PDF

Info

Publication number
US20100114629A1
US20100114629A1 US12/265,993 US26599308A US2010114629A1 US 20100114629 A1 US20100114629 A1 US 20100114629A1 US 26599308 A US26599308 A US 26599308A US 2010114629 A1 US2010114629 A1 US 2010114629A1
Authority
US
United States
Prior art keywords
data
enterprise
provenance
graph
analysis
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
US12/265,993
Other versions
US9053437B2 (en
Inventor
Sharon C. Adler
Francisco Phelan Curbera
Yurdaer Nezihi Doganata
Chung-Sheng Li
Douglas C. Lovell
Axel Martens
Kevin Patrick McAuliffe
Huong Thu Morris
Nirmal K. Mukhi
Aleksander A. Slominski
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
International Business Machines Corp
Original Assignee
International Business Machines Corp
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by International Business Machines Corp filed Critical International Business Machines Corp
Priority to US12/265,993 priority Critical patent/US9053437B2/en
Assigned to INTERNATIONAL BUSINESS MACHINES CORPORATION reassignment INTERNATIONAL BUSINESS MACHINES CORPORATION ASSIGNMENT OF ASSIGNORS INTEREST (SEE DOCUMENT FOR DETAILS). Assignors: ADLER, SHARON C., CURBERA, FRANCISCO PHELAN, DOGANATA, YURDAER NEZIHI, LI, CHUNG-SHENG, MCAULIFFE, KEVIN PATRICK, MUKHI, NIRMAL K., LOVELL, DOUGLAS C., MARTENS, AXEL, MORRIS, HUONG THU, SLOMINSKI, ALEKSANDER A.
Publication of US20100114629A1 publication Critical patent/US20100114629A1/en
Application granted granted Critical
Publication of US9053437B2 publication Critical patent/US9053437B2/en
Expired - Fee Related legal-status Critical Current
Adjusted expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q10/00Administration; Management
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q10/00Administration; Management
    • G06Q10/06Resources, workflows, human or project management; Enterprise or organisation planning; Enterprise or organisation modelling
    • G06Q10/063Operations research, analysis or management
    • G06Q10/0637Strategic management or analysis, e.g. setting a goal or target of an organisation; Planning actions based on goals; Analysis or evaluation of effectiveness of goals
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q10/00Administration; Management
    • G06Q10/04Forecasting or optimisation specially adapted for administrative or management purposes, e.g. linear programming or "cutting stock problem"
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q10/00Administration; Management
    • G06Q10/06Resources, workflows, human or project management; Enterprise or organisation planning; Enterprise or organisation modelling
    • G06Q10/063Operations research, analysis or management
    • G06Q10/0633Workflow analysis

Definitions

  • the present application is related to the U.S. patent applications respectively identified as: (i) attorney docket no. YOR920080508US1, entitled “Processing of Provenance Data for Automatic Discovery of Enterprise Process Information;” (ii) attorney docket no. YOR920080509US1, entitled “Validating Compliance in Enterprise Operations Based On Provenance Data;” and (iii) attorney docket no. YOR920080592US1, entitled “Influencing Behavior of Enterprise Operations During Process Enactment Using Provenance Data,” all of which are filed concurrently herewith, and the disclosures of which are incorporated by reference herein in their entirety.
  • the present invention relates to provenance data and, more particularly, to techniques for extracting information through analysis of provenance data.
  • Illustrative embodiments of the invention provide techniques for extracting information through analysis of provenance data.
  • a computer-implemented method of extracting information regarding an execution of an enterprise process comprises the following steps.
  • Provenance data is generated, wherein the provenance data is based on collected data associated with an actual end-to-end execution of the enterprise process and is indicative of a lineage of one or more data items.
  • a provenance graph is generated that provides a visual representation of the generated provenance data, wherein nodes of the graph represent records associated with the collected data and edges of the graph represent relations between the records. At least a portion of the generated provenance data from the graph is analyzed so as to extract information about the execution of the enterprise process based on the analysis.
  • embodiments of the invention provide techniques to extract information about an enterprise process from its execution traces which are captured by employing enterprise provenance graph technology. Hence, analysis of an enterprise provenance graph and its data is performed to extract information and discover knowledge about enterprise practices, policies, operational aspects, process patterns, workflow, statistics and models.
  • FIG. 1 illustrates a system for collecting and processing provenance data for automatic discovery of enterprise process information, according to an embodiment of the invention.
  • FIG. 2 illustrates a provenance record, according to an embodiment of the invention.
  • FIG. 3 illustrates a provenance data model, according to an embodiment of the invention.
  • FIG. 4A illustrates an enterprise application scenario used to generate sample provenance graph, according to am embodiment of the invention.
  • FIG. 4B illustrates a provenance graph extracted from an enterprise scenario, according to an embodiment of the invention.
  • FIG. 4C illustrates a provenance sub-graph that represents a control-point, according to an embodiment of the invention.
  • FIG. 5 illustrates a provenance graph enrichment process, according to an embodiment of the invention.
  • FIG. 6 illustrates an enterprise process information extraction system, according to an embodiment of the invention.
  • FIG. 7 illustrates process pattern extraction and pattern search components, according to an embodiment of the invention.
  • FIG. 8 illustrates a statistical pattern analysis methodology, according to an embodiment of the invention.
  • FIG. 9 illustrates an example of extracting workflow information, according to an embodiment of the invention.
  • FIG. 10 illustrates a relationship discovery methodology, according to an embodiment of the invention.
  • FIG. 11 illustrates a relation discovery subsystem, according to an embodiment of the invention.
  • FIG. 12 illustrates an example of finding a relationship between two enterprise objects, according to an embodiment of the invention.
  • FIG. 13 illustrates a computer system in accordance with which one or more components/steps of the techniques of the invention may be implemented, according to an embodiment of the invention.
  • enterprise is understood to broadly refer to any entity that is created or formed to achieve some purpose, examples of which include, but are not limited to, an undertaking, an endeavor, a venture, a business, a concern, a corporation, an establishment, a firm, an organization, or the like.
  • enterprise processes are processes that the enterprise performs in the course of attempting to achieve that purpose.
  • enterprise processes may comprise business processes.
  • the term “provenance” is understood to broadly refer to an indication or determination of where something, such as a unit of data, came from or an indication or determination of what it was derived from. That is, the term “provenance” refers to the history or lineage of a particular item.
  • provenance information or “provenance data” is information or data that provides this indication or results of such determination.
  • enterprise provenance data may comprise business provenance data.
  • Principles of the invention provide a technique to extract information about an enterprise process from its execution traces which are captured by employing enterprise provenance graph technology.
  • a focus of the invention is the analysis of an enterprise provenance graph to extract information and discover knowledge about enterprise practices, policies, operational aspects, process patterns and models.
  • provenance technology is utilized to capture various aspects of the enterprise operations from relevant system events.
  • various data records that capture different aspects of the operations are collected and correlated to form a graph which is called a “provenance graph.”
  • Provenance records reflecting data objects, tasks, or processes are created by listening to the underlying events of the IT (information technology) infrastructure.
  • the records collected about enterprise processes are connected through correlation and a representation of end-to-end process is obtained.
  • graph analysis techniques are used to extract information related to the enterprise practices.
  • One embodiment of the invention provides a sub-system that extracts common process patterns that are followed during the execution of the enterprise process. In the absence of a model or when the execution does not follow the model, extracted patterns help understand the actual process, optimize the system and compare it with existing solutions. A comprehensive list of process patterns is published in the literature and available for reuse. Once the patterns that are employed in the enterprise process are discovered, they are indexed and stored. Other than the common process pattern, embodiments of the invention allow to search for a specific process pattern. A custom pattern may be specified through a user interface and pattern search function is invoked to search for this particular pattern.
  • Another embodiment of the invention provides for the statistical analysis of the executed paths.
  • Statistics of the execution traces reveal the enterprise practices generally adopted within the enterprise, exceptional situations and exception rates, performance parameters such as expected delays, throughput, etc. More importantly, graph analysis techniques help to predict how the enterprise process will evolve. This way it becomes possible to answer questions like how long a task will last to completion and what are the likely tasks and activities that will come next.
  • Yet another embodiment of the invention provides for the discovery of the workflow from the execution traces.
  • the actual workflow of the execution traces may differ from the initial workflow model.
  • the analysis reveals the discrepancies.
  • Workflow is the depiction of a series of tasks and information flow that are executed to fulfill a requirement or a goal. It is a visual abstraction of the process.
  • One other embodiment provides for the discovery of a relationship between two enterprise objects that are part of the process.
  • a methodology is described that discovers the relationship between two objects that are not directly connected or correlated within the provenance graph. This is a complex relation formed by combining the atomic relation between other objects on the path that connects the two objects. Finding relations is important to discover roles and relations and help auditing to ensure enterprise integrity.
  • the provenance graph encapsulates various aspects of the enterprise and embodiments of the invention disclose a methodology to discover these aspects. These include people and roles (referred to as “organizational aspect”), the tasks and activities executed (referred to as “functional aspect”), and the data pedigree and flow (referred to as “data aspect”).
  • Section I provides illustrative embodiments of an enterprise provenance approach that provides for creation and maintenance of a provenance data model and graph. This approach is disclosed in the above-referenced U.S. patent application identified as attorney docket no. YOR920080508US1, entitled “Processing of Provenance Data for Automatic Discovery of Enterprise Process Information,” filed concurrently herewith and incorporated by reference herein in its entirety. Section II of the detailed description below then provides description of the above-mentioned illustrative embodiments for extracting enterprise practices and policies through inspection and analysis of provenance data.
  • an enterprise provenance approach as one that comprises capturing and managing the lineage of enterprise artifacts to discover functional, organizational, data and resource aspects of an enterprise. Examining enterprise provenance data gives insight into the chain of cause and effect relations and facilitates understanding the root causes of the resultant event.
  • our approach comprises the following steps: (1) identifying the control points, relevant enterprise artifacts and required correlations; (2) probing the actual execution of the enterprise process to collect data; (3) correlating and enriching the collected data and the relations among them to create a provenance graph; (4) analyzing aggregated information to enable enterprise activity monitoring or to interfere with the execution by generating alerts; and (5) providing access to information stored in the graph for detailed investigation and root cause analysis.
  • FIG. 1 shows a system for capturing and processing provenance data for automatic discovery of enterprise process information, according to an embodiment of the invention.
  • the enterprise process information discovery system comprises storage unit 101 , multi-capturing/recording components 103 , provenance data management sub-system 107 , rules library 109 , provenance graph enrichment engine 111 , text analysis engine 110 , enterprise data repository 120 , provenance data query interface 113 , graph visualizer 117 and dashboard 115 .
  • the provenance data management component 107 supports the specification of the provenance data model 105 , i.e., the list of enterprise objects to be captured and the level of details. It is also used to define the correlation rules between two data records.
  • Capturing/recording components 103 are used to capture, process, and reformat application events of the underlying information system 100 (including, for example, computers, servers, repositories, email systems and other enterprise systems) and record the meta-data of enterprise operations into the provenance store. Hence, capturing/recording components 103 map the captured event data onto the data model defined ( 122 ) by provenance data management component 107 . The information is then transferred ( 121 ) to storage unit 101 , which is the store for provenance data.
  • Provenance data management component 107 generates rules ( 130 ) that are stored in rules library 109 for provenance graph enrichment engine 111 .
  • the rules define a correlation between the enterprise artifacts which is then used to connect them in the provenance graph representation.
  • Provenance graph enrichment engine 111 links and enriches the collected data to produce the provenance graph. To do so, provenance graph enrichment engine 111 accesses ( 126 ) the content of the provenance store 101 through provenance data query interface 113 as well as the original enterprise data. It also employs text analysis engine 110 to discover relationships among data records by analyzing the unstructured text contained in some of the data records. As an example, the analysis of e-mail may reveal that it is a rejection and is used to establish a link between the e-mail and an approval task.
  • the enriched enterprise data is accessed through query interface 113 and is used to display information about actual enterprise operations. This can be done in one of several ways. One way is to deploy a query into the provenance store which emits the results in real-time, feeding an existing dashboard 115 in order to display key performance indicators as an example. Secondly, a query front-end enables visualization and navigation through the provenance graph by using graph visualizer component 117 .
  • the central component of the architecture is data store 101 where the provenance graph and the associated data records are kept.
  • the probed event data coming from the runtime systems 100 is transformed into provenance data by capturing/recording component 103 , they are written to the store through a database connection ( 121 ).
  • provenance graph enrichment engine 111 is notified via connection 124 .
  • Provenance graph enrichment engine 111 examines the new data records and run associated rules from the rules library, utilizes the existing enterprise data as well as text analysis engine 110 to determine a possible correlation. If new data items or relations are discovered, they are written to the province store via query interface 113 .
  • Ensuring compliance through the information system 100 requires laying out a data model that covers the relevant aspects of the enterprise operations. Creating a data model is the first step to bridge enterprise operations to information systems. The data model should support relevant and salient aspects of the enterprise.
  • FIG. 2 illustrates a comprehensive, generic data model that can be extended to meet the domain specific needs.
  • the data of enterprise artifacts stored in the provenance store depicted as Provenance Record 210 , falls into one of the following five dimensions or classes:
  • a data record is the representation of an enterprise artifact that was produced or changed during execution of an enterprise process.
  • those artifacts include documents, e-mails, and database records.
  • provenance store each version of such an artifact is represented separately.
  • Task Record 220 A task record is the representation of the execution of one particular task. Such task might be part of a formally defined enterprise process or be stand alone; it might be fully automated or manual.
  • a process record represents one instance of a process.
  • tasks are executed by processes. Hence, each task is associated to the corresponding process record.
  • Resource Record 215 A resource record represents a person, a runtime or a different kind of resource that is relevant to the selected scope of enterprise provenance, e.g., as actor of a particular task.
  • Custom Records 250 provide the extension point to capture domain specific, mostly virtual artifacts such as compliance goals, alerts, checkpoints, etc. This will be explained in greater detail below.
  • Relation Records 260 represent the edges. These are the records generally produced as a result of relation analysis among the collected records. For simplicity of explanation, we only consider binary relations between records. However, relations between relation records are possible and such higher degree relation could be expressed in accordance with illustrative principles of the invention. Some relations are rather basic on the IT (information technology) level, such as the read and write between tasks and data. Other relations are derived from the context, such as that between manager and achieved challenge.
  • the inventive enterprise provenance solution provides a generic data model that can be extended to meet the application domain specific needs.
  • FIG. 3 depicts the UML (Unified Modeling Language) representation of the provenance graph data model.
  • the provenance graph comprises six different sets of records, namely, Process 310 , Data 320 , Task 330 , Resource 340 , Relation 380 and Custom 350 record types.
  • Each record is an extensible XML data structure and all records share common attributes: id and type are used to identify and classify the record within the graph; the appId (application specific id) and display name refer to characteristics of the corresponding enterprise artifact. These attributes are inherited from a parent record type, RecordType 370 .
  • Data, task and process records are added to the provenance graph as the business operations are executed. Resource and custom records are often added after the fact by analytics.
  • FIG. 3 shows several specializations of the basic record types. The challenge document and key control point type, however, are specific to a particular application.
  • ProcessRecordType 310 is differentiated from the other record types by trigger, startTime, endTime, runtime and model attributes.
  • DataRecordType 320 has creator, creation Time, location, hash Value attributes. These attributes are consistent with the original purpose of having these records in the graph.
  • EmailRecordType 322 two data record types are exemplified which are specific to a particular application; EmailRecordType 322 and ChallengeDocumentType 324 .
  • Email record type contains all the attributes necessary to represent an e-mail document such as subject, from, to, cc, bcc, sendTime, receiveTime, reference, attachments while ChallengeDocumentType represents an application specific document attributes.
  • RelationRecordType 380 has source and target attributes.
  • RelationRecordType Various other relation types are also depicted as extensions of RelationRecordType in 382 .
  • CustomRecordType 350 is introduced and KeyControlPointType 352 is shown as an example to a custom record type.
  • KeyControlPointType 352 is used to relate records to a particular compliance control point.
  • ProvenanceGraphType 360 is introduced to represent the attributes of the graph which are listed as relations, dataRecords, taskRecords, processRecords, resourceRecords, customRecords.
  • the domainId attribute is introduced to specify the particular domain for which this provenance graph is generated.
  • EmployeeRecordType 344 contains the attributes that define an employee within the organization. These attributes are listed as an email address, a userid, indicator of being a manager or not, the name of employee's manager and employee's role in executing the tasks.
  • a recordType 370 is the parent of all record types from where they inherit id, type, application id, display name and xml attributes.
  • the children of recordType 370 are ProcessRecordType 310 , DataRecordType 320 , TaskrecordType 330 , CustomRecordType 350 and RelationRecordType 370 , as mentioned previously.
  • ExtensibleType 394 can be considered the ancestor of all types which has three children, namely, RecordType ( 370 ), RecordReferenceType ( 390 ) and ContentReferenceType ( 396 ).
  • ExtensibleType passes one attribute, extensions, to the children. This attribute gives flexibility to have multiple extensions of the same model.
  • ContentReferenceType 396 and RecordReferenceType 390 are used to refer to the location of actual data.
  • the provenance graph is a meta-information repository and the actual data resides within the enterprise at the addresses specified in record and content reference types.
  • Resource RecordType ( 340 ) has two children. That is, there are two kinds of resource records, employees and machines. These are the entities that activate task items. In the model, employee resource is represented by EmployeeRecordType 344 and machine resources are represented as RuntimeRecordType ( 346 ).
  • FIG. 4A illustrates this scenario.
  • the manager creates the challenge ( 1 ) using a Web-front-end to the central record management system.
  • This task triggers an automated email informing the employee about the challenge.
  • the employee has to provide evidence ( 2 )—which can take various forms: a contract or receipt, a fax from the sales customer, a pointer to a different revenue database, etc.
  • the evidence is available electronically and it is attached to an e-mail sent to his manager by the employee.
  • the manager evaluates the challenge and, in case of achievement, marks its status ( 3 ).
  • the latest achievement data is collected and fed into the payroll system ( 4 ).
  • the paycheck is issued to the employee ( 5 ).
  • control points In order to assure the compliance of the overall process with legal accounting regulations, various control points are introduced. Each control point reflects one locally verifiable requirement is validated today manually for a small number of sampled transactions by internal and/or external auditors. Typically, control points are established for the interaction of various systems and the verification of the control point requires the correlation of structured and/or unstructured data. In FIG. 4A , the two control points are shown. Control point A requires the manager to obtain, evaluate carefully, and maintain the evidence of any achieved challenge. Control point B requires the paycheck to reflect the accumulated commissions correctly.
  • an auditor selects an achieved challenge, requests the evidence, and compares the sales targets with the documented achievements.
  • This seemingly simple task has proven to be quite complicated in practice.
  • the evidence is not directly linked to the challenge. In some cases, it is not even stored in a central repository but kept locally by the manager. The auditor therefore has to contact the manager, and the manager has to find the right documents.
  • Our observations have shown compliance failure rate of 70%, largely because the evidence could not be located.
  • FIG. 4B depicts the provenance graph for the scenario explained above.
  • DataRecord types are identified by cylindrical shapes while ResourceRecord types are hexagonal, and TaskRecord types are rectangular.
  • the corresponding task records are represented in FIG. 4B as ChallengeProcess node 470 , CreateChallenge node 420 , and MarkAchievenment node 410 .
  • the corresponding resource records are represented as SalesManager node 450 and SalesEmployee node 460 .
  • Corresponding data records are represented as OfferedChallenge node 430 and AchievedChallenge node 440 .
  • the diamond shapes on the edges between nodes represent the corresponding relation records: partOf 422 , writes 426 , priorVersion 432 , reads 434 , priorTask 424 , actor 452 , partOf 472 , actor 458 , managerOf 454 , writes 412 , managerOf 456 , employeeOf 462 .
  • the provenance sub-graph of FIG. 4C shows how to represent a control point (in particular, control point A shown in FIG. 4A ) which indicates a requirement that sales manager must obtain and review the supporting document that supports the achieved challenge. Representing control points at the IT level enables computing compliance automatically.
  • the corresponding task record is represented in the sub-graph of the control point ( 468 ) in FIG. 4C as SendClaim node 476 .
  • the corresponding resource records are represented as SalesManager node 470 and SalesEmployee node 471 .
  • Corresponding data records are represented as AchievedChallenge node 472 , ClaimEmail node 474 , and SupportingDocument node 478 .
  • the diamond shapes on the edges between nodes represent the corresponding relation records. For the sake of simplicity, they have not been separately numbered since their specific relationships to the nodes they attach are dependent on the process being modeled (and fully understood from the scenario explained above in the context of FIG. 4A ).
  • FIG. 5 shows the process of enriching the provenance graph.
  • Provenance graph 500 is enriched by finding the relations among existing provenance records and discovering the new ones.
  • the relations among the provenance records are defined by the rule files stored in the rule library 109 .
  • a simple rule may indicate that if the value of “From” field of an e-mail document is equal to the e-mail address of a person record, “sender” relation is set between the e-mail DataRecord and the person ResourceRecord.
  • provenance graph enrichment engine 111 is notified via a graph event listener 510 .
  • the attributes of these newly created records are queried through graph query interface 520 and the received information is passed to the analytics component 540 .
  • the main function of the analytics is to find relations or new records by computing the rules stored in the rules library 109 over the attributes of provenance records.
  • Existing enterprise data 120 could also be used to find new relations, such as management or organizational relations.
  • Text analysis engine 110 is employed when rules require the analysis of an unstructured content.
  • a system for utilizing the actual execution traces of an enterprise operation in order to extract information about the enterprise.
  • the execution trace is captured in the form a graph from the instances of enterprise operations in a manner as described above in section I. Recall from FIG. 1 that the graph data is stored in the provenance store 101 and accessed through a query interface 113 .
  • the enterprise information that is encapsulated by the graph is extracted by various system components.
  • the functional descriptions of these components are given below.
  • Process patterns are the pattern of activities within an organization that solve common problems. Repeatable ways of bringing together activities to solve common problems form patterns. Many known process patterns are identified and listed to help process modeling and design, for example, see W. M. P van der Aalst et al., “Workflow Patterns,” Distributed and Parallel Databases, 14(3), pages 5-51, July 2003, the disclosure of which is incorporated by reference herein in its entirety. These patterns are divided in six categories.
  • Extracting process patterns help understanding how the enterprise operations are modeled.
  • the abstractions provided by the models help understanding the performance issues and optimize the operation end-to-end.
  • Statistical pattern analyzer The way enterprise operations execute is not deterministic. In many cases, unpredictable human behavior is the cause of variations in process executions. Statistical pattern analyzer examines many execution traces for the same process and extracts statistical information about the execution patterns. Examples include the execution paths that are most frequently used, process evolution predictions, exception ratios, delay statistics, throughput, etc.
  • Workflow is a sequence of operations or tasks and information flow that are executed to achieve a goal.
  • Execution trace contains the information about the workflow.
  • the workflow analyzer examines the trace and predicts the workflow.
  • a process model is extracted as a result. This way an abstraction of the execution path is obtained.
  • This analyzer determines the human resources behind the execution of non-automated tasks and their roles. It finds out who executed what and when.
  • Functional aspect analyzer During the execution of an enterprise process, different tasks are invoked. This analyzer finds the list of executed tasks, their orders, duration of each task, and outputs of each task.
  • Data aspect analyzer Identifying the enterprise artifacts and their pedigree is the purpose of the data aspect analyzer.
  • Enterprise artifacts are documents, e-mails, forms and other structured or unstructured data that are consumed during the execution of an enterprise process.
  • Pattern Search This is a special search function to search for special process patterns within the execution trace.
  • Relation finder The relationship between two enterprise objects can be discovered by analyzing the paths between these objects. The relation finder looks into existing atomic relations between enterprise objects and determines the relationship between two objects where there is no direct relation.
  • embodiments of the invention extract information about enterprise operations by examining the execution traces which are captured by creation of a provenance graph. Recall from FIG. 1 (described in section I above) that a data model captures the relevant aspects of the enterprise and that the data associated with enterprise artifacts are stored in the provenance store 101 as ProvenanceRecord 210 ( FIG. 2 ). Recall that the data falls into one of the following five record types:
  • a data record is the representation of an enterprise artifact that was produced or changed during execution of an enterprise process.
  • those artifacts include documents, e-mails, and database records.
  • provenance store each version of such an artifact is represented separately.
  • Task Record Recall that a task record is the representation of the execution of one particular task. Such task might be part of a formally defined enterprise process or be stand alone, and it might be fully automated or manual.
  • Process Record Recall that a process record represents one instance of a process. In automated business management systems, tasks are executed by processes. Hence, each task is associated to the corresponding process record.
  • Resource Record Recall that a resource record represents a person, a runtime or a different kind of resource that is relevant to the selected scope of enterprise provenance, e.g., as actor of a particular task.
  • Custom Records Recall that a custom record provides the extension point to capture domain specific, mostly virtual artifacts such as compliance goals, alerts, checkpoints, etc.
  • Embodiments of the invention assume that a provenance graph is created from the execution traces of enterprise operations. Recall that a sample provenance graph is depicted in FIG. 4C .
  • FIG. 6 depicts an enterprise process information extraction system 650 .
  • system 650 sends queries to provenance data 600 (in form of a provenance graph), stored in provenance store 605 , via provenance graph query interface 610 .
  • the queries are sent to retrieve information for analysis.
  • the components of system 650 are process pattern extractor 620 , pattern search 622 , organizational aspect analyzer 630 , functional aspect analyzer 632 , data aspect analyzer 634 , relation (path) finder 640 , statistical pattern analyzer 642 and workflow analyzer 644 .
  • the system also includes a user interface 660 . Descriptions of these components are given above.
  • Process execution trace 600 is a directed graph.
  • a number of known languages are available to query a directed graph such as a provenance graph.
  • One example is SPARQL (Simple Protocol and RDF Query Language) for RDF (Resource Description Framework) which is a directed, labeled graph data format.
  • provenance graph query interface 610 provides for a set of SPARQL interfaces to access graph information 600 in store 605 . These include, for example, graph nodes, edges, their connections, paths between the nodes, etc.
  • organizational aspect analyzer 630 sends queries to find out people and their roles during enterprise operations.
  • People are ResourceRecords that constitute some of the nodes of the provenance graph. People roles are related to the task they invoke.
  • the edges that connect the people records to task records reveal various roles. Examples may include approver, sender, receiver, reviewer, etc.
  • the functional aspect analyzer 634 sends queries via 610 to retrieve the list of task and activities.
  • organizational aspect analyzer 330 determines people's role by searching for the edges that connect people records to task records.
  • Data aspect analyzer 634 extracts data record objects from the graph via 610 and finds their relations in time. A new DataRecord node is created for a new version of the data object on the provenance graph.
  • Data aspect analyzer 634 determines the creation and modification dates of every data item, which tasks consume the data objects, and the people who create and modify the data. All of these functions are performed by using a query language like SPARQL, which enables retrieving various properties of the graph.
  • FIG. 7 depicts the components of process pattern extractor 620 .
  • Common process patterns are identified as reusable process design patterns.
  • Common process pattern subcomponent 710 contains the representations of common patterns where each pattern is represented in terms of provenance graph data model. The representation is based on TaskRecord type of ProvenanceRecord as depicted in FIG. 1 and the relationship between TaskRecords. As an example, if multiple task records are merging into the same TaskRecord, then it is an indication of a “simple merge” pattern. Similarly, all patterns can be expressed in terms of the provenance graph TaskRecords and their relations. As a result, graph patterns associated with every common process pattern are formed. These graph patterns are stored in block 710 while associated queries for every pattern are kept in a library of common process pattern queries 720 .
  • a common process pattern is a sub-graph where the nodes are TaskRecords and the edges are relations between the TaskRecords.
  • An associated query for a common pattern retrieves all sub-graphs of the provenance graph with the same properties.
  • a common process pattern query sent by block 720 returns a list of matching patterns from the provenance graph in block 740 .
  • Common process patterns are then extracted in block 750 from the list of returned sub-graph patterns in block 740 .
  • Execution traces 600 may contain patterns other than the common process patterns.
  • the search function 622 is used to extract patterns that are not expressed as a common pattern.
  • special pattern query 770 is formed for the pattern to be searched for and the corresponding results are retrieved by block 770 .
  • FIG. 8 shows how to compute the statistics of execution paths between two points in a process.
  • the process may have many execution traces and, for statistics to be meaningful, hence it is assumed that sufficient numbers of execution traces are collected before the statistics module is run.
  • the algorithm starts with identifying the start and end points ( 800 ). These are nodes of the provenance graph. The algorithm is run over all the traces. If the current trace is NOT null (NO in 820 ), then a Bayesian Learning algorithm 840 is run between the start and end nodes. Existing execution traces are used to create a Markovian model for the provenance graph. The Markovian model assigns probabilities for each node and transitions between the nodes.
  • the learning algorithm is based on the statistics of transitions between nodes and the average time spent in each node. If the trace is null and thus no other traces are available to analyze (YES in 820 ), then the available statistics are displayed ( 830 ). A known path finding algorithm can be employed.
  • path statistics are re-computed ( 860 ) and the next trace ( 810 ) is examined. Hence, every new trace contributes to the path statistics in step 860 .
  • Path statistics include standardized regression coefficients predicting one variable from another. Once the statistical model is computed for a path, then this model is used for predicting how the enterprise process will evolve. As an example, from the statistical model, one can predict what the next execution step will be.
  • Well-known machine techniques are available for analyzing a path and computing path statistics by using Bayesian Network Model (see Machine Learning, Tom Mitchell, McGraw Hill, 1997, the disclosure of which is incorporated by reference herein in its entirety).
  • Workflow is the sequence of tasks in an enterprise operation and provides for an abstraction or a model for the actual work. It provides an easier way to understand and explain complex processes in visual form.
  • workflow analyzer 644 depicted in FIG. 6 extracts the sequence of tasks from the provenance graph.
  • the sequence of tasks is retrieved from the provenance graph via query interface 610 by searching for the pedigree of TaskRecords, DataRecords and their relations.
  • FIG. 9 is an example workflow 900 for the enterprise operation which includes not only the sequence of tasks but also the data consumed between the tasks.
  • An edge connecting the two nodes of a graph is an atomic relation.
  • SupportingDocument 478 and ClaimEmail 474 nodes are connected with attachedTo relation edge.
  • the atomic relation reveals that SupportingDocument is attached to the ClaimEmail.
  • More complex relations can be composed by combining the atomic relations.
  • FIG. 10 shows the sequence of steps to find the relationship between two provenance records X and Y ( 1000 ).
  • the first step is to find all paths between X and Y ( 1010 ). Once the connecting paths are discovered, the paths are expressed in terms of their atomic relations ( 1020 ), and the main relationship is expressed as the combination of atomic relations ( 1030 ).
  • FIG. 11 shows how the relation finder subsystem works. Recall that this component is shown in FIG. 6 as relation (path) finder 640 .
  • a pair of provenance records 1100 for which the relationship is sought is entered through user interface 660 .
  • One of the path finding algorithms 1110 is used to find the path between the records via the query interface 610 .
  • A* and Dijkstra are some of the most frequently used path finding algorithms that may be used.
  • the datalog format is known and disclosed in, for example, S Ceri et al., “What you always wanted to know about Datalog (and never dared to ask),” IEEE Transactions on Knowledge and Data Engineering 1(1), 1989, pp. 146-66; and Datalog User Manual, John D. Ramsdell of The MITRE Corporation, 2004, the disclosures of which are incorporated by reference herein in their entirety.
  • FIG. 13 illustrates a computer system in accordance with which one or more components/steps of the techniques of the invention may be implemented. It is to be further understood that the individual components/steps may be implemented on one such computer system or on more than one such computer system.
  • the individual computer systems and/or devices may be connected via a suitable network, e.g., the Internet or World Wide Web.
  • the system may be realized via private or local networks. In any case, the invention is not limited to any particular network.
  • the computer system shown in FIG. 13 may represent one or more of the components/steps shown and described above in the context of in FIGS. 1 through 12 .
  • the computer system may be used to implement one or more of the components of the information extraction system depicted in FIG. 6 .
  • the computer system may generally include a processor 1301 , memory 1302 , input/output (I/O) devices 1303 , and network interface 1304 , coupled via a computer bus 1305 or alternate connection arrangement.
  • processor as used herein is intended to include any processing device, such as, for example, one that includes a CPU and/or other processing circuitry. It is also to be understood that the term “processor” may refer to more than one processing device and that various elements associated with a processing device may be shared by other processing devices.
  • memory as used herein is intended to include memory associated with a processor or CPU, such as, for example, RAM, ROM, a fixed memory device (e.g., hard disk drive), a removable memory device (e.g., diskette), flash memory, etc.
  • the memory may be considered a computer readable storage medium.
  • input/output devices or “I/O devices” as used herein is intended to include, for example, one or more input devices (e.g., keyboard, mouse, etc.) for entering data to the processing unit, and/or one or more output devices (e.g., display, etc.) for presenting results associated with the processing unit.
  • input devices e.g., keyboard, mouse, etc.
  • output devices e.g., display, etc.
  • network interface as used herein is intended to include, for example, one or more transceivers to permit the computer system to communicate with another computer system via an appropriate communications protocol.
  • software components including instructions or code for performing the methodologies described herein may be stored in one or more of the associated memory devices (e.g., ROM, fixed or removable memory) and, when ready to be utilized, loaded in part or in whole (e.g., into RAM) and executed by a CPU.
  • ROM read-only memory
  • RAM random access memory

Abstract

Techniques are disclosed for extracting information through analysis of provenance data. For example, a computer-implemented method of extracting information regarding an execution of an enterprise process comprises the following steps. Provenance data is generated, wherein the provenance data is based on collected data associated with an actual end-to-end execution of the enterprise process and is indicative of a lineage of one or more data items. A provenance graph is generated that provides a visual representation of the generated provenance data, wherein nodes of the graph represent records associated with the collected data and edges of the graph represent relations between the records. At least a portion of the generated provenance data from the graph is analyzed so as to extract information about the execution of the enterprise process based on the analysis.

Description

    CROSS REFERENCE TO RELATED APPLICATIONS
  • The present application is related to the U.S. patent applications respectively identified as: (i) attorney docket no. YOR920080508US1, entitled “Processing of Provenance Data for Automatic Discovery of Enterprise Process Information;” (ii) attorney docket no. YOR920080509US1, entitled “Validating Compliance in Enterprise Operations Based On Provenance Data;” and (iii) attorney docket no. YOR920080592US1, entitled “Influencing Behavior of Enterprise Operations During Process Enactment Using Provenance Data,” all of which are filed concurrently herewith, and the disclosures of which are incorporated by reference herein in their entirety.
  • FIELD OF THE INVENTION
  • The present invention relates to provenance data and, more particularly, to techniques for extracting information through analysis of provenance data.
  • BACKGROUND OF THE INVENTION
  • Today's enterprise applications span multiple systems and organizations, integrating legacy and newly developed software components to deliver value to enterprise operations. However, the actual execution of enterprise applications may not be predicted in advance. This is mainly because of process dependencies on human activities and lack of automation. Nevertheless, the information about what has actually happened during the execution of an enterprise process is hidden in the process execution trace.
  • It would be desirable to be able to easily and effectively extract such information. However, this is not possible with existing enterprise process management systems.
  • SUMMARY OF THE INVENTION
  • Illustrative embodiments of the invention provide techniques for extracting information through analysis of provenance data.
  • For example, in one embodiment, a computer-implemented method of extracting information regarding an execution of an enterprise process comprises the following steps. Provenance data is generated, wherein the provenance data is based on collected data associated with an actual end-to-end execution of the enterprise process and is indicative of a lineage of one or more data items. A provenance graph is generated that provides a visual representation of the generated provenance data, wherein nodes of the graph represent records associated with the collected data and edges of the graph represent relations between the records. At least a portion of the generated provenance data from the graph is analyzed so as to extract information about the execution of the enterprise process based on the analysis.
  • Advantageously, embodiments of the invention provide techniques to extract information about an enterprise process from its execution traces which are captured by employing enterprise provenance graph technology. Hence, analysis of an enterprise provenance graph and its data is performed to extract information and discover knowledge about enterprise practices, policies, operational aspects, process patterns, workflow, statistics and models.
  • These and other objects, features, and advantages of the present invention will become apparent from the following detailed description of illustrative embodiments thereof, which is to be read in connection with the accompanying drawings.
  • BRIEF DESCRIPTION OF THE DRAWINGS
  • FIG. 1 illustrates a system for collecting and processing provenance data for automatic discovery of enterprise process information, according to an embodiment of the invention.
  • FIG. 2 illustrates a provenance record, according to an embodiment of the invention.
  • FIG. 3 illustrates a provenance data model, according to an embodiment of the invention.
  • FIG. 4A illustrates an enterprise application scenario used to generate sample provenance graph, according to am embodiment of the invention.
  • FIG. 4B illustrates a provenance graph extracted from an enterprise scenario, according to an embodiment of the invention.
  • FIG. 4C illustrates a provenance sub-graph that represents a control-point, according to an embodiment of the invention.
  • FIG. 5 illustrates a provenance graph enrichment process, according to an embodiment of the invention.
  • FIG. 6 illustrates an enterprise process information extraction system, according to an embodiment of the invention.
  • FIG. 7 illustrates process pattern extraction and pattern search components, according to an embodiment of the invention.
  • FIG. 8 illustrates a statistical pattern analysis methodology, according to an embodiment of the invention.
  • FIG. 9 illustrates an example of extracting workflow information, according to an embodiment of the invention.
  • FIG. 10 illustrates a relationship discovery methodology, according to an embodiment of the invention.
  • FIG. 11 illustrates a relation discovery subsystem, according to an embodiment of the invention.
  • FIG. 12 illustrates an example of finding a relationship between two enterprise objects, according to an embodiment of the invention.
  • FIG. 13 illustrates a computer system in accordance with which one or more components/steps of the techniques of the invention may be implemented, according to an embodiment of the invention.
  • DETAILED DESCRIPTION OF PREFERRED EMBODIMENTS
  • As used herein, the term “enterprise” is understood to broadly refer to any entity that is created or formed to achieve some purpose, examples of which include, but are not limited to, an undertaking, an endeavor, a venture, a business, a concern, a corporation, an establishment, a firm, an organization, or the like. Thus, “enterprise processes” are processes that the enterprise performs in the course of attempting to achieve that purpose. By way of one example only, enterprise processes may comprise business processes.
  • As used herein, the term “provenance” is understood to broadly refer to an indication or determination of where something, such as a unit of data, came from or an indication or determination of what it was derived from. That is, the term “provenance” refers to the history or lineage of a particular item. Thus, “provenance information” or “provenance data” is information or data that provides this indication or results of such determination. By way of one example only, enterprise provenance data may comprise business provenance data.
  • Principles of the invention provide a technique to extract information about an enterprise process from its execution traces which are captured by employing enterprise provenance graph technology. Hence, a focus of the invention is the analysis of an enterprise provenance graph to extract information and discover knowledge about enterprise practices, policies, operational aspects, process patterns and models.
  • Since the analysis is based on the recorded enterprise activities and objects, effective trace of actual enterprise execution is important. In the illustrative embodiments to be described herein, provenance technology is utilized to capture various aspects of the enterprise operations from relevant system events. In order to establish enterprise provenance, various data records that capture different aspects of the operations are collected and correlated to form a graph which is called a “provenance graph.” Provenance records reflecting data objects, tasks, or processes are created by listening to the underlying events of the IT (information technology) infrastructure. The records collected about enterprise processes are connected through correlation and a representation of end-to-end process is obtained. Once a provenance graph is obtained, graph analysis techniques are used to extract information related to the enterprise practices.
  • One embodiment of the invention provides a sub-system that extracts common process patterns that are followed during the execution of the enterprise process. In the absence of a model or when the execution does not follow the model, extracted patterns help understand the actual process, optimize the system and compare it with existing solutions. A comprehensive list of process patterns is published in the literature and available for reuse. Once the patterns that are employed in the enterprise process are discovered, they are indexed and stored. Other than the common process pattern, embodiments of the invention allow to search for a specific process pattern. A custom pattern may be specified through a user interface and pattern search function is invoked to search for this particular pattern.
  • Another embodiment of the invention provides for the statistical analysis of the executed paths. Statistics of the execution traces reveal the enterprise practices generally adopted within the enterprise, exceptional situations and exception rates, performance parameters such as expected delays, throughput, etc. More importantly, graph analysis techniques help to predict how the enterprise process will evolve. This way it becomes possible to answer questions like how long a task will last to completion and what are the likely tasks and activities that will come next.
  • Yet another embodiment of the invention provides for the discovery of the workflow from the execution traces. The actual workflow of the execution traces may differ from the initial workflow model. The analysis reveals the discrepancies. Workflow is the depiction of a series of tasks and information flow that are executed to fulfill a requirement or a goal. It is a visual abstraction of the process.
  • One other embodiment provides for the discovery of a relationship between two enterprise objects that are part of the process. A methodology is described that discovers the relationship between two objects that are not directly connected or correlated within the provenance graph. This is a complex relation formed by combining the atomic relation between other objects on the path that connects the two objects. Finding relations is important to discover roles and relations and help auditing to ensure enterprise integrity.
  • Further, as will be explained, the provenance graph encapsulates various aspects of the enterprise and embodiments of the invention disclose a methodology to discover these aspects. These include people and roles (referred to as “organizational aspect”), the tasks and activities executed (referred to as “functional aspect”), and the data pedigree and flow (referred to as “data aspect”).
  • Below, the detailed description, in Section I, provides illustrative embodiments of an enterprise provenance approach that provides for creation and maintenance of a provenance data model and graph. This approach is disclosed in the above-referenced U.S. patent application identified as attorney docket no. YOR920080508US1, entitled “Processing of Provenance Data for Automatic Discovery of Enterprise Process Information,” filed concurrently herewith and incorporated by reference herein in its entirety. Section II of the detailed description below then provides description of the above-mentioned illustrative embodiments for extracting enterprise practices and policies through inspection and analysis of provenance data.
  • I. Provenance Data Model and Graph
  • We define an enterprise provenance approach as one that comprises capturing and managing the lineage of enterprise artifacts to discover functional, organizational, data and resource aspects of an enterprise. Examining enterprise provenance data gives insight into the chain of cause and effect relations and facilitates understanding the root causes of the resultant event.
  • In one embodiment of the invention, our approach comprises the following steps: (1) identifying the control points, relevant enterprise artifacts and required correlations; (2) probing the actual execution of the enterprise process to collect data; (3) correlating and enriching the collected data and the relations among them to create a provenance graph; (4) analyzing aggregated information to enable enterprise activity monitoring or to interfere with the execution by generating alerts; and (5) providing access to information stored in the graph for detailed investigation and root cause analysis.
  • FIG. 1 shows a system for capturing and processing provenance data for automatic discovery of enterprise process information, according to an embodiment of the invention. The enterprise process information discovery system comprises storage unit 101, multi-capturing/recording components 103, provenance data management sub-system 107, rules library 109, provenance graph enrichment engine 111, text analysis engine 110, enterprise data repository 120, provenance data query interface 113, graph visualizer 117 and dashboard 115.
  • The provenance data management component 107 supports the specification of the provenance data model 105, i.e., the list of enterprise objects to be captured and the level of details. It is also used to define the correlation rules between two data records. Capturing/recording components 103 are used to capture, process, and reformat application events of the underlying information system 100 (including, for example, computers, servers, repositories, email systems and other enterprise systems) and record the meta-data of enterprise operations into the provenance store. Hence, capturing/recording components 103 map the captured event data onto the data model defined (122) by provenance data management component 107. The information is then transferred (121) to storage unit 101, which is the store for provenance data.
  • Provenance data management component 107 generates rules (130) that are stored in rules library 109 for provenance graph enrichment engine 111. The rules define a correlation between the enterprise artifacts which is then used to connect them in the provenance graph representation.
  • Provenance graph enrichment engine 111 links and enriches the collected data to produce the provenance graph. To do so, provenance graph enrichment engine 111 accesses (126) the content of the provenance store 101 through provenance data query interface 113 as well as the original enterprise data. It also employs text analysis engine 110 to discover relationships among data records by analyzing the unstructured text contained in some of the data records. As an example, the analysis of e-mail may reveal that it is a rejection and is used to establish a link between the e-mail and an approval task.
  • The enriched enterprise data is accessed through query interface 113 and is used to display information about actual enterprise operations. This can be done in one of several ways. One way is to deploy a query into the provenance store which emits the results in real-time, feeding an existing dashboard 115 in order to display key performance indicators as an example. Secondly, a query front-end enables visualization and navigation through the provenance graph by using graph visualizer component 117.
  • The central component of the architecture is data store 101 where the provenance graph and the associated data records are kept. When the probed event data coming from the runtime systems 100 is transformed into provenance data by capturing/recording component 103, they are written to the store through a database connection (121). As new data are captured and recorded, provenance graph enrichment engine 111 is notified via connection 124. Provenance graph enrichment engine 111 examines the new data records and run associated rules from the rules library, utilizes the existing enterprise data as well as text analysis engine 110 to determine a possible correlation. If new data items or relations are discovered, they are written to the province store via query interface 113.
  • Ensuring compliance through the information system 100 requires laying out a data model that covers the relevant aspects of the enterprise operations. Creating a data model is the first step to bridge enterprise operations to information systems. The data model should support relevant and salient aspects of the enterprise.
  • FIG. 2 illustrates a comprehensive, generic data model that can be extended to meet the domain specific needs. As shown, the data of enterprise artifacts stored in the provenance store, depicted as Provenance Record 210, falls into one of the following five dimensions or classes:
  • Data Record 230: A data record is the representation of an enterprise artifact that was produced or changed during execution of an enterprise process. Typically, those artifacts include documents, e-mails, and database records. In the provenance store, each version of such an artifact is represented separately.
  • Task Record 220: A task record is the representation of the execution of one particular task. Such task might be part of a formally defined enterprise process or be stand alone; it might be fully automated or manual.
  • Process Record 240: A process record represents one instance of a process. In automated enterprise management systems, tasks are executed by processes. Hence, each task is associated to the corresponding process record.
  • Resource Record 215: A resource record represents a person, a runtime or a different kind of resource that is relevant to the selected scope of enterprise provenance, e.g., as actor of a particular task.
  • Custom Records 250: Custom records provide the extension point to capture domain specific, mostly virtual artifacts such as compliance goals, alerts, checkpoints, etc. This will be explained in greater detail below.
  • These five classes of records represent the nodes of the provenance graph. To define the correlation between two records, Relation Records 260 represent the edges. These are the records generally produced as a result of relation analysis among the collected records. For simplicity of explanation, we only consider binary relations between records. However, relations between relation records are possible and such higher degree relation could be expressed in accordance with illustrative principles of the invention. Some relations are rather basic on the IT (information technology) level, such as the read and write between tasks and data. Other relations are derived from the context, such as that between manager and achieved challenge.
  • As mentioned above, the inventive enterprise provenance solution provides a generic data model that can be extended to meet the application domain specific needs.
  • FIG. 3 depicts the UML (Unified Modeling Language) representation of the provenance graph data model. Basically, the provenance graph comprises six different sets of records, namely, Process 310, Data 320, Task 330, Resource 340, Relation 380 and Custom 350 record types. Each record is an extensible XML data structure and all records share common attributes: id and type are used to identify and classify the record within the graph; the appId (application specific id) and display name refer to characteristics of the corresponding enterprise artifact. These attributes are inherited from a parent record type, RecordType 370. Data, task and process records are added to the provenance graph as the business operations are executed. Resource and custom records are often added after the fact by analytics. Those five record classes represent the nodes of the provenance graph. A semantic relation between two enterprise artifacts is expressed by an edge between the corresponding nodes materialized as a relation record. FIG. 3 shows several specializations of the basic record types. The challenge document and key control point type, however, are specific to a particular application.
  • ProcessRecordType 310 is differentiated from the other record types by trigger, startTime, endTime, runtime and model attributes. DataRecordType 320, on the other hand, has creator, creation Time, location, hash Value attributes. These attributes are consistent with the original purpose of having these records in the graph. In FIG. 3, two data record types are exemplified which are specific to a particular application; EmailRecordType 322 and ChallengeDocumentType 324. Email record type contains all the attributes necessary to represent an e-mail document such as subject, from, to, cc, bcc, sendTime, receiveTime, reference, attachments while ChallengeDocumentType represents an application specific document attributes.
  • Relations connect to provenance records. Hence, a RelationRecordType 380 has source and target attributes. Various other relation types are also depicted as extensions of RelationRecordType in 382.
  • In order to keep the data model generic and flexible, CustomRecordType 350 is introduced and KeyControlPointType 352 is shown as an example to a custom record type. KeyControlPointType 352 is used to relate records to a particular compliance control point. ProvenanceGraphType 360 is introduced to represent the attributes of the graph which are listed as relations, dataRecords, taskRecords, processRecords, resourceRecords, customRecords. In addition to the graph attributes, the domainId attribute is introduced to specify the particular domain for which this provenance graph is generated. EmployeeRecordType 344 contains the attributes that define an employee within the organization. These attributes are listed as an email address, a userid, indicator of being a manager or not, the name of employee's manager and employee's role in executing the tasks. A recordType 370 is the parent of all record types from where they inherit id, type, application id, display name and xml attributes. The children of recordType 370 are ProcessRecordType 310, DataRecordType 320, TaskrecordType 330, CustomRecordType 350 and RelationRecordType 370, as mentioned previously. Following the concept of object oriented modeling, ExtensibleType 394 can be considered the ancestor of all types which has three children, namely, RecordType (370), RecordReferenceType (390) and ContentReferenceType (396). ExtensibleType passes one attribute, extensions, to the children. This attribute gives flexibility to have multiple extensions of the same model. The content and record reference types, ContentReferenceType 396 and RecordReferenceType 390 are used to refer to the location of actual data. Note that the provenance graph is a meta-information repository and the actual data resides within the enterprise at the addresses specified in record and content reference types. Resource RecordType (340) has two children. That is, there are two kinds of resource records, employees and machines. These are the entities that activate task items. In the model, employee resource is represented by EmployeeRecordType 344 and machine resources are represented as RuntimeRecordType (346).
  • In order to demonstrate how a provenance graph captures various aspects of the enterprise, we take a closer look at a sample scenario related to distribution of variable compensation of sales employees. Our example represents a simplified version of the actual process seen in a customer engagement. The process can be described as follows: A sales employee receives commissions for the generated revenue or profit as variable part of his income. To align these incentives specifically to the line of business, geography, and individual situation of the employees, managers create challenges. A challenge is a document that describes in detail each sales target and the associated compensation. If an employee is able to provide evidence about the achievement of a particular challenge, commission is added to his next payment statement as an incentive.
  • Although from modeling point of view there is one end-to-end process instance that spans all activities from the creation of a particular challenge to the issuance of the corresponding payment statement, in practice, various distributed systems are involved in the execution of the process. Processing structured as well as unstructured documents and running formal sub-processes as well as ad-hoc tasks increases the operational complexity. FIG. 4A illustrates this scenario.
  • In the first step, the manager creates the challenge (1) using a Web-front-end to the central record management system. This task triggers an automated email informing the employee about the challenge. To claim the achievement, the employee has to provide evidence (2)—which can take various forms: a contract or receipt, a fax from the sales customer, a pointer to a different revenue database, etc. Typically, the evidence is available electronically and it is attached to an e-mail sent to his manager by the employee. Upon reviewing the evidence, the manager evaluates the challenge and, in case of achievement, marks its status (3). Periodically, the latest achievement data is collected and fed into the payroll system (4). Finally, the paycheck is issued to the employee (5).
  • In order to assure the compliance of the overall process with legal accounting regulations, various control points are introduced. Each control point reflects one locally verifiable requirement is validated today manually for a small number of sampled transactions by internal and/or external auditors. Typically, control points are established for the interaction of various systems and the verification of the control point requires the correlation of structured and/or unstructured data. In FIG. 4A, the two control points are shown. Control point A requires the manager to obtain, evaluate carefully, and maintain the evidence of any achieved challenge. Control point B requires the paycheck to reflect the accumulated commissions correctly.
  • To verify control point A, an auditor selects an achieved challenge, requests the evidence, and compares the sales targets with the documented achievements. This seemingly simple task has proven to be quite complicated in practice. Firstly, the evidence is not directly linked to the challenge. In some cases, it is not even stored in a central repository but kept locally by the manager. The auditor therefore has to contact the manager, and the manager has to find the right documents. Our observations have shown compliance failure rate of 70%, largely because the evidence could not be located. Also, we have observed lengthy email exchanges between an auditor and a manager until the correct evidence could be identified. As a result of this cumbersome process, only a small fraction of the total number of transactions can be sampled, which implies a high number of undetected questionable situations and possibly fraud. In addition, there had been no support available to track down the root-cause once a questionable situation was detected. This is a major drawback of the existing auditing method. To enable an enterprise to prevent future wrongdoing or simply to detect a pattern of fraudulent behavior, it is essential to answer the following question: “Why did this happen?” Our proposed enterprise provenance approach targets exactly this question.
  • In the given example, one might argue that the process is not well designed. But regardless how carefully an application is architected, there will always be gaps between the different systems involved, there will always be data that does not fit into predefined forms, and there will always be exceptions in the execution. Rather than requiring a full scale, heavyweight data integration, our approach focuses on the recording of meta-data of relevant objects and events into a centralized and easily accessible store with links into the original systems; the automated correlation of those meta-data to establish execution traces, versioning histories, and other relevant relations; and finally the deep analysis to detect situations after the fact, raise alerts while monitoring continuously, and even interfere with the execution to prevent compliance violations.
  • FIG. 4B depicts the provenance graph for the scenario explained above. The relevant enterprise artifacts and their relations with respect to the scenario are illustrated. DataRecord types are identified by cylindrical shapes while ResourceRecord types are hexagonal, and TaskRecord types are rectangular. Thus, with respect to the scenario in FIG. 4A, the corresponding task records are represented in FIG. 4B as ChallengeProcess node 470, CreateChallenge node 420, and MarkAchievenment node 410. Further, the corresponding resource records are represented as SalesManager node 450 and SalesEmployee node 460. Corresponding data records are represented as OfferedChallenge node 430 and AchievedChallenge node 440. The diamond shapes on the edges between nodes represent the corresponding relation records: partOf 422, writes 426, priorVersion 432, reads 434, priorTask 424, actor 452, partOf 472, actor 458, managerOf 454, writes 412, managerOf 456, employeeOf 462.
  • The provenance sub-graph of FIG. 4C shows how to represent a control point (in particular, control point A shown in FIG. 4A) which indicates a requirement that sales manager must obtain and review the supporting document that supports the achieved challenge. Representing control points at the IT level enables computing compliance automatically.
  • More particularly, with respect to the scenario in FIG. 4A, the corresponding task record is represented in the sub-graph of the control point (468) in FIG. 4C as SendClaim node 476. Further, the corresponding resource records are represented as SalesManager node 470 and SalesEmployee node 471. Corresponding data records are represented as AchievedChallenge node 472, ClaimEmail node 474, and SupportingDocument node 478. Again, the diamond shapes on the edges between nodes represent the corresponding relation records. For the sake of simplicity, they have not been separately numbered since their specific relationships to the nodes they attach are dependent on the process being modeled (and fully understood from the scenario explained above in the context of FIG. 4A).
  • FIG. 5 shows the process of enriching the provenance graph. Provenance graph 500 is enriched by finding the relations among existing provenance records and discovering the new ones. The relations among the provenance records are defined by the rule files stored in the rule library 109. As an example, a simple rule may indicate that if the value of “From” field of an e-mail document is equal to the e-mail address of a person record, “sender” relation is set between the e-mail DataRecord and the person ResourceRecord. For every new item created in the graph, provenance graph enrichment engine 111 is notified via a graph event listener 510. The attributes of these newly created records are queried through graph query interface 520 and the received information is passed to the analytics component 540.
  • The main function of the analytics is to find relations or new records by computing the rules stored in the rules library 109 over the attributes of provenance records. Existing enterprise data 120 could also be used to find new relations, such as management or organizational relations. Text analysis engine 110 is employed when rules require the analysis of an unstructured content.
  • II. Extracting Information from Provenance Data
  • In the illustrative embodiments described herein, a system is described for utilizing the actual execution traces of an enterprise operation in order to extract information about the enterprise. The execution trace is captured in the form a graph from the instances of enterprise operations in a manner as described above in section I. Recall from FIG. 1 that the graph data is stored in the provenance store 101 and accessed through a query interface 113.
  • The enterprise information that is encapsulated by the graph is extracted by various system components. The functional descriptions of these components are given below.
  • Process pattern extractor: Process patterns are the pattern of activities within an organization that solve common problems. Repeatable ways of bringing together activities to solve common problems form patterns. Many known process patterns are identified and listed to help process modeling and design, for example, see W. M. P van der Aalst et al., “Workflow Patterns,” Distributed and Parallel Databases, 14(3), pages 5-51, July 2003, the disclosure of which is incorporated by reference herein in its entirety. These patterns are divided in six categories.
      • Basic Control,
      • Advanced Branching,
      • Structural,
      • Multiple Instances,
      • State Based; and
      • Cancellation.
  • Extracting process patterns help understanding how the enterprise operations are modeled. The abstractions provided by the models help understanding the performance issues and optimize the operation end-to-end.
  • Statistical pattern analyzer: The way enterprise operations execute is not deterministic. In many cases, unpredictable human behavior is the cause of variations in process executions. Statistical pattern analyzer examines many execution traces for the same process and extracts statistical information about the execution patterns. Examples include the execution paths that are most frequently used, process evolution predictions, exception ratios, delay statistics, throughput, etc.
  • Workflow Analyzer: Workflow is a sequence of operations or tasks and information flow that are executed to achieve a goal. Execution trace contains the information about the workflow. The workflow analyzer examines the trace and predicts the workflow. A process model is extracted as a result. This way an abstraction of the execution path is obtained.
  • Organizational aspect analyzer: This analyzer determines the human resources behind the execution of non-automated tasks and their roles. It finds out who executed what and when.
  • Functional aspect analyzer: During the execution of an enterprise process, different tasks are invoked. This analyzer finds the list of executed tasks, their orders, duration of each task, and outputs of each task.
  • Data aspect analyzer: Identifying the enterprise artifacts and their pedigree is the purpose of the data aspect analyzer. Enterprise artifacts are documents, e-mails, forms and other structured or unstructured data that are consumed during the execution of an enterprise process.
  • Pattern Search: This is a special search function to search for special process patterns within the execution trace.
  • Relation finder: The relationship between two enterprise objects can be discovered by analyzing the paths between these objects. The relation finder looks into existing atomic relations between enterprise objects and determines the relationship between two objects where there is no direct relation.
  • As mentioned above, embodiments of the invention extract information about enterprise operations by examining the execution traces which are captured by creation of a provenance graph. Recall from FIG. 1 (described in section I above) that a data model captures the relevant aspects of the enterprise and that the data associated with enterprise artifacts are stored in the provenance store 101 as ProvenanceRecord 210 (FIG. 2). Recall that the data falls into one of the following five record types:
  • Data Record: Recall that a data record is the representation of an enterprise artifact that was produced or changed during execution of an enterprise process. Typically those artifacts include documents, e-mails, and database records. In the provenance store, each version of such an artifact is represented separately.
  • Task Record: Recall that a task record is the representation of the execution of one particular task. Such task might be part of a formally defined enterprise process or be stand alone, and it might be fully automated or manual.
  • Process Record: Recall that a process record represents one instance of a process. In automated business management systems, tasks are executed by processes. Hence, each task is associated to the corresponding process record.
  • Resource Record: Recall that a resource record represents a person, a runtime or a different kind of resource that is relevant to the selected scope of enterprise provenance, e.g., as actor of a particular task.
  • Custom Records: Recall that a custom record provides the extension point to capture domain specific, mostly virtual artifacts such as compliance goals, alerts, checkpoints, etc.
  • Embodiments of the invention assume that a provenance graph is created from the execution traces of enterprise operations. Recall that a sample provenance graph is depicted in FIG. 4C.
  • FIG. 6 depicts an enterprise process information extraction system 650. In general, system 650 sends queries to provenance data 600 (in form of a provenance graph), stored in provenance store 605, via provenance graph query interface 610. The queries are sent to retrieve information for analysis.
  • The components of system 650 are process pattern extractor 620, pattern search 622, organizational aspect analyzer 630, functional aspect analyzer 632, data aspect analyzer 634, relation (path) finder 640, statistical pattern analyzer 642 and workflow analyzer 644. The system also includes a user interface 660. Descriptions of these components are given above.
  • Process execution trace 600 is a directed graph. A number of known languages are available to query a directed graph such as a provenance graph. One example is SPARQL (Simple Protocol and RDF Query Language) for RDF (Resource Description Framework) which is a directed, labeled graph data format. In one implementation, provenance graph query interface 610 provides for a set of SPARQL interfaces to access graph information 600 in store 605. These include, for example, graph nodes, edges, their connections, paths between the nodes, etc.
  • As an example, organizational aspect analyzer 630 sends queries to find out people and their roles during enterprise operations. People are ResourceRecords that constitute some of the nodes of the provenance graph. People roles are related to the task they invoke. The edges that connect the people records to task records reveal various roles. Examples may include approver, sender, receiver, reviewer, etc. The functional aspect analyzer 634 sends queries via 610 to retrieve the list of task and activities.
  • Together with the results retrieved by functional aspect analyzer 632, organizational aspect analyzer 330 determines people's role by searching for the edges that connect people records to task records. Data aspect analyzer 634 extracts data record objects from the graph via 610 and finds their relations in time. A new DataRecord node is created for a new version of the data object on the provenance graph. Data aspect analyzer 634 determines the creation and modification dates of every data item, which tasks consume the data objects, and the people who create and modify the data. All of these functions are performed by using a query language like SPARQL, which enables retrieving various properties of the graph.
  • FIG. 7 depicts the components of process pattern extractor 620. Common process patterns are identified as reusable process design patterns. Common process pattern subcomponent 710 contains the representations of common patterns where each pattern is represented in terms of provenance graph data model. The representation is based on TaskRecord type of ProvenanceRecord as depicted in FIG. 1 and the relationship between TaskRecords. As an example, if multiple task records are merging into the same TaskRecord, then it is an indication of a “simple merge” pattern. Similarly, all patterns can be expressed in terms of the provenance graph TaskRecords and their relations. As a result, graph patterns associated with every common process pattern are formed. These graph patterns are stored in block 710 while associated queries for every pattern are kept in a library of common process pattern queries 720.
  • As discussed above, a common process pattern is a sub-graph where the nodes are TaskRecords and the edges are relations between the TaskRecords. An associated query for a common pattern retrieves all sub-graphs of the provenance graph with the same properties.
  • A common process pattern query sent by block 720 returns a list of matching patterns from the provenance graph in block 740. Common process patterns are then extracted in block 750 from the list of returned sub-graph patterns in block 740.
  • Execution traces 600 may contain patterns other than the common process patterns. The search function 622 is used to extract patterns that are not expressed as a common pattern. As shown in FIG. 7, special pattern query 770 is formed for the pattern to be searched for and the corresponding results are retrieved by block 770.
  • FIG. 8 shows how to compute the statistics of execution paths between two points in a process. The process may have many execution traces and, for statistics to be meaningful, hence it is assumed that sufficient numbers of execution traces are collected before the statistics module is run. The algorithm starts with identifying the start and end points (800). These are nodes of the provenance graph. The algorithm is run over all the traces. If the current trace is NOT null (NO in 820), then a Bayesian Learning algorithm 840 is run between the start and end nodes. Existing execution traces are used to create a Markovian model for the provenance graph. The Markovian model assigns probabilities for each node and transitions between the nodes. The learning algorithm is based on the statistics of transitions between nodes and the average time spent in each node. If the trace is null and thus no other traces are available to analyze (YES in 820), then the available statistics are displayed (830). A known path finding algorithm can be employed.
  • Once the path is found, path statistics are re-computed (860) and the next trace (810) is examined. Hence, every new trace contributes to the path statistics in step 860. Path statistics include standardized regression coefficients predicting one variable from another. Once the statistical model is computed for a path, then this model is used for predicting how the enterprise process will evolve. As an example, from the statistical model, one can predict what the next execution step will be. Well-known machine techniques are available for analyzing a path and computing path statistics by using Bayesian Network Model (see Machine Learning, Tom Mitchell, McGraw Hill, 1997, the disclosure of which is incorporated by reference herein in its entirety).
  • Workflow is the sequence of tasks in an enterprise operation and provides for an abstraction or a model for the actual work. It provides an easier way to understand and explain complex processes in visual form. Recall that workflow analyzer 644 depicted in FIG. 6 extracts the sequence of tasks from the provenance graph. The sequence of tasks is retrieved from the provenance graph via query interface 610 by searching for the pedigree of TaskRecords, DataRecords and their relations. FIG. 9 is an example workflow 900 for the enterprise operation which includes not only the sequence of tasks but also the data consumed between the tasks.
  • Discovering the relationship between two enterprise objects is important to understand the roles and impact of various resources on the operation. An edge connecting the two nodes of a graph is an atomic relation. Referring back to FIG. 4C, as an example, SupportingDocument 478 and ClaimEmail 474 nodes are connected with attachedTo relation edge. The atomic relation reveals that SupportingDocument is attached to the ClaimEmail. More complex relations can be composed by combining the atomic relations.
  • FIG. 10 shows the sequence of steps to find the relationship between two provenance records X and Y (1000). The first step is to find all paths between X and Y (1010). Once the connecting paths are discovered, the paths are expressed in terms of their atomic relations (1020), and the main relationship is expressed as the combination of atomic relations (1030).
  • FIG. 11 shows how the relation finder subsystem works. Recall that this component is shown in FIG. 6 as relation (path) finder 640.
  • A pair of provenance records 1100 for which the relationship is sought is entered through user interface 660. One of the path finding algorithms 1110 is used to find the path between the records via the query interface 610. For example, A* and Dijkstra are some of the most frequently used path finding algorithms that may be used. Once the path is found, atomic relations are extracted (1120) and the main relation is built (1130).
  • In the example below, a relationship between SalesManager and SupportingDocument depicted in FIG. 12 is sought. Based on the steps of FIG. 11, the following steps are taken:
      • 1. Finding paths between SalesManager and SupportingDocument
        • First path: SalesManager—ClaimEmail—SupportingDocument
        • SecondPath: SalesManager—SalesEmployee—ClaimEmail—SupportingDocument
      • 2. Atomic Relations for the first path:
        • attachedTo(ClaimEmail, SupportingDocument): SupportingDocument is attached to the ClaimEmail
        • receiver(SalesManager, ClaimEmail): SalesManager receives the ClaimEmail
      • 3. Building the path from atomic relations in datalog format:
        • Relations(SalesManager, SupportingDocument): attachedTo(ClaimEmail, SupportingDocument) AND receiver(SalesManager, ClaimEmail)
      • 4. Discovered Relation: SupportingDocument is attached to the e-mail received by SalesManager
  • The datalog format is known and disclosed in, for example, S Ceri et al., “What you always wanted to know about Datalog (and never dared to ask),” IEEE Transactions on Knowledge and Data Engineering 1(1), 1989, pp. 146-66; and Datalog User Manual, John D. Ramsdell of The MITRE Corporation, 2004, the disclosures of which are incorporated by reference herein in their entirety.
  • Lastly, FIG. 13 illustrates a computer system in accordance with which one or more components/steps of the techniques of the invention may be implemented. It is to be further understood that the individual components/steps may be implemented on one such computer system or on more than one such computer system. In the case of an implementation on a distributed computing system, the individual computer systems and/or devices may be connected via a suitable network, e.g., the Internet or World Wide Web. However, the system may be realized via private or local networks. In any case, the invention is not limited to any particular network.
  • Thus, the computer system shown in FIG. 13 may represent one or more of the components/steps shown and described above in the context of in FIGS. 1 through 12. For example, the computer system may be used to implement one or more of the components of the information extraction system depicted in FIG. 6.
  • The computer system may generally include a processor 1301, memory 1302, input/output (I/O) devices 1303, and network interface 1304, coupled via a computer bus 1305 or alternate connection arrangement.
  • It is to be appreciated that the term “processor” as used herein is intended to include any processing device, such as, for example, one that includes a CPU and/or other processing circuitry. It is also to be understood that the term “processor” may refer to more than one processing device and that various elements associated with a processing device may be shared by other processing devices.
  • The term “memory” as used herein is intended to include memory associated with a processor or CPU, such as, for example, RAM, ROM, a fixed memory device (e.g., hard disk drive), a removable memory device (e.g., diskette), flash memory, etc. The memory may be considered a computer readable storage medium.
  • In addition, the phrase “input/output devices” or “I/O devices” as used herein is intended to include, for example, one or more input devices (e.g., keyboard, mouse, etc.) for entering data to the processing unit, and/or one or more output devices (e.g., display, etc.) for presenting results associated with the processing unit.
  • Still further, the phrase “network interface” as used herein is intended to include, for example, one or more transceivers to permit the computer system to communicate with another computer system via an appropriate communications protocol.
  • Accordingly, software components including instructions or code for performing the methodologies described herein may be stored in one or more of the associated memory devices (e.g., ROM, fixed or removable memory) and, when ready to be utilized, loaded in part or in whole (e.g., into RAM) and executed by a CPU.
  • In any case, it is to be appreciated that the techniques of the invention, described herein and shown in the appended figures, may be implemented in various forms of hardware, software, or combinations thereof, e.g., one or more operatively programmed general purpose digital computers with associated memory, implementation-specific integrated circuit(s), functional circuitry, etc. Given the techniques of the invention provided herein, one of ordinary skill in the art will be able to contemplate other implementations of the techniques of the invention.
  • Although illustrative embodiments of the present invention have been described herein with reference to the accompanying drawings, it is to be understood that the invention is not limited to those precise embodiments, and that various other changes and modifications may be made by one skilled in the art without departing from the scope or spirit of the invention.

Claims (25)

1. A computer-implemented method of extracting information regarding an execution of an enterprise process, comprising the steps of:
generating provenance data, wherein the provenance data is based on collected data associated with an actual end-to-end execution of the enterprise process and is indicative of a lineage of one or more data items;
generating a provenance graph that provides a visual representation of the generated provenance data, wherein nodes of the graph represent records associated with the collected data and edges of the graph represent relations between the records; and
analyzing at least a portion of the generated provenance data from the graph so as to extract information about the execution of the enterprise process based on the analysis.
2. The method of claim 1, wherein the analysis step further comprises determining one or more process patterns in the enterprise process.
3. The method of claim 2, wherein the one or more process patterns comprise one or more common process patterns.
4. The method of claim 2, wherein the one or more process patterns comprise one or more custom process patterns.
5. The method of claim 4, wherein a custom pattern is specified by a user through a user interface and a pattern search function is invoked to search for this particular pattern.
6. The method of claim 1, wherein the analysis step further comprises performing a statistical analysis on at least a portion of the generated provenance data.
7. The method of claim 6, wherein the statistical analysis predicts a subsequent evolution of the enterprise process.
8. The method of claim 1, wherein the analysis step further comprises discovering an actual workflow from at least a portion of the generated provenance data.
9. The method of claim 1, wherein the analysis step further comprises discovering a relation between two enterprise objects that are part of the enterprise process from at least a portion of the generated provenance data.
10. The method of claim 9, wherein the two enterprise objects for which the relation is discovered are not directly connected or correlated within the provenance graph.
11. The method of claim 1, wherein the analysis step further comprises discovering one or more human resources behind the execution of non-automated tasks and their roles in the enterprise process.
12. The method of claim 1, wherein the analysis step further comprises discovering a list of one or more executed tasks, orders of the tasks, durations of the tasks and outputs of the tasks in the enterprise process.
13. The method of claim 1, wherein the analysis step further comprises discovering one or more artifacts and their pedigree in the enterprise process.
14. Apparatus for extracting information regarding an execution of an enterprise process, comprising:
a memory; and
a processor coupled to the memory and configured to: generate provenance data, wherein the provenance data is based on collected data associated with an actual end-to-end execution of the enterprise process and is indicative of a lineage of one or more data items; generate a provenance graph that provides a visual representation of the generated provenance data, wherein nodes of the graph represent records associated with the collected data and edges of the graph represent relations between the records; and analyze at least a portion of the generated provenance data from the graph so as to extract information about the execution of the enterprise process based on the analysis.
15. The apparatus of claim 14, wherein the analysis further comprises determining one or more process patterns in the enterprise process.
16. The apparatus of claim 15, wherein the one or more process patterns comprise one or more common process patterns.
17. The apparatus of claim 15, wherein the one or more process patterns comprise one or more custom process patterns.
18. The apparatus of claim 14, wherein the analysis further comprises performing a statistical analysis on at least a portion of the generated provenance data.
19. The apparatus of claim 14, wherein the analysis further comprises discovering an actual workflow from at least a portion of the generated provenance data.
20. The apparatus of claim 14, wherein the analysis further comprises discovering a relation between two enterprise objects that are part of the enterprise process from at least a portion of the generated provenance data.
21. The apparatus of claim 20, wherein the two enterprise objects for which the relation is discovered are not directly connected or correlated within the provenance graph.
22. The apparatus of claim 14, wherein the analysis further comprises discovering one or more human resources behind the execution of non-automated tasks and their roles in the enterprise process.
23. The apparatus of claim 14, wherein the analysis further comprises discovering a list of one or more executed tasks, orders of the tasks, durations of the tasks and outputs of the tasks in the enterprise process.
24. The apparatus of claim 14, wherein the analysis further comprises discovering one or more artifacts and their pedigree in the enterprise process.
25. An article of manufacture for extracting information regarding an execution of an enterprise process, the article comprising a computer readable storage medium including program code which when executed by a computer performs the steps of:
generating provenance data, wherein the provenance data is based on collected data associated with an actual end-to-end execution of the enterprise process and is indicative of a lineage of one or more data items;
generating a provenance graph that provides a visual representation of the generated provenance data, wherein nodes of the graph represent records associated with the collected data and edges of the graph represent relations between the records; and
analyzing at least a portion of the generated provenance data from the graph so as to extract information about the execution of the enterprise process based on the analysis.
US12/265,993 2008-11-06 2008-11-06 Extracting enterprise information through analysis of provenance data Expired - Fee Related US9053437B2 (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
US12/265,993 US9053437B2 (en) 2008-11-06 2008-11-06 Extracting enterprise information through analysis of provenance data

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
US12/265,993 US9053437B2 (en) 2008-11-06 2008-11-06 Extracting enterprise information through analysis of provenance data

Publications (2)

Publication Number Publication Date
US20100114629A1 true US20100114629A1 (en) 2010-05-06
US9053437B2 US9053437B2 (en) 2015-06-09

Family

ID=42132549

Family Applications (1)

Application Number Title Priority Date Filing Date
US12/265,993 Expired - Fee Related US9053437B2 (en) 2008-11-06 2008-11-06 Extracting enterprise information through analysis of provenance data

Country Status (1)

Country Link
US (1) US9053437B2 (en)

Cited By (42)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20100250689A1 (en) * 2009-03-24 2010-09-30 Lockheed Martin Corporation Method and apparatus for generating a figure of merit for use in transmission of messages in a multi-level secure environment
US20110153311A1 (en) * 2009-12-17 2011-06-23 Boegl Andreas Method and an apparatus for automatically providing a common modelling pattern
US20120143867A1 (en) * 2010-12-07 2012-06-07 Sap Ag Facilitating Extraction and Discovery of Enterprise Services
US20120296924A1 (en) * 2011-05-20 2012-11-22 International Business Machines Corporation Method, program, and system for converting part of graph data to data structure as an image of homomorphism
US20130018702A1 (en) * 2011-04-22 2013-01-17 Progress Software Corporation System and method for responsive process management driven by business visibility and complex event processing
US8423575B1 (en) 2011-09-29 2013-04-16 International Business Machines Corporation Presenting information from heterogeneous and distributed data sources with real time updates
US20130106860A1 (en) * 2011-10-28 2013-05-02 International Business Machines Corporation Visualization of virtual image relationships and attributes
US20130226893A1 (en) * 2009-11-19 2013-08-29 21Ct, Inc. System and method for optimizing pattern query searches on a graph database
US20130231978A1 (en) * 2012-03-01 2013-09-05 International Business Machines Corporation Integrated case management history and analytics
US8589331B2 (en) * 2010-10-22 2013-11-19 International Business Machines Corporation Predicting outcomes of a content driven process instance execution
US20130325831A1 (en) * 2012-05-31 2013-12-05 International Business Machines Corporation Search quality via query provenance visualization
US20140074760A1 (en) * 2012-09-13 2014-03-13 Nokia Corporation Method and apparatus for providing standard data processing model through machine learning
US8700678B1 (en) * 2011-12-21 2014-04-15 Emc Corporation Data provenance in computing infrastructure
US8818932B2 (en) 2011-02-14 2014-08-26 Decisive Analytics Corporation Method and apparatus for creating a predictive model
US8825581B2 (en) 2012-09-10 2014-09-02 International Business Machines Corporation Simplifying a graph of correlation rules while preserving semantic coverage
US8928665B2 (en) 2011-06-21 2015-01-06 International Business Machines Corporation Supporting recursive dynamic provenance annotations over data graphs
US9069844B2 (en) 2011-11-02 2015-06-30 Sap Se Facilitating extraction and discovery of enterprise services
US20150234936A1 (en) * 2014-02-20 2015-08-20 Fujitsu Limited Event propagation in graph data
US9177289B2 (en) 2012-05-03 2015-11-03 Sap Se Enhancing enterprise service design knowledge using ontology-based clustering
US20150324241A1 (en) * 2014-05-06 2015-11-12 International Business Machines Corporation Leveraging path information to generate predictions for parallel business processes
US9195725B2 (en) 2012-07-23 2015-11-24 International Business Machines Corporation Resolving database integration conflicts using data provenance
US20160301706A1 (en) * 2015-04-07 2016-10-13 Passport Health Communications, Inc. Enriched system for suspicious interaction record detection
US20170039253A1 (en) * 2015-08-03 2017-02-09 Palantir Technologies, Inc. Providing full data provenance visualization for versioned datasets
US20180013643A1 (en) * 2011-10-14 2018-01-11 Mimecast Services Ltd. Determining events by analyzing stored electronic communications
US9946738B2 (en) 2014-11-05 2018-04-17 Palantir Technologies, Inc. Universal data pipeline
US10007674B2 (en) 2016-06-13 2018-06-26 Palantir Technologies Inc. Data revision control in large-scale data analytic systems
US10235781B2 (en) * 2016-01-15 2019-03-19 Oracle International Corporation Visualization of provenance data
US20190087755A1 (en) * 2017-09-15 2019-03-21 International Business Machines Corporation Cognitive process learning
US10438119B2 (en) * 2012-10-12 2019-10-08 International Business Machines Corporation Text-based inference chaining
US10540624B2 (en) * 2016-07-20 2020-01-21 International Business Machines Corporation System and method to automate provenance-aware application execution
US10664338B2 (en) 2017-12-12 2020-05-26 International Business Machines Corporation System and method for root cause analysis in large scale data curation flows using provenance
US10754822B1 (en) 2018-04-18 2020-08-25 Palantir Technologies Inc. Systems and methods for ontology migration
WO2020197794A1 (en) * 2019-03-22 2020-10-01 Microsoft Technology Licensing, Llc Multilevel data lineage view
CN112328839A (en) * 2020-11-05 2021-02-05 航天信息股份有限公司 Enterprise risk identification method and system based on enterprise sales relationship map
US10936988B2 (en) 2017-09-15 2021-03-02 International Business Machines Corporation Cognitive process enactment
US10956406B2 (en) 2017-06-12 2021-03-23 Palantir Technologies Inc. Propagated deletion of database records and derived data
US11080296B2 (en) 2015-09-09 2021-08-03 Palantir Technologies Inc. Domain-specific language for dataset transformations
US11120005B2 (en) 2018-01-26 2021-09-14 International Business Machines Corporation Reliable workflow system provenance tracking at runtime
US11488029B2 (en) 2017-09-15 2022-11-01 International Business Machines Corporation Cognitive process code generation
US11651003B2 (en) 2019-09-27 2023-05-16 Tableau Software, LLC Interactive data visualization interface for data and graph models
US11687571B2 (en) 2019-04-19 2023-06-27 Tableau Software, LLC Interactive lineage analyzer for data assets
US11829421B2 (en) * 2019-11-08 2023-11-28 Tableau Software, LLC Dynamic graph generation for interactive data analysis

Families Citing this family (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US10275476B2 (en) * 2014-12-22 2019-04-30 Verizon Patent And Licensing Inc. Machine to machine data aggregator
KR101783791B1 (en) * 2016-05-01 2017-10-11 충북대학교 산학협력단 Compression apparatus and method for managing provenance
US10776740B2 (en) 2016-06-07 2020-09-15 International Business Machines Corporation Detecting potential root causes of data quality issues using data lineage graphs
US10489225B2 (en) 2017-08-10 2019-11-26 Bank Of America Corporation Automatic resource dependency tracking and structure for maintenance of resource fault propagation
US11816596B2 (en) 2020-02-25 2023-11-14 Apps Consultants Inc. Process discovery and optimization using time-series databases, graph-analytics, and machine learning
US11367008B2 (en) 2020-05-01 2022-06-21 Cognitive Ops Inc. Artificial intelligence techniques for improving efficiency
US11403120B1 (en) 2021-01-27 2022-08-02 UiPath, Inc. Enterprise process graphs for representing RPA data

Citations (19)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6278977B1 (en) * 1997-08-01 2001-08-21 International Business Machines Corporation Deriving process models for workflow management systems from audit trails
US20030144868A1 (en) * 2001-10-11 2003-07-31 Macintyre James W. System, method, and computer program product for processing and visualization of information
US6763353B2 (en) * 1998-12-07 2004-07-13 Vitria Technology, Inc. Real time business process analysis method and apparatus
US20040174397A1 (en) * 2003-03-05 2004-09-09 Paul Cereghini Integration of visualizations, reports, and data
US6920474B2 (en) * 2002-03-25 2005-07-19 Data Quality Solutions, Inc. Method and system for enterprise business process management
US20050210473A1 (en) * 2004-03-08 2005-09-22 Frank Inchingolo Controlling task execution
US20060149759A1 (en) * 2004-12-30 2006-07-06 Bird Colin L Method and apparatus for managing feedback in a group resource environment
US20060184410A1 (en) * 2003-12-30 2006-08-17 Shankar Ramamurthy System and method for capture of user actions and use of capture data in business processes
US7143392B2 (en) * 2001-09-19 2006-11-28 Hewlett-Packard Development Company, L.P. Hyperbolic tree space display of computer system monitoring and analysis data
US20070288479A1 (en) * 2006-06-09 2007-12-13 Copyright Clearance Center, Inc. Method and apparatus for converting a document universal resource locator to a standard document identifier
US20080103749A1 (en) * 2006-10-26 2008-05-01 Hewlett-Packard Development Company, L.P. Computer network management
US20080126042A1 (en) * 2006-08-23 2008-05-29 Kim Tyler T System And Method For Optimum Phasing Of A Three-Shaft Steering Column
US20090281865A1 (en) * 2008-05-08 2009-11-12 Todor Stoitsev Method and system to manage a business process
US20090292818A1 (en) * 2008-05-22 2009-11-26 Marion Lee Blount Method and Apparatus for Determining and Validating Provenance Data in Data Stream Processing System
US7668726B2 (en) * 1999-06-14 2010-02-23 Bally Technologies, Inc. Data visualisation system and method
US20100082331A1 (en) * 2008-09-30 2010-04-01 Xerox Corporation Semantically-driven extraction of relations between named entities
US20100114627A1 (en) * 2008-11-06 2010-05-06 Adler Sharon C Processing of Provenance Data for Automatic Discovery of Enterprise Process Information
US20100114630A1 (en) * 2008-11-06 2010-05-06 Adler Sharon C Influencing Behavior of Enterprise Operations During Process Enactment Using Provenance Data
US20100114628A1 (en) * 2008-11-06 2010-05-06 Adler Sharon C Validating Compliance in Enterprise Operations Based on Provenance Data

Family Cites Families (26)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6236994B1 (en) 1997-10-21 2001-05-22 Xerox Corporation Method and apparatus for the integration of information and knowledge
US6138121A (en) 1998-05-29 2000-10-24 Hewlett-Packard Company Network management event storage and manipulation using relational database technology in a data warehouse
EP1129417A4 (en) 1998-12-04 2004-06-30 Technology Enabling Company Ll Systems and methods for organizing data
US7200563B1 (en) 1999-08-20 2007-04-03 Acl International Inc. Ontology-driven information system
US6922685B2 (en) 2000-05-22 2005-07-26 Mci, Inc. Method and system for managing partitioned data resources
US7039953B2 (en) 2001-08-30 2006-05-02 International Business Machines Corporation Hierarchical correlation of intrusion detection events
EP1444629A4 (en) 2001-10-23 2006-06-14 Electronic Data Syst Corp System and method for managing spending
AU2003901152A0 (en) 2003-03-12 2003-03-27 Intotality Pty Ltd Network service management system and method
US20060242180A1 (en) 2003-07-23 2006-10-26 Graf James A Extracting data from semi-structured text documents
US20040107124A1 (en) 2003-09-24 2004-06-03 James Sharpe Software Method for Regulatory Compliance
US20050071207A1 (en) 2003-09-26 2005-03-31 E2Open Llc Visibility and synchronization in a multi tier supply chain model
US20060149589A1 (en) 2005-01-03 2006-07-06 Cerner Innovation, Inc. System and method for clinical workforce management interface
CA2560277A1 (en) 2004-03-19 2005-09-29 Oversight Technologies, Inc. Methods and systems for transaction compliance monitoring
US7552447B2 (en) 2004-05-26 2009-06-23 International Business Machines Corporation System and method for using root cause analysis to generate a representation of resource dependencies
US20060253477A1 (en) 2004-10-12 2006-11-09 Maranhao Ramiro M Automated data collection and management system
US7610545B2 (en) 2005-06-06 2009-10-27 Bea Systems, Inc. Annotations for tracking provenance
JP2009504026A (en) 2005-07-27 2009-01-29 ダグ カーソン アンド アソシエーツ,インク. Verification history data associated with digital content
US20070156478A1 (en) 2005-09-23 2007-07-05 Accenture Global Services Gmbh High performance business framework and associated analysis and diagnostic tools and processes
US8893111B2 (en) 2006-03-31 2014-11-18 The Invention Science Fund I, Llc Event evaluation using extrinsic state information
US20080040181A1 (en) 2006-04-07 2008-02-14 The University Of Utah Research Foundation Managing provenance for an evolutionary workflow process in a collaborative environment
US20080005194A1 (en) 2006-05-05 2008-01-03 Lockheed Martin Corporation System and method for immutably cataloging and storing electronic assets in a large scale computer system
US20070276711A1 (en) 2006-05-23 2007-11-29 Simon Shiu Method of monitoring procedural compliance of business processes
US20080126399A1 (en) 2006-06-29 2008-05-29 Macgregor Robert M Method and apparatus for optimizing data while preserving provenance information for the data
WO2008039741A2 (en) 2006-09-25 2008-04-03 Mark Business Intelligence Systems, Llc. System and method for project process and workflow optimization
GB0621409D0 (en) 2006-10-27 2006-12-06 Ibm Access control within a publish/subscribe system
US7908281B2 (en) 2006-11-22 2011-03-15 Architecture Technology Corporation Dynamic assembly of information pedigrees

Patent Citations (19)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6278977B1 (en) * 1997-08-01 2001-08-21 International Business Machines Corporation Deriving process models for workflow management systems from audit trails
US6763353B2 (en) * 1998-12-07 2004-07-13 Vitria Technology, Inc. Real time business process analysis method and apparatus
US7668726B2 (en) * 1999-06-14 2010-02-23 Bally Technologies, Inc. Data visualisation system and method
US7143392B2 (en) * 2001-09-19 2006-11-28 Hewlett-Packard Development Company, L.P. Hyperbolic tree space display of computer system monitoring and analysis data
US20030144868A1 (en) * 2001-10-11 2003-07-31 Macintyre James W. System, method, and computer program product for processing and visualization of information
US6920474B2 (en) * 2002-03-25 2005-07-19 Data Quality Solutions, Inc. Method and system for enterprise business process management
US20040174397A1 (en) * 2003-03-05 2004-09-09 Paul Cereghini Integration of visualizations, reports, and data
US20060184410A1 (en) * 2003-12-30 2006-08-17 Shankar Ramamurthy System and method for capture of user actions and use of capture data in business processes
US20050210473A1 (en) * 2004-03-08 2005-09-22 Frank Inchingolo Controlling task execution
US20060149759A1 (en) * 2004-12-30 2006-07-06 Bird Colin L Method and apparatus for managing feedback in a group resource environment
US20070288479A1 (en) * 2006-06-09 2007-12-13 Copyright Clearance Center, Inc. Method and apparatus for converting a document universal resource locator to a standard document identifier
US20080126042A1 (en) * 2006-08-23 2008-05-29 Kim Tyler T System And Method For Optimum Phasing Of A Three-Shaft Steering Column
US20080103749A1 (en) * 2006-10-26 2008-05-01 Hewlett-Packard Development Company, L.P. Computer network management
US20090281865A1 (en) * 2008-05-08 2009-11-12 Todor Stoitsev Method and system to manage a business process
US20090292818A1 (en) * 2008-05-22 2009-11-26 Marion Lee Blount Method and Apparatus for Determining and Validating Provenance Data in Data Stream Processing System
US20100082331A1 (en) * 2008-09-30 2010-04-01 Xerox Corporation Semantically-driven extraction of relations between named entities
US20100114627A1 (en) * 2008-11-06 2010-05-06 Adler Sharon C Processing of Provenance Data for Automatic Discovery of Enterprise Process Information
US20100114630A1 (en) * 2008-11-06 2010-05-06 Adler Sharon C Influencing Behavior of Enterprise Operations During Process Enactment Using Provenance Data
US20100114628A1 (en) * 2008-11-06 2010-05-06 Adler Sharon C Validating Compliance in Enterprise Operations Based on Provenance Data

Cited By (68)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8943084B2 (en) 1920-05-20 2015-01-27 International Business Machines Corporation Method, program, and system for converting part of graph data to data structure as an image of homomorphism
US8166122B2 (en) * 2009-03-24 2012-04-24 Lockheed Martin Corporation Method and apparatus for generating a figure of merit for use in transmission of messages in a multi-level secure environment
US20100250689A1 (en) * 2009-03-24 2010-09-30 Lockheed Martin Corporation Method and apparatus for generating a figure of merit for use in transmission of messages in a multi-level secure environment
US20130226893A1 (en) * 2009-11-19 2013-08-29 21Ct, Inc. System and method for optimizing pattern query searches on a graph database
US9292570B2 (en) * 2009-11-19 2016-03-22 Northrop Grumman Systems Corporation System and method for optimizing pattern query searches on a graph database
US20110153311A1 (en) * 2009-12-17 2011-06-23 Boegl Andreas Method and an apparatus for automatically providing a common modelling pattern
US9305271B2 (en) * 2009-12-17 2016-04-05 Siemens Aktiengesellschaft Method and an apparatus for automatically providing a common modelling pattern
US8589331B2 (en) * 2010-10-22 2013-11-19 International Business Machines Corporation Predicting outcomes of a content driven process instance execution
US20120143867A1 (en) * 2010-12-07 2012-06-07 Sap Ag Facilitating Extraction and Discovery of Enterprise Services
US9189566B2 (en) * 2010-12-07 2015-11-17 Sap Se Facilitating extraction and discovery of enterprise services
US8818932B2 (en) 2011-02-14 2014-08-26 Decisive Analytics Corporation Method and apparatus for creating a predictive model
US20130018702A1 (en) * 2011-04-22 2013-01-17 Progress Software Corporation System and method for responsive process management driven by business visibility and complex event processing
US10528906B2 (en) * 2011-04-22 2020-01-07 Progress Software Corporation System and method for responsive process management driven by business visibility and complex event processing
US20120296924A1 (en) * 2011-05-20 2012-11-22 International Business Machines Corporation Method, program, and system for converting part of graph data to data structure as an image of homomorphism
US8914391B2 (en) * 2011-05-20 2014-12-16 International Business Machines Corporation Method, program, and system for converting part of graph data to data structure as an image of homomorphism
US8928665B2 (en) 2011-06-21 2015-01-06 International Business Machines Corporation Supporting recursive dynamic provenance annotations over data graphs
US8423575B1 (en) 2011-09-29 2013-04-16 International Business Machines Corporation Presenting information from heterogeneous and distributed data sources with real time updates
US8589444B2 (en) 2011-09-29 2013-11-19 International Business Machines Corporation Presenting information from heterogeneous and distributed data sources with real time updates
US10547525B2 (en) * 2011-10-14 2020-01-28 Mimecast Services Ltd. Determining events by analyzing stored electronic communications
US20180013643A1 (en) * 2011-10-14 2018-01-11 Mimecast Services Ltd. Determining events by analyzing stored electronic communications
US20130106860A1 (en) * 2011-10-28 2013-05-02 International Business Machines Corporation Visualization of virtual image relationships and attributes
US8754892B2 (en) * 2011-10-28 2014-06-17 International Business Machines Corporation Visualization of virtual image relationships and attributes
US8749554B2 (en) * 2011-10-28 2014-06-10 International Business Machines Corporation Visualization of virtual image relationships and attributes
US9740754B2 (en) 2011-11-02 2017-08-22 Sap Se Facilitating extraction and discovery of enterprise services
US9069844B2 (en) 2011-11-02 2015-06-30 Sap Se Facilitating extraction and discovery of enterprise services
US8700678B1 (en) * 2011-12-21 2014-04-15 Emc Corporation Data provenance in computing infrastructure
US9710332B1 (en) 2011-12-21 2017-07-18 EMC IP Holding Company LLC Data provenance in computing infrastructure
US20130231978A1 (en) * 2012-03-01 2013-09-05 International Business Machines Corporation Integrated case management history and analytics
US9177289B2 (en) 2012-05-03 2015-11-03 Sap Se Enhancing enterprise service design knowledge using ontology-based clustering
US8892546B2 (en) * 2012-05-31 2014-11-18 International Business Machines Corporation Search quality via query provenance visualization
US20130325831A1 (en) * 2012-05-31 2013-12-05 International Business Machines Corporation Search quality via query provenance visualization
US9195725B2 (en) 2012-07-23 2015-11-24 International Business Machines Corporation Resolving database integration conflicts using data provenance
US8825581B2 (en) 2012-09-10 2014-09-02 International Business Machines Corporation Simplifying a graph of correlation rules while preserving semantic coverage
US9324033B2 (en) * 2012-09-13 2016-04-26 Nokia Technologies Oy Method and apparatus for providing standard data processing model through machine learning
US20140074760A1 (en) * 2012-09-13 2014-03-13 Nokia Corporation Method and apparatus for providing standard data processing model through machine learning
US10438119B2 (en) * 2012-10-12 2019-10-08 International Business Machines Corporation Text-based inference chaining
US11182679B2 (en) * 2012-10-12 2021-11-23 International Business Machines Corporation Text-based inference chaining
US20150234936A1 (en) * 2014-02-20 2015-08-20 Fujitsu Limited Event propagation in graph data
US9372736B2 (en) * 2014-05-06 2016-06-21 International Business Machines Corporation Leveraging path information to generate predictions for parallel business processes
US20150324241A1 (en) * 2014-05-06 2015-11-12 International Business Machines Corporation Leveraging path information to generate predictions for parallel business processes
US9946738B2 (en) 2014-11-05 2018-04-17 Palantir Technologies, Inc. Universal data pipeline
US10853338B2 (en) 2014-11-05 2020-12-01 Palantir Technologies Inc. Universal data pipeline
US10191926B2 (en) 2014-11-05 2019-01-29 Palantir Technologies, Inc. Universal data pipeline
US20160301706A1 (en) * 2015-04-07 2016-10-13 Passport Health Communications, Inc. Enriched system for suspicious interaction record detection
US10187399B2 (en) * 2015-04-07 2019-01-22 Passport Health Communications, Inc. Enriched system for suspicious interaction record detection
US20170039253A1 (en) * 2015-08-03 2017-02-09 Palantir Technologies, Inc. Providing full data provenance visualization for versioned datasets
US9996595B2 (en) * 2015-08-03 2018-06-12 Palantir Technologies, Inc. Providing full data provenance visualization for versioned datasets
US11080296B2 (en) 2015-09-09 2021-08-03 Palantir Technologies Inc. Domain-specific language for dataset transformations
US10235781B2 (en) * 2016-01-15 2019-03-19 Oracle International Corporation Visualization of provenance data
US10580177B2 (en) * 2016-01-15 2020-03-03 Oracle International Corporation Visualization of provenance data
US10007674B2 (en) 2016-06-13 2018-06-26 Palantir Technologies Inc. Data revision control in large-scale data analytic systems
US11841835B2 (en) 2016-06-13 2023-12-12 Palantir Technologies Inc. Data revision control in large-scale data analytic systems
US11106638B2 (en) 2016-06-13 2021-08-31 Palantir Technologies Inc. Data revision control in large-scale data analytic systems
US10540624B2 (en) * 2016-07-20 2020-01-21 International Business Machines Corporation System and method to automate provenance-aware application execution
US10956406B2 (en) 2017-06-12 2021-03-23 Palantir Technologies Inc. Propagated deletion of database records and derived data
US10936988B2 (en) 2017-09-15 2021-03-02 International Business Machines Corporation Cognitive process enactment
US20190087755A1 (en) * 2017-09-15 2019-03-21 International Business Machines Corporation Cognitive process learning
US10846644B2 (en) * 2017-09-15 2020-11-24 International Business Machines Corporation Cognitive process learning
US11488029B2 (en) 2017-09-15 2022-11-01 International Business Machines Corporation Cognitive process code generation
US10664338B2 (en) 2017-12-12 2020-05-26 International Business Machines Corporation System and method for root cause analysis in large scale data curation flows using provenance
US11120005B2 (en) 2018-01-26 2021-09-14 International Business Machines Corporation Reliable workflow system provenance tracking at runtime
US10754822B1 (en) 2018-04-18 2020-08-25 Palantir Technologies Inc. Systems and methods for ontology migration
US11093483B2 (en) 2019-03-22 2021-08-17 Microsoft Technology Licensing, Llc Multilevel data lineage view
WO2020197794A1 (en) * 2019-03-22 2020-10-01 Microsoft Technology Licensing, Llc Multilevel data lineage view
US11687571B2 (en) 2019-04-19 2023-06-27 Tableau Software, LLC Interactive lineage analyzer for data assets
US11651003B2 (en) 2019-09-27 2023-05-16 Tableau Software, LLC Interactive data visualization interface for data and graph models
US11829421B2 (en) * 2019-11-08 2023-11-28 Tableau Software, LLC Dynamic graph generation for interactive data analysis
CN112328839A (en) * 2020-11-05 2021-02-05 航天信息股份有限公司 Enterprise risk identification method and system based on enterprise sales relationship map

Also Published As

Publication number Publication date
US9053437B2 (en) 2015-06-09

Similar Documents

Publication Publication Date Title
US9053437B2 (en) Extracting enterprise information through analysis of provenance data
US8595042B2 (en) Processing of provenance data for automatic discovery of enterprise process information
US8209204B2 (en) Influencing behavior of enterprise operations during process enactment using provenance data
US20100114628A1 (en) Validating Compliance in Enterprise Operations Based on Provenance Data
Curbera et al. Business provenance–a technology to increase traceability of end-to-end operations
Reijers et al. Human and automatic modularizations of process models to enhance their comprehension
Van der Aalst et al. Discovering workflow performance models from timed logs
US8600792B2 (en) Business process visibility at real time
Wickboldt et al. A framework for risk assessment based on analysis of historical information of workflow execution in IT systems
Rozinat et al. Discovering colored Petri nets from event logs
US20060184410A1 (en) System and method for capture of user actions and use of capture data in business processes
US20080065400A1 (en) System and Method for Producing Audit Trails
Koetter et al. A model-driven approach for event-based business process monitoring
Shihab An exploration of challenges limiting pragmatic software defect prediction
Biffl et al. Semantic integration of heterogeneous data sources for monitoring frequent-release software projects
Van der Aalst et al. Getting the data
Verhulst Evaluating quality of event data within event logs: an extensible framework
Kebede et al. Comparative evaluation of process mining tools
Curty et al. Design of blockchain-based applications using model-driven engineering and low-code/no-code platforms: a structured literature review
Oliveira et al. Using REO on ETL conceptual modelling: a first approach
Wetzstein KPI-related monitoring, analysis, and adaptation of business processes
US20140372386A1 (en) Detecting wasteful data collection
Hompes et al. Lifecycle-based process performance analysis
US10938673B2 (en) Automated SLA non-compliance detection and prevention system for batch jobs
Bala et al. Uncovering the hidden co-evolution in the work history of software projects

Legal Events

Date Code Title Description
AS Assignment

Owner name: INTERNATIONAL BUSINESS MACHINES CORPORATION,NEW YO

Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNORS:ADLER, SHARON C.;CURBERA, FRANCISCO PHELAN;DOGANATA, YURDAER NEZIHI;AND OTHERS;SIGNING DATES FROM 20081211 TO 20081217;REEL/FRAME:022003/0832

Owner name: INTERNATIONAL BUSINESS MACHINES CORPORATION, NEW Y

Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNORS:ADLER, SHARON C.;CURBERA, FRANCISCO PHELAN;DOGANATA, YURDAER NEZIHI;AND OTHERS;SIGNING DATES FROM 20081211 TO 20081217;REEL/FRAME:022003/0832

STCF Information on status: patent grant

Free format text: PATENTED CASE

FEPP Fee payment procedure

Free format text: MAINTENANCE FEE REMINDER MAILED (ORIGINAL EVENT CODE: REM.); ENTITY STATUS OF PATENT OWNER: LARGE ENTITY

LAPS Lapse for failure to pay maintenance fees

Free format text: PATENT EXPIRED FOR FAILURE TO PAY MAINTENANCE FEES (ORIGINAL EVENT CODE: EXP.); ENTITY STATUS OF PATENT OWNER: LARGE ENTITY

STCH Information on status: patent discontinuation

Free format text: PATENT EXPIRED DUE TO NONPAYMENT OF MAINTENANCE FEES UNDER 37 CFR 1.362

FP Expired due to failure to pay maintenance fee

Effective date: 20190609