EP1810131A2 - Architecture orientee services pour services d'integration de donnees - Google Patents

Architecture orientee services pour services d'integration de donnees

Info

Publication number
EP1810131A2
EP1810131A2 EP05792780A EP05792780A EP1810131A2 EP 1810131 A2 EP1810131 A2 EP 1810131A2 EP 05792780 A EP05792780 A EP 05792780A EP 05792780 A EP05792780 A EP 05792780A EP 1810131 A2 EP1810131 A2 EP 1810131A2
Authority
EP
European Patent Office
Prior art keywords
module
data
service
function
services
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Withdrawn
Application number
EP05792780A
Other languages
German (de)
English (en)
Other versions
EP1810131A4 (fr
Inventor
Vinodh Arjun
Hernando Borda
Thomas Cherel
Rajiv Kadayam
Truong Le
Jean-Claude Mamou
Lee J. Scheffler
Rick Stiller
Christian Tawil
Brian Tinnel
Henry Zheng
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
International Business Machines Corp
Original Assignee
Ascential Software Corp
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Ascential Software Corp filed Critical Ascential Software Corp
Publication of EP1810131A2 publication Critical patent/EP1810131A2/fr
Publication of EP1810131A4 publication Critical patent/EP1810131A4/fr
Withdrawn legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q10/00Administration; Management
    • G06Q10/06Resources, workflows, human or project management; Enterprise or organisation planning; Enterprise or organisation modelling

Definitions

  • This invention relates to the field of information technology, and more particularly to the field of data integration systems.
  • EAI efforts encounter many challenges, ranging from the need to handle different protocols, the need to address ever-increasing volumes of data and numbers of transactions, and an ever-increasing appetite for faster integration of data.
  • Various approaches to EAI have been taken, including least-common-denominator approaches, atomic approaches, and bridge-type approaches.
  • EAI is based upon communication between individual applications.
  • the complexity of EAI solutions grows geometrically in response to linear additions of platforms and applications.
  • the methods and systems may include providing one or more modules, tools, facilities, functions, services, processes or the like that perform a data integration function.
  • the methods and systems may also include providing a registry of services that can be accessed by users, such as users identifying, designing, developing, deploying and using data integration jobs or platfo ⁇ ns.
  • One or more of the modules, tools, facilities, functions, services, processes or the like can be provided with an input stage, and output stage, or both, such as a binding, that permits the data integration module, tool, function, service, or process to be accessed through the registry, such as for real-time or batch execution of the data integration functionality that is supported by the module, tool, facility, function, service or process.
  • the module, tool, facility, function, service or process can be identified and used as a service in a services oriented architecture.
  • any one of them can be modified without impacting the performance of other related items.
  • a wide range of data integration modules, tools, facilities, functions, services and processes can be deployed as services in a services oriented architecture for a data integration job, method, platform or system.
  • the foregoing may include an extraction function, a data transformation, a loading function, a metadata management function, a data profiling function, a mapping function, a data auditing function, a data quality function, a data cleansing function, a matching function, a probabilistic matching function, a metabroker function, a data migration function, an atomic data repository function, a semantic identification function, a filtering function, a refinement and selection function, a design interface function, or many others.
  • the methods and systems described herein include providing a module for a data extraction function, providing a registry of services, providing an interface for the module, and identifying the module in the registry, wherein the module can be accessed as a service in a services oriented architecture.
  • the methods and systems described herein also include providing a module for a data transformation function, providing a registry of services, providing an interface for the module, and identifying the module in the registry, wherein the module can be accessed as a service in a services oriented architecture.
  • the methods and systems described herein also include providing a module for a data loading function, providing a registry of services, providing an interface for the module, and identifying the module in the registry, wherein the module can be accessed as a service in a services oriented architecture.
  • the methods and systems described herein also include providing a module for a metadata management function, providing a registry of services, providing an interface for the module, and identifying the module in the registry, wherein the module can be accessed as a service in a services oriented architecture.
  • the methods and systems described herein also include providing a module for a data profiling function, providing a registry of services, providing an interface for the module, and identifying the module in the registry, wherein the module can be accessed as a service in a services oriented architecture.
  • the methods and systems described herein also include providing a module for a data auditing function, providing a registry of services, providing an interface for the module, and identifying the module in the registry, wherein the module can be accessed as a service in a services oriented architecture.
  • the methods and systems described herein also include providing a module for a data cleansing function, providing a registry of services, providing an interface for the module, and identifying the module in the registry, wherein the module can be accessed as a service in a services oriented architecture.
  • the methods and systems described herein also include providing a module for a data quality function, providing a registry of services, providing an interface for the module, and identifying the module in the registry, wherein the module can be accessed as a service in a services oriented architecture.
  • the methods and systems described herein also include providing a module for a data matching function, providing a registry of services, providing an interface for the module, and identifying the module in the registry, wherein the module can be accessed as a service in a services oriented architecture.
  • the data matching function may be a probabilistic matching function.
  • the methods and systems described herein also include providing a module for a metabroker function, providing a registry of services, providing an interface for the module, and identifying the module in the registry, wherein the module can be accessed as a service in a services oriented architecture.
  • the metabroker function maintains the semantics of a data integration function across multiple data integration platforms.
  • the methods and systems described herein also include providing a module for a data migration function, providing a registry of services, providing an interface for the module, and identifying the module in the registry, wherein the module can be accessed as a service in a services oriented architecture.
  • the methods and systems described herein also include providing a module for an atomic data repository, providing a registry of services, providing an interface for the module, and identifying the module in the registry, wherein the module can be accessed as a service in a services oriented architecture.
  • the methods and systems described herein also include providing a module for a semantic identification function, providing a registry of services, providing an interface for the module, and identifying the module in the registry, wherein the module can be accessed as a service in a services oriented architecture.
  • the methods and systems described herein also include providing a module for a filtering function, providing a registry of services, providing an interface for the module, and identifying the module in the registry, wherein the module can be accessed as a service in a services oriented architecture.
  • the filtering is based on a level of abstraction.
  • the level of abstraction can be at least one of a physical level of abstraction and a logical level of abstraction.
  • the methods and systems described herein also include providing a module for a refinement and selection facility, providing a registry of services, providing an interface for the facility, and identifying the facility in the registry, wherein the facility can be accessed as a service in a services oriented architecture.
  • the refinement and selection facility allows the system to distinguish between a logical level of abstraction and a physical level of abstraction.
  • the methods and systems described herein also include providing a module for analyzing the contents of a database, providing a registry of services, providing an interface for the module, and identifying the module in the registry, wherein the module can be accessed as a service in a services oriented architecture. 5 030953
  • the methods and systems described herein also include providing a module for analyzing a table of a database, providing a registry of services, providing an interface for the module, and identifying the module in the registry, wherein the module can be accessed as a service in a services oriented architecture.
  • the methods and systems described herein also include providing a module for analyzing a row of a database, providing a registry of services, providing an interface for the module, and identifying the module in the registry, wherein the module can be accessed as a service in a services oriented architecture.
  • the methods and systems described herein also include providing a module for analyzing a data structure, providing a registry of services, providing an interface for the module, and identifying the module in the registry, wherein the module can be accessed as a service in a services oriented architecture.
  • the methods and systems described herein also include providing a module for recommending a target data facility, providing a registry of services, providing an interface for the module, and identifying the module in the registry, wherein the module can be accessed as a service in a services oriented architecture.
  • the methods and systems described herein also include providing a module for providing a primary key for a data integration function, providing a registry of services, providing an interface for the module, and identifying the module in the registry, wherein the module can be accessed as a service in a services oriented architecture.
  • the methods and systems described herein also include providing a module for providing a foreign key for a data integration function, providing a registry of services, providing an interface for the module, and identifying the module in the registry, wherein the module can be accessed as a service in a services oriented architecture.
  • the methods and systems described herein also include providing a module for providing a table normalization for a data integration function, providing a registry of services, providing an interface for the module, and identifying the module in the registry, wherein the module can be accessed as a service in a services oriented architecture.
  • the methods and systems described herein also include providing source-to-target mapping for a data integration function, providing a registry of services, providing an interface for the module, and identifying the module in the registry, wherein the module can be accessed as a service in a services oriented architecture.
  • the methods and systems described herein also include providing a module for automatically generating a data integration job from a profile for a data integration job, providing a registry of services, providing an interface for the module, and identifying the module in the registry, wherein the module can be accessed as a service in a services oriented architecture.
  • the methods and systems described herein also include providing a module for defect detection, providing a registry of services, providing an interface for the module, and identifying the module in the registry, wherein the module can be accessed as a service in a services oriented architecture.
  • the methods and systems described herein also include providing a module for measuring the performance of a data integration function, providing a registry of services, providing an interface for the module, and identifying the module in the registry, wherein the module can be accessed as a service in a services oriented architecture.
  • the methods and systems described herein also include providing a module for data de-duplication, providing a registry of services, providing an interface for the module, and identifying the module in the registry, wherein the module can be accessed as a service in a services oriented architecture.
  • the de- duplication module matches data items based on a probability.
  • the de-duplication module discards duplicate items.
  • the methods and systems described herein also include providing a module for statistical analysis of a plurality of data items, providing a registry of services, providing an interface for the module, and identifying the module in the registry, wherein the module can be accessed as a service in a services oriented architecture.
  • the methods and systems described herein also include providing a module for reconciling data from a plurality of data facilities, providing a registry of services, providing an interface for the module, and identifying the module in the registry, wherein the module can be accessed as a service in a services oriented architecture.
  • the methods and systems described herein also include providing a module for accessing library of transformation functions, providing a registry of services, providing an interface for the module, and identifying the module in the registry, wherein the module can be accessed as a service in a services oriented architecture.
  • the methods and systems described herein also include providing a module for managing versions of a data integration job, providing a registry of services, providing an interface for the module, and identifying the module in the registry, wherein the module can be accessed as a service in a services oriented architecture.
  • the methods and systems described herein also include providing a module for managing versions of a data integration job, providing a registry of services, providing an interface for the module, and identifying the module in the registry, wherein the module can be accessed as a service in a services oriented architecture.
  • the module allows a user to share a version with another user.
  • the module allows a user to check in and check out a version of a data integration job in order to use the data integration job.
  • the methods and systems described herein also include providing a module for parallel execution of a data integration function, providing a registry of services, providing an interface for the module, and identifying the module in the registry, wherein the module can be accessed as a service in a services oriented architecture.
  • the methods and systems described herein also include providing a module for partitioning data, providing a registry of services, providing an interface for the module, and identifying the module in the registry, wherein the module can be accessed as a service in a services oriented architecture.
  • the methods and systems described herein also include providing a module for partitioning and repartitioning data, providing a registry of services, providing an interface for the module, and identifying the module in the registry, wherein the module can be accessed as a service in a services oriented architecture.
  • the methods and systems described herein also include providing a database interface module, providing a registry of services, providing an interface for the module, and identifying the module in the registry, wherein the module can be accessed as a service in a services oriented architecture.
  • the interface module facilities an interface to databases of a plurality of database vendors.
  • the methods and systems described herein also include providing a module for a data integration function, providing a registry of services, providing an interface for the module, and identifying the module in the registry, wherein the module can be accessed as a service in a services oriented architecture.
  • the methods and systems described herein also include providing a module for synchronizing data, providing a registry of services, providing an interface for the module, and identifying the module in the registry, wherein the module can be accessed as a service in a services oriented architecture.
  • the module facilities synchronization of data across a plurality of hierarchical data formats.
  • the module facilitates synchronization of data across a plurality of transactional formats.
  • the module facilitates synchronization of data across a plurality of operating environments.
  • the module facilitates synchronization of Electronic Data Interchange format data.
  • the module facilitates synchronization of HIPAA data.
  • the module facilitates synchronization of SWIFT format data.
  • the methods and systems described herein also include providing a module for supplying a metadata directory, providing a registry of services, providing an interface for the module, and identifying the module in the registry, wherein the module can be accessed as a service in a services oriented architecture.
  • the methods and systems described herein also include providing a module for graphical depiction of the impact of a change to a data integration function, providing a registry of services, providing an interface for the module, and identifying the module in the registry, wherein the module can be accessed as a service in a services oriented architecture.
  • the methods and systems described herein also include providing a module for creating a metabroker, providing a registry of services, providing an interface for the module, and identifying the module in the registry, wherein the module can be accessed as a service in a services oriented architecture.
  • the methods and systems described herein also include providing a module for a hub repository of metadata, providing a registry of services, providing an interface for the module, and identifying the module in the registry, wherein the module can be accessed as a service in a services oriented architecture.
  • the hub stores semantic models for a plurality of data integration platforms.
  • the methods and systems described herein also include providing a packaged application connectivity kit (PACK), providing a registry of services, providing an interface for the PACK, and identifying the PACK in the registry, wherein the PACK can be accessed as a service in a services oriented architecture.
  • PACK packaged application connectivity kit
  • the methods and systems described herein also include providing a module for storing an industry-specific data model, providing a registry of services, providing an interface for the module, and identifying the module in the registry, wherein the module can be accessed as a service in a services oriented architecture.
  • the model may be a manufacturing industry model, a retail industry model, a telecommunications industry model, a healthcare industry model, a financial services industry model or a model from any other industry.
  • the methods and systems described herein also include providing template for building a data integration function, providing a registry of services, providing an interface for the template, and identifying the template in the registry, wherein the template can be accessed as a service in a services oriented architecture.
  • the methods and systems described herein also include providing a module for creating a business rule, providing a registry of services, providing an interface for the module, and identifying the module in the registry, wherein the module can be accessed as a service in a services oriented architecture.
  • the methods and systems described herein also include providing a module for creating a validation table, providing a registry of services, providing an interface for the module, and identifying the module in the registry, wherein the module can be accessed as a service in a services oriented architecture.
  • the methods and systems described herein also include providing a module for a data integration function, providing a registry of services, providing an interface for the module, and identifying the module in the registry, wherein the module can be accessed as a service in a services oriented architecture.
  • the methods and systems described herein also include providing a module for creating a business metric, providing a registry of services, providing an interface for the module, and identifying the module in the registry, wherein the module can be accessed as a service in a services oriented architecture.
  • the methods and systems described herein also include providing a module for defining a target database, providing a registry of services, providing an interface for the module, and identifying the module in the registry, wherein the module can be accessed as a service in a services oriented architecture.
  • the methods and systems described herein also include providing a module for profiling mainframe data, providing a registry of services, providing an interface for the module, and identifying the module in the registry, wherein the module can be accessed as a service in a services oriented architecture.
  • the methods and systems described herein also include providing a module for batch processing a batch of data, providing a registry of services, providing an interface for the module, and identifying the module in the registry, wherein the module can be accessed as a service in a services oriented architecture.
  • the methods and systems described herein also include providing a module for cross-table analysis, providing a registry of services, providing an interface for the module, and identifying the module in the registry, wherein the module can be accessed as a service in a services oriented architecture.
  • the methods and systems described herein also include providing a module for relationship analysis, providing a registry of services, providing an interface for the module, and identifying the module in the registry, wherein the module can be accessed as a service in a services oriented architecture.
  • the methods and systems described herein also include providing a module for generating data definition language (DDL) code, providing a registry of services, providing an interface for the module, and identifying the module in the registry, wherein the module can be accessed as a service in a services oriented architecture.
  • the methods and systems may further include using the module to create a mapping between source and target data facilities.
  • the methods and systems described herein also include providing a design interface module for designing a data integration job, providing a registry of services, providing an interface for the module, and identifying the module in the registry, wherein the module can be accessed as a service in a services oriented architecture.
  • the methods and systems described herein also include providing a module for developing a data integration job, providing a registry of services, providing an interface for the module, and identifying the module in the registry, wherein the module can be accessed as a service in a services oriented architecture.
  • the methods and systems described herein also include providing a module for deploying a data integration job, providing a registry of services, providing an interface for the module, and identifying the module in the registry, wherein the module can be accessed as a service in a services oriented architecture.
  • the services in a services oriented architecture for a data integration platform or process may be services that are useful for a wide range of integration and computing tasks, including modules that perform functions that are required or beneficial for many common tasks.
  • a monitoring service may be deployed as a service with an input stage and/or output stage in a services oriented architecture.
  • the monitoring service may be invoked by a user to monitor some aspect of the performance of a data integration job or task, or to monitor an event or process.
  • a data integration system may have a service called a job execution service, the purpose of which is to run a job, such as a batch job.
  • a monitoring service a user can monitor how many times the job execution service has been run, how long it took to run, the minimum execution time, maximum execution time, average execution time and other statistics. The user can accomplish all of those functions without seeing the code of the underlying job execution service.
  • monitoring services are deployed as services.
  • a user can ask, for example, how many databases have been touched or other monitoring items that are specific to the semantics of the job execution service.
  • the job execution service can itself be a client of the monitoring service.
  • the system can tell what is happening inside the implementation of another service.
  • each common service such as the monitoring service and the other services
  • various areas can be established for each service, such as what to monitor, the runtime of the service, and an administration part.
  • the user may be queried as to what to monitor.
  • the monitoring service can be used by services in a services oriented architecture to monitor what the services do or may be used to conduct domain-specific monitoring for other events and conditions.
  • a security module or service may be deployed as a service in a services oriented architecture for providing a security capability, such as in connection with a data integration job or task.
  • a security facility such as password protection, encryption, tracking access, restricting access, or the like
  • the user can invoke a security module as a service in a services oriented architecture, so that the user does not have to create a separate security facility for each data integration job or task.
  • a licensing module may be deployed in a services oriented architecture, for enabling licensing functions when invoked by a user. For example, a job designer may cause a data integration job to invoke the licensing service to determine whether a particular task to be executed at runtime does or does not comply with license restrictions, such as license restrictions related to the number of machines, number of users, or the like. The user avoids the need to prepare separate licensing code for each data integration job or task the user creates.
  • An event management module may be deployed in a services oriented architecture for tracking and managing events when invoked by a user through a services registry.
  • the user may access the event management module for any event management required for a data integration job or task, such as tracking events in order to determine when to execute a process or function.
  • the user avoids the need to create separate event management code for each different data integration task or job.
  • a provisioning capability may be provided as a common service in a services oriented architecture.
  • a transaction module may be deployed in a services oriented architecture with an input stage and/or output stage that allows a user to access the transaction module through a services registry, avoiding the need to create separate transaction management code for each application created by the user, such as for a data integration job or task.
  • An auditing module can be deployed in a services oriented architecture with an input stage and/or output stage that allows a user to access the auditing module through a services registry, avoiding the need to create separate auditing code for each application created by the user, such as for a data integration job or task.
  • the user can audit events, such as auditing what users have accessed a particular database or process, what events have taken place, and the like.
  • An auditing module can allow a user to conveniently audit past events without having to generate separate code.
  • a wide variety of common tasks that are necessary or beneficial for data integration jobs or platforms can be created as modules and deployed as services in a services oriented architecture.
  • AOP aspect- oriented programming
  • various metadata functions and modules can be implemented as services with AOP.
  • bindings for services such as EJBs (such as EJB 3.0) may use AOP.
  • a policy may be associated with any of the foregoing services and/or modules using aspect oriented programming.
  • bindings such as EJB, JMS, web services and JCA bindings can be used to invoke services in the various embodiments of services oriented architectures described herein.
  • an application programming interface may be provided for assisting access to a service.
  • the API may be provide various functions, such as selecting a particular binding for a service, where the selection is based on a condition or event, such as selecting a binding that is appropriate for a particular application. For example, bindings may vary in their flexibility, and an API may apply a tight or loose binding based on the conditions of the application or device that accesses the service.
  • the API may be a Java API or similar facility. In embodiments the same Java API may be used for different kinds of bindings.
  • a smart client may be supplied for a service.
  • the smart client may be another layer on top of the API.
  • the smart client may be stored and access through a registry associated with a service.
  • an application may download the appropriate smart client based on the device using the application, the context of the application, or the like.
  • a smart client may be used to buffer certain information that is used by a service and send the information to the service in a package, rather than having an application access the service constantly.
  • a user may wish to log only errors, rather than all events. By holding events until predetermined time periods, the user can reduce the number of calls to the server while still capturing all of the necessary events.
  • the smart client can thus execute various rules that optimize the use of a service by a device or application.
  • the smart client can select a binding, either alone or by interaction with an API, that optimizes the binding of the client-side device or application to the service based on the conditions of access, the capabilities of the device, the context of the access, or the like.
  • the smart client or API can be used to store various access rules. For example, the rules might indicate that if a device or application is inside a firewall, then it can access a service using EJB bindings, while if the device or application is outside the firewall then it will access a service using a web service binding. Any such rules can be embodied in the API or may be included in a smart client, which may optionally be listed in a registry with the service and downloaded by a client device or application that will access the service.
  • One of the benefits of a services oriented architecture is that it facilitates loose coupling between a client device or application that accesses a service and the code for the service itself; that is, a client device or application can invoke and use the service without knowing very much about the code for the service, needing to satisfy only certain predetermined inputs, such as what to input to the service (e.g., a file, an answer to a query, or the like).
  • a client device or application can invoke and use the service without knowing very much about the code for the service, needing to satisfy only certain predetermined inputs, such as what to input to the service (e.g., a file, an answer to a query, or the like).
  • the absence of a tight coupling can result in performance problems, as context-dependent optimizing routines are omitted from the service description in order to make it more generically useful.
  • An API and/or smart client can make up for diminished performance by ensuring that a service is accessed optimally, such as by selecting a correct binding, caching data into batches, to avoid constantly invoking services for small jobs, or the like.
  • a smart client provides effective performance in a loose coupling environment.
  • the smart client thus bridges the gap between a tight coupling environment and a loose coupling environment and allows the user, application or device that accesses a service to choose a type of binding along the spectrum between loose coupling and tight coupling (such as EJB) according to the performance expectation or requirements.
  • EJB coupling may perform better than web services, because EJB couplings are by nature more tightly coupled between client applications and the server side.
  • the smart client improves performance of both EJBs and web services by caching or buffering and sending things in appropriate batches. In situations where it is impossible or not desirable to cache or buffer items, a system can use a tight EJB binding to achieve good performance.
  • the API may hide the binding that the client device or application is using. With a smart client, a user can tune the performance of the system by tuning the level of coupling between the client and the server.
  • the runtime of a service in a services oriented architecture may be a client itself of another service, such one or more of the common services described above.
  • the foregoing can be accomplished using AOP.
  • AOP entities known as interceptors can associate a policy to a service. Inside the policy of the service, interceptors can be plugged into the policies, and the interceptors can be clients of the common services.
  • a policy in a service can include an interceptor that invokes a monitoring service.
  • AOP techniques can be used to insert code of interceptors into the code of various services described herein.
  • AOP a user can create a piece of code and associate an "aspect" - a list of things to insert at runtime to the code as it is being executed.
  • the runtime program calls another piece of code, such as invoking a service, rather than doing what the code would normally do.
  • the code calls another function that is compiled independently.
  • the programmer looks at the source code for a runtime program, the programmer doesn't see the source code for the piece that is invoked by the interceptor.
  • the program can compile the source code to create the byte code, which is the runtime of Java, and a Java virtual machine reads the byte code.
  • the program has the Java code and the aspect.
  • the AOP compiler does byte code manipulation and calls other types of code, such as the services in the services oriented architecture.
  • the methods and systems described herein also include providing a framework for data integration including a module of code developed for accomplishing a data integration task and the deployment of code as a service that is organized in a services oriented architecture.
  • the module of code may be a J2EE artifact.
  • the methods and systems described herein also include providing a framework for data integration including establishing a services oriented architecture and within the services oriented architecture, deploying common services that can be accessed by any other service, including the services oriented architecture itself.
  • the common service may be a monitoring service, a transaction service, a provisioning service, an event management service, a security service, an auditing service and/or a logging service.
  • the service may be invoked by an interceptor inserted in the code for at least one of an application and another service.
  • the service may further comprise associating an extension with a common service, wherein the extension identifies the common service within the architecture for monitoring by at least one of a module, an application and another service within the architecture.
  • the methods and systems described herein also include providing a method of providing data integration services including establishing a services oriented architecture for deploying a service, wherein the service has more than one binding and automatically generating a binding for the service based on an access characteristic.
  • the access characteristic may be based on at least one of a device, an application and/or a service that invokes the service.
  • the methods and systems described herein include using common services in various ways, such as explicitly from an application or from another service, or from an interceptor inserted in a service policy. That allows the same common service to be used by any service implementer and by the services oriented architecture framework transparently, such as through an AOP sub-system.
  • the modules, facilities, tools, jobs, services, processes and functions described herein may be accessed through various input and output facilities, including bindings and similar facilities, such as EJBs, JMS, web services, SOAP and other bindings.
  • the methods and systems described herein may include a client-side facility for optimizing access of a module, facility, job, service, process, function or the like by a client device.
  • the methods and systems described herein may include a server-side facility for optimizing access of a module, facility, job, service, process, function or the like by a client device.
  • a method disclosed herein may include providing a services oriented architecture for deploying services; deploying a service within the services oriented architecture, the service having a number of bindings available in the services oriented architecture; and automatically selecting one of the number of bindings for the service.
  • a computer program product embodied in a computer readable medium may include: computer executable code for providing a services oriented architecture for deploying services; computer executable code for deploying a service within the services oriented architecture, the service having a number of bindings available in the services oriented architecture; and computer executable code for automatically selecting one of the number of bindings for the service.
  • a system may include a services oriented architecture for deploying services; a service within the services oriented architecture, the service having a number of bindings available in the services oriented architecture; and a software module for automatically selecting one of the number of bindings for the service.
  • automatically selecting may include making a rules-based selection of one of the number of bindings for the service, and/or may include making a selection based upon an access characteristic of an entity invoking the service.
  • the entity may include one or more of a device, an application, and a service.
  • the methods, systems, and computer program products above may include providing a client invocation framework for invoking the service.
  • the client invocation framework may include an interface for dynamically calling any one of the number of bindings for the service.
  • the methods, systems, and computer program products may include generating a plurality of proxies for the interface, which may include, for example at least one C-H- proxy and at least one C# proxy.
  • the client invocation framework may be language independent.
  • the client invocation framework may be proxy-based.
  • the service may include at least one data integration function.
  • the methods, systems, and computer program products described herein may include an
  • Aspect Oriented Programming system for implementing policy management for services in a services oriented architecture wherein a policy manager manages services and binding policies.
  • Aspect Oriented Programming interceptors may associate a policy to a service, and each interceptor can be a client of a common services service that provides policy management functions at runtime.
  • a computer program product may include a computer useable medium including computer readable program code, wherein the computer readable program code when executed on one or more computers causes the one or more computers to perform any one or more of the methods above.
  • data source or “data target” are intended to have the broadest possible meaning consistent with these terms, and shall include a database, a plurality of databases, a repository information manager, a queue, a message service, a repository, a data facility, a data storage facility, a data provider, a website, a server, a computer, a computer storage facility, a CD, a DVD, a mobile storage facility, a central storage facility, a hard disk, a multiple coordinating data storage facilities, RAM, ROM, flash memory, a memory card, a temporary memory facility, a permanent memory facility, magnetic tape, a locally connected computing facility, a remotely connected computing facility, a wireless facility, a wired facility, a mobile facility, a central facility, a web browser, a client, a laptop, a personal digital assistant ("PDA"), a telephone, a cellular phone, a mobile phone, an information platform, an analysis facility, a processing facility, a business enterprise system or other facility where data is handled
  • PDA personal digital
  • Enterprise Java Bean shall include the server-side component architecture for the J2EE platform.
  • EJBs support rapid and simplified development of distributed, transactional, secure and portable Java applications.
  • EJBs support a container architecture that allows concurrent consumption of messages and provide support for distributed transactions, so that database updates, message processing, and connections to enterprise systems using the J2EE architecture can participate in the same transaction context.
  • JMS Java Message Service
  • JCA Java Connector Architecture of the J2EE platform described more particularly below. It should be appreciated that, while EJB, JMS, and JCA are commonly used software tools in contemporary distributed transaction environments, any platform, system, or architecture providing similar functionality may be employed with the data integration systems described herein.
  • Real time shall include periods of time that approximate the duration of a business transaction or business and shall include processes or services that occur during a business operation or business process, as opposed to occurring off-line, such as in a nightly batch processing operation. Depending on the duration of the business process, real time might include seconds, fractions of seconds, minutes, hours, or even days.
  • Business process shall include any methods, service, operations, processes or transactions that can be performed by a business, including, without limitation, sales, marketing, fulfillment, inventory management, pricing, product design, professional services, financial services, administration, finance, underwriting, analysis, contracting, information technology services, data storage, data mining, delivery of information, routing of goods, scheduling, communications, investments, transactions, offerings, promotions, advertisements, offers, engineering, manufacturing, supply chain management, human resources management, data processing, data integration, work flow administration, software production, hardware production, development of new products, research, development, strategy functions, quality control and assurance, packaging, logistics, customer relationship management, handling rebates and returns, customer support, product maintenance, telemarketing, corporate communications, investor relations, and many others.
  • Service oriented architecture shall include services that form part of the infrastructure of a business enterprise.
  • services can become building blocks for application development and deployment, allowing rapid application development and avoiding redundant code.
  • Each service may embody a set of business logic or business rules that can be bound to the surrounding environment, such as the source of the data inputs for the service or the targets for the data outputs of the service.
  • SOA Service oriented architecture
  • Methods shall include data that brings context to the data being processed, data about the data, information pertaining to the context of related information, information pertaining to the origin of data, information pertaining to the location of data, information pertaining to the meaning of data, information pertaining to the age of data, information pertaining to the heading of data, information pertaining to the units of data, information pertaining to the field of data and/or information pertaining to any other info ⁇ nation relating to the context of the data.
  • WSDL Web Services Description Language
  • WSDL Web Services Description Language
  • XML fo ⁇ nat for describing network services (often web services) as a set of endpoints operating on messages containing either document- oriented or procedure-oriented information.
  • the operations and messages are described abstractly, and then bound to a concrete network protocol and message fo ⁇ nat to define an endpoint.
  • Related concrete endpoints are combined into abstract endpoints (services).
  • WSDL is extensible to allow description of endpoints and their messages regardless of what message formats or network protocols are used to communicate.
  • Methodabroker as used herein, shall include systems or methods that may involve a translation engine or other means for performing translation operations or other operations on data or metadata.
  • the translation operations or other operations may involve the translation of data or metadata from one or more formats, languages and/or data models to one or more formats, languages and/or data models.
  • Fig. l is a schematic diagram of a business enterprise with a plurality of business processes, each of which may include a plurality of different computer applications and data sources.
  • Fig. 2 is a schematic diagram showing data integration across a plurality of business processes of a business enterprise.
  • Fig. 3 is a schematic diagram showing an architecture for providing data integration for a plurality of data sources for a business enterprise.
  • Fig. 4 shows an item in relation to other items.
  • Fig. 5 shows an item in relation to other items.
  • Fig 6A shows an item in a certain context.
  • Fig 6B shows an item in a certain context.
  • Fig. 7 shows certain strings.
  • Fig. 8 shows an item and a corresponding string.
  • Fig. 9 shows a string and certain of its variations.
  • Fig. 10 shows a translation engine acting on certain strings.
  • Fig. 11 shows an item that may exist in multiple forms or instances.
  • Fig. 12 shows an item that may exist in multiple forms or instances in a hub or database.
  • Fig. 13 shows an item in a hub at various levels of abstraction.
  • Fig. 14 shows a translation process in which all items are grabbed at the database or hub.
  • Fig. 15A shows a translation process in which items are filtered at the database or hub.
  • Fig. 15B shows a translation process in which the query is translated.
  • Fig. 16A shows an overview of an architecture for a data integration system that includes a services oriented architecture facility.
  • Fig. 16B shows a high level schematic view of another similar architecture for a data integration system that includes a services oriented architecture.
  • Fig. 16C shows modules for enabling services in a services oriented architecture.
  • Fig. 16D shows additional modules for enabling services in a services oriented architecture.
  • Fig. 16E shows a services oriented architecture with a smart client.
  • Fig. 16F shows a particular embodiment of a services oriented architecture.
  • Fig. 16G shows the development and deployment of a module, service and/or facility as services in a services oriented architecture.
  • Fig. 17 shows the deployment of a module as a service in a services oriented architecture.
  • Fig. 18 shows the development and deployment of a data transformation module as a service in a services oriented architecture.
  • Fig. 19 shows the development and deployment of a data loading module as a service in a services oriented architecture.
  • Fig. 20 shows the development and deployment of a metadata management module as a service in a services oriented architecture.
  • Fig. 21 shows the development and deployment of a data profiling module as a service in a services oriented architecture.
  • Fig. 22 shows the development and deployment of a data auditing module as a service in a services oriented architecture.
  • Fig. 23 shows the development and deployment of a data cleansing module as a service in a services oriented architecture.
  • Fig. 24 shows the development and deployment of a data quality module as a service in a services oriented architecture.
  • Fig. 25 shows the development and deployment of a data matching module as a service in a services oriented architecture.
  • Fig. 26 shows the development and deployment of a metabroker module as a service in a services oriented architecture.
  • Fig. 27 shows the development and deployment of a data migration module as a service in a services oriented architecture.
  • Fig. 28 shows the development and deployment of an atomic data repository module as a service in a services oriented architecture.
  • Fig. 29 shows the development and deployment of a semantic identification module as a service in a services oriented architecture.
  • Fig. 30 shows the development and deployment of a filtering module as a service in a services oriented architecture.
  • Fig. 31 shows the development and deployment of a refinement and selection module as a service in a services oriented architecture.
  • Fig. 32 shows the development and deployment of a database content analysis module as a service in a services oriented architecture.
  • Fig. 33 shows the development and deployment of a database table analysis module as a service in a services oriented architecture.
  • Fig. 34 shows the development and deployment of a database row analysis module as a service in a services oriented architecture.
  • Fig. 35 shows the development and deployment of a database structure analysis module as a service in a services oriented architecture.
  • Fig. 36 shows the development and deployment of a recommendation module as a service in a services oriented architecture.
  • Fig. 37 shows the development and deployment of a primary key module as a service in a services oriented architecture.
  • Fig. 38 shows the development and deployment of a foreign key module as a service in a services oriented architecture.
  • Fig. 39 shows the development and deployment of a table normalization module as a service in a services oriented architecture.
  • Fig. 40 shows the development and deployment of a source-to-target mapping module as a service in a services oriented architecture.
  • Fig. 41 shows the development and deployment of an automatic data integration job generation module as a service in a services oriented architecture.
  • Fig. 42 shows the development and deployment of a defect detection module as a service in a services oriented architecture.
  • Fig. 43 shows the development and deployment of a performance measurement module as a service in a services oriented architecture.
  • Fig. 44 shows the development and deployment of a data de-duplication module as a service in a services oriented architecture.
  • Fig. 45 shows the development and deployment of a statistical analysis module as a service in a services oriented architecture.
  • Fig. 46 shows the development and deployment of a data reconciliation module as a service in a services oriented architecture.
  • Fig. 47 shows the development and deployment of a transformation function library module as a service in a services oriented architecture.
  • Fig. 48 shows the development and deployment of a version management module as a service in a services oriented architecture.
  • Fig. 49 shows the development and deployment of a version management module as a service in a services oriented architecture.
  • Fig. 50 shows the development and deployment of a parallel execution module as a service in a services oriented architecture.
  • Fig. 51 shows the development and deployment of a data partitioning module as a service in a services oriented architecture.
  • Fig. 52 shows the development and deployment of a partitioning and repartitioning module as a service in a services oriented architecture.
  • Fig. 53 shows the development and deployment of a database interface module as a service in a services oriented architecture.
  • Fig. 54 shows the development and deployment of a data integration module as a service in a services oriented architecture.
  • Fig. 55 shows the development and deployment of a synchronization module as a service in a services oriented architecture.
  • Fig. 56 shows the development and deployment of a metadata directory supply module as a service in a services oriented architecture.
  • Fig. 57 shows the development and deployment of a graphical depiction module as a service in a services oriented architecture.
  • Fig. 58 shows the development and deployment of a metabroker module as a service in a services oriented architecture.
  • Fig. 59 shows the development and deployment of a metadata hub repository module as a service in a services oriented architecture.
  • Fig. 60 shows the development and deployment of a packaged application connectivity kit module as a service in a services oriented architecture.
  • Fig. 61 shows the development and deployment of an industry-specific data model storage module as a service in a services oriented architecture.
  • Fig. 62 shows the development and deployment of a template module as a service in a services oriented architecture.
  • Fig. 63 shows the development and deployment of a business rule creation module as a service in a services oriented architecture.
  • Fig. 64 shows the development and deployment of a validation table creation module as a service in a services oriented architecture.
  • Fig. 65 shows the development and deployment of a data integration module as a service in a services oriented architecture.
  • Fig. 66 shows the development and deployment of a business metric creation module as a service in a services oriented architecture.
  • Fig. 67 shows the development and deployment of a target database definition module as a service in a services oriented architecture.
  • Fig. 68 shows the development and deployment of a mainframe data profiling module as a service in a services oriented architecture.
  • Fig. 69 shows the development and deployment of a batch processing module as a service in a services oriented architecture.
  • Fig. 70 shows the development and deployment of a cross-table analysis module as a service in a services oriented architecture.
  • Fig. 71 shows the development and deployment of a relationship analysis module as a service in a services oriented architecture.
  • Fig. 72 shows the development and deployment of a data definition language code generation module as a service in a services oriented architecture.
  • Fig. 73 shows the development and deployment of a design interface module as a service in a services oriented architecture.
  • Fig. 74 shows the development and deployment of a data integration job development module as a service in a services oriented architecture.
  • Fig. 75 shows the development and deployment of a data integration job deployment module as a service in a services oriented architecture.
  • Fig. 76 shows the development and deployment of a logging service module as a service in a services oriented architecture.
  • Fig. 77 shows the development and deployment of a monitoring service module as a service in a services oriented architecture.
  • Fig. 78 shows the development and deployment of a security module as a service in a services oriented architecture.
  • Fig. 79 shows the development and deployment of a licensing module as a service in a services oriented architecture.
  • Fig. 80 shows the development and deployment of an event management module as a service in a services oriented architecture.
  • Fig. 81 shows the development and deployment of a provisioning module as a service in a services oriented architecture.
  • Fig. 82 shows the development and deployment of a transaction module as a service in a services oriented architecture.
  • Fig. 83 shows the development and deployment of an auditing module as a service in a services oriented architecture.
  • Fig. 84 shows a service, API and smart client.
  • the invention(s) disclosed herein can take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment containing both hardware and software elements.
  • the invention is implemented in software, which includes but is not limited to firmware, resident software, microcode, etc.
  • the invention(s) can take the form of a computer program product accessible from a computer-usable or computer-readable medium providing program code for use by or in connection with a computer or any instruction execution system.
  • a computer-usable or computer readable medium can be any apparatus that can contain, store, communicate, propagate, or transport the program for use by or in connection with the instruction execution system, apparatus, or device.
  • the medium can be an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system (or apparatus or device) or a propagation medium.
  • Examples of a computer-readable medium include a semiconductor or solid state memory, magnetic tape, a removable computer diskette, a random access memory (RAM), a read-only memory (ROM), a rigid magnetic disk and an optical disk.
  • Current examples of optical disks include compact disk - read only memory (CD-ROM), compact disk - read/write (CD-R/W) and DVD.
  • a data processing system suitable for storing and/or executing program code will include at least one processor coupled directly or indirectly to memory elements through a system bus.
  • the memory elements can include local memory employed during actual execution of the program code, bulk storage, and cache memories which provide temporary storage of at least some program code in order to reduce the number of times code must be retrieved from bulk storage during execution.
  • I/O devices including but not limited to keyboards, displays, pointing devices, etc.
  • I/O controllers can be coupled to the system either directly or through intervening I/O controllers.
  • Network adapters may also be coupled to the system to enable the data processing system to become coupled to other data processing systems or remote printers or storage devices through intervening private or public networks.
  • Modems, cable modem and Ethernet cards are just a few of the currently available types of network adapters.
  • Fig. 1 represents a platform 100 for facilitating integration of various data of a business enterprise.
  • the platform includes a plurality of business processes, each of which may include a plurality of different computer applications and data sources.
  • the platform may include several data sources 102, which may be data sources such as those described above. These data sources may include a wide variety of data types from a wide variety of physical locations.
  • the data source may include systems from providers such as such as Sybase, Microsoft, Informix, Oracle, Inlomover, EMC, Trillium, First Logic, Siebel, PeopleSoft, IBM, Apache, or Netscape.
  • the data sources 102 may include systems using database products or standards such as IMS, DB2, ADABAS, VSAM, MD Series, UDB, XML, complex flat files, or FTP files.
  • the data sources 102 may include files created or used by applications such as Microsoft Outlook, Microsoft Word, Microsoft Excel, Microsoft Access, as well as files in standard formats such as ASCII, CSV, GIF, TIF, PNG, PDF, and so forth.
  • the data sources 102 may come from various locations or they may be centrally located.
  • the data supplied from the data sources 102 may come in various forms and have different formats that may or may not be compatible with one another.
  • Data targets are discussed later in this description. In general, these data targets may be any of the data sources 102 noted above. This difference in nomenclature typically denotes whether a data system provides data or receives data in a data integration process. However, it should be appreciated that this distinction is not intended to convey any difference in capability between data sources and data targets (unless specifically stated otherwise), since in a conventional data integration system, data sources may receive data and data targets may provide data.
  • the platform illustrated in Fig.l also includes a data integration system 104.
  • the data integration system may, for example, facilitate the collection of data from the data sources 102 as the result of a query or retrieval command the data integration system 104 receives.
  • the data integration system 104 may send commands to one or more of the data sources 102 such that the data source(s) provides data to the data integration system 104. Since the data received may be in multiple formats including varying metadata, the data integration system may reconfigure the received data such that it can be later combined for integrated processing. The functions that may be performed by the data integration system 104 are described in more detail below.
  • the platform 100 also includes several retrieval systems 108.
  • the retrieval systems 108 may include databases or processing platforms used to further manipulate the data communicated from the data integration system 104.
  • the data integration system 104 may cleanse, combine, transform or otherwise manipulate the data it receives from the data sources 102 such that a retrieval system 108 can use the processed data to produce reports 110 useful to the business.
  • the reports 110 may be used to report data associations, answer complex queries, answer simple queries, or form other reports useful to the business or user, and may include raw data, tables, charts, graphs, and any other representations of data from the retrieval systems 108.
  • the platform 100 may also include a database or data base management system 112.
  • the database 112 may be used to store information temporally, temporarily, or for permanent or long-term storage.
  • the data integration system 104 may collect data from one or more data sources 102 and transform the data into forms that are compatible with one another or compatible to be combined with one another. Once the data is transformed, the data integration system 104 may store the data in the database 112 in a decomposed form, combined form or other form for later retrieval.
  • Fig. 2 is a schematic diagram showing data integration across a plurality of entities and business processes of a business enterprise.
  • the data integration system 104 facilitates the information flowing between user interface systems 202 and data sources 102.
  • the data integration system 104 may receive queries from the interface systems 202, where the queries necessitate the extraction and possibly transformation of data residing in one or more of the data sources 102.
  • the interface systems 202 may include any device or program for communicating with the data integration system 104, such as a web browser operating on a laptop or desktop computer, a cell phone, a personal digital assistant ("PDA"), a networked platform and devices attached thereto, or any other device or system that might interface with the data integration system 104.
  • PDA personal digital assistant
  • a user may be operating a PDA and make a request for information to the data integration system 104 over a WiFi or Wireless Access Protocol/Wireless Markup Language ("WAP/WML") interface.
  • the data integration system 104 may receive the request and generate any required queries to access information from a website or other data source 102 such as an FTP file site.
  • the data from the data sources 102 may be extracted and transformed into a format compatible with the requesting interface system 202 (a PDA in this example) and then communicated to the interface system 202 for user viewing and manipulation.
  • the data may have previously been extracted from the data sources and stored in a separate database 112, which may be a data warehouse or other data facility used by the data integration system 104.
  • the data may have been stored in the database 112 in a transformed condition or in its original state.
  • the data may be stored in a transformed condition such that the data from a number of data sources 102 can be combined in another transformation process.
  • a query from the PDA may be transmitted to the data integration system 104 and the data integration system 104 may extract the information from the database 112. Following the extraction, the data integration system 104 may transform the data into a combined format compatible with the PDA before transmission to the PDA.
  • Fig. 3 is a schematic diagram showing an architecture for providing data integration for a plurality of data sources 102 for a business enterprise.
  • An embodiment of a data integration system 104 may include a discover data stage 302 to perform, possibly among other processes, extraction of data from a data source and analysis of column values and table structures for source data.
  • a discover data stage 302 may also generate recommendations about table structure, relationships, and keys for a data target. More sophisticated profiling and auditing functions may include date range validation, accuracy of computations, accuracy of if-then evaluations, and so forth.
  • the discover data stage 302 may normalize data, such as by eliminating redundant dependencies and other anomalies in the source data.
  • the discover data stage 302 may provide additional functions, such as drill down to exceptions within a data source 102 for further analysis, or enabling direct profiling of mainframe data.
  • a non-limiting example of a commercial embodiment of a discover data stage 302 may be found in IBM 1 S WebSphere Prof ⁇ leStage product.
  • the data integration system 104 may also include a data preparation stage 304 where the data is prepared, standardized, matched, or otherwise manipulated to produce quality data to be later transfonned.
  • the data preparation stage 304 may perform generic data quality functions, such as reconciling inconsistencies or checking for correct matches (including one-to-one matches, one-to-many matches, and deduplication) within data.
  • the data preparation stage 304 may also provide specific data enhancement functions. For example, the data preparation stage 304 may ensure that addresses confonn to multinational postal references for improved international communication.
  • the data preparation stage 304 may conform location data to multinational geocoding standards for spatial information management.
  • the data preparation stage may modify or add to addresses to ensure that address information qualifies for U.S. Postal Service mail rate discounts under Government Certified U.S. Address Correction. Similar analysis and data revision may be provided for Canadian and Australian postal systems, which provide discount rates for properly addressed mail.
  • a non-limiting example of a commercial embodiment of a data preparation stage 304 may be found in IBM 's WebSphere QualityStage product.
  • the data integration system may also include a data transformation stage 308 to transform, enrich and deliver transformed data.
  • the data transformation stage 308 may perform transitional services such as reorganization and reformatting of data, and perform calculations based on business rules and algorithms of the system user.
  • the data transformation stage 308 may also organize target data into subsets known as datamarts or cubes for more highly tuned processing of data in certain analytical contexts.
  • the data transformation stage 308 may employ bridges, translators, or other interfaces (as discussed generally below) to span various software and hardware architectures of various data sources and data targets used by the data integration system 104.
  • the data transformation stage 308 may include a graphical user interface, a command line interface, or some combination of these, to design data integration jobs across the platform 100.
  • a non-limiting example of a commercial embodiment of a data transformation stage 308 may be found in IBM 's WebSphere DataStage product.
  • the stages 302, 304, 308 of the data integration system 104 may be executed using a parallel execution system 310 or in a serial or combination manner to optimize the performance of the system 104.
  • the data integration system 104 may also include a metadata management system 312 for managing metadata associated with data sources 102.
  • the metadata management system 312 may provide for interchange, integration, management, and analysis of metadata across all of the tools in a data integration environment.
  • a metadata management system 312 may provide common, universally accessible views of data in disparate sources, such as IBM WebSphere's ODBC MetaBroker, CA ERwin, IBM WebSphere ProfileStage, IBM WebSphere DataStage, IBM WebSphere QualityStage, IBM DB2 Cube Views, and Cognos Impromptu.
  • the metadata management system 312 may also provide analysis tools for data lineage and impact analysis for changes to data structures.
  • the metadata management system 312 may further be used to prepare a business data glossary of data definitions, algorithms, and business contexts for data within the data integration system 104, which glossary may be published for use throughout an enterprise.
  • a non-limiting example of a commercial embodiment of a metadata management system 312 may be found in IBM 's WebSphere MetaStage product.
  • items that are relevant to an enterprise can be described in terms of various contexts or hierarchies, such as to capture the semantic context of the items.
  • Fig. 4 depicts a semantic identifier for an item.
  • the item may be an object, class, attribute, data item, data model, metadata model, model, definition, identity, structure, language, mapping, relationship, instance or other item or concept, including another semantic identifier.
  • the semantic identifier may identify the item based on the item's attributes, the item's physical location, the relationship of the item with one or more other items, such as in a hierarchy, or the like. In some cases a relationship may be defined as the absence of some particular relationship. A relationship may be based on semantics. A relationship may involve the position of the item in a relational hierarchy. For example, in Fig. 4 item 1 5202 may be identified based on its relationship with the other items to which it is related. Item 1 5202 may be identified as being directly related to item 2 5204, item 3 5208 and item 4 5210, indirectly related to item 5 5212 and indirectly related to item 6 5214 through item 5 5212 and item 4 5210.
  • Item 1 may also be identified as being directly related to item 25204, item 3 5208 and item 4 5210.
  • the indirect relationships between item 1 5202 and item 5 5212 and item 6 5214 may be captured in the relationship of item 5202 1 to item 4 5210.
  • This concatenation or recursive type of identification may permit dynamic, in addition to static, identifiers. For example, if the relationship between item 4 5210 and item 6 5214 changes, the semantic identifier for item 1 5202 which incorporates item 2 5204, item 3 5208 and item 45210 would incorporate this change through incorporation of item 44210 and would not need to be updated to account for the changes in item 6 5214 as it would if item 6 5214 was directly included in the semantic identifier.
  • Figure 5 presents a more concrete example of a semantic identifier.
  • Jim may be identified as Jim, residing at 111 Anyroad, Anytown, Anystate USA, with phone number 555-555-5555 and social security number 013-65- 8067.
  • Jim may be identified in terms of his relationships with others.
  • Jim may be identified as the son of Betty, brother of Larry and Jeff, father of Jessica and nephew of Frank.
  • the semantic identifier may be a unique identifier for an item.
  • this semantic identifier would be a unique identifier for Jim. It is possible that a unique semantic identifier to an item takes into account fewer than all of the relationships of that item with other items. In the example of Figure 5, if there were only one Jim in the world who was the son of Betty, brother of Larry and father of Jessica, the existence of these relationships alone would be enough to create a unique semantic identifier. Jim's relationships with Jeff and Frank would not need to be considered.
  • semantic identifier may be advantageous to create a semantic identifier that is based on the minimum number of relationships that ensure uniqueness. For example, if the semantic identifier was to be stored in a database 112 or processed by a data integration system 104, a less complex semantic identifier would require less space and would allow for faster processing.
  • Figure 6 A depicts two items of interest: item 1 5402 and item 7 5404.
  • item 1 5402 may be distinguished from item 7 5404 by item l's 5402 relationship with item 5 5410 and item 6 5412. That is, in context A, the unique semantic identifier for item 1 5402 may be that it is directly related to items 2, 3 and 4, indirectly related to item 5 5410 though item 4 and indirectly related to item 6 5412 through item 5 5410 and item 4.
  • the unique semantic identifier for item 7 5404 may be that it is directly related to only items 2 and 3.
  • Figure 6B presents item 1 5402 in a different context, context B 5414.
  • any one or more of item l's 5402 direct relationships with item 4, absence of a direct relationship with item 6 or indirect relationship with item 5 may be taken into account.
  • item 1 5402 may be uniquely semantically identified as directly related to items 2 and 3, but not directly related to item 6.
  • the unique identifier for item 1 differs between context A 5408 and context B 5414.
  • a semantic identifier for an item such as an item related to a data integration job or a data integration platform, may be provided with a context-dependent identifier for the item.
  • a context-dependent identifier may be stored in an atomic format, such as in a data repository.
  • contexts A 5408 and B 5414 may be two different imports, mappings, run versions, models, metabroker models, instances, tools, views, objects, classes, items, relationships, attributes, or any combination of any of the foregoing.
  • a matching or comparison facility may compare the syntax of the identity of an item in different imports, run versions, models, metabroker models, instances, tools and/or items and determine or assist with the determination of what action to take or refrain from taking based on the comparison.
  • a matching engine may compare the model used by import instance A to the model used by metabroker B. Based on this comparison it may be decided that metabroker B can access the data and metadata of import instance A without transformation or modification, and the comparison facility may direct the metabroker B to proceed.
  • tool A 5408 may be compared to tool B 544, and it may be determined to perform a cross-tool object merge, wherein each tool can access and use the objects of the other tool.
  • the comparison facility may trigger a translation facility to assist the cross-tool object merge, such as establishing a bridge, metabroker, hub or the like for translating any objects that require translation, such as translation that is based on the different syntax for the handling of the identity of particular items in each respective tool, or based on other differences between the tools as determined by the comparison.
  • a semantic identifier may be stored, maintained, recorded, processed and/or interpreted in a syntax that may be stored, maintained, recorded, processed and/or interpreted in a string structure or format.
  • Figure 7 depicts an example of a syntax and a corresponding string composed in that syntax.
  • the syntax 5502 may be column name::table name::database name. This syntax may be related, for example, to a semantic identifier that identifies a column of a table in a database.
  • a string composed in this syntax 5504 may be age::employee::employee database. This string may be related, for example, to a semantic identifier that identifies the age of an employee in a particular employee database.
  • the string corresponding to the semantic identifier for item 1 5402 in context B 5414 may be: direct relation to item 2::direct relation to item 3 "direction relationship to item 4.
  • the semantic identifier and corresponding string may also incorporate the lack of a direct relationship between items 1 5402 and item 6.
  • the semantic identifier in string format for item 9 5602 may be: direct to item 2::direct to item 3::direct to item 4: indirect to item 5 5604.
  • a string may be capable of being parsed.
  • a syntax and/or string may be truncated, modified and/or the elements of a syntax and/or string may be re-ordered.
  • string 5702 is a truncation of string 5604
  • string 5704 is a truncation and modification and/or re-ordering of string 5604
  • string 5708 is a modification and/or re-ordering of string 5606.
  • the truncation, modification and/or re-ordering may be performed by a translation engine.
  • Truncating a syntax and/or string may reduce storage requirements and increase processing efficiency. It may also be useful to change the order of the relationships in a syntax and/or string, for example, to reduce processing time for data integration processes.
  • a translation engine may perform translation operations with respect to one or more semantic identifiers, databases 112, databases 112 including semantic identifiers, systems of information, systems of information including semantic identifiers or other items.
  • Figure 10 depicts a translation engine 5802 acting on a semantic identifier embodied as a string 5804 and on a semantic identifier embodied as a string located in a database 5808.
  • the translation operation may translate or otherwise modify the format, language and/or data model of a semantic identifier.
  • a translation operation may involve a translation or mapping to or from one or more data tools, languages, formats and/or data models to or from at least one other data tool, language, format and/or data model.
  • a translation operation may involve a translation or mapping to, from or between known data integration tools, such as WebSphere DataStage 7 from IBM, WebSphere QualityStage from IBM, Business Objects tools, IBM - DB2 Cube Views, UML 1.1, UML 1.3, ERStudio, IBM 1 S WebSphere ProfileStage, PowerDesigner (with added support for Packages and Extended Attributes) and/or MicroStrategy tools.
  • a translation engine and/or translation operation may optionally be embodied in a metabroker.
  • a translation operation may be performed, executed and/or conducted in batch, real-time and/or on a continuous basis.
  • a translation operation may be provided or made available as a service, for example, as part of a service oriented architecture.
  • the SOA can be part of the infrastructure of an enterprise computing system of a business enterprise.
  • services become building blocks for application development and deployment, allowing rapid application development and avoiding redundant code.
  • Each service embodies a set of business logic or business rules that can be blind to the surrounding environment, such as the source of the data inputs for the service or the targets for the data outputs of the service.
  • services can be reused in connection with a variety of applications, provided that appropriate inputs and outputs are established between the service and the applications.
  • the service-oriented architecture allows the service to be protected against environmental changes, so that the architecture functions even if the surrounding computer environment is changed. As a result, services may not need to be recoded as a result of infrastructure changes, which may result in savings of time and effort.
  • An SOA may be for a web service and may involve three entities, a service provider, a service requester and a service registry.
  • the registry may be public or private.
  • the service requester may search a registry for an appropriate service. Once an appropriate service is discovered, the service requester may receive code, such as Web Services Description Language (“WSDL") code, that is necessary to invoke the service.
  • WSDL Web Services Description Language
  • the service requester may then interface with the service provider, such as through messages in appropriate formats (such as the Simple Object Access Protocol (“SOAP”) format for web service messages), to invoke the service.
  • SOAP protocol is a preferred protocol for transferring data in web services.
  • the SOAP protocol defines the exchange format for messages between a web services client and a web services server.
  • the SOAP protocol uses an extensible Markup Language (“XML”) schema, XML being a generic language specification commonly used in web services for tagging data, although other markup languages may be used.
  • mapping of a translation operation can, among other things, trace data that is translated in the execution of the operation backward and forward between an original semantic context and a translated semantic context.
  • the appropriate identifier for the data item may vary, such as by varying or truncating a syntax and/or string to enable more efficient storage or faster processing, or by varying the relationships used to form a unique identifier where the semantic context varies.
  • a dynamic identifier may combine the benefits of retraceable translation with the benefits of rapid processing, efficient data processing and effective operation in various contexts in which a data item is used.
  • a given item such as an item that has an identity in a model, may exist in multiple forms or instances, such as a physical instance and a logical modeling instance.
  • Figure 11 depicts an item, namely, a table of employee information 5902.
  • the concept or entity "employees" can exist in a number of different forms within an enterprise.
  • the employee table 5902 may exist as a physical table that stores values related to employees in a physical data storage facility.
  • the entity employee may also be represented as a logical entity, such as an icon or text that represents employees in a logical modeling activity 5908, or in various other forms or instances.
  • Figure 12 depicts the employee table 5902 in one form or a single instance in a database 6002 and/or more than one form or instance in a database 6004 or hub 6008.
  • any differentiating characteristic may be used, such as a level of abstraction, a physical property of an item, a location of the item within a hierarchy, a location of an item in a database, a context in which an item is found, a syntax of an item, a relationship of an item to other items, an attribute of an item, the class of an item, or other characteristic.
  • a level of abstraction a physical property of an item
  • a location of the item within a hierarchy a location of an item in a database
  • a context in which an item is found a syntax of an item, a relationship of an item to other items, an attribute of an item, the class of an item, or other characteristic.
  • the items, or individuals in this case may be distinguished based on age, gender, hair color, IQ, political affiliation and/or number of trips to the doctor in the past three months.
  • the employee table may exist in multiple forms or instances in the hub 6102, such as a physical employee table 5904, such as used to store values in a database that relate to data that pertains to employees, and a logical employee model 5908, such as to be used in a view of process that relates to employees.
  • an item such as a table named "employee”
  • a hub collector may have two forms or instances of "employee” in the hub; one corresponding to the physical database instance and another corresponding to the logical modeling activity.
  • a differentiating characteristic such as a property of the item attributed to the item in the hub allows for the differentiation between the physical instances and the logical model instances or forms. In embodiments that differentiating characteristic can be called a level of abstraction, such as to distinguish between logical and physical levels of abstraction.
  • the hub may associate other characteristics with items, such as different forms of identifiers, relationships, classes, attributes, physical locations, logical positions, models and the like.
  • a system such as a translation engine 6204, may grab, load or obtain all of the items from a hub 6208 or database 6210. It may select or filter 6204 the items based on any differentiating characteristic. For example, it may select or filter out those instances or forms that have a physical level of abstraction, that have a particular relationship to other items, that have a logical level of abstraction, that are created prior to a specified date and time, or that have any other distinguishing characteristics.
  • the methods and systems described herein provide for selective handling of instances of the same item or entity based on any differentiating characteristic.
  • a translation engine 6204 may filter or select items, including any data and/or metadata, at the hub 6208 or database 6210 and grab, load or obtain only those items of the relevant level of abstraction. For example, it may filter or select out those instances or forms with a logical level of abstraction, keeping only those with a physical level of abstraction.
  • the filtering or selection may be performed at runtime or design time and may be conducted in batch, real-time or on a continuous basis. In embodiments such a method of filtering or selection may be provided as an RTI service in a services oriented architecture.
  • the filtering or selection may be based on information, such as a mapping of a data model, a mapping of a metadata model, a differentiating characteristic, a relationship of an item to another item, an attribute of an item, or the syntax of an identifier, that is obtained by the translation engine and/or system at development-time, design-time or run-time.
  • the information may be updated in a dynamic fashion in real-time. The closer in the overall process the filtering or selection is to the hub or database the more efficient and faster the operation.
  • the translation engine 6204 may perform a translation operation on the query 6202 itself, resulting in a revised query 6302, which may be sent for further processing, such as directly to the hub 6208 or database 6210.
  • the revised query 6302 may be rendered in a format that is directly compatible with the native format of the hub 6208 or database 6210. For example, by rendering the query in the native format of the database 6210, the system may increase processing efficiency for the query. Similarly, the query 6302 may be filtered or a command such as a select command may be generated to keep a logical modeling entity rather than a physical entity, in which case the query 6302 may be rendered in a format suitable for a logical modeling activity (such as a graphical user interface), rather than for the database.
  • a logical modeling activity such as a graphical user interface
  • the methods and systems described herein can be used to capture semantic contexts and to handle data integration tasks with respect to a wide range of items related to an enterprise, such as an object, data item, datum, column, row, table, database, instance, attribute, metadata, concept, topic, subject, semantic identifier, other identifier, RFID tag, vendor, supplier, customer, person, team, organization, user, network, system, device, family, store, product, product line, product feature, product specification, product attribute, price, cost, bill of materials, shipping data, tax data, course, educational program, location, map, division, organization, organism, process, rule, law, rating system, good, service and/or service offering.
  • items related to an enterprise such as an object, data item, datum, column, row, table, database, instance, attribute, metadata, concept, topic, subject, semantic identifier, other identifier, RFID tag, vendor, supplier, customer, person, team, organization, user, network, system, device, family, store, product, product line, product feature, product specification, product attribute, price
  • the methods and systems described herein can be used in a variety of semantic contexts, such as a step in an enterprise method, a datum in a database, a datum in a row or column, a row or column in a table, a row or column in a database, a datum in a table, a table in a database, metadata in a database, an item in a hub or repository, an item in a database, an item in a table, an item in a column, an item in a row, a person in an organization, a sender or recipient of a communication, a user on a network, a system on a network, a device on a network, a person in a family, an item in a store, a dish on a menu, a product in a product line, a product in a product offering, a course or step in an educational or training program, a location on a map, a location of an item, a division of an organization, a person on a team,
  • a high level schematic view of an architecture depicts how a plurality of services may be combined to operate as an integrated application that unifies development, deployment, operation, and life- cycle management of a data integration solution.
  • the unification of data integration tasks into a single platform may eliminate the need for separate software products for different phases of design and deployment.
  • the individual modules, processes, services, and functions can each be provided separately, such as by invoking each of them independently as services in a services oriented architecture.
  • the architecture 6430 may include a GUI/tool framework 6432, an intelligent automation layer 6403, one or more clients 6434, APIs 6438, core services 6440, product function services 6442, metadata services 6452, metadata repositories 6454, one or more runtime engines 6444 with component runtimes 6450 and connectors 6448.
  • the architecture 6430 may be deployed on a service-oriented architecture, such as any of the service-oriented architectures described above.
  • Metadata models stored in the metadata repository 6454 provide common internal representations of data throughout the system at every step of the process from design through deployment.
  • the common services may provide for batch processing, concurrent processing, straight through processing, pipelining, modeling, simulation, conceptualization, detail design, testing, debugging, validation, deployment, execution, monitoring, measurement, improvement, upgrade, reporting, system management, and administration.
  • Models may be registered in a directory that is accessible to other system components.
  • the common models may provide a common representation (common to all product function services) of numerous suite-wide items including metadata (data descriptive data including data profile information), data integration process specifications, users, machine and software configurations, etc. These common models may enable common user views of enterprise resources and integration processes no matter what product functions the user is using, and may obviate the need for model translation among integrated product functions.
  • the service oriented architecture is shown as encompassing all of the services and may provide for the coordination of all the services from the GUI 6432 through the run time engine 6444 and the connections 6448 to the computing environment.
  • the common models which may be stored in the metadata repository 6454, may allow the SOA to seamlessly provide interaction between a plurality of services or a plurality of models.
  • the SOA may, for example, expose the GUI 6432 to all aspects of data integration design and deployment by use of common core services 6440, production function services 6442, and metadata services 6452, and may operate through an intelligent automation layer 6403.
  • the common models and services may allow for common representation of objects in the GUI 6432 for various actions during the design and deployment process.
  • the GUI 6432 may have a plurality of clients 6434 interfacing with SOA coordinated services.
  • the clients 5204 may allow users to interface with the data integration design with a plurality of skill levels enabling users to work as a team across organizationally appropriate levels.
  • the SOA 5201 may provide access to common core services 5210 and product function services 5212, as well as providing back end support to APIs 5208, for functions and services in data integration designs. Services may be shared and reused by a plurality of clients 5204 and other services.
  • a GUI 6432 may be the GUI for a client application that is designed specifically to work with a particular RTI service, such as exposing a particular data integration job as a service.
  • the GUI 6432 may be a GUI for a product service 6442, such as a data integration service, such as extraction, transformation, loading, cleansing, profiling, auditing, matching, or the like.
  • the GUI 6432 may be a GUI or client for a common service 6440, such as a logging or event management service.
  • the clients 6434 may allow users to interface with the data integration design with a plurality of skill levels enabling users to work as a team across organizationally appropriate levels.
  • the SOA may provide access to common core services 6440, product function services 6442, and services related to metadata.
  • the SOA may also include one or more APIs 6438 that expose the functions and services in the data integration platform to external applications and devices.
  • Services may be shared and reused by a plurality of clients 6434, APIs, devices, applications and other services.
  • the intelligent automation layer 6403 may employ metadata and services within the architecture to simplify user choices within the GUI 6432, such as by showing only relevant user choices, or automating common, frequent, and/or obvious operations.
  • the intelligent automation layer 6403 may automatically generate certain jobs, diagnose designs and design choices, and tune performance.
  • the intelligent automation layer 6403 may also support higher-level design paradigms, such as workflow management or modeling of business context, and may more generally apply project or other contextual awareness to assist a user in more quickly and efficiently implementing data integration solutions.
  • the common core services 6440 may provide common function services that may be commonly used across all aspects of the design and deployment of the data integration solution, such as directory services for one or more common registries, logging and auditing services, monitoring, event management, transaction services, security, licensing (such as creation and enforcement of licensing policies and communication with external licensing services), and provisioning, and management of SO A services.
  • the common core services 6440 may allow a common representation of functions and objects to the common GUI 6432. Any other service, such as the product function services 6442, RTI services, or other services, devices, applications or modules can access and act as a client of any particular common service 6440.
  • product specific function services 6442 may be contained in the product function services 6442 and may provide services to specific appropriate clients 6434 and services. These may include, for example, importing and browsing external metadata, as well as profiling, analyzing, and generating reports. Other functions may be more design-oriented, such as services for designing, compiling, deploying, and running data integration services through the architecture.
  • the product function services 6442 may be accessible to the GUI 6432 when an appropriate task is used and may provide a task oriented GUI 6432.
  • a task oriented GUI may present a user only functions that are appropriate for the actions in the data integration design.
  • the application program interfaces (APIs) 6438 may provide a programming interface for access to the full architecture, including any or all of the services, repositories, engines, and connectors therein.
  • the APIs 6438 may contain a commonly used library of functions used by and/or created from various services, and may be called recursively.
  • Fig. 16A additionally shows metadata and repository services 6454 that may control access to the metadata repository 6454. All functions may keep metadata represented by its own function-specific models in a common repository in the metadata repository 6454. Functions may share common models, or use metadata mappings to dynamically translate semantics among their respective models.
  • Metadata and metadata models may be stored in the metadata repository 6454 and the metadata and repository services 6452 may maintain metadata versioning, persistence, check-in and check-out of metadata and metadata models, and repository space for interim metadata created by a user before it is reconciled with other metadata.
  • the metadata and repository services 6452 may provide access to the metadata repository 6454 to a plurality of services, GUI 6432, internal clients 6434 and external clients using a repository hub. Access by other services and clients 6434 to the metadata repository 6454 may allow metadata to be accessed, transformed, combined, cleansed, and queried by the other services in seamless transactions coordinated by the SOA.
  • a runtime engine 6444 may use adapters and connections 6448 to communicate with external sources.
  • the engines 6444 may be exposed to designs created by a user to create compiled and deployed solutions based on the computing environment.
  • the runtime engine 6444 may provide late binding to the computer environment and may provide the user the ability to design data integration solutions independent of computer environment considerations.
  • the run time engine 6444 orchestration with SOA services may allow the user to design without restrictions of run time compilation issues.
  • the runtime engine 6444 may compile the data integration solution and provide an appropriate deployed runtime for high throughput or high concurrency environments automatically. Services may be deployed as J2EE structures from a registry that provides access to interface and usage specifications for various services.
  • the services may support multiple protocols, such as HTTP, Corba/RMI, JMS, JCA, and the like, for use with heterogeneous hardware and software environments. Bindings to these protocols may be automatically selected by the runtime engine 6444 or manually selected by the user from the GUI 6432 as part of the deployment process.
  • External connectors 6448 may provide access to a network or other external resources, and provide common access points for multiple execution engines and other transformation execution environments, such as Java or stored procedures, to external resources.
  • the runtime engines 6444 may include a transaction engine adapted to parse large transactions of potentially unlimited length, as well as continuous streams of real time transactions.
  • the runtime engines 6444 may also include a parallelism (or concurrency) engine adapted to processing small independent transactions.
  • the parallelism engine may try to break up a process into pipeline functionality or some other partitioned flow, and works well with a large volume of similar work units.
  • the parallelism engine may be adapted to receive preprocessed input (and output) that has been divided into a pipelined or otherwise partitioned flow.
  • a compilation and optimization layer may determine how to present processes to these various engines, such as by preprocessing output to the parallelism engine into small chunks.
  • centralizing connectors within the architecture it is possible to more closely control distribution of processes between various engines, and to provide accessibility to this control at the user interface level.
  • a common intermediate representation of connectivity in a transformation process enables deployment of any automation strategies, and selection of different combinations of execution engines, as well as optimization based on, for example, metadata or profiling.
  • the architecture 6430 described herein provides a high-degree of flexibility and customizability to the user's working environment. This may be applied, for example, to configure user environments around existing or planned workflows and design processes. Users maybe able to create specific functional services by constructing components and combining them into compositions, which may also serve in turn as components allowing recursive nesting of modularity in the design of new components.
  • the components and compositions may be stored in the metadata repository 6454 with access provided by the metadata and repository services 6452.
  • Metadata and repository services 6452 may provide common data definitions with a common interface with a plurality of services and may provide support for native data formats and industry standard formats.
  • the modular nature of the architecture described herein enables packaging of any enterprise function(s) or integration process(es) into a package having components selected from the common core services 6440 and other ones of the product function services 6442, as well as other components of the overall architecture.
  • the ability to make packages from system components may be provided as a common core service 6442.
  • any arbitrary function can be constructed, provided it is capable of expression as a combination of atomic services, components, and compositions already within the architecture 6430.
  • the packaging capability of the architecture 6430 may be combined with the task orientation of the user interface to achieve a user interface specifically adapted to any workflow or design methodology that a user wishes.
  • Figure 16B depicts, at a high level, another architecture for a data integration system that includes an SOA 1 which in an embodiment may be the IBM WebSphere Services Backbone from IBM.
  • the architecture may include components similar to those described in connection with Fig. 16A, such as one or more GUIs 6434, which may include specific clients 6480 that are designed to interact with various RTI services, such as described throughout this disclosure.
  • the GUIs 6434 may include various other GUIs, such as GUIs for a variety for a variety of data integration tools, such as IBM's WebSphere Datastage, Metastage, RTI, Datastage TX, and other tools, as well as tools from other vendors.
  • GUI such as an RTI client 6480, or a conventional GUI 6434
  • RTI client 6480 may facilitate interaction with the functions, processes, modules and services of the data integration platform.
  • GUIs 6434 may be clients of services that are deployed in a services oriented architecture.
  • the platform may include various other product services 6442, such as services that perform specific data integration functions.
  • product services 6442 can be exposed as services in an SOA to enable access to the functions without requiring them to be separately coded. Many embodiments of such product services 6442 are described in detail below.
  • the architecture may include common services 6440, which include a variety of services that may be useful for a wide variety of applications, modules, processes or functions.
  • GUIs 6434, product services 6442, other common services 6440, and other applications can serve as clients of any of the common services 6440, invoking the common services 6440 as needed to perform common functions, such as logging, event management, monitoring, provisioning, security, and the like.
  • An SOA may also interact with common model and repository data and metadata 6454, including to expose metadata related services in an SOA.
  • the architecture may also include an API, such as to allow an external device or application to access the data integration functions of the platform.
  • An SOA may also interact with and/or invoke metabrokers 6452, engines 6450 and connectivity applications 6448. Such as to perform data integration tasks, such as extraction, transformation, and loading of data and metadata.
  • the core of the SOA may be the service binding 6468, SOA infrastructure 6470, and service implementation 6474.
  • Service binding 6468 may permit binding of clients, such as GUI 6464, applications 6460, script orchestration 6458, management framework 6456, and other clients, to services that may be internal or external to the SOA.
  • the bound services may be part of the common core services 5520 and the services binding 6464 may access the service description registry 6466 to instantiate the service.
  • the service binding 6464 may make it possible for clients to use services that may be local or external using the same or different technologies.
  • the binding to external services may expose the external services and they may be invoked in the same manner as internal services.
  • Communication to the services may be synchronous or asynchronous, may use different communication paths, and may be stateful or stateless.
  • the service binding 6464 may provide support for a plurality of protocols such as, HTTP, E]B, web services protocols, CORBA/RMI, JMS, or JCA. As described herein, the service binding 6464 may determine the appropriate protocol for the service binding automatically according to the computer environment or the user may select the protocol from the GUI 6464 as part of the design solution 5304.
  • the management framework 6456 client may provide facilities to install, expose, catalog, configure, monitor, and otherwise administer the SOA services.
  • the management framework 6456 may provide access to clients, internal services, external services through connections, or metadata in internal or external metadata.
  • the orchestration client 6458 may make it possible to design a plurality of complex product functions and workflows by composing a plurality of SOA services into a design solution 5304.
  • the services may be composed from the common core services 6476, services external to the internal services 6480, internal processes 6484, or user defined services 6478.
  • the orchestration of the SOA is at the core of the capability to provide a unified data integration designs in the enterprise environment.
  • the orchestration between the clients, core services, metadata repository services, deployment engines, and external services and metadata enables designs meeting a wide range of enterprise needs.
  • the unified approach provides an architecture to bind together the entire suite for enterprise design and may allow for a single GUI 6464 capable of the seamless presentation of entire design process through to a to deployment design solution. This architecture also enables common models to be used at design and run time, and common deployment models leveraging the same services as the design GUI 6464.
  • the application client 6460 may programmatically provide additional functionality to SOA coordinated services by allowing services to call common functions as needed.
  • the functions of the application client 6460 may enhance the capability of the services of the SOA by allowing the services to call the functions and apply them as if they were part of the service.
  • the GUI client 6464 may provide the user interface to the SOA services and resources by allowing these services and resources to be graphically displayed and manipulated.
  • the SOA infrastructure 6470 may be J2EE based and may provide the facility to allow services to be developed independent of the deployment environment.
  • the SOA infrastructure 6470 may provide additional functionality in support of the deployment environment such as resource pooling, interception, serializing, load balancing, event listening, and monitoring.
  • the SOA infrastructure 6470 may have access to the computing environment and may influence services available to the GUI 6464 and may support a context-directed GUI 6464.
  • the SOA infrastructure 6464 may provide resource pooling using, for example, enterprise Java bean (EJB) and real time integration (RTI).
  • the resource pooling may permit a plurality of concurrent service instances to share a small number of resources, both internal and external.
  • the SOA infrastructure may provide a number of useful tools and features. Interception may provide for insertion of encryption, compression, tracing, monitoring, and other management tools that may be transparent to the services and provide reporting of these services to clients and other services.
  • Serialization and de-serialization may provide complex service request and data transfer support across a plurality of invocation protocols and across disparate technologies. Load balancing may allow a plurality of service instances to be distributed across a plurality of servers.
  • Load balancing may support high concurrency processing or high throughput processing accessing one or a plurality of processor on a plurality of servers.
  • Event listening and generation may enable the invocation of a service based on observed external events. This may allow the invocation of a second service based on the function of a first service and if a specified condition may occur.
  • Event listening may also support call back capability specifying that a service may be invoked using the same identifier as when previously invoked.
  • the service description registry 6466 may be a service that maintains all interface and usage specifications for all other services.
  • the service description registry 6466 may provide query and selection services to create instances of services, bindings, and protocols to be used with a design solution.
  • instances of services may be requested by a client or other service to the SOA where the SOA will request a query or selection of the called service.
  • the service description registry 6466 may then return the instance of the service for binding by the service binding 6464 and then may be used in the design solution.
  • the common core services 6476 may contain a plurality of services that may be invoked to create design solutions and runtime deployed solutions.
  • the common core services 6476 may contain all of the common services for design solutions therefore freeing other services from having to maintain the capabilities of these services themselves.
  • the services themselves may call other services within the common core services 6476 as required to complete the design solution.
  • a plurality of clients may access the common core services 6476 through the service binding 6464, SOA infrastructure 6470 and service description registry 6466.
  • Common core services may also be accessed by external services through metadata repository services 6452 and the SOA infrastructure 6470.
  • Additional external services may access any of the environments supported by the SOA infrastructure 6464 through the service implementation 6474.
  • the service implementation may provide access to external services through use of adapters and connectors 6448.
  • services 6480 may expose specific product functionality provided by other software products for developing design solutions. These services 6480 may provide investigation, design, development, testing, deployment, operation, monitoring, tuning, or other functions.
  • the services 6480 may perform the data integration jobs and may access the SOA for metadata, meta models, or services.
  • the service implementation 6474 may provide access for the processes 6484 to integration processes created with other tools and exposed as services to the SOA infrastructure 6470. Users of other tools may have created these integration processes and these processes may be exposed as services to the SOA and clients.
  • the service implementation 6474 may also provide access to user defined services 6478 that may allow users to define or create their own custom processes and expose them as SOA services. Exposing the user-defined services 6478 as SOA services allows them to be exposed to all clients and services of the SOA.
  • Fig. 16D depicts the internal architecture of an SOA, such as the IBM WebSphere Services Backbone.
  • SOA may incorporate or be composed of several different managers, such as a client invocation manager 6451 for managing the invocation of a client interface 6434, a policy manager 6453, that may manage service and binding policies, a J2EE manager 6455, a registry manager 6461, a persistence manager 6463, a service manager 6457 for managing the deployment of services, such as to add, modify or delete services, a binding manager 6465, a service deployment manager 6459 for managing deployment of services and a binding deployment manager 6467 for managing deployment of bindings for services.
  • An application server 6486, UDDI registry 6488 and a common repository 6490 may be associated with or part of the SOA.
  • the SOA may provide common services 6440 and product services 6442. Each service may have a description 6477 associated with it.
  • the description 6477 may have certain extensions associated with it.
  • An extension may be used to link a service to other services.
  • An example of an extension would be to attach a "monitoring service extension" to a service.
  • this extension can consist, for example, of an m-bean that the service uses to track some values related to the service behavior.
  • the m-bean can automatically be registered with the monitoring service.
  • an administrator can define "metrics" that are calculated values created on top of the raw attribute values of the m-bean and can also define “monitors” that are monitoring the m-bean to react to changes to the m-bean attribute values or to changes to the calculated values of the metrics.
  • An example of a behavior associated to a monitoring service can be to generate an event (managed by the event management service). In turn that event may call another service, or send an email or an alert to some specific users or administrators.
  • An m-bean associated with a service description can capture values of attributes of the service, such as the number of times a service was invoked, or the like.
  • common services 6440 such as a monitoring service, can monitor the m-bean and calculate various metrics, such as averages, weighted averages, or the like, based on the values and attributes captured in the m-beans.
  • the architecture can also include a service packager 6473 and a binding packager 6469.
  • a binding factory 6479 can be used to build bindings 6468, such as bindings that are appropriate for various services.
  • a service may have multiple bindings, which, as described below, may facilitate a variety of types of coupling between the service and various clients of the service.
  • bindings 6404 that allow the service to be accessed, such as through ports 6402.
  • various bindings such as EJB, JMS, web services and JCA bindings can be used to invoke services in the various embodiments of services oriented architectures described herein.
  • an API 13210 may be provided for assisting access to a service 6400.
  • the API may be provide various functions, such as selecting a particular binding for a service, where the selection is based on a condition or event, such as selecting a binding that is appropriate for a particular application.
  • bindings may vary in their flexibility, and an API 13210 may apply a tight or loose binding based on the conditions of the application or device that accesses the service.
  • the API 13210 may be a Java API or similar facility. In embodiments the same Java API 13210 may be used for different kinds of bindings.
  • a smart client 13208 may be supplied for a service 6400. The smart client 13208 may be another layer on top of the API 13210 or may substitute for the API 13210. The smart client 13208 may be stored and accessed through a registry associated with a service.
  • an application may download the appropriate smart client 13208 based on the device using the application, the context of the application, or the like.
  • a smart client 13208 may be used to buffer certain information that is used by a service and send the information to the service in a package, rather than having an application access the service constantly.
  • a user may wish to log only errors, rather than all events. By holding events until predetermined time periods, the user can reduce the number of calls to the server while still capturing all of the necessary events.
  • the smart client 13208 can thus execute various rules that optimize the use of a service by a device or application.
  • the smart client 13208 can select a binding, either alone or by interaction with an API 13210, that optimizes the binding of the client-side device or application to the service 6400 based on the conditions of access, the capabilities of the device, the context of the access, or the like.
  • the smart client 13208 or API 13210 can be used to store various access rules. For example, the rules might indicate that if a device or application is inside a firewall, then it can access a service using EJB bindings, while if the device or application is outside the firewall then it will access a service using a web service binding.
  • any such rules can be embodied in the API 13210 or may be included in a smart client 13208, which may optionally be listed in a registry with the service and downloaded by a client device or application that will access the service.
  • One of the benefits of a services oriented architecture is that it facilitates loose coupling between a client device or application that accesses a service and the code for the service itself; that is, a client device or application can invoke and use the service without knowing very much about the code for the service, needing to satisfy only certain predetermined inputs, such as what to input to the service (e.g., a file, an answer to a query, or the like).
  • An API 13210 and/or smart client 13208 can make up for diminished performance by ensuring that a service is accessed optimally, such as by selecting a correct binding, caching data into batches, to avoid constantly invoking services for small jobs, or the like.
  • a smart client 13208 provides effective performance in a loose coupling environment.
  • the smart client 13208 thus bridges the gap between a tight coupling environment and a loose coupling environment and allows the user, application or device that accesses a service to choose a type of binding along the spectrum between loose coupling and tight coupling (such as EJB) according to the performance expectation or requirements.
  • EJB coupling may perform better than web services, because EJB couplings are by nature more tightly coupled between client applications and the server side.
  • the smart client 13208 improves performance of both EJBs and web services by caching or buffering and sending things in appropriate batches. In situations where it is impossible or not desirable to cache or buffer items, a system can use a tight EJB binding to achieve good performance.
  • the API 13210 may hide the binding that the client device or application is using. With a smart client 13208, a user can tune the performance of the system by tuning the level of coupling between the client and the server.
  • the runtime 13200 of a service in a services oriented architecture may be a client itself of another service, such one or more of the common services described in connection with Figs. 76 through 83.
  • the foregoing can be accomplished using AOP.
  • entities known as interceptors can associate a policy to a service. Inside the policy of the service, interceptors can be plugged into the policies, and the interceptors can be clients of the common services.
  • a policy in a service can include a plug-in that invokes the monitoring service 12500 of Fig. 77.
  • AOP techniques can be used to insert code of interceptors into the code of various services described herein.
  • a user can create a piece of code and associate an "aspect" - a list of things to insert at runtime to the code as it is being executed.
  • the runtime program calls another piece of code, such as invoking a service, rather than doing what the code would normally do.
  • the code calls another function that is compiled independently.
  • the program can compile the source code to create the byte code, which is the runtime of Java, and a Java virtual machine reads the byte code.
  • the program has the
  • the AOP compiler does byte code manipulation and calls other types of code, such as the services in the services oriented architecture.
  • the methods and systems described herein include using common services either explicitly from an application or another service, or from an interceptor inserted in a service policy. That allows the same common service to be used by any service implementer and by the services oriented architecture framework transparently through the AOP sub-system.
  • Fig. 16F depicts a particular embodiment of an architecture for deploying a service in an SOA.
  • a variety of client-side and system-side components can be provided to enable the SOA.
  • client-side applications 6480 or GUIs 6434 such as clients for RTI services, common services 6440 or product services 6442, can be developed and configured to access specific services.
  • the client applications 6480 or GUIs 6434 can access the services directly through code that is designed to interact with various bindings, such as SOAP, EJB, JMS and web services bindings.
  • a proper binding may be selected and enabled in the client application 6480, 6434, such as a tight EJB binding or a loosely coupled web services binding.
  • the architecture may also include the API 13210, which may be designed to provide an interface to a particular service that is suitable for a particular type of client application, device, communication protocol, or the like.
  • a client invocation framework can automatically generate proxy, such as a C# or a C++ proxy, for either the 5 030953
  • a service through the client API 13210 can use any of the defined bindings transparently, according to business rules, without requiring special coding to interface with the bindings;
  • additional smart /rich clients can be created on top of the generated API 13210 to optimize the use of the particular service, and
  • proxies such as C# or C++ proxies, can be generated to provide access to these generated clients or rich/smart clients in environments different from that of the API 13210, such as a non Java environment in the case of a Java API.
  • the system may include specific clients, such as SOAP clients 6407, EJB clients 6409, JCA clients 6411 and JMS clients 6413.
  • the architecture may also include a WSDL layer 6415. Thus, multiple clients can exist to access a given service through various bindings, with a particular application or device being able to select the appropriate client, API 13210 or binding to access the service.
  • the system also includes various ports 6402 with appropriate bindings
  • the SOA runtime 13200 can enable many services, such as the various common services 6440 (such as logging, monitoring, provisioning, security, event management, administration, auditing and the like), product services 6442 (including metadata services 6452, RTI services, user-defined services, and the like). Services may also include connector access services, job execution services, metadata services, job browsing services, job deployment services, services related to workflow, job compilation services, logging services, security services, auditing services, monitoring services, licensing services, event management services and session management services.
  • the various common services 6440 such as logging, monitoring, provisioning, security, event management, administration, auditing and the like
  • product services 6442 including metadata services 6452, RTI services, user-defined services, and the like.
  • Services may also include connector access services, job execution services, metadata services, job browsing services, job deployment services, services related to workflow, job compilation services, logging services, security services, auditing services, monitoring services, licensing services, event management services and session management services.
  • a data integration module 6400 which could be any module, tool, facility, function, service, process, client application or other item that can be accessed by one or more pre-defined ports 6402 such as ports accessible through a computer network, a programming interface, or any other hardware or software connection or interface.
  • Each port can have an associated binding 6404, which allows a user to access the module 6400 through the port 6402, as described above in connection with various embodiments of SOA.
  • the module 6400 may include various operations 6408, which can be performed by the module 6400 when accessed through the bindings 6404 and ports 6402.
  • a client interface 6410 may invoke or interact with services.
  • One or more client interfaces 6410 may be invoked by or interact with the data integration service, module or facility 6400.
  • the client interface 6410 may be a C++, C#, Java or any other application.
  • Each module 6400 may include an interface 6414, such as for incoming and outgoing messages and other interactions with the service.
  • the module 6400 possibly through one or more bindings 6404 may invoke or interact with service policies and/or interceptors 6412.
  • the service policy 6412 may be a logging service, event management service, installation service, provisioning service, licensing service, monitoring service or auditing service.
  • An interceptor 6412 may associate a policy to a service. Any one or more of a client interface 6410, port 6402, binding 6404, service policy or interceptor 6412 may form or be part of a services oriented architecture, such as the IBM WebSphere Services Backbone, common Services 6440 or product services 6442. Messages can have various parts, corresponding to the requirements of the definition of the module 6400, such as those described above in connection with various embodiments of services oriented architectures. For example, an incoming message can be in a format suitable for a given binding and can include input triggers for triggering operations of the particular module 6400.
  • the module 6400 may include various operations 6408, connected to or creating an abstract interface 6414, which can be performed by the module 6400 when accessed through the bindings 6404 and ports 6402.
  • the module 6400 can be published in a registry, such as a registry for web services, to be identified and accessed by one or more users to accomplish the functions or operations defined in the definition of the module 6400.
  • the code for those operations may be any conventional code for data integration platform functions, or any other code useful in data integration platforms of various vendors, such as IBM and others.
  • modules 6400 can include product services 6442 for providing a wide range of functions, such as an extraction function, a data transformation, a loading function, a metadata management function, a data profiling function, a mapping function, a data auditing function, a data quality function, a data cleansing function, a matching function, a probabilistic matching function, a metabroker function, a data migration function, an atomic data repository function, a semantic identification function, a filtering function, a refinement and selection function, a design interface function, or many others.
  • the module 6400 can be a data extraction module 6500.
  • the data extraction module 6500 can be a data extraction module 6500.
  • the methods and systems described herein include providing a module for a data extraction function, providing a registry of services, providing one or more client interfaces 6410, service policies and/or interceptors 6412, and identifying the module in the registry, wherein the module can be accessed as a service in a services oriented architecture.
  • the module 6400 can be a data transformation module 6600.
  • the data transformation module 6600 may transform data from a form provided from a data facility 112 into a form for storage in a data target, such as any database, data facility, or process, or combinations of these.
  • the data transformation module 6600 may take the form of any of those described herein and may include, for example, one or more hubs or atomic data repositories, bridges, parallel execution engines, metabrokers, pipelining facilities or other facilities for moving data in batch or real-time transformations.
  • the transformation module 6600 may transform data from an XML or similar data format into the native format for a database or process, such as a supply chain database using SAP or Oracle.
  • the data transformation module 6600 may perform additional operations incidental to a data transformation, such as extracting, loading, or cleansing.
  • the methods and systems described herein include providing a module for a data transformation function, providing a registry of services, providing one or more client interfaces 6410, service policies and/or interceptors 6412, and identifying the module in the registry, wherein the module can be accessed as a service in a services oriented architecture.
  • the module 6400 can be a data loading module 6700.
  • the data loading module 6700 may load data into one or more databases, processes, or other targets.
  • a loading module 6700 may be a batch loading facility or a real-time loading facility, such as a loading facility that uses pipelining or similar functionality.
  • the loading module 6700 may be used to load data in parallel to more than one data integration process, module, system, data facility or other element.
  • a loading facility may load data that is stored on or associated with a product tracking system simultaneously into a database for tracking the physical location of goods and into a database for tracking metadata associated with the goods, such as metadata entered by users at the time of collection 53
  • the methods and systems described herein also include providing a module for a data loading function, providing a registry of services, providing one or more client interfaces 6410, service policies and/or interceptors 6412, and identifying the module in the registry, wherein the module can be accessed as a service in a services oriented architecture.
  • the module 6400 can be a metadata management module 6800.
  • the metadata management module 6800 may allow for storage and manipulation of metadata associated.
  • the metadata management module 6800 may, for example, take the form of any metadata facility described herein.
  • the metadata management module 6800 may include a metabroker, an atomic data repository, a migration engine and/or other metadata facility.
  • the metadata management module 6800 may be constructed to provide a variety of metadata functions that can be specified when the module 6800 is invoked as a service, or the metadata management module 6800 might perform a single, dedicated metadata management function.
  • the metadata management module 6800 may allow a user to store, add, annotate and otherwise manipulate metadata.
  • a marketing manager may modify the metadata associated with a particular product to account for the fact that the product is currently the subj ect of a marketing campaign in a particular region.
  • an engineer may modify the metadata associated with a part to reflect a change from metric units to English units, or vice versa, or to add a new characteristic for existing inventory such as RFID or UPC identification codes.
  • the methods and systems described herein also include providing a module for a metadata management function, providing a registry of services, providing one or more client interfaces 6410, service policies and/or interceptors 6412, and identifying the module in the registry, wherein the module can be accessed as a service in a services oriented architecture.
  • the module 6400 can be a data profiling module 6900.
  • the data profiling module 6900 may be used to profile data that is stored in a data facility or associated with a system. For example, the data profiling module 6900 may determine the content of columns or tables of data or metadata or assess the quality of the data or metadata.
  • the data profiling module 6900 may generate a metadata model for one or more data sources to facilitate automation of subsequent data integration tasks.
  • the data profiling module 6900 may also provide recommendations for constructing a target database from a source being profiled, such as keys and table normalizations.
  • the methods and systems described herein also include providing a module for a data profiling function, providing a registry of services, providing one or more client interfaces 6410, service policies and/or interceptors 6412, and identifying the module in the registry, wherein the module can be accessed as a service in a services oriented architecture.
  • the module 6400 can be a data auditing module 7000.
  • the data auditing module 7000 may be used to audit data that is stored in a data facility or associated with a system. For example, the data auditing module 7000 may determine the origin of a column of a table and track the job function of each user who modified the data. The data auditing module 7000 may also perform tasks such as validation of data ranges, calculations, value combinations, and so on.
  • the methods and systems described herein also include providing a module for a data auditing function, providing a registry of services, providing one or more client interfaces 6410, service policies and/or interceptors 6412, and identifying the module in the registry, wherein the module can be accessed as a service in a services oriented architecture.
  • the module 6400 can be a data cleansing module 7100.
  • the data cleansing module 7100 may cleanse data or metadata that is received from a database or system.
  • the data cleansing module 7100 may take the form of any data cleansing facility, and may provide any data cleansing operations, such as any of those provided by the WebSphere Quality Stage product from IBM.
  • the data cleansing module 7100 may rapidly perform cleansing operations, such as de-duplicating records, so that any processes, systems, functions, modules, or the like that depend on the data have good data, rather than, for example, duplicate or erroneous data.
  • the methods and systems described herein also include providing a module for a data cleansing function, providing a registry of services, providing one or more client interfaces 6410, service policies and/or interceptors 6412, and identifying the module in the registry, wherein the module can be accessed as a service in a services oriented architecture.
  • the module 6400 can be a data quality module 7200.
  • the data quality module 7200 may assess the quality of data or metadata.
  • the data quality module 7200 may provide any data quality functionality, such as functions provided by the WebSphere QualityStage product from IBM.
  • the data quality module 7200 may determine the extent of duplication and erroneous data and may correct such errors.
  • the methods and systems described herein also include providing a module for a data quality function, providing a registry of services, providing one or more client interfaces 6410, service policies and/or interceptors 6412, and identifying the module in the registry, wherein the module can be accessed as a service in a services oriented architecture.
  • the module 6400 can be a data matching module 7300.
  • the data matching module 7300 may match data or metadata associated with an item to another item, such as a process, identifier, element, business process, business object, subject, data facility, rule, system or the like.
  • a matching module 7300 may match product data with a particular process, so that the product data or metadata is stored in the correct process.
  • the methods and systems described herein also include providing a module for a data matching function, providing a registry of services, providing one or more client interfaces 6410, service policies and/or interceptors 6412, and identifying the module in the registry, wherein the module can be accessed as a service in a services oriented architecture.
  • the data matching function may be a probabilistic matching function.
  • the module 6400 can be a metabroker module 7400.
  • a metabroker module 7400 may convert or transform metadata from one format or language to another, or between metadata models even if they use the same database technology.
  • a metabroker module 7400 may convert metadata associated with a particular line of products from SAP format to a format that can be used with an Oracle database.
  • a company using its own metadata model for inventory may acquire another company that uses a different metadata model for inventory.
  • the metabroker module 7400 may be used as a translator for combining or sharing data between inventory databases of the two companies.
  • the methods and systems described herein also include providing a module for a metabroker function, providing a registry of services, providing one or more client interfaces 6410, service policies and/or interceptors 6412, and identifying the module in the registry, wherein the module can be accessed as a service in a services oriented architecture.
  • the metabroker function maintains the semantics of a data integration function across multiple data integration platforms.
  • the module 6400 can be a data migration module 7500.
  • a data migration module 7500 may move data from one data facility 112 to another data facility 112 or hub.
  • a data migration module 7500 may move data from a customer database to a hub, where it may be acted upon by a metabroker module 7400, and then migrated or otherwise transferred to a finance database.
  • the methods and systems described herein also include providing a module for a data migration function, providing a registry of services, providing one or more client interfaces 6410, service policies and/or interceptors 6412, and identifying the module in the registry, wherein the module can be accessed as a service in a services oriented architecture.
  • the module 6400 can be an atomic data repository module 7600.
  • An atomic data repository module 6400 may provide one or more fundamental data operations, such as read or write, for communicating with a repository using atomic data structures of the repository.
  • the atomic data repository module 7600 may be employed for simple data transactions with a metadata model or other item stored in a repository, or may be combined with other modules 7600 to provide core repository services such as querying metadata models and the like.
  • the methods and systems described herein also include providing a module for an atomic data repository, providing a registry of services, providing one or more client interfaces 6410, service policies and/or interceptors 6412, and identifying the module in the registry, wherein the module can be accessed as a service in a services oriented architecture.
  • the module 6400 can be a semantic identification module 7700.
  • a semantic identification module 7700 may identify an object, table, column or other item based on its relationship with other objects, tables, columns and other items. For example, a semantic identification module 7700 may create a string that may be acted upon by a data transformation module 6600.
  • the methods and systems described herein also include providing a module for a semantic identification function, providing a registry of services, providing one or more client interfaces 6410, service policies and/or interceptors 6412, and identifying the module in the registry, wherein the module can be accessed as a service in a services oriented architecture.
  • the module 6400 can be a filtering module 7800.
  • a filtering module 7800 may filter data, metadata, objects, items or instances of an item based on the associated level of abstraction or other properties. For example, a filtering module 7800 may filter the physical instances of the columns of a table in a hub from the logical instances based on the level of abstraction associated with each instance.
  • the methods and systems described herein also include providing a module for a filtering function, providing a registry of services, providing one or more client interfaces 6410, service policies and/or interceptors 6412, and identifying the module in the registry, wherein the module can be accessed as a service in a services oriented architecture.
  • the filtering is based on a level of abstraction.
  • the level of abstraction can be at least one of a physical level of abstraction and a logical level of abstraction.
  • the module 6400 can be a refinement and selection module 7900.
  • a refinement and selection module 7900 may filter data, metadata, instances or other items at the database, hub, query or other levels or stages of a process.
  • a refinement and selection module 7900 may allow a transformation operation to be performed on a query before it is sent to the relevant database.
  • the methods and systems described herein also include providing a module for a refinement and selection facility, providing a registry of services, and identifying the facility in the registry, wherein the facility can be accessed as a service in a services oriented architecture.
  • the refinement and selection facility allows the system to distinguish between a logical level of abstraction and a physical level of abstraction.
  • the module 6400 can be a database content analysis module 8000.
  • a database content analysis module 8000 may analyze and summarize the content of a database and suggest possible related databases. For example, a database content analysis module may analyze a customer database and summarize salient information regarding the top twenty-five customers. As another example, the database content analysis module 8000 may provide a statistical analysis of numerical data in columns of a database, or report on the frequency of empty records, or report the number and size of tables, and so on. The database content analysis module 8000 may also characterize database structure, and provide metadata relating to, for example, keys, column names, table names, and hierarchical or other relationships among the foregoing.
  • the database content analysis module 8000 may provide any quantitative or qualitative analysis of a database than can be expressed in program code, and may provide corresponding reports or metrics that may be used by other modules 6400 or designers to characterize and apply the database contents.
  • the database content analysis module may also, or instead, combine functions of modules described below for analyzing tables, columns and rows of databases, or employ those modules in analysis a database.
  • the methods and systems described herein also include providing a module for analyzing the contents of a database, providing a registry of services, providing one or more client interfaces 6410, service policies and/or interceptors 6412, and identifying the module in the registry, wherein the module can be accessed as a service in a services oriented architecture.
  • the module 6400 can be a database table analysis module 8100.
  • a database table analysis module 8100 may analyze and summarize the content of a table.
  • a database table analysis module 8100 may provide the hierarchical position of one table of a database with respect to other tables of the database.
  • the methods and systems described herein also include providing a module for analyzing a table of a database, providing a registry of services, providing one or more client interfaces 6410, service policies and/or interceptors 6412, and identifying the module in the registry, wherein the module can be accessed as a service in a services oriented architecture.
  • the module 6400 can be a database row analysis module 8200.
  • a database row analysis module 8200 may analyze and summarize the content of a row of a table. For example, a database row analysis module may suggest other rows and/or tables that may be related to a row of interest.
  • the database row analysis module 8200 may also, or instead, evaluate the validity of records within a row according to information about database structure.
  • the methods and systems described herein also include providing a module for analyzing a row of a database, providing a registry of services, providing one or more client interfaces 6410, service policies and/or interceptors 6412, and identifying the module in the registry, wherein the module can be accessed as a service in a services oriented architecture.
  • the module 6400 can be a data structure analysis module 8300.
  • a data structure analysis module 8300 may analyze the overall structure of the data or metadata associated with the data relating to a row, column, table or data facility 112, or any combination of these. For example, a data structure analysis module 8300 may generate a report summarizing the number and hierarchical relationship of the rows, columns and tables composing a particular database 112.
  • the methods and systems described herein also include providing a module for analyzing a data structure, providing a registry of services, providing one or more client interfaces 6410, service policies and/or interceptors 6412, and identifying the module in the registry, wherein the module can be accessed as a service in a services oriented architecture.
  • the module 6400 can be a recommendation module 8400.
  • a recommendation module 8400 may recommend a target data facility for an operation or process.
  • a recommendation module 8400 may locate and recommend an unused hub for a process involving a metabroker module 6600.
  • the recommendation module 8400 may recommend a target database for an ETL operation based upon known characteristics of potential target databases such as access time, fault tolerance, capacity, and so on.
  • the recommendation module 8400 may also, or instead, provide a number of different recommendations for the structure of a target database using techniques analogous to those employed by IBM's WebSphere ProfileStage and AuditStage products.
  • the methods and systems described herein also include providing a module for recommending a target data facility, providing a registry of services, providing one or more client interfaces 6410, service policies and/or interceptors 6412, and identifying the module in the registry, wherein the module can be accessed as a service in a services oriented architecture.
  • the module 6400 can be a primary key module 8500.
  • a primary key module 8500 may use dependency information from table analysis to identify primary key candidates for a table under analysis. For example, the primary key module 8500 may determine that the customer name column should be a primary key for a customer information table. This information may be used to assist in designing a target database for an ETL operation or other data integration process requiring a data target.
  • the methods and systems described herein also include providing a module for providing a primary key for a data integration function, providing a registry of services, providing one or more client interfaces 6410, service policies and/or interceptors 6412, and identifying the module in the registry, wherein the module can be accessed as a service in a services oriented architecture.
  • the module 6400 can be a foreign key module 8600.
  • a foreign key module 8600 may analysis a data structure to identify foreign keys. This information may be useful in, for example, preserving the integrity of relationships between tables, and in locating a primary key table with a data structure.
  • the methods and systems described herein also include providing a module for providing a foreign key for a data integration function, providing a registry of services, providing one or more client interfaces 6410, service policies and/or interceptors 6412, and identifying the module in the registry, wherein the module can be accessed as a service in a services oriented architecture.
  • the module 6400 can be a table normalization module 8700.
  • a table normalization module 8700 for a data integration function may transform or a split a table to eliminate dependencies and/or remove redundant data and anomalies. Normalization may provide significant performance improvements in a database including faster queries and improved data integrity.
  • the methods and systems described herein also include providing a module for providing a table normalization for a data integration function, providing a registry of services, providing one or more client interfaces 6410, service policies and/or interceptors 6412, and identifying the module in the registry, wherein the module can be accessed as a service in a services oriented architecture.
  • the module 6400 can be a source-to-target mapping module 8800.
  • a source-to-target mapping module 8800 for a data integration function may create a data transformation mapping for mapping data or metadata from the source system to one or more target data facilities.
  • a mapping facility may map product location data collected by a sensor to a new database combining all information about products.
  • a mapping may be between a supply chain database and an inventory database, or more generally from any source to any target.
  • mapping typically connotes literal transfer between two locations
  • the source-to-target mapping module may also specify transformations with a mapping, such as combinations, filters, or other conversions or transformations.
  • the mapping may specify a coincident transformation from minutes to hours or days.
  • the methods and systems described herein also include providing source-to-target mapping for a data integration function, providing a registry of services, providing one or more client interfaces 6410, service policies and/or interceptors 6412, and identifying the module in the registry, wherein the module can be accessed as a service in a services oriented architecture.
  • the module 6400 can be an automatic data integration job generation module 8900.
  • An automatic data integration job module 8900 may automate the creation of a data integration job by generating a data integration job using a profile or specification provided to the module 8900.
  • the data integration job may be provided as another module 6400 that may be registered for subsequent use throughout an enterprise, and the automatic data integration job generation module 8900 may return a specification of where and how to access the newly created job module.
  • an automatic data integration module 8900 may generate a commonly used data integration job for a stored profile for that type of data integration job.
  • the commonly used data integration job may be the integration of customer credit information with information regarding the customer's business. This job may need to be performed for each new customer.
  • the methods and systems described herein also include providing a module for automatically generating a data integration job from a profile for a data integration job, providing a registry of services, providing one or more client interfaces 6410, service policies and/or interceptors 6412, and identifying the module in the registry, wherein the module can be accessed as a service in a services oriented architecture.
  • the module 6400 can be a defect detection module 9000.
  • a defect detection module 9000 may detect defects in a data facility, process or other operation. For example, a defect detection module 9000 may determine that a data integration process was performed incorrectly resulting in a table with mismatched entries.
  • the methods and systems described herein also include providing a module for defect detection, providing a registry of services, providing one or more client interfaces 6410, service policies and/or interceptors 6412, and identifying the module in the registry, wherein the module can be accessed as a service in a services oriented architecture.
  • the module 6400 can be a performance measurement module 9100.
  • a performance measurement module 9100 may measure the performance of a data integration process.
  • a performance measurement module 9100 may record the time and processor load for a given data integration operation.
  • the performance measurement module 9100 may also assist with the optimization or modification of data integration processes.
  • the methods and systems described herein also include providing a module for measuring the performance of a data integration function, providing a registry of services, providing one or more client interfaces 6410, service policies and/or interceptors 6412, and identifying the module in the registry, wherein the module can be accessed as a service in a services oriented architecture.
  • the module 6400 can be a data de-duplication module 9200.
  • De-duplication may be an important preliminary quality enhancement step in an ETL operation, or any other data integration process involving an extraction of data from a database.
  • the methods and systems described herein also include providing a module for data de-duplication, providing a registry of services, providing one or more client interfaces 6410, service policies and/or interceptors 6412, and identifying the module in the registry, wherein the module can be accessed as a service in a services oriented architecture.
  • the de-duplication module matches data items based on a probability.
  • the de- duplication module discards duplicate items.
  • the module 6400 can be a statistical analysis module 9300.
  • a statistical analysis module 9300 may perform tests and gather statistics relating to data, metadata or the processes and operations being 30953
  • a statistical analysis module 9300 may generate a relationship function describing the relationship between the number of units of a product sold and the age of the customer.
  • a statistical analysis module 9300 may also provide process metrics, such as determining the average time it takes to perform a certain data integration operation with a certain processor configuration. More generally, the statistical analysis module 9300 may perform any statistical analysis on data within a data source, metadata for one or more data sources, or processes operating on data or metadata.
  • the methods and systems described herein also include providing a module for statistical analysis of a plurality of data items, providing a registry of services, providing one or more client interfaces 6410, service policies and/or interceptors 6412, and identifying the module in the registry, wherein the module can be accessed as a service in a services oriented architecture.
  • the module 6400 can be a data reconciliation module 9400.
  • a data reconciliation module may reconcile data and metadata from disparate data facilities 112.
  • a data reconciliation module 9400 may join similar product entries from a company's product databases corresponding to two different geographic regions allowing for the creation of master records.
  • a data reconciliation module 9400 may reconcile multiple instances of an identical or nearly identical record.
  • a customer may have two different records with different addresses. These records may be reconciled, such as by using a creation date or a most recent transaction date, into a single record. Other reconciliations may be useful in a data integration system, such as a reconciliation of database backups or a reconciliation of versions of a metadata model, and may be performed using a data reconciliation module 9400.
  • the methods and systems described herein also include providing a module for reconciling data from a plurality of data facilities, providing a registry of services, providing one or more client interfaces 6410, service policies and/or interceptors 6412, and identifying the module in the registry, wherein the module can be accessed as a service in a services oriented architecture.
  • the module 6400 can be a transformation function library module 9500.
  • a transformation function library module 9500 may provide access to a library of transformation functions. For example, common transformation functions, such as integration of customer credit and purchasing information, or transformation of data between units (e.g., Celsius to Fahrenheit or quarts to liters), or revision of exchanges for telephone numbers, may be maintained in a library so that a user does not need to create the operation from scratch each time the user wished to perform the operation. Other more fundamental transformations may also be used, such as character strings to numerical values or vice versa, or change of numerical value types (e.g. byte, word, long word).
  • the methods and systems described herein also include providing a module for accessing library of transformation functions, providing a registry of services, providing one or more client interfaces 6410, service policies and/or interceptors 6412, and identifying the module in the registry, wherein the module can be accessed as a service in a services oriented architecture.
  • the module 6400 can be a version management module 9600.
  • a version management module 9600 may assist in the management of different data integration jobs stored in a library or may assist in the creation and execution of data integration jobs.
  • a version management module may allow a user to maintain multiple versions of the customer credit and purchasing data integration job described above. It may be the case that customers often have two or three accounts that require integration, so a separate version of the data integration job may be maintained for jobs dealing with two or three transactions.
  • the version management module 9600 may be used to select a version of a metadata model, metabroker, or other repository object, or to query a registry or repository about what versions of these objects exist.
  • the module 9600 may also support version-related functions, such as branching and reconciliation of multiple versions.
  • the methods and systems described herein also include providing a module for managing versions of a data integration job, providing a registry of services, providing one or more client interfaces 6410, service policies and/or interceptors 6412, and identifying the module in the registry, wherein the module can be accessed as a service in a services oriented architecture.
  • the module 6400 can be a version management module 9700 of a different type.
  • the version management module 9700 of Fig. 50 may control versions of data or metadata used in a data integration process.
  • the module 9600 of Fig. 48 may control versions of tools and processes
  • the methods and systems described herein also include providing a module for managing versions of a data integration job, providing a registry of services, providing one or more client interfaces 6410, service policies and/or interceptors 6412, and identifying the module in the registry, wherein the module can be accessed as a service in a services oriented architecture.
  • the module allows a user to share a version with another user.
  • the module allows a user to check in and check out a version of a data integration job in order to use the data integration job.
  • the module 6400 can be a parallel execution module 9800.
  • a parallel execution module 9800 may allow for the dynamic execution of data integration jobs in parallel.
  • the parallel execution module 9800 may analyze processing and data dependencies of portions of an execution task to generate an appropriate parallel execution order, or may receive explicit parallelism instructions along with the identification of a task for execution.
  • the methods and systems described herein also include providing a module for parallel execution of a data integration function, providing a registry of services, providing one or more client interfaces
  • the module 6400 can be a data partitioning module 9900.
  • a data partitioning module 9900 may break up a source record set into several sub-sets. For example, for a data integration job involving a table, the table may be broken into several sub-tables, each having its own data, index, and so forth, and the data integration job performed on each sub-table simultaneously. This process may result in shorter processing times.
  • the methods and systems described herein also include providing a module for partitioning data, providing a registry of services, providing one or more client interfaces 6410, service policies and/or interceptors 6412, and identifying the module in the registry, wherein the module can be accessed as a service in a services oriented architecture.
  • the module 6400 can be a partitioning and repartitioning module 10000.
  • a partitioning and repartitioning module 10000 may function as a portioning module 9900 with the added functionality of being able to recombine the original or transformed subsets.
  • a partitioning and repartitioning module 10000 may join the sub-tables to create a transformed table resembling the source table.
  • the methods and systems described herein also include providing a module for partitioning and repartitioning data, providing a registry of services, providing one or more client interfaces 6410, service policies and/or interceptors 6412, and identifying the module in the registry, wherein the module can be accessed as a service in a services oriented architecture.
  • the module 6400 can be a database interface module 10100.
  • a database interface module 10100 may allow a user to interact with a database and/or perform data integration jobs.
  • a database interface module 10100 may allow a user to view certain entries in a database, such as the sales performance history for a certain employee.
  • the database interface module 10100 may provide atomic user interaction, such as an individual query, read, write, or other transaction.
  • the database interface module 10100 may also, or instead, provide more general database connectivity through which a data integration job or other process may operate continuously on a database.
  • the methods and systems described herein also include providing a database interface module, providing a registry of services, providing one or more client interfaces 6410, service policies and/or interceptors 6412, and identifying the module in the registry, wherein the module can be accessed as a service in a services oriented architecture.
  • the interface module facilities an interface to databases of a plurality of database vendors.
  • the module 6400 can be a data integration module 10200.
  • a data integration module 10200 may allow for the creation or execution of data integration jobs. For example, a user may create and schedule certain transformation jobs using the data integration module 10200, or investigate what data integration processes are available in modules 6400 using the data integration module 10200.
  • the methods and systems described herein also include providing a module for a data integration function, providing a registry of services, providing one or more client interfaces 6410, service policies and/or interceptors 6412, and identifying the module in the registry, wherein the module can be accessed as a service in a services oriented architecture.
  • the module 6400 can be a synchronization module 10300.
  • a data synchronization module 10300 may synchronize data from disparate sources.
  • a data synchronization module 10300 may align similar entries in different databases, perform cross-linking analysis and remove any duplicative or erroneous records.
  • the methods and systems described herein also include providing a module for synchronizing data, providing a registry of services, providing one or more client interfaces 6410, service policies and/or interceptors 6412, and identifying the module in the registry, wherein the module can be accessed as a service in a services oriented architecture.
  • the module facilitates synchronization of data across a plurality of hierarchical data formats.
  • the module facilitates synchronization of data across a plurality of transactional formats. In embodiments the module facilitates synchronization of data across a plurality of operating environments. In embodiments the module facilitates synchronization of Electronic Data Interchange format data. In embodiments the module facilitates synchronization of HIPAA data. In embodiments the module facilitates synchronization of SWIFT format data.
  • the module 6400 can be a metadata directory supply module 10400.
  • a metadata directory supply module 10400 may serve as a glossary or definitional database that provides insight into the types of information recorded by an enterprise. For example, user in the sales department can access a metadata directory using the metadata directory supply module 10400 to learn about the types of data recorded by the production department. The user may learn that the production department defines units in lots, while the sales department defines units in hundred-lots. As a result, the user can adjust her supply forecasts accordingly.
  • the methods and systems described herein also include providing a module for supplying a metadata directory, providing a registry of services, providing one or more client interfaces 6410, service policies and/or interceptors 6412, and identifying the module in the registry, wherein the module can be accessed as a service in a services oriented architecture.
  • the module 6400 can be a graphical depiction module 10500.
  • a graphical depiction module 10500 may depict in graphical format the effects of a modification to a data integration job.
  • a graphical depiction module 10500 may show a user the larger table that may result if the data normalization step is skipped in a data integration process.
  • the graphical depiction module 10500 may be particularly useful, for example, to support a strongly separated user interface for interacting with a data integration system.
  • the methods and systems described herein also include providing a module for graphical depiction of the impact of a change to a data integration function, providing a registry of services, providing one or more client interfaces 6410, service policies and/or interceptors 6412, and identifying the module in the registry, wherein the module can be accessed as a service in a services oriented architecture.
  • the module 6400 can be a metabroker module 10600.
  • a metabroker module 10600 may provide metadata concerning metabrokers registered in a system.
  • the metabroker module 10600 may permit queries over available metabrokers to assist in a manual or automated selection of metabrokers for design of a data integration process.
  • the methods and systems described herein also include providing a module for creating a metabroker, providing a registry of services, providing one or more client interfaces 6410, service policies and/or interceptors 6412, and identifying the module in the registry, wherein the module can be accessed as a service in a services oriented architecture.
  • the module 6400 can be a metadata hub repository module 10700.
  • a metadata hub repository module 10700 may allow for the transient storage of metadata so that operations may be performed on the metadata.
  • the metadata hub repository module 10700 may allow metadata to occupy a hub in such a way as to allow a metabroker to convert the metadata to an SAP compatible format.
  • the methods and systems described herein also include providing a module for a hub repository of metadata, providing a registry of services, providing one or more client interfaces 6410, service policies and/or interceptors 6412, and identifying the module in the registry, wherein the module can be accessed as a service in a services oriented architecture.
  • the hub stores semantic models for a plurality of data integration platforms.
  • the module 6400 can be a packaged application connectivity kit (PACK) module 10800.
  • PACK application connectivity kit
  • a PACK module 10800 may allow for the interchange of data and metadata between disparate applications.
  • a PACK module 10800 may allow data and metadata generated and/or stored using Informatica PowerCenter to be accessed and used by SAP BW.
  • a PACK may enable connectivity to or between any database, application, or enterprise running on any operating system and/or hardware.
  • the PACK module 10800 may be particularly useful, for example, when integrating legacy data systems into an enterprise, or when integrating data across previously separated divisions of a business that use different database management technologies.
  • the methods and systems described herein also include providing a PACK, providing a registry of services, providing one or more client interfaces 6410, service policies and/or interceptors 6412for the PACK, and identifying the PACK in the registry, wherein the PACK can be accessed as a service in a services oriented architecture.
  • the module 6400 can be an industry-specific data model storage module 10900.
  • An industry-specific data model storage module 10900 may allow for the storage of industry-specific data models. For example, companies in the trucking industry may record certain characteristics about shipments.
  • An industry-specific data model storage module 10900 may allow for the storage of a template that can be used by trucking companies.
  • Certain industries employ widely adopted or legally required standards for data storage and communication. For example, HIPAA mandates certain transaction types and privacy standards that must be used by health care providers. SWIFT is commonly used for transactions in financial industries. These and other similar standards may be managed and deployed within a data integration system using the industry-specific data model storage module 10900.
  • the methods and systems described herein also include providing a module for storing an industry-specific data model, providing a registry of services, providing one or more client interfaces 6410, service policies and/or interceptors 6412, and identifying the module in the registry, wherein the module can be accessed as a service in a services oriented architecture.
  • the model may be a manufacturing industry model, a retail industry model, a telecommunications industry model, a healthcare industry model, a financial services industry model or a model from any other industry.
  • the module 6400 can be a template module 11000.
  • a template module 11000 may allow a user to build and store templates for certain type of data integration jobs.
  • a template may combine tasks and functions of other modules 6400 described herein, or any other tasks and functions suitable for a data integration system, to capture a particular design solution for use, reuse, and refinement.
  • a user may build and store a template that integrates customer credit and order information. The user may make this template available to other users through the transformation function library module 9500.
  • the methods and systems described herein also include providing a template for building a data integration function, providing a registry of services, providing one or more client interfaces 6410, service policies and/or interceptors 6412 for the template, and identifying the template in the registry, wherein the template can be accessed as a service in a services oriented architecture. Referring to Fig.
  • the module 6400 can be a business rule creation module 11100.
  • a business rule creation module 11100 may provide any business rule or business logic capable of formal expression, and may include comparisons, conditional evaluations, mathematical evaluations, statistical analyses, Boolean operations, and any other operations that may be performed in the context of providing a business rule. For example, a company may require a minimum credit score before issuing credit to a customer, and this may be formalized as a business rule. A company may have predetermined programs for salaries and pensions that may be applied to payroll calculations in a human resources department, or a company may maintain different hiring criteria for different departments, or a company may be required to report sales to a local government agency. The scope and complexity of possible business rules is unlimited.
  • any such rule that can be programmatically expressed may be created using the business rule creation module 11100 and subsequently applied in data integration processes.
  • the methods and systems described herein also include providing a module for creating a business rule, providing a registry of services, providing one or more client interfaces 6410, service policies and/or interceptors 6412, and identifying the module in the registry, wherein the module can be accessed as a service in a services oriented architecture.
  • the module 6400 can be a validation table creation module 11200.
  • a validation table creation module 11200 may allow for the creation of a validation table for other data integration functions.
  • the methods and systems described herein also include providing a module for creating a validation table, providing a registry of services, providing one or more client interfaces 6410, service policies and/or interceptors 6412, and identifying the module in the registry, wherein the module can be accessed as a service in a services oriented architecture.
  • the module 6400 can be a data integration module 11300. It will be noted that a data integration module 10200 has been described in reference to Fig. 66.
  • the module 11300 described here relates instead to a module that executes a specific data integration job, task, or function.
  • a data integration job created with the data integration module 10200 may be executed as a prepackaged job in the data integration module 11300 described here.
  • the data integration module 11300 may perform any data integration job, task, or process.
  • the data integration module 10200 may also be associated with a control in a graphical user interface labeled to indicate the nature of the data integration function. In this manner, a strongly separated user interface may have access to any user-defined data integration function through a button, drop-down menu item, or other control, which may be conveniently labeled for user identification.
  • the methods and systems described herein also include providing a module for a data integration function, providing a registry of services, providing one or more client interfaces 6410, service policies and/or interceptors 6412, and identifying the module in the registry, wherein the module can be accessed as a service in a services oriented architecture.
  • the module 6400 can be a business metric creation module 11400.
  • a business metric creation module 11400 may allow for the creation of certain business metrics to be associated with a business or subset of a business.
  • the business may be a consumer products business and the business metric creation module 11400 may help to create a metric measuring increased sales per dollar of advertising.
  • the business metric creation module 11400 may also collect the necessary data for computation of the metrics or work with other modules and systems to this end.
  • the module 11400 may enable creation of a metric using any mathematical, logical, conditional, or other function, or combinations thereof.
  • the methods and systems described herein also include providing a module for creating a business metric, providing a registry of services, providing one or more client interfaces 6410, service policies and/or interceptors 6412, and identifying the module in the registry, wherein the module can be accessed as a service in a services oriented architecture.
  • the module 6400 can be a target database definition module 11500.
  • a target database definition module 11500 may assist in the definition of a target database, including the type and structure of the database.
  • the target database definition module 11500 may receive recommendations from profiling and auditing modules, and prepare a database definition for a target database suitable for a particular data source and transformation.
  • the module 11500 may allow for interactive control at various decision points, or may function deterministically without user intervention.
  • the methods and systems described herein also include providing a module for defining a target database, providing a registry of services, providing one or more client interfaces 6410, service policies and/or interceptors 6412, and identifying the module in the registry, wherein the module can be accessed as a service in a services oriented architecture.
  • the module 6400 can be a mainframe data profiling module 11600.
  • a mainframe data profiling module 11600 may allow for the profiling of mainframe data.
  • a computer mainframe may have particular data formats, connectivity requirements, security layers, and so on.
  • the mainframe data profiling module 11600 may be designed to address all of these issues for a particular mainframe or type of mainframe to accelerate design of data integration systems using such a mainframe.
  • the methods and systems described herein also include providing a module for profiling mainframe data, providing a registry of services, providing one or more client interfaces 6410, service policies and/or interceptors 6412, and identifying the module in the registry, wherein the module can be accessed as a service in a services oriented architecture.
  • the module 6400 can be a batch processing module 11700.
  • a batch processing module 11700 may allow for the processing of data integration jobs in batch. For example, with certain processor configurations it may be desirable to process transactions in batch. As another example, it may be desirable to concentrate processing away from peak computer-use times, such as from 1 :00 a.m. to 3 :00 a.m. Batch processing may facilitate the execution of large data integration jobs and processes at user-programmable times, or on user- selectable machines. The batch processing module 11700 may aid facilitate processing in this manner, or any other controlled manner.
  • the methods and systems described herein also include providing a module for batch processing a batch of data, providing a registry of services, providing one or more client interfaces 6410, service policies and/or interceptors 6412, and identifying the module in the registry, wherein the module can be accessed as a service in a services oriented architecture.
  • the module 6400 can be a cross-table analysis module 11800.
  • a cross-table analysis module 11800 may allow for the analysis of relationships and linkage between tables, which may yield significant benefits in the construction of target databases.
  • a cross-table analysis module 11800 may allow a user to determine the degree of relatedness between two customer data tables. Based on this information a user may decide to integrate the information in the tables.
  • the methods and systems described herein also include providing a module for cross-table analysis, providing a registry of services, providing one or more client interfaces 6410, service policies and/or interceptors 6412, and identifying the module in the registry, wherein the module can be accessed as a service in a services oriented architecture.
  • the module 6400 can be a relationship analysis module 11900.
  • a relationship analysis module 11900 may analyze the relationship between any two or more rows, columns, tables, databases, or combinations of these and other data source items. For example, a relationship analysis module 11900 may determine the relationship between a column and a table. This information may be used to validate other data in the database, or identify keys or other structural information for a database that has not yet been fully characterized.
  • the methods and systems described herein also include providing a module for relationship analysis, providing a registry of services, providing one or more client interfaces 6410, service policies and/or interceptors 6412, and identifying the module in the registry, wherein the module can be accessed as a service in a services oriented architecture.
  • the module 6400 can be a data definition language code generation module 12000.
  • a data definition language (DDL) code generation module 12000 may generate DDL code for a database, either to create a new target database, or modify a source or target database.
  • the data definition language code generation module 12000 may generate DDL code in response to other structural database descriptions provided to the module, or as a parameter accompanying some other data integration process.
  • DDL code may be applied directly to a database, such as an SQL database, to affect structural changes therein.
  • the methods and systems described herein also include providing a module for DDL code, providing a registry of services, providing one or more client interfaces 6410, service policies and/or interceptors 6412, and identifying the module in the registry, wherein the module can be accessed as a service in a services oriented architecture.
  • the methods and systems may further include using the module to create a mapping between source and target data facilities.
  • the module 6400 can be a design interface module 12100.
  • a design interface module 12100 may provide a user interface for the creation and design of data integration jobs.
  • a design interface module 12100 may include a graphical user interface.
  • the design interface module 12100 may be strongly separated, providing only the low-level controls and layout for an interface, while being associated with other modules 6400 or code that performs functions within a data integration system.
  • a design interface module 12100 may allow a user to link various operations on a screen to create a data integration job.
  • the design interface module 12100 may provide only functional access to a design, such as a metadata model or data integration job, by providing suitable programmatic control over storage, retrieval, and modification of the design.
  • the design interface module 12100 may in turn connect the programmatic control to a client such as a program or a graphical user interface.
  • the methods and systems described herein also include providing a design interface module for designing a US2005/030953
  • the module 6400 can be a data integration job development module 12200.
  • a data integration job development module 12200 may allow for the development of a data integration job.
  • a user may use the data integration job development module 12200 to build upon pre-existing data integration jobs.
  • the data integration job development module 12200 may provide functional support for development features of a strongly separated graphical user interface.
  • the methods and systems described herein also include providing a module for developing a data integration job, providing a registry of services, providing one or more client interfaces 6410, service policies and/or interceptors 6412, and identifying the module in the registry, wherein the module can be accessed as a service in a services oriented architecture.
  • the module 6400 can be a data integration job deployment module 12300.
  • a data integration job deployment module 12300 may facilitate the deployment of data integration jobs, and address any implementation issues arising at run time.
  • the data integration job deployment module 12300 may deploy data integration jobs on a scheduled basis, or under control of a client of the module 12300.
  • the module 12300 may also suggest the scheduling of additional data integration jobs.
  • the data integration job deployment module 12300 may deploy multiple data integration jobs simultaneously across disparate data facilities 112.
  • the methods and systems described herein also include providing a module for deploying a data integration job, providing a registry of services, providing one or more client interfaces 6410, service policies and/or interceptors 6412, and identifying the module in the registry, wherein the module can be accessed as a service in a services oriented architecture.
  • modules, facilities, tools, jobs, services, processes and functions described herein may be accessed through various input and output facilities, including bindings and similar facilities, such as EJBs, JMS, web services, SOAP and other bindings.
  • the methods and systems described herein may include a client-side facility for optimizing access of a module, facility, job, service, process, function or the like by a client device.
  • the methods and systems described herein may include a server-side facility for optimizing access of a module, facility, job, service, process, function or the like by a client device.
  • the services in a services oriented architecture for a data integration platform or process may be services that are useful for a wide range of integration and computing tasks, including modules that perform functions that are required or beneficial for many common tasks.
  • a logging service 12400 may be deployed, such as for logging events.
  • a user who wishes to log events (for any reason related to any task, such as in connection with data integration job or task) may invoke the logging service by accessing it through a services registry in a services oriented architecture.
  • a programmer need not create a new logging service for logging events, but instead may invoke a pre-coded logging service through the services registry.
  • a monitoring service 12500 may be deployed as a service in a services oriented architecture.
  • the monitoring service 12500 maybe invoked by a user to monitor some aspect of the performance of a data integration job or task, or to monitor an event or process.
  • a monitoring service 12500 may allow for the generation of specific events and metrics, such as counters, averages and sums, for monitoring purposes.
  • a data integration system may have a service called a job execution service, the purpose of which is to run a job, such as a batch job.
  • a monitoring service 12500 a user can monitor how many times 53
  • each common service such as the monitoring service 12500 and the other services described in connection with Figs.
  • the monitoring service 12500 can be used by services in a services oriented architecture to monitor what the services do or may be used to conduct domain-specific monitoring for other events and conditions.
  • a security module 12600 or service may be deployed as a service in a services oriented architecture for providing a security capability, such as in connection with a data integration job or task.
  • a security facility such as password protection, encryption, tracking access, restricting access, or the like
  • the user can invoke a security module 12600 as a service in a services oriented architecture, so that the user does not have to create a separate security facility for each data integration job or task.
  • a licensing module 12700 may be deployed in a services oriented architecture, for enabling licensing functions when invoked by a user. For example, a job designer may cause a data integration job to invoke the licensing service to determine whether a particular task to be executed at runtime does or does not comply with license restrictions, such as license restrictions related to the number of machines, number of users, or the like. The user avoids the need to prepare separate licensing code for each data integration job or task the user creates.
  • a licensing module may be used in connection with an installation and/or provisioning service.
  • an event management module 12800 may be deployed in a services oriented architecture for tracking and managing events when invoked by a user through a services registry.
  • the user may access the event management module 12800 for any event management required for a data integration job or task, such as tracking events in order to determine when to execute a process or function.
  • An event management module 12800 may allow for event subscription by application and may incorporate a callback mechanism.
  • a provisioning module 12900 may be deployed in a services oriented architecture, allowing a user to enable provisioning functions by accessing the provisioning module 12900 through a services registry.
  • a provisioning module 12900 may allow for the provision of components to multiple machines, may maintain a history of the components and version installed on different machines, push or distribute software or patches, may trigger the installation of a security service, may assist with or allow for authorization and/or authentication, may maintain internal and external user directories and may assist with or allow for single sign-on functionality.
  • a transaction module 13000 may be deployed in a services oriented architecture that allows a user to access the transaction module 13000 through a services registry, avoiding the need to create separate transaction management code for each application created by the user, such as for a data integration job or task.
  • an auditing module 13100 can be deployed in a services oriented architecture that allows a user to access the auditing module 13100 through a services registry, avoiding the need to create separate auditing code for each application created by the user, such as for a data integration job or task.
  • the user can audit events, such as auditing what users have accessed a particular database or process, what events have taken place, and the like.
  • An auditing module 13100 can allow a user to conveniently audit past events without having to generate separate code.
  • AOP AOP oriented architecture
  • various metadata functions and modules can be implemented as services with AOP.
  • bindings for services such as EJBs (such as EJB 3.0) may use AOP.

Landscapes

  • Engineering & Computer Science (AREA)
  • Business, Economics & Management (AREA)
  • Human Resources & Organizations (AREA)
  • Economics (AREA)
  • Strategic Management (AREA)
  • Entrepreneurship & Innovation (AREA)
  • Operations Research (AREA)
  • Physics & Mathematics (AREA)
  • Educational Administration (AREA)
  • Marketing (AREA)
  • Development Economics (AREA)
  • Quality & Reliability (AREA)
  • Tourism & Hospitality (AREA)
  • Game Theory and Decision Science (AREA)
  • General Business, Economics & Management (AREA)
  • General Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Stored Programmes (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)
  • Computer And Data Communications (AREA)

Abstract

L'invention porte sur des méthodes et des systèmes de mise en place de services, tels que services de produits, services en temps réel et services généraux, dans une architecture orientée services, y compris pour des fonctions d'intégration de données d'entreprises.
EP05792780A 2004-08-31 2005-08-31 Architecture orientee services pour services d'integration de donnees Withdrawn EP1810131A4 (fr)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
US60637004P 2004-08-31 2004-08-31
PCT/US2005/030953 WO2006026659A2 (fr) 2004-08-31 2005-08-31 Architecture orientee services pour services d'integration de donnees

Publications (2)

Publication Number Publication Date
EP1810131A2 true EP1810131A2 (fr) 2007-07-25
EP1810131A4 EP1810131A4 (fr) 2011-05-11

Family

ID=36000707

Family Applications (1)

Application Number Title Priority Date Filing Date
EP05792780A Withdrawn EP1810131A4 (fr) 2004-08-31 2005-08-31 Architecture orientee services pour services d'integration de donnees

Country Status (3)

Country Link
EP (1) EP1810131A4 (fr)
CN (1) CN101048732A (fr)
WO (1) WO2006026659A2 (fr)

Families Citing this family (22)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8069439B2 (en) * 2006-03-30 2011-11-29 Microsoft Corporation Framework for modeling continuations in workflows
EP2393051A1 (fr) * 2010-06-01 2011-12-07 Alcatel Lucent Système pour aider un utilisateur à décider d'accepter ou pas un service ou une application proposés qui impliquent la communication de certaines données personnelles
EP2645244B1 (fr) * 2012-03-27 2019-09-11 Software AG Procédé et registre permettant l'application de règles établies lors de la conception pendant l'exécution dans une architecture orientée services
US9443229B2 (en) 2013-03-15 2016-09-13 Elemica, Inc. Supply chain message management and shipment constraint optimization
US8904528B2 (en) 2013-03-15 2014-12-02 Elemica, Inc. Method and apparatus for translation of business messages
US9224135B2 (en) 2013-03-15 2015-12-29 Elemica, Inc. Method and apparatus for adaptive configuration for translation of business messages
GB2514136A (en) * 2013-05-14 2014-11-19 Aims Innovation As Integration platform monitoring
GB2522832A (en) * 2013-10-10 2015-08-12 Ibm A method and a system for loading data with complex relationships
US10545917B2 (en) 2014-02-19 2020-01-28 Snowflake Inc. Multi-range and runtime pruning
US9842152B2 (en) 2014-02-19 2017-12-12 Snowflake Computing, Inc. Transparent discovery of semi-structured data schema
CN105094984A (zh) * 2014-11-25 2015-11-25 航天恒星科技有限公司 资源调度的方法及系统
CN105354238A (zh) * 2015-10-10 2016-02-24 成都博元时代软件有限公司 基于分布式的大数据挖掘方法
CN106027534A (zh) * 2016-05-26 2016-10-12 浪潮(苏州)金融技术服务有限公司 一种基于Netty实现金融报文处理系统
US20170345030A1 (en) * 2016-05-31 2017-11-30 b8ta, inc. Flash retailing
US10437780B2 (en) 2016-07-14 2019-10-08 Snowflake Inc. Data pruning based on metadata
US10261767B2 (en) * 2016-09-15 2019-04-16 Talend, Inc. Data integration job conversion
CN107122476A (zh) * 2017-05-02 2017-09-01 山东浪潮通软信息科技有限公司 一种网络隔离模式下公有数据的处理方法和装置
CN108052574A (zh) * 2017-12-08 2018-05-18 南京中新赛克科技有限责任公司 基于Kafka技术的从FTP服务器导入海量数据的ETL系统及实现方法
CN108363609B (zh) * 2018-02-07 2021-11-30 腾讯科技(深圳)有限公司 模拟传感器的方法、装置和存储介质
CN109656979B (zh) * 2018-12-24 2021-05-04 北京小米移动软件有限公司 数据统计分析方法、装置及存储介质
CN110795422B (zh) * 2019-09-12 2020-10-27 三盟科技股份有限公司 一种数据服务管理方法及系统
CN112905167B (zh) * 2021-03-11 2023-10-27 北京字节跳动网络技术有限公司 应用操作方法、装置和电子设备

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2004054295A1 (fr) * 2002-12-10 2004-06-24 International Business Machines Corporation Liaison dynamique de services mettant en place une commutation transparente de services d'informations possedant des regions de couverture definies
US6763353B2 (en) * 1998-12-07 2004-07-13 Vitria Technology, Inc. Real time business process analysis method and apparatus

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7343428B2 (en) * 2001-09-19 2008-03-11 International Business Machines Corporation Dynamic, real-time integration of software resources through services of a content framework

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6763353B2 (en) * 1998-12-07 2004-07-13 Vitria Technology, Inc. Real time business process analysis method and apparatus
WO2004054295A1 (fr) * 2002-12-10 2004-06-24 International Business Machines Corporation Liaison dynamique de services mettant en place une commutation transparente de services d'informations possedant des regions de couverture definies

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
See also references of WO2006026659A2 *

Also Published As

Publication number Publication date
EP1810131A4 (fr) 2011-05-11
CN101048732A (zh) 2007-10-03
WO2006026659A3 (fr) 2006-07-06
WO2006026659A2 (fr) 2006-03-09

Similar Documents

Publication Publication Date Title
EP1810131A2 (fr) Architecture orientee services pour services d'integration de donnees
US8060553B2 (en) Service oriented architecture for a transformation function in a data integration platform
US7814470B2 (en) Multiple service bindings for a real time data integration service
US7814142B2 (en) User interface service for a services oriented architecture in a data integration platform
US8041760B2 (en) Service oriented architecture for a loading function in a data integration platform
US20220253298A1 (en) Systems and methods for transformation of reporting schema
US7761406B2 (en) Regenerating data integration functions for transfer from a data integration platform
US20050262193A1 (en) Logging service for a services oriented architecture in a data integration platform
US20050223109A1 (en) Data integration through a services oriented architecture
US20050240592A1 (en) Real time data integration for supply chain management
US20050228808A1 (en) Real time data integration services for health care information data integration
US20060069717A1 (en) Security service for a services oriented architecture in a data integration platform
US20050232046A1 (en) Location-based real time data integration services
US20050235274A1 (en) Real time data integration for inventory management
US20050262189A1 (en) Server-side application programming interface for a real time data integration service
US20050262190A1 (en) Client side interface for real time data integration jobs
US20050222931A1 (en) Real time data integration services for financial information data integration
US20050234969A1 (en) Services oriented architecture for handling metadata in a data integration platform
US20050240354A1 (en) Service oriented architecture for an extract function in a data integration platform
US20060010195A1 (en) Service oriented architecture for a message broker in a data integration platform
US7313575B2 (en) Data services handler
US8489474B2 (en) Systems and/or methods for managing transformations in enterprise application integration and/or business processing management environments
US8954375B2 (en) Method and system for developing data integration applications with reusable semantic types to represent and process application data
US20050251533A1 (en) Migrating data integration processes through use of externalized metadata representations
US20050243604A1 (en) Migrating integration processes among data integration platforms

Legal Events

Date Code Title Description
PUAI Public reference made under article 153(3) epc to a published international application that has entered the european phase

Free format text: ORIGINAL CODE: 0009012

17P Request for examination filed

Effective date: 20070322

AK Designated contracting states

Kind code of ref document: A2

Designated state(s): AT BE BG CH CY CZ DE DK EE ES FI FR GB GR HU IE IS IT LI LT LU LV MC NL PL PT RO SE SI SK TR

DAX Request for extension of the european patent (deleted)
RAP1 Party data changed (applicant data changed or rights of an application transferred)

Owner name: INTERNATIONAL BUSINESS MACHINES CORPORATION

A4 Supplementary search report drawn up and despatched

Effective date: 20110411

STAA Information on the status of an ep patent application or granted ep patent

Free format text: STATUS: THE APPLICATION HAS BEEN WITHDRAWN

18W Application withdrawn

Effective date: 20110706