US6850947B1 - Method and apparatus with data partitioning and parallel processing for transporting data for data warehousing applications - Google Patents

Method and apparatus with data partitioning and parallel processing for transporting data for data warehousing applications Download PDF

Info

Publication number
US6850947B1
US6850947B1 US09637335 US63733500A US6850947B1 US 6850947 B1 US6850947 B1 US 6850947B1 US 09637335 US09637335 US 09637335 US 63733500 A US63733500 A US 63733500A US 6850947 B1 US6850947 B1 US 6850947B1
Authority
US
Grant status
Grant
Patent type
Prior art keywords
data
transformation
pipelines
present
computer
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active, expires
Application number
US09637335
Inventor
Dwayne Chung
Gautam H. Mudunuri
Fayyaz Younas
Lillian S. Lim
Renjie Tang
Steve Carlin
Subramanya Madapura
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
INFORMATICA LLC
Original Assignee
INFORMATICA CORP
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Grant date

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F17/00Digital computing or data processing equipment or methods, specially adapted for specific functions
    • G06F17/30Information retrieval; Database structures therefor ; File system structures therefor
    • G06F17/30286Information retrieval; Database structures therefor ; File system structures therefor in structured data stores
    • G06F17/30557Details of integrating or interfacing systems involving at least one database management system
    • G06F17/30563Details for extract, transform and load [ETL] procedures, e.g. ETL data flows in data warehouses
    • GPHYSICS
    • G06COMPUTING; CALCULATING; COUNTING
    • G06QDATA PROCESSING SYSTEMS OR METHODS, SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL, SUPERVISORY OR FORECASTING PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL, SUPERVISORY OR FORECASTING PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q10/00Administration; Management
    • G06Q10/10Office automation, e.g. computer aided management of electronic mail or groupware; Time management, e.g. calendars, reminders, meetings or time accounting
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y10TECHNICAL SUBJECTS COVERED BY FORMER USPC
    • Y10STECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y10S707/00Data processing: database and file management or data structures
    • Y10S707/99941Database schema or data structure
    • Y10S707/99942Manipulating data structure, e.g. compression, compaction, compilation

Abstract

A method and apparatus for transporting data for a data warehouse application is described. The data from an operational data store (the source database) is organized in non-overlapping data partitions. Separate execution threads read the data from the operational data store concurrently. This is followed by concurrent transformation of the data in multiple execution threads. Finally, the data is loaded into the target data warehouse concurrently using multiple execution threads. By using multiple execution threads, the data contention is reduced. Thereby the apparatus and method of the present invention achieves increased throughput.

Description

FIELD OF THE INVENTION

The present invention relates to database systems. More particularly, the present invention pertains to an apparatus and method for transporting data for a data warehousing application that increases throughput.

BACKGROUND OF THE INVENTION

Due to the increased amounts of data being stored and processed today, operational databases are constructed, categorized, and formatted in a manner conducive for maximum throughput, access time, and storage capacity. Unfortunately, the raw data found in these operational databases often exist as rows and columns of numbers and code which appears bewildering and incomprehensible to business analysts and decision makers. Furthermore, the scope and vastness of the raw data stored in modem databases renders it harder to analyze. Hence, applications were developed in an effort to help interpret, analyze, and compile the data so that a business analyst may readily and easily understand it. This is accomplished by mapping, sorting, and summarizing the raw data before it is presented for display. Thereby, individuals can now interpret the data and make key decisions based thereon.

Extracting raw data from one or more operational databases and transforming it into useful information is the function of data “warehouses” and data “marts.” In data warehouses and data marts, the data is structured to satisfy decision support roles rather than operational needs. Before the data is loaded into the target data warehouse or data mart, the corresponding source data from an operational database is filtered to remove extraneous and erroneous records; cryptic and conflicting codes are resolved; raw data is translated into something more meaningful; and summary data that is useful for decision support, trend analysis or other end-user needs is pre-calculated. In the end, the data warehouse is comprised of an analytical database containing data useful for decision support. A data mart is similar to a data warehouse, except that it contains a subset of corporate data for a single aspect of business, such as finance, sales, inventory, or human resources. With data warehouses and data marts, useful information is retained at the disposal of the decision-makers.

One major difficulty associated with implementing data warehouses and data marts is that a significant amount of processing time is required for performing data transport operations. Because transport processes (data extraction, transformation, and loading) consume a significant amount of system resources, unless transport processes are scheduled to occur during specific time windows during which the operational databases are processing the minimum amount of transactional data, the performance of the operational databases are seriously compromised. In recent data warehouse implementations, because the process of data transport slows down the operational databases, some organizations leave a small nightly window for the data transport process, such as from one to two in the morning.

Because of increasing demands for after-hours database usage and expanded operational hours, there is a need to further increase the throughput of the data transport process in order to assure that the data transport operation does not interfere with the operation of the operational database. Furthermore, in keeping with the proliferation of data mining software applications that capture the rich data patterns hidden inside the data warehouses, some organizations might even require hourly refreshes. Thus, the approaches for non-invasive data transport now focus on increasing the throughput of data transporting process, whereby the whole data transport process can be completed within the narrow time windows allowed. In other words, the pursuit of optimizing throughput (i.e., speed) has begun.

To improve throughput, recent data warehouse application programs that perform data transport functions have relied on the use of multiple fast microprocessors. However, these recent data warehouse application programs use a single pipeline that includes multiple dependent process threads for performing extraction, transformation and loading operations. The use of multiple processors gives significantly improved processing speed and a corresponding increase in throughput. However, these prior art applications do not fully take advantage of the capabilities of the multiple processor environment. For example, delays in read operations slow down the entire process. Furthermore, because of the interdependencies between process threads within the single pipeline, delays affecting one microprocessor are propagated to all of the other processors, resulting in further delays. Thus, in spited of the use of increasingly powerful computers and the use of multiple microprocessors, data transport operations still consume an excessive amount of processing resources and processing time.

What is needed is a method and apparatus for transporting data for data warehousing applications that increases throughput. In addition, a method and apparatus is required that meets the above need and that takes full advantage of the use of a multiple processor environment. The present invention provides a method and apparatus that meets the above needs.

SUMMARY OF THE INVENTION

The present invention includes a method and apparatus for transporting data for a data warehousing application. More particularly, the present invention introduces a data transport process architecture that uses multiple partitions of source data and multiple pipelines for achieving improved throughput for extraction, transformation, and loading in a data warehousing application.

Source databases are operational databases from which raw data is to be extracted. In one embodiment of the present invention, a transformation server that is coupled to a source database organizes data from a source database into multiple non-overlapping partitions of source data. In the present embodiment, the partitions are user-defined.

Multiple pipelines are then constructed that include transformation components for manipulating the partitioned data. In the present embodiment, transformation components are constructed and are coupled together to form multiple pipelines. The pipeline structure is organized to minimize data sharing between pipelines and to provide a maximum amount of parallel processing.

Target databases are data warehouses and/or data marts into which transformed data is loaded. In the present embodiment, one or more target database is specified for storing the data generated by the pipelines.

Data transport operations extract data from the source database, transform the data, and load the transformed data into the target database. The term “data transport” and “data transportation” as used herein includes data extraction, transformation, and loadinlg. In the present embodiment, tasks are executed in parallel through the pipelines to extract, transform, and load data.

In the present embodiment, execution threads or processes read the data from the operational data store concurrently, followed by concurrent transformation of the data in multiple execution threads, and concurrent loading of data into the target data warehouse using multiple execution threads or processes.

The use of data partitions gives non-overlapping, independent sets of data from which either a single or multiple pipeline(s) can process data. This allows for definition of pipelines to minimize data sharing between pipelines. By minimizing data sharing between pipelines, independent processing of data is possible, preventing delays due to dependent operations such as, for example, the concurrent read operations of prior art systems.

Each pipeline of transformations constitutes an independent unit for which the transformation server can dedicate one or more threads or processes. Thereby, a computer having multiple microprocessors can realize its full potential of parallelism in optimizing extraction, transformation, and loading throughput. Furthermore, because the data partitioning structure is user-defined, the user can customize the extent of parallelism desired, thus taking full advantage of the parallel processing capabilities of the underlying computer hardware system.

Thereby, the method and apparatus of the present invention provides increased throughput for data transport operations for data warehouse applications. In addition, the method and apparatus of the present invention takes full advantage of the multiple processor environment of a parallel hardware platform, thus optimizing throughput.

BRIEF DESCRIPTION OF THE DRAWINGS

The accompanying drawings, which are incorporated in and form a part of this specification, illustrate embodiments of the present invention and, together with the description, serve to explain the principles of the invention.

FIG. 1 illustrates an exemplary computer system used as part of a data warehousing system in accordance with one embodiment of the present invention.

FIG. 2 illustrates an exemplary architecture that includes a transformation engine server in accordance with one embodiment of the present invention.

FIG. 3 illustrates the process flow of a data extraction, transformation, and loading process in accordance with one embodiment of the present invention.

FIG. 4 illustrates a method for transporting data in a data warehousing application in accordance with one embodiment of the present invention.

FIG. 5 illustrates an exemplary structure that includes multiple data partitions and multiple independent pipelines of transformation components for transport of data from a source database to a target database in accordance with one embodiment of the present invention.

FIG. 6 shows an exemplary pipeline having two transformation components in accordance with one embodiment of the present invention.

DETAILED DESCRIPTION

An apparatus and method for transporting data for a data warehousing application is described. In the following description, for purposes of explanation, numerous specific details are set forth in order to provide a thorough understanding of the present invention. It will be obvious, however, to one skilled in the art that the present invention may be practiced without these specific details. In other instances, well-known structures and devices are shown in block diagram form in order to avoid obscuring the present invention.

Some portions of the detailed descriptions that follow are presented in terms of procedures, logic blocks, processing, and other symbolic representations of operations on data bits within a computer memory. These descriptions and representations are the means used by those skilled in the data processing arts to most effectively convey the substance of their work to others skilled in the art. In the present application, a procedure, logic block, process, etc., is conceived to be a self-consistent sequence of steps or instructions leading to a desired result. The steps are those requiring physical manipulations of physical quantities. Usually, though not necessarily, these quantities take the form of electrical or magnetic signals capable of being stored, transferred, combined, compared, and otherwise manipulated in a computer system. It has proven convenient at times, principally for reasons of common usage, to refer to these signals as bits, values, elements, symbols, characters, terms, numbers, or the like.

It should be borne in mind, however, that all of these and similar terms are to be associated with the appropriate physical quantities and are merely convenient labels applied to these quantities. Unless specifically stated otherwise as apparent from the following discussions, it is appreciated that throughout the present invention, discussions utilizing terms such as “designating”, “partitioning”, “constructing”, “specifying”, “receiving” or the like, can refer to the actions and processes of a computer system, or similar electronic computing device. The computer system or similar electronic computing device manipulates and transforms data represented as physical (electronic) quantities within the computer system's registers and memories into other data similarly represented as physical quantities within the computer system memories or registers or other such information storage, transmission, or display devices.

With reference to FIG. 1, portions of the present invention are comprised of the computer-readable and computer executable instructions which reside, for example, in computer system 10 used as a part of a data warehousing system in accordance with one embodiment of the present invention. It is appreciated that system 10 of FIG. 1 is exemplary only and that the present invention can operate within a number of different computer systems including general-purpose computer systems, embedded computer systems, and stand-alone computer systems specially adapted for data warehousing applications. Computer system 10 includes an address/data bus 12 for conveying digital information between the various components, a central processor unit (CPU) 14 for processing the digital information and instructions, a main memory 16 comprised of volatile random access memory (RAM) for storing the digital information and instructions, a non-volatile read only memory (ROM) 18 for storing information and instructions of a more permanent nature. In addition, computer system 10 may also include a data storage unit 20 (e.g., a magnetic, optical, floppy, or tape drive) for storing vast amounts of data, and an I/O interface 22 for interfacing with peripheral devices (e.g., computer network, modem, mass storage devices, etc.). It should be noted that the software program for performing the transport process can be stored either in main memory 16, data storage unit 20, or in an external storage device. Devices which may be coupled to computer system 10 include a display device 28 for displaying information to a computer user, an alphanumeric input device 30 (e.g., a keyboard), and a cursor control device 26 (e.g., mouse, trackball, light pen, etc.) for inputting data, selections, updates, etc.

Furthermore, computer system 10 may be coupled in a network, such as in a client/server environment, whereby a number of clients (e.g., personal computers, workstations, portable computers, minicomputers, terminals, etc.), are used to run processes for performing desired tasks (e.g., inventory control, payroll, billing, etc.).

FIG. 2 illustrates an exemplary computer network upon which an embodiment of the present invention may be practiced. Operational databases 210, 220, and 230 store data resulting from business and financial transactions, and/or from equipment performance logs. These databases can be any of the conventional RDMS systems (such as from Oracle, Informix, Sybase, Microsoft, etc.) that reside within a high capacity mass storage device (such as hard disk drives, optical drives, tape drives, etc.). Databases 250 and 260 are the data warehouses or data marts that are the targets of the data transportation process.

Data integration engine 270 is a functional element, that can be implemented in software and/or hardware for performing data transport operations. In the present embodiment, data integration engine 270 is a software program, operable on transformation engine server 240, that performs data transport operations. That is, in the present embodiment, data from databases 210, 220, and 230 is extracted, transformed, and loaded by transformation engine server 240 into databases 250 and 260. In the present embodiment, transformation engine server 240 includes one or more microprocessor on which an operating program such as, for example, Windows, NT, UNIX, or other operating program runs.

FIG. 3 shows a representation of a process flow 300 in accordance with one embodiment of the present invention. Referring now to block 301, a partitioning process is performed on data from source database 310 to form data partitions 304. Transformation processes, shown generally as transformation process 305, perform transformations on the data and couples the output to data marts 306-309. Though the process illustrated in FIG. 3 is shown to utilize a single source database 310, alternatively, data from multiple source databases could be processed using partitioning process 301 to obtain data partitions 304. Similarly, transformation process 305 could couple data to a single data mart or to more or fewer data marts.

FIG. 4 shows a method for transporting data in a data warehousing application in accordance with the present invention. First, as shown by step 401 data is partitioned. That is, data is extracted from the source database and is partitioned into data partitions. In the present embodiment, data partitions are non-overlapping. That is, data that is included within a particular data partition is not duplicated in any other data partition. In one embodiment, partitioned data is then stored locally such that the data partitions are readily available. Alternatively, data partitions are only stored to the extent necessary for the staging of data to accomplish the desired transformations on the source data.

Continuing with step 401 of FIG. 4, in the present embodiment, data partitions are user defined. In one embodiment, the user is prompted, by means of a graphical user interface selection mechanism to define data partitions. The received user input is then used to structure the extracted source data into partitions. Though any of a number of methods can be used to define data partitions, in one embodiment of the present invention, data partitions are defined by evenly dividing data into a user-defined number of data partitions. The present embodiment also allows for data partitioning using key ranges, referred to hereinafter as “key range partitioning” that partitions data based on key ranges within a source data base (e.g. the key ranges of Oracle and Sybase databases).

In the present embodiment, data is partitioned such that there is affinity of data within each partition. That is, related data is grouped together. For example all data for a particular geographic region (e.g., by continent, nationality, state, region, city, etc.) could be partitioned into a data partition. Alternately, data can be related by broad product categories, product categories, functional units of an organization, geographic units of an organization, etc.

Pipelines for performing data transformations and loading are then constructed as shown by step 402. In the present embodiment pipelines are constructed so as to minimize data sharing. That is, many pipelines are used, with parallel pipelines performing identical or different processes such that a maximum number of pipelines do not share data at all with other pipelines, and such that the number of pipelines that share any particular data are minimized.

Continuing with step 402 of FIG. 4, in the present embodiment, the pipeline structure is formed using a distributed architecture that packages code such that the responsibility is distribute to smaller units (i.e., components) of code. Each one of these software components is responsible for one specific type of transformation. Transformation components can be provided by the developer (e.g., from a monolithic transformation application) or can be user-developed. These transformation components form a base of ready-made elements that are combined to build functionally more sophisticated transformations in the data transportation process.

The transformation components are then coupled together to form the pipeline structure. In the present embodiment, transformation components are combined so as to form multiple pipelines that perform operations on data from data partitions to generate output. Further information regarding the use of coupled transformation components to form pipelines is described in U.S. patent application Ser. No. 09/16,422, titled “METHOD AND ARCHITECTURE FOR AUTOMATED OPTIMIZATION OF ETL THROUGHPUT IN DATA WAREHOUSING APPLICATIONS,” which is incorporated herein by reference.

In the present embodiment, step 402 is performed manually by a user. That is, the user manually chooses the transformation components necessary to accomplish the desired data transportation process. The user then manually arranges the selected transformation components to form parallel pipelines. The transformation server then automatically selects the independent tasks to be executed in parallel based on the selected transformation components of each pipeline.

Though the present embodiment illustrates the constructs of pipelines using transformation components that are coupled together, the present invention is well adapted for construction of pipelines using other mechanisms. In one alternate embodiment, pipelines are constructed using a single block of source code that is responsible for all phases of the extraction, transformation and loading processes and that generates pipelines for performing data transformations according to a specific, rigid set of rules.

Referring back to FIG. 4, tasks are then executed in parallel through the pipelines as shown by step 403. The data generated by the pipelines is then stored in a target database as shown by step 404.

FIG. 5 shows an exemplary structure 500 that is formed according to method 400 of FIG. 4. Data from source database 501 is partitioned (step 401 of FIG. 4) into data partitions 510-514. Pipelines 520-524 are then constructed (steps 402 of FIG. 4) for performing operations on data contained within data partitions 510-514.

Continuing with FIG. 5, tasks are executed in parallel through pipelines 520-524 (step 403 of FIG. 4) and the results are stored(step 404 of FIG. 4) in target database 504.

Multiple execution threads within ones of pipelines 520-524 perform data transformation functions concurrently. In the present embodiment, execution threads read the data from source database 500 concurrently, followed by concurrent transformation of the data in multiple execution threads, and concurrent loading of data into target database 504 using multiple execution threads. Thereby data contention is reduced and throughput is increased.

FIG. 6 shows an exemplary pipeline 600 that includes, for example, a source table 620, an expression transformation 622, an aggregation transformation 624, and a target table 626. In the present embodiment, source table 620 is a table of partitioned data (e.g., within one of partitions 510-514 of FIG. 5). The expression transformation 622 performs a calculation based on values within a single record from source table 620 (e.g., based on the price and quantity of a particular item, one can calculate the total purchase price for than line item in an order). Next, the aggregate transformation 624 is used to perform an aggregate calculation based on all records passed through the transformation (e.g., one can find the total number and average salary of all employees in a particular office using this transformation). The result is then stored as a record in a target table 626. Target table 626 is a table within a target database (e.g., target database 504 of FIG. 5).

Each transformation component obtains data from one or more of the data partitions and can implement staging (storing) the incoming data fields as it processes these data fields. In the present embodiment, the degree of requisite staging by each transformation component is automatically determined and implemented, without any human intervention. Depending on the nature of the transformation, each transformation component will automatically select the optimal amount of staging. The staging can range continuously from zero staging (also know as streaming) to full staging. A transformation with zero staging is called streaming transformation.

In the currently preferred embodiment, there are thirteen different transformation components: source, target, expression, aggregation, filter, rank, update strategy, sequence, joiner, lookup, stored procedure, external procedure, and normalizer. The source transformation contains tables, views, synonyms, or flat files that provide data for the data mart/data warehouse. The target transformation maintains database objects or files that receive data from other transformations. These targets then make the data available to data mart users for decision support. Expression transformations calculate a single result, using values from one or more ports. The aggregation transformation calculates an aggregate value, such as a sum or average, using the entire range of data within a port or within a particular group. Filter transformations filter (selects) records based on a condition the user has set in the expression. The rank transformation filters the top or bottom range of records, based on a condition set by the user. The update strategy transformation assigns a numeric code to each record indicating whether the server should use the information in the record to insert, delete, or update the target. The sequence generator transformation generates unique ID numbers. The joiner transformation joins records from different databases or file systems. The lookup transformation looks up values. The stored procedure transformation calls a stored procedure. The external procedure transformation calls a procedure in a shared library or in the COM layer of Windows NT. And the normalizer transformation normalizes records, including those read from virtual storage access method (VSAM) sources. In the currently preferred embodiment, the source, target, aggregation, rank, and joiner transformations are all staged transformations. The data generated by these transformations are automatically staged by the software, without human interaction. The expression, filter, update strategy, sequence, lookup, stored procedure, external procedure, and normalizer transformations are all streamed transformations. Other new types of transformations can also be added to this list.

Some transformations require routing to enforce data affinity. For example, aggregation and rank transformations require routing. For these transformations an internal router is created that routes data to a pipeline based on the grouping for the transformation. In one embodiment, individual process threads are assigned to both aggregation and rank transformations. Thus, aggregate transformations run in their own thread boundary and rank transformations run in their own separate thread boundary. This allows for independent processing of aggregation and rank transformations and eliminates the need to re-combine data as is required in prior art processes for performing aggregation and rank transformations, resulting in further improvements in throughput.

Some of the more complex transformations require special processing. For example, joiner and lookup transformations require both synchronization and routing to a cache. In the present embodiment, a cache is created during the partitioning process for both joiner and lookup transformations. In one embodiment, the cache is created on the local memory of the transformation engine server. In the case of a look-up transformation, synchronization is performed and the cache is built serially as data is read into each data partition. Memory within the cache is allocated to each partition. This maintains the non-overlapping structure for data partitions. Processing efficiency is obtained by creating the cache once and using the data in the cache across all data partitions.

In the present embodiment, a deadlock retry function is implemented for each failed transaction. The deadlock retry function reinitiates execution of the tasks that are to be executed within a particular pipeline when an initial failure is encountered. By reinitiating execution of the tasks to be executed within a particular pipeline, failures that result from database deadlocks are avoided.

Each partition runs within its own thread boundary with minimal amounts of shared data between process threads. This yields increased throughput. In addition, because each process thread is assigned to operations that are independent of operations performed by other execution threads, errors due to timing delays, delays in read and write operations, routing errors, and errors resulting from dependent operations are avoided.

Because execution threads read the data from the operational data store concurrently, followed by concurrent transformation of the data in multiple execution threads, and concurrent loading of data into the target data warehouse using multiple execution threads, data transformation time is reduced. Thereby the apparatus and method of the present invention achieves increased throughput.

The foregoing descriptions of specific embodiments of the present invention have been presented for purposes of illustration and description. They are not intended to be exhaustive or to limit the invention to the precise forms disclosed, and obviously many modifications and variations are possible in light of the above teaching. The embodiments were chosen and described in order to best explain the principles of the invention and its practical application, to thereby enable others skilled in the art to best utilize the invention and various embodiments with various modification as are suited to the particular use contemplated. It is intended the scope of the invention be defined by the Claims appended hereto and their equivalents.

Claims (22)

1. A computer implemented method for transporting data in a data warehousing application, comprising the steps of:
specifying at least one source containing data wherein at least some portion of said data is to be transported;
partitioning formerly un-partitioned data from said source containing data so as to form a plurality of non overlapping data portions;
constructing a plurality of pipelines that include transformation components for manipulating data in said data partitions; and
specifying a target for storing data generated by one or more pipelines.
2. The computer implemented method of claim 1 wherein partitioning formerly un-partitioned data further comprises the steps of:
receiving user input that indicates desired data partitioning; and
partitioning data from said source containing data so as to form a plurality of data partitions conforming to said user input.
3. The computer implemented method of claim 2 wherein said data is partitioned by dividing said data evenly into a user-selected number of data partitions.
4. The computer implemented method of claim 1 wherein constructing a plurality of pipelines further comprises the steps of:
constructing a plurality of transformation components for manipulating data in said data partitions; and
coupling the transformation components to form a plurality of pipelines.
5. The computer implemented method of claim 4 wherein coupling of transformation components to form a plurality of pipelines is performed such that at least some of said plurality of pipelines operate using data from a single data partition.
6. The computer implemented method of claim 4 wherein coupling of transformation components to form a plurality of pipelines allows for multiple pipelines to access each of said data partitions.
7. The computer implemented method of claim 1 further comprising the step of:
executing a plurality of tasks in parallel through said plurality of pipelines.
8. The computer-implemented method of claim 6 wherein said data is partitioned such that there is affinity for data within each data partition.
9. The computer-implemented method of claim 6 wherein at least some of said pipelines are independent execution threads.
10. A computer readable medium having stored therein instructions for causing a computer to implement a method for transporting data in a data warehousing application, said method comprising the steps of:
specifying at least one source containing data wherein at least some portion of said data is to be transported;
partitioning formerly un-partitioned data from said source containing data so as to form a plurality of non overlapping data portions;
constructing a plurality of pipelines that include transformation components for manipulating data in said data partitions; and
specifying a target for storing data generated by one or more pipelines.
11. A computer readable medium as described in claim 10 wherein partitioning formerly un-partitioned data further comprises:
receiving user input that indicates desired data partitioning; and
partitioning data from said source containing data so as to form a plurality of data partitions conforming to said user input.
12. A computer readable medium as recited in claim 10 wherein said data is partitioned by dividing said data evenly into a user-selected number of data partitions.
13. A computer readable medium as recited in claim 10 wherein constructing a plurality of pipelines further comprises the steps of:
constructing a plurality of transformation components for manipulating data in said data partitions; and
coupling the transformation components to form a plurality of pipelines.
14. A computer readable medium as recited in claim 13 wherein data is partitioned such that there is affinity for data contained within each data partition.
15. A computer readable medium as recited in claim 13 wherein coupling the transformation components to form a plurality of pipelines allows for coupling of said transformation components such that multiple pipelines can access each of said data partitions.
16. A computer readable medium as recited in claim 10 wherein said method further comprises the step of:
executing a plurality of tasks in parallel through said plurality of pipelines.
17. A computer readable medium as recited in claim 13 wherein said pipelines are formed so as to minimize data sharing between pipelines.
18. The computer-readable medium of claim 13, wherein said transformation components include a source transformation component, a target transformation component, an aggregation transformation component, a rank transformation component, and a joiner transformation component that stage data.
19. The computer-readable medium of claim 13, wherein said transformation components include an expression transformation component, a filter transformation component, an update strategy transformation component, a sequence transformation component, a lookup transformation component, a stored procedure transformation component, an external procedure transformation component, and a normalizer transformation component for streaming data.
20. A method for transporting data in a data warehousing application comprising:
partitioning formerly un-partitioned data from said source containing data to form a plurality of overlapping data partitions;
storing at least some of said partitioned data;
constructing a plurality of pipelines that include transformation components for manipulating data in said data partitions;
coupling said transformation components to form a plurality of parallel pipelines; and
executing a plurality of tasks in parallel through said plurality of pipelines; and
storing said data generated by one or more of said pipelines in a target database.
21. The method of claim 20 wherein said transformation components are coupled such that said plurality of pipelines correspond to said plurality of data partitions, thereby reducing the amount of sharing of data between individual pipelines.
22. The computer implemented method of claim 21 wherein data affinity factors are used in constructing said plurality of transformation components.
US09637335 2000-08-10 2000-08-10 Method and apparatus with data partitioning and parallel processing for transporting data for data warehousing applications Active 2021-05-31 US6850947B1 (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
US09637335 US6850947B1 (en) 2000-08-10 2000-08-10 Method and apparatus with data partitioning and parallel processing for transporting data for data warehousing applications

Applications Claiming Priority (4)

Application Number Priority Date Filing Date Title
US09637335 US6850947B1 (en) 2000-08-10 2000-08-10 Method and apparatus with data partitioning and parallel processing for transporting data for data warehousing applications
CA 2418859 CA2418859C (en) 2000-08-10 2001-08-10 Method and apparatus relating to data transport
EP20010962109 EP1322920A2 (en) 2000-08-10 2001-08-10 Method and apparatus relating to data transport
PCT/US2001/025236 WO2002012839A3 (en) 2000-08-10 2001-08-10 Method and apparatus relating to data transport

Publications (1)

Publication Number Publication Date
US6850947B1 true US6850947B1 (en) 2005-02-01

Family

ID=24555490

Family Applications (1)

Application Number Title Priority Date Filing Date
US09637335 Active 2021-05-31 US6850947B1 (en) 2000-08-10 2000-08-10 Method and apparatus with data partitioning and parallel processing for transporting data for data warehousing applications

Country Status (4)

Country Link
US (1) US6850947B1 (en)
EP (1) EP1322920A2 (en)
CA (1) CA2418859C (en)
WO (1) WO2002012839A3 (en)

Cited By (49)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20050172293A1 (en) * 2004-01-27 2005-08-04 Network Appliance, Inc. Method and apparatus for allocating resources in a shared resource processor
US20050278152A1 (en) * 2004-05-24 2005-12-15 Blaszczak Michael A Systems and methods for distributing a workplan for data flow execution based on an arbitrary graph describing the desired data flow
US7051334B1 (en) * 2001-04-27 2006-05-23 Sprint Communications Company L.P. Distributed extract, transfer, and load (ETL) computer method
US20070088756A1 (en) * 2005-10-19 2007-04-19 Bruun Peter M Data consolidation
US20070094529A1 (en) * 2005-10-20 2007-04-26 Lango Jason A Method and apparatus for increasing throughput in a storage server
US20070106778A1 (en) * 2005-10-27 2007-05-10 Zeldin Paul E Information and status and statistics messaging method and system for inter-process communication
US20070130144A1 (en) * 2005-11-30 2007-06-07 International Business Machines Corporation Method and system for concurrent processing of list items
US20070174829A1 (en) * 2005-07-15 2007-07-26 Erik Brockmeyer Method for mapping applications on a multiprocessor platform/system
US20070174852A1 (en) * 2003-09-12 2007-07-26 Smirnov Dmitry M Application interface including dynamic transform definitions
US7299216B1 (en) * 2002-10-08 2007-11-20 Taiwan Semiconductor Manufacturing Company, Ltd. Method and apparatus for supervising extraction/transformation/loading processes within a database system
US7321939B1 (en) 2003-06-27 2008-01-22 Embarq Holdings Company Llc Enhanced distributed extract, transform and load (ETL) computer method
EP1909198A2 (en) * 2006-10-04 2008-04-09 Sap Ag Semantical partitioning of data
US7376768B1 (en) * 2003-12-19 2008-05-20 Sonic Solutions, Inc. Dynamic memory allocation for multiple targets
US20080140602A1 (en) * 2006-12-11 2008-06-12 International Business Machines Corporation Using a data mining algorithm to discover data rules
US20090006283A1 (en) * 2007-06-27 2009-01-01 International Business Machines Corporation Using a data mining algorithm to generate format rules used to validate data sets
US20090006282A1 (en) * 2007-06-27 2009-01-01 International Business Machines Corporation Using a data mining algorithm to generate rules used to validate a selected region of a predicted column
US20090024551A1 (en) * 2007-07-17 2009-01-22 International Business Machines Corporation Managing validation models and rules to apply to data sets
US20090063504A1 (en) * 2007-08-29 2009-03-05 Richard Banister Bi-directional replication between web services and relational databases
US20090210433A1 (en) * 2008-02-19 2009-08-20 Horst Werner Parallelizing Data Manipulation by Data Set Abstraction
US20090299958A1 (en) * 2008-06-02 2009-12-03 Microsoft Corporation Reordering of data elements in a data parallel system
US20090327208A1 (en) * 2008-06-30 2009-12-31 International Business Machines Corporation Discovering transformations applied to a source table to generate a target table
EP2157518A1 (en) * 2008-08-19 2010-02-24 Tieto Oyj Method for delivering data
US20100082954A1 (en) * 2008-09-30 2010-04-01 International Business Machines Corporation Configuration rule prototyping tool
US20100082706A1 (en) * 2008-09-30 2010-04-01 International Business Machines Corporation Configurable transformation macro
US20100316286A1 (en) * 2009-06-16 2010-12-16 University-Industry Cooperation Group Of Kyung Hee University Media data customization
US20110113031A1 (en) * 2009-11-12 2011-05-12 Oracle International Corporation Continuous aggregation on a data grid
US8131612B1 (en) * 2005-05-09 2012-03-06 Genesis Financial Products, Inc. Program generator for hedging the guaranteed benefits of a set of variable annuity contracts
US20120066207A1 (en) * 2009-05-19 2012-03-15 Ntt Docomo, Inc. Data combination system and data combination method
US8442999B2 (en) 2003-09-10 2013-05-14 International Business Machines Corporation Semantic discovery and mapping between data sources
US20130139166A1 (en) * 2011-11-24 2013-05-30 Alibaba Group Holding Limited Distributed data stream processing method and system
US8533661B2 (en) 2007-04-27 2013-09-10 Dell Products, Lp System and method for automated on-demand creation of a customized software application
US8589207B1 (en) 2012-05-15 2013-11-19 Dell Products, Lp System and method for determining and visually predicting at-risk integrated processes based on age and activity
US8627331B1 (en) 2010-04-30 2014-01-07 Netapp, Inc. Multi-level parallelism of process execution in a mutual exclusion domain of a processing system
US8782103B2 (en) 2012-04-13 2014-07-15 Dell Products, Lp Monitoring system for optimizing integrated business processes to work flow
US8805716B2 (en) 2012-03-19 2014-08-12 Dell Products, Lp Dashboard system and method for identifying and monitoring process errors and throughput of integration software
US8930303B2 (en) 2012-03-30 2015-01-06 International Business Machines Corporation Discovering pivot type relationships between database objects
US8943076B2 (en) 2012-02-06 2015-01-27 Dell Products, Lp System to automate mapping of variables between business process applications and method therefor
US9015106B2 (en) 2012-04-30 2015-04-21 Dell Products, Lp Cloud based master data management system and method therefor
US9069898B2 (en) 2012-05-31 2015-06-30 Dell Products, Lp System for providing regression testing of an integrated process development system and method therefor
US9092244B2 (en) 2012-06-07 2015-07-28 Dell Products, Lp System for developing custom data transformations for system integration application programs
US9158782B2 (en) 2012-04-30 2015-10-13 Dell Products, Lp Cloud based master data management system with configuration advisor and method therefore
US9183074B2 (en) 2013-06-21 2015-11-10 Dell Products, Lp Integration process management console with error resolution interface
US20160055204A1 (en) * 2013-03-18 2016-02-25 Ge Intelligent Platforms, Inc. Apparatus and method for executing parallel time series data analytics
US9372942B1 (en) 2013-03-15 2016-06-21 Dell Software Inc. System and method for facilitating data visualization via a map-reduce framework
EP2504754A4 (en) * 2009-11-24 2016-10-05 Alibaba Group Holding Ltd Efficient data backflow processing for data warehouse
CN103136217B (en) * 2011-11-24 2016-12-14 阿里巴巴集团控股有限公司 A distributed data processing method and system stream
US9606995B2 (en) 2012-04-30 2017-03-28 Dell Products, Lp Cloud based master data management system with remote data store and method therefor
US9665608B2 (en) 2013-06-27 2017-05-30 International Business Machines Corporation Parallelization of data processing
US9710282B2 (en) 2011-12-21 2017-07-18 Dell Products, Lp System to automate development of system integration application programs and method therefor

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2000004466A1 (en) 1998-07-15 2000-01-27 Informatica Corporation Method and architecture for automated optimization of etl throughput in data warehousing applications
US6065007A (en) * 1998-04-28 2000-05-16 Lucent Technologies Inc. Computer method, apparatus and programmed medium for approximating large databases and improving search efficiency
US6122628A (en) * 1997-10-31 2000-09-19 International Business Machines Corporation Multidimensional data clustering and dimension reduction for indexing and searching
US6173310B1 (en) * 1999-03-23 2001-01-09 Microstrategy, Inc. System and method for automatic transmission of on-line analytical processing system report output
US6216125B1 (en) * 1998-07-02 2001-04-10 At&T Corp. Coarse indexes for a data warehouse
US6385604B1 (en) * 1999-08-04 2002-05-07 Hyperroll, Israel Limited Relational database management system having integrated non-relational multi-dimensional data store of aggregated data elements
US6408292B1 (en) * 1999-08-04 2002-06-18 Hyperroll, Israel, Ltd. Method of and system for managing multi-dimensional databases using modular-arithmetic based address data mapping processes on integer-encoded business dimensions

Patent Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6122628A (en) * 1997-10-31 2000-09-19 International Business Machines Corporation Multidimensional data clustering and dimension reduction for indexing and searching
US6065007A (en) * 1998-04-28 2000-05-16 Lucent Technologies Inc. Computer method, apparatus and programmed medium for approximating large databases and improving search efficiency
US6216125B1 (en) * 1998-07-02 2001-04-10 At&T Corp. Coarse indexes for a data warehouse
WO2000004466A1 (en) 1998-07-15 2000-01-27 Informatica Corporation Method and architecture for automated optimization of etl throughput in data warehousing applications
US6208990B1 (en) * 1998-07-15 2001-03-27 Informatica Corporation Method and architecture for automated optimization of ETL throughput in data warehousing applications
US6173310B1 (en) * 1999-03-23 2001-01-09 Microstrategy, Inc. System and method for automatic transmission of on-line analytical processing system report output
US6385604B1 (en) * 1999-08-04 2002-05-07 Hyperroll, Israel Limited Relational database management system having integrated non-relational multi-dimensional data store of aggregated data elements
US6408292B1 (en) * 1999-08-04 2002-06-18 Hyperroll, Israel, Ltd. Method of and system for managing multi-dimensional databases using modular-arithmetic based address data mapping processes on integer-encoded business dimensions

Non-Patent Citations (4)

* Cited by examiner, † Cited by third party
Title
Ballinger C. et al.: "Born to be parallel, Why parallel origins give Teradata an enduring performance edge" Bulletin of the technical committee on data Engineering vol. 20, No. 2, Jun. 1997, XP002225741 IEEE Comput Soc., Los Alamitos, CA, US p. 8, line 21-line 39.
Bellatreche L. et al.: "OLAP query processing for partitioned data warehouses"; Proceedings 1999 Intl symposium on database Application in Non-Traditional Environments (Dante'99) (Cat. No. PR00496), Kyoto, JP Nov. 28-30, 1999, pp. 35-42, XP01037691 IEEE comput. Soc. Los Alamitos, CA US ISBN 0-7695-0496-5, p. 35, right-hand column, paragraph 2, p. 36, right-hand column, line 14-line 16, p. 38, right-hand column, paragraph 1.
Jones, Katherine, "An Introduction to Data Warehousing: What Are the Implications for the Network", 1998, International Journal of Network Management, vol. 8 pp. 42-56.*
Mohania M. et al: "Advances and research directions in data warehousing technology" AJIS Australian Journal of Information Systems, vol. 7. No. 1. Sep. 1999, pp. 41-59, XP000978044 Wollongong, Au ISSN: 1039-7841 p. 53, paragraph 1, paragraph 2, p. 56, paragraph 2.

Cited By (79)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7051334B1 (en) * 2001-04-27 2006-05-23 Sprint Communications Company L.P. Distributed extract, transfer, and load (ETL) computer method
US7299216B1 (en) * 2002-10-08 2007-11-20 Taiwan Semiconductor Manufacturing Company, Ltd. Method and apparatus for supervising extraction/transformation/loading processes within a database system
US7321939B1 (en) 2003-06-27 2008-01-22 Embarq Holdings Company Llc Enhanced distributed extract, transform and load (ETL) computer method
US9336253B2 (en) 2003-09-10 2016-05-10 International Business Machines Corporation Semantic discovery and mapping between data sources
US8442999B2 (en) 2003-09-10 2013-05-14 International Business Machines Corporation Semantic discovery and mapping between data sources
US8874613B2 (en) 2003-09-10 2014-10-28 International Business Machines Corporation Semantic discovery and mapping between data sources
US7970779B2 (en) * 2003-09-12 2011-06-28 Oracle International Corporation Application interface including dynamic transform definitions
US20070174852A1 (en) * 2003-09-12 2007-07-26 Smirnov Dmitry M Application interface including dynamic transform definitions
US7376768B1 (en) * 2003-12-19 2008-05-20 Sonic Solutions, Inc. Dynamic memory allocation for multiple targets
US8171480B2 (en) 2004-01-27 2012-05-01 Network Appliance, Inc. Method and apparatus for allocating shared resources to process domains according to current processor utilization in a shared resource processor
US20050172293A1 (en) * 2004-01-27 2005-08-04 Network Appliance, Inc. Method and apparatus for allocating resources in a shared resource processor
US20050278152A1 (en) * 2004-05-24 2005-12-15 Blaszczak Michael A Systems and methods for distributing a workplan for data flow execution based on an arbitrary graph describing the desired data flow
US7930432B2 (en) * 2004-05-24 2011-04-19 Microsoft Corporation Systems and methods for distributing a workplan for data flow execution based on an arbitrary graph describing the desired data flow
US8131612B1 (en) * 2005-05-09 2012-03-06 Genesis Financial Products, Inc. Program generator for hedging the guaranteed benefits of a set of variable annuity contracts
US20070174829A1 (en) * 2005-07-15 2007-07-26 Erik Brockmeyer Method for mapping applications on a multiprocessor platform/system
US8473934B2 (en) * 2005-07-15 2013-06-25 Imec Method for mapping applications on a multiprocessor platform/system
US20070088756A1 (en) * 2005-10-19 2007-04-19 Bruun Peter M Data consolidation
US7490108B2 (en) * 2005-10-19 2009-02-10 Hewlett-Packard Development Company, L.P. Data consolidation
US20070094529A1 (en) * 2005-10-20 2007-04-26 Lango Jason A Method and apparatus for increasing throughput in a storage server
US8347293B2 (en) * 2005-10-20 2013-01-01 Network Appliance, Inc. Mutual exclusion domains to perform file system processes on stripes
US20070106778A1 (en) * 2005-10-27 2007-05-10 Zeldin Paul E Information and status and statistics messaging method and system for inter-process communication
US20070130144A1 (en) * 2005-11-30 2007-06-07 International Business Machines Corporation Method and system for concurrent processing of list items
EP1909198A3 (en) * 2006-10-04 2008-05-28 Sap Ag Semantical partitioning of data
EP1909198A2 (en) * 2006-10-04 2008-04-09 Sap Ag Semantical partitioning of data
US7836004B2 (en) 2006-12-11 2010-11-16 International Business Machines Corporation Using data mining algorithms including association rules and tree classifications to discover data rules
US20080140602A1 (en) * 2006-12-11 2008-06-12 International Business Machines Corporation Using a data mining algorithm to discover data rules
US9176711B2 (en) 2007-04-27 2015-11-03 Dell Products, Lp System and method for automated on-demand creation of a customized software application
US8533661B2 (en) 2007-04-27 2013-09-10 Dell Products, Lp System and method for automated on-demand creation of a customized software application
US20090006282A1 (en) * 2007-06-27 2009-01-01 International Business Machines Corporation Using a data mining algorithm to generate rules used to validate a selected region of a predicted column
US20090006283A1 (en) * 2007-06-27 2009-01-01 International Business Machines Corporation Using a data mining algorithm to generate format rules used to validate data sets
US8171001B2 (en) 2007-06-27 2012-05-01 International Business Machines Corporation Using a data mining algorithm to generate rules used to validate a selected region of a predicted column
US8166000B2 (en) 2007-06-27 2012-04-24 International Business Machines Corporation Using a data mining algorithm to generate format rules used to validate data sets
US8401987B2 (en) 2007-07-17 2013-03-19 International Business Machines Corporation Managing validation models and rules to apply to data sets
US20090024551A1 (en) * 2007-07-17 2009-01-22 International Business Machines Corporation Managing validation models and rules to apply to data sets
US20090063504A1 (en) * 2007-08-29 2009-03-05 Richard Banister Bi-directional replication between web services and relational databases
US8122040B2 (en) 2007-08-29 2012-02-21 Richard Banister Method of integrating remote databases by automated client scoping of update requests over a communications network
US9928255B2 (en) 2007-08-29 2018-03-27 Sesame Software, Inc. Method for generating indexes for downloading data
US8051091B2 (en) * 2008-02-19 2011-11-01 Sap Ag Parallelizing data manipulation by data set abstraction
US20090210433A1 (en) * 2008-02-19 2009-08-20 Horst Werner Parallelizing Data Manipulation by Data Set Abstraction
US20090299958A1 (en) * 2008-06-02 2009-12-03 Microsoft Corporation Reordering of data elements in a data parallel system
US8290917B2 (en) 2008-06-02 2012-10-16 Microsoft Corporation Reordering of data elements in a data parallel system
US20090327208A1 (en) * 2008-06-30 2009-12-31 International Business Machines Corporation Discovering transformations applied to a source table to generate a target table
US9720971B2 (en) 2008-06-30 2017-08-01 International Business Machines Corporation Discovering transformations applied to a source table to generate a target table
EP2157518A1 (en) * 2008-08-19 2010-02-24 Tieto Oyj Method for delivering data
US8209341B2 (en) * 2008-09-30 2012-06-26 International Business Machines Corporation Configurable transformation macro
US20100082706A1 (en) * 2008-09-30 2010-04-01 International Business Machines Corporation Configurable transformation macro
US20100082954A1 (en) * 2008-09-30 2010-04-01 International Business Machines Corporation Configuration rule prototyping tool
US8756407B2 (en) 2008-09-30 2014-06-17 International Business Machines Corporation Configuration rule prototyping tool
US8386452B2 (en) * 2009-05-19 2013-02-26 Ntt Docomo, Inc. Data combination system and data combination method
US20120066207A1 (en) * 2009-05-19 2012-03-15 Ntt Docomo, Inc. Data combination system and data combination method
US9008464B2 (en) * 2009-06-16 2015-04-14 University-Industry Cooperation Group Of Kyung Hee University Media data customization
US20100316286A1 (en) * 2009-06-16 2010-12-16 University-Industry Cooperation Group Of Kyung Hee University Media data customization
US20110113031A1 (en) * 2009-11-12 2011-05-12 Oracle International Corporation Continuous aggregation on a data grid
US8566341B2 (en) * 2009-11-12 2013-10-22 Oracle International Corporation Continuous aggregation on a data grid
EP2504754A4 (en) * 2009-11-24 2016-10-05 Alibaba Group Holding Ltd Efficient data backflow processing for data warehouse
US8627331B1 (en) 2010-04-30 2014-01-07 Netapp, Inc. Multi-level parallelism of process execution in a mutual exclusion domain of a processing system
US9071622B2 (en) 2010-04-30 2015-06-30 Netapp, Inc. Multi-level parallelism of process execution in a mutual exclusion domain of a processing system
US9250963B2 (en) * 2011-11-24 2016-02-02 Alibaba Group Holding Limited Distributed data stream processing method and system
US9727613B2 (en) * 2011-11-24 2017-08-08 Alibaba Group Holding Limited Distributed data stream processing method and system
US20130139166A1 (en) * 2011-11-24 2013-05-30 Alibaba Group Holding Limited Distributed data stream processing method and system
CN103136217B (en) * 2011-11-24 2016-12-14 阿里巴巴集团控股有限公司 A distributed data processing method and system stream
CN103136217A (en) * 2011-11-24 2013-06-05 阿里巴巴集团控股有限公司 Distributed data flow processing method and system thereof
US20160179898A1 (en) * 2011-11-24 2016-06-23 Alibaba Group Holding Limited Distributed data stream processing method and system
US9710282B2 (en) 2011-12-21 2017-07-18 Dell Products, Lp System to automate development of system integration application programs and method therefor
US8943076B2 (en) 2012-02-06 2015-01-27 Dell Products, Lp System to automate mapping of variables between business process applications and method therefor
US8805716B2 (en) 2012-03-19 2014-08-12 Dell Products, Lp Dashboard system and method for identifying and monitoring process errors and throughput of integration software
US8930303B2 (en) 2012-03-30 2015-01-06 International Business Machines Corporation Discovering pivot type relationships between database objects
US8782103B2 (en) 2012-04-13 2014-07-15 Dell Products, Lp Monitoring system for optimizing integrated business processes to work flow
US9015106B2 (en) 2012-04-30 2015-04-21 Dell Products, Lp Cloud based master data management system and method therefor
US9158782B2 (en) 2012-04-30 2015-10-13 Dell Products, Lp Cloud based master data management system with configuration advisor and method therefore
US9606995B2 (en) 2012-04-30 2017-03-28 Dell Products, Lp Cloud based master data management system with remote data store and method therefor
US8589207B1 (en) 2012-05-15 2013-11-19 Dell Products, Lp System and method for determining and visually predicting at-risk integrated processes based on age and activity
US9069898B2 (en) 2012-05-31 2015-06-30 Dell Products, Lp System for providing regression testing of an integrated process development system and method therefor
US9092244B2 (en) 2012-06-07 2015-07-28 Dell Products, Lp System for developing custom data transformations for system integration application programs
US9372942B1 (en) 2013-03-15 2016-06-21 Dell Software Inc. System and method for facilitating data visualization via a map-reduce framework
US20160055204A1 (en) * 2013-03-18 2016-02-25 Ge Intelligent Platforms, Inc. Apparatus and method for executing parallel time series data analytics
US9864673B2 (en) 2013-06-21 2018-01-09 Dell Products, Lp Integration process management console with error resolution interface
US9183074B2 (en) 2013-06-21 2015-11-10 Dell Products, Lp Integration process management console with error resolution interface
US9665608B2 (en) 2013-06-27 2017-05-30 International Business Machines Corporation Parallelization of data processing

Also Published As

Publication number Publication date Type
CA2418859C (en) 2012-11-27 grant
WO2002012839A3 (en) 2003-04-24 application
CA2418859A1 (en) 2002-02-14 application
EP1322920A2 (en) 2003-07-02 application
WO2002012839A2 (en) 2002-02-14 application

Similar Documents

Publication Publication Date Title
Valduriez Join indices
Groff et al. SQL: the complete reference
US5666528A (en) System and methods for optimizing database queries
US6285996B1 (en) Run-time support for user-defined index ranges and index filters
US6691101B2 (en) Database system providing optimization of group by operator over a union all
US6032158A (en) Apparatus and method for capturing and propagating changes from an operational database to data marts
US6119128A (en) Recovering different types of objects with one pass of the log
US6298342B1 (en) Electronic database operations for perspective transformations on relational tables using pivot and unpivot columns
Hellerstein et al. Architecture of a database system
US6801903B2 (en) Collecting statistics in a database system
US7085769B1 (en) Method and apparatus for performing hash join
Goil et al. High performance OLAP and data mining on parallel computers
US6112198A (en) Optimization of data repartitioning during parallel query optimization
US6999958B2 (en) Runtime query optimization for dynamically selecting from multiple plans in a query based upon runtime-evaluated performance criterion
US6363371B1 (en) Identifying essential statistics for query optimization for databases
US6101502A (en) Object model mapping and runtime engine for employing relational database with object oriented software
US7080081B2 (en) Multidimensional data clustering scheme for query processing and maintenance in relational databases
US6253196B1 (en) Generalized model for the exploitation of database indexes
US6223171B1 (en) What-if index analysis utility for database systems
US6581205B1 (en) Intelligent compilation of materialized view maintenance for query processing systems
US6397125B1 (en) Method of and apparatus for performing design synchronization in a computer system
US6195662B1 (en) System for transforming and exchanging data between distributed heterogeneous computer systems
US7149736B2 (en) Maintaining time-sorted aggregation records representing aggregations of values from multiple database records using multiple partitions
US6327587B1 (en) Caching optimization with disk and/or memory cache management
US5899988A (en) Bitmapped indexing with high granularity locking

Legal Events

Date Code Title Description
AS Assignment

Owner name: INFORMATICA CORPORATION, CALIFORNIA

Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNORS:CHUNG, DWAYNE;MUDUNURI, GAUTAM H.;YOUNAS, FAYYAZ;AND OTHERS;REEL/FRAME:011010/0097;SIGNING DATES FROM 20000714 TO 20000803

FPAY Fee payment

Year of fee payment: 4

FPAY Fee payment

Year of fee payment: 8

AS Assignment

Owner name: BANK OF AMERICA, N.A., AS COLLATERAL AGENT, NORTH

Free format text: SECURITY AGREEMENT;ASSIGNOR:INFORMATICA CORPORATION;REEL/FRAME:036294/0701

Effective date: 20150806

AS Assignment

Owner name: INFORMATICA LLC, CALIFORNIA

Free format text: CHANGE OF NAME;ASSIGNOR:INFORMATICA CORPORATION;REEL/FRAME:036453/0406

Effective date: 20150806

FPAY Fee payment

Year of fee payment: 12