WO2005020105A1 - Dml statements for densifying data in a relational database system - Google Patents

Dml statements for densifying data in a relational database system Download PDF

Info

Publication number
WO2005020105A1
WO2005020105A1 PCT/US2004/027406 US2004027406W WO2005020105A1 WO 2005020105 A1 WO2005020105 A1 WO 2005020105A1 US 2004027406 W US2004027406 W US 2004027406W WO 2005020105 A1 WO2005020105 A1 WO 2005020105A1
Authority
WO
WIPO (PCT)
Prior art keywords
processors
data
dimension
instructions
machine
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Ceased
Application number
PCT/US2004/027406
Other languages
English (en)
French (fr)
Inventor
Abhinav Gupta
Lei Sheng
Sankar Subramanian
Nathan Folkert
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Oracle International Corp
Original Assignee
Oracle International Corp
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Oracle International Corp filed Critical Oracle International Corp
Priority to EP04781985A priority Critical patent/EP1658572A1/en
Priority to JP2006524117A priority patent/JP4747094B2/ja
Priority to CA002534788A priority patent/CA2534788C/en
Priority to AU2004267850A priority patent/AU2004267850B2/en
Publication of WO2005020105A1 publication Critical patent/WO2005020105A1/en
Anticipated expiration legal-status Critical
Ceased legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/28Databases characterised by their database models, e.g. relational or object models
    • G06F16/283Multi-dimensional databases or data warehouses, e.g. MOLAP or ROLAP
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/242Query formulation
    • G06F16/2433Query languages
    • G06F16/2445Data retrieval commands; View definitions
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/245Query processing
    • G06F16/2455Query execution
    • G06F16/24553Query execution of query operations
    • G06F16/24558Binary matching operations
    • G06F16/2456Join operations
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y10TECHNICAL SUBJECTS COVERED BY FORMER USPC
    • Y10STECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y10S707/00Data processing: database and file management or data structures
    • Y10S707/99941Database schema or data structure
    • Y10S707/99943Generating database or data structure, e.g. via user interface
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y10TECHNICAL SUBJECTS COVERED BY FORMER USPC
    • Y10STECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y10S707/00Data processing: database and file management or data structures
    • Y10S707/99941Database schema or data structure
    • Y10S707/99944Object-oriented database structure

Definitions

  • the present invention relates to data manipulation. More specifically, the present invention relates to techniques for densifying a set of data relative to one or more dimensions.
  • dimension refers to a related set of distinct values. For example, a
  • TIMES dimension can include all dates from January 1998 to December 2003.
  • PRODUCTS dimension can include values representing all possible products of a company.
  • a set of data items is "dimensional" if each data item in the set is associated with a value from a particular dimension. For example, assume that each row of a table includes data about a particular event, including the date of the event. In this example, the "event data" is "dimensional" relative to the TIMES dimension.
  • a set of data items is "multi-dimensional” if the data items are dimensional relative to more than one dimension. For example, assume that each row of a SALES table includes data for a particular sale, including (1) the date of the sale, (2) the product sold, and (3) the region in which the sale was made. In this example, the "sales data" is multidimensional, since the sales data is dimensional relative to the TIMES, PRODUCTS and REGION dimensions.
  • a table that stores multi-dimensional data is often referred to as a "fact table”.
  • Tables that store the dimension values of a particular dimension are referred to as
  • Each row of a fact table will correspond to a dimensional value combination that includes one value for each of the dimensions.
  • each row will typically correspond to a combination of TIMES, PRODUCTS and REGION dimension values.
  • the set of dimension values associated with a given SALES table row may be represented as (t, p, r), where t is the value for the TIMES dimension, p is the value for the PRODUCTS dimension, and r is the value for the REGION dimension.
  • the set of dimension value combinations that are associated with rows in the fact table is a subset of the cross-product of the dimension values from each of the dimensions.
  • a fact table is referred to as "dense" along a dimension 'D' if the fact table contains all possible values of 'D' for any given combination of the other dimensional values in the fact table.
  • the REGION dimension has only three possible values RGN1, RGN2 and RGN3.
  • the SALES table is dense relative to the REGION dimension if, for every combination (t, p) reflected in the SALES table, the SALES table includes rows for the dimension value combinations (t, p, RGN1), (t, p, RGN2) and (t, p, RGN3).
  • Densification is the process of making a set of data denser than it originally was along a dimension of interest.
  • a set of rows may be densified, for example, by creating dummy rows for missing combinations of dimensional values.
  • Data whose density has been increased will be referred to as having been "densified, " and the dimension with respect to which it has been or is being increased will be referred to as the "densifying" dimension.
  • Densification is useful for a variety of situations. For example, queries in some multidimensional database systems (e.g., On-Line Analytical Processing (OLAP)) require data to be densified along the time dimension.
  • OLAP On-Line Analytical Processing
  • OLAP users are accustomed to seeing the data in a densified format, especially when window functions are computed and presented. For example, if there are no sales for a particular day, some users still want to see a display showing the running total of sales, the day, and a blank space in the sales column (because the running total is a window function in OLAP, which usually displays densified data).
  • densification may be performed by a series of operations that include the DISTINCT, CROSS JOIN and OUTER JOIN operations.
  • SQL Structured Query Language
  • TIMES time_id, calendar_month_desc, calendar_quarter_desc, Calendar_year
  • PRODUCTS prod_id, prod_subcategory, prod_category
  • the SALES fact table stores a measure (amount sold) that is dimensional relative to the TIMES and PRODUCTS dimensions.
  • the "time_id, " column stores the time dimension value for the row
  • the "product_id, " column stores the PRODUCTS dimension value for the row.
  • the amount_sold column stores the measure value "amount sold”.
  • the TIMES dimension table stores all time_id values, and details about the time that corresponds to each time_id value.
  • the PRODUCTS dimension table stores all product_id values, and details about the product that correspond to each product_id value.
  • V2 includes all combinations of (1) prod_id values that are actually in the SALES table, and (2) all time_id values in the TIMES table.
  • the "SELECT DISTINCT prod_id FROM SALES” clause finds all of the unique values of prod d found in the SALES table.
  • the "SELECT time_id FROM TIMES” clause finds all values of time_id in the TIMES table.
  • the "CROSS JOIN” construct causes a cross product to be taken between the unique prod__ids found and all time_ids.
  • FIG. 1 is a flowchart showing method 100 for performing the densification, which is used in Ql .
  • step 102 a sort is performed to obtain all DISTINCT values of ⁇ rod_id in the SALES table.
  • step 104 a CROSS JOIN of all ⁇ rod_id values with time_id values in the TIMES table is performed, thereby obtaining a set of (prod_id, time__id) dimension value combinations that is dense in the Time dimension, but includes only those values of ⁇ rod_id found in the SALES table.
  • step 106 an OUTER JOIN of the SALES fact table with the results of the CROSS JOIN of step 104 is performed, thereby adding blank rows to the SALES table for any (prod_id, time_id) dimension value combination of the CROSS JOIN not found in the original SALES table.
  • YTD year-to-date
  • the "FROM SALES" statement is identical to Ql, and causes the same sequence of operations to occur.
  • the "SELECT V2.prod_id" statement sums the sales values. The sum is returned in the YTD_sales variable. The data returned is partitioned according to product and year by the standard SQL PARTITION BY construct, and then ordered by time_id using the ORDER BY construct.
  • step 102 (or step (1)) is an extra computation that is not needed for the final result, but cannot be avoided in the prior art. Additionally, the present inventors have recognized that Ql is not intuitive. Especially with more complicated densification queries, it can be very difficult to decipher the intent of the user by inspecting the statement, because of the unintuitive nature of the series of operations used to perform the densification. Thus, the structure of the current way of performing densification within query is complex, hard to understand, and inefficient to compute.
  • FIG. 1 is a flowchart showing method 100 for performing densification.
  • FIG. 2A is a block diagram of the syntax of a partitioned table.
  • FIG. 2B is a block diagram of the syntax of the outer join type of FIG. 2C.
  • FIG. 2C is a block diagram of an extended JOIN syntax using the partitioned table of FIG. 2 A, according to an embodiment of the invention.
  • FIG. 3 is a flowchart showing an example of a method for implementing the
  • FIG. 4 is a flowchart showing another example of a method for implementing the PARTITIONED OUTER JOIN of FIG. 2C. '
  • FIG. 5 is a flowchart showing another example of a method for implementing the PARTITIONED OUTER JOIN of FIG. 2C.
  • FIG. 6 is a flowchart of a method of the operations of an optimizer that may be included in a relational database management system capable of running the
  • FIG. 7 is a block diagram that illustrates a computer system 700 upon which an embodiment of the invention may be implemented. DETAILED DESCRIPTION OF THE INVENTION
  • the present methods and apparatuses provide several features that can each be used independently of one another or with any combination of the other features. Although many of the features of the present apparatuses and methods of providing DML Statements for densifying data are motivated by the problems explained above, any individual feature may not address any of the problems discussed above or may only address one of the problems discussed above. Some of the problems discussed above may not be fully addressed by any of the features of the present method of storing and organizing data related to fact tables. Although headings are provided, information related to a particular heading, but not found in the section having that heading, may also be found elsewhere in the specification.
  • an embodiment of the invention densities data without performing a DISTINCT operation.
  • data is densified without performing a sort operation to find the distinct values of a dimension in which the data is not being densified.
  • the database language supported by a database server is extended to support a construct that may be used in DML statements that at least in combination with other constructs causes a densification operation.
  • a new construct for DML statements is provided for partitioning a set of data that is to be used on an operation associated with another construct to density the data.
  • the syntax and semantics of the OUTER JOIN statement is extended to include a partitioning construct.
  • the OUTER JOIN with the partitioning construct will be referred to as a PARTITIONED OUTER JOIN.
  • the PARTITIONED OUTER JOIN returns data that is partitioned with respect to a specified dimension, and each partition is OUTER JOINED to another set of values. [0035] If the partitions are OUTER JOINED to a dense set of dimension values, a dense table may be formed.
  • the resultant table may be denser than the original set.
  • the construct for partitioning data is used to operate on data involved in an OUTER JOIN in order to densify data relative to a dimension.
  • the partitioning construct performs the densification as part of an OUTER JOIN, the specification is not limited to embodiments in which the densification is performed via a partitioning construct or an OUTER JOIN.
  • the construct for partitioning data in DML statements, outside of the window and spreadsheet functions, and the construct for the densification of data are independent contributions to the art.
  • a different construct for densifying data may be used that does not necessarily involve an OUTER JOIN statement.
  • the original set of data being densified may be a table, which may be referred to as a target table.
  • the target table may be a fact table or any other type of table.
  • the target table may be a virtual table formed by a database expression.
  • Database servers are designed to execute database statements that conform to the database language that they support. SQL is a database language supported by many database servers. Database languages typically include (1) constructs for identifying operations, and (2) constructs for identifying the data upon which the operations are to be performed. Since many database operations are designed to be performed on tables, the database statements typically include table identifiers that specify the tables upon which the operations are to be performed. When describing the syntax of the database language, such table identifiers are represented by the label table_reference.
  • FIG. 2A is a block diagram of the syntax of a partitioned table 220. Partitioned table 210 includes table reference 221, comma 224, and expr 210, which will be referenced in the discussion below.
  • the query execution engine of a database server is extended to support database statements that have partitioned_table references (partition table 220) in one or more contexts in which the language previously only supported table_references.
  • partition table 220 partitioned_table references
  • FIG. 2A the metasyntax of a partition able reference, is illustrated in FIG. 2A, and is defined, in Backus-Naur Form (BNF), as follows:
  • table reference 221 is placed to the left of key words "PARTITION BY".
  • Table reference 221 may be any table.
  • expr 222 which may be reference to a column or an expressions that evaluated to a column, such as Coll+Col2, where Coll and Col2 are references to columns.
  • Partitioned table 220 may be SALES PARTITION BY (warehouse_id*deliv_id, time_id). Partitioned_table references, such as partition table 220, may be used in a variety of contexts that previously required table references.
  • the syntax of the PARTITIONED OUTER JOIN is similar to that of a standard OUTER JOIN, except wherever a table_reference is called for in the standard OUTER JOIN, the PARTITIONED OUTER JOIN accepts either a partitioned_table or a table_reference.
  • a partitioned_table reference includes a table_reference and a PARTITION BY clause.
  • the table specified by the tablejreference that precedes a PARTITION BY clause shall be referred to herein as the "target table" of the PARTITION BY clause.
  • the target table may be a fact table or any other type of table.
  • the expressions and columns in the PARTITION BY clause will be referred to as partitioning expressions and columns, respectively.
  • the database server splits the target table specified by table reference 221 into partitions, where each partition corresponds to a dimension value of the column resulting from expr
  • the SALES table is partitioned by prod_id. If there are three prod_ids, PI, P2, and P3, then the first partition of partitioned product table will be those rows having prod_id PI, the second partition will be those rows having prodjd P2, and the third partition will be those rows having prod__id P3.
  • the partitioned table clause "SALES PARTITION BY ⁇ prodjd, regnjd ⁇ " will partition the table into six partitions.
  • the first partition has product ID PI and region ID RI
  • the second partition has product ID PI and region ID R2
  • the third partition has product ID P2 and region ID RI
  • the fourth partition has product ID P2 and region ID R2
  • the fifth partition has product ID P3 and region ID RI
  • the sixth partition has product ID P3 and region ID R2.
  • the hierarchy of the partitions in the above embodiment is: the first partitioning index listed divides the target table into major partitions, and the next partitioning index divides each major partition into smaller partitions.
  • any other partitioning hierarchy may be used.
  • the last partitioning dimension listed may be used to make the major partitions and the next partitioning dimension listed may be used to partition the major partitions into smaller partitions.
  • the partitioned table clause "SALES PARTITION BY ⁇ prodjd, regnjd ⁇ " results in the first partition having product ID PI and region ID RI, the second partition having product ID P2 and region ID RI, the third partition having product ID P3 and region ID RI, the fourth partition has product ID PI and region LD R2, the fifth partition having product ID P2 and region ID R2, and the sixth partition having product ID P3 and region ID R2.
  • FIG. 2B is a block diagram of the syntax of the outer join type 230 of FIG. 2C.
  • Outer join type 230 includes options 242.
  • the syntax of an OUTER JOIN allows either a partitioned table or table reference to occur on either side of an OUTER JOIN.
  • the outer join type may be a FULL, a LEFT, or a RIGHT OUTER JOIN depending on whether the resulting table will contain all rows from both tables specified on both sides, the table specified on the left side,, or the table specified on the right side of the keywords "OUTER JOLN" in the OUTER JOIN clause.
  • the resulting table includes all rows from both tables, the table that appears on the right and the table that appears of the left of the "OUTER JOIN" keywords. If a partitionedjable appears on either side, each partition is separately OUTER JOINED to the table eference or partitionedjable on the other side of the "OUTER JOLN" keywords.
  • table A has rows for (P 1 , SI), (P3, S2), (P3, S3), (P5, -), and (P10, -)
  • the results of a FULL OUTER JOLN is a table having rows for (PI, SI), (P3, S2), (P3, S3), (P5, -), (PIO, -), and (-, S4). If table A is partitioned on p Jd, then the results of a LEFT OUTER JOLN is a table having rows for (PI, SI), (P3, S2), (P3, S3), (P5, -), and (P10, -). The results of a LEFT OUTER JOIN are the same as if table A was not partitioned. Since Table B was not partitioned, no densification of Table B occurs.
  • the results of a RIGHT OUTER JOIN is a table having rows for (PI, SI), (PI, S2), (PI, S3), (PI, S4), (P3, SI), (P3, S2), (P3, S3), (P3, S4), (P5, SI), (P5, S2), (P5, S3), (P5, S4), (P10, SI), (P10, S2), (P10, S3), and (P10, S4).
  • each of the PI, P3, P5, and P10 partitions is densified to include one row for each of value of the second column (SI, S2, S3, and S4).
  • the results of a FULL OUTER JOIN is a table having rows for (PI, SI), (PI, S2), (PI, S3), (PI, S4), (P3, SI), (P3, S2), (P3, S3), (P3, S4), (P5, SI), (P5, S2), (P5, S3), (P5, S4), (P5,-), (P10, SI), (P10, S2), (P10, S3), (P10, S4), and (P10, -).
  • the results of a FULL OUTER JOLN include rows (P5,-) and (P10,-), which are not included in the RIGHT OUTER JOLN.
  • rows (P5, -) and (P10,-) are added since they would be added in the cross product performed in a LEFT OUTER JOIN.
  • the result record is composed of (1) the columns used as the join keys, (2) the columns of the non-join keys from the left of the JOIN operand, and (3) the columns of the non-join keys from the right of the JOLN operand.
  • the result of the clause Tl RIGHT OUTER JOLN T2 USING (cl) is composed of (1) the columns used as the join keys, (2) the columns of the non-join keys from the left of the JOIN operand, and (3) the columns of the non-join keys from the right of the JOLN operand.
  • the results of a named column partitioned outer join include (1) the result of the partitioning expression from the left operand, if any, (2) the result of the partitioning expressions from right operand if any, (3) the join columns, (4) the non- partitioning and non-join columns from left operand, and (5) the non-partitioning and non-join columns from right operand.
  • the result recodes of the above statement is composed of columns (Tl.c2, cl, Tl.c3, T2.c4)
  • the results from a partitioned outer join having the join condition specified through an ON clause includes (1) the partitioning expressions from the left operand if any, (2) the partitioning expressions from right operand, if any, (3) the non-partitioning columns from left operand, and (4) the non-partitioning columns from right operand.
  • table Tl was partitioned by column C2. Then each partition of table Tl was outer joined to table T2 using column Cl as the join key. Since a named join was used, the join column, Cl, appears only once in the results. [0053] Now consider the statement
  • table T2 has 3 rows:
  • FIG. 2C is a block diagram of an extended JOIN syntax having branches 202 and 204.
  • Branch 204 includes partitioned table 206, partitioned table 208, partitioned table 210, outer join type 212, outer join type 214, and condition 216, which are referenced in the discussion below.
  • Branch 202 gives the syntax for the INNER and CROSS JOINs.
  • branch 204 the corresponding syntax of the PARTITIONED OUTER JOIN in BNF format is as follows.
  • RIGHT ⁇ [OUTER] JOLN ⁇ partitionedjable
  • branch 204 starting from the left, the user first specifies either partitioned table 206a or table reference 206b, then the user may specify the keyword "NATURAL" on branch 204b, or the user may use branch 204a and not use the keyword "NATURAL.”
  • Outer join type 230 is an embodiment of outer join types 212 and 214. Following outer join type 212 or outer join type 214 comes the keyword JOIN.
  • partitioned table 208a or table reference 208b follow the JOLN keyword.
  • partitioned table 210a or table reference 210b follows the JOIN keyword. Since in a NATURAL OUTER JOLN no conditions are applied, on branch 204b, the OUTER JOIN clause ends with either the partitioned table 210a or table reference 210b. Following partitioned table 208a or table reference 208b is either keyword "ON" for specifying a join condition or "USING" for specifying a join key. Similar to existing ANSI join operators, the PARTITIONED OUTER JOLN allows complex join conditions in which the join condition 216 (i.e., join_cond, following the "ON" keyword) may be an arbitrarily complex Boolean expression. The join_cond or join condition 216 can apply to columns of tables from either side of the join, including any column in the partitioning expressions of a partitioned table.
  • each expr 222 either is a column or an expression that evaluates to a column.
  • the result of the join is a UNION of the results from applying an OUTER JOLN to each of the partitions.
  • the partitioning expression takes a value that identifies the corresponding partitioned table.
  • PARTITIONED OUTER JOLN reverts to a standard OUTER JOIN.
  • each of the above examples performs the same densification of the SALES table.
  • the first example uses a right outer join
  • the second example uses a left OUTER JOLN.
  • the positions of the SALES fact table and the TLME dimension table are reversed in relation to the "OUTER JOLN" keywords so that the result is the same.
  • the join key is specified by name, which is equivalent to the equijoin condition of the first two examples.
  • the common columns of the TIMES and SALES tables are used as a join keys. If the TIMES and SALES table have only one common column (timejd), for example, the results will be the same as the statements in the three statements above.
  • the query Ql can be rewritten as Qljiew:
  • the Ql_new query just requires a single PARTITIONED OUTER JOLN, thereby simplifying the coding required to perform the densification as compared to the code of
  • a database server executes one or more routines to perform the partitioned outer join operation.
  • routines may be used by the database server to perform the partitioned outer join.
  • a database server includes routines for each of a plurality of the techniques, and then chooses which technique is most appropriate for each statement that contains a partitioned outer join operation.
  • the method chosen for a given query depends upon which method will be more efficient for the performing the query.
  • FIG. 3 is a flowchart showing method 300, which will be referred to a Nested
  • the nested loop (e.g., do-loops) is a code including at least two loops in a nested relationship. For example, the outermost loop iterates across one set of values, and a nested loop iterates across a second set of values for each value in the first set of values.
  • the one or more of the outermost loops of the nested loop may correspond to the partitioning dimensions.
  • Each of the innermost loops may correspond to a different densifying dimension.
  • Each loop has an index. During each iteration of a loop, the loop index is set to the next value.
  • the loops corresponding to the partitioning dimensions will be referred to as the partitioning loops, and the corresponding loop indexes will be referred to as the partitioning indexes.
  • the loops corresponding to the densifying dimensions will be referred to as the densifying loops and the corresponding loop indexes will be referred to as the densifying indexes.
  • the index of a loop that corresponds to a particular dimension will be assigned a new value from that dimension during each iteration of the loop.
  • the index values of a partitioning loop correspond to the distinct dimension values of the partitioning dimension. For example, if the table has three rows, and each of the three rows has the same dimension value for the partitioning dimension, then the partitioning loop executes only once for that particular dimension value.
  • the index values of the densifying loops correspond to the dimension values of the densifying dimensions.
  • the index value of the loop corresponds to a unique dimension value.
  • an "if statement checks if an entry (e.g., a row) exists in the data (e.g., a target table) having the dimension value combination corresponding to the current index value combination of the loops. If the entry exists, it is added to a set of output data. If the entry does not exist for the dimension value, a null valued entry is created having the dimension value combination corresponding to the index value combination.
  • the following algorithm may be used for the underlying code of the Ql jiew query, in which the PRODUCT dimension is used for the partitioning dimension and the TLME dimension is used for the densifying dimension.
  • index I uses distinct values of the Product dimension of the SALES table
  • index II performs a full scan of the Times dimension of the TIMES table or uses an index for the Times dimension that was already setup.
  • the nested loop join uses an available index as its loop index and may be used with a nonequijoin. If an index does not exist for use as the loop index, it is possible to set up indexes for the purpose of being able to perform a nested loop outer join or sort merge outer join. However, then the computing costs associated with setting up the indexes, may make the partitioned nested loop less efficient than the other methods of performing a partitioned outer join.
  • the index (index I) of the partitioning loop is initially set to the first distinct value of the Product dimension.
  • index I is set to a current value, which may be the next distinct dimension value of the Product dimension.
  • a method of obtaining the values of the partitioning dimension for index I is discussed below.
  • the index of the next loop, index TL is set to a current value, which may be the next distinct dimension value of the index with respect to which the fact table is being densified.
  • partitioning and/or densifying loops each corresponding to different partitioning dimension and/or densifying dimension, respectively. If the loop used for partitioning the data corresponds to a dimension with respect to which the data is fully dense, then the corresponding loop will have an index value corresponding to each of the dimension values.
  • step 306 a determination is made (e.g., via an "if statement) whether there is a row having the dimension value combination corresponding to the combination of current index values. Alternatively, for a non-equijoin, a determination is made whether the dimension combination of the present row and the index value combination satisfy the join conditions (e.g., condition 216 (FIG. 2C) or the join_cond). If no such row exists (if the join conditions are not satisfied), method 300 proceeds to step 308 where the row is created having a null value for its measures. After step 308, method 300 proceeds to step 310. Returning to the discussion of step 306, if the row does exist, then method 300 proceeds to step 310 (without performing step 308).
  • the join conditions e.g., condition 216 (FIG. 2C) or the join_cond
  • step 310 the row found in step 306 or created in step 308 is added to the final result (or output table) that will be presented.
  • step 312 a determination is made as to whether there are anymore index values for the loop having index II. If there are other index II values, method 300 returns to step 304 to begin another execution of the loop having index II. If there are no more index LL values, method 300 proceeds to step 314 to check whether there are any more index I values. If there are more index I values, method 300 proceeds to step 302 to begin another execution of the loop having the I index. If there are no more index I values, method 300 terminates. As indicated by the dots between steps 310 and 312, there may be any number of steps similar to steps 312 and 314, in which each step corresponds to a different loop and each loop corresponds to a different densifying or partitioning dimension.
  • the partitioning index, index I cycles through distinct dimension values I. Additionally, for a particular value of index I, a loop is executed for all of the values II of the densifying dimension, while applying the "if statement for each value II, which thereby densities the partition of the fact table having partitioning dimension value I.
  • the "if statement may be replaced with any statement or group of statements that makes a determination whether the row corresponding the current index value combination exists.
  • the outer index may be based on the PRODUCTS dimension.
  • the index of the inner loop is the entire set of timejds.
  • a skip scan is used for the partitioning index (e.g., index I) even though the partitioning dimension is not a leading column of the composite index.
  • the partitioning dimension of the composite index is accessed for each value of the one or more leading columns.
  • other entries in the index having that value are skipped, until the next highest value is found.
  • all index values already found are skipped. Repeating this process, for each distinct value of the partitioning index may be found.
  • the partitioning loop After finding a new distinct value the partitioning index the partitioning loop is executed.
  • the skip scan may be quicker than a full table scan for some sets for data is because the index is organized in a manner that keeps track of the sequence of the dimension values. The organization of the index values may be taken advantage to determine index combinations that can be skipped. Additionally, the size of the disc space that stores the index is smaller than the size of the disc space upon which the full table is stored, and the smaller disc space has a faster access time.
  • a dimension may be used as an index that would otherwise require a table scan (e.g., the DISTINCT construct) to find all rows corresponding to any given distinct value of the dimension.
  • a table scan e.g., the DISTINCT construct
  • the PARTITIONED OUTER JOLN can be computed by getting, via a skip scan, all the distinct prodjd values using the index. Then, the tuple consisting of each prodjd value and timejd, where the timejd value from the TIMES table as index II, is used for scanning the SALES table. For each combination of index values I and II, the statements within the two nested loops determine if there exists a matching row with corresponding prodjd and timejd values, and if a matching row, is found it returned. Otherwise, if no matching is returned, a dummy row is generated. SPLITTING THE FACT TABLE LNTO PARTITIONS
  • FIG. 4 shows a flowchart for a second method, method 400, of performing a PARTITIONED OUTER JOIN, in which the target table is split into partitions and then each partition is OUTER JOINED, as described below.
  • the splitting of the fact table into partitions does not rely on equijoins or indexes and therefore may be more flexible than the other two methods, but may also be less efficient in some situations.
  • a target table is sorted on a dimension other than the dimension with respect to which the data is being densified.
  • the target table may be a fact table or any other table.
  • the target table may be a virtual table in which the rows are produced by an expression of database statements.
  • the sorting of the target table has the effect of splitting the target table into partitions in which each partition corresponds to a different dimension value (e.g., prodjd) upon which the target table was sorted.
  • Method 400 requires support for detecting and handling the ends of the partitions. Therefore, the dimension value of the partitioning dimension of a row may be compared against the current partition value, to determine if the partition boundary has been passed.
  • each partition is OUTER JOLNed with the dense dimension table (e.g. the TIMES table).
  • the dense dimension table e.g. the TIMES table
  • Outer joining the partitions with the dense dimension involves taking a row of the dense dimension table (e.g. a TIMES row), and then, matching it with rows in each partition. If there is no match, a dummy row corresponding to the given dimension value combination (e.g., prodjd, timejd) is created.
  • the densification by splitting the target table of method 400 is flexible, and may be used when there are non-equiconditions (i.e., conditions that are not equality) in the ON clause. In addition to being useable for nonequjoins, method 400 does not require use of an index.
  • method 400 may be used to densify with respect to multiple dimensions.
  • the output of step 404 is also partitioned. Consequently, using the output of the last application of step 404, and repeating the step 404 with respect to a second dimension will densify the data with respect to the second dimension.
  • step 404 may be repeated any number of times to densify with respect to multiple dimensions.
  • FIG. 5 is a flowchart of a third method, method 500, of performing the PARTITIONED OUTER JOLN, which may be referred to as Sort Merge Partitioned Outer Join.
  • a sort merge join within each partition is performed, which is described below.
  • the sort merge join does not require an index, and also uses an equijoin.
  • the target table e.g., the SALES table
  • the target table is sorted with respect to all of its dimensions, (e.g., prodjd, timejd).
  • the densifying dimension is sorted last so that the data is divided into partitions in which in each partition there is no more that one dimension value for each of the dimension values of the densifying dimension. All other dimensions become partitioning dimensions.
  • the sorting of all of the partitioning dimensions has the effect of partitiomng the target table.
  • each partition of target table e.g., SALES table
  • SALES table corresponds to a different combination of dimension values. In other words, there will be one partition for each unique combination of partitioning dimension values. Additionally, each partition will only have one combination of partitioning dimension values.
  • the SALES table has dimensions TIMES having values of timejd, PRODUCTS having values prodjd, and REGION having values regnjd.
  • the densifying dimension table (e.g., the TIMES table) for the densifying dimension is sorted on its dimension values (e.g., timejd).
  • each row within each partition of the target table e.g., of the SALES table
  • sorted densifying dimension table e.g. the TIMES table.
  • the column of the sorted target table that includes the densification dimension may be used as a key when joining the rows of the target table to the dimension table, which may be referred to as a join key. If a match is not found, a dummy row is returned. Since all dimensions have been sorted, all dimension values are ordered.
  • the end of a partition may be detected by detecting a change in a dimension value combination having one dimension value for each of the partitioning dimensions.
  • the loop for the densifying dimension values is repeated for the next partition.
  • the joining of the individual rows of the target table to the dimension being densified may be performed by placing an "if statement" or "if block” in a loop having an index corresponding to the densifying dimension.
  • a cursor may be advanced through the densifying dimension table to the next row of the densifying dimension table after each check of the join condition on a row of the target table (e.g., a SALES table).
  • the cursor for the join key index is moved to the next row every time the equality condition of an equijoin is met, which is every time the dimension key of the densifying dimension table is equal to the join key of the sorted target table.
  • the cursor for the join key is advanced to the next row every time the index is equal to the join key.
  • FIG. 6 is a flowchart of a method 600 of the operations of an optimizer that may be included in a relational database management system capable of running the PARTITIONED OUTER JOLN.
  • an optimizer is included in the relational database management system running the PARTITIONED OUTER JOLN.
  • the optimizer decides between various methods of performing a variety of different operations based on the computing cost of each method for the set of tables and the statements on which the operations are being performed.
  • the optimizer decides upon a method for performing a PARTITIONED OUTER JOLN based on which method is most efficient.
  • step 602 the optimizer may first check if there are indexes available for the dimensions of the target table, if a partitioning or densifying dimension does not have an available index, method 300 cannot be used, and method 600 proceeds to step 604.
  • step 604 a determination is made whether a nonequijoin is present. If a nonequijoin is present, then method 500 cannot be used, so method 600 proceeds to step 606 where method 400 implements the splitting of the table method of FIG. 4.
  • step 608 if there are no nonequijoins, method 600 proceeds to step 608, where a decision is made whether method 400 or method 500 is expected to be more efficient. If method 400 is expected to be more efficient, then method 600 proceed to step 606. If method 500 is expected to be more efficient, then the method proceeds to step 610, where the Sort Merge Outer Join of method 500 is implemented.
  • step 612 a determination is made whether there are any nonequjoins present. If a nonequijoin is present, then method 500 cannot be used, so method 600 proceeds to step 614, where a determination is made whether method 300, 400, or 500 is expected to be more efficient. If method 500 is expected to be more efficient, method 600 proceeds to step 610. If method 400 is expected to be more efficient, method 600 proceeds to step 606. If method 300 is expected to be more efficient, method 600 proceeds to step 616, where the nested loop outer join of method 300 is implemented.
  • step 618 a determination is made whether methods 300 or 400 are expected to more efficient. If method 300 is expected to be more efficient, method 600 proceeds to step 616 where method 300 is implemented. If method 400 is determined to be more efficient, then method 600 proceed to step 620 where method 400 is implemented.
  • the optimizer may choose between other methods of performing the PARTITONED OUTER JOLN in addition to or instead of any one of or any combination of methods 300, 400, and 500.
  • only one of methods 300, 400, or 500 of performing the PARTIONED OUTER JOLN is included or only two of methods 300, 400, or 500 are included in the relational database, and if the one or the two methods included cannot be applied, the PARTIONED OUTER JOLN is not performed.
  • OPTIMIZATIONS AND ENHANCEMENT FOR PARTITIONED OUTER JOLN [0092] Optimizations normally used with JOLNs and OUTER JOLNs can also be used with the PARTITIONED OUTER JOLN. For example, predicate pushing, and partition pruning may also be used with the PARTITIONED OUTER JOLN.
  • predicate pushing only those predicates that are defined over the partitioning expressions should be pushed down or up for a view.
  • the program applies that predicate while performing the initial calculations so that unnecessary calculations related to dimension values that are not of interest are not performed.
  • the predicate is applied at one or more points in the program that minimize or at least reduce the amount of calculations that need to be performed to achieve the same result.
  • the predicate, prodjd in (1, 2, 3), can be pushed into the table scan of the SALES table as a filter so that only products 1, 2, and 3 of the SALES table are joined with the TIMES table. Consequently, computations will not be performed on other products of the SALES table.
  • the predicates of an outer query block may be used for pruning an inner query block that is a partitioned operand of PARTITIONED OUTER JOIN. If a portion or partition of the table is not relevant to performing a portion of the calculations, the calculations are not performed on that partition.
  • the example of predicate pushing is also an example of pruning, because all parts of the table that were not relevant to products 1, 2, or 3 were removed from the data under consideration before doing any manipulations on the data of the target table.
  • the optimizer is also enhanced for estimating the cost and cardinality with PARTITIONED OUTER JOIN. In an embodiment, cost-based optimizations are supported for the PARTITIONED OUTER JOLN.
  • Partitioned outer Join execution scheme described above is scalable in that each PARTITIONED OUTER JOLN computation is delegated to a set of slaves such that they can each perform the join operation independently of other slaves.
  • a slave refers to any entity that is capable of processing instructions in parallel to other such entities.
  • a slave may be another processor, process, or thread.
  • the densifying dimension table of the JOLN is broadcast to all of the slaves.
  • the partitioned target table of the JOLN operation may be hash or range partitioned across the processes and/or slaves.
  • the first processor may receive the partitions of the SALES table corresponding to product LDs 1 and 2
  • the second processor may receive a copy of the partitions corresponding to product LDs 3 and 4
  • the third processor may receive the portion of the product table corresponding to product ID 5
  • the fourth processor may receive the portion of the SALES table corresponding to product LD 6.
  • each of processors 1-6 will receive the entire TIMES table. Consequently, each slave and/or process has access to the entire dimension table and the partitions of the target table that it needs to perform the PARTITIONED OUTER JOLN operation for its slice of the partitioned table.
  • the first processor may first OUTER JOLN the partition of product LD # 1 and then may OUTER JOLN the partition of product LD # 2.
  • the second processor may first OUTER JOIN the partition of product LD # 3 and then may OUTER JOIN the partition of LD # 4.
  • the third processor may OUTER JOLN the product LD # 5 partition.
  • the fourth processor may OUTER JOLN the product LD # 6 partition.
  • the non-partitioned dimension table "TIMES” may be broadcast to all of the slaves, and the partitioned target table "SALES" may be hash or range partitioned based on the partitioning columns (e.g., prodjd).
  • a different slave may work on each prodjd into which the table is partitioned.
  • Each slave has access to the entire TIMES table and some partitions of the SALES table identified by the prodjd.
  • Each slave can therefore perform its portion of the PARTITIONED OUTER JOIN independent of other slaves.
  • the partitioning scheme used to partition the partitioned table on the partitioning columns may be hash or range partitioned.
  • a construct may be used that is tailored for desifying.
  • a construct for desifying may have the following syntax: DENSIFY (table_reference[, table reference]%) BY (densifying_expr[, densifying_expr] %) USLNG (dimension_expr[, dimension_expr] %)
  • the densifying construct may have the following syntax:
  • DENSIFY (tablejreference[, table reference] ..., dimension_expr[, dimension_expr] 10) BY (densifying_expr[, densifying _expr] 10)
  • tablejreference is a reference to the target table
  • dimension_expr is a column or an expression that evaluates to a virtual column that is used as a dimension, where a virtual column is a set of numbers having one number for each row of the table.
  • the densifying sxpr is the dimension or virtual dimension with respect to which the data is densified.
  • the dimension_expr need not actually be used for partitioning the data, but is used as a nondesifying dimension, when densifying.
  • the table produced by the densifying construct will have one row for the cross product of each value of the densifying dimension and each distinct value of the partitioning dimension.
  • any number of tables having the same dimensions may be denisfied by the same statement, the tables may have any number of densifying dimensions, and any number of other dimensions, as long as there is at least one other dimension in addition to the densifying dimension.
  • SALES table includes only tuples (regnjd, prodjd, timejd) for dimension value combinations (1, 1, 1) and (1, 2, 1)
  • the TLME table includes timejds 1 and 2.
  • DENSIFY (SALES) BY (timejd) USLNG (prodjd) will result in a SALES table having rows for dimension value combinations (1, 1, 1), (1, 2, 1), (1, 1, 2) and (1, 2, 2).
  • DENSIFY (SALES, regnjd) BY (timejd) will result in a SALES table having rows for dimension value combinations (1, 1, 1), (1, 2, 1), and (1, 1, 2).
  • FIG. 7 is a block diagram that illustrates a computer system 700 upon which an embodiment of the invention may be implemented.
  • the invention may be implemented on many different types of machines.
  • Computer system 700 is just one example of such a machine.
  • Computer system 700 includes a bus 702 or other communication mechanism for communicating information, and a processor 704 coupled with bus 702 for processing information.
  • Computer system 700 also includes a main memory 706, such as a random access memory (RAM) or other dynamic storage device, coupled to bus 702 for storing information and instructions to be executed by processor 704.
  • Main memory 706 also may be used for storing temporary variables or other intermediate information during execution of instructions to be executed by processor 704.
  • Computer system 700 further includes a read only memory (ROM) 708 or other static storage device coupled to bus 702 for storing static information and instructions for processor 704.
  • ROM read only memory
  • a storage device 710 such as a magnetic disk or optical disk, is provided and coupled to bus 702 for storing information and instructions.
  • Computer system 700 may be coupled via bus 702 to a display 712, such as a cathode ray tube (CRT), for displaying information to a computer user.
  • An input device 714 is coupled to bus 702 for communicating information and command selections to processor 704.
  • cursor control 716 such as a mouse, a trackball, or cursor direction keys for communicating direction information and command selections to processor 704 and for controlling cursor movement on display 712.
  • This input device typically has two degrees of freedom in two axes, a first axis (e.g., x) and a second axis (e.g., y), that allows the device to specify positions in a plane.
  • the invention is related to the use of computer system 700 for implementing the techniques described herein. According to one embodiment of the invention, those techniques are performed by computer system 700 in response to processor 704 executing one or more sequences of one or more instructions contained in main memory 706. Such instructions may be read into main memory 706 from another computer-readable medium, such as storage device 710. Execution of the sequences of instructions contained in main memory 706 causes processor 704 to perform the process steps described herein. In alternative embodiments, hard-wired circuitry may be used in place of or in combination with software instructions to implement the invention. Thus, embodiments of the invention are not limited to any specific combination of hardware circuitry and software.
  • Computer-readable medium refers to any medium that participates in providing instructions to processor 704 for execution. Since compute system 700 is just one example of a machine, and computer readable media is just one example of a "machine-readable medium.” Such a medium may take many forms, including but not limited to, non- volatile media, volatile media, and transmission media.
  • Non- volatile media includes, for example, optical or magnetic disks, such as storage device 710.
  • Volatile media includes dynamic memory, such as main memory 706.
  • Transmission media includes coaxial cables, copper wire and fiber optics, including the wires that comprise bus 702. Transmission media can also take the form of acoustic or light waves, such as those generated during radio-wave and infra-red data communications.
  • Computer-readable media include, for example, a floppy disk, a flexible disk, hard disk, magnetic tape, or any other magnetic medium, a CD- ROM, any other optical medium, punchcards, papertape, any other physical medium with patterns of holes, a RAM, a PROM, an EPROM, a FLASH-EPROM, any other memory chip or cartridge, a carrier wave as described hereinafter, or any other medium from which a computer can read.
  • Various forms of computer readable media may be involved in carrying one or more sequences of one or more instructions to processor 704 for execution.
  • the instructions may initially be carried on a magnetic disk of a remote computer.
  • the remote computer can load the instructions into its dynamic memory and send the instructions over a telephone line using a modem.
  • a modem local to computer system 700 can receive the data on the telephone line and use an infra-red transmitter to convert the data to an infra-red signal.
  • An infra-red detector can receive the data carried in the infra-red signal and appropriate circuitry can place the data on bus 702.
  • Bus 702 carries the data to main memory 706, from which processor 704 retrieves and executes the instructions.
  • main memory 706 may optionally be stored on storage device 710 either before or after execution by processor 704.
  • instructions for implementing the construct for a partition or for densifying data may be stored in main memory to 706 and/or carried by any of the computer-readable media described herein.
  • Computer system 700 also includes a communication interface 718 coupled to bus 702.
  • Communication interface 718 provides a two-way data communication coupling to a network link 720 that is connected to a local network 722.
  • communication interface 718 maybe an integrated services digital network (ISDN) card or a modem to provide a data communication connection to a corresponding type of telephone line.
  • ISDN integrated services digital network
  • communication interface 718 may be a local area network (LAN) card to provide a data communication connection to a compatible LAN.
  • LAN local area network
  • Wireless links may also be implemented.
  • communication interface 718 sends and receives electrical, electromagnetic or optical signals that carry digital data streams representing various types of information.
  • Network link 720 typically provides data communication through one or more networks to other data devices.
  • network link 720 may provide a connection through local network 722 to a host computer 724 or to data equipment operated by an Internet Service Provider (ISP) 726.
  • ISP 726 in turn provides data communication services through the world wide packet data communication network now commonly referred to as the "Internet" 728.
  • Internet 728 uses electrical, electromagnetic or optical signals that carry digital data streams.
  • the signals through the various networks and the signals on network link 720 and through communication interface 718, which carry the digital data to and from computer system 700, are exemplary forms of carrier waves transporting the information.
  • Computer system 700 can send messages and receive data, including program code, through the network(s), network link 720 and communication interface 718.
  • a server 730 might transmit a requested code for an application program through Internet 728, ISP 726, local network 722 and communication interface 718.
  • the received code may be executed by processor 704 as it is received, and/or stored in storage device 710, or other non- volatile storage for later execution. In this manner, computer system 700 may obtain application code in the form of a carrier wave.
  • the specification and drawings are, accordingly, to be regarded in an illustrative rather than a restrictive sense.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Computational Linguistics (AREA)
  • Mathematical Physics (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
PCT/US2004/027406 2003-08-22 2004-08-19 Dml statements for densifying data in a relational database system Ceased WO2005020105A1 (en)

Priority Applications (4)

Application Number Priority Date Filing Date Title
EP04781985A EP1658572A1 (en) 2003-08-22 2004-08-19 Dml statements for densifying data in a relational database system
JP2006524117A JP4747094B2 (ja) 2003-08-22 2004-08-19 リレーショナルデータベースシステムでデータを緻密化するためのdmlステートメント
CA002534788A CA2534788C (en) 2003-08-22 2004-08-19 Dml statements for densifying data in a relational database system
AU2004267850A AU2004267850B2 (en) 2003-08-22 2004-08-19 DML statements for densifying data in a relational database system

Applications Claiming Priority (6)

Application Number Priority Date Filing Date Title
US49711103P 2003-08-22 2003-08-22
US60/497,111 2003-08-22
US49907803P 2003-08-28 2003-08-28
US60/499,078 2003-08-28
US10/796,217 US7356542B2 (en) 2003-08-22 2004-03-08 DML statements for densifying data
US10/796,217 2004-03-08

Publications (1)

Publication Number Publication Date
WO2005020105A1 true WO2005020105A1 (en) 2005-03-03

Family

ID=34198989

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/US2004/027406 Ceased WO2005020105A1 (en) 2003-08-22 2004-08-19 Dml statements for densifying data in a relational database system

Country Status (6)

Country Link
US (1) US7356542B2 (https=)
EP (1) EP1658572A1 (https=)
JP (1) JP4747094B2 (https=)
AU (1) AU2004267850B2 (https=)
CA (1) CA2534788C (https=)
WO (1) WO2005020105A1 (https=)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7356542B2 (en) 2003-08-22 2008-04-08 Oracle International Corporation DML statements for densifying data

Families Citing this family (23)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7539667B2 (en) * 2004-11-05 2009-05-26 International Business Machines Corporation Method, system and program for executing a query having a union operator
US8645313B1 (en) * 2005-05-27 2014-02-04 Microstrategy, Inc. Systems and methods for enhanced SQL indices for duplicate row entries
US20070094233A1 (en) * 2005-10-24 2007-04-26 Wolfgang Otter Translating time-independent data using database operations
US8027969B2 (en) * 2005-12-29 2011-09-27 Sap Ag Efficient calculation of sets of distinct results in an information retrieval service
US20090063527A1 (en) * 2007-08-31 2009-03-05 International Business Machines Corporation Processing of database statements with join predicates on range-partitioned tables
IL195956A0 (en) 2008-12-15 2009-09-01 Hyperroll Israel Ltd Automatic data store architecture detection
IL197961A0 (en) * 2009-04-05 2009-12-24 Guy Shaked Methods for effective processing of time series
US8370316B2 (en) 2010-07-12 2013-02-05 Sap Ag Hash-join in parallel computation environments
US8880503B2 (en) * 2011-06-21 2014-11-04 Microsoft Corporation Value-based positioning for outer join queries
US8468150B2 (en) 2011-10-31 2013-06-18 International Business Machines Corporation Accommodating gaps in database index scans
US8996544B2 (en) 2012-09-28 2015-03-31 Oracle International Corporation Pruning disk blocks of a clustered table in a relational database management system
US9430550B2 (en) 2012-09-28 2016-08-30 Oracle International Corporation Clustering a table in a relational database management system
US9507825B2 (en) 2012-09-28 2016-11-29 Oracle International Corporation Techniques for partition pruning based on aggregated zone map information
US9275111B2 (en) 2013-03-15 2016-03-01 International Business Machines Corporation Minimizing result set size when converting from asymmetric to symmetric requests
US10642837B2 (en) 2013-03-15 2020-05-05 Oracle International Corporation Relocating derived cache during data rebalance to maintain application performance
US9892158B2 (en) * 2014-01-31 2018-02-13 International Business Machines Corporation Dynamically adjust duplicate skipping method for increased performance
US10387395B2 (en) 2014-11-03 2019-08-20 Sap Se Parallelized execution of window operator
US9852184B2 (en) * 2014-11-03 2017-12-26 Sap Se Partition-aware distributed execution of window operator
US10157193B2 (en) 2016-03-03 2018-12-18 International Business Machines Corporation Switching between a non-partitioned hash join and a partitioned hash join based on an amount of available memory
US10776349B2 (en) * 2017-01-31 2020-09-15 Salesforce.Com, Inc. Systems, methods, and apparatuses for implementing dynamic macros within a multi-tenant aware structured query language
US10803062B2 (en) * 2017-01-31 2020-10-13 Salesforce.Com, Inc. Systems, methods, and apparatuses for implementing a by partition command term within a multi-tenant aware structured query language
US11086876B2 (en) 2017-09-29 2021-08-10 Oracle International Corporation Storing derived summaries on persistent memory of a storage device
US11954605B2 (en) * 2020-09-25 2024-04-09 Sap Se Systems and methods for intelligent labeling of instance data clusters based on knowledge graph

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5557791A (en) * 1991-08-23 1996-09-17 International Business Machines Corporation Outer join operations using responsibility regions assigned to inner tables in a relational database
US6112198A (en) * 1997-06-30 2000-08-29 International Business Machines Corporation Optimization of data repartitioning during parallel query optimization
US6397214B1 (en) * 1998-11-03 2002-05-28 Computer Associates Think, Inc. Method and apparatus for instantiating records with missing data

Family Cites Families (13)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5423035A (en) * 1992-12-23 1995-06-06 Hughes Aircraft Company Method for evaluating relational database queries with automatic indexing and ordering of join components
US6073134A (en) * 1997-05-29 2000-06-06 Oracle Corporation Method article of manufacture, and apparatus for generating a multi-dimensional record management index
US5905985A (en) * 1997-06-30 1999-05-18 International Business Machines Corporation Relational database modifications based on multi-dimensional database modifications
US6298342B1 (en) * 1998-03-16 2001-10-02 Microsoft Corporation Electronic database operations for perspective transformations on relational tables using pivot and unpivot columns
US6625593B1 (en) * 1998-06-29 2003-09-23 International Business Machines Corporation Parallel query optimization strategies for replicated and partitioned tables
US5991754A (en) * 1998-12-28 1999-11-23 Oracle Corporation Rewriting a query in terms of a summary based on aggregate computability and canonical format, and when a dimension table is on the child side of an outer join
US6282544B1 (en) * 1999-05-24 2001-08-28 Computer Associates Think, Inc. Method and apparatus for populating multiple data marts in a single aggregation process
US6446063B1 (en) * 1999-06-25 2002-09-03 International Business Machines Corporation Method, system, and program for performing a join operation on a multi column table and satellite tables
US6397204B1 (en) * 1999-06-25 2002-05-28 International Business Machines Corporation Method, system, and program for determining the join ordering of tables in a join query
US6345272B1 (en) * 1999-07-27 2002-02-05 Oracle Corporation Rewriting queries to access materialized views that group along an ordered dimension
US6665663B2 (en) * 2001-03-15 2003-12-16 International Business Machines Corporation Outerjoin and antijoin reordering using extended eligibility lists
US6606621B2 (en) * 2001-05-30 2003-08-12 Oracle International Corp. Methods and apparatus for aggregating sparse data
US7356542B2 (en) 2003-08-22 2008-04-08 Oracle International Corporation DML statements for densifying data

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5557791A (en) * 1991-08-23 1996-09-17 International Business Machines Corporation Outer join operations using responsibility regions assigned to inner tables in a relational database
US6112198A (en) * 1997-06-30 2000-08-29 International Business Machines Corporation Optimization of data repartitioning during parallel query optimization
US6397214B1 (en) * 1998-11-03 2002-05-28 Computer Associates Think, Inc. Method and apparatus for instantiating records with missing data

Non-Patent Citations (6)

* Cited by examiner, † Cited by third party
Title
CHAUDHURI S ED - ASSOCIATION FOR COMPUTING MACHINERY: "AN OVERVIEW OF QUERY OPTIMIZATION IN RELATIONAL SYSTEMS", PROCEEDINGS OF THE 17TH. ACM SIGACT-SIGMOD-SIGART SYMPOSIUM ON PRINCIPLES OF DATABASE SYSTEMS. PODS 1998. SEATTLE, WA, JUNE 1 -3, 1998, PROCEEDINGS OF THE ACM SIGACT-SIGMOD-SIGART SYMPOSIUM ON PRINCIPLES OF DATABASE SYSTEMS, NEW YORK, NY : ACM, US, 1 June 1998 (1998-06-01), pages 34 - 43, XP000782631, ISBN: 0-89791-996-3 *
CHAUDHURI S ET AL: "An Overview of Data Warehousing and OLAP Technology", SIGMOD RECORD, SIGMOD, NEW YORK, NY, US, vol. 26, no. 1, March 1997 (1997-03-01), pages 65 - 74, XP002193792, ISSN: 0163-5808 *
GRAEFE G: "Query evaluation techniques for large databases", ACM COMPUTING SURVEYS USA, vol. 25, no. 2, June 1993 (1993-06-01), pages 73 - 170, XP002306867, ISSN: 0360-0300 *
GUPTA, A. ET AL: "Data Densification in a Relational Database System", PROCEEDINGS OF THE ACM SIGMOD 2004 CONFERENCE, 13 June 2004 (2004-06-13), USA, XP002306866 *
JAEDICKE, M. ET AL: "User-Defined table Operators: Enhancing Extensibility in ORDBMS", PROCEEDINGS 25TH VLDB CONFERENCE, EDINBURGH, SCOTLAND, 7 September 1999 (1999-09-07), USA, pages 494 - 505, XP002306868 *
WITKOWSKI, A. ET AL: "Spreadsheets in RDBMS for OLAP", PROCEEDINGS OF THE ACM SIGMOD 2003 CONFERENCE, 9 June 2003 (2003-06-09), USA, pages 52 - 63, XP002306865 *

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7356542B2 (en) 2003-08-22 2008-04-08 Oracle International Corporation DML statements for densifying data

Also Published As

Publication number Publication date
JP4747094B2 (ja) 2011-08-10
US7356542B2 (en) 2008-04-08
AU2004267850B2 (en) 2010-04-08
US20050044102A1 (en) 2005-02-24
CA2534788A1 (en) 2005-03-03
JP2007534035A (ja) 2007-11-22
AU2004267850A1 (en) 2005-03-03
EP1658572A1 (en) 2006-05-24
CA2534788C (en) 2009-12-08

Similar Documents

Publication Publication Date Title
AU2004267850B2 (en) DML statements for densifying data in a relational database system
US10572484B2 (en) Duplicate reduction or elimination with hash join operations
US6792420B2 (en) Method, system, and program for optimizing the processing of queries involving set operators
US7246108B2 (en) Reusing optimized query blocks in query processing
US8620888B2 (en) Partitioning in virtual columns
US6397204B1 (en) Method, system, and program for determining the join ordering of tables in a join query
US6965891B1 (en) Method and mechanism for partition pruning
US7171427B2 (en) Methods of navigating a cube that is implemented as a relational object
US6957225B1 (en) Automatic discovery and use of column correlations in tables
US9390115B2 (en) Tables with unlimited number of sparse columns and techniques for an efficient implementation
US5797136A (en) Optional quantifiers in relational and object-oriented views of database systems
US7814042B2 (en) Selecting candidate queries
US6947934B1 (en) Aggregate predicates and search in a database management system
CA2327167C (en) Method and system for composing a query for a database and traversing the database
US20070061287A1 (en) Method, apparatus and program storage device for optimizing a data warehouse model and operation
WO2015042070A1 (en) Densely grouping dimensional data
Marroquín et al. Pay one, get hundreds for free: Reducing cloud costs through shared query execution
US6253197B1 (en) System and method for hash loops join of data using outer join and early-out join
US9268817B2 (en) Efficient evaluation of hierarchical cubes by non-blocking rollups and skipping levels
US20060161525A1 (en) Method and system for supporting structured aggregation operations on semi-structured data
Wi et al. Towards multi-way join aware optimizer in SAP HANA
CN100547584C (zh) 在关系数据库系统中用于使数据稠密的dml语句
US12271378B2 (en) Ubiquitous search on data stored in the RDBMS
Schneider et al. SimDataMapper: An Architectural Pattern to Integrate Declarative Similarity Matching into Database Applications.
WO2024233051A2 (en) Ubiquitous search on data stored in the rdbms

Legal Events

Date Code Title Description
WWE Wipo information: entry into national phase

Ref document number: 200480024005.2

Country of ref document: CN

AK Designated states

Kind code of ref document: A1

Designated state(s): AE AG AL AM AT AU AZ BA BB BG BR BW BY BZ CA CH CN CO CR CU CZ DE DK DM DZ EC EE EG ES FI GB GD GE GH GM HR HU ID IL IN IS JP KE KG KP KR KZ LC LK LR LS LT LU LV MA MD MG MK MN MW MX MZ NA NI NO NZ OM PG PH PL PT RO RU SC SD SE SG SK SL SY TJ TM TN TR TT TZ UA UG US UZ VC VN YU ZA ZM ZW

AL Designated countries for regional patents

Kind code of ref document: A1

Designated state(s): BW GH GM KE LS MW MZ NA SD SL SZ TZ UG ZM ZW AM AZ BY KG KZ MD RU TJ TM AT BE BG CH CY CZ DE DK EE ES FI FR GB GR HU IE IT LU MC NL PL PT RO SE SI SK TR BF BJ CF CG CI CM GA GN GQ GW ML MR NE SN TD TG

121 Ep: the epo has been informed by wipo that ep was designated in this application
ENP Entry into the national phase

Ref document number: 2534788

Country of ref document: CA

WWE Wipo information: entry into national phase

Ref document number: 2004267850

Country of ref document: AU

WWE Wipo information: entry into national phase

Ref document number: 2004781985

Country of ref document: EP

WWE Wipo information: entry into national phase

Ref document number: 2006524117

Country of ref document: JP

Ref document number: 620/CHENP/2006

Country of ref document: IN

ENP Entry into the national phase

Ref document number: 2004267850

Country of ref document: AU

Date of ref document: 20040819

Kind code of ref document: A

WWP Wipo information: published in national office

Ref document number: 2004267850

Country of ref document: AU

DPEN Request for preliminary examination filed prior to expiration of 19th month from priority date (pct application filed from 20040101)
WWP Wipo information: published in national office

Ref document number: 2004781985

Country of ref document: EP