EP0532727A1 - Verfahren um einen rekursiven suchbefehl auf eine datenbank auszuwerten - Google Patents

Verfahren um einen rekursiven suchbefehl auf eine datenbank auszuwerten

Info

Publication number
EP0532727A1
EP0532727A1 EP92907452A EP92907452A EP0532727A1 EP 0532727 A1 EP0532727 A1 EP 0532727A1 EP 92907452 A EP92907452 A EP 92907452A EP 92907452 A EP92907452 A EP 92907452A EP 0532727 A1 EP0532727 A1 EP 0532727A1
Authority
EP
European Patent Office
Prior art keywords
fixpoint
recursive
query
relation
operator
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Ceased
Application number
EP92907452A
Other languages
English (en)
French (fr)
Inventor
Ming-Chien Shan
Marie-Anne Neimat
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
HP Inc
Original Assignee
Hewlett Packard Co
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Hewlett Packard Co filed Critical Hewlett Packard Co
Publication of EP0532727A1 publication Critical patent/EP0532727A1/de
Ceased legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/245Query processing
    • G06F16/2453Query optimisation
    • G06F16/24534Query rewriting; Transformation
    • G06F16/24537Query rewriting; Transformation of operators
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/245Query processing
    • G06F16/2455Query execution
    • G06F16/24564Applying rules; Deductive queries
    • G06F16/24566Recursive queries

Definitions

  • the present invention relates generally to database systems and more particularly to a method of evaluating a recursive query of a database.
  • Database systems are being used to store and manage more and more different kinds of data. As the use of database systems has expanded and the quantities of data stored in a database have increased, much effort has been devoted to improving existing database systems and developing new systems with new and better capabilities.
  • Each table may be thought of as specifying a "relation" among the data in that table; therefore, each table is referred to as a "relation”.
  • Each row in a relation may be thought of as one data record. The rows are referred to as "tuples”.
  • a database means to request information from it.
  • a query means to obtain the requested information. Sometimes the requested information can be obtained directly by looking it up in one of the relations. If the requested information does not appear in any of the relations, it must be derived, for example by comparing two or more tuples in a relation or by combining tuples from two different relations. The following four examples will help to illustrate these concepts.
  • PNAME is the name of a parent of a person named NAME.
  • PARENT Andrew, William
  • tuple which says that William is a parent of Andrew.
  • FNAME is the name of a friend of a person named NAME.
  • NAME is the name of a person and SEX is the sex of that person.
  • SEX is the sex of that person.
  • Representative PARENT, FRIEND and PERSON relations are depicted in Tables I through III, respectively.
  • a query of the form "FIND PARENTS OF [X]" is an example of a request for information that is directly obtainable by looking up data in this particular database.
  • the response to such a query would be the names of the parents of X.
  • a query of the form "FIND MOTHER OF [X]" is an example of a request for information that cannot be obtained by looking it up.
  • the PARENT relation does not include the sex of the parents and, unlike a human, a computer does not know, for example, that "William” would ordinarily be a father and "Mary" a mother. Therefore, the requested information must be derived from information in the database, for example by (1) finding the parents of X in the PARENT relation, (2) finding the sex of each of those parents in the PERSON relation, and (3) selecting the female parent.
  • the response to such a query would be the name of the mother of X.
  • SUPPLIER contains data arranged in the form
  • SUPPLIER (PART, SUPPLIER, CITY)
  • SUPPLIER represents a name of a source of that part
  • CITY represents the location of that supplier.
  • SUPPLIER Naeedle Valve, Paul's Plumbing Mfg. Co., Pittsburgh
  • SPART represents the name of a part that requires a subpart
  • SSUBPART represents the name of that subpart
  • SQTY represents how many of that subpart are used in that part.
  • a query of the form "FIND SUPPLIERS OF [X]" is an example of a request for information that is directly obtainable by retrieving data from this particular database.
  • the response to such a query would be the names of all suppliers of part X.
  • a query of the form "HOW MANY SUPPLIERS SUPPLY [X]" is an example of a request for information that is not directly obtainable but that can be derived from information in the database, for example by finding all suppliers of X and then counting how many supplier names are found. The response to such a query would be the number of suppliers that supply part X.
  • FLIGHT FLIGHT (FROM, TO, DISTANCE, DTIME, ATIME, COST)
  • FROM is the name of a departure city
  • TO is the name of an arrival city
  • DISTANCE is the mileage between those two cities
  • DTIME is the time of departure
  • ATIME is the time of arrival
  • COST is the cost of a ticket on that flight.
  • a query of the form "FIND CHEAPEST FLIGHT BETWEEN [X] and [Y]" is an example of a request for information that cannot be obtained merely by retrieving data from this database but that can be derived from information in the database, for example by finding the fares of all flights between city X and city Y and then comparing the various fares to find which is lowest.
  • the response to such a query would be the flight number of the cheapest flight between X and Y.
  • VACCINATED is a yes-or-no entry indicating whether the person has been vaccinated against the virus.
  • Representative ENCOUNTER and PERSON relations are depicted in Tables V and VI, respectively.
  • a query that requests the names of all women who had encounters with a certain man is an example of a request for information that can be retrieved directly.
  • a query that seeks the names of all vaccinated women who had encounters with a certain man is an example of a request for information that can be derived by (1) retrieving the names of all women who had encounters with that man according to the ENCOUNTER relation and (2) checking the names of each of those women according to the PERSON relation to find out which of them have been vaccinated.
  • relational algebra A set of relational operators collectively referred to as "relational algebra" has been developed for use in optimizing the evaluation of a complicated query in a large database.
  • a query is translated into a relational algebra expression; the expression is simplified according to certain procedures; query plans for evaluating the simplified expression are generated; and the most efficient of these plans is selected and carried out to provide the desired response. See generally C. J. Date, An Introduction to Database Systems (4th Ed.) Vol. I, Addison-Wesley 1986, chapters 13 and 16, and references cited therein.
  • the relational algebra includes a set of operators. These operators can be compared with arithmetic operators such as " + " and " ⁇ ". Just as an arithmetic operator operates on one or two "input” numbers and provides a new "output” number, so each relational algebra operator takes one or two relations as "inputs” and provides a new relation as "output”.
  • a few examples of these operators are the SELECT, PROJECT, UNION, INTERSECTION, and JOIN operators.
  • the SELECT operator obtains specified rows from a relation. For example, a query seeking the names of Andrew's parents could be phrased as "SELECT entries respecting Andrew from the PARENT relation" (see Example 1 above). This SELECT operation would be represented in the relational algebra as
  • the PROJECT operator obtains specified columns from a relation. For example, a query seeking the names of all persons who have children could be phrased as
  • PROJECT PNAME entries from the PARENT relation This PROJECT operation would be represented in the relational algebra as
  • the UNION operator collects all the rows of each of two relations.
  • the UNION operation is represented in relational algebra as
  • a and B are relations.
  • PARENT U FRIEND in the historical database would provide a new relation containing parent-names and friend-names of everyone who has either a parent or a friend or both.
  • the NAME entries in the PARENT relation would be correlated with the NAME entries in the FRIEND relation; for example, the name Andrew in the PARENT relation would be considered to refer to the same person as the name Andrew in the FRIEND relation.
  • the INTERSECTION operator collects common rows of each of two relations. The INTERSECTION operation is represented in relational algebra as
  • PARENT D FRIEND in the historical database would provide a new relation containing parent-names and friend-names of those persons who have both a parent and a friend.
  • the JOIN operator combines rows from each of two relations according to a specified condition.
  • relational algebra the JOIN operation is represented as
  • a kind of database query which has grown more important in recent years is a recursive query.
  • Such a query can be described as a query which queries itself.
  • a recursive query can be evaluated only by deriving information recursively.
  • a general discussion of the mathematical concept of recursion can be found in J. Bradley,
  • information can be recursively derived from the information in the database, for example by a query of the form "FIND PARENTS OF [FIND PARENTS OF [X]]". If the number of iterations required to evaluate a recursive query is known in advance, then the evaluation process is relatively straight-forward. For example, the request to find the grandparents of X requires exactly two iterations ⁇ one to find the parents of X and one to find the parents of the parents of X. However, if the number of iterations is not known, then the evaluation becomes far more difficult; an example of a request in which the number of iterations is not known is a request to find all ancestors of X.
  • the relational algebra does not have recursion operators and hence cannot support recursive queries.
  • Some relatively simple recursive queries can be expressed in transitive closure form, and transitive closure operators have been proposed for use in translating such queries into relational algebra expressions (R. Agrawal, "Alpha: An Extension of Relational Algebra to Express a Class of Recursive Queries", Proceedings of the Third International Conference on Data Engineering, Los Angeles, California, February 3-5, 1987; S. Ceri et al., "Translation and Optimization of Logic Queries: the Algebraic Approach", Proceedings of the Eleventh International Conference on Very Large Data Bases, Kyoto, Japan, August 1986).
  • not all recursive queries can be expressed in transitive closure form. From the foregoing, it will be apparent that there is a need for a way to optimize recursive queries, especially those which cannot be expressed in transitive closure form, for efficient evaluation in large and complex database systems.
  • the present invention provides a method of evaluating a recursive query in a database system by translating the query into an expression that includes a novel fixpoint operator and using a novel set of transformation procedures to simplify the translated query.
  • a method of evaluating a recursive query includes the steps of translating the query into a relational algebra expression that includes a fixpoint operator, optimizing the expression according to a set of transformation procedures, and evaluating the optimized expression by reference to data in the database.
  • transformation procedures include commuting a projection operation with a fixpoint operation, commuting a selection operation with a fixpoint operation, distributing a join operation over a fixpoint operation, and regrouping a join operation and a fixpoint operation.
  • Regrouping means applying the commutation and association rules, typically to an expression having a fixpoint and several join operators.
  • the selection operation may be a selection predicate on a direct mapping column, a global selection predicate, or a selection predicate that includes a join operation.
  • FIGURE 1 is a data flow diagram depicting a preferred embodiment of a method of optimizing recursive queries according to the invention
  • FIG. 2 is a flow diagram depicting initial and recursive inputs of a fixpoint operator as referenced in the "translate using ⁇ " process of FIG. 1;
  • FIG. 3 is a flow diagram depicting a generalized version of the fixpoint operator shown in FIG. 2.
  • the invention is embodied in a novel method of evaluating recursive queries in a database system.
  • Relational algebra provides a powerful technique for optimizing the evaluation of database queries, but recursive queries have not been amenable to such optimization techniques.
  • the query can be evaluated much more efficiently than would otherwise be possible.
  • a method of evaluating a recursive query of a database 11 comprises translating a recursive query into an expression that includes a fixpoint operator, as indicated by a "translate using ⁇ ” process circle 13; optimizing the expression according to a set of transformation procedures as indicated by an “optimize” process circle 15; and evaluating the optimized expression, as indicated by an "evaluate” process circle 17, by reference to data in the database 11.
  • the recursive query is received from a user as indicated by an input box 19 and an arrow extending from the box 19 to the "translate” circle 13.
  • the "user” may be a person at a computer terminal, but the user could also be, for example, an electronic device, an application program, or the like.
  • the result of evaluating the optimized query is provided to the user as indicated by an output box 21 and an arrow extending from the "evaluate” circle 17 to the box 21.
  • the user who receives the output is usually the same user as the one who generated the query, but this need not be the case; the user that generates a query could specify that the result be sent somewhere else.
  • These new transformation procedures include commuting a projection operation with a fixpoint operation as indicated by a "commute projection” process circle 25, commuting a selection operation with a fixpoint operation as indicated by a “commute selection” process circle 27, distributing a join operation over a fixpoint operation as indicated by a “distribute join” process circle 29, and regrouping join and fixpoint operations as indicated by a "regroup” process circle 31.
  • Commuting a selection operation with a fixpoint operation comprises commuting a selection predicate on a direct mapping column with a fixpoint operation as indicated by a "direct map” process circle 33, commuting a global selection predicate with a fixpoint operation as indicated by a "global” process circle 35, and commuting a selection predicate that includes a join with a fixpoint operation as indicated by a "join” process circle 37.
  • Arrows extend in both directions between the "commute selection” circle 27 and each of the circles 33, 35 and 37 to indicate that one or more than one of the procedures respecting commuting a selection and a fixpoint operator may be used as needed; of course, none of the "commute selection" procedures may be required in some cases.
  • Regrouping a join operation and a fixpoint operation comprises commutation and association as indicated by a "commutation” process circle 39 and an “association” process circle 41, respectively. Arrows extending in both directions between the
  • transformation procedures may also be used to simplify a recursive query as indicated by a "transform procedures" process circle 43.
  • Arrows extending in both directions between the circle 43 and the circle 15 indicate that these previously-known transformation procedures may be used once, several times or not at all in any given optimization.
  • fixpoint operator enhances the declarative power of relational algebra by supporting recursive queries. It is expected that the introduction of the fixpoint operator will benefit many database and computer applications such as computer aided design and manufacture (CAD/CAM), software engineering (CASE), and artificial intelligence (AI) applications.
  • CAD/CAM computer aided design and manufacture
  • CASE software engineering
  • AI artificial intelligence
  • the inputs (operands) and output of the fixpoint operator are relations.
  • the fixpoint operator supports least fixed point semantics.
  • the fixpoint operator can compute both linear and mutually recursive relations. In a simple form the fixpoint operator is defined symbolically by an expression of the form
  • I j represents the j-th initial input
  • R j represents they j-th recursive input.
  • Each initial input I is a relation. There may be one or more such inputs; M initial inputs are shown in FIG. 2.
  • Each recursive input R is also a relation. There may be none, one or more of these recursive inputs; N recursive inputs are shown in FIG. 2.
  • a recursive input may, but need not, be the same as an initial input.
  • the output of the fixpoint operator is fed back as a recursive input R c ; this recursive input R c differs from the N other recursive inputs in that the input R c is derived by the fixpoint operator whereas the other recursive inputs are not.
  • the initial inputs are utilized to provide a first output. This output is fed back as the recursive input R c .
  • the recursive input R c and the N other recursive inputs are utilized to provide a second output, and so on for as many iterations as are required.
  • fixpoint operator simplifies to conventional join and union operations.
  • Each initial input I is a relation. There may be one or more such inputs; M initial inputs are shown in FIG. 3.
  • Each recursive input R is also a relation. There may be none, one or more of these recursive inputs; N recursive inputs are shown in FIG. 2.
  • a recursive input may, but need not, be the same as an initial input.
  • the fixpoint operator provides K recursive relations as outputs and these are fed back as recursive inputs R C1 through R CK ; these K recursive inputs R c differ from the N other recursive inputs in that the inputs R c are derived by the fixpoint operator whereas the other recursive inputs are not.
  • J may but need not be equal to K.
  • the sets of initial or recursive inputs for each recursive relation need not be disjoint.
  • the recursive inputs cannot be disjoint because if they were the relations would not be mutually recursive.
  • Cartesion product of the K mutually recursive output relations can be used to extract individual relations from the output.
  • the historical "people” database as described in Example 1 above includes the base relations PARENT, FRIEND and PERSON as given in Tables I through III.
  • DNAME is the name of a person and ANAME is the name of an ancestor of that person, is defined as
  • PARENT ⁇ name aname: dname, pname (PARENT, ANCESTOR).
  • fixpoint operator may then be used to express various recursive queries of the "people” database. Specifically, a query of the form “Find all ancestors of John” is expressed:
  • the factory database of Example 2 above includes the base relations SUPPLIER and SUBPART.
  • a derived relation COMP of the form
  • fixpoint operator may then be used to express various recursive queries such as a query of the form "Find the location and quantities of any parts that go into making a Locomotive" as follows:
  • FLIGHT ⁇ destination from, catime ⁇ dtime, distance ⁇ 1000: start, to, cdtime, atime, ccost, +cost ( FLIGHT, SHORT_CONNECTION )
  • the fixpoint operator is used, for example, to express a query of the form "Find the minimum cost of all flights between London and San Francisco under the condition that the distance between each pair of connecting points is less than 1,000 miles" as follows:
  • a main task to be performed by a query optimizer is to rearrange the sequence of operations in an expression of a query for more efficient evaluation.
  • the query expression Starting with an initial form generated by a parser, the query expression usually undergoes a sequence of transformations based upon certain heuristic rules or execution cost comparisons.
  • the transformations usually include:
  • Example 4 above (the study of the sexually transmitted virus) will be used to illustrate these rules.
  • EXPSDFML is defined as:
  • EXPSDML is defined as:
  • Tables V and VI above illustrate the ENCOUNTER(E) and PERSON(P) relations.
  • the E XPSDFML(F) AND EXPSDML(M) relations are illustrated in the following tables VII and VIII.
  • a selection predicate is said to be global if it is applied at each iteration during the generation of the recursive relation defined by the fixpoint operation. In other words, once a tuple of an input relation fails to satisfy a global selection predicate, it will be excluded from consideration in any of the subsequent recursive computations. (The detection of global predicates will be discussed later.)
  • a detailed semantic analysis is required to determine the legal transformations that can be applied to an algebraic expression. The present discussion is limited to transformation rules which do not require any semantic query analysis. Formal proofs of the rules are omitted because of their length and instead each rule is motivated either with a sketch of a proof or with detailed examples.
  • the purpose of this translation is to reduce the sizes of the operands for each operation.
  • the fixpoint operator ⁇ is viewed as a generator of a directed graph consisting of all paths leading to all possible answers, an early selection on the initial input and recursive input relations has the effect of eliminating the unqualified paths in the graph before they are generated. See, generally, Ioannidis, Y., and Wong, W., "On the Computation of the Transitive Closure of Relational Operators", Proc. of 12th Int. Conf.
  • direct mapping column is very similar to that of invariant column introduced in Devanbu, P. and Agrawal, R., "Moving Selections Into Fixpoint Queries", Proc. of 4th Int. Conf on Data Engineering, Los Angeles, February, 1988.
  • Direct mapping columns are actually a subset of invariant columns.
  • the detection of the more general invariant columns requires a detailed analysis of the selection predicates.
  • the advantage of concentrating on direct mapping columns is that their detection is trivial and they cover the majority of the cases for which the payoff for performing early selections is substantial.
  • this rule was used to move the selection predicate
  • this rule was used to move the global selection predicates
  • EXPSDFML and EXPSDML This will be detected by the query optimizer and query Q3 will be translated to:
  • join operator need not be converted to a right outer join operator.
  • the distribution of joins is performed over the initial inputs and non-recursive relations in the recursive inputs and only when the join columns are restricted to a single non-recursive relation in the inputs.
  • the invention provides an effective and efficient method of evaluating linear and recursive queries in large databases.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
EP92907452A 1991-02-26 1992-02-25 Verfahren um einen rekursiven suchbefehl auf eine datenbank auszuwerten Ceased EP0532727A1 (de)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
US66025691A 1991-02-26 1991-02-26
US660256 1991-02-26

Publications (1)

Publication Number Publication Date
EP0532727A1 true EP0532727A1 (de) 1993-03-24

Family

ID=24648750

Family Applications (1)

Application Number Title Priority Date Filing Date
EP92907452A Ceased EP0532727A1 (de) 1991-02-26 1992-02-25 Verfahren um einen rekursiven suchbefehl auf eine datenbank auszuwerten

Country Status (2)

Country Link
EP (1) EP0532727A1 (de)
WO (1) WO1992015066A1 (de)

Families Citing this family (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5548755A (en) * 1995-02-17 1996-08-20 International Business Machines Corporation System for optimizing correlated SQL queries in a relational database using magic decorrelation
US7526468B2 (en) * 1999-01-08 2009-04-28 Computer Associates Think, Inc. System and method for recursive path analysis of DBMS procedures
US7548935B2 (en) * 2002-05-09 2009-06-16 Robert Pecherer Method of recursive objects for representing hierarchies in relational database systems
US7249120B2 (en) * 2003-06-27 2007-07-24 Microsoft Corporation Method and apparatus for selecting candidate statistics to estimate the selectivity value of the conditional selectivity expression in optimize queries based on a set of predicates that each reference a set of relational database tables
US20100088325A1 (en) * 2008-10-07 2010-04-08 Microsoft Corporation Streaming Queries
US9158816B2 (en) 2009-10-21 2015-10-13 Microsoft Technology Licensing, Llc Event processing with XML query based on reusable XML query template
US10262269B2 (en) 2015-08-31 2019-04-16 Semmle Limited Evaluating recursive and exiversal predicates

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
See references of WO9215066A1 *

Also Published As

Publication number Publication date
WO1992015066A1 (en) 1992-09-03

Similar Documents

Publication Publication Date Title
Ooi Efficient query processing in geographic information systems
Sellis Multiple-query optimization
Abiteboul et al. On the representation and querying of sets of possible worlds
US6088524A (en) Method and apparatus for optimizing database queries involving aggregation predicates
Astrahan et al. Implementation of a structured English query language
Silva et al. The similarity join database operator
US7184998B2 (en) System and methodology for generating bushy trees using a left-deep tree join enumeration algorithm
Gudes et al. Discovering frequent graph patterns using disjoint paths
CN107169033A (zh) 基于数据模式转换和并行框架的关系数据查询优化方法
Furtado et al. An algebra of quotient relations
Hassan et al. Grfusion: Graphs as first-class citizens in main-memory relational database systems
Klug Access paths in the" Abe" statistical query facility
EP0532727A1 (de) Verfahren um einen rekursiven suchbefehl auf eine datenbank auszuwerten
van Emde Boas et al. Storing and evaluating Horn-clause rules in a relational database
Dadashzadeh An improved division operator for relational algebra
Han Pattern-based and knowledge-directed query compilation for recursive data bases (expert, systems, reductive, logic)
Rosado et al. Flexible query languages for relational databases: an overview
Brodsky et al. On approximation-based query evaluation, expensive predicates and constraint objects
Gardarin et al. OFL: A functional execution model for object query languages
Malkemus et al. Predicate derivation and monotonicity detection in DB2 UDB
Sadreddini et al. Framework for query optimization in distributed statistical databases
Giannotti et al. Nondeterministic, nonmonotonic logic databases
Ooi et al. Query optimization in an extended DBMS
Scheuermann et al. Multidatabase query processing with uncertainty in global keys and attribute values
Shan et al. Optimization of relational algebra expressions containing recursion operators

Legal Events

Date Code Title Description
PUAI Public reference made under article 153(3) epc to a published international application that has entered the european phase

Free format text: ORIGINAL CODE: 0009012

17P Request for examination filed

Effective date: 19921125

AK Designated contracting states

Kind code of ref document: A1

Designated state(s): DE GB

17Q First examination report despatched

Effective date: 19961001

GRAG Despatch of communication of intention to grant

Free format text: ORIGINAL CODE: EPIDOS AGRA

STAA Information on the status of an ep patent application or granted ep patent

Free format text: STATUS: THE APPLICATION HAS BEEN REFUSED

18R Application refused

Effective date: 19980808